Module: MungingUtils
Constant Summary collapse
- NON_PLAIN_ASCII_RE =
all non-keyboard characters (that is, characters outside the 0x20 to 0x127 range)
/[^\x20-\x7e]/m
- CONTROL_CHARS_RE =
characters below 0x20
/[\x00-\x19]/m
Instance Method Summary collapse
-
#safe_json_encode(string) ⇒ Object
Returns a JSON encoded string, with all non-ASCII characters escaped.
-
#safe_xml_encode(text) ⇒ Object
For follow-on escaping of XML-encoded text.
-
#scrub_control_chars(text) ⇒ Object
Modifies the text in place, replacing all newlines, tabs, and other control characters with a space (those < ascii 0x20, but not including 0xff).
- #time_columns_from_time(time) ⇒ Object
- #warn_record(desc, record = nil) ⇒ Object
Instance Method Details
#safe_json_encode(string) ⇒ Object
Returns a JSON encoded string, with all non-ASCII characters escaped
52 53 54 55 56 |
# File 'lib/wu/munging.rb', line 52 def safe_json_encode(string) jsonized = MultiJson.encode(string) jsonized.gsub!(NON_PLAIN_ASCII_RE){|ch| "\\u%04x" % ch.ord } unless jsonized.ascii_only? jsonized end |
#safe_xml_encode(text) ⇒ Object
For follow-on escaping of XML-encoded text.
Modifies the text in place, replacing all non-keyboard characters (newline, tab, anything not between ascii 0x20 and 0x7e) with their XML entity encoding
NOTE: does NOT escape the ampersand character NOTE: modifies the text in-place
46 47 48 49 |
# File 'lib/wu/munging.rb', line 46 def safe_xml_encode(text) text.gsub!(NON_PLAIN_ASCII_RE){|ch| "&#%04x;" % ch.ord } unless jsonized.ascii_only? text end |
#scrub_control_chars(text) ⇒ Object
Modifies the text in place, replacing all newlines, tabs, and other control characters with a space (those < ascii 0x20, but not including 0xff). This uses a whitelist
Only use this if funny characters aren't suppose to be in there in the first
place; there are safe, easy ways to properly encode, eg MultiJson.encode()
33 34 35 36 |
# File 'lib/wu/munging.rb', line 33 def scrub_control_chars(text) text.gsub!(CONTROL_CHARS_RE, ' ') text end |
#time_columns_from_time(time) ⇒ Object
11 12 13 14 15 16 17 18 |
# File 'lib/wu/munging.rb', line 11 def time_columns_from_time(time) columns = [] columns << "%04d%02d%02d" % [time.year, time.month, time.day] columns << "%02d%02d%02d" % [time.hour, time.min, time.sec] columns << time.to_i columns << time.wday return columns end |
#warn_record(desc, record = nil) ⇒ Object
20 21 22 23 24 |
# File 'lib/wu/munging.rb', line 20 def warn_record(desc, record=nil) record_info = MultiJson.encode(record)[0..1000] rescue "(unencodeable record) #{record.inspect[0..100]}" Log.warn [desc, record_info].join("\t") nil end |