Module: MungingUtils

Extended by:
MungingUtils
Included in:
MungingUtils
Defined in:
lib/wu/munging.rb

Constant Summary collapse

NON_PLAIN_ASCII_RE =

all non-keyboard characters (that is, characters outside the 0x20 to 0x127 range)

/[^\x20-\x7e]/m
CONTROL_CHARS_RE =

characters below 0x20

/[\x00-\x19]/m

Instance Method Summary collapse

Instance Method Details

#safe_json_encode(string) ⇒ Object

Returns a JSON encoded string, with all non-ASCII characters escaped



52
53
54
55
56
# File 'lib/wu/munging.rb', line 52

def safe_json_encode(string)
  jsonized = MultiJson.encode(string)
  jsonized.gsub!(NON_PLAIN_ASCII_RE){|ch| "\\u%04x" % ch.ord } unless jsonized.ascii_only?
  jsonized
end

#safe_xml_encode(text) ⇒ Object

For follow-on escaping of XML-encoded text.

Modifies the text in place, replacing all non-keyboard characters (newline, tab, anything not between ascii 0x20 and 0x7e) with their XML entity encoding

NOTE: does NOT escape the ampersand character NOTE: modifies the text in-place



46
47
48
49
# File 'lib/wu/munging.rb', line 46

def safe_xml_encode(text)
  text.gsub!(NON_PLAIN_ASCII_RE){|ch|  "&#%04x;" % ch.ord } unless jsonized.ascii_only?
  text
end

#scrub_control_chars(text) ⇒ Object

Modifies the text in place, replacing all newlines, tabs, and other control characters with a space (those < ascii 0x20, but not including 0xff). This uses a whitelist

Only use this if funny characters aren't suppose to be in there in the first place; there are safe, easy ways to properly encode, eg MultiJson.encode()



33
34
35
36
# File 'lib/wu/munging.rb', line 33

def scrub_control_chars(text)
  text.gsub!(CONTROL_CHARS_RE, ' ')
  text
end

#time_columns_from_time(time) ⇒ Object



11
12
13
14
15
16
17
18
# File 'lib/wu/munging.rb', line 11

def time_columns_from_time(time)
  columns = []
  columns << "%04d%02d%02d" % [time.year, time.month, time.day]
  columns << "%02d%02d%02d" % [time.hour, time.min, time.sec]
  columns << time.to_i
  columns << time.wday
  return columns
end

#warn_record(desc, record = nil) ⇒ Object



20
21
22
23
24
# File 'lib/wu/munging.rb', line 20

def warn_record(desc, record=nil)
  record_info = MultiJson.encode(record)[0..1000] rescue "(unencodeable record) #{record.inspect[0..100]}"
  Log.warn [desc, record_info].join("\t")
  nil
end