Class: JSON::Stream::Buffer
- Inherits:
-
Object
- Object
- JSON::Stream::Buffer
- Defined in:
- lib/json/stream/buffer.rb
Overview
A character buffer that expects a UTF-8 encoded stream of bytes. This handles truncated multi-byte characters properly so we can just feed it binary data and receive a properly formatted UTF-8 String as output. See here for UTF-8 parsing details: en.wikipedia.org/wiki/UTF-8 tools.ietf.org/html/rfc3629#section-3
Instance Method Summary collapse
-
#<<(data) ⇒ Object
Fill the buffer with a String of binary UTF-8 encoded bytes.
-
#initialize ⇒ Buffer
constructor
A new instance of Buffer.
Constructor Details
#initialize ⇒ Buffer
Returns a new instance of Buffer.
13 14 15 |
# File 'lib/json/stream/buffer.rb', line 13 def initialize @state, @buf, @need = :start, [], 0 end |
Instance Method Details
#<<(data) ⇒ Object
Fill the buffer with a String of binary UTF-8 encoded bytes. Returns as much of the data in a UTF-8 String as we have. Truncated multi-byte characters are saved in the buffer until the next call to this method where we expect to receive the rest of the multi-byte character.
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/json/stream/buffer.rb', line 21 def <<(data) bytes = [] data.bytes.each do |b| case @state when :start if b < 128 bytes << b elsif b >= 192 @state = :multi_byte @buf << b @need = case when b >= 240 then 4 when b >= 224 then 3 when b >= 192 then 2 end else error('Expected start of multi-byte or single byte char') end when :multi_byte if b > 127 && b < 192 @buf << b if @buf.size == @need bytes += @buf.slice!(0, @buf.size) @state = :start end else error('Expected continuation byte') end end end bytes.pack('C*').force_encoding(Encoding::UTF_8).tap do |str| error('Invalid UTF-8 byte sequence') unless str.valid_encoding? end end |