Class: JSON::Stream::Buffer

Inherits:
Object
  • Object
show all
Defined in:
lib/json/stream/buffer.rb

Overview

A character buffer that expects a UTF-8 encoded stream of bytes. This handles truncated multi-byte characters properly so we can just feed it binary data and receive a properly formatted UTF-8 String as output. See here for UTF-8 parsing details: en.wikipedia.org/wiki/UTF-8 tools.ietf.org/html/rfc3629#section-3

Instance Method Summary collapse

Constructor Details

#initializeBuffer

Returns a new instance of Buffer.



13
14
15
# File 'lib/json/stream/buffer.rb', line 13

def initialize
  @state, @buf, @need = :start, [], 0
end

Instance Method Details

#<<(data) ⇒ Object

Fill the buffer with a String of binary UTF-8 encoded bytes. Returns as much of the data in a UTF-8 String as we have. Truncated multi-byte characters are saved in the buffer until the next call to this method where we expect to receive the rest of the multi-byte character.



21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/json/stream/buffer.rb', line 21

def <<(data)
  bytes = []
  data.bytes.each do |b|
    case @state
    when :start
      if b < 128
        bytes << b
      elsif b >= 192
        @state = :multi_byte
        @buf << b
        @need = case
          when b >= 240 then 4
          when b >= 224 then 3
          when b >= 192 then 2 end
      else
        error('Expected start of multi-byte or single byte char')
      end
    when :multi_byte
      if b > 127 && b < 192
        @buf << b
        if @buf.size == @need
          bytes += @buf.slice!(0, @buf.size)
          @state = :start
        end
      else
        error('Expected continuation byte')
      end
    end
  end
  bytes.pack('C*').force_encoding(Encoding::UTF_8).tap do |str|
    error('Invalid UTF-8 byte sequence') unless str.valid_encoding?
  end
end