Class: JSON::Stream::Buffer

Inherits:
Object
  • Object
show all
Defined in:
lib/json/stream/buffer.rb

Overview

A character buffer that expects a UTF-8 encoded stream of bytes. This handles truncated multi-byte characters properly so we can just feed it binary data and receive a properly formatted UTF-8 String as output.

More UTF-8 parsing details are available at:

http://en.wikipedia.org/wiki/UTF-8
http://tools.ietf.org/html/rfc3629#section-3

Instance Method Summary collapse

Constructor Details

#initializeBuffer

Returns a new instance of Buffer.



15
16
17
18
19
# File 'lib/json/stream/buffer.rb', line 15

def initialize
  @state = :start
  @buffer = []
  @need = 0
end

Instance Method Details

#<<(data) ⇒ Object

Fill the buffer with a String of binary UTF-8 encoded bytes. Returns as much of the data in a UTF-8 String as we have. Truncated multi-byte characters are saved in the buffer until the next call to this method where we expect to receive the rest of the multi-byte character.

data - The partial binary encoded String data.

Raises JSON::Stream::ParserError if the UTF-8 byte sequence is malformed.

Returns a UTF-8 encoded String.



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/json/stream/buffer.rb', line 31

def <<(data)
  # Avoid state machine for complete UTF-8.
  if @buffer.empty?
    data.force_encoding(Encoding::UTF_8)
    return data if data.valid_encoding?
  end

  bytes = []
  data.each_byte do |byte|
    case @state
    when :start
      if byte < 128
        bytes << byte
      elsif byte >= 192
        @state = :multi_byte
        @buffer << byte
        @need =
          case
          when byte >= 240 then 4
          when byte >= 224 then 3
          when byte >= 192 then 2
          end
      else
        error('Expected start of multi-byte or single byte char')
      end
    when :multi_byte
      if byte > 127 && byte < 192
        @buffer << byte
        if @buffer.size == @need
          bytes += @buffer.slice!(0, @buffer.size)
          @state = :start
        end
      else
        error('Expected continuation byte')
      end
    end
  end

  # Build UTF-8 encoded string from completed codepoints.
  bytes.pack('C*').force_encoding(Encoding::UTF_8).tap do |text|
    error('Invalid UTF-8 byte sequence') unless text.valid_encoding?
  end
end

#empty?Boolean

Determine if the buffer contains partial UTF-8 continuation bytes that are waiting on subsequent completion bytes before a full codepoint is formed.

Examples

bytes = "é".bytes

buffer << bytes[0]
buffer.empty?
# => false

buffer << bytes[1]
buffer.empty?
# => true

Returns true if the buffer is empty.

Returns:

  • (Boolean)


92
93
94
# File 'lib/json/stream/buffer.rb', line 92

def empty?
  @buffer.empty?
end