BinData

A declarative way to read and write structured binary data.

What is it for?

Do you ever find yourself writing code like this?

io = File.open(...)
len = io.read(2).unpack("v")
name = io.read(len)
width, height = io.read(8).unpack("VV")
puts "Rectangle #{name} is #{width} x #{height}"

It’s ugly, violates DRY and feels like you’re writing Perl, not Ruby. There is a better way.

class Rectangle < BinData::MultiValue
  uint16le :len
  string   :name, :read_length => :len
  uint32le :width
  uint32le :height
end

io = File.open(...)
r = Rectangle.read(io)
puts "Rectangle #{r.name} is #{r.width} x #{r.height}"

BinData makes it easy to specify the structure of the data you are manipulating.

Read on for the tutorial, or go straight to the download page.

Syntax

BinData declarations are easy to read. Here’s an example.

class MyFancyFormat < BinData::MultiValue
  stringz :comment
  uint8   :count, :check_value => lambda { (value % 2) == 0 }
  array   :some_ints, :type => :int32be, :initial_length => :count
end

The structure of the data in this example is

  1. A zero terminated string

  2. An unsigned 8bit integer which must by even

  3. A sequence of unsigned 32bit integers in big endian form, the total number of which is determined by the value of the 8bit integer.

The BinData declaration matches the english description closely. Just for fun, lets look at how we’d implement this using #pack and #unpack. Here’s the writing code, have a go at the reading code.

comment = "this is a comment"
some_ints = [2, 3, 8, 9, 1, 8]
File.open(...) do |io|
  io.write([comment, some_ints.size, *some_ints].pack("Z*CN*"))
end

The general format of a BinData declaration is a class containing one or more fields.

class MyName < BinData::MultiValue
  type field_name, :param1 => "foo", :param2 => bar, ...
  ...
end

type is the name of a supplied type (e.g. uint32be, string) or a user defined type. For user defined types, convert the class name from CamelCase to lowercase underscore_style.

field_name is the name by which you can access the data. Use either a String or a Symbol. You may specify a name as nil, but this is described later in the tutorial.

Each field may have parameters for how to process the data. The parameters are passed as a Hash using Symbols for keys.

Handling dependencies between fields

A common occurance in binary file formats is one field depending upon the value of another. e.g. A string preceded by it’s length.

As an example, let’s assume a Pascal style string where the byte preceding the string contains the string’s length.

# reading
io = File.open(...)
len = io.getc
str = io.read(len)
puts "string is " + str

# writing
io = File.open(...)
str = "this is a string"
io.putc(str.length)
io.write(str)

Here’s how we’d implement the same example with BinData.

class PascalString < BinData::MultiValue
  uint8  :len,  :value => lambda { data.length }
  string :data, :read_length => :len
end

# reading
io = File.open(...)
ps = PascalString.new
ps.read(io)
puts "string is " + ps.data

# writing
io = File.open(...)
ps = PascalString.new
ps.data = "this is a string"
ps.write(io)

This syntax needs explaining. Let’s simplify by examining reading and writing separately.

class PascalStringReader < BinData::MultiValue
  uint8  :len
  string :data, :read_length => :len
end

This states that when reading the string, the initial length of the string (and hence the number of bytes to read) is determined by the value of the len field.

Note that :read_length => :len is syntactic sugar for :read_length => lambda { len }, but more on that later.

class PascalStringWriter < BinData::MultiValue
  uint8  :len, :value => lambda { data.length }
  string :data
end

This states that the value of len is always equal to the length of data. len may not be manually modified.

Combining these two definitions gives the definition for PascalString as previously defined.

Once thing to note with dependencies, is that a field can only depend on one before it. You can’t have a string which has the characters first and the length afterwards.

Predefined Types

These are the predefined types. Custom types can be created by composing these types.

BinData::String

A sequence of bytes.

BinData::Stringz

A zero terminated sequence of bytes.

BinData::Array

A list of objects of the same type.

BinData::Choice

A choice between several objects.

BinData::Struct

An ordered collection of named objects.

BinData::Int8

Signed 8 bit integer.

BinData::Int16le

Signed 16 bit integer (little endian).

BinData::Int16be

Signed 16 bit integer (big endian).

BinData::Int32le

Signed 32 bit integer (little endian).

BinData::Int32be

Signed 32 bit integer (big endian).

BinData::Int64le

Signed 64 bit integer (little endian).

BinData::Int64be

Signed 64 bit integer (big endian).

BinData::Uint8

Unsigned 8 bit integer.

BinData::Uint16le

Unsigned 16 bit integer (little endian).

BinData::Uint16be

Unsigned 16 bit integer (big endian).

BinData::Uint32le

Unsigned 32 bit integer (little endian).

BinData::Uint32be

Unsigned 32 bit integer (big endian).

BinData::Uint64le

Unsigned 64 bit integer (little endian).

BinData::Uint64be

Unsigned 64 bit integer (big endian).

BinData::FloatLe

Single precision floating point number (little endian).

BinData::FloatBe

Single precision floating point number (big endian).

BinData::DoubleLe

Double precision floating point number (little endian).

BinData::DoubleBe

Double precision floating point number (big endian).

BinData::Rest

Consumes the rest of the input stream.

Parameters

class PascalStringWriter < BinData::MultiValue
  uint8  :len, :value => lambda { data.length }
  string :data
end

Revisiting the Pascal string writer, we see that a field can take parameters. Parameters are passed as a Hash, where the key is a symbol. It should be noted that parameters are designed to be lazily evaluated, possibly multiple times. This means that any parameter value must not have side effects.

Here are some examples of legal values for parameters.

* :param => 5
* :param => lambda { 5 + 2 }
* :param => lambda { foo + 2 }
* :param => :foo

The simplest case is when the value is a literal value, such as 5.

If the value is not a literal, it is expected to be a lambda. The lambda will be evaluated in the context of the parent, in this case the parent is an instance of PascalStringWriter.

If the value is a symbol, it is taken as syntactic sugar for a lambda containing the value of the symbol. e.g :param => :foo is :param => lambda { foo }

Saving Typing

The endianess of numeric types must be explicitly defined so that the code produced is independent of architecture. Explicitly specifying the endianess of each numeric type can become tedious, so the following shortcut is provided.

class A < BinData::MultiValue
  endian :little

  uint16   :a
  uint32   :b
  double   :c
  uint32be :d
  array    :e, :type => :int16
end

is equivalent to:

class A < BinData::MultiValue
  uint16le  :a
  uint32le  :b
  double_le :c
  uint32be  :d
  array     :e, :type => :int16le
end

Using the endian keyword improves the readability of the declaration as well as reducing the amount of typing necessary. Note that the endian keyword will cascade to nested types, as illustrated with the array in the above example.

Creating custom types

Custom types should be created by subclassing BinData::MultiValue or BinData::SingleValue. Ocassionally it may be useful to subclass BinData::Single. Subclassing other classes may have unexpected results and is unsupported.

Let us revisit the Pascal String example.

class PascalString < BinData::MultiValue
  uint8  :len,  :value => lambda { data.length }
  string :data, :read_length => :len
end

We’d like to make PascalString a custom type that behaves like a BinData::Single object so we can use :initial_value etc. Here’s an example usage of what we’d like:

class Favourites < BinData::MultiValue
  pascal_string :language, :initial_value => "ruby"
  pascal_string :os,       :initial_value => "unix"
end

f = Favourites.new
f.os = "freebsd"
f.to_s #=> "\004ruby\007freebsd"

We create this type of custom string by inheriting from BinData::SingleValue and implementing the #get and #set methods.

class PascalString < BinData::SingleValue
  uint8  :len,  :value => lambda { data.length }
  string :data, :read_length => :len

  def get;   self.data; end
  def set(v) self.data = v; end
end

If the type we are creating represents a single value then inherit from BinData::SingleValue, otherwise inherit from BinData::MultiValue.

License

BinData is released under the same license as Ruby.

Copyright © 2007, 2008 Dion Mendel