Module: DataMetaAvro

Defined in:
lib/dataMetaAvro.rb

Overview

DataMetaDOM and Avro Schemas.

For command line details either check the new method’s source or the README, the usage section.

Constant Summary collapse

VERSION =

Current version

'1.0.1'
GEM_ROOT =

The root of the gem.

File.realpath(File.dirname(__FILE__) + '/../')
TMPL_ROOT =

Location of templates.

File.join(GEM_ROOT, 'tmpl')
AVRO_TYPES =

Mapping from a DataMeta DOM type to a matching renderer of Avro schema JSON. The lambda expect whole DataMetaDom::Field instance, must return the whole specification that you would put under the "type": JSON tag, such as:

"int"

or, for a type with a size:

{ "type": "fixed", "name": "theFieldName", "size": 16}

Note that wrapping this type into optional specification, i.e. unioned with "null" is done by calling the avroType method.

{
        DataMetaDom::BOOL => lambda{|dt| %q<"boolean">},
        DataMetaDom::INT => lambda{ |dt|
          len = dt.length
          case
            when len <= 4; %q<"int">
            when len <= 8; %q<"long">
            else; raise "Invalid integer length #{len}"
          end
        },
        DataMetaDom::FLOAT => lambda{|dt|
          len = dt.length
          case
            when len <= 4; %q<"float">
            when len <= 8; %q<"double">
            else; raise "Invalid float length #{len}"
          end
        },
        DataMetaDom::RAW => lambda{|dt| %q<"bytes">},
        DataMetaDom::STRING => lambda{|dt| %q<"string">},
=begin
Unlike DataMeta DOM, Avro does not support temporal types such as date, time and datetime,
they have a ticket filed for it but no idea when it is going to be implemented.
They use {integral types}[http://avro.apache.org/docs/current/spec.html#Time+%28millisecond+precision%29] for
everything temporal.
=end
    DataMetaDom::DATETIME => lambda{|dt| %q<"long">},
# No support for these in this release:
      #NUMERIC => lambda{|t| "BigDecimal"}
}

Class Method Summary collapse

Class Method Details

.assertMapKeyType(fld, type) ⇒ Object

Raises:

  • (ArgumentError)


141
142
143
144
# File 'lib/dataMetaAvro.rb', line 141

def assertMapKeyType(fld, type)
    raise ArgumentError, %<Field "#{fld.name}": Avro supports only strings as map keys, "#{
        type}" is not supported as a map key by Avro> unless type == DataMetaDom::STRING
end

.assertNamespace(fullName) ⇒ Object

Splits the full name of a class into the namespace and the base, returns an array of the namespace (empty string if there is no namespace on the name) and the base name.

Examples:

  • 'BaseNameAlone' -> ['', 'BaseNameAlone']

  • 'one.package.another.pack.FinallyTheName' -> ['one.package.another.pack', 'FinallyTheName']



113
114
115
116
# File 'lib/dataMetaAvro.rb', line 113

def assertNamespace(fullName)
  ns, base = DataMetaDom::splitNameSpace(fullName)
  [DataMetaDom.validNs?(ns, base) ? ns : '', base]
end

.avroType(dataMetaType) ⇒ Object

Converts DataMeta DOM type to Avro schema type.



73
74
75
76
77
# File 'lib/dataMetaAvro.rb', line 73

def avroType(dataMetaType)
    renderer = AVRO_TYPES[dataMetaType.type]
    raise "Unsupported type #{dataMetaType}" unless renderer
    renderer.call(dataMetaType)
end

.genRecordJson(model, outFile, rec, nameSpace, base) ⇒ Object

Generates an Avro Schema for the given model’s record.

It makes impression that some parameters are not used, but it is not so: they are used by the ERB template as the part of the method’s binding.

The parameters nameSpace and the base can be derived from rec, but since they are evaluated previously by calling assertNamespace, can just as well reuse them.

  • Params:

    • model - DataMetaDom::Model

    • outFile - output file name

    • rec - DataMetaDom::Record

    • nameSpace - the namespace for the record

    • base - base name of the record



100
101
102
103
# File 'lib/dataMetaAvro.rb', line 100

def genRecordJson(model, outFile, rec, nameSpace, base)
    vars =  OpenStruct.new # for template's local variables. ERB does not make them visible to the binding
    IO.write(outFile, "#{ERB.new(IO.read("#{TMPL_ROOT}/dataClass.avsc.erb"), 0, '-').result(binding)}", {:mode => 'wb'})
end

.genSchema(model, outRoot) ⇒ Object

Generates the Avro Schema, one avsc file per a record.



121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/dataMetaAvro.rb', line 121

def genSchema(model, outRoot)
  model.records.values.each { |rec| # loop through all the records in the model
    nameSpace, base = assertNamespace(rec.name)
    FileUtils.mkdir_p outRoot # write json files named as one.package.another.package.ClassName.json in one dir
    outFile = File.join(outRoot, "#{rec.name}.avsc")
      case
        when rec.kind_of?(DataMetaDom::Record)
            genRecordJson model, outFile, rec, nameSpace, base
        else # since we are cycling through records, should never get here
          raise "Unsupported Entity: #{rec.inspect}"
      end
  }
end

.helpAvroSchemaGen(file, errorText = nil) ⇒ Object

Shortcut to help for the Hadoop Writables generator.



136
137
138
139
# File 'lib/dataMetaAvro.rb', line 136

def helpAvroSchemaGen(file, errorText=nil)
    DataMetaDom::help(file, "DataMeta DOM Avro Schema Generation ver #{VERSION}",
                      '<DataMeta DOM source> <Avro Schemas target dir>', errorText)
end

.wrapReqOptional(field, baseType) ⇒ Object

Wraps required/optional in proper enclosure



80
81
82
# File 'lib/dataMetaAvro.rb', line 80

def wrapReqOptional(field, baseType)
    field.isRequired ? baseType : %Q^[#{baseType}, "null"]^
end