Module: Wukong::Schema::ClassMethods

Defined in:
lib/wukong/schema.rb

Instance Method Summary collapse

Instance Method Details

#pig_load(filename = nil) ⇒ Object

A pig snippet to load a tsv file containing serialized instances of this class.

Assumes the first column is the resource name (you can, and probably should, follow with an immediate GENERATE to ditch that field.)



137
138
139
140
141
142
143
144
# File 'lib/wukong/schema.rb', line 137

def pig_load filename=nil
  filename ||= resource_name.to_s+'.tsv'
  cmd = [
    "%-23s" % self.to_s.gsub(/^.*\W/, ""),
    "= LOAD '#{filename}'",
    "AS ( rsrc:chararray,", self.to_pig, ') ;',
  ].join(" ")
end

#sql_create_table(primary_key = nil, drop_first = nil, table_options = '') ⇒ Object

Creates a table for the wukong class.

  • primary_key gives the name of one column to be set as the primary key

  • if drop_first is given, a “DROP TABLE IF EXISTS” statement will precede the snippet.

  • table_options sets the table parameters. Useful table_options for a read-only database in MySQL:

    ENGINE=MyISAM PACK_KEYS=0
    


181
182
183
184
185
186
187
188
189
# File 'lib/wukong/schema.rb', line 181

def sql_create_table primary_key=nil, drop_first=nil, table_options=''
  str = []
  str << %Q{DROP TABLE IF EXISTS `#{self.table_name}`;  } if drop_first
  str << %Q{CREATE TABLE         `#{self.table_name}` ( }
  str << self.to_sql
  if primary_key then str.last << ',' ; str << %Q{  PRIMARY KEY     \t(`#{primary_key}`)} ; end
  str << %Q{  ) #{table_options} ;}
  str.join("\n")
end

#sql_load_mysql(filename = nil) ⇒ Object

A mysql snippet to bulk load the tab-separated-values file emitted by a Wukong script.

Let’s say your class is ClickLog; its resource_name is “click_log” and thus its table_name is ‘click_logs’. sql_load_mysql will:

  • disable indexing on the table

  • import the file, replacing any existing rows. (Replacement is governed by primary key and unique index constraints – see the mysql docs).

  • re-enable indexing on that table

  • show the number of

The load portion will

  • Load into a table named click_logs

  • from a file named click_logs.tsv

  • where all rows have the string ‘click_logs’ in their first column

  • and all remaining fields in their #members order

  • assuming strings are wukong_encode’d and so shouldn’t be escaped or enclosed.

Why the “LINES STARTING BY” part? For map/reduce outputs that have many different objects jumbled together, you can just dump in the whole file, landing each object in its correct table.



216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# File 'lib/wukong/schema.rb', line 216

def sql_load_mysql(filename=nil)
  filename ||= ":resource_name.tsv"
  filename.gsub!(/:resource_name/, self.table_name)
  str = []
  # disable indexing during bulk load
  str << %Q{ALTER TABLE            `#{self.table_name}` DISABLE KEYS; }
  # Bulk load the tab-separated-values file.
  str << %Q{LOAD DATA LOCAL INFILE '#{filename}'}
  str << %Q{  REPLACE INTO TABLE   `#{self.table_name}`    }
  str << %Q{  COLUMNS                                         }
  str << %Q{    TERMINATED BY           '\\t'                 }
  str << %Q{    OPTIONALLY ENCLOSED BY  ''                    }
  str << %Q{    ESCAPED BY              ''                    }
  str << %Q{  LINES STARTING BY     '#{self.resource_name}'   }
  str << %Q{  ( @dummy,\n }
  str << '    '+self.sql_members
  str << %Q{\n  ); }
  # Re-enable indexing
  str << %Q{ALTER TABLE `#{self.table_name}` ENABLE KEYS ; }
  # Show it loaded correctly
  str << %Q{SELECT NOW(), COUNT(*), '#{self.table_name}' FROM `#{self.table_name}`; }
  str.join("\n")
end

#sql_membersObject

List off member names, to be stuffed into a SELECT or a LOAD DATA



165
166
167
# File 'lib/wukong/schema.rb', line 165

def sql_members
  members.map{|attr| "`#{attr}`" }.join(", ")
end

#table_nameObject

Table name for this class



111
112
113
# File 'lib/wukong/schema.rb', line 111

def table_name
  resource_name.to_s.pluralize
end

#to_avroObject

Avro



246
247
248
249
250
251
252
253
254
255
256
# File 'lib/wukong/schema.rb', line 246

def to_avro
  require 'json' # yikes
  h = {}
  h[:name]   = self.name
  h[:type]   = "record"
  h[:fields] =  []
  members.zip(mtypes).each do |member, type|
    h[:fields] << {:name => member.to_s, :type => type.to_avro}
  end
  h.to_json
end

#to_pigObject

Export schema as Pig

Won’t correctly handle complex types (struct having struct as member, eg)



124
125
126
127
128
# File 'lib/wukong/schema.rb', line 124

def to_pig
  members.zip(mtypes).map do |member, type|
    member.to_s + ': ' + type.to_pig
  end.join(', ')
end

#to_sqlObject

Schema definition for use in a CREATE TABLE statement



153
154
155
156
157
158
159
160
# File 'lib/wukong/schema.rb', line 153

def to_sql
  sql_str = []
  members.zip(mtypes).each do |attr, type|
    type_str = type.respond_to?(:to_sql) ? type.to_sql : type.to_s.upcase
    sql_str << "  %-29s\t%s" %["`#{attr}`", type_str]
  end
  sql_str.join(",\n")
end