Module: Wukong::Schema::ClassMethods
- Defined in:
- lib/wukong/schema.rb
Instance Method Summary collapse
-
#pig_load(filename = nil) ⇒ Object
A pig snippet to load a tsv file containing serialized instances of this class.
-
#sql_create_table(primary_key = nil, drop_first = nil, table_options = '') ⇒ Object
Creates a table for the wukong class.
-
#sql_load_mysql(filename = nil) ⇒ Object
A mysql snippet to bulk load the tab-separated-values file emitted by a Wukong script.
-
#sql_members ⇒ Object
List off member names, to be stuffed into a SELECT or a LOAD DATA.
-
#table_name ⇒ Object
Table name for this class.
-
#to_avro ⇒ Object
Avro.
-
#to_pig ⇒ Object
Export schema as Pig.
-
#to_sql ⇒ Object
Schema definition for use in a CREATE TABLE statement.
Instance Method Details
#pig_load(filename = nil) ⇒ Object
A pig snippet to load a tsv file containing serialized instances of this class.
Assumes the first column is the resource name (you can, and probably should, follow with an immediate GENERATE to ditch that field.)
137 138 139 140 141 142 143 144 |
# File 'lib/wukong/schema.rb', line 137 def pig_load filename=nil filename ||= resource_name.to_s+'.tsv' cmd = [ "%-23s" % self.to_s.gsub(/^.*\W/, ""), "= LOAD '#{filename}'", "AS ( rsrc:chararray,", self.to_pig, ') ;', ].join(" ") end |
#sql_create_table(primary_key = nil, drop_first = nil, table_options = '') ⇒ Object
Creates a table for the wukong class.
-
primary_key gives the name of one column to be set as the primary key
-
if drop_first is given, a “DROP TABLE IF EXISTS” statement will precede the snippet.
-
table_options sets the table parameters. Useful table_options for a read-only database in MySQL:
ENGINE=MyISAM PACK_KEYS=0
181 182 183 184 185 186 187 188 189 |
# File 'lib/wukong/schema.rb', line 181 def sql_create_table primary_key=nil, drop_first=nil, ='' str = [] str << %Q{DROP TABLE IF EXISTS `#{self.table_name}`; } if drop_first str << %Q{CREATE TABLE `#{self.table_name}` ( } str << self.to_sql if primary_key then str.last << ',' ; str << %Q{ PRIMARY KEY \t(`#{primary_key}`)} ; end str << %Q{ ) #{table_options} ;} str.join("\n") end |
#sql_load_mysql(filename = nil) ⇒ Object
A mysql snippet to bulk load the tab-separated-values file emitted by a Wukong script.
Let’s say your class is ClickLog; its resource_name is “click_log” and thus its table_name is ‘click_logs’. sql_load_mysql will:
-
disable indexing on the table
-
import the file, replacing any existing rows. (Replacement is governed by primary key and unique index constraints – see the mysql docs).
-
re-enable indexing on that table
-
show the number of
The load portion will
-
Load into a table named click_logs
-
from a file named click_logs.tsv
-
where all rows have the string ‘click_logs’ in their first column
-
and all remaining fields in their #members order
-
assuming strings are wukong_encode’d and so shouldn’t be escaped or enclosed.
Why the “LINES STARTING BY” part? For map/reduce outputs that have many different objects jumbled together, you can just dump in the whole file, landing each object in its correct table.
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
# File 'lib/wukong/schema.rb', line 216 def sql_load_mysql(filename=nil) filename ||= ":resource_name.tsv" filename.gsub!(/:resource_name/, self.table_name) str = [] # disable indexing during bulk load str << %Q{ALTER TABLE `#{self.table_name}` DISABLE KEYS; } # Bulk load the tab-separated-values file. str << %Q{LOAD DATA LOCAL INFILE '#{filename}'} str << %Q{ REPLACE INTO TABLE `#{self.table_name}` } str << %Q{ COLUMNS } str << %Q{ TERMINATED BY '\\t' } str << %Q{ OPTIONALLY ENCLOSED BY '' } str << %Q{ ESCAPED BY '' } str << %Q{ LINES STARTING BY '#{self.resource_name}' } str << %Q{ ( @dummy,\n } str << ' '+self.sql_members str << %Q{\n ); } # Re-enable indexing str << %Q{ALTER TABLE `#{self.table_name}` ENABLE KEYS ; } # Show it loaded correctly str << %Q{SELECT NOW(), COUNT(*), '#{self.table_name}' FROM `#{self.table_name}`; } str.join("\n") end |
#sql_members ⇒ Object
List off member names, to be stuffed into a SELECT or a LOAD DATA
165 166 167 |
# File 'lib/wukong/schema.rb', line 165 def sql_members members.map{|attr| "`#{attr}`" }.join(", ") end |
#table_name ⇒ Object
Table name for this class
111 112 113 |
# File 'lib/wukong/schema.rb', line 111 def table_name resource_name.to_s.pluralize end |
#to_avro ⇒ Object
Avro
246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/wukong/schema.rb', line 246 def to_avro require 'json' # yikes h = {} h[:name] = self.name h[:type] = "record" h[:fields] = [] members.zip(mtypes).each do |member, type| h[:fields] << {:name => member.to_s, :type => type.to_avro} end h.to_json end |
#to_pig ⇒ Object
Export schema as Pig
Won’t correctly handle complex types (struct having struct as member, eg)
124 125 126 127 128 |
# File 'lib/wukong/schema.rb', line 124 def to_pig members.zip(mtypes).map do |member, type| member.to_s + ': ' + type.to_pig end.join(', ') end |
#to_sql ⇒ Object
Schema definition for use in a CREATE TABLE statement
153 154 155 156 157 158 159 160 |
# File 'lib/wukong/schema.rb', line 153 def to_sql sql_str = [] members.zip(mtypes).each do |attr, type| type_str = type.respond_to?(:to_sql) ? type.to_sql : type.to_s.upcase sql_str << " %-29s\t%s" %["`#{attr}`", type_str] end sql_str.join(",\n") end |