Class: Migratrix::Transforms::Transform
- Includes:
- Loggable, ValidOptions
- Defined in:
- lib/migratrix/transforms/transform.rb
Overview
Transform base class. A transform takes a collection of extracted objects and returns a collection of transformed objects. To do this, it needs to know the following:
-
What kind of collection to create. (Array? Hash? Set?
Custom?)
-
How to transform one extracted object into a transformed
object. 2.1 How to create the object that will hold the transformed object’s attributes (might be the transformed object’s class, might be a hash of attributes used to create the object itself, you might even try loading the target object from the destination db to see if it’s already been migrated–and if so, can it be skipped or does it need to be updated?) 2.2 What attributes to pull off the extracted object 2.3 HOW to pull an attribute off the extracted object 2.4 How to transform each attribute 2.5 What attributes to store on the target object 2.6 HOW to store an attribute on the target object 2.7 How to finalize the transformation 2.7.1 Did you create the target object by class and have you been updating it all along? Then yay, this step is a no-op and you’re already done. 2.7.2 If you’re doing a merge transformation, and you haven’t already checked the destination db for an existing record, now’s the time to try to load that sucker and, if successful, to merge it with your migrated values. 2.7.3 A common Rails optimization is to create an attributes hash and new the ActiveRecord object in one call. This is faster* than creating a blank object and updating its attributes individually. 2.8 How to store the finalized object in the collection. (hash=obj? set << obj?)
And remember, all of this is just to handle ONE record, albeit admittedly in the most complicated way possible . For sql->sql migrations the transformation step is pretty simple. (It doesn’t exist, ha ha. A cross-database INSERT SELECT means that the extract, transform and load all take place in the extract step.)
-
DANGER DANGER DANGER This is a completely unsubstantiated
optimization claim. I’m sure it’s faster, though, so that’s good enough, right? TODO: BENCHMARK THIS OR SOMETHING WHATEVER
Direct Known Subclasses
Instance Attribute Summary collapse
-
#extraction ⇒ Object
Name of the extraction to use.
-
#name ⇒ Object
Returns the value of attribute name.
-
#options ⇒ Object
Returns the value of attribute options.
-
#transformations ⇒ Object
Returns the value of attribute transformations.
Instance Method Summary collapse
- #apply_attribute(object, attribute_or_apply, value) ⇒ Object
- #create_new_object(extracted_row) ⇒ Object
- #create_transformed_collection ⇒ Object
- #extract_attribute(object, attribute_or_extract) ⇒ Object
-
#finalize_object(new_object) ⇒ Object
Both Missing: final_object = new_object :final_class only: final_obj = FinalClass.new(new_obj) :finalize only: final_obj = finalize(new_obj) Both Present: final_obj = FinalClass.new(finalize(obj)).
-
#initialize(name, options = {}) ⇒ Transform
constructor
A new instance of Transform.
- #store_transformed_object(object, collection) ⇒ Object
-
#transform(extracted_objects) ⇒ Object
This transform method has strategy magic at every turn.
Constructor Details
#initialize(name, options = {}) ⇒ Transform
Returns a new instance of Transform.
64 65 66 67 68 |
# File 'lib/migratrix/transforms/transform.rb', line 64 def initialize(name, ={}) @name = name @options = .symbolize_keys @transformations = [:transform] end |
Instance Attribute Details
#extraction ⇒ Object
Name of the extraction to use. If omitted, returns our name.
71 72 73 |
# File 'lib/migratrix/transforms/transform.rb', line 71 def extraction @extraction end |
#name ⇒ Object
Returns the value of attribute name.
49 50 51 |
# File 'lib/migratrix/transforms/transform.rb', line 49 def name @name end |
#options ⇒ Object
Returns the value of attribute options.
49 50 51 |
# File 'lib/migratrix/transforms/transform.rb', line 49 def @options end |
#transformations ⇒ Object
Returns the value of attribute transformations.
49 50 51 |
# File 'lib/migratrix/transforms/transform.rb', line 49 def transformations @transformations end |
Instance Method Details
#apply_attribute(object, attribute_or_apply, value) ⇒ Object
214 215 216 217 218 219 220 221 222 223 224 225 |
# File 'lib/migratrix/transforms/transform.rb', line 214 def apply_attribute(object, attribute_or_apply, value) raise NotImplementedError unless [:apply_attribute] option = [:apply_attribute] case option when Proc option.call(object, attribute_or_apply, value) when Symbol object.send(option, attribute_or_apply, value) else raise TypeError end end |
#create_new_object(extracted_row) ⇒ Object
200 201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/migratrix/transforms/transform.rb', line 200 def create_new_object(extracted_row) # TODO: this should work like finalize, taking a create and an init. raise NotImplementedError unless [:transform_class] option = [:transform_class] case option when Proc option.call(extracted_row) when Class option.new # laaame--should receive extracted_row, see todo above else raise TypeError end end |
#create_transformed_collection ⇒ Object
187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/migratrix/transforms/transform.rb', line 187 def create_transformed_collection raise NotImplementedError unless [:transform_collection] option = [:transform_collection] case option when Proc option.call when Class option.new else raise TypeError end end |
#extract_attribute(object, attribute_or_extract) ⇒ Object
227 228 229 230 231 232 233 234 235 236 237 238 |
# File 'lib/migratrix/transforms/transform.rb', line 227 def extract_attribute(object, attribute_or_extract) raise NotImplementedError unless [:extract_attribute] option = [:extract_attribute] case option when Proc option.call(object, attribute_or_extract) when Symbol object.send(option, attribute_or_extract) else raise TypeError end end |
#finalize_object(new_object) ⇒ Object
Both Missing: final_object = new_object
:final_class only: final_obj = FinalClass.new(new_obj)
:finalize only: final_obj = finalize(new_obj)
Both Present: final_obj = FinalClass.new(finalize(obj))
245 246 247 248 249 250 251 252 |
# File 'lib/migratrix/transforms/transform.rb', line 245 def finalize_object(new_object) return new_object unless [:final_class] || [:finalize_object] raise TypeError if [:finalize_object] && ![:finalize_object].is_a?(Proc) raise TypeError if [:final_class] && ![:final_class].is_a?(Class) new_object = [:finalize_object].call(new_object) if [:finalize_object] new_object = [:final_class].new(new_object) if [:final_class] new_object end |
#store_transformed_object(object, collection) ⇒ Object
254 255 256 257 258 259 260 261 262 263 264 265 |
# File 'lib/migratrix/transforms/transform.rb', line 254 def store_transformed_object(object, collection) raise NotImplementedError unless [:store_transformed_object] option = [:store_transformed_object] case option when Proc option.call(object, collection) when Symbol collection.send(option, object) else raise TypeError end end |
#transform(extracted_objects) ⇒ Object
This transform method has strategy magic at every turn. I expect it to be slow, but we can optimize it later, e.g. by rolling out a define_method or similar for all of the constant parts.
Map’s strategy, as used by PetsMigration
create_transformed_collection -> Hash.new create_new_object -> Hash.new transformation -> {:id => :id, :name => :name } extract_attribute -> object apply_attribute -> object = attribute_or_apply finalize_object -> no-op store_transformed_object -> collection[object] = object
Default strategy:
create_transformed_collection -> Array.new create_new_object -> @options.new transformations -> MUST BE SUPPLIED BY CHILD CLASS, e.g. Map’s => :src hash extract_attribute -> attribute_or_extract.is_a?(Proc) ? attribute_or_extract.call(object) : object.send(attribute_or_extract) apply_attribute -> object.send(“#attribute_or_apply=”, value) finalize_object -> no-op store_transformed_object -> collection << transformed_object
Now, can we represent these two strategies as configurations in the migration? For example, here’s one way of representing PetTypeMigration’s strategy, which in the Load step must save off a YAML dump of a hash of all the pet objects (with just id and name) keyed
set_transform :repetition_types, :map,
:map => {:id => :id, :name => :name },
:extract_method => :index,
:apply_method => :index,
:new_class => Hash,
:collection => Hash,
:store_method => lambda {|item, hash| hash[item[:id]] = item },
So the backing magic is this:
-
We expect :new_class to respond to new() and give us a new, blank object.
-
Ditto for :collection; it should respond to new()
-
extract_method :index means attr = extracted_object
-
apply_method :index means the same thing
-
store_method is the only weirdness here, and it might actually make sense to have names for the most obvious strategies, like :store_by => { :index => :id } (An array could use :store_by => :push, Set by :add, etc.)
-
The :final_class option is optional; its presence interacts with the also-optional :finalize option.
-
The :finalize option is optional; its presence interacts with the also-optional :final_class option, as follows:
Both Missing: final_object = new_object :final_class only: final_obj = FinalClass.new(new_obj) :finalize only: final_obj = finalize(new_obj) Both Present: final_obj = FinalClass.new(finalize(obj))
Examples/Notes:
-
To build ActiveRecord objects quickly, use :new_class => Hash, :final_class => ModelClass, :finalize => nil.
-
If your source row is a hash (e.g. select_all or a YAML load) that needs no transformation, but has more columns than your ActiveRecord model can accept, you could use a copy transform (basically a map transform without the map step; new_object = extracted_object) with something like :new_class => Hash, :final_class => ModelClass, :finalize
> lambda {|hsh| hsh.dup.keep_if? {|k,v|
k.in?(ModelClass.new.attributes.keys)}} (That would be HORRIBLY inefficient but there would be ways to memoize with e.g. before_finalize { @keys = ModelClass.new.attributes.keys })
-
Advanced Magic:
-
new_class, collection can be procs returning a new object
-
extract_method, apply_method can be procs taking |obj, attr| and |obj, attr, value|
Super Advanced Magic (YAGNI):
-
instead of procs, take blocks and use define_method on them so they’re faster.
171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# File 'lib/migratrix/transforms/transform.rb', line 171 def transform(extracted_objects) info "Transform #{name} started transform." transformed_collection = create_transformed_collection extracted_objects.each do |extracted_object| new_object = create_new_object(extracted_object) transformations.each do |attribute_or_apply, attribute_or_extract| apply_attribute(new_object, attribute_or_apply, extract_attribute(extracted_object, attribute_or_extract)) end transformed_object = finalize_object(new_object) store_transformed_object(transformed_object, transformed_collection) end info "Transform #{name} finished transform." transformed_collection end |