Method: ActiveRecord::Base.bulk_import

Defined in:
lib/activerecord-import/import.rb

.bulk_import(*args) ⇒ Object

Imports a collection of values to the database.

This is more efficient than using ActiveRecord::Base#create or ActiveRecord::Base#save multiple times. This method works well if you want to create more than one record at a time and do not care about having ActiveRecord objects returned for each record inserted.

This can be used with or without validations. It does not utilize the ActiveRecord::Callbacks during creation/modification while performing the import.

Usage

Model.import array_of_models
Model.import column_names, array_of_models
Model.import array_of_hash_objects
Model.import column_names, array_of_hash_objects
Model.import column_names, array_of_values
Model.import column_names, array_of_values, options

Model.import array_of_models

With this form you can call import passing in an array of model objects that you want updated.

Model.import column_names, array_of_values

The first parameter column_names is an array of symbols or strings which specify the columns that you want to update.

The second parameter, array_of_values, is an array of arrays. Each subarray is a single set of values for a new record. The order of values in each subarray should match up to the order of the column_names.

Model.import column_names, array_of_values, options

The first two parameters are the same as the above form. The third parameter, options, is a hash. This is optional. Please see below for what options are available.

Options

  • validate - true|false, tells import whether or not to use ActiveRecord validations. Validations are enforced by default. It skips the uniqueness validation for performance reasons. You can find more details here: github.com/zdennis/activerecord-import/issues/228

  • ignore - true|false, an alias for on_duplicate_key_ignore.

  • on_duplicate_key_ignore - true|false, tells import to discard records that contain duplicate keys. For Postgres 9.5+ it adds ON CONFLICT DO NOTHING, for MySQL it uses INSERT IGNORE, and for SQLite it uses INSERT OR IGNORE. Cannot be enabled on a recursive import. For database adapters that normally support setting primary keys on imported objects, this option prevents that from occurring.

  • on_duplicate_key_update - :all, an Array, or Hash, tells import to use MySQL’s ON DUPLICATE KEY UPDATE or Postgres/SQLite ON CONFLICT DO UPDATE ability. See On Duplicate Key Update below.

  • synchronize - an array of ActiveRecord instances for the model that you are currently importing data into. This synchronizes existing model instances in memory with updates from the import.

  • timestamps - true|false, tells import to not add timestamps (if false) even if record timestamps is disabled in ActiveRecord::Base

  • recursive - true|false, tells import to import all has_many/has_one associations if the adapter supports setting the primary keys of the newly imported objects. PostgreSQL only.

  • batch_size - an integer value to specify the max number of records to include per insert. Defaults to the total number of records to import.

Examples

class BlogPost < ActiveRecord::Base ; end

# Example using array of model objects
posts = [ BlogPost.new author_name: 'Zach Dennis', title: 'AREXT',
          BlogPost.new author_name: 'Zach Dennis', title: 'AREXT2',
          BlogPost.new author_name: 'Zach Dennis', title: 'AREXT3' ]
BlogPost.import posts

# Example using array_of_hash_objects
# NOTE: column_names will be determined by using the keys of the first hash in the array. If later hashes in the
# array have different keys an exception will be raised. If you have hashes to import with different sets of keys
# we recommend grouping these into batches before importing.
values = [ {author_name: 'zdennis', title: 'test post'} ], [ {author_name: 'jdoe', title: 'another test post'} ] ]
BlogPost.import values

# Example using column_names and array_of_hash_objects
columns = [ :author_name, :title ]
values = [ {author_name: 'zdennis', title: 'test post'} ], [ {author_name: 'jdoe', title: 'another test post'} ] ]
BlogPost.import columns, values

# Example using column_names and array_of_values
columns = [ :author_name, :title ]
values = [ [ 'zdennis', 'test post' ], [ 'jdoe', 'another test post' ] ]
BlogPost.import columns, values

# Example using column_names, array_of_value and options
columns = [ :author_name, :title ]
values = [ [ 'zdennis', 'test post' ], [ 'jdoe', 'another test post' ] ]
BlogPost.import( columns, values, validate: false  )

# Example synchronizing existing instances in memory
post = BlogPost.where(author_name: 'zdennis').first
puts post.author_name # => 'zdennis'
columns = [ :author_name, :title ]
values = [ [ 'yoda', 'test post' ] ]
BlogPost.import posts, synchronize: [ post ]
puts post.author_name # => 'yoda'

# Example synchronizing unsaved/new instances in memory by using a uniqued imported field
posts = [BlogPost.new(title: "Foo"), BlogPost.new(title: "Bar")]
BlogPost.import posts, synchronize: posts, synchronize_keys: [:title]
puts posts.first.persisted? # => true

On Duplicate Key Update (MySQL)

The :on_duplicate_key_update option can be either :all, an Array, or a Hash.

Using :all

The :on_duplicate_key_update option can be set to :all. All columns other than the primary key are updated. If a list of column names is supplied, only those columns will be updated. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: :all

Using an Array

The :on_duplicate_key_update option can be an array of column names. The column names are the only fields that are updated if a duplicate record is found. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: [ :date_modified, :content, :author ]

Using A Hash

The :on_duplicate_key_update option can be a hash of column names to model attribute name mappings. This gives you finer grained control over what fields are updated with what attributes on your model. Below is an example:

BlogPost.import columns, attributes, on_duplicate_key_update: { title: :title }

On Duplicate Key Update (Postgres 9.5+ and SQLite 3.24+)

The :on_duplicate_key_update option can be :all, an Array, or a Hash with up to three attributes, :conflict_target (and optionally :index_predicate) or :constraint_name (Postgres), and :columns.

Using :all

The :on_duplicate_key_update option can be set to :all. All columns other than the primary key are updated. If a list of column names is supplied, only those columns will be updated. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: :all

Using an Array

The :on_duplicate_key_update option can be an array of column names. This option only handles inserts that conflict with the primary key. If a table does not have a primary key, this will not work. The column names are the only fields that are updated if a duplicate record is found. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: [ :date_modified, :content, :author ]

Using a Hash

The :on_duplicate_key_update option can be a hash with up to three attributes, :conflict_target (and optionally :index_predicate) or :constraint_name, and :columns. Unlike MySQL, Postgres requires the conflicting constraint to be explicitly specified. Using this option allows you to specify a constraint other than the primary key.

:conflict_target

The :conflict_target attribute specifies the columns that make up the conflicting unique constraint and can be a single column or an array of column names. This attribute is ignored if :constraint_name is included, but it is the preferred method of identifying a constraint. It will default to the primary key. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: { conflict_target: [ :author_id, :slug ], columns: [ :date_modified ] }
:index_predicate

The :index_predicate attribute optionally specifies a WHERE condition on :conflict_target, which is required for matching against partial indexes. This attribute is ignored if :constraint_name is included. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: { conflict_target: [ :author_id, :slug ], index_predicate: 'status <> 0', columns: [ :date_modified ] }
:constraint_name

The :constraint_name attribute explicitly identifies the conflicting unique index by name. Postgres documentation discourages using this method of identifying an index unless absolutely necessary. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: { constraint_name: :blog_posts_pkey, columns: [ :date_modified ] }
:condition

The :condition attribute optionally specifies a WHERE condition on :conflict_action. Only rows for which this expression returns true will be updated. Note that it’s evaluated last, after a conflict has been identified as a candidate to update. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: { conflict_target: [ :author_id ], condition: "blog_posts.title NOT LIKE '%sample%'", columns: [ :author_name ] }
:columns

The :columns attribute can be either :all, an Array, or a Hash.

Using :all

The :columns attribute can be :all. All columns other than the primary key will be updated. If a list of column names is supplied, only those columns will be updated. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: { conflict_target: :slug, columns: :all }
Using an Array

The :columns attribute can be an array of column names. The column names are the only fields that are updated if a duplicate record is found. Below is an example:

BlogPost.import columns, values, on_duplicate_key_update: { conflict_target: :slug, columns: [ :date_modified, :content, :author ] }
Using a Hash

The :columns option can be a hash of column names to model attribute name mappings. This gives you finer grained control over what fields are updated with what attributes on your model. Below is an example:

BlogPost.import columns, attributes, on_duplicate_key_update: { conflict_target: :slug, columns: { title: :title } }

Returns

This returns an object which responds to failed_instances and num_inserts.

  • failed_instances - an array of objects that fails validation and were not committed to the database. An empty array if no validation is performed.

  • num_inserts - the number of insert statements it took to import the data

  • ids - the primary keys of the imported ids if the adapter supports it, otherwise an empty array.

  • results - import results if the adapter supports it, otherwise an empty array.



530
531
532
533
534
535
536
537
538
539
540
# File 'lib/activerecord-import/import.rb', line 530

def bulk_import(*args)
  if args.first.is_a?( Array ) && args.first.first.is_a?(ActiveRecord::Base)
    options = {}
    options.merge!( args.pop ) if args.last.is_a?(Hash)

    models = args.first
    import_helper(models, options)
  else
    import_helper(*args)
  end
end