InstDataShipper

This gem is intended to facilitate easy upload of LTI datasets to Instructure Hosted Data.

Installation

Add this line to your application's Gemfile:

gem 'inst_data_shipper'

Then run the migrations:

bundle exec rake db:migrate

Usage

Dumper

The main tool provided by this Gem is the InstDataDumper::Dumper class. It is used to define a "Dump" which is a combination of tasks and schema.

Here is an example Dumper implementation, wrapped in an ActiveJob job:

class HostedDataPushJob < ApplicationJob
  # The schema serves two purposes: defining the schema and mapping data
  SCHEMA = InstDataShipper::SchemaBuilder.build do
    # You can augment the Table-builder DSL with custom methods like so:
    extend_table_builder do
      # It may be useful to define a custom column definition helpers:
      def custom_column(*args, from: nil, **kwargs, &blk)
        # In this example, the helper reads the value from a `data` jsonb column - without it, you'd need
        #   to define `from: ->(row) { row.data["<KEY>"] }` on each column that needs to read from the jsonb
        from ||= args[0].to_s
        from = ->(row) { row.data[from] } if from.is_a?(String)
        column(*args, **kwargs, from: from, &blk)
      end

      # `extend_table_builder` uses `class_eval`, so you could alternatively write your helpers in a Concern or Module and include them like normal:
      include SomeConcern
    end

    table(ALocalModel, "<TABLE DESCRIPTION>") do
      # If you define a table as incremental, it'll only export changes made since the start of the last successful Dumper run
      #  The first argument "scope" can be interpreted in different ways:
      #    If exporting a local model it may be a: (default: `updated_at`)
      #      Proc that will receive a Relation and return a Relation (use `incremental_since`)
      #      String of a column to compare with `incremental_since`
      #    If exporting a Canvas report it may be a: (default: `updated_after`)
      #      Proc that will receive report params and return modified report params (use `incremental_since`)
      #      String of a report param to set to `incremental_since`
      #  `on:` is passed to Hosted Data and is used as the unique key. It may be an array to form a composite-key
      #  `if:` may be a Proc or a Symbol (of a method on the Dumper)
      incremental "updated_at", on: [:id], if: ->() {}

      # Schema's may declaratively define the data source.
      # This can be used for basic schemas where there's a 1:1 mapping between source table and destination table, and there is no conditional logic that needs to be performed.
      # In order to apply these statements, your Dumper must call `auto_enqueue_from_schema`.
      source :local_table
      # A Proc can also be passed. The below is equivalent to the above
      source ->(table_def) { import_local_table(table_def[:model] || table_def[:warehouse_name]) }

      column :name_in_destinations, :maybe_optional_sql_type, "Optional description of column"

      # The type may usually be omitted if the `table()` is passed a Model class, but strings are an exception to this
      custom_column :name, :"varchar(128)"

      # `from:` May be...
      # A Symbol of a method to be called on the record
      custom_column :sis_type, :"varchar(32)", from: :some_model_method
      # A String of a column to read from the record
      custom_column :sis_type, :"varchar(32)", from: "sis_source_type"
      # A Proc to be called with each record
      custom_column :sis_type, :"varchar(32)", from: ->(rec) { ... }
      # Not specified. Will default to using the Schema Column Name as a String ("sis_type" in this case)
      custom_column :sis_type, :"varchar(32)"
    end

    table("my_table", model: ALocalModel) do
      # ...
    end

    table("proserv_student_submissions_csv") do
      column :canvas_id, :bigint, from: "canvas user id"
      column :sis_id, :"varchar(64)", from: "sis user id"
      column :name, :"varchar(64)", from: "user name"
      column :submission_id, :bigint, from: "submission id"
    end
  end

  Dumper = InstDataShipper::Dumper.define(schema: SCHEMA, include: [
    InstDataShipper::DataSources::LocalTables,
    InstDataShipper::DataSources::CanvasReports,
  ]) do
    import_local_table(ALocalModel)
    import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))

    # If the report_name/Model don't directly match the Schema, a schema_name: parameter may be passed:
    import_local_table(SomeModel, schema_name: "my_table")
    import_canvas_report_by_terms("some_report", terms: Term.all.pluck(:canvas_id), schema_name: "my_table")

    # Iterate through the Tables defined in the Schema and apply any defined `source` statements.
    # This is the default behavior if `define()` is called w/o a block.
    auto_enqueue_from_schema
  end

  def perform
    Dumper.perform_dump([
      "hosted-data://<JWT>@<HOSTED DATA SERVER>?table_prefix=example",
      "s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<path>",
    ])
  end
end

Dumpers may also be formed as a normal Ruby subclass:

class HostedDataPushJob < ApplicationJob
  SCHEMA = InstDataShipper::SchemaBuilder.build do
    # ...
  end

  class Dumper < InstDataShipper::Dumper
    include InstDataShipper::DataSources::LocalTables
    include InstDataShipper::DataSources::CanvasReports

    def enqueue_tasks
      import_local_table(ALocalModel)
      import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))

      # auto_enqueue_from_schema
    end

    def table_schemas
      SCHEMA
    end
  end

  def perform
    Dumper.perform_dump([
      "hosted-data://<JWT>@<HOSTED DATA SERVER>?table_prefix=example",
      "s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<path>",
    ])
  end
end

Destinations

This Gem is mainly designed for use with Hosted Data, but it tries to abstract that a little to allow for other destinations/backends. Out of the box, support for Hosted Data and S3 are included.

Destinations are passed as URI-formatted strings. Passing Hashes is also supported, but the format/keys are destination specific.

Destinations blindly accept URI Fragments (the # chunk at the end of the URI). These options are not used internally but will be made available as dest.user_config. Ideally these are in the same format as query parameters (x=1&y=2, which it will try to parse into a Hash), but it can be any string.

Hosted Data

hosted-data://<JWT>@<HOSTED DATA SERVER>

Optional Parameters:

table_prefix: An optional string to prefix onto each table name in the schema when declaring the schema in Hosted Data

S3

s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<optional path>

Optional Parameters:

None

Development

When adding to or updating this gem, make sure you do the following:

Update the yardoc comments where necessary, and confirm the changes by running yardoc --server
Write specs
If you modify the model or migration templates, run bundle exec rake update_test_schema to update them in the Rails Dummy application (and commit those changes)

Docs

Docs can be generated using yard. To view the docs:

Clone this gem's repository
bundle install
yard server --reload

The yard server will give you a URL you can visit to view the docs.