InstDataShipper
This gem is intended to facilitate easy upload of LTI datasets to Instructure Hosted Data.
Installation
Add this line to your application's Gemfile:
gem 'inst_data_shipper'
Then run the migrations:
bundle exec rake db:migrate
Usage
Dumper
The main tool provided by this Gem is the InstDataDumper::Dumper
class. It is used to define a "Dump" which is a combination of tasks and schema.
Here is an example Dumper
implementation, wrapped in an ActiveJob job:
class HostedDataPushJob < ApplicationJob
# The schema serves two purposes: defining the schema and mapping data
SCHEMA = InstDataShipper::SchemaBuilder.build do
# You can augment the Table-builder DSL with custom methods like so:
extend_table_builder do
# It may be useful to define a custom column definition helpers:
def custom_column(*args, from: nil, **kwargs, &blk)
# In this example, the helper reads the value from a `data` jsonb column - without it, you'd need
# to define `from: ->(row) { row.data["<KEY>"] }` on each column that needs to read from the jsonb
from ||= args[0].to_s
from = ->(row) { row.data[from] } if from.is_a?(String)
column(*args, **kwargs, from: from, &blk)
end
# `extend_table_builder` uses `class_eval`, so you could alternatively write your helpers in a Concern or Module and include them like normal:
include SomeConcern
end
table(ALocalModel, "<TABLE DESCRIPTION>") do
# If you define a table as incremental, it'll only export changes made since the start of the last successful Dumper run
# The first argument "scope" can be interpreted in different ways:
# If exporting a local model it may be a: (default: `updated_at`)
# Proc that will receive a Relation and return a Relation (use `incremental_since`)
# String of a column to compare with `incremental_since`
# If exporting a Canvas report it may be a: (default: `updated_after`)
# Proc that will receive report params and return modified report params (use `incremental_since`)
# String of a report param to set to `incremental_since`
# `on:` is passed to Hosted Data and is used as the unique key. It may be an array to form a composite-key
# `if:` may be a Proc or a Symbol (of a method on the Dumper)
incremental "updated_at", on: [:id], if: ->() {}
# Schema's may declaratively define the data source.
# This can be used for basic schemas where there's a 1:1 mapping between source table and destination table, and there is no conditional logic that needs to be performed.
# In order to apply these statements, your Dumper must call `auto_enqueue_from_schema`.
source :local_table
# A Proc can also be passed. The below is equivalent to the above
source ->(table_def) { import_local_table(table_def[:model] || table_def[:warehouse_name]) }
column :name_in_destinations, :maybe_optional_sql_type, "Optional description of column"
# The type may usually be omitted if the `table()` is passed a Model class, but strings are an exception to this
custom_column :name, :"varchar(128)"
# `from:` May be...
# A Symbol of a method to be called on the record
custom_column :sis_type, :"varchar(32)", from: :some_model_method
# A String of a column to read from the record
custom_column :sis_type, :"varchar(32)", from: "sis_source_type"
# A Proc to be called with each record
custom_column :sis_type, :"varchar(32)", from: ->(rec) { ... }
# Not specified. Will default to using the Schema Column Name as a String ("sis_type" in this case)
custom_column :sis_type, :"varchar(32)"
end
table("my_table", model: ALocalModel) do
# ...
end
table("proserv_student_submissions_csv") do
column :canvas_id, :bigint, from: "canvas user id"
column :sis_id, :"varchar(64)", from: "sis user id"
column :name, :"varchar(64)", from: "user name"
column :submission_id, :bigint, from: "submission id"
end
end
Dumper = InstDataShipper::Dumper.define(schema: SCHEMA, include: [
InstDataShipper::DataSources::LocalTables,
InstDataShipper::DataSources::CanvasReports,
]) do
import_local_table(ALocalModel)
import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))
# If the report_name/Model don't directly match the Schema, a schema_name: parameter may be passed:
import_local_table(SomeModel, schema_name: "my_table")
import_canvas_report_by_terms("some_report", terms: Term.all.pluck(:canvas_id), schema_name: "my_table")
# Iterate through the Tables defined in the Schema and apply any defined `source` statements.
# This is the default behavior if `define()` is called w/o a block.
auto_enqueue_from_schema
end
def perform
Dumper.perform_dump([
"hosted-data://<JWT>@<HOSTED DATA SERVER>?table_prefix=example",
"s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<path>",
])
end
end
Dumper
s may also be formed as a normal Ruby subclass:
class HostedDataPushJob < ApplicationJob
SCHEMA = InstDataShipper::SchemaBuilder.build do
# ...
end
class Dumper < InstDataShipper::Dumper
include InstDataShipper::DataSources::LocalTables
include InstDataShipper::DataSources::CanvasReports
def enqueue_tasks
import_local_table(ALocalModel)
import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id))
# auto_enqueue_from_schema
end
def table_schemas
SCHEMA
end
end
def perform
Dumper.perform_dump([
"hosted-data://<JWT>@<HOSTED DATA SERVER>?table_prefix=example",
"s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<path>",
])
end
end
Destinations
This Gem is mainly designed for use with Hosted Data, but it tries to abstract that a little to allow for other destinations/backends. Out of the box, support for Hosted Data and S3 are included.
Destinations are passed as URI-formatted strings. Passing Hashes is also supported, but the format/keys are destination specific.
Destinations blindly accept URI Fragments (the #
chunk at the end of the URI). These options are not used internally but will be made available as dest.user_config
. Ideally these are in the same format as query parameters (x=1&y=2
, which it will try to parse into a Hash), but it can be any string.
Hosted Data
hosted-data://<JWT>@<HOSTED DATA SERVER>
Optional Parameters:
table_prefix
: An optional string to prefix onto each table name in the schema when declaring the schema in Hosted Data
S3
s3://<access_key_id>:<access_key_secret>@<region>/<bucket>/<optional path>
Optional Parameters:
None
Development
When adding to or updating this gem, make sure you do the following:
- Update the yardoc comments where necessary, and confirm the changes by running
yardoc --server
- Write specs
- If you modify the model or migration templates, run
bundle exec rake update_test_schema
to update them in the Rails Dummy application (and commit those changes)
Docs
Docs can be generated using yard. To view the docs:
- Clone this gem's repository
bundle install
yard server --reload
The yard server will give you a URL you can visit to view the docs.