Class: Remi::DataSubject

Inherits:
Object
  • Object
show all
Defined in:
lib/remi/data_subject.rb

Overview

The DataSubject is the parent class for DataSource and DataTarget. It is not intended to be used as a standalone class.

A DataSubject is either a source or a target. It is largely used to associate a dataframe with a set of "fields" containing metadata describing how the vectors of the dataframe are meant to be interpreted. For example, one of the fields might represent a date with MM-DD-YYYY format.

DataSubjects can be defined either using the standard DataSubject.new(<args>) convention, or through a DSL, which is convenient for data subjects defined in as part of job class definition.

Direct Known Subclasses

DataSource, DataTarget

Defined Under Namespace

Modules: CsvFile, Postgres, Salesforce

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(context = nil, name: 'NOT DEFINED', **kargs, &block) ⇒ DataSubject

Returns a new instance of DataSubject.

Parameters:

  • context (Object) (defaults to: nil)

    the context in which the DSL is evaluated

  • name (Symbol, String) (defaults to: 'NOT DEFINED')

    the name of the data subject

  • block (Proc)

    a block of code to be executed to define the data subject



19
20
21
22
23
24
25
26
# File 'lib/remi/data_subject.rb', line 19

def initialize(context=nil, name: 'NOT DEFINED', **kargs, &block)
  @context = context
  @name = name
  @block = block
  @df_type = :daru
  @fields = Remi::Fields.new
  @field_symbolizer = Remi::FieldSymbolizers[:standard]
end

Instance Attribute Details

#contextObject

Returns the value of attribute context.



28
29
30
# File 'lib/remi/data_subject.rb', line 28

def context
  @context
end

#nameObject

Returns the value of attribute name.



28
29
30
# File 'lib/remi/data_subject.rb', line 28

def name
  @name
end

Instance Method Details

#dfRemi::DataFrame

Returns the dataframe associated with this DataSubject.

Returns:



66
67
68
# File 'lib/remi/data_subject.rb', line 66

def df
  @dataframe ||= Remi::DataFrame.create(df_type, [], order: fields.keys)
end

#df=(new_dataframe) ⇒ Remi::DataFrame

Reassigns the dataframe associated with this DataSubject.

Parameters:

  • new_dataframe (Object)

    The new dataframe object to be associated.

Returns:



73
74
75
76
77
78
79
# File 'lib/remi/data_subject.rb', line 73

def df=(new_dataframe)
  if new_dataframe.respond_to? :df_type
    @dataframe = new_dataframe
  else
    @dataframe = Remi::DataFrame.create(df_type, new_dataframe)
  end
end

#df_type(arg = nil) ⇒ Symbol

Returns the type of dataframe (defaults to :daru if not explicitly set).

Parameters:

  • arg (Symbol) (defaults to: nil)

    sets the type of dataframe to use for this subject

Returns:

  • (Symbol)

    the type of dataframe (defaults to :daru if not explicitly set)



33
34
35
36
# File 'lib/remi/data_subject.rb', line 33

def df_type(arg = nil)
  return get_df_type unless arg
  set_df_type arg
end

#dsl_evalself

Defines the subject using the DSL in the block provided

Returns:

  • (self)


103
104
105
106
107
# File 'lib/remi/data_subject.rb', line 103

def dsl_eval
  dsl_eval! unless @dsl_evaluated
  @dsl_evaluated = true
  self
end

#dsl_eval!Object



109
110
111
112
# File 'lib/remi/data_subject.rb', line 109

def dsl_eval!
  return self unless @block
  Dsl.dsl_eval(self, @context, &@block)
end

#enforce_types(*types) ⇒ self

Enforces the types defined in the field metadata. Throws an error if a data element does not conform to the type. For example, if a field has metadata with type: :date, then the type enforcer will convert data in that field into a date, and will throw an error if it is unable to parse any of the values.

Parameters:

  • types (Array<Symbol>)

    a list of metadata types to use to enforce. If none are given, all types are enforced.

Returns:

  • (self)


90
91
92
93
94
95
96
97
98
# File 'lib/remi/data_subject.rb', line 90

def enforce_types(*types)
  sttm = SourceToTargetMap.new(df, source_metadata: fields)
  fields.keys.each do |field|
    next unless (types.size == 0 || types.include?(fields[field][:type])) && df.vectors.include?(field)
    sttm.source(field).target(field).transform(Remi::Transform::EnforceType.new).execute
  end

  self
end

#field_symbolizer(arg = nil) ⇒ Proc

Field symbolizer used to convert field names into symbols. This method sets the symbolizer for the data subject and also sets the symbolizers for any associated parser and encoders.

Returns:

  • (Proc)

    the method for symbolizing field names



56
57
58
59
60
61
62
63
# File 'lib/remi/data_subject.rb', line 56

def field_symbolizer(arg = nil)
  return @field_symbolizer unless arg
  @field_symbolizer = if arg.is_a? Symbol
                        Remi::FieldSymbolizers[arg]
                      else
                        arg
                      end
end

#fields(arg = nil) ⇒ Remi::Fields

Returns the field metadata for this data subject.

Parameters:

  • arg (Hash, Remi::Fields) (defaults to: nil)

    set the field metadata for this data subject

Returns:

  • (Remi::Fields)

    the field metadata for this data subject



40
41
42
43
# File 'lib/remi/data_subject.rb', line 40

def fields(arg = nil)
  return get_fields unless arg
  set_fields arg
end

#fields=(arg) ⇒ Remi::Fields

Returns the field metadata for this data subject.

Parameters:

  • arg (Hash, Remi::Fields)

    set the field metadata for this data subject

Returns:

  • (Remi::Fields)

    the field metadata for this data subject



47
48
49
# File 'lib/remi/data_subject.rb', line 47

def fields=(arg)
  @fields = Remi::Fields.new(arg)
end