Class: FlexColumns::Contents::ColumnData

Inherits:
Object
  • Object
show all
Defined in:
lib/flex_columns/contents/column_data.rb

Overview

ColumnData is one of the core classes in flex_columns. An instance of ColumnData represents the data present in a single row for a single flex column; it stores that data, is used to set and retrieve that data, and can serialize and deserialize itself from and to JSON (with headers and optional compression added for binary storage).

Clients do not interact with ColumnData itself; rather, they interact with an instance of a generated subclass of FlexColumnsContentsBase, and it delegates core methods to this object.

Instance Method Summary collapse

Constructor Details

#initialize(field_set, options = { }) ⇒ ColumnData

Creates a new instance. field_set is the FlexColumns::Definition::FieldSet that contains the set of fields defined for this flex column; options can contain:

:storage_string

The data present in the column in the database; this can be omitted if creating an instance for a row that has no data, or for a new row.

:data_source

Where did that data come from? This can be any object; it must respond to #describe_flex_column_data_source (no arguments), which should return a String that is used in thrown exceptions to let the client know what data caused the problem; it also must respond to #notification_hash_for_flex_column_data_source (no arguments), which should return a Hash that is used to generate the payload for the ActiveSupport::Notification calls this class makes. (This is, in practice, always an instance of the FlexColumnsContentsBase subclass generated for the column.)

:unknown_fields

Must pass :preserve or :delete. If there are keys in the serialized JSON that do not correspond to any fields that the FieldSet knows about, this determines what will happen to that data when re-serializing it to save: :preserve keeps that data, while :delete removes it. (In neither case is that data actually accessible; you must declare a field if you want access to it.)

:length_limit

If present, specifies the maximum length of data that can be stored in the underlying storage mechanism (the column). When serializing data, this object will raise an exception if the serialized form is longer than this limit. This is used to avoid cases where the database might otherwise silently truncate the data being stored (I’m looking at you, MySQL) and hence corrupt stored data.

:storage

This must be :binary, :text, or :json. If :text, standard, uncompressed JSON will always be stored. (It is not possible to store compressed data reliably in a text column, because the database will interpret the bytes as characters and may modify them or raise an exception if byte sequences are present that would be invalid characters in whatever encoding it’s using.) If :binary, then a very small header will be written that’s just for versioning (currently FC:01,), followed by a marker indicating if it’s compressed (1,) or not (0,), followed by either standard, uncompressed JSON encoded in UTF-8 or the GZipped version of the same. If :json, then we assume the database has a native JSON type (like PostgreSQL with sufficiently-recent ActiveRecord and PG gem), and deal in an actual Hash, which the database processes directly.

:compress_if_over_length

If present, must be set to an integer. If :storage is :binary and the JSON string is at least this many bytes long, then this class will compress it before returning its stored data (from #to_stored_data); if the compressed version is at most 95% (MIN_SIZE_REDUCTION_RATIO_FOR_COMPRESSION) as long as the uncompressed version, then the compressed version will be used instead.

:binary_header

Must be true or false. If false, then, even if :storage is :binary, no header will be written to the binary column. (As a consequence, compression will also be disabled, since compression requires the header.)

:null

Must be true or false. If false, assumes the underlying column in the database is defined as non-NULL (although this is not recommended), and therefore will set an empty string (“”) on the column if there’s no data in it, rather than SQL NULL.

Raises:

  • (ArgumentError)


56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/flex_columns/contents/column_data.rb', line 56

def initialize(field_set, options = { })
  options.assert_valid_keys(:storage_string, :data_source, :unknown_fields, :length_limit, :storage,
    :compress_if_over_length, :binary_header, :null)

  @storage_string = options[:storage_string]
  @field_set = field_set
  @data_source = options[:data_source]
  @unknown_fields = options[:unknown_fields]
  @length_limit = options[:length_limit]
  @storage = options[:storage]
  @compress_if_over_length = options[:compress_if_over_length]
  @binary_header = options[:binary_header]
  @null = options[:null]

  raise ArgumentError, "Invalid JSON string: #{storage_string.inspect}" if storage_string && (! storage_string.kind_of?(String)) && (! storage_string.kind_of?(Hash))
  raise ArgumentError, "Must supply a FieldSet, not: #{field_set.inspect}" unless field_set.kind_of?(FlexColumns::Definition::FieldSet)
  raise ArgumentError, "Must supply a data source, not: #{data_source.inspect}" unless data_source
  raise ArgumentError, "Invalid value for :unknown_fields: #{unknown_fields.inspect}" unless [ :preserve, :delete ].include?(unknown_fields)
  raise ArgumentError, "Invalid value for :length_limit: #{length_limit.inspect}" if length_limit && (! (length_limit.kind_of?(Integer) && length_limit >= 8))
  raise ArgumentError, "Invalid value for :storage: #{storage.inspect}" unless [ :binary, :text, :json ].include?(storage)
  raise ArgumentError, "Invalid value for :compress_if_over_length: #{compress_if_over_length.inspect}" if compress_if_over_length && (! compress_if_over_length.kind_of?(Integer))
  raise ArgumentError, "Invalid value for :binary_header: #{binary_header.inspect}" unless [ true, false ].include?(binary_header)
  raise ArgumentError, "Invalid value for :null: #{null.inspect}" unless [ true, false ].include?(null)


  @field_contents_by_field_name = nil
  @unknown_field_contents_by_key = nil
end

Instance Method Details

#[](field_name) ⇒ Object

Returns the data for the given field_name. Raises FlexColumns::Errors::NoSuchFieldError if there is no field of the given name. Returns nil if there is such a field, but no data for it.



87
88
89
90
# File 'lib/flex_columns/contents/column_data.rb', line 87

def [](field_name)
  field_name = validate_and_deserialize_for_field(field_name)
  field_contents_by_field_name[field_name]
end

#[]=(field_name, new_value) ⇒ Object

Sets the data for the given field_name to the given new_value. Raises FlexColumns::Errors::NoSuchFieldError if there is no field of the given name. Returns new_value.



94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/flex_columns/contents/column_data.rb', line 94

def []=(field_name, new_value)
  field_name = validate_and_deserialize_for_field(field_name)

  # We do this for a very good reason. When encoding as JSON, Ruby's JSON library happily accepts Symbols, but
  # encodes them as simple Strings in the JSON. (This makes sense, because JSON doesn't support Symbols.) This
  # means that if you save a value in a flex column as a Symbol, and then re-read that row from the database,
  # you'll get back a String, not the Symbol you put in.
  #
  # Unfortunately, this is different from what you'll get if there is no intervening save/load cycle, where it'd
  # otherwise stay a Symbol. This difference in behavior can be the source of some really annoying bugs. While
  # ActiveRecord has this annoying behavior, this is a chance to clean it up in a small way -- so, if you set a
  # Symbol, we return a String. (And, yes, this has no bearing on Symbols stored nested inside Arrays or Hashes;
  # and that's OK.)
  new_value = new_value.to_s if new_value.kind_of?(Symbol)

  old_value = field_contents_by_field_name[field_name]

  # We deliberately delete from the hash anything that's being set to +nil+; this is so that we don't end up just
  # binding keys to +nil+, and returning them in #keys, etc. (Yes, this means that you can't distinguish a key
  # explicitly set to +nil+ from a key that's not present; this is different from Ruby's semantics for a Hash,
  # but not by very much, and it makes use of +flex_columns+ a whole lot simpler.)
  if new_value == nil
    field_contents_by_field_name.delete(field_name)
    nil
  else
    field_contents_by_field_name[field_name] = new_value
  end
end

#deserialized?Boolean

Has this object been deserialized? If it’s been deserialized, then we need to do things like run validations on it, save it back to the database when someone calls #save! on the parent object, and so on.

Not at all obvious: originally, we had a method called #touched? that let you know whether the given object had been changed at all. It simply got set on #[]=, above. The problem with this is that very frequently, flex_columns is used to store complex data structures (because that’s one of the things that’s dramatically easier in a serialized JSON blob than in a traditional relational structure). But if you have an array stored, and you call #<< on it to append an element, then #[]= never gets called at all – because it’s still the same object, just with different contents.

We could have worked around this by saving off a copy of each field when we deserialized, then comparing them using a deep equality (#== should work just fine) to determine if they’ve changed. However, this adds very significant overhead to each and every single use of a flex_column object, whether or not you rely on or care about this kind of tracking – we would have to #dup every flex column field every single time we deserialized, and, if you have large objects in there, that can get extremely expensive.

Since almost every object in Ruby is mutable – even Strings – there aren’t really any easy wins here. Numbers are the only commonplace object that aren’t, and it’s not going to be a common use case that someone uses a flex_column with fields that each simply store one single number. (Storing an array or a hash of numbers is much more common, but then you’re talking about Arrays and Hashes, which are back to being mutable.)

Another option would be to #freeze all of the fields on a flex column, thus requiring clients to reassign them with a new object if they wanted to change them at all. That, however, presents an API that most users would hate – I don’t want to say user.prefs_map = user.prefs_map.merge(:foo => bar); I want to just say user.prefs_map[:foo] = bar.

Instead, once we deserialize a field, we just assume that it has changed. While this may end up causing the client to do extra work at times, it’s much higher-performance than doing the tracking every time.

(There is definitely room to add code that would make this configurable, on a per-flex-column or even per-field basis. As always, patches are welcome; as of this writing, it seems likely that it might just not be an issue big enough to worry about.)

Returns:

  • (Boolean)


177
178
179
# File 'lib/flex_columns/contents/column_data.rb', line 177

def deserialized?
  !! field_contents_by_field_name
end

#keysObject

Returns an Array of all field names that are currently set to something.



124
125
126
127
# File 'lib/flex_columns/contents/column_data.rb', line 124

def keys
  deserialize_if_necessary!
  field_contents_by_field_name.keys
end

#to_hashObject

Returns a representation of this data as a Hash. This should not be used in flex_columns to manipulate data, as it does not contain a full representation of a column (in particular, unknown-field data is not represented in the returned Hash); however, it’s useful to construct a string (e.g., FlexColumnsContentsBase#inspect) to help with debugging.



133
134
135
136
# File 'lib/flex_columns/contents/column_data.rb', line 133

def to_hash
  deserialize_if_necessary!
  field_contents_by_field_name.dup.with_indifferent_access
end

#to_jsonObject

Returns a String with the current contents of this object as JSON. (This will deserialize from JSON, if it hasn’t already happened.)

Always returns a string encoded in UTF-8, if we’re running on a Ruby >= 1.9 (that is, with encoding support).



185
186
187
188
189
190
191
192
193
# File 'lib/flex_columns/contents/column_data.rb', line 185

def to_json
  deserialize_if_necessary!

  json_hash = to_json_hash
  as_string = JSON.generate(json_hash, :allow_nan => true)
  as_string = as_string.encode(Encoding::UTF_8) if as_string.respond_to?(:encode)

  as_string
end

#to_stored_dataObject

Returns the exact String that should be stored in the database – compressed or not, with header or not, etc. Raises FlexColumns::Errors::JsonTooLongError if the string is too long to fit in the database.

(Under PostgreSQL, with appropriate ActiveRecord and PostgreSQL support,)



199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/flex_columns/contents/column_data.rb', line 199

def to_stored_data
  out = nil

  deserialize_if_necessary!

  return to_json_hash if storage == :json

  instrument("serialize") do
    if storage == :json
      out = to_json_hash
    else
      out = to_json

      if out.length < 8 && out =~ /^\s*\{\s*\}\s*$/i
        out = @null ? nil : ""
      else
        out = to_binary_storage(out) if storage == :binary
      end
    end
  end

  actual_length = out ? out.length : 0
  if length_limit && actual_length > length_limit
    raise FlexColumns::Errors::JsonTooLongError.new(data_source, length_limit, out)
  end

  out
end

#touch!Object

Does nothing, other than making sure the JSON has been deserialized. This therefore has the effect both of ensuring that the stored data (if any) is valid, and also will remove any unknown keys (on save) if :unknown_fields was set to :delete.



141
142
143
# File 'lib/flex_columns/contents/column_data.rb', line 141

def touch!
  deserialize_if_necessary!
end