Class: SQA::DataFrame

Inherits:
Object
  • Object
show all
Extended by:
Forwardable
Defined in:
lib/sqa/data_frame.rb,
lib/sqa/data_frame/data.rb,
lib/sqa/data_frame/alpha_vantage.rb,
lib/sqa/data_frame/yahoo_finance.rb

Overview

The website financial.yahoo.com no longer supports an API.

To get recent stock historical price updates you have
to scrape the webpage.

Defined Under Namespace

Classes: AlphaVantage, Data, YahooFinance

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(raw_data = nil, mapping: {}, transformers: {}) ⇒ DataFrame

Creates a new DataFrame instance.

Examples:

With column mapping

df = SQA::DataFrame.new(data, mapping: { "Close" => "close_price" })

With transformers

df = SQA::DataFrame.new(data, transformers: { "price" => ->(v) { v.to_f } })

Parameters:

  • raw_data (Hash, Array, Polars::DataFrame, nil) (defaults to: nil)

    Initial data for the DataFrame

  • mapping (Hash) (defaults to: {})

    Column name mappings to apply (old_name => new_name)

  • transformers (Hash) (defaults to: {})

    Column transformers to apply (column => lambda)



47
48
49
50
51
52
53
54
# File 'lib/sqa/data_frame.rb', line 47

def initialize(raw_data = nil, mapping: {}, transformers: {})
  @data = Polars::DataFrame.new(raw_data || [])

  # IMPORTANT: Rename columns FIRST, then apply transformers
  # Transformers expect renamed column names
  rename_columns!(mapping) unless mapping.empty?
  apply_transformers!(transformers) unless transformers.empty?
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(method_name, *args, &block) ⇒ Object

Delegates unknown methods to the underlying Polars DataFrame. This allows direct access to Polars methods like filter, select, etc.

Parameters:

  • method_name (Symbol)

    Method name being called

  • args (Array)

    Method arguments

  • block (Proc)

    Optional block

Returns:

  • (Object)

    Result from Polars DataFrame method



282
283
284
285
# File 'lib/sqa/data_frame.rb', line 282

def method_missing(method_name, *args, &block)
  return super unless @data.respond_to?(method_name)
  @data.send(method_name, *args, &block)
end

Instance Attribute Details

#dataPolars::DataFrame

Returns The underlying Polars DataFrame.

Returns:

  • (Polars::DataFrame)

    The underlying Polars DataFrame



33
34
35
# File 'lib/sqa/data_frame.rb', line 33

def data
  @data
end

Class Method Details

.aofh_to_hofa(aofh, mapping: {}, transformers: {}) ⇒ Hash{String => Array}

Converts array of hashes to hash of arrays format.

Parameters:

  • aofh (Array<Hash>)

    Array of hash records

  • mapping (Hash) (defaults to: {})

    Column name mappings (unused, for API compatibility)

  • transformers (Hash) (defaults to: {})

    Column transformers (unused, for API compatibility)

Returns:

  • (Hash{String => Array})

    Hash with column names as keys and arrays as values



425
426
427
428
429
# File 'lib/sqa/data_frame.rb', line 425

def aofh_to_hofa(aofh, mapping: {}, transformers: {})
  hofa = Hash.new { |h, k| h[k.downcase] = [] }
  aofh.each { |entry| entry.each { |key, value| hofa[key.to_s.downcase] << value } }
  hofa
end

.from_aofh(aofh, mapping: {}, transformers: {}) ⇒ SQA::DataFrame

Creates a DataFrame from an array of hashes.

Examples:

data = [{ "date" => "2024-01-01", "price" => 100.0 }]
df = SQA::DataFrame.from_aofh(data)

Parameters:

  • aofh (Array<Hash>)

    Array of hash records

  • mapping (Hash) (defaults to: {})

    Column name mappings to apply

  • transformers (Hash) (defaults to: {})

    Column transformers to apply

Returns:



327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
# File 'lib/sqa/data_frame.rb', line 327

def from_aofh(aofh, mapping: {}, transformers: {})
  return new({}, mapping: mapping, transformers: transformers) if aofh.empty?

  # Sanitize keys to strings and convert to hash of arrays (Polars-compatible format)
  aoh_sanitized = aofh.map { |entry| entry.transform_keys(&:to_s) }
  columns = aoh_sanitized.first.keys

  # Convert array-of-hashes to hash-of-arrays for Polars
  hofa = columns.each_with_object({}) do |col, hash|
    hash[col] = aoh_sanitized.map { |row| row[col] }
  end

  df = Polars::DataFrame.new(hofa)
  new(df, mapping: mapping, transformers: transformers)
end

.from_csv_file(source, mapping: {}, transformers: {}) ⇒ SQA::DataFrame

Creates a DataFrame from a CSV file.

Parameters:

  • source (String, Pathname)

    Path to CSV file

  • mapping (Hash) (defaults to: {})

    Column name mappings to apply

  • transformers (Hash) (defaults to: {})

    Column transformers to apply

Returns:



349
350
351
352
# File 'lib/sqa/data_frame.rb', line 349

def from_csv_file(source, mapping: {}, transformers: {})
  df = Polars.read_csv(source)
  new(df, mapping: mapping, transformers: transformers)
end

.from_json_file(source, mapping: {}, transformers: {}) ⇒ SQA::DataFrame

Creates a DataFrame from a JSON file.

Parameters:

  • source (String, Pathname)

    Path to JSON file containing array of objects

  • mapping (Hash) (defaults to: {})

    Column name mappings to apply

  • transformers (Hash) (defaults to: {})

    Column transformers to apply

Returns:



360
361
362
363
# File 'lib/sqa/data_frame.rb', line 360

def from_json_file(source, mapping: {}, transformers: {})
  aofh = JSON.parse(File.read(source)).map { |entry| entry.transform_keys(&:to_s) }
  from_aofh(aofh, mapping: mapping, transformers: transformers)
end

.generate_mapping(keys) ⇒ Hash{String => Symbol}

Generates a mapping of original keys to underscored keys.

Parameters:

  • keys (Array<String>)

    Original key names

Returns:

  • (Hash{String => Symbol})

    Mapping from original to underscored keys



369
370
371
372
373
# File 'lib/sqa/data_frame.rb', line 369

def generate_mapping(keys)
  keys.each_with_object({}) do |key, hash|
    hash[key.to_s] = underscore_key(key.to_s)
  end
end

.is_date?(value) ⇒ Boolean

Checks if a value appears to be a date string.

Parameters:

  • value (Object)

    Value to check

Returns:

  • (Boolean)

    true if value matches YYYY-MM-DD format



271
272
273
# File 'lib/sqa/data_frame.rb', line 271

def self.is_date?(value)
  value.is_a?(String) && !/\d{4}-\d{2}-\d{2}/.match(value).nil?
end

.load(source:, transformers: {}, mapping: {}) ⇒ SQA::DataFrame

Load a DataFrame from a file source This is the primary method for loading persisted DataFrames

Note: For cached CSV files, transformers and mapping should typically be empty since transformations were already applied when the data was first fetched. We only apply them if the CSV has old-format column names that need migration.

Parameters:

  • source (String, Pathname)

    Path to CSV file

  • transformers (Hash) (defaults to: {})

    Column transformations to apply (usually not needed for cached data)

  • mapping (Hash) (defaults to: {})

    Column name mappings (usually not needed for cached data)

Returns:



308
309
310
311
312
313
314
# File 'lib/sqa/data_frame.rb', line 308

def load(source:, transformers: {}, mapping: {})
  df = Polars.read_csv(source.to_s)

  # Auto-detect if CSV needs migration (has old column names like "open" instead of "open_price")
  # Only apply mapping if explicitly provided (for migration scenarios)
  new(df, mapping: mapping, transformers: transformers)
end

.normalize_keys(hash, adapter_mapping: {}) ⇒ Hash

Normalizes all keys in a hash to snake_case format.

Parameters:

  • hash (Hash)

    Hash with keys to normalize

  • adapter_mapping (Hash) (defaults to: {})

    Optional pre-mapping to apply first

Returns:

  • (Hash)

    Hash with normalized keys



403
404
405
406
407
# File 'lib/sqa/data_frame.rb', line 403

def normalize_keys(hash, adapter_mapping: {})
  hash = rename(hash, adapter_mapping) unless adapter_mapping.empty?
  mapping = generate_mapping(hash.keys)
  rename(hash, mapping)
end

.rename(hash, mapping) ⇒ Hash

Renames keys in a hash according to a mapping.

Parameters:

  • hash (Hash)

    Hash to modify

  • mapping (Hash)

    Old key to new key mapping

Returns:

  • (Hash)

    Modified hash



414
415
416
417
# File 'lib/sqa/data_frame.rb', line 414

def rename(hash, mapping)
  mapping.each { |old_key, new_key| hash[new_key] = hash.delete(old_key) if hash.key?(old_key) }
  hash
end

.underscore_key(key) ⇒ Symbol Also known as: sanitize_key

Converts a key string to underscored snake_case format.

Examples:

underscore_key("closePrice")  # => :close_price
underscore_key("Close Price") # => :close_price

Parameters:

  • key (String)

    Key to convert

Returns:

  • (Symbol)

    Underscored key as symbol



384
385
386
387
388
389
390
391
392
393
394
# File 'lib/sqa/data_frame.rb', line 384

def underscore_key(key)
  key.to_s
     .gsub(/([A-Z]+)([A-Z][a-z])/, '\1_\2')
     .gsub(/([a-z\d])([A-Z])/, '\1_\2')
     .gsub(/[^a-zA-Z0-9]/, ' ')
     .squeeze(' ')
     .strip
     .tr(' ', '_')
     .downcase
     .to_sym
end

Instance Method Details

#append!(other_df) ⇒ void Also known as: concat!

This method returns an undefined value.

Appends another DataFrame to this one in place.

Examples:

df1.append!(df2)

Parameters:

Raises:

  • (RuntimeError)

    If the resulting row count doesn’t match expected



107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/sqa/data_frame.rb', line 107

def append!(other_df)
  self_row_count = @data.shape[0]
  other_row_count = other_df.data.shape[0]

  @data = if self_row_count == 0
            other_df.data
          else
            @data.vstack(other_df.data)
          end

  post_append_row_count = @data.shape[0]
  expected_row_count = self_row_count + other_row_count
  return if post_append_row_count == expected_row_count

  raise "Append Error: expected #{expected_row_count}, got #{post_append_row_count} "

end

#apply_transformers!(transformers) ⇒ void

This method returns an undefined value.

Applies transformer functions to specified columns in place.

Examples:

df.apply_transformers!({ "price" => ->(v) { v.to_f }, "volume" => ->(v) { v.to_i } })

Parameters:

  • transformers (Hash{String, Symbol => Proc})

    Column name to transformer mapping



65
66
67
68
69
70
71
72
# File 'lib/sqa/data_frame.rb', line 65

def apply_transformers!(transformers)
  transformers.each do |col, transformer|
    col_name = col.to_s
    @data = @data.with_column(
      @data[col_name].apply(&transformer).alias(col_name)
    )
  end
end

#columnsArray<String>

Returns the column names of the DataFrame.

Returns:

  • (Array<String>)

    List of column names



176
177
178
# File 'lib/sqa/data_frame.rb', line 176

def columns
  @data.columns
end

#concat_and_deduplicate!(other_df, sort_column: "timestamp", descending: false) ⇒ Object

Concatenate another DataFrame, remove duplicates, and sort This is the preferred method for updating CSV data to prevent duplicates

NOTE: TA-Lib requires data in ascending (oldest-first) order. Using descending: true will produce a warning and force ascending order to prevent silent calculation errors.

Examples:

Merge new data with deduplication

stock = SQA::Stock.new(ticker: 'AAPL')
df = stock.df
df.size  # => 252

# Fetch recent data (may have overlapping dates)
new_df = SQA::DataFrame::AlphaVantage.recent('AAPL', from_date: Date.today - 7)
df.concat_and_deduplicate!(new_df)
# Duplicates removed, data sorted ascending (oldest first)
df.size  # => 255 (only 3 new unique dates added)

Maintains TA-Lib compatibility

df.concat_and_deduplicate!(new_df)  # Sorted ascending automatically
prices = df["adj_close_price"].to_a
rsi = SQAI.rsi(prices, period: 14)  # Works correctly with ascending data

Parameters:

  • other_df (SQA::DataFrame)

    DataFrame to append

  • sort_column (String) (defaults to: "timestamp")

    Column to use for deduplication and sorting (default: “timestamp”)

  • descending (Boolean) (defaults to: false)

    Sort order - false for ascending (oldest first, TA-Lib compatible), true for descending



152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# File 'lib/sqa/data_frame.rb', line 152

def concat_and_deduplicate!(other_df, sort_column: "timestamp", descending: false)
  # Enforce ascending order for TA-Lib compatibility
  if descending
    warn "[SQA WARNING] TA-Lib requires ascending (oldest-first) order. Forcing descending: false"
    descending = false
  end

  # Concatenate the dataframes
  @data = if @data.shape[0] == 0
            other_df.data
          else
            @data.vstack(other_df.data)
          end

  # Remove duplicates based on sort_column, keeping first occurrence
  @data = @data.unique(subset: [sort_column], keep: "first")

  # Sort by the specified column (Polars uses 'reverse' for descending)
  @data = @data.sort(sort_column, reverse: descending)
end

#fpl(column: 'adj_close_price', fpop: 14) ⇒ Array<Array<Float, Float>>

FPL Analysis - Calculate Future Period Loss/Profit

Examples:

stock = SQA::Stock.new(ticker: 'AAPL')
fpl_data = stock.df.fpl(fpop: 10)

Parameters:

  • column (String, Symbol) (defaults to: 'adj_close_price')

    Column name containing prices (default: “adj_close_price”)

  • fpop (Integer) (defaults to: 14)

    Future Period of Performance (days to look ahead)

Returns:

  • (Array<Array<Float, Float>>)

    Array of [min_delta, max_delta] pairs



243
244
245
246
# File 'lib/sqa/data_frame.rb', line 243

def fpl(column: 'adj_close_price', fpop: 14)
  prices = @data[column.to_s].to_a
  SQA::FPOP.fpl(prices, fpop: fpop)
end

#fpl_analysis(column: 'adj_close_price', fpop: 14) ⇒ Array<Hash>

FPL Analysis with risk metrics and classification

Examples:

analysis = stock.df.fpl_analysis(fpop: 10)
analysis.first[:direction]  # => :UP, :DOWN, :UNCERTAIN, or :FLAT
analysis.first[:magnitude]  # => Average expected movement percentage
analysis.first[:risk]       # => Volatility range

Parameters:

  • column (String, Symbol) (defaults to: 'adj_close_price')

    Column name containing prices (default: “adj_close_price”)

  • fpop (Integer) (defaults to: 14)

    Future Period of Performance

Returns:

  • (Array<Hash>)

    Array of analysis hashes



261
262
263
264
# File 'lib/sqa/data_frame.rb', line 261

def fpl_analysis(column: 'adj_close_price', fpop: 14)
  prices = @data[column.to_s].to_a
  SQA::FPOP.fpl_analysis(prices, fpop: fpop)
end

#keysArray<String> Also known as: vectors

Returns the column names of the DataFrame. Alias for #columns.

Returns:

  • (Array<String>)

    List of column names



184
185
186
# File 'lib/sqa/data_frame.rb', line 184

def keys
  @data.columns
end

#ncolsInteger

Returns the number of columns in the DataFrame.

Returns:

  • (Integer)

    Column count



228
229
230
# File 'lib/sqa/data_frame.rb', line 228

def ncols
  @data.width
end

#rename_columns!(mapping) ⇒ void

This method returns an undefined value.

Renames columns according to the provided mapping in place.

Examples:

df.rename_columns!({ "open" => "open_price", "close" => "close_price" })

Parameters:

  • mapping (Hash{String, Symbol => String})

    Old column name to new column name mapping



82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/sqa/data_frame.rb', line 82

def rename_columns!(mapping)
  # Normalize mapping keys to strings for consistent lookup
  # mapping can have string or symbol keys, columns are always strings
  string_mapping = mapping.transform_keys(&:to_s)

  rename_mapping = @data.columns.each_with_index.map do |col, _|
    # Try exact match first, then lowercase match
    new_name = string_mapping[col] || string_mapping[col.downcase] || col
    # Polars requires both keys and values to be strings
    [col, new_name.to_s]
  end.to_h

  @data = @data.rename(rename_mapping)
end

#respond_to_missing?(method_name, include_private = false) ⇒ Boolean

Checks if the DataFrame responds to a method.

Parameters:

  • method_name (Symbol)

    Method name to check

  • include_private (Boolean) (defaults to: false)

    Include private methods

Returns:

  • (Boolean)

    true if method is available



292
293
294
# File 'lib/sqa/data_frame.rb', line 292

def respond_to_missing?(method_name, include_private = false)
  @data.respond_to?(method_name) || super
end

#sizeInteger Also known as: nrows, length

Returns the number of rows in the DataFrame.

Returns:

  • (Integer)

    Row count



219
220
221
# File 'lib/sqa/data_frame.rb', line 219

def size
  @data.height
end

#to_csv(path_to_file) ⇒ void

This method returns an undefined value.

Writes the DataFrame to a CSV file.

Examples:

Save stock data to CSV

stock = SQA::Stock.new(ticker: 'AAPL')
stock.df.to_csv('aapl_prices.csv')

Export with custom path

df.to_csv(Pathname.new('data/exports/prices.csv'))

Parameters:

  • path_to_file (String, Pathname)

    Path to output CSV file



212
213
214
# File 'lib/sqa/data_frame.rb', line 212

def to_csv(path_to_file)
  @data.write_csv(path_to_file)
end

#to_hHash{Symbol => Array}

Converts the DataFrame to a Ruby Hash.

Examples:

df.to_h  # => { timestamp: ["2024-01-01", ...], close_price: [100.0, ...] }

Returns:

  • (Hash{Symbol => Array})

    Hash with column names as keys and column data as arrays



196
197
198
# File 'lib/sqa/data_frame.rb', line 196

def to_h
  @data.columns.map { |col| [col.to_sym, @data[col].to_a] }.to_h
end