Class: SQA::DataFrame
- Inherits:
-
Object
- Object
- SQA::DataFrame
- Extended by:
- Forwardable
- Defined in:
- lib/sqa/data_frame.rb,
lib/sqa/data_frame/data.rb,
lib/sqa/data_frame/alpha_vantage.rb,
lib/sqa/data_frame/yahoo_finance.rb
Overview
The website financial.yahoo.com no longer supports an API.
To get recent stock historical price updates you have
to scrape the webpage.
Defined Under Namespace
Classes: AlphaVantage, Data, YahooFinance
Instance Attribute Summary collapse
-
#data ⇒ Polars::DataFrame
The underlying Polars DataFrame.
Class Method Summary collapse
-
.aofh_to_hofa(aofh, mapping: {}, transformers: {}) ⇒ Hash{String => Array}
Converts array of hashes to hash of arrays format.
-
.from_aofh(aofh, mapping: {}, transformers: {}) ⇒ SQA::DataFrame
Creates a DataFrame from an array of hashes.
-
.from_csv_file(source, mapping: {}, transformers: {}) ⇒ SQA::DataFrame
Creates a DataFrame from a CSV file.
-
.from_json_file(source, mapping: {}, transformers: {}) ⇒ SQA::DataFrame
Creates a DataFrame from a JSON file.
-
.generate_mapping(keys) ⇒ Hash{String => Symbol}
Generates a mapping of original keys to underscored keys.
-
.is_date?(value) ⇒ Boolean
Checks if a value appears to be a date string.
-
.load(source:, transformers: {}, mapping: {}) ⇒ SQA::DataFrame
Load a DataFrame from a file source This is the primary method for loading persisted DataFrames.
-
.normalize_keys(hash, adapter_mapping: {}) ⇒ Hash
Normalizes all keys in a hash to snake_case format.
-
.rename(hash, mapping) ⇒ Hash
Renames keys in a hash according to a mapping.
-
.underscore_key(key) ⇒ Symbol
(also: sanitize_key)
Converts a key string to underscored snake_case format.
Instance Method Summary collapse
-
#append!(other_df) ⇒ void
(also: #concat!)
Appends another DataFrame to this one in place.
-
#apply_transformers!(transformers) ⇒ void
Applies transformer functions to specified columns in place.
-
#columns ⇒ Array<String>
Returns the column names of the DataFrame.
-
#concat_and_deduplicate!(other_df, sort_column: "timestamp", descending: false) ⇒ Object
Concatenate another DataFrame, remove duplicates, and sort This is the preferred method for updating CSV data to prevent duplicates.
-
#fpl(column: 'adj_close_price', fpop: 14) ⇒ Array<Array<Float, Float>>
FPL Analysis - Calculate Future Period Loss/Profit.
-
#fpl_analysis(column: 'adj_close_price', fpop: 14) ⇒ Array<Hash>
FPL Analysis with risk metrics and classification.
-
#initialize(raw_data = nil, mapping: {}, transformers: {}) ⇒ DataFrame
constructor
Creates a new DataFrame instance.
-
#keys ⇒ Array<String>
(also: #vectors)
Returns the column names of the DataFrame.
-
#method_missing(method_name, *args, &block) ⇒ Object
Delegates unknown methods to the underlying Polars DataFrame.
-
#ncols ⇒ Integer
Returns the number of columns in the DataFrame.
-
#rename_columns!(mapping) ⇒ void
Renames columns according to the provided mapping in place.
-
#respond_to_missing?(method_name, include_private = false) ⇒ Boolean
Checks if the DataFrame responds to a method.
-
#size ⇒ Integer
(also: #nrows, #length)
Returns the number of rows in the DataFrame.
-
#to_csv(path_to_file) ⇒ void
Writes the DataFrame to a CSV file.
-
#to_h ⇒ Hash{Symbol => Array}
Converts the DataFrame to a Ruby Hash.
Constructor Details
#initialize(raw_data = nil, mapping: {}, transformers: {}) ⇒ DataFrame
Creates a new DataFrame instance.
47 48 49 50 51 52 53 54 |
# File 'lib/sqa/data_frame.rb', line 47 def initialize(raw_data = nil, mapping: {}, transformers: {}) @data = Polars::DataFrame.new(raw_data || []) # IMPORTANT: Rename columns FIRST, then apply transformers # Transformers expect renamed column names rename_columns!(mapping) unless mapping.empty? apply_transformers!(transformers) unless transformers.empty? end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(method_name, *args, &block) ⇒ Object
Delegates unknown methods to the underlying Polars DataFrame. This allows direct access to Polars methods like filter, select, etc.
282 283 284 285 |
# File 'lib/sqa/data_frame.rb', line 282 def method_missing(method_name, *args, &block) return super unless @data.respond_to?(method_name) @data.send(method_name, *args, &block) end |
Instance Attribute Details
#data ⇒ Polars::DataFrame
Returns The underlying Polars DataFrame.
33 34 35 |
# File 'lib/sqa/data_frame.rb', line 33 def data @data end |
Class Method Details
.aofh_to_hofa(aofh, mapping: {}, transformers: {}) ⇒ Hash{String => Array}
Converts array of hashes to hash of arrays format.
425 426 427 428 429 |
# File 'lib/sqa/data_frame.rb', line 425 def aofh_to_hofa(aofh, mapping: {}, transformers: {}) hofa = Hash.new { |h, k| h[k.downcase] = [] } aofh.each { |entry| entry.each { |key, value| hofa[key.to_s.downcase] << value } } hofa end |
.from_aofh(aofh, mapping: {}, transformers: {}) ⇒ SQA::DataFrame
Creates a DataFrame from an array of hashes.
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
# File 'lib/sqa/data_frame.rb', line 327 def from_aofh(aofh, mapping: {}, transformers: {}) return new({}, mapping: mapping, transformers: transformers) if aofh.empty? # Sanitize keys to strings and convert to hash of arrays (Polars-compatible format) aoh_sanitized = aofh.map { |entry| entry.transform_keys(&:to_s) } columns = aoh_sanitized.first.keys # Convert array-of-hashes to hash-of-arrays for Polars hofa = columns.each_with_object({}) do |col, hash| hash[col] = aoh_sanitized.map { |row| row[col] } end df = Polars::DataFrame.new(hofa) new(df, mapping: mapping, transformers: transformers) end |
.from_csv_file(source, mapping: {}, transformers: {}) ⇒ SQA::DataFrame
Creates a DataFrame from a CSV file.
349 350 351 352 |
# File 'lib/sqa/data_frame.rb', line 349 def from_csv_file(source, mapping: {}, transformers: {}) df = Polars.read_csv(source) new(df, mapping: mapping, transformers: transformers) end |
.from_json_file(source, mapping: {}, transformers: {}) ⇒ SQA::DataFrame
Creates a DataFrame from a JSON file.
360 361 362 363 |
# File 'lib/sqa/data_frame.rb', line 360 def from_json_file(source, mapping: {}, transformers: {}) aofh = JSON.parse(File.read(source)).map { |entry| entry.transform_keys(&:to_s) } from_aofh(aofh, mapping: mapping, transformers: transformers) end |
.generate_mapping(keys) ⇒ Hash{String => Symbol}
Generates a mapping of original keys to underscored keys.
369 370 371 372 373 |
# File 'lib/sqa/data_frame.rb', line 369 def generate_mapping(keys) keys.each_with_object({}) do |key, hash| hash[key.to_s] = underscore_key(key.to_s) end end |
.is_date?(value) ⇒ Boolean
Checks if a value appears to be a date string.
271 272 273 |
# File 'lib/sqa/data_frame.rb', line 271 def self.is_date?(value) value.is_a?(String) && !/\d{4}-\d{2}-\d{2}/.match(value).nil? end |
.load(source:, transformers: {}, mapping: {}) ⇒ SQA::DataFrame
Load a DataFrame from a file source This is the primary method for loading persisted DataFrames
Note: For cached CSV files, transformers and mapping should typically be empty since transformations were already applied when the data was first fetched. We only apply them if the CSV has old-format column names that need migration.
308 309 310 311 312 313 314 |
# File 'lib/sqa/data_frame.rb', line 308 def load(source:, transformers: {}, mapping: {}) df = Polars.read_csv(source.to_s) # Auto-detect if CSV needs migration (has old column names like "open" instead of "open_price") # Only apply mapping if explicitly provided (for migration scenarios) new(df, mapping: mapping, transformers: transformers) end |
.normalize_keys(hash, adapter_mapping: {}) ⇒ Hash
Normalizes all keys in a hash to snake_case format.
403 404 405 406 407 |
# File 'lib/sqa/data_frame.rb', line 403 def normalize_keys(hash, adapter_mapping: {}) hash = rename(hash, adapter_mapping) unless adapter_mapping.empty? mapping = generate_mapping(hash.keys) rename(hash, mapping) end |
.rename(hash, mapping) ⇒ Hash
Renames keys in a hash according to a mapping.
414 415 416 417 |
# File 'lib/sqa/data_frame.rb', line 414 def rename(hash, mapping) mapping.each { |old_key, new_key| hash[new_key] = hash.delete(old_key) if hash.key?(old_key) } hash end |
.underscore_key(key) ⇒ Symbol Also known as: sanitize_key
Converts a key string to underscored snake_case format.
384 385 386 387 388 389 390 391 392 393 394 |
# File 'lib/sqa/data_frame.rb', line 384 def underscore_key(key) key.to_s .gsub(/([A-Z]+)([A-Z][a-z])/, '\1_\2') .gsub(/([a-z\d])([A-Z])/, '\1_\2') .gsub(/[^a-zA-Z0-9]/, ' ') .squeeze(' ') .strip .tr(' ', '_') .downcase .to_sym end |
Instance Method Details
#append!(other_df) ⇒ void Also known as: concat!
This method returns an undefined value.
Appends another DataFrame to this one in place.
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/sqa/data_frame.rb', line 107 def append!(other_df) self_row_count = @data.shape[0] other_row_count = other_df.data.shape[0] @data = if self_row_count == 0 other_df.data else @data.vstack(other_df.data) end post_append_row_count = @data.shape[0] expected_row_count = self_row_count + other_row_count return if post_append_row_count == expected_row_count raise "Append Error: expected #{expected_row_count}, got #{post_append_row_count} " end |
#apply_transformers!(transformers) ⇒ void
This method returns an undefined value.
Applies transformer functions to specified columns in place.
65 66 67 68 69 70 71 72 |
# File 'lib/sqa/data_frame.rb', line 65 def apply_transformers!(transformers) transformers.each do |col, transformer| col_name = col.to_s @data = @data.with_column( @data[col_name].apply(&transformer).alias(col_name) ) end end |
#columns ⇒ Array<String>
Returns the column names of the DataFrame.
176 177 178 |
# File 'lib/sqa/data_frame.rb', line 176 def columns @data.columns end |
#concat_and_deduplicate!(other_df, sort_column: "timestamp", descending: false) ⇒ Object
Concatenate another DataFrame, remove duplicates, and sort This is the preferred method for updating CSV data to prevent duplicates
NOTE: TA-Lib requires data in ascending (oldest-first) order. Using descending: true will produce a warning and force ascending order to prevent silent calculation errors.
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
# File 'lib/sqa/data_frame.rb', line 152 def concat_and_deduplicate!(other_df, sort_column: "timestamp", descending: false) # Enforce ascending order for TA-Lib compatibility if descending warn "[SQA WARNING] TA-Lib requires ascending (oldest-first) order. Forcing descending: false" descending = false end # Concatenate the dataframes @data = if @data.shape[0] == 0 other_df.data else @data.vstack(other_df.data) end # Remove duplicates based on sort_column, keeping first occurrence @data = @data.unique(subset: [sort_column], keep: "first") # Sort by the specified column (Polars uses 'reverse' for descending) @data = @data.sort(sort_column, reverse: descending) end |
#fpl(column: 'adj_close_price', fpop: 14) ⇒ Array<Array<Float, Float>>
FPL Analysis - Calculate Future Period Loss/Profit
243 244 245 246 |
# File 'lib/sqa/data_frame.rb', line 243 def fpl(column: 'adj_close_price', fpop: 14) prices = @data[column.to_s].to_a SQA::FPOP.fpl(prices, fpop: fpop) end |
#fpl_analysis(column: 'adj_close_price', fpop: 14) ⇒ Array<Hash>
FPL Analysis with risk metrics and classification
261 262 263 264 |
# File 'lib/sqa/data_frame.rb', line 261 def fpl_analysis(column: 'adj_close_price', fpop: 14) prices = @data[column.to_s].to_a SQA::FPOP.fpl_analysis(prices, fpop: fpop) end |
#keys ⇒ Array<String> Also known as: vectors
Returns the column names of the DataFrame. Alias for #columns.
184 185 186 |
# File 'lib/sqa/data_frame.rb', line 184 def keys @data.columns end |
#ncols ⇒ Integer
Returns the number of columns in the DataFrame.
228 229 230 |
# File 'lib/sqa/data_frame.rb', line 228 def ncols @data.width end |
#rename_columns!(mapping) ⇒ void
This method returns an undefined value.
Renames columns according to the provided mapping in place.
82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
# File 'lib/sqa/data_frame.rb', line 82 def rename_columns!(mapping) # Normalize mapping keys to strings for consistent lookup # mapping can have string or symbol keys, columns are always strings string_mapping = mapping.transform_keys(&:to_s) rename_mapping = @data.columns.each_with_index.map do |col, _| # Try exact match first, then lowercase match new_name = string_mapping[col] || string_mapping[col.downcase] || col # Polars requires both keys and values to be strings [col, new_name.to_s] end.to_h @data = @data.rename(rename_mapping) end |
#respond_to_missing?(method_name, include_private = false) ⇒ Boolean
Checks if the DataFrame responds to a method.
292 293 294 |
# File 'lib/sqa/data_frame.rb', line 292 def respond_to_missing?(method_name, include_private = false) @data.respond_to?(method_name) || super end |
#size ⇒ Integer Also known as: nrows, length
Returns the number of rows in the DataFrame.
219 220 221 |
# File 'lib/sqa/data_frame.rb', line 219 def size @data.height end |
#to_csv(path_to_file) ⇒ void
This method returns an undefined value.
Writes the DataFrame to a CSV file.
212 213 214 |
# File 'lib/sqa/data_frame.rb', line 212 def to_csv(path_to_file) @data.write_csv(path_to_file) end |
#to_h ⇒ Hash{Symbol => Array}
Converts the DataFrame to a Ruby Hash.
196 197 198 |
# File 'lib/sqa/data_frame.rb', line 196 def to_h @data.columns.map { |col| [col.to_sym, @data[col].to_a] }.to_h end |