Module: DaruLite::Category
- Defined in:
- lib/daru_lite/category.rb
Overview
rubocop:disable Metrics/ModuleLength
Constant Summary collapse
- UNDEFINED =
Object.new.freeze
- CODING_SCHEMES =
%i[dummy deviation helmert simple].freeze
Instance Attribute Summary collapse
-
#base_category ⇒ Object
Returns the value of attribute base_category.
-
#coding_scheme ⇒ Object
Returns the value of attribute coding_scheme.
-
#index ⇒ Object
Returns the value of attribute index.
-
#name ⇒ Object
Returns the value of attribute name.
Instance Method Summary collapse
-
#==(other) ⇒ Object
Two categorical vectors are equal if their index and corresponding values are same return [true, false] true if two vectors are similar.
-
#[](*indexes) ⇒ Object
Returns vector for indexes/positions specified.
-
#[]=(*indexes, val) ⇒ Object
Modifies values at specified indexes/positions.
-
#add_category(*new_categories) ⇒ Object
Associates a category to the vector.
-
#at(*positions) ⇒ Object
Returns vector for positions specified.
-
#categories ⇒ Array
(also: #order)
Returns all the categories with the inherent order.
-
#categories=(cat_with_order) ⇒ Object
Sets order of the categories.
-
#contrast_code(opts = {}) ⇒ DaruLite::DataFrame
Contrast code the vector acording to the coding scheme set.
-
#count(category = UNDEFINED) ⇒ Object
Returns frequency of given category.
-
#count_values(*values) ⇒ Integer
Count the number of values specified.
-
#describe ⇒ DaruLite::Vector
Gives the summary of data using following parameters - size: size of the data - categories: total number of categories - max_freq: Max no of times a category occurs - max_category: The category which occurs max no of times - min_freq: Min no of times a category occurs - min_category: The category which occurs min no of times.
-
#dup ⇒ DaruLite::Vector
Duplicated a vector.
-
#each ⇒ Enumerator
Returns an enumerator that enumerates on categorical data.
-
#frequencies(type = :count) ⇒ DaruLite::Vector
Returns a vector storing count/frequency of each category.
-
#include_values?(*values) ⇒ true, false
Check if any one of mentioned values occur in the vector.
-
#indexes(*values) ⇒ Array
Return indexes of values specified.
-
#initialize_category(data, opts = {}) ⇒ Object
Initializes a vector to store categorical data.
-
#max ⇒ object
Returns the maximum category acording to the order specified.
-
#min ⇒ object
Returns the minimum category acording to the order specified.
-
#ordered=(bool) ⇒ Object
Make categorical data ordered or unordered.
-
#ordered? ⇒ Boolean
Tells whether vector is ordered or not.
- #positions(*values) ⇒ Object
-
#reindex!(idx) ⇒ DaruLite::Vector
Sets new index for vector.
-
#reject_values(*values) ⇒ DaruLite::Vector
Return a vector with specified values removed.
-
#remove_unused_categories ⇒ DaruLite::Vector
Removes the unused categories.
-
#rename_categories(old_to_new) ⇒ Object
Rename categories.
-
#reorder!(order) ⇒ Object
Reorder the vector with given positions.
-
#replace_values(old_values, new_value) ⇒ DaruLite::Vector
Replaces specified values with a new value.
-
#set_at(positions, val) ⇒ Object
Modifies values at specified positions.
-
#size ⇒ Object
Size of categorical data.
- #sort ⇒ Object
-
#sort! ⇒ DaruLite::Vector
Sorts the vector in the order specified.
-
#to_a ⇒ Array
Returns all categorical data.
-
#to_category ⇒ DaruLite::Vector
Does nothing since its already of type category.
-
#to_ints ⇒ Array
Returns integer coding for categorical data in the order starting from 0.
-
#to_non_category ⇒ DaruLite::Vector
Converts a category type vector to non category type vector.
-
#where(bool_array) ⇒ DaruLite::Vector
For querying the data.
Instance Attribute Details
#base_category ⇒ Object
Returns the value of attribute base_category.
5 6 7 |
# File 'lib/daru_lite/category.rb', line 5 def base_category @base_category end |
#coding_scheme ⇒ Object
Returns the value of attribute coding_scheme.
6 7 8 |
# File 'lib/daru_lite/category.rb', line 6 def coding_scheme @coding_scheme end |
#index ⇒ Object
Returns the value of attribute index.
6 7 8 |
# File 'lib/daru_lite/category.rb', line 6 def index @index end |
#name ⇒ Object
Returns the value of attribute name.
6 7 8 |
# File 'lib/daru_lite/category.rb', line 6 def name @name end |
Instance Method Details
#==(other) ⇒ Object
Two categorical vectors are equal if their index and corresponding values are same return [true, false] true if two vectors are similar
492 493 494 495 496 |
# File 'lib/daru_lite/category.rb', line 492 def ==(other) size == other.size && to_a == other.to_a && index == other.index end |
#[](*indexes) ⇒ Object
Since it accepts both indexes and postions. In case of collision, argument will be treated as index
Returns vector for indexes/positions specified
177 178 179 180 181 182 183 184 185 186 187 |
# File 'lib/daru_lite/category.rb', line 177 def [](*indexes) positions = @index.pos(*indexes) return category_from_position(positions) if positions.is_a? Integer DaruLite::Vector.new positions.map { |pos| category_from_position pos }, index: @index.subset(*indexes), name: @name, type: :category, ordered: @ordered, categories: categories end |
#[]=(*indexes, val) ⇒ Object
In order to add a new category you need to associate it via #add_category
Modifies values at specified indexes/positions.
231 232 233 234 235 236 237 238 239 240 |
# File 'lib/daru_lite/category.rb', line 231 def []=(*indexes, val) positions = @index.pos(*indexes) if positions.is_a? Numeric modify_category_at positions, val else positions.each { |pos| modify_category_at pos, val } end self end |
#add_category(*new_categories) ⇒ Object
Associates a category to the vector.
110 111 112 113 |
# File 'lib/daru_lite/category.rb', line 110 def add_category(*new_categories) new_categories -= categories add_extra_categories new_categories end |
#at(*positions) ⇒ Object
Returns vector for positions specified.
200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
# File 'lib/daru_lite/category.rb', line 200 def at(*positions) original_positions = positions positions = coerce_positions(*positions) validate_positions(*positions) return category_from_position(positions) if positions.is_a? Integer DaruLite::Vector.new positions.map { |pos| category_from_position(pos) }, index: @index.at(*original_positions), name: @name, type: :category, ordered: @ordered, categories: categories end |
#categories ⇒ Array Also known as: order
Returns all the categories with the inherent order
301 302 303 |
# File 'lib/daru_lite/category.rb', line 301 def categories @cat_hash.keys end |
#categories=(cat_with_order) ⇒ Object
If extra categories are specified, they get added too.
Sets order of the categories.
315 316 317 318 319 |
# File 'lib/daru_lite/category.rb', line 315 def categories=(cat_with_order) validate_categories(cat_with_order) add_extra_categories(cat_with_order - categories) order_with cat_with_order end |
#contrast_code(opts = {}) ⇒ DaruLite::DataFrame
To set the coding scheme use #coding_scheme=
Contrast code the vector acording to the coding scheme set.
474 475 476 477 478 479 480 481 |
# File 'lib/daru_lite/category.rb', line 474 def contrast_code(opts = {}) if opts[:user_defined] user_defined_coding(opts[:user_defined]) else # TODO: Make various coding schemes code DRY send(:"#{coding_scheme}_coding", opts[:full] || false) end end |
#count(category = UNDEFINED) ⇒ Object
Returns frequency of given category
124 125 126 127 128 129 130 |
# File 'lib/daru_lite/category.rb', line 124 def count(category = UNDEFINED) return @cat_hash.values.sum(&:size) if category == UNDEFINED # count all raise ArgumentError, "Invalid category #{category}" unless categories.include?(category) @cat_hash[category].size end |
#count_values(*values) ⇒ Integer
Count the number of values specified
691 692 693 694 |
# File 'lib/daru_lite/category.rb', line 691 def count_values(*values) values.filter_map { |v| @cat_hash[v].size if @cat_hash.include? v } .sum end |
#describe ⇒ DaruLite::Vector
Gives the summary of data using following parameters
-
size: size of the data
-
categories: total number of categories
-
max_freq: Max no of times a category occurs
-
max_category: The category which occurs max no of times
-
min_freq: Min no of times a category occurs
-
min_category: The category which occurs min no of times
614 615 616 617 618 619 620 621 622 623 |
# File 'lib/daru_lite/category.rb', line 614 def describe DaruLite::Vector.new( size: size, categories: categories.size, max_freq: @cat_hash.values.map(&:size).max, max_category: @cat_hash.keys.max_by { |cat| @cat_hash[cat].size }, min_freq: @cat_hash.values.map(&:size).min, min_category: @cat_hash.keys.min_by { |cat| @cat_hash[cat].size } ) end |
#dup ⇒ DaruLite::Vector
Duplicated a vector
94 95 96 97 98 99 100 101 |
# File 'lib/daru_lite/category.rb', line 94 def dup DaruLite::Vector.new to_a.dup, name: @name, index: @index.dup, type: :category, categories: categories, ordered: ordered? end |
#each ⇒ Enumerator
Returns an enumerator that enumerates on categorical data
66 67 68 69 70 71 |
# File 'lib/daru_lite/category.rb', line 66 def each return enum_for(:each) unless block_given? @array.each { |pos| yield cat_from_int pos } self end |
#frequencies(type = :count) ⇒ DaruLite::Vector
Returns a vector storing count/frequency of each category
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
# File 'lib/daru_lite/category.rb', line 143 def frequencies(type = :count) counts = @cat_hash.values.map(&:size) values = case type when :count counts when :fraction counts.map { |c| c / size.to_f } when :percentage counts.map { |c| c / size.to_f * 100 } else raise ArgumentError, 'Type should be either :count, :fraction or ' \ ":percentage. #{type} not supported." end DaruLite::Vector.new values, index: categories, name: name end |
#include_values?(*values) ⇒ true, false
Check if any one of mentioned values occur in the vector
661 662 663 |
# File 'lib/daru_lite/category.rb', line 661 def include_values?(*values) values.any? { |v| @cat_hash.include?(v) && !@cat_hash[v].empty? } end |
#indexes(*values) ⇒ Array
Return indexes of values specified
703 704 705 706 |
# File 'lib/daru_lite/category.rb', line 703 def indexes(*values) values &= categories index.to_a.values_at(*values.flat_map { |v| @cat_hash[v] }.sort) end |
#initialize_category(data, opts = {}) ⇒ Object
Base category is set to the first category encountered in the vector.
Initializes a vector to store categorical data.
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/daru_lite/category.rb', line 28 def initialize_category(data, opts = {}) @type = :category initialize_core_attributes data if opts[:categories] validate_categories(opts[:categories]) add_extra_categories(opts[:categories] - categories) order_with opts[:categories] end # Specify if the categories are ordered or not. # By default its unordered @ordered = opts[:ordered] || false # The coding scheme to code with. Default is dummy coding. @coding_scheme = :dummy # Base category which won't be present in the coding @base_category = @cat_hash.keys.first # Stores the name of the vector @name = opts[:name] # Index of the vector @index = coerce_index opts[:index] self end |
#max ⇒ object
This operation will only work if vector is ordered. To set the vector ordered do ‘vector.ordered = true`
Returns the maximum category acording to the order specified.
395 396 397 398 |
# File 'lib/daru_lite/category.rb', line 395 def max assert_ordered :max categories.last end |
#min ⇒ object
This operation will only work if vector is ordered. To set the vector ordered do ‘vector.ordered = true`
Returns the minimum category acording to the order specified.
381 382 383 384 |
# File 'lib/daru_lite/category.rb', line 381 def min assert_ordered :min categories.first end |
#ordered=(bool) ⇒ Object
Make categorical data ordered or unordered.
289 290 291 |
# File 'lib/daru_lite/category.rb', line 289 def ordered=(bool) @ordered = bool end |
#ordered? ⇒ Boolean
Tells whether vector is ordered or not.
278 279 280 |
# File 'lib/daru_lite/category.rb', line 278 def ordered? @ordered end |
#positions(*values) ⇒ Object
731 732 733 734 |
# File 'lib/daru_lite/category.rb', line 731 def positions(*values) values &= categories values.flat_map { |v| @cat_hash[v] }.sort end |
#reindex!(idx) ⇒ DaruLite::Vector
Unlike #reorder! which takes positions as input it takes index as an input to reorder the vector
Sets new index for vector. Preserves index->value correspondence.
546 547 548 549 550 551 552 553 554 555 556 557 |
# File 'lib/daru_lite/category.rb', line 546 def reindex!(idx) idx = DaruLite::Index.new idx unless idx.is_a? DaruLite::Index raise ArgumentError, 'Invalid index specified' unless idx.to_a.sort == index.to_a.sort old_categories = categories data = idx.map { |i| self[i] } initialize_core_attributes data self.categories = old_categories self.index = idx self end |
#reject_values(*values) ⇒ DaruLite::Vector
Return a vector with specified values removed
674 675 676 677 678 679 680 681 682 |
# File 'lib/daru_lite/category.rb', line 674 def reject_values(*values) resultant_pos = size.times.to_a - values.flat_map { |v| @cat_hash[v] } dv = at(*resultant_pos) unless dv.is_a? DaruLite::Vector pos = resultant_pos.first dv = at(pos..pos) end dv.remove_unused_categories end |
#remove_unused_categories ⇒ DaruLite::Vector
If base category is removed, then the first occuring category in the data is taken as base category. Order of the undeleted categories remains preserved.
Removes the unused categories
362 363 364 365 366 367 368 369 370 |
# File 'lib/daru_lite/category.rb', line 362 def remove_unused_categories old_categories = categories initialize_core_attributes to_a self.categories = old_categories & categories self.base_category = @cat_hash.keys.first unless categories.include? base_category self end |
#rename_categories(old_to_new) ⇒ Object
The order of categories after renaming is preserved but new categories are added at the end in the order. Also the base-category is reassigned to new value if it is renamed
Rename categories.
337 338 339 340 341 342 343 344 345 346 347 348 |
# File 'lib/daru_lite/category.rb', line 337 def rename_categories(old_to_new) old_categories = categories data = to_a.map do |cat| old_to_new.include?(cat) ? old_to_new[cat] : cat end initialize_core_attributes data self.categories = (old_categories - old_to_new.keys) | old_to_new.values self.base_category = old_to_new[base_category] if old_to_new.include? base_category self end |
#reorder!(order) ⇒ Object
Unlike #reindex! which takes index as input, it takes positions as an input to reorder the vector
Reorder the vector with given positions
523 524 525 526 527 528 529 530 531 532 |
# File 'lib/daru_lite/category.rb', line 523 def reorder!(order) raise ArgumentError, 'Invalid order specified' unless order.sort == size.times.to_a # TODO: Room for optimization old_data = to_a new_data = order.map { |i| old_data[i] } initialize_core_attributes new_data self end |
#replace_values(old_values, new_value) ⇒ DaruLite::Vector
It performs the replace in place.
Replaces specified values with a new value
725 726 727 728 729 |
# File 'lib/daru_lite/category.rb', line 725 def replace_values(old_values, new_value) old_values = [old_values] unless old_values.is_a? Array rename_hash = old_values.to_h { |v| [v, new_value] } rename_categories rename_hash end |
#set_at(positions, val) ⇒ Object
Modifies values at specified positions.
256 257 258 259 260 |
# File 'lib/daru_lite/category.rb', line 256 def set_at(positions, val) validate_positions(*positions) positions.map { |pos| modify_category_at pos, val } self end |
#size ⇒ Object
Size of categorical data.
268 269 270 |
# File 'lib/daru_lite/category.rb', line 268 def size @array.size end |
#sort ⇒ Object
438 439 440 |
# File 'lib/daru_lite/category.rb', line 438 def sort dup.sort! end |
#sort! ⇒ DaruLite::Vector
This operation will only work if vector is ordered. To set the vector ordered, do ‘vector.ordered = true`
Sorts the vector in the order specified.
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 |
# File 'lib/daru_lite/category.rb', line 415 def sort! # TODO: Simply the code assert_ordered :sort # Build sorted index old_index = @index.to_a new_index = @cat_hash.values.map do |positions| old_index.values_at(*positions) end.flatten @index = @index.class.new new_index # Build sorted data @cat_hash = categories.inject([{}, 0]) do |acc, cat| hash, count = acc cat_count = @cat_hash[cat].size cat_count.times { |i| @array[count + i] = int_from_cat(cat) } hash[cat] = (count...(cat_count + count)).to_a [hash, count + cat_count] end.first self end |
#to_a ⇒ Array
Returns all categorical data
79 80 81 |
# File 'lib/daru_lite/category.rb', line 79 def to_a each.to_a end |
#to_category ⇒ DaruLite::Vector
Does nothing since its already of type category.
627 628 629 |
# File 'lib/daru_lite/category.rb', line 627 def to_category self end |
#to_ints ⇒ Array
Returns integer coding for categorical data in the order starting from 0. For example if order is [:a, :b, :c], then :a, will be coded as 0, :b as 1 and :c as 2
507 508 509 |
# File 'lib/daru_lite/category.rb', line 507 def to_ints @array end |
#to_non_category ⇒ DaruLite::Vector
Converts a category type vector to non category type vector
633 634 635 |
# File 'lib/daru_lite/category.rb', line 633 def to_non_category DaruLite::Vector.new to_a, name: name, index: index end |
#where(bool_array) ⇒ DaruLite::Vector
For querying the data
591 592 593 |
# File 'lib/daru_lite/category.rb', line 591 def where(bool_array) DaruLite::Core::Query.vector_where self, bool_array end |