Module: Daru::Category
- Defined in:
- lib/daru/category.rb
Overview
rubocop:disable Metrics/ModuleLength
Constant Summary collapse
- CODING_SCHEMES =
[:dummy, :deviation, :helmert, :simple].freeze
Instance Attribute Summary collapse
-
#base_category ⇒ Object
Returns the value of attribute base_category.
-
#coding_scheme ⇒ Object
Returns the value of attribute coding_scheme.
-
#index ⇒ Object
Returns the value of attribute index.
-
#name ⇒ Object
Returns the value of attribute name.
Instance Method Summary collapse
-
#==(other) ⇒ Object
Two categorical vectors are equal if their index and corresponding values are same return [true, false] true if two vectors are similar.
-
#[](*indexes) ⇒ Object
Returns vector for indexes/positions specified.
-
#[]=(*indexes, val) ⇒ Object
Modifies values at specified indexes/positions.
-
#add_category(*new_categories) ⇒ Object
Associates a category to the vector.
-
#at(*positions) ⇒ Object
Returns vector for positions specified.
-
#categories ⇒ Array
(also: #order)
Returns all the categories with the inherent order.
-
#categories=(cat_with_order) ⇒ Object
Sets order of the categories.
-
#contrast_code(opts = {}) ⇒ Daru::DataFrame
Contrast code the vector acording to the coding scheme set.
-
#count(category) ⇒ Object
Returns frequency of given category.
-
#count_values(*values) ⇒ Integer
Count the number of values specified.
-
#describe ⇒ Daru::Vector
Gives the summary of data using following parameters - size: size of the data - categories: total number of categories - max_freq: Max no of times a category occurs - max_category: The category which occurs max no of times - min_freq: Min no of times a category occurs - min_category: The category which occurs min no of times.
-
#dup ⇒ Daru::Vector
Duplicated a vector.
-
#each ⇒ Enumerator
Returns an enumerator that enumerates on categorical data.
-
#frequencies(type = :count) ⇒ Daru::Vector
Returns a vector storing count/frequency of each category.
-
#include_values?(*values) ⇒ true, false
Check if any one of mentioned values occur in the vector.
-
#indexes(*values) ⇒ Array
Return indexes of values specified.
-
#initialize_category(data, opts = {}) ⇒ Object
Initializes a vector to store categorical data.
-
#max ⇒ object
Returns the maximum category acording to the order specified.
-
#min ⇒ object
Returns the minimum category acording to the order specified.
-
#ordered=(bool) ⇒ Object
Make categorical data ordered or unordered.
-
#ordered? ⇒ Boolean
Tells whether vector is ordered or not.
- #plotting_library=(lib) ⇒ Object
- #positions(*values) ⇒ Object
-
#reindex!(idx) ⇒ Daru::Vector
Sets new index for vector.
-
#reject_values(*values) ⇒ Daru::Vector
Return a vector with specified values removed.
-
#remove_unused_categories ⇒ Daru::Vector
Removes the unused categories.
-
#rename_categories(old_to_new) ⇒ Object
Rename categories.
-
#reorder!(order) ⇒ Object
Reorder the vector with given positions.
-
#replace_values(old_values, new_value) ⇒ Daru::Vector
Replaces specified values with a new value.
-
#set_at(positions, val) ⇒ Object
Modifies values at specified positions.
-
#size ⇒ Object
Size of categorical data.
- #sort ⇒ Object
-
#sort! ⇒ Daru::Vector
Sorts the vector in the order specified.
-
#to_a ⇒ Array
Returns all categorical data.
-
#to_category ⇒ Daru::Vector
Does nothing since its already of type category.
-
#to_ints ⇒ Array
Returns integer coding for categorical data in the order starting from 0.
-
#to_non_category ⇒ Daru::Vector
Converts a category type vector to non category type vector.
-
#where(bool_array) ⇒ Daru::Vector
For querying the data.
Instance Attribute Details
#base_category ⇒ Object
Returns the value of attribute base_category.
3 4 5 |
# File 'lib/daru/category.rb', line 3 def base_category @base_category end |
#coding_scheme ⇒ Object
Returns the value of attribute coding_scheme.
4 5 6 |
# File 'lib/daru/category.rb', line 4 def coding_scheme @coding_scheme end |
#index ⇒ Object
Returns the value of attribute index.
4 5 6 |
# File 'lib/daru/category.rb', line 4 def index @index end |
#name ⇒ Object
Returns the value of attribute name.
4 5 6 |
# File 'lib/daru/category.rb', line 4 def name @name end |
Instance Method Details
#==(other) ⇒ Object
Two categorical vectors are equal if their index and corresponding values are same return [true, false] true if two vectors are similar
500 501 502 503 504 |
# File 'lib/daru/category.rb', line 500 def == other size == other.size && to_a == other.to_a && index == other.index end |
#[](*indexes) ⇒ Object
Since it accepts both indexes and postions. In case of collision, arguement will be treated as index
Returns vector for indexes/positions specified
186 187 188 189 190 191 192 193 194 195 196 |
# File 'lib/daru/category.rb', line 186 def [] *indexes positions = @index.pos(*indexes) return category_from_position(positions) if positions.is_a? Integer Daru::Vector.new positions.map { |pos| category_from_position pos }, index: @index.subset(*indexes), name: @name, type: :category, ordered: @ordered, categories: categories end |
#[]=(*indexes, val) ⇒ Object
In order to add a new category you need to associate it via #add_category
Modifies values at specified indexes/positions.
240 241 242 243 244 245 246 247 248 249 |
# File 'lib/daru/category.rb', line 240 def []= *indexes, val positions = @index.pos(*indexes) if positions.is_a? Numeric modify_category_at positions, val else positions.each { |pos| modify_category_at pos, val } end self end |
#add_category(*new_categories) ⇒ Object
Associates a category to the vector.
122 123 124 125 |
# File 'lib/daru/category.rb', line 122 def add_category(*new_categories) new_categories -= categories add_extra_categories new_categories end |
#at(*positions) ⇒ Object
Returns vector for positions specified.
209 210 211 212 213 214 215 216 217 218 219 220 221 222 |
# File 'lib/daru/category.rb', line 209 def at *positions original_positions = positions positions = coerce_positions(*positions) validate_positions(*positions) return category_from_position(positions) if positions.is_a? Integer Daru::Vector.new positions.map { |pos| category_from_position(pos) }, index: @index.at(*original_positions), name: @name, type: :category, ordered: @ordered, categories: categories end |
#categories ⇒ Array Also known as: order
Returns all the categories with the inherent order
310 311 312 |
# File 'lib/daru/category.rb', line 310 def categories @cat_hash.keys end |
#categories=(cat_with_order) ⇒ Object
If extra categories are specified, they get added too.
Sets order of the categories.
324 325 326 327 328 |
# File 'lib/daru/category.rb', line 324 def categories= cat_with_order validate_categories(cat_with_order) add_extra_categories(cat_with_order - categories) order_with cat_with_order end |
#contrast_code(opts = {}) ⇒ Daru::DataFrame
To set the coding scheme use #coding_scheme=
Contrast code the vector acording to the coding scheme set.
482 483 484 485 486 487 488 489 |
# File 'lib/daru/category.rb', line 482 def contrast_code opts={} if opts[:user_defined] user_defined_coding(opts[:user_defined]) else # TODO: Make various coding schemes code DRY send("#{coding_scheme}_coding".to_sym, opts[:full] || false) end end |
#count(category) ⇒ Object
Returns frequency of given category
134 135 136 137 138 139 |
# File 'lib/daru/category.rb', line 134 def count category raise ArgumentError, "Invalid category #{category}" unless categories.include?(category) @cat_hash[category].size end |
#count_values(*values) ⇒ Integer
Count the number of values specified
698 699 700 701 702 |
# File 'lib/daru/category.rb', line 698 def count_values(*values) values.map { |v| @cat_hash[v].size if @cat_hash.include? v } .compact .inject(0, :+) end |
#describe ⇒ Daru::Vector
Gives the summary of data using following parameters
-
size: size of the data
-
categories: total number of categories
-
max_freq: Max no of times a category occurs
-
max_category: The category which occurs max no of times
-
min_freq: Min no of times a category occurs
-
min_category: The category which occurs min no of times
621 622 623 624 625 626 627 628 629 630 |
# File 'lib/daru/category.rb', line 621 def describe Daru::Vector.new( size: size, categories: categories.size, max_freq: @cat_hash.values.map(&:size).max, max_category: @cat_hash.keys.max_by { |cat| @cat_hash[cat].size }, min_freq: @cat_hash.values.map(&:size).min, min_category: @cat_hash.keys.min_by { |cat| @cat_hash[cat].size } ) end |
#dup ⇒ Daru::Vector
Duplicated a vector
106 107 108 109 110 111 112 113 |
# File 'lib/daru/category.rb', line 106 def dup Daru::Vector.new to_a.dup, name: @name, index: @index.dup, type: :category, categories: categories, ordered: ordered? end |
#each ⇒ Enumerator
Returns an enumerator that enumerates on categorical data
79 80 81 82 83 |
# File 'lib/daru/category.rb', line 79 def each return enum_for(:each) unless block_given? @array.each { |pos| yield cat_from_int pos } self end |
#frequencies(type = :count) ⇒ Daru::Vector
Returns a vector storing count/frequency of each category
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/daru/category.rb', line 152 def frequencies type=:count counts = @cat_hash.values.map(&:size) values = case type when :count counts when :fraction counts.map { |c| c / size.to_f } when :percentage counts.map { |c| c / size.to_f * 100 } else raise ArgumentError, 'Type should be either :count, :fraction or'\ " :percentage. #{type} not supported." end Daru::Vector.new values, index: categories, name: name end |
#include_values?(*values) ⇒ true, false
Check if any one of mentioned values occur in the vector
668 669 670 |
# File 'lib/daru/category.rb', line 668 def include_values?(*values) values.any? { |v| @cat_hash.include?(v) && !@cat_hash[v].empty? } end |
#indexes(*values) ⇒ Array
Return indexes of values specified
711 712 713 714 |
# File 'lib/daru/category.rb', line 711 def indexes(*values) values &= categories index.to_a.values_at(*values.flat_map { |v| @cat_hash[v] }.sort) end |
#initialize_category(data, opts = {}) ⇒ Object
Base category is set to the first category encountered in the vector.
Initializes a vector to store categorical data.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/daru/category.rb', line 26 def initialize_category data, opts={} @type = :category initialize_core_attributes data if opts[:categories] validate_categories(opts[:categories]) add_extra_categories(opts[:categories] - categories) order_with opts[:categories] end # Specify if the categories are ordered or not. # By default its unordered @ordered = opts[:ordered] || false # The coding scheme to code with. Default is dummy coding. @coding_scheme = :dummy # Base category which won't be present in the coding @base_category = @cat_hash.keys.first # Stores the name of the vector @name = opts[:name] # Index of the vector @index = coerce_index opts[:index] self end |
#max ⇒ object
This operation will only work if vector is ordered. To set the vector ordered do ‘vector.ordered = true`
Returns the maximum category acording to the order specified.
404 405 406 407 |
# File 'lib/daru/category.rb', line 404 def max assert_ordered :max categories.last end |
#min ⇒ object
This operation will only work if vector is ordered. To set the vector ordered do ‘vector.ordered = true`
Returns the minimum category acording to the order specified.
390 391 392 393 |
# File 'lib/daru/category.rb', line 390 def min assert_ordered :min categories.first end |
#ordered=(bool) ⇒ Object
Make categorical data ordered or unordered.
298 299 300 |
# File 'lib/daru/category.rb', line 298 def ordered= bool @ordered = bool end |
#ordered? ⇒ Boolean
Tells whether vector is ordered or not.
287 288 289 |
# File 'lib/daru/category.rb', line 287 def ordered? @ordered end |
#plotting_library=(lib) ⇒ Object
60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/daru/category.rb', line 60 def plotting_library= lib case lib when :gruff, :nyaplot @plotting_library = lib if Daru.send("has_#{lib}?".to_sym) extend Module.const_get( "Daru::Plotting::Category::#{lib.to_s.capitalize}Library" ) end else raise ArgumentError, "Plotting library #{lib} not supported. "\ 'Supported libraries are :nyaplot and :gruff' end end |
#positions(*values) ⇒ Object
739 740 741 742 |
# File 'lib/daru/category.rb', line 739 def positions(*values) values &= categories values.flat_map { |v| @cat_hash[v] }.sort end |
#reindex!(idx) ⇒ Daru::Vector
Unlike #reorder! which takes positions as input it takes index as an input to reorder the vector
Sets new index for vector. Preserves index->value correspondence.
553 554 555 556 557 558 559 560 561 562 563 564 |
# File 'lib/daru/category.rb', line 553 def reindex! idx idx = Daru::Index.new idx unless idx.is_a? Daru::Index raise ArgumentError, 'Invalid index specified' unless idx.to_a.sort == index.to_a.sort old_categories = categories data = idx.map { |i| self[i] } initialize_core_attributes data self.categories = old_categories self.index = idx self end |
#reject_values(*values) ⇒ Daru::Vector
Return a vector with specified values removed
681 682 683 684 685 686 687 688 689 |
# File 'lib/daru/category.rb', line 681 def reject_values(*values) resultant_pos = size.times.to_a - values.flat_map { |v| @cat_hash[v] } dv = at(*resultant_pos) unless dv.is_a? Daru::Vector pos = resultant_pos.first dv = at(pos..pos) end dv.remove_unused_categories end |
#remove_unused_categories ⇒ Daru::Vector
If base category is removed, then the first occuring category in the data is taken as base category. Order of the undeleted categories remains preserved.
Removes the unused categories
371 372 373 374 375 376 377 378 379 |
# File 'lib/daru/category.rb', line 371 def remove_unused_categories old_categories = categories initialize_core_attributes to_a self.categories = old_categories & categories self.base_category = @cat_hash.keys.first unless categories.include? base_category self end |
#rename_categories(old_to_new) ⇒ Object
The order of categories after renaming is preserved but new categories are added at the end in the order. Also the base-category is reassigned to new value if it is renamed
Rename categories.
346 347 348 349 350 351 352 353 354 355 356 357 |
# File 'lib/daru/category.rb', line 346 def rename_categories old_to_new old_categories = categories data = to_a.map do |cat| old_to_new.include?(cat) ? old_to_new[cat] : cat end initialize_core_attributes data self.categories = (old_categories - old_to_new.keys) | old_to_new.values self.base_category = old_to_new[base_category] if old_to_new.include? base_category self end |
#reorder!(order) ⇒ Object
Unlike #reindex! which takes index as input, it takes positions as an input to reorder the vector
Reorder the vector with given positions
531 532 533 534 535 536 537 538 539 |
# File 'lib/daru/category.rb', line 531 def reorder! order raise ArgumentError, 'Invalid order specified' unless order.sort == size.times.to_a # TODO: Room for optimization old_data = to_a new_data = order.map { |i| old_data[i] } initialize_core_attributes new_data self end |
#replace_values(old_values, new_value) ⇒ Daru::Vector
It performs the replace in place.
Replaces specified values with a new value
733 734 735 736 737 |
# File 'lib/daru/category.rb', line 733 def replace_values old_values, new_value old_values = [old_values] unless old_values.is_a? Array rename_hash = old_values.map { |v| [v, new_value] }.to_h rename_categories rename_hash end |
#set_at(positions, val) ⇒ Object
Modifies values at specified positions.
265 266 267 268 269 |
# File 'lib/daru/category.rb', line 265 def set_at positions, val validate_positions(*positions) positions.map { |pos| modify_category_at pos, val } self end |
#size ⇒ Object
Size of categorical data.
277 278 279 |
# File 'lib/daru/category.rb', line 277 def size @array.size end |
#sort ⇒ Object
447 448 449 |
# File 'lib/daru/category.rb', line 447 def sort dup.sort! end |
#sort! ⇒ Daru::Vector
This operation will only work if vector is ordered. To set the vector ordered, do ‘vector.ordered = true`
Sorts the vector in the order specified.
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 |
# File 'lib/daru/category.rb', line 424 def sort! # rubocop:disable Metrics/AbcSize # TODO: Simply the code assert_ordered :sort # Build sorted index old_index = @index.to_a new_index = @cat_hash.values.map do |positions| old_index.values_at(*positions) end.flatten @index = @index.class.new new_index # Build sorted data @cat_hash = categories.inject([{}, 0]) do |acc, cat| hash, count = acc cat_count = @cat_hash[cat].size cat_count.times { |i| @array[count+i] = int_from_cat(cat) } hash[cat] = (count...(cat_count+count)).to_a [hash, count + cat_count] end.first self end |
#to_a ⇒ Array
Returns all categorical data
91 92 93 |
# File 'lib/daru/category.rb', line 91 def to_a each.to_a end |
#to_category ⇒ Daru::Vector
Does nothing since its already of type category.
634 635 636 |
# File 'lib/daru/category.rb', line 634 def to_category self end |
#to_ints ⇒ Array
Returns integer coding for categorical data in the order starting from 0. For example if order is [:a, :b, :c], then :a, will be coded as 0, :b as 1 and :c as 2
515 516 517 |
# File 'lib/daru/category.rb', line 515 def to_ints @array end |
#to_non_category ⇒ Daru::Vector
Converts a category type vector to non category type vector
640 641 642 |
# File 'lib/daru/category.rb', line 640 def to_non_category Daru::Vector.new to_a, name: name, index: index end |
#where(bool_array) ⇒ Daru::Vector
For querying the data
598 599 600 |
# File 'lib/daru/category.rb', line 598 def where bool_array Daru::Core::Query.vector_where self, bool_array end |