Class: Basset::FeatureSelector
- Inherits:
-
Object
- Object
- Basset::FeatureSelector
- Defined in:
- lib/basset/feature_selector.rb
Overview
This class is the feature selector. All documents in the training set should be added to the selector. Once they are in, a number of features may be selected based on the chi square value. When in doubt just call feature_with_chi_value_greater_than with an empty hash. It will return all features that have at least some statistical significance and occur in more than one document.
Defined Under Namespace
Classes: FeatureValues
Instance Attribute Summary collapse
-
#docs ⇒ Object
readonly
Returns the value of attribute docs.
Instance Method Summary collapse
-
#add_document(document) ⇒ Object
Adds a document to the feature selector.
-
#all_feature_names ⇒ Object
returns all features, regardless of chi_square or frequency.
-
#best_features(count = 10, classification = nil) ⇒ Object
returns an array of the best features for a given classification.
- #features_with_chi(classification) ⇒ Object
-
#initialize ⇒ FeatureSelector
constructor
A new instance of FeatureSelector.
- #number_of_features ⇒ Object
-
#select_features(chi_value = 1.0, classification = nil) ⇒ Object
returns an array of features that have a minimum or better chi_square value.
Constructor Details
#initialize ⇒ FeatureSelector
Returns a new instance of FeatureSelector.
11 12 13 14 15 |
# File 'lib/basset/feature_selector.rb', line 11 def initialize @docs = 0 @docs_in_class = Hash.new(0) @features = Hash.new { |h, k| h[k] = FeatureValues.new } end |
Instance Attribute Details
#docs ⇒ Object (readonly)
Returns the value of attribute docs.
9 10 11 |
# File 'lib/basset/feature_selector.rb', line 9 def docs @docs end |
Instance Method Details
#add_document(document) ⇒ Object
Adds a document to the feature selector. The document should respond_to a method vector_of_features which returns a vector of unique features.
19 20 21 22 23 24 25 26 |
# File 'lib/basset/feature_selector.rb', line 19 def add_document(document) @docs += 1 @docs_in_class[document.classification] += 1 document.vector_of_features.each do |feature| @features[feature.name].add_document_with_class(document.classification) end end |
#all_feature_names ⇒ Object
returns all features, regardless of chi_square or frequency
29 30 31 |
# File 'lib/basset/feature_selector.rb', line 29 def all_feature_names @features.keys end |
#best_features(count = 10, classification = nil) ⇒ Object
returns an array of the best features for a given classification
38 39 40 |
# File 'lib/basset/feature_selector.rb', line 38 def best_features(count = 10, classification = nil) select_features(1.0, classification).first(count) end |
#features_with_chi(classification) ⇒ Object
42 43 44 45 46 |
# File 'lib/basset/feature_selector.rb', line 42 def features_with_chi(classification) @features.keys.map do |feature_name| Feature.new(feature_name, chi_squared(feature_name, classification)) end end |
#number_of_features ⇒ Object
33 34 35 |
# File 'lib/basset/feature_selector.rb', line 33 def number_of_features @features.size end |
#select_features(chi_value = 1.0, classification = nil) ⇒ Object
returns an array of features that have a minimum or better chi_square value.
49 50 51 52 53 54 55 56 57 |
# File 'lib/basset/feature_selector.rb', line 49 def select_features(chi_value = 1.0, classification = nil) classification ||= @docs_in_class.keys.first selected_features = features_with_chi(classification).select do |feature| (docs_with_feature(feature.name) > 1) && (feature.value >= chi_value) end selected_features.sort_by(&:value).reverse.collect(&:name) end |