Class: Basset::FeatureExtractor
- Inherits:
-
Object
- Object
- Basset::FeatureExtractor
- Includes:
- YamlSerialization
- Defined in:
- lib/basset/feature_extractor.rb
Overview
Extracts features from a document. On initialization it expects the set of features that are to be extracted from documents. The extracted features will just be numbered in ascending order. This makes it easy to output feature sets for libraries like svmlight.
Instance Method Summary collapse
-
#extract(document) ⇒ Object
just returns the features from the document that the extractor is interested in.
-
#extract_numbered(document) ⇒ Object
returns an array of features, but with their names replaced with an integer identifier.
-
#initialize(feature_names) ⇒ FeatureExtractor
constructor
the constructor takes an array of feature names.
- #number_of_features ⇒ Object
Methods included from YamlSerialization
Constructor Details
#initialize(feature_names) ⇒ FeatureExtractor
the constructor takes an array of feature names. These are the features that will be extracted from documents. All others will be ignored.
13 14 15 16 |
# File 'lib/basset/feature_extractor.rb', line 13 def initialize(feature_names) @feature_names = {} feature_names.each_with_index {|feature_name, index| @feature_names[feature_name] = index + 1} end |
Instance Method Details
#extract(document) ⇒ Object
just returns the features from the document that the extractor is interested in
33 34 35 36 37 |
# File 'lib/basset/feature_extractor.rb', line 33 def extract(document) document.vector_of_features.find_all do |feature| @feature_names[feature.name] end end |
#extract_numbered(document) ⇒ Object
returns an array of features, but with their names replaced with an integer identifier. They should be sorted in ascending identifier order. This is a generic representation that works well with other machine learning packages like svm_light.
25 26 27 28 29 30 |
# File 'lib/basset/feature_extractor.rb', line 25 def extract_numbered(document) numbered_features = extract(document).collect do |feature| Feature.new(@feature_names[feature.name], feature.value) end numbered_features.sort end |
#number_of_features ⇒ Object
18 19 20 |
# File 'lib/basset/feature_extractor.rb', line 18 def number_of_features @feature_names.size end |