Class: GraphRank::Keywords
- Defined in:
- lib/graph-rank/keywords.rb
Overview
Implement the PageRank algorithm for unsupervised keyword extraction.
Constant Summary
Constants inherited from TextRank
Instance Attribute Summary
Attributes inherited from TextRank
Instance Method Summary collapse
-
#build_graph ⇒ Object
Build the co-occurence graph for an n-gram.
-
#clean_text ⇒ Object
Clean text leaving just letters from a-z.
-
#filter_features ⇒ Object
Remove short and stop words.
-
#get_features ⇒ Object
Split the text on words.
-
#remove_short_words ⇒ Object
Remove 1 and 2 char words.
-
#remove_stop_words ⇒ Object
Remove all stop words.
Methods inherited from TextRank
#calculate_ranking, #initialize, #run
Constructor Details
This class inherits a constructor from GraphRank::TextRank
Instance Method Details
#build_graph ⇒ Object
Build the co-occurence graph for an n-gram.
35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/graph-rank/keywords.rb', line 35 def build_graph @features.each_with_index do |f,i| min, max = i - @ngram_size, i + @ngram_size while min < max if @features[min] and min != i @ranking.add(@features[i], @features[min]) end min += 1 end end end |
#clean_text ⇒ Object
Clean text leaving just letters from a-z.
18 19 20 21 22 |
# File 'lib/graph-rank/keywords.rb', line 18 def clean_text @text.downcase! @text.gsub!(/[^a-z ]/, ' ') @text.gsub!(/\s+/, " ") end |
#filter_features ⇒ Object
Remove short and stop words.
12 13 14 15 |
# File 'lib/graph-rank/keywords.rb', line 12 def filter_features remove_short_words remove_stop_words end |
#get_features ⇒ Object
Split the text on words.
6 7 8 9 |
# File 'lib/graph-rank/keywords.rb', line 6 def get_features clean_text @features = @text.split(' ') end |
#remove_short_words ⇒ Object
Remove 1 and 2 char words.
30 31 32 |
# File 'lib/graph-rank/keywords.rb', line 30 def remove_short_words @features.delete_if { |word| word.length < 3 } end |
#remove_stop_words ⇒ Object
Remove all stop words.
25 26 27 |
# File 'lib/graph-rank/keywords.rb', line 25 def remove_stop_words @features.delete_if { |word| @stop_words.include?(word) } end |