Class: GraphRank::Keywords

Inherits:
TextRank show all
Defined in:
lib/graph-rank/keywords.rb

Overview

Implement the PageRank algorithm for unsupervised keyword extraction.

Constant Summary

Constants inherited from TextRank

TextRank::StopWords

Instance Attribute Summary

Attributes inherited from TextRank

#stop_words

Instance Method Summary collapse

Methods inherited from TextRank

#calculate_ranking, #initialize, #run

Constructor Details

This class inherits a constructor from GraphRank::TextRank

Instance Method Details

#build_graphObject

Build the co-occurence graph for an n-gram.



35
36
37
38
39
40
41
42
43
44
45
# File 'lib/graph-rank/keywords.rb', line 35

def build_graph
  @features.each_with_index do |f,i|
    min, max = i - @ngram_size, i + @ngram_size
    while min < max
      if @features[min] and min != i
        @ranking.add(@features[i], @features[min])
      end
      min += 1
    end
  end
end

#clean_textObject

Clean text leaving just letters from a-z.



18
19
20
21
22
# File 'lib/graph-rank/keywords.rb', line 18

def clean_text
  @text.downcase!
  @text.gsub!(/[^a-z ]/, ' ')
  @text.gsub!(/\s+/, " ")
end

#filter_featuresObject

Remove short and stop words.



12
13
14
15
# File 'lib/graph-rank/keywords.rb', line 12

def filter_features
  remove_short_words
  remove_stop_words
end

#get_featuresObject

Split the text on words.



6
7
8
9
# File 'lib/graph-rank/keywords.rb', line 6

def get_features
  clean_text
  @features = @text.split(' ')
end

#remove_short_wordsObject

Remove 1 and 2 char words.



30
31
32
# File 'lib/graph-rank/keywords.rb', line 30

def remove_short_words
  @features.delete_if { |word| word.length < 3 }
end

#remove_stop_wordsObject

Remove all stop words.



25
26
27
# File 'lib/graph-rank/keywords.rb', line 25

def remove_stop_words
  @features.delete_if { |word| @stop_words.include?(word) }
end