Class: TextRank::KeywordExtractor
- Inherits:
-
Object
- Object
- TextRank::KeywordExtractor
- Defined in:
- lib/text_rank/keyword_extractor.rb
Overview
Primary class for keyword extraction and hub for filters, tokenizers, and graph strategies # that customize how the text is processed and how the TextRank algorithm is applied.
Class Method Summary collapse
-
.advanced(**options) ⇒ KeywordExtractor
Creates an “advanced” keyword extractor with a larger set of default filters.
-
.basic(**options) ⇒ KeywordExtractor
Creates a “basic” keyword extractor with default options.
Instance Method Summary collapse
-
#add_char_filter(filter, **options) ⇒ nil
Add a new CharFilter for processing text before tokenization.
-
#add_rank_filter(filter, **options) ⇒ nil
Add a new RankFilter for processing ranks after calculating.
-
#add_token_filter(filter, **options) ⇒ nil
Add a new TokenFilter for processing tokens after tokenization.
-
#add_tokenizer(tokenizer, **options) ⇒ nil
Add a tokenizer regular expression for producing tokens from filtered text.
-
#extract(text, **options) ⇒ Hash<String, Float>
Filter & tokenize text, and return PageRank.
-
#graph_strategy=(strategy) ⇒ Class, ...
Sets the graph strategy for producing a graph from tokens.
-
#initialize(**options) ⇒ KeywordExtractor
constructor
A new instance of KeywordExtractor.
-
#tokenize(text) ⇒ Array<String>
Filters and tokenizes text.
Constructor Details
#initialize(**options) ⇒ KeywordExtractor
Returns a new instance of KeywordExtractor.
42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/text_rank/keyword_extractor.rb', line 42 def initialize(**) @page_rank_options = { strategy: [:strategy] || :sparse, damping: [:damping], tolerance: [:tolerance], } @char_filters = [:char_filters] || [] @tokenizers = [:tokenizers] || [Tokenizer::Word] @token_filters = [:token_filters] || [] @rank_filters = [:rank_filters] || [] @graph_strategy = [:graph_strategy] || GraphStrategy::Coocurrence end |
Class Method Details
.advanced(**options) ⇒ KeywordExtractor
Creates an “advanced” keyword extractor with a larger set of default filters
26 27 28 29 30 31 32 33 34 |
# File 'lib/text_rank/keyword_extractor.rb', line 26 def self.advanced(**) new(**{ char_filters: [:AsciiFolding, :Lowercase, :StripHtml, :StripEmail, :UndoContractions, :StripPossessive], tokenizers: [:Url, :Money, :Number, :Word, :Punctuation], token_filters: [:PartOfSpeech, :Stopwords, :MinLength], graph_strategy: :Coocurrence, rank_filters: [:CollapseAdjacent, :NormalizeUnitVector, :SortByValue], }.merge()) end |
.basic(**options) ⇒ KeywordExtractor
Creates a “basic” keyword extractor with default options
14 15 16 17 18 19 20 21 |
# File 'lib/text_rank/keyword_extractor.rb', line 14 def self.basic(**) new(**{ char_filters: [:AsciiFolding, :Lowercase], tokenizers: [:Word], token_filters: [:Stopwords, :MinLength], graph_strategy: :Coocurrence, }.merge()) end |
Instance Method Details
#add_char_filter(filter, **options) ⇒ nil
Add a new CharFilter for processing text before tokenization
59 60 61 62 |
# File 'lib/text_rank/keyword_extractor.rb', line 59 def add_char_filter(filter, **) add_into(@char_filters, filter, **) nil end |
#add_rank_filter(filter, **options) ⇒ nil
Add a new RankFilter for processing ranks after calculating
93 94 95 96 |
# File 'lib/text_rank/keyword_extractor.rb', line 93 def add_rank_filter(filter, **) add_into(@rank_filters, filter, **) nil end |
#add_token_filter(filter, **options) ⇒ nil
Add a new TokenFilter for processing tokens after tokenization
84 85 86 87 |
# File 'lib/text_rank/keyword_extractor.rb', line 84 def add_token_filter(filter, **) add_into(@token_filters, filter, **) nil end |
#add_tokenizer(tokenizer, **options) ⇒ nil
Add a tokenizer regular expression for producing tokens from filtered text
68 69 70 71 |
# File 'lib/text_rank/keyword_extractor.rb', line 68 def add_tokenizer(tokenizer, **) add_into(@tokenizers, tokenizer, **) nil end |
#extract(text, **options) ⇒ Hash<String, Float>
Filter & tokenize text, and return PageRank
110 111 112 113 114 115 116 |
# File 'lib/text_rank/keyword_extractor.rb', line 110 def extract(text, **) tokens = tokenize(text) graph = PageRank.new(**@page_rank_options) classify(@graph_strategy, context: GraphStrategy).build_graph(tokens, graph) ranks = graph.calculate(**) apply_rank_filters(ranks, tokens: tokens, original_text: text) end |
#graph_strategy=(strategy) ⇒ Class, ...
Sets the graph strategy for producing a graph from tokens
76 77 78 |
# File 'lib/text_rank/keyword_extractor.rb', line 76 def graph_strategy=(strategy) @graph_strategy = strategy end |