Method: TextRank::KeywordExtractor.advanced

Defined in:
lib/text_rank/keyword_extractor.rb

.advanced(**options) ⇒ KeywordExtractor

Creates an "advanced" keyword extractor with a larger set of default filters

Options Hash (**options):

  • :char_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied prior to tokenization

  • :tokenizers (Array<Symbol, Regexp, String>)

    A list of tokenizer regular expressions to perform tokenization

  • :token_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied to each token after tokenization

  • :graph_strategy (Class, Symbol, #build_graph)

    A class or strategy instance for producing a graph from tokens

  • :rank_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied to the keyword ranks after keyword extraction

  • :strategy (Symbol)

    PageRank strategy to use (either :sparse or :dense)

  • :damping (Float)

    The probability of following the graph vs. randomly choosing a new node

  • :tolerance (Float)

    The desired accuracy of the results

Returns:



26
27
28
29
30
31
32
33
34
# File 'lib/text_rank/keyword_extractor.rb', line 26

def self.advanced(**options)
  new(**{
    char_filters:   %i[AsciiFolding Lowercase StripHtml StripEmail UndoContractions StripPossessive],
    tokenizers:     %i[Url Money Number Word Punctuation],
    token_filters:  %i[PartOfSpeech Stopwords MinLength],
    graph_strategy: :Coocurrence,
    rank_filters:   %i[CollapseAdjacent NormalizeUnitVector SortByValue],
  }.merge(options))
end