Module: Boilerpipe::Extractors
- Defined in:
- lib/boilerpipe/extractors/keep_everything_extractor.rb,
lib/boilerpipe/extractors/canola_extractor.rb,
lib/boilerpipe/extractors/article_extractor.rb,
lib/boilerpipe/extractors/default_extractor.rb,
lib/boilerpipe/extractors/largest_content_extractor.rb,
lib/boilerpipe/extractors/num_words_rules_extractor.rb,
lib/boilerpipe/extractors/article_sentence_extractor.rb,
lib/boilerpipe/extractors/keep_everything_with_k_min_words_extractor.rb
Overview
A full-text extractor which extracts the largest text component of a page. For news articles, it may perform better than the DefaultExtractor, but usually worse than ArticleExtractor.
Defined Under Namespace
Classes: ArticleExtractor, ArticleSentenceExtractor, CanolaExtractor, DefaultExtractor, KeepEverythingExtractor, KeepEverythingWithKMinWordsExtractor, LargestContentExtractor, NumWordsRulesExtractor