Class: Boilerpipe::Extractors::KeepEverythingWithKMinWordsExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/boilerpipe/extractors/keep_everything_with_k_min_words_extractor.rb

Class Method Summary collapse

Class Method Details

.process(min, doc) ⇒ Object



13
14
15
16
17
18
# File 'lib/boilerpipe/extractors/keep_everything_with_k_min_words_extractor.rb', line 13

def self.process(min, doc)
  ::Boilerpipe::Filters::SimpleBlockFusionProcessor.process doc
  ::Boilerpipe::Filters::MarkEverythingContentFilter.process doc
  ::Boilerpipe::Filters::MinWordsFilter.process min, doc
  doc
end

.text(min, contents) ⇒ Object



7
8
9
10
11
# File 'lib/boilerpipe/extractors/keep_everything_with_k_min_words_extractor.rb', line 7

def self.text(min, contents)
  doc = ::Boilerpipe::SAX::BoilerpipeHTMLParser.parse(contents)
  ::Boilerpipe::Extractors::KeepEverythingWithKMinWordsExtractor.process min, doc
  doc.content
end