Class: Boilerpipe::Extractors::DefaultExtractor
- Inherits:
-
Object
- Object
- Boilerpipe::Extractors::DefaultExtractor
- Defined in:
- lib/boilerpipe/extractors/default_extractor.rb
Class Method Summary collapse
Class Method Details
.process(doc) ⇒ Object
9 10 11 12 13 14 15 16 17 18 19 20 21 |
# File 'lib/boilerpipe/extractors/default_extractor.rb', line 9 def self.process(doc) filters = ::Boilerpipe::Filters # merge adjacent blocks with equal text_density filters::SimpleBlockFusionProcessor.process doc # merge text blocks next to each other filters::BlockProximityFusion::MAX_DISTANCE_1.process doc # marks text blocks as content / non-content using boilerpipe alg filters::DensityRulesClassifier.process doc doc end |
.text(contents) ⇒ Object
3 4 5 6 7 |
# File 'lib/boilerpipe/extractors/default_extractor.rb', line 3 def self.text(contents) doc = ::Boilerpipe::SAX::BoilerpipeHTMLParser.parse(contents) ::Boilerpipe::Extractors::DefaultExtractor.process doc doc.content end |