Module: Boilerpipe::Filters

Defined in:
lib/boilerpipe/filters/min_words_filter.rb,
lib/boilerpipe/filters/canola_classifier.rb,
lib/boilerpipe/filters/list_at_end_filter.rb,
lib/boilerpipe/filters/heuristic_filter_base.rb,
lib/boilerpipe/filters/block_proximity_fusion.rb,
lib/boilerpipe/filters/min_clause_words_filter.rb,
lib/boilerpipe/filters/boilerplate_block_filter.rb,
lib/boilerpipe/filters/density_rules_classifier.rb,
lib/boilerpipe/filters/keep_largest_block_filter.rb,
lib/boilerpipe/filters/terminating_blocks_finder.rb,
lib/boilerpipe/filters/num_words_rules_classifier.rb,
lib/boilerpipe/filters/simple_block_fusion_processor.rb,
lib/boilerpipe/filters/split_paragraph_blocks_filter.rb,
lib/boilerpipe/filters/expand_title_to_content_filter.rb,
lib/boilerpipe/filters/mark_everything_content_filter.rb,
lib/boilerpipe/filters/document_title_match_classifier.rb,
lib/boilerpipe/filters/ignore_blocks_after_content_filter.rb,
lib/boilerpipe/filters/trailing_headline_to_boilerplate_filter.rb,
lib/boilerpipe/filters/large_block_same_tag_level_to_content_filter.rb

Overview

Marks all blocks as content that:

are on the same tag-level as very likely main content
(usually the level of the largest  block)
have a significant number of words, currently: at least 100
Used downstream of KeepLargestBlockFilter

Defined Under Namespace

Classes: BlockProximityFusion, BoilerplateBlockFilter, CanolaClassifier, DensityRulesClassifier, DocumentTitleMatchClassifier, ExpandTitleToContentFilter, HeuristicFilterBase, IgnoreBlocksAfterContentFilter, KeepLargestBlockFilter, LargeBlockSameTagLevelToContentFilter, ListAtEndFilter, MarkEverythingContentFilter, MinClauseWordsFilter, MinWordsFilter, NumWordsRulesClassifier, SimpleBlockFusionProcessor, SplitParagraphBlocksFilter, TerminatingBlocksFinder, TrailingHeadlineToBoilerplateFilter