Class: Scrubyt::PostProcessor

Inherits:
Object
  • Object
show all
Defined in:
lib/scrubyt/output/post_processor.rb

Overview

The sole purpose of this class is to execute these post-processing tasks.

Class Method Summary collapse

Class Method Details

.apply_post_processing(root_pattern) ⇒ Object

This is just a convenience method do call all the postprocessing functionality and checks



18
19
20
21
22
# File 'lib/scrubyt/output/post_processor.rb', line 18

def self.apply_post_processing(root_pattern)
  ensure_presence_of_pattern_full(root_pattern)      
  remove_multiple_filter_duplicates(root_pattern) if root_pattern.children[0].filters.size > 1
  report_if_no_results(root_pattern) if root_pattern.evaluation_context.extractor.get_mode != :production
end

.ensure_presence_of_pattern_full(pattern) ⇒ Object

Apply the ensure_presence_of_pattern constraint on the full extractor



27
28
29
30
# File 'lib/scrubyt/output/post_processor.rb', line 27

def self.ensure_presence_of_pattern_full(pattern)
  ensure_presence_of_pattern(pattern)
  pattern.children.each {|child| ensure_presence_of_pattern_full(child)}
end

.remove_multiple_filter_duplicates(pattern) ⇒ Object

Remove unneeded results of a pattern (caused by evaluating multiple filters) See for example the B&N scenario - the book titles are extracted two times for every pattern (since both examples generate the same XPath for them) but since always only one of the results has a price, the other is discarded



37
38
39
40
# File 'lib/scrubyt/output/post_processor.rb', line 37

def self.remove_multiple_filter_duplicates(pattern)
  remove_multiple_filter_duplicates_intern(pattern) if pattern.parent_of_leaf
  pattern.children.each {|child| remove_multiple_filter_duplicates(child)}
end

.report_if_no_results(root_pattern) ⇒ Object

Issue an error report if the document did not extract anything. Probably this is because the structure of the page changed or because of some rather nasty bug - in any case, something wrong is going on, and we need to inform the user about this!



47
48
49
50
51
52
53
54
55
# File 'lib/scrubyt/output/post_processor.rb', line 47

def self.report_if_no_results(root_pattern)
  results_found = false
  root_pattern.children.each {|child| return if (child.result.childmap.size > 0)}
  
  Scrubyt.log :WARNING, [
    "The extractor did not find any result instances. Most probably this is wrong.",
    "Check your extractor and if you are sure it should work, report a bug!"
  ]
end