Class: PDF::Reader::OverlappingRunsFilter
- Inherits:
-
Object
- Object
- PDF::Reader::OverlappingRunsFilter
- Defined in:
- lib/pdf/reader/overlapping_runs_filter.rb
Overview
remove duplicates from a collection of TextRun objects. This can be helpful when a PDF uses slightly offset overlapping characters to achieve a fake ‘bold’ effect.
Constant Summary collapse
- OVERLAPPING_THRESHOLD =
This should be between 0 and 1. If TextRun B obscures this much of TextRun A (and they have identical characters) then one will be discarded
0.5
Class Method Summary collapse
- .detect_intersection(sweep_line_status, event_point) ⇒ Object
- .exclude_redundant_runs(runs) ⇒ Object
Class Method Details
.detect_intersection(sweep_line_status, event_point) ⇒ Object
40 41 42 43 44 45 46 47 48 49 |
# File 'lib/pdf/reader/overlapping_runs_filter.rb', line 40 def self.detect_intersection(sweep_line_status, event_point) sweep_line_status.each do |point_in_sls| if event_point.x >= point_in_sls.run.x && event_point.x <= point_in_sls.run.endx && point_in_sls.run.intersection_area_percent(event_point.run) >= OVERLAPPING_THRESHOLD return true end end return false end |
.exclude_redundant_runs(runs) ⇒ Object
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/pdf/reader/overlapping_runs_filter.rb', line 12 def self.exclude_redundant_runs(runs) sweep_line_status = Array.new event_point_schedule = Array.new to_exclude = [] runs.each do |run| event_point_schedule << EventPoint.new(run.x, run) event_point_schedule << EventPoint.new(run.endx, run) end event_point_schedule.sort! { |a,b| a.x <=> b.x } while not event_point_schedule.empty? do event_point = event_point_schedule.shift break unless event_point if event_point.start? then if detect_intersection(sweep_line_status, event_point) to_exclude << event_point.run end sweep_line_status.push event_point else sweep_line_status.delete event_point end end runs - to_exclude end |