Class: PDF::Reader::PagesStrategy

Inherits:
AbstractStrategy show all
Defined in:
lib/pdf/reader/pages_strategy.rb

Overview

Walks the pages of the PDF file and calls the appropriate callback methods when something of interest is found.

The callback methods should exist on the receiver object passed into the constructor. Whenever some content is found that will trigger a callback, the receiver is checked to see if the callback is defined.

If it is defined it will be called. If not, processing will continue.

Available Callbacks

The following callbacks are available and should be methods defined on your receiver class. Only implement the ones you need - the rest will be ignored.

Some callbacks will include parameters which will be passed in as an array. For callbacks that supply no paramters, or where you don’t need them, the *params argument can be left off. Some example callback method definitions are:

def begin_document
def end_page
def show_text(string, *params)
def fill_stroke(*params)

You should be able to infer the basic command the callback is reporting based on the name. For further experimentation, define the callback with just a *params parameter, then print out the contents of the array using something like:

puts params.inspect

Text Callbacks

All text passed into these callbacks will be encoded as UTF-8. Depending on where (and when) the PDF was generated, there’s a good chance the text is NOT stored as UTF-8 internally so be careful when doing a comparison on strings returned from PDF::Reader (when doing unit tests for example). The string may not be byte-by-byte identical with the string that was originally written to the PDF.

  • end_text_object

  • move_to_start_of_next_line

  • set_character_spacing

  • move_text_position

  • move_text_position_and_set_leading

  • set_text_font_and_size

  • show_text

  • show_text_with_positioning

  • set_text_leading

  • set_text_matrix_and_text_line_matrix

  • set_text_rendering_mode

  • set_text_rise

  • set_word_spacing

  • set_horizontal_text_scaling

  • move_to_next_line_and_show_text

  • set_spacing_next_line_show_text

If the :raw_text option was passed to the PDF::Reader class the following callbacks may also appear:

  • show_text_raw

  • show_text_with_positioning_raw

  • move_to_next_line_and_show_text_raw

  • set_spacing_next_line_show_text_raw

Graphics Callbacks

  • close_fill_stroke

  • fill_stroke

  • close_fill_stroke_with_even_odd

  • fill_stroke_with_even_odd

  • begin_marked_content_with_pl

  • begin_inline_image

  • begin_marked_content

  • begin_text_object

  • append_curved_segment

  • concatenate_matrix

  • set_stroke_color_space

  • set_nonstroke_color_space

  • set_line_dash

  • set_glyph_width

  • set_glyph_width_and_bounding_box

  • invoke_xobject

  • define_marked_content_with_pl

  • end_inline_image

  • end_marked_content

  • fill_path_with_nonzero

  • fill_path_with_nonzero

  • fill_path_with_even_odd

  • set_gray_for_stroking

  • set_gray_for_nonstroking

  • set_graphics_state_parameters

  • close_subpath

  • set_flatness_tolerance

  • begin_inline_image_data

  • set_line_join_style

  • set_line_cap_style

  • set_cmyk_color_for_stroking,

  • set_cmyk_color_for_nonstroking

  • append_line

  • begin_new_subpath

  • set_miter_limit

  • define_marked_content_point

  • end_path

  • save_graphics_state

  • restore_graphics_state

  • append_rectangle

  • set_rgb_color_for_stroking

  • set_rgb_color_for_nonstroking

  • set_color_rendering_intent

  • close_and_stroke_path

  • stroke_path

  • set_color_for_stroking

  • set_color_for_nonstroking

  • set_color_for_stroking_and_special

  • set_color_for_nonstroking_and_special

  • paint_area_with_shading_pattern

  • append_curved_segment_initial_point_replicated

  • set_line_width

  • set_clipping_path_with_nonzero

  • set_clipping_path_with_even_odd

  • append_curved_segment_final_point_replicated

Misc Callbacks

  • begin_compatibility_section

  • end_compatibility_section,

  • begin_document

  • end_document

  • begin_page_container

  • end_page_container

  • begin_page

  • end_page

  • metadata

  • xml_metadata

  • page_count

  • begin_form_xobject

  • end_form_xobject

Resource Callbacks

Each page can contain (or inherit) a range of resources required for the page, including things like fonts and images. The following callbacks may appear after begin_page if the relevant resources exist on a page:

  • resource_procset

  • resource_xobject

  • resource_extgstate

  • resource_colorspace

  • resource_pattern

  • resource_font

In most cases, these callbacks associate a name with each resource, allowing it to be referred to by name in the page content. For example, an XObject can hold an image. If it gets mapped to the name “IM1”, then it can be placed on the page using invoke_xobject “IM1”.

DEPRECATED: this class was deprecated in version 0.11.0 and will

eventually be removed

Constant Summary collapse

OPERATORS =

:nodoc:

{
  'b'   => :close_fill_stroke,
  'B'   => :fill_stroke,
  'b*'  => :close_fill_stroke_with_even_odd,
  'B*'  => :fill_stroke_with_even_odd,
  'BDC' => :begin_marked_content_with_pl,
  'BI'  => :begin_inline_image,
  'BMC' => :begin_marked_content,
  'BT'  => :begin_text_object,
  'BX'  => :begin_compatibility_section,
  'c'   => :append_curved_segment,
  'cm'  => :concatenate_matrix,
  'CS'  => :set_stroke_color_space,
  'cs'  => :set_nonstroke_color_space,
  'd'   => :set_line_dash,
  'd0'  => :set_glyph_width,
  'd1'  => :set_glyph_width_and_bounding_box,
  'Do'  => :invoke_xobject,
  'DP'  => :define_marked_content_with_pl,
  'EI'  => :end_inline_image,
  'EMC' => :end_marked_content,
  'ET'  => :end_text_object,
  'EX'  => :end_compatibility_section,
  'f'   => :fill_path_with_nonzero,
  'F'   => :fill_path_with_nonzero,
  'f*'  => :fill_path_with_even_odd,
  'G'   => :set_gray_for_stroking,
  'g'   => :set_gray_for_nonstroking,
  'gs'  => :set_graphics_state_parameters,
  'h'   => :close_subpath,
  'i'   => :set_flatness_tolerance,
  'ID'  => :begin_inline_image_data,
  'j'   => :set_line_join_style,
  'J'   => :set_line_cap_style,
  'K'   => :set_cmyk_color_for_stroking,
  'k'   => :set_cmyk_color_for_nonstroking,
  'l'   => :append_line,
  'm'   => :begin_new_subpath,
  'M'   => :set_miter_limit,
  'MP'  => :define_marked_content_point,
  'n'   => :end_path,
  'q'   => :save_graphics_state,
  'Q'   => :restore_graphics_state,
  're'  => :append_rectangle,
  'RG'  => :set_rgb_color_for_stroking,
  'rg'  => :set_rgb_color_for_nonstroking,
  'ri'  => :set_color_rendering_intent,
  's'   => :close_and_stroke_path,
  'S'   => :stroke_path,
  'SC'  => :set_color_for_stroking,
  'sc'  => :set_color_for_nonstroking,
  'SCN' => :set_color_for_stroking_and_special,
  'scn' => :set_color_for_nonstroking_and_special,
  'sh'  => :paint_area_with_shading_pattern,
  'T*'  => :move_to_start_of_next_line,
  'Tc'  => :set_character_spacing,
  'Td'  => :move_text_position,
  'TD'  => :move_text_position_and_set_leading,
  'Tf'  => :set_text_font_and_size,
  'Tj'  => :show_text,
  'TJ'  => :show_text_with_positioning,
  'TL'  => :set_text_leading,
  'Tm'  => :set_text_matrix_and_text_line_matrix,
  'Tr'  => :set_text_rendering_mode,
  'Ts'  => :set_text_rise,
  'Tw'  => :set_word_spacing,
  'Tz'  => :set_horizontal_text_scaling,
  'v'   => :append_curved_segment_initial_point_replicated,
  'w'   => :set_line_width,
  'W'   => :set_clipping_path_with_nonzero,
  'W*'  => :set_clipping_path_with_even_odd,
  'y'   => :append_curved_segment_final_point_replicated,
  '\''  => :move_to_next_line_and_show_text,
  '"'   => :set_spacing_next_line_show_text,
}

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from AbstractStrategy

#initialize

Constructor Details

This class inherits a constructor from PDF::Reader::AbstractStrategy

Class Method Details

.to_symObject



259
260
261
# File 'lib/pdf/reader/pages_strategy.rb', line 259

def self.to_sym
  :pages
end

Instance Method Details

#processObject

Begin processing the document



264
265
266
267
268
269
270
# File 'lib/pdf/reader/pages_strategy.rb', line 264

def process
  return false unless options[:pages]

  callback(:begin_document, [root])
  walk_pages(@ohash.object(root[:Pages]))
  callback(:end_document)
end