Module: CombinePDF

Defined in:
lib/combine_pdf.rb,
lib/combine_pdf/combine_pdf_pdf.rb,
lib/combine_pdf/combine_pdf_fonts.rb,
lib/combine_pdf/combine_pdf_fonts.rb,
lib/combine_pdf/combine_pdf_filter.rb,
lib/combine_pdf/combine_pdf_parser.rb,
lib/combine_pdf/combine_pdf_decrypt.rb,
lib/combine_pdf/combine_pdf_operations.rb,
lib/combine_pdf/combine_pdf_basic_writer.rb

Overview

This is a pure ruby library to combine/merge, stmap/overlay and number PDF files.

You can also use this library for writing basic text content into new or existing PDF files (For authoring new PDF files look at the Prawn ruby library).

here is the most basic application for the library, a one-liner that combines the PDF files and saves them:

(CombinePDF.new("file1.pdf") << CombinePDF.new("file2.pdf") << CombinePDF.new("file3.pdf")).save("combined.pdf")

Loading PDF data

Loading PDF data can be done from file system or directly from the memory.

Loading data from a file is easy:

pdf = CombinePDF.new("file.pdf")

you can also parse PDF files from memory:

pdf_data = IO.read 'file.pdf' # for this demo, load a file to memory
pdf = CombinePDF.parse(pdf_data)

Loading from the memory is especially effective for importing PDF data recieved through the internet or from a different authoring library such as Prawn.

Combine/Merge PDF files or Pages

To combine PDF files (or data):

pdf = CombinePDF.new
pdf << CombinePDF.new("file1.pdf")
pdf << CombinePDF.new("file2.pdf")
pdf.save "combined.pdf"

as demonstrated above, these can be chained for into a one-liner.

you can also choose to add only specific pages.

in this example, only even pages will be added:

pdf = CombinePDF.new
i = 0
CombinePDF.new("file.pdf").pages.each do |page|
  i += 1
  pdf << page if i.even?
end
pdf.save "even_pages.pdf"

notice that adding the whole file is faster then adding each page seperately.

Add content to existing pages (Stamp / Watermark)

To add content to existing PDF pages, first import the new content from an existing PDF file. after that, add the content to each of the pages in your existing PDF.

in this example, a company logo will be stamped over each page:

 = CombinePDF.new("company_logo.pdf").pages[0]
pdf = CombinePDF.new "content_file.pdf"
pdf.pages.each {|page| page << }
pdf.save "content_with_logo.pdf"

Notice the << operator is on a page and not a PDF object. The << operator acts differently on PDF objects and on Pages.

The << operator defaults to secure injection by renaming references to avoid conflics.

Less recommended, but available - for overlaying pages using compressed data that might not be editable (due to limited filter support), you can use:

pdf.pages(nil, false).each {|page| page << stamp_page}

Page Numbering

adding page numbers to a PDF object or file is as simple as can be:

pdf = CombinePDF.new "file_to_number.pdf"
pdf.number_pages
pdf.save "file_with_numbering.pdf"

numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values.

Writing Content

page numbering actually adds content using the PDFWriter object (a very basic writer).

in this example, all the PDF pages will be stamped, along the top, with a red box, with blue text, stating “Draft, page #”. here is the easy way (we can even use “number_pages” without page numbers, if we wish):

pdf = CombinePDF.new "file_to_stamp.pdf"
pdf.number_pages number_format: " - Draft, page %d - ", number_location: [:top], font_color: [0,0,1], box_color: [0.4,0,0], opacity: 0.75, font_size:16
pdf.save "draft.pdf"

for demntration, it will now be coded the hard way, just so we can play more directly with some of the data.

pdf = CombinePDF.new "file_to_stamp.pdf"
ipage_number = 1
pdf.pages.each do |page|
  # create a "stamp" PDF page with the same size as the target page
  # we will do this because we will use this to center the box in the page
  mediabox = page[:MediaBox]
  # CombinePDF is pointer based...
  # so you can add the stamp to the page and still continue to edit it's content!
  stamp = PDFWriter.new mediabox
  page << stamp
  # set the visible dimensions to the CropBox, if it exists.
  cropbox = page[:CropBox]
  mediabox = cropbox if cropbox
  # set stamp text
  text = " Draft (page %d) " % page_number
  # write the textbox
  stamp.textbox text, x: mediabox[0]+30, y: mediabox[1]+30, width: mediabox[2]-mediabox[0]-60, height: mediabox[3]-mediabox[1]-60, font_color: [0,0,1], font_size: :fit_text, box_color: [0.4,0,0], opacity: 0.5
end
pdf.save "draft.pdf"

font support for the writer is still in the works and is extreamly limited. at the moment it is best to limit the fonts to the 14 standard latin fonts (no unicode).

Decryption & Filters

Some PDF files are encrypted and some are compressed (the use of filters)…

There is very little support for encrypted files and very very basic and limited support for compressed files.

I need help with that.

Comments and file structure

If you want to help with the code, please be aware:

I’m a self learned hobbiest at heart. The documentation is lacking and the comments in the code are poor guidlines.

The code itself should be very straight forward, but feel free to ask whatever you want.

Credit

Caige Nichols wrote an amazing RC4 gem which I used in my code.

I wanted to install the gem, but I had issues with the internet and ended up copying the code itself into the combine_pdf_decrypt class file.

Credit to his wonderful is given here. Please respect his license and copyright… and mine.

License

GPLv3

Defined Under Namespace

Modules: Fonts, PDFFilter, PDFOperations Classes: PDF, PDFDecrypt, PDFParser, PDFWriter

Constant Summary collapse

PRIVATE_HASH_KEYS =

lists the Hash keys used for PDF objects

the CombinePDF library doesn’t use special classes for its objects (PDFPage class, PDFStream class or anything like that).

there is only one PDF class which represents the whole of the PDF file.

this Hash lists the private Hash keys that the CombinePDF library uses to differentiate between complex PDF objects.

[:indirect_reference_id, :indirect_generation_number, :raw_stream_content, :is_reference_only, :referenced_object, :indirect_without_dictionary]

Class Method Summary collapse

Class Method Details

.create_page(mediabox = [0.0, 0.0, 612.0, 792.0]) ⇒ Object

makes a PDFWriter object

PDFWriter objects reresent an empty page and have the method “textbox” that adds content to that page.

PDFWriter objects are used internally for numbering pages (by creating a PDF page with the page number and “stamping” it over the existing page).

::mediabox an Array representing the size of the PDF document. defaults to: [0.0, 0.0, 612.0, 792.0]

if the page is PDFWriter object as a stamp, the final size will be that of the original page.



183
184
185
# File 'lib/combine_pdf.rb', line 183

def create_page(mediabox = [0.0, 0.0, 612.0, 792.0])
  PDFWriter.new mediabox
end

.new(file_name = "") ⇒ Object

Create an empty PDF object or create a PDF object from a file (parsing the file).

file_name

is the name of a file to be parsed.

Raises:

  • (TypeError)


161
162
163
164
165
# File 'lib/combine_pdf.rb', line 161

def new(file_name = "")
  raise TypeError, "couldn't parse and data, expecting type String" unless file_name.is_a? String
  return PDF.new() if file_name == ''
  PDF.new( PDFParser.new(  IO.read(file_name).force_encoding(Encoding::ASCII_8BIT) ) )
end

.new_table(options = {}) ⇒ Object

makes a PDF object containing a table

all the pages in this PDF object are PDFWriter objects and are writable using the texbox function (should you wish to add a title, or more info)

the main intended use of this method is to create indexes (a table of contents) for merged data.

example:

pdf = CombinePDF.new_table headers: ["header 1", "another header"], table_data: [ ["this is one row", "with two columns"] , ["this is another row", "also two columns", "the third will be ignored"] ]
pdf.save "table_file.pdf"

accepts a Hash with any of the following keys as well as any of the PDFWriter#textbox options:

headers

an Array of strings with the headers (will be repeated every page).

table_data

as Array of Arrays, each containing a string for each column. the first row sets the number of columns. extra columns will be ignored.

font

a registered or standard font name (see PDFWriter). defaults to nil (:Helvetica).

header_font

a registered or standard font name for the headers (see PDFWriter). defaults to nil (the font for all the table rows).

max_font_size

the maximum font size. if the string doesn’t fit, it will be resized. defaults to 14.

column_widths

an array of relative column widths ([1,2] will display only the first two columns, the second twice as big as the first). defaults to nil (even widths).

header_color

the header color. defaults to [0.8, 0.8, 0.8] (light gray).

main_color

main row color. defaults to nil (transparent / white).

alternate_color: alternate row color. defaults to [0.95, 0.95, 0.95] (very light gray). font_color: font color. defaults to [0,0,0] (black). border_color: border color. defaults to [0,0,0] (black). border_width: border width in PDF units. defaults to 1. header_align: the header text alignment within each column (:right, :left, :center). defaults to :center.

row_align

the row text alignment within each column. defaults to :left (:right for RTL table).

direction: the table’s writing direction (:ltr or :rtl). this reffers to the direction of the columns and doesn’t effect text (rtl text is automatically recognized). defaults to :ltr. rows_per_page: the number of rows per page, INCLUDING the header row. deafults to 25. page_size: the size of the page in PDF points. defaults to [0, 0, 595.3, 841.9] (A4).



216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
# File 'lib/combine_pdf.rb', line 216

def new_table (options = {})
  defaults = {
    headers: nil,
    table_data: [[]],
    font: nil,
    header_font: nil,
    max_font_size: 14,
    column_widths: nil,
    header_color: [0.8, 0.8, 0.8],
    main_color: nil,
    alternate_color: [0.95, 0.95, 0.95],
    font_color: [0,0,0],
    border_color: [0,0,0],
    border_width: 1,
    header_align: :center,
    row_align: nil,
    direction: :ltr,
    rows_per_page: 25,
    page_size: [0, 0, 595.3, 841.9] #A4
  }
  options = defaults.merge options
  options[:header_font] = options[:font] unless options[:header_font]
  options[:row_align] ||= ( (options[:direction] == :rtl) ? :right : :left )
  # assert table_data is an array of arrays
  return false unless (options[:table_data].select {|r| !r.is_a?(Array) }).empty?
  # compute sizes
  page_size = options[:page_size]
  top = page_size[3] * 0.9
  height = page_size[3] * 0.8 / options[:rows_per_page]
  from_side = page_size[2] * 0.1
  width = page_size[2] * 0.8
  columns = options[:table_data][0].length
  column_widths = []
  columns.times {|i| column_widths << (width/columns) }
  if options[:column_widths]
    scale = 0
    options[:column_widths].each {|w| scale += w}
    column_widths = []
    options[:column_widths].each { |w|  column_widths << (width*w/scale) }
  end
  column_widths = column_widths.reverse if options[:direction] == :rtl
  # set pdf object and start writing the data
  table = PDF.new()
  page = nil
  rows_per_page = options[:rows_per_page]
  row_number = rows_per_page + 1

  options[:table_data].each do |row_data|
    if row_number > rows_per_page
      page = create_page page_size
      table << page
      row_number = 1
      # add headers
      if options[:headers]
        x = from_side
        headers = options[:headers]
        headers = headers.reverse if options[:direction] == :rtl
        column_widths.each_index do |i|
          text = headers[i].to_s
          page.textbox text, {x: x, y: (top - (height*row_number)), width: column_widths[i], height: height, box_color: options[:header_color], text_align: options[:header_align] }.merge(options).merge({font: options[:header_font]})
          x += column_widths[i]
        end
        row_number += 1
      end
    end
    x = from_side
    row_data = row_data.reverse if options[:direction] == :rtl
    column_widths.each_index do |i|
      text = row_data[i].to_s
      box_color = options[:main_color]
      box_color = options[:alternate_color] if options[:alternate_color] && row_number.odd?
      page.textbox text, {x: x, y: (top - (height*row_number)), width: column_widths[i], height: height, box_color: box_color, text_align: options[:row_align]}.merge(options)
      x += column_widths[i]
    end     
    row_number += 1
  end
  table
end

.parse(data) ⇒ Object

Create a PDF object from a raw PDF data (parsing the data).

data

is a string that represents the content of a PDF file.

Raises:

  • (TypeError)


168
169
170
171
# File 'lib/combine_pdf.rb', line 168

def parse(data)
  raise TypeError, "couldn't parse and data, expecting type String" unless data.is_a? String
  PDF.new( PDFParser.new(data) )
end

.register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil) ⇒ Object

adds a correctly formatted font object to the font library.

registered fonts will remain in the library and will only be embeded in PDF objects when they are used by PDFWriter objects (for example, for numbering pages).

this function enables plug-ins to expend the font functionality of CombinePDF.

font_name

a Symbol with the name of the font. if the fonts exists in the library, it will be overwritten!

font_metrics

a Hash of font metrics, of the format char => char_width, boundingbox: [left_x, buttom_y, right_x, top_y] where char == character itself (i.e. “ ” for space). The Hash should contain a special value :missing for the metrics of missing characters. an optional :wy might be supported in the future, for up to down fonts.

font_pdf_object

a Hash in the internal format recognized by CombinePDF, that represents the font object.

font_cmap

a CMap dictionary Hash) which maps unicode characters to the hex CID for the font (i.e. {“a” => “61”, “z” => “7a” }).



306
307
308
# File 'lib/combine_pdf.rb', line 306

def register_font(font_name, font_metrics, font_pdf_object, font_cmap = nil)
  Fonts.register_font font_name, font_metrics, font_pdf_object, font_cmap
end

.register_font_from_pdf_object(font_name, font_object) ⇒ Object

adds an existing font (from any PDF Object) to the font library.

returns the font on success or false on failure.

VERY LIMITTED SUPPORT:

  • at the moment it only imports Type0 fonts.

  • also, to extract the Hash of the actual font object you were looking for, is not a trivial matter. I do it on the console.

font_name

a Symbol with the name of the font registry. if the fonts exists in the library, it will be overwritten!

font_object

a Hash in the internal format recognized by CombinePDF, that represents the font object.



319
320
321
# File 'lib/combine_pdf.rb', line 319

def register_font_from_pdf_object font_name, font_object
  Fonts.register_font_from_pdf_object font_name, font_object
end