podoff

A Ruby tool to deface PDF documents.

Uses "incremental updates" to do so.

Podoff is used to write over PDF documents. Those documents should first be uncompressed (and recompressed) (how? see below)

require 'podoff'

d = Podoff.load('d2.pdf')
  # load my d2.pdf

fo = d.add_base_font('Helvetica')
  # make sure the document knows about "Helvetica"
  # (one of the base 13 or 14 fonts PDF readers know about)


pa = d.page(1)
  # grab first page of the document

pa.insert_font('/MyHelvetica', fo)
  # link "MyHelvetica" to the base font above for this page

st =
  d.add_stream {
    tf '/MyHelvetica', 12 # Helvetica size 12
    bt 100, 100, "#{Time.now} stamped via podoff" # text at bottom left
  }

pa.insert_content(st)
  # add content to page

d.write('d3.pdf')
  # write stamped document to d3.pdf

For more about the podoff "api", read "how I use podoff".

If you're looking for serious libraries, look at

preparing documents for use with podoff

Podoff is naive and can't read xref tables in object streams. You have to work against PDF documents that have vanilla xref tables. Qpdf to the rescue.

Given a doc0.pdf you can produce such a document by doing:

qpdf --object-streams=disable doc0.pdf doc1.pdf

doc1.pdf is now ready for overwriting with podoff.

qpdf has rewritten the PDF, extracting the xref table but keeping the streams compressed.

bin/podoff

bin/podoff is a command-line tool for to preparing/check PDFs before use.

$ ./bin/podoff -h

Usage: ./bin/podoff [option] {fname}

    -o, --objs                       List objs
    -w, --rewrite                    Rewrite
    -s, --stamp                      Apply time stamp at bottom of each page
    -r, --recompress                 Recompress
    --version                        Show version
    -h, --help                       Show this message

--recompress is mostly an alias for qpdf --object-streams=disable in.pdf out.pf

--stamp is used to check whether podoff can add a time stamp on each page of an input PDF.

how I use podoff

In the application which necessitated the creation of podoff, there are two PDF to generate from time to time.

I keep those two PDFs in memory.

# lib/myapp/pdf.rb

require 'podoff'

module MyApp::Pdf

  DOC0 = Podoff.load('pdf_templates/d0.pdf')
  DOC1 = Podoff.load('pdf_templates/d1.pdf')

  def generate_doc0(data, path)

    d = DOC0.dup # shallow copy of the document
    d.add_fonts

    pa2 = d.page(2)
    st = d.add_stream # open stream...

    st.font 'MyHelv', 12 # font is an alias to tf
    st.text 100, 100, data['customer_name']
    st.text 100, 80, data['customer_phone']
    st.text 100, 60, data['date'] if data['date']
      # fill in customer info on page 2

    pa2.insert_content(st) ... close stream (yes, you can use a block too)

    pa3 = d.page(3)
    pa3.insert_content(d.add_stream { check 52, 100 }) if data['discount']
      # a single check on page 3 if the customer gets a discount

    d.write(path)
  end

  # ...
end

module Podoff # adding a few helper methods to the podoff classes

  class Document

    # Makes sure Helvetica and ZapfDingbats are available
    # on each page of the document
    #
    def add_fonts

      fo0 = add_base_font('/Helvetica')
      fo1 = add_base_font('/ZapfDingbats')

      pages.each { |pa|
        pa = re_add(pa)
        pa.insert_font('/MyHelv', fo0)
        pa.insert_font('/MyZapf', fo1)
      }
    end
  end

  class Stream

    # Places a check mark ✓ at x, y
    #
    def check(x, y)

      font = @font            # save current font
      self.tf '/MyZapf', 12   # switch to ZapfDingbats size 12
      self.bt x, y, '3'       # check mark
      @font = font            # get back to saved font
    end
  end
end

The documents are kept in memory, as generation request comes, the get duplicated, incrementally updated and the filled documents are written to disk. The duplication doesn't copy the whole document file, only the references to the "obj" in the document get copied.

Podoff::Document

class Podoff::Document

  def self.load(path, encoding='iso-8859-1')
    # Podoff.load(path, encoding) is a shortcut to this method

  def dup
    # Makes a shallow copy of the document

  def add_base_font(name)
    # Given a name in the base 13/14 fonts readers are supposed to know,
    # ensures the document has access to the font.
    # Usually "Helvetica" or "ZapfDingbats".

  def pages
    # Returns an array of all the objs that are pages

  def page(index)
    # Starts at 1, returns a page obj. Understands negative indexes, like
    # -1 for the last page.

  def add_stream(src=nil, &block)
    # Prepares a new obj with a stream
    # If src is given places the src string in the stream.
    # If a block is given executes the block in the context of the
    # Podoff::Stream instance.
    # If no src and no block, simply returns the Podoff::Stream wrapped inside
    # of the new obj (see example code above)

  def re_add(obj_or_ref)
    # Given an obj or a ref (like "1234 0") to an obj, copies that obj
    # and re-adds it to the document.
    # This is necessary for the incremental updates podoff uses, if you add
    # an obj to the Contents list of a page, you have to add it to the
    # re-added page, not directly to the original page.

  def write(path=:string)
    # Writes the document, with incremental updates to a file given by its path.
    # If the path is :string, will simply return the string containing the
    # whole document

  def rewrite(path=:string)
    # Like #write, but squashes the incremental updates in the document.
    # Takes more time and memory and might fail (remember, podoff is very
    # naive (as his author is)). Test with care...

  #
  # a bit lower-level...

  def objs
    # returns the hash { String/obj_ref => Podoff::Obj/obj_instance }

Podoff::Obj

A PDF document is mostly a hierarchy of obj elements. Podoff::Obj points to such elements (see Podoff::Document#objs).

class Podoff::Obj

  def insert_font(font_nick, font_obj_or_ref)
  def insert_contents(obj_or_ref)

Podoff::Stream

TODO

class Podoff::Stream

  def tf(font_name, font_size)
  alias :font :tf

  def bt(x, y, text)
  alias :text :bt

disclaimer

The author of this tool/library have no link whatsoever with the authors of the sample PDF documents found under pdfs/. Those documents have been selected because they are representative of the PDF forms podoff is meant to ~~deface~~fill.

known bugs

podoff parsing is naive, documents that contain uncompressed streams with "endobj", "startxref", "/Root" will disorient podoff
completely candid about encoding (only used it for British English documents so far)

LICENSE

MIT, see LICENSE.txt