Module: CombinePDF
- Defined in:
- lib/combine_pdf.rb,
lib/combine_pdf/combine_pdf_pdf.rb,
lib/combine_pdf/combine_pdf_fonts.rb,
lib/combine_pdf/combine_pdf_fonts.rb,
lib/combine_pdf/combine_pdf_filter.rb,
lib/combine_pdf/combine_pdf_parser.rb,
lib/combine_pdf/combine_pdf_decrypt.rb,
lib/combine_pdf/combine_pdf_operations.rb,
lib/combine_pdf/combine_pdf_basic_writer.rb
Overview
This is a pure ruby library to merge PDF files. In the future, this library will also allow stamping and watermarking PDFs (it allows this now, only with some issues).
PDF objects can be used to combine or to inject data.
Combine/Merge PDF files or Pages
To combine PDF files (or data):
pdf = CombinePDF.new
pdf << CombinePDF.new("file1.pdf") # one way to combine, very fast.
pdf << CombinePDF.new("file2.pdf")
pdf.save "combined.pdf"
or even a one liner:
(CombinePDF.new("file1.pdf") << CombinePDF.new("file2.pdf") << CombinePDF.new("file3.pdf")).save("combined.pdf")
you can also add just odd or even pages:
pdf = CombinePDF.new
i = 0
CombinePDF.new("file.pdf").pages.each do |page|
i += 1
pdf << page if i.even?
end
pdf.save "even_pages.pdf"
notice that adding all the pages one by one is slower then adding the whole file.
Add content to existing pages (Stamp / Watermark)
To add content to existing PDF pages, first import the new content from an existing PDF file. after that, add the content to each of the pages in your existing PDF.
in this example, we will add a company logo to each page:
company_logo = CombinePDF.new("company_logo.pdf").pages[0]
pdf = CombinePDF.new "content_file.pdf"
pdf.pages.each {|page| page << company_logo} # notice the << operator is on a page and not a PDF object.
pdf.save "content_with_logo.pdf"
Notice the << operator is on a page and not a PDF object. The << operator acts differently on PDF objects and on Pages.
The << operator defaults to secure injection by renaming references to avoid conflics. For overlaying pages using compressed data that might not be editable (due to limited filter support), you can use:
pdf.pages(nil, false).each {|page| page << stamp_page}
Notice that page objects are Hash class objects and the << operator was added to the Page instances without altering the class.
Page Numbering
adding page numbers to a PDF object or file is as simple as can be:
pdf = CombinePDF.new "file_to_number.pdf"
pdf.number_pages
pdf.save "file_with_numbering.pdf"
numbering can be done with many different options, with different formating, with or without a box object, and even with opacity values.
Loading PDF data
Loading PDF data can be done from file system or directly from the memory.
Loading data from a file is easy:
pdf = CombinePDF.new("file.pdf")
you can also parse PDF files from memory:
pdf_data = IO.read 'file.pdf' # for this demo, load a file to memory
pdf = CombinePDF.parse(pdf_data)
Loading from the memory is especially effective for importing PDF data recieved through the internet or from a different authoring library such as Prawn.
Decryption & Filters
Some PDF files are encrypted and some are compressed (the use of filters)…
There is very little support for encrypted files and very very basic and limited support for compressed files.
I need help with that.
Comments and file structure
If you want to help with the code, please be aware:
I’m a self learned hobbiest at heart. The documentation is lacking and the comments in the code are poor guidlines.
The code itself should be very straight forward, but feel free to ask whatever you want.
Credit
Caige Nichols wrote an amazing RC4 gem which I used in my code.
I wanted to install the gem, but I had issues with the internet and ended up copying the code itself into the combine_pdf_decrypt class file.
Credit to his wonderful is given here. Please respect his license and copyright… and mine.
License
GPLv3
Defined Under Namespace
Modules: Fonts, PDFFilter, PDFOperations Classes: PDF, PDFDecrypt, PDFParser, PDFWriter
Constant Summary collapse
- PRIVATE_HASH_KEYS =
lists the Hash keys used for PDF objects
the CombinePDF library doesn’t use special classes for its objects (PDFPage class, PDFStream class or anything like that).
there is only one PDF class which represents the whole of the PDF file.
this Hash lists the private Hash keys that the CombinePDF library uses to differentiate between complex PDF objects.
[:indirect_reference_id, :indirect_generation_number, :raw_stream_content, :is_reference_only, :referenced_object, :indirect_without_dictionary, :linked_fonts]
Class Method Summary collapse
-
.create_page(mediabox = [0.0, 0.0, 612.0, 792.0]) ⇒ Object
makes a PDFWriter object ::mediabox an Array representing the size of the PDF document.
-
.new(file_name = "") ⇒ Object
Create an empty PDF object or create a PDF object from a file (parsing the file).
-
.parse(data) ⇒ Object
Create a PDF object from a raw PDF data (parsing the data).
Class Method Details
.create_page(mediabox = [0.0, 0.0, 612.0, 792.0]) ⇒ Object
makes a PDFWriter object ::mediabox an Array representing the size of the PDF document. defaults to: [0.0, 0.0, 612.0, 792.0]
136 137 138 |
# File 'lib/combine_pdf.rb', line 136 def create_page(mediabox = [0.0, 0.0, 612.0, 792.0]) PDFWriter.new mediabox end |
.new(file_name = "") ⇒ Object
Create an empty PDF object or create a PDF object from a file (parsing the file).
- file_name
-
is the name of a file to be parsed.
123 124 125 126 127 |
# File 'lib/combine_pdf.rb', line 123 def new(file_name = "") raise TypeError, "couldn't parse and data, expecting type String" unless file_name.is_a? String return PDF.new() if file_name == '' PDF.new( PDFParser.new( IO.read(file_name).force_encoding(Encoding::ASCII_8BIT) ) ) end |
.parse(data) ⇒ Object
Create a PDF object from a raw PDF data (parsing the data).
- data
-
is a string that represents the content of a PDF file.
130 131 132 133 |
# File 'lib/combine_pdf.rb', line 130 def parse(data) raise TypeError, "couldn't parse and data, expecting type String" unless data.is_a? String PDF.new( PDFParser.new(data) ) end |