File: README — Documentation for pdf

This gem was last updated on the 04.02.2024 (dd.mm.yyyy notation), at 04:38:42 o'clock.

This project can help with pdf-related activities, such as extracting a .pdf page, converting .pdf page, merging .pdf files, splitting .pdf files, setting the title of a .pdf page and similar actions.

The project has to remain quite flexible. We may use external programs such as ghoscript or qpdf, or we may use pure ruby solutions, such as via the gem called combine_pdf, prawn or hexapdf.

The file here (README.gen, respectively the generated file called README.md), will describe some of the components that make up this gem.

There are many different pdf-related toolkits available if you look for them on the www.

For example, we have prawn, we have qpdf, we have calibre, we have hexapdf, we have ghostscript, and many more applications.

Some of these projects have unique features; and some of them have overlapping functionality, such as reading the content of .pdf files in a simplified manner (number of pages, title, author and so forth).

The PdfParadise project attempts to support as many different (open-source) projects as possible. From the point of view of the PdfParadise project, it is also permissive to support closed source projects, provided that the code remains simple (and simple to change), for instance via ruby's system / `` callbacks for commandline binaries.

The primary focus for the pdf_paradise gem is on open-source projects, though, so closed-source is a secondary objective, at best.

Why does the PdfParadise project attempt to support many different pdf-related projects?

The answer to this question is rather simple: on Linux I have a lot of flexibility and can use literally any pdf-related project just fine. On Windows, however had, I am more restricted in what I can use. Not all programs are available on windows or can be easily compiled or be installed there. Thus, in order to allow the pdf_paradise .gem to work on windows, we need a certain level of flexibility.

The reason why I added this subsection here in June 2021 was that I am slowly changing the sinatra-related part of the PdfParadise project, in order to embed the functionality into my main controller which is handled by the Roebe namespace. In that controller I wanted to easily offer pdf-related functionality "out of the box" when I start the sinatra-application on windows. Because I want to be able to offer pdf-related modifications on windows as well, the PdfParadise project had to become more flexible, so that a simple toplevel route, such as /pdf, will work properly, and lead to entry points (subroutes) that allow me to tap into the features offered by the PdfParadise project. That way I can then, for instance, easily display the number of pages in a .pdf file on windows as well.

So, the primary summary here is this: the PdfParadise project must remain flexible in order to support a proper workflow on windows systems as well. (We could use WSL on windows, but not every computer has this available, so I am targeting "vanilla" windows really.)

Note that one slight drawback is that the sinatra part of the PdfParadise project now has a dependency on the cyberweb project, so if you want to use that, you also have to install the cyberweb gem. This is a trade-off - for me the more important part is long-term maintainability of the pdf_paradise project in the long run, so a unified code base had to be used in this regard.

Converting a .pdf file to text

Sometimes you may wish to have a text-file describing the content of a .pdf file, rather than the .pdf file itself.

Via class PdfParadise::ConvertPdfToText, residing in the file at pdf_paradise/convert_pdf_to_text.rb, you can convert a .pdf file to a text file.

Usage example from ruby, for the file called foobar.pdf:

PdfParadise::ConvertPdfToText.new(ARGV)
PdfParadise::ConvertPdfToText.new('foobar.pdf')

You can also use the bin/ file from the commandline:

convert_pdf_to_text
convert_pdf_to_text foobar.pdf

There is also a ruby-gtk3 widget that offers the functionality from class PdfParadise::ConvertPdfToText, if the user has gtk3 installed and the ruby-bindings to it as well.

You can start that ruby-gtk3 widget via:

convert_pdf_to_text --gui

Storing the .pdf pages that are currently open

If you need to store the .pdf files that are currently open, you can use the following commandline to do so:

pdfparadise --store-open-pdf-files

This will attempt to store the full path to the .pdf files into a local file. That way you may also be able to batch-open these .pdf files at a later time, e. g. when you switch your window manager or after a reboot.

Since as of October 2022 I am not using this as much anymore as before, because the roebe gem has a class called Books (at roebe/classes/books.rb) that handles .pdf files for me. I use that class as if I am reading different "books" - each individual .pdf file is then a "book".

Converting markdown .md files to .pdf files

If you use kramdown, prawn and kramdown-pdf-converter, then you can convert .md files on the commandline, via:

convert_markdown_to_pdf path_to_pdf_file_goes_here.pdf

Install the necessary gems prior to using this commandline functionality.

Querying the title of a .pdf file

class PdfParadise::QueryPdfTitle will report the title of any .pdf file that is passed into it, on the commandline.

This currently depends on exiftool but at a later time, this may change to also allow a query via prawn or other tools.

If you need to determine whether a given .pdf file has a title or whether it does not, you can use PdfParadise.does_this_pdf_file_have_a_title?, such as in:

PdfParadise.does_this_pdf_file_have_a_title? "foobar.pdf" # => true

This method will return true if the .pdf file at hand has a title; and false otherwise.

Determining how many pages a given .pdf file has

class PdfParadise::PdfFileNTotalPages can be used to query how many pages a given .pdf file has.

The executable called bin/n_pages (thus, n_pages) can be used to query this, on the commandline.

Example:

n_pages foobar.pdf

Do note that the class requires the external program called pdfinfo.

It is possible to query the number of pages in a given .pdf file without pdfinfo, but some .pdf files are a bit buggy, and pdfinfo is simply more reliable than the regex that was used until March 2020. So, past March 2020, the program pdfinfo is now used by default. Note that pdfinfo is part of the poppler software suite.

You can also use the following toplevel API for this:

PdfParadise.n_pages? 'THE_PATH_TO_THE_PDF_FILE_GOES_IN_HERE.pdf'
PdfParadise.n_pages? 'foobar.pdf'

Adding page numbers to .pdf files

Via the combine_pdf gem it is now possible to add page numbers to .pdf files. This has a few limitations for complex .pdf files, due to combine_pdf having limitations in turn - but for simple .pdf files this should work really well.

How to use that functionality?

Consider using the following toplevel API:

PdfParadise.number_pages('this_file.pdf')

The file called this_file.pdf has to exist in order for this to work, of course.

The current default is to display the page numbers on the bottom right side. This is hardcoded, but you could modify the code to adapt to your needs; see also how combine_pdf does this. (You have to pass an option-hash.)

Various GUI component of the PdfParadise project

The PdfParadise project comes with some ruby-gtk3 specific GUIs, but a few ruby-gtk2 and ruby-tk bindings may exist as well. The ruby-gtk3 components constitute the main GUI elements of this project, though.

You can start, from the commandline, the gtk-wrapper over the split_pdf_file functionality.

In order to do this, do either one of the following:

pdf_paradise --gui
pdf_paradise --gtk

This will require the gtk_paradise project and the gtk bindings, so quite a lot. gem install gtk3 and gem install gtk_paradise should help.

The GUI for class SplitPdfFile is called PdfParadise::Gtk::SplitPdfFile. The idea behind it is to allow you to determine some of the parameters in a graphical fashion.

Since as of September 2019, there is also a mini-widget for quickly removing the first page of a .pdf file. This is really minimal right now and not very elegant; it may be improved in the future, but for the time being it is what it is. It is more a proof-of-concept that it can work.

You can start this via:

require 'pdf_paradise/gui/gtk2/remove_first_page_of_pdf_file.rb'

PdfParadise.start_gtk_gui_remove_first_page_of_pdf_file

Note that as of January 2021 the gtk bindings will default to ruby-gtk3. Support for ruby-gtk2 will be retained, though, but new code may not necessarily be written for ruby-gtk2 in mind. ruby-gtk3 is now the main GUI target for this project.

I am slowly porting the individual widgets.

The following widgets have been ported so far:

PdfParadise::GUI::Gtk::StatisticsWidget # can be found under pdf_paradise/gui/gtk3/statistics_widget/statistics_widget.rb

Specification of the .pdf format

This subsection is a stub - I only needed it to gather information about the .pdf specification. This is NOT complete - it only shall contain some useful information and snippets about the .pdf specification.

PDF stands short for Portable Document Format.

PDF has been standardized as ISO-32000 in the year 2008.

In the pdf-specification we can distinguish these entities:

Objects: these are not objects in the OOP sense, but simply the
basic data type of the PDF standard. There are 9 types of objects:
null, boolean, integer, real, name, string, array, dictionary and
stream.

Dictionary: this is a key-value pair that is unordered. They are 
denoted by << and >> at the beginning and the end.

Indirect Objects: these are objects that are referred to by
reference.

Direct Objects: these are objects that appear inline and are
obtained directly.

Conforming Reader: is ann application that parses a PDF
file according to the PDF Standard.

A .pdf file is made up of a specific structure, usually a four-part layout.

These four parts are:

Header
Body
Cross-reference table
Trailer

The .pdf Header tag

The header may begin with an entry such as %PDF-1.7.

The general format for the header is:

%PDF- followed by the version number in the form of 1.N.

This is not valid for all .pdf files, though. Past PDF Version 1.4, the Version entry in the document's catalog dictionary, which is within the Root entry of the Trailer, may be used instead of the Header - if present.

If a .pdf file contains binary data - which most PDF files will do nowadays, such as stream objects - then the Header line shall be immediately followed by a line containing at the least four binary characters. These are character codes of 128 or greater.

The .pdf Body tag

The body of a PDF File consist of these aforementioned Indirect Objects representing the contents of a document.

Indirect Objects begin with a unique object identifier that allows other objects to refer to them.

That identifier is made up of the following two components:

(1) Object Number:     a positive Integer, can be in any arbitrary order
(2) Generation Number: a non-negative Integer)

The Indirect Objects can be referred to from elsewhere by an Indirect Reference. This must consist of:

Object Number
Generation Number, and
keyword R # for instance: 4 0 R

After the identifier is the keyword obj (start of the object) and endobj (end of the object). Anything in between that is is a key-value pair that describes the object.

A a simple example showing the use of Indirect Objects will be shown next:

1 0 obj % Object Number 1, Generation Number 0
<<
/Type /Pages % Describe type of object
/Kids [ 4 0 R ] $ Kids Entry referring to an indirect reference (Object number 4, Generation number 0)
/Count 1
>>
endobj

2 0 obj % Object Number 2, Generation Number 0
<<
/Type /Catalog % Describe type of object
/Pages 1 0 R % Referring another object via unique object identifier
>>
endobj

The Body section of a .pdf file is thus a tree of objects that are linked together, ultimately coming down to the Root Object (Defined by the Root entry in the Trailer section, as a catalog dictionary).

The Cross-Reference Table is a table that contains a list of byte offset pointing to the indirect objects.

A pdf-conforming reader uses the Cross-Reference Table as a lookup table to access certain objects quickly when needed.

The format for entries in Cross-Reference Table can be summarized ass follows:

- In the following format nnnnnnnnnn ggggg n eol, a total of 20 bytes
- nnnnnnnnnn is a 10-digit byte offset in the decoded stream
- ggggg 5-digit generation number
- n keyword for in-use entry or f keyword for free entry
- eol 2 character end-of-line sequence (Like CR LF)

The Cross-Reference Table always begins with the special entry 0000000000 65535 - see the following example:

0000000000 65535 f % special entry, f denoting it is a free entry

Graphical User Interfaces (GUIs)

The pdf_paradise gem comes with a few, small-ish widgets, primarily written in ruby-gtk. Since as of August 2021 I am also experimenting with libui but this is a slow process - stay tuned for more updates in the coming months in this regard.

One big advantage of libui is that it works on windows out-of-the-box, so we can use GUIs on windows as well. \o/

Storing all open .pdf files in a yaml file

In *February 2022 the yaml file working_on_these_pdf_files.yml was added at:

pdf_paradise/yaml/working_on_these_pdf_files.yml

The idea here is that this yaml-file retains the local path to any .pdf file that the user (in this case me) is working on, aka reading right now.

I needed this because I tend to work through .pdf files and remove page after page when I read it. The idea is that I do not lose that information when I reboot my computer or when said computer crashes; I needed to make this persistent information.

Why is this yaml file part of the pdf_paradise gem, though? This is mostly due to convenience. I wanted to have this available in one of my ruby gems by default. In the long run I will add code that allows other users to adjust this to their own use case (and perhaps in their home directory rather than store this in the gem itself). As of February 2022 code for the latter is currently not part of the gem, but I may add code for this - either in the pdf_paradise gem or the roebe gem.

Splitting a single pdf file into individual several .pdf files

You can use the following toplevel API to split up a single .pdf file into several .pdf files:

PdfParadise.burst(ARGV)
PdfParadise.burst('foobar.pdf')

A commandline variant exists as well, at bin/burst_this_pdf_file, tapping into the code stored in the file pdf_paradise/utility_scripts/split_pdf.rb.

Usage example for the commandline variant:

burst_this_pdf_file foobar.pdf

(Make sure this bin file can be found in $PATH.)

Be careful when using this script: it will dump the generated individual .pdf files into the current working directory, so you may want to create a subdirectory before invoking this executable, and move your target .pdf into that file. While functionality could be added to automatically create a subdirectory and relocate the generated .pdf files into that subdirectory, for now we'll keep it simple here and just extract the individual .pdf pages into the current working directory.

Note that hexapdf can also be used for this functionality. In February 2023 it became the default; the old variant via imagemagick's convert is retained in the file pdf_paradise/utility_scripts/split_pdf.rb though.

Merging pdf files

class PdfParadise::MergePdf.new(ARGV) can be used for merging .pdf files. This functionality depends on external software, so you have to install this first.

Currently ghostscript and hexapdf can be used for the merging step.

Examples for how to use either of these two variants, as far as class PdfParadise::MergePdf is concerned, follows next:

mergepdf one.pdf two.pdf --use-ghostscript
mergepdf one.pdf two.pdf --use-hexapdf
mergepdf *.avif --use-hexapdf
mergepdf SCAN1.avif SCAN2.avif SCAN3.avif SCAN4.avif SCAN5.avif --use-hexapdf
mergepdf SCAN1.avif SCAN2.avif SCAN3.avif SCAN4.avif SCAN5.avif --use-ghostscript
mergepdf output-page1.pdf output-page2.pdf output-page3.pdf output-page4.pdf output-page5.pdf  --use-ghostscript
mergepdf SCAN1_GUTACHTEN.pdf SCAN2_GUTACHTEN.pdf SCAN3_GUTACHTEN.pdf SCAN4_GUTACHTEN.pdf SCAN5_GUTACHTEN.pdf --use-ghostscript
mergepdf SCAN1_GUTACHTEN.pdf SCAN2_GUTACHTEN.pdf SCAN3_GUTACHTEN.pdf SCAN4_GUTACHTEN.pdf SCAN5_GUTACHTEN.pdf --hexapdf

(The two -- hyphen are mandatory for commandline arguments right now; otherwise it is assumed to be a locally existing .pdf file.)

If you need to do this from within ruby code, consider using the following code:

require 'pdf_paradise'
merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
merge_pdf.feedback_where_it_is_stored # Call it manually.

require 'pdf_paradise'
merge_pdf = PdfParadise::MergePdf.new('one.pdf two.pdf')
merge_pdf.feedback_where_it_is_stored # Call it manually.

Combining individual pages from .pdf files into a new .pdf file via class PdfParadise::CombineThesePdfPages

class PdfParadise::CombineThesePdfPages can be used to extract individual pdf pages from a given .pdf file and combine these into a new .pdf file.

There is also an executable at bin/combine_these_pdf_pages which can be used on the commandline.

This functionality depends on the hexapdf gem.

Usage example:

combine_these_pdf_pages foobar.pdf 3,4,5

This would retain the pages at 3, 4 and 5 and create a new .pdf file.

Extracting all images from a .pdf file

If you make use of poppler then you can extract all images from a given .pdf file.

A small libui-GUI was added for this functionality - this is mostly for quick demo purposes. It does not work extremely well.

On IceWM it looks like this right now:

Not pretty, but it took only about 20 minutes to write this.

pdfimages from poppler must be installed. On Windows you can probably download an executable for poppler here:

https://blog.alivate.com.au/poppler-windows/

I tested whether the above executables work on windows, and indeed, they still work fine. I also tested the libui variant on windows, and it works. The code is a bit brittle, so use with care, but I was able to use it successfully on August 2022 to extract all images from a given .pdf file. At a later time I may add am to-image converter via libui, probably in the other gem called image_paradise. Stay tuned in this regard.

To start the libui wrapper from the commandline, you can use the following:

/usr/bin/pdf_paradise --libui
bin/pdf_paradise  --libui
pdf_paradise --libui # This variant should work, or try the other
                     # variants; it is stored in bin/pdf_paradise
                     # of this gem

Numbering the pages in a given .pdf file automatically

If you use the external gem called combine_pdf then you can make use of automatic numbering via the pdf_paradise gem.

The API for this is:

PdfParadise.number_this_pdf_file('foobar.pdf')

It is not a very flexible API as of right now. Perhaps at a later point in time it may be extended.

class PdfParadise::ToPdf

class PdfParadise::ToPdf can be used for two main activities right now:

(1) You can convert .docx to .pdf files on the commandline, if you have libreoffice installed.

(2) If you pass in a directory, then all image files of that directory will be gathered, converted into a .pdf file, and then the .pdf file will be assembled.

The sinatra interface of the pdf_paradise gem

Since as of April 2019 there is a minimal sinatra interface to the PdfParadise project. Consider this incomplete work-in-progress.

To start it, try:

pdf_paradise --sinatra

Since as of July 2023 this now makes use of class Cyberweb::HtmlTemplate. This is the generic class I use for generating HTML files (or rather, the String that describes the .html file in question).

Flipping / Rotating a .pdf file

This subsection will try to explain how a .pdf file can be flipped / rotated, and how this may relate to the pdf_paradise gem here.

There are many ways how to do so. Let's start with an example via qpdf.

To rotate clockwise, 90°, use:

qpdf --rotate=+90 foo.pdf bar.pdf

This will generate a flipped .pdf file, rotated 90°, and call it bar.pdf.

The pdf_paradise gem has a class that is tasked with rotating .pdf files.

See:

require 'pdf_paradise/utility_scripts/rotate_pdf_file.rb'
PdfParadise::RotatePdfFile.new(ARGV)

To set the rotation you can invoke the method called .set_rotate().

There is also a bin/ commandline executable for this, called rotate_pdf.

There is also a little GUI wrapper around that functionality available, as part of the pdf_paradise project.

See:

PdfParadise::GUI::LibUI::RotatePdfFile.new

Deleting the last or the first page of a .pdf file

You can use class DeleteLastPageOfThisPdfFile, more accurately called class PdfParadise::DeleteLastPageOfThisPdfFile, to delete the last page in a .pdf file.

In ruby code, you can invoke this like so:

require 'pdf_paradise'

PdfParadise::DeleteLastPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')

or shorter:

require 'pdf_paradise'

PdfParadise.delete_last_page_of_this_pdf_file('foobar.pdf')

A very similar API exists for deleting the first page of a given .pdf file, too.

See:

In ruby code, you can invoke this like so:

require 'pdf_paradise'

PdfParadise::DeleteTheFirstPageOfThisPdfFile.new('path_to_the_pdf_file/goes_in_here.pdf')

or shorter:

require 'pdf_paradise'

PdfParadise.delete_the_first_page_of_this_pdf_file('foobar.pdf')
PdfParadise.delete_first_page_of_this_pdf_file('foobar.pdf') # Both variants work.

Note that a small libui-wrapper exists for this functionality, under the gui/ subdirectory of this gem. It may look like this:

An older ruby-gtk3 variant also exists:

However had, in October 2023 I found this layout confusing, and since I was also on a journey to write as many jruby-swing GUIs as possible, I rewrote the old ruby-gtk3 code, to then be used as a basis for the jruby code at a later time.

The rewrite did not change much, but the new layout makes more logical sense, I think - at the least compared to the prior variant:

In October 2023 the old class DeleteFirstPageOfThisPdfFile was rewritten and renamed, into DeleteTheFirstPageOfThisPdfFile. The code was improved, in particular when working on windows - that was one use case I had, that it had to work on the windows platform as well.

Commandline usage

You can use the pdf_paradise gem from the commandline, as other examples on this homepage shows.

For instance, say that you wish to modify the title of a .pdf file, you can use a commandline invocation such as via this way:

pdf_paradise --use-this-pdf-file=location_to_your_pdf_file.pdf --set_title="The title you want to use goes in here."

You can also shrink a .pdf file, by using the commandline switch --shrink-pdf-size-of=foobar.pdf or just --shrink, such as:

pdf_paradise --shrink-pdf-size-of=foobar.pdf
pdf_paradise --shrink=foobar.pdf

The shrink functionality is contained in the module-method PdfParadise.reduce_size_of_this_pdf_file().

Converting .jpg files to .pdf files

If you have a use case to convert several .jpg files into .pdf files then the following commandline example should be helpful:

convert /path/to/image foobar.pdf
convert *.jpg foobar.pdf

Note that this requires ImageMagick. ImageMagick is not always perfect; it has a few problems, unfortunately.

For instance, in April 2022 when I tried the above, the image was repeated three times on the x-axis. I do not know why, but that makes absolutely no sense. It is just a single image, so why is the resulting .pdf file repeated three times? Perhaps imagemagick's convert tool does this automatically, but then I question the default behaviour - it makes no sense for the use case I have. One image should be one image, not three images or fifty images.

In the event that ImageMagick does not work very well for your use case, consider using another software suite, such as img2pdf.

The syntax for img2pdf goes something like this:

img2pdf *.jpg -o document.pdf
img2pdf SCAN1.jpg SCAN2.jpg SCAN3.jpg SCAN4.jpg SCAN5.jpg -o document.pdf

I liked this, so in April 2022 this was added to ImageParadise. The API for this is as follows:

ImageParadise.img2pdf('*.jpg') # If a '*' is part of the input Dir[] will be used.

As that functionality may be useful on the commandline as well, an executable has been added at bin/imageparadise_img2pdf. Simply pass the image files that you want to convert.

Usage example:

imageparadise_img2pdf *jpg

If you need the images to be ordered or sorted then you may have to do so when specifying the image file at hand specifically, e. g. the path to it.

So for instance:

imageparadise_img2pdf image3.jpg image1.jpg image2.png

The only drawback I have found with img2pdf so far is that you can not easily add text to an image. This makes it hard to identify which image is named how. A work around for this is to embed the filename into the image itself, e. g. create temporary images, and then pack them together via img2pdf.

Unfortunately in September 2023 I realiased that img2pdf sometimes creates .pdf files that are flawed. So img2pdf may not always be an optimal choice.

Compressing a .pdf file (optizime the size of a .pdf file)

Sometimes you may want to reduce the filesize of a given .pdf file at hand, such as when you need to upload a .pdf file, and there is some file size limit in place, thus making it obligatory to reduce the .pdf file below a certain threshold. This actually happened to me a few times when using webmail-based email services, where an automatic notice was generated and issued to me when the .pdf file was too large, such as above 25MB in size or something similar.

So, let us now assume that you do have a use case such as described above, or any other use case - you want to reduce the file size of a given .pdf file at hand.

How can this be done?

Well, there are several ways of course.

One is to use online-based tools, which tend to work surprisingly well; I verified this in February 2022. One example for this is this website:

https://www.ilovepdf.com/compress_pdf

But, as far as the gem here is concerned, we will focus primarily on means that can be used by you, on your own, without having to depend on external websites.

Two methods will be described here - the first one requiring ghostscript, the second one requiring hexapdf.

The important parameter in regards for ghostscript is the dPDFSETTINGS parameter. This one will determine the compression level, which ultimately will affect the quality of the compressed .pdf file.

Available parameters to dPDFSETTINGS include /screen, /ebook, /printer, /prepress and /default.

The options are as follows:

-dPDFSETTINGS Option	Explanation
-dPDFSETTINGS=/screen	Has a lower quality and smaller size. (72 dpi)
-dPDFSETTINGS=/ebook	Has a better quality, but has a slightly larger size (150 dpi)
-dPDFSETTINGS=/prepress	Output is of a higher size and quality (300 dpi)
-dPDFSETTINGS=/printer	Output is of a printer type quality (300 dpi)
-dPDFSETTINGS=/default	Selects the output which is useful for multiple purposes. Can cause large PDFS.

In particular /screen is optimal here if you want to reduce the file size. You can achieve, for instance, a compression from a .pdf file the size of 73 MB down to 14 MB - which is quite neat.

class PdfParadise::CompressThisPdfFile can be of help here. Simply pass, as argument to .new(), the path of the local .pdf to that class.

This class resides at:

pdf_paradise/compress/compress_this_pdf_file.rb

Note that class PdfParadise::CompressThisPdfFile currently only uses ghostscript, so we have to use the above commandline options, such as -dPDFSETTINGS.

You can also use a toplevel method if you'd like to:

require 'pdf_paradise'
PdfParadise.compress_this_pdf_file
PdfParadise.compress_this_pdf_file('/foobar.pdf') # ← Pass the path to the .pdf file into this method.

The variant using hexapdf is called:

PdfParadise.compress_via_pdf
PdfParadise.compress_via_pdf('foobar.pdf')

The API names may change at a later point in time; perhaps we will just add a toplevel API called PdfParadise.compress(), but for the time being the above APIs will be retained as they are.

In February 2024 I noticed that qpdf can also be used to compress .pdf files.

Commandline variants in this regard may look like this:

qpdf --compress-streams=y --object-streams=generate --recompress-flate --optimize-images input_file_here.pdf output_file_there.pdf

To use the above in pdf_paradise you can use:

PdfParadise.compress_via_qpdf

Licence

In January 2024, the licence of this project was changed from GPL-2.0 towards "MIT No Attribution". You can read up on this MIT licence here:

https://spdx.org/licenses/MIT-0.html

The two most important parts are the "no warranty", as well as "use this software how you want to", so it is a fairly liberal licence, with almost no restrictions.

I will also copy/paste the full licence here, for convenience to the reader:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contact information and mandatory 2FA (no longer) coming up in 2022 / 2023

If your creative mind has ideas and specific suggestions to make this gem more useful in general, feel free to drop me an email at any time, via:

shevy@inbox.lt

Before that email I used an email account at Google gmail, but in 2021 I decided to slowly abandon gmail, for various reasons. In order to limit the explanation here, allow me to just briefly state that I do not feel as if I want to promote any Google service anymore when the user becomes the end product (such as via data collection by upstream services, including other proxy-services). My feeling is that this is a hugely flawed business model to begin with, and I no longer wish to support this in any way, even if only indirectly so, such as by using services of companies that try to promote this flawed model.

In regards to responding to emails: please keep in mind that responding may take some time, depending on the amount of work I may have at that moment. So it is not that emails are ignored; it is more that I have not (yet) found the time to read and reply. This means there may be a delay of days, weeks and in some instances also months. There is, unfortunately, not much I can do when I need to prioritise my time investment, but I try to consider all feedback as an opportunity to improve my projects nonetheless.

In 2022 rubygems.org decided to make 2FA mandatory for every gem owner eventually:

see https://blog.rubygems.org/2022/06/13/making-packages-more-secure.html

Mandatory 2FA will eventually be extended to all rubygems.org developers and maintainers. As I can not use 2FA, for reasons I will skip explaining here, this means that my projects will eventually be removed, as I no longer have any control over my projects hosted on rubygems.org (because I can not use 2FA).

At that point, I no longer have any control what is done to my projects since whoever is controlling the gems ecosystem took away our control here. I am not sure at which point ruby became corporate-controlled - that was not the case several years ago, so something has changed.

Ruby also only allows 2FA users to participate on the issue tracker these days:

https://bugs.ruby-lang.org/issues/18800

But this has been reverted some months ago, so it is no longer applicable. Suffice to say that I do not think that we should only be allowed to interact on the world wide web when some 'authority' authenticated us, such as via mandatory 2FA, so I hope this won't come back again.

Fighting spam is a noble goal, but when it also means you lock out real human people then this is definitely NOT a good situation to be had.

Rationale for making use of separate pdf-related projects

Converting a .pdf file to text

Storing the .pdf pages that are currently open

Converting markdown .md files to .pdf files

Querying the title of a .pdf file

Determining how many pages a given .pdf file has

Adding page numbers to .pdf files

Various GUI component of the PdfParadise project

Specification of the .pdf format

The .pdf Header tag

The .pdf Body tag

Graphical User Interfaces (GUIs)

Storing all open .pdf files in a yaml file

Splitting a single pdf file into individual several .pdf files

Merging pdf files

Combining individual pages from .pdf files into a new .pdf file via class PdfParadise::CombineThesePdfPages

Extracting all images from a .pdf file

Numbering the pages in a given .pdf file automatically

class PdfParadise::ToPdf

The sinatra interface of the pdf_paradise gem

Flipping / Rotating a .pdf file

Deleting the last or the first page of a .pdf file

Commandline usage

Converting .jpg files to .pdf files

Compressing a .pdf file (optizime the size of a .pdf file)

Licence

Contact information and mandatory 2FA (no longer) coming up in 2022 / 2023