The Extracter project

The primary goal of the extracter gem

This gem can help the user if said user wants to extract a given archive.

Naturally this requires that ruby is installed on the target computer.

The extracter-gem also works on Windows, since as of March 2024, if the program 7z (7zip) is installed. See https://www.7-zip.org/ for binaries that will work on Windows.

The scope of class Extracter

The scope (and goals) for class Extracter, which is the primary class of the extracter-gem, are rather simple: throw any archive format at this class, and the class will try to extract it. That's it.

The class can be used on the commandline as well, via bin/extracter.

Note that in order to extract a given archive, some programs must be available. On Linux this typically involves programs such as tar, bzip2, xz and gzip. On Windows it is recommended to use 7-zip - I found that 7-zip (7z) works best on Windows, much better than unzip or rar/unrar.

How to find out whether a given file is a valid archive

You can find out whether a given input xyz is a valid archive by making use of the following toplevel-method:

Extracter.is_this_a_valid_archive?(path)

This will return either true or false, with false being the default.

In August 2022 the code was adapted slightly to return true no matter if the file extension is upcased or lowercased. In other words: a file called foobar.zip will work fine, as far as this method is concerned, as will the same file but this time called foobar.ZIP. I had the latter situation be an issue when I downloaded a BIOS Firmware update in August 2022 - the file had as extension .ZIP and Extracter did not work as a consequence. So that was a tiny bug, or rather, an oversight in the extracter gem - it really should work on .ZIP files just fine as well, since these are simply .zip files, anyway; in particular on Windows.

Keep in mind that not all archive formats have been added so far, so this will return false even if it should instead return true. I am willing to add support for more archive formats in the future, as time permits.

How to extract an archive through class Extracter, in pure ruby

You can use the following method to extract an archive, in ruby, into a specific target location on the given computer system:

Extracter.extract_what_to('foo-1.0.tar.xz', '/tmp/')

The first argument is the local path to an archive.

The second argument simply specifies the target, usually the target directory, such as /tmp/ in this case. Can't get any simpler than that!

Colour support

If the Colours gem (gem install colours) is available then colour support is possible for class Extracter::Extracter.

For class Extracter::Extracter, whether colours are supported is determined via the method called .colours?. If true then colours will be used - at the least the class will try to make use of colours.

If you do not want to have colours, for whatever the reason or use case, then you can disable this, e. g. through the method call .disable_colours specifically, to give one example for doing so.

class Extracter::ExtractIt

class Extracter::ExtractIt was a standalone .gem (named extract_it), but since as of September 2020 it is now part of "module Extracter".

Basically this class is is a wrapper over the class Extracter, which will extract archives in general. The extraction step will go into the current working directory, as far as class ExtractIt is concerned.

What archives and files can be extracted?

class Extracter can extract .tar.xz, .tar.bz, .tar.gz, tar.Z, .zip and numerous other file formats. This requires that tools such as tar or gzip or xz are installed and available on the target computer-machine, as we will simply use ruby to shell out to these other programs. Ruby is the ultimate 'glue' language after all.

This class can also extract audio-files if ffmpeg is installed and the multimedia_paradise project is available (gem install multimedia_paradise), as well as squashfs files (typically .img files). In the latter case, such an .img file will be mounted in the current directory. You could then simply copy the content, to have a 'full' archive situation (mounting means read-only by deafult in this regard, hence why the copy-operation has to be done).

Support for extracting .jar files - used by Java - has been added in May 2020.

Support for 'extracting' .pdf files has been added in January 2021. This depends on poppler, and it is not really an extraction; we simply convert the file to a .txt (text) file. For this functionality the pdf_paradise project has to be available.

Support for extracting archives on windows via 7zip has been added in August 2021. This was done because I had problems with 'tar' on windows; using 7zip bypassed these problems, and I needed to be able to extract archives on windows easily.

Strip components

Say that we have a path such as /home/x/htop/htop-3.0.5.tar.xz.

You want to extract this archive into a directory such as /tmp/.

Normally this would create the following directory, if everything goes according to plan:

/tmp/htop-3.0.5/

In most cases this is what the user wants; at the least I expect this, if an archive is correctly tarred up, such as a .tar.xz file.

But there are use cases where this is not the case so. For instance, say that you already did creat such a directory, cd-ed into it and now just want to extract right into this directory. Then you don't want to end up with:

/tmp/htop-3.0.5/htop-3.0.5/

tar allows us to avoid this, by using the commandline flag called --strip-components=1.

In October 2021 I needed this for the rbt-project. Since the old class Extracter did not support it, and the code was very messy, I ended up re-writing the project. It should now be better - at the least cleaner than it once was. And support for the above is possible as well now; even with a specific method call (which is .strip_components()).

Extracter.remove_archive_type

If you have a use case where you would like to remove the file suffix from an archive, such as turning the String "https://rubygems.org/rubygems/rubygems-3.3.14.tgz" into the String "rubygems-3.3.14" exactly, then you can use the toplevel method called Extracter.remove_archive_type() for this task.

An usage example for this follows:

Extracter.remove_archive_type(File.basename("https://rubygems.org/rubygems/rubygems-3.3.14.tgz")) # => "rubygems-3.3.14"

Usage example of the primary API:

If you want to extract something, via ruby and Extract, use:

_ = Extracter.extract_what_to('foobar-1.0.tar.bz2', '/tmp/')

You can also use the .new variant, which goes like so::

_ = Extracter.new('foo.gz', '/tmp', false) # Yes, you can extract .gz files as well.
_.run

The argument false means "do not yet run". You have to invoke the .run method in that case on your own.

The second argument to the above method is the target location, the directory where we will extract our archive to.

Contact information and mandatory 2FA (no longer) coming up in 2022 / 2023

If your creative mind has ideas and specific suggestions to make this gem more useful in general, feel free to drop me an email at any time, via:

shevy@inbox.lt

Before that email I used an email account at Google gmail, but in 2021 I decided to slowly abandon gmail, for various reasons. In order to limit the explanation here, allow me to just briefly state that I do not feel as if I want to promote any Google service anymore when the user becomes the end product (such as via data collection by upstream services, including other proxy-services). My feeling is that this is a hugely flawed business model to begin with, and I no longer wish to support this in any way, even if only indirectly so, such as by using services of companies that try to promote this flawed model.

In regards to responding to emails: please keep in mind that responding may take some time, depending on the amount of work I may have at that moment. So it is not that emails are ignored; it is more that I have not (yet) found the time to read and reply. This means there may be a delay of days, weeks and in some instances also months. There is, unfortunately, not much I can do when I need to prioritise my time investment, but I try to consider all feedback as an opportunity to improve my projects nonetheless.

In 2022 rubygems.org decided to make 2FA mandatory for every gem owner eventually:

see https://blog.rubygems.org/2022/06/13/making-packages-more-secure.html

However had, that has been reverted again, so I decided to shorten this paragraph. Mandatory 2FA may exclude users who do not have a smartphone device or other means to 'identify'. I do not feel it is a fair assumption by others to be made that non-identified people may not contribute code, which is why I reject it. Mandatory 2FA would mean an end to all my projects on rubygems.org, so let's hope it will never happen. (Keep in mind that I refer to mandatory 2FA; I have no qualms for people who use 2FA on their own, but this carrot-and-stick strategy by those who control the rubygems infrastructure is a very bad one to pursue.