🖼 GsImgFetcher

Build Status Gem Version Maintainability

gs_img_fetcher is a tool to download images from remote hosts and save them on your local storage.

Installation

Add this line to your application's Gemfile:

gem 'gs_img_fetcher'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install gs_img_fetcher

Features

Concurrency

gs_img_fetcher is designed with concurrency in mind. It can be configured to fetch images either asynchronously or synchronously. By default, it runs asynchronously and the maximum number of threads depends on what your machine allows. For a relatively small input, it would be better to specify --no-async option. Check out the options --async and --max_threads.

File size limit

You can set a limit on the maximum size of each downloaded image to avoid downloading unexpectedly large files and filling up your storage. By default, it runs without a limit. Use --max_size option to set one.

Usage

CLI

Let's say you have in your current directory a text file named urls.txt containing list of image URLs, each line containing one URL.

$ cat urls.txt
http://example.com/image1.jpg
http://example.com/image1.png
http://example.com/image1.svg

$ gs_img_fetcher run urls.txt output --max_size=5
I, [2020-05-17T13:09:01.420214 #87392]  INFO -- : Processing 3 URLs (3 valid, 0 invalid)
...
I, [2020-05-17T13:09:02.709097 #87392]  INFO -- : Fetch complete (3 successful, 0 failed)

$ ls output
1e8256aa-5cb7-4545-9109-65aaa550deac.jpg  49d4f436-110f-4206-a2d6-07cc6156fc56.png  a5b4ce07-1fc3-49e3-b558-44f8c4afaaab.svg

Running gs_img_fetcher run urls.txt output would take URLs from urls.txt, downloads the images and saves them in the directory output.

Run gs_img_fetcher --help to show usage guide.

Set the environment variable NOLOG to a truthy value to suppress logs.

Hooking GsImgFetcher into your own application

Fetching a single image

fetcher = GsImgFetcher::Fetcher.new(
  GsImgFetcher::Entry.new('http://example.com/image.png'),
  'output'
)
fetcher.fetch
fetcher.save
fetcher.successful?

Fetching multiple images

entry_set = GsImgFetcher::EntrySet.from_file('urls.txt')
# or
urls = ['http://example.com/image.png', 'http://example.com/image2.png'] 
entries = urls.map { |url| GsImgFetcher::Entry.new(url) }
entry_set = GsImgFetcher::EntrySet.new(entries)

manager = GsImgFetcher::Manager.new(entry_set, output_dir: 'output', async: false)
manager.setup.fetch
  • Manager is what controls the entire process of handling the input and fetching and saving the images.
  • EntrySet is responsible for finding the input file and parsing, sanitizing and validating the list of URLs.
  • Fetcher is responsible for downloading and saving images.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run bundle exec rspec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

License

The gem is available as open source under the terms of the MIT License.