Down
Down is a utility tool for streaming, flexible and safe downloading of remote files. It can use open-uri + Net::HTTP
, http.rb or wget
as the backend HTTP library.
Installation
“ gem “down”, “~> 4.4”
“
Downloading
The primary method is Down.download
, which downloads the remote file into a Tempfile
:
“ require “down”
tempfile = Down.download(“http://example.com/nature.jpg”) tempfile #=> #Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg
“
Metadata
The returned Tempfile
has some additional attributes extracted from the response data:
“ tempfile.content_type #=> “text/plain” tempfile.original_filename #=> “document.txt” tempfile.charset #=> “utf-8”
“
Maximum size
When you’re accepting URLs from an outside source, it’s a good idea to limit the filesize (because attackers want to give a lot of work to your servers). Down allows you to pass a :max_size
option:
“ Down.download(“http://example.com/image.jpg”, max_size: 5 * 1024 * 1024) # 5 MB
Down::TooLarge: file is too large (max is 5MB)
“
What is the advantage over simply checking size after downloading? Well, Down terminates the download very early, as soon as it gets the Content-Length
header. And if the Content-Length
header is missing, Down will terminate the download as soon as the downloaded content surpasses the maximum size.
Destination
By default the remote file will be downloaded into a temporary location and returned as a Tempfile
. If you would like the file to be downloaded to a specific location on disk, you can specify the :destination
option:
“ Down.download(“http://example.com/image.jpg”, destination: “/path/to/destination”)
=> nil
“
In this case Down.download
won’t have any return value, so if you need a File object you’ll have to create it manually.
Basic authentication
Down.download
and Down.open
will automatically detect and apply HTTP basic authentication from the URL:
“ Down.download(“http://user:[email protected]”) Down.open(“http://user:[email protected]”)
“
Progress
Down.download
supports :content_length_proc
, which gets called with the value of the Content-Length
header as soon as it’s received, and :progress_proc
, which gets called with current filesize whenever a new chunk is downloaded.
“ Down.download “http://example.com/movie.mp4”, content_length_proc: -> (content_length) { … }, progress_proc: -> (progress) { … }
“
Streaming
Down has the ability to retrieve content of the remote file as it is being downloaded. The Down.open
method returns a Down::ChunkedIO
object which represents the remote file on the given URL. When you read from it, Down internally downloads chunks of the remote file, but only how much is needed.
“ remote_file = Down.open(“http://example.com/image.jpg”) remote_file.size # read from the “Content-Length” header
remote_file.read(1024) # downloads and returns first 1 KB remote_file.read(1024) # downloads and returns next 1 KB
remote_file.eof? #=> false remote_file.read # downloads and returns the rest of the file content remote_file.eof? #=> true
remote_file.close # closes the HTTP connection and deletes the internal Tempfile
“
The following IO methods are implemented:
#read
&#readpartial
#gets
#seek
#pos
&#tell
#eof?
#rewind
#close
Caching
By default the downloaded content is internally cached into a Tempfile
, so that when you rewind the Down::ChunkedIO
, it continues reading the cached content that it had already retrieved.
“ remote_file = Down.open(“http://example.com/image.jpg”) remote_file.read(110241024) # downloads, caches, and returns first 1MB remote_file.rewind remote_file.read(110241024) # reads the cached content remote_file.read(110241024) # downloads the next 1MB
“
If you want to save on IO calls and on disk usage, and don’t need to be able to rewind the Down::ChunkedIO
, you can disable caching downloaded content:
“ Down.open(“http://example.com/image.jpg”, rewindable: false)
“
Yielding chunks
You can also yield chunks directly as they’re downloaded via #each_chunk
, in which case the downloaded content is not cached into a file regardless of the :rewindable
option.
“ remote_file = Down.open(“http://example.com/image.jpg”) remote_file.each_chunk { |chunk| … } remote_file.close
“
Data
You can access the response status and headers of the HTTP request that was made:
“ remote_file = Down.open(“http://example.com/image.jpg”) remote_file.data[:status] #=> 200 remote_file.data[:headers] #=> { … } remote_file.data[:response] # returns the response object
“
Note that a Down::ResponseError
exception will automatically be raised if response status was 4xx or 5xx.
Down::ChunkedIO
The Down.open
performs HTTP logic and returns an instance of Down::ChunkedIO
. However, Down::ChunkedIO
is a generic class that can wrap any kind of streaming. It accepts an Enumerator
that yields chunks of content, and provides IO-like interface over that enumerator, calling it whenever more content is needed.
“ require “down/chunked_io”
Down::ChunkedIO.new(…)
“
:chunks
–Enumerator
that yields chunks of content:size
– size of the file if it’s known (returned by#size
):on_close
– called when streaming finishes or IO is closed:data
- custom data that you want to store (returned by#data
):rewindable
- whether to cache retrieved data into a file (defaults totrue
):encoding
- force content to be returned in specified encoding (defaults toEncoding::BINARY
)
Here is an example of creating a streaming IO of a MongoDB GridFS file:
“ require “down/chunked_io”
mongo = Mongo::Client.new(…) bucket = mongo.database.fs
content_length = bucket.find(_id: id).first[:length] stream = bucket.open_download_stream(id)
io = Down::ChunkedIO.new( size: content_length, chunks: stream.enum_for(:each), on_close: -> { stream.close }, )
“
Exceptions
Down tries to recognize various types of exceptions and re-raise them as one of the Down::Error
subclasses. This is Down’s exception hierarchy:
Down::Error
Down::TooLarge
Down::InvalidUrl
Down::TooManyRedirects
Down::ResponseError
Down::ClientError
Down::NotFound
Down::ServerError
Down::ConnectionError
Down::TimeoutError
Down::SSLError
Backends
The following backends are available:
- Down::NetHttp (default)
- Down::Http
- Down::Wget
You can use the backend directly:
“ require “down/net_http”
Down::NetHttp.download(“…”) Down::NetHttp.open(“…”)
“
Or you can set the backend globally (default is :net_http
):
“ require “down”
Down.backend :http # use the Down::Http backend
Down.download(“…”) Down.open(“…”)
“
Down::NetHttp
The Down::NetHttp
backend implements downloads using open-uri and Net::HTTP.
“ gem “down”, “~> 4.4”
“
“ require “down/net_http”
tempfile = Down::NetHttp.download(“http://nature.com/forest.jpg”) tempfile #=> #Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg
io = Down::NetHttp.open(“http://nature.com/forest.jpg”) io #=> #Down::ChunkedIO ...
“
Down::NetHttp.download
is implemented as a wrapper around open-uri, and fixes some of open-uri’s undesired behaviours:
- uses
URI::HTTP#open
orURI::HTTPS#open
directly for security - always returns a
Tempfile
object, whereas open-uri returnsStringIO
when file is smaller than 10KB - gives the extension to the
Tempfile
object from the URL - allows you to limit maximum number of redirects
On the other hand Down::NetHttp.open
is implemented using Net::HTTP directly, as open-uri doesn’t support downloading on-demand.
Redirects
Down::NetHttp#download
turns off open-uri’s following redirects, as open-uri doesn’t have a way to limit the maximum number of hops, and implements its own. By default maximum of 2 redirects will be followed, but you can change it via the :max_redirects
option:
“ Down::NetHttp.download(“http://example.com/image.jpg”) # 2 redirects allowed Down::NetHttp.download(“http://example.com/image.jpg”, max_redirects: 5) # 5 redirects allowed Down::NetHttp.download(“http://example.com/image.jpg”, max_redirects: 0) # 0 redirects allowed
Down::NetHttp.open(“http://example.com/image.jpg”) # 2 redirects allowed Down::NetHttp.open(“http://example.com/image.jpg”, max_redirects: 5) # 5 redirects allowed Down::NetHttp.open(“http://example.com/image.jpg”, max_redirects: 0) # 0 redirects allowed
“
Proxy
An HTTP proxy can be specified via the :proxy
option:
“ Down::NetHttp.download(“http://example.com/image.jpg”, proxy: “http://proxy.org”) Down::NetHttp.open(“http://example.com/image.jpg”, proxy: “http://user:[email protected]”)
“
Timeouts
Timeouts can be configured via the :open_timeout
and :read_timeout
options:
“ Down::NetHttp.download(“http://example.com/image.jpg”, open_timeout: 5) Down::NetHttp.open(“http://example.com/image.jpg”, read_timeout: 10)
“
Headers
Request headers can be added via the :headers
option:
“ Down::NetHttp.download(“http://example.com/image.jpg”, headers: { “Header” => “Value” }) Down::NetHttp.open(“http://example.com/image.jpg”, headers: { “Header” => “Value” })
“
SSL options
The :ssl_ca_cert
and :ssl_verify_mode
options are supported, and they have the same semantics as in open-uri
:
“ Down::NetHttp.open(“http://example.com/image.jpg”, ssl_ca_cert: “/path/to/cert”, ssl_verify_mode: OpenSSL::SSL::VERIFY_PEER)
“
Additional options
Any additional options passed to Down.download
will be forwarded to open-uri, so you can for example add basic authentication or a timeout:
“ Down::NetHttp.download “http://example.com/image.jpg”, http_basic_authentication: [john, secret], read_timeout: 5
“
You can also initialize the backend with default options:
“ net_http = Down::NetHttp.new(open_timeout: 3)
net_http.download(“http://example.com/image.jpg”) net_http.open(“http://example.com/image.jpg”)
“
Down::Http
The Down::Http
backend implements downloads using the http.rb gem.
“ gem “down”, “~> 4.4” gem “http”, “~> 4.0”
“
“ require “down/http”
tempfile = Down::Http.download(“http://nature.com/forest.jpg”) tempfile #=> #Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg
io = Down::Http.open(“http://nature.com/forest.jpg”) io #=> #Down::ChunkedIO ...
“
Some features that give the http.rb backend an advantage over open-uri
and Net::HTTP
include:
- Low memory usage (10x less than
open-uri
/Net::HTTP
) - Proper SSL support
- Support for persistent connections
- Global timeouts (limiting how long the whole request can take)
- Chainable builder API for setting default options
Additional options
All additional options will be forwarded to HTTP::Client#request
:
“ Down::Http.download(“http://example.org/image.jpg”, headers: { “Foo” => “Bar” }) Down::Http.open(“http://example.org/image.jpg”, follow: { max_hops: 0 })
“
However, it’s recommended to configure request options using http.rb’s chainable API, as it’s more convenient than passing raw options.
“ Down::Http.open(“http://example.org/image.jpg”) do |client| client.timeout(connect: 3, read: 3) end
“
You can also initialize the backend with default options:
“ http = Down::Http.new(headers: { “Foo” => “Bar” })
or
http = Down::Http.new { |client| client.timeout(connect: 3) }
http.download(“http://example.com/image.jpg”) http.open(“http://example.com/image.jpg”)
“
Request method
By default Down::Http
makes a GET
request to the specified endpoint, but you can specify a different request method using the :method
option:
“ Down::Http.download(“http://example.org/image.jpg”, method: :post) Down::Http.open(“http://example.org/image.jpg”, method: :post)
down = Down::Http.new(method: :post) down.download(“http://example.org/image.jpg”)
“
Down::Wget (experimental)
The Down::Wget
backend implements downloads using the wget
command line utility.
“ gem “down”, “~> 4.4” gem “posix-spawn” # omit if on JRuby gem “http_parser.rb”
“
“ require “down/wget”
tempfile = Down::Wget.download(“http://nature.com/forest.jpg”) tempfile #=> #Tempfile:/var/folders/k7/6zx6dx6x7ys3rv3srh0nyfj00000gn/T/20150925-55456-z7vxqz.jpg
io = Down::Wget.open(“http://nature.com/forest.jpg”) io #=> #Down::ChunkedIO ...
“
One major advantage of wget
is that it automatically resumes downloads that were interrupted due to network failures, which is very useful when you’re downloading large files.
However, the Wget backend should still be considered experimental, as it wasn’t easy to implement a CLI wrapper that streams output, so it’s possible that I’ve made mistakes. Let me know how it’s working out for you 😉.
Additional arguments
You can pass additional arguments to the underlying wget
commmand via symbols:
“ Down::Wget.download(“http://nature.com/forest.jpg”, :no_proxy, connect_timeout: 3) Down::Wget.open(“http://nature.com/forest.jpg”, user: “janko”, password: “secret”)
“
You can also initialize the backend with default arguments:
“ wget = Down::Wget.new(:no_proxy, connect_timeout: 3)
wget.download(“http://nature.com/forest.jpg”) wget.open(“http://nature.com/forest.jpg”)
“
Supported Ruby versions
- MRI 2.2
- MRI 2.3
- MRI 2.4
- MRI 2.5
- MRI 2.6
- JRuby
Development
You can run tests with
“ bundle exec rake test
“
The test suite pulls and runs kennethreitz/httpbin as a Docker container, so you’ll need to have Docker installed and running.