Class: AIPP::Downloader
Overview
AIP downloader infrastructure
The downloader operates in the storage
directory where it creates two subdirectories “archive” and “work”. The initializer looks for archive
in “archives” and (if found) unzips its contents into “work”. When reading a document
, the downloader looks for the document
in “work” and (unless found) downloads it from url
. HTML documents are parsed to Nokogiri::HTML5::Document
, PDF documents are parsed to AIPP::PDF
. Finally, the contents of “work” are written back to archive
.
Instance Attribute Summary collapse
-
#archive ⇒ String
readonly
Name of the archive (without extension “.zip”).
-
#archive_file ⇒ Pathname
readonly
Full path to the archive.
-
#storage ⇒ Pathname
readonly
Directory to operate within.
Instance Method Summary collapse
-
#initialize(storage:, archive:) ⇒ Downloader
constructor
A new instance of Downloader.
-
#read(document:, url:, type: nil) ⇒ Nokogiri::HTML5::Document, AIPP::PDF
Download and read
document
.
Constructor Details
#initialize(storage:, archive:) ⇒ Downloader
Returns a new instance of Downloader.
37 38 39 40 41 42 43 44 45 46 47 |
# File 'lib/aipp/downloader.rb', line 37 def initialize(storage:, archive:) @storage, @archive = storage, archive fail(ArgumentError, 'bad storage directory') unless Dir.exist? storage @archive_file = archives_path.join("#{@archive}.zip") prepare unzip if @archive_file.exist? yield self zip ensure teardown end |
Instance Attribute Details
#archive ⇒ String (readonly)
Returns name of the archive (without extension “.zip”).
30 31 32 |
# File 'lib/aipp/downloader.rb', line 30 def archive @archive end |
#archive_file ⇒ Pathname (readonly)
Returns full path to the archive.
33 34 35 |
# File 'lib/aipp/downloader.rb', line 33 def archive_file @archive_file end |
#storage ⇒ Pathname (readonly)
Returns directory to operate within.
27 28 29 |
# File 'lib/aipp/downloader.rb', line 27 def storage @storage end |
Instance Method Details
#read(document:, url:, type: nil) ⇒ Nokogiri::HTML5::Document, AIPP::PDF
Download and read document
56 57 58 59 60 61 62 63 64 |
# File 'lib/aipp/downloader.rb', line 56 def read(document:, url:, type: nil) type ||= Pathname(URI(url).path).extname[1..-1].to_sym file = work_path.join([document, type].join('.')) unless file.exist? verbose_info "Downloading #{document}" IO.copy_stream(Kernel.open(url), file) end convert file end |