Class: ImageDownloader

Inherits:
Object
  • Object
show all
Defined in:
lib/yamd.rb

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(base_dir = Dir.pwd) ⇒ ImageDownloader

Returns a new instance of ImageDownloader.



140
141
142
# File 'lib/yamd.rb', line 140

def initialize(base_dir = Dir.pwd)
  @base_dir = base_dir
end

Class Method Details

.format_page_name(page, chapter, manga) ⇒ Object



190
191
192
193
194
# File 'lib/yamd.rb', line 190

def self.format_page_name(page, chapter, manga)
  # TODO: usar log10 de chapter.pages.size para determinar padding de zeros
  page_path = Addressable::URI.parse(page.image_url).path
  format("%04d", page.number) + File.extname(page_path)
end

.sanitize_dir_name(name) ⇒ Object

TODO: check if all text from the site that’s used to make dirs or

files is sanitized

thanks to “Ranma 1/2”



199
200
201
202
# File 'lib/yamd.rb', line 199

def self.sanitize_dir_name(name)
  # TODO: this is a hack, find a serious solution for every possible case
  name.gsub(/\//, '_')
end

Instance Method Details

#download(manga) ⇒ Object

TODO: Many, many things:

* Add a hash parameter with the possibilities of parallelization
* What parallelization options should exist? Parallelize chapters? Parallelize pages independet of chapter? Chapters within a walking window? Pages within a walking window? 
* Add the retryable gem to all the IO actions, THIS INCLUDE THE ABSTRACT CLASSES ABOVE.
* Avoid that an error with one page or chapter stops the download. Log the work of the algorithm and all the failures in a file inside the manga directory. This way the user can review the problems easily.
* Good and bad points of the parallelization options:
  * Chapter - Start a thread for each chapter, download pages of the chapter in a sequential fashion.
    * Good: The most easy to implement. For the average manga is a good granularity: between 10~100 threads of 19~45 pages each.
    * Bad: If things goes bad, thing goes BAD. It's possible that each chapter will have not downloaded pages. In that case the best is remove everything and start over. Don't work for unending shounens (One Piece, in truth ~800 pieces of 19 pages each).
  * Chapter within window - starts N threads and put them in a queue, wait the first end, when this happen adds an new element at the end of the queue and wait again.
    * Good: Not very complex to implement. If things goes bad we remove the N last chapters, not everything. With log we can need to remove even less chapters. Works for unending shounens.
    * Bad: Adds a variable to be hardcoded or received. More complex than simply parallelize every chapter.
  * Pages (or Chapters and Pages) - starts a thread for every chapter, then starts a thread for every page of the chapter, simple, don't?
    * Good: If no other option eaten all of your bandwidth, this one will frozen your computer or give you the best result. Easy to implement.
    * Bad: Have you ever seen chaos? this is it. If something fail and you don't checked the log you will discover missing pages on the middle of chapters. Also not have a good granularity. There's a lot of thread creation overhead for little work. Also, almost surely will frozen your computer if the size of the manga is big and your bandwithd and processing power are small.
  * Pages within window - the same as the chapters within window but with pages instead of chapters.
    * Good: Not very complex to implement. Works well for mangas that the uploader has compressed an entire volume in an chapter, and there's only one volume. If things goes bad, you only need to delete the last N pages, if N is 40, for example, and it's a shounen with no less than 19 pages per chapter, delete the last 3 chapters.
    * Bad: Bad granularity. Not so bad if your bandwidth is small, but probably will cost a lot of CPU for little effort.


162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# File 'lib/yamd.rb', line 162

def download(manga)
  manga_name = self.class.sanitize_dir_name(manga.name)
  manga_dir = Pathname.new(@base_dir).join(manga_name + '/')
  if manga_dir.exist?
    p 'Manga dir exists. Skipping each existing chapter. If the script was forced to stop the last downloaded chapter can be incomplete. Remove it to be downloaded again.'
  else
    Dir.mkdir(manga_dir.to_s)
  end
  manga.chapters.each do | chapter |
    chapter_name = self.class.sanitize_dir_name(chapter.name)
    chapter_dir = manga_dir.join(chapter_name + '/')
    unless chapter_dir.exist?
      Dir.mkdir(chapter_dir.to_s)
      chapter.pages.each do | page |
        page_name = self.class.format_page_name(page, chapter, manga)
        page_abs_path = chapter_dir.join(page_name).to_s
        File.open(page_abs_path, 'wb') do | f |
          open(page.clean_image_url) do | image |
            # TODO: check if copy_stream avoids alloacting the whole image in
            # memory before starting to flush it
            IO.copy_stream(image, f)
          end
        end
      end
    end # end "unless chapter_dir.exist?"
  end
end