Class: ImageDownloader
- Inherits:
-
Object
- Object
- ImageDownloader
- Defined in:
- lib/yamd.rb
Class Method Summary collapse
- .format_page_name(page, chapter, manga) ⇒ Object
-
.sanitize_dir_name(name) ⇒ Object
TODO: check if all text from the site that’s used to make dirs or files is sanitized thanks to “Ranma 1/2”.
Instance Method Summary collapse
-
#download(manga) ⇒ Object
TODO: Many, many things: * Add a hash parameter with the possibilities of parallelization * What parallelization options should exist? Parallelize chapters? Parallelize pages independet of chapter? Chapters within a walking window? Pages within a walking window? * Add the retryable gem to all the IO actions, THIS INCLUDE THE ABSTRACT CLASSES ABOVE.
-
#initialize(base_dir = Dir.pwd) ⇒ ImageDownloader
constructor
A new instance of ImageDownloader.
Constructor Details
#initialize(base_dir = Dir.pwd) ⇒ ImageDownloader
Returns a new instance of ImageDownloader.
140 141 142 |
# File 'lib/yamd.rb', line 140 def initialize(base_dir = Dir.pwd) @base_dir = base_dir end |
Class Method Details
.format_page_name(page, chapter, manga) ⇒ Object
190 191 192 193 194 |
# File 'lib/yamd.rb', line 190 def self.format_page_name(page, chapter, manga) # TODO: usar log10 de chapter.pages.size para determinar padding de zeros page_path = Addressable::URI.parse(page.image_url).path format("%04d", page.number) + File.extname(page_path) end |
.sanitize_dir_name(name) ⇒ Object
TODO: check if all text from the site that’s used to make dirs or
files is sanitized
thanks to “Ranma 1/2”
199 200 201 202 |
# File 'lib/yamd.rb', line 199 def self.sanitize_dir_name(name) # TODO: this is a hack, find a serious solution for every possible case name.gsub(/\//, '_') end |
Instance Method Details
#download(manga) ⇒ Object
TODO: Many, many things:
* Add a hash parameter with the possibilities of parallelization
* What parallelization options should exist? Parallelize chapters? Parallelize pages independet of chapter? Chapters within a walking window? Pages within a walking window?
* Add the retryable gem to all the IO actions, THIS INCLUDE THE ABSTRACT CLASSES ABOVE.
* Avoid that an error with one page or chapter stops the download. Log the work of the algorithm and all the failures in a file inside the manga directory. This way the user can review the problems easily.
* Good and bad points of the parallelization options:
* Chapter - Start a thread for each chapter, download pages of the chapter in a sequential fashion.
* Good: The most easy to implement. For the average manga is a good granularity: between 10~100 threads of 19~45 pages each.
* Bad: If things goes bad, thing goes BAD. It's possible that each chapter will have not downloaded pages. In that case the best is remove everything and start over. Don't work for unending shounens (One Piece, in truth ~800 pieces of 19 pages each).
* Chapter within window - starts N threads and put them in a queue, wait the first end, when this happen adds an new element at the end of the queue and wait again.
* Good: Not very complex to implement. If things goes bad we remove the N last chapters, not everything. With log we can need to remove even less chapters. Works for unending shounens.
* Bad: Adds a variable to be hardcoded or received. More complex than simply parallelize every chapter.
* Pages (or Chapters and Pages) - starts a thread for every chapter, then starts a thread for every page of the chapter, simple, don't?
* Good: If no other option eaten all of your bandwidth, this one will frozen your computer or give you the best result. Easy to implement.
* Bad: Have you ever seen chaos? this is it. If something fail and you don't checked the log you will discover missing pages on the middle of chapters. Also not have a good granularity. There's a lot of thread creation overhead for little work. Also, almost surely will frozen your computer if the size of the manga is big and your bandwithd and processing power are small.
* Pages within window - the same as the chapters within window but with pages instead of chapters.
* Good: Not very complex to implement. Works well for mangas that the uploader has compressed an entire volume in an chapter, and there's only one volume. If things goes bad, you only need to delete the last N pages, if N is 40, for example, and it's a shounen with no less than 19 pages per chapter, delete the last 3 chapters.
* Bad: Bad granularity. Not so bad if your bandwidth is small, but probably will cost a lot of CPU for little effort.
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
# File 'lib/yamd.rb', line 162 def download(manga) manga_name = self.class.sanitize_dir_name(manga.name) manga_dir = Pathname.new(@base_dir).join(manga_name + '/') if manga_dir.exist? p 'Manga dir exists. Skipping each existing chapter. If the script was forced to stop the last downloaded chapter can be incomplete. Remove it to be downloaded again.' else Dir.mkdir(manga_dir.to_s) end manga.chapters.each do | chapter | chapter_name = self.class.sanitize_dir_name(chapter.name) chapter_dir = manga_dir.join(chapter_name + '/') unless chapter_dir.exist? Dir.mkdir(chapter_dir.to_s) chapter.pages.each do | page | page_name = self.class.format_page_name(page, chapter, manga) page_abs_path = chapter_dir.join(page_name).to_s File.open(page_abs_path, 'wb') do | f | open(page.clean_image_url) do | image | # TODO: check if copy_stream avoids alloacting the whole image in # memory before starting to flush it IO.copy_stream(image, f) end end end end # end "unless chapter_dir.exist?" end end |