Class: BcCrawler::Release

Inherits:

Object

Object
BcCrawler::Release

show all

Defined in:: lib/bc_crawler/release.rb

Instance Attribute Summary collapse

#about ⇒ Object readonly

Returns the value of attribute about.
#art_fullsize_url ⇒ Object readonly

Returns the value of attribute art_fullsize_url.
#art_id ⇒ Object readonly

Returns the value of attribute art_id.
#art_thumb_url ⇒ Object readonly

Returns the value of attribute art_thumb_url.
#artist ⇒ Object readonly

Returns the value of attribute artist.
#band_id ⇒ Object readonly

Returns the value of attribute band_id.
#credits ⇒ Object readonly

Returns the value of attribute credits.
#data ⇒ Object readonly

Returns the value of attribute data.
#featured_track_id ⇒ Object readonly

Returns the value of attribute featured_track_id.
#has_audio ⇒ Object readonly

Returns the value of attribute has_audio.
#html ⇒ Object readonly

Returns the value of attribute html.
#id ⇒ Object readonly

Returns the value of attribute id.
#purchase_url ⇒ Object readonly

Returns the value of attribute purchase_url.
#release_date ⇒ Object readonly

Returns the value of attribute release_date.
#title ⇒ Object readonly

Returns the value of attribute title.
#tracks ⇒ Object readonly

Returns the value of attribute tracks.
#type ⇒ Object readonly

Returns the value of attribute type.
#url ⇒ Object readonly

Returns the value of attribute url.

Instance Method Summary collapse

#crawl(nodes = %w(artFullsizeUrl artThumbURL current hasAudio trackinfo url))) ⇒ Object

Scan the HTML for a particular JavaScript snippet where a variable named “TralbumData” is assigned.
#initialize(url) ⇒ Release constructor

A new instance of Release.
#load_release_info ⇒ Object

Assign some of the main information to instance variables TODO: make ALL information available as instance variables.
#load_track_info ⇒ Object

Tracks have their own class.
#to_s ⇒ Object

Constructor Details

#initialize(url) ⇒ `Release`

Returns a new instance of Release.

# File 'lib/bc_crawler/release.rb', line 9

def initialize(url)
  @url = url
  @tracks = []
end

Instance Attribute Details

#about ⇒ `Object` (readonly)

Returns the value of attribute about.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def about
  @about
end

#art_fullsize_url ⇒ `Object` (readonly)

Returns the value of attribute art_fullsize_url.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def art_fullsize_url
  @art_fullsize_url
end

#art_id ⇒ `Object` (readonly)

Returns the value of attribute art_id.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def art_id
  @art_id
end

#art_thumb_url ⇒ `Object` (readonly)

Returns the value of attribute art_thumb_url.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def art_thumb_url
  @art_thumb_url
end

#artist ⇒ `Object` (readonly)

Returns the value of attribute artist.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def artist
  @artist
end

#band_id ⇒ `Object` (readonly)

Returns the value of attribute band_id.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def band_id
  @band_id
end

#credits ⇒ `Object` (readonly)

Returns the value of attribute credits.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def credits
  @credits
end

#data ⇒ `Object` (readonly)

Returns the value of attribute data.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def data
  @data
end

#featured_track_id ⇒ `Object` (readonly)

Returns the value of attribute featured_track_id.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def featured_track_id
  @featured_track_id
end

#has_audio ⇒ `Object` (readonly)

Returns the value of attribute has_audio.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def has_audio
  @has_audio
end

#html ⇒ `Object` (readonly)

Returns the value of attribute html.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def html
  @html
end

#id ⇒ `Object` (readonly)

Returns the value of attribute id.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def id
  @id
end

#purchase_url ⇒ `Object` (readonly)

Returns the value of attribute purchase_url.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def purchase_url
  @purchase_url
end

#release_date ⇒ `Object` (readonly)

Returns the value of attribute release_date.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def release_date
  @release_date
end

#title ⇒ `Object` (readonly)

Returns the value of attribute title.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def title
  @title
end

#tracks ⇒ `Object` (readonly)

Returns the value of attribute tracks.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def tracks
  @tracks
end

#type ⇒ `Object` (readonly)

Returns the value of attribute type.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def type
  @type
end

#url ⇒ `Object` (readonly)

Returns the value of attribute url.



5
6
7

# File 'lib/bc_crawler/release.rb', line 5

def url
  @url
end

Instance Method Details

#crawl(nodes = %w(artFullsizeUrl artThumbURL current hasAudio trackinfo url))) ⇒ `Object`

Scan the HTML for a particular JavaScript snippet where a variable named “TralbumData” is assigned. TralbumData contains all information about the release (and its tracks), but has to be cleaned first in order to get a valid JSON object.

By default, only the main nodes in TralbumData are crawled. There are more nodes available.

nodes = %w(album_is_preorder album_release_date artFullsizeUrl artist artThumbURL
           current defaultPrice featured_track_id FREE freeDownloadPage hasAudio
           id initial_track_num is_preorder item_type last_subscription_item
           maxPrice minPrice packages PAID playing_from preorder_count trackinfo url)

# File 'lib/bc_crawler/release.rb', line 24

def crawl(nodes = %w(artFullsizeUrl artThumbURL current hasAudio trackinfo url))
  puts "Crawling #{@url}"
  @nodes = nodes

  # call the URL, fetch the JavaScript code (TralbumData) and clean the string
  @html = open(@url).read
  js_content = html.gsub(/\n/, '~~')[/var TralbumData = \{(.*?)\};/, 1] # get content of JS variable TralbumData
                   .gsub('~~', "\n")                                  # undo line endings replacement
                   .gsub("\t", '')                                    # remove tabs
                   .gsub("\" + \"", '')                               # special bug in "url" node

  # scan the JavaScript code text for the given nodes
  json_nodes = []
  @nodes.each do |node|
    json_nodes << js_content[/^( )*#{node}( )*:.*$/]                  # fetch current node in JavaScript object
                           .gsub(/#{node}/, "\"#{node}\"")            # add double quotes around node name
                           .gsub(/( )*,( )*$/, '')                    # remove empty lines with comma
  end

  @data = JSON.parse("{ #{ json_nodes.join(', ') } }")

  # Finally, we load the release info
  load_release_info
end

#load_release_info ⇒ `Object`

Assign some of the main information to instance variables TODO: make ALL information available as instance variables

# File 'lib/bc_crawler/release.rb', line 51

def load_release_info
  @art_fullsize_url   = @data['artFullsizeUrl']
  @art_thumb_url      = @data['artThumbURL']
  @art_id             = @data['current']['art_it']
  @about              = @data['current']['about']
  @featured_track_id  = @data['current']['featured_track_id']
  @credits            = @data['current']['credits']
  @artist             = @data['current']['artist']
  @purchase_url       = @data['current']['purchase_url']
  @band_id            = @data['current']['band_id']
  @id                 = @data['current']['id']
  @release_date       = @data['current']['release_date']
  @type               = @data['current']['type']
  @title              = @data['current']['title']
  @has_audio          = @data['hasAudio']
  load_track_info
end

#load_track_info ⇒ `Object`

Tracks have their own class

# File 'lib/bc_crawler/release.rb', line 70

def load_track_info
  @data['trackinfo'].each do |track|
    @tracks << Track.new(self, track)
  end
end

#to_s ⇒ `Object`

# File 'lib/bc_crawler/release.rb', line 76

def to_s
  <<-EOF
  URL : #{ @url }
  Artist : #{ @artist }
  Release title : #{ @title }
  Number of tracks : #{ @tracks.count }
  #{ '(use .crawl method to fetch the missing data)' if @artist.nil? }
  EOF
end

Class: BcCrawler::Release

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(url) ⇒ Release

Instance Attribute Details

#about ⇒ Object (readonly)

#art_fullsize_url ⇒ Object (readonly)

#art_id ⇒ Object (readonly)

#art_thumb_url ⇒ Object (readonly)

#artist ⇒ Object (readonly)

#band_id ⇒ Object (readonly)

#credits ⇒ Object (readonly)

#data ⇒ Object (readonly)

#featured_track_id ⇒ Object (readonly)

#has_audio ⇒ Object (readonly)

#html ⇒ Object (readonly)

#id ⇒ Object (readonly)

#purchase_url ⇒ Object (readonly)

#release_date ⇒ Object (readonly)

#title ⇒ Object (readonly)

#tracks ⇒ Object (readonly)

#type ⇒ Object (readonly)

#url ⇒ Object (readonly)