Class: Youtube::SearchResultScraper

Inherits:
Object
  • Object
show all
Defined in:
lib/youtube/searchresultscraper.rb

Overview

Introduction

Youtube::SearchResultScraper scrapes video information from search result page on www.youtube.com.

You can get result as array or xml.

XML format is same as YouTube Developer API (www.youtube.com/dev_api_ref?m=youtube.videos.list_by_tag).

Example

require "rubygems"
require "youtube/searchresultscraper"

scraper = Youtube::SearchResultScraper.new(keyword, page)
scraper.open
scraper.scrape
puts scraper.get_xml

More Information

www.ark-web.jp/sandbox/wiki/184.html (japanese only)

Author

Yuki SHIDA <[email protected]>

Author

Konuma Akio <[email protected]>

Version

0.0.3

License

MIT license

Constant Summary collapse

@@youtube_search_base_url =
"http://www.youtube.com/results?search_query="

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(keyword, page = nil) ⇒ SearchResultScraper

Create Youtube::SearchResultScraper object specifying keyword and number of page.

You cannot specify number of videos per page. Always, the number of videos is 20 per page.

  • keyword - specify keyword that you want to search on YouTube. You must specify keyword encoded by UTF-8.

  • page - specify number of page



79
80
81
82
# File 'lib/youtube/searchresultscraper.rb', line 79

def initialize keyword, page=nil
  @keyword = keyword
  @page    = page if not page == nil
end

Instance Attribute Details

#keywordObject

Returns the value of attribute keyword.



62
63
64
# File 'lib/youtube/searchresultscraper.rb', line 62

def keyword
  @keyword
end

#pageObject

Returns the value of attribute page.



63
64
65
# File 'lib/youtube/searchresultscraper.rb', line 63

def page
  @page
end

#video_countObject (readonly)

Returns the value of attribute video_count.



64
65
66
# File 'lib/youtube/searchresultscraper.rb', line 64

def video_count
  @video_count
end

#video_fromObject (readonly)

Returns the value of attribute video_from.



65
66
67
# File 'lib/youtube/searchresultscraper.rb', line 65

def video_from
  @video_from
end

#video_toObject (readonly)

Returns the value of attribute video_to.



66
67
68
# File 'lib/youtube/searchresultscraper.rb', line 66

def video_to
  @video_to
end

Instance Method Details

#eachObject

Iterator for scraped videos.



126
127
128
129
130
# File 'lib/youtube/searchresultscraper.rb', line 126

def each
  @videos.each do |video|
    yield video
  end
end

#get_xmlObject

Return videos information as XML Format.



133
134
135
136
137
138
139
140
141
# File 'lib/youtube/searchresultscraper.rb', line 133

def get_xml
  xml = "<ut_response status=\"ok\">" +
          "<video_count>" + @video_count.to_s +  "</video_count>" +
          "<video_list>\n"
  each do |video|
    xml += video.to_xml
  end
  xml += "</video_list></ut_response>"
end

#openObject

Get search result from youtube by specified keyword.



85
86
87
88
89
90
91
# File 'lib/youtube/searchresultscraper.rb', line 85

def open
  @url = @@youtube_search_base_url + CGI.escape(@keyword)
  @url += "&page=#{@page}" if not @page == nil
  @html = Kernel.open(@url).read
  replace_document_write_javascript
  @search_result = Hpricot.parse(@html)
end

#scrapeObject

Scrape video information from search result html.



94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/youtube/searchresultscraper.rb', line 94

def scrape
  @videos = []

  @search_result.search("//div[@class='vEntry']").each do |video_html|
    video = Youtube::Video.new
    video.id             = scrape_id(video_html)
    video.author         = scrape_author(video_html)
    video.title          = scrape_title(video_html)
    video.length_seconds = scrape_length_seconds(video_html)
    video.rating_avg     = scrape_rating_avg(video_html)
    video.rating_count   = scrape_rating_count(video_html)
    video.description    = scrape_description(video_html)
    video.view_count     = scrape_view_count(video_html)
    video.thumbnail_url  = scrape_thumbnail_url(video_html)
    video.tags           = scrape_tags(video_html)
    video.url            = scrape_url(video_html)

    check_video video

    @videos << video
  end

  @video_count = scrape_video_count
  @video_from  = scrape_video_from
  @video_to    = scrape_video_to

  raise "scraping error" if (is_no_result != @videos.empty?)

  @videos
end