Class: Youtube::SearchResultScraper

Inherits:
Object
  • Object
show all
Defined in:
lib/youtube/searchresultscraper.rb

Overview

Introduction

Youtube::SearchResultScraper scrapes video information from search result page on www.youtube.com.

You can get result as array or xml.

XML format is same as YouTube Developer API (www.youtube.com/dev_api_ref?m=youtube.videos.list_by_tag).

Example

require "rubygems"
require "youtube/searchresultscraper"

scraper = Youtube::SearchResultScraper.new(keyword, page)
scraper.open
scraper.scrape
puts scraper.get_xml

More Information

www.ark-web.jp/sandbox/wiki/184.html (japanese only)

Author

Yuki SHIDA <[email protected]>

Version

0.0.2

License

MIT license

Constant Summary collapse

@@youtube_search_base_url =
"http://www.youtube.com/results?search_query="

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(keyword, page = nil) ⇒ SearchResultScraper

Create Youtube::SearchResultScraper object specifying keyword and number of page.

You cannot specify number of videos per page. Always, the number of videos is 20 per page.

  • keyword - specify keyword that you want to search on YouTube. You must specify keyword encoded by UTF-8.

  • page - specify number of page



75
76
77
78
# File 'lib/youtube/searchresultscraper.rb', line 75

def initialize keyword, page=nil
  @keyword = keyword
  @page    = page if not page == nil
end

Instance Attribute Details

#keywordObject

Returns the value of attribute keyword.



61
62
63
# File 'lib/youtube/searchresultscraper.rb', line 61

def keyword
  @keyword
end

#pageObject

Returns the value of attribute page.



62
63
64
# File 'lib/youtube/searchresultscraper.rb', line 62

def page
  @page
end

Instance Method Details

#eachObject

Iterator for scraped videos.



113
114
115
116
117
# File 'lib/youtube/searchresultscraper.rb', line 113

def each 
  @videos.each do |video|
    yield video
  end
end

#get_xmlObject

Return videos information as XML Format.



120
121
122
123
124
125
126
# File 'lib/youtube/searchresultscraper.rb', line 120

def get_xml
  xml = "<ut_response status=\"ok\"><video_list>\n"
  each do |video|
    xml += video.to_xml
  end
  xml += "</video_list></ut_response>"
end

#openObject

Get search result from youtube by specified keyword.



81
82
83
84
85
86
87
# File 'lib/youtube/searchresultscraper.rb', line 81

def open
  url = @@youtube_search_base_url + CGI.escape(@keyword)
  url += "&page=#{@page}" if not @page == nil
  @html = Kernel.open(url).read
  replace_document_write_javascript
  @search_result = Hpricot.parse(@html)
end

#scrapeObject

Scrape video information from search result html.



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/youtube/searchresultscraper.rb', line 90

def scrape
  @videos = []

  @search_result.search("//div[@class='vEntry']").each do |video_html|
    video = Youtube::Video.new
    video.id             = scrape_id(video_html)
    video.author         = scrape_author(video_html)
    video.title          = scrape_title(video_html)
    video.length_seconds = scrape_length_seconds(video_html)
    video.rating_avg     = scrape_rating_avg(video_html)
    video.rating_count   = scrape_rating_count(video_html)
    video.description    = scrape_description(video_html)
    video.view_count     = scrape_view_count(video_html)
    video.thumbnail_url  = scrape_thumbnail_url(video_html)
    video.tags           = scrape_tags(video_html)
    video.url            = scrape_url(video_html)
    @videos << video
  end

  @videos
end