Class: Ruboty::YMCrawl::Element

Inherits:

Object

Object
Ruboty::YMCrawl::Element

show all

Defined in:: lib/ruboty/ymcrawl/crawler.rb

Overview

セレクタにより抽出されたPageの一部を表すクラス

Instance Method Summary collapse

#get_content(target) ⇒ Object

対象に応じてURLを返す.
#get_image_title ⇒ Object

画像のタイトルを返す.
#get_image_url ⇒ Object

画像へのURLを返す.
#get_page_index_max ⇒ Object

記事が何ページまであるかを返す.
#get_title ⇒ Object

記事タイトルを返す.
#get_url ⇒ Object
#initialize(doc) ⇒ Element constructor

A new instance of Element.

Constructor Details

#initialize(doc) ⇒ `Element`

Returns a new instance of Element.

81	# File 'lib/ruboty/ymcrawl/crawler.rb', line 81 def initialize(doc) @doc = doc end

Instance Method Details

#get_content(target) ⇒ `Object`

対象に応じてURLを返す

# File 'lib/ruboty/ymcrawl/crawler.rb', line 105

def get_content(target)
	return get_url            if target == :url
	return get_image_url      if target == :image
	return get_image_title    if target == :image_title
	return get_title          if target == :title
	return get_page_index_max if target == :page_index_max
end

#get_image_title ⇒ `Object`

画像のタイトルを返す

# File 'lib/ruboty/ymcrawl/crawler.rb', line 93

def get_image_title
	title = (@doc.name == "img") ? @doc["title"] : @doc.content
	(title == nil) ? "noname" : title
end

#get_image_url ⇒ `Object`

画像へのURLを返す

Raises:

(ArgumentError)

# File 'lib/ruboty/ymcrawl/crawler.rb', line 86

def get_image_url
	return @doc["href"] if @doc.name == "a"
	return @doc["src"]  if @doc.name == "img"
	raise ArgumentError, "in Element"
end

#get_page_index_max ⇒ `Object`

記事が何ページまであるかを返す

102	# File 'lib/ruboty/ymcrawl/crawler.rb', line 102 def get_page_index_max; @doc.content.to_i end

#get_title ⇒ `Object`

記事タイトルを返す

99	# File 'lib/ruboty/ymcrawl/crawler.rb', line 99 def get_title; @doc.content end

#get_url ⇒ `Object`

83	# File 'lib/ruboty/ymcrawl/crawler.rb', line 83 def get_url; @doc["href"] end

Class: Ruboty::YMCrawl::Element

Overview

Instance Method Summary collapse

Constructor Details

#initialize(doc) ⇒ Element

Instance Method Details

#get_content(target) ⇒ Object

#get_image_title ⇒ Object

#get_image_url ⇒ Object

#get_page_index_max ⇒ Object

#get_title ⇒ Object

#get_url ⇒ Object

#initialize(doc) ⇒ `Element`

#get_content(target) ⇒ `Object`

#get_image_title ⇒ `Object`

#get_image_url ⇒ `Object`

#get_page_index_max ⇒ `Object`

#get_title ⇒ `Object`

#get_url ⇒ `Object`