Class: DaimonSkycrawlers::Crawler::Base

Inherits:
Object
  • Object
show all
Includes:
DaimonSkycrawlers::Callbacks, DaimonSkycrawlers::ConfigMixin, DaimonSkycrawlers::Configurable, LoggerMixin
Defined in:
lib/daimon_skycrawlers/crawler/base.rb

Overview

The base class of crawler

Direct Known Subclasses

Default

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from DaimonSkycrawlers::Configurable

#configure

Methods included from DaimonSkycrawlers::Callbacks

#before_process, #clear_before_process_callbacks, #run_before_callbacks

Methods included from LoggerMixin

included

Constructor Details

#initialize(base_url = nil, faraday_options: {}, options: {}) ⇒ Base

Returns a new instance of Base.

Parameters:

  • Base (String)

    URL for crawler

  • options (Hash) (defaults to: {})

    for Faraday



38
39
40
41
42
43
44
45
46
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 38

def initialize(base_url = nil, faraday_options: {}, options: {})
  super()
  @base_url = base_url
  @faraday_options = faraday_options
  @options = options
  @prepare = ->(connection) {}
  @skipped = false
  @n_processed_urls = 0
end

Instance Attribute Details

#n_processed_urlsObject (readonly)

Returns the value of attribute n_processed_urls.



32
33
34
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 32

def n_processed_urls
  @n_processed_urls
end

#storageObject

Retrieve storage instance



74
75
76
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 74

def storage
  @storage ||= Storage::RDB.new
end

Instance Method Details

#connectionObject



82
83
84
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 82

def connection
  @connection ||= Faraday.new(@base_url, @faraday_options)
end

#fetch(path, message = {}) ⇒ Object

Raises:

  • (NotImplementedError)


107
108
109
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 107

def fetch(path, message = {})
  raise NotImplementedError, "Must implement this method in subclass"
end

#get(path, params = {}) ⇒ Object



111
112
113
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 111

def get(path, params = {})
  @connection.get(path, params)
end

#post(path, params = {}) ⇒ Object



115
116
117
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 115

def post(path, params = {})
  @connection.post(path, params)
end

#prepare(&block) ⇒ Object

Call this method before DaimonSkycrawlers.register_crawler For example, you can login before fetch URL



67
68
69
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 67

def prepare(&block)
  @prepare = block
end

#process(message, &block) ⇒ Object



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 86

def process(message, &block)
  @skipped = false
  @n_processed_urls += 1

  setup_default_filters

  proceeding = run_before_callbacks(message)
  unless proceeding
    @skipped = true
    skip(message[:url])
    return
  end

  # url can be a path
  url = message.delete(:url)
  url = (URI(connection.url_prefix) + url).to_s

  @prepare.call(connection)
  fetch(url, message, &block)
end

#setup_connection(options = {}) {|faraday| ... } ⇒ Object

Set up connection

Parameters:

  • options (Hash) (defaults to: {})

    for Faraday

Yields:

  • (faraday)

Yield Parameters:

  • faraday (Faraday)


55
56
57
58
59
60
61
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 55

def setup_connection(options = {})
  merged_options = @faraday_options.merge(options)
  faraday_options = merged_options.empty? ? nil : merged_options
  @connection = Faraday.new(@base_url, faraday_options) do |faraday|
    yield faraday
  end
end

#skipped?Boolean

Returns:

  • (Boolean)


78
79
80
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 78

def skipped?
  @skipped
end