Module: DaimonSkycrawlers

Defined in:
lib/daimon_skycrawlers.rb,
lib/daimon_skycrawlers/cli.rb,
lib/daimon_skycrawlers/queue.rb,
lib/daimon_skycrawlers/timer.rb,
lib/daimon_skycrawlers/config.rb,
lib/daimon_skycrawlers/filter.rb,
lib/daimon_skycrawlers/logger.rb,
lib/daimon_skycrawlers/crawler.rb,
lib/daimon_skycrawlers/storage.rb,
lib/daimon_skycrawlers/version.rb,
lib/daimon_skycrawlers/consumer.rb,
lib/daimon_skycrawlers/callbacks.rb,
lib/daimon_skycrawlers/processor.rb,
lib/daimon_skycrawlers/filter/base.rb,
lib/daimon_skycrawlers/storage/rdb.rb,
lib/daimon_skycrawlers/configurable.rb,
lib/daimon_skycrawlers/consumer/url.rb,
lib/daimon_skycrawlers/crawler/base.rb,
lib/daimon_skycrawlers/storage/base.rb,
lib/daimon_skycrawlers/storage/file.rb,
lib/daimon_skycrawlers/storage/null.rb,
lib/daimon_skycrawlers/consumer/base.rb,
lib/daimon_skycrawlers/generator/new.rb,
lib/daimon_skycrawlers/processor/base.rb,
lib/daimon_skycrawlers/processor/proc.rb,
lib/daimon_skycrawlers/sitemap_parser.rb,
lib/daimon_skycrawlers/commands/runner.rb,
lib/daimon_skycrawlers/crawler/default.rb,
lib/daimon_skycrawlers/commands/enqueue.rb,
lib/daimon_skycrawlers/generator/filter.rb,
lib/daimon_skycrawlers/processor/spider.rb,
lib/daimon_skycrawlers/generator/crawler.rb,
lib/daimon_skycrawlers/processor/default.rb,
lib/daimon_skycrawlers/generator/generate.rb,
lib/daimon_skycrawlers/generator/processor.rb,
lib/daimon_skycrawlers/filter/update_checker.rb,
lib/daimon_skycrawlers/consumer/http_response.rb,
lib/daimon_skycrawlers/filter/duplicate_checker.rb,
lib/daimon_skycrawlers/filter/robots_txt_checker.rb

Overview

Name space for this library

Defined Under Namespace

Modules: Callbacks, ConfigMixin, Configurable, Consumer, Crawler, Filter, LoggerMixin, Processor, Storage, Timer Classes: Configuration, Logger, Queue, SitemapParser

Constant Summary collapse

VERSION =

Version of this library

"1.0.0"

Class Method Summary collapse

Class Method Details

.configurationDaimonSkycrawlers::Configuration

Retrieve configuration object



52
53
54
55
56
57
58
59
# File 'lib/daimon_skycrawlers.rb', line 52

def configuration
  @configuration ||= DaimonSkycrawlers::Configuration.new.tap do |config|
    config.logger = DaimonSkycrawlers::Logger.default
    config.queue_name_prefix = "daimon-skycrawlers"
    config.crawler_interval = 1
    config.shutdown_interval = 10
  end
end

.configure {|configuration| ... } ⇒ void

This method returns an undefined value.

Configure DaimonSkycrawlers

DaimonSkycrawlers.configure do |config|
  config.logger = DaimonSkycrawlers::Logger.default
  config.queue_name_prefix = "daimon-skycrawlers"
  config.crawler_interval = 1
  config.shutdown_interval = 10
end
  • logger: logger instance
  • queue_name_prefix: prefix of queue name.
  • crawler_interval: crawling interval
  • shutdown_interval: shutdown after interval after the queue is empty

Yields:

Yield Parameters:

Yield Returns:

  • (void)


82
83
84
# File 'lib/daimon_skycrawlers.rb', line 82

def configure
  yield configuration
end

.envObject

Return current environment



125
126
127
# File 'lib/daimon_skycrawlers.rb', line 125

def env
  ENV["SKYCRAWLERS_ENV"] || "development"
end

.load_crawlersvoid

This method returns an undefined value.

Load "app/crawlers/*/.rb"



103
104
105
106
107
108
# File 'lib/daimon_skycrawlers.rb', line 103

def load_crawlers
  Dir.glob("app/crawlers/**/*.rb") do |path|
    require(File.expand_path(path, Dir.pwd)) &&
      DaimonSkycrawlers.configuration.logger.info("Loaded crawler: #{path}")
  end
end

.load_initvoid

This method returns an undefined value.

Load "config/init.rb"



91
92
93
94
95
96
# File 'lib/daimon_skycrawlers.rb', line 91

def load_init
  require(File.expand_path("config/init.rb", Dir.pwd))
rescue LoadError => ex
  puts ex.message
  exit(false)
end

.load_processorsvoid

This method returns an undefined value.

Load "app/processors/*/.rb"



115
116
117
118
119
120
# File 'lib/daimon_skycrawlers.rb', line 115

def load_processors
  Dir.glob("app/processors/**/*.rb") do |path|
    require(File.expand_path(path, Dir.pwd)) &&
      DaimonSkycrawlers.configuration.logger.info("Loaded processor: #{path}")
  end
end

.register_crawler(crawler) ⇒ void

This method returns an undefined value.

Register a crawler

Parameters:

  • crawler (Crawler)

    instance which implements fetch method



43
44
45
# File 'lib/daimon_skycrawlers.rb', line 43

def register_crawler(crawler)
  DaimonSkycrawlers::Consumer::URL.register(crawler)
end

.register_processor(processor) ⇒ void .register_processor {|message| ... } ⇒ void

Register a processor

Overloads:

  • .register_processor(processor) ⇒ void

    This method returns an undefined value.

    Parameters:

    • processor (Processor)

      instance which implements call method

  • .register_processor {|message| ... } ⇒ void

    This method returns an undefined value.

    Yields:

    • (message)

      Register given block as a processor.

    Yield Parameters:

    • message (Hash)

      A message from queue

    Yield Returns:

    • (void)


33
34
35
# File 'lib/daimon_skycrawlers.rb', line 33

def register_processor(processor = nil, &block)
  DaimonSkycrawlers::Consumer::HTTPResponse.register(processor, &block)
end