Module: DaimonSkycrawlers
- Defined in:
- lib/daimon_skycrawlers.rb,
lib/daimon_skycrawlers/cli.rb,
lib/daimon_skycrawlers/queue.rb,
lib/daimon_skycrawlers/timer.rb,
lib/daimon_skycrawlers/config.rb,
lib/daimon_skycrawlers/filter.rb,
lib/daimon_skycrawlers/logger.rb,
lib/daimon_skycrawlers/crawler.rb,
lib/daimon_skycrawlers/storage.rb,
lib/daimon_skycrawlers/version.rb,
lib/daimon_skycrawlers/consumer.rb,
lib/daimon_skycrawlers/callbacks.rb,
lib/daimon_skycrawlers/processor.rb,
lib/daimon_skycrawlers/filter/base.rb,
lib/daimon_skycrawlers/storage/rdb.rb,
lib/daimon_skycrawlers/configurable.rb,
lib/daimon_skycrawlers/consumer/url.rb,
lib/daimon_skycrawlers/crawler/base.rb,
lib/daimon_skycrawlers/storage/base.rb,
lib/daimon_skycrawlers/storage/file.rb,
lib/daimon_skycrawlers/storage/null.rb,
lib/daimon_skycrawlers/consumer/base.rb,
lib/daimon_skycrawlers/generator/new.rb,
lib/daimon_skycrawlers/processor/base.rb,
lib/daimon_skycrawlers/processor/proc.rb,
lib/daimon_skycrawlers/sitemap_parser.rb,
lib/daimon_skycrawlers/commands/runner.rb,
lib/daimon_skycrawlers/crawler/default.rb,
lib/daimon_skycrawlers/commands/enqueue.rb,
lib/daimon_skycrawlers/generator/filter.rb,
lib/daimon_skycrawlers/processor/spider.rb,
lib/daimon_skycrawlers/generator/crawler.rb,
lib/daimon_skycrawlers/processor/default.rb,
lib/daimon_skycrawlers/generator/generate.rb,
lib/daimon_skycrawlers/generator/processor.rb,
lib/daimon_skycrawlers/filter/update_checker.rb,
lib/daimon_skycrawlers/consumer/http_response.rb,
lib/daimon_skycrawlers/filter/duplicate_checker.rb,
lib/daimon_skycrawlers/filter/robots_txt_checker.rb
Overview
Name space for this library
Defined Under Namespace
Modules: Callbacks, ConfigMixin, Configurable, Consumer, Crawler, Filter, LoggerMixin, Processor, Storage, Timer Classes: Configuration, Logger, Queue, SitemapParser
Constant Summary collapse
- VERSION =
Version of this library
"1.0.0"
Class Method Summary collapse
-
.configuration ⇒ DaimonSkycrawlers::Configuration
Retrieve configuration object.
-
.configure {|configuration| ... } ⇒ void
Configure DaimonSkycrawlers.
-
.env ⇒ Object
Return current environment.
-
.load_crawlers ⇒ void
Load "app/crawlers/*/.rb".
-
.load_init ⇒ void
Load "config/init.rb".
-
.load_processors ⇒ void
Load "app/processors/*/.rb".
-
.register_crawler(crawler) ⇒ void
Register a crawler.
-
.register_processor(processor = nil, &block) ⇒ Object
Register a processor.
Class Method Details
.configuration ⇒ DaimonSkycrawlers::Configuration
Retrieve configuration object
52 53 54 55 56 57 58 59 |
# File 'lib/daimon_skycrawlers.rb', line 52 def configuration @configuration ||= DaimonSkycrawlers::Configuration.new.tap do |config| config.logger = DaimonSkycrawlers::Logger.default config.queue_name_prefix = "daimon-skycrawlers" config.crawler_interval = 1 config.shutdown_interval = 10 end end |
.configure {|configuration| ... } ⇒ void
This method returns an undefined value.
Configure DaimonSkycrawlers
DaimonSkycrawlers.configure do |config|
config.logger = DaimonSkycrawlers::Logger.default
config.queue_name_prefix = "daimon-skycrawlers"
config.crawler_interval = 1
config.shutdown_interval = 10
end
- logger: logger instance
- queue_name_prefix: prefix of queue name.
- crawler_interval: crawling interval
- shutdown_interval: shutdown after interval after the queue is empty
82 83 84 |
# File 'lib/daimon_skycrawlers.rb', line 82 def configure yield configuration end |
.env ⇒ Object
Return current environment
125 126 127 |
# File 'lib/daimon_skycrawlers.rb', line 125 def env ENV["SKYCRAWLERS_ENV"] || "development" end |
.load_crawlers ⇒ void
This method returns an undefined value.
Load "app/crawlers/*/.rb"
103 104 105 106 107 108 |
# File 'lib/daimon_skycrawlers.rb', line 103 def load_crawlers Dir.glob("app/crawlers/**/*.rb") do |path| require(File.(path, Dir.pwd)) && DaimonSkycrawlers.configuration.logger.info("Loaded crawler: #{path}") end end |
.load_init ⇒ void
This method returns an undefined value.
Load "config/init.rb"
91 92 93 94 95 96 |
# File 'lib/daimon_skycrawlers.rb', line 91 def load_init require(File.("config/init.rb", Dir.pwd)) rescue LoadError => ex puts ex. exit(false) end |
.load_processors ⇒ void
This method returns an undefined value.
Load "app/processors/*/.rb"
115 116 117 118 119 120 |
# File 'lib/daimon_skycrawlers.rb', line 115 def load_processors Dir.glob("app/processors/**/*.rb") do |path| require(File.(path, Dir.pwd)) && DaimonSkycrawlers.configuration.logger.info("Loaded processor: #{path}") end end |
.register_crawler(crawler) ⇒ void
This method returns an undefined value.
Register a crawler
43 44 45 |
# File 'lib/daimon_skycrawlers.rb', line 43 def register_crawler(crawler) DaimonSkycrawlers::Consumer::URL.register(crawler) end |
.register_processor(processor) ⇒ void .register_processor {|message| ... } ⇒ void
Register a processor
33 34 35 |
# File 'lib/daimon_skycrawlers.rb', line 33 def register_processor(processor = nil, &block) DaimonSkycrawlers::Consumer::HTTPResponse.register(processor, &block) end |