Module: Wgit

Defined in:
lib/wgit/web_crawler.rb,
lib/wgit/url.rb,
lib/wgit/utils.rb,
lib/wgit/crawler.rb,
lib/wgit/version.rb,
lib/wgit/document.rb,
lib/wgit/assertable.rb,
lib/wgit/database/model.rb,
lib/wgit/database/database.rb,
lib/wgit/database/mongo_connection_details.rb

Overview

Author:

  • Michael Telford

Defined Under Namespace

Modules: Assertable, Model, Utils Classes: Crawler, Database, Document, Url, WebCrawler

Constant Summary collapse

VERSION =
"0.0.1".freeze
DB_PROVIDER =
:MongoLabs.freeze
CONNECTION_DETAILS =

MongoLabs (MongoDB 3.0)

{
  :host           => "ds037205.mongolab.com",
  :port           => "37205",
  :db             => "crawler",
  :uname          => "rubyapp",
  :pword          => "R5jUKv1fessb",
}.freeze

Class Method Summary collapse

Class Method Details

.crawl_the_web(max_sites_to_crawl = -1,, max_data_size = 1048576000) ⇒ Object

Convience method to crawl the World Wide Web. The default value (-1) for max_sites_to_crawl is unrestricted. The default max_data_size is 1GB.



12
13
14
15
16
# File 'lib/wgit/web_crawler.rb', line 12

def self.crawl_the_web(max_sites_to_crawl = -1, max_data_size = 1048576000)
  db = Wgit::Database.new
  web_crawler = Wgit::WebCrawler.new(db, max_sites_to_crawl, max_data_size)
  web_crawler.crawl_the_web
end