Module: NewsScraper

Extended by:
NewsScraper
Included in:
NewsScraper
Defined in:
lib/news_scraper.rb,
lib/news_scraper/cli.rb,
lib/news_scraper/errors.rb,
lib/news_scraper/scraper.rb,
lib/news_scraper/trainer.rb,
lib/news_scraper/version.rb,
lib/news_scraper/uri_parser.rb,
lib/news_scraper/configuration.rb,
lib/news_scraper/extractors/article.rb,
lib/news_scraper/extractors_helpers.rb,
lib/news_scraper/trainer/url_trainer.rb,
lib/news_scraper/transformers/article.rb,
lib/news_scraper/trainer/preset_selector.rb,
lib/news_scraper/extractors/google_news_rss.rb,
lib/news_scraper/transformers/trainer_article.rb,
lib/news_scraper/transformers/nokogiri/functions.rb,
lib/news_scraper/transformers/helpers/highscore_parser.rb

Defined Under Namespace

Modules: CLI, Extractors, ExtractorsHelpers, Trainer, Transformers Classes: Configuration, ResponseError, Scraper, URIParser

Constant Summary collapse

VERSION =
"1.1.1".freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#configurationObject

:nocov:



47
48
49
# File 'lib/news_scraper.rb', line 47

def configuration
  @configuration ||= Configuration.new
end

Instance Method Details

#configure {|configuration| ... } ⇒ Object

Yields:



55
56
57
# File 'lib/news_scraper.rb', line 55

def configure
  yield(configuration)
end

#reset_configurationObject



51
52
53
# File 'lib/news_scraper.rb', line 51

def reset_configuration
  @configuration = Configuration.new
end

#train(query:) ⇒ Object

NewsScraper::train is an interactive command-line prompt that:

  1. Collates all articles for the given :query

  2. Grep for :data_types using :presets in the config set in the configuration

  3. Displays the results of each :preset grep for a given :data_type

  4. Prompts to select one of the :presets or define a pattern for that domain’s :data_type

N.B: User may ignore all presets and manually configure it in the YAML file

  1. Saves the selected :preset to config/article_scrape_patterns.yml

Params

  • query: a keyword arugment specifying the query to train on

:nocov:



42
43
44
# File 'lib/news_scraper.rb', line 42

def train(query:)
  Trainer.train(query: query)
end