Module: MyModule

Extended by:
MyModule, OpencBot, OpencBot::CompanyFetcherBot
Included in:
MyModule
Defined in:
lib/openc_bot/templates/lib/bot.rb,
lib/openc_bot/templates/lib/company_fetcher_bot.rb

Overview

uncomment (and line further down) to get Date helper methods. (Also available csv and text helpers) require ‘openc_bot/helpers/dates’

Constant Summary

Constants included from OpencBot

OpencBot::VERSION

Constants included from OpencBot::Helpers::RegisterMethods

OpencBot::Helpers::RegisterMethods::MAX_BUSY_RETRIES, OpencBot::Helpers::RegisterMethods::MAX_STALE_COUNT

Instance Method Summary collapse

Methods included from OpencBot

db_location, db_name, export, extended, insert_or_update, root_directory, save_data, save_run_report, spotcheck, sqlite_busy_timeout, sqlite_magic_connection, table_summary, unlock_database, verbose?

Methods included from OpencBot::Helpers::Text

#normalise_utf8_spaces, #strip_all_spaces

Methods included from OpencBot::CompanyFetcherBot

fetch_datum, inferred_jurisdiction_code, primary_key_name, save_entity, save_entity!, schema_name

Methods included from OpencBot::Helpers::AlphaSearch

#alpha_terms, #each_search_term, #fetch_data_via_alpha_search, #get_results_and_extract_data_for, #letters_and_numbers, #numbers_of_chars_in_search

Methods included from OpencBot::Helpers::RegisterMethods

#datum_exists?, #default_stale_count, #fetch_data, #fetch_registry_page, #get_raw_data, #post_process, #prepare_and_save_data, #primary_key_name, #raise_when_saving_invalid_record, #raw_data_file_location, #registry_url, #registry_url_from_db, #save_entity, #save_entity!, #save_raw_data, #schema_name, #stale_entry_uids, #update_datum, #update_stale, #use_alpha_search, #validate_datum

Methods included from OpencBot::Helpers::IncrementalSearch

#entity_uid_prefixes, #entity_uid_suffixes, #fetch_data_via_incremental_search, #highest_entry_uid_result, #highest_entry_uids, #increment_number, #incremental_rewind_count, #incremental_search, #max_failed_count

Instance Method Details

#computed_registry_url(company_number) ⇒ Object

If the register has a GET’able URL based on the company_number define it here. This should mean that #fetch_datum ‘just works’.



30
31
32
33
# File 'lib/openc_bot/templates/lib/company_fetcher_bot.rb', line 30

def computed_registry_url(company_number)
  # e.g.
  # "http://some,register.com/path/to/#{company_number}"
end

#export_dataObject



12
13
14
15
16
17
18
19
20
21
22
23
# File 'lib/openc_bot/templates/lib/bot.rb', line 12

def export_data
  # This is the basic functionality for exporting the data from the database. By default the data
  # table (what is created when you save_data) is called ocdata, but it can be called anything else,
  # and the query can be more complex, returning, for example, only the most recent results.
  sql_query = "ocdata.* from ocdata"
  select(sql_query).collect do |raw_datum|
    # raw_datum will be a Hash of field names (as symbols) for the keys and the values for each field.
    # It should be converted to the format necessary for importing into OpenCorporates by using a
    # prepare_for_export method.
    prepare_for_export(raw_datum)
  end
end

#prepare_for_export(raw_data) ⇒ Object



25
26
27
28
# File 'lib/openc_bot/templates/lib/bot.rb', line 25

def prepare_for_export(raw_data)
  # do something here to convert the raw data from the database (if you are using one) into
  # the form required by the export.
end

#process_datum(datum_hash) ⇒ Object

This method must be defined for all bots that can fetch and process individual records, e.g. incremental searchers, or individual company pages in an alpha search. Where the bot cannot do this (e.g. where the underlying data is only available as a CSV file, or there are no individual pages for each company, it can be left as a stub method) It should return a hash that conforms to the company-schema schema (and it will be checked) against this schema before saving



68
69
70
# File 'lib/openc_bot/templates/lib/company_fetcher_bot.rb', line 68

def process_datum(datum_hash)
  # write your code to parse what is in the company pages/data
end

#search_for_entities_for_term(term, options = {}) ⇒ Object

This method is called by #fetch_data_via_alpha_search (defined in AlphaSearch helper), and is passed a search term, typically search term of a number of characters (e.g. ‘AB’, ‘AC’…). This method should yield a hash of company data which can be validated to the company-schema



91
92
93
# File 'lib/openc_bot/templates/lib/company_fetcher_bot.rb', line 91

def search_for_entities_for_term(term, options={})
  # write your code to search all the pages for the given term, and yield a series of company hashes
end

#update_dataObject



30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/openc_bot/templates/lib/bot.rb', line 30

def update_data
  # write code here (using other methods if necessary) for
  # updating your local database with data from the source
  # that you are scraping or fetching from
  #
  # # See https://github.com/openc/openc_bot README for details
  # save_data([:uid,:date], my_data, sometablename)
  #
  # After updating the data you should run save_run_report, which
  # saves the status (and other data, if applicable)
  save_run_report(:status => 'success')
end