Module: MyModule
- Extended by:
- MyModule, OpencBot, OpencBot::CompanyFetcherBot
- Included in:
- MyModule
- Defined in:
- lib/openc_bot/templates/lib/bot.rb,
lib/openc_bot/templates/lib/company_fetcher_bot.rb
Overview
uncomment (and line further down) to get Date helper methods. (Also available csv and text helpers) require ‘openc_bot/helpers/dates’
Constant Summary
Constants included from OpencBot
Constants included from OpencBot::Helpers::RegisterMethods
OpencBot::Helpers::RegisterMethods::MAX_BUSY_RETRIES, OpencBot::Helpers::RegisterMethods::MAX_STALE_COUNT
Instance Method Summary collapse
-
#computed_registry_url(company_number) ⇒ Object
If the register has a GET’able URL based on the company_number define it here.
- #export_data ⇒ Object
- #prepare_for_export(raw_data) ⇒ Object
-
#process_datum(datum_hash) ⇒ Object
This method must be defined for all bots that can fetch and process individual records, e.g.
-
#search_for_entities_for_term(term, options = {}) ⇒ Object
This method is called by #fetch_data_via_alpha_search (defined in AlphaSearch helper), and is passed a search term, typically search term of a number of characters (e.g. ‘AB’, ‘AC’…).
- #update_data ⇒ Object
Methods included from OpencBot
db_location, db_name, export, extended, insert_or_update, root_directory, save_data, save_run_report, spotcheck, sqlite_busy_timeout, sqlite_magic_connection, table_summary, unlock_database, verbose?
Methods included from OpencBot::Helpers::Text
#normalise_utf8_spaces, #strip_all_spaces
Methods included from OpencBot::CompanyFetcherBot
fetch_datum, inferred_jurisdiction_code, primary_key_name, save_entity, save_entity!, schema_name
Methods included from OpencBot::Helpers::AlphaSearch
#alpha_terms, #each_search_term, #fetch_data_via_alpha_search, #get_results_and_extract_data_for, #letters_and_numbers, #numbers_of_chars_in_search
Methods included from OpencBot::Helpers::RegisterMethods
#datum_exists?, #default_stale_count, #fetch_data, #fetch_registry_page, #get_raw_data, #post_process, #prepare_and_save_data, #primary_key_name, #raise_when_saving_invalid_record, #raw_data_file_location, #registry_url, #registry_url_from_db, #save_entity, #save_entity!, #save_raw_data, #schema_name, #stale_entry_uids, #update_datum, #update_stale, #use_alpha_search, #validate_datum
Methods included from OpencBot::Helpers::IncrementalSearch
#entity_uid_prefixes, #entity_uid_suffixes, #fetch_data_via_incremental_search, #highest_entry_uid_result, #highest_entry_uids, #increment_number, #incremental_rewind_count, #incremental_search, #max_failed_count
Instance Method Details
#computed_registry_url(company_number) ⇒ Object
If the register has a GET’able URL based on the company_number define it here. This should mean that #fetch_datum ‘just works’.
30 31 32 33 |
# File 'lib/openc_bot/templates/lib/company_fetcher_bot.rb', line 30 def computed_registry_url(company_number) # e.g. # "http://some,register.com/path/to/#{company_number}" end |
#export_data ⇒ Object
12 13 14 15 16 17 18 19 20 21 22 23 |
# File 'lib/openc_bot/templates/lib/bot.rb', line 12 def export_data # This is the basic functionality for exporting the data from the database. By default the data # table (what is created when you save_data) is called ocdata, but it can be called anything else, # and the query can be more complex, returning, for example, only the most recent results. sql_query = "ocdata.* from ocdata" select(sql_query).collect do |raw_datum| # raw_datum will be a Hash of field names (as symbols) for the keys and the values for each field. # It should be converted to the format necessary for importing into OpenCorporates by using a # prepare_for_export method. prepare_for_export(raw_datum) end end |
#prepare_for_export(raw_data) ⇒ Object
25 26 27 28 |
# File 'lib/openc_bot/templates/lib/bot.rb', line 25 def prepare_for_export(raw_data) # do something here to convert the raw data from the database (if you are using one) into # the form required by the export. end |
#process_datum(datum_hash) ⇒ Object
This method must be defined for all bots that can fetch and process individual records, e.g. incremental searchers, or individual company pages in an alpha search. Where the bot cannot do this (e.g. where the underlying data is only available as a CSV file, or there are no individual pages for each company, it can be left as a stub method) It should return a hash that conforms to the company-schema schema (and it will be checked) against this schema before saving
68 69 70 |
# File 'lib/openc_bot/templates/lib/company_fetcher_bot.rb', line 68 def process_datum(datum_hash) # write your code to parse what is in the company pages/data end |
#search_for_entities_for_term(term, options = {}) ⇒ Object
This method is called by #fetch_data_via_alpha_search (defined in AlphaSearch helper), and is passed a search term, typically search term of a number of characters (e.g. ‘AB’, ‘AC’…). This method should yield a hash of company data which can be validated to the company-schema
91 92 93 |
# File 'lib/openc_bot/templates/lib/company_fetcher_bot.rb', line 91 def search_for_entities_for_term(term, ={}) # write your code to search all the pages for the given term, and yield a series of company hashes end |
#update_data ⇒ Object
30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/openc_bot/templates/lib/bot.rb', line 30 def update_data # write code here (using other methods if necessary) for # updating your local database with data from the source # that you are scraping or fetching from # # # See https://github.com/openc/openc_bot README for details # save_data([:uid,:date], my_data, sometablename) # # After updating the data you should run save_run_report, which # saves the status (and other data, if applicable) save_run_report(:status => 'success') end |