Class: Ailurus::Dataset
- Inherits:
-
Object
- Object
- Ailurus::Dataset
- Defined in:
- lib/ailurus/dataset.rb,
lib/ailurus/dataset/create.rb,
lib/ailurus/dataset/search.rb,
lib/ailurus/dataset/update.rb,
lib/ailurus/dataset/metadata.rb
Overview
Public: A class corresponding to a PANDA Dataset.
client - An Ailurus::Client instance (see ‘/lib/ailurus/client.rb`). slug - The slug to a PANDA Dataset, as described at
http://panda.readthedocs.org/en/1.1.1/api.html#datasets
Instance Attribute Summary collapse
-
#client ⇒ Object
Returns the value of attribute client.
-
#slug ⇒ Object
Returns the value of attribute slug.
Instance Method Summary collapse
-
#create(columns = [], additional_params = {}) ⇒ Object
Public: Create this Dataset on the server.
-
#data_page(query = nil, page_num = 0, rows_per_page = 100, additional_params = {}) ⇒ Object
Internal: Retrieve a set of rows from the Dataset, specified by page number and page length.
-
#data_rows(query = nil, offset = 0, limit = 100, additional_params = {}) ⇒ Object
Internal: Retrieve a set of rows from the Dataset, specified by offset and length.
-
#get_indexed_name(field_name) ⇒ Object
Public: Get the indexed name for a field so you can perform more detailed searches if desired.
-
#initialize(client, slug) ⇒ Dataset
constructor
A new instance of Dataset.
-
#metadata ⇒ Object
Public: Retrieve metadata about this Dataset.
-
#search(query = nil, options = {}) ⇒ Object
Public: Search the Dataset with a given query.
-
#update(rows = []) ⇒ Object
Public: Update the data in this Dataset.
Constructor Details
#initialize(client, slug) ⇒ Dataset
Returns a new instance of Dataset.
15 16 17 18 |
# File 'lib/ailurus/dataset.rb', line 15 def initialize(client, slug) @client = client @slug = slug end |
Instance Attribute Details
#client ⇒ Object
Returns the value of attribute client.
13 14 15 |
# File 'lib/ailurus/dataset.rb', line 13 def client @client end |
#slug ⇒ Object
Returns the value of attribute slug.
13 14 15 |
# File 'lib/ailurus/dataset.rb', line 13 def slug @slug end |
Instance Method Details
#create(columns = [], additional_params = {}) ⇒ Object
Public: Create this Dataset on the server.
This instance’s @slug will be used as its ‘name`, too, as that’s defined in the API.
columns - An Array of Hashes, one per column, about columns expected to
be in the Dataset. Each Hash SHOULD contain at least a :name,
but it also MAY contain a :type (e.g., "int", "unicode",
"bool"--default is "unicode") and/or :index (true or false,
depending on whether you want the column to be indexed--default
is false) (default: none).
additional_params - A Hash of other properties to set on the Dataset,
such as description and title (default: none).
Returns a metadata object, such as the one returned by Dataset#metadata.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/ailurus/dataset/create.rb', line 19 def create(columns = [], additional_params = {}) # Start with the bare minimum. payload = { "name" => @slug, "slug" => @slug } # Add the columns. This requires the addition of up to three separate # parameters, each comma-delimited and in a consistent order. column_info = {} if not columns.empty? column_info["columns"] = columns.each_with_index.map do |column, index| column.fetch(:name, "column_#{index}") end.join(",") column_info["column_types"] = columns.map do |column| column.fetch(:type, "unicode") end.join(",") column_info["typed_columns"] = columns.map do |column| # FIXME: Probably should check whether non-false values _actually_ # are true. column.fetch(:index, false).to_s end.join(",") end # Add other properties as specified. payload.merge!(additional_params) # Let's do this thing! @client.make_request( "/api/1.0/dataset/", :method => :post, :query => column_info, :body => payload) end |
#data_page(query = nil, page_num = 0, rows_per_page = 100, additional_params = {}) ⇒ Object
Internal: Retrieve a set of rows from the Dataset, specified by page number and page length.
query - A query string to use when searching the data. page_num - The 0-indexed page number of data to retrieve. rows_per_page - The number of rows to include on each page.
Returns an Array of Arrays.
47 48 49 50 51 52 53 54 |
# File 'lib/ailurus/dataset/search.rb', line 47 def data_page( query = nil, page_num = 0, rows_per_page = 100, additional_params = {}) self.data_rows( query = query, offset = page_num * rows_per_page, limit = rows_per_page, additional_params = additional_params) end |
#data_rows(query = nil, offset = 0, limit = 100, additional_params = {}) ⇒ Object
Internal: Retrieve a set of rows from the Dataset, specified by offset and length.
query - A query string to use when searching the data. offset - The number of rows to exclude from the beginning of the results
before returning what follows; for example, to get the last
third of a 30-row set, you would need an offset of 20.
limit - The maximum number of rows to return, after honoring the
offset; for example, to get the last third of a 30-row set, you
would need a limit of 10.
Returns an Array of Arrays.
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
# File 'lib/ailurus/dataset/search.rb', line 15 def data_rows(query = nil, offset = 0, limit = 100, additional_params = {}) endpoint = "/api/1.0/dataset/#{slug}/data/" params = { "offset" => offset, "limit" => limit } if query.nil? raise NotImplementedError, ( "API returns unexpected results without a query present, so query is required for now.") else params["q"] = query end params.merge!(additional_params) res = @client.make_request(endpoint, :query => params) if res.objects.empty? && res..next.nil? raise RangeError, "No data available for offset #{offset}" end res.objects.map { |row| row.data } end |
#get_indexed_name(field_name) ⇒ Object
Public: Get the indexed name for a field so you can perform more detailed searches if desired.
column_name - A String matching the name of a column in the Dataset.
Returns a String or nil, depending on whether the field is indexed.
25 26 27 28 29 30 31 |
# File 'lib/ailurus/dataset/metadata.rb', line 25 def get_indexed_name(field_name) column_schema = self..column_schema indexed_names_by_column_name = Hash[column_schema.map do |schema_entry| [schema_entry.name, schema_entry.indexed_name] end] indexed_names_by_column_name[field_name] end |
#metadata ⇒ Object
Public: Retrieve metadata about this Dataset.
TODO: Figure out a good way to cache this so we don’t keep hitting it.
Returns a Hash.
10 11 12 13 14 15 16 17 |
# File 'lib/ailurus/dataset/metadata.rb', line 10 def endpoint = "/api/1.0/dataset/#{@slug}/" begin @client.make_request(endpoint) rescue JSON::JSONError nil end end |
#search(query = nil, options = {}) ⇒ Object
Public: Search the Dataset with a given query.
Queries currently are required due to some observed problems with the PANDA API. See Dataset#data_rows.
query - A query string to use when searching the data.
Returns an Array of Arrays.
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/ailurus/dataset/search.rb', line 64 def search(query = nil, = {}) # Handle optional arguments. max_results = .fetch(:max_results, nil) additional_params = .fetch(:options, {}) rows = [] page_num = 0 while true # Warning: Infinite loop! Remember to break. # Get the current page of results. If there aren't any results on that # page, we're done. begin rows.concat(self.data_page( query = query, page_num = page_num, rows_per_page = 100, additional_params = additional_params)) rescue RangeError break # Escape the infinite loop! end # If we have at least as many results as we're supposed to return, # we're done! (Truncating as necessary, of course.) if !max_results.nil? && rows.length >= max_results rows.slice!(max_results, rows.length - max_results) break # Escape the infinite loop! end # Move on to the next page. page_num += 1 end rows end |
#update(rows = []) ⇒ Object
Public: Update the data in this Dataset.
rows - An Array of data Hashes containing the following properties:
"objects" - An Array of Strings in the same order as this
Dataset's columns. If you don't know what order
your columns are in, call Dataset#metadata and
check the result's `column_schema` attribute.
"external_id" - An optional String identifying this row of data.
Providing an external ID will allow future calls
to Dataset#update to update this row with new
information (assuming the same ID is used for one
of its rows) rather than create a new row
altogether. See http://bit.ly/1zeeax1 for more
information.
Returns an OpenStruct describing the rows that were created and/or updated.
20 21 22 23 24 25 26 27 |
# File 'lib/ailurus/dataset/update.rb', line 20 def update(rows = []) @client.make_request( "/api/1.0/dataset/#{@slug}/data/", :method => :put, :body => { "objects" => rows }) end |