Class: ElasticGraph::GraphQL::DatastoreQuery

Inherits:
Object
  • Object
show all
Defined in:
lib/elastic_graph/graphql/datastore_query.rb,
lib/elastic_graph/graphql/datastore_query/paginator.rb,
lib/elastic_graph/graphql/datastore_query/routing_picker.rb,
lib/elastic_graph/graphql/datastore_query/document_paginator.rb,
lib/elastic_graph/graphql/datastore_query/index_expression_builder.rb

Overview

An immutable class that represents a datastore query. Since this represents a datastore query, and not a GraphQL query, all the data in it is modeled in datastore terms, not GraphQL terms. For example, any field names in a ‘Query` should be references to index fields, not GraphQL fields.

Filters are modeled as a ‘Set` of filtering hashes. While we usually expect only a single `filter` hash, modeling it as a set makes it easy for us to support merging queries. The datastore knows how to apply multiple `must` clauses that apply to the same field, giving us the exact semantics we want in such a situation with minimal effort.

Defined Under Namespace

Classes: Builder, IndexExpression, Paginator

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.perform(queries) ⇒ Object

Performs a list of queries by building a hash of datastore msearch header/body tuples (keyed by query), yielding them to the caller, and then post-processing the results. The caller is responsible for returning a hash of responses by query from its block.

Note that some of the passed queries may not be yielded to the caller; when we can tell that a query does not have to be sent to the datastore we avoid yielding it from here. Therefore, the caller should not assume that all queries passed to this method will be yielded back.

The return value is a hash of ‘DatastoreResponse::SearchResponse` objects by query.

Note: this method uses ‘send` to work around ruby visibility rules. We do not want `#decoded_cursor_factory` to be public, as we only need it here, but we cannot access it from a class method without using `send`.



93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 93

def self.perform(queries)
  empty_queries, present_queries = queries.partition(&:empty?)

  responses_by_query = Aggregation::QueryOptimizer.optimize_queries(present_queries) do |optimized_queries|
    header_body_tuples_by_query = optimized_queries.each_with_object({}) do |query, hash|
      hash[query] = query.to_datastore_msearch_header_and_body
    end

    yield(header_body_tuples_by_query)
  end

  empty_responses = empty_queries.each_with_object({}) do |query, hash|
    hash[query] = DatastoreResponse::SearchResponse::RAW_EMPTY
  end

  empty_responses.merge(responses_by_query).each_with_object({}) do |(query, response), hash|
    hash[query] = DatastoreResponse::SearchResponse.build(response, decoded_cursor_factory: query.send(:decoded_cursor_factory))
  end.tap do |responses_hash|
    # Callers expect this `perform` method to provide an invariant: the returned hash MUST contain one entry
    # for each of the `queries` passed in the args. In practice, violating this invariant primarily causes a
    # problem when the caller uses the `GraphQL::Dataloader` (which happens for every GraphQL request in production...).
    # However, our tests do not always run queries end-to-end, so this is an added check we want to do, so that
    # anytime our logic here fails to include a query in the response in any test, we'll be notified of the
    # problem.
    expected_queries = queries.to_set
    actual_queries = responses_hash.keys.to_set

    if expected_queries != actual_queries
      missing_queries = expected_queries - actual_queries
      extra_queries = actual_queries - expected_queries

      raise SearchFailedError, "The `responses_hash` does not have the expected set of queries as keys. " \
        "This can cause problems for the `GraphQL::Dataloader` and suggests a bug in the logic that should be fixed.\n\n" \
        "Missing queries (#{missing_queries.size}):\n#{missing_queries.map(&:inspect).join("\n")}.\n\n" \
        "Extra queries (#{extra_queries.size}): #{extra_queries.map(&:inspect).join("\n")}"
    end
  end
end

Instance Method Details

#cluster_nameObject

Returns the name of the datastore cluster as a String where this query should be setn. Unless exactly 1 cluster name is found, this method raises a ConfigError.

Raises:

  • (ConfigError)


181
182
183
184
185
186
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 181

def cluster_name
  cluster_name = search_index_definitions.map(&:cluster_to_query).uniq
  return cluster_name.first if cluster_name.size == 1
  raise ConfigError, "Found different datastore clusters (#{cluster_name}) to query " \
    "for query targeting indices: #{search_index_definitions}"
end

#document_paginatorObject



252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 252

def document_paginator
  @document_paginator ||= DocumentPaginator.new(
    sort_clauses: sort_with_tiebreaker,
    individual_docs_needed: individual_docs_needed,
    total_document_count_needed: total_document_count_needed,
    decoded_cursor_factory: decoded_cursor_factory,
    schema_element_names: schema_element_names,
    paginator: Paginator.new(
      default_page_size: default_page_size,
      max_page_size: max_page_size,
      first: document_pagination[:first],
      after: document_pagination[:after],
      last: document_pagination[:last],
      before: document_pagination[:before],
      schema_element_names: schema_element_names
    )
  )
end

#empty?Boolean

Indicates if the query does not need any results from the datastore. As an optimization, we can reply with a default “empty” response for an empty query.

Returns:

  • (Boolean)


224
225
226
227
228
229
230
231
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 224

def empty?
  # If we are searching no indices or routing to an empty set of shards, there is no need to query the datastore at all.
  # This only happens when our filter processing has deduced that the query will match no results.
  return true if search_index_expression.empty? || shard_routing_values&.empty?

  datastore_body = to_datastore_body
  datastore_body.fetch(:size) == 0 && !datastore_body.fetch(:track_total_hits) && aggregations_datastore_body.empty?
end

#hashObject

‘DatastoreQuery` objects are used as keys in a hash. Computing `#hash` can be expensive (given how many fields an `DatastoreQuery` has) and it’s safe to cache since ‘DatastoreQuery` instances are immutable, so we memoize it here. We’ve observed this making a very noticeable difference in our test suite runtime.



248
249
250
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 248

def hash
  @hash ||= super
end

#inspectObject



233
234
235
236
237
238
239
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 233

def inspect
  description = to_datastore_msearch_header.merge(to_datastore_body).map do |key, value|
    "#{key}=#{(key == :query) ? "<REDACTED>" : value.inspect}"
  end.join(" ")

  "#<#{self.class.name} #{description}>"
end

#merge(other_query) ⇒ Object

Merges the provided query, returning a new combined query object. Both query objects are left unchanged.



134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 134

def merge(other_query)
  if search_index_definitions != other_query.search_index_definitions
    raise ElasticGraph::InvalidMergeError, "`search_index_definitions` conflict while merging between " \
      "#{search_index_definitions} and #{other_query.search_index_definitions}"
  end

  with(
    individual_docs_needed: individual_docs_needed || other_query.individual_docs_needed,
    total_document_count_needed: total_document_count_needed || other_query.total_document_count_needed,
    filters: filters + other_query.filters,
    sort: merge_attribute(other_query, :sort),
    requested_fields: requested_fields + other_query.requested_fields,
    document_pagination: merge_attribute(other_query, :document_pagination),
    monotonic_clock_deadline: [monotonic_clock_deadline, other_query.monotonic_clock_deadline].compact.min,
    aggregations: aggregations.merge(other_query.aggregations)
  )
end

#merge_with(**query_options) ⇒ Object

Convenience method for merging when you do not have access to an ‘DatastoreQuery::Builder`. Allows you to pass the query options you would like to merge. As with `#merge`, leaves the original query unchanged and returns a combined query object.



156
157
158
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 156

def merge_with(**query_options)
  merge(with(**query_options))
end

#route_with_field_pathsObject

Returns a list of unique field paths that should be used for shard routing during searches.

If a search is filtering on one of these fields, we can optimize the search by routing it to only the shards containing documents for that routing value.

Note that this returns a list due to our support for type unions. A unioned type can be composed of subtypes that have use different shard routing; this will return the set union of them all.



196
197
198
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 196

def route_with_field_paths
  search_index_definitions.map(&:route_with).uniq
end

#search_index_expressionObject

Returns an index_definition expression string to use for searches. This string can specify multiple indices, use wildcards, etc. For info about what is supported, see: www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html



169
170
171
172
173
174
175
176
177
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 169

def search_index_expression
  @search_index_expression ||= index_expression_builder.determine_search_index_expression(
    filters,
    search_index_definitions,
    # When we have aggregations, we must require indices to search. When we search no indices, the datastore does not return
    # the standard aggregations response structure, which causes problems.
    require_indices: !aggregations_datastore_body.empty?
  ).to_s
end

#shard_routing_valuesObject

The shard routing values used for this search. Can be ‘nil` if the query will hit all shards. `[]` means that we are routing to no shards.



202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 202

def shard_routing_values
  return @shard_routing_values if defined?(@shard_routing_values)
  routing_values = routing_picker.extract_eligible_routing_values(filters, route_with_field_paths)

  @shard_routing_values ||=
    if routing_values&.empty? && !aggregations_datastore_body.empty?
      # If we return an empty array of routing values, no shards will get searched, which causes a problem for aggregations.
      # When a query includes aggregations, there are normally aggregation structures on the respopnse (even when there are no
      # search hits to aggregate over!) but if there are no routing values, those aggregation structures will be missing from
      # the response. It's complex to handle that in our downstream response handling code, so we prefer to force a "fallback"
      # routing value here to ensure that at least one shard gets searched. Which shard gets searched doesn't matter; the search
      # filter that led to an empty set of routing values will match on documents on any shard.
      ["fallback_shard_routing_value"]
    elsif contains_ignored_values_for_routing?(routing_values)
      nil
    else
      routing_values&.sort # order doesn't matter, but sorting it makes it easier to assert on in our tests.
    end
end

#to_datastore_msearch_headerObject



241
242
243
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 241

def to_datastore_msearch_header
  @to_datastore_msearch_header ||= {index: search_index_expression, routing: shard_routing_values&.join(",")}.compact
end

#to_datastore_msearch_header_and_bodyObject

Pairs the multi-search headers and body into a tuple, as per the format required by the datastore: www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html#search-multi-search-api-desc



162
163
164
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 162

def to_datastore_msearch_header_and_body
  @to_datastore_msearch_header_and_body ||= [to_datastore_msearch_header, to_datastore_body]
end