Class: ElasticGraph::GraphQL::Aggregation::QueryOptimizer

Inherits:
Object
  • Object
show all
Defined in:
lib/elastic_graph/graphql/aggregation/query_optimizer.rb

Overview

This class is used by ‘DatastoreQuery.perform` to optimize away an inefficiency that’s present in our aggregations API. To explain what this does, it’s useful to see an example:

“‘ query WigdetsBySizeAndColor($filter: WidgetFilterInput) {

by_size: widgetAggregations(filter: $filter) {
  edges { node {
    size
    count
  } }
}

by_color: widgetAggregations(filter: $filter) {
  edges { node {
    color
    count
  } }
}

} “‘

With this API, two separate datastore queries get built–one for ‘by_size`, and one for `by_color`. While we’re able to send them to the datastore in a single ‘msearch` request, as it allows a single search to have multiple aggregations in it. The aggregations API we offered before April 2023 directly supported this, allowing for more efficient queries. (But it had other significant downsides).

We found that sending 2 queries is significantly slower than sending one combined query (from benchmarks/aggregations_old_vs_new_api.rb):

Benchmarks for old API (300 times): Average took value: 15 Median took value: 14 P99 took value: 45

Benchmarks for new API (300 times): Average took value: 28 Median took value: 25 P99 took value: 75

This class optimizes this case by merging ‘DatastoreQuery` objects together when we can safely do so, in order to execute fewer datastore queries. Notably, while this was designed for this specific aggregations case, the merging logic can also apply in non-aggregations case.

Note that we want to err on the side of safety here. We only merge queries if their datastore payloads are byte-for-byte identical when aggregations are excluded. There are some cases where we could merge slightly differing queries in clever ways (for example, if the only difference is ‘track_total_hits: false` vs `track_total_hits: true`, we could merge to a single query with `track_total_hits: true`), but that’s significantly more complex and error prone, so we do not do it. We can always improve this further in the future to cover more cases.

NOTE: the ‘QueryOptimizer` assumes that `Aggregation::Query` will always produce aggregation keys using `Aggregation::Query#name` such that `Aggregation::Key.extract_aggregation_name_from` is able to extract the original name from response keys. If that is violated, it will not work properly and subtle bugs can result. However, we have a test helper method which is hooked into our unit and integration tests for `DatastoreQuery` (`verify_aggregations_satisfy_optimizer_requirements`) which verifies that this requirement is satisfied.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(original_queries, logger:) ⇒ QueryOptimizer

Returns a new instance of QueryOptimizer.



79
80
81
82
83
84
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 79

def initialize(original_queries, logger:)
  @original_queries = original_queries
  @logger = logger
  last_id = 0
  @unique_prefix_by_query = ::Hash.new { |h, k| h[k] = "#{last_id += 1}_" }
end

Class Method Details

.optimize_queries(queries) ⇒ Object



72
73
74
75
76
77
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 72

def self.optimize_queries(queries)
  return {} if queries.empty?
  optimizer = new(queries, logger: (_ = queries.first).logger)
  responses_by_query = yield optimizer.merged_queries
  optimizer.unmerge_responses(responses_by_query)
end

Instance Method Details

#merged_queriesObject



86
87
88
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 86

def merged_queries
  original_queries_by_merged_query.keys
end

#unmerge_responses(responses_by_merged_query) ⇒ Object



90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 90

def unmerge_responses(responses_by_merged_query)
  original_queries_by_merged_query.flat_map do |merged, originals|
    # When we only had a single query to start with, we didn't change the query at all, and don't need to unmerge the response.
    needs_unmerging = originals.size > 1

    originals.filter_map do |orig|
      if (merged_response = responses_by_merged_query[merged])
        response = needs_unmerging ? unmerge_response(merged_response, orig) : merged_response
        [orig, response]
      end
    end
  end.to_h
end