Class: ElasticGraph::GraphQL::Aggregation::QueryOptimizer
- Inherits:
-
Object
- Object
- ElasticGraph::GraphQL::Aggregation::QueryOptimizer
- Defined in:
- lib/elastic_graph/graphql/aggregation/query_optimizer.rb
Overview
This class is used by ‘DatastoreQuery.perform` to optimize away an inefficiency that’s present in our aggregations API. To explain what this does, it’s useful to see an example:
“‘ query WigdetsBySizeAndColor($filter: WidgetFilterInput) {
by_size: widgetAggregations(filter: $filter) {
edges { node {
size
count
} }
}
by_color: widgetAggregations(filter: $filter) {
edges { node {
color
count
} }
}
} “‘
With this API, two separate datastore queries get built–one for ‘by_size`, and one for `by_color`. While we’re able to send them to the datastore in a single ‘msearch` request, as it allows a single search to have multiple aggregations in it. The aggregations API we offered before April 2023 directly supported this, allowing for more efficient queries. (But it had other significant downsides).
We found that sending 2 queries is significantly slower than sending one combined query (from benchmarks/aggregations_old_vs_new_api.rb):
Benchmarks for old API (300 times): Average took value: 15 Median took value: 14 P99 took value: 45
Benchmarks for new API (300 times): Average took value: 28 Median took value: 25 P99 took value: 75
This class optimizes this case by merging ‘DatastoreQuery` objects together when we can safely do so, in order to execute fewer datastore queries. Notably, while this was designed for this specific aggregations case, the merging logic can also apply in non-aggregations case.
Note that we want to err on the side of safety here. We only merge queries if their datastore payloads are byte-for-byte identical when aggregations are excluded. There are some cases where we could merge slightly differing queries in clever ways (for example, if the only difference is ‘track_total_hits: false` vs `track_total_hits: true`, we could merge to a single query with `track_total_hits: true`), but that’s significantly more complex and error prone, so we do not do it. We can always improve this further in the future to cover more cases.
NOTE: the ‘QueryOptimizer` assumes that `Aggregation::Query` will always produce aggregation keys using `Aggregation::Query#name` such that `Aggregation::Key.extract_aggregation_name_from` is able to extract the original name from response keys. If that is violated, it will not work properly and subtle bugs can result. However, we have a test helper method which is hooked into our unit and integration tests for `DatastoreQuery` (`verify_aggregations_satisfy_optimizer_requirements`) which verifies that this requirement is satisfied.
Class Method Summary collapse
Instance Method Summary collapse
-
#initialize(original_queries, logger:) ⇒ QueryOptimizer
constructor
A new instance of QueryOptimizer.
- #merged_queries ⇒ Object
- #unmerge_responses(responses_by_merged_query) ⇒ Object
Constructor Details
#initialize(original_queries, logger:) ⇒ QueryOptimizer
Returns a new instance of QueryOptimizer.
79 80 81 82 83 84 |
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 79 def initialize(original_queries, logger:) @original_queries = original_queries @logger = logger last_id = 0 @unique_prefix_by_query = ::Hash.new { |h, k| h[k] = "#{last_id += 1}_" } end |
Class Method Details
.optimize_queries(queries) ⇒ Object
72 73 74 75 76 77 |
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 72 def self.optimize_queries(queries) return {} if queries.empty? optimizer = new(queries, logger: (_ = queries.first).logger) responses_by_query = yield optimizer.merged_queries optimizer.unmerge_responses(responses_by_query) end |
Instance Method Details
#merged_queries ⇒ Object
86 87 88 |
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 86 def merged_queries original_queries_by_merged_query.keys end |
#unmerge_responses(responses_by_merged_query) ⇒ Object
90 91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'lib/elastic_graph/graphql/aggregation/query_optimizer.rb', line 90 def unmerge_responses(responses_by_merged_query) original_queries_by_merged_query.flat_map do |merged, originals| # When we only had a single query to start with, we didn't change the query at all, and don't need to unmerge the response. needs_unmerging = originals.size > 1 originals.filter_map do |orig| if (merged_response = responses_by_merged_query[merged]) response = needs_unmerging ? unmerge_response(merged_response, orig) : merged_response [orig, response] end end end.to_h end |