Class: Gitlab::Database::SimilarityScore

Inherits:

Object

Object
Gitlab::Database::SimilarityScore

Defined in:: lib/gitlab/database/similarity_score.rb

Constant Summary collapse

EMPTY_STRING =

Arel.sql("''").freeze

EXPRESSION_ON_INVALID_INPUT =

Arel::Nodes::NamedFunction.new('CAST', [Arel.sql('0').as('integer')]).freeze

DEFAULT_MULTIPLIER =

DISPLAY_NAME =

self.name.underscore.freeze

SIMILARITY_FUNCTION_CALL_WITH_ANNOTATION = Adds a “magic” comment in the generated SQL expression in order to be able to tell if we’re sorting by similarity. Example: /* gitlab/database/similarity_score */ SIMILARITY(COALESCE…

"/* #{DISPLAY_NAME} */ SIMILARITY"

Class Method Summary collapse

.build_expression(search:, rules:) ⇒ Object

This method returns an Arel expression that can be used in an ActiveRecord query to order the resultset by similarity.
.order_by_similarity?(arel_query) ⇒ Boolean

Class Method Details

.build_expression(search:, rules:) ⇒ `Object`

This method returns an Arel expression that can be used in an ActiveRecord query to order the resultset by similarity.

Note: Calculating similarity score for large volume of records is inefficient. use SimilarityScore only for smaller resultset which is already filtered by other conditions (< 10_000 records).

Parameters

search - [String] the user provided search string
rules - [{ column: COLUMN, multiplier: 1 }, { column: COLUMN_2, multiplier: 0.5 }] rules for the scoring.
- column - Arel column expression, example: Project.arel_table
- multiplier - Integer or Float to increase or decrease the score (optional, defaults to 1)

Use case

We’d like to search for projects by path, name and description. We want to rank higher the path and name matches, since it’s more likely that the user was remembering the path or the name of the project.

Rules:

[
  { column: Project.arel_table['path'], multiplier: 1 },
  { column: Project.arel_table['name'], multiplier: 1 },
  { column: Project.arel_table['description'], multiplier: 0.5 }
]

Examples

Similarity calculation based on one column:

Gitlab::Database::SimilarityScore.build_expession(search: 'my input', rules: [{ column: Project.arel_table['name'] }])

Similarity calculation based on two column, where the second column has lower priority:

Gitlab::Database::SimilarityScore.build_expession(search: 'my input', rules: [
  { column: Project.arel_table['name'], multiplier: 1 },
  { column: Project.arel_table['description'], multiplier: 0.5 }
])

Integration with an ActiveRecord query:

table = Project.arel_table

order_expression = Gitlab::Database::SimilarityScore.build_expession(search: 'input', rules: [
  { column: table['name'], multiplier: 1 },
  { column: table['description'], multiplier: 0.5 }
])

Project.where("name LIKE ?", '%' + 'input' + '%').order(order_expression.desc)

The expression can be also used in SELECT:

results = Project.select(order_expression.as('similarity')).where("name LIKE ?", '%' + 'input' + '%').order(similarity: :desc)
puts results.map(&:similarity)

# File 'lib/gitlab/database/similarity_score.rb', line 67

def self.build_expression(search:, rules:)
  return EXPRESSION_ON_INVALID_INPUT if search.blank? || rules.empty?

  quoted_search = ApplicationRecord.connection.quote(search.to_s)

  first_expression, *expressions = rules.map do |rule|
    rule_to_arel(quoted_search, rule)
  end

  # (SIMILARITY ...) + (SIMILARITY ...)
  additions = expressions.inject(first_expression) do |expression1, expression2|
    Arel::Nodes::Addition.new(expression1, expression2)
  end

  score_as_numeric = Arel::Nodes::NamedFunction.new('CAST', [Arel::Nodes::Grouping.new(additions).as('numeric')])

  # Rounding the score to two decimals
  Arel::Nodes::NamedFunction.new('ROUND', [score_as_numeric, 2])
end

.order_by_similarity?(arel_query) ⇒ `Boolean`

Returns:

(Boolean)



87
88
89

# File 'lib/gitlab/database/similarity_score.rb', line 87

def self.order_by_similarity?(arel_query)
  arel_query.to_sql.include?(SIMILARITY_FUNCTION_CALL_WITH_ANNOTATION)
end