Class: Gitlab::Database::SimilarityScore

Inherits:
Object
  • Object
show all
Defined in:
lib/gitlab/database/similarity_score.rb

Constant Summary collapse

EMPTY_STRING =
Arel.sql("''").freeze
EXPRESSION_ON_INVALID_INPUT =
Arel::Nodes::NamedFunction.new('CAST', [Arel.sql('0').as('integer')]).freeze
DEFAULT_MULTIPLIER =
1
DISPLAY_NAME =
self.name.underscore.freeze
SIMILARITY_FUNCTION_CALL_WITH_ANNOTATION =

Adds a “magic” comment in the generated SQL expression in order to be able to tell if we’re sorting by similarity. Example: /* gitlab/database/similarity_score */ SIMILARITY(COALESCE…

"/* #{DISPLAY_NAME} */ SIMILARITY"

Class Method Summary collapse

Class Method Details

.build_expression(search:, rules:) ⇒ Object

This method returns an Arel expression that can be used in an ActiveRecord query to order the resultset by similarity.

Note: Calculating similarity score for large volume of records is inefficient. use SimilarityScore only for smaller resultset which is already filtered by other conditions (< 10_000 records).

Parameters

  • search - [String] the user provided search string

  • rules - [{ column: COLUMN, multiplier: 1 }, { column: COLUMN_2, multiplier: 0.5 }] rules for the scoring.

    • column - Arel column expression, example: Project.arel_table

    • multiplier - Integer or Float to increase or decrease the score (optional, defaults to 1)

Use case

We’d like to search for projects by path, name and description. We want to rank higher the path and name matches, since it’s more likely that the user was remembering the path or the name of the project.

Rules:

[
  { column: Project.arel_table['path'], multiplier: 1 },
  { column: Project.arel_table['name'], multiplier: 1 },
  { column: Project.arel_table['description'], multiplier: 0.5 }
]

Examples

Similarity calculation based on one column:

Gitlab::Database::SimilarityScore.build_expession(search: 'my input', rules: [{ column: Project.arel_table['name'] }])

Similarity calculation based on two column, where the second column has lower priority:

Gitlab::Database::SimilarityScore.build_expession(search: 'my input', rules: [
  { column: Project.arel_table['name'], multiplier: 1 },
  { column: Project.arel_table['description'], multiplier: 0.5 }
])

Integration with an ActiveRecord query:

table = Project.arel_table

order_expression = Gitlab::Database::SimilarityScore.build_expession(search: 'input', rules: [
  { column: table['name'], multiplier: 1 },
  { column: table['description'], multiplier: 0.5 }
])

Project.where("name LIKE ?", '%' + 'input' + '%').order(order_expression.desc)

The expression can be also used in SELECT:

results = Project.select(order_expression.as('similarity')).where("name LIKE ?", '%' + 'input' + '%').order(similarity: :desc)
puts results.map(&:similarity)


67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/gitlab/database/similarity_score.rb', line 67

def self.build_expression(search:, rules:)
  return EXPRESSION_ON_INVALID_INPUT if search.blank? || rules.empty?

  quoted_search = ApplicationRecord.connection.quote(search.to_s)

  first_expression, *expressions = rules.map do |rule|
    rule_to_arel(quoted_search, rule)
  end

  # (SIMILARITY ...) + (SIMILARITY ...)
  additions = expressions.inject(first_expression) do |expression1, expression2|
    Arel::Nodes::Addition.new(expression1, expression2)
  end

  score_as_numeric = Arel::Nodes::NamedFunction.new('CAST', [Arel::Nodes::Grouping.new(additions).as('numeric')])

  # Rounding the score to two decimals
  Arel::Nodes::NamedFunction.new('ROUND', [score_as_numeric, 2])
end

.order_by_similarity?(arel_query) ⇒ Boolean

Returns:

  • (Boolean)


87
88
89
# File 'lib/gitlab/database/similarity_score.rb', line 87

def self.order_by_similarity?(arel_query)
  arel_query.to_sql.include?(SIMILARITY_FUNCTION_CALL_WITH_ANNOTATION)
end