Module: Similars
- Included in:
- ItemBasedRecommendation, UserBasedRecommendation
- Defined in:
- lib/utils/similars.rb
Class Method Summary collapse
-
.calculate_dot_product(d1, d2) ⇒ Object
Find dot product of the document vectors Dot product is done by mutliplying attribute by attribute of each document vector.
-
.calculate_vector_magnitude(vector) ⇒ Object
Document vector magnitude is calculated by summing up the squares of each attribute and finding the square root of the sum.
-
.fix_missing_values(d1, d2) ⇒ Object
Fix vectors of varying length by adding zeros.
-
.get_similars(target, data) ⇒ Object
Returns an array of similarity indexes Sort the similarity indexes in descending order.
-
.similarity_index(d1, d2) ⇒ Object
Calculate similarity index using the cosine similarity model Formula is defined as cos(d1,d2) = (d1●d2)/||d1|| * ||d2|| (d1●d2) is known as the dot product i.e multiplying each attributes of the document vectors by each other ||d1||, ||d2|| is defined as magnitude Document vector magnitude is calculated by squaring each document vector attribute and find the square root of the sum.
Class Method Details
.calculate_dot_product(d1, d2) ⇒ Object
Find dot product of the document vectors Dot product is done by mutliplying attribute by attribute of each document vector
18 19 20 21 22 23 24 25 26 |
# File 'lib/utils/similars.rb', line 18 def self.calculate_dot_product(d1,d2) sum = 0 i = 0 while i < d1.length sum = sum + d1[i] * d2[i] i += 1 end return sum.to_f end |
.calculate_vector_magnitude(vector) ⇒ Object
Document vector magnitude is calculated by summing up the squares of each attribute and finding the square root of the sum
29 30 31 32 33 34 35 36 37 |
# File 'lib/utils/similars.rb', line 29 def self.calculate_vector_magnitude(vector) sum = 0 i = 0 while i < vector.length sum = sum + vector[i] * vector[i] i += 1 end return Math.sqrt(sum).to_f end |
.fix_missing_values(d1, d2) ⇒ Object
Fix vectors of varying length by adding zeros
56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/utils/similars.rb', line 56 def self.fix_missing_values(d1,d2) i = 0 length_diff = d1.length - d2.length if length_diff > 0 while i < length_diff d2.push(0) i += 1 end end return d2 end |
.get_similars(target, data) ⇒ Object
Returns an array of similarity indexes Sort the similarity indexes in descending order
41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/utils/similars.rb', line 41 def self.get_similars(target,data) sim = Hash.new d1 = data[target.id] data.each do |key, d2| if key != target.id index = similarity_index(d1, d2) if index > 0 && !index.nan? # Only store records that have similarity index greater than 0 and is not NaN (NaN values are derived when document vector is filled with only zero values) sim[key] = index end end end return sim.sort_by {|k,v| -v}.to_h # sort_by function converts hash to multidemsional array, so convert back to hash after sorting end |
.similarity_index(d1, d2) ⇒ Object
Calculate similarity index using the cosine similarity model Formula is defined as cos(d1,d2) = (d1●d2)/||d1|| * ||d2|| (d1●d2) is known as the dot product i.e multiplying each attributes of the document vectors by each other ||d1||, ||d2|| is defined as magnitude Document vector magnitude is calculated by squaring each document vector attribute and find the square root of the sum
8 9 10 11 12 13 14 |
# File 'lib/utils/similars.rb', line 8 def self.similarity_index(d1,d2) d2 = fix_missing_values(d1,d2) dot_product = calculate_dot_product(d1,d2) d1_magnitude = calculate_vector_magnitude(d1) d2_magnitude = calculate_vector_magnitude(d2) return dot_product / (d1_magnitude * d2_magnitude) # convert to values to float before calculation so decimal places will not be lost end |