Module: Similars

Included in:
ItemBasedRecommendation, UserBasedRecommendation
Defined in:
lib/utils/similars.rb

Class Method Summary collapse

Class Method Details

.calculate_dot_product(d1, d2) ⇒ Object

Find dot product of the document vectors Dot product is done by mutliplying attribute by attribute of each document vector



18
19
20
21
22
23
24
25
26
# File 'lib/utils/similars.rb', line 18

def self.calculate_dot_product(d1,d2)
    sum = 0
    i = 0
    while i < d1.length
        sum = sum + d1[i] * d2[i]
        i += 1
    end
    return sum.to_f
end

.calculate_vector_magnitude(vector) ⇒ Object

Document vector magnitude is calculated by summing up the squares of each attribute and finding the square root of the sum



29
30
31
32
33
34
35
36
37
# File 'lib/utils/similars.rb', line 29

def self.calculate_vector_magnitude(vector)
    sum = 0
    i = 0
    while i < vector.length
        sum = sum + vector[i] * vector[i]
        i += 1
    end
    return Math.sqrt(sum).to_f
end

.fix_missing_values(d1, d2) ⇒ Object

Fix vectors of varying length by adding zeros



56
57
58
59
60
61
62
63
64
65
66
# File 'lib/utils/similars.rb', line 56

def self.fix_missing_values(d1,d2)
    i = 0
    length_diff = d1.length - d2.length
    if length_diff > 0
        while i < length_diff
            d2.push(0)
            i += 1
        end
    end
    return d2
end

.get_similars(target, data) ⇒ Object

Returns an array of similarity indexes Sort the similarity indexes in descending order



41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/utils/similars.rb', line 41

def self.get_similars(target,data)
    sim = Hash.new
    d1 = data[target.id]
    data.each do |key, d2|
        if key != target.id
            index = similarity_index(d1, d2)
            if index > 0 && !index.nan? # Only store records that have similarity index greater than 0 and is not NaN (NaN values are derived when document vector is filled with only zero values)
                sim[key] = index
            end
        end
    end
    return sim.sort_by {|k,v| -v}.to_h # sort_by function converts hash to multidemsional array, so convert back to hash after sorting
end

.similarity_index(d1, d2) ⇒ Object

Calculate similarity index using the cosine similarity model Formula is defined as cos(d1,d2) = (d1●d2)/||d1|| * ||d2|| (d1●d2) is known as the dot product i.e multiplying each attributes of the document vectors by each other ||d1||, ||d2|| is defined as magnitude Document vector magnitude is calculated by squaring each document vector attribute and find the square root of the sum



8
9
10
11
12
13
14
# File 'lib/utils/similars.rb', line 8

def self.similarity_index(d1,d2)
    d2 = fix_missing_values(d1,d2)
    dot_product = calculate_dot_product(d1,d2)
    d1_magnitude = calculate_vector_magnitude(d1)
    d2_magnitude = calculate_vector_magnitude(d2)
    return dot_product / (d1_magnitude * d2_magnitude) # convert to  values to float before calculation so decimal places will not be lost
end