Class: SupervisedLearning::LinearRegression

Inherits:
Object
  • Object
show all
Defined in:
lib/supervised_learning.rb

Overview

This class uses linear regression to make predictions based on a training set. For datasets of less than 1000 columns, use #predict since this will give the most accurate prediction. For larger datasets where the #predict method is too slow, use #predict_advanced. The algorithms in #predict and #predict_advanced were provided by Andrew Ng (Stanford University).

Author:

  • Michael Imstepf

Instance Method Summary collapse

Constructor Details

#initialize(training_set) ⇒ LinearRegression

Initializes a LinearRegression object with a training set

Parameters:

  • training_set (Matrix)

    training_set, each feature/dimension has one column and the last column is the output column (type of value #predict will return)

Raises:

  • (ArgumentError)

    if training_set is not a Matrix or does not have at least two columns and one row



17
18
19
20
21
22
23
24
25
26
27
28
29
# File 'lib/supervised_learning.rb', line 17

def initialize(training_set)      
  @training_set = training_set
  raise ArgumentError, 'input is not a Matrix' unless @training_set.is_a? Matrix
  raise ArgumentError, 'Matrix must have at least 2 columns and 1 row' unless @training_set.column_size > 1

  @number_of_features = @training_set.column_size - 1
  @number_of_training_examples = @training_set.row_size      

  @feature_set = @training_set.clone
  @feature_set.hpop # remove output set

  @output_set = @training_set.column_vectors.last      
end

Instance Method Details

#predict(prediction) ⇒ Object

Makes prediction using normalization. This algorithm is the most accurate one but with large sets (more than 1000 columns) it might take too long to calculate.

Parameters:

  • prediction (Matrix)

    prediction



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'lib/supervised_learning.rb', line 35

def predict(prediction)
  # add ones to feature set
  feature_set = Matrix.hconcat(Matrix.one(@number_of_training_examples, 1), @feature_set)

  validate_prediction_input(prediction)
        
  transposed_feature_set = feature_set.transpose # only transpose once for efficiency                  
  theta = (transposed_feature_set * feature_set).inverse * transposed_feature_set * @output_set

  # add column of ones to prediction
  prediction = Matrix.hconcat(Matrix.one(prediction.row_size, 1), prediction)
  
  result_vectorized = prediction * theta
  result = result_vectorized.to_a.first.to_f
end

#predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false) ⇒ Object

Makes prediction using gradient descent. This algorithm is requires less computing power than #predict but is less accurate since it uses approximation.

Parameters:

  • prediction (Matrix)

    prediction



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/supervised_learning.rb', line 55

def predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false) 
  validate_prediction_input(prediction)

  feature_set = normalize_feature_set(@feature_set)
  # add ones to feature set after normalization      
  feature_set = Matrix.hconcat(Matrix.one(@number_of_training_examples, 1), feature_set)

  # prepare theta column vector with zeros
  theta = Matrix.zero(@number_of_features+1, 1)

  iterations.times do        
    theta = theta - (learning_rate * (1.0/@number_of_training_examples) * (feature_set * theta - @output_set).transpose * feature_set).transpose
    if debug
      puts "Theta: #{theta}"
      puts "Cost: #{calculate_cost(feature_set, theta)}"
    end
  end

  # normalize prediction
  prediction = normalize_prediction(prediction)

  # add column of ones to prediction
  prediction = Matrix.hconcat(Matrix.one(prediction.row_size, 1), prediction)

  result_vectorized = prediction * theta
  result = result_vectorized[0,0]
end