Class: SupervisedLearning::LinearRegression

Inherits:
Object
  • Object
show all
Defined in:
lib/supervised_learning.rb

Overview

This class uses linear regression to make predictions based on a training set. For datasets of less than 1000 columns, use #predict since this will give the most accurate prediction. For larger datasets where the #predict method is too slow, use #predict_advanced. The algorithms in #predict and #predict_advanced were provided by Andrew Ng (Stanford University).

Author:

  • Michael Imstepf

Instance Method Summary collapse

Constructor Details

#initialize(training_set) ⇒ LinearRegression

Initializes a LinearRegression object with a training set

Parameters:

  • training_set (Matrix)

    training_set, each feature/dimension has one column and the last column is the output column (type of value #predict will return)

Raises:

  • (ArgumentError)

    if training_set is not a Matrix and has at least two columns and one row



16
17
18
19
20
21
22
23
24
25
26
# File 'lib/supervised_learning.rb', line 16

def initialize(training_set)      
  @training_set = training_set
  raise ArgumentError, 'input is not a Matrix' unless @training_set.is_a? Matrix
  raise ArgumentError, 'Matrix must have at least 2 columns and 1 row' unless @training_set.column_size > 1

  @number_of_training_set_columns = @training_set.column_size
  @number_of_features = @number_of_training_set_columns - 1
  @number_of_training_examples = @training_set.row_size      

  @output_set = @training_set.column_vectors.last      
end

Instance Method Details

#predict(prediction) ⇒ Object

Makes prediction using normalization. This algorithm is the most accurate one but with large sets (more than 1000 columns) it might take too long to calculate.

Parameters:

  • prediction (Matrix)

    prediction



32
33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/supervised_learning.rb', line 32

def predict(prediction)
  feature_set = get_feature_set(@training_set, true)      

  validate_prediction_input(prediction)
        
  transposed_feature_set = feature_set.transpose # only transpose once for efficiency                  
  theta = (transposed_feature_set * feature_set).inverse * transposed_feature_set * @output_set

  # add column of ones to prediction
  prediction = get_feature_set(prediction, true)
  
  result_vectorized = prediction * theta
  result = result_vectorized.to_a.first.to_f
end

#predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false) ⇒ Object

Makes prediction using gradient descent. This algorithm is requires less computing power than #predict but is less accurate since it uses approximation.

Parameters:

  • prediction (Matrix)

    prediction



51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/supervised_learning.rb', line 51

def predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false) 
  validate_prediction_input(prediction)

  feature_set = get_feature_set(@training_set, false) 
  feature_set = normalize_feature_set(feature_set)
  # add ones to feature set after normalization      
  feature_set = get_feature_set(feature_set, true)

  # prepare theta column vector with zeros
  theta = Array.new(@number_of_training_set_columns, 0)
  theta = Matrix.columns([theta])

  iterations.times do        
    theta = theta - (learning_rate * (1.0/@number_of_training_examples) * (feature_set * theta - @output_set).transpose * feature_set).transpose
    if debug
      puts "Theta: #{theta}"
      puts "Cost: #{calculate_cost(feature_set, theta)}"
    end
  end

  # normalize prediction
  prediction = normalize_prediction(prediction)

  # add column of ones to prediction
  prediction = get_feature_set(prediction, true)

  result_vectorized = prediction * theta
  result = result_vectorized[0,0]
end