Class: Veritable::Prediction

Inherits:
Hash
  • Object
show all
Defined in:
lib/veritable/api.rb

Overview

Represents the result of a Veritable prediction

A Veritable::Prediction is a Hash whose keys are the columns in the prediction request, and whose values are standard point estimates for predicted columns. For fixed (conditioning) columns, the value is the fixed value. For predicted values, the point estimate varies by datatype:

  • real – mean

  • count – mean rounded to the nearest integer

  • categorical – mode

  • boolean – mode

The object also gives access to the original predictions request, the predicted distribution on missing values, the schema of the analysis used to make predictions, and standard measures of uncertainty for the predicted values.

Attributes

  • request – a Hash containing the original predictions request. Keys are column names; conditioning values are present, predicted values are nil.

  • distribution – the underlying predicted distribution as an Array of Hashes, each of which represents a single sample from the predictive distribution.

  • schema – the schema for the columns in the predictions request

  • uncertainty – a Hash containing measures of uncertainty for each predicted value.

Methods

  • prob_within – calculates the probability a column’s value lies within a range

  • credible_values – calculates a credible range for the value of a column

See also: dev.priorknowledge.com/docs/client/ruby

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(request, distribution, schema, request_id = nil) ⇒ Prediction

Initializes a Veritable::Prediction

Users should not call directly. Instead, call Veritable::Analysis#predict.

See also: dev.priorknowledge.com/docs/client/ruby



896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
# File 'lib/veritable/api.rb', line 896

def initialize(request, distribution, schema, request_id=nil)
  @request = request
  @request.delete '_request_id'

  @schema = Schema.new(schema)
  @request_id = request_id

  fixed = {}
  @request.each { |k,v| 
    if not v.nil?
        fixed[k] = v
    end
  }
  @distribution = distribution
  @distribution.each {|d| 
    d.delete '_request_id'
    d.update(fixed)
  }

  @uncertainty = Hash.new()
  @request.each { |k,v|
    if v.nil?
      self[k] = point_estimate k
      @uncertainty[k] = calculate_uncertainty k
    else
      self[k] = v
      @uncertainty[k] = 0.0
    end
  }
end

Instance Attribute Details

#distributionObject (readonly)

The underlying predicted distribution, as an Array of Hashes

Each Hash represents a single draw from the predictive distribution, and should be regarded as equiprobable with the others.

See also: dev.priorknowledge.com/docs/client/ruby



875
876
877
# File 'lib/veritable/api.rb', line 875

def distribution
  @distribution
end

#requestObject (readonly)

The original predictions request, as a Hash



865
866
867
# File 'lib/veritable/api.rb', line 865

def request
  @request
end

#request_idObject (readonly)

The original prediction ‘_request_id’, nil if none was specified



868
869
870
# File 'lib/veritable/api.rb', line 868

def request_id
  @request_id
end

#schemaObject (readonly)

The schema for the columns in the predictions request



878
879
880
# File 'lib/veritable/api.rb', line 878

def schema
  @schema
end

#uncertaintyObject (readonly)

A Hash of standard uncertainty measures

Keys are the columns in the prediction request and values are uncertainty measures associated with each point estimate. A higher value indicates greater uncertainty. These measures vary by datatype:

  • real – length of 90% credible interval

  • count – length of 90% credible interval

  • categorical – total probability of all non-modal values

  • boolean – probability of the non-modal value

See also: dev.priorknowledge.com/docs/client/ruby



889
890
891
# File 'lib/veritable/api.rb', line 889

def uncertainty
  @uncertainty
end

Instance Method Details

#credible_values(column, p = nil) ⇒ Object

Based on the underlying predicted distribution, calculates a range within which the predicted value for the column lies with the specified probability.

Arguments

  • column – the column for which to calculate the range

  • p – The desired degree of probability. Default is nil, in which case will default to 0.5 for boolean and categorical columns, and to 0.9 for count and real columns.

Returns

For boolean and categorical columns, a Hash whose keys are categorical values in the calculated range and whose values are probabilities; for real and count columns, an Array of the [min, max] values for the calculated range.

See also: dev.priorknowledge.com/docs/client/ruby



978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
# File 'lib/veritable/api.rb', line 978

def credible_values(column, p=nil)
  col_type = schema.type column
  Veritable::Util.check_datatype(col_type, "Credible values -- ")
  if col_type == 'boolean' or col_type == 'categorical'
    p = 0.5 if p.nil?
    tf = Hash.new
    ((freqs(counts(column)).sort_by {|k, v| v}).reject {|c, a| a < p}).each {|k, v| tf[k] = v}
    tf
  elsif col_type == 'count' or col_type == 'real'
    p = 0.9 if p.nil?
    n = distribution.size
    a = (n * (1.0 - p) / 2.0).round.to_i
    sv = sorted_values column
    n = sv.size
    lo = sv[a]
    hi = sv[n - 1 - a]
    [lo, hi]
  end
end

#inspectObject

Returns a string representation of the prediction results



999
# File 'lib/veritable/api.rb', line 999

def inspect; to_s; end

#prob_within(column, range) ⇒ Object

Calculates the probability a column’s value lies within a range.

Based on the underlying predicted distribution, calculates the marginal probability that the predicted value for the given columns lies within the specified range.

Arguments

column – the column for which to calculate probabilities range – a representation of the range for which to calculate probabilities. For real and count columns, this is an Array of [start, end] representing a closed interval. For boolean and categorical columns, this is an Array of discrete values.

Returns

A probability as a Float

See also: dev.priorknowledge.com/docs/client/python



939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
# File 'lib/veritable/api.rb', line 939

def prob_within(column, range)
  col_type = schema.type column
  Veritable::Util.check_datatype(col_type, "Probability within -- ")
  if col_type == 'boolean' or col_type == 'categorical'
    count = distribution.inject(0) {|memo, row|
      if range.include? row[column]
        memo + 1 
      else
        memo
      end
    }
    count.to_f / distribution.size
  elsif col_type == 'count' or col_type == 'real'
    mn = range[0]
    mx = range[1]
    count = distribution.inject(0) {|memo, row|
      v = row[column]
      if (mn.nil? or v >= mn) and (mx.nil? or v <=mx)
        memo + 1 
      else
        memo
      end
    }
    count.to_f / distribution.size
  end
end

#to_sObject

Returns a string representation of the prediction results



1002
# File 'lib/veritable/api.rb', line 1002

def to_s; "<Veritable::Prediction #{super}>"; end