Module: OpenTox::Algorithm::Similarity
- Defined in:
- lib/utils.rb
Overview
Similarity calculations
Class Method Summary collapse
-
.cosine(fingerprints_a, fingerprints_b, weights = nil) ⇒ Float
Cosine similarity.
-
.cosine_num(a, b) ⇒ Float
Cosine similarity.
-
.outliers(params) ⇒ Object
Outlier detection based on Mahalanobis distances Multivariate detection on X, univariate detection on y Uses an existing Rinruby instance, if possible @param Keys query_matrix, data_matrix, acts are required; r, p_outlier optional @return indices identifying outliers (may occur several times, this is intended).
-
.tanimoto(fingerprints_a, fingerprints_b, weights = nil, params = nil) ⇒ Float
Tanimoto similarity.
Class Method Details
.cosine(fingerprints_a, fingerprints_b, weights = nil) ⇒ Float
Cosine similarity
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
# File 'lib/utils.rb', line 291 def self.cosine(fingerprints_a,fingerprints_b,weights=nil) # fingerprints are hashes if fingerprints_a.class == Hash && fingerprints_b.class == Hash a = []; b = [] common_features = fingerprints_a.keys & fingerprints_b.keys if common_features.size > 1 common_features.each do |p| a << fingerprints_a[p] b << fingerprints_b[p] end end # fingerprints are arrays elsif fingerprints_a.class == Array && fingerprints_b.class == Array a = fingerprints_a b = fingerprints_b end (a.size > 0 && b.size > 0) ? self.cosine_num(a.to_gv, b.to_gv) : 0.0 end |
.cosine_num(a, b) ⇒ Float
Cosine similarity
319 320 321 322 323 324 325 |
# File 'lib/utils.rb', line 319 def self.cosine_num(a, b) if a.size>12 && b.size>12 a = a[0..11] b = b[0..11] end a.dot(b) / (a.norm * b.norm) end |
.outliers(params) ⇒ Object
Outlier detection based on Mahalanobis distances Multivariate detection on X, univariate detection on y Uses an existing Rinruby instance, if possible @param Keys query_matrix, data_matrix, acts are required; r, p_outlier optional @return indices identifying outliers (may occur several times, this is intended)
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
# File 'lib/utils.rb', line 333 def self.outliers(params) outlier_array = [] data_matrix = params[:data_matrix] query_matrix = params[:query_matrix] acts = params[:acts] begin LOGGER.debug "Outliers (p=#{params[:p_outlier] || 0.9999})..." r = ( params[:r] || RinRuby.new(false,false) ) r.eval "suppressPackageStartupMessages(library(\"robustbase\"))" r.eval "outlier_threshold = #{params[:p_outlier] || 0.999}" nr_cases, nr_features = data_matrix.to_a.size, data_matrix.to_a[0].size r.odx = data_matrix.to_a.flatten r.q = query_matrix.to_a.flatten r.y = acts.to_a.flatten r.eval "odx = matrix(odx, #{nr_cases}, #{nr_features}, byrow=T)" r.eval 'odx = rbind(q,odx)' # query is nr 0 (1) in ruby (R) r.eval 'mah = covMcd(odx)$mah' # run MCD alg r.eval "mah = pchisq(mah,#{nr_features})" r.eval 'outlier_array = which(mah>outlier_threshold)' # multivariate outliers using robust mahalanobis outlier_array = r.outlier_array.to_a.collect{|v| v-2 } # translate to ruby index (-1 for q, -1 due to ruby) r.eval 'fqu = matrix(summary(y))[2]' r.eval 'tqu = matrix(summary(y))[5]' r.eval 'outlier_array = which(y>(tqu+1.5*IQR(y)))' # univariate outliers due to Tukey (http://goo.gl/mwzNH) outlier_array += r.outlier_array.to_a.collect{|v| v-1 } # translate to ruby index (-1 due to ruby) r.eval 'outlier_array = which(y<(fqu-1.5*IQR(y)))' outlier_array += r.outlier_array.to_a.collect{|v| v-1 } rescue Exception => e LOGGER.debug "#{e.class}: #{e.}" #LOGGER.debug "Backtrace:\n\t#{e.backtrace.join("\n\t")}" end outlier_array end |
.tanimoto(fingerprints_a, fingerprints_b, weights = nil, params = nil) ⇒ Float
Tanimoto similarity
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
# File 'lib/utils.rb', line 258 def self.tanimoto(fingerprints_a,fingerprints_b,weights=nil,params=nil) common_p_sum = 0.0 all_p_sum = 0.0 # fingerprints are hashes if fingerprints_a.class == Hash && fingerprints_b.class == Hash common_features = fingerprints_a.keys & fingerprints_b.keys all_features = (fingerprints_a.keys + fingerprints_b.keys).uniq if common_features.size > 0 common_features.each{ |f| common_p_sum += [ fingerprints_a[f], fingerprints_b[f] ].min } all_features.each{ |f| all_p_sum += [ fingerprints_a[f],fingerprints_b[f] ].compact.max } # compact, since one fp may be empty at that pos end # fingerprints are arrays elsif fingerprints_a.class == Array && fingerprints_b.class == Array size = [ fingerprints_a.size, fingerprints_b.size ].min LOGGER.warn "fingerprints don't have equal size" if fingerprints_a.size != fingerprints_b.size (0...size).each { |idx| common_p_sum += [ fingerprints_a[idx], fingerprints_b[idx] ].min all_p_sum += [ fingerprints_a[idx], fingerprints_b[idx] ].max } end (all_p_sum > 0.0) ? (common_p_sum/all_p_sum) : 0.0 end |