Class: BrowserWebData::EntitySumarization::PredicatesSimilarity
- Inherits:
-
Object
- Object
- BrowserWebData::EntitySumarization::PredicatesSimilarity
- Includes:
- BrowserWebData::EntitySumarizationConfig
- Defined in:
- lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb
Overview
The class include methods to identify identical predicates
Constant Summary
Constants included from BrowserWebData::EntitySumarizationConfig
BrowserWebData::EntitySumarizationConfig::COMMON_PROPERTIES, BrowserWebData::EntitySumarizationConfig::IDENTICAL_PROPERTY_LIMIT, BrowserWebData::EntitySumarizationConfig::IMPORTANCE_TO_IDENTIFY_MAX_COUNT, BrowserWebData::EntitySumarizationConfig::NO_SENSE_PROPERTIES, BrowserWebData::EntitySumarizationConfig::SCAN_REGEXP
Class Method Summary collapse
-
.get_key(predicates) ⇒ String
The method return key of identical predicates.
-
.parse_key(key) ⇒ Array<String>
The method return identical predicates by key.
Instance Method Summary collapse
-
#add_different(values) ⇒ Object
The method add new different values to local storage.
-
#add_identical(values) ⇒ Object
The method add new identical values to local storage.
-
#find_different(value) ⇒ String, NilClass
The method helps to recognize if is already marked as different properties.
-
#find_identical(value) ⇒ String, NilClass
The method helps to recognize if is already marked as identical properties.
-
#identify_identical_predicates(predicates, identical_limit = @identical_limit) ⇒ Object
The method verify every combination of two predicates.
-
#initialize(results_dir_path, identical_limit = IDENTICAL_PROPERTY_LIMIT, console_output = false) ⇒ PredicatesSimilarity
constructor
The method create new instance of PredicatesSimilarity class.
-
#is_identical_property_ontology?(values) ⇒ TrueClass, FalseClass
The method helps to automatic identify identical properties that means DBpedia property versus ontology predicates.
-
#recursive_find_identical(keys, values) ⇒ Array<String>
The method helps to collect identical chains.
-
#reduce_identical ⇒ Object
The method helps to reduce identical predicates by join of common predicate.
Constructor Details
#initialize(results_dir_path, identical_limit = IDENTICAL_PROPERTY_LIMIT, console_output = false) ⇒ PredicatesSimilarity
The method create new instance of PredicatesSimilarity class.
22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 22 def initialize(results_dir_path, identical_limit = IDENTICAL_PROPERTY_LIMIT, console_output = false) @results_dir_path = results_dir_path @console_output = console_output @identical_limit = identical_limit @query = SPARQLRequest.new load_identical_predicates load_different_predicates load_counts end |
Class Method Details
.get_key(predicates) ⇒ String
The method return key of identical predicates
40 41 42 43 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 40 def self.get_key(predicates) predicates = [predicates] unless predicates.is_a?(Array) "<#{predicates.sort.join('><')}>" if predicates && !predicates.empty? end |
.parse_key(key) ⇒ Array<String>
The method return identical predicates by key
51 52 53 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 51 def self.parse_key(key) key.to_s.scan(SCAN_REGEXP[:identical_key]).reduce(:+) end |
Instance Method Details
#add_different(values) ⇒ Object
The method add new different values to local storage.
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 173 def add_different(values) values = values.map { |p| p.to_s }.uniq.sort group_key = PredicatesSimilarity.get_key(values) unless @different_predicates.include?(group_key) @different_predicates << group_key @new_diff_counter ||= 0 @new_diff_counter += 1 if @new_diff_counter > 100 store_different_predicates @new_diff_counter = 0 end end end |
#add_identical(values) ⇒ Object
The method add new identical values to local storage.
159 160 161 162 163 164 165 166 167 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 159 def add_identical(values) values = values.map { |p| p.to_s }.uniq.sort group_key = PredicatesSimilarity.get_key(values) unless @identical_predicates.include?(group_key) @identical_predicates << group_key store_identical_properties end end |
#find_different(value) ⇒ String, NilClass
The method helps to recognize if is already marked as different properties
140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 140 def find_different(value) raise RuntimeError.new('No support identify identical for more than 2 predicates.') if value.is_a?(Array) && value.size >2 key = case value when Array value = value.map { |v| PredicatesSimilarity.get_key(v) } @different_predicates.find { |p| p[value[0]] && p[value[1]] } else value = PredicatesSimilarity.get_key(value) @different_predicates.find { |p| p[value] } end PredicatesSimilarity.parse_key(key) end |
#find_identical(value) ⇒ String, NilClass
The method helps to recognize if is already marked as identical properties
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 115 def find_identical(value) raise RuntimeError.new('No support identify identical for more than 2 predicates.') if value.is_a?(Array) && value.size >2 predicates_key = case value when Array value = value.map { |v| PredicatesSimilarity.get_key(v) } @identical_predicates.find { |p| p[value[0]] && p[value[1]] } else value = PredicatesSimilarity.get_key(value) @identical_predicates.find { |p| p[value] } end PredicatesSimilarity.parse_key(predicates_key) end |
#identify_identical_predicates(predicates, identical_limit = @identical_limit) ⇒ Object
The method verify every combination of two predicates. Method store identify combination in two files identical_predicates.json and different_predicates.json files contains Array of combination keys. Given predicates count are is reduced to #IMPORTANCE_TO_IDENTIFY_MAX_COUNT (250)
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 62 def identify_identical_predicates(predicates, identical_limit = @identical_limit) combination = predicates.take(IMPORTANCE_TO_IDENTIFY_MAX_COUNT).map { |p| p.to_sym }.combination(2) times_count = combination.size / 10.0 combination.each_with_index { |values, i| already_mark_same = find_identical(values) already_mark_different = find_different(values) if already_mark_same.nil? && already_mark_different.nil? # in case of dbpedia ontology vs. property # automatically became identical unless is_identical_property_ontology?(values) unless @counts[values[0]] @counts[values[0]] = @query.get_count_of_identical_predicates(values[0]) end unless @counts[values[1]] @counts[values[1]] = @query.get_count_of_identical_predicates(values[1]) end x = @counts[values[0]] y = @counts[values[1]] z = @query.get_count_of_identical_predicates(values) identical_level = z / [x, y].max if identical_level >= identical_limit puts " - result[#{identical_level}] z[#{z}] x[#{x}] y[#{y}] #{values.inspect}" if @console_output add_identical(values) else add_different(values) end end end if @console_output && ( i == 0 || (i+1) % times_count == 0 ) puts "#{Time.now.localtime} | #{(((i+1)/combination.size.to_f) * 100).round(0)}% | [#{(i+1)}/#{combination.size}]" end } store_counts end |
#is_identical_property_ontology?(values) ⇒ TrueClass, FalseClass
The method helps to automatic identify identical properties that means DBpedia property versus ontology predicates.
198 199 200 201 202 203 204 205 206 207 208 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 198 def is_identical_property_ontology?(values) group_key = PredicatesSimilarity.get_key(values) temp = values.map { |val| val.to_s.split('/').last }.uniq if temp.size == 1 && group_key['property/'] && group_key['ontology/'] add_identical(values) true else false end end |
#recursive_find_identical(keys, values) ⇒ Array<String>
The method helps to collect identical chains.
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 238 def recursive_find_identical(keys, values) keys = [keys] unless keys.is_a?(Array) @identical_predicates.each { |this_key| next if keys.include?(this_key) temp = PredicatesSimilarity.parse_key(this_key) unless (temp & values).empty? keys << this_key return recursive_find_identical(keys, (values + temp).uniq) end } values end |
#reduce_identical ⇒ Object
The method helps to reduce identical predicates by join of common predicate
213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
# File 'lib/browser_web_data_entity_sumarization/entity_sumarization_predicates_similarity.rb', line 213 def reduce_identical new_identical = [] @identical_predicates.each { |key| values = PredicatesSimilarity.parse_key(key) next if new_identical.find { |v| !(v & values).empty? } ## find nodes with values predicates values = recursive_find_identical(key, values) new_identical << values.uniq.sort } @identical_predicates = new_identical.map { |v| PredicatesSimilarity.get_key(v) } store_identical_properties end |