Class: ContentData::ContentData
- Inherits:
-
Object
- Object
- ContentData::ContentData
- Defined in:
- lib/content_data/content_data.rb
Overview
Content Data(CD) object holds files information as contents and instances Files info retrieved from hardware: checksum, size, time modification, server, device and path Those attributes are divided into content and instance attributes:
unique checksum, size are content attributes
time modification, server, device and path are instance attributes
The relationship between content and instances is 1:many meaning that a content can have instances in many servers. content also has time attribute, which has the value of the time of the first instance. This can be changed by using unify_time method which sets all time attributes for a content and it’s instances to the min time off all. Different files(instances) with same content(checksum), are grouped together under that content. Interface methods include:
iterate over contents and instances info,
unify time, add/remove instance, queries, merge, remove directory and more.
Content info data structure:
@contents_info = { Checksum -> [size, *instances*, content_modification_time] }
*instances* = {[server,path] -> instance_modification_time }
Notes:
1. content_modification_time is the instance_modification_time of the first
instances which was added to @contents_info
Instance Method Summary collapse
- #==(other) ⇒ Object
- #add_instance(checksum, size, server, path, modification_time) ⇒ Object
- #checksum_instances_size(checksum) ⇒ Object
- #clone_contents_info ⇒ Object
- #clone_instances_info ⇒ Object
-
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time, instance modification time, server and file path.
- #content_exists(checksum) ⇒ Object
- #contents_size ⇒ Object
-
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time.
-
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time, instance modification time, server and file path.
- #empty? ⇒ Boolean
-
#from_file(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing.
- #get_instance_mod_time(checksum, location) ⇒ Object
-
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn't be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison.
-
#initialize(other = nil) ⇒ ContentData
constructor
A new instance of ContentData.
- #instance_exists(path, server) ⇒ Object
- #instances_size ⇒ Object
- #remove_content(checksum) ⇒ Object
-
#remove_directory(server, dir_to_remove) ⇒ Object
removes all instances records which are located under input param: dir_to_remove.
-
#remove_instance(server, path) ⇒ Object
removes an instance record both in @instances_info and @instances_info.
- #to_file(filename) ⇒ Object
- #to_s ⇒ Object
-
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
-
#unique_id ⇒ ID
Content Data unique identification.
-
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
Constructor Details
#initialize(other = nil) ⇒ ContentData
Returns a new instance of ContentData.
32 33 34 35 36 37 38 39 40 |
# File 'lib/content_data/content_data.rb', line 32 def initialize(other = nil) if other.nil? @contents_info = {} # Checksum --> [size, paths-->time(instance), time(content)] @instances_info = {} # location --> checksum to optimize instances query else @contents_info = other.clone_contents_info @instances_info = other.clone_instances_info # location --> checksum to optimize instances query end end |
Instance Method Details
#==(other) ⇒ Object
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
# File 'lib/content_data/content_data.rb', line 203 def ==(other) return false if other.nil? return false if @contents_info.length != other.contents_size other.each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path| return false if instance_exists(path, server) != other.instance_exists(path, server) local_content_info = @contents_info[checksum] return false if local_content_info.nil? return false if local_content_info[0] != size return false if local_content_info[2] != content_mod_time #check instances local_instances = local_content_info[1] return false if other.checksum_instances_size(checksum) != local_instances.length location = [server, path] local_instance_mod_time = local_instances[location] return false if local_instance_mod_time.nil? return false if local_instance_mod_time != instance_mod_time } true end |
#add_instance(checksum, size, server, path, modification_time) ⇒ Object
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/content_data/content_data.rb', line 138 def add_instance(checksum, size, server, path, modification_time) location = [server, path] content_info = @contents_info[checksum] if content_info.nil? @contents_info[checksum] = [size, {location => modification_time}, modification_time] else if size != content_info[0] Log.warning('File size different from content size while same checksum') Log.warning("instance location:server:'#{location[0]}' path:'#{location[1]}'") Log.warning("instance mod time:'#{modification_time}'") end #override file if needed content_info[0] = size instances = content_info[1] instances[location] = modification_time end @instances_info[location] = checksum end |
#checksum_instances_size(checksum) ⇒ Object
125 126 127 128 129 |
# File 'lib/content_data/content_data.rb', line 125 def checksum_instances_size(checksum) content_info = @contents_info[checksum] return 0 if content_info.nil? content_info[1].length end |
#clone_contents_info ⇒ Object
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/content_data/content_data.rb', line 55 def clone_contents_info @contents_info.keys.inject({}) { |clone_contents_info, checksum| instances = @contents_info[checksum] size = instances[0] content_time = instances[2] instances_db = instances[1] instances_db_cloned = {} instances_db.keys.each { |location| instance_mtime = instances_db[location] instances_db_cloned[[location[0].clone,location[1].clone]]=instance_mtime } clone_contents_info[checksum] = [size, instances_db_cloned, content_time] clone_contents_info } end |
#clone_instances_info ⇒ Object
48 49 50 51 52 53 |
# File 'lib/content_data/content_data.rb', line 48 def clone_instances_info @instances_info.keys.inject({}) { |clone_instances_info, location| clone_instances_info[[location[0].clone, location[1].clone]] = @instances_info[location].clone clone_instances_info } end |
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time,
instance modification time, server and file path
102 103 104 105 106 107 108 109 110 111 |
# File 'lib/content_data/content_data.rb', line 102 def content_each_instance(checksum, &block) content_info = @contents_info[checksum] content_info[1].keys.each {|location| # provide the block with: checksum, size, content modification time,instance modification time, # server and path. instance_modification_time = content_info[1][location] block.call(checksum,content_info[0], content_info[2], instance_modification_time, location[0], location[1]) } end |
#content_exists(checksum) ⇒ Object
163 164 165 |
# File 'lib/content_data/content_data.rb', line 163 def content_exists(checksum) @contents_info.has_key?(checksum) end |
#contents_size ⇒ Object
113 114 115 |
# File 'lib/content_data/content_data.rb', line 113 def contents_size() @contents_info.length end |
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time
75 76 77 78 79 80 81 |
# File 'lib/content_data/content_data.rb', line 75 def each_content(&block) @contents_info.keys.each { |checksum| content_val = @contents_info[checksum] # provide checksum, size and content modification time to the block block.call(checksum,content_val[0], content_val[2]) } end |
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time,
instance modification time, server and file path
86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/content_data/content_data.rb', line 86 def each_instance(&block) @contents_info.keys.each { |checksum| content_info = @contents_info[checksum] content_info[1].keys.each {|location| # provide the block with: checksum, size, content modification time,instance modification time, # server and path. instance_modification_time = content_info[1][location] block.call(checksum,content_info[0], content_info[2], instance_modification_time, location[0], location[1]) } } end |
#empty? ⇒ Boolean
159 160 161 |
# File 'lib/content_data/content_data.rb', line 159 def empty? @contents_info.empty? end |
#from_file(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 |
# File 'lib/content_data/content_data.rb', line 266 def from_file(filename) lines = IO.readlines(filename) number_of_contents = lines[0].to_i i = 1 + number_of_contents number_of_instances = lines[i].to_i i += 1 number_of_instances.times { if lines[i].nil? Log.warning("line ##{i} is nil !!!, Backing filename: #{filename} to #{filename}.bad") FileUtils.cp(filename, "#{filename}.bad") Log.warning("Lines:\n#{lines[i].join("\n")}") else parameters = lines[i].split(',') # bugfix: if file name consist a comma then parsing based on comma separating fails if (parameters.size > 5) (4..parameters.size-2).each do |i| parameters[3] = [parameters[3], parameters[i]].join(",") end (4..parameters.size-2).each do |i| parameters.delete_at(4) end end add_instance(parameters[0], parameters[1].to_i, parameters[2], parameters[3], DateTime.parse(parameters[4]).to_time.to_i) end i += 1 } end |
#get_instance_mod_time(checksum, location) ⇒ Object
131 132 133 134 135 136 |
# File 'lib/content_data/content_data.rb', line 131 def get_instance_mod_time(checksum, location) content_info = @contents_info[checksum] return nil if content_info.nil? instances = content_info[1] instance_time = instances[location] end |
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn't be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison. In need of case user should define it’s own ‘==’ implemementation.
454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 |
# File 'lib/content_data/content_data.rb', line 454 def get_query(variable, params) raise RuntimeError.new 'This method is experimental and shouldn\'t be used' exact = params['exact'].nil? ? Array.new : params['exact'] from = params['from'] to = params ['to'] is_inside = params['is_inside'] unless ContentInstance.new.instance_variable_defined?("@#{attribute}") raise ArgumentError "#{variable} isn't a ContentInstance variable" end if (exact.nil? && from.nil? && to.nil?) raise ArgumentError 'At least one of the argiments {exact, from, to} must be defined' end if (!(from.nil? || to.nil?) && from.kind_of?(to.class)) raise ArgumentError 'to and from arguments should be comparable one with another' end # FIXME add support for from/to for Strings if ((!from.nil? && !from.kind_of?(Numeric.new.class))\ || (!to.nil? && to.kind_of?(Numeric.new.class))) raise ArgumentError 'from and to options supported only for numeric values' end if (!exact.empty? && (!from.nil? || !to.nil?)) raise ArgumentError 'exact and from/to options are mutually exclusive' end result_index = ContentData.new instances.each_value do |instance| is_match = false var_value = instance.instance_variable_get("@#{variable}") if exact.include? var_value is_match = true elsif (from.nil? || var_value > from) && (to.nil? || var_value < to) is_match = true end if (is_match && is_inside) || (!is_match && !is_inside) checksum = instance.checksum result_index.add_content(contents[checksum]) unless result_index.content_exists(checksum) result_index.add_instance instance end end result_index end |
#instance_exists(path, server) ⇒ Object
167 168 169 |
# File 'lib/content_data/content_data.rb', line 167 def instance_exists(path, server) @instances_info.has_key?([server, path]) end |
#instances_size ⇒ Object
117 118 119 120 121 122 123 |
# File 'lib/content_data/content_data.rb', line 117 def instances_size() counter=0 @contents_info.values.each { |content_info| counter += content_info[1].length } counter end |
#remove_content(checksum) ⇒ Object
223 224 225 226 227 228 229 230 231 |
# File 'lib/content_data/content_data.rb', line 223 def remove_content(checksum) content_info = @contents_info[checksum] if content_info content_info[1].each_key { |location| @instances_info.delete(location) } @contents_info.delete(checksum) end end |
#remove_directory(server, dir_to_remove) ⇒ Object
removes all instances records which are located under input param: dir_to_remove. found records are removed from both @instances_info and @instances_info. input params: server & dir_to_remove - are used to check each instance unique key (called location) removes also contents, if a contents becomes empty after removing instances
189 190 191 192 193 194 195 196 197 198 199 200 |
# File 'lib/content_data/content_data.rb', line 189 def remove_directory(server, dir_to_remove) @contents_info.keys.each { |checksum| instances = @contents_info[checksum][1] instances.each_key { |location| if location[0] == server and location[1].scan(dir_to_remove).size > 0 instances.delete(location) @instances_info.delete(location) end } @contents_info.delete(checksum) if instances.empty? } end |
#remove_instance(server, path) ⇒ Object
removes an instance record both in @instances_info and @instances_info. input params: server & path - are the instance unique key (called location) removes also the content, if content becomes empty after removing the instance
174 175 176 177 178 179 180 181 182 183 |
# File 'lib/content_data/content_data.rb', line 174 def remove_instance(server, path) location = [server, path] checksum = @instances_info[location] content_info = @contents_info[checksum] return nil if content_info.nil? instances = content_info[1] instances.delete(location) @contents_info.delete(checksum) if instances.empty? @instances_info.delete(location) end |
#to_file(filename) ⇒ Object
250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
# File 'lib/content_data/content_data.rb', line 250 def to_file(filename) content_data_dir = File.dirname(filename) FileUtils.makedirs(content_data_dir) unless File.directory?(content_data_dir) file = File.open(filename, 'w') file.write("#{@contents_info.length}\n") each_content { |checksum, size, content_mod_time| file.write("#{checksum},#{size},#{Time.at(content_mod_time)}\n") } file.write("#{@instances_info.length}\n") each_instance { |checksum, size, _, instance_mod_time, server, path| file.write("#{checksum},#{size},#{server},#{path},#{Time.at(instance_mod_time)}\n") } file.close end |
#to_s ⇒ Object
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 |
# File 'lib/content_data/content_data.rb', line 233 def to_s return_str = "" contents_str = "" instances_str = "" each_content { |checksum, size, content_mod_time| contents_str << "%s,%d,%d\n" % [checksum, size, content_mod_time] } each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path| instances_str << "%s,%d,%s,%s,%d\n" % [checksum, size, server, path, instance_mod_time] } return_str << "%d\n" % [@contents_info.length] return_str << contents_str return_str << "%d\n" % [@instances_info.length] return_str << instances_str return_str end |
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 |
# File 'lib/content_data/content_data.rb', line 301 def unify_time() @contents_info.keys.each { |checksum| content_info = @contents_info[checksum] min_time_per_checksum = content_info[2] instances = content_info[1] instances.keys.each { |location| instance_mod_time = instances[location] if instance_mod_time < min_time_per_checksum min_time_per_checksum = instance_mod_time end } # update all instances with min time instances.keys.each { |location| instances[location] = min_time_per_checksum } # update content time with min time content_info[2] = min_time_per_checksum } end |
#unique_id ⇒ ID
Content Data unique identification
44 45 46 |
# File 'lib/content_data/content_data.rb', line 44 def unique_id @instances_info.hash end |
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
There are two levels of validation, controlled by instance_check_level system parameter:
-
shallow - quick, tests instance for file existence and attributes.
-
deep - can take more time, in addition to shallow recalculates hash sum.
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 |
# File 'lib/content_data/content_data.rb', line 333 def validate(params = nil) # used to answer whether specific param was set param_exists = Proc.new do |param| !(params.nil? || params[param].nil?) end # used to process method parameters centrally process_params = Proc.new do |values| if param_exists.call(:failed) info = values[:details] unless info.nil? checksum = info[0] content_mtime = info[1] size = info[2] inst_mtime = info[3] server = info[4] file_path = info[5] params[:failed].add_instance(checksum, size, server, file_path, inst_mtime) end end end is_valid = true @contents_info.keys.each { |checksum| instances = @contents_info[checksum] content_size = instances[0] content_mtime = instances[2] instances[1].keys.each { |unique_path| instance_mtime = instances[1][unique_path] instance_info = [checksum, content_mtime, content_size, instance_mtime] instance_info.concat(unique_path) unless check_instance(instance_info) is_valid = false unless params.nil? || params.empty? process_params.call({:details => instance_info}) end end } } is_valid end |