Class: ContentData::ContentData
- Inherits:
-
Object
- Object
- ContentData::ContentData
- Defined in:
- lib/content_data/content_data.rb
Overview
Content Data(CD) object holds files information as contents and instances Files info retrieved from hardware: checksum, size, time modification, server, device and path Those attributes are divided into content and instance attributes:
unique checksum, size are content attributes
time modification, server, device and path are instance attributes
The relationship between content and instances is 1:many meaning that a content can have instances in many servers. content also has time attribute, which has the value of the time of the first instance. This can be changed by using unify_time method which sets all time attributes for a content and it’s instances to the min time off all. Different files(instances) with same content(checksum), are grouped together under that content. Interface methods include:
iterate over contents and instances info,
unify time, add/remove instance, queries, merge, remove directory and more.
Content info data structure:
@contents_info = { Checksum -> [size, *instances*, content_modification_time] }
*instances* = {[server,path] -> [instance_modification_time,index_time] }
Notes:
1. content_modification_time is the instance_modification_time of the first
instances which was added to @contents_info
Constant Summary collapse
- CHUNK_SIZE =
5000
Instance Method Summary collapse
- #==(other) ⇒ Object
- #add_instance(checksum, size, server, path, modification_time, index_time = Time.now.to_i) ⇒ Object
- #add_symlink(server, path, target) ⇒ Object
- #checksum_instances_size(checksum) ⇒ Object
- #clone_contents_info ⇒ Object
- #clone_instances_info ⇒ Object
- #clone_symlinks_info ⇒ Object
-
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time, instance modification time, server and file path.
- #content_exists(checksum) ⇒ Object
- #contents_size ⇒ Object
-
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time.
-
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time, instance modification time, server and file path.
-
#each_symlink(&block) ⇒ Object
iterator over @symlinks_info data structure block is provided with: server, file path and target.
- #empty? ⇒ Boolean
-
#from_file(filename) ⇒ Object
Imports content data from file.
-
#from_file_old(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing TODO class level method? Loading db from file using chunks for better memory performance.
- #get_instance_mod_time(checksum, location) ⇒ Object
-
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn't be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison.
-
#initialize(other = nil) ⇒ ContentData
constructor
NOTE Cloning is time/memory expensive operation.
- #instance_exists(path, server) ⇒ Object
- #instances_size ⇒ Object
- #read_old_contents_chunk(filename, file, chunk_size) ⇒ Object
- #read_old_instances_chunk(filename, file, chunk_size) ⇒ Object
- #remove_content(checksum) ⇒ Object
-
#remove_directory(dir_to_remove, server) ⇒ Object
removes all instances records which are located under input param: dir_to_remove.
-
#remove_instance(server, path) ⇒ Object
removes an instance record both in @instances_info and @instances_info.
- #remove_symlink(server, path) ⇒ Object
- #symlink_exists(path, server) ⇒ Object
- #symlinks_size ⇒ Object
-
#to_file(filename) ⇒ Object
Write content data to file.
- #to_file_old(filename) ⇒ Object deprecated Deprecated.
- #to_old_file_contents_chunk(file, contents_enum, chunk_size) ⇒ Object
- #to_old_file_instances_chunk(file, contents_enum, chunk_size) ⇒ Object
-
#to_s ⇒ Object
Don’t use in production, only for testing.
-
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
-
#unique_id ⇒ ID
Content Data unique identification.
-
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
Constructor Details
#initialize(other = nil) ⇒ ContentData
NOTE Cloning is time/memory expensive operation. It is highly recomended to avoid it.
38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/content_data/content_data.rb', line 38 def initialize(other = nil) if other.nil? @contents_info = {} # Checksum --> [size, paths-->time(instance), time(content)] @instances_info = {} # location --> checksum to optimize instances query @symlinks_info = {} # [server,symlink path] -> target else @contents_info = other.clone_contents_info @instances_info = other.clone_instances_info # location --> checksum to optimize instances query @symlinks_info = other.clone_symlinks_info end end |
Instance Method Details
#==(other) ⇒ Object
284 285 286 287 |
# File 'lib/content_data/content_data.rb', line 284 def ==(other) return nil if other.nil? # for this case: content_data == nil unique_id == other.unique_id end |
#add_instance(checksum, size, server, path, modification_time, index_time = Time.now.to_i) ⇒ Object
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
# File 'lib/content_data/content_data.rb', line 187 def add_instance(checksum, size, server, path, modification_time, index_time=Time.now.to_i) location = [server, path] # file was changed but remove_instance was not called if (@instances_info.include?(location) && @instances_info[location] != checksum) Log.warning("#{server}:#{path} file already exists with different checksum") remove_instance(server, path) end content_info = @contents_info[checksum] if content_info.nil? @contents_info[checksum] = [size, {location => [modification_time,index_time]}, modification_time] else if size != content_info[0] Log.warning('File size different from content size while same checksum') Log.warning("instance location:server:'#{location[0]}' path:'#{location[1]}'") Log.warning("instance mod time:'#{modification_time}'") end #override file if needed content_info[0] = size instances = content_info[1] instances[location] = [modification_time, index_time] end @instances_info[location] = checksum end |
#add_symlink(server, path, target) ⇒ Object
215 216 217 |
# File 'lib/content_data/content_data.rb', line 215 def add_symlink(server, path, target) @symlinks_info[[server,path]] = target end |
#checksum_instances_size(checksum) ⇒ Object
173 174 175 176 177 |
# File 'lib/content_data/content_data.rb', line 173 def checksum_instances_size(checksum) content_info = @contents_info[checksum] return 0 if content_info.nil? content_info[1].length end |
#clone_contents_info ⇒ Object
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/content_data/content_data.rb', line 66 def clone_contents_info clone_contents_info = {} contents_info_enum = @contents_info.each_key loop { checksum = contents_info_enum.next rescue break instances = @contents_info[checksum] size = instances[0] content_time = instances[2] instances_db = instances[1] instances_db_cloned = {} instances_db_enum = instances_db.each_key loop { location = instances_db_enum.next rescue break inst_mod_times = instances_db[location] # we use deep clone for location since map key is using shallow clone. # we dont want references between new content data # and orig object. This will help the GC dispose the orig object if not used any more. instances_db_cloned[[location[0].clone,location[1].clone]] = inst_mod_times.clone } clone_contents_info[checksum] = [size, instances_db_cloned, content_time] } clone_contents_info end |
#clone_instances_info ⇒ Object
56 57 58 59 60 61 62 63 64 |
# File 'lib/content_data/content_data.rb', line 56 def clone_instances_info clone_instances_info = {} instances_info_enum = @instances_info.each_key loop { location = instances_info_enum.next rescue break clone_instances_info[[location[0].clone, location[1].clone]] = @instances_info[location].clone } clone_instances_info end |
#clone_symlinks_info ⇒ Object
92 93 94 95 96 97 98 99 100 |
# File 'lib/content_data/content_data.rb', line 92 def clone_symlinks_info symlinks_info_enum = @symlinks_info.each_key cloned_symlinks = {} loop { symlink_key = symlinks_info_enum.next rescue break cloned_symlinks[[symlink_key[0].clone, symlink_key[0].clone]] = @symlinks_info[symlink_key].clone } cloned_symlinks end |
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time,
instance modification time, server and file path
137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/content_data/content_data.rb', line 137 def content_each_instance(checksum, &block) content_info = @contents_info[checksum] instances_db_enum = content_info[1].each_key loop { location = instances_db_enum.next rescue break # provide the block with: checksum, size, content modification time,instance modification time, # server and path. inst_mod_time,_ = content_info[1][location] block.call(checksum,content_info[0], content_info[2], inst_mod_time, location[0], location[1]) } end |
#content_exists(checksum) ⇒ Object
227 228 229 |
# File 'lib/content_data/content_data.rb', line 227 def content_exists(checksum) @contents_info.has_key?(checksum) end |
#contents_size ⇒ Object
161 162 163 |
# File 'lib/content_data/content_data.rb', line 161 def contents_size() @contents_info.length end |
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time
104 105 106 107 108 109 110 111 112 |
# File 'lib/content_data/content_data.rb', line 104 def each_content(&block) contents_enum = @contents_info.each_key loop { checksum = contents_enum.next rescue break content_val = @contents_info[checksum] # provide checksum, size and content modification time to the block block.call(checksum,content_val[0], content_val[2]) } end |
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time,
instance modification time, server and file path
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/content_data/content_data.rb', line 117 def each_instance(&block) contents_enum = @contents_info.each_key loop { checksum = contents_enum.next rescue break content_info = @contents_info[checksum] content_info_enum = content_info[1].each_key loop { location = content_info_enum.next rescue break # provide the block with: checksum, size, content modification time,instance modification time, # server and path. inst_mod_time, inst_index_time = content_info[1][location] block.call(checksum,content_info[0], content_info[2], inst_mod_time, location[0], location[1], inst_index_time) } } end |
#each_symlink(&block) ⇒ Object
iterator over @symlinks_info data structure block is provided with: server, file path and target
152 153 154 155 156 157 158 159 |
# File 'lib/content_data/content_data.rb', line 152 def each_symlink(&block) symlink_enum = @symlinks_info.each_key loop { symlink_key = symlink_enum.next rescue break symlink_target = @symlinks_info[symlink_key] block.call(symlink_key[0], symlink_key[1], symlink_target) } end |
#empty? ⇒ Boolean
223 224 225 |
# File 'lib/content_data/content_data.rb', line 223 def empty? @contents_info.empty? and @symlinks_info.empty? end |
#from_file(filename) ⇒ Object
Imports content data from file. This method will throw an exception if the file is not in correct format.
343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
# File 'lib/content_data/content_data.rb', line 343 def from_file(filename) unless File.exists? filename raise ArgumentError.new "No such a file #{filename}" end number_of_instances = nil number_of_symlinks = nil Zlib::GzipReader.open(filename) { |gz| gz.each_line do |line| row = line.parse_csv if number_of_instances == nil begin # get number of instances number_of_instances = row[0].to_i rescue ArgumentError raise("Parse error of content data file:#{filename} line ##{$.}\n" + "number of instances should be a Number. We got:#{number_of_instances}") end elsif number_of_instances > 0 if (6 != row.length) raise("Parse error of content data file:#{filename} line ##{$.}\n" + "Expected to read 6 fields ('<' separated) but got #{row.length}.\nLine:#{instance_line}") end add_instance(row[0], #checksum row[1].to_i, # size row[2], # server row[3], # path row[4].to_i, # mod time row[5].to_i) # index time number_of_instances -= 1 elsif number_of_symlinks == nil begin # get number of symlinks number_of_symlinks = row[0].to_i rescue ArgumentError raise("Parse error of content data file:#{filename} line ##{$.}\n" + "number of symlinks should be a Number. We got:#{number_of_symlinks}") end elsif number_of_symlinks > 0 if (3 != row.length) raise("Parse error of content data file:#{filename} line ##{$.}\n" + "Expected to read 3 fields ('<' separated) but got #{row.length}.\nLine:#{symlinks_line}") end @symlinks_info[[row[0], row[1]]] = row[2] number_of_symlinks -= 1 end end } end |
#from_file_old(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing TODO class level method? Loading db from file using chunks for better memory performance
466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 |
# File 'lib/content_data/content_data.rb', line 466 def from_file_old(filename) # read first line (number of contents) # calculate line number (number of instances) # read number of instances. # loop over instances lines (using chunks) and add instances unless File.exists? filename raise ArgumentError.new "No such a file #{filename}" end File.open(filename, 'r') { |file| # Get number of contents (at first line) number_of_contents = file.gets # this gets the next line or return nil at EOF unless (number_of_contents and number_of_contents.match(/^[\d]+$/)) # check that line is of Number format raise("Parse error of content data file:#{filename} line ##{$.}\n" + "number of contents should be a number. We got:#{number_of_contents}") end number_of_contents = number_of_contents.to_i # advance file lines over all contents. We need only the instances data to build the content data object # use chunks and GC contents_chunks = number_of_contents / CHUNK_SIZE contents_chunks += 1 if (contents_chunks * CHUNK_SIZE < number_of_contents) chunk_index = 0 while chunk_index < contents_chunks chunk_size = CHUNK_SIZE if chunk_index + 1 == contents_chunks # update last chunk size chunk_size = number_of_contents - (chunk_index * CHUNK_SIZE) end return unless read_old_contents_chunk(filename, file, chunk_size) GC.start chunk_index += 1 end # get number of instances number_of_instances = file.gets unless (number_of_instances and number_of_instances.match(/^[\d]+$/)) # check that line is of Number format raise("Parse error of content data file:#{filename} line ##{$.}\n" + "number of instances should be a Number. We got:#{number_of_instances}") end number_of_instances = number_of_instances.to_i # read in instances chunks and GC instances_chunks = number_of_instances / CHUNK_SIZE instances_chunks += 1 if (instances_chunks * CHUNK_SIZE < number_of_instances) chunk_index = 0 while chunk_index < instances_chunks chunk_size = CHUNK_SIZE if chunk_index + 1 == instances_chunks # update last chunk size chunk_size = number_of_instances - (chunk_index * CHUNK_SIZE) end return unless read_old_instances_chunk(filename, file, chunk_size) GC.start chunk_index += 1 end # get number of symlinks number_of_symlinks = file.gets unless (number_of_symlinks and number_of_symlinks.match(/^[\d]+$/)) # check that line is of Number format raise("Parse error of content data file:#{filename} line ##{$.}\n" + "number of symlinks should be a Number. We got:#{number_of_symlinks}") end number_of_symlinks.to_i.times { symlinks_line = file.gets.strip unless symlinks_line raise("Parse error of content data file:#{filename} line ##{$.}\n" + "Expected to read symlink line but reached EOF") end parameters = symlinks_line.split(',') if (3 != parameters.length) raise("Parse error of content data file:#{filename} line ##{$.}\n" + "Expected to read 3 fields (comma separated) but got #{parameters.length}.\nLine:#{symlinks_line}") end @symlinks_info[[parameters[0],parameters[1]]] = parameters[2] } } end |
#get_instance_mod_time(checksum, location) ⇒ Object
179 180 181 182 183 184 185 |
# File 'lib/content_data/content_data.rb', line 179 def get_instance_mod_time(checksum, location) content_info = @contents_info[checksum] return nil if content_info.nil? instances = content_info[1] instance_time,_ = instances[location] instance_time end |
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn't be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison. In need of case user should define it’s own ‘==’ implemementation.
754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 |
# File 'lib/content_data/content_data.rb', line 754 def get_query(variable, params) raise RuntimeError.new 'This method is experimental and shouldn\'t be used' exact = params['exact'].nil? ? Array.new : params['exact'] from = params['from'] to = params ['to'] is_inside = params['is_inside'] unless ContentInstance.new.instance_variable_defined?("@#{attribute}") raise ArgumentError "#{variable} isn't a ContentInstance variable" end if (exact.nil? && from.nil? && to.nil?) raise ArgumentError 'At least one of the argiments {exact, from, to} must be defined' end if (!(from.nil? || to.nil?) && from.kind_of?(to.class)) raise ArgumentError 'to and from arguments should be comparable one with another' end # FIXME add support for from/to for Strings if ((!from.nil? && !from.kind_of?(Numeric.new.class))\ || (!to.nil? && to.kind_of?(Numeric.new.class))) raise ArgumentError 'from and to options supported only for numeric values' end if (!exact.empty? && (!from.nil? || !to.nil?)) raise ArgumentError 'exact and from/to options are mutually exclusive' end result_index = ContentData.new instances.each_value do |instance| is_match = false var_value = instance.instance_variable_get("@#{variable}") if exact.include? var_value is_match = true elsif (from.nil? || var_value > from) && (to.nil? || var_value < to) is_match = true end if (is_match && is_inside) || (!is_match && !is_inside) checksum = instance.checksum result_index.add_content(contents[checksum]) unless result_index.content_exists(checksum) result_index.add_instance instance end end result_index end |
#instance_exists(path, server) ⇒ Object
231 232 233 |
# File 'lib/content_data/content_data.rb', line 231 def instance_exists(path, server) @instances_info.has_key?([server, path]) end |
#instances_size ⇒ Object
165 166 167 |
# File 'lib/content_data/content_data.rb', line 165 def instances_size() @instances_info.length end |
#read_old_contents_chunk(filename, file, chunk_size) ⇒ Object
545 546 547 548 549 550 551 552 553 554 555 |
# File 'lib/content_data/content_data.rb', line 545 def read_old_contents_chunk(filename, file, chunk_size) chunk_index = 0 while chunk_index < chunk_size unless file.gets raise("Parse error of content data file:#{filename} line ##{$.}\n" + "Expecting content line but reached end of file") end chunk_index += 1 end true end |
#read_old_instances_chunk(filename, file, chunk_size) ⇒ Object
557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 |
# File 'lib/content_data/content_data.rb', line 557 def read_old_instances_chunk(filename, file, chunk_size) chunk_index = 0 while chunk_index < chunk_size instance_line = file.gets unless instance_line raise("Parse error of content data file:#{filename} line ##{$.}\n" + "Expected to read Instance line but reached EOF") end parameters = instance_line.split(',') # bugfix: if file name consist a comma then parsing based on comma separating fails if (parameters.size > 6) (4..parameters.size-3).each do |i| parameters[3] = [parameters[3], parameters[i]].join(",") end (4..parameters.size-3).each do |i| parameters.delete_at(4) end end add_instance(parameters[0], #checksum parameters[1].to_i, # size parameters[2], # server parameters[3], # path parameters[4].to_i, # mod time parameters[5].to_i) # index time chunk_index += 1 end true end |
#remove_content(checksum) ⇒ Object
289 290 291 292 293 294 295 296 297 |
# File 'lib/content_data/content_data.rb', line 289 def remove_content(checksum) content_info = @contents_info[checksum] if content_info content_info[1].each_key { |location| @instances_info.delete(location) } @contents_info.delete(checksum) end end |
#remove_directory(dir_to_remove, server) ⇒ Object
removes all instances records which are located under input param: dir_to_remove. found records are removed from @contents_info , @instances_info and @symlinks_info input params: server & dir_to_remove - are used to check each instance unique key (called location) removes also contents, if a contents becomes empty after removing instances
258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
# File 'lib/content_data/content_data.rb', line 258 def remove_directory(dir_to_remove, server) contents_enum = @contents_info.each_key loop { checksum = contents_enum.next rescue break instances = @contents_info[checksum][1] instances_enum = instances.each_key loop { location = instances_enum.next rescue break if location[0] == server and location[1].scan(dir_to_remove).size > 0 instances.delete(location) @instances_info.delete(location) end } @contents_info.delete(checksum) if instances.empty? } # handle symlinks symlinks_enum = @symlinks_info.each_key loop { symlink_key = symlinks_enum.next rescue break if symlink_key[0] == server and symlink_key[1].scan(dir_to_remove).size > 0 @symlinks_info.delete(symlink_key) end } end |
#remove_instance(server, path) ⇒ Object
removes an instance record both in @instances_info and @instances_info. input params: server & path - are the instance unique key (called location) removes also the content, if content becomes empty after removing the instance
243 244 245 246 247 248 249 250 251 252 |
# File 'lib/content_data/content_data.rb', line 243 def remove_instance(server, path) location = [server, path] checksum = @instances_info[location] content_info = @contents_info[checksum] return nil if content_info.nil? instances = content_info[1] instances.delete(location) @contents_info.delete(checksum) if instances.empty? @instances_info.delete(location) end |
#remove_symlink(server, path) ⇒ Object
219 220 221 |
# File 'lib/content_data/content_data.rb', line 219 def remove_symlink(server, path) @symlinks_info.delete([server,path]) end |
#symlink_exists(path, server) ⇒ Object
235 236 237 |
# File 'lib/content_data/content_data.rb', line 235 def symlink_exists(path, server) @symlinks_info.has_key?([server, path]) end |
#symlinks_size ⇒ Object
169 170 171 |
# File 'lib/content_data/content_data.rb', line 169 def symlinks_size() @symlinks_info.length end |
#to_file(filename) ⇒ Object
Write content data to file.
326 327 328 329 330 331 332 333 334 335 336 337 338 339 |
# File 'lib/content_data/content_data.rb', line 326 def to_file(filename) content_data_dir = File.dirname(filename) FileUtils.makedirs(content_data_dir) unless File.directory?(content_data_dir) Zlib::GzipWriter.open(filename) do |gz| gz.write [@instances_info.length].to_csv each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path, inst_index_time| gz.write [checksum, size, server, path, instance_mod_time, inst_index_time].to_csv } gz.write [@symlinks_info.length].to_csv each_symlink { |file, path, target| gz.write [file, path, target].to_csv } end end |
#to_file_old(filename) ⇒ Object
DEPRECATED: Old deprecated from/to file methods still needed for migration purposes Write content data to file. Write is using chunks (for both content chunks and instances chunks) Chunk is used to maximize GC affect. The temporary memory of each chunk is GCed. Without the chunks used in a dipper stack level, GC keeps the temporary objects as part of the stack context.
399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 |
# File 'lib/content_data/content_data.rb', line 399 def to_file_old(filename) content_data_dir = File.dirname(filename) FileUtils.makedirs(content_data_dir) unless File.directory?(content_data_dir) File.open(filename, 'w') { |file| # Write contents file.write("#{@contents_info.length}\n") contents_enum = @contents_info.each_key content_chunks = @contents_info.length / CHUNK_SIZE + 1 chunks_counter = 0 while chunks_counter < content_chunks to_old_file_contents_chunk(file,contents_enum, CHUNK_SIZE) GC.start chunks_counter += 1 end # Write instances file.write("#{@instances_info.length}\n") contents_enum = @contents_info.each_key chunks_counter = 0 while chunks_counter < content_chunks to_old_file_instances_chunk(file,contents_enum, CHUNK_SIZE) GC.start chunks_counter += 1 end # Write symlinks symlinks_info_enum = @symlinks_info.each_key file.write("#{@symlinks_info.length}\n") loop { symlink_key = symlinks_info_enum.next rescue break file.write("#{symlink_key[0]},#{symlink_key[1]},#{@symlinks_info[symlink_key]}\n") } } end |
#to_old_file_contents_chunk(file, contents_enum, chunk_size) ⇒ Object
434 435 436 437 438 439 440 441 442 |
# File 'lib/content_data/content_data.rb', line 434 def to_old_file_contents_chunk(file, contents_enum, chunk_size) chunk_counter = 0 while chunk_counter < chunk_size checksum = contents_enum.next rescue return content_info = @contents_info[checksum] file.write("#{checksum},#{content_info[0]},#{content_info[2]}\n") chunk_counter += 1 end end |
#to_old_file_instances_chunk(file, contents_enum, chunk_size) ⇒ Object
444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 |
# File 'lib/content_data/content_data.rb', line 444 def to_old_file_instances_chunk(file, contents_enum, chunk_size) chunk_counter = 0 while chunk_counter < chunk_size checksum = contents_enum.next rescue return content_info = @contents_info[checksum] instances_db_enum = content_info[1].each_key loop { location = instances_db_enum.next rescue break # provide the block with: checksum, size, content modification time,instance modification time, # server and path. instance_modification_time,instance_index_time = content_info[1][location] file.write("#{checksum},#{content_info[0]},#{location[0]},#{location[1]}," + "#{instance_modification_time},#{instance_index_time}\n") } chunk_counter += 1 break if chunk_counter == chunk_size end end |
#to_s ⇒ Object
Don’t use in production, only for testing.
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
# File 'lib/content_data/content_data.rb', line 300 def to_s return_str = "" contents_str = "" instances_str = "" each_content { |checksum, size, content_mod_time| contents_str << "%s,%d,%d\n" % [checksum, size, content_mod_time] } each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path| instances_str << "%s,%d,%s,%s,%d\n" % [checksum, size, server, path, instance_mod_time] } return_str << "%d\n" % [@contents_info.length] return_str << contents_str return_str << "%d\n" % [@instances_info.length] return_str << instances_str symlinks_str = "" each_symlink { |server, path, target| symlinks_str << "%s,%s,%s\n" % [server, path, target] } return_str << symlinks_str return_str end |
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 |
# File 'lib/content_data/content_data.rb', line 591 def unify_time() contents_enum = @contents_info.each_key loop { checksum = contents_enum.next rescue break content_info = @contents_info[checksum] min_time_per_checksum = content_info[2] instances = content_info[1] instances_enum = instances.each_key loop { location = instances_enum.next rescue break instance_mod_time = instances[location][0] if instance_mod_time < min_time_per_checksum min_time_per_checksum = instance_mod_time end } # update all instances with min time instances_enum = instances.each_key loop { location = instances_enum.next rescue break instances[location][0] = min_time_per_checksum } # update content time with min time content_info[2] = min_time_per_checksum } end |
#unique_id ⇒ ID
Content Data unique identification
52 53 54 |
# File 'lib/content_data/content_data.rb', line 52 def unique_id [@contents_info.hash,@symlinks_info.hash] end |
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
There are two levels of validation, controlled by instance_check_level system parameter:
-
shallow - quick, tests instance for file existence and attributes.
-
deep - can take more time, in addition to shallow recalculates hash sum.
629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 |
# File 'lib/content_data/content_data.rb', line 629 def validate(params = nil) # used to answer whether specific param was set param_exists = Proc.new do |param| !(params.nil? || params[param].nil?) end # used to process method parameters centrally process_params = Proc.new do |values| if param_exists.call(:failed) info = values[:details] unless info.nil? checksum = info[0] content_mtime = info[1] size = info[2] inst_mtime = info[3] server = info[4] file_path = info[5] params[:failed].add_instance(checksum, size, server, file_path, inst_mtime) end end end is_valid = true contents_enum = @contents_info.each_key loop { checksum = contents_enum.next rescue break instances = @contents_info[checksum] content_size = instances[0] content_mtime = instances[2] instances_enum = instances[1].each_key loop { unique_path = instances_enum.next rescue break instance_mtime = instances[1][unique_path][0] instance_info = [checksum, content_mtime, content_size, instance_mtime] instance_info.concat(unique_path) unless check_instance(instance_info) is_valid = false unless params.nil? || params.empty? process_params.call({:details => instance_info}) end end } } is_valid end |