Class: ContentData::ContentData
- Inherits:
-
Object
- Object
- ContentData::ContentData
- Defined in:
- lib/content_data/content_data.rb
Overview
Content Data(CD) object holds files information as contents and instances Files info retrieved from hardware: checksum, size, time modification, server, device and path Those attributes are divided into content and instance attributes:
unique checksum, size are content attributes
time modification, server, device and path are instance attributes
The relationship between content and instances is 1:many meaning that a content can have instances in many servers. content also has time attribute, which has the value of the time of the first instance. This can be changed by using unify_time method which sets all time attributes for a content and it’s instances to the min time off all. Different files(instances) with same content(checksum), are grouped together under that content. Interface methods include:
iterate over contents and instances info,
unify time, add/remove instance, queries, merge, remove directory and more.
Content info data structure:
@contents_info = { Checksum -> [size, *instances*, content_modification_time] }
*instances* = {[server,path] -> instance_modification_time }
Notes:
1. content_modification_time is the instance_modification_time of the first
instances which was added to @contents_info
Class Method Summary collapse
Instance Method Summary collapse
- #==(other) ⇒ Object
- #add_instance(checksum, size, server, path, modification_time) ⇒ Object
-
#clone_contents_info ⇒ Object
getting a cloned data base.
-
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time, instance modification time, server and file path.
- #content_exists(checksum) ⇒ Object
- #contents_size ⇒ Object
-
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time.
-
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time, instance modification time, server and file path.
- #empty? ⇒ Boolean
-
#from_file(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing.
- #get_instance_mod_time(checksum, location) ⇒ Object
-
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn't be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison.
-
#initialize(other = nil) ⇒ ContentData
constructor
A new instance of ContentData.
-
#instance_exists(path, server, checksum = nil) ⇒ Object
TODO (genadyp) consider about using hash for optional defining of parameters.
- #instances_size(checksum) ⇒ Object
- #remove_content(checksum) ⇒ Object
- #remove_directory(dir_to_remove, server) ⇒ Object
-
#remove_instance(location, checksum = nil) ⇒ Object
removes an instance from known content (faster then unknown content) remove also the content, if content becomes empty.
- #stats_by_location(location) ⇒ Object
- #to_file(filename) ⇒ Object
- #to_s ⇒ Object
-
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
-
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
Constructor Details
#initialize(other = nil) ⇒ ContentData
Returns a new instance of ContentData.
32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'lib/content_data/content_data.rb', line 32 def initialize(other = nil) ObjectSpace.define_finalizer(self, self.class.method(:finalize).to_proc) if Params['enable_monitoring'] ::ContentServer::Globals.process_vars.inc('obj add ContentData') end if other.nil? @contents_info = {} # Checksum --> [size, paths-->time(instance), time(content)] else @contents_info = other.clone_contents_info end end |
Class Method Details
.finalize(id) ⇒ Object
45 46 47 48 49 |
# File 'lib/content_data/content_data.rb', line 45 def self.finalize(id) if Params['enable_monitoring'] ::ContentServer::Globals.process_vars.inc('obj rem ContentData') end end |
Instance Method Details
#==(other) ⇒ Object
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# File 'lib/content_data/content_data.rb', line 210 def ==(other) return false if other.nil? return false if @contents_info.size != other.contents_size other.each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path| local_content_info = @contents_info[checksum] return false if local_content_info.nil? return false if local_content_info[0] != size return false if local_content_info[2] != content_mod_time #check instances local_instances = local_content_info[1] return false if other.instances_size(checksum) != local_instances.size location = [server, path] local_instance_mod_time = local_instances[location] return false if local_instance_mod_time.nil? return false if local_instance_mod_time != instance_mod_time } true end |
#add_instance(checksum, size, server, path, modification_time) ⇒ Object
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/content_data/content_data.rb', line 127 def add_instance(checksum, size, server, path, modification_time) location = [server, path] content_info = @contents_info[checksum] if content_info.nil? @contents_info[checksum] = [size, {location => modification_time}, modification_time] else if size != content_info[0] Log.warning 'File size different from content size while same checksum' Log.warning("instance location:server:'#{location[0]}' path:'#{location[1]}'") Log.warning("instance mod time:'#{modification_time}'") end #override file if needed content_info[0] = size instances = content_info[1] instances[location] = modification_time end end |
#clone_contents_info ⇒ Object
getting a cloned data base
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
# File 'lib/content_data/content_data.rb', line 52 def clone_contents_info @contents_info.keys.inject({}) { |clone_contents_info, checksum| instances = @contents_info[checksum] size = instances[0] content_time = instances[2] instances_db = instances[1] instances_db_cloned = {} instances_db.keys.each { |location| instance_mtime = instances_db[location] instances_db_cloned[[location[0].clone,location[1].clone]]=instance_mtime } clone_contents_info[checksum] = [size, instances_db_cloned, content_time] clone_contents_info } end |
#content_each_instance(checksum, &block) ⇒ Object
iterator of instances over specific content block is provided with: checksum, size, content modification time,
instance modification time, server and file path
99 100 101 102 103 104 105 106 107 108 |
# File 'lib/content_data/content_data.rb', line 99 def content_each_instance(checksum, &block) content_info = @contents_info[checksum] content_info[1].keys.each {|location| # provide the block with: checksum, size, content modification time,instance modification time, # server and path. instance_modification_time = content_info[1][location] block.call(checksum,content_info[0], content_info[2], instance_modification_time, location[0], location[1]) } end |
#content_exists(checksum) ⇒ Object
151 152 153 |
# File 'lib/content_data/content_data.rb', line 151 def content_exists(checksum) @contents_info.has_key?(checksum) end |
#contents_size ⇒ Object
110 111 112 |
# File 'lib/content_data/content_data.rb', line 110 def contents_size() @contents_info.size end |
#each_content(&block) ⇒ Object
iterator over @contents_info data structure (not including instances) block is provided with: checksum, size and content modification time
72 73 74 75 76 77 78 |
# File 'lib/content_data/content_data.rb', line 72 def each_content(&block) @contents_info.keys.each { |checksum| content_val = @contents_info[checksum] # provide checksum, size and content modification time to the block block.call(checksum,content_val[0], content_val[2]) } end |
#each_instance(&block) ⇒ Object
iterator over @contents_info data structure (including instances) block is provided with: checksum, size, content modification time,
instance modification time, server and file path
83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/content_data/content_data.rb', line 83 def each_instance(&block) @contents_info.keys.each { |checksum| content_info = @contents_info[checksum] content_info[1].keys.each {|location| # provide the block with: checksum, size, content modification time,instance modification time, # server and path. instance_modification_time = content_info[1][location] block.call(checksum,content_info[0], content_info[2], instance_modification_time, location[0], location[1]) } } end |
#empty? ⇒ Boolean
147 148 149 |
# File 'lib/content_data/content_data.rb', line 147 def empty? @contents_info.empty? end |
#from_file(filename) ⇒ Object
TODO validation that file indeed contains ContentData missing
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
# File 'lib/content_data/content_data.rb', line 260 def from_file(filename) lines = IO.readlines(filename) number_of_contents = lines[0].to_i i = 1 + number_of_contents number_of_instances = lines[i].to_i i += 1 number_of_instances.times { if lines[i].nil? Log.warning "line ##{i} is nil !!!, Backing filename: #{filename} to #{filename}.bad" FileUtils.cp(filename, "#{filename}.bad") Log.warning("Lines:\n#{lines[i].join("\n")}") else parameters = lines[i].split(',') # bugfix: if file name consist a comma then parsing based on comma separating fails if (parameters.size > 5) (4..parameters.size-2).each do |i| parameters[3] = [parameters[3], parameters[i]].join(",") end (4..parameters.size-2).each do |i| parameters.delete_at(4) end end add_instance(parameters[0], parameters[1].to_i, parameters[2], parameters[3], parameters[4].to_i) end i += 1 } end |
#get_instance_mod_time(checksum, location) ⇒ Object
120 121 122 123 124 125 |
# File 'lib/content_data/content_data.rb', line 120 def get_instance_mod_time(checksum, location) content_info = @contents_info[checksum] return nil if content_info.nil? instances = content_info[1] instance_time = instances[location] end |
#get_query(variable, params) ⇒ Object
TODO simplify conditions This mehod is experimental and shouldn't be used nil is used to define +/- infinity for to/from method arguments from/to values are exlusive in condition’a calculations Need to take care about ‘==’ operation that is used for object’s comparison. In need of case user should define it’s own ‘==’ implemementation.
448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 |
# File 'lib/content_data/content_data.rb', line 448 def get_query(variable, params) raise RuntimeError.new 'This method is experimental and shouldn\'t be used' exact = params['exact'].nil? ? Array.new : params['exact'] from = params['from'] to = params ['to'] is_inside = params['is_inside'] unless ContentInstance.new.instance_variable_defined?("@#{attribute}") raise ArgumentError "#{variable} isn't a ContentInstance variable" end if (exact.nil? && from.nil? && to.nil?) raise ArgumentError 'At least one of the argiments {exact, from, to} must be defined' end if (!(from.nil? || to.nil?) && from.kind_of?(to.class)) raise ArgumentError 'to and from arguments should be comparable one with another' end # FIXME add support for from/to for Strings if ((!from.nil? && !from.kind_of?(Numeric.new.class))\ || (!to.nil? && to.kind_of?(Numeric.new.class))) raise ArgumentError 'from and to options supported only for numeric values' end if (!exact.empty? && (!from.nil? || !to.nil?)) raise ArgumentError 'exact and from/to options are mutually exclusive' end result_index = ContentData.new instances.each_value do |instance| is_match = false var_value = instance.instance_variable_get("@#{variable}") if exact.include? var_value is_match = true elsif (from.nil? || var_value > from) && (to.nil? || var_value < to) is_match = true end if (is_match && is_inside) || (!is_match && !is_inside) checksum = instance.checksum result_index.add_content(contents[checksum]) unless result_index.content_exists(checksum) result_index.add_instance instance end end result_index end |
#instance_exists(path, server, checksum = nil) ⇒ Object
TODO (genadyp) consider about using hash for optional defining of parameters
157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/content_data/content_data.rb', line 157 def instance_exists(path, server, checksum=nil) location = [server, path] if checksum.nil? @contents_info.values.any? { |content_db| content_db[1].has_key?(location) } else content_info = @contents_info[checksum] return false if content_info.nil? content_info[1].has_key?(location) end end |
#instances_size(checksum) ⇒ Object
114 115 116 117 118 |
# File 'lib/content_data/content_data.rb', line 114 def instances_size(checksum) content_info = @contents_info[checksum] return 0 if content_info.nil? content_info[1].size end |
#remove_content(checksum) ⇒ Object
229 230 231 |
# File 'lib/content_data/content_data.rb', line 229 def remove_content(checksum) @contents_info.delete(checksum) end |
#remove_directory(dir_to_remove, server) ⇒ Object
199 200 201 202 203 204 205 206 207 |
# File 'lib/content_data/content_data.rb', line 199 def remove_directory(dir_to_remove, server) @contents_info.keys.each { |checksum| instances = @contents_info[checksum][1] instances.delete_if { |location, _| location[0] == server and location[1].scan(dir_to_remove).size > 0 } @contents_info.delete(checksum) if instances.empty? } end |
#remove_instance(location, checksum = nil) ⇒ Object
removes an instance from known content (faster then unknown content) remove also the content, if content becomes empty
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/content_data/content_data.rb', line 182 def remove_instance(location, checksum=nil) if checksum.nil? @contents_info.keys.each { |checksum| instances = @contents_info[checksum][1] instances.delete(location) @contents_info.delete(checksum) if instances.empty? } else content_info = @contents_info[checksum] unless content_info.nil? instances = content_info[1] instances.delete(location) @contents_info.delete(checksum) if instances.empty? end end end |
#stats_by_location(location) ⇒ Object
170 171 172 173 174 175 176 177 |
# File 'lib/content_data/content_data.rb', line 170 def stats_by_location(location) @contents_info.each_value { |content_db| if content_db[1].has_key?(location) return [content_db[0], content_db[1][location]] end } return nil end |
#to_file(filename) ⇒ Object
253 254 255 256 257 |
# File 'lib/content_data/content_data.rb', line 253 def to_file(filename) content_data_dir = File.dirname(filename) FileUtils.makedirs(content_data_dir) unless File.directory?(content_data_dir) File.open(filename, 'w') {|f| f.write(to_s) } end |
#to_s ⇒ Object
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'lib/content_data/content_data.rb', line 233 def to_s return_str = "" contents_str = "" instances_str = "" instances_counter = 0 each_content { |checksum, size, content_mod_time| contents_str << "%s,%d,%d\n" % [checksum, size, content_mod_time] } instances_counter = 0 each_instance { |checksum, size, content_mod_time, instance_mod_time, server, path| instances_counter += 1 instances_str << "%s,%d,%s,%s,%d\n" % [checksum, size, server, path, instance_mod_time] } return_str << "%d\n" % [@contents_info.size] return_str << contents_str return_str << "%d\n" % [instances_counter] return_str << instances_str return_str end |
#unify_time ⇒ Object
for each content, all time fields (content and instances) are replaced with the min time found, while going through all time fields.
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
# File 'lib/content_data/content_data.rb', line 295 def unify_time() @contents_info.keys.each { |checksum| content_info = @contents_info[checksum] min_time_per_checksum = content_info[2] instances = content_info[1] instances.keys.each { |location| instance_mod_time = instances[location] if instance_mod_time < min_time_per_checksum min_time_per_checksum = instance_mod_time end } # update all instances with min time instances.keys.each { |location| instances[location] = min_time_per_checksum } # update content time with min time content_info[2] = min_time_per_checksum } end |
#validate(params = nil) ⇒ Boolean
Validates index against file system that all instances hold a correct data regarding files that they represents.
There are two levels of validation, controlled by instance_check_level system parameter:
-
shallow - quick, tests instance for file existence and attributes.
-
deep - can take more time, in addition to shallow recalculates hash sum.
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 |
# File 'lib/content_data/content_data.rb', line 327 def validate(params = nil) # used to answer whether specific param was set param_exists = Proc.new do |param| !(params.nil? || params[param].nil?) end # used to process method parameters centrally process_params = Proc.new do |values| if param_exists.call(:failed) info = values[:details] unless info.nil? checksum = info[0] content_mtime = info[1] size = info[2] inst_mtime = info[3] server = info[4] file_path = info[5] params[:failed].add_instance(checksum, size, server, file_path, inst_mtime) end end end is_valid = true @contents_info.keys.each { |checksum| instances = @contents_info[checksum] content_size = instances[0] content_mtime = instances[2] instances[1].keys.each { |unique_path| instance_mtime = instances[1][unique_path] instance_info = [checksum, content_mtime, content_size, instance_mtime] instance_info.concat(unique_path) unless check_instance(instance_info) is_valid = false unless params.nil? || params.empty? process_params.call({:details => instance_info}) end end } } is_valid end |