Class: PureCDB::Writer
Overview
Write 32 or 64 bit CDB files
Memory considerations
While the entry is written to the target object immediately on calling #store, the actual hash tables can not be written until the full dataset is ready. You must therefore be able to hold the hash of each key (including duplicates) and the position in the file the full netry is stored at in memory while building the CDB file.
It would be possible to write this to a temporary file at the cost of performance, but the current implementation does not do this.
As a compromise, the current implementation stores the hashes and positions as a BER encoded string per hash bucket until it is ready to write it to disk.
Constant Summary
Constants inherited from Base
Base::CDB64_MAGIC, Base::DEFAULT_HASHPTR_SIZE, Base::DEFAULT_LENGTH_SIZE, Base::DEFAULT_NUM_HASHES
Instance Attribute Summary collapse
-
#hash_fill_factor ⇒ Object
How full any given hash table is allowed to get, as a float between 0 and 1.
Attributes inherited from Base
#hashptr_size, #length_size, #mode, #num_hashes
Class Method Summary collapse
-
.open(target, *options, &block) ⇒ Object
Alternative to PureCDB::Writer.new(target,options) ..
Instance Method Summary collapse
-
#close ⇒ Object
Write out the hashes and hash pointers, and close the target if it responds to #close.
-
#initialize(target, *options) ⇒ Writer
constructor
Open a CDB file for writing, or preparing an IO like object for writing.
-
#store(key, value) ⇒ Object
Store ‘value’ under ‘key’.
Methods inherited from Base
#hash, #hash_size, #hashref_size, #set_mode, #set_stream
Constructor Details
#initialize(target, *options) ⇒ Writer
Open a CDB file for writing, or preparing an IO like object for writing.
:call-seq:
w = PureCDB::Writer.new(target)
w = PureCDB::Writer.new(target, *)
PureCDB::Writer.new(target) {|w| ... }
PureCDB::Writer.new(target, *) {|w| ... }
If :mode is passed in options, it must be the integers 32 or 64, indicating whether you wish to write a standard (32 bit) CDB file, or a 64 bit CDB-like file. The default is 32.
If target is a String it is treated as a filename of a file to be opened to write to. Otherwise target is assumed to be an IO-like object that ideally responds to #sysseek and #syswrite. If it doesn’t, it will be wrapped with an object delegating #sysseek and #syswrite to #seek and #write respectively, and these must be present.
(IO and StringIO both satisfy these requirements)
If passed a block, the writer is yielded to the block and PureCDB::Writer#close is called afterwards.
WARNING: To complete writing the hash tables, you must ensure #close is called when you are done.
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/purecdb/writer.rb', line 51 def initialize target, * super * @hash_fill_factor = 0.7 set_mode(32) if @mode == :detect if target.is_a?(String) @io = File.new(target,"wb") else set_stream(target) end @hashes = [nil] * num_hashes @hashptrs = [0] * num_hashes * 2 write_hashptrs @pos = hash_size if block_given? yield(self) close nil else self end end |
Instance Attribute Details
#hash_fill_factor ⇒ Object
How full any given hash table is allowed to get, as a float between 0 and 1.
Needs to be <= 1. The lower it is, the fewer records will collide. The closer to 1 it is, the more frequently the reader may have to engage in potentially lengthy (worst case scanning all the records) probing to find the right entry
24 25 26 |
# File 'lib/purecdb/writer.rb', line 24 def hash_fill_factor @hash_fill_factor end |
Class Method Details
.open(target, *options, &block) ⇒ Object
Alternative to PureCDB::Writer.new(target,options) ..
109 110 111 |
# File 'lib/purecdb/writer.rb', line 109 def self.open target, *, &block Writer.new(target, *, &block) end |
Instance Method Details
#close ⇒ Object
Write out the hashes and hash pointers, and close the target if it responds to #close
82 83 84 85 86 |
# File 'lib/purecdb/writer.rb', line 82 def close write_hashes write_hashptrs @io.close if @io.respond_to?(:close) end |
#store(key, value) ⇒ Object
Store ‘value’ under ‘key’.
Multiple values can we stored for the same key by calling #store multiple times with the same key value.
92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/purecdb/writer.rb', line 92 def store key,value # In an attempt to save memory, we pack the hash data we gather into # strings of BER compressed integers... h = hash(key) hi = (h % num_hashes) @hashes[hi] ||= "" header = build_header(key.length, value.length) @io.syswrite(header+key+value) size = header.size + key.size + value.size @hashes[hi] += [h,@pos].pack("ww") # BER compressed @pos += size end |