Class: Merge3
- Inherits:
-
Object
- Object
- Merge3
- Defined in:
- lib/merge3.rb
Overview
so here comes the main algorithm. We have the class to carry the original file and it’s hash codes we need for the matching of the files
Class Method Summary collapse
-
.all_matches(chunk) ⇒ Object
collect all matches (for the start file, so it doesn’t have to be done twice).
-
.match(chunk, around) ⇒ Object
find matches for around the given length.
-
.match_newlines(chunk) ⇒ Object
find matches according to newline seperation.
-
.matches(chunk, size) ⇒ Object
build a hash of chunks( hash of the chunks string against the chunk).
-
.merge(one_seq, two_seq) ⇒ Object
given the Sequences of both files, attempt merging them.
-
.three_way(start_file, one_file, two_file, ignore_white_space = false) ⇒ Object
make a three way merge, passing in strings and returning a string strings are byte arrays (not line arrays as in diff).
Class Method Details
.all_matches(chunk) ⇒ Object
collect all matches (for the start file, so it doesn’t have to be done twice)
182 183 184 185 186 187 188 189 |
# File 'lib/merge3.rb', line 182 def Merge3::all_matches( chunk ) # order counts, so large chunks overwrites small for hash collisions matches = Merge3::match(chunk,16) matches.update(match(chunk,32)) matches.update(match(chunk,48)) matches.update(match_newlines(chunk)) matches end |
.match(chunk, around) ⇒ Object
find matches for around the given length. Exactly the length for binary files, but if there is whitespace, we use that to break within 70% of the given value
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
# File 'lib/merge3.rb', line 226 def Merge3::match( chunk , around ) match = {} # the hash of matches , using the strings as key, chunks as value index = 0 #count through the string length while index < chunk.length # also possible here to do half from the left = chunk.length - index #front and half from the back (?) if left > 0.7*around and left <= 1.5*around # are we at the end len = chunk.length - index else sub = chunk.str[index , 1.5*around] # take the maximum length we want if (white = sub.index( /\W/ , 0.7 * around )).nil? # and check for non char len = around # either taking the whole else #or the "words" we found len = white #hoping this is better for text, so we have no allignment end # issues end #then create a chunk with the indexes and string cop = chunk.subchunk( index , len , index ) match[cop.str] = cop # and store in our hash (no collision detection) index += len end # now we only take one from the back,because that's often a match and will be at = (chunk.length - 0.7*around).to_i cop = chunk.subchunk( at , (0.7*around).to_i , at ) match[cop.str] = cop match end |
.match_newlines(chunk) ⇒ Object
find matches according to newline seperation.
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
# File 'lib/merge3.rb', line 192 def Merge3::match_newlines( chunk ) match = {} # the hash of matches , using the strings as key, chunk as value index = 0 #count through the string length if chunk.whitespace.empty? # whitespace compression on or not ? # because there are only newline if whitespace compression is off chunk.str.split("\n").each do |line| # this logic strips the whitespace off both sides of the line, in an # effort to match things like changed indentation offset = 0 offset += 1 while line[offset,1] =~ /\s/ index += offset line.strip! cop = chunk.subchunk( index , line.length , index ) match[cop.str] = cop # and store in our hash (no collision detection) #puts "LINE #{index} #{line.length} #{offset}--#{cop.str}--" index += line.length + 1 # 1 is the newline end else #whitespace compression is on, so the whitespace table contains # the newlines, just have to find them chunk.whitespace.each do | at , str | #puts "###{str}##" next unless str.include?( "\n" ) cop = chunk.subchunk( index + 1 , at - index , index ) index =at #puts "###{cop.str}##" if cop.str.include?("Four") match[cop.str] = cop # and store in our hash (no collision detection) end end match end |
.matches(chunk, size) ⇒ Object
build a hash of chunks( hash of the chunks string against the chunk). This is the backbone of the matching algorithm, so that finding a match is basically a hash lookup Matches may be of different sizes, we use 16,32,48 and -1 for line matching
173 174 175 176 177 178 179 |
# File 'lib/merge3.rb', line 173 def Merge3::matches( chunk , size ) if size == -1 then return Merge3::match_newlines( chunk ) else return Merge3::match( chunk , size ) end end |
.merge(one_seq, two_seq) ⇒ Object
given the Sequences of both files, attempt merging them. the chunks in the sequences are non-overlapping, so we can always work on whole chunks. In a sentence, the algorithm is now: Follow the line of change
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/merge3.rb', line 111 def Merge3::merge( one_seq , two_seq ) # we go along the original file's order and change to the other sequence # if there has been a change compared to the original file. out = [] # this is the array of string to be outputted, the result # this doesn't consider changes at the beginning of the file # which just goes to prove that we're researchers one_chunk = one_seq.chunks.first two_chunk = two_seq.chunks.first last_choosen = one_chunk break_flag = nil while break_flag.nil? if DEBUG and one_chunk.from != two_chunk.from #chunks are non overlapping, so they must be the same raise "CHUNKS differ #{one_chunk}\n#{two_chunk}" end # we start with the next in the edited file order and then check where next_one = one_seq.get_next( one_chunk ) # the change is next_two = two_seq.get_next( two_chunk ) break_flag = true if next_one.nil? or next_two.nil? #eof # so if the change is in the first file, if break_flag.nil? and one_chunk.org_stop != next_one.from raise "conflict at #{one_chunk.org_stop}" if two_chunk.org_stop != next_two.from # we find the one in file two that starts where it (one) ends next_two = two_seq.find_from( next_one.from ) puts "ONE #{one_chunk}\n #{two_chunk} nNEXT #{next_one}\n #{next_two}" if DEBUG next_choosen = next_one # otherwise the change must be in file two elsif break_flag.nil? # and we look for the chunk in one, that starts where two ends next_one = one_seq.find_from( next_two.from ) puts "TWO #{two_chunk}\n #{one_chunk}\nNEXT #{next_two}\n #{next_one}" if DEBUG next_choosen = next_two else next_one , next_two = nil ,nil end this_del = ( one_chunk.kind_of?(Deleted) or two_chunk.kind_of?(Deleted) ) # Output the associated (real==whitespace expanded) string if not deleted out << last_choosen.real_string unless this_del puts "OUT #{Chunk::short_string(one_chunk.real_string)}" if DEBUG and not this_del # we ignore adds (from one file) that happen to fall in the middle of a deleted in the other one_del = ( one_chunk.kind_of?( Deleted) and next_one.kind_of?( Deleted) ) two_del = ( two_chunk.kind_of?( Deleted) and next_two.kind_of?( Deleted) ) unless one_del or two_del # otherwise we output added text from both files puts "ADD one #{one_chunk.added}" if DEBUG and not one_chunk.added.nil? out << one_chunk.added.real_string unless one_chunk.added.nil? # and if both have added text create an asymetric result, hmm puts "ADD two #{ two_chunk.added}" if DEBUG and not two_chunk.added.nil? out << two_chunk.added.real_string unless two_chunk.added.nil? end one_chunk = next_one two_chunk = next_two last_choosen = next_choosen end puts "RESULT" , out.join , "END" if DEBUG return out.join # joins the pieces and returns the merged string end |
.three_way(start_file, one_file, two_file, ignore_white_space = false) ⇒ Object
make a three way merge, passing in strings and returning a string strings are byte arrays (not line arrays as in diff)
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# File 'lib/merge3.rb', line 80 def Merge3::three_way( start_file , one_file , two_file , ignore_white_space = false ) # We also create chunks of the whole files, in case whitespace # handling is on. Chunk::subchunk passed compressed whitespace on start_chunk = Chunk.new( 0 , start_file.length , start_file ) one_chunk = Chunk.new( 0 , one_file.length , one_file ) two_chunk = Chunk.new( 0 , two_file.length , two_file ) if ignore_white_space start_chunk.squeeze # all consecutive whitespace is then squeezed one_chunk.squeeze # into one, and a talbe create to unsqueeze later two_chunk.squeeze # Yes, this is expensive end # only create the macthes hash once for the start file start_matches = Merge3::all_matches(start_chunk) # Create the chunks( moves first , then added and deleted) one_seq = Sequence.new( one_chunk , start_chunk , start_matches ) two_seq = Sequence.new( two_chunk , start_chunk , start_matches ) puts "Diff original 0" , one_seq if DEBUG puts "Diff original 1" , two_seq if DEBUG # break all chunks into non overlapping regions (in the original file order) one_seq.no_overlap( two_seq ) two_seq.no_overlap( one_seq ) puts "Diff non-overlapping 0" , one_seq if DEBUG puts "Diff non-overlapping 1" , two_seq if DEBUG # and finally (and most simply) do the merge Merge3::merge( one_seq , two_seq ) end |