Skip to content

philhanna/ListDifferencer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ListDifferencer

Java library that computes the difference between two lists. Written as a generic class that can handle lists of anything that implements equals(Object other) and hashCode(), so that it can find the difference between two lists of strings, or two XML files, or whatever can be made to implement these methods.

Algorithm

The algorithm consists of recursively finding the longest common subsequence (LCS) of two lists as follows: Let

X = (x1, x2, ..., xn)

and

Y = (y1, y2, ..., ym).

Then

LCS(X,Y) = {
   if either X or Y is empty then
      return the empty list
   else if the last element of X is the same as the last element of Y then
      return LCS(X-1, Y-1) + common last element of X and Y
   else if the last elements are different then
      return longer(LCS(X, Y-1) and LCS(X-1, Y))
}
where X-1 means the sublist of X that omits the last element of X

Since the algorithm is recursive, it may need to compute the same subsequences many times. The working results are cached for performance reasons.

Note that the definition of a common subsequence means a list of elements all of whom exist in both lists in the same order, but not necessarily contiguously. In other words, in these two lists:

list 1 = {"the", "cat", "in", "the", "hat", "came", "back", "today"}
list 2 = {"but, "the" "cat", "does", "not", "have", "a", "hat", "today"}

the longest common subsequence is

common = {"the" "cat", "hat", "today"}

This common subsequence is found in consecutive (but not contiguous) elements 1, 2, 5, and 8 of list 1, and elements 2, 3, 8, and 9 of list 2. Visually, this is:

list 1      common      list 2
------      ------      1. but
1. the      the         2. the
2. cat      cat         3. cat
3. in       ------      4. does
4. the      ------      5. not
------      ------      6. have
------      ------      7. a
5. hat      hat         8. hat
6. came     ------      ------    
7. back     ------      ------    
8. today    today       9. today

Once the longest common subsequence is found, the difference list is computed in a straightforward manner: Any element in the first list that is not in the common list is a deletion. Any element in the second list that is not in the common list is an addition.

References

About

Java library that computes the difference between two lists.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages