Bram Cohen announces:
Those of you who have used version control systems for a long time, or anything which has to diff extensively for that matter, know that once in a while they completely bungle a diff and give output which is, to put it charitably, extremely confusing.
Coming up with an algorithm which reliably gives good diffs is far from trivial, but is a problem which has now thankfully been solved. If you get the latest version of bazaar and use its diff utility it will give reasonable diff output on almost anything. All other version control systems (and I mean all others) should switch to using bazaar’s diff algorithm as a swap-in replacement, a change which has essentially no downside. (It also has better asymptotic runtime, and code which is easier to understand and debug.)
Later in comments he briefly explain new algorithm:
Instead of doing a longest common subsequence on everything, it does a longest common subsequence on lines which occur exactly once on both sides, then recurses between lines which got matched on that pass.
Read at: Bram Cohen: “The diff problem has been solved”.
I’d like to try it out when it becomes available for Subversion or Git.
Recent posts on similar topics
- RFC: let's make textual conflicts more personal - December 17th, 2008
- Dave Dribin: "Choosing a Distributed Version Control System" - February 10th, 2008
- Mark Shuttleworth on renaming and merging - October 7th, 2007
- Mark Shuttleworth on renaming and merging - October 7th, 2007
- Ian Clatworthy: "The Future Is Adaptive" - October 6th, 2007