Archive for the 'Ideas' Category

Mark Shuttleworth on renaming and merging

Sunday, October 7th, 2007

Mark Shuttleworth (of Thawte, Ubuntu, and Canonical fames) wrote four posts on rename tracking feature in version control systems, and its impact on merging as a social process.

Read at his blog:

Those posts caught a lot of attention back in June, but frankly it seems to me the issue is somewhat overrated (but nevertheless a recommended reading). Mark’s Canonical Ltd. supports the development of Bazaar. Bazaar has the discussed feature of merging with renames tracking. Git does not have it, but it has another argument in this discussion: huge merging traffic — Linux kernel development is all about merging, and the codebase is huge. So, I guess this feature is not that crucial: Linus is well-known by his uncompromising approach to tools.

Ian Clatworthy: “The Future Is Adaptive”

Saturday, October 6th, 2007

Ian Clatworthy, one of the primary developers of Bazaar, has posted a series of articles on version control in broad modern context.

His primary thesis is:

Beyond market acceptance, there are 6 main criteria I consider when evaluating collaboration tools:

  1. Reliability
  2. Adaptability
  3. Usability
  4. Extensibility
  5. Integration
  6. Administration (including Total Cost of Ownership)

Read the whole series at:

Couple of memorable quotes:

Likewise, in the field of collaboration, I think there are 5 interesting numbers: 1, 2, 10, 100 and 1000. These numbers represent:

  • an Individual
  • a Partnership
  • a Team
  • a Company
  • a Community

[…]

As a young software engineer back in the early 90s, 10s of thousands of people woke up to cold showers in Sydney one morning because of a corner-case bug in my code controlling the off peak hot water system. That sort of event tends to have a life long impact on how one designs software!

Ken Takusagawa: “Version control filesystem for software packages and system configuration”

Monday, June 25th, 2007
Simple and cool idea: package manager should be version control software of a kind. Obviously handles version rollback and history, trojans are detected easily, general cleanliness. Downside: needs to be implemented totally, with strict discipline for software upgrades. Although, same discipline is needed for any change-managed activity. At Ken Takusagawa’s Blog: “Version control filesystem for software packages and system configuration”

Bram Cohen: “The diff problem has been solved”

Sunday, June 24th, 2007
Bram Cohen announces:
Those of you who have used version control systems for a long time, or anything which has to diff extensively for that matter, know that once in a while they completely bungle a diff and give output which is, to put it charitably, extremely confusing. Coming up with an algorithm which reliably gives good diffs is far from trivial, but is a problem which has now thankfully been solved. If you get the latest version of bazaar and use its diff utility it will give reasonable diff output on almost anything. All other version control systems (and I mean all others) should switch to using bazaar’s diff algorithm as a swap-in replacement, a change which has essentially no downside. (It also has better asymptotic runtime, and code which is easier to understand and debug.)
Later in comments he briefly explain new algorithm:
Instead of doing a longest common subsequence on everything, it does a longest common subsequence on lines which occur exactly once on both sides, then recurses between lines which got matched on that pass.
Read at: Bram Cohen: “The diff problem has been solved”. I’d like to try it out when it becomes available for Subversion or Git.

MolhadoRef: A refactoring-aware version control system

Tuesday, February 20th, 2007
Dan Moore writes about MolhadoRef. MolhadoRef is a research project developed by Danny Dig, which implements a version control system where refactoring is first-class concept. That is, when you use well-known refactoring techniques in Java, such as “rename package, rename class, rename method, move class, move method, and changing the method signature”, MolhadoRef detects that you’re doing just that, and records this refactoring information. Later, during merging, it tries to use refactoring history to resolve merging conflicts in a more intelligent way. Usual text-level conflict resolution is left as a fall-back. Dig’s experiments show radical decrease in number of merging conflicts when using this merge strategy (see “Evaluation: effectiveness”). Also, you can see the history of code objects, such as methods and classes: how they evolved over time. See Dan Moore’s post on MolhadoRef for notes from Dig’s talk “Refactoring-Aware Software Merging and Version Control for Object-Oriented Programs”. P.S.: About two years ago during the discussion of early GIT development, Linus Torvalds outlined his idea about tracking the movement of code objects, as opposed to simply tracking file renames and line changes. See the discussion at Linus Torvalds: Re: Merge with git-pasky II
(Message-ID: Pine.LNX.4.58.0504141728590.7211@ppc970.osdl.org).

Overview: database schema version control

Sunday, November 5th, 2006

It seems that version control for database schema changes has traditionally received little developer attention. Almost everyone just goes ahead and changes database schema, trying to keep it backwards-compatible.

That’s possible if the change is small or non-intrusive (as it is in most cases). For more complex changes you could create ad-hoc script which makes necessary alterations and data conversion. In the most serious cases you stop the service, backup the database (this step is usually skipped by everyone once in lifetime, and only once), upgrade the code, run the conversion procedure, start the service back and hope that it all went ok.

This path of less resistance works reasonably well, and rarely leads to disasters. However, it is still error-prone, and the trend is to make the schema upgrades more relaxed, both error developers and production admins.

More integrated development environments seem to get built-in database schema versioning naturally. Less integrated ones get consistent migration strategy only at individual product level, and the tools are usually ad-hoc.

For example, Ruby on Rails has a very refreshing idea of Active Migrations. They work seamlessly, because:

  • Rails is tightly integrated with underlying database;
  • migrations are not tied to using SQL: to create table or add new field to it, you use Ruby itself; this helps with thinking in right way about data changes;
  • keeping Rails application under Subversion is the encouraged practice;
  • rake utility knows about migrations, and helps in creating and deploying them;
  • capistrano utility also knows about migrations, and helps in pushing them to remote production/testing servers;

Perl is, of course, less integrated, and its primary offer, DBIx::Migration could use more attention. Being tied to using SQL is its main drawback.

I believe that mimicking Active Migrations functionality should be a new standard for every major open-source language, such as Perl, Python, or PHP. That’s just like everyone currently expects something similar to CPAN for every new language which tries to attract serious attention.

Meanwhile, Microsoft is taking advantage of its integrated development environment, and makes it easier to keep database schema under common version control.

Jeff Lynch writes in Microsoft Visual Studio Team Edition for Database Professionals:

…with Visual Studio Team Edition for Database Professionals, database development now becomes a fully supported part of your application’s development lifecycle. Now all database development is done “off line” and in a “sandbox” environment (this should make your dba stand up and sing!). All user defined functions and stored procedures can be fully unit tested using representative test data automatically generated by the toolset. And best of all, this new Visual Studio sku fully integrates into Team Foundation Server so your database schema (SQL scripts) can be put under source control just like any other C#, BizTalk or Web Application!

Sachin Rekhi writes in Extensibility in Team Edition for Database Professionals:

Test Conditions. The database unit testing feature allows you verify tests using either SQL assertions or easily configurable UI client-side test conditions. We ship a set of test conditions in the box, including row count, scalar value, empty resultset, etc. But these test conditions are completely extensible so you can imagine creating your own to do more powerful test verification. Check-in Policies. Team System also allows you to create custom check-in policies that require certain actions to be performed prior to check-in. For example, a testing policy that ships with TFS enforces that a specific set of tests is run prior to checking in your code. You can implement other such db specific policies if you desired.

Several third-party tools are available to take control of your database schema for MS SQL Server, e.g.:

What’s your experience with such tools? What do you use? How did it affect your development habits? What are the drawbacks? How should we design database migration tools for open-source languages?

Martin Fowler: PervasiveVersioning

Tuesday, October 24th, 2006
Martin Fowler writes about Apple Time Machine, which is an attempt to give something like version control to ordinary users. He advocates the idea that more and more applications for non-developers could incorporate some aspects of version control, especially for collaboration needs.
So my hope that is that Time Machine will spur development of applications that are aware of versioning and can take advantage of it, which will in turn shift to more effective collaboration.
Read more at Martin Fowler: “Pervasive Versioning” and “More Version Control”.