Archive for November, 2006

Keith Packard: “Repository Formats Matter”

Tuesday, November 14th, 2006

Keith Packard, a well-known X.org hacker, describes his reasons for choosing Git as a version control system for X.org.

X.org uses mostly centralized development style, and Git lets you continue working that way:

[…]it all depends on the conventions used within a project and individual developer style.

At X.org, we migrated from CVS to Git and yet have retained our largely centralized development model. There are few people publishing alternate trees, and we grant direct repository access to the same set of developers who used to have CVS access.
According to Keith, Git also adds several useful features, thanks to its completely distributed origin, and to the format of its repository:
  • Offline repository access. You can have your private repository on e.g. your laptop, and do some development work in offline mode. You commit as often as needed, saving your entire history in arbitrary level of details. When you are back online, you can push your private repository to the public repository, and all your changes almost smoothly go to there, with entire history preserved.
  • Private branches. Some developers could have really secret development efforts based on some public codebase (like drivers for the hardware that is not yet released). Git makes this supremely easy by allowing us to keep the ultra-secret new hardware changes in a private repository while still tracking the public repository. When we’re allowed to release the source code for the new hardware, we simply merge the private branch to the upstream master and push that to the public repository. All of the development history for the new hardware then becomes a part of the public source repository.
  • Distributed backups. With Git, when you clone the public repository, you get it in its entirety. Thus, every developer could have a backup copy of repository, immediately available in case something happens to central server. Also, because of the cryptographic protection of repository objects, you can be sure that backup repository was not altered in any way — the checksums just won’t match.

One interesting piece of information is repository size for large projects. It turns out that Git has very low overhead: The Mozilla CVS repository was 2.7GB, imported to Subversion it grew to 8.2GB. Under Git, it shrunk to 450MB. Given that a Mozilla checkout is around 350MB, it’s fairly nice to have the whole project history (from 1998) in only slightly more space. (Similar numbers are reported in Tracking CVS repository with Git).

Read more at: “Repository Formats Matter” (title is a bit misleading) and “Tyrannical SCM selection”.

Overview: database schema version control

Sunday, November 5th, 2006

It seems that version control for database schema changes has traditionally received little developer attention. Almost everyone just goes ahead and changes database schema, trying to keep it backwards-compatible.

That’s possible if the change is small or non-intrusive (as it is in most cases). For more complex changes you could create ad-hoc script which makes necessary alterations and data conversion. In the most serious cases you stop the service, backup the database (this step is usually skipped by everyone once in lifetime, and only once), upgrade the code, run the conversion procedure, start the service back and hope that it all went ok.

This path of less resistance works reasonably well, and rarely leads to disasters. However, it is still error-prone, and the trend is to make the schema upgrades more relaxed, both error developers and production admins.

More integrated development environments seem to get built-in database schema versioning naturally. Less integrated ones get consistent migration strategy only at individual product level, and the tools are usually ad-hoc.

For example, Ruby on Rails has a very refreshing idea of Active Migrations. They work seamlessly, because:

  • Rails is tightly integrated with underlying database;
  • migrations are not tied to using SQL: to create table or add new field to it, you use Ruby itself; this helps with thinking in right way about data changes;
  • keeping Rails application under Subversion is the encouraged practice;
  • rake utility knows about migrations, and helps in creating and deploying them;
  • capistrano utility also knows about migrations, and helps in pushing them to remote production/testing servers;

Perl is, of course, less integrated, and its primary offer, DBIx::Migration could use more attention. Being tied to using SQL is its main drawback.

I believe that mimicking Active Migrations functionality should be a new standard for every major open-source language, such as Perl, Python, or PHP. That’s just like everyone currently expects something similar to CPAN for every new language which tries to attract serious attention.

Meanwhile, Microsoft is taking advantage of its integrated development environment, and makes it easier to keep database schema under common version control.

Jeff Lynch writes in Microsoft Visual Studio Team Edition for Database Professionals:

…with Visual Studio Team Edition for Database Professionals, database development now becomes a fully supported part of your application’s development lifecycle. Now all database development is done “off line” and in a “sandbox” environment (this should make your dba stand up and sing!). All user defined functions and stored procedures can be fully unit tested using representative test data automatically generated by the toolset. And best of all, this new Visual Studio sku fully integrates into Team Foundation Server so your database schema (SQL scripts) can be put under source control just like any other C#, BizTalk or Web Application!

Sachin Rekhi writes in Extensibility in Team Edition for Database Professionals:

Test Conditions. The database unit testing feature allows you verify tests using either SQL assertions or easily configurable UI client-side test conditions. We ship a set of test conditions in the box, including row count, scalar value, empty resultset, etc. But these test conditions are completely extensible so you can imagine creating your own to do more powerful test verification. Check-in Policies. Team System also allows you to create custom check-in policies that require certain actions to be performed prior to check-in. For example, a testing policy that ships with TFS enforces that a specific set of tests is run prior to checking in your code. You can implement other such db specific policies if you desired.

Several third-party tools are available to take control of your database schema for MS SQL Server, e.g.:

What’s your experience with such tools? What do you use? How did it affect your development habits? What are the drawbacks? How should we design database migration tools for open-source languages?

The Daily WTF: “Happy Merge Day!”

Saturday, November 4th, 2006
Our favourite optimism-inducing The Daily WTF shares a story about version control done strictly wrong. Read at “Happy Merge Day”.