Archive for the 'Use cases' Category

configure.in and version control

Monday, January 8th, 2007
Karl Fogel writes in “Using Version Control System” chapter of his book “Producing Open Source Software”:
Don’t keep generated files under version control. They are not truly editable data, since they are produced programmatically from other files. For example, some build systems create configure based on the template configure.in. To make a change to the configure, one would edit configure.in and then regenerate; thus, only the template configure.in is an “editable file.” Always version only the templates. If you version the result files as well, people will inevitably forget to regenerate when they commit a change to a template, and the resulting inconsistencies will cause no end of confusion.
I believe that this advice is only partially correct. Almost always there is one important use case for the repository: someone may want to check out current sources and build them on some unknown machine. Practice shows that you have to match all three versions of Automake, Autoconf, and Libtool almost exactly, or you will have random build errors which are very hard to diagnose and fix. So, you have to spare that unknown person from such pains. So, we always add configure and Makefile.in files to the repository, although technically they are “generated” (from configure.ac and Makefile.am, respectively). To fix the problem with “people will forget to regenerate” you have to just always use --enable-maintainer-mode switch to ./configure, and those files will regenerate automatically the next time you run make. (Hopefully, you commit only if the thing at least compiles.) Remember that lots of Autotools special files, such as config.cache and stamp-h*, must be added to ignore list of your SCM tool, so that you do not waste your attention. This not only allows you to build from sources, but also make quick fixes, if they touch only existing source files, and do not require adjusting the build infrastructure. Obviously, most of the fixes would be from that category. If our unknown developer needs to add new file, or to change the dependency, or to add a clause to configure.ac, she will have to install the correct versions of Autotools (which should be exactly documented in your project’s developer documentation). Of course, the above description is not applicable to “ordinary” generated files. If you have Docbook-based documentation, and you generate a bunch of HTML files from that, you should never add them to the repository. Only master .xml-files should be versioned.

Brian Harry: “Branch structure in Developer Division at Microsoft”

Sunday, December 24th, 2006
Brian Harry shares an inside story about how the branches are organized in Developer Division at Microsoft. They use many branches (called “virtual build labs”), with strict rules for moving changes between labs. Private branches and release branches are also used extensively. The post is quite old (November 2005), and they seem to be already switching to feature branches. But nevertheless, it’s a good read. Some arithmetics: I’d estimate that we have on the order of 800 developers[…]. Let’s imagine for a minute that any given developer has a brain fart and checks in a build break, serious bug, etc once every 6 months (which in my opinion would be a team of 800 of the best developers I’ve ever met). There’s about 250 working days in a year and simple math says 800 * 2 / 250 > 6 serious bugs checked in every single working day. The fact is it’s higher than that. Read more on how they cope with that scale of bugs at Brian Harry: “Branch structure in Developer Division at Microsoft”. (via notgartner blog)

Books: Karl Fogel “Producing Open Source Software”

Monday, December 4th, 2006
Karl Fogel, one of the lead developers behind Subversion and CVS, wrote a book on Open Source software development. A large part of text is dedicated to the social side of software development: communication, finances, managing volunteers, licenses. Technical side is also strong, as could be expected from a developer of such caliber. Of course, Karl has quite a few words about version control as applied to topic. The book is available freely under the open license. You may browse it online at Producing Open Source Software: How to Run a Successful Free Software Project. Particularly, the version control chapter is at Chapter 3. Technical Infrastructure: Version Control. You may also buy the book at Amazon: Karl Fogel: “Producing Open Source Software: How to Run a Successful Free Software Project”. More books at Amazon aStore.

Joel Spolsky: “Using source control tools on huge projects”

Monday, December 4th, 2006
Joel Spolsky forwarded more information about source control in Windows team:
“… prior to the restart effort of Longhorn, there were about seven [branches], reverse-integrating into one main branch every two or three weeks perhaps. Now, imagine several thousand developers checking in directly into seven branches. This will lead to two things: “1. you check in frequently, and there’s a very high chance of either breaking the build, or breaking functionality in the OS, or 2., as a counter-reaction, you don’t check in very often, which clearly is bad, since now you don’t have a good delta history of what you did. “So this clearly didn’t scale. As part of the restart effort, we decided that each team would get its own feature branch, each feature area (multiple teams) would go up to an aggregation branch, and those would lead up to the final main branch. (As such there’s now north of a hundred branches in tiers, leading up to about six aggregation branches.) Teams were free to choose how many sub-feature branches they wanted, if any, and they were free to choose how often they wanted to push up their changes to the aggregation branch. As part of the reverse-integration (RI, i.e. pushing up) process, various quality gates had to pass, including performance tests. Due to how comprehensive those gates ended up being, this would take at least a day to run, plus perhaps a day or two to triage issues if any cropped up; so there was a possibly considerable cost to doing an RI in the first place. However, these gates were essential in upholding the quality of the main branch, and had they not existed, the OS would have never shipped. I suppose it’s one of those ‘what doesn’t kill you…’ type deals.
Joel adds:
When you’re working with source control on a huge team, the best way to organize things is to create branches and sub-branches that correspond to your individual feature teams, down to a high level of granularity. If your tools support it, you can even have private branches for every developer. So they can check in as often as they want, only merging up when they feel that their code is stable. Your QA department owns the “junction points” above each merge. That is, as soon as a developer merges their private branch with their team branch, QA gets to look at it and they only merge it up if it meets their quality bar.
Read all at Joel Spolsky: “Using source control tools on huge projects”.

Perforce as the version control system at Google

Sunday, December 3rd, 2006
It is well known that Google uses Perforce as its internal source management system (it has a source license). Niall Kennedy writes:
Google uses a company-wide Perforce depot with almost no developer branches. Each developer has their own NFS workspace readable by anyone in the company, including automated processes. An administrative process takes snapshots of each developer workspace including local development environments accessed over SSH. Files within these snapshots can be compared to checked-in data, encrypted, and archived.
Dan Bloch did a presentation at Perforce European User Conference, called “Performance and Database Locking at Large Perforce Sites”. It contains statistics on Google Perforce Depot, such as:
  • More than 3000 users and 100Gb of metadata on one primary server;
  • Hardware is an HP DL585 4-way Opteron with 128Gb of memory;
  • Depot is on a NetApp filer;
  • Metadata and journal on RAID-10 local disk;
(via Niall Kennedy: “Google Mondrian: web-based code review and storage”)

Moishe Lettvin “The Windows Shutdown crapfest”

Thursday, November 30th, 2006
One of the hottest links of the moment is the blog post by Moishe Lettvin about how the Windows Vista shutdown menu got to be how it is. What, of course, caught my attention is how the version control is deployed inside Microsoft. Quoting Moishe:
I’d also like to sketch out how actual coding — what there is of it — works on the Windows team. In small programming projects, there’s a central repository of code. Builds are produced, generally daily, from this central repository. Programmers add their changes to this central repository as they go, so the daily build is a pretty good snapshot of the current state of the product. In Windows, this model breaks down simply because there are far too many developers to access one central repository — among other problems, the infrastructure just won’t support it. So Windows has a tree of repositories: developers check in to the nodes, and periodically the changes in the nodes are integrated up one level in the hierarchy. At a different periodicity, changes are integrated down the tree from the root to the nodes. In Windows, the node I was working on was 4 levels removed from the root. The periodicity of integration decayed exponentially and unpredictably as you approached the root so it ended up that it took between 1 and 3 months for my code to get to the root node, and some multiple of that for it to reach the other nodes. It should be noted too that the only common ancestor that my team, the shell team, and the kernel team shared was the root. So in addition to the above problems with decision-making, each team had no idea what the other team was actually doing until it had been done for weeks.
Read the whole story: Moishe Lettvin “The Windows Shutdown crapfest”.