January 2005 archives from
Piotr's R&D blog

Reef CVS adapter done

Tuesday, January 25, 2005, 06:30PM - category General -

Reef's CVS adapter is (provisionally) done. Reef will now connect to a CVS repository, seek out any changes, update a local working copy and analyze the log data to present a unified modifications report. This report is used by following phases to enable only incremental processing to be done, hopefully greatly enhancing the performance of the application on huge code bases.

The modification extraction is particularly interesting. I found out that CVSNT, an alternative but generally compatible implementation of the CVS system, tags log entries with automatically generated commit IDs. This allows for easy reconstruction of the atomic commits actually performed by the user, and was not mentioned in the papers I read about the topic. Of course, if the commit IDs are not present for any reason, Reef will use a standard sliding-window algorithm to synthesize best-guess multi-file commits. The papers made this out to be a big deal, but I found that (with the backing of an XML database that supports XQuery) the algorithm comes down to less than a screenful of code. Once you've fixed the bugs in the database, of course... ah, the wonder of open source projects!

Some things that the adapter won't do yet is deal with module aliases or branching. To deal with branching, I'd have to figure out a merging mechanism for diagram edits, which I frankly don't feel like doing yet. Hopefully there's enough projects out there without branches that will still find Reef useful. As for module aliases (and other CVSROOT/modules tricks), does anyone know how prevalent they are in practice? I suspect it should be possible to work around at least a few of the problems, but I'd rather not unless it blocks a significant quantity of possible deployments.