An Introduction to the Conary Software Provisioning System


Abstract

rPath, Inc. introduces the Conary Software Provisioning System, which provides a fresh approach to open source software management and provisioning; one designed to facilitate maintaining branches, ease system administration, and reduce resource use.

Introduction

Managing and customizing Linux systems has been hampered by the very heart of system maintenance, the software management system. With the current packaging systems and tools available for Linux, local changes to source code and configuration files have always fallen into users' or administrators' hands for safekeeping, requiring manual synchronization when changes are made by the operating system distributor.

Conary introduces software management that acts as a combination of repository-based source code management and traditional package management. Users and administrators are now able to make their local changes persistent across changes to the operating system such as upgrades, security patches, and bug fixes. With technologies such as repositories, intelligent branching, shadowing capabilities, and management based on change sets, rPath's Linux offerings will benefit businesses, system administrators, developers, and users.

The Repository

Conary distinguishes itself from classical Linux software management tools by using a versioned repository. Where once there was a large set of package files, there is now a repository of source and binary files. The Conary repository is a network-accessible database that contains files for applications, libraries, and other elements of the operating system. In addition, the repository contains multiple versions of these files on multiple development branches. In simple terms, Conary can be described as a packaging system that works like a source control system.

Within its repository, Conary organizes files by grouping them first into components, which are then grouped into one or more packages. Conary uses systematic versions to avoid confusion in all aspects of the system. Since the packages are collections of files in a repository, the version is specified as the repository location, then the original version number (from the authors of the software), then the source revision number, then the binary build revision number.

Components contain all the files Conary needs to install the application or library, and are stored with the files themselves in a network-enabled repository. This allows the applications to be "checked out" as in a source control system. Similarly, all the sources required to build components are stored in the repository using the same version system, so that changes to the source can be accomplished in an environment which maintains the relationships between sources and binaries.

In addition, source code that builds more than one component is represented by only one instance in the repository. For example, if the same source code builds the application mozilla and mozilla-chat there is no duplication of the source code in the repository or on the user's machine. Also, when updating packages to new versions, Conary updates only files that have actually changed in some way. These behaviors provide significant advantages in system and user resource usage for Conary versus traditional packaging applications.

Branch Structure

Classical software management suffers from a few shortcomings. In particular, most packaging systems use simple version numbers to allow those package versions to be sorted into "older" and "newer" packages, adding concepts such as epochs to work around version numbers that do not follow the packaging system's ideas of how version numbers count. While the concepts of "newer" and "older" seem simple, they break down when multiple streams of development are maintained simultaneously. For example, different versions of a Linux distribution include different versions of the same libraries, so the exact same source code built for different distribution versions would yield different binary packages. A simple linear sorting of version numbers simply cannot represent this situation, which quickly becomes complicated. Neither of the binary packages is newer than the other; the packages simply apply to different contexts.

Conary instead uses descriptive strings to specify both the version numbers and the branch structure for any given component. The version not only provides this information but also the location of the repository (on a network), no matter if that location is external or on the local machine. Although this makes the actual version long, Conary normally abbreviates the strings into a form that closely resemble the versions other software management systems use.

In addition to the repository location being represented, there are other versioning conventions in Conary to avoid build conflicts. The numeric portion of the version contains the upstream version number followed by the source build number (how many times the sources have changed), and the binary build number (how many times this particular set of sources has been built). These source and build numbers are specific to the repository in which they are built. Conary compares two upstream versions only to see whether they are the same or different; the real meaning of the version is derived from the source build number and binary build numbers in relation to the branch and repository names.

Similarly, when the sources are branched, Conary creates a branch label to distinguish what has changed from the original sources. The branch number will be hidden from the user, as the version is quite long at that point. However, the lengthy string provides a well-described version that prevents version conflicts.

Conary is designed to make branching an inherent process of maintaining and customizing the system. However, it is also smart enough to avoid the old version number conflicts that have affected both users and developers.

Shadows

One consistent problem in the open source community is the maintenance and customization of applications and libraries that change often. With the speed of change inherent in the high-tech world, conflicts arise often when a developer or administrator creates local changes and then tries to track changing upstream development.

The most powerful way to manage local changes is (of course) to build in changes from the source code. Conary makes this possible in two ways. One way is the simple branch, just like you would do with any source code control software. Unfortunately, this is not always the best solution.

If, for example, you were maintaining a version of the Linux kernel in which you had to compile in a specific driver, you could create a branch to add your driver, but all the work you do would be relative to the kernel version you started with. Creating a new branch to track another version of the kernel doesn't help you as the new branch will go off in its own direction like your first branch. Therefore, when a new kernel is released and committed to the repository, the only way to represent the changes in that version of your branch would be to manually compare the changes and apply them, bring your patch up to date, and commit these changes. This is time-consuming work that would have to be performed all over again whenever there is yet another new kernel release.

Conary introduces a new concept: the shadow. A shadow acts primarily as a new layer for keeping local changes while tracking upstream changes. Shadows allow local changes to be kept distinct from the branching structure of a component being tracked; this makes it straightforward to reapply those changes to other locations in the version tree. Shadows are not designed to facilitate forking, but rather as a tool to allow local changes to track another repository. As with all aspects of the Conary system, shadows are labeled intelligently for the maintainer's ease of use.

With shadows, maintaining the example kernel above is simply a matter of updating the shadow, modifying the local patch if necessary, and committing the new changes to the shadow. Essentially, you are able to track the changes in the kernel while easily maintaining a patch. This maintenance and customization takes less work and less time than maintaining a branch, whether your task is maintaining small changes on frequently-updated components or managing a large set of changes relative to an entire operating system.

Changesets

Anyone responsible for system maintenance or system configuration wants to accomplish their tasks in the simplest and safest manner. Traditional packaging systems make loading a new release of an application or library easy, but do so in a "blanket" manner. When traditional systems update packages, they have no regard for determining whether the files being replaced are pristine or not. Changes are simply overwritten whether the file has been changed or not. Writing unchanged files over again creates greater overhead and is intrusive to a well-running system. The risk is normally small, but the overhead is significant.

Just as source code control systems use patch files to describe the differences between two versions of a file, Conary uses changesets to describe the differences between versions of components. These changesets include the actual changes in contents in existing files, the contents of new files, name changes (if files are renamed but otherwise unchanged, only the change in name is included), permissions changes, and so forth. They also can include changes to components as well as to individual files.

Conary changesets are quite often transient objects; they are created as part of an operation (such as fetching a new version from a repository) and disappear when that operation has completed. They can be stored in files, however, which allows them to be distributed like the package files produced by a classical package management system. Applying changesets rather than installing whole new versions of libraries and applications allows Conary to update only the parts of the system which have changed, rather than blindly reinstalling every file. Changesets are more efficient than classic packages in at least two ways: they take less space to express what changes to make on the system, and they take less time to apply the changes to the system when the set of changes required is small. These benefits apply whether the changesets are acquired through a network connection to a repository, on a CD, or via any other method.

Representing updates as changesets not only saves space and bandwidth, it also allows merging. Conary intelligently merges changes to file contents and file metadata such as permissions. This capability is very useful if you wish to maintain a branch of an application or library while keeping current with vendor maintenance, while adding a couple of patches to meet local needs.

Conary also preserves local changes in essentially the same way. When, for example, you add a few lines to a configuration file on an installed system, and then a new version of an application is released with changes to that configuration file, Conary can merge the two unless there is a direct conflict (unusual, but possible). If there is a conflict, it is marked as such so that modifications can be applied. Also, if you change something as simple as a file's permissions, those changes will be preserved across upgrades.

A local changeset is a special changeset that represents the changes made on a local system. There are two ways Conary allows you to commit local changesets: committing a local changeset to a repository, and distributing the changeset to individual systems. The first is better for systems with entirely centralized management policies, and the latter for individual systems that are expected to autonomously update themselves asynchronously.

Changesets represent a sane approach to preserving changes to a system while ensuring system integrity and limiting resources used to make such changes. System customization and maintenance are no longer an obstacle with Conary.

Summary

Conary is a powerful new software management system. rPath has applied lessons taught by the drawbacks of traditional packaging applications to create a software management system that puts the customization of a system first. While it is easy to describe Conary as a cross between packaging and source control applications, it is in fact a very powerful provisioning system. From the power of tracking upstream releases with shadows, to the security and low-overhead of changesets, all within a database-driven repository, Conary has redrawn the playing field for Linux Operating System management and customization.