Drupal Deployment part 1: the challenge of deploying to large and busy sites

This is the first in a three part series looking at how we deploy updates to our Drupal sites nzpost.co.nz, stamps.nzpost.co.nz, and coins.nzpost.co.nz. In part one, senior Drupal developer Neil Bertram from Catalyst IT outlines some of the problems with Drupal deployments on large sites and looks at two of the foundations we depend on at New Zealand Post, source control and deployment with Debian packages. In part two, we talk about the modules that make things easier, custom hooks, and how we deploy. In part three we dream about seamless releases that we can do on demand, and the three biggest issues we currently face with deployment.

As touched on in earlier posts, the New Zealand Post website runs on a “slightly” customised version of Drupal 6. Releasing new features or changes to the site is a somewhat complex process, in many ways not made easier by Drupal’s architecture.

Traditionally, when Drupal was seldom used for larger sites, changes could be made directly in the live environment with little testing or scripted procedure. Because Drupal is intended to be very easy to install and maintain, a lot of design decisions in Drupal core have historically leaned towards making it very easy to reconfigure by point-and-click administration pages. It is equally easy to install and set up third-party contributed modules without much code knowledge required.

This design persists in the latest version of Drupal. The core itself is biased towards users who don’t really know how things work under the hood, which in some ways penalises users who know what they want and just need a simple way to manipulate and migrate site configuration predictably. Because Drupal stores most of its configuration in the database, migrating the entire site configuration between development, testing and live environments requires a bit more thought and process than it would if we had all our settings stored in configuration files.

As is becoming increasingly typical, the community has written tools to make this easier. There are third party contributed modules that make life easier for serious sites needing this control. These started to appear late in the lifecycle of Drupal 5, but have only really become mainstream with Drupal 6. Modules such as Features, Strongarm, Drush, Chaos tools, Environment and even the Deploy module offered a bunch of (occasionally competing) ways to prepare and ship changes into a production environment. We are using several of these modules and a few more to manage our deployment process at New Zealand Post.

This series will cover how we get our changes out in a predictable manner and with minimum disruption to the users on our site, and round up with some unsolved mysteries that maybe our readers can suggest solutions to!

It all starts with source control

Source control is the foundation of our change tracking. All changes can be reliably tracked and migrated between our environments. For the New Zealand Post site we use Git. Git allows developers to maintain their own branches. These branches contain features that aren’t ready for prime time yet, and makes it easy to see what’s changed in between two releases. When a feature is ready for release it is merged into the master branch.

It helps that Drupal core (and modules) switched from CVS to Git a couple of years back. This means we can easily pull in security and bug fixes from the upstream project with no more fuss than if we had made the changes ourselves. It also makes it easier for us to contribute patches back to the community for modifications we’ve made to Drupal itself, as those patches need to be in a standardised Git format.

Of course, like many larger sites, we have a few custom patches to Drupal that are specific to the way we do things, or to remove limitations that don’t make sense to work around in other ways. Git gives us the ability to easily look at the difference between the version of Drupal we’re using and what the upstream project has.

When we’re ready to bundle up a set of work to move through testing and eventually into production, we simply take a snapshot of the Git master branch at that particular time and package it up. A tag is left in Git for that release, so if we need to get a copy of the code as it exists on production, we can do that easily by checking out the code marked with that tag. This is fairly standard practice across various other software projects and source control systems, so this should be no surprise to anyone who deals with controlled deployment of software.

A little bit different – deployment with Debian packages

Our site runs on a cluster of Ubuntu servers, which manage installed software using the Debian package format and APT package manager. Our site is also deployed using this “deb” format, rather than by simply cloning it out of Git. This may seem a little strange to those familiar with web software deployment, but I’ll explain.

One of the ways to make life easier for systems administrators is to reduce the number of exceptions they need to deal with across the vast number of systems they administer. Because the servers that run our website are looked after by admins who also look after many hundreds of other servers that do vastly different things, we decided to be helpful and use Debian packages to keep things standard for them. It’s one less thing they need to learn to be productive. Other environments that aren’t standardised on Debian or Ubuntu may have other priorities, for example Red Hat shops often deploy software using RPMs instead.

Debian packages aren’t a bad fit for web software though. Indeed, Debian/Ubuntu already ships with many popular web applications bundled up into this format for easy installation. Debian packages offer versioning, dependency management, configuration and standard install locations for software that make life relatively easy. In our case, we manage configuration (including for Drupal) using Puppet, but the actual code comes from Debian packages.

Getting a Drupal site from Git into a Debian package is relatively straightforward, thanks to the Drush Debian Packaging scripts.

There are many other ways to deploy code and configuration for a Drupal site. Most involve deploying directly from source control systems. We’ve looked at using Capistrano and Fabric due to their better cluster awareness, but so far Debian packages with some help from cluster-aware shell scripts do the job for us just fine.

Any ideas for us?

Are you looking after a major Drupal site? We’re always interested in hearing how others solve the problems we’ve faced.

Next time we’ll look at the modules we rely on, custom hooks, and how we deploy.

Thanks for reading!

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.