Or "Don't use database dumps to deploy your site".
In the world of Drupal consulting, there are few groups who can touch the sheer excellence, the collective brain power, the average (not just the median, we're talking the true statistical average) sheer awesomeness of Lullabot. From writing books, to creating entire conferences, to training camps, their multi-year podcast project.. not to mention the contribution of some of the most important Drupal modules (outside of what Earl Miles creates anyway) and the fact that their brainpower includes the official Drupal 7 co-maintainer, Angie Byron aka webchick, and the leader of the Drupal documentation efforts, Addison Berry aka add1sun - anyone who is even slightly involved in working with Drupal will use, learn from or be affected by something they have had their hands in.
I'm also lucky enough to have in-person training from both Angie and Nate Haug / quicksketch in 2007 so I know first-hand how smart and knowledgeable they are with all things Drupal. I truly have the utmost respect for every one of their team.
So it came as a tremendous shock when I discovered last week that they built an "installation profile" that was really just a database dump, and then today saw that this was then followed up with an article by Angie appearing to promote this as a reasonable way of deploying sites.
Fie, I say! With all of your effort to show people the right way of developing with and for Drupal, do not give in to the short win but long-term failure of using database dumps for deployment, especially when there already is a better way!
To me this is like the Iron Chefs saying "don't take all that time to make your own food like we do, instead try some drive-thru our delivered pizza!"
But what's wrong with database dumps?
First off, I don't think building a wrapper around a database dump can be called a real "installation profile", it's nothing more than a shortcut to running "mysql -u [username] --password [password] [databasename] < drupal.dump" - one single database command that could also be done as a shell script.
Other ways it sucks includes:
- Tracking your changes as development progresses involves keeping multiple dumps of the database. With complex sites this can get very large, very quickly.
- You cannot make granular changes without diving into a multi-thousand line. With MySQL this becomes even trickier as the mysqldump command's default is to put all record inserts on one line per table.
- You cannot extract pieces of it or make parts of it optional, e.g. give site installers an option on which content types should be enabled - with database dumps it is all or nothing.
- Adding a new module or removing one, requires the dump be recreated, you can't just add a variable to enable it and then add a few settings lines.
- Because the entire database is installed in one go, it cannot transparently handle updates to modules, instead you'll have to run the system update script right after installing to ensure the schema is correctly updated.
- You cannot support more than one database per dump file, so all that work to make Drupal database agnostic gets thrown out the window, instead you're locked into MySQL. Supporting other databases then requires the same site be recreated on each separate database installation, thus twice the amount of work for two databases, three times for three databases.
- It works against efforts to ensure all modules have clean APIs for programatically managing their data structures.
- If bugs crop in during development, you have to roll back to an earlier copy of the database, you can't track it down to "oh this array field on the content_field_create() makes the FileField module misbehave".
"What's it all abou' Alfie?"
Before I go any further let me quickly explain what an installation profile is for anyone who's unfamiliar with the term.
In Drupal when you go through the first installation to create the initial bare site it is actually running an internal script which creates the bare Story and Page content types, and unfortunately out of the box that's really all you get. Thankfully you can take this script, called an installation profile, and add a tremendous amount of functionality to your site, so that when your client (or mom, or neighbor, or..) finishes going through the installation process they can have a fully functioning and awesome site ready to go. It's magic!
An installation profile also lends itself well to being shared with other developers, because you'll ultimately have a clean way to install your site, and to be able to hand it off to others.
The script itself has a few functions to control the process - you first tell the system which modules (aka plugins) to enable and then any additional instructions you want. With a few lines of code you can create content types, add fields to the content types, add blocks, create menus, load views, define lots of settings and even install starter content! Additionally you can make them even more advanced by adding options to the process, so you can have one large profile which can be reused for different situations (e.g. your mom wants a blog but your neighbor doesn't).
Some more reasons why you should use installation profiles:
- An install profile allows you to build a standard toolkit for your new Drupal sites. Supposing you always do articles the same way, photo galleries the same way, have the same elaborate Contact Us page built with Webform - that can all be placed in an install profile to make future projects quicker.
- If something breaks you have a clean way of going back in time to pinpoint the change.
- You can use table prefixes, which lets you use the same database for multiple purposes, including multiple Drupal sites. Or maybe you want to share the user tables across multiple sites..
- All change are in code so can be tracked in your revision management system, i.e. subversion, git, etc.
- It becomes very easy to make minor changes and verify that each change works correctly.
- As modules are updated you'll be able to keep your profile up-to-date by checking whether the functions accessed have changed.
- The end result is reproducible and so it's easier to detect if something stops working during the development cycle, making bug detection and elimination that much easier.
- By using the built-in functions to create the data structures you will be building your profile to be database agnostic, presuming the individual modules are. With the work in Drupal 7's database layer to make the absolute rule of the land, you're opening the door for anyone to run your profile on any database.
- You can build a SimpleTest suite to ensure your site is working correctly: are my users able to register and fill in the optional fields, can you rate content, is the image gallery carousel displaying in the correct location? The list just goes on.
- More and more modules are being built to support exporting their data structures, e.g. Views, Panels, and recently ImageCache and others. This makes it super easy to test something and save it out for your profile, rather than dealing with just the regular APIs.
- Lastly, it's The Dries' Way, going back to his October 2006 post about building distributions to customize Drupal for specific audiences rather than forking the project.
So that's the basics.
The chocolate's cabbage-flavored center
While this sounds all great, there can be some problems developing an install profile:
- It takes time, effort and lots of time. Yes. It can take lots of time, especially the first time you make one. You have to start small, try something, re-deploy, see if it worked, adjust as necessary, rinse, repeat. If you're on a slow machine this can get long in the tooth as you wait 5-15 minutes for your deployment script to finish before you can verify if everything worked correctly.
- It isn't as easy as just fiddling with the admin control panel for a few hours.
- You have to write all of your desired functionality in code, which can get rather verbose.
- Not everything has an easy way of exporting settings. Thankfully you can export content type definitions, Views, Panels and even ImageCache profiles, so those can be put into code very easily & quickly, but not taxonomy structures, blocks, menus, nor just all those settings.
- Not all data structures & modules will have a clean & documented API (i.e. functions) to create the required data.
- Occasionally, to work out the correct variables to pass to a function or
- You occasionally have to dig into the code to work out how to make something work the way you want.
- Occasionally the functions will be buggy and you'll have to track down the bug or work around it.
- You can end up with some really dirty nasty code, with nested arrays being passed into a function.
- There's unfortunately no single step to building a clean installation profile from an existing site. There was one for Drupal 5 called Profile_Generator but it hasn't been updated, nor has there been any real effort (in the module's CVS tree) to try.
A few specific problems I've ran into:
- Early on the collection of modules I had was introducing a bug whereby I could no longer export content types. This forced me to manually recreate all of the content types manually, including all of the individual fields and field groups, which obviously took much longer. this was definitely a bug in a module somewhere as it eventually worked its way out.
- Many data structures I wanted to work with had kinda kludgy data creation APIs. I ended up using the install_profile_api module to help simplify some of it.
- Install_profile_api wasn't always the best way of doing things, I found in some occasions I ended up writing about the same amount of code to use its functions as just using the built-in APIs, which defeats the purpose. Obviously it needs further work.
- There can be unexpected dependencies with your intended setup, like not being able to add menu items to taxonomy pages for terms that don't exist yet (i.e. add menus after taxonomy terms), to needing to assign role permissions after content types are created so you can set who should have access to them.
Isn't there another way?
While the above can, and does, take a good amount of time to do, and can lead to occasionally unsightly code, there currently aren't any fully working solutions that can completely replace it. That said, several projects are making definite headway and are worth keeping an eye on:
- Features - packages up combinations of views, content types and other functionality into packaged modules, the idea being that you could load a module and gain X functionality. Currently in alpha state for Drupal 6 and requires the new (but awesome) Context module.
- Patterns - create an XML (or YAML or PHP array) structure which defines specific groups of functionality. There's also Patterns_Profile which lets you select functionality in the installation profile. Currently the Drupal 6 version is under heavy development.
- Distribution Export - intended as a script to compile an installation profile from an existing site. Not even an actual module yet, so it has a long way to go.
- The Aegir platform, including the Provision, Hosting and Hostmaster modules. This is currently only at v0.2 (or rather the first release candidate of v0.2) so it has a long way to go.
So maybe in another six months this blog post will be irrelevant, one can hope at least.
My learning curve
My first real Drupal site, skinet.com built in 2008, was built the old fashioned way - through the GUI. When it came time to deploying the site to the production environment it was through the old database-dump-and-upload technique, and when something needed to be changed I had to ensure I remembered every single step, which could take hours depending on the amount of changes. Having done this once I wanted to find a better way.
My second Drupal site, scubadiving.com built in early 2009, started the same way. While I tried looking at using an installation profile I found it to be a bit cumbersome. So, for several days I worked to build the architecture on my laptop with the intention of doing the database fandango when I had to deploy it to other servers for testing. Well, that seemed great until about 1am on a certain Friday night when I had two edit forms open simultaneously. I saved both forms only to discover that I was now getting some weird error - a conflict had arisen from the submission and a faulty data structure was now causing errors. I then proceeded to waste two days trying to track down the bug as I foolishly didn't want to loose what I had already done by starting over.
When it became obvious that I was not going to track down the error without a) putting the project in jeopardy or b) going crazy, I went back to the drawing board. Yes, I started with a fresh install. I didn't delete what I already had, instead I created a separate directory and local hostname to work from so I could reproduce the parts that did work cleanly.
I scoured the net to find as much information as I could about creating install profiles. Unfortunately there weren't very many guides available to explain how to build installation profiles, so it took another few days of work testing different things before I finally worked out a good routine and getting caught up. The rest of the time on the project saw me building out everything in the site (excluding the main data migration from the site's existing CMS) so that at any time I could scrap my database and reinstall, which I did frequently.
Several months later this concept now going a step further to being used to build an even larger install profile for managing multiple sites much more complex than scubadiving.com.
At this point I'm convinced that this is the best and only reliable way to build a large Drupal site so that it remains completely stable, which is ultimately what clients want.
Anatomy of an install profile
An installation profile has three main functions:
- hook_profile_details() - this is a simple array that tells the system about it. This should return an array with an element named 'name' which defines the profile's name, e.g. array('name' => 'My swanky profile'). You can also add a 'description' element to give a longer explanation of what your profile does.
- hook_profile_modules() - with this function you simply return an array of modules you want enabled by default.
- hook_profile_final() - while not strictly necessary, your profile isn't really going to do much if it only enables modules. This is where you define all of the real guts of your profile - the content types, taxonomies, etc, and this is where you'll spend most of your time building the profile. It should be noted that this function is executed after the modules have been fully installed and enabled, so not only do you have full access to the entire Drupal function set but also that of all of the enabled modules, so go have some fun!
How I build install profiles
Now, after all the meta discussion, lets get into the heart of it: how I build installation profiles.
Step 1: Two sites
The first steps is to create two local hostnames to work from - one for testing out ideas, the other to build & test the profile, both running from the same directory structure but with different hostnames and databases. This might be seen as testing, but honestly you won't know how to build your profile should work until you test different options and decide how it should all work.
The way I worked the two different sites is to consider one the profile master and the other the feature test bed. The profile master is where I add code to my profile, then empty the database and re-run the installation, while the feature test bed is where I take the currently working profile and, through the normal GUI, test out various combinations of feature until I decide upon a specific solution.
Step 2: Build the structure
While an installation profile is basically just one file named e.g. "mycoolsite.profile", if you build everything into this one file you'll go crazy - you need to separate it out into separate files for easier management. You also have issues of dependencies - you can't add a menu item for a node that doesn't exist, or add a taxonomy to a content type that hasn't been added yet. For scubadiving.com, after lots of go-arounds and juggling dependencies I had ten sub-files in this order:
- Modules enabled at the start,
- Initial settings for Drupal core and the modules above, like setting the homepage to load a custom view rather than the default, assign the standard theme, etc,
- Roles and permissions - the site used the Content_Permissions module so this got rather long,
- Users that were needed at the start (a generic site editor and one for the main site's editor).
- Content types with all required fields,
- Taxonomies, including all terms.
- Nodes. In order to do step 8, build the menus, we needed some of the generic nodes available, like "Contact Us", "Customer Service", etc.
- Three different menu structures, most of which were links to taxonomy term pages created in step 6 but many were links to specific node pages created in step 7.
- Blocks were mostly added through a module enabled in step 1 and just needed some adjusting & visibility rules assigned, but there were also some boxes (generic blocks) manually added with some custom PHP.
- Final steps, including adding some filters, updating some URLs, building a few actions, etc.
With each step's contents separated out into a different file it became very easy to manage and build in a clean & reliable way. If you realize you're ending up with one of your files getting overly large, feel free to chop it up into smaller files, basically whatever works for you to keep it manageable. You may also discover that you have additional dependencies, also so feel free to shuffle it around as needed.
For a new site I'm doing I'm going a step beyond that to having a standard set of shared settings and allowing them to be overridden per-site, but that's left as an exercise for the reader. Or at least I won't blog about it until after I'm finished ;-)
Step 3: Get modular
The first part of the installation involves the hook_profile_modules() function and deciding which modules you want to have. To do this you just have your function return an array of names of the modules you want - simple stuff. While you can go gagga and throw everything in it at first, I suggest starting off with some basics and building up as needed.
Step 4: Add feature, test
The second step is feature building - adding features to the feature test bed machine to see how well different options work. For example, maybe you aren't sure whether to use the GMap or GMaps module for your geo-mapping, or whether you should use the Image module or ImageField & ImageCache modules for your image embedding (hint: use ImageField). If at any point something breaks you can dump the database and re-run the deployment script to get you back to where you were; while this doesn't happen often, it can happen when using in-development or alpha-release modules so it's good to be able to do.
Step 5: Copy to the profile, test
Once a specific piece of functionality is decided upon you then have to work out how to reproduce it in your profile. While some modules, like Views, Panels and ImageCache, let you export your data structures, you aren't so lucky with most of them, so you'll usually have to dig into the specific module to find the key functions. I've posted some specific tips below.
Once you've worked out how you're going to reproduce the functionality you have to add it. I recommend going slowly though this, making small changes and testing on the profile master site whether it works as intended before continuing to the next part. For example, rather than say "I'm going to add this block," instead try adding it in steps - first create the block with its content, make sure it comes up correctly, then add the visibility rules, etc. It's also very important to commit your changes to your code repository (i.e. svn, git, etc) after verifying each minor change, so that if you accidentally break something you can easily go back to the last version that was known to work correctly.
Step 6: Lather, rinse, repeat
Continue repeating steps four and five until everything is done. Remember to make changes in small pieces and save each step to your repository to help ensure you avoid problems.
Tips for specific modules
Having built a site with an installation profile already, here are some tips for dealing with specific modules.
- The module selection step, in the hook_profile_modules() function, requires that all modules added to the array have the exact same directory name as internal module name. This shouldn't be a problem as modules are supposed to follow the same strict naming conventions as functions, so e.g. the "Node Export" module should have its directory named "node_export" and the files named e.g. "node_export.module". As it turned out, until recently this module was using the file name "export.module", so in order for it to work in the profile the directory had to be named "export" and in the profile it had to be named "export".
- The install_profile_api functions can particularly help at times, so it's worth giving it a look. The developers are also very interested in improving the module, so please submit any patches you come up with if they'd be of use to others.
- That said, sometimes the install_profile_api functions can feel a bit overly verbose versus the built-in functionality. For example, compare install_profile_api's install_create_field() function versus the built-in content_field_instance_create() function - I've found the latter can be simpler to work with. Something to work on, I think.
- While the memcache module is an extra layer of awesomeness for your site, it seems to cause problem with the installation, so you'll need wait until after the installation to enable it. Hopefully this can be resolved soon, to remove an extra step.
- Most of the time you'll want to assign your settings right after enabling the modules.
- Google_CSE has to be enabled after Clean URLs is configured, so it's best to leave that until the final wrapup stage of the profile.
- Make sure you define the pauthauto variables before inserting any taxonomy terms or nodes, otherwise you'll end up with some butt-ugl URLs.
Life can be better
As you can guess from the above, this can take a good amount of effort. For a large site I definitely believe it's worth taking the time to ensure everything works correctly,
The single best way to improve the ability to develop installation profiles will be for more developers to support the ability to export & import data structures for their modules. Thankfully Earl Miles has done everyone a great favor by adding an API to his Chaos Tool Suite module for other developers to make doing so easier, so now there's no excuse. Thanks Earl!
Also, as mentioned earlier there are several projects which are looking to improve this whole kettle of fish, so hopefully by the time Drupal 7 graces us with its presence we'll have an easier time of it.
Until then, please don't make Prince Vultan angry, create installation profiles the proper way rather than using database dumps to deploy sites - he's gotten to be so mellow since Flash dispatched Emperor Ming, all the kids love him now!
- 6/7/09 - Added the anatomy section.