What Makes a Content Management System?
I got to thinking the other day: exactly when do you have a “content management system?” We’ve all built apps that manage content, but when do you graduate from a “relational database with an admin section” (RDBWAAS) to the lofty and deserved title of “content management system?”
(Incidentally, I struggled with what to call the venerable “relational database with an admin section,” to the point of asking a group of colleagues what they would call it. “Ree-dee-bee-wazz” became the default choice.)
I was working on a site the other day that was built (by someone else) in classic ASP back in 2001, and it was just what you’d expect: a bunch of hand-coded admin interfaces to an Access database with ASP pages full of embedded code for the presentation. It was the very definition of RDBWAAS.
Was this a content management system? It was indeed a system that managed content, but somehow I just couldn’t bring myself to call it a CMS.
If we look at content management functionality as a continuum, there’s a graduated scale between the two. On the one side, you have something simple – an “articles” table with a couple of password-protected pages to update it. On the other side, you have a commercial CMS that you paid $50K for with all the bells and whistles. Specifically, how are the two different?
In terms of feature sets, here’s where the two models overlap pretty clearly.
Content Modeling and Storage: This is the process by which you take the content you want to manage, and turn it into data the system can process and store. Ironically, this is actually one place where the RDBWAAS system shines – there’s no more granular way to model content than a custom relational database. In this respect, most content management systems are shadow of what you can do with an empty database and a copy of phpMyAdmin. (So why use a CMS at all? Well, to get everything else on this list…) On the content management side, systems vary wildly in how well they model content. Some allow you to create custom XML files, others have object-oriented relational databases, and I found one the other day that essentially said, “Create your own relational database and just tell the system how it works.”
We’ve talked about this aspect of content management ad nauseam: The Necessity of Subcontent, Open and Closed Content Management, Discrete vs. Relational Content Modeling, Is the Relational Model the Best Model?, etc.
Content Editing: At the risk of being too basic, this one is obvious – both extremes allow you to create new content, and edit and delete existing content. We’ll also lump WYSIWYG editing under here, since the quality of the rich-text editing interface is usually completely independent and separable from the larger application. Even the most basic RDBWAAS system can sport a really advanced WYSIWYG editor without too much trouble.
Publishing and Templating: Both extremes allow you to present content at a URL for visitors to consume it. Refer back to our post on Content Publishing Models for a discussion on just how content gets from your content repository to a URL where people can view it. While it seems obvious, it’s worth mentioning because there’s such a wide range of ways to do it, and there’s quite a range of how separate the two steps – templating and publishing – are. They may happen at the same instant as the content is requested (a PHP page that retrieves and formats database records, for instance), or your system may use a template to convert data into an output file that it then FTPs, file copies, or otherwise moves to a publishing location.
From this point, you move into “higher level” content management functions. What can get a little tricky here is figuring out where the functionality actually lies. In a CMS environment, functionality can source from three places:
The operating system or some application external to the CMS
The CMS itself
Functionality built on top of the CMS
For example –
If you have your CMS in a staging environment and want to push the content to a live database, you could always configure some database replication or a timed export job. In this case, the “remote publishing” or “repository replication” feature is actually “under” the CMS – the CMS knows nothing about it. This functionality is coming from the underlying operating system and supporting applications.
At the opposite extreme, let’s say you need a calendaring application. Using your CMS, you design a “calendar” object which contains “event” objects. This content model is managed by your CMS, and uses the CMS publishing tools to deliver a calendar to a “consumable” URL. In this case, this functionality didn’t really come from your CMS either – it came from an application built on top of your CMS using the tools your CMS provides.
So, in talking about “higher lever” CMS functions, we’re going to try and stay strictly within the bounds of the CMS itself. We’ll start with the absolute “core” functionality – things which about everything calling itself a “content management system” better be able to do. Here goes:
Versioning: In a versioning system, when content is updated, the older version is kept. If something needs to rollback, the older version can be restored – usually to a draft state, ready to be published as a new version. Versioning is usually simple and serial, but higher-end systems can have branching, merging, and all the other goodness you normally associate with source code management systems like Subversion.In my mind, versioning is utterly necessary for any claim of “content management.” Managing content is largely concerned with keeping it safe, and making sure old versions are recoverable is a big part of that. Remember, if you don’t version, then there’s no functional difference between the permission to update content, and the permission to delete it.
Granular User Management: Most RDBWAAS systems have binary access – there’s a password, and if you know it, you can do anything. More advanced user management allows you to put users into groups, to which you can assign specific actions (edit, update, delete, etc.) on specific content. For a system with a sufficiently large number of authors, permissions are everything. Josh Clark of Big Medium fame says that “the #1 category of feature requests I get is how to restrict people from doing x, y and z.” My experience bears that out as well.
Content Organization and Relation: I’ve written several posts about this – mainly The Content Tree, The Necessity of Subcontent, and Discrete vs. Relational Content Modeling – but let me say again that the ability to position content in larger organizational structures and in relation to other content is one of the most crucial functions of a CMS.
File and Image Management: In additional to textual data, content is often supported by binary files – images, PDFs, etc. A CMS needs to store these files somehow, preferably in relation to the content that uses them. See our post on File and Image Handling in Content Management from just a few days ago for more on this.
Multi-State Content: It’s awfully handy to be able to leave content in a half-finished state before publishing it, or to be able to “archive” content so it comes off the published site without actually deleting it from the repository. Even something as simple as an “active” checkbox can be a Godsend in a lot of cases.
From here comes “extra” functionality, which is where systems start to diverge widely.
Workflow or Approval Chains: As I’ve said before, workflow can get crazy complicated and people usually think they need much more complicated workflow capabilities than they really do. Usually they just need serial approval chains, a la Ektron and others. I read a good quote in a book once: “Workflow is the most over-purchased aspect of content management.” (I think it was Bob Boiko’s book, but with 1,200 pages, the exact citation escapes me.)
Check In/Out: Got a lot of people working on your content at the same time? If you’re able to check content out, it means no one else can touch it while you’re in it, saving you from concurrency conflicts or the “last one to the submit button wins” problem. (Some versioning systems can approximate this. The trick is determining when the new version is generated. ez Publish generates a new version when the “edit” button is pressed, not when the edits are saved. This is crucial, because it means that even if two people hit “edit” at the same time, they each get their own new version to work with. They both save those versions independently, at which time they can work out which one actually gets published.)
Extensibility and Integration: Oftentimes you need your system to do something just a little bit extra. Higher-level systems have hooks and filters in place for you to add functionality not built in, and programmatic APIs and other ways of getting into the repository to allow you to connect and manipulate content from other systems.
Shared User Directory: Having yet another database of users can be a pain, so you find a lot of systems that have LDAP integration or other methods of plugging into other authentication schemes to manage the user base.
Scheduled Publishing and Expiration: It’s handy to be able to schedule when content should automatically go live and when it should come down.Ektron has a nice implementation that takes this a step further by defining what should happen when the content expires. Does it actually come off the site or does some other administrative action happen like the content getting added to an “Expired Content Report” or generating a task for someone?
Task Management and Collaboration: We’ve discussed this a bit in our post on The First 85%. More and more systems are providing task management subsystems to handle the real “management” of the content – the discussions and preliminary communications that occur when teams are working on content together. The benefit of having these systems integrated into your CMS is that you can bind discussions, tasks, and historical comments to specific content items, so they can be viewed in-context and retained for future reference regarding that content.
Image Manipulation: Some systems will allow you to store images in a “pure” form, then have them automatically manipulated by the system (usually re-sized) for delivery with content. So if you want to change the size of all the images with your news articles someday, you have the source image which the system can just re-manipulate in a new way. See Image Abstractions and Implementations in Content Management.This is fairly closely relation to multi-format publishing (see below). In both cases, you have a “pure” object which is published in multiple “renditions.” The “pure” version stays, well, pure. The renditions are more or less disposable because they can be regenerated as necessary.
Auditing: This is versioning taken a step further. A lot of organizations need to know everything single thing that has ever happened in the life of a piece of content – who created it, who has looked at it and when, who has edited it, and finally when it got delete and by whom. The more regulated the company, the more important this becomes.
In-Context Editing: This is the anti-admin section. When logged in, the system provides controls to initiate actions on content from the public-facing side of the site, so you can browse the site like a visitor, and do things without having to wade through an admin section. The actual interface varies. Often you just see menus and hyperlinks (“Edit this Content”) that are hidden from the general public. Ektron uses a right-click context menu. Documentum Web Publisher had special HTML comments you put in your template that, when viewed through a proxy page, exposed editing menus. (Why the complexity? Since the logic was built into their proxy page, they could provide the same in-context editing no matter what templating language you used, from ASP.Net to Perl.)
Membership Management: Fewer and fewer systems these days are anonymous-only. More often than not, there are membership systems in place so that visitors to the site can be “known” to the system. This enables you to allow content on a subscription basis, tune analytics, and store preferences and other settings on a per-user basis. The more graceful implementations of this make everyone a “user,” just giving some users more permissions than others (e.g. – the ability to edit content).
Pre-Built Functionality or “Widgets”: Reading this list, I can hear a lot of CMS vendors crying out, “What about discussion forums?” or “What about our RSS generator?” or “What about our ‘Get the Weather for your Neighborhood’ control?” All this, and more, falls under the heading of “pre-built widgets.” There’s a lot of them, and CMS vendors throw them out all the time as simple solutions to common problems. Sometimes they’re good, but a lot of the time they do everything you want, except that one little thing, so you end up re-writing them yourself anyway. I spent a lot of time harping on this in a post called Architecture and Functionality in Content Management. The bottom line: functionality is nice, but a system will live or die on architecture. Make sure the core architecture of your system is in good shape before you start writing little widgets.
In-Context Preview: If you have pending changes, some systems will allow you to “publish” them to your session only, so you – and only you – can browse your Web site as if all staged content was published. Ektron does this nicely – you can enter “Preview Mode” where you can browse your entire Web site through “publish-it-all glasses.” A lesser implementation of this just allows you to see pending content rendered in its corresponding template, on a piece-by-piece basis.
Integrated Analytics: I’m not a huge fan of integrated analytics, as I’ve never seen any package built in to a CMS that provided even 10% of the functionality that Google Analytics provides. Still, a lot of people want something simple and integrated with their CMS so they can view analytics from their admin interface.
Search: As with analytics, I’ve never been a big fan of integrated search – I almost always end up using something outside of the CMS. The problems with integrated search is (1) the search is from the CMS perspective, rather than from the user’s perspective; and (2) you often want to search things contained outside the CMS. That said, some search functionality is generally required with a CMS, for administrative management, if nothing else. When you say “search,” you usually think full-text search. However, a core piece of architecture is the ability to fetch content from the repository, sometimes with finely-grained criteria (think a SELECT statement in SQL with lots of WHERE clauses). Some systems are better than others.
URL Management: If your system ever renders a link to content within itself, it needs to know the URL at which that content gets accessed. For example, the CMS may know that your article ID is 348, but you have to somehow tell it that when it publishes a link to that article in the navigation, the template (and consequent URL) it should use is at
/article.php…except for things in the “politics” section – they use “/politics/article.php,” etc. As such, a CMS needs to be “URL-aware” so it knows the publicly-accessible URL that maps to the contain contained within it.
Additionally, many systems allow you to alias URLs for usability and search engine optimization. (However, rewrite engines at the Web server level – mod_rewrite or ISAPI Rewrite – can usually solve this problem.)
Multiple-Format Publishing: Though your primary publishing format is usually HTML, many systems will allow you to define other “renditions” of content, from a simple printer-friendly version to a full-blown PDF. In practice, however, I’ve found that the utility of this is limited. Printer-friendly versions are best handled with CSS these days, and alternate formats like low-bandwidth and mobile versions can be handled at the template level. That said, one handy application of this is the automatic PDF generation of binary files. Users store a Word document, but a PDF gets published. This saves you from having to manage an explicit PDF rendition – the CMS creates a new one whenever a new Word document is published. (The best implementation I’ve seen of this is Cascade Server from Hannon Hill. Go to any page on their site, and you can view the same content in six different formats, from rich text to WML. Crazy.)
Multiple Languages: Same content, multiple languages. It’s sounds simple, but it gets awfully complex pretty quickly. What you find is that everything has to be multi-lingual – every piece of content, every bit of navigation, even publishing templates since they often have text built into them. (Not to mention the admin interface of the CMS itself.) Most systems will let you select a default language. If content doesn’t exist in the requested language, the two options are (1) don’t show that content, or (2) show the content in the default language. (What gets really fun is when you have content published in multiple formats and in multiple languages. One article published in six languages with HTML, printer friendly, PDF, and WML versions comes to…24 renditions of your single piece of content.)
Email Integration: This is kind of an odd one. I’m including it because it’s getting more and more common for systems to treat email as another publishing channel altogether, but really, this is just a compilation of a lot of the above features: it’s a pre-built widget, an alternate format, and a manifestation of an extension architecture all at the same time.
So, there you have it – a brief survey of what content management systems do over the RDBWAAS systems we all start with. It’s a broad survey, and I’m sure that I left some things out, so…
I hereby announce this entry will stay open indefinitely. Comment away about with your opinions about where I was right, where I was wrong, and what I left out. I will periodically add to this entry as necessary.