Decoupled Content Management 101
I’m curious how many developers have worked with “decoupled” content management systems. This is a system that sits behind your firewall, manages your content, and pushes it into your delivery tier on-demand. In these environments, you have a “repository server” and a “publishing server,” and never the two shall meet, except for the conjugal moments when you push content over the wall.
The existence of these systems is very stacked towards the upper end of the content management scale. Decoupled systems are usually very enterprisey, and most often Java-based. (More on why this is later.)
I’d guess that a relatively small percentage of developers have experience here. I’ve done a tiny bit of work in this space, but my stock-in-trade has always been coupled systems where the CMS and delivery server are one-and-the-same – systems like Drupal, Ektron, Episerver, eZ publish, etc.
When I was presenting at Gilbane Boston last December, I took an impromptu survey of the attendees at my session. Of the 60 or so people with active CMS projects, only one of them was on a coupled system.
If all you’ve ever worked with are coupled systems, you might be pondering all the limitations this architecture brings with it and wondering why anyone would need this. It seems so…out-dated, right?
The primary benefit to decoupled systems is that you don’t have to run a CMS in your delivery tier (in most cases – more later on this). This imparts all sorts of secondary benefits.
It allows your repository system and publishing system to be on different architectures. You could easily have a Java-based repository pushing content onto a Windows server running .Net.
You can usually make your delivery layer more secure. If you’re just publishing static HTML assets to a stripped-down Apache server, your attack surface drops through the floor – the potential hack points an a well-maintained Linux/Apache server are minuscule compared to your average Joomla install. Having a full-blown CMS in the delivery tier has been known to give panic attacks to paranoid sysadmins.
You can publish content to multiple servers. A large media enterprise might have hundreds of servers in dozens of countries on multiple continents. Deploying content is much more complex than changing the “published” column in some database table. Edge servers need to be updated, media needs to be pushed into a CDN, reverse-proxies need to be reset, failover servers need to be updated, etc.
Since you don’t need to install a CMS on all these delivery servers, you don’t need to license them (usually – again, more later). Depending on the size of your delivery environment, this could save enormous amounts of money.
It can be easier to scale a decoupled delivery tier. Bring a new server online, add it to your publishing script, and tell your load-balancer about it. That’s a bit of an over-simplification, but I’ve heard about installs on grid networks like EC2 that can bring a new batch of servers online in literally minutes.
Reliability is usually higher. No CMS in the delivery tier means less moving parts. Static HTML files usually don’t throw exceptions.
You can publish content from multiple repositories and systems. Your CMS may only be one system of many that generate content, so your delivery tier needs to publish content without knowing nor caring where it came from. (For much more on this, see The Dawn of the Web Content Delivery System from last year.)
In some cases, the content of the Web site is secondary to its purpose. Wells Fargo’s Web site, for instance, is a massive, custom-built banking platform that incidentally also displays some content. You can’t drop a CMS on top of this. The CMS has to be subservient to it, exist somewhere else, and push content into it.
Some editors want a “staging environment” in which to develop content. I think these are overblown and usually unnecessary, but a lot of people are still in the mode of developing and reviewing content in one environment, then publishing it to another. The idea of “editing content in production” freaks them out.
Late last year, I was researching my session on a related subject for Gilbane, and I had the privilege of talking to a lot of really smart people in the content management field about publishing models. People like Kevin Cochrane from Day, Seth Gottlieb from Content Here, Tom Wentworth from Ektron, Tony Byrne of the Real Story Group, Peter Monks from Alfresco, and John Peterson from Sutro Software, among others.
What I learned is that being coupled or decoupled is not a binary state. Between the two extremes, there are lots of shades of gray and levels of “purity.” This has evolved over the last decade-and-a-half.
Way back in the day, decoupled systems were really decoupled. Many, many systems did nothing but push static file assets through to the delivery tier. You’d run some batch job, and a bunch of HTML files and JPGs would get pulled out of your repository and FTPed or copied somewhere to be consumed.
This is what we’ll call “pure” decoupling. I’m thinking of Cascade Server, the old Serena Collage, and even Movable Type (the humble blogging platform on which this very site still runs).
Today, this is more and more rare, even among systems that call themselves decoupled. A lot of systems can still do this, but it’s not the preferred way of delivering content anymore.
The fact is, things need to happen in the delivery tier. There’s a reason we no longer code static HTML files in FrontPage – because a decade or so ago, we decided we wanted our Web pages to do things. They suddenly had to act like databases, react to the user, be searched, allow personalization, and all those other great things. This can’t be done with static HTML. (There’s a reason why things like ASP, PHP, CGI were invented, after all.)
(One other challenge with decoupling we’re not going to talk about much is what to do with user-generated content. Some content, like comments, for instance, is created in the delivery tier. When you’re decoupled, you need to somehow push that content “backwards” to your repository if you want it managed. Pure or not, figuring this out is tricky in a decoupled environment.)
If you’re publishing pure static content, you can have very basic problems due to the fact that your content is frozen at publish time. For example, if you generate the navigation elements at publish time, then you essentially have to re-publish every page when even one thing changes, because you have no way of knowing if Page X appears in some menu on Page Y.
What usually happens with this model is you start publishing file includes, or scriptable files that execute or assemble at request time. This site, for instance, uses Movable Type to generate PHP files that define a “Post” class and are templated when requested. But you still have the limitation that you need an “overhead view” of your content to make decisions about how to display it, especially when rendering navigation and making decisions based on context.
To make this easier, static publishing naturally evolved to push data rather than files. You had systems that were now able to contact a database server in the delivery tier and “publishing” now meant just throwing database rows over the wall. With this, you could have all your pages in a table, and render them dynamically out there in the templating language of your choice. This pushes your templating to request time, rather than publish time, which is much easier to develop against (the classic "baked” vs. “fried” dichotomy).
I wrote about this eight years ago when I had a sudden epiphany that managing content was not the same as delivering content: The Value-Add Side of Content Management. A year later, I opined about rendering a site in an entirely different language than the one in which the content was managed: CMS Administration vs. Presentation Languages.
So, this concept of pushing data over the wall – is this still “pure” decoupling? Yes, because the delivery tier has no idea where the content came from. It’s published as a neutral format. With either HTML files or database records, your delivery server really has no idea where the content came from and doesn’t really care. You could even swap in another CMS, so long as it published stuff in the same format.
And this is where we saw the rise of “the runtime” (I totally stole that word from Kevin Cochrane of Day). By runtime, I mean code on your delivery server which acts on the content to render a Web site. These systems were built to make sense of all the published content assets and generate a dynamic Web site out of it. They didn’t manage the content (that was being done somewhere else, remember), but they delivered it. The runtime enables things like dynamic navigation, personalization, permissions, etc.
So long as our decoupling is pure, these runtimes don’t have to know anything about our CMS. This can be handy, because you could have entirely different sets of developers working on the runtime and the CMS. The runtime developers would just explain to the CMS developers what content they needed and in what format, and the CMS developers would see that they got it.
Additionally, there’s an entirely different set of skills required for developing in the runtime. As general Web developers, we tend to look at our skills as all bundled together, but there’s actually a clear dichotomy. HTML, CSS, user interaction, analytics, social media integration, public user interaction, etc. are content delivery concerns. Content modeling, workflow, editing, indexing, etc. are content management concerns. (I have a much larger blog post percolating on this point, I promise…)
However, eventually we started to see “proprietary runtimes.” This is when the company making the CMS wanted to extend their reach and have a hand in doing things in the delivery tier, so they started shipping a delivery component as a separate product. (I have no doubt some of them were also motivated by a desire to get licensing fees from the delivery tier.)
Now, don’t confuse a proprietary runtime with what we’ll call a “delivery receiver” in the delivery tier. This was some process that the CMS contacted to hand off assets (this was before Web services were so standardized – I remember Documentum even did some low-level socket connection). However, after content assets were transferred the receiver was out of it.
Proprietary runtimes stayed in the game after the moment of publishing. They were actively involved in manipulating content during the request, either by providing an API that the runtime could use, or even intercepting the request itself and mapping it to content. Interwoven was one of the first to do it with LiveSite – their CMS product, Teamsite, would push content to Livesite and it would manage the delivery layer.
If you expand the runtime enough, you get to the logical next step – you just publish into a full-blown remote CMS install. So, you have two CMS installs – one behind the firewall, and one in the DMZ. You manage content in the install behind the firewall, and it pushes content into the install in the DMZ.
The negates a few of advantages from above:
Your delivery tier is just as complicated as anything else
You have to license the delivery tier (though, you may get a discount for simple delivery servers)
It has to be on the same architecture
However, you maintain some other advantages:
You can develop content in a separate staging environment
You can usually remove the admin interface from the delivery installs to make them more secure
You can publish to multiple remote installs
Plus there’s another huge advantage – your CMS is acutely aware of your delivery tier. It shares the same functionality and architecture. A user group on the management server is the same as a user group on the delivery server. Functionality present on the management server is natively supported on the production server, so the two sides of the management/delivery equation are playing on the same ball field.
This type of decoupling is really common in the .Net CMS space. .Net systems are traditionally coupled, so you have a wealth of installs using all the power of a coupled, real-time system, and it’s a tough sell to strip all that functionality out to make the system decoupled. So, to make sure the decoupled runtime is just as functional, you pretty much have to shove the entire CMS out there.
Ektron has gone this route with their eSync product, and Episerver does the same with content mirroring. I know that Sitecore can publish into an active environment as well. Still. decoupled publishing is rare in the .Net world.
So, to conclude this rambling history of the evolution of content decoupling, I present to you my labels for the four levels of decoupling.
Stage 1: CMS publishes file assets only. (Yes, it could also publish XML, which would sort of be a data asset.) This is considered “pure,” by our definition, because the assets are vendor-neutral.
Stage 2: CMS publishes data assets, in the form of database records or other non-file data storage methods. This is also pure, as the data is vendor-neutral.
Stage 3: CMS publishes into a proprietary runtime, though not a full CMS install. This is not pure, since the runtime is provided by the CMS vendor.
Stage 4: CMS publishes into a complete remote CMS install. Obviously, not pure.
My final point is one that puzzles me quite a bit – there’s is a clear alignment between platform and decoupling. The vast majority of pure decoupled systems are Java. This is rare in .Net, and – to my knowledge – decoupling in PHP is non-existent. I know of no PHP system than works like this. (No LAMP system, even.)
I think a lot of the difference between Java and PHP can be can be chalked up to the relative “enterprisey-ness” of decoupling. To decouple, you usually have more than one server, in a managed network environment, perhaps with multiple load-balanced front-line servers, etc. Additionally, you likely have multiple competing human factors – sysadmins who don’t want a CMS on a public server, editors who want a staging environment, CIOs who don’t want to license multiple servers, etc. Organizations in these situations are much more likely to run Java than PHP.
PHP, for its part, has a very low barrier to entry, so it gets used a lot by smaller shops and hackers who are more likely to have a single server, often hosted somewhere. PHP projects are also more likely to be focused around a single developer who doesn’t have a lot of competing interests to juggle. (Relax, fanboys, I’m generalizing here…)
But it’s the .Net gap I can’t figure out. Other than the multi-install architectures discussed above, there’s not a lot out there. I know of no “pure” decoupled architectures on the .Net side. I don’t even know of any proprietary runtimes (I vaguely remember something about Sitecore publishing to an abbreviated runtime, but I’m no expert). Decoupling in .Net is all multi-install publishing.
(Update: I found a pure decoupled .Net system: Ingenuix)
I posted this question to Quora (my first question, no less). The answers were slim. Tom from Ektron chimed in that eSync enabled this, but I still maintain it’s really publishing to a remote install. You have to have a full Ektron install in the runtime for this to work.
Some may say that the gap Is because .Net isn’t enterprisey enough, but I doubt that. In my experience, more large organizations are running on .Net than Java.
I’m wondering if it’s a reflection of the fact that .Net CMS are relatively new. .Net isn’t even 10 years old, and the oldest .Net CMS are perhaps 5-6 years old. Java has been around much longer. Is this a factor because pure decoupling is an out-dated architecture? Are we not publishing static assets anymore, so the newer batch of .Net systems have never seen a demand for it?
Sadly, however, I am seeing a demand for it. More and more, we’re finding larger .Net clients who either want a CMS-less runtime, or who have a existing runtime for which they just need content injected into – they can’t have a CMS take over the whole thing. In these instances, we’ve looked for a .Net system to fill this gap, and come up short.
I’m doing a little exploratory work now with Episerver to see if I can make a decent pure decoupled system out of it. Episerver has great management tools, so this becomes a process of building a platform-independent templating system and publishing process. My goal is to make it as pure as possible (Episerver already does remote install mirroring).
I’d be curious on any opinions you might have on the “decoupling gap” from Java to .Net, and whether you have any knowledge of a system like this on the LAMP side. If you have any thoughts, please comment.
(Also, if you’re a CMS vendor and you feel like I’ve inaccurately characterized your system in some way, please let me know how to correct that. I’ll gladly update the post.)