Content Transformation as a Drivetrain
The drivetrain of a car is complicated as does a lot of stuff. Let’s consider the engine as the raw content stored in the repository, and the point where the tires touch the pavement is the final step in presentation to our audience.
The relationship of the power that comes off the engine and the rotational force applied to the ground through the tires is complicated and takes a lot of work to sort out.
You can’t hook the wheels directly up to the crankshaft of the engine. The engine has to constantly turn, and if the wheels were connected directly, they’d have to turn around about 500 times per minute at idle. To fix this, we have clutches and torque converters to disconnect the engine so it can run while the car isn’t moving.
In many cases, the crankshaft of the engine is a single rotational device turning on a north-south axis, while there are (at least) two drive wheels turning east-west. This is why we have differentials – they’re a series of gears with let the power of the engine turn 90-degrees and split into two outputs.
Engines can rev from 500 rpm to sometimes 10,000 rpm, but they make peak power in a more narrow range called the “powerband.” Thus, we have transmissions which match the speed of the car to the powerband, so the engine stays as efficient as possible.
This is even a simplified analysis – we didn’t even talk about flywheels, transfer cases, driveshafts, and wheel bearings. A lot of stuff has to happen to get a single horsepower at the crankshaft to do anything productive at the wheels. Remember, in its pure form, an internal combustion engine makes pretty useless power. It has to keep running, it only turns one way, and it gets irritated when you try to put load on it outside a particular RPM range. So the drivetrain of an engine is a mechanical pipeline that “fixes” a series of problems along the way, and what comes out at the end – tire on pavement – is useful power.
Is your content the same way? Is the raw content stored in your CMS of the same value as the content that ends up in the hands of the consumer?
What the drivetrain of the car is designed to do – much like transformation and templating – is progressively refine (there’s that phrase again) the power of the engine until it’s an appropriate form to be applied to the pavement through the tires.
And here’s the point to remember: there’s one drivetrain that powers all the wheels. Each wheel doesn’t have its own transmission. We’ll come back to this in a second.
A CMS needs the ability to transform content prior to delivery to a consuming application where it will be further templated. This means transforming content as close to the “engine” (the repository) as possible so that all “tires” (consuming channels) benefit from it. This is in opposition to delivering nothing but raw content and expecting the consuming channels to bear all of the templating workload.
The problem with leaving all templating to the consuming channels is two-fold:
Latency: some transformation is complex and involves considerable computational power and perhaps external resources. Doing this on every read can be painful.
Multiple channels/platforms/languages: if we’re pushing this content into multiple channels, there’s some basic transformation that we simply don’t want to repeat in every channel.
Put another way, having content in its raw form might be less than helpful (remember our useless raw horsepower). Here are some real-life examples:
Example: Domain Specific Languages (DSL)
A health system had 900 clinics, each of which had to maintain their hours online. This meant storing open and close times for seven days of the week, and occasionally there was a clinic that was open for separate time periods in the same day – for example, from 8 a.m. to noon, then again from 2 p.m. to 5 p.m.
This was a painful interface to model. At a minimum, you’re looking at 14 different properties (open and close times for seven days of the week), or more when you consider the odd exceptions. Additionally, it was tedious. Time selectors generally suck, and editors were faced with 14 of them. It wasn’t pretty.
So what we did was allow (and train) editors on a simple domain-specific language, like this:
M-F: 8-5
Sa: 8-noon
Or:
M: 8-noon, 2-5
T-Th: 8-5
Or:
*: 8-3
We wrote a parser to turn this into a strongly typed object, which could then be turned into JSON or XML (and subsequently cached). The logic to do this parsing was complex and took considerable debugging to work out. Additionally, it wasn’t computationally cheap. But it worked beautifully, and with about five minutes of training, it made perfect sense to the editors.
(For a more common example, consider Markdown, That’s a perfect example of a DSL designed to simplify the generation of HTML.)
Example: Markup Macros
In many CMSs, editors can insert “macros” in rich text. These are text shortcuts which “expand” to larger text constructs – usually HTML, but they could theoretically be anything.
For example, the site for my book has a glossary with wiki-like links between pages. This is what they look like in the editor (from the entry for “version control”):
{%{
A version label is a designation applied to any particular
version. For example, a version might be labeled
"{{publication|Published}}" or "{{Draft}}". By versioning
content, the {{CMS}} retains a history of all the changes
to a piece of content over time allowing {{editor|+s}} to
rollback to prior versions if necessary or audit changes
to content for security or regulatory purposes.
}%}
See those words in double brackets? Those are wiki-ish links that map to other terms based on ID slug and may display different text – different words entirely, or modified versions of the source word (i.e. – “editor|+s” links to “editor” but displays “editors”).I’ve written a parser to “expand” all these “macros” – to find the correct term, modify the source word, and add the hyperlink.
(And it’s not just me: Drupal has an entire architecture of “text filters,” and WordPress has a very popular system for handling “shortcodes” – I’ve used a couple in this very blog post, in fact. A more advanced version might be Episerver’s re-usable block elements that can be dragged into rich text fields.)
Example: Volume and Consistency
A law firm has 48 separate web properties. They all run on different platforms (some WordPress, some Sitecore, some hand-coded on non-executing environments like S3). They also have a series of blog posts, which they want to be able to insert into different sites (perhaps through a macro system, as described above).
However, they want the exact same format for the blog posts. They want a title, date, site name, preview, and link. They have a common HTML construct which can be styled for the individual property. What they don’t want is for each potential environment to “roll their own” templating code and form HTML from raw content. They want these all to appear the same way, in addition to the problem of some environments not being able to execute at all.
What they need, then, is transformation at the repository (the “drivetrain”). So each blog post has all its raw, editorial content, but it also contains an HTML representation of the last time it was saved which can be rendered to the page, unaltered, in any language, on any platform. This is a problem if the only thing the consuming platforms can get is the raw content to templating separately. They don’t need that – they need a bundle of content and rendering logic, cached to a string.
(Could these posts be pre-rendered and cached as HTML in some central location? Sure, but then you lose a couple things – searchability and any vendor-provided CDN or caching functionality – which we’ll discuss more below.)
The Need for Repository-Centric Transformation
These three examples both lead me to the same conclusion: I only want to do this early-stage transformation once. Here’s why:
The end result of these two situations doesn’t change post-publish. (Markdown renders to HTML, full stop.)
It’s relatively painful to do, computationally.
It would be a lot of work to re-implement and maintain this logic for more than one language or platform. (What if it depends on a library that doesn’t exist for a particular platform?)
Beyond the sheer workload, I really only want one canonical instance of this logic. (Markdown has many flavors. I don’t want developers applying their own preferred flavor to my source content.)
Think back to our car analogy – this is way up at the transmission level, right after power leaves in the engine. Late-stage templating is way back at the wheels. And once we start pushing content to multiple channels, this becomes a real problem. Remember: Each wheel on the car doesn’t have its own transmission. There’s one transmission, which all the wheels benefit from, just like there is content transformation we don’t want to repeat in each consuming channel.
(Consider the content from my glossary site. If I wanted to use that here, on Gadgetopia, I would have to rewrite all that wiki link parsing and expansion logic in PHP (it’s in .NET right now). Believe me when I tell you that I don’t want to do this.)
“Back End for Front End”
All the issues above aren’t problems if we own the entire content publishing apparatus, from start to finish. However, with the new crop of headless CMS, we often don’t. These are often SaaS systems, with which we interact through web APIs alone.