Output and Publication Management
There’s a funny Far Side cartoon showing a group of people in a karate studio. Through the window, you can see a flying saucer has landed on the street outside and “aliens” made of bricks and boards are walking down the ramp, preparing to terrorize the town.
The caption reads:
The class abruptly stopped practicing. Here was an opportunity to not only employ their skills, but also to save the entire town.
The implication is that the karate students were very skilled at breaking boards and bricks, and never expected they’d have a chance to use these skills to actually do something productive in the real world.
The same goes for our CMS. Now we’re to a point where we have to output some content for a visitor to consume. Up to this point, we’ve basically been practicing. We’ve modeled our content, determined how to aggregate it into groups, and identified the editorials tools necessary to enable our editors to work with it.
The one thing we haven’t done is actually publish it. To provide value, a web content management system has to generate web content at some point. We have to get the abstract notion of content out of our CMS and into a form and location where it does some good.
In other words, it’s time to get out of the studio, break some boards, and save the town.
The Difference Between Content and Presentation
As I’ve said before, there’s a tendency to look at a news article that you’ve modeled using your CMS and delivered into a browser window and say, “That’s my content.”
But it’s not. That’s a web page. It just happens to be displaying your news article. Your news article and the web page it rode in on are not the same thing.
What if you published the same article (in a shortened form) to Twitter? Would that be your news article? No, that would be a tweet, just displaying some different information from your article.
The fact is, that same article might be published into 20 different distribution channels, your website just being one among many. In each one, a new presentation artifact is created using information from your news article. These artifacts are not your news article; they’re just things created from it.
The article might even be presented in different ways on the same website. For example, while the article has a “main” view where a reader can consume the entire thing, it likely appears in some other form on several news listing pages, which just use the title, the summary, and perhaps an image. And what about when the article appears in search listings? That’s yet another presentation of the same article.
The key here is to separate your content in its pure form – its raw, naked data – from the ways in which it’s used. Your article might consist of the following information (attributes, in the content model):
- Title
- Summary
- Image
- Byline
- Body
This is the pure content that makes up the article. It might also need to have multiple attributes to help when it’s presented in various channels, such as:
- Tweet Body
- Sidebar Position
- Facebook Link Text
This information isn’t your content. It’s not critical to the “spirit” or core of the news article. It exists merely to aid in the translation of your news article into a format necessary for one or more channels.
In the end, does this matter? In many cases, no, this is merely an academic argument. But as we continue, it’s important to note the difference between the content and the presentation in which it’s displayed. Some practices are universal to both, while others only make sense in the context of one or the other.
Templating
Templating is the process of generating output based on managed content. In a very general sense, the output will be a string of text characters, usually HTML. Less commonly, a CMS will generate binary content such as PDFs.
A CMS blends two things together to generate output:
- Templating code, or text entered by a developer, usually created and stored in a file. Different templates create different output. One template might generate a web page, while another might generate a tweet.
- Managed content, or text managed by an editor, created and stored in the CMS.
The combination of both is the final, rendered output.
Multiple templates are provided to the same content object to provide multiple outputs
One of the constant balancing acts in a CMS implementation is where this dichotomy lies. How much of the rendered output should be created and managed by a developer in the template, and how much should be created and managed by the editor in the CMS as content? The answer to this question has a huge impact on the usability and manageability of the final system.
Some might argue that editors should have as much control over the page design and layout as possible, which can be accomplished either by providing configuration options for every possible bit of output, or by allowing editors free access to the templates (often from the CMS interface). This rarely works well. HTML/CSS technologies have advanced to the point where someone without training can do real damage if they’re careless.
Additionally, in most cases, templated content is an advantage, not a limitation. As I mentioned earlier when discussing dynamic page composition, allowing an editor to change a page for no other reason than aesthetic preference might violate the site’s style guidelines, and dealing with exceptions (e.g. where all content is a certain way except News Release X), which is very rarely a good thing.
In general, clean separation of responsibility between templating code and editorial content is a desirable thing to have.
Templating Philosophy
There are varying schools of thought on the scope of templating that revolve around how much power the templates should have. The two sides of the argument look like this:
- Templates should not have any data that they are not directly given. Templates should be given a defined bundle of data, and they can format this data only.
- Templates should be able to retrieve and generate data as necessary. They should be small, encapsulated units of code that can reach “outside” themselves if they need to.
The first option is clearly more limiting. The CMS will “give” the template some data, and that’s all the template has to work with. The argument in favor of this is one of maintainability. Template developers shouldn’t be allowed unrestricted logic, or confusion will result because there’s now one more location for something to go wrong.
Terence Parr, the creator of the StringTemplate templating engine, has written an entire white paper on this subject. In it, he says:
The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development
.
It’s a valid point. If a template can do anything, then in what sense is it a “template” at all, and how is it different from any other code?
The other side of the debate might argue that this is limiting and that logic as it relates to presentation is perfectly acceptable. If a template needs to present a certain set of data, it’s simpler for the template to be able to retrieve that data instead of having to depend on the invoking code or system to provide it. Templates only exist to make it easier to intersperse code amongst presentational markup, not to set the code apart for any other reason.
Regardless of your position, the fact is that different systems enforce different models, and many are settling into a hybrid approach: the template is given a bundle of data but can perform other operations as necessary, unless explicitly disallowed by configuration.
In practical terms, this means that most templates will execute in the context of a known set of data. Data will be provided, and most operations in the template will be specifically to format this data.
For instance, in ASP.NET MVC’s Razor templating language, the data structure is conventionally known as the “model” TITLE
tag of a page, that piece of data is retrieved from the model:
<title>@Model.PageTitle</title>
<title>{{ title }}</title>
URL mapping and the operative content object
Closely related to the architectural concept of how the template engine operates is how the CMS determines what information to give the template to work with. In a coupled system, this is usually accomplished by mapping a URL to a content object to be operated on – what we’ll call the operative content object.
Consider the inbound URL:
/politics/2015/debate-report
In a coupled system, there is no “politics” or “2015” directory, and no file named “debate-report.” Rather, this URL is mapped to the operative content object. When the request for the URL is received, that content object is retrieved and the CMS determines what template should render it. That template is given the content object (and often additional data) and executed to provide output that is returned to the client.
In the prior section I said that templates operate in the context of a specific set of data. For content management specifically, we can say that templates usually execute in the context of the operative content object.
In Episerver (using Razor), the operative object is provided as part of the model, under a property called CurrentPage
:
<title>@Model.CurrentPage.Name</title>
In Sitecore (also Razor):
<title>@Html.Sitecore().Field("Title", Sitecore.Context.Item)</title>
In eZ Platform (Symfony and Twig):
<title>{{ ez_field_value( content, 'title' ) }}</title>
In WordPress (raw PHP):
<title><?php wp_title(); ?></title>
The point here is that the operative content object will be known and provided to the template. In the previous examples, this object was known to the template and referenced as Model
, Sitecore.Context.Item
, content
, and wp_title()
, respectively.
In a decoupled system that writes files to the filesystem, the URL mapping model is reversed. Instead of a URL being received and mapped to an object, that URL is specified on the object and simply used to generate the file in the appropriate location. Put another way, the file exists before the request. When a request is received, it’s handled by the underlying web server without invoking the CMS at all.
Templating Language Functionality
All systems invariably have a language for generating textual output. When it comes time for the merging of content and templating, there is always some type of shorthand for making this happen. It consists of templating code with markers indicating where the managed content should go and how it should behave.
Very few CMSs implement their own custom templating languages. Most modern CMSs use an existing templating language in common use, coupled with some custom extensions and data structures specific to that CMS.
In ASP.NET this is Web Forms or Razor for MVC projects. In PHP, Twig is currently very popular, and Smarty has been well used in the past (to say nothing of just using PHP itself). For Java, FreeMarker and Velocity are popular.
There are three major “levels” of functionality in templating languages. We’ll look at these next.
Simple token replacement
By definition, a system will always have the ability to replace “tokens” (or “variables”) in the templating code with managed content. A token is simply a placeholder that is replaced with information specific to the operative content object being templated.
Consider the following completely hypothetical code:
The name of this article is "{article.title}"
and it was written by {article.author}.
The name of this article is "The Migration Patterns of
the Dodo Bird" and it was written by Bob Jones.
For example:
This article was written on {article.date|"MMM DD, YYYY"}.
In this case, the date of the article is output in a specific format. Depending on the platform, MMM DD, YYYY
might result in “September 3, 2015.” The format of this date is dictated by what is placed in the template by the template editor.
Other common filtering needs include:
Causing the output to be in currency format with two decimal places:
The product costs {product.price|"$#.##"}.
Causing the output to read “5 days ago,” rather than a specific date.
Posted {article.publish_date|relative} ago.
This might cause the word “result” or “results” to appear, depending on how many search results were available.
There are {search.result_count|pluralize:"result"}.
Token replacement is core to any templating language. Templating would effectively be impossible without it.
Limited control structures
Where token replacement runs short is when templates need to perform more advanced logic, such as repeating actions for multiple items or deciding whether or not to output something based on criteria. These concepts are foundational to programming in general, and are collectively known as “control structures” or “flow control.”
Note that the existence of control structures in a templating language is never in place of token replacement – languages with control structures and logical processing as described in this section will also always have the token replacement capabilities described in the previous section. Control structures are an extension of token replacement.
The two core control structures are:
- Looping
- Branching
Consider this (again, hypothetical) templating code:
Other articles about this topic include {article.related_articles}.
What we’d really like to do here is something like this:
Other articles about this topic include:
{foreach related_article in article.related_articles}
* "{related_article.title}" by {related_article.author}
{endforeach}
What we’ve created here is a “for each” loop, which is a programming control structure. Assuming that related_articles
is a reference attribute to multiple other articles, this code will loop through them, and inside the loop the token related_article
will be fully populated as an Article content object, from which we can output information. We’re saying: “For each article in the related_articles
collection, do this…”
We can usually refer to related_article
from inside the loop only. Outside the loop – before the foreach
or after the endforeach
token – the related_article
token has no value. This is called “in scope.” The token related_article
is only in scope inside the loop, and it has a different value during each pass through or iteration over the loop. Outside the loop, it has no value (it’s “out of scope”), and referring to it might even result in an error.
A for each loop is a very common programming construct, and one of multiple ways to loop over a collection of items. The actual implementation will vary from system to system.
In addition to looping, we’ll often need to make decisions to output information based on criteria inside a content object. For instance, what if an article had no related articles? In this case, the related_articles
property would be empty, and there would be nothing to loop over, leaving just this in the output:
Other articles about this topic include:
This would look odd, and leave visitors wondering if they’d missed something. We need to remove everything referring to related articles if there are none.
In this case, we could attempt something like this:
{if article.related_articles.count > 0}
Other articles about this topic include:
{foreach related_article in article.related_articles}
* "{related_article.title}" by {related_article.author}
{endforeach}
{/endif}
Almost all templating languages have some capacity for at least primitive control structures. Without them, you’re limited to basic token replacement, which will quickly fall short of even basic templating tasks.
Native programming language
Templating code can often get complicated. When branching and looping are introduced, templates effectively become little procedural computer programs. The line between the template code and the actual code of a CMS can begin to get blurry.
This causes some to ask, why do templating languages exist at all? If native computer languages are available, why not simply use them? It might seem silly or even unfair to constrain a developer into a more primitive language. The underlying language of your CMS – PHP, C#, or Ruby, for example – can no doubt do a great many things, so why can’t you just do your templating in that language?
In some cases you can, and this often removes the need for a separate templating language altogether. For example, our original token replacement example could be written in PHP like this:
The name of this article is "<?php print $article->title; ?>"
and it was written by <?php print $article->author; ?>
<a href="<?php the_permalink(); ?>">
<?php the_title(); ?>
</a>
<p class="date"><?php echo get_the_date(); ?></p>
<?php the_excerpt(); ?>
So, why isn’t templating done in a full programming language rather than having access to a templating language?
Remember back in The Content Management Team, we identified a subset of developers responsible for the frontend of the website – mainly the HTML/CSS and the templating. This template developer might not be the same person as the backend or server-side developer responsible for completely integrating the CMS. The roles and responsibilities are different. While the server-side developer is concerned with the grand architecture of the entire system, the template developer is only concerned with how things are rendered.
As such, it’s generally desirable for a template developer to only work with a subset of programming functionality, rather than having access to the full scope and power of the underlying programming language in use by the CMS. Giving a template developer unrestricted access to the full programming language introduces three problems:
First, programming languages can be fundamentally complex. There are often many nonintuitive things that a programmer needs to understand, such as variable scoping, the difference between reference and value types, and recursion. These concepts are far beyond what’s necessary to render a simple page of content.
In 2006, Tim Berners-Lee (the founder of the World Wide Web itself) and Noan Mendelsohn edited a paper called "The Rule of Least Power”. Their abstract states:
When designing computer systems, one is often faced with a choice between using a more or less powerful language…. The “Rule of Least Power” suggests choosing the least powerful language suitable for a given purpose.
More power almost always involves more complication, and most programming languages are designed to solve problems more complex than templating.
A dedicated templating language can be domain-specific, meaning it is aware of its intended usage and can contain constructs and concepts designed solely to make it easier to achieve that goal – to generate textual (usually HTML) output, in most cases. The full programming language, by contrast, is designed to do anything and everything a programmer may be tasked with doing.
Second, and closely related, is the issue of security. If a full programming language is available to a template developer, that template could then be allowed to do basically anything the programming language allows. Just because the code is executing in the context of a template doesn’t make it any less dangerous.
Something like this would cause an error during rendering:
Here's what happens when you divide something by zero: <?php print 1/0; ?>.
Finally, templating languages are designed to be stable by being fault tolerant. If an error occurs, it’s often ignored and the template simply carries on with execution. Templates do not (or at least, should not) manipulate data, so the risk of data corruption is low. Additionally, template logic issues can be isolated so that they simply affect one portion of a page, and continued execution can still generate usable content. Issues that arise during templating are rarely something that will or should damage the stability of the website as a whole.
The Surround
When considering a rendered HTML page, there’s a need to separate between the managed content of the page and “the surround.” The surround is everything that (wait for it) surrounds the content object on the page.
The concept of the surround has been with us since long before content management. Server Side Includes have long allowed web developers to provide common markup for headers and footers, and some client-side editing systems provided explicit support for this concept, such as Microsoft Front Page’s “Shared Borders” feature.
Consider the news article in the image below. Several items on this page are the direct result of the operative content object being rendered on the page:
- The title
- The byline
- The body
Then, there’s everything else above, below, and to the sides of the news article. The “everything else” is the surround.
A news article from the New York Times: everything outlined in black – the headline, byline, and body of the article – is content from the actual (operative) content object, and everything else is the surround
In most systems, these items are handled by two different templates. The surround is the outer shell of the HTML document, which is common to all content, while the content object has its own template. The content object is rendered by its template, and placed inside the surround template.
Here’s an example of a surround from the Razor templating language of ASP.NET MVC:
<html>
<body>
<h1>Website Title</h1>
@RenderBody();
</body>
</html>
The @RenderBody()
is a method call that will render the subtemplate for the content in that location. Here’s an example of that template:
<h2>@Article.Title</h2>
<p>
@Article.Body
</p>
Like in our previous examples, the @Article.Body
and @Article.Title
are tokens that are replaced with managed content. The entire result is then embedded in the larger surround and delivered to the end user.
The final result looks like this:
<html>
<body>
<h1>Website Title</h1>
<h2>Article Title</h2>
by Bob Jones
</p>
<p>Lorem ipsum dolar...</p>
<p>More paragraphs of content here...</p>
</body>
</html>
The surround is valuable because there is often infrastructural HTML that is common to every single page on a website. Every page may require a reference to the same stylesheet in the HEAD
tag, or open with the same containing DIV
. Keeping this code in one place is simply a good design practice.
Where templates depart from one another is often in the rendering of different content types. Your Employee Bio content type has fundamentally different information than your News Release content type. Each of these types will likely have its own template, though the output of these templates will be placed within the same surround for final delivery.
It’s possible that different content types might have entirely different surrounds, but this is more rare than you’d think. Occasionally, a “landing page” content type might have a very bare surround, or certain content designed for machine consumption (an RSS feed, for example) will have no surround at all. However, the vast majority of types in the average content management installation will be rendered in the same surround.
Context in the surround
In the examples just presented, the surround is completely ignorant of the content being ultimately rendered inside of it. Our sample surround will render the exact same way each time regardless of the content type.
But let’s add a HEAD
and a TITLE
tag to the current surround:
<html>
<head>
<title>...</title>
</head>
<body>
<h1>Website Title</h1>
{object_template}
</body>
</html>
The question now becomes, what do we put in our TITLE
tag and how do we get it there? The article template itself (the “inner” template) clearly knows how to do this with the {article.title}
token, but what about the surround? What does it “know” about the content rendering inside of it?
Remember, all the templates we’ve discussed so far have known about the operative content object. They’ve all executed in the context of a specific content object to which they could refer. Does the surround have this same luxury? Or is it completely ignorant of what happens inside the inner template being placed within it?
This is a matter of context, or the ability for the surround to take action based on an understanding of the content that is ultimately being rendered. In our example, we could do this:
<title>{article.title}</title>
Additionally, would the templating language even understand the token {article}
? We’re not necessarily rendering an article anymore. The surround has to be generic enough to handle any content type we throw at it.
Here’s the brute force approach to solving this problem:
<title>
(if object.type == "Article") {
{article.title}
}
(if object.type == "EmployeeBio") {
{employee.first_name} {employee.last_name}
}
</title>
This would work – and I’m sure it’s been done – but it’s not very scalable. We’d have to add to this mess of code for every possible content type.
There might be a better ways to solve this problem. Back in Content Modeling when we discussed content modeling, we talked about inheritance, where content types can inherit from more general types and gain all their properties in the process.
Using that, we could create a Web Page content type with a text attribute of Title Tag. Then, our News Article and Employee Bio types could inherit from the Web Page type and get the Title Tag attribute in the process. Then we might do something like this:
<title>{object.title_tag}</title>
Note that we’re using an {object}
token in the templating code of the surround. This is purely hypothetical, but common. The surround usually has access to a piece of content in a form that has information common to all content. It might not be able to dig into the specifics of the content object, but it can deal in generalities.
In reality, most web CMSs have specific ways of handling the TITLE
tag, but this is just one example of how the surround often needs to deal with functionality that is specific to the content that is being rendered.
Consider the common requirement of “Related Content.” CMS integrators see this all the time in wireframes – the idea that content related to the content being viewed can be magically conjured out of thin air
To render this, the surround has to know enough about the specific content being viewed – the content in the “inner” template. Will it have this information in enough detail to act on it?
Navigation is another very common contextual requirement. Often, the surround needs to know where the content lives in the larger content geography. For the left navigation menu of the site, perhaps your plan is to render links to all of the “sibling” pages to the one being viewed, or simply to format the link to the current page differently. To do this, the surround has to know what content is currently being rendered. A crumbtrail is another example – a crumbtrail only makes sense when the position of the current content is known in relation to other content.
Is your surround going to have access to this information? Will it be able to get references to the current object so it can query the repository for the sibling pages?
Lack of context in the surround can occasionally be supremely frustrating. While a template for a specific content object is relatively simple, other things can be made unnecessarily complex by a lack of abstraction and lack of awareness when rendering the surround.
Template Selection
Content objects need a template to render. How is that template selected? How are objects and templates matched up for rendering?
In most cases, templates are selected based on content type. This is natural because a content type is the most obvious determinant of what a template needs to do. The templating code required to render an Employee Bio will almost always be very different from the code required to render a News Release.
In some cases, however, the template selected to render a content object can differ based on factors other than type.
Editors may have a selection of templates, usually in order to alter layout. For instance, an editor might select a “two-column” template in order to display a sidebar.
In these instances, confusion might result from the fact that a different template may require different content to render, and the existence of content might a better way to do automatic selection.
In the case of our two-column template, content has to exist for that sidebar column. Does the content object have an attribute for Sidebar Content? And if it does, could the regular template simply show or hide the sidebar based on whether that property was populated? It would be confusing for an editor to populate a Sidebar Content attribute but still not see a sidebar simply because she had failed to select a template that supports it.
In other cases, we might want to supply a different template for a specific content object to enable some extended functionality. If, for instance, we had a custom-programmed mortgage calculator, we could create a Mortgage Calculator content type, with its own template based on the type. Depending on the effort required, however, this might be a waste for a content type that will only be used one time – there will be exactly one content object created from that type.
It might be easier to simply create a Page object and call it “Mortgage Calculator,” then use a different template for that specific object that contains the code to render our calculator. This could be by editorial selection, but that runs the risk of an editor selecting this template for other pages as well. It would likely be better to force this template for that content object at the code or configuration level.
Some systems do this by filenaming standards, with a defined “fallback” list for how a content object will render. The system will look for a template from most to least specific. For example, say we have an Employee Bio content object that has a unique ID of #632. Our system might look in the templates directory for files named:
content-id-632.tpl
content-employee-bio.tpl
content.tpl
The system will look for a template specific to the ID first (content-id-632.tpl
). If it doesn’t find this, it will look for a template specific to the content type (content-employee-bio.tpl
). If it doesn’t find that, it will use a generic template (content.tpl
) for all content (which, in most cases, would be highly undesirable – one would hope that there would be a template for each content type, at the very least; how would we possibly render completely different content types from the same template?).
While falling back based on file naming is common, other systems have much more elaborate ways of determining template selection, including evaluation of specific properties or specific locations in the geography, and even advanced rules engines involving esoteric combinations of environment and content variables.
Finally, many systems will also provide developer tools to override template selection at the code level. A developer might be able to write code that takes any variables into consideration when assigning a template to a content object for rendering.
Template Abstraction and Inclusion
In addition to the relationship between the template and its surround, a template will quite often contain “subtemplates” or “included templates,” which are separate templates injected into specific places in the “containing” template.
This is the continuation of a very common technique of web programming languages. Server Side Includes have been used for years to insert chunks of HTML and programming code in languages like PHP, Classic ASP, and ColdFusion. And this itself is a continuation of the programming principle of DRY (“Don’t Repeat Yourself”), which encourages programmers to elevate common code to central “libraries” that are referenced in multiple places.
The goal of this model is to avoid repetition and ease the maintenance of templates as changes need to be made. If common template code is concentrated in one location, it can be changed once with potentially wide-ranging effects.
For example, in several places in a website, we might want to generate an HTML structure like this:
<ul>
<li><a href="/article1">Article #1</a></li>
<li><a href="/article2">Article #2</a></li>
<li><a href="/article3">Article #3</a></li>
</ul>
This a simple bulleted list of three articles and their titles. We might use this in our Related Content sidebar, our Latest News menu, and our Other Articles in This Series promotional box. In each case, it would display different articles, but the general presentational structure of displaying a list of articles and titles would apply in all cases.
The code to generate this output might look like this:
<ul>
{foreach article in article_list}
<li><a href="{article.url}">{article.title}</a></li>
{endforeach}
</ul>
We could, of course, simply include that template code in all three places in our templates. But what if we wanted to change it? Rather than including it three times, it would be more efficient to have the code in one place, and simply refer to it.
Perhaps instead, we could insert the following code:
{include:article_list.tpl}
This code would find the article_list.tpl file, in which our code lives, and insert the contents in that location. Used in multiple places, this code would have the effect of centralizing the template structure and allowing us to maintain it in one place.
Remember that the actual articles will be different in each of our three use cases, so we need a way to specify what the article_list
variable means inside the subtemplate. This is usually accomplished by specifying the value when calling the template:
{include:article_list.tpl article_list=article.related_articles}
In this case, we’re calling the subtemplate and telling it that – for this instance only – the article_list
is comprised of the related_articles
attribute of the article we’re rendering.
Template inclusion is quite common (both in CMSs and web development in general), and is extremely helpful to reduce the complexity of templates by abstracting common output structures into their own templates and managing them there.
Template Development and Management
We’ve spent lots of time talking about templates, but what are they exactly, and how do they differ from content itself?
Templates are almost always file-based. Whereas content exists in the CMS as something editors work with through the interface, templates exist on the filesystem as files that developers work on using their standard development tools.
Some systems also allow for template editing through the interface, though this is rare and would usually only be done in an emergency when access to the underlying code was not available. A textarea
in an HTML page offers very little in the way of the coding support even the most rudimentary code editing tool offers – line numbering, syntax highlighting, autocomplete, etc
The existence of file-based templates highlights another difference between templates and content – templates are a code asset, not a content asset. A template change will usually be treated as a code-level change and subject to the developer’s workflow process, not the editors’ workflow process. The two workflow processes are quite different.
Templates are normally stored in a source code management system such as Git or Team Foundation Server. Sometimes they’re stored alongside the CMS code itself, and sometimes separately. Changes to templates are often tested and deployed through well-known build tools like Jenkins or Cruise Control (we’ll talk more about development tools in The CMS Implementation).
The relationship between code and content is often misunderstood, and the two are often conflated. Editors might expect content to be handled like code, and code to be handled like content. Understanding the difference between the two and the boundaries between them is critical to an overall understanding of the CMS itself.
Responsive Design and Output Agnosticism
More and more, prospective CMS customers are asking to what extent a CMS enables or inhibits responsive design. The answer to either question should be “not at all.” Responsive design is largely a byproduct of HTML and CSS markup, and a CMS should neither enable nor inhibit any particular output paradigm. A CMS should ideally strive to be “output-agnostic.”
Some CMSs do provide device detection and use this information for template selection (technically, this is adaptive design, not responsive), but even in systems that don’t, this functionality can be provided by the web server or some other element in the technology stack.
The earlier warning about prebuilt interface widgets looms large here. A canned HTML structure provided by a CMS “feature” will stick out like a sore thumb when it’s the only nonresponsive element on a page or doesn’t respond in the way everything else does. And given that the HTML for your responsive design will be highly specific to the CSS framework you choose, how will these prebuilt widgets decide what HTML to output?
In a larger sense, this question speaks to the division of responsibilities. Is it the responsibility of a CMS to manage the detection of devices and the generation of responsive HTML? Most developers would say no – this should be handled by other components in the technology stack. So long as the CMS does not hinder the generation of any HTML the template developer desires, then the responsiveness of the output is not the CMS’s concern.
Publishing Content
Once we understand the relationship between our content and our presentation, and we’ve developed templates to render content in the format we want, then we need to get this content into a state where someone can consume it. How we do this depends highly on the relationship between our management environment and our delivery environment.
Coupled Versus Decoupled Content Management
One of the more significant architectural principles behind a CMS is the coupling model between its management and delivery environments. By “management,” I mean the system in which content is created, edited, and managed. By “delivery,” I mean the system from which content is consumed by a visitor.
In many cases, these are the same system. Editors manage content and visitors consume it from the same server, using the same execution environment. For example, an editor working on content in Sitecore and a visitor reading that content are both talking to the same Sitecore installation, just from different sides.
These systems are said to be “coupled.” Management and delivery are inextricably linked in the same environment.
Contrast this to a system where the authoring and management environment is on one server, and the delivery environment is on a completely different server, perhaps in a different data center, and even in a different geographic location entirely. Content is created and managed in one place, and is then transmitted to another place where it’s consumed by visitors.
The delivery environment might be only vaguely aware the management environment even exists. If content is placed onto it via FTP or file copy, the web server in the delivery environment will dutifully serve the content up without knowing or caring where it came from.
These systems are said to be “decoupled.” Management and delivery are separated into two environments.
When it comes to actually publishing content, the two options are handled quite differently:
- With a coupled system, the act of publishing content simply means changing a setting on the content to make it publicly available. From the first moment it’s created, content is already in the delivery environment; it’s just hidden from public view. Referring back to the versioning discussion in the last chapter, to make it available for public view, we simply mark one of the versions as “published.” It’s almost anticlimactic.
- With a decoupled system, we have to actually move the data from one environment to another. All content intended for publishing is gathered up from the management environment, then transmitted to another server entirely
.
These two models often make the concept of a “staging environment” confusing. In a coupled system, the staging environment is virtual – if you have permission to see draft content and are perhaps in a “preview mode,” then you’re effectively viewing the staging environment on the same server as the production environment. With a decoupled system, a staging environment might be a literally different environment to which content is transmitted for preview.
Which is the default architecture?
Back in the early days of content management, decoupling was the default architecture. Content management systems were largely static file generators that simply helped website managers turn data into formatted HTML files that were then copied to the root of the website.
But as web programming languages and websites became more sophisticated, the decoupling model began to show cracks. Having simple, static HTML files worked well when content didn’t change much and wasn’t required to do anything, but the market was starting to demand that content become active.
Website managers wanted users to interact with content in contextual ways – they wanted to hide some content from users who weren’t logged in, or they wanted to change the way content was organized based on the user, or they wanted to enable real-time search of content. Static HTML files didn’t adapt well to these needs.
Gradually, the CMS and the content it managed began to become more coupled. Why write out an HTML file when a PHP script could simply query and retrieve content from a database in real time? In the years since, the coupled CMS has become the default model, and decoupled systems are becoming harder to find (though this might be changing; we’ll talk about this a bit in here-cms-is-going).
In situations where decoupling is still used, the CMS normally either publishes scripted web pages (PHP files, for example) that execute on request, or doesn’t publish files at all. Some systems publish pure data records into a database
The argument for decoupling
While not appropriate for many situations, decoupling does have undeniable advantages:
- It allows your repository system and publishing system to be on different architectures. You could have a Java-based CMS pushing content onto a Windows server running .NET. Your delivery environment is not limited by your management environment.
- It will usually result in a more secure delivery environment. The potential hack points on a stripped-down web server are tiny compared to a full CMS.
- You can publish content to multiple delivery environments. A large media enterprise might have hundreds of servers in dozens of countries on multiple continents. In these situations, deploying content is much more complex than just manipulating the version stack. Caching servers need to be updated, media needs to be pushed into a CDN, reverse proxies need to be reset, failover servers need to be updated, etc.
- Since you don’t need to install a CMS on all the delivery servers, you might not need to license them. Depending on the size of your delivery environment, this could save you enormous amounts of money.
- It can be easier to scale a decoupled delivery tier. Adding a simple web server to your load balancer is vastly easier than bringing up a new CMS installation and somehow synchronizing it with the others. Your management environment might actually be quite modest, but it could publish content into a mammoth delivery architecture.
- Reliability is usually higher. Not having a CMS in the delivery environment means fewer moving parts. There are fewer chances for error when serving static HTML files.
- You can publish content from multiple repositories and systems. Your CMS may only be one system of many that generate content, so your delivery tier might need to publish content without knowing (or caring) where it came from. It’s easier to blend content from multiple points of origin in a decoupled environment.
- In some cases, the content in a CMS is secondary to the website’s primary purpose. An online banking system, for instance, might be a massive, custom-built banking platform that incidentally also displays some content. You can’t simply drop a coupled CMS on top of this – the CMS can’t “own” the delivery environment. Instead, the CMS has to be subservient to it, exist somewhere else, and push content into it.
- Some editors demand a true “staging environment” from which to develop content. They want a sandbox in which to publish content for preview before public delivery.
Decoupled Publishing Targets
In decoupled environments, the CMS transmits content to “publishing targets,” which are environments intended for content delivery. Most systems can support more than one publishing environment and publish content to them simultaneously.
The actual method of transmission is often one of the following:
- FTP or SFTP
- File copy (obviously, the two systems would need to be on the same network or VPN)
- WebDAV
- rsync
- SCP
- Web service
Some transmission methods are universal (almost every server will support FTP), while others need something on the other end to receive the content. There is no universal web service, for example, that would receive content from any CMS. Therefore, a CMS might provide a web service to run in the delivery environment that will be used to get content from one environment to another.
Once this happens, the neutrality of the decoupled model is broken. If the delivery environment needs something running inside it, then that environment becomes an extension of the CMS, to some extent. The CMS no longer publishes to a neutral environment, but instead publishes to a known endpoint that is prepared to receive the content from it.
Some systems are even more specific – they run proprietary software to receive content in the delivery environment. The CMS effectively comes in two pieces, resulting in a system where management software pushes content into delivery software that is required on every delivery server
Delivery environment synchronization
Here’s a seemingly simple question: how does a decoupled system delete published content from the delivery environment? Say your decoupled CMS pushes an HTML file (a database record, whatever) to a delivery server. Later, the content “behind” that file gets deleted from the repository. Does the decoupled system then delete the output file from the delivery server?
Some do, but others might only update the delivery environment when you publish, which means that when you delete, there’s an orphaned file sitting out there. Over time, these accumulate, and you wind up with a mix of active files that represent content and orphaned files that have no corresponding content. How can you tell the two apart?
Can you just wipe out the delivery environment and republish the entire repository from scratch? Only if everything for your site is in the CMS. Some solutions are just partially managed – supporting files live in the delivery environment (or are deployed there from source control) and content files are published from the CMS, and these files are all intermingled in the same locations on the delivery server. How can you tell which files were published from the CMS and which files exist only in the delivery environment?
This raises a larger question: how does a decoupled CMS ensure it stays perfectly in sync with its published environment? Does it “own” the delivery environment and exert ironclad control over it? Or is it designed to “contribute” to the delivery environment and not disrupt files that are already out there? The answer to this question varies by system.
A Summary of Output Management and Publication Features
The following checklists provide you with some guidance on points to keep in mind when evaluating your output and publication needs.
Architecture
- Is the system a coupled or decoupled system?
- Does the system manage content without reference to delivery on the Web, or does it have web-centric features built in?
Templating
- Is templating done in a domain-specific language, or is it done in the underlying programming language of the CMS itself?
- Does the language allow for token replacement? Does it permit filtering and formatting of replaced values?
- What control structures are available for template logic?
- How does the templating system allow you to manage and include the surround? How can the surround obtain the correct context of the content object being rendered?
- How does the templating system allow you to abstract and include other templates?
- How are templates selected for content? How are you able to affect this selection?
- How are templates developed and managed by template developers?
Decoupled Publishing
- How is content transmitted to publishing targets?
- How are publishing targets configured and managed? Are they required to run CMS-specific software?
- Can the CMS capture publishing events? Can processes be run before or after content is pushed into the delivery environment?
- What data artifacts are actually published? Just files, or can the system publish records to a database or other non-filesystem storage method?
- How does the CMS ensure the delivery environment stays in sync with the repository?
- Is the CMS expected to manage non-content files as well, or should supporting files live in the delivery environment and only content files be published from the CMS?
- What needs to be installed on the delivery servers in order for them to receive content from the management environment?
"Enforcing Strict Model-View Separation in Template Engines,” (PDF) May 2004.
This is both philosophical and practical. Philosophically, “Model” is the “M” in “MVC.” Practically, this data is referred to in the template as the variable named Model
.
If that sounds a little cynical, it is. There is no Grand Unified Theory of Related Content, though wireframe designers usually assume it just magically happens somehow, so they routinely throw it into every sidebar.
Again, this is hypothetical, but “.tpl” is a very common extension for template files. The files are simple text files, and could just as easily use an “.html” or “.txt” extension, but the “.tpl” extension identifies their purpose by name, which can be helpful.
Very few other systems force template development solely through the interface. This is rare, but you see it occasionally, and it often throws developers into disarray. Files are the universal container of web development – they’re the thing that developers base their work on and use as an encapsulation and transport mechanism for code. Almost all programming processes assume that code exists in files, not database records, and without file artifacts to manage, many programming methodologies and workflows will completely break down.
Usually. Some installations might simply publish content to a different location on the same server.
I’ve even worked on a highly specialized build that simply populated an entire SQLite database with content and pushed that into the delivery environment. So, the CMS swapped out the entire data source of a running website whenever content was published. While clearly not appropriate for many situations, it was the right choice for those particular requirements and demonstrates that decoupled data can be published in many different formats beyond static files.
Clearly, another benefit for commercial CMS companies in this model is that all the delivery servers need to be accounted for and subsequently licensed. The cost of licensing the delivery environment might constitute the largest portion of the vendor’s total price tag.