Editorial Tools and Workflow

The hosts of the TV show MythBusters did an experiment once where they interleaved the pages of two phone books. In effect, they set two phone books together, then pushed them into each other so that their pages alternated, and every page of one phone book was lying between two pages of the other . The only thing holding the two phone books together was the friction of the pages on one another.

Then they tried to pull the two phone books apart.

They tried pulling with a dozen people, then they dangled a person from one of them, then they lifted a car off the ground, then they tried to use power equipment in the shop, then they tried two cars moving in opposite directions. Nothing could pull the two books apart until they got two World War II – era armored vehicles. The phone books finally came apart under 8,000 lbs of force.

Do not underestimate friction. It can sneak up on you and bring everything to a grinding halt.

Your CMS necessarily introduces some degree of editorial friction. To do their jobs, your editors will have to interact with the CMS, use the tools it offers, and suffer without the tools it doesn’t. The CMS can either enable them to efficiently breeze through their work, or introduce friction through poor usability, needless repetition, error-prone interfaces, and poor conceptual models.

The capabilities of the CMS that editors use to perform the editorial process are collectively known as editorial tools or editorial workflow (literally meaning “flow of work,” rather than workflow as a specific CMS concept, which we’ll discuss further later in this chapter).

This is really the “management” of content management systems. These are the tools that increase editors’ ability to create better content and gain more control over the content under their care. This is the side of the CMS that editors are going to use, day in and day out.

This is a critical area of functionality, because poor tools and workflow can cripple editors and destroy morale. Sadly, editorial usability is one area of CMS development that gets skipped over too often. As we’ve discussed, CMSs are created by developers, but they’re often also created for developers, first and only. A developer understands things differently than the average content editor, and when designing editorial interfaces and tools, developers will often take leaps and liberties that make sense to them, but not necessarily to people with other perspectives.

With commercial systems and larger open source systems, these usability shortcomings are corrected due to market pressures and large editorial usage. However, in smaller open source systems that don’t have to collect a license fee and might not have a large editor community, editorial usability problems can persist for years without correction.

While editorial friction directly impedes editor productivity in the short term, the more damaging aspect is the chronic drag it has on morale in the long term. Many an editorial team has grown increasingly frustrated and resentful over time with a poorly architected or implemented CMS. More than once, I’ve encountered teams that were fraying at the edges and losing staff because they were tired of the extra workload imposed on them by the system they were forced to use.

Solid, well-implemented editorial tools enhance the editorial process. Poor or nonexistent tools will destroy it over time. At an absolute minimum, a CMS needs to stay out of the way and not impose any friction beyond what’s absolutely necessary.

The Content Lifecycle

From the moment it’s conceived to the moment it’s irrevocably deleted, content goes through multiple stages of “life.” The stage where it’s actually published on a website and can be consumed by a visitor is just one among many (and might sometimes be quite short – a news release announcing an event might be created then deleted a week later).

These stages are collectively called the “lifecycle” of content. There is no universally accepted definition of the exact stages and their order, but I’ll try to present a definition here that encompasses many of the commonly accepted stages.

The content lifecycle can be described as having the following stages:

Some of these stages are iterative and may apply simultaneously to different versions of the same content.

For example, a piece of content may be published for some time, then need to be changed. At this time (and depending on the CMS), a new version is created as a draft (Edit and Collaborate), is submitted for approval (Submit and Approve), and then is finally Published, which causes the previous version to Archive. There are now two versions of this content, in different stages of their lifecycles – one is archived, the other is published.

This is not the only way the content lifecycle can be described, and the language used depends highly on the perspective and professional role of the observer. Marketers, for instance, would tend to describe content in terms of “creating, distributing, and analyzing,” without getting into the nitty-gritty of editing, approval, and archiving that a content manager is concerned with.

The Archive stage is particularly nebulous, with very few practitioners completely agreeing on its definition. For some, to archive content just means to make it not visible to the end consumer, without deleting it. For others, it means moving it “somewhere else” in the CMS, out of the way of the non-archived content, but perhaps still leaving it accessible to visitors via a different method. For others, it may mean moving it to different storage – even into offline archival storage media .

Regardless of the particular stages of the lifecycle, a good CMS provides functionality across the entire scope of a content object’s existence in your website.

The Editing Interface

The first job of an editing environment is to be usable and to provide content editors with a competent and functional interface in which to create and edit content. If a CMS fails at this, it’s tough to recover. Editors who hate working with content in their CMS will be hard pressed to create anything of value over the long term.

Content Findability and Traversal

To edit content, an editor first has to find it. In some websites, this is simple – if a website has 20 pages, it’s not hard to locate the right one. However, when a website has thousands and thousands of pages, it becomes more difficult. How do you keep track of them all?

Traditionally, websites offered dedicated management interfaces designed to be used by editors solely to browse the content in the repository. Content would be listed in a simple table, with search tools to assist in finding it.

As more and more CMSs embraced the content tree geography, management interfaces moved into a collapsible tree structure, where editors would traverse down through parent and child relationships to identify content.

Today, these interfaces are increasingly giving way to in-context management, where editors simply browse their websites like content consumers do. When the editors are authenticated to the CMS, however, they have editing tools available, ranging from a simple “Edit This Page” link to more complex pop-up/overlay interfaces that allow them to enter an editing mode.

This style of finding content can be difficult for decoupled systems. When the publishing/delivery server is separate from the management/repository server, it often has no capability to authenticate someone as an editor, and therefore has no way to show these users editing tools on the page. Many systems get around this by generating a proxied version of the site – editors browse in a management interface that shows the website in an IFRAME or proxies the entire website to another server, to which the editor is authenticated.

This method of content traversal has become common because as website usability has increased, the tools available for content consumers to find content have become more similar to the tools editors use. Why build a set of editor tools when the CMS has already provided the end user with an array of sorting and search functionality? As we come up with better and better search technologies and interfaces for our visitors, we’re becoming hard pressed to improve on these for editors. And it would be tough to explain if your editors had to use tools that were actually worse than what site visitors used.

However, there are still content filtering tools that can be considered editor-specific. These are tools that allow you to filter.

In all these situations, the criterion for locating content is not something that would be visible to the content consumer, and therefore a CMS might have special editorial tools with which to handle these situations.

Given that editors will always have some special needs beyond those of content consumers, most CMSs pursue a hybrid approach. There is a dedicated administrative interface, supplemented by in-context editing tools of varying depth.

Type Selection

When creating a new content object, the first task of an editor is usually to select the content type on which to base it (see Content Modeling). The CMS needs to ensure the editor can select the appropriate content type for the situation, and not a content type that doesn’t work quite right, or will break the website should it be published.

In many cases, the integrity of a content tree is enforced through restrictions on type parentage. We may have an Issue type, which contains one or more Article Category types, which each contain one or more Article objects. The hierarchy is logical and necessary to the proper modeling and rendering of content.

If an editor was allowed to create content based on the Issue type as a child of an Article, what would happen? The code in the template wouldn’t be expecting content of that particular type in that location. In the best case, if the developer were checking the type of everything, the template would simply ignore it. However, in many situations, the templating would break – the developer would probably assume that the CMS would enforce the type hierarchy, and would thus trust that an Issue would never be found as a child of an Article .

In these situations, the CMS needs to be able to dictate the correct relationship between types. Editors create content in specific locations in the content geography, and the CMS should be able to dictate what content they can create at any particular location. Content types therefore need to indicate what types can be created as children of a specific type.

Additionally, some content types might be restricted from certain editors. It’s common that certain types are more volatile and advanced than others. As we discussed in The Content Management Team, not all editors are created equal. A power editor might be allowed greater liberties than others, and the content types to which these editors have access should reflect this.

In some installations, for example, a power editor might have access to a content type that allows raw HTML inclusion – the editor might be able to type raw HTML/JavaScript code that is included into the page, unchanged. Clearly, this can be dangerous – HTML could be introduced that prematurely closed tags, or JavaScript could be added that opened the site to cross-site scripting attacks.

Other items, like image carousels, might require more care and experience than the training of the mainstream editor allows. Editors might need to understand how to select and edit images, address accessibility concerns of image inclusion, and deal with other usability issues.

In these cases, these content types should be restricted to editors trained to use them correctly.

Finally, in some installations, the list of available content types can become quite large, occasionally numbering in the dozens. Several different features can make type selection easier for editors:

The type selection interface in Episerver

In all cases, the goal is to provide editors with a list of content types that are allowed and appropriate to the current content problem. A CMS should be able to alter this list based on the location of the desired content object and the role of the particular editor.

Content Preview

While we’ll discuss the actual editing interface shortly, it’s important to consider the relationship between editing content and previewing it. When editors make changes to content, they usually want to see those changes before they publish their content to the world. Hence, editors need preview functionality to enable them to see these changes “in context,” which means in the visual environment to which they’re about to be published.

There are two schools of thought about how content preview should relate to the editing interface:

While in-context editing seems advantageous on its face – who wouldn’t want to see live, real-time previews? – there are a couple of extenuating issues:

Mark Boulton considered this very issue in a blog post:

The problem is this: The question content people ask when finishing adding content to a CMS is “how does this look?”. And this is not a question a CMS can answer any more – even with a preview. How we use the web today has meant that the answer to that questions is, “in what?”

More modern CMSs have multipreview features where an editor can pick a view to preview the content – as a web page, a tweet, or an RSS item, for example. However, this preview functionality is not common, and it generally requires additional setup, development, and templating to provide accurate views for all possible content channels.

Understand that multichannel preview is not just a technical issue. Left to their own devices, editors will be biased toward the main, intended output format, which will likely be HTML. Don’t underestimate the workflow, training, and governance challenges involved with mandating multichannel preview before publication .

The problem of personalization and preview

One of the new frontiers in content management over the last half-decade has been personalization (discussed more in Other Features), or the use of behaviors and contextual information to personalize the web experience for each visitor.

Unfortunately, this complicates preview even further. When previewing content, how can you account for all the possible permutations and combinations of factors that might affect that content?

For example, if a visitor has viewed three pages in the section of a university’s website about the nursing program, then a stock image of a nursing student should be displayed on the Admissions page, rather than a more generic image.

How can you preview this? Will you have to manually view three pages yourself in order to mock up the behavior your CMS requires? Or does your CMS have tools to allow you to “spoof” your membership in a personalization group?

Combine this with device and distribution channels, and the possible outcomes can be endless. For one web page, an editor could conceivably create hundreds of different combinations of visitor behavior, consuming device, and distribution channel, all with their own specific previews.

Editing Interface Elements

The editorial rubber meets the road in the interface. There comes a point where an editor actually types or clicks something and creates some content. Generally speaking, a CMS should present editors with the correct editing element for the information they’re trying to edit. A good editing interface guides editors into the right decisions and protects them from doing damage.

Imagine if the editing interface was simply a list of empty text boxes for all attributes. For text, this might be appropriate, but what about for a yes/no value? Should the editor just type “yes” or “no”? How about “true” or “false”? Does capitalization matter? Clearly in this instance the appropriate interface element is a checkbox, which is checked for “yes” and unchecked for “no.”

A CMS should render the editing interface to conform to the content model, making intelligent assumptions when selecting the correct element to present to editors (and allowing administrative overrides where needed). The goal is to present a highly productive working environment avoiding unnecessary error or guesswork.

In addition to the aforementioned checkbox, here are some other element choices:

Validation

In addition to accepting input, a content editing interface must ensure the input is valid to prevent errors from compromising the content. Validation can be guided by the use of editing interface elements, as discussed in the previous section; however, the CMS should always validate data independently of the interface in the case of data being entered through an API or service (and therefore not being subject to the restrictions of the editing interface).

Understand that the validation of content is related to its logical value, not necessarily its pure datatype. Datatypes do not understand context; they only understand pure data, completely separate from how the data will be used.

As we discussed in Content Modeling, if an attribute represents a year, then the underlying datatype might be number (or integer). However, the logical idea of a year presupposes several other restrictions:

In this case, the datatype of number is wholly insufficient to enforce the necessary restrictions around the logical value type of year. Additional validation will have to take place.

Some systems offer expanded validation types for these instances. For instance, in the case of a number, a range might be allowed to ensure the number is valid. The same could be true of dates, to ensure an entered date in the future, in the past, or between two landmark dates.

Regular expressions (“regexes”) can be used in many cases to validate text patterns. While a discussion of regex is far beyond the scope of this book , at a high level a regex is a definition of a text pattern, which can be tested for validity.

For example, in the case of our movie release date, we can define a regular expression to enforce:

(19|20)\d{2}

This pattern, when applied to entered text, will ensure that the first two digits are “19” or “20” and that they are followed by two additional digits. This would limit data to years between 1900 and 2099.

If we know that our product numbers begin with “A,” followed by two other alphabetic characters, then a dash and four digits, we can write a pattern like this:

A[A-Z]{2}-\d{4}

Invariably, some validation needs simply can’t be predefined by a CMS and must be implemented by custom code. In this example, we can certainly enforce the format of a product number, but we can’t ensure that this product exists in our catalog.

To validate that fact, we will need to write custom code to connect to our product database, check for the entered product number, then tell the interface whether to accept the entered data or display an error to the editor. Different CMSs will offer different levels of functionality in this regard.

Rich text editing

Most CMSs include a rich text interface to allow editors to create HTML content as an attribute of a content object.

For example, almost all implementations will have a content type for Text Page or Standard Page. This content type can be as simple as a Title and a Body, which will often be rich text. Inside the editing interface for the Body, the editor will have buttons for formatting items like bold text, italics, bulleted and numbered lists, image insertion, hyperlink creation, etc. “Just like Microsoft Word,” is a common phrase used to describe these editors.

Usage of rich text can be divisive. Editors enjoy the control, but developers and designers can get nervous about the freedom it allows. Editors have been known to use formatting liberally, and often in defiance of style guides and conventions. Additionally, if they have access to the HTML source, they can manually edit the HTML, which might cause rendering problems with the template in which the content is displayed (worse, a nefarious editor can write HTML that compromises the security of the site itself).

Ideally, a CMS should be very careful about the formatting and access to source it allows. Some common protective features include:

There is a recent trend to avoid rich text altogether, and instead attempt to “structure away” the problem by breaking content down into attributes small enough to not need rich text at all. While this might make developers happy, it’s probably not entirely realistic. Most editors will always want formatting tools.

Alternately, some implementations are moving toward very lightweight markup languages rather than HTML. These languages can be edited inside simple text area elements and use character combinations that convert to HTML later, in the page rendering stage. The most common example is Markdown, which looks like this:

This text is in _italics_ and this text is *bold.*
This is [a link](http://oreilly.com/).

Other examples of alternate markup languages are Textile, PanDoc, and WikiText . Some CMS, like Ghost, offer real-time preview of these languages in a side-by-side style interface, with changes in one pane reflected in the other.

Editing in Ghost using the Markdown syntax in the lefthand pane with real-time preview in the righthand pane

Reference content selection

In most implementations, content will need to be linked together, in one or more ways:

In all these cases, an editor will need to find the remote content object from the editing interface. Methods of doing this vary, but commonly the editor will be presented with a pop-up window that offers multiple methods to find the content – editors will usually be able to browse for it, and might be able to search for it. This becomes more and more important as the number of content objects scales up. Trying to browse for a specific article among thousands can be frustratingly difficult.

What becomes critical is the ways in which this interface can be restricted. For example:

Additionally, a subtle but critical point is whether the reference to an object is attached to the object itself, or to the current URL of the object. The latter is always going to be problematic. If a CMS requires an editor to simply find the URL of another page and paste it into the hyperlink box, what happens if the URL of that second content object changes?

URLs can change, so links between content should be resolved as late as possible in the content delivery cycle. Any inserted link should just be a reference to the content, not its actual, current URL. The reference should then be replaced with the correct and current URL to that content when the content is rendered.

In-context help and documentation

You can’t merely assume that editors will always understand all the nuances of the content model. Content changes can have subtle implications that it may be hard for them to keep track of after the training session. This is especially true of seldom-used properties and features.

Systems vary in their ability to provide editors with help in the editing interface itself. At the very least, properties should be labeled clearly. The ability to provide a few sentences of documentation sometimes makes all the difference.

For example, when presented with a summary field, these few sentences might be invaluable:

Content entered in the summary will be used along with the title when this content is referenced from other locations in the website. If left blank (not recommended), the first paragraph from the body will be used for the summary.

If there’s one thing an editor hates, it’s not knowing what to do and getting stuck. Worse still is doing something and having it cause unintended side effects, or even an error. In-context documentation vastly reduces uncertainty along with the ensuing questions and frustration.

Versioning, Version Control, and Version Labels

Versioning is the act of not overwriting content with changes, but instead saving content in a new version of an existing content object. This means that content objects that have been edited have multiple versions – indeed, they might have hundreds of versions – each representing how that content object looked at a certain moment in time.

Editors can use versioning in several ways:

Some systems make versioning automatic, while others require it to be specified by content type. Some systems just version content, while others version everything – content, users, media, and settings.

At any given time, an editor should be able to review a timeline of content changes for an object and see who changed the content and when. Some systems take this a step further by allowing editors to compare versions, either in side-by-side windows or sometimes in a “Track Changes” style where additions, deletions, and edits are shown inline.

Conceptually, versions become a “stack,” stretching back in time. The initial version of content is on the bottom of the stack, with new versions stacked on top of it. Versions are usually labeled with a status, with one of the versions being considered the “published” version.

You might envision an arrow pointing to one version on the stack, which is the published version. This is hypothetical, and the actual implementation of the concept might vary, but it’s a handy metaphor to envision the relationship between the version stack and the various states of content within it.

To change which version is published is to “roll back” to a previous version. Different systems handle this different ways – some will simply move the “published” arrow to a different version, while others will copy the desired older version and make it a new draft version at the top of the stack (thus ensuring that the published version of the content is always the latest version).

Changes to content are considered a new unpublished version – with a label of Draft or Awaiting Publication – sitting on top of the stack. When a new version is published, the publication arrow is simply moved. In some systems you can edit a prior version, while in others you cannot; any change to any version becomes a new version at the top of the stack.

Logically, only one version of content can be published at any one time.

The version stack is conceptually a pile of versions, from latest to oldest, sometimes with a designator showing which one is published (other implementations might just consider the top version published)

Some systems will also allow mass rollback, which will allow an editor to essentially step back in time and view the entire site as if all content was rolled back to its version at that moment.

Almost all versioning in web content management is “serial,” meaning versions to a content object are simply stacked in order of date. Some more advanced document management systems, however, offer branching, where content can be split into multiple branches – version 1 can have version 1.1 and 1.2, which is further split into 1.2.1, etc. This gets very confusing for most editors, and is only required in highly controlled document scenarios (this is common in the management of technical documentation, for example).

Even if never used, versioning is a handy feature to have lurking in the background. The only reasons against using versioning might be storage, since multiple versions will obviously consume more disk space.

Versioning is designed to keep content safe, so the ability to delete or purge versions is usually not available by default. A limit can often be set and a scheduled job will delete excess versions beyond that limit (in the event that there are storage limitations), or the permission to delete individual versions can be granted on an exception basis to specific editors.

When considering versioning, remember that without it, the power to edit is essentially the power to delete. An innocent mistake by an editor who is too quick with the save button can be disastrous.

Version control

Version control is about the management of versions of content between multiple editors working simultaneously. If two editors want to work on the same piece of content, how does this get managed? There are a few options:

At the very least, a CMS should have some indication that a content object is being edited, or that there’s a draft version of this content object above the published version on the version stack. The editor should be notified of this, and given the option of working with the existing draft version (which might be locked), or creating a new draft (which might require change reconciliation with the existing draft at some point in the future).

Dependency Management

Two content objects can be related in a number of ways:

Many CMSs will keep track of these links and their purpose (HTML link, attribute reference, media embed, etc.) so it’s possible to know which content depends on which other content. This enables some helpful functionality:

A deletion warning in Sitecore – the content pending deletion is depended on by other content, and the system needs to know how the editor wants to handle the broken dependencies

Content Scheduling and Expiration

Often, an editor doesn’t want content to be published immediately. Rather, content should be scheduled for publication in the future.

This is intertwined with versioning, because what an editor is essentially doing is scheduling the change in version labels. The content she’s scheduling is considered to be Draft, with the version label to be changed to Published in the future. (Remember our conceptual “publication arrow”? All the editor is doing is scheduling a time where it moves to a different version in the stack.)

Publication scheduling has two basic forms:

This can be slightly complex in some cases when editors begin working on a new version of content before the latest version of content has been published. In these cases, you have the published version several levels back, then one or more versions awaiting publication, then one or more versions ahead of those in draft, which might then get scheduled.

And what happens if an editor elects to publish a new version directly? Or schedules it to publish before one of the versions behind it in the version stack? Some systems might not allow this, others might negate the scheduled publication of anything behind it in the stack, and others might simply blindly follow instructions, which means the scheduled publication would actually move the publication arrow backward in the stack, rather than forward.

Thankfully, it almost never gets this complicated, but poor communication between editors can sometimes bring about complicated scheduling logic problems that can be tricky to sort out.

Changeset Publication

Oftentimes, editors are working on a content project that requires changes to multiple content objects separately. Editors would like these changes tracked as a group and scheduled, approved, and published together.

This is known by several names, but most commonly as a changeset (other common names are: “project,” “edition,” and “package”). A changeset is created with all related content bound to it. The changeset itself is scheduled, rather than the individual content versions. When the changeset reaches publication, all of the content objects are published simultaneously.

Content Expiration

Mercifully, content expiration is quite a bit simpler. At a given point in time, content is simply removed from publication. This means our imaginary arrow disappears from the stack completely, and no version of the content is considered to be published. This is not a deletion. The content still exists, it’s just not viewable.

The only caveat here is that the unattended removal of content from a site can cause some issues when an editor is not available to be notified. When attempting to delete content directly, for example, an editor might be notified that the content is linked to and from several other content objects and that deleting it will break these links. If content is expired unattended, links might break silently without warning .

Workflow and Approvals

Workflow is the logical movement of a content object through various steps or stages (though we’ll avoid using the word for “stage” from here on out so as not to confuse this with the content lifecycle discussed previously). Workflow is often conflated with approvals, and they overlap heavily, so we’ll discuss both.

Approvals

After an editor makes a change to content, he might be able to publish it directly, or he might have to submit the change for approval. Many systems separate Edit and Publish permissions. If an editor can edit but not publish, then he can make a change, but it can only be published by someone with that permission.

Conceptually, the editor needs a way to signify, “I am done working on this content, but cannot publish it directly. Someone who can publish it directly needs to review the content and publish it for me.”

Two questions need to be resolved:

For the former, the “owning user” (or group of users) can be specified to receive notices of changes or submissions. In other cases, any editor with permission to publish might be notified. In still other cases, specific workflows are created (see the next section) that identify the responsible party.

Notification is usually handled via simple email or through the CMS’s task management system, which might also generate an email.

Workflow

Generally, workflow is a larger, more abstract concept than simple approval. The approval of content can be a type of workflow, but many workflows have nothing to do with content approval. Workflow is more broad than simple approval.

Workflow is the movement of content through a map or network of discrete steps. A workflow step can be almost any process that takes some action before moving the content to another step. As a rule, content can only be in a single step at any time in a given workflow. A workflow step (sometimes called an “activity” or a “task”) is a clear boundary that defines a state the content is in at a moment in time.

Some examples:

In all cases, one or more steps will have no subsequent step, which ends the workflow. Any content currently in a step is in an active workflow, and when the content progresses past the last step, the workflow ends.

It’s important to differentiate between a workflow template and an actual running workflow. Nomenclature varies, but like content, workflows have types (templates) and actual instantiations of those templates currently operating on content .

For example, a news publishing organization might have a News Approval workflow that moves content from Submitted to Published. This is a template that defines how the workflow should operate. In a busy newsroom, articles might be submitted for publication every 5 minutes, so while there is one News Approval workflow template, there may be 20 – 30 instances of this workflow active at any given time, all moving individual content items through steps toward publication.

Many systems have reporting interfaces to view all the running instances of a particular workflow, including which step the content is currently in. In some cases, content can get “stuck” in a workflow step, which means it is waiting for an action that will never take place, for whatever reason. Content stuck in a workflow can usually be manually progressed, or have its workflow forcibly ended.

While not common for most organizations, content can even be in more than one workflow at a time. For example, a news article might be in the News Approval workflow, while at the same time it is in the Media Request workflow, awaiting photography.

What constitutes a workflow step can be vague, and it depends highly on what a particular system allows. In some cases, workflow steps are only human-based approvals (one system even calls workflows “approval chains”), while in other cases there are numerous prepackaged activities and actions that can happen, and many allow arbitrary code execution.

While editors tend to envision workflows as human-centered processes, some workflows have no human-powered steps at all, and are more accurately considered arbitrary processes that can be initiated and performed on content. For example:

In these cases, workflow is perhaps more accurately described as “work actions.” The initiator is, in effect, saying “execute this action on this content,” and there might not be multiple steps through which it progresses. Rather, there might be a single conceptual action that happens at a moment in time and then ends.

Clearly, workflow is a broad and vague concept that defies attempts at clear definition . What one system calls “workflow” might be simple approvals in another, or code-level events in a third. Additionally, the scope and functionality of a workflow event vary widely. Some systems allow workflow to be used for approvals only, others allow a broad definition of processes to be performed on content, and still others use their own internal workflow framework to manage content publication.

Collaboration

In multieditor scenarios, there’s often a need to specify a unit of work, or have a discussion or collaboration session, specifically related to a piece of content.

To address this, some systems have task management or lightweight groupware built in. The utility varies widely, but some common features include:

Clearly, this functionality overlaps heavily with non-CMS tools that editors might be using, such as Slack, Skype, Exchange, and even email. The specific difference is the ability for these discussions and tasks to be bound to and make changes to specific content, and for this information to be displayed in the CMS interface. Within the context of the CMS, these features are aware of the content and can be directed in relation to it.

Like with workflow, though, it’s worth mentioning that these features are not often used. Collaboration tools inside a CMS are not the primary focus of the software, and their functionality won’t be able to compete with the dedicated collaboration tools your organization likely uses every day. Left to their own devices, editors will usually revert to things like email and group chat to work with other editors on content.

The key in evaluating the usefulness of a CMS collaboration system is determining what advantages it offers by being embedded in the CMS. Sometimes, that intimacy with content brings nice advantages. But in many cases, the advantages aren’t worth the disruption of yet one more collaboration environment.

Content File Management

Content files are the files (usually binary ) that support the editorial process. These are images, PDF files, Word documents, or other downloads that are not structured, modeled content, but are delivered as fully intact files by the CMS.

In many systems, files are “second-class” content. You can manage them, but in a more rudimentary fashion than “first-class” content (modeled content types). In these instances, binary files are often missing the following features:

In a more pure and functional implementation, binary files are simply managed content types like any other, with one additional property – the content of the binary file itself. So, the file is “wrapped” in a full-fledged content object that allows modeling of additional information (copyright notice, image caption, etc.), workflow, permissions, and so on.

Adding Content Files

Until the last few years, browsers were never stellar at file uploads, and web CMSs were bound by these limitations. Uploading dozens of files was a tedious exercise, with editors having to manually transfer one file after another.

Simple file upload still works and is available, but better methods now exist for getting files into your CMS. These include:

Content Association

Files are different from other content in that they rarely exist in isolation. To get to a file, a user has to navigate to other content, and a file download is almost always represented by a link on an HTML page (which is likely represented by a content object).

Consider how you typically download files. Unless someone has emailed you a direct link, how often do you navigate directly to a file download without touching any other page on a website? Usually you access a download page, then click a download link.

Additionally, many content files serve solely in support of specific content. A photo gallery will have multiple images that it renders. These images might not be used anywhere else on the site, and serve no purpose other than in support of that single photo gallery.

This means that files are often associated with specific content – they are “attached” to that content and operate under the same management umbrella. In these cases, the content objects and the files that support them should be managed as a package.

For example:

Many systems will provide for this by having files that are specifically associated with another content object and only available for use by that object, while also allowing for global files that are available to all the content in the system.

Image Processing

There’s a difference between an image itself (the original) and a specific file representation of that image. An image of a sailboat might need to be converted into multiple files at different resolutions and file sizes for insertion in content at different locations.

Many systems will preserve the original uploaded image, but create additional renditions of it based on a set of configurable rules that allow for multiple styles of the image to be available to editors and template developers. For example, upon uploading an image, the CMS might resize it to three different sizes.

This manifests itself in two main ways:

In addition to automatic image manipulation, many systems provide some manual image editing capability – the unspoken goal being “Photoshop in the browser” – with varying degrees of effectiveness. Simple image editing, such as resizing and cropping, is common, but more in-depth transforms usually require images to be edited offline, and might require additional training for editors.

Permissions

Content permissions are meant to prevent malicious manipulation of content, or (more likely) to protect editors from doing things they don’t intend to do. Preventing an editor from changing the home page is both good editorial policy and helpful for the editor, who might accidentally be making global changes without realizing it.

The concept of permissions in a CMS ties heavily into (and borrows liberally from) the permissions systems that have been in use on filesystems for years. Windows and Linux filesystems have had global permissions models since they were invented, and many of the concepts in the modern CMS are based around them.

For example, an “access control list” or ACL is a generic computing concept. The definition from Wikipedia:

An access control list (ACL), with respect to a computer filesystem, is a list of permissions attached to an object. An ACL specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects.

In the case of a CMS, an “object” is usually a content object, as opposed to a file on a hard disk. Many CMSs use both the concept and nomenclature of an ACL to control their own permissions.

A permission – or, technically, an access control entry (ACE), an ACL is a bundle of ACEs – is an intersection between three things:

In any situation involving permissions, we must ask ourselves: (1) who is trying to (2) do what action, (3) on what object? An ACE, bundled into an ACL, governs what is allowable.

Users

First, we must identify the user context in which an action will take place. For this, we must take into account roles and permissions. For example:

Users can be identified directly (Mary and Fred in the preceding examples), or by group. A group or role is an aggregation of users. Users are assigned to a group, or are considered “in” that group. If a permission is granted to a group, then any user who is a member of that group gets that permission.

Thus, we can adjust our previous examples as follows:

In most cases, this will make more sense.

In general, permissions should always be assigned by group, even when just a single user is in that group. Permissions are usually related to the role someone is performing, rather than to that user as a specific person.

For example, if posts to the CEO’s blog have to be approved by Jessica the CEO, is this because she’s…well, Jessica? No, clearly it’s because she’s the CEO, and if she’s ever not the CEO, then she should lose this permission. The permission belongs to the role of CEO, not the person fulfilling that role.

In this situation, it would be entirely appropriate to create a group called “CEO,” put Jessica in it as the only member, and assign the permission to the group. When Tilly deposes Jessica in a coup and assumes control of the company, we simply remove Jessica from the CEO group and add Tilly to the group, and Tilly assumes all of Jessica’s powers. [Insert maniacal laugh here.]

Group management can get complex. In some cases, groups can contain other groups. So, the Corporate Counsel group could contain a subgroup called Really Important Lawyers. Being in the Really Important Lawyers group would allow all the rights and roles of the larger Corporate Counsel group, plus perhaps some additional rights.

Additionally, some systems have a differentiation between groups and roles. Groups identify users as a members, while roles indicate what they do.

For example, you may have an Editor group, in which you place all your editors, and then have multiple roles for News Article Editor, Media Editor, etc. Permissions are assigned to the roles, which are then assigned to the groups. Users are aggregated into groups, permissions are aggregated into roles, and then roles and groups meet to allow actions to take place.

Yes, this can get confusing. There’s actually an entire discipline and body of theory called identity management. Again, from Wikipedia:

In computing, identity management (IdM) describes the management of individual principals, their authentication, authorization, and privileges within or across system and enterprise boundaries.

In most situations, however, groups and roles are simply conflated. Even in situations where they’re separated, the benefit in most cases is merely hypothetical and semantic. There are no doubt scenarios where the differentiation is important, but it’s not common. Most systems will have a method of aggregating users and assigning permissions to those aggregations, whether they are called “groups” or “roles” or something else.

Objects

In an abstract sense, an “object” is anything in a system that may need to be acted upon. This includes:

Different systems have different granularity in assigning and managing the permissions to act upon different objects. It’s quite possible that a CMS will have a complex group/role ACL structure in place for everything. In many other cases, ACL-style permissions are reserved for content, and permissions for managing other items in the system – like templates or users – are simply binary: designated people can do it, and other people cannot.

In most cases, permissions apply to content. These permissions are granted or denied on specific objects, but rarely are they directly assigned to those objects. Permissions are usually inferred from either the type of content or its location in the larger content geography. For example:

Occasions exist when a specific content object has different permissions from other content of the same type or in the same location, but this is rare. Specifying those objects as such would become unmanageable over the long term. Therefore, managing permissions in aggregate becomes the only reasonable method.

In many cases, permissions are inherited from some other object – permissions often the parent object or folder. Changing the permission of an object will also change the permissions of all its descendant objects, unless that descendant has been specifically declared to “break” this inheritance and manage its own permissions (at which point it might be the target for the inheritance of its child objects). This is appropriate as permissions are often based on location, and this effectively cascades permissions down branches of a tree.

This is an example of CMS permission models often mimicking filesystems. In Windows, for example, a new file inherits the permissions of its containing folder, unless this connection is specifically broken. Many CMSs use this same logical model.

Actions

Once we know the user and the object to be acted on, we need to allow or deny specific actions. While a user could have so-called “full control” of an object (a phrase borrowed from the Windows permissions model), there’s a greater chance that what the editor can do is limited.

Some of the more common permissions in relation to a content object are:

Different systems have different granularity around what permissions can be assigned to an object. Some just have binary access – either you can create/edit/publish content, or you can’t. Others have extremely fine-grained control over specific actions.

Some actions might presuppose other actions. It could be that the right to publish content also confers the right to create and edit it, though situations could be conceived where this doesn’t apply. Likewise, an editor might be given the right to delete content but do nothing else to it, though envisioning a realistic usage scenario for this is harder.

Some systems are also extensible, allowing developers to create their own permissions to govern customizations and extensions they write to a CMS. The image below shows one of the permissions interfaces in the Sitecore CMS, which offers fine-grained control.

One of several permissions interfaces in Sitecore – permissions can be allowed or denied to both content and administrative objects, and permissions “cascade” down the tree to child objects, unless overridden

Permission conflict resolution

Permissions can get complex, especially when different rules come into conflict. In these cases, each system will have some defined method of resolution.

For example, some systems simplify by only offering Allow permissions, but others have explicit Deny permissions as well, which often take precedence over Allow. Additionally, inheritance rules can come into play. Does an inherited permission take precedence over an explicit permission? Usually not; however, as Deny often takes precedence over Allow, an inherited Deny might overrule an explicit Allow.

It all depends on the system, and how that system implements its security model. That the simpler you can keep your permissions model, the better. A more distributed editorial base requires more complicated permissions, which can lead to some complex models, and occasionally extended debugging when an editor can’t do what he should be able to do.

A Summary of Editorial Tools

We’ve covered a lot of ground here. Here are some questions you might want to keep in mind. More than in any other chapter, the warning applies here that these checklists are simplistic and crude tools for analysis. Editorial tools run the gamut of functionality and polish. The room for interpretation is wide, and wide-eyed editors sick of their current CMSs can be easily seduced by glamorous features they might never use.

Content Traversal and Navigation

Type Selection

Content Preview

The Editing Interface

Versioning, Version Control, Scheduling, and Expiration

Workflow and Approvals

Content File Management

Permissions

Footnote #1

It actually wasn’t that simple. They had to individually turn thousands of phone book pages over each other. That part is boring, but the rest of the video is incredibly entertaining.

Footnote #2

On LinkedIn, a group of content managers attempted to define “archiving.” The range of responses was considerable, and they were collected in "Perspectives On What ‘Archiving’ Means in Content Management”.

Footnote #3

Yes, clearly, this would be a failure in error checking. But when working with a CMS, developers will often trust it to enforce certain standards and forgo exhaustive error checking for the sake of practicality.

Footnote #4

"WYSIWTFFTWOMG!”, September 3, 2013.

Footnote #5

I wrote this book in O’Reilly’s Atlas editing platform. I wrote in a text format called AsciiDoc, and could “build” into multiple formats at any given time. I obsessed over the formatting of the print (PDF) version of the book, showing a clear bias toward what it would look like when printed. Only later in the writing process did I start looking at the EPUB, Mobi, and HTML output options. Often, formatting that worked in one was problematic in another.

Footnote #6

Mastering Regular Expressions by Jeffrey E.F. Friedl (O’Reilly), currently in its third edition, is the seminal and authoritative text on regexes.

Footnote #7

As mentioned earlier, I wrote this entire book in a variant of Markdown called AsciiDoc.

Footnote #8

I once had a client who was concerned about this exact scenario. While the logical problem could not be solved short of simply not allowing expiration on objects that were the target of links, we did create a scheduled job that emailed the webmaster every night if it found content that (1) was the target of one or more links, and (2) was expiring in the next 72 hours. This at least gave the client some notice so they could resolve the situation gracefully rather than have links break.

Footnote #9

While it would be convenient to call them “workflow types” to parallel “content types,” it seems to be an industry convention to call them “workflow templates.”

Footnote #10

If you have an interest in workflow as a general process, the Workflow Patterns website is a project by two universities “to provide a conceptual basis for process technology.” If nothing else, the site will demonstrates that workflow is a discipline that originated and is practiced far beyond the bounds of content management.

Footnote #11

Many systems refer to content files as “binary files,” even though they’re not technically required to be binary. There’s nothing stopping an editor from uploading a text file to the CMS, for example.

This is item #9 in a sequence of 18 items.

You can use your left/right arrow keys to navigate