Editorial Tools and Workflow
The hosts of the TV show MythBusters did an experiment once where they interleaved the pages of two phone books. In effect, they set two phone books together, then pushed them into each other so that their pages alternated, and every page of one phone book was lying between two pages of the other
Then they tried to pull the two phone books apart.
They tried pulling with a dozen people, then they dangled a person from one of them, then they lifted a car off the ground, then they tried to use power equipment in the shop, then they tried two cars moving in opposite directions. Nothing could pull the two books apart until they got two World War II – era armored vehicles. The phone books finally came apart under 8,000 lbs of force.
Do not underestimate friction. It can sneak up on you and bring everything to a grinding halt.
Your CMS necessarily introduces some degree of editorial friction. To do their jobs, your editors will have to interact with the CMS, use the tools it offers, and suffer without the tools it doesn’t. The CMS can either enable them to efficiently breeze through their work, or introduce friction through poor usability, needless repetition, error-prone interfaces, and poor conceptual models.
The capabilities of the CMS that editors use to perform the editorial process are collectively known as editorial tools or editorial workflow (literally meaning “flow of work,” rather than workflow as a specific CMS concept, which we’ll discuss further later in this chapter).
This is really the “management” of content management systems. These are the tools that increase editors’ ability to create better content and gain more control over the content under their care. This is the side of the CMS that editors are going to use, day in and day out.
This is a critical area of functionality, because poor tools and workflow can cripple editors and destroy morale. Sadly, editorial usability is one area of CMS development that gets skipped over too often. As we’ve discussed, CMSs are created by developers, but they’re often also created for developers, first and only. A developer understands things differently than the average content editor, and when designing editorial interfaces and tools, developers will often take leaps and liberties that make sense to them, but not necessarily to people with other perspectives.
With commercial systems and larger open source systems, these usability shortcomings are corrected due to market pressures and large editorial usage. However, in smaller open source systems that don’t have to collect a license fee and might not have a large editor community, editorial usability problems can persist for years without correction.
While editorial friction directly impedes editor productivity in the short term, the more damaging aspect is the chronic drag it has on morale in the long term. Many an editorial team has grown increasingly frustrated and resentful over time with a poorly architected or implemented CMS. More than once, I’ve encountered teams that were fraying at the edges and losing staff because they were tired of the extra workload imposed on them by the system they were forced to use.
Solid, well-implemented editorial tools enhance the editorial process. Poor or nonexistent tools will destroy it over time. At an absolute minimum, a CMS needs to stay out of the way and not impose any friction beyond what’s absolutely necessary.
The Content Lifecycle
From the moment it’s conceived to the moment it’s irrevocably deleted, content goes through multiple stages of “life.” The stage where it’s actually published on a website and can be consumed by a visitor is just one among many (and might sometimes be quite short – a news release announcing an event might be created then deleted a week later).
These stages are collectively called the “lifecycle” of content. There is no universally accepted definition of the exact stages and their order, but I’ll try to present a definition here that encompasses many of the commonly accepted stages.
The content lifecycle can be described as having the following stages:
- Create: Content is initiated in the CMS. It is not complete, but exists as a managed content object. It is not visible to the content consumer.
- Edit and Collaborate: Content is actively edited and/or collaborated on by one or more editors. Content is still not visible.
- Submit and Approve: Content is created and edited, and has been submitted for one or more approvals. Content is still not visible.
- Publish: Content has been approved and is published on the website. Content in this state is visible to the content consumer.
- Archive: Content is removed from public access, but not deleted. It is usually no longer visible.
- Delete: Content is irreversibly deleted from the CMS.
Some of these stages are iterative and may apply simultaneously to different versions of the same content.
For example, a piece of content may be published for some time, then need to be changed. At this time (and depending on the CMS), a new version is created as a draft (Edit and Collaborate), is submitted for approval (Submit and Approve), and then is finally Published, which causes the previous version to Archive. There are now two versions of this content, in different stages of their lifecycles – one is archived, the other is published.
This is not the only way the content lifecycle can be described, and the language used depends highly on the perspective and professional role of the observer. Marketers, for instance, would tend to describe content in terms of “creating, distributing, and analyzing,” without getting into the nitty-gritty of editing, approval, and archiving that a content manager is concerned with.
The Archive stage is particularly nebulous, with very few practitioners completely agreeing on its definition. For some, to archive content just means to make it not visible to the end consumer, without deleting it. For others, it means moving it “somewhere else” in the CMS, out of the way of the non-archived content, but perhaps still leaving it accessible to visitors via a different method. For others, it may mean moving it to different storage – even into offline archival storage media
Regardless of the particular stages of the lifecycle, a good CMS provides functionality across the entire scope of a content object’s existence in your website.
The Editing Interface
The first job of an editing environment is to be usable and to provide content editors with a competent and functional interface in which to create and edit content. If a CMS fails at this, it’s tough to recover. Editors who hate working with content in their CMS will be hard pressed to create anything of value over the long term.
Content Findability and Traversal
To edit content, an editor first has to find it. In some websites, this is simple – if a website has 20 pages, it’s not hard to locate the right one. However, when a website has thousands and thousands of pages, it becomes more difficult. How do you keep track of them all?
Traditionally, websites offered dedicated management interfaces designed to be used by editors solely to browse the content in the repository. Content would be listed in a simple table, with search tools to assist in finding it.
As more and more CMSs embraced the content tree geography, management interfaces moved into a collapsible tree structure, where editors would traverse down through parent and child relationships to identify content.
Today, these interfaces are increasingly giving way to in-context management, where editors simply browse their websites like content consumers do. When the editors are authenticated to the CMS, however, they have editing tools available, ranging from a simple “Edit This Page” link to more complex pop-up/overlay interfaces that allow them to enter an editing mode.
This style of finding content can be difficult for decoupled systems. When the publishing/delivery server is separate from the management/repository server, it often has no capability to authenticate someone as an editor, and therefore has no way to show these users editing tools on the page. Many systems get around this by generating a proxied version of the site – editors browse in a management interface that shows the website in an IFRAME
or proxies the entire website to another server, to which the editor is authenticated.
This method of content traversal has become common because as website usability has increased, the tools available for content consumers to find content have become more similar to the tools editors use. Why build a set of editor tools when the CMS has already provided the end user with an array of sorting and search functionality? As we come up with better and better search technologies and interfaces for our visitors, we’re becoming hard pressed to improve on these for editors. And it would be tough to explain if your editors had to use tools that were actually worse than what site visitors used.
However, there are still content filtering tools that can be considered editor-specific. These are tools that allow you to filter.
- By workflow or publication status
- By administrative task status
- By content owner
- By archival or deletion status
- By custom editorial metadata (for example, locating content tagged with “needs review”)
In all these situations, the criterion for locating content is not something that would be visible to the content consumer, and therefore a CMS might have special editorial tools with which to handle these situations.
Given that editors will always have some special needs beyond those of content consumers, most CMSs pursue a hybrid approach. There is a dedicated administrative interface, supplemented by in-context editing tools of varying depth.
Type Selection
When creating a new content object, the first task of an editor is usually to select the content type on which to base it (see Content Modeling). The CMS needs to ensure the editor can select the appropriate content type for the situation, and not a content type that doesn’t work quite right, or will break the website should it be published.
In many cases, the integrity of a content tree is enforced through restrictions on type parentage. We may have an Issue type, which contains one or more Article Category types, which each contain one or more Article objects. The hierarchy is logical and necessary to the proper modeling and rendering of content.
If an editor was allowed to create content based on the Issue type as a child of an Article, what would happen? The code in the template wouldn’t be expecting content of that particular type in that location. In the best case, if the developer were checking the type of everything, the template would simply ignore it. However, in many situations, the templating would break – the developer would probably assume that the CMS would enforce the type hierarchy, and would thus trust that an Issue would never be found as a child of an Article
In these situations, the CMS needs to be able to dictate the correct relationship between types. Editors create content in specific locations in the content geography, and the CMS should be able to dictate what content they can create at any particular location. Content types therefore need to indicate what types can be created as children of a specific type.
Additionally, some content types might be restricted from certain editors. It’s common that certain types are more volatile and advanced than others. As we discussed in The Content Management Team, not all editors are created equal. A power editor might be allowed greater liberties than others, and the content types to which these editors have access should reflect this.
In some installations, for example, a power editor might have access to a content type that allows raw HTML inclusion – the editor might be able to type raw HTML/JavaScript code that is included into the page, unchanged. Clearly, this can be dangerous – HTML could be introduced that prematurely closed tags, or JavaScript could be added that opened the site to cross-site scripting attacks.
Other items, like image carousels, might require more care and experience than the training of the mainstream editor allows. Editors might need to understand how to select and edit images, address accessibility concerns of image inclusion, and deal with other usability issues.
In these cases, these content types should be restricted to editors trained to use them correctly.
Finally, in some installations, the list of available content types can become quite large, occasionally numbering in the dozens. Several different features can make type selection easier for editors:
- Intelligent naming can help with content type selection. Types often have a “system name” for reference from code and a “display name,” which can be expanded to make more sense for editors.
- Some CMSs will allow a sentence or two of description about what the type does to further explain its usage.
- Others will allow a thumbnail image of an example of a fully rendered type that can jog an editor’s memory as to its correct usage.
- Some CMSs will even learn over time, and present editors with their most commonly used types, or types most commonly used in similar situations – as children of the parent type.
The type selection interface in Episerver
In all cases, the goal is to provide editors with a list of content types that are allowed and appropriate to the current content problem. A CMS should be able to alter this list based on the location of the desired content object and the role of the particular editor.
Content Preview
While we’ll discuss the actual editing interface shortly, it’s important to consider the relationship between editing content and previewing it. When editors make changes to content, they usually want to see those changes before they publish their content to the world. Hence, editors need preview functionality to enable them to see these changes “in context,” which means in the visual environment to which they’re about to be published.
There are two schools of thought about how content preview should relate to the editing interface:
Presentation-free editing was the default standard for many years. In this case, the editor works in a standard web form, with text boxes, text areas, dropdown lists, etc. To preview content, the editor navigates to a separate interface, then back again to continue editing.
In-context editing, by contrast, seeks to make the editing interface look as close to the final content production as possible. An editor might find himself editing in something that looks very much like the finished page, complete with styling. When he types the title, for instance, the letters come out in 24 pt Arial Bold, just like when the page will be published. The goal is to try to disguise that this is an editing interface at all, and make it seem like the editor is simply working in the finished page. Preview becomes effectively real-time.
While in-context editing seems advantageous on its face – who wouldn’t want to see live, real-time previews? – there are a couple of extenuating issues:
In-context editing doesn’t handle nonvisual properties, like +META+ tag content, or configuration content, such as a checkbox for “Show Sidebar,” for instance. If the content is meant to alter page behavior rather than be consumed as content itself, it’s harder to work into an in-context interface.
In-context editing will often only represent a single view of the content – that of a single web page. In today’s multichannel world, content might be published in many places around the Internet and consumed by many different devices, so the question becomes, what preview are you viewing?
Mark Boulton considered this very issue in a blog post:
The problem is this: The question content people ask when finishing adding content to a CMS is “how does this look?”. And this is not a question a CMS can answer any more – even with a preview. How we use the web today has meant that the answer to that questions is, “in what?”
More modern CMSs have multipreview features where an editor can pick a view to preview the content – as a web page, a tweet, or an RSS item, for example. However, this preview functionality is not common, and it generally requires additional setup, development, and templating to provide accurate views for all possible content channels.
Understand that multichannel preview is not just a technical issue. Left to their own devices, editors will be biased toward the main, intended output format, which will likely be HTML. Don’t underestimate the workflow, training, and governance challenges involved with mandating multichannel preview before publication
The problem of personalization and preview
One of the new frontiers in content management over the last half-decade has been personalization (discussed more in Other Features), or the use of behaviors and contextual information to personalize the web experience for each visitor.
Unfortunately, this complicates preview even further. When previewing content, how can you account for all the possible permutations and combinations of factors that might affect that content?
For example, if a visitor has viewed three pages in the section of a university’s website about the nursing program, then a stock image of a nursing student should be displayed on the Admissions page, rather than a more generic image.
How can you preview this? Will you have to manually view three pages yourself in order to mock up the behavior your CMS requires? Or does your CMS have tools to allow you to “spoof” your membership in a personalization group?
Combine this with device and distribution channels, and the possible outcomes can be endless. For one web page, an editor could conceivably create hundreds of different combinations of visitor behavior, consuming device, and distribution channel, all with their own specific previews.
Editing Interface Elements
The editorial rubber meets the road in the interface. There comes a point where an editor actually types or clicks something and creates some content. Generally speaking, a CMS should present editors with the correct editing element for the information they’re trying to edit. A good editing interface guides editors into the right decisions and protects them from doing damage.
Imagine if the editing interface was simply a list of empty text boxes for all attributes. For text, this might be appropriate, but what about for a yes/no value? Should the editor just type “yes” or “no”? How about “true” or “false”? Does capitalization matter? Clearly in this instance the appropriate interface element is a checkbox, which is checked for “yes” and unchecked for “no.”
A CMS should render the editing interface to conform to the content model, making intelligent assumptions when selecting the correct element to present to editors (and allowing administrative overrides where needed). The goal is to present a highly productive working environment avoiding unnecessary error or guesswork.
In addition to the aforementioned checkbox, here are some other element choices:
- A simple text box or text area for long or short text entry
- A checkbox list for an attribute that can support multiple values from a predefined list
- A radio button list or drop-down selector for an attribute that allows only one value from a predefined list
- A rich-text (WYSIWYG) editor for editing HTML
- A calendar drop-down for selecting a date
- A Google Maps interface for selecting a geographic point on a map
- A custom interface providing a search tool to locate an SKU from your product catalog
Validation
In addition to accepting input, a content editing interface must ensure the input is valid to prevent errors from compromising the content. Validation can be guided by the use of editing interface elements, as discussed in the previous section; however, the CMS should always validate data independently of the interface in the case of data being entered through an API or service (and therefore not being subject to the restrictions of the editing interface).
Understand that the validation of content is related to its logical value, not necessarily its pure datatype. Datatypes do not understand context; they only understand pure data, completely separate from how the data will be used.
As we discussed in Content Modeling, if an attribute represents a year, then the underlying datatype might be number (or integer). However, the logical idea of a year presupposes several other restrictions:
- It must have four digits.
- It might have to be in the past.
- It might have to fit into a logical range (while 4538 AD might make for a good science fiction novel, it does not work in the context of when a movie was released).
In this case, the datatype of number is wholly insufficient to enforce the necessary restrictions around the logical value type of year. Additional validation will have to take place.
Some systems offer expanded validation types for these instances. For instance, in the case of a number, a range might be allowed to ensure the number is valid. The same could be true of dates, to ensure an entered date in the future, in the past, or between two landmark dates.
Regular expressions (“regexes”) can be used in many cases to validate text patterns. While a discussion of regex is far beyond the scope of this book
For example, in the case of our movie release date, we can define a regular expression to enforce:
(19|20)\d{2}
This pattern, when applied to entered text, will ensure that the first two digits are “19” or “20” and that they are followed by two additional digits. This would limit data to years between 1900 and 2099.
If we know that our product numbers begin with “A,” followed by two other alphabetic characters, then a dash and four digits, we can write a pattern like this:
A[A-Z]{2}-\d{4}
Invariably, some validation needs simply can’t be predefined by a CMS and must be implemented by custom code. In this example, we can certainly enforce the format of a product number, but we can’t ensure that this product exists in our catalog.
To validate that fact, we will need to write custom code to connect to our product database, check for the entered product number, then tell the interface whether to accept the entered data or display an error to the editor. Different CMSs will offer different levels of functionality in this regard.
Rich text editing
Most CMSs include a rich text interface to allow editors to create HTML content as an attribute of a content object.
For example, almost all implementations will have a content type for Text Page or Standard Page. This content type can be as simple as a Title and a Body, which will often be rich text. Inside the editing interface for the Body, the editor will have buttons for formatting items like bold text, italics, bulleted and numbered lists, image insertion, hyperlink creation, etc. “Just like Microsoft Word,” is a common phrase used to describe these editors.
Usage of rich text can be divisive. Editors enjoy the control, but developers and designers can get nervous about the freedom it allows. Editors have been known to use formatting liberally, and often in defiance of style guides and conventions. Additionally, if they have access to the HTML source, they can manually edit the HTML, which might cause rendering problems with the template in which the content is displayed (worse, a nefarious editor can write HTML that compromises the security of the site itself).
Ideally, a CMS should be very careful about the formatting and access to source it allows. Some common protective features include:
- The buttons displayed in the formatting toolbar should be centrally controlled and contextual to both editor (certain editors get more options than others) and attribute (more options are available when editing the Body than when editing an Author Note, for instance).
- Regardless of editor or context, formatting tools should be heavily scrubbed of anything that might compromise the style of the site, including font selectors, text color palettes, font size controls, etc.
- Access to HTML source should be carefully controlled. Invalid HTML can be introduced through direct HTML editing, in addition to malicious JavaScript opening the site up to cross-site scripting attacks.
- HTML validation should be enabled and strict. When rich text content is saved, it should be checked and corrected for invalid HTML.
There is a recent trend to avoid rich text altogether, and instead attempt to “structure away” the problem by breaking content down into attributes small enough to not need rich text at all. While this might make developers happy, it’s probably not entirely realistic. Most editors will always want formatting tools.
Alternately, some implementations are moving toward very lightweight markup languages rather than HTML. These languages can be edited inside simple text area elements and use character combinations that convert to HTML later, in the page rendering stage. The most common example is Markdown, which looks like this:
This text is in _italics_ and this text is *bold.*
This is [a link](http://oreilly.com/).
Other examples of alternate markup languages are Textile, PanDoc, and WikiText
Editing in Ghost using the Markdown syntax in the lefthand pane with real-time preview in the righthand pane
Reference content selection
In most implementations, content will need to be linked together, in one or more ways:
- The rich text in one content object might contain a hyperlink to another content object.
- A content object might use an image stored elsewhere in the repository.
- A content object might have an attribute that references another content object.
In all these cases, an editor will need to find the remote content object from the editing interface. Methods of doing this vary, but commonly the editor will be presented with a pop-up window that offers multiple methods to find the content – editors will usually be able to browse for it, and might be able to search for it. This becomes more and more important as the number of content objects scales up. Trying to browse for a specific article among thousands can be frustratingly difficult.
What becomes critical is the ways in which this interface can be restricted. For example:
- An attribute reference might only be allowed to a specific content type. The Manager attribute of an Employee content object should only be linked to another Employee content object.
- An attribute reference might be restricted to a specific location in the geography. Perhaps the editor can only select children of the current issue for the Featured Article.
Additionally, a subtle but critical point is whether the reference to an object is attached to the object itself, or to the current URL of the object. The latter is always going to be problematic. If a CMS requires an editor to simply find the URL of another page and paste it into the hyperlink box, what happens if the URL of that second content object changes?
URLs can change, so links between content should be resolved as late as possible in the content delivery cycle. Any inserted link should just be a reference to the content, not its actual, current URL. The reference should then be replaced with the correct and current URL to that content when the content is rendered.
In-context help and documentation
You can’t merely assume that editors will always understand all the nuances of the content model. Content changes can have subtle implications that it may be hard for them to keep track of after the training session. This is especially true of seldom-used properties and features.
Systems vary in their ability to provide editors with help in the editing interface itself. At the very least, properties should be labeled clearly. The ability to provide a few sentences of documentation sometimes makes all the difference.
For example, when presented with a summary field, these few sentences might be invaluable:
Content entered in the summary will be used along with the title when this content is referenced from other locations in the website. If left blank (not recommended), the first paragraph from the body will be used for the summary.
If there’s one thing an editor hates, it’s not knowing what to do and getting stuck. Worse still is doing something and having it cause unintended side effects, or even an error. In-context documentation vastly reduces uncertainty along with the ensuing questions and frustration.
Versioning, Version Control, and Version Labels
Versioning is the act of not overwriting content with changes, but instead saving content in a new version of an existing content object. This means that content objects that have been edited have multiple versions – indeed, they might have hundreds of versions – each representing how that content object looked at a certain moment in time.
Editors can use versioning in several ways:
- As a safeguard against improper or malicious changes. Versioning is like real-time backup for that single content object.
- As an audit trail to ensure they always have a record of what content was presented to the public at any given time, perhaps for legal and compliance reasons.
- To separate changes to content from the currently published content, so that changes can be approved and scheduled independently of the content currently shown to the public.
- To enable one version to be compared to another to determine what has changed, which is helpful for approvals (discussed later in this chapter).
Some systems make versioning automatic, while others require it to be specified by content type. Some systems just version content, while others version everything – content, users, media, and settings.
At any given time, an editor should be able to review a timeline of content changes for an object and see who changed the content and when. Some systems take this a step further by allowing editors to compare versions, either in side-by-side windows or sometimes in a “Track Changes” style where additions, deletions, and edits are shown inline.
Conceptually, versions become a “stack,” stretching back in time. The initial version of content is on the bottom of the stack, with new versions stacked on top of it. Versions are usually labeled with a status, with one of the versions being considered the “published” version.
You might envision an arrow pointing to one version on the stack, which is the published version. This is hypothetical, and the actual implementation of the concept might vary, but it’s a handy metaphor to envision the relationship between the version stack and the various states of content within it.
To change which version is published is to “roll back” to a previous version. Different systems handle this different ways – some will simply move the “published” arrow to a different version, while others will copy the desired older version and make it a new draft version at the top of the stack (thus ensuring that the published version of the content is always the latest version).
Changes to content are considered a new unpublished version – with a label of Draft or Awaiting Publication – sitting on top of the stack. When a new version is published, the publication arrow is simply moved. In some systems you can edit a prior version, while in others you cannot; any change to any version becomes a new version at the top of the stack.
Logically, only one version of content can be published at any one time.
The version stack is conceptually a pile of versions, from latest to oldest, sometimes with a designator showing which one is published (other implementations might just consider the top version published)
Some systems will also allow mass rollback, which will allow an editor to essentially step back in time and view the entire site as if all content was rolled back to its version at that moment.
Almost all versioning in web content management is “serial,” meaning versions to a content object are simply stacked in order of date. Some more advanced document management systems, however, offer branching, where content can be split into multiple branches – version 1 can have version 1.1 and 1.2, which is further split into 1.2.1, etc. This gets very confusing for most editors, and is only required in highly controlled document scenarios (this is common in the management of technical documentation, for example).
Even if never used, versioning is a handy feature to have lurking in the background. The only reasons against using versioning might be storage, since multiple versions will obviously consume more disk space.
Versioning is designed to keep content safe, so the ability to delete or purge versions is usually not available by default. A limit can often be set and a scheduled job will delete excess versions beyond that limit (in the event that there are storage limitations), or the permission to delete individual versions can be granted on an exception basis to specific editors.
When considering versioning, remember that without it, the power to edit is essentially the power to delete. An innocent mistake by an editor who is too quick with the save button can be disastrous.
Version control
Version control is about the management of versions of content between multiple editors working simultaneously. If two editors want to work on the same piece of content, how does this get managed? There are a few options:
- Does the system create a version for each, calling them both “Draft”? This can be a bit frustrating, since the two editors might not realize someone else is working on the same content. When they go to publish, they will realize that they have to reconcile changes between two versions for the content to be correct.
- Does the system allow an editor to “lock” a content object to prevent anyone else from working on it? This is certainly effective, but it essentially blocks the workflow of other editors, and if used haphazardly can cause a piece of content to be locked for longer than necessary, perhaps even while a careless editor goes on vacation.
- Does the system allow two editors to work simultaneously on the same version, à la Google Docs?
At the very least, a CMS should have some indication that a content object is being edited, or that there’s a draft version of this content object above the published version on the version stack. The editor should be notified of this, and given the option of working with the existing draft version (which might be locked), or creating a new draft (which might require change reconciliation with the existing draft at some point in the future).
Dependency Management
Two content objects can be related in a number of ways:
- The HTML of an attribute might contain a hyperlink to another content object.
- An attribute reference might link to another content object.
- A content object might be embedded in the rich text of another content object.
- The HTML of an attribute might contain embedded media that is a separate content object.
Many CMSs will keep track of these links and their purpose (HTML link, attribute reference, media embed, etc.) so it’s possible to know which content depends on which other content. This enables some helpful functionality:
- A predeletion check can inform the editor that the content he’s about to delete is referenced from other content.
- Broken link reports can identify content that links to a target that is no longer available, either due to forced deletion, content expiration, or permission changes.
- An orphaned content report could identify content not in use by any other content.
- Dependencies can be used to determine the cascading scope of changes. When content is published, for example, the system can know what other content needs to be republished, in the case of a decoupled CMS, or reindexed for search.
- Search optimization can use dependencies when weighting result pages, assuming that popular content is referenced more often and should be move heavily weighted. Other information architecture functionality might use the link graph to extract information from content relationships.
A deletion warning in Sitecore – the content pending deletion is depended on by other content, and the system needs to know how the editor wants to handle the broken dependencies
Content Scheduling and Expiration
Often, an editor doesn’t want content to be published immediately. Rather, content should be scheduled for publication in the future.
This is intertwined with versioning, because what an editor is essentially doing is scheduling the change in version labels. The content she’s scheduling is considered to be Draft, with the version label to be changed to Published in the future. (Remember our conceptual “publication arrow”? All the editor is doing is scheduling a time where it moves to a different version in the stack.)
Publication scheduling has two basic forms:
- Scheduling of new content: The content object isn’t displayed anywhere on the site until a point in time at which it appears.
- Scheduling of a new version: The content object is displayed in its current form until a point in time, when a new version takes its place.
This can be slightly complex in some cases when editors begin working on a new version of content before the latest version of content has been published. In these cases, you have the published version several levels back, then one or more versions awaiting publication, then one or more versions ahead of those in draft, which might then get scheduled.
And what happens if an editor elects to publish a new version directly? Or schedules it to publish before one of the versions behind it in the version stack? Some systems might not allow this, others might negate the scheduled publication of anything behind it in the stack, and others might simply blindly follow instructions, which means the scheduled publication would actually move the publication arrow backward in the stack, rather than forward.
Thankfully, it almost never gets this complicated, but poor communication between editors can sometimes bring about complicated scheduling logic problems that can be tricky to sort out.
Changeset Publication
Oftentimes, editors are working on a content project that requires changes to multiple content objects separately. Editors would like these changes tracked as a group and scheduled, approved, and published together.
This is known by several names, but most commonly as a changeset (other common names are: “project,” “edition,” and “package”). A changeset is created with all related content bound to it. The changeset itself is scheduled, rather than the individual content versions. When the changeset reaches publication, all of the content objects are published simultaneously.
Content Expiration
Mercifully, content expiration is quite a bit simpler. At a given point in time, content is simply removed from publication. This means our imaginary arrow disappears from the stack completely, and no version of the content is considered to be published. This is not a deletion. The content still exists, it’s just not viewable.
The only caveat here is that the unattended removal of content from a site can cause some issues when an editor is not available to be notified. When attempting to delete content directly, for example, an editor might be notified that the content is linked to and from several other content objects and that deleting it will break these links. If content is expired unattended, links might break silently without warning
Workflow and Approvals
Workflow is the logical movement of a content object through various steps or stages (though we’ll avoid using the word for “stage” from here on out so as not to confuse this with the content lifecycle discussed previously). Workflow is often conflated with approvals, and they overlap heavily, so we’ll discuss both.
Approvals
After an editor makes a change to content, he might be able to publish it directly, or he might have to submit the change for approval. Many systems separate Edit and Publish permissions. If an editor can edit but not publish, then he can make a change, but it can only be published by someone with that permission.
Conceptually, the editor needs a way to signify, “I am done working on this content, but cannot publish it directly. Someone who can publish it directly needs to review the content and publish it for me.”
Two questions need to be resolved:
- Who gets notified?
- How are they notified?
For the former, the “owning user” (or group of users) can be specified to receive notices of changes or submissions. In other cases, any editor with permission to publish might be notified. In still other cases, specific workflows are created (see the next section) that identify the responsible party.
Notification is usually handled via simple email or through the CMS’s task management system, which might also generate an email.
Workflow
Generally, workflow is a larger, more abstract concept than simple approval. The approval of content can be a type of workflow, but many workflows have nothing to do with content approval. Workflow is more broad than simple approval.
Workflow is the movement of content through a map or network of discrete steps. A workflow step can be almost any process that takes some action before moving the content to another step. As a rule, content can only be in a single step at any time in a given workflow. A workflow step (sometimes called an “activity” or a “task”) is a clear boundary that defines a state the content is in at a moment in time.
Some examples:
- Content is waiting in a step for an editor to approve its movement to the next step, which publishes it. It might wait in this step for three minutes or three days, depending on how long it takes for the editor to take the action necessary for it to continue.
- Content is moved to a step that triggers the execution of code to post a summary of the content to Twitter. When this is complete, the content is automatically moved to the next step.
- Content is waiting in a step for a translator to complete a Spanish translation of the content. When this is done, the translator will signify this completion and move the content to the next step, which creates a task for a reviewer.
In all cases, one or more steps will have no subsequent step, which ends the workflow. Any content currently in a step is in an active workflow, and when the content progresses past the last step, the workflow ends.
It’s important to differentiate between a workflow template and an actual running workflow. Nomenclature varies, but like content, workflows have types (templates) and actual instantiations of those templates currently operating on content
For example, a news publishing organization might have a News Approval workflow that moves content from Submitted to Published. This is a template that defines how the workflow should operate. In a busy newsroom, articles might be submitted for publication every 5 minutes, so while there is one News Approval workflow template, there may be 20 – 30 instances of this workflow active at any given time, all moving individual content items through steps toward publication.
Many systems have reporting interfaces to view all the running instances of a particular workflow, including which step the content is currently in. In some cases, content can get “stuck” in a workflow step, which means it is waiting for an action that will never take place, for whatever reason. Content stuck in a workflow can usually be manually progressed, or have its workflow forcibly ended.
While not common for most organizations, content can even be in more than one workflow at a time. For example, a news article might be in the News Approval workflow, while at the same time it is in the Media Request workflow, awaiting photography.
What constitutes a workflow step can be vague, and it depends highly on what a particular system allows. In some cases, workflow steps are only human-based approvals (one system even calls workflows “approval chains”), while in other cases there are numerous prepackaged activities and actions that can happen, and many allow arbitrary code execution.
While editors tend to envision workflows as human-centered processes, some workflows have no human-powered steps at all, and are more accurately considered arbitrary processes that can be initiated and performed on content. For example:
- A Post to Twitter workflow might have one step that takes mere seconds to execute, then the workflow ends.
- An Export Content workflow could serialize content to a file, write it to the filesystem, and notify an external process that the file is available to be moved to offline storage.
In these cases, workflow is perhaps more accurately described as “work actions.” The initiator is, in effect, saying “execute this action on this content,” and there might not be multiple steps through which it progresses. Rather, there might be a single conceptual action that happens at a moment in time and then ends.
Clearly, workflow is a broad and vague concept that defies attempts at clear definition
Collaboration
In multieditor scenarios, there’s often a need to specify a unit of work, or have a discussion or collaboration session, specifically related to a piece of content.
To address this, some systems have task management or lightweight groupware built in. The utility varies widely, but some common features include:
- The ability to create and assign a task, specifically bound to a content object or changeset. An editor might create a task entitled “Update the Privacy Policy,” then attach that content object to the task, and assign it to another editor. This often dovetails into workflow, as the act of creating the task might have created a workflow. Alternately, a workflow might use the task subsystem heavily when notifying editors of pending approvals. In some cases, the tasks attached to a content object can be viewable in the administrative interface from that object.
- The ability to leave notes for other editors regarding specific content; provide notes on specific versions explaining what was changed; or have threaded, multiuser discussions about content.
- The ability to store editorial metadata (in the event that you want this data separate from the actual content model).
- The ability to have real-time group chats within the CMS interface.
Clearly, this functionality overlaps heavily with non-CMS tools that editors might be using, such as Slack, Skype, Exchange, and even email. The specific difference is the ability for these discussions and tasks to be bound to and make changes to specific content, and for this information to be displayed in the CMS interface. Within the context of the CMS, these features are aware of the content and can be directed in relation to it.
Like with workflow, though, it’s worth mentioning that these features are not often used. Collaboration tools inside a CMS are not the primary focus of the software, and their functionality won’t be able to compete with the dedicated collaboration tools your organization likely uses every day. Left to their own devices, editors will usually revert to things like email and group chat to work with other editors on content.
The key in evaluating the usefulness of a CMS collaboration system is determining what advantages it offers by being embedded in the CMS. Sometimes, that intimacy with content brings nice advantages. But in many cases, the advantages aren’t worth the disruption of yet one more collaboration environment.
Content File Management
Content files are the files (usually binary
In many systems, files are “second-class” content. You can manage them, but in a more rudimentary fashion than “first-class” content (modeled content types). In these instances, binary files are often missing the following features:
- Granular permissions
- Workflow
- Language translation
- Metadata, or additional modeled data
- Personalization
In a more pure and functional implementation, binary files are simply managed content types like any other, with one additional property – the content of the binary file itself. So, the file is “wrapped” in a full-fledged content object that allows modeling of additional information (copyright notice, image caption, etc.), workflow, permissions, and so on.
Adding Content Files
Until the last few years, browsers were never stellar at file uploads, and web CMSs were bound by these limitations. Uploading dozens of files was a tedious exercise, with editors having to manually transfer one file after another.
Simple file upload still works and is available, but better methods now exist for getting files into your CMS. These include:
- Drag and drop: Many systems will allow editors to simply drag one or more files into the browser window and onto a designated location in the interface. All files will then upload simultaneously.
- Pseudo filesystem access: Some systems support protocols allowing for the repository to be accessed like a filesystem. Users might be able to “map a drive” to the CMS, or access the system via FTP or WebDAV clients. Additionally, when the repository is available natively to the filesystem, it’s much easier for automated processes to upload content files – a scheduled script might copy files into the system every night, for instance.
- External repository integrations: Many systems have “connectors” that expose other repositories to the CMS, such as DAM systems, ECM systems, or even remote services like Amazon S3 or Dropbox. Editors working with content might be able to insert images directly from a SharePoint library, for instance, without having to upload it first.
Content Association
Files are different from other content in that they rarely exist in isolation. To get to a file, a user has to navigate to other content, and a file download is almost always represented by a link on an HTML page (which is likely represented by a content object).
Consider how you typically download files. Unless someone has emailed you a direct link, how often do you navigate directly to a file download without touching any other page on a website? Usually you access a download page, then click a download link.
Additionally, many content files serve solely in support of specific content. A photo gallery will have multiple images that it renders. These images might not be used anywhere else on the site, and serve no purpose other than in support of that single photo gallery.
This means that files are often associated with specific content – they are “attached” to that content and operate under the same management umbrella. In these cases, the content objects and the files that support them should be managed as a package.
For example:
- A file associated with a page of content might need to mirror the permissions of that page. If the page is only available to logged-in users, the file should have that same limitation. If the permissions of the page change, the file permissions should change as well.
- When selecting files to link or insert, editors should have the option to isolate that selection to files associated with that page, rather than wading through all the files in the system.
- When a page of content is deleted or archived, any associated files should suffer the same fate. The lack of this feature inevitably results in a massive archive of old content files, the vast majority of which are not in use by the CMS any longer.
Many systems will provide for this by having files that are specifically associated with another content object and only available for use by that object, while also allowing for global files that are available to all the content in the system.
Image Processing
There’s a difference between an image itself (the original) and a specific file representation of that image. An image of a sailboat might need to be converted into multiple files at different resolutions and file sizes for insertion in content at different locations.
Many systems will preserve the original uploaded image, but create additional renditions of it based on a set of configurable rules that allow for multiple styles of the image to be available to editors and template developers. For example, upon uploading an image, the CMS might resize it to three different sizes.
This manifests itself in two main ways:
- When an image is delivered, the templating system might have constructs for the selection of different renditions, or even the detection of the container and automatic insertion of the correct size.
- Editors might be able to select from different sizes and renditions when inserting images into rich text content.
In addition to automatic image manipulation, many systems provide some manual image editing capability – the unspoken goal being “Photoshop in the browser” – with varying degrees of effectiveness. Simple image editing, such as resizing and cropping, is common, but more in-depth transforms usually require images to be edited offline, and might require additional training for editors.
Permissions
Content permissions are meant to prevent malicious manipulation of content, or (more likely) to protect editors from doing things they don’t intend to do. Preventing an editor from changing the home page is both good editorial policy and helpful for the editor, who might accidentally be making global changes without realizing it.
The concept of permissions in a CMS ties heavily into (and borrows liberally from) the permissions systems that have been in use on filesystems for years. Windows and Linux filesystems have had global permissions models since they were invented, and many of the concepts in the modern CMS are based around them.
For example, an “access control list” or ACL is a generic computing concept. The definition from Wikipedia:
An access control list (ACL), with respect to a computer filesystem, is a list of permissions attached to an object. An ACL specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects.
In the case of a CMS, an “object” is usually a content object, as opposed to a file on a hard disk. Many CMSs use both the concept and nomenclature of an ACL to control their own permissions.
A permission – or, technically, an access control entry (ACE), an ACL is a bundle of ACEs – is an intersection between three things:
- User
- Action
- Object
In any situation involving permissions, we must ask ourselves: (1) who is trying to (2) do what action, (3) on what object? An ACE, bundled into an ACL, governs what is allowable.
Users
First, we must identify the user context in which an action will take place. For this, we must take into account roles and permissions. For example:
- Fred the Editor has been given Edit permission for the privacy policy, but not Publish permission.
- Mary the Corporate Counsel has been given Publish permission for the privacy policy.
Users can be identified directly (Mary and Fred in the preceding examples), or by group. A group or role is an aggregation of users. Users are assigned to a group, or are considered “in” that group. If a permission is granted to a group, then any user who is a member of that group gets that permission.
Thus, we can adjust our previous examples as follows:
- Anyone in the Editors group has been given Edit permission for the privacy policy, but not Publish permission.
- Anyone in the Corporate Counsel group has been given Publish permission for the privacy policy.
In most cases, this will make more sense.
In general, permissions should always be assigned by group, even when just a single user is in that group. Permissions are usually related to the role someone is performing, rather than to that user as a specific person.
For example, if posts to the CEO’s blog have to be approved by Jessica the CEO, is this because she’s…well, Jessica? No, clearly it’s because she’s the CEO, and if she’s ever not the CEO, then she should lose this permission. The permission belongs to the role of CEO, not the person fulfilling that role.
In this situation, it would be entirely appropriate to create a group called “CEO,” put Jessica in it as the only member, and assign the permission to the group. When Tilly deposes Jessica in a coup and assumes control of the company, we simply remove Jessica from the CEO group and add Tilly to the group, and Tilly assumes all of Jessica’s powers. [Insert maniacal laugh here.]
Group management can get complex. In some cases, groups can contain other groups. So, the Corporate Counsel group could contain a subgroup called Really Important Lawyers. Being in the Really Important Lawyers group would allow all the rights and roles of the larger Corporate Counsel group, plus perhaps some additional rights.
Additionally, some systems have a differentiation between groups and roles. Groups identify users as a members, while roles indicate what they do.
For example, you may have an Editor group, in which you place all your editors, and then have multiple roles for News Article Editor, Media Editor, etc. Permissions are assigned to the roles, which are then assigned to the groups. Users are aggregated into groups, permissions are aggregated into roles, and then roles and groups meet to allow actions to take place.
Yes, this can get confusing. There’s actually an entire discipline and body of theory called identity management. Again, from Wikipedia:
In computing, identity management (IdM) describes the management of individual principals, their authentication, authorization, and privileges within or across system and enterprise boundaries.
In most situations, however, groups and roles are simply conflated. Even in situations where they’re separated, the benefit in most cases is merely hypothetical and semantic. There are no doubt scenarios where the differentiation is important, but it’s not common. Most systems will have a method of aggregating users and assigning permissions to those aggregations, whether they are called “groups” or “roles” or something else.
Objects
In an abstract sense, an “object” is anything in a system that may need to be acted upon. This includes:
- Content objects
- Users
- Content types
- Settings
- Templates
Different systems have different granularity in assigning and managing the permissions to act upon different objects. It’s quite possible that a CMS will have a complex group/role ACL structure in place for everything. In many other cases, ACL-style permissions are reserved for content, and permissions for managing other items in the system – like templates or users – are simply binary: designated people can do it, and other people cannot.
In most cases, permissions apply to content. These permissions are granted or denied on specific objects, but rarely are they directly assigned to those objects. Permissions are usually inferred from either the type of content or its location in the larger content geography. For example:
- A user has full rights to any News Release anywhere in the system.
- A user can create any allowable object under the News section of the content tree.
Occasions exist when a specific content object has different permissions from other content of the same type or in the same location, but this is rare. Specifying those objects as such would become unmanageable over the long term. Therefore, managing permissions in aggregate becomes the only reasonable method.
In many cases, permissions are inherited from some other object – permissions often the parent object or folder. Changing the permission of an object will also change the permissions of all its descendant objects, unless that descendant has been specifically declared to “break” this inheritance and manage its own permissions (at which point it might be the target for the inheritance of its child objects). This is appropriate as permissions are often based on location, and this effectively cascades permissions down branches of a tree.
This is an example of CMS permission models often mimicking filesystems. In Windows, for example, a new file inherits the permissions of its containing folder, unless this connection is specifically broken. Many CMSs use this same logical model.
Actions
Once we know the user and the object to be acted on, we need to allow or deny specific actions. While a user could have so-called “full control” of an object (a phrase borrowed from the Windows permissions model), there’s a greater chance that what the editor can do is limited.
Some of the more common permissions in relation to a content object are:
- Creating content of a specific type
- Editing a content object
- Publishing a content object
- Viewing an unpublished content object
- Rolling back to a previous version of a content object
- Initiating a specific workflow on a type of content object
- Editing a single, spec of an object of a specific type
- So-called “soft deleting” an object by moving it to the Trash or the Recycle Bin
- Irretrievably “hard deleting” a content object
Different systems have different granularity around what permissions can be assigned to an object. Some just have binary access – either you can create/edit/publish content, or you can’t. Others have extremely fine-grained control over specific actions.
Some actions might presuppose other actions. It could be that the right to publish content also confers the right to create and edit it, though situations could be conceived where this doesn’t apply. Likewise, an editor might be given the right to delete content but do nothing else to it, though envisioning a realistic usage scenario for this is harder.
Some systems are also extensible, allowing developers to create their own permissions to govern customizations and extensions they write to a CMS. The image below shows one of the permissions interfaces in the Sitecore CMS, which offers fine-grained control.
One of several permissions interfaces in Sitecore – permissions can be allowed or denied to both content and administrative objects, and permissions “cascade” down the tree to child objects, unless overridden
Permission conflict resolution
Permissions can get complex, especially when different rules come into conflict. In these cases, each system will have some defined method of resolution.
For example, some systems simplify by only offering Allow permissions, but others have explicit Deny permissions as well, which often take precedence over Allow. Additionally, inheritance rules can come into play. Does an inherited permission take precedence over an explicit permission? Usually not; however, as Deny often takes precedence over Allow, an inherited Deny might overrule an explicit Allow.
It all depends on the system, and how that system implements its security model. That the simpler you can keep your permissions model, the better. A more distributed editorial base requires more complicated permissions, which can lead to some complex models, and occasionally extended debugging when an editor can’t do what he should be able to do.
A Summary of Editorial Tools
We’ve covered a lot of ground here. Here are some questions you might want to keep in mind. More than in any other chapter, the warning applies here that these checklists are simplistic and crude tools for analysis. Editorial tools run the gamut of functionality and polish. The room for interpretation is wide, and wide-eyed editors sick of their current CMSs can be easily seduced by glamorous features they might never use.
Content Traversal and Navigation
- Is there a dedicated administrative interface?
- Are there in-context editorial tools for authenticated editors?
- How is content presented and organized for review and selection?
- How can content be organized and grouped for editors?
Type Selection
- How are editors presented with types for creation? Is the interface usable and helpful? How will it scale for potentially dozens of different content types?
- Can available types be restricted by editor role?
- Can available types be restricted by parentage or location in the geography?
Content Preview
- Can the content be previewed prior to publication?
- Can the content be edited in preview mode?
- Can the editor select multiple preview modes to see how the content will appear in different channels?
- Can the editor spoof demographic or session information to invoke various personalization states?
The Editing Interface
- How usable are the editorial interface elements?
- What interface elements can be selected and configured for each property type?
- How can content be validated during entry? How much control is available for error messages?
- Is in-context help available to assist editors during content creation?
- How can the editorial interface be customized? Is it possible to remove functionality based on role? Is it possible to add links, buttons, and other functionality?
Versioning, Version Control, Scheduling, and Expiration
- Does the system version content at all? Is it optional or required?
- Is the versioning serial or branching?
- Can content be rolled back to a prior version?
- How can versions be compared?
- Can new content be scheduled for publication?
- Can a new version of existing content be scheduled for publication?
- Can content be scheduled for expiration?
- Is there a concept of archiving, and what does it mean? Is content actually moved to another location in the geography? Is it deleted? Can it be retrieved?
Workflow and Approvals
- What is the process for content approvals?
- Can approvals be achieved through simple manipulation of permissions?
- How are approvers for specific content identified?
- How are approvers notified that there is content awaiting their review?
- Is there a workflow engine?
- How are workflows created? From the interface? From code? By configuration?
- What constitutes a workflow step? Are there predefined actions that can be taken in a step? Can these steps be customized?
- Is there a task management system? How does this differ from the non-CMS collaboration tools your organization uses today?
Content File Management
- Can content files be managed from the CMS?
- Can these files be associated with specific content?
- Can their permissions and archiving/deletion be synced with content?
- How are files uploaded? Is there a mass-upload feature?
- Can external file repositories be connected to the CMS?
- What automatic image processing features are available?
- What in-browser image editing is available?
Permissions
- What level of permissions does the system offer beyond binary “full control” access?
- How can users be aggregated? Does the system offer both groups and roles?
- How can content be identified for the application of permissions? By location in the geography? By content type?
- Can content inherit or reference its permissions from another location or object, or are all permissions directly applied?
It actually wasn’t that simple. They had to individually turn thousands of phone book pages over each other. That part is boring, but the rest of the video is incredibly entertaining.
On LinkedIn, a group of content managers attempted to define “archiving.” The range of responses was considerable, and they were collected in "Perspectives On What ‘Archiving’ Means in Content Management”.
Yes, clearly, this would be a failure in error checking. But when working with a CMS, developers will often trust it to enforce certain standards and forgo exhaustive error checking for the sake of practicality.
"WYSIWTFFTWOMG!”, September 3, 2013.
I wrote this book in O’Reilly’s Atlas editing platform. I wrote in a text format called AsciiDoc, and could “build” into multiple formats at any given time. I obsessed over the formatting of the print (PDF) version of the book, showing a clear bias toward what it would look like when printed. Only later in the writing process did I start looking at the EPUB, Mobi, and HTML output options. Often, formatting that worked in one was problematic in another.
Mastering Regular Expressions by Jeffrey E.F. Friedl (O’Reilly), currently in its third edition, is the seminal and authoritative text on regexes.
As mentioned earlier, I wrote this entire book in a variant of Markdown called AsciiDoc.
I once had a client who was concerned about this exact scenario. While the logical problem could not be solved short of simply not allowing expiration on objects that were the target of links, we did create a scheduled job that emailed the webmaster every night if it found content that (1) was the target of one or more links, and (2) was expiring in the next 72 hours. This at least gave the client some notice so they could resolve the situation gracefully rather than have links break.
While it would be convenient to call them “workflow types” to parallel “content types,” it seems to be an industry convention to call them “workflow templates.”
If you have an interest in workflow as a general process, the Workflow Patterns website is a project by two universities “to provide a conceptual basis for process technology.” If nothing else, the site will demonstrates that workflow is a discipline that originated and is practiced far beyond the bounds of content management.
Many systems refer to content files as “binary files,” even though they’re not technically required to be binary. There’s nothing stopping an editor from uploading a text file to the CMS, for example.