Chasing the Ideal: Relational Content Modeling in Content Management

By Deane Barker • April 11, 2011 •

Different Web sites require vastly different levels of structure. Content models range from simplistic to absurdly complex, and this really should impart a huge amount of influence on what CMS powers the site.

A brochureware site may just consist of pages grouped together in menus. Pages have no significant relationship to each other beyond what’s implied by the navigation. On the other end of the extreme, an online database of the James Bond movies would have multiple entities tied together in a very formal data model.

What we’re talking about is something I wrote about five years ago and called “relational content modeling.” This is the concept of how different, separately-managed pieces of content relate to each other. (This is distinct from “discrete content modeling,” which is how you structure a single piece of content.)

Relational content modeling deals with the same things you were dealing with when writing your first database:

Each Movie object should have a field called James Bond Actor which must link to a single Actor object.
Each Movie object has a field called Villains which can link to one or more Character objects. This field must have at least one value.
Each Portrayal object has a field called Movie which must link to a single Movie object and a field called Actor which must link to a single Actor object.

I know, I know – this stuff is Database Design 101. But you’re not working with a database that you can custom-design, remember? You’re working with a packaged CMS. Can it handle this?

In this regard, I think all CMS implementations are chasing the ideal of the custom relational database. Their competence in relationally modeling content is defined by how close they get to this ideal. Can any CMS reach it?

Generally speaking, no. Some systems are better than others, but chasing the ideal of the custom data model you would write if you were building your database from scratch is going to frustrate you pretty quickly. If you’re working with your first CMS and bringing a lot of experience with database design, you’ll likely get a pretty harsh introduction to your systems’ limitations in this regard.

However, there are a few exceptions to the rule.

Refresh Software makes a system called SR2 which has a pretty unique content model – you design a database, then simply tell the CMS about it. So the data resides in your own database, and the CMS just manages it.
WebNodes is a .Net system out of Norway which bills itself as a “semantic CMS.” It’s essentially an ORM with a CMS wrapped around it. Your data model can be extremely complex, and based on my limited experience with it, WebNodes is actually writing out underlying database tables to support it.
dotCMS is a commercial open-source system which has a pretty sophisticated system of designing content relationships, indicating what can be related to what else, and how.
Seth told me about Ellington, which is a Python system written in Django. However, I’m going to lump that in with a lot of other Django/Rails systems. For those two frameworks, you see far more custom development rather than canned CMS. Custom-developed systems are usually always based around a custom data storage model, and I know Rails has enough plugins to essentially turn any relational database into a fairly competent CMS (example: Acts as Versioned).
Drupal has a series of modules that allow it to be highly relational (ironic, given that Drupal is the granddaddy of the “big flat pool of content” model). Modules like Node Relationships and Node Relativity can get you pretty close.

Relational content modeling is core primarily because it harkens back to the simple object-oriented programming concept of composition:

In computer science, object composition is a way to combine simple objects or data types into more complex ones. Compositions are a critical building block of many basic data structures […]

In content modeling, we often need to take simple content structures and combine them to form more complex structures.

For example, I was working with a magazine publisher who wanted to represent the following.

A Publication contains many…
Issues, which contain many…
Sections, which contain many…
Articles, which may contain one or more
Subarticles (really just an article, called a “subarticle” when it was subordinate to another article)

This is a simple parent/child hierarchy, but it demonstrates the idea of taking multiple discrete content objects (articles, sections, issues) and “rolling them up” into a more complex content object (publication).

Seth Gottlieb is always partial to a “quiz” example, which might look something like this:

A Quiz contains many
Questions, which contain one
Correct Answer, and many
Incorrect Answers

In my experience implementing content management for 10+ years, excellence in relational content modeling really boils down to two main competencies, each of which has multiple sub-competencies.

Here’s how to do it well:

Competency #1

You need an intuitive method to model parent/child relationships, which takes me back to my preference for systems with a strong content tree. See:

Parent/child is the most common relationship in content management (probably in all of information architecture, even). The benefit of a content tree is that you get implicit enforcement of the model and its referential integrity (each child must have a parent; delete the parent, and the children go with it.)

There are several sub-competencies here:

1.1: You need to be able to limit parent/child relationships by type. So, a “Procedure Section” can contain only “Procedure” objects.
1.2: It’s nice to be able to limit the number of items, so a “State” can only contain one “Capital,” for instance.
1.3: An object should be able to appear as a child of multiple objects, though one should be the designated as the “main” relationship.

Competency #2

You need a method to allow the specification of other content in the context of a specific property on a specific content object. So, on your “Article” content object, you can have an “Author” property into which you select a “Person” object from somewhere else on the system.

These are several sub-competencies here.

2.1: It needs to support single or multiple selections. You need to need be able to specify if you can only pick one “Person” object or multiple.
2.2: Whenever selecting multiple objects, you need to be able to order them arbitrarily. These selections become a discrete collection of content that exists in that exact context – a specific property on a specific content object. Each instance should be allowed to define its own ordering.
2.3: You need to be able to limit the pool of content from which someone can select. These limits are usually based on content type or location. For our “Author” property, we can only pick “Person” objects from the “Authors” branch of the content tree.
2.4: You need to be able to enforce referential integrity. So a content object that is selected into a property somewhere else in the system cannot be deleted, or it can be and will remove itself from any properties in which it was specified (however, this becomes a problem if that removal leaves content in an invalid state – your “Article” now has zero “Person” objects in its “Author” property, and that breaks the rendering).

(For this content selection, the interface becomes pretty important. A great competency in this area can be ruined by a clunky interface. Ideally, you can choose different interfaces, based on the situation. In some cases, perhaps give editors a content tree to which they can navigate to something; in other cases, give editors a dropdown list of qualifying objects to select from.)

When considering these competencies, a CMS can fall into three states:

It can do this out-of-the-box
It can do this with some custom development or plugins
It cannot do this

Obviously, the first one is desirable. The second isn’t bad either, since you could dial it in for your specific situation. (What might be the best option is that it’s handled out-of-the-box, but the system is customizable enough that you can develop against the feature to make it fit your situation perfectly.)

The last item there is not good, and will likely be a source of massive frustration for you. Content modeling is one of the first things you do in a CMS implementation (it’s even first on my list of the Four Disciplines of Content Management), and the inability to accurately model your content the way you want it will impact every other stage of your implementation, from training to governance to templating.

So, what CMS support both of these competencies and all their sub-competencies?

From my experience, eZ publish does all of this quite well out-of-the-box. Episerver can do it all too, with some plugins. (It’s also worth mentioning that Ektron’s significant weakness in this regard remains my chief complaint with that system.)

I’d like to hear from you. Please comment about any systems you have experience with and how they stack up.