Uber-Text Pages and the Lack of Inheritance in Content Management

By Deane Barker • April 21, 2008 •

We had a build meeting the other day for a client’s site, and we walked through the site map to determine what content types we were going to need to pull this off.

In these cases, the first content type you inevitably define is the ubiquitous “text page.” This is a simple page. Of text. Duh.

Text pages usually consist of a title, a summary (for index pages where you’re listing a bunch of them), and a body of text. Many content management systems support this model explicitly (it’s built-in this way – think of a blogging platform), or you end up modeling your page like this if your system gives you that ability.

But how far do you…push, the text page? There are a lot of opportunities to re-use this content type. How far do you take it?

This particular client also need an “announcement.” We took a long view of it, and determined that their announcements section was really just a group of text pages, reverse-ordered by date. So, we thought, let’s just tack a “date” field on the text page model, and be done with it. If the text page is in an announcements section, we’ll order by that date. If not, we’ll ignore it.

Then, the client needed an “article” content type. Well, what is an article? It’s a text page…with a date…and an author. So, let’s just tack an “author” datatype on the “text page” model, and we’re good…right? We can use it when we need to, and ignore it when we don’t.

Later, the client needed a “newsletter” content type. Turns out, this is just a text page with a PDF file attachment. So, we tacked on a “file” datatype…

Now, in truth, this situation was hypothetical. But you see the idea at work here? How content types are really just derivative of a core content type? The fact is, an awful lot of content types can be defined as:

Title
Summary
Text body

Tack on these datatypes –

Date
Author
File attachment

– and you’ve handled four separate, logical types in our mythical client’s content model: text page, announcement, article, and newsletter.

So, the question is, did we take this too far? Or is what we have planned here an elegant solution to modeling this content?

In the end, it depends. It depends on a lot of your content management system’s functionality external to content modeling. Dividing these four logical types into multiple actual types is often valuable for more than just content modeling – many systems will drive things like templating and permissions by content type. And what happens when you need to add a property to your Announcements, but not your Articles? So having everything as some uber-text page can lead to other issues.

In the end, it comes down to repetition vs. elegance. While duplicating your core set of properties on every content type is a pain, you avoid some tricky issues. Conversely, pushing the envelope with a single content type is elegant, but you can paint yourself into a corner pretty quickly.

But, my point here is that we shouldn’t have to do this. And here’s why –

Very few content management systems are using the object-oriented concept of inheritance these days. Inheritance says that Class B is a superset of Class A – it includes all of Class A’s functionality, and then some more. So if I happen to change Class A, Class B will change too.

In this case, I would model a “Page” object with these properties:

Title
Menu Title (for implicitly menued systems)
Summary
Body
META keywords
META description

Then, would extend this base “Page” object into the “Announcement” object by adding a “date.” I would extend that into an “Article” object by adding an “author,” and into the “Newsletter” object by adding a “file.”

Then, say I want to geolocate everything someday. I just add a “location” attribute to the base object, and everything extends from that.

Very few content management systems allow this. I’ve seen it in exactly two systems, both heavy of document management – Alfresco and Documentum. It’s elegant, it’s precise, and it’s powerful, which should be obvious since it’s been a core tenet of object-oriented programming for years.

Sadly, implementing this kind of system is complicated, and usually computationally expensive. Documentum, for example, maintained a database table for every level of inheritance, and did one-to-one joins all the way down the inheritance tree to return a big database row for an object. (But, on the other hand, this is built-in to Postgres, so WTH?)

Even if a system didn’t let you do traditional inheritance, N-levels deep, it would be handy if you had a “base object” from which you could derive your types from. Meaning, you could alter a base object to include things like the title, summary, text, etc., then each type would be adding properties to this base type. You couldn’t go more than the one level deep, but it would still solve a number of problems.

If your CMS has a strong content tree, you could fake inheritance a bit. You could create a base content type, then add subcontent to “flavor” it. Your base type would have the core properties, and you could add subcontent underneath it to hold other information specific to the pseudo-class you need that particular object to act like. This is hack-ish, but it might work well in some cases, and it fits the model of “Custom Field Sets” we discussed several years ago.

In the end, content type inheritance is the holy grail of content modeling, and you don’t see it that often, which is too bad. It would be a huge asset to any CMS that implemented it. eZ publish claims that it’s on the roadmap, but I’ve yet to see anyone put a date on it.