To Structure or Not to Structure
I’ve talked a lot over the years about content modeling. Open and Closed Content Management is probably the most self-referenced post on this site. Recently I called content modeling one of the Four Disciplines of Content Management.
But, lingering behind all the questions about how to model something is a bigger question: do you model it at all? When is it obvious to structure some content, and when do you just throw it into the “WYSIWYG pile”?
We were meeting with a client the other day about applying some content management to their Web site. We came upon a page of “business partners.” It had a repeated HTML structure consisting of a logo for the partner, their name, their URL, and a few paragraphs about them. There were maybe a dozen or so partners listed.
It looked like this:
From a content modeling perspective, you have three ways to handle content like this:
No structure: This is a perfectly viable option – provided you didn’t mind a TABLE, the HTML represented here is nothing any decent WYSIWYG editor couldn’t handle.
Structured as a single content object: For most systems, this means an XML document, with a repeating “partner” element, and sub-elements therein for “logo,” “name,” “url,” and “description.”
Structured as multiple content objects: You could create a “partner” content type, with fields for “logo,” “name,” “url,” and “description.” This page would be a rendering of multiple partner content objects, sequentially down the page.
Not surprisingly, there are advantages and disadvantages for each, and we threw them around with this client.
The simplest, fastest, and cheapest thing to do.
It’s flexible. To structure content, it needs to be rigid and consistent. However, the client will invariably have an exception to this rule, and a WYSIWYG editor will let them do whatever they want.
It’s likely more usable for the end-user. They have content in their WYSIWYG editor that looks like it does on the page, and it’s tough to make more sense than that.
You’re trusting the user to not screw up the formatting. That can be a huge leap of faith.
You can’t isolate the individual partner records. If they suddenly decided they wanted to feature a random partner on the home page, you can’t easily go pick one out of the list.
WYSIWYG editors don’t do CSS layouts well, so you may have to settle for some messed up HTML. The above layout, for instance, would be hard for the average content editor to do in anything but a big TABLE tag.
Structured as a Single Content Object
You can control the formatting at the template level, which means it’s much harder for the user to screw it up. You’ve taken the formatting out of the user’s hands and they’re now working with pure content.
You can programmatically resize and otherwise process the images.
The page remains a single content object, which is simpler for the end user (opposed to multiple content objects, in the next scenario).
There’s now enough structure to get at the individual records if you need to.
With structure comes rigidity. If the client wants an exception to the format for Partner X, it can become a complicated exception process, or it can be nigh impossible.
Changes to the format now become a developer concern, rather than an editor concern. If they suddenly decide they want to include a line for “location,” they can’t do that themselves without dual-purposing another field, which kind of defeats the purpose.
To effect this solution, you need to have a CMS that can handle repeating form elements in a content type. This usually means XML, and an editor that allows repeating XML elements.
While arbitrary ordering of records is simple (most XML editors will let you move child elements around within the parent), sorting based on a property can be problematic. Support for sorting in XSL is simplistic, and to sort otherwise would require you to parse the XML into some other, sortable data structure, sort it, then publish it from there, or put it back into XML to be transformed.
Structured as Multiple Content Objects
As before, you can control the formatting at the template level.
You can manage each partner record individually. For each one, this means you can permission them, version them, subject them to workflow, etc. You could even give each partner a login and let them manage their own record.
Getting at each of the records individually is as simple as possible.
Most all CMSs can handle this type of structure (as opposed to the need for repeating elements in the prior scenario).
As before, with structure comes rigidity.
Can be more complex for the end user. They have more than one object floating around. Some users might find this just as simple, some may find it complicated.
Depending on the CMS, ordering the records on the page can be a problem. You might want to arbitrarily order these records (rather than sorting on a property), and some CMSs do that better than others.
So, there’s a run-down of the advantages and the disadvantages of the major approaches. But which one to choose?
As you’d imagine, there’s no clear-cut answer. Here are some of the factors to consider:
To what extent do you need to sub-divide the content? In the above example, will you ever need to isolate a single partner?
If your answer to the above is “no,” how sure are you of that? How well does the end-user understand the risk they’re taking by not making the content sub-dividable? Will they accept extra expense if it has to be structured later?
How many sub-dividable units are there? The stakes are much lower with a dozen than with 15,000. Additionally, as the number of units goes up, so does the administrative overhead of managing them all as a group (finding the one that changed out of a dozen is easier than out of 15,000, no matter how good your diffing tool).
What is the technical sophistication of the end-user? How well do they understand content management? Can they grasp the concept of compositing a page out of sub-elements, or are they going to be confused if what’s in the editor doesn’t look like what’s on the page?
How often is the content going to change? When it does change, how often will it be a single element? Will changes to single units overlap, so if one is in the middle of a workflow, would it help to be able to send another one through workflow as a separate unit?
How intricate is the formatting?
How do they want to order the individual records?
Is there any processing you need to do on sub-elements of a record? In the example, the image needs to be resized consistently every time. In other cases, you may need to do consistent calculations of user-supplied values that are best run through a stable algorithm.
Are permissions different on individual units? In the example, if partners can manage their own record, then they have to be managed separately.
How sophisticated is your CMS editing tool? Can it even do repeating elements at all? If you choose the second option, and structure within a single content object, how closely can the editing form look like what’s on the page? (Put another way, how easily can you “trick” the end-user into structuring content by making it look like WYSIWYG?)(Incidentally, Ektron does a good job of this. You can make input forms with repeating elements that come very close what what the end result will be. Joseph Scott’s Edit in Place would do well here too.)
In writing this post, I tossed it over the pond to Josh Clark for his input. In his response, he captured one of the more succinct differences between how we (developers) look at content, and how the end users do. This too, needs to play a role in your decision (emphasis mine):
The big advantage to structuring content, of course, is that it lets you repackage it and present it in different forms and contexts. The downside is that it forces editors to approach their content like machines, thinking in terms of abstract fields that might be mixed and matched down the road. The benefits often outweigh this usability cost if you’re going to present the content elements in multiple contexts and/or offer various sorting options with a large number of elements. If not, then I typically go with unstructured.
That’s brilliant, and it’s so true. Understand that structuring content can suck the soul out of the authoring process for a lot of people. Like Josh said, often the advantages are clear enough to justify some soul-sucking, but always approach this with care.
I remember a client for whom we were building a “case studies” section of their Web site. I kept trying to get them to structure the case studies. I would say things like:
If you kept your case studies in an Excel spreadsheet, and each row was a case study, what would the column headings be?
Now, this is a good question and one that’s worked well for me in the past, but this client was just not getting it. Finally, Joe said, “Dude, I think they just want a page…” And he was right. The client wasn’t thinking in terms of structure – they were thinking in terms of a page with stuff on it. The figured they could just WYSIWYG it up, and in the end, they did it this way and they were fine.
Postscript: So, what are we going to do with our original example? At this point, I can’t say, but I’m leaning away from pure WYSIWYG because of the image processing. If we get this client on eZ publish, I imagine we’ll do it in separate records because eZ can’t repeat sub-elements within the same record. If we were to go with a CMS that allowed that, then my inclination would be to do as a structured single record.