Varying Levels of Content Structure

By Deane Barker

I’ve talked ad nauseam on this site about the idea of content modeling – designing a content structure to correctly model your content inside your CMS. I’ve talked about how discrete content modeling differs from relational content modeling.

Just recently, however, I was struck by two other “levels” of content structure, which I’ve managed to fit into a four-level range of structure.

Going from localized structure to more global structure –

1. Intra-Property Structure

This is structure inside an identified property of a content object, generally denoted by some type of markup, be it HTML or XML.

When editing a news article in a WYSIWYG editor, for example, we divide it up into sections, and include a heading or title for section as an H2 tag. This is clearly structure. We would normally look at this has formatting, you can programmatically use this information to add functionality. On the client-side, a jQuery plugin will turn heading tags into a table of contents, and on the server-side, you could divide the article up into pages based on these headings.

Other formatting-ish devices can be used to derive more formal structure. Marking a paragraph with a class of “sidebar” clearly implies some semantics to the contents – it’s not the core content, but rather some ancillary content.

HTML5 has just pushed this further – inside a WYSWIYG editor for a single property, I can now mark something as an “article,” or a “footer.” This has made formal the ability to semantically structure our content down to the markup level.

This is the property level – how raw content (literally raw words are letters) is organized inside a single property.

2. Discrete Structure

This is the structure of a single content object, formed by dividing the object up into properties. For our news article, we divide it up into a Title a Summary, an Author, and a Body.

This level is very traditional and accepted, and really what everyone thinks of when we say “structured content.” Individual fields can be validated and accessed independently of other fields for the purposes of automation or templating.

This is the content object level – how properties are organized inside a single content object.

3a. Relational Structure (Intra-Repository)

This is the structure of different content objects as they relate to each other in the same domain of content. Back to our news article, if we have an Author content object for Bob Smith, then the Author property on our news article might link to our Bob Smith author object, creating a relation between these two objects.

I’ve talked about this numerous times before, most recently in relation to the idea that a content management system often tries to imitate a relational database (the ultimate in relational structure) with varying degrees of success.

This is the repository level – how content objects are organized within a single repository.

3b. Relational Structure (Inter-Repository)

Your completed content objects might relate to content in another repository. Say our news article written by Bob Smith is on our intranet. It’s therefore somewhat related to Bob Smith’s Active Directory record (which is also content, just not in a traditional content management system).

This link could be loose or formal. Bob Smith’s name on the byline could be a mailto link. Clicking it would pop open a new email in Outlook addressed to Bob, and clicking on Bob’s name in the email would open a dialog with all the information about him stored in Active Directory

More formally, it can be more explicit via something like RDF. Our Active Directory system might expose Bob’s data via a Web service and our intranet has a page which pulls that data and renders an employee profile page we can get to from our news article.

Even more explicit, Ad might expose a URI, and the CMS might hold our news article, which references the Active Directory URI via RDF or other semantic linking scheme. So there is now a “hard” link between these two content objects in two different repositories.

This is the inter-repository level – still talking about content objects, but how they’re organized between repositories, rather than within a repository.

4. Information Concept Structure

A repository of content might be related to another repository of content then be brought together as part of a larger information concept to which they both belong.

Say our humble news article is in a content management system inside our company as part of a larger intranet. It may be a piece of a larger construct. For instance, if it’s a news article about the new Whizbang 5000 project, then other content from other repositories might also relate. If someone is searching the organization for information on the W5000, this news article might come back, as well as 453 emails from their index, 13 wiki pages, 57 Word documents, and 19 images. All this content together from all these repositories forms a larger – albeit amorphous – construct of “Whizbang 5000 Data.”

We’re still talking about content objects here, but now as repositories of content objects within larger information concepts.

In an attempt to put it all together, here’s a completely contrived example that pretty much works backwards from the stages explained above –

Mary needs to know why the Whizbang 5000 uses a Gimmel Widget to tighten the Whatoozit Nut.

She searches the intranet for “whizbang 5000 gimmel whatoozit.” Given the search parameters, the results are all part of a very loose construct of “Whizbang 5000 Data.”

She looks through the results, and finds some news articles in a content management system (a single repository) from the communication team about the development of the Whizbang 5000. She pages through these, and sees the title of a single article called “Connectors for the Whizbang 5000” written by a veteran company engineer named Bob Smith.

Opening this single content object, she doesn’t find what she needs, but thinks maybe the author would have written something else. She clicks on Bob’s name and goes to his author page (a different, related content object), where she finds he’s written another article called “My Theories on Widgets” She reads the summary (a single content property) of this article, and thinks it’s close to what she wants.

Opening the article, she finds a table of content at the top which extracts the heading from the content of the body of the article. There’s a heading called “Why I Love the Gimmel Widget.” She clicks that heading to scroll down to the content, and, under the heading, she finds that Bob’s mother’s middle name was Gimmel.

This is item #111 in a sequence of 357 items.

You can use your left/right arrow keys to navigate