Database / XML Hybrid Content Management

Thoughts on Content Management: This guy and I think alike. In the beginning of the article he touches on the same things I talked about when I compared open and closed content management systems.

Then, he runs into the same problem: there are too many types of content, each storing their own stuff. So…he comes up with the same solution I’ve tried.

Treat every piece of content as the same as every other, and store it all in a single table. Preposterous? Probably. But bear in mind that there will be a common set of metadata attributes that every piece of content will have (at least in this context): a unique name or identifier (the guid), a date it was created, a title, a description. And of course, there would have to be a “body” field for the content itself. Roll those into the table structure.

The theory is to have separate fields for the common metadata, then a field for the actual content, stored in XML. I’ve done a quick-and-dirty variation of this with a table that just had a bunch of columns: field1, field 2, field3, etc. with a table that tracked what each field stored for each type of content. Yes, it’s kludge, but it worked really well.

The XML solution is much cleaner. However, the problem is that the XML field is a black box that – on most database platforms – you can’t look inside. What if you want a list of articles written by a particular author? Well, you need to use SQL to get all the XML back, spin that collection, XPath into every single one to find the value author of the author node, then keep that record it matches.

In my mind, this won’t be a totally viable solution until databases that allow XPath queries on fields are commonplace. The ability to query like this would break this whole puzzle wide open.

SELECT * FROM articles WHERE XPath(XML, /article/author) = 'bob'

Right now, I think that Oracle and SQL Server (2000 and up) are the only database platforms that allow this. It’ll be a happy, happy day when it makes it in to MySQL.

In thinking about this, I’m reminded of my post from several months ago about managing XML files in aggregate. This is just another attempt to solve that problem: use a database and the ability to do SQL/XPath queries to manage all the individual XML documents. I guess I’m not sure if we’ve solved the problem yet.

And, now that I think back, Documentum Web Publisher does this same thing: stores all the non-meta content in an XML file. Even with that, it was tough to really slice and dice the content due to the limitations described above.