How Content Is Stored

Under the covers, the actual data stored in our repository is usually just one of four types:

A text string (a series of characters)
A number
A date
A block of bytes (a binary large object, or BLOB)

These four things are primitive datatypes. (That’s highly differentiated term, depending on the programming language, but it fits our general usage here.)

Almost every attribute type will convert their logical value into one of those primitive values before storing it. This happens because we don’t create a custom database schema for every possible attribute type – they all have to make do with a general schema, so their values have to be converted to a store-able form.

This is a fundamental theme of CMS in general – we’re storing highly custom data in a generalized system. If we created a custom storage system for everything then we wouldn’t need a CMS. Indeed, our desire not to do this is why we have CMSs in the first place.

A CMS is, in many ways, just a customization or an extension to an underlying datastore. Every CMS is backed by some datastore – usually a relational database. Any database will store information. What makes a content management system are the services it provides over and above simple storage.

We’ll discuss the idea of a content service model in a later chapter.

The concept of converting a more complex data structure to a simpler one for storage is called serialization, while the process of restoring the complex structure from the simpler value is deserialization.

For example, if we need to edit and store a set of map coordinates, we will likely give our editors a visual map interface from which to pick a location. When they do, the CMS will serialize that logical value – the set of coordinates, and perhaps the zoom level the editor was at when they picked it – into something simpler. The actual value stored in the database is just a string of structured text, like this (XML, in this example):

<coordinates lat="43.55" long="-96.7" zoom="1.5"/>

Next time the editorial interface for this attribute needs to render and populate with the stored value, the CMS deserializes that XML back into the logical location value, and uses it to position the map in the editing interface.

The point of this chapter is that many attribute types are just fancy editorial elements to arrive at one of the primitive datatypes that can be stored.

Another example: a CMS might offer a “Number” attribute type, into which an editor can enter a number. It might also offer a “Slider” field which lets the edit drag a slider to the right or left to increase or decrease a numeric value. Both of these attributes are storing primitive numbers, they only differ in the interface they offer editors to express the desired value.

To re-state the note from the beginning of the chapter, this information certainly isn’t necessary to work with a content model or a CMS. In fact, CMSs go to great lengths to hide all of this from you. A CMS works very hard to make storing content seem like magic, and not exposing you to all the gory details of how it gets done.