Archiving and Retention in Content Cloud
Optimizely has a flexible set of tools to help you implement your organization’s archival and retention policies.
There are probably few features less interesting to most people than archival and retention. This is the practice of storing old versions of content to ensure you can find them later if an auditor asks for it.
The excitement level here is a little low.
Then again, that’s kind of the point.
The truth about archiving is that when you need it, you really need it. Lots of organizations exist under strict archival policies, and the functionality for this is going to be verified by someone who verifies things for a living.
I maintain that archiving should be boring like a trip to the dentist is boring. You absolutely want your trips to the dentist to be boring. An “exciting” trip to the dentist is rarely a good thing.
Content Cloud – like most modern CMSs – will “version” content. Meaning, when you change content and save those changes, it doesn’t save on top of the existing content. Rather, Optimizely saves a list of your content changes over time.
You can view different versions side-by-side to see what’s different between two of them. You can “rollback” to a prior version if something went wrong, or you can take a prior version, create a new draft from it, edit it, then publish it.
But archival is not quite the same thing.
First, when you delete content in Optimizely – like most all other CMSs – it deletes all the versions of content with it. Delete is delete, and delete is forever (…almost, see below). If you want to just remove content from your channel, you might actually be wiping out all historical record of it. This is a problem for most archiving and retention policies.
Second, some archival policies require the archives to be moved to a location that’s less accessible than a past version – they need to be moved to “warm” or “cold” storage. It’s not enough to save versions alongside the content that’s being edited.
All that said, here’s what Optimizely provides that customers can use to fulfill their archiving policies –
At the most basic level, Content Cloud has a permissions model using which you can simply prevent editors from deleting content. These editors can edit the content to “unpublish” it (which simply sets the Expiration Date on the content to that moment), but they can’t remove it.
Additionally, Content Cloud doesn’t actually “delete” from the user interface. From the UI, the option is called “Move to Trash.”
From there, Content Cloud has a Trash Bin where deleted content goes and can be recovered. But what’s handy is that the Trash Bin is just another node in the system and is assigned permissions like anything else. It’s quite simple to allow editors to “trash” content, but not allow them to view or recover content from the Trash Bin. If this is the case, they can’t empty the Trash Bin either, meaning they can’t permanently remove content from the system without an administrator’s help.
There is a scheduled job to automatically remove content over 30 days old from the Trash Bin. You don’t have to run this job (you can “de-schedule” it directly from the UI), or you can implement your own job with longer timeframes.
(Of couse, if you don’t like how any of this works, you can customize the system to pre-empt it, because Content Cloud is customizable down to its core. An event is raised just before content is deleted. It’s quite simple to capture this event, do something entirely different with the content, then cancel the original event. More on this below.)
Content in Content Cloud can be scheduled to publish and to expire. When content is expired, it no longer appears on your site, in search results, or in any API call.
That’s fairly standard CMS behavior, but you can also specify an archive location for content – another location in the larger content structure – and the content will be moved to this location if it’s expired. The permissions on this location can be adjusted so editors cannot view content. In this sense, it’s like the Trash Bin, but coincides with content expiration.
In additional to tacking content, Content Cloud can track editor behavior. The system has a change log that tracks about everything that happens in the system, including changes to content, content deletions, project actions, approval actions, etc. And not just moving content to the Trash Bin – even if you empty the Trash Bin and try to remove every remnant of the content, a record of it and all actions taken on it will still exist in the change log.
Plus, the audit log has a full API. It would be quite easy to push all records from the audit log to an external location via a timed batch job.
Finally, as we discussed a bit above, Content Cloud has an event-based API which makes it easy to capture anything that happens to content during its lifecycle. From cradle to grave, a content object has multiple places where code can be “hooked.” For example:
- Content Created
- Content Saved
- Content Published
- Content Moved
- Content Deleted
At any time, a developer can inject their own code into the Optimizely process and do whatever they like with the content. Events taking place before the action in question can also be canceled.
To prove this, I spent 15 minutes writing a proof-of-concept that did this:
- Captured the action of content being published
- In a separate execution space (so it didn’t slow down the UI or cause any problems), it turned that content object into text (I used XML, but you could write it to whatever format you wanted).
- It took that text and pushed it into an Azure File Storage account.
- It created a directory for the content ID, then wrote the file under the version number of the content. This means that nothing would be over-written – there is a separate file for every version of every piece of content that was ever published.
- That file would stay there forever. The file has no knowledge of anything that happens to that content in the future, and Azure doesn’t even know where the file came from. It’s completely disconnected from Content Cloud. Your CMS could vanish entirely, and all those content files will still be sitting in Azure storage.
Again, this was 15 minutes of work, made possible by how flexible Optimizely’s API is to work with. This system – while it could no-doubt be improved – will run in the background of my Optimizely installation, quietly archiving every published version of content to off-line storage. Editors won’t even know it happened. The archive file can just sit there until needed, or any number of Azure processes could act on it.
I’ll refer back my earlier point: archival requirements are very specific to organizations. Content Cloud and other tools have the development flexibility to effectively implement any strategy you might need.