Eval Criteria # 20

How can types be changed after object creation?

Content models are foundational. They’re literally the framework on which your domain of content is built. And since every content object is based on a type, this means that some types might have dozens, hundreds, or even thousands of content objects that have been created from them.

Each of these content objects “fleshes out” the base skeleton of information that the type represents. Layered on the type’s collection of attributes are the actual values, which comprise that unique object of content.

This means if you want to change a type after content has been created from it, this is a lot like changing the foundation of a house after we’ve built the walls and floors on top of it. If the walls of the foundation define the shape of the house, what do you do when that shape changes?

Common Type Changes

Changing a type might involve:

Adding a new attribute
Renaming an existing attribute
Splitting an attribute into one or more new attributes
Changing the attribute type of an existing attribute
Deleting an attribute

Adding an attribute is generally not problematic, because you have no existing attribute values to “protect.” The new attribute just adds content alongside the existing attributes.

Renaming an existing attribute can be an issue, depending on the system. This is hopefully a simple operation, but some systems might never intend for this to happen. We discussed earlier that attributes have an internal name, and in some systems, this is sacrosanct and can’t be changed once the type is defined.

If you can change an attribute name, you need to determine how these changes need to ripple through the rest of your integration and templating code. The name of an attribute is likely a unique identifier at some level, and is therefore used to refer to that attribute throughout an implementation. Sometimes, the only way to find all these references is to set up a duplicate of your production environment, change the attribute name, and see what breaks.

Splitting an attribute falls under a discipline called migration. A migration is the transfer of data from one storage scheme to another.

In most cases, this is accomplished using the API of the CMS to read data, manipulate it, and rewrite it back in a new format. Clearly, this will require developer support.

There’s always some element of danger here since you risk leaving your content in an invalid state, but this can be mitigated by good testing and backup practices. Additionally, it’s perfectly acceptable to create new attributes while leaving the deprecated attribute in place.

For example, say you have a Name attribute which you want to split into First Name and Last Name. Using the API of the CMS, you can read each Name value, split it into the two new values, and save the results back to First Name and Last Name, _while preserving the original Name_. Upon completion, you can test the result before finally removing Name. Or, you can simply disable Name and perhaps hide it from the editorial interface, preserving the data in case a problem is discovered later.

Migrations like this are normally a code-level operation. It would be difficult to encapsulate functionality like this in a UI, since the desired data manipulation could operate on a very wide range. Additionally, the resulting code execution might be a long-running operation, sometimes taking several minutes or more, depending on volume, which can be difficult to manage from a web-based UI.

Changing the underlying type of an attribute can be very difficult. Remember the actual value of the attribute is a primitive value in the underlying data storage system which is serialized from the logical type. Every attribute type is going to serialize this differently. There is very little chance your new attribute type will deserialize the primitive in any sensible way. If both attribute types store simple numbers, or unstructured strings, then it might work, but test this carefully.

For this reason, attribute type changes are rarely supported. Normally, you need to delete an attribute – and lose all its data – then create a new attribute with the same name.

Deleting an attribute is just as dangerous as it sounds – when the attribute goes away, all the data stored in that attribute will likely disappear immediately. And while many systems provide content versioning, delete is still delete.

If you delete an attribute, there’s little chance the attribute will still exist in prior versions. Some systems might have a separate “archiving” subsystem from which more complete, historical data is retrievable, but since there is no longer an attribute to load that value into, this would likely be a manual recovery operation.

Type Conversions

Occasionally, you’ll want to swap types – take some existing content objects, and change the type that defines their structure.

Normally, you wouldn’t do this will all content objects of a type. If you have an Article type, then create a Blog Post type, and decide to swap all 1,363 Article content objects, then one has to ask why you don’t just modify the Article type. If you’re moving all the content objects of a type into another type, then what’s the point of keeping the first type?

Usually, you want to move a subset of objects. So you want to isolate a specific set of objects and associate them with a new type.

Say you’ve been using an Article type for the CEO’s blog posts, but now the CEO wants to store an attribute indicating the Mood they were in when they wrote it. You don’t want to clutter up Article with that extra attribute, so you decide to create a new type specifically for Blog Post. Of the aforementioned 1,363 Article objects, there are 29 of them that represent blog posts by the CEO, so you just need to swap the types of those 29 and leave the other 1,334 alone.

The degree of problem this causes is proportional to the degree of difference in the type structures. In our example, there’s no problem because our new, target type should have a matching attribute for everything on the old, source type (and one extra attribute, which will initially be empty).

Consider the inverse. Our CEO uses the new Blog Post type for a while (say, 16 more posts), then abandons the idea of storing the Mood. To simplify our model, you want to turn those 16 objects back into Article objects.

Our problem is that on those 16 objects, you now have a Mood attribute on the source Blog Post for which _no matching attribute exists on the target Article_. What do you do with it?

In this case, you can probably just throw it away since getting rid of it was the whole idea. In other cases, you might be confronted with a dozen attributes on the source type that have no “place to land” on the target type, and suddenly change our minds about switching types in the first place. Imagine a really unpleasant game of musical chairs – the music stops, and someone doesn’t have a place to sit down anymore.

The type conversion interface in Episerver. The user has selected a tree branch under which they’d like to convert all Simple Text objects into Update objects. The dropdowns for each attribute let us match off a source attribute and a target attribute. Note that two source attributes – Main Body and Summary – have matching target attributes – but three others do not, and the editor is being forced to acknowledge that they’re prepared to throw them away.

Episerver allows you to do a “test conversion.” It will run a mock conversion and show you the results – what objects would change, and what properties would convert. You can decide after that if you want to go ahead with the actual conversion.

Each system has its own method of determining source and target attribute matching and compatibility. The name of the attribute usually signals the intention that these are compatible attributes, and then a secondary comparison is made against attribute type.

How a system decides what constitutes attribute type compatibility is specific to the system. You might end up with some interesting values post-conversion, depending on how the new type deserializes the primitive value – for example, an image type might convert to a text type by simply storing the file path to the image.

Metadata Retention and Type Conversions

In many cases, a “conversion” is actually a “delete and re-creation.” You’re not actually converting an object in place, but rather deleting it and swapping it with a new object to which you’ve transferred some data. This can cause some subtle issues.

Aside from the of actual attribute values, what is transferred “with” the source object in a type conversion? There’s no way to generalize this, but consider that a content object has information beyond just the content encapsulated in its attribute values. Depending on the system, some of the below information might constitute additional “baggage” on a content object:

Unique identifier
Versioning history
Permissions
URL addressability
Template assignment
Publication status

We talked about the different between “content” and “content-ish” in an earlier chapter.

There may or may not be parallels from source type to target type. If you try to convert a non-URL-addressable content object to one that should have a URL, does one automatically get assigned? If you convert an object that has existed for years and has dozens of edits, do you lose its entire version history?

Every system is different. You need to test for specific use cases.

In tree-based systems you might have the added complication of type restrictions.

You might have a Quiz type that only allows Question children. If this is the case, you’d hope the system would prevent us from turning any Question objects into Article objects, since that would violate the type constraints of the tree.

The same is true for parents. If I attempt to convert my Quiz into an Article, but an article doesn’t allow Question child objects, then that conversion should be refused.

Referential attributes typing restrictions should also prevent invalid changes. If you have linked to an Author object from the Author attribute which enforces a type restriction, then you should be prevented from change the type of any object so linked.

Type conversions can get tricky for all the reasons mentioned above. They’re generally something done only for significant site re-organization in conjunction with a developer, and with very recent backups standing by in case something goes wrong.

In some cases, it’s not enough to just convert a type, and more complicated processing needs to be done. In these cases, a developer might just have to manually script the conversion to manipulate the data in a fully featured programming language.

When doing extensive type conversions, the only universal advice is to test, test, test…and then test some more.

Set up a parallel instance and do deep regression testing to guard against subtle effects that might surface even after a “successful” conversion.

Evaluation Questions

What capabilities exist to change types after creation? Can attributes be added or deleted?
Can attribute types be changed after creation? How are the attribute values converted?
Are content type definitions versioned? Is an object connected to a specific version of the definition of its type?
Can content objects have their underlying content types changed after creation? If so, how are attributes mapped between the source and target types? What happens to source attributes that don’t exist on the target type?
If an attribute is deleted, does its value exist in any prior versions of affected content objects?
What information from an object is discarded during a type change? Does the object retain its version history? Permissions? Unique ID?
In tree-based systems with typing restrictions, are these restrictions enforced when evaluating type changes?