Eval Criteria # 21

How does the system model file assets?

So far, we’ve been talking about text-based content. The attribute values we’ve discussed are storable as strings of characters – even the ones that represent more complicated content structures.

But not everything is text, and you’ll eventually need to store file assets: PDF documents, images, video, etc. Are these modeled as content, and if so, how does their modeling differ from the text-based content we’ve discussed?

Assets as Content

The first question is whether assets are modeled as content at all. Older systems usually didn’t consider assets to be managed content.

It was typical for these system to just offer an interface to manage a directory structure of files on the server’s file system. Content objects could establish links to these files, and the files were exposed by the web server and directly URL addressable.

Unfortunately, this meant that assets didn’t get the benefits of full content objects. They couldn’t hold any other information, were rarely versioned, didn’t have workflow, and usually had some alternative, simpler permission system.

Over time, in many systems, assets were slowly changed into “true” content, and now it’s more common to find systems that consider an asset the same as a regular content object, and thus worthy of all the benefits of content object status. There are some differences in usage that we’ll discuss below, but many systems will allow assets to be modeled like any other content object.

Refer back to the prior chapter where we discussed what is “content” and what isn’t.

Assets as Attribute Values

In addition to general asset creation, some systems allow assets to be uploaded as the value of a specific attribute. Many systems will have an attribute type for “File Upload,” or something similar. This will render an editorial element as a browser file upload interface and allow a file to be specifically uploaded and stored as the value for a named attribute.

We might call these attached assets and detached assets, to represent their tie to a specific attribute.

If an asset is uploaded in this way, the asset is likely not accessible to other content and can therefore not re-used. If the same file needs to be attached to multiple content objects, it will need to be uploaded multiple times.

This differs from a referential attribute which uses an existing asset as its target. In that case, multiple attributes on multiple objects can refer to the same asset.

In some systems, uploading an asset for an attribute will actually create a general asset, and the attribute value will be a reference. In these situations, the attribute file upload is simply a convenience and the end result doesn’t differ from uploading the asset separately then referring to it.

The Utility of Modeling Assets as Content

In its most pure sense, an asset modeled as content would be a regular content object with an addressable URL and one special attribute that carries a bytestream to represent the actual content of the associated file. Boiled down to its essence, this is all an asset is – attributes surrounding a payload of bytes.

An asset could have attributes modeled for when it’s referenced in other content. If we wanted to display a list of assets, for example, we might include a Title, Description, File Type, and other information.

Another common use of extra attributes is for search and categorization. An image doesn’t have any text for a search engine to index, so it’s often surrounded by editorially supplied search terms, perhaps in addition to categorization.

Assets and Delivery Contexts

We can even consider assets in terms of the delivery contexts we’ve discussed previously. When accessed via a URL, the delivery context for an asset would simply dump that array of bytes to the response buffer, and perhaps set a Content-Type header from a Content Type attribute which could be set on file upload (although that can often be extrapolated, if the URL has a file extension). The biggest difference between this and other content is that the delivery context is just passing through the value of single attribute – the bytestream.

An asset might even be “templated” in the sense that the delivery context might change what is ultimately delivered in response to a request. Variables in the delivery context (e.g., querystring arguments) might modify assets from their source form. For example:

A specific video frame might be captured and returned as an image
An image might be resized and returned at a smaller size
An image might undergo batched stylistic changes; for example, it might be switched to grayscale, injected with some blur and static-like artifacts, tilted five degrees, and surrounded by a white border to resemble an old photograph
A specific page range might be extracted from a PDF document, or a specific slide from a PowerPoint presentation
Using machine learning, the most prominent face in an image might be detected, and the image cropped around it
A zip archive might be opened, and a specific file extracted and returned

Usage and Access Patterns of Assets

There are a few usage differences of assets as opposed to other content objects.

First, assets are accessed directly less often. Usually, they are referenced in HTML. An image is embedded with an img tag, or a video is referenced with a video tag. A PDF or other downloadable file might be addressed directly, but the vast majority of assets are embedded.

Because of this, assets usually require an addressable URL endpoint.

Additionally, assets are generally used in support of other content. This means that they need to be easily findable from the editorial interface, as they’re often the target of a referential attribute.

Finally, assets commonly need to exist in the same editorial lifecycle “space” as an associated content object. If an asset is only used in support of a specific content object, it’s helpful if the asset –

Mirrors the permissions of that object
Is subject to the same publishing status events, meaning it publishes and expires at the same times
Is deleted when the object is deleted

All of these factors combine to imply an image might be “owned” by a particular content object, which means some pseudo or explicit attribute exists to refer to that object. In some tree-based systems, the assets might be a child of that owning object, or it might exist inside a container (an “asset folder,” for example) which is owned by that object.

Digital Asset Management Systems

Most CMSs aren’t designed around managing file assets. It’s a “good enough” feature – something a system does to check a box on a list of requirements, but doesn’t attempt to excel at it. Modeling asset files is just not as important as other content to most users.

However, some scenarios are asset-centric. For example, the website for a television station might have video files as its main content type. And clearly, something like a stock photo image website will rely heavily on asset management.

In these situations, there is a specific genre of content management called Digital Asset Management or “DAM” (unfortunately pronounced just like it’s written). DAM systems provide more tools around asset-based content.

Some common features are:

Mass file upload (often called “intake”) and object creation; for example, a photographer covering a sporting event might take 1,000 pictures they want to dump in the system all at once
Extensive reporting capabilities
Advanced metadata, categorization, and tagging systems
Automatic data extraction and attribute value assignment from data embedded in the media; for example, EXIF data, from an image, or automatic textual transcript of audio
More sophisticated physical storage options, such as near-line and off-line storage, and partitioning of logical storage
Integration options with other systems; for example, a web CMS might be able to “delegate” all its asset management to a connected DAM
Integration with content delivery networks to provide cached storage of published assets
Advanced renditioning and transformation services (see examples above)
AI or machine learning systems to identify subject matter in images
In-system editing, such as image cropping and resizing, or video splicing
Automatic watermarking or DRM embedding

These options are driven in part by the unique characteristics of the assets themselves, but also by how the different systems are used.

It’s common to use DAM as a generalized, enhanced storage location. Whereas a web CMS might only contain content that’s specifically published, an organization might push thousands of assets into their DAM system merely as storage, and only ever actively publish a tiny percentage of the total.

While a corporate website might have several hundred content objects, it’s not uncommon for a DAM to have tens or hundreds of thousands. This makes features like reporting and tagging systems more important.

DAM systems are generally used for media assets, which is loosely defined as images, video, audio, and occasionally some more esoteric types, such as augmented reality (AR), virtual reality (VR), legacy formats like Flash, or source files for other rendered formats (PSD, FLA). Most DAM systems will allow you to store any file as a general bytestream.

However, for non-media files, such as text formats like PDF and Word, a genre known as enterprise content management (ECM) is common. Like, DAM, an ECM system is used more for storage and publication, and has features heavily tilted to reported and asset findability, in addition to an emphasis on permissions and workflow.

File assets often need modeling just like other content. They are the manifestation of the “content” and “metadata” dichotomy discussed earlier. The bytestream is the thing, and other attributes can be placed “around” the bytes in order to record additional information that enhances it in some way, or provides extra information for handling the file.

Evaluation Questions

Does the system differentiate between asset and non-asset content objects? Do assets differ from non-asset content?
Are inbound requests for assets handled differently than those for non-assets? Is there still a delivery context with the potential for code execution and templating/transformation?
How is the URL for assets formed? Does this differ from the URL logic for non-assets?
Can assets be “owned” by a content object so that they mirror its permission set and editorial lifecycle?
Are there any built-in transformations available for different asset types that might prevent the need to store different versions of the asset?