Queen Bees: The Resurgence of CMS Repositories in the Age of AI
We’re gonna need to get back to the boring basics if we’re going to survive the next decade.
I’m wondering if the coming age AI will force CMSs to retrench back to the repository level.
Since the birth of the CMS, we’ve walked away from the repositories. We initially imbued them with lots of tools and features – modeling, versioning, workflows, permissions, version control, etc. – but these are ultimately… unsexy. It’s hard to sell tools that are literally designed to keep your life boring and uneventful.
Soon, building content output became more important. The content itself gave way in importance to site management and site building.
Then, those even gave way to post-publish optimization tools. No one cared what happened “to the left of the Publish button.” All the cool kids moved to the right of it, and just wanted to talk about what happened once the content was published.
In the background, our repositories haven’t really evolved. Show me a vendor that regularly releases new repository service features, and I’ll show you a vendor no one is talking about.
But let’s back up a bit –
Right now, CMSs do a lot, especially if they’ve made the jump to that vague definition of “digital experience platform” (DXP). Let’s build it out from the bottom up:
Content Modeling: defining what the content is – types, properties, datatypes, validation rules, etc.
Content Storage and Access: providing basic persistence and access to content; APIs, both remote and native; querying facilities
Content “Management:” allowing everything that happens from creation to deletion, like permissions, workflow, lifecycle management, auditing, etc. (in quotes because this is the actual management stuff; that word does too much work)
Content Aggregation: grouping content in ways to provide increased value; trees, menus, tagging, collections, relations, assemblies, etc.
Content Publishing: generating and transmitting some kind of output artifact
Content Optimization: adapting content to immediate context; personalization and experimentation and customer behavior tracking and all that
This is an onion of successive layers.
The first few layers – the first two? the first three? the first four? – are very core to the idea of “managing content.” But as we go further down the list we get into more and more abstract levels of functionality that “wrap” the core.
AI might start to peel this onion, meaning it will start to replace some of the outer layers and even redefine what they mean. Consequently, CMSs might start to shed those layers, effectively ceding them to AI, and retrench back to the core.
When this happens, the question will become: where does it stop? How close to the core do we let AI absorb? At what point do we say:
“Enough. You can go no farther. The features and functionality this close to the core are managed by humans and cannot be manipulated by AI.”
Specifically I’m thinking in particular about content modeling. This is on my mind because of a group discussion we had at CMS Connect 25 in Montreal last week. A bunch of us were batting around what content modeling looks like in the age of AI, and I started to get a little uneasy.
The core question we all seemed to be dancing around was this: do we let AI manage our content models?
Right now, I say NO.
To me, this is the cliff that prevents us from retreating any further. While we might retreat to the very concept of defining what our content is, that’s where we turn toward the invader and make our stand.
I feel like content models should be inviolate. They’re built with a very human understanding of the nature of your content and how it relates to your organizational domain. So many decisions are implemented in content models that are based on a human’s understanding of what works at that organization and the people who comprise it.
Additionally, a good model represents not only what you need out of content now, but what you might need in the future. A smart human doesn’t have a myopic focus on only the output or usage they need immediately, but what they might do in the future, along multiple axes, including how to output the content, manage the content, operate on the content, govern the content, etc.
Yes, this might be a Pollyanna-esque view of the whole thing. I get that. But this is where my head is at right now.
For some years now, I’ve managed a database of all my books. As an initial build-time task, I needed to define a model of what constitutes a “book.”
Each one has a Title and a Subtitle and a bunch of other obvious stuff, of course. But then I include some data that only makes sense to me – fields for “Loaned To,” and “Acquisition Notes” (how I came to possess it), and “Repair Notes” (poor bindings and stuff that could be repaired), “Reading Environment Notes” (where I was when I read it, if it was somewhere interesting, like on vacation or a business trip or a road trip).
A lot of this is based purely on what I want to know and store about my books. In particular, it’s not related to any existing output (I don’t need to publish information about how a book needs to be repaired, for example), so there’s no existing thing to reverse-engineer. In fact, when I first put this content model together, I wasn’t even planning to output the data anywhere – it was just a tracking exercise.
To me, this content model is something that I own. This is the basic idea – the nature, the notion – of what I’m trying to manage. To be overly dramatic, this content represents the “soul” of what a book represents to me.
I’m not ready to surrender this to AI. I don’t think I ever will be.
Right now, I’m using this to output a library inventory and a reading list, but who knows what I’ll do with it in the future? The model exists apart from any particular output. In fact, the outputs can be considered ephemeral. They’ve existed for many years, but who knows? I might throw them away someday and do something else with the data.
Some people would say that AI should manage the model. In particular, it’s kind of vogue to say, “just show AI what you want to get out of it, and it will suggest and manage a content model for you.”
For smaller one-off projects where the content exists for the sake of that one output, maybe this would be fine. But for larger notions of enterprise content, I feel like this is short-sighted.
In my situation:
I didn’t have an output planned when I created the database. There was literally nothing to show to an AI.
I have at least two outputs now. And who knows what else I’ll do later? I can’t show AI “what I want out of the data” because the end of that story hasn’t been written and likely never will be. My model and my data is a bundle of potential energy which I haven’t finished harnessing in different ways yet.
If I did have an AI work backwards to create my model, I would have gotten a model really good for one particular output – the one I showed to the AI. I worry that an “output-centric model” is way too specific to that particular output.
So, at this moment, I’m settling on this working premise –
Humans create the content model. An AI may work only within the bounds of it. AI is subservient to the content model. It can operate on content, but it cannot define what content is.
A good content repository in the age of AI should be well-defended and well-described.
- Well-defended: It should protect content by concise and expressive data and validation rules
- Well-described: It should expose its modeling as openly and clearly as possible
This is the only way we’re going to keep everything under control when multiple AIs start working with our content as they evolve over the next decade.
Imagine that our ironclad repository and model is the queen bee of a hive. Lots of AIs are hovering around it, working independently and concurrently, attending to The Queen’s needs and otherwise acting in service of The Queen.
But they are not The Queen. They are ultimately disposable.
And the last point is non-trivial. What an AI does is or can often be ephemeral, and should be considered as such, which is utterly antithetical to the principles that a repository is designed around.
Let me tangent a bit on this point –
Remember when computer virtualization became a thing? My career started when a server was 1:1 to a plastic box in a data center somewhere. Servers were a big deal – you ordered them, and they showed up on a wooden pallet, and you got a buddy to hold them for you while you bolted them into the rack.
But one day we got virtual servers, and so long as we could wrangle an IT guy, then we could get a new server by filing a ticket. And then when we got AWS and Azure, and suddenly we had the option to create servers in seconds, without even involving IT if we didn’t want to.
And – this was the really weird part – we could shut them down just as fast. Given the hourly billing, we were introduced to the idea of creating a server to do a set amount of work, then throwing it away. Servers really became “freelance computation” that you could hire for the specific period of time you needed.
Computational power became disposable.
This is where I think AIs is going in relation to content. We’ll use AI to work or create artifacts from content or UIs to work with content, with the understanding that what they create is basically disposable. Why? Because the cost to create what it is has been effectively driven to zero. When creating something doesn’t cost anything, you don’t need to keep the output. You can get it back by just asking for it.
Let’s say that one day, I have a little extra time. To make sure all my books are tagged, I could very easily just say:
Hey Claude, connect to this repository and learn all about the content model and the content in it.
Then produce an optimized UI as a browser-based app for me to apply tags to any book records that don’t already have them. In the UI, show me some details about the book, and maybe prefill the form field with some suggested tags. Also, show me all the tags currently across all books in use and the number of times they appear. Finally, show me a running percentage of how far I am toward getting all books tagged.
I get my UI in a few minutes and run it (maybe just on localhost). I work away until all my books are tagged, and then I just throw the UI away. I don’t need it anymore, and if I ever do, I’ll just ask for it from scratch again.
In the end, I didn’t want the UI. I wanted the outcome of having it. I didn’t want the shovel, I wanted a hole in the ground.
This really re-frames the concept of “vibe-coding.” I care less about the textbook correctness of code that I’m not going to keep. And remember, since my content model is “well-described” and my repository “well-defended,” even stupid code is prevented from doing damage. I’ve put rules around The Queen, and I don’t much care how a worker bee does their job, so long as they obey the rules.
A key point is that what I just did with Claude is only one thing that might be happening in the repository at any given time. Somewhere else, some other person might be doing this:
Hey ChatGPT, connect to this repository and learn all about the content model and the content in it.
Then execute a batch process to identify books that have a Date Completed value for which there is no physical copy in the library. Append the tag “read-but-unowned” to the current value of the “Tags” field.
Think about what’s going on here:
- This operation is being done by a different AI engine. Let to its own devices, it might interpret the content different unless it has access to a clear, concise description of how the content is defined.
This operation is being done simultaneously with the other operation. This means that they’re both depending on a stable content model. In this particular case, the two operations are “sharing” access to the Tags property, and each AI is ignorant of the fact that the other one is working with the same field. Each AI needs the definition of that property to stay the same as it was when the operation was started.
This operation is transactional and ephemeral. I’m just asking it to do a job then stop. Will I need to do this again? Maybe, but probably not enough to keep the code around. Note that I’m not specifically asking it to “create” anything. In the first instance I said “produce a UI” and in the other I said “execute a batch process.” I don’t want the shovel – I just want the hole in the ground.
Let’s do it again:
Hey Gemini, connect to this repository and learn all about the content model and the content in it.
Then generate a static website that has a home page that lists all the books, a tag overview page that lists all the tags in use, a single page for each tag that lists the books with that tag, and then a single page for every book, to which any reference of the book is linked. Search Amazon to find the product URL for the book. If you find one, put that on the page too, using affiliate link token “jamesbondfan007.” Give this to me as a ZIP file of HTML documents that I can just extract to a web server root.
Another AI engine. Another ephemeral job. Another worker bee.
What do all these scenarios have in common?
The Queen.
A content model, and a repository. They all depend on some common definition of what content is, and some common location to access the content itself.
They are worker bees, performing actions on behalf of The Queen. They can’t usurp The Queen, only serve her and then die.
Where we might have depended on tools in a CMS to do the jobs we’ve described, we’ll use temporary AI agents to do it in the future. We’ll probably use a lot of them with all sorts of different models and capabilities, invoked and given direction by different people, starting and stopping in overlap with each other, and occasionally coming into conflict with each other. Lots and lots of worker bees, all trying to serve The Queen at the same time.
With the withering state of repositories, what possible hope do we have of controlling and coordinating all this, while keeping our content safe and avoiding risk?
Primarily, we need a well-defended, well-described content model that no AI can change directly.
In my mind, this becomes the limit of our retrenchment. This is the core of the onion. We cede no more territory than this.
If an AI requires a change (a different property, a different type, whatever…), then it either needs to explain the change to a human, or put that property and its data somewhere else and manage it from there.
Secondly, we’ll need to revisit security, audit, and concurrency capabilities. We’ll need to acknowledge the increasing frequency of “user agency” – AIs acting on content on behalf of human users.
If I give AI either of the first two scenarios above, how do we record those write operations? Did I do that? Or did the AI do that? Well, we both did. We’ll need to account for the scenario of an AI acting at a human’s direction. In addition to being clear that this was a delegated action, we might want to store meta about the AI that was used – vendor, model, instance, etc. – and we might even store the prompt so that we can audit and recreate what happened later, if we need to.
Thirdly, very boring, unsexy risk mitigation features will have renewed importance. Versioning, version control, rollback, and concurrency and conflict management will become a bigger concern, and will need to be more granular. We’ll need to track changes to content in a way that they can be reconciled, merged, or canceled after the fact, because the proliferation of worker bees will cause them to run over each other, in addition to the fact that AI will simply magnify the volume of activity a single human can initiate.
Simply put: content repositories will need to get better and stronger.
The Queen must rule. Long live The Queen.
On the other side, the outputs – the worker bees – are less important. They can unbundle. I might use one worker bee for a specific project, but use a different one the next week, maybe even to re-do that same project. Worker bees are easier. I don’t bet the company on a worker bee.
This is the future we’re in for, I think –
The resurgence of repository services as a key enabler of AI capabilities. Some repositories will perform well in the face of an AI swarm. Others will not.
Rich, inviolate content models no AI can modify directly. Well-defended and well-described.
Some common description protocol for these models. MCP is the telephone, but what language are we speaking? RDF? JCR? OWL? JSON Schema?
External APIs for description (“Connect to this repository and learn all about the content model…”)
Ephemeral content operations (“worker bees”) from which we simply want to retain the outcome, not the tool used to achieve it
Stronger and more granular control and auditing features, particularly those that increase the level of concurrency management
Evolved security models that understand humans and AIs will work together, and so two “users” can and will collaborate on the same action
Who is going to lead this charge? Which vendors and which architectural paradigms are closest to it already?
Generally speaking: CMS vendors who have continued to improve their “boring” repositories over time, resisting the desire to concentrate only on more glamorous output and optimization tooling, and ensuring that their repositories provided a solid foundational base for everything else.
I’ll resist the urge to be more specific than that. I’ll let others make their own arguments.
Good luck, everyone.