Making Your Fields Do Their Own Dirty Work
The document discusses the concept of datatypes in content management systems (CMS) and how they can be used to manage and validate data. It explains the concept of a Field object, which is an object that knows its datatype and can perform various actions such as retrieve from POST, render for view, render for edit, store, validate, load, and clean. The document also provides a detailed explanation of validation, complex datatypes, serialization, and how to create a new datatype.
Generated by Azure AI on June 24, 2024At one point or another, all content management systems (CMS) come down to some kind of datatype. You have to be able to set a field to a string, or an integer, or whatever, and then enforce and manage that piece of data. The idea is that you take these datatypes and glue them together to form classes of objects.
(Note: “field” and “datatype” are two different things. A “datatype” is a description of a type of data: string, datetime, etc. A field is an example of that: title, body, author, etc. You may have a dozen fields in a database, all of the “varchar” datatype.)
In a lot of systems that are specific to one type of content, the fields are known to the system. Movable Type, for instance, knows that the title field is going to be a string datatype, so it can handle it as such.
But what about systems that allow developers to create their own classes by gluing datatypes together? (See “Open and Closed Content Management” for some more insight here.) How do you validate an object if it could be comprised of any kind of data?
The answer is to make the fields smart: don’t do their work for them, make them do it themselves. Then all you need is some “controller” code that only needs to know one thing: what fields do I need to order around? The controller doesn’t know or care how the fields are completing each action it tells them to complete – that’s the field’s responsibility.
I’ve done this on a fairly large scale, and it’s worked beautifully.
The Field Object
Each field in my database corresponds to a “Field” object instance. This object knows its datatype – whether or not its a string or an integer or whatever. (Technically speaking, it could extend or mix-in a “Datatype_String” or “Datatype_Integer” class in order to get all the methods it needs. In my case, there’s a “meta table” in the database that describes all the fields of the various content objects and what the datatypes are for each.)
Here are some methods I’ve put on my “Field” class which correspond to stuff every self-respecting field needs to be able to do for itself:
Retrieve from POST: gets whatever values from the POST that belong to it after an HTML form is submitted
Render for View: returns the field’s value, formatted correctly for viewing (you can even allow it to accept a “view_type” argument and return different formats of the data – a table row compared to a list item, for instance)
Render for Edit: returns the field in the correct input element (or elements, if you’re serializing – see below)
Store: returns SQL for that specific field (you pass in an “INSERT” or “UPDATE” flag)
Validate: returns NULL on success or an error message (a string) on failure
Load: loads itself up from a value in the underlying data table (unserializes itself, if necessary – see below)
Clean: returns a “cleaned” version of itself, suitable for search engine indexing
Making These Methods Do Something Useful
Consider the moment an edit form is submitted. Using our Field class, the controlling code (the code that fields the inbound request) just needs to know what fields are expected, then iterate over them and:
call the Retrieve from POST method on each to get the incoming values
call the Validate method on each, and, assuming no errors are returned…
call the Store method on each and glue the resulting SQL into an INSERT or UPDATE statement
call the Clean method on each, glue all that together and dump it somewhere where the indexer can find it
The key here is that the controlling code doesn’t know how any of this happens – all that is handled inside the various Field objects. It just needs to issue the order to each field in turn.
There actually may be more actions that need to be taken (you could have a “Log Change” method, for instance), but that doesn’t matter – the point is that the controller has elevated itself above the dirty work of doing these things. It just sits back and tells the fields to do it to themselves, and since the fields know what datatype they are, they know how to do it.
Lame analogy: Instead of tying the shoes of every kindergartner in the classroom, you teach them how to tie their own shoes (which could be different for each one, given sandals, boots, Velcro straps, etc.), then you just walk down the row and tell each one: “Tie your shoes.”
A Note on Validation
Field types can actually be classified into “Data Types” and “Value Types.”
Data Type is the actual field of the database – be it varchar, int, datetime, whatever. This is the value that the machine cares about. Value Type is the functional value of the field – what the field represents. This is the value that the human cares about.
For instance –
A field may have a Data Type of “int,” meaning it needs to be an integer to fit into the database. However it might have a Value Type of “year,” meaning it’s comprised of exactly four numerals. Value Type is necessarily more restrictive than Data Type – all years are valid integers, but not all integers are valid years.
This means that validating the field should go from Value Type to Data Type – if the Value Type validation passes (that the value is exactly four numerals), then the Data Type validation must be assumed as true.
Complex Datatypes and Serialization
Datatypes can be “simple” or “complex.” A simple type is data that was really meant to be stored in a single database field – a string, for instance. A complex type is data that wasn’t meant to be stored in a single field. Since you don’t want to hack the data model for every new datatype, you need a method to force fit the data.
For instance, an “image” datatype could actually store several things (“sub-fields,” if you will):
The image file itself (usually a reference to a file on the file system)
The caption to go under the image
The ALT text for the image
The alignment of the image (left, right, whatever)
Any extra style information (masked images would have “border: none;” for instance)
All this information needs to be “rolled up” into a string of XML or YAML or whatever so you can insert it. This is part of the Store method – each field knows how to serialize itself – the “string” datatype returns its value pretty much unchanged, while the “image” datatype does some serious serialization acrobatics before returning the “storable” representation of itself.
(Additionally, what’s great about making the fields responsible is that the “Render for Edit” method can render an input widget however it needs to be – in this case, it would actually return a chunk of HTML with four or five input fields that would have names like image[file]
and image[caption]
. Then the “Retrieve from POST” method would know how to gather all these from the POST and “reconstitute” them into an object.)
The one problem with serialization is pretty obvious: marshalled data is tough to search with SQL. Don’t forget that you’re denormalizing your data and that has consequences. What if you wanted to find all images with “gumby” in the caption? Tough to do when that information is buried in a XML string (unless your database can do XPath queries on fields – then you’re in great shape). We talked about this same thing here, where I called it “data globbing.”
Making It All Work
Yes, this is complicated, but it all comes together beautifully in that fateful moment when you have to create a new datatype. Like this –
Say you have a system to track all the dates you’ve ever been on (hint: this means you’re a loser). Within this system, you have a “Date” object. This object has fields for “Partner,” “Date and Time,” “How Fun Was It,” etc.
You decide you want a field for “Movie We Went To” and you want to track this information (these “sub-fields”):
Movie Title
Did I Buy Popcorn
Running Time
My Review
Now, normally “Movie” would become a new class in the system and a “Date” object would store a reference to a “Movie” object. However, if you’ve built your datatype right, it can all be stored and managed as a field on another object, you just need to honor the contracts of the methods we’ve discussed:
Render for View would simply render the field in some pleasant way, returning HTML. It could be as simple or complex as you like. We could just return a simple HTML table, or we could connect to a Web service, get back a URL to an image of the movie poster, and include it with the data. It doesn’t matter far we take it because none of it never gets out of the object – the controlling code just depends on the object to return some kind of HTML.
Render for Edit would return a chunk of HTML with input fields for all four pieces of information we want to store – there would be textboxes for the title and running time, a checkbox for the popcorn, and a textarea for the review.
Retrieve from Post would check the POST and populate properties from the data there that pertains to itself.
Validate would check each bit of information and return an error message if the data wasn’t in the shape it should be
Store would convert the four sub-fields to simple YAML string and return it in a SQL fragment
Clean would glue the four pieces of data together, strip out all punctuation, and return the result. The controlling code would decide what to do with it, whether it be to insert it into a “search_index” table or write it to a file where the indexer will pick it up, or whatever.
Remember that these methods are all happening in among the other fields. This field is called in turn as the controller iterates over the fields, each one validating their contents in their own special way.
I’ve used this theory on a sizable system and it works beautifully. I implemented it when I was confronted by a boss who would march down to my desk at any moment and say “I want a field for X” and expect that field to be in the interface and fully functional by the time he got back to his desk (in fact, I wrote this post about just that problem).
One table in this system has grown to 108 fields. This is a lot of fields for a database table, but it’s manageable because the fields all have datatypes, and the datatypes know how to manage themselves. So when the boss wants to store something new, I just add that field to the database table, then add a record to the “fields” table which describes (1) the name of the field, and (2) the datatype of the field.
The rest is pretty simple. So long as the field knows where to put its data and how to take care of itself, there’s not much to it and you can start writing new datatypes for anything that may come along.