Preventing Data Corruption Due to User Interface Bugs
In a non-versioning system (CMS or otherwise), user interface bugs can lead to data loss more easily than you think.
The document discusses the issue of data corruption due to user interface bugs in HTML forms. It highlights the issue of SQL that rewrites all fields of the record with the data in the HTTP request, which can be affected by a bug in the HTML form. The author proposes a solution that only updates the fields in the database that have been changed in the form, using a JavaScript method that only writes to the “changed string” if the field changes multiple times.
Generated by Azure AI on June 24, 2024There’s a standard way of handling object updates via HTML forms. Generally speaking, when the user selects an object to edit, you populate an HTML form with all the data from the object, post all these fields when the user presses Submit, then update all the fields of the record with the data in the HTTP request.
So, if I’m going to edit an article, the HTML form contains all the article properties, and – even if I just change one – the SQL that results after I press Submit rewrites ALL the fields with what it receives in the request.
This has always made me just a little nervous because is assumes that the request is perfectly accurate – that it accurately represents what we want the object to become. But is this always the case?
Say you have some kind of bug in your HTML form, and one of the fields doesn’t populate with the existing object data when you bring up the editor. You might not notice this when you change the one field you needed to change, and you hit save and – BAM! – you just lost all the data for one of your object properties.
Consider this scenario:
One day, someone comes to you and tells you that they just noticed email addresses are disappearing from the mission critical contact database. Yikes.
You do a little debugging and realize that when a user edits a contact, a bug in the HTML form is leaving the email field blank – it’s not loading the current email address in the field.
The users haven’t noticed this – after all, not all contacts have email addresses, and there are a lot of fields – so they hit submit, and the resulting request has no email value, so the contact updates with no email.
You heart skips a beat. You look back at your calendar and realize that the change that caused the bug happened four weeks ago. How many contacts get updated every day? Dozens. That’s hundreds and hundreds of records that have been updated in the last four weeks. How many of them had email addresses? Half? That’s hundreds of email addresses…gone.
So, how do you prevent this? All the solutions I’ve been throwing around are based around the same principle: only update the fields in the database that have been changed in the form.
It’s logical – if your article object has 37 fields, and someone only changes the title, then the bulk of your SQL is simply writing identical data back to the other 36 fields. This is a little wasteful, isn’t it? (Actually, it’s probably not – in the deep, dark recesses of the database code, there’s probably some optimization or something that means it doesn’t make a difference…)
And what about the people who hit Submit without changing anything just to get out of the form? Now you’re making a completely pointless database change which wastes CPU cycles and is another possible point of failure.
This concept reminds me of this SQL which I see a lot in other people’s apps:
DELETE FROM table WHERE id = 345 LIMIT 1
Here you have a WHERE
clause which specifies a value for a primary key. Logically, this can only match one row. So why the LIMIT 1
? As near as I can tell, it’s just in case. God only knows what can happen when you’re dealing with user-entered data, so better safe than sorry lest you wipe out more records than you want to.
So, back to our problem – how do you limit your SQL UPDATE
statement only to the fields that have changed? I’ve had an idea floating around my head for years, but it might suck. Here goes –
In your form, have an empty, hidden INPUT
field. For all your other input fields, have a function fired by the onchange
event. When a field is changed, this function appends the name of that field to the hidden INPUT field.
So after your user has changed all the fields they need to change, this hidden field would have a comma-delimited list of only the fields that have been changed by the user. When you get the HTTP request back server-side, split the list on the commas and you have an array of the only database fields that need to addressed in your UPDATE
statement.
This method is a little inefficient because the field gets put in the “changed string” multiple times if it changes multiple times (you can filter out dupes server-side, of course), and even if the user changes it back to the original value, the field is still in the “changed” string.
A more anal-retentive way would be to have a hidden field for every displayed field that contains the original value of the field. Bind the onsubmit
event on the form to a function that compares the value of the field at the time of submission (the displayed field that user can modify) to the value when the form loaded (the hidden field, which would be the current value of the object back on the server). If the value at time of submission differs from the original value, log that field in the “changed string” as above.
Now, all this depends on client-side JavaScript, which may make you nervous since you can never absolutely depend on it. But the beauty of the system is it fails closed – if they don’t have JavaScript, they can change fields all day long, and nothing will write to the “changed string,” thus no database updating will happen when the request gets back to the server.
(If the concept of the “changed string” bothers you, there’s another alternative: roll through every field in your form, and disable those which haven’t changed. Now your HTTP request is very pure – it only contains information that has changed since browsers don’t post disabled fields, which has the added benefit of making the request smaller and therefore faster.)
If you want to keep it all server-side, you have fewer options. You can compare the request data to the original object data, but when you find a difference, how can you tell if it was changed on purpose or due to an error? You can’t.
Since UI errors will almost always result in a value disappearing completely, you could look for values which are populated in the object, but empty in the request. Then you could warn the user that they’re about to lose data, and do they really want to do this?
You could even take it a step further and show the user all the changes – for this field, this is the old data, and this is what you are going to change it to. Do you really want to do this? But this is clumsy, and unless your users are a really dedicated bunch (trust me, they’re not), everyone is going to skip this and just press submit twice. So this just brings me back to my JavaScript solution.
So, I have two issues here to be evaluated –
- Is there value to only updating fields that have changed?
- Is the JavaScript plan detailed above a valid way to do this?
Anyone see any problems with this? (I mean, besides the fact that it proves I’m anal-retentive, paranoid, and borderline obsessive-compulsive…)