Data Globbing with MySQL Regex
As I become a more experienced developer, I’m learning when you should and shouldn’t break the rules. While following every rule of programming and data modeling is wonderful, sometimes you need to bend the rules for the sake of simplicity and expediency.
Always remember, an app in the hand is worth a thousand on the white board.
This being the case, lately I’ve been known to “glob” up data in database fields. Yes, I know this breaks the first normal form – that of atomicity – but there are times when doing it right would involve three more queries, two more database tables, another UI screen, etc. Often it makes the cure worse than the disease.
For instance, consider this little XML document as the contents of the “children” field for one of the records in my “church_attender” table:
<children> <child> <first_name>Isabella</first_name> <last_name>Barker</last_name> </child> <child> <first_name>Gabrielle</first_name> <last_name>Barker</last_name> </child> </children>
Now if I never wanted to search for individual children, I would make no excuses for this. It saves us a database table, a join, and a ton of complexity in the interface. Life is good.
Searching a globbed up field is a problem, though. We alluded to it in this post when we said:
However, the problem is that the XML field is a black box that – on most database platforms – you can’t look inside. What if you want a list of articles written by a particular author? Well, you need to use SQL to get all the XML back, spin that collection, XPath into every single one to find the value author of the author node, then keep that record it matches.
So what if I want to find a person with a child named Gabrielle? Some databases (Oracle, for one), will let you do something like this:
SELECT * FROM church_attender WHERE XPATH(children,'/child/first_name') = 'Gabrielle'
That’d be great, but I don’t have Oracle. However, given MySQL regular expression capabilities, how unacceptable would this be:
SELECT * FROM church_attender WHERE children LIKE '%Gabrielle%' AND children RLIKE '<children>.*<child>.*<first_name> [line break] Gabrielle </first_name>.*</child>.*</children>'
(Note that there are some extra spaces in there just so the lines would wrap.)
Yes, yes, I know the database Gods would frown on this, but given the enormous amount of complexity it would save us, is it acceptable? Does the good outweigh the bad?
Fishing for opinions here. Let’s hear them.