How This Document Came About
I am a developer. I have been working in application development and databases for almost two decades.
Corey is an employee of my company. He is not a developer. His title is “User Interaction Strategist,” which is a fancy title for The Guy Who Helps You Figure Out What You Really Need.
What Corey does is help you take your very abstract idea of “we need a new application” and figure out what that actually means. Why do you think you need a new application? What audiences will this application serve? What do those people want from it? What specific tasks are they trying to accomplish? What do we need to deliver to these people to make them happy?
Corey takes in all this information, synthesizes it, and outputs a plan which tells a developer, “this is what this person really wants to do.” (In the enterprise, Corey would likely be called a Systems Analyst.)
One day, Corey and I met with a company who really had very little tracking around the core function of their job. This company made deliveries of a sort, and they had a handful of old applications to help them work out various aspects of these deliveries:
- One application would help them determine sales tax for a specific type of delivery.
- Another application would help them write a quote for a delivery.
- Another application would help them track the equipment used in a delivery.
And so on. These applications were completely isolated, knew nothing about each other, and had been abandoned by their IT departments. The client wanted something integrated so they could get a better view of their work.
As the client was talking about their problem, Corey was sifting through screenprints of the various applications they provided, and I was drawing boxes and arrows in my notebook. The client would say something, and I would draw another box, or cross a box out, and replace it with an arrow. Corey would drop one piece of paper and pick up another one.
When the meeting was done, I don’t know how Corey felt about it, but the problem was crystal clear to me. As the client had been talking, I had been distilling their problem down to pure data concepts.
This was quite different from how Corey was looking at it:
I was analyzing the problem at the data level. The doodles in my notebook were actually a crude form what what’s called an entity relationship diagram , which visualizes how different pieces of data fit together.
Corey was analyzing this problem from the user interface level, by examining each of the the different screens and interfaces that a user would touch to do parts of their job.
After the meeting, it occurred to me that the right way to search for a solution is to examine both sides of the problem at the same time.
While the user interface is what the user is interacting with at any given moment, a user’s higher level understanding of the data they work with is rooted in how the designer of an application has modeled that data. If the person who designed an application centered it around a specific type of data, then the interfaces for that application, and many of the ways the application will behave will be centered around that piece of data.
There’s a common premise that gets thrown around a lot in the design world:
“The interface is your application.”
The idea behind that is that it doesn’t matter what your application does, only how the user interacts with it. This is all they see, so to them this is the application.
I don’t subscribe to this. Users work with interfaces only to achieve a higher-level goal. Yes, that mortgage application is pretty and very intuitive, but what I’m really trying to do is input the data necessary to start getting a mortgage. I love all the design work that went into that car-selling application, but what I need is a clear understanding of how to form this search to find that Porsche 911 I’ve been dreaming about since I was a kid.
So, to this end, I offer this:
“The data is your application.”
Interfaces manipulate data. Get the data wrong, all the design in the world isn’t going to solve your problem and the entire application will suffer for it. In this sense, the interface level of an application is simply a reflection of how the data for that application has been modeled.
These interfaces can certainly be done well or poorly – and a visual designer is critical in this process – but if you tell a designer to give you an interface for working with a widget, and what the user really needs to work with is a doodad, then all the Photoshop in the world isn’t going to work that problem out.
I realize, of course, that I may have ignited a Holy War with this comment. Before I get any impeccably designed hate mail, let me say that my point is not to denigrate design, but at least elevate the data modeling behind an application to the same level as the pretty face that gets put on it.
It was right about this moment that I realized that Corey need to understand the concepts of a relational database so that he could develop the ability to examine problems from both a pure data perspective and from an interaction perspective at the same time.
Later on, while writing this document, I came to acknowledge that relational databases were simply a convenient medium through which to teach basic concepts of data modeling. The term “databases” encompasses a large variety of information storage methods, but they’re all centered around rules which govern the management of data and require you to think in pure data concepts. Relational databases provide a very good introduction to thinking this way.
This document is a wild expansion from the original set of notes I created to teach Corey how relational databases worked.
Why Should a Non-Developer Care About Relational Databases?
If there’s anything non-sexy in the world of information, it’s probably databases.
Correction: If there’s anything non-sexy in the world, it’s probably databases.
There’s likely no drier subject – nothing deeper, darker, and more under-the-covers than systems which do nothing but hold…data. In fact it can almost be said that databases are designed to make data boring. Because “boring” is often a euphemism for “stable, clear, and unexciting.”
And this is this what you want with your business-critical data: stability, clarity, and a distinct lack of excitement.
So, as a non-developer, why should you spend any time learning any more than you have to about them? These are for people with neck beards, right?
Here’s why –
Because databases are pervasive
The fact is, you interact with databases all the time. Unless you never use anything but entirely transient applications that don’t need to persist data longer than the time you spend actively working with them, then that data has to be stored somewhere
This data is the only thing that gives that application any context or continuity from session to session, and it’s likely stored in a database somewhere. If not for this data and our ability to manage it, all your web applications would simply forget you from visit to visit, and where would that leave us?
Databases are really the collective memory of the enterprise and of the Internet as a whole. Everywhere you go, you leave digital footprints in a database somewhere, and it’s largely because of these that information technology doesn’t forget everything it knows from second to second.
Databases, therefore, are perhaps the most core part of an IT infrastructure after the computing and networking hardware itself. Almost every application in the enterprise uses a database in some form, and this is the backbone of the application’s functionality. Everything many applications do is simply create and manipulate data in a database somewhere.
Most web applications, in fact, are simply wrappers around a database that let users view the data in different ways in an attempt to draw some context out of it and have it makes sense to their current task.
Many enterprises are purely based around the data in their databases, and the management and security of this data is synonymous with the business itself. Banks have all their customer information and balances in databases. No data means no business.
Never forget this –
That balance you think you have in your checking account? It’s just a record in a database somewhere, and the only reason you “have” that money is because the database says you do.
So, sleep well tonight.
Because they affect almost every development project
For those of you in the creative industry, you’ll come into contact with databases more often than you think, often in the context of a development project. This will bring you into contact with database architects and engineers and require you to speak their language.
When they start throwing around terms like “referential integrity” and “normal forms,” you need to be able to understand what applies to you and what doesn’t.
Perhaps more important than speaking their language, however, you need to be able to think like they do and put yourself in their shoes. You might propose things that they’re uncomfortable with, and when they resist, you need to understand why so you can discuss and resolve the problem the best possible way for both sides. You need to have a basic understanding of the pressures they’re under to perform in their roles, so you don’t inadvertently push them to do things that cause them to push back because of some reason four levels removed from what you’re thinking.
Why are they so uptight about something called a “primary key”? Why do they always have to know if this piece of data is an exception, or if there are going to be more like it? Why do they want our logs written to files instead of a database table?
As we said before, databases are the core of many applications. Building an application without understanding how an absolutely key piece of infrastructure works is like building a house with no understanding of how a nail holds two pieces of wood together.
Because reification is a core information skill
Databases the most primary example of the need to operationalize logical concepts. Databases are where vague ideas are solidified into actual, manageable units of information, and the skills around doing this are critical for anyone working in IT.
In many projects, goals are discussed in very conceptual and abstract ways.
“You know what we need about here? A database of all our of press releases!”
While this is certainly a noble goal in the abstract, to address this need in practical terms, you need to operationalize this information – you need to drag the concept of a press release down to some manageable format.
To do this, you need to be able to distill a press release into a pure information unit, then dissect it into its constituent parts, define which combination of those parts represents logical objects, and determine how those objects relate to each other.
The proper name for this process is reification , which is literally translated from Latin as “to make a thing.” It refers to the process of taking an abstract concept and turning it into a concrete thing.
The more common name for this is data modeling , which is an extremely valuable skill, both in the direct sense of what you do with it, but also in the indirect sense of the secondary effects this process will have on your project.
Nothing helps define a project more clearly than asking hard questions about very abstract concepts in order to model these concepts in some structured format, like that which a database requires. It’s at this stage of a project where the needs become clearer, the goals become sharper, and the scope starts to take shape. If your overall goal is unreachable, it’s often at this point where that becomes clear. I would wager that more projects have been scrapped at the data modeling phase than in any other, because it’s at this moment that absurdities stick out like sore thumbs.
Databases are the gears through which abstract ideas and pleasant notions are ground through in order to see if they still make sense in the stark, functional arena of tables, rows and columns.
The ability to understand and manage the tools to accomplish this process is a core skill of working with information in any capacity.
History, Purpose, and Usage
As a generic term, a “database” is any structured source of data which can be manipulated and searched. A directory of text files could be considered a database, as long as they were in such a structure that we could search through them and find a particular piece of information we needed.
A “relational database” is a source of data in a specific format involving interrelated sets of tabular data. The core concepts of the relational database were invented by Edgar Codd of IBM. In 1970, he published a paper entitled “A Relational Model of Data for Large Shared Data Banks” which established how a relational database should work in theory. All of today’s relational databases have been based on these theories. (Codd is often referred to as “the father of the relational database.”)
There are other kinds of databases – XML databases have been around for a while, and so-called “object databases” are all the rage right now – but in 99% of cases, when someone in this industry says “database,” they mean a standard relational database. For the purposes of this document, “database” and “relational database” are equivalent terms.
Very lightweight consumer-oriented databases can run completely on your local machine. The most common PC database is Microsoft Access (part of the Office suite). For Macs, it’s Filemaker.
However, at Blend, when we say “database,” we’re usually always talking about a system that runs in the background of a centralized server. These are systems that can manage data for multiple different applications. You may have applications on a dozen different servers all talking to the same database server.
The nomenclature can a little tricky here – a database server is a process running in the background of a server. It usually always manages multiple databases , even hundreds at the same time. These databases can be for completely separate things and have nothing to do with each other (much like a web server can serve hundreds of independent web sites).
Databases are the most industrial strength example of enterprise software. Very few other types of software are expected to support the sustained load and data integrity needs of databases. In many organizations, every piece of information the company needs to do business is in a database somewhere. No other type of software has been optimized, refined, tested, and developed to the level that database servers have been. Some of most cutting-edge research in computing and some of the most arcane computing architectures have been developed to make database servers perform faster and more reliably.
Companies take their databases very seriously, and spend massive amounts of money installing and maintaining them. There’s an entire job description dedicated to databases: the DBA (“Database Administrator”). All DBAs do all day is plan, design, manage, and monitor the various databases of a company. Experienced Oracle DBAs, in particular, are some of the highest paid people in all of IT.
In IT lore, DBAs are also considered the crankiest, most easily irritated type of employee. Think of a troll trying to guard his treasure, and you have a rough idea of the perception. Generally speaking, no one enjoys a conversation with a DBA. And if you screw up their data, God help you.
The central database server of a larger organization will traditionally be load-balanced across several servers, meaning the logic runs on many different computers, and tasks are sent to whichever machine is under the least load. Additionally, the data is often replicated in real-time to multiple physical locations to guard against data loss. For a company like a bank, the records in their databases is the business. The company doesn’t exist without it.
Microsoft’s commercial database system is called SQL (“sequel”) Server. It only runs on Windows servers. SQL Server is extremely common in the enterprise, and most every company with a Windows server architecture (90% of all U.S. companies, at least) will be running one or more instances of SQL Server.
The big competitor to SQL Server is Oracle , which runs on a variety of platforms. Oracle is vastly more expensive. It’s licensed by processor, and you can easily pay six or even seven figures for an Oracle license on a machine with many processing cores or for many servers in a data center. You see Oracle less and less these days as SQL Server has gotten better and better.
The open-source system for Linux that most apps use is called MySQL (we say “my-sequel,” though you might hear “my-ess-cue-elle” too). Another common one on Linux is PostgreSQL (commonly just “Postgres”). These two systems offer less functionality than the commercial options, but are often considered good enough. (Additionally, MySQL is well-known for being extremely fast, as it’s considerably stripped down.)
Some other less common database platforms are DB2 , Firebird , Informix , SQLite , Sybase , and Ingres.