By Deane Barker tags: math, statistics

Thomas Bayes was a 17th century English polymath. He documented a method for statistical analysis and probability which has become core to those disciplines.

In its simplest form, Bayes Theorem allows us to find the probability of an event occurring them we know the probability of multiple related events occurring.

  • If we know that A leads to D Y% of the time
  • And we know that B leads to D X% of the time
  • And we also know that if C occurs, then D did not occur Z% of the time
  • So, if A, B, and C all occur, we can now calculate how likely it is that D will occur

Roughly speaking, Bayes Theorem allows us to predict the future by knowing the past.

If we know that when a country’s inflation rate hits 30%, that country was 45% like to have a coup in the next 30 days; and when a country has a significant natural disaster, it was 3% likely to have a coup in the next 30 days; then we can predict the likelihood of a coup when those two things happen together.

Bayes Theorem is a specific mathematical equation. I won’t reproduce the math here because it’s easily findable and well-known.

The word “Bayesian” is used in a lot of ways – “Bayesian analysis” and “Bayesian inference” are both popular. In essence, these refer to the same thing: using the probability of known past events to predict the likelihood of a future event.

Why I Looked It Up

I first heard the word 20 years ago when Bayesian analysis became popular in email spam detection. It went something like this:

If the word “penis” is in an email, there’s a 91% chance it’s spam. If the word “enlargement” is in an email, there’s a 57% chance it’s spam. But if the word “accounting” is in an email, there’s only a 3% chance it’s spam. If all three words occur, what is the chance that it’s spam?

Bayesian analysis was used in most of the original spam suppression systems. The system would “learn” from pre-categorized emails (either your own, or a general domain set that someone had), and then would apply those learnings to incoming email.

I used a Bayesian filter on a Microsoft Exchange server back in 2004 or so, and it was very effective.

Since then, I’ve heard the word Bayesian used in a lot of books on politics and intelligence. It’s quite commonly used by intelligence services to predict the likelihood that some event will occur.

I had a general understanding of why it was used, but I wanted specifics.


Added on

In Spam: A Shadow History of the Internet, there’s a great analogy to explain Bayesian theory:

It can be briefly summarized by a common analogy using black and white marbles. Imagine someone new to this world seeing the first sunset of her life. Her question: will the sun rise again tomorrow? In ignorance, she defaults to a fifty-fifty chance and puts a black marble and a white marble into a bag. When the sun rises, she puts another white marble in. The probability of randomly picking a white from the bag – that is, the probability of the sun rising based on her present evidence – has gone from 1 in 2 to 2 in 3. The next day, when the sun rises, she adds another marble, moving it to 3 in 4, and so on. Over time, she will approach (but never reach) certainty that the sun will rise. If, one terrible morning, the sun does not rise, she will put in a black marble and the probability will decline in proportion to the history of her observations. This system can be extended to very complex problems in which each of the marbles in the bag is itself a bag of marbles: a total probability made up of many individual, varying probabilities…

This is item #86 in a sequence of 838 items.

You can use your left/right arrow keys to navigate