Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
Spoiler alert: the answer is yes, it is to your advantage (assuming you’re like me and would fancy owning a nice car over, err, a goat).
The Monty Hall problem is famous because it seems completely counter-intuitive. It’s brain-melting stuff. I remember the first time I came across it; I was completely dumbfounded. Actually, I was dumbfounded for years. I refused to believe it. So did many people—including many PhDs and top scientists—when the problem was first published.
It was only when I started learning about Bayesian inference at university that my brain magically re-wired itself and all became clear. I saw the light, and it was glorious.
In this post, I’ll give an introduction to Bayesian inference and use it to prove that yes, it is indeed in your best interest to change your opinion, lest you want to end up with a goat wandering around in your apartment. But before I get to the nitty gritty, let’s start with the basics.
What is Bayesian inference?
Simply put, Bayesian inference is a statistical method in which the extent of our belief in some hypothesis H holding is updated as more evidence is provided. Ever seen one of those over-acted courtroom dramas where the prosecutor urges the jurors to find the defendant guilty beyond a reasonable doubt? It’s a great example of Bayesian inference. During the deliberation process, jurors start with a hypothesis of not guilty and update their belief in this hypothesis as they go through the evidence presented throughout the trial. A gun was found with the defendant’s fingerprints on it? The belief in innocence is lowered. The defendant was out of town during the night of the murder? The belief in innocence becomes stronger.
At the heart of Bayesian inference is Bayes’ theorem, devised by English statistician, philosopher and minister Reverend Thomas Bayes. Mathematically speaking, it’s an equation. Being an equation, it can be expressed in various different ways, each possibly more incomprehensible than the last. Here’s the form I find easiest to grasp:
P(H|E) = ( P(E|H) * P(H) ) / P(E)
- P(H|E) = The probability of H occuring given that E has occured. In Bayes-speak, P(H|E) is known as the posterior probability or simply posterior. I usually think of the posterior as “the stuff on the left-hand side of the equation”.
- P(E|H) = The probability of E occuring given that H has occured. Also known as the likelihood (which I find slightly confusing given all this talk of probabilities)
- P(H) = The probability of H occuring regardless of any other information. This is known as the prior probability or prior.
- P(E) = The probability of E occuring regardless of any other information. This is known as the marginal likelyhood or model evidence.
The objective is to calculate the posterior probability based on an initial probability (our prior) and the probability of a piece of evidence E. We can perform this calculation once, but better still, we can update the posterior probility when new evidence becomes available by simply using the previous posterior as the prior for our new calculation (this is known as Bayesian updating). It’s like Bayes’ theorem inside Bayes’ theorem. It quickly becomes complex, but for solving the Monty Hall problem, we don’t need to update the posterior at all.
Solving the Monty Hall problem using Bayesian inference
Before we apply Bayes’ theorem, let’s take note of what information we already have and formulate our hypotheses. Before the game show contestant (let’s call him Jim) has done anything, he has three doors to choose from. The probability of a given door having a car behind it is, 1/3. Let A, B and C be the following hypotheses:
- A: The car is behind the first door.
- B: The car is behind the second door.
- C: The car is behind the third door.
Trivially, P(A) = P(B) = P(C) = 1/3.
Our evidence comes in the form of the host opening a particular door. This gives us new information with which to update our posterior probability. According to the rules of the game show, Monty will never open the door with a car behind it, nor will he open the door currently chosen by the contestant. That leaves two possible doors. From Jim’s perspective, it’s equally likely for Monty choose one over the other. Thus, our marginal likelyhood is P(E)= 1/2.
Next, we set our hypotheses. Overall, we are interested in the probability of the car being behind a particular door given the evidence (the host opening another door). Let’s follow the Monty Hall problem definition and say Jim initially chooses the first door, after which Monty opens door number three to reveal a goat. Since hypothesis C will never hold (Monty will never open a door he knows to have a car behind it), only hypotheses A and B are of interest.
Let’s start with hypothesis A (the car is behind the first door). Our prior is the probability of the car being behind door A regardless of any other information we have. P(A) = 1/3.
Our marginal likelyhood is the probability of Monty choosing a given door regardless of any other information. Since he only ever has a choice between two doors, P(E) = 1/2 as stated above.
Our likelyhood, i.e. the probability that Monty chooses door three given that the car is behind the first door is 1/2 (both the second and third door have goats behind them). P(E|A) = 1/2.
Finally, we can calculate the posterior:
P(A|E) = (P(E|A) * P(A)) / P(E) = (1/2 * 1/3) / (1/2) = 1/3.
The probability of the car being behind the first door is 1/3. That means Jim has a one in third chance of winning is he sticks by his original choice.
Now, what about the posterior of hypothesis B? The marginal likelyhood is the same as before, since marginal likelyhoods don’t change based on the hypotheses under consideration. The prior also happens to be the same as before, 1/3. But the likelyhood isn’t. If we assume the hypothesis to hold and say that the car is behind the second door, Monty has no choice but to choose door three (remember, he’ll never open a door with a car behind it, nor will he open the door the contestant initially chose). Thus, P(E|B) is 1.
Our posterior is thus:
P(B|E) = (P(E|B) * P(B)) / P(E) = (1 * 1/3) / (1/2) = 2/3.
In terms of Bayesian inference, we can state that our belief in hypothesis B is stronger than our belief in A, since 2/3 is greater than 1/3. In other words, Jim has a two thirds chance of winning a car if he changes his mind and chooses a different door. He doubles his chances of winning a car if he changes his mind. With the prospect of dragging a goat across town, that’s a bet worth taking.
Besides solving the Monty Hall problem, what else can you do with Bayesian inference?
Properly answering this question would take at least a couple of PhD theses, so I’ll give you highly abridged version. Generalising the Monty Hall problem, Bayesian inference can be used to assign probabilities to all kinds of hypotheses. In addition, one can use Bayesian inference to predict the probable values of new, unobserved data points, which makes the method great for all kinds of forecasting. Though the calculations quickly become complex—often containing hundreds, if not thousands, of integrals—for problems more complicated than the Monty Hall problem, advances in cloud computing and breakthroughs in applied probability theory (Markov Chain Monte Carlo, I’m looking at you) allow us to steer clear of the massive equations and symbol hieroglyphics only a mathematician could love.
Given its capability to assign probabilities to hypotheses and predict things, the number of possible applications for Bayesian inference is virtually limitless. As a developer, I can use the method to administer statistically sound A/B tests, write machine learning algorithms, and develop recommendation systems, to name but a few. In an era where we rely on data to inform our decision-making, it’s to statistical methods like Bayesian inference we owe our sincerest thanks. They drive the design and development of the cutting-edge services of today, and enable the future innovations of tomorrow.
For fear that I may be understating just how important a role Bayesian inference plays in our universe, I’ll leave you with this parting thought:
If humans make decisions based on past experiences, doesn’t that mean that the human brain is simply a complex Bayesian computing machine?
I’ll be completely honest: I would not be surprised if, one day, scientific research found this to be true.
This post was written by Max Pagels, a Data Science Specialist at SC5 who spends an inordinate amount of time getting lost in statistical complexities of his own making. Thanks to Samuli Savolainen, Mikael Blomberg, and Niklas Lindgren for proofreading and feedback, and to Antti Salo for providing the cover art.