Creative Commons License
This pyAgrum's notebook is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Author: Aymen Merrouche and Pierre-Henri Wuillemin.

**Monty Hall Problem**

This notebook follows the examples from "The Book Of Why" (Pearl, 2018) chapter 6 page 178.

Assume that you are playing a game that consists of choosing, among three doors, one to open, revealing the prise you won. Behind one of the three doors is a car and behind the two others are goats. After making your choice, the host opens one of the two remaining doors behind which is a goat. Do you want to change your choice?

The host can't choose the winning door. Let's enumerate all the possible cases:

Suppose that you choose door 1
Door 1 Door 2 Door 3 outcome if you switchoutcome if you stay
autogoatgoatloosewin
goatautogoatwinloose
goatgoatautowinloose

Simulation:

$P(wining \mid swithed) = \frac{2}{3}$
$P(wining \mid stayed) = \frac{1}{3}$

How is that? How can systematically changing my choice be favorable? When the host opens the door, I am left with two choices, this means I have a $\frac{1}{2}$ probability of winning if I switch and a $\frac{1}{2}$ probability of winning if I stick to my choice.

Causal diagram:

The host can't pick the winning door, so the variable "Door opened" has two causes, if you choose a door it can't be opened ($Your Door \rightarrow Door Opened$) and the host can't pick the door behind which the car is ($Location Of The Car \rightarrow Door Opened $). The corresponding causal diagram is the following:

Suppose that you choose door number 1, and the host opens door number 3:

It is confirmed by the Bayesian network, systematic change of choice is the best strategy!

In the beginning, I have to choose between three doors with no prior knowledge:

$P(Location Of The Car = 1) = \frac{1}{3}$ $P(Location Of The Car = 2) = \frac{1}{3}$ $P(Location Of The Car = 3) = \frac{1}{3}$

I choose to open door number 1.

The host opens door number 3, knowing that he can't pick the winning door, we get:

$P(DoorOpened = 3 \mid LocationOfTheCar = 3) = 0$ $P(DoorOpened = 3 \mid LocationOfTheCar = 2) = 1$ $P(DoorOpened = 3 \mid LocationOfTheCar = 1) = \frac{1}{2}$

We apply Bayes' rule: $P(LocationOfTheCar = 2 \mid DoorOpened = 3) = \frac{P(DoorOpened = 3 \mid LocationOfTheCar = 2)\times P(LocationOfTheCar = 2)}{\sum_{i=1}^{3}{P(DoorOpened = 3 \mid LocationOfTheCar = i) \times P(LocationOfTheCar = i)}} = \frac{1 \times \frac{1}{3}}{\frac{1}{3} \times (0+1+\frac{1}{2})} = \frac{2}{3} $

This is explained by collider bias. "Door Opened" is a collider, once we know this information it's as if we are controlling for it, opening a non-causal path between "Your Door" and "Location of the car." If you choose door number 1, and the host opens door number 3 then it is twice as likely that the car is located behind door number 2. That is because the host must not pick the winning door, why did he pick door number 3 and not door number 2? Maybe because door number 2 is the winning door. But the host didn't need to make such a choice between the door you chose and door number 3.

Version 2:

In this version of the game the host picks the door randomly (including the door behind which the car is), there are 6 cases to enumerate. Again, we will only account for cases where the host opens one of the two remaining doors revealing a goat.

Simulation :

In this simulation, we're only looking at cases where the host doesn't reveal the car because that's the data that we're left with, in the first place. Doing this and finding a different result from the first simulation, will highlight the importance of the process that generated the data.

And in this case :
$P(wining \mid swithed) = \frac{1}{2}$
$P(wining \mid stayed) = \frac{1}{2}$

We create the causal diagram:

Since the host can open any door apart from the one you chose, "Location of the car" is no longer a cause of "Door opened. The corresponding causal diagram is the following:

Suppose that you choose door number 1, and the host opens door number 3 revealing a goat:

The host chooses randomly between the remaining doors:

In the beginning, I have to choose between three doors with no prior knowledge: $P(Location Of The Car = 1) = \frac{1}{3}$ $P(Location Of The Car = 2) = \frac{1}{3}$ $P(Location Of The Car = 3) = \frac{1}{3}$
I choose to open door number 1.

The host randomly opens door number 3. We get:
$P(DoorOpened = 3 \mid LocationOfTheCar = 3) = \frac{1}{2}$ $P(DoorOpened = 3 \mid LocationOfTheCar = 2) = \frac{1}{2}$ $P(DoorOpened = 3 \mid LocationOfTheCar = 1) = \frac{1}{2}$

We apply Bayes' rule: $P(LocationOfTheCar = 2 \mid DoorOpened = 3) = \frac{P(DoorOpened = 3 \mid LocationOfTheCar = 2)\times P(LocationOfTheCar = 2)}{\sum_{i=1}^{3}{P(DoorOpened = 3 \mid LocationOfTheCar = i) \times P(LocationOfTheCar = i)}} = \frac{\frac{1}{2} \times \frac{1}{3}}{\frac{1}{3} \times (\frac{1}{2}+\frac{1}{2}+\frac{1}{2})} = \frac{1}{3} = P(LocationOfTheCar = 2)$

In this case, the host chooses randomly between door number 2 and door number 3. Why did he pick door number 3 and not door number 2? There is no reason for his choice. If he revealed the car then, it would mean that you made the wrong choice (You have no chances of winning whether you switched or not.) if he revealed a goat, he was not, in any cases, forced to do so because his choice was completely random. (You have as many chances of winning whether you switched or not.)