Bayesian updating
Numeracy · 0 connections
Bayesian updating
A belief should move when evidence arrives, and exactly as far as that evidence favours it over the alternatives.
Near-universal rule
Explanation
A city's cabs: 15% blue, 85% green. A witness who is right 80% of the time says the cab in a hit-and-run was blue. Now what is the probability that the cab was blue? Most people set the probability at 80%. The correct answer is 41%. “How likely is the witness to say blue if the cab is blue” (80%) is not the same as “How likely is the cab blue given that the witness said blue” (41%). There are 100 cabs in the city. 15 of them are blue. 85 of them are green. A witness sees the cab that hit the car. The witness is right 8 times out of 10, but wrong 2 times out of 10. The witness says: “The cab was blue.” Now we want to know: How likely is it that the cab was really blue? Let’s imagine the witness looked at all 100 cabs one by one. He is going to be wrong 20 times. Out of the 15 blue cabs, the witness says “blue” for 12 of them (because 80% of 15 is 12). Out of the 85 green cabs, the witness sometimes makes a mistake and says “blue” for 17 of them (because 20% of 85 is 17). So, when the witness says “blue”, that happens in 12 + 17 = 29 cases. Out of those 29 times the witness says “blue”, it is only 12 times that the cab was actually blue. So the chance is 12 out of 29, which is about 41%. So, even though the witness said blue, it is only about 41 out of 100 times that the cab is really blue. Prior (what you know before the witness speaks): Before anyone says anything, you already know that only 15 out of 100 cabs are blue. This is the prior probability or base rate: 15% or 0.15 (or 15 out of 100). This is your starting belief based on the base rates of the city. Posterior (what you believe after the witness speaks). The witness says the cab is blue. After hearing this, we update our belief using the numbers you already calculated. This updated belief is called the posterior probability. It is 12 out of 29, which is about 41% or 0.41. The witness’s report gives us new information, so we update from the prior (15%) to the posterior (41%). The number goes up from 15% because the witness is usually right, but it does not go all the way to 80% because of the many green cabs that can cause mistakes.
Watch
No video has yet earned its place here.
When it applies
Use it when you can state a belief as a number before the evidence arrives — even a rough one. "Probably" is not enough; "about 60%" is. It fits most cleanly when the problem has the structure of diagnosis: a prior probability (how common is this condition, this failure, this outcome in cases like this one?), a piece of evidence with known reliability, and a conclusion to draw. Medical testing is the clearest case, but the structure appears in any field that keeps base-rate records. It sharpens when several explanations are competing at once. The update runs against each candidate; evidence that discriminates between them moves the probabilities apart, evidence that says nothing either way leaves them where they were. The prior should come from a reference class where one exists — how often has this kind of thing turned out to be true — not from how this particular case feels.
Where it stops
It stops where the prior has no ground under it. If no base rate exists and no substantive knowledge can be translated into a starting probability, the prior is a guess — and multiplying evidence against a guess produces a number that looks more precise than it is. The calculation runs; the result does not mean what it appears to mean. It also stops where the evidence cannot be assigned a probability in the first place. Some questions do not yield to probability estimates at all. When neither the prior nor the likelihood can be stated even roughly, the framework has no valid inputs and should not run.
The misuse
A Dutch court convicted a nurse of multiple murders. The central statistical evidence: the probability that the pattern of deaths and resuscitations on her shifts occurred by chance was roughly 1 in 342 million. Presented in court, it did its work. The error — committed by someone who calculated the figure — was treating it as the probability she was guilty. The figure measured how likely this cluster would be if she were not causing the incidents: P(cluster | not-guilty) - "“If the nurse is innocent, how likely is it to see this bad pattern of deaths on her shifts?” The answer is: extremely unlikely". The court needed how likely she was not guilty, given the cluster: P(not-guilty | cluster) - "“Given this bad pattern of deaths, how likely is it that the nurse is actually innocent?”. Getting from one to the other requires the prior — how likely any given nurse is a serial killer before examining shift data. That prior is very small. It was never entered. The likelihood stood in for the posterior, and the conditional was inverted. She was later acquitted after independent statistical review.
A worked example
In a multi-year U.S. intelligence research program, volunteer forecasters were scored for accuracy on geopolitical questions alongside professional analysts who had access to classified information the volunteers never saw. The volunteers won — roughly 30% better than the analyst comparison group, about 60% better than chance. The difference was not what the volunteers knew. It was what they did when new information arrived. The professional analysts tended to set an estimate and hold it, revising rarely and in large jumps when they did. The top-performing volunteers revised more often and in smaller steps — a 60% probability nudged to 65% on confirming news, trimmed to 55% on news that cut against. Each small revision brought their estimates closer to what eventually happened.
Push
When new evidence arrives, combine it with what you already believed. Do not replace your prior with the evidence. Do not ignore the evidence and leave your prior untouched.
Veto
Do not treat the probability of the evidence as the probability of the hypothesis. A figure measuring how surprising your evidence would be if the hypothesis were false — a match probability, a cluster probability, a p-value — may not be reported or acted on as the probability that the hypothesis is false. The two are related only through the prior. Read off the likelihood as the answer and you have not updated at all; you have inverted a conditional without permission.
Connects to
Connections appear as the deck grows.
Go deeper
Superforecasting (Tetlock and Gardner, 2015): If the entry opened a question about your own forecasting practice, this shows updating as a discipline that can be learned
Book
with real tournament data on what separates the better forecasters from the worse ones.
Thinking, Fast and Slow (Kahneman, 2011): If the entry left you wanting a fuller account of how and why the prior gets dropped, this maps the failure modes
Book
base-rate neglect, anchoring, and coherence errors — in narrative accessible to any reader.
Understanding Uncertainty (Lindley, 2006): If you want the philosophical foundation without the mathematics, this gives the clearest account of why probability is the only consistent language for degrees of belief
Book
and why priors are not optional.
Mastery question
**Question:** A test you trust completely returns a result that points hard toward one conclusion. The number it hands you is accurate. Why is that number still not the probability that the conclusion is true? **The answer:** Because the number measures how expected this result would be if the conclusion were true — P(evidence | hypothesis), the likelihood — and that is a different quantity from how likely the conclusion is given the result — P(hypothesis | evidence), the thing you actually want. The only bridge between the two is your prior: you have to carry the result back through what you believed beforehand. A perfectly accurate likelihood is still the wrong conditional until it is multiplied through the prior. If the conclusion was very unlikely going in, even strong evidence may leave it unlikely. **The answer that misses it:** "Because no test is perfect — there's always some error rate, so you can't be fully certain." This person reaches for measurement noise: they treat the gap between the number and the truth as a question of reliability, to be closed by a more accurate test. **Why the difference matters:** The memoriser explains the gap as imperfection — more accuracy would close it. But the gap survives a perfect test, because it is not about error; it is about which conditional the number measures. Someone who has internalised the concept locates the gap in the inversion and names the prior as the missing operation.