TED: How juries are fooled by statistics by Peter Donnelly

November 20, 2008 — aboytsov

If you have a disease test which is 99% accurate, and it’s diagnosed a patient positive, what is the probability that a patient indeed has the disease? 99%? Wrong! Think again! You really can’t tell unless you know the probability of the disease itself, i.e. how likely a random patient is to have the disease. For a quite uncommon one, this single test may present confidence of 1% or even less. Although obvious to a statistics specialist, common people are usually totally stumbled by it, and find it very hard to grasp. I won’t explain here why it is so, since I doubt I’ll do a better job than the speaker I’m about to present.

Peter Donnelly is an Oxford mathematician, specializing in applied probability. In this highly educational (and quite jaw-dropping for most) talk, he reveals common mistakes in interpreting statistics – and the devastating impact they can have. This is a great presentation of how important statistics is, and how crucial, and yet so uncommon, it is to understand it.

There are so many ways to misuse or misinterpret statistics, with one of the favorites being, of course – correlation implying causation. And yet, I believe statistics is a cornerstone of modern science. It may not play such a central role in theoretical physics, of course, but medicine, sociology, cognitive science, etc. etc. all depend on it to interpret the results of experiments correctly. If we all had better education in this area, we maybe wouldn’t be interviewing successful people so much.

Please watch this talk. It might be a little boring in the beginning, but it gets much more fun pretty soon.

My teacher said me: there are three kinds of lie: a lie, a Big Lie and statistics.

But your example with the decease test is not correct: the answer depends on what is the definition of accuracy. If the accuracy is defined as “For 100 contracted persons 99 have positive tests” it is exactly as you wrote – it depends on the probability of the decease and you need to use Bayes formula to calculate the answer.

But if the accuracy would be defined as “For 100 persons who had positive tests, 99 actually have the decease”, the answer does NOT depends on other factors.

The problem is that people rarely know the concrete definition of “accuracy” and just assume one that seems more natural for them.

I think your teacher borrowed it from someone else. 🙂 “There are three kinds of lies: lies, damned lies, and statistics”. Attributed to Bejamin Disraeli and popularized by Mark Twain – http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics.

I would rather prefer he did not (popularize it), since it’s used now as a popular defense by people that just want to believe what they do refusing to look at the evidence (=data =statistics).

As for your accuracy argument – you’re right, and it’s a good one. Thanks!

November 21, 2008 at 4:17 am

My teacher said me: there are three kinds of lie: a lie, a Big Lie and statistics.

But your example with the decease test is not correct: the answer depends on what is the definition of accuracy. If the accuracy is defined as “For 100 contracted persons 99 have positive tests” it is exactly as you wrote – it depends on the probability of the decease and you need to use Bayes formula to calculate the answer.

But if the accuracy would be defined as “For 100 persons who had positive tests, 99 actually have the decease”, the answer does NOT depends on other factors.

The problem is that people rarely know the concrete definition of “accuracy” and just assume one that seems more natural for them.

November 21, 2008 at 2:06 pm

I think your teacher borrowed it from someone else. 🙂 “There are three kinds of lies: lies, damned lies, and statistics”. Attributed to Bejamin Disraeli and popularized by Mark Twain – http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics.

I would rather prefer he did not (popularize it), since it’s used now as a popular defense by people that just want to believe what they do refusing to look at the evidence (=data =statistics).

As for your accuracy argument – you’re right, and it’s a good one. Thanks!

November 21, 2008 at 6:24 pm

“99%” answer isn’t necessarily wrong. It’s just based on an implicit assumption about prior distribution.

November 23, 2008 at 8:23 pm

Agree, but the assumption would have to be completely unrealistic – that every second person in the population has the decease.

February 22, 2010 at 10:04 pm

absolutely fascinating. there is a 99% I would like another statistics class if I took one! TED is my dream convention! xx den