cause, chance and Bayesian statistics

a briefing document

Cause, chance and Bayesian statistics is one in a series of documents showing how to apply empiric reasoning to social and psychological problems..
	Intelligence: misuse and abuse of statistics	drugs, smoking and addiction
	establishment psycho-bunk	cause, chance and Bayesian statistics
	For related empiric reasoning documents, start with Why Aristotelian logic does not work

Index	advertising disclaimer
Introduction
Black and blue taxis
Testing for rare conditions
How bad can it get?
Endnotes

Introduction Bayes [1], Thomas 1702-1761. An English theologian and mathematician who was the first to use probability assessments inductively. That is, calculating the probability of a new event on the basis of earlier probability estimates which have been derived from empiric data. Bayes set down his ideas on probability in “Essay Towards Solving a Problem in the Doctrine of Chances” (1763, published posthumously). That work became the basis of a statistical technique, now called Bayesian statistics. A key feature of Bayesian methods is the notion of using an empirically derived probability distribution for a population parameter. The Bayesian approach permits the use of objective data or subjective opinion [2] in specifying a prior distribution [3]. With the Bayesian approach, different individuals might specify different prior distributions. Classical statisticians argue that, for this reason, Bayesian methods suffer from a lack of objectivity. Bayesian proponents argue, correctly, that the classical methods of statistical inference have built-in subjectivity (through the choice of a sampling plan and the assumption of ‘randomness’ of distributions) and that an advantage of the Bayesian approach is that the subjectivity is made explicit [4]. However, a prior distribution cannot easily be argued to be strongly ‘subjective’. Bayesian methods have been used extensively in statistical decision theory. In this context, Bayes's theorem provides a mechanism for combining a prior probability distribution for the states of nature with new sample information, the combined data giving a revised probability distribution about the states of nature, which can then be used as a prior probability with a future new sample, and so on. The intent is that the earlier probabilities are then used to make ever better decisions. Thus, this is an iterative or learning process, and is a common basis for establishing computer programmes that learn from experience (see Feedback and crowding).

Black and blue taxis

Consider the witness problem in law courts. Witness reports are notoriously unreliable, which does not stop people being locked away on the basis of little more.

Consider a commonly cited scenario.

First piece of data:
A town has two taxi companies, one runs blue taxi-cabs and the other uses black taxi-cabs. It is known that Blue Company has 15 taxis and the Black Cab Company has 85 vehicles. Late one night, there is a hit-and-run accident involving a taxi. It is assumed that all 100 taxis were on the streets at the time.

Second piece of data:
A witness sees the accident and claims that a blue taxi was involved. At the request of the defence, the witness undergoes a vision test under conditions similar to those on the night in question. Presented repeatedly with a blue taxi and a black taxi, in‘random’ order, the witness shows he can successfully identify the colour of the taxi 4 times out of 5 (80% of the time). The rest or 1/5 of the time, he misidentifies a blue taxi as black or a black taxi as blue.

Bayesian probability theory asks the following question, “If the witness reports seeing a blue taxi, how likely is it that he has the colour correct?”

As the witness is correct 80% of the time (that is, 4 times in 5), he is also incorrect 1 time in 5, on average.

For the 15 blue taxis, he would (correctly) identify 80% of them as being blue, namely 12, and misidentify the other 3 blue taxis as being black.

For the 85 black taxis, he would also incorrectly identify 20% of them as being blue, namely 17.

Thus, in all, he would have misidentified the colour of 20 of the taxis. Also, he would have called 29 of the taxis blue where there are only 15 blue taxis in the town!

In the situation in question, the witness is telling us that the taxi was blue.

But he would have identified 29 of the taxis as being blue. That is, he has called 12 blue taxis ‘blue’, and 17 black taxis he has also called ‘blue’.

Therefore, in the test the witness has said that 29 taxis are blue and only been correct 12 times!

Thus, the probability that the taxis the witness claimed to be blue actually being blue, given the witness's identification ability, is 12/29, i.e. 0.41.

When the witness said the taxi was blue, he was incorrect therefore nearly 3 times out of every 5 times. The test showed the witness to be correct less than half the time.

Bayesian probability takes account of the real distribution of taxis in the town. It takes account, not just of the ability of a witness to identify blue taxis correctly (80%), but also the witness’s ability to identify the colour of blue taxis among all the taxis in town. In other words, Bayesian probability takes account of the witness’s propensity to misidentify black taxis as well. In the trade, these are called ‘false positives’.

The ‘false negatives’ were the blue taxis that the witness misidentified as black. Bayesian probability statistics (BPS) becomes most important when attempting to calculate comparatively small risks. BPS becomes important in situations where distributions are not random, as in this case where there were far more black taxis than blue ones.

Had the witness called the offending taxi as black, the calculation would have been {the 68 taxis the witness correctly named as black} over {the 71 taxis the witness thought were black}. That is, 68/71 (the difference being the 3 blue taxis the witness thought were black); or nearly 96% of the time, when the witness thought the taxi was black, it was indeed black.

Unfortunately, most people untrained in the analysis of probability tend to intuit, from the 80% accuracy of the witness, that the witness can identify blue cars among many others with an 80% rate of accuracy. I hope the example above will convince you that this is a very unsafe belief. Thus, in a court trial, it is not the ability of the person to identify a person among 8 (with a 1/8^th, or 12.5%, chance of guessing ‘right’ by luck!) in a pre-arranged line up that matters, but their ability to recognise them in a crowded street or a darkened alleyway in conditions of stress.

advertising
disclaimer

Testing for rare conditions

Virtually every lab-conducted test involves sources of error. Test samples can be contaminated, or one sample can be confused with another. The report on a test you receive from your doctor just may belong to someone else, or be sloppily performed. When the supposed results are bad, such tests can produce fear. But let us assume the laboratory has done its work well, and the medic is not currently drunk and incapable.

The problem of false positives is still a considerable difficulty. Virtually every medical test designed to detect a disease or medical condition has a built-in margin of error. The margin of error size varies from one test procedure to another, but it is often in the range of 1-5%, although sometimes it can be much greater than this. Error here means that the test will sometimes indicate the presence of the disease, even when there is no disease present.

Suppose a lab is using a test for a rare condition, a test that has a 2% false-positive rate. This means that the test will indicate the disease in 2% of people who do not have the condition.

Among 1,000 tested for the disease and who do not have it; the test will suggest that about 20 persons do have it. If, as we are supposing, the disease is rare (say it occurs in 0.1% of the population, 1 in 1000), it follows that the majority (here, 95%, 19 in 20) of the people whom the tests report to have the disease will be misdiagnosed!

Consider a concrete example [5]. Suppose that a woman (let us suppose her to be a white female, who has not recently had a blood transfusion and who does not take drugs and doesn’t have sex with intravenous drug users or bisexuals) goes to her doctor and requests an HIV test. Given her demographic profile, her risk of being HIV-positive is about 1 in 100,000. Even if the HIV test was so good that it had a false-positive rate as low as 0.1% (and it is nothing like that good), this means that approximately 100 women among 100,000 similar women will test positive for HIV, even though only one of them is actually infected with HIV.

When considering both the traumatising effects of such reports on people and the effects on future insurability, employability and the like, it becomes clear that the false-positive problem is much more than just an interesting technical flaw.

If your medic ever reports that you tested positive for some rare disorder, you should be extremely skeptical. There is a considerable likelihood the diagnosis itself is mistaken. Knowing this, intelligent physicians are very careful in their use of test results and in their subsequent discussion with patients. But not all doctors have the time or the ability to treat test results with the skepticism that they often deserve.

How bad can it get?

In general:

The more rare a condition and the less precise the test (or judgement), then the more likely (frequent) the error.

Consider the HIV test above. Many such tests are wrong 5%, or more, of the time. Remember that the real risk for our heterosexual white woman was around 1 in 100,000, but the test would indicate positive for 5000 of every 100,000 tested! Thus, if applied to a low risk group like white heterosexual females (who did not inject drugs, and did not have sex with a member of a high-risk group like bisexuals, or haemophiliacs, or drug injectors) then the HIV test would be incorrect 4999 times out of 5000!

In general, if the risk were even less and the test method still had a 5% the error rate, the rate for false positives would be even greater. The false positive rate would also increase if the test accuracy were lower.

advertising
disclaimer

Related further reading
	Intelligence: misuse and abuse of statistics	drugs, smoking and addiction
	establishment psycho-bunk	cause, chance and Bayesian statistics
	For related empiric reasoning documents, start with Why Aristotelian logic does not work

Endnotes

Enc Brit. (with much modification).
An example where notional probabilities are quoted can be seen in an American government report on ‘lie detection’, where it is very optimistically assumed that it is possible to‘detect lies’ correctly 90% of the time! (There is further discussion of ‘lie detection’, and of this report, at establishment psycho-bunk 1: ‘lie detection’.) If you examine the taxi example further on, you will come to understand that even were a 90% detection’ level possible, it would still amount to a very poor performance. But there is worse to come!

Supposing the test (‘lie detection’) were used for detecting spies in government recruiting. Then, if spies occur only once among every 50,000 applicants, 5000 genuine applicants would be rejected for every spy rejected, although one spy in 10 would also be missed. However, the ‘tests’ are actually done on students who have not been trained to fool ‘lie detectors’. A spy would be taught to relax when lying, and to randomly become tense when answering truthfully.

The students in the laboratory ‘tests’ stand to lose nothing if they ‘fail’ the test. Also, they will often know the purpose of the test. A career criminal is used to lying habitually, often without concern; after all, it is such people who readily choose a criminal career, whereas an ordinary person, brought in from the street, is going to be extremely disturbed if asked whether they murdered Mister X, and whether they were in a certain area Sunday last.

But the situation worsens still further.... There is no obvious means of determining a background probability in real-world investigations. Is the suspect to be regarded as one among ten, or one among ten million? If among ten million, with a 90% ‘detection’ rate, the test would throw up a million false-positives. If among ten suspects, the false positives can be expected to be one. Assume that you now have one, or even two (including the one wot dun it), then at best you have a fifty-fifty chance. Perhaps you should toss a coin? Is the one who‘failed’ the test a false positive, or is he the real thing? Further, even should you be lucky enough to pinpoint the appropriate target, no self-respecting criminal, knowing full well that‘lie detection’ is a nonsense, is likely to confess.

Every reader of spy thrillers knows full well that the major objective of the spy-masters is usually to suborn or blackmail people already in the organisation, not to send Colonel Kleb under cover to obtain a job in the cipher office. In either case, the spy-master will train their asset to avoid detection.

The only possible use for such mumbo-jumbo is to deter those naïve enough to believe in it, or to obtain a confession, which may or may not be false, from the simple-minded; a form of psychological thumb-screw.
A prior distribution means previously collected data. The word ‘distribution’ is a statistician’s word for a real data-set (set of data) or a theoretical model. An example of a theoretical distribution would be the normal curve, often referred to as the Bell Curve or as a Gaussian distribution.
Go to ‘Intelligence’: misuse and abuse of statistics to read on problems with the interpretation of‘classical’ statistics.
Example adapted from: Larry Laudan, Danger Ahead, Wiley, 1997, 0471134406. pp 30,31 and 66 – 71.
(Review probably to follow.)

email abelard at abelard.org

the address for this document is http://www.abelard.org/briefings/bayes.htm

2200 words
prints as 4 A4 pages (on my printer and set-up)