In other words, if we want to reverse the conditional operator, we need to make allowances for the ‘base frequency’, the ordinary frequency with which each event occurs on its own. To overcome **base frequency neglect**,^{6} we have a mathematical tool, courtesy of the good Revd. Thomas Bayes, who sayeth that, verily,

$latex p(B \mid A) = \frac{p(A \mid B) p(B)}{p(A)}

Or, in words: if you want to reverse the probabilities, **you will have to take the base rates of each event into account**. If what we know is **the likelihood that you were not speeding if you were snapped** and what we’re interested in is **the likelihood that someone getting snapped is indeed speeding**, we’ll need to know a few more things.

- We know that the speed cameras have a Type II (false negative) error rate of zero – in other words, if you are speeding (), you are guaranteed to get snapped () – thus, $p(A \mid B)$ is 1.
- We also know from the Highway Authority, who were using a different and more accurate measurement system, that approximately one in 1,000 drivers is speeding ().
- Finally, we know that of 1,000 drivers, 31 will be snapped – the one speeder and 3% accounting for the false positive rate –, yielding .

Putting that into our equation,

In other words, the likelihood that we indeed did exceed the speed limit is just barely north of 3%. That’s a far cry from the ‘intuitive’ answer of 97% (quite accidentally, it’s almost the inverse).

The procedure of medical diagnostics is ultimately a relatively simple algorithm:

- create a list of possibilities, however remote (the process of differential diagnostics),
- order them in order of likelihood,
- update priors as you run tests.
^{7}

From a statistical perspective, this is implemented as follows.

- We begin by running a number of tests, specifically of them. It is assumed that the tests are independent from each other, i.e. the value of one does not affect the value of another. Let denote the results of test $j \leq m$.
- For each test, we need to iterate over all our differentials , and determine the probability of each in light of the new evidence, i.e. $latex p(D_i \mid R_j).
- So, let’s take the results of test that yielded the results , and the putative diagnosis . What we’re interested in is , that is, the probability of the putative diagnosis
*given the new evidence*. Or, to use Bayesian lingo, we are*updating our prior*: we had a previous probability assigned to , which may have been a uniform probability or some other probability, and we are now*updating*it – seeing how likely it is*given the new evidence*, getting what is referred to as a*posterior*.^{8} - To calculate the posterior , we need to know three things – the sensitivity and specificity of the test (I’ll call these and , respectively), the overall incidence of ,
^{9}and the overall incidence of the particular result . - Plugging these variables into our beloved Bayesian formula, we get .
- We know that , that is, the probability that someone will test a particular way if they do have the condition , is connected to sensitivity and specificity: if is supposed to be positive if the patient has , then (sensitivity), whereas if the test is supposed to be negative if the patient has , then (specificity).
- We also know, or are supposed to know, the overall incidence of and the probability of a particular outcome, . With that, we can update our prior for .

- We iterate over each of the tests, updating the priors every time new evidence comes in.

This may sound daunting and highly mathematical, but in fact most physicians have this down to an innate skill, so much so that when I explained this to a group of FY2 doctors, they couldn’t believe it – until they thought about how they thought. And that’s a key issue here: thinking about the way we arrive at results is important, because they are the bedrock of what we need to make those results intelligible to others.

For a worked example of this in the diagnosis of coeliac disease, check Notebook 1: ATA case study. It puts things in the context of sensitivity and specificity in medical testing, and is in many ways quite similar to the above example, except here, we’re working with a real-world test with real-world uncertainties.

There are several ways of testing for __coeliac disease, a metabolic disorder__ in which the body responds to gluten proteins (gliadins and glutenins) in wheats, wheat hybrids, barley, oats and rye. One diagnostic approach looks at genetic markers in the HLA-DQ (Human Leukocyte Antigen type DQ), part of the MHC (Major Histocompatibility Complex) Class II receptor system. Genetic testing for a particular haplotype of the HLA-DQ2 gene, called DQ2.5, can lead to a diagnosis in most patients. Unfortunately, it’s slow and expensive. Another test, a colonoscopic biopsy of the intestines, looks at the intestinal villi, short protrusions (about 1mm long) into the intestine, for tell-tale damage – but this test is unpleasant, possibly painful and costly.

So, a more frequent way is by looking for evidence of an autoantibody called anti-tissue transglutaminase antibody (ATA) – unrelated to __this gene__, sadly. ATA testing is cheap and cheerful, and relatively good, with a sensitivity () of 85% and specificity () of 97%.^{10} We also know the rough probability of a sample being from someone who actually has coeliac disease – for a referral lab, it’s about 1%.

Let’s consider the following case study. A patient gets tested for coeliac disease using the ATA test described above. Depending on whether the test is positive or negative, what are the chances she has coeliac disease?

If you’ve read the notebook, you know by now that the probability of having coeliac disease if testing positive is around 22%, or a little better than one-fifth. And from the visualisation to the left, you could see that small incremental improvements in specificity would yield a lot more increase in accuracy (marginal accuracy gain) than increases in sensitivity.

While quite simple, this is a good case study because it emphasises a few essential things about Bayesian reasoning:

**Always know your baselines.**In this case, we took a baseline of 1%, even though the average incidence of coeliac disease in the population is closer to about 0.25% of that. Why? Because we don’t spot-test people for coeliac disease. People who do get tested get tested because they exhibit symptoms that may or may not be coeliac disease, and by definition they have a higher prevalence^{11}of coeliac disease. The factor is, of course, entirely imaginary – you would, normally, need to know or have a