src="//" alt="A" title="A" class="latex" /> and B, in and of themselves, are material. Consider, for a moment, the unlikely scenario of living in that mythical wonderland of law-abiding citizens where nobody speeds. Then, it does not matter how many drivers are snapped – all of them are false positives, and thus p(B|A), the probability of speeding (B) given that one got snapped by a speed camera (A), is actually zero.

In other words, if we want to reverse the conditional operator, we need to make allowances for the ‘base frequency’, the ordinary frequency with which each event occurs on its own. To overcome base frequency neglect,6 we have a mathematical tool, courtesy of the good Revd. Thomas Bayes, who sayeth that, verily,

$latex p(B \mid A) = \frac{p(A \mid B) p(B)}{p(A)}

Or, in words: if you want to reverse the probabilities, you will have to take the base rates of each event into account. If what we know is the likelihood that you were not speeding if you were snapped and what we’re interested in is the likelihood that someone getting snapped is indeed speeding, we’ll need to know a few more things.

Case study 1: Speed cameras – continued

Putting that into our equation,

p(B|A) = \frac{p(A \mid B) p(B)}{p(A)} = \frac{1 \cdot 0.001}{0.031} = 0.032

In other words, the likelihood that we indeed did exceed the speed limit is just barely north of 3%. That’s a far cry from the ‘intuitive’ answer of 97% (quite accidentally, it’s almost the inverse).

Diagnostics, probabilities and Bayesian logic

The procedure of medical diagnostics is ultimately a relatively simple algorithm:

  1. create a list of possibilities, however remote (the process of differential diagnostics),
  2. order them in order of likelihood,
  3. update priors as you run tests.7

From a statistical perspective, this is implemented as follows.

  1. We begin by running a number of tests, specifically m of them. It is assumed that the tests are independent from each other, i.e. the value of one does not affect the value of another. Let R_j denote the results of test $j \leq m$.
    1. For each test, we need to iterate over all our differentials D_{i \ldots n}, and determine the probability of each in light of the new evidence, i.e. $latex p(D_i \mid R_j).
    2. So, let’s take the results of test j that yielded the results R_j, and the putative diagnosis D_i. What we’re interested in is p(D_i \mid R_j), that is, the probability of the putative diagnosis given the new evidence. Or, to use Bayesian lingo, we are updating our prior: we had a previous probability assigned to D_i, which may have been a uniform probability or some other probability, and we are now updating it – seeing how likely it is given the new evidence, getting what is referred to as a posterior.8
    3. To calculate the posterior P(D_i | R_j), we need to know three things – the sensitivity and specificity of the test j (I’ll call these S^+_j and S^-_j, respectively), the overall incidence of D_i,9 and the overall incidence of the particular result R_j.
    4. Plugging these variables into our beloved Bayesian formula, we get p(D_i \mid R_j) = \frac{p(R_j \mid D_i) p(D_i)}{p(R_j)}.
    5. We know that p(R_j \mid D_i), that is, the probability that someone will test a particular way if they do have the condition D_i, is connected to sensitivity and specificity: if R_j is supposed to be positive if the patient has D_i, then p(R_j \mid D_i) = S^-_j (sensitivity), whereas if the test is supposed to be negative if the patient has D_i, then p(R_j \mid D_i) = S^+_j (specificity).
    6. We also know, or are supposed to know, the overall incidence of D_i and the probability of a particular outcome, R_j. With that, we can update our prior for D_i \mid R_j.
  2. We iterate over each of the tests, updating the priors every time new evidence comes in.

This may sound daunting and highly mathematical, but in fact most physicians have this down to an innate skill, so much so that when I explained this to a group of FY2 doctors, they couldn’t believe it – until they thought about how they thought. And that’s a key issue here: thinking about the way we arrive at results is important, because they are the bedrock of what we need to make those results intelligible to others.

Case study 2: ATA testing for coeliac disease

For a worked example of this in the diagnosis of coeliac disease, check Notebook 1: ATA case study. It puts things in the context of sensitivity and specificity in medical testing, and is in many ways quite similar to the above example, except here, we’re working with a real-world test with real-world uncertainties.

There are several ways of testing for coeliac disease, a metabolic disorder in which the body responds to gluten proteins (gliadins and glutenins) in wheats, wheat hybrids, barley, oats and rye. One diagnostic approach looks at genetic markers in the HLA-DQ (Human Leukocyte Antigen type DQ), part of the MHC (Major Histocompatibility Complex) Class II receptor system. Genetic testing for a particular haplotype of the HLA-DQ2 gene, called DQ2.5, can lead to a diagnosis in most patients. Unfortunately, it’s slow and expensive. Another test, a colonoscopic biopsy of the intestines, looks at the intestinal villi, short protrusions (about 1mm long) into the intestine, for tell-tale damage – but this test is unpleasant, possibly painful and costly.

So, a more frequent way is by looking for evidence of an autoantibody called anti-tissue transglutaminase antibody (ATA) – unrelated to this gene, sadly. ATA testing is cheap and cheerful, and relatively good, with a sensitivity (S^+_{ATA}) of 85% and specificity (S^+_{ATA}) of 97%.10 We also know the rough probability of a sample being from someone who actually has coeliac disease – for a referral lab, it’s about 1%.

Let’s consider the following case study. A patient gets tested for coeliac disease using the ATA test described above. Depending on whether the test is positive or negative, what are the chances she has coeliac disease?

Sensitivity and specificity trade-off for an ATA test given various values of true coeliac disease prevalence in the population.

If you’ve read the notebook, you know by now that the probability of having coeliac disease if testing positive is around 22%, or a little better than one-fifth. And from the visualisation to the left, you could see that small incremental improvements in specificity would yield a lot more increase in accuracy (marginal accuracy gain) than increases in sensitivity.

While quite simple, this is a good case study because it emphasises a few essential things about Bayesian reasoning: