On the truth and falsity of models

Data scienceEpidemiology &c.

Over the last week, I have published two pieces (one | two) sharply critical of Dr Neil Ferguson’s COVID-19 models. Inevitably (but still disappointingly), much of the response to this fell into two camps: those who believe the Imperial model approximated deaths pretty well, and was consequently ‘right’, and those who believe the Imperial model overstated COVID-19 related mortality and morbidity, and hence was ‘wrong’.

My contention in the following is that neither of these statements are true. Models are not analytically truth-apt, they cannot be ‘true’ or ‘false’. They can, however, be ‘good’ or ‘bad’ (normative).

It is a frequently quoted aphorism, ascribed to the English statistician George Box, that all models are wrong, but some are useful. Most critics of stochastic modelling take issue with the second part. I will, unusually, take issue with the first.

Box’s statement that ‘all models are wrong’ is either trivially true or profoundly false. It is trivially true in that any model is a limited depiction of reality, akin to a lower-dimensional projection of a higher-dimensional space the way we reduce high-dimensional data to lower dimensionality all the time in statistics (e.g. PCA). It is profoundly false in that a model cannot be ‘true’ or ‘false’. Truth or falsity of an assertion ordinarily refers to whether the assertion is logically valid, there usually being an external yardstick against which to measure truth. For instance, if I were to assert that the sky was green (which it occasionally is), the truth or falsity of that statement could be related against the objective standard of the light spectrum emitted by the sky, as measurable by e.g. a spectrometer.

For a future projection, such a standard does not exist. If I were to assert instead that tomorrow the sky will indeed be green, you would be reasonable in doubting that I would be right. The probability of the sky being green tomorrow is by all means quite low, and undoubtedly, my prediction will not hold up to objective verification tomorrow. Does that make my prediction false?

It is arguably incorrect to say that a valid prediction (more about validity later – for now, let’s equate it to logical soundness) is right or wrong at the time it is made, because by definition the yardstick against which correctness is measured does not exist at the time. The model is not right or wrong but undefined at the very least until the time of the prediction.1)

My contention is that this does not change with reaching the time of the prediction. It would be comforting to assume one of two (almost equally wrong) descriptions:

•  As time converges to time , the value of reaches the ‘true’ value, i.e. , with being the expectation value.

•  As time converges to time , nothing happens. Then, suddenly, as time reaches , the value of , being , blinks into existence.

The first of these is more obviously wrong than the second. It is wrong because – as stated – the truth value of the prediction couldn’t possibly exist until the time of prediction is reached. Certainly we may make various statements about how likely it is, given data so far, that the prediction will pan out – Bayesian statistics has made a pretty good business out of doing so, and that’s a worthwhile statistical endeavour, but a truth value it is not.

The second is slightly more complex. It does satisfy our inherent desire to measure a prediction against something (anything, dear G–d, please!), but that is of course quite misleading. Predictions and realities are non-commensurate. By this term, I mean to indicate that they exist as if they were in entirely different universes. A prediction is a hypothetical probability statement of an expected value or distribution.2) Predictions do not ‘happen’: things ‘happen’, and a prediction may or may not be congruent with it, but that does not mean they share the same reality to the slightest extent.

All of this sounds quite esoteric and philosophical, but in fact it is not. Consider the following: on an unseasonably warm night in Atlantic City, I am gambling my entire fortune straight up on some number,3) say, Black 33. The odds on a straight up bet are, at least in America, 37:1, i.e. the chances of me not going home extremely broke are .4) These are the odds when I make the bet. The question is, what are the odds of the winning a straight up bet after the wheel has spun and the ball has landed? 

Notice that if this is how I ask the question, your instinct is to say ‘the same’, whereas if I were to ask what the probability of Black 33 was after the ball had landed in, say, Red 1 (right next to it, dang!), you would be inclined to say zero. It clearly cannot be both.

There is a somewhat preferred solution by which a probability function of time assumes various values as converges from to , after which it remains immobile. Those given to such language have described this analogously to a wavefunction collapse in quantum mechanics, in which the superposition of eigenstates (analogous here to the probabilities, being the superposition of all possible outcomes) collapses to a single eigenstate (analogous here to the actual outcome). The box is opened and the cat is either alive or dead.

The problem with this explanation is that it is altogether useless for causality, pace Schrödinger. Causality is not only about what led to a certain event but about what contributed to a particular event. In the law of civil wrongs, esp. negligence, this is sometimes conceptualised as a ‘loss of chance’: for instance, a delay in diagnosing an illness may decrease the chances of survival. The problem with a ‘dead/alive cat’ model is that it is incapable of ascribing values to that decrease in survival odds, because it has to conclude either that the cat’s dead or that it is alive. If the cat (or the patient) is alive, then the probability function over time has collapsed at one, which by the way is the maximum value, hence there’s nothing to compensate for. If the cat (or the patient) is dead, the probability function has collapsed at zero. For an excellent display of the ineptitude of the highest English court (with the laudable exception of Lord Hoffmann) to deal with this problem, you might enjoy reading this.

Which leads me to COVID-19 models, and what truth and falsity is in a model. I hope the aforegoing was convincing enough to assure you that truth and falsity cannot be defined for a predictive model. They are not, as Box said, all wrong. They are not even wrong, to borrow the words of Wolfgang Pauli.

And that circles us back to how we can assess models in a critical space (or just generally assess them, critical or not). Validation against experimental data can disprove a model, but it cannot ‘prove’ it to be right. If a model consistently does not align with empirical data, it is in all likelihood a bad model (not wrong, but bad, meaning doing a bad job at reflecting the reality it is supposed to reflect). If a model does align with empirical data, it may be a matter of coincidence or not. The Bayesian answer is that evidence helps us update our priors on how good a model is, and in practice, this will hover between zero and one most of the time after a non-trivial warm-up. This is a somewhat better depiction of what’s really going on than concluding that a model that predicts the future relatively accurately is ‘right’ or ‘wrong’.

This leaves us with assessing whether the model is ‘good’ or ‘bad’. A good model can be (but is very unlikely to consistently be) incorrect vis-a-vis reality as it ensues, and a bad model can be (but is very unlikely to consistently be) correct vis-a-vis reality as it ensues. However, the ‘goodness’ or ‘badness’ of a model are not relationships of fit to a particular expected value. Rather, they are inherent factors. Are the assumptions reasonable? Do the assumptions reflect empirical evidence? Does the model make unwarranted assumptions? Does the model make assumptions that are contrary to empirical evidence for no good reason?

While these questions are much harder to answer than calculating a RMSE or some other metric of divergence, they are the way to assess a model accurately – at least initially. That, of course, can then be updated by evidence continuously, yielding a Bayesian posterior of model ‘goodness’.


1 Nomenclature: time at the prediction is the time at which the prediction is made, while time of the prediction is the time for which the predicted value is estimated.
2 A singular value is, in fact, a special distribution, but that’s a different story.
3 For people who are neither mathematicians nor degenerate gamblers: straight up is when you bet on one particular number.
4 I actually have never played a game of pure luck for money in my entire life. I did fleece much of my university class in poker, though.
I'm a data scientist and computational epidemiologist focusing on the intersection of public health, data science and artificial intelligence.

You may also like

Leave a Reply