“Yeah, but what is it you actually DO?” A friendly explainer.

People ask me what I do a lot. I used to say “I work with computers”, but I realized this would earn me an invite to check out broken routers or messed up Outlook installations. Sometimes, I told people I was an epidemiologist, and a few insisted on asking me about skin disorders (err no… that’s dermatology). And eventually, in my exasperation, I asked around on our computational epidemiology mailing list how others deal with this question.

What you can expect next time you’re flying if you’re too specific about your research interests. That, or a spot on the no-fly list.

Turns out there’s no good answer. And I mean that, to the point that some people would rather actually check on Aunt Velma’s busted router. “Public health worker” sounds like a social worker. “I’m a mathematician that’s interested in diseases” strikes people as weird at best. And “Oh… I’m into ebola and stuff” gets you on the no-fly list faster than you can say Elizabethkingia.

If we’re lucky, your audience will politely nod once told what you do, and if we’re super lucky, they might google it later on. That you’re looking at this website may mean one of four things:

  1. You might be that exceptional lady or gentleman who did meet a computational epidemiologist, who handed you this link. Yay! You keep good company.
  2. You may have heard of what we do – or that we even exist – for the first time. Excellent! The longest journey begins with a single step. Or so the fortune cookie from last night’s Mama Chang’s says.
  3. You’ve been told you’re going to work with one of us, and you’re like ‘a whut?’. Cool! We’re a rare and endangered species, so please don’t shoot us.
  4. You’ve been considering, or you’ve been voluntold,[1] to go into the computational epidemiology field. That’s because you’re either an epidemiologist who is a bit of a geek or a geek who has a penchant for epidemiology. Welcome!

In the following, we’ll go over some of the basics of who we aren’t, what we do and how to care for your friendly neighbourhood computational epidemiologist, including some basic do’s and don’ts. With adequate care, your computational epidemiologist may live a long, happy life only occasionally punctuated by swearing at the command line. Let’s get started!

Who we aren’t

To get this out of the way: computational epidemiologists are NOT infectious disease doctors (in fact, despite the name, epidemiology deals with all sorts of illnesses, including non-infectious ones!), most don’t have an MD, and no, most of us never wanted to be doctors. It’s not where ‘doctors who couldn’t hack it go’. That’s naturopathic medicine you’re thinking of.

While statistically, ugly sweaters are more prevalent among computationalists, Kate Winslet’s character in Contagion (2011) is actually a regular field epidemiologist. We know that because she dies. Computational epidemiologists are presumably immortal.

We’re also, mostly, not field workers. Remember that movie where somebody brings home some zoonotic meningitis/flu hybrid from South East Asia and the whole world starts dying, and Kate Winslet plays an EIS officer? Well, while technically there are computationalists who do end up in the EIS, most of us prefer our subjects of study to be as far away from us as possible. Given that I study filoviridae in particular and their transmission dynamics in various simulated populations, I’m rather glad I can do so without ever having to see a filovirus. Not saying they’re not pretty (they do have their own kind of beauty), but I’d rather keep my distance from things that dissolve your cells, turn you into sludge and make you bleed out of orifices you didn’t know you could bleed out of.

This transmission network of tuberculosis was done on the basis of contact tracing data. We rely on field epidemiologists to get us that data so that we can then understand spread and risk factors better.
Source: Bjorn-Mortensen et al., 2017.

We do, however, depend on field workers quite a bit – all the nice graph models we look at began life as a bunch of scared graduate students in Tyvek suits sweating their butt off and in laboured French/English/Swahili/Igbo/other language tried to ascertain whom the sickly-looking gentleman in the corner of the room has been in contact with over the last few weeks (that’s called ‘contact tracing’, and is one of the most indispensable yet one of the most tiresome parts of field epidemiology).

We’re also not infectious disease biologists, although if you’re really interested in how to create one that will kill a lot of people, ask a computational epidemiologist. They won’t answer, because you look shifty and anyone who asks questions like that shouldn’t know anyway, but we’re the ones to know.

What we actually do

An early triumph of geospatial methods in epidemiology: John Snow’s cholera map.

We’re in a somewhat ill-defined field. At the margins, we touch mathematics, statistics, bioinformatics, computer science and computability theory, genomics, medicine, public health and probably a dozen other fields I forgot about. Ultimately, the core of what we do is using (and improving and developing) computational methods to answer complex questions about health and sickness.

One surprising aspect when I talk to people is that we’re actually not only working with infectious diseases. I was bitten by the epidemiology bug[2] when working on a Britain-wide project on predictive factors of heart disease. Epidemiologists also deal with obesity, drug use, workplace accidents and toxic exposure, radiation leaks and pretty much anything that can make you sick.[3]

The name does suggest an association with epidemics, but its roots are actually worth considering. Epidemiology is the art/science (logos) of what is ‘on’ (epi) the people (demos) – that is, the study of what has befallen the people. And an epidemic, of course, is something that befalls enough people that it might be worth considering as a population level phenomenon. The Greeks knew their thing when it came to naming stuff, and in a sense, this is brilliant naming, for it points out exactly what the great strength and the great weakness of epidemiology is: epidemiology is ‘good’ at answering questions about what’s going on with an entire population. It is notoriously less good at answering questions about what’s going on with an individual.

And so, when you tell an epidemiologist you have a sore throat, enlarged lymph nodes and a fever, they’ll tell you that statistically, you probably have acute bacterial pharyngitis caused by group A beta-hemolytic Streptococcus, aka ‘strep throat’. But statistically, you may also have infectious mononucleosis or lymphoma. That’s because if you look at a million patients with a sore throat, enlarged lymph nodes and a fever, most will indeed have acute bacterial pharyngitis (and most of those where a cause can be identified will have GABHS bacterial pharyngitis),[4] and the rest will have an odd smattering of cases. Because on a population level, every patient is a Schrödinger’s Cat: until they get conclusively diagnosed, they have 80% bacterial pharyngitis, 15% mono, 3% lymphoma and 2% some weird bug the lab sees once a decade.[5]

The power of this kind of analysis becomes apparent when we do need to deal with a population at large rather than a small number of individual patients. This is the domain of public health, and that’s why epidemiologists and public health physicians are best of buddies.

Consider the following scenario. Every once in a while, someone is unintentionally exposed to HIV, the pathogen that causes AIDS. In some cases, a regimen of antiviral drugs administered right after the exposure (known as post-exposure prophylaxis or PEP) can reduce the likelihood of HIV remaining in the system and developing AIDS. However, the antivirals in PEP are both expensive and, well, fairly nasty drugs, with their own side effect profile. Should PEP be routinely offered? And for how long?

Enter epidemiology to save the day and answer the question. Based on studies about the likelihood that PEP will prevent seroconversion (being infected by HIV), studies about the per-thousand-doses risk of the antivirals used for PEP, data about PEP’s effectiveness and so on, a more coherent statistical picture can be constructed. This allows for the quantification of risks vs benefits in light of effectiveness and efficiency. Epidemiology can answer questions that literally save lives every day. It factors into individual health and treatment decisions, into the guidelines that govern care and into public health policies, such as the availability of neuraminidase inhibitors[6] in a flu epidemic. It governs emergency responses, such as whether to respond to an outbreak by mass vaccination, ring vaccination, antivirals, isolation, quarantine or other measures. Ultimately, epidemiology is the art and science of using data to help physicians, public health officials and clinicians make the right decisions based on the right data.

The computational part

There are lots of ways to categorise epidemiologists (and none of them will be encountered with much favour – we like being uncategorised, thanks!). Sometimes, epidemiologists are categorized by the disease area examined (e.g. oncoepidemiology or psychiatric epidemiology), by patient group (such as pediatric or perinatal epidemiology or geriatric epidemiology), disease type epidemiology (in particular, chronic illness epidemiology) and finally, categories that look at particular methods of epidemiological research. Molecular and genetic/genomic epidemiology, for instance, looks at the genetic correlates of sickness and health, and what genetic factors determine or predispose to particular conditions – or the lack thereof. Much of this is using association studies – studies in which medical histories and genotyping are correlated to determine what factors are most strongly associated with a particular outcome.[7] Computational approaches are just another methodological category.

So what sets computationalists like me apart? We are concerned with using computational efficiency and mathematically efficient methods to deliver results. Our quest is no different from other epidemiologists, but our arsenal of tools is different. By using a combination of statistical, machine learning, geospatial, graph based and simulational tools (and whatever else we found in the kitchen sink), we can provide more accurate and detailed data to provide answers to questions that can prolong life, avoid side effects, alleviate suffering and help people live longer, healthier, more fulfilling lives.

Computational epidemiology grew out of the increasing digitization of health data – the volume of detailed health information in electronic medical records are a tremendous resource – and the rapid growth in high performance computational methods over the last few decades. The reason computational epidemiologists are trained in a uniquely interdisciplinary fashion, and are expected to be conversant with anything from linear algebra to graph theory, efficient tensor algorithm programming to computationally solving differential equations, from filoviridae to predicting zoonotic vectors, from modeling the spread of waterborne infections to mining Facebook – in short, just about anything and everything – is to be able to carry a toolbox as extensive as possible to tackle the widest range of challenges. Ultimately, we rarely encounter the same issue twice, and the ability to solve problems often has to do with combining the right tools in the right constellation for the job.

The future

Computational epidemiology is a new profession, and yet every now and then, I get asked whether new developments in technology will make us someday obsolete.

No doubt there have been a lot of new developments in various fields of computer science and mathematics has given us some new tools to tackle challenges faster and better:

  • Social networks provide a volume of data about individuals that together may predict disease trends, and web-scale databases can efficiently analyze it. An example of this is Google Flu, which uses search terms for flu-related concepts – things one with influenza-like illness would search for, such as ‘is it normal to use more than 250 tissues a day?’ – to gauge and estimate the expected disease severity in a particular region. The efficacy of Google Flu is controversial,[8] but the approach is definitely one that has a future. Efficient web crawling is another aspect of this.
  • Natural language processing helps analysing and interpreting unstructured, verbal information and allow analyses from unstructured data, such as news items or social media comments. NLP needs a lot of work as it currently stands, including the ability to understand non-literal language – irony, metaphor, etc.[9] -, but it has gone an enormous way over the last decades and has become a workhorse of analysing medical and social data.
  • Machine learning is a wide field, and it’s without doubt that the rising ML tide lifts our boats, too. Deep learning, for instance, has made levels of large genomic association studies (genome-wide association studies, GWASs) with thousands or even millions of dimensions possible[10] that hitherto would have required vast computing power reserved only to a few major institutions.
  • GPU based processing, together with the liberalization of access to computing power (consider AWS’s EC2 or Microsoft Azure), has opened up computationally intensive tasks to millions, and reduced the barriers of entry immensely by allowing pay-by-use instead of a massive upfront investment. The effect of this on our work cannot be underestimated.
  • The recent developments in synthetic populations and large agent-based models have made simulations of massive numbers of people based on real population data possible. This is an incredible opportunity, as it allows simulations to be conducted with a statistically representative communication while safeguarding patient privacy. Agent-based models are enormously useful not only to predict but to test potential interventional scenarios.
  • Graph databases can ‘organically’ represent relationship data, such as traced contacts. The result is the ability to perform searches (known as traversal in graph analysis lingo) over large networks in less time.
  • Optimisation and algorithmic research, in particular trying to parallelise as many problems as possible, is accelerating what we do every day.

But in the end, most epidemiologists find their strength and resilience in the fact that they are helping other humans live better, healthier lives. Unlike clinicians and field epidemiologists, we may not meet individual patients too often. But at the end of the day, behind all the numbers is a shared endeavor by a massive web of interdependent professions, from clinicians through pharmacologists and lab techs to public health workers and us computational epidemiologists to help people, to make their lives healthier and to alleviate suffering. It’s why we pull the crazy hours. It’s why we spend long days away from those we love. It’s what keeps us going after failed simulation #889982984234. And I doubt any algorithm, AI or robot, however advanced, can find that in their hearts.

While a lot of what’s in this post is important and factually true, the tone is occasionally a little more tongue in cheek than you might be used to on this blog. This is a profession for the slightly insane. Talking about it without a degree of self-deprecating irony is as pointless as it’s impossible.

References   [ + ]

1. Volunteering a la 12 Monkeys: “you, you and you!”.
2. That’s a metaphor. Epidemiologists are not made by insect bite, that’s Spider-Man. Epidemiologists are made, usually in small batches (use non-stick baking parchment and keep temperature below 250ºF). The process of creating decent epidemiologists has been engraved on the handle of John Snow’s pump. It hasn’t been seen since the 1854 London Cholera outbreak, sadly, so most of us have been winging it. Don’t complain, you’re still alive, right?
3. Controversially, this includes gun violence. Not to get into this fray, but epidemiology has neither the right tools nor the right approach to approach the tragedy of accidental and avoidable gun deaths and even less so for intentional and non-avoidable gun deaths. At some point, I’ll rant a little about how a lot of this has to do with the fact that when research money is at stake, everything looks like the kind of nail eminently suited for smashing by the hammer your working group came up with. Until then, I’ll stay quiet and try not to piss off too many people.
4. Note that because we don’t really bother with pharyngitis if it does the polite thing and goes away eventually, the definite cause of about a third of all cases of pharyngitis is never discovered. To put it in perspective: as the most frequent single pathogen causing bacterial pharyngitis, GABHS accounts for about 20% of all cases where the cause is known – in other words, ‘no known cause’ is still almost three times as frequent as the most frequent known bacterial cause!
5. These are not the real numbers, but you get the point.
6. Such as oseltamivir (Tamiflu), zanamivir (Relenza) and laninamivir (Inavir), used in the treatment of symptomatic influenza A and B infections. Controversially, their effectiveness is somewhat limited – at most, they cut off a few days off the disease length and for normal healthy patients, they may not make much of a difference -, but have pronounced side effects, some of which long outlast administration. At this point, meta-analyses show that in otherwise healthy patients, neuraminidase inhibitors do not have a favourable risk/benefit ratio.
7. A famous example is the case of PCSK9, a gene coding for a protein involved in lipoprotein homeostasis, which in turn is strongly correlated with heart disease and cardiac mortality. In familial hypercholesterolaemia, a heritable condition where people present with high cholesterol levels largely regardless of diet, a gain-of-function mutation of the PCSK9 protein is present. It was found that reducing the levels of PCSK9, such as by a targeted antibody that binds PCSK9 (the monoclonal antibodies evolocumab and alirocumab, for example), would reduce LDL (low density lipoprotein or ‘bad cholesterol’) and thus have cardiac benefits. In the end, so far, the effect has been quite modest compared to the exorbitant price tag, considering the much lower cost of statins that accomplish, more or less, the same purpose.
8. For a good summary, see Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205.
9. Reyes, A., Rosso, P., & Buscaldi, D. (2012). From humor recognition to irony detection: The figurative language of social media. Data & Knowledge Engineering, (74)1-12.
10. Szymczak, S., Biernacka, J. M., Cordell, H. J., González‐Recio, O., König, I. R., Zhang, H., & Sun, Y. V. (2009). Machine learning in genome‐wide association studies. Genetic epidemiology, 33(S1).

Leave a Reply