A different shade of grey

computational chemistry

A response to Urbina et al. (2022) and the unreasonable spectre of AI coming up with chemical warfare agents (as if we didn’t already have enough of them).


Chris von Csefalvay


13 April 2022

In a recent paper that has attracted the interest of popular media as well, Fabio Urbina and colleagues examined the use (or rather, the abuse) of computational chemistry models of toxicity for generating toxic compounds and potential chemical agent candidates.(Urbina et al. 2022) Urbina and colleagues conclude that

Urbina, Fabio, Filippa Lentzos, Cédric Invernizzi, and Sean Ekins. 2022. ‘Dual Use of Artificial-Intelligence-Powered Drug Discovery’. Nature Machine Intelligence 4 (3): 189–91.

By going as close as we dared, we have still crossed a grey moral boundary, demonstrating that it is possible to design virtual potential toxic molecules without much in the way of effort, time or computational resources.

I agree with the conclusion, but for rather different reasons.


Computational chemistry is the branch of computational science that focuses on applications in the chemical field. This includes pharmacology and rational drug design (RDD) in particular. The purpose of RDD is to generate drug candidates that show favourable indicators of effectiveness (such as high binding affinity to a target protein) along with indicators of biological suitability (such as no or low interference with other proteins, low toxicity and no inhibition of metabolic ‘bottlenecks’ like CYP3A4). The latter part is typically handled by a toxicity model.

Rather simply put, a toxicity model infers the structural associations (the chemical structures associated with undesirable effects) from a library of known compounds with known effects. For instance, the Toxicology in the 21st Century (Tox21) programme of the US federal government has performed over sixty different assays (typically, enzyme inhibition assays) for over 13,000 different compounds. (Tice et al. 2013) Using molecular fingerprinting, which we have discussed in a previous post on this blog in this very same context, it is possible to build relatively easy models for toxicity. Where a particular desired toxicity is known, say mitochondrial toxicity, it’s not difficult to build a pipeline that generates candidate compounds, derives the molecular fingerprint and evaluates the likelihood that the molecule that is obtained will be an effective agent. In this sense, I wholeheartedly agree with Urbina: the cat is very much out of the bag. Even if Tox21’s public data does not include the classical target of modern chemical weapons (acetylcholinesterase), such data is not exactly hard to come by or, for a nation-state actor, to generate. A near-peer adversary could create such assays for cents on the compound.

Tice, Raymond R, Christopher P Austin, Robert J Kavlock, and John R Bucher. 2013. ‘Improving the Human Hazard Characterization of Chemicals: A Tox21 Update’. Environmental Health Perspectives 121 (7): 756–65.

Nothing about the above is controversial. In fact, Urbina’s paper is an example of ‘trivial genius’: just about anyone who has ever done computational chemistry in the pharmacological/drug design space knows that any algorithm that is intended to optimise towards lower toxicity can be inverted to optimise towards higher toxicity, and the same models used to create effective enzyme inhibitors to treat cancer, depression, schizophrenia, allergies or autoimmune disease can be repurposed in a few hours and about $100 in AWS credits to something that will generate potent acetylcholinesterase inhibitors (AChEIs). Notably, this is not to say that all research aimed at acetylcholinesterase inhibition is aimed at creating a chemical warfare agent. AChEIs are used in a clinical context, e.g. for myasthenia gravis. They are, however, also the archetypal “nerve agent”. Which leads me to my second point of agreement with Urbina et al.: the tools of computational pharmacology and RDD are — and have been, for a long time! — open to misuse.

The reverse of the medal

On the other hand, the likelihood of an AI-generated chemical agent ever posing a threat beyond the theoretical is very, very low. There are a few reasons for that, and they’re inherent partly in computational chemistry, partly in weapons design.

The computational chemistry part pertains to the fact that molecular fingerprinting and similar models only give us a narrow view of the outcomes we may expect. For starters, no model is able to reliably assess the feasibility and cost-effectiveness of synthesis. There are plenty of drug candidates that have performed admirably in vitro and sometimes even in clinical trials, but for which no feasible way of cost-efficient, large-scale synthesis could be found. Then, there are the drugs that ought to work, and might even work in vitro, but end up failing in clinical trials with no effect or an unexpected toxicity. Effect inference from chemical structure looks only at one side of the medal, and not even all of that.

The bigger problem is the weapons design part. To avoid a late-night visit from some mild-mannered federal employees in a dark SUV, I’d like to point out that anything I discuss here is well in the public domain. With that said: just as pharmaceutical chemists want some things from their target compounds (such as relatively little inference, predictable metabolism, a wide therapeutic margin and few adverse effects), designers of chemical weapons have their own considerations for which to optimise. VX, for instance, is immensely popular because its oily consistency gives it beneficial physical properties. Similarly, a potential chemical weapon candidate must be stable vis-a-vis e.g. UV exposure, but not too stable. An example of the latter is the Red Zone in France, the World War I battlefields that have been bombarded with so many chemical weapons that to this day, they are heavily contaminated by arsenic, among others. The preference for binary agents (which contain two relatively stable and relatively non-toxic chemicals that are mixed, typically during the flight time of a shell, to form the active agent) means that a less toxic agent that can be reliably produced from the simple admixture of two relatively stable agents may be preferred to a more lethal unary agent. And this, of course, all assumes a state actor willing and able to violate international law on chemical weapons.

Finally, there is no real need for novel chemical agents, at least not in the nerve agent category. Not only does using a known agent provide plausible deniability, there is also no real need to create anything more lethal than VX. Even relatively old chemical agents, such as mustard agents, are effective enough. A novel chemical structure may not guarantee that the agent escapes chemical detection, and functional antidotes are going to be just as effective against novel agents. To an atropine/pralidoxime NAAK autoinjector’s efficacy, it makes no difference whether the acetylcholinesterase inhibition comes from sarin, VX, Tetram or inadvertent exposure to organophosphate pesticides. Arguably, this becomes somewhat more complex with other agents, where the biological targets — and hence the antidotes — are more specific. Nevertheless, it is hard to conceive of anyone possessed of a burning rationale to start creating novel chemical agents.

A lighter shade of grey

I commend the authors for discussing the moral aspects of this exercise. It is rather uncomfortable to write about the use of artificial intelligence to create tools whose predominant use would be to extinguish human lives (although, as noted, many of these compounds can, and often do, have a medicinal use).

Where I cannot agree with the authors is the conclusion that this situation calls for regulation (be it self-regulation or imposed from above).

Although MegaSyn is a commercial product and thus we have control over who has access to it, going forward, we will implement restrictions or an API for any forward-facing models. A reporting structure or hotline to authorities, for use if there is a lapse or if we become aware of anyone working on developing toxic molecules for non-therapeutic uses, may also be valuable. Finally, universities should redouble their efforts toward the ethical training of science students and broaden the scope to other disciplines, and particularly to computing students, so that they are aware of the potential for misuse of AI from an early stage of their career, as well as understanding the potential for broader impact.

This is all well and good, but — assuming that there were a realistic danger of people coming to grief from AI-generated chemical weapons — it solves a problem that the authors have failed to substantiate exists. The tools to do this have been around for a long time. For the reasons laid out in the previous section, there is no burning desire anywhere in the world right now, as far as I can see, to develop a successor to VX purely on a structural basis. Of course, a chemical agent that can be synthesised without relying on any OPCW listed substances, or which can have novel effects, or which can escape traditional methods of detection (GC/MS, typically), would be of interest to potential bad actors, but current models of computational chemistry do not help with that. Creating permutational or functional analogues of VX does not, realistically, put anyone a single step closer to the ability to carry out an atrocity using such reprehensible weapons.

On the other hand, I am concerned about the moral aspects that derive from the consequences of Urbina et al.’s paper. Given that the overwhelming majority of users of computational chemistry and RDD do so for benign purposes, the public attention garnered by such articles may create a regulatory push that is not going to make anyone safer, but will impede scientific inquiry. Urbina et al. point out the reputational risk, and the risk of a single bad apple spoiling the lot — I am rather uncertain whether an attention-grabbing headline in The Verge, raising the spectre of the scary AI that generates tens of thousands of killer compounds in hours (virtually all of which are very likely to be practically useless), is doing our discipline any favours.

The Scythian prince Anacharsis likened laws to cobwebs: strong enough to catch the weak, but too weak to impede the powerful. In that sense, a sufficiently determined adversary with the right tools — scientists, labs, an AWS account with a few hundred bucks of credit — will be able to subvert the art of creating chemistry to save lives and turn it into a way to destroy them. No amount of regulation, controlled APIs and lectures on the Hague Ethical Guidelines are going to be any impediment. To a near-peer adversary or even a well-funded non-state actor, the door has been open for rational chemical agent discovery (RCAD) for a very long time. If the past is anything to go by, it has not really yielded rich fruit. Pretty much the only new thing under the sun for chemical agents in recent decades was the (possible, theorised) use of binary agents on Kim Jong-nam — not exactly a result of assiduous, AI-driven research into novel agents, but rather the every-day fare of a paranoid, despotic regime driven by cruelty and ignorance.

The only thing to fear, then, is fear itself. There is a justified concern in the AI community that while we do indeed need to discuss ethical use of artificial intelligence in the RDD domain, there is a time and a place for that. Sensationalism and alarmist headlines of poison-spewing machine learning models are great clickbait, but they do not benefit the discipline. Realism, not alarmism, is needed to tackle these issues.


BibTeX citation:
  author = {Chris von Csefalvay},
  title = {A Different Shade of Grey},
  date = {2022-04-13},
  url = {https://chrisvoncsefalvay.com/posts/a-different-shade-of-grey},
  doi = {10.59350/bwee1-hee31},
  langid = {en-GB}
For attribution, please cite this work as:
Chris von Csefalvay. 2022. “A Different Shade of Grey.” https://doi.org/10.59350/bwee1-hee31.