# MedDRA + VAERS: A marriage made in hell?

This post is a Golden DDoS Award winner

So far, this blog was DDoS’d only three times within 24 hours of its publication. That deserves a prize.

Quick: what do a broken femur, Henoch-Schönlein purpura, fainting, an expired vaccine and a healthy childbirth have in common? If your answer was “they’re all valid MedDRA codes”, you’re doing pretty well. If you, from that, deduced that they all can be logged on VAERS as adverse effects of vaccination, you’re exceeding expectations. And if you also realise that the idea that Jane got an expired HPV vaccine, and as a consequence broke her femur, developed Henoch-Schönlein purpura, and suddenly gave birth to a healthy baby boy is completely idiotic and yet can be logged on VAERS, you’re getting where I’m going.

MedDRA is a medical nomenclature specifically developed for the purposes of pharmacovigilance. The idea is, actually, not dreadful – there are some things in a usual medical nomenclature like ICD-10 that are not appropriate for a nomenclature used for pharmacovigilance reporting (V97.33: sucked into jet engine comes to my mind), and then there are things that are specific to pharmacovigilance, such as “oh shoot, that was not supposed to go up his bum!” (MedDRA 10013659: vaccine administered at inappropriate site), “we overdosed little Johnny on the flu vaccine!” (MedDRA 10000381: drug overdose, accidental) and other joys that generally do only happen in the context of pharmacovigilance. So far, so good.

At the same time, MedDRA is non-hierarchical, at least on the coding level. Thus, while the ICD code V97.33 tells you that you’re dealing with an external cause of mortality and morbidity (V and Y codes), specifically air and space transport (V95-97), more specifically ‘other’ specific air transport accidents, specifically getting sucked into a jet engine (V97.33), there’s no way to extract from MedDRA 10000381 what the hell we’re dealing with. Not only do we not know if it’s a test result, a procedure, a test or a disease, we are hopelessly lost as to figuring out what larger categories it belongs to. To make matters worse, MedDRA is proprietary – which in and of itself is offensive to the extreme to the idea of open research on VAERS and other public databases: a public database should not rely on proprietary encoding! -, and it lacks the inherent logic of ICD-10. Consider the encoding of the clinical diagnosis of unilateral headache in both:

We know that an ICD code beginning with F will be something psychiatric and G will be neurological, and from that alone we can get some easy analytical approaches (a popular one is looking at billed codes and drilling down by hierarchical level of ICD-10 codes, something in which the ICD-10 is vastly superior to its predecessor). MedDRA, alas, does not help us such.

## Garbage in, garbage out

OK, so we’ve got a nomenclature where the codes for needlestick injury, death, pneumonia, congenital myopathy and a CBC look all the same. That’s already bad enough. It gets worse when you can enter any and all of these into the one single field. Meet VAERS.

The idea of VAERS is to allow physicians, non-physicians and ‘members of the public’ to report incidents. These are then coded by the CDC and depending on seriousness, they may or may not be investigated (all reports that are regarded as ‘serious’ are investigated, according to the CDC). The problem is that this approach is susceptible to three particular vulnerabilities:

• The single field problem: VAERS has a single field for ‘symptoms’. Everything’s a symptom. This includes pre-existing conditions, new onset conditions, vaccination errors, lab tests (not merely results, just the tests themselves!), interventions (without specifying if they’re before or after the vaccine), and so on. There is also no way to filter out factors that definitely have nothing to do with the vaccine, such as a pre-existing birth defect. The History/Allergies field is not coded.
• The coding problem: what gets coded and what does not is sometimes imperfect. This being a human process, it’s impossible to expect perfection, but the ramifications to this to certain methods of analysis are immense. For instance. if there are 100 cases of uncontrollable vomiting, that may be a signal. But if half of those are coded as ‘gastrointestinal disorder’ (also an existing code), you have two values of 50, neither of which may end up being a signal.
• The issue of multiple coding: because MedDRA is non-hierarchical, it is not possible to normalise at a higher level (say, with ICD-10 codes, at chapter or block level), and it is not clear if two codes are hierarchically related. In ICD-10, if a record contains I07 (rheumatic tricuspid valve disease) and I07.2 (tricuspid stenosis with tricuspid insufficiency), one can decide to retain the more specific or the less specific entry, depending on intended purpose of the analysis.

In the following, I will demonstrate each of these based on randomly selected reports from VAERS.

### The Single Field Problem (SFP)

The core of the SFP is that there is only one codeable field, ‘symptoms’.

VAERS ID 375693-1 involves a report, in which the patient claims she developed, between the first and second round of Gardasil,

severe stomach pain, cramping, and burning that lasted weeks. Muscle aches and overall feeling of not being well. In August 2009 patient had flu like symptoms, anxiety, depression, fatigue, ulcers, acne, overall feeling of illness or impending death.

Below is the patient’s symptom transposition into MedDRA entities (under Symptoms):

The above example shows the mixture of symptoms, diagnostic procedures and diagnostic entities that are coded in the ‘Symptoms’ field. The principal problem with this is that when considering mass correlations (all drugs vs all symptoms, for instance), this system would treat a blood test just as much as a contributor to a safety signal as anxiety or myalgia, which might be true issues, or depression, which is a true diagnosis. Unfiltered, this makes VAERS effectively useless for market basket analysis based (cooccurrence frequency) analyses.

Consider for instance, that $PRR$ is calculated as

$PRR_{V,R} = \frac{\Sigma (R \mid V) \/ \Sigma (V)}{\Sigma (R \mid \neg V) \/ \Sigma (\neg V)} = \frac{\Sigma (R \mid V)}{\Sigma (V)} \cdot \frac{\Sigma (\neg V)}{\Sigma (R \mid \neg V)}$

where $V$ denotes the vaccine of interest, $R$ denotes the reaction of interest, and the $\Sigma$ operator denotes the sum of rows or columns that fulfill the requisite criteria (a more detailed, matrix-based version of this equation is presented here). But if $\{R\}$, the set of all $R$, contains not merely diagnoses but also various ‘non-diagnoses’, the PRR calculation will be distorted. For constant $V$ and an unduly large $R$, the values computationally obtained from the VAERS data that ought to be $\Sigma(R \mid V)$ and $\Sigma(R \mid \neg V)$ will both be inaccurately inflated. This will yield inaccurate final results.

Just how bad IS this problem? About 30% bad, if not more. A manual tagging of the top 1,000 symptoms (by $N$, i.e. by the number of occurrences) was used as an estimate for how many of the diagnostic entities do not disclose an actual problem with the vaccine.

According to the survey of the top 1,000 codes, only a little more than 70% of the codes themselves disclose a relevant issue with the vaccine. In other words, almost a third of disclosed symptoms must be pruned, and these cannot be categorically pruned because unlike ICD-10, MedDRA does not disclose hierarchies based on which such pruning would be possible. As far as the use of MedDRA goes, this alone should be a complete disaster.

Again, for effect: a third of the codes do not disclose an actual side effect of the medication. These are not separate or identifiable in any way other than manually classifying them and seeing whether they disclose an actual side effect or just an ancillary issue. Pharmacovigilance relies on accurate source data, and VAERS is not set up, with its current use of MedDRA, to deliver that.

### The coding problem

Once a VAERS report is received, it is MedDRA coded at the CDC. Now, no manual coding is perfect, but that’s not the point here. The problem is that a MedDRA code does not, in and of itself,  indicate the level of detail it holds. For instance, 10025169 and 10021881 look all alike, where in fact the first is a lowest-level entity (an LLT – Lower-Level Term – in MedDRA lingo) representing Lyme disease, while the former is the top-level class (SOC – System Organ Class) corresponding to infectious diseases. What this means is that once we see a MedDRA coded entity as its code, we don’t know what level of specificity we are dealing with.

The problem gets worse with named entities. You see, MedDRA has a ‘leaf’ structure: every branch must terminate in one or more (usually one) LLT. Often enough, LLTs have the same name as their parent PT, so you get PT Lyme disease and LLT Lyme disease. Not that it terrifically matters for most applications, but when you see only the verbose output, as is the case in VAERS, you don’t know if this is a PT, an LLT, or, God forbid, a higher level concept with a similar name.

Finally, to put the cherry on top of the cake, where a PT is also the LLT, they have the same code. So for Lyme disease, the PT and LLT both have the code 10025169. I’m sure this seemed like a good idea at the time.

### The issue of multiple coding

As this has been touched upon previously, because MedDRA lacks an inherent hierarchy, a code cannot be converted into its next upper level without using a lookup table, whereas with, say, ICD-10, one can simply normalise to the chapter and block (the ‘part left of the dot’). More problematically, however, the same code may be a PT or an LLT, as is the case for Lyme disease (10025169).

Let’s look at this formally. Let the operator $\in^*$ denote membership under the transitive closure of the set membership relation, so that

1. if $x \in A$, then $x \in^* A$,
2. if $x \in A$ and $A \subseteq B$, then $x \in^* B$.

and so on, recursively, ad infinitum. Let furthermore $\in^*_{m}$ denote the depth of recursion, so that

1. for $x \in A$:  $x \in^*_{0} A$,
2. for $x \in A \mid A \subseteq B$:  $x \in^*_{1} B$,

and, once again, so on, recursively, ad infinitum.

Then let a coding scheme $\{S_{1...n}\}$ exhibit the Definite Degree of Transitiveness (DDoT) property iff (if and only if) for any $S_m \mid m \leq n$, there exists exactly one $p$ for which it is true that $S_m \in^*_{p} S$.

Or, in other words, two codes $S_q, S_r \mid q, r \leq n$, may not be representable identically if $p_q \neq p_r$. Less formally: two codes on different levels may not be identical. This is clearly violated in MedDRA, as the example below shows.

### Bonus: the ethical problem

To me as a public health researcher, there is a huge ethical problem with the use of MedDRA in VAERS. I believe very strongly in open data and in the openness of biomedical information. I’m not alone: for better or worse, the wealth – terabytes upon terabytes – of biomedical data, genetics, X-ray crystallography, models, sequences  prove that if I’m a dreamer, I’m not the only one.

Which is why it’s little short of an insult to the public that a pharmacovigilance system is using a proprietary encoding model.

Downloads from VAERS, of course, provide the verbose names of the conditions or symptoms, but not what hierarchical level they are, nor what structure they are on. For that, unless you are a regulatory authority or a ‘non-profit’ or ‘non-commercial’ (which would already exclude a blogger who unlike me has ads on their blog to pay for hosting, or indeed most individual researchers, who by their nature could not provide the documentation to prove they aren’t making any money), you have to shell out some serious money.

Worse, the ‘non-profit’ definition does not include a non-profit research institution or an individual non-profit researcher, or any of the research bodies that are not medical libraries or affiliated with educational institutions but are funded by third party non-profit funding:

There is something rotten with the use of MedDRA, and it’s not just how unsuitable it is for the purpose, it is also the sheer obscenity of a public database of grave public interest being tied to a (vastly unsuitable and flawed, as I hope it has been demonstrated above) nomenclature.

## Is VAERS lost?

### Resolving the MedDRA issue

Unlike quite a few people in the field, I don’t think VAERS is hopelessly lost. There’s, in fact, great potential in it. But the way it integrates with MedDRA has to be changed. This is both a moral point – a point of commitment to opening up government information – and one of facilitating research.

There are two alternatives at this point for the CDC.

1. MedDRA has to open up at least the 17% of codes, complete with hierarchy, that are used within VAERS. These should be accessible, complete with the hierarchy, within VAERS, including the CDC WONDER interface.
2. The CDC has to switch to a more suitable system. ICD-10 alone is not necessarily the best solution, and there are few alternatives, which puts MedDRA into a monopoly position that it seems to mercilessly exploit at the time. This can – and should – change.

### Moving past the Single Field Problem

MedDRA apart, it is crucial for VAERS to resolve the Single Field Problem. It is clear that from the issues presented in the first paragraph – a broken femur, Henoch-Schönlein purpura, fainting, an expired vaccine and a healthy childbirth – that there is a range of issues that need to be logged. A good structure would be

1. pre-existing conditions and risk factors,
2. symptoms that arose within 6 hours of administration,
3. symptoms that arose within 48 hours of administration,
4. symptoms that arose later than 48 hours of administration,
5. non-symptoms,
6. clinical tests without results,
7. clinical tests segmented by positive and negative results, and
8. ancillary circumstances, esp. circumstances pertaining to vaccination errors such as wrong vaccine administered, expired vaccine, etc.

The use of this segmentation would be able to differentiate not only time of occurrence, but also allow for adequate filtering to identify the correct denominators for the $PRR$.

### A future with (for?) MedDRA

As said, I am not necessarily hostile to MedDRA, even if the closet libertarian in me bristles at the fact that MedDRA is mercilessly exploiting what is an effective monopoly position. But MedDRA can be better, and needs to be better – if not for its own economic interests, then for the interests of those it serves. There are three particular suggestions MedDRA needs to seriously consider.

1. MedDRA’s entity structure is valuable – arguably, it’s the value in the entire project. If coding can be structured to reflect its internal hierarchy, MedDRA becomes parseable without a LUT,1 and kinship structures become parseable without the extra step of a LUT.
2. MedDRA needs to open up, especially to researchers not falling within its narrowly defined confines of access. Especially given the inherent public nature of its use – PhV and regulation are quintessentially public functions, and this needs an open system.
3. MedDRA’s entity structure’s biggest strength is that it comprises a range of different things, from administrative errors through physical injuries to test results and the simple fact of tests.

## Conclusion

VAERS is a valuable system with a range of flaws. All of them are avoidable and correctable – but would require the requisite level of will and commitment – both on CDC’s side and that of MedDRA. For any progress in this field, it is imperative that the CDC understand that a public resource maintained in the public interest cannot be driven by a proprietary nomenclature, least of all one that is priced out of the range of the average interested individual: and if they cannot be served, does the entire system even fulfill its governmental function of being of the people and for the people? It is ultimately CDC’s asset, and it has a unique chance to leverage its position to ensure that at least as far as the 17% of MedDRA codes go that are used in VAERS, these are released openly.

In the end, however sophisticated our dissimilarity metrics, when 30% of all entities are non-symptoms and we need to manually prune the key terms to avoid denominator bloat due to non-symptom entities, such as diagnostic tests without results or clearly unconnected causes of morbidity and mortality like motor vehicle accidents, dissimilarity based approaches will suffer from serious flaws. In the absence of detailed administration and symptom tracking at an individual or institutional level, dissimilarity metrics are the cheapest and most feasible ways of creating value out of post marketing passive reports. If VAERS is to be a useful research tool, as I firmly believe it was intended to be, it must evolve to that capability for all.

References   [ + ]

 1 ↑ Look-up table