Installing RStudio Server on Debian 9

Oh, wouldn’t it be just wonderful if you could have your own RStudio installation on a server that you could then access from whatever device you currently have, including an iPad? It totally would. Except it’s some times far from straightforward. Here’s how to do it relatively painlessly.

Step 1: Get a server

Choose a suitable (and affordable) server, pick a location near you in the drop-down menu on the bottom, and press Add this Linode! to set up your first Linode.

I use Linode,1 and in general, their 4096 server is pretty good. Linodes can be very easily resized, so this should not be a worry.

Step 2: Set up the server

Once your Linode is up, it will turn up on your dashboard with a random name (linode1234567, typically). If you click on it, you will see your Linode is ‘Brand New’, which means you need to configure it. I usually keep them in groups depending on purpose: blog servers, various processing servers, hosts, research servers. Each of them then gets a name. Choose whatever nomenclature fits your needs best.

Give your Linode a name you can recognise it by, and assign it to a category.

Next, click on the Rebuild tab, and configure the root password, the operating system (we’ll be using Debian 9), the swap disk size (for R, it’s a good idea to set this as large as you can) and finally, set the deployment disk size. I usually set that for 66%.

Step 3: Install R

The Remote Access field shows the SSH access command, complete with the root user, as well as public IPs, default gateways and other networking stuff.

SSH into your Linode and log in as root. You will find your Linode’s IP address and other interesting factoids about it under the Remote access tab. I have obscured some information as I don’t want you scallywags messing around with my server, but the IP address is displayed both in the SSH link (the one that goes ssh [email protected] or something along these lines) and below under public IPs.

Using vim, open /etc/apt/sources.list and add the following source:

# CRAN server for Debian stretch (R and related stuff)
deb http://cran.rstudio.com/bin/linux/debian stretch-cran34/

Save the file, and next install dirmngr (sudo apt-get install dirmngr). Then, add the requisite GPG key for CRAN, update the repository and install r-base:

sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'

sudo apt update
sudo apt install r-base

At this point, you can enter R to test if your R installation works, and try to install a package, like ggplot2. It should work, but may ask you to select an installation server. If all is well, this is what you should be getting:

[email protected]:~# R

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

In the prompt, enter install.packages("ggplot2"). Select a server if requested. Otherwise, watch ggplot2 (and a bazillion other packages) install. Then, quit R by calling q().

Step 4: Downgrade libssl

For some inscrutable reason, RStudio is currently set up to work with libssl1.0.0, whereas Debian 9 comes with libssl1.1.0 out of the box. Clearly that’s not going to work, so we’ll have to roll back our libssl. We’ll do so by creating a file called /etc/apt/sources.list.d/jessie.list using vim. And we’ll fill it with the following:

deb http://httpredir.debian.org/debian jessie main contrib non-free
deb-src http://httpredir.debian.org/debian jessie main contrib non-free

deb http://security.debian.org/ jessie/updates main contrib non-free
deb-src http://security.debian.org/ jessie/updates main contrib non-free

In case you’re curious: this creates a sources list called jessie, and allows you to draw from Debian Jessie. So let’s have apt get with the program (sudo apt update), and install libssl1.0.0 (sudo apt install libssl1.0.0). Ift may be prudent to also install the openssl tool corresponding to the libssl version (sudo apt install openssl/jessie).

Step 5: Time to install RStudio Server!

First, install gDebi, a package installer, by typing sudo apt-get install gdebi. Next, we’ll be grabbing the latest RStudio version using wget, and installing it using gDebi. Make sure you either do this in your home directory or in /tmp/, ideally. Note that the versions of RStudio tend to change – 1.1.442 was the current version, released 12 March 2018, at the time of writing this post, but may by now have changed. You can check the current version number here.

wget https://download2.rstudio.org/rstudio-server-1.1.442-amd64.deb
sudo gdebi rstudio-server-1.1.442-amd64.deb

If all is well and you said yes to the dress question about whether you actually want to install RStudio Server (no, you’ve been going through this whole pain in the rear for excrement and jocularity, duh) , you should see something like this:

(Reading database ... 55651 files and directories currently installed.)
Preparing to unpack rstudio-server-1.1.442-amd64.deb ...
Unpacking rstudio-server (1.1.442) ...
Setting up rstudio-server (1.1.442) ...
groupadd: group 'rstudio-server' already exists
rsession: no process found
Created symlink /etc/systemd/system/multi-user.target.wants/rstudio-server.service → /etc/systemd/system/rstudio-server.service.
● rstudio-server.service - RStudio Server
   Loaded: loaded (/etc/systemd/system/rstudio-server.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2018-03-31 21:59:25 UTC; 1s ago
  Process: 16478 ExecStart=/usr/lib/rstudio-server/bin/rserver (code=exited, status=0/SUCCESS)
 Main PID: 16479 (rserver)
    Tasks: 3 (limit: 4915)
   CGroup: /system.slice/rstudio-server.service
           └─16479 /usr/lib/rstudio-server/bin/rserver

Mar 31 21:59:25 localhost systemd[1]: Starting RStudio Server...
Mar 31 21:59:25 localhost systemd[1]: Started RStudio Server.

Now, if all goes well, you can navigate to your server’s IP at port 8787, and you should behold something akin to this:

The login interface. Technically, you could simply log in with your root password and root as the username. But just don’t yet.

There are a few more things you wish to install at this point – these are libraries that will help with SSL and other functionality from within R. Head back to the terminal and install the following:

sudo apt-get install libssl-dev libcurl4-openssl-dev libssh2-1-dev

Step 6: Some finishing touches

The login screen will draw on PAM, Unix’s own authentication module, in lieu of a user manager. As such, to create access, you will have to create a new Unix user with adduser, and assign a password to it. This will grant it a directory of its own, and you the ability to fine-tune what they should, and what they shouldn’t, have access to. Win-win! This will allow you to share a single installation among a range of people.

Step 7: To reverse proxy, or to not reverse proxy?

There are diverging opinions as to whether a reverse proxy such as NGINX carries any benefit. In my understanding, they do not, and there’s a not entirely unpleasant degree of security by obscurity in having a hard to guess port (which, by the way, you can change). It also makes uploads, on which you will probably rely quite a bit, more difficult. On the whole, there are more arguments against than in favour of a reverse proxy, but I may add specific guidance on reverse proxying here if there’s interest.

References   [ + ]

1. Using this link gives me a referral bonus of $20 as long as you remain a customer for 90 days. If you do not wish to do so, please use this link. It costs the same either way, and I would be using Linode anyway as their service is superbly reliable.

SafeGram: visualising drug safety

Update: an RMarkdown notebook explaining the whole process is available here.

Visualising vaccine safety is hard. Doing so from passive (or, as we say it in Britain, ‘spontaneous’!) pharmacovigilance (PhV) sources is even harder. Unlike in active or trial pharmacovigilance, where you are essentially dividing the number of incidents by the person-time or the number of patients in the cohort overall, in passive PhV, only incidents are reported. This makes it quite difficult to figure out their prevalence overall, but fortunately, we have some metrics we can use to better understand the issues with a particular medication or vaccine. The proportional reporting ratio (PRR) is a metric that can operate entirely on spontaneous reporting, and reflect how frequent a particular symptom is for a particular treatment versus all other treatments.

Defining PRR

For convenience’s sake, I will use the subscript * operator to mean a row or column sum of a matrix, so that

N_{i,*} = \displaystyle \sum_{j=1}^{n} N_{i,j}

and

N_{*,j} = \displaystyle \sum_{i=1}^{m} N_{i,j}

and furthermore, I will use the exclusion operator * \neg to mean all entities except the right hand value. So e.g.

N_{i, * \neg k} = \displaystyle \sum_{j=1, j \neq k}^m N_{i,j}

Conventionally, the PRR is often defined to with reference to a 2×2 contingency table that cross-tabulates treatments (m axis) with adverse effects (n axis):

Adverse effect of interest
(i)
All other adverse effects
(\neg i)
TOTAL
Treatment of interest
(j)
a = D_{i,j} b = D_{i, * \neg j} a + b = D_{i, *} = \displaystyle \sum_{j = 1}^{n} D_{i, j}
All other treatments
(\neg j)
c = D_{* \neg i, j} d = D_{* \neg i, * \neg j} c + d = D_{* \neg i, *} = \displaystyle \sum_{k=1, k \neq i}^{m} \sum_{l = 1}^{n} D_{k, l}

 

With reference to the contingency table, the PRR is usually defined as

\frac{a / (a+b)}{c / (c+d)} = \frac{a}{a + b} \cdot \frac{c + d}{c}

However, let’s formally define it over any matrix D.

Definition 1. PRR. Let D be an m \times n matrix that represents the frequency with which each of the m adverse effects occur for each of the n drugs, so that D_{i,j} (i \in m, j \in n) represents the number of times the adverse effect j has occurred with the treatment i.

For convenience’s sake, let D_{*,j} denote \sum_{i=1}^{m} D_{i,j}, let D_{i,*} denote \sum_{j=1}^{n} D_{i,j}, and let D_{*,*} denote \sum_{i=1}^{m} \sum_{j=1}^{n} D_{i,j}. Furthermore, let D_{* \neg i, j} denote \sum_{k \neq i}^{m} D_{k,j} and D_{i, * \neg j} denote \sum_{k \neq j}^{n} D_{i, k}.

Then, PRR can be calculated for each combination D_{i,j} by the following formula:

PRR_{i,j} = \frac{D_{i,j} / D_{i,*}}{D_{* \neg i, j} / D_{* \neg i, *}} = \frac{D_{i,j}}{D_{i,*}} \cdot \frac{D_{*\neg i, *}}{D_{*\neg i, j}}

Expanding this, we get

PRR_{i,j} = \frac{D_{i,j}}{\displaystyle\sum_{q=1}^n D_{i,q}} \cdot \frac{\displaystyle\sum_{r=1, r\ne i}^{m} \displaystyle\sum_{s=1}^{n} D_{r,s}}{\displaystyle\sum_{t=1, t\ne i}^{m} D_{t,j}}

Which looks and sounds awfully convoluted until we start to think of it as a relatively simple query operation: calculate the sum of each row, then calculate the quotient of the ADR of interest associated with the treatment of interest divided by all uses of the treatment of interest on one hand and the ADR of interest associated with all other drugs (j \mid \neg i or c) divided by all ADRs associated with all treatments other than the treatment of interest. Easy peasy!

Beyond PRR

However, the PRR only tells part of the story. It does show whether a particular symptom is disproportionately often reported – but does it show whether that particular symptom is frequent at all? Evans (1998) suggested using a combination of an N-minimum, a PRR value and a chi-square value to identify a signal.1 In order to represent the overall safety profile of a drug, it’s important to show not only the PRR but also the overall incidence of each risk. The design of the SafeGram is to show exactly that, for every known occurred side effect. To show a better estimate, instead of plotting indiviual points (there are several hundreds, or even thousands, of different side effects), the kernel density is plotted.

This SafeGram shows four vaccines – meningococcal, oral and injectable polio and smallpox -, and their safety record based on VAERS data between 2006 and 2016.

The reason why SafeGrams are so intuitive is because they convey two important facts at once. First, the PRR cut-off (set to 3.00 in this case) conclusively excludes statistically insignificant increases of risk.2 Of course, anything above that is not necessarily dangerous or proof of a safety signal. Rather, it allows the clinician to reason about the side effect profile of the particular medication.

  • The meningococcal vaccine (left upper corner) had several side effects that occurred frequently (hence the tall, ‘flame-like’ appearance). However, these were largely side effects that were shared among other vaccines (hence the low PRR). This is the epitome of a safe vaccine, with few surprises likely.
  • The injectable polio vaccine (IPV) has a similar profile, although the wide disseminated ‘margin’ (blue) indicates that ht has a wider range of side effects compared to the meningococcal vaccine, even though virtually all of these were side effects shared among other vaccines to the same extent.
  • The oral polio vaccine (OPV, left bottom corner) shows a flattened pattern typical for vaccines that have a number of ‘peculiar’ side effects. While the disproportionately frequently reported instances are relatively infrequent, the ‘tail-like’ appearance of the OPV SafeGram is a cause for concern. The difference between meningococcal and IPV on one hand and OPV on the other is explained largely by the fact that OPV was a ‘live’ vaccine, and in small susceptible groups (hence the low numbers), they could provoke adverse effects.
  • The smallpox vaccine, another live vaccine, was known to have a range of adverse effects, with a significant part of the population (about 20%) having at least one contraindication. The large area covered indicates that there is a rather astonishing diversity of side effects, and many of these – about half of the orange kernel – lies above the significance boundary of 3.00. The large area covered by the kernel density estimate and the reach into the right upper corner indicates a very probable safety signal worth examining.

Interpretation

A SafeGram for each vaccine shows the two-dimensional density distribution of two things – the frequency and the proportional reporting rate of each vaccine (or drug or device or whatever it is applied to). When considering the safety of a particular product, the most important question is whether a particular adverse effect is serious – a product with a low chance of an irreversible severe side effect is riskier than one with a high probability of a relatively harmless side effect, such as localized soreness after injection. But the relative severity of a side effect is hard to quantify, and a better proxy for that is to assume that in general, most severe side effects will be unique to a particular vaccine. So for instance while injection site reactions and mild pyrexia following inoculation are common to all vaccines and hence the relative reporting rates are relatively low, reflecting roughly the number of inoculations administered, serious adverse effects tend to be more particular to the vaccine (e.g. the association of influenza vaccines with Guillain-Barré syndrome in certain years means that GBS has an elevated PRR, despite the low number of occurrences, for the flu vaccines). Discarding vaccines with a very low number of administered cases, the SafeGram remains robust to differences between the number of vaccines administered. Fig. 1 above shows a number of typical patterns. In general, anything to the left of the vertical significance line can be safely ignored, as they are generally effects shared between most other vaccines in general and exhibit no specific risk signal for the particular vaccine. On the other hand, occurrences to the right of the vertical significance line may – but don’t necessarily do – indicate a safety signal. Of particular concern are right upper quadrant signals – these are frequent and at the same time peculiar to a particular vaccine, suggesting that it is not part of the typical post-inoculation syndrome (fever, fatigue, malaise) arising from immune activation but rather a specific issue created by the antigen or the adjuvant. In rare cases, there is a lower right corner ‘stripe’, such as for the OPV, where a wide range of unique but relatively infrequent effects are produced. These, too, might indicate the need for closer scrutiny. It is crucial to note that merely having a density of signals in the statistically significant range does not automatically mean that there is a PhV concern, but rather that such a concern cannot be excluded. Setting the PRR significance limit is somewhat arbitrary, but Evans et al. (2001) have found a PRR of 2, more than 3 cases over a two year period and a chi-square statistic of 4 or above to be suggestive of a safety signal. Following this lead, the original SafeGram code looks at a PRR of 3.0 and above, and disregards cases with an overall frequency of 3Y, where Y denotes the number of years considered.

Limitations

The SafeGram inherently tries to make the best out of imperfect data. Acknowledging that passive reporting data is subject to imperfections, some caveats need to be kept in mind.

  • The algorithm assigns equal weight to every ‘symptom’ reported. VAERS uses an unfiltered version of MedDRA, a coding system for regulatory activities, and this includes a shocking array of codes that do not suggest any pathology. For instance, the VAERS implementation of MedDRA contains 530 codes for normal non-pathological states (e.g. “abdomen scan normal”), and almost 18,000 (!) events involve at least one of these ‘everything is fine!’ markers. This may be clinically useful because they may assist in differential diagnosis and excluding other causes of symptoms, but since they’re not treated separately from actually pathological symptoms, they corrupt the data to a minor but not insignificant extent. The only solution is manual filtering, and with tens of thousands of MedDRA codes, one would not necessarily be inclined to do so. The consequence is that some symptoms aren’t symptoms at all – they’re the exact opposite. This is not a problem for the PRR because it compares a symptom among those taking a particular medication against the same symptom among those who are not.
  • A lot of VAERS reports are, of course, low quality reports, and there is no way for the SafeGram to differentiate. This is a persistent problem with all passive reporting systems.
  • The SafeGram gives an overall picture of a particular drug’s or vaccine’s safety. It does not differentiate between the relative severity of a particular symptom.
  • As usual, correlation does not equal causation. As such, none of this proves the actual risk or danger of a vaccine, but rather the correlation or, in other words, potential safety signals that are worth examining.
Grouped by pathogen, the safety of a range of vaccines was examined by estimating the density of adverse event occurrence versus adverse event PRR. Note that adverse events reported in VAERS do not show or prove causation, only correlation. This shows that for the overwhelming majority of vaccines, most AEFI reports are below the PRR required to be considered a true safety signal.

SafeGrams are a great way to show the safety of vaccines, and to identify which vaccines have frequently occurring and significantly distinct (high-PRR) AEFIs that may be potential signals. It is important to note that for most common vaccines, including controversial ones like HPV, the centre of the density kernel estimate are below the margin of the PRR signal limit. The SafeGram is a useful and visually appealing proof of the safety of vaccines that can get actionable intelligence out of VAERS passive reporting evidence that is often disregarded as useless.

References   [ + ]

1. Evans, S. J. W. et al. (1998). Proportional reporting ratios: the uses of epidemiological methods for signal generation. Pharmacoepidemiol Drug Saf, 7(Suppl 2), 102.
2. According to Evans et al., the correct figure for PRR exclusion is 2.00, but they also use N-restriction and a minimum chi-square of 4.0.