Structuring R projects

There are some things that I call Smith goods:1 things I want, nay, require, but hate doing. A clean room is one of these – I have a visceral need to have some semblance of tidiness around me, I just absolutely hate tidying, especially in the summer.2 Starting and structuring packages and projects is another of these things, which is why I’m so happy things like cookiecutter exist that do it for you in Python.

While I don’t like structuring R projects, I keep doing it, because I know it matters. That’s a pearl of wisdom that came occasionally at a great price.
I am famously laid back about structuring R projects – my chill attitude is only occasionally compared to the Holy Inquisition, the other Holy Inquisition and Gunny R. Lee Ermey’s portrayal of Drill Sgt. Hartman, and it’s been months since I last gutted an intern for messing up namespaces.3 So while I don’t like structuring R projects, I keep doing it, because I know it matters. That’s a pearl of wisdom that came occasionally at a great price, some of which I am hoping to save you by this post.

Five principles of structuring R projects

Every R project is different. Therefore, when structuring R projects, there has to be a lot more adaptability than there is normally When structuring R projects, I try to follow five overarching principles.

  1. The project determines the structure. In a small exploratory data analysis (EDA) project, you might have some leeway as to structural features that you might not have when writing safety-critical or autonomously running code. This variability in R – reflective of the diversity of its use – means that it’s hard to devise a boilerplate that’s universally applicable to all kinds of projects.
  2. Structure is a means to an end, not an end in itself. The reason why gutting interns, scalping them or yelling at them Gunny style are inadvisable is not just the additional paperwork it creates for HR. Rather, the point of the whole exercise is to create people who understand why the rules exists and organically adopt them, understanding how they help.
  3. Rules are good, tools are better. When tools are provided that take the burden of adherence – linters, structure generators like cookiecutter, IDE plugins, &c. – off the developer, adherence is both more likely and simpler.
  4. Structures should be interpretable to a wide range of collaborators. Even if you have no collaborators, thinking from the perspective of an analyst, a data scientist, a modeller, a data engineer and, most importantly, the client who will at the very end receive the overall product.
  5. Structures should be capable of evolution. Your project may change objectives, it may evolve, it may change. What was a pet project might become a client product. What was designed to be a massive, error-resilient superstructure might have to scale down. And most importantly, your single-player adventure may end up turning into an MMORPG. Your structure has to be able to roll with the punches.

A good starting structure

Pretty much every R project can be imagined as a sort of process: data gets ingested, magic happens, then the results – analyses, processed data, and so on – get spit out. The absolute minimum structure reflects this:

.
└── my_awesome_project
    ├── src
    ├── output
    ├── data
    │   ├── raw
    │   └── processed
    ├── README.md
    ├── run_analyses.R 
    └── .gitignore

In this structure, we see this reflected by having a data/ folder (a source), a folder for the code that performs the operations (src/) and a place to put the results (output/). The root analysis file (the sole R file on the top level) is responsible for launching and orchestrating the functions defined in the src/ folder’s contents.

The data folder

The data folder is, unsurprisingly, where your data goes. In many cases, you may not have any file-formatted raw data (e.g. where the raw data is accessed via a *DBC connection to a database), and you might even keep all intermediate files there, although that’s pretty uncommon on the whole, and might not make you the local DBA’s favourite (not to mention data protection issues). So while the raw/ subfolder might be dispensed with, you’ll most definitely need a data/ folder.

When it comes to data, it is crucial to make a distinction between source data and generated data. Rich Fitzjohn puts it best when he says to treat

  • source data as read-only, and
  • generated data as disposable.

The preferred implementation I have adopted is to have

  • a data/raw/ folder, which is usually is symlinked to a folder that is write-only to clients but read-only to the R user,4,
  • a data/temp/ folder, which contains temp data, and
  • a data/output/ folder, if warranted.

The src folder

Some call this folder R– I find this a misleading practice, as you might have C++, bash and other non-R code in it, but is unfortunately enforced by R if you want to structure your project as a valid R package, which I advocate in some cases. I am a fan of structuring the src/ folder, usually by their logical function. There are two systems of nomenclature that have worked really well for me and people I work with:

  • The library model: in this case, the root folder of src/ holds individual .R scripts that when executed will carry out an analysis. There may be one or more such scripts, e.g. for different analyses or different depths of insight. Subfolders of src/ are named after the kind of scripts they contain, e.g. ETL, transformation, plotting. The risk with this structure is that sometimes it’s tricky to remember what’s where, so descriptive file names are particularly important.
  • The pipeline model: in this case, there is a main runner script or potentially a small number. These go through scripts in a sequence. It is a sensible idea in such a case to establish sequential subfolders or sequentially numbered scripts that are executed in sequence. Typically, this model performs better if there are at most a handful distinct pipelines.

Whichever approach you adopt, a crucial point is to keep function definition and application separate. This means that only the pipeline or the runner scripts are allowed to execute (apply) functions, and other files are merely supposed to define them. Typically, folder level segregation works best for this:

  • keep all function definitions in subfolders of src/, e.g. src/data_engineering, and have the directly-executable scripts directly under src/ (this works better for larger projects), or
  • keep function definitions in src/, and keep the directly executable scripts in the root folder (this is more convenient for smaller projects, where perhaps the entire data engineering part is not much more than a single script).

output and other output folders

Output may mean a range of things, depending on the nature of your project. It can be anything from a whole D.Phil thesis written in a LaTeX-compliant form to a brief report to a client. There are a couple of conventions with regard to output folders that are useful to keep in mind.

Separating plot output

My personal preference is that plot output folders should be subfolders of output/, rather than top-tier folders, unless the plots themselves are the objective.
It is common to have a separate folder for plots (usually called figs/ or plots/), usually so that they could be used for various purposes. My personal preference is that plot output folders should be subfolders of output folders, rather than top-tier folders, unless they are the very output of the project. That is the case, for instance, where the project is intended to create a particular plot on a regular basis. This was the case, for instance, with the CBRD project whose purpose was to regularly generate daily epicurves for the DRC Zaire ebolavirus outbreak.

With regard to maps, in general, the principle that has worked best for teams I ran was to treat static maps as plots. However, dynamic maps (e.g. LeafletJS apps), tilesets, layers or generated files (e.g. GeoJSON files) tend to deserve their own folder.

Reports and reporting

For business users, automatically getting a beautiful PDF report can be priceless.
Not every project needs a reporting folder, but for business users, having a nice, pre-written reporting script that can be run automatically and produces a beautiful PDF report every day can be priceless. A large organisation I worked for in the past used this very well to monitor their Amazon AWS expenditure.5 A team of over fifty data scientists worked on a range of EC2 instances, and runaway spending from provisioning instances that were too big, leaving instances on and data transfer charges resulting from misconfigured instances6 was rampant. So the client wanted daily, weekly, monthly and 10-day rolling usage nicely plotted in a report, by user, highlighting people who would go on the naughty list. This was very well accomplished by an RMarkdown template that was ‘knit‘ every day at 0600 and uploaded as an HTML file onto an internal server, so that every user could see who’s been naughty and who’s been nice. EC2 usage costs have gone down by almost 30% in a few weeks, and that was without having to dismember anyone!7

Probably the only structural rule to keep in mind is to keep reports and reporting code separate. Reports are client products, reporting code is a work product and therefore should reside in src/.

Requirements and general settings

I am, in general, not a huge fan of outright loading whole packages to begin with. Too many users of R don’t realise that

  • you do not need to attach (library(package)) a package in order to use a function from it – as long as the package is available to R, you can simply call the function as package::function(arg1, arg2, ...), and
  • importing a package using library(package) puts every single function from that package into the namespace, overwriting by default all previous entries. This means that in order to deterministically know what any given symbol means, you would have to know, at all times, the order of package imports. Needless to say, there is enough stuff to keep in one’s mind when coding in R to worry about this stuff.

However, some packages might be useful to import, and sometimes it’s useful to have an initialisation script. This may be the case in three particular scenarios:

  • You need a particular locale setting, or a particularly crucial environment setting.
  • It’s your own package and you know you’re not going to shadow already existing symbols.
  • You are not using packrat or some other package management solution, and definitely need to ensure some packages are installed, but prefer not to put the clunky install-if-not-present code in every single thing.

In these cases, it’s sensible to have a file you would source before every top-level script – in an act of shameless thievery from Python, I tend to call this requirements.R, and it includes some fundamental settings I like to rely on, such as setting the locale appropriately. It also includes a CRAN install check script, although I would very much advise the use of Packrat over it, since it’s not version-sensitive.

Themes, house style and other settings

It is common, in addition to all this, to keep some general settings. If your institution has a ‘house style’ for ggplot2 (as, for instance, a ggthemr file), for instance, this could be part of your project’s config. But where does this best go?

I’m a big fan of keeping house styles in separate repos, as this ensures consistency across the board.
It would normally be perfectly fine to keep your settings in a config.R file at root level, but a config/ folder is much preferred as it prevents clutter if you derive any of your configurations from a git submodule. I’m a big fan of keeping house styles and other things intended to give a shared appearance to code and outputs (e.g. linting rules, text editor settings, map themes) in separate – and very, very well managed! – repos, as this ensures consistency across the board over time. As a result, most of my projects do have a config folder instead of a single configuration file.

It is paramount to separate project configuration and runtime configuration:

  • Project configuration pertains to the project itself, its outputs, schemes, the whole nine yards. For instance, the paper size to use for generated LaTeX documents would normally be a project configuration item. Your project configuration belongs in your config/ folder.
  • Runtime configuration pertains to parameters that relate to individual runs. In general, you should aspire to have as few of these, if any, as possible – and if you do, you should keep them as environment variables. But if you do decide to keep them as a file, it’s generally a good idea to keep them at the top level, and store them not as R files but as e.g. JSON files. There are a range of tools that can programmatically edit and change these file formats, while changing R files programmatically is fraught with difficulties.

Keeping runtime configuration editable

A few years ago, I worked on a viral forecasting tool where a range of model parameters to build the forecast from were hardcoded as R variables in a runtime configuration file. It was eventually decided to create a Python-based web interface on top of it, which would allow users to see the results as a dashboard (reading from a database where forecast results would be written) and make adjustments to some of the model parameters. The problem was, that’s really not easy to do with variables in an R file.

On the other hand, Python can easily read a JSON file into memory, change values as requested and export them onto the file system. So instead of that, the web interface would store the parameters in a JSON file, from which R would then read them and execute accordingly. Worked like a charm. Bottom line – configurations are data, and using code to store data is bad form.

Dirty little secrets

Everybody has secrets. In all likelihood, your project is no different: passwords, API keys, database credentials, the works. The first rule of this, of course, is never hardcode credentials in code. But you will need to work out how to make your project work, including via version control, while also not divulging credentials to the world at large. My preferred solutions, in order of preference, are:

  1. the keyring package, which interacts with OS X’s keychain, Windows’s Credential Store and the Secret Service API on Linux (where supported),
  2. using environment variables,
  3. using a secrets file that is .gitignored,
  4. using a config file that’s .gitignored,
  5. prompting the user.

Let’s take these – except the last one, which you should consider only as a measure of desperation, as it relies on RStudio and your code should aspire to run without it – in turn.

Using keyring

keyring is an R package that interfaces with the operating system’s keychain management solution, and works without any additional software on OS X and Windows.8 Using keyring is delightfully simple: it conceives of an individual key as belonging to a keyring and identified by a service. By reference to the service, it can then be retrieved easily once the user has authenticated to access the keychain. It has two drawbacks to be aware of:

  • It’s an interactive solution (it has to get access permission for the keychain), so if what you’re after is R code that runs quietly without any intervention, this is not your best bet.
  • A key can only contain a username and a password, so it cannot store more complex credentials, such as 4-ple secrets (e.g. in OAuth, where you may have a consumer and a publisher key and secret each). In that case, you could split them into separate keyring keys.

However, for most interactive purposes, keyring works fine. This includes single-item secrets, e.g. API keys, where you can use some junk as your username and hold only on to the password.

For most interactive purposes, keyring works fine. This includes single-item secrets, e.g. API keys.
By default, the operating system’s ‘main’ keyring is used, but you’re welcome to create a new one for your project. Note that users may be prompted for a keychain password at call time, and it’s helpful if they know what’s going on, so be sure you document your keyring calls well.

To set a key, simply call keyring::key_set(service = "my_awesome_service", username = "my_awesome_user). This will launch a dialogue using the host OS’s keychain handler to request authentication to access the relevant keychain (in this case, the system keychain, as no keychain is specified), and you can then retrieve

  • the username: using keyring::key_list("my_awesome_service")[1,2], and
  • the password: using keyring::key_get("my_awesome_service").

Using environment variables

The thing to remember about environment variables is that they’re ‘relatively private’: everyone in the user session will be able to read them.
Using environment variables to hold certain secrets has become extremely popular especially for Dockerised implementations of R code, as envvars can be very easily set using Docker. The thing to remember about environment variables is that they’re ‘relatively private’: they’re not part of the codebase, so they will definitely not accidentally get committed to the VCS, but everyone who has access to the particular user session  will be able to read them. This may be an issue when e.g. multiple people are sharing the ec2-user account on an EC2 instance. The other drawback of envvars is that if there’s a large number of them, setting them can be a pain. R has a little workaround for that: if you create an envfile called .Renviron in the working directory, it will store values in the environment. So for instance the following .Renviron file will bind an API key and a username:

api_username = "my_awesome_user"
api_key = "e19bb9e938e85e49037518a102860147"

So when you then call Sys.getenv("api_username"), you get the correct result. It’s worth keeping in mind that the .Renviron file is sourced once, and once only: at the start of the R session. Thus, obviously, changes made after that will not propagate into the session until it ends and a new session is started. It’s also rather clumsy to edit, although most APIs used to ini files will, with the occasional grumble, digest .Renvirons.

Needless to say, committing the .Renviron file to the VCS is what is sometimes referred to as making a chocolate fireman in the business, and is generally a bad idea.

Using a .gitignored config or secrets file

config is a package that allows you to keep a range of configuration settings outside your code, in a YAML file, then retrieve them. For instance, you can create a default configuration for an API:

default:
    my_awesome_api:
        url: 'https://awesome_api.internal'
        username: 'my_test_user'
        api_key: 'e19bb9e938e85e49037518a102860147'

From R, you could then access this using the config::get() function:

my_awesome_api_configuration <- config::get("my_awesome_api")

This would then allow you to e.g. refer to the URL as my_awesome_api_configuration$url, and the API key as my_awesome_api_configuration$api_key. As long as the configuration YAML file is kept out of the VCS, all is well. The problem is that not everything in such a configuration file is supposed to be secret. For instance, it makes sense for a database access credentials to have the other credentials DBI::dbConnect() needs for a connection available to other users, but keep the password private. So .gitignoreing a config file is not a good idea.

A dedicated secrets file is a better place for credentials than a config file, as this file can then be wholesale .gitignored.
A somewhat better idea is a secrets file. This file can be safely .gitignored, because it definitely only contains secrets. As previously noted, definitely create it using a format that can be widely written (JSON, YAML).9 For reasons noted in the next subsection, the thing you should definitely not do is creating a secrets file that consists of R variable assignments, however convenient an idea that may appear at first. Because…

Whatever you do…

One of the best ways to mess up is creating a fabulous way of keeping your secret credentials truly secret… then loading them into the global scope. Never, ever assign credentials. Ever.

You might have seen code like this:

dbuser <- Sys.getenv("dbuser")
dbpass <- Sys.getenv("dbpass")

conn <- DBI::dbConnect(odbc::odbc(), UID = dbuser, PWD = dbpass)
Never, ever put credentials into any environment if possible – especially not into the global scope.
This will work perfectly, except once its done, it will leave the password and the user name, in unencrypted plaintext (!), in the global scope, accessible to any code. That’s not just extremely embarrassing if, say, your wife of ten years discovers that your database password is your World of Warcraft character’s first name, but also a potential security risk. Never put credentials into any environment if possible, and if it has to happen, at least make it happen within a function so that they don’t end up in the global scope. The correct way to do the above would be more akin to this:

create_db_connection <- function() {
    DBI::dbConnect(odbc::odbc(), UID = Sys.getenv("dbuser"), PWD = Sys.getenv("dbpass")) %>% return()
}

Concluding remarks

Structuring R projects is an art, not just a science. Many best practices are highly domain-specific, and learning these generally happens by trial and pratfall error. In many ways, it’s the bellwether of an R developer’s skill trajectory, because it shows whether they possess the tenacity and endurance it takes to do meticulous, fine and often rather boring work in pursuance of future success – or at the very least, an easier time debugging things in the future. Studies show that one of the greatest predictors of success in life is being able to tolerate deferred gratification, and structuring R projects is a pure exercise in that discipline.

Structuring R projects is an art, not just a science. Many best practices are highly domain-specific, and learning these generally happens by trial and error.
At the same time, a well-executed structure can save valuable developer time, prevent errors and allow data scientists to focus on the data rather than debugging and trying to find where that damn snippet of code is or scratching their head trying to figure out what a particularly obscurely named function does. What might feel like an utter waste of time has enormous potential to create value, both for the individual, the team and the organisation.

As long as you keep in mind why structure matters and what its ultimate aims are, you will arrive at a form of order out of chaos that will be productive, collaborative and useful.
I’m sure there are many aspects of structuring R projects that I have omitted or ignored – in many ways, it is my own experiences that inform and motivate these commentaries on R. Some of these observations are echoed by many authors, others diverge greatly from what’s commonly held wisdom. As with all concepts in development, I encourage you to read widely, get to know as many different ideas about structuring R projects as possible, and synthesise your own style. As long as you keep in mind why structure matters and what its ultimate aims are, you will arrive at a form of order out of chaos that will be productive, collaborative and mutually useful not just for your own development but others’ work as well.

My last commentary on defensive programming in R has spawned a vivid and exciting debate on Reddit, and many have made extremely insightful comments there. I’m deeply grateful for all who have contributed there. I hope you will also consider posting your observations in the comment section below. That way, comments will remain together with the original content.

References   [ + ]

1.As in, Adam Smith.
2.It took me years to figure out why. It turns out that I have ZF alpha-1 antitrypsin deficiency. As a consequence, even minimal exposure to small particulates and dust can set off violent coughing attacks and impair breathing for days. Symptoms tend to be worse in hot weather due to impaired connective tissue something-or-other.
3.That’s a joke. I don’t gut interns – they’re valuable resources, HR shuns dismembering your coworkers, it creates paperwork and I liked every intern I’ve ever worked with – but most importantly, once gutted like a fish, they are not going to learn anything new. I prefer gentle, structured discussions on the benefits of good package structure. Please respect your interns – they are the next generation, and you are probably one of their first example of what software development/data science leadership looks like. The waves you set into motion will ripple through generations, well after you’re gone. You better set a good example.
4.Such a folder is often referred to as a ‘dropbox’, and the typical corresponding octal setting, 0422, guarantees that the R user will not accidentally overwrite data.
5.The organisation consented to me telling this story but requested anonymity, a request I honour whenever legally possible.
6.In case you’re unfamiliar with AWS: it’s a cloud service where elastic computing instances (EC2 instances) reside in ‘regions’, e.g. us-west-1a. There are (small but nonzero) charges for data transfer between regions. If you’re in one region but you configure the yum repo server of another region as your default, there will be costs, and, eventually, tears – provision ten instances with a few GBs worth of downloads, and there’ll be yelling. This is now more or less impossible to do except on purpose, but one must never underestimate what users are capable of from time to time!
7.Or so I’m told.
8.Linux users will need libsecret 0.16 or above, and sodium.
9.XML is acceptable if you’re threatened with waterboarding.

Bayesian reasoning in clinical diagnostics: a primer.

We know, from the source of eternal wisdom that is Saturday Morning Breakfast Cereal, that insufficient math education is the basis of the entire Western economy.1 This makes Bayesian logic and reasoning about probabilities almost like a dark art, a well-kept secret that only a few seem to know (and it shouldn’t be… but that’s a different story). This weird-wonderful argument, reflecting a much-reiterated meme about vaccines and vaccine efficacy, is a good example:

The argument, here, in case you are not familiar with the latest in anti-vaccination fallacies, is that vaccines don’t work, and they have not reduced the incidence of vaccine-preventable diseases. Rather, if a person is vaccinated for, say, measles, then despite displaying clinical signs of measles, he will be considered to have a different disease, and therefore all disease statistics proving the efficacy of vaccines are wrong. Now, that’s clearly nonsense, but it highlights one interesting point, one that has a massive bearing on computational systems drawing conclusions from evidence: namely, the internal Bayesian logic of the diagnostic process.

Which, incidentally, is the most important thing that they didn’t teach you in school. Bayesian logic, that is. Shockingly, they don’t even teach much of it in medical school unless you do research, and even there it’s seen as a predictive method, not a tool to make sense of analytical process. Which is a pity. The reason why idiotic arguments like the above by @Cattlechildren proliferate is that physicians have been taught how to diagnose well, but never how to explain and reason about the diagnostic process. This was true for the generations before me, and is more or less true for those still in med school today. What is often covered up with nebulous concepts like ‘clinical experience’ is in fact solid Bayesian reasoning. Knowing the mathematical fundamentals of the thought process you are using day to day, and which help you make the right decisions every day in the clinic, helps you reason about it, find weak points, answer challenges and respond to them. For this reason, my highest hope is that as many MDs, epidemiologists, med students, RNs, NPs and other clinical decision-makers will engage with this topic, even if it’s a little long. I promise, it’s worth it.

Some basic ideas about probability

In probability, an event, usually denoted with a capital and customarily starting at A (I have no idea why, as it makes things only more confusing!), is any outcome or incidence that we’re interested in – as long as they’re binary, that is, they either happen or don’t happen, and discrete, that is, there’s a clear definition for it, so that we can decide if it’s happened or not – no half-way events for now.2 In other words, an event can’t happen and not happen at the same time. Or, to get used to the notation of conditionality, p(A \mid \neg A) = 0.3 A thing cannot be both true and false.

Now, we may be interested in how likely it is for an event to happen if another event happens: how likely is A if B holds true? This is denoted as p(A|B), and for now, the most important thing to keep in mind about it is that it is not necessarily the same as p(B|A)!4

Bayesian logic deals with the notion of conditional probabilities – in other words, the probability of one event, given another.5 It is one of the most widely misunderstood part of probability, yet it is crucial to understand to our own idea of the way we reason about things.

Just to understand how important this is, let us consider a classic example.


Case study 1: speed cameras

Your local authority is broke. And so, it does what local authorities do when they’re broke: play poker with the borough credit card set up a bunch of speed cameras and fine drivers. Over this particular stretch of road, the speed limit is 60mph.

According to the manufacturer, the speed cameras are very sensitive, but not very specific. In other words, they never falsely indicate that a driver was below the speed limit, but they may falsely indicate that the driver was above it, in about 3% of the cases (the false positive rate).

One morning, you’re greeted by a message in your postbox, notifying you that you’ve driven too fast and fining you a rather respectable amount of cash. What is the probability that you have indeed driven too fast?

You may feel inclined to blurt out 97%. That, in fact, is wrong.


Explanation

It’s rather counter-intuitive at first to understand why, until we consider the problem in formal terms. We know the probability p(A|\not B), that is, the probability of being snapped (A) even though you were not speeding (\not B). But what the question asks is what the likelihood that you were, in fact, speeding (B) given the fact that you were snapped (A). And as we have learned, the conditional probability operator is not commutative, that is, p(A|B) is not necessarily the same as p(B|A).

Why is that the case? Because base rates matter. In other words, the probabilities of A and B, in and of themselves, are material. Consider, for a moment, the unlikely scenario of living in that mythical wonderland of law-abiding citizens where nobody speeds. Then, it does not matter how many drivers are snapped – all of them are false positives, and thus p(B|A), the probability of speeding (B) given that one got snapped by a speed camera (A), is actually zero.

In other words, if we want to reverse the conditional operator, we need to make allowances for the ‘base frequency’, the ordinary frequency with which each event occurs on its own. To overcome base frequency neglect,6 we have a mathematical tool, courtesy of the good Revd. Thomas Bayes, who sayeth that, verily,

$latex p(B \mid A) = \frac{p(A \mid B) p(B)}{p(A)}

Or, in words: if you want to reverse the probabilities, you will have to take the base rates of each event into account. If what we know is the likelihood that you were not speeding if you were snapped and what we’re interested in is the likelihood that someone getting snapped is indeed speeding, we’ll need to know a few more things.


Case study 1: Speed cameras – continued

  • We know that the speed cameras have a Type II (false negative) error rate of zero – in other words, if you are speeding (B), you are guaranteed to get snapped (A) – thus, $p(A \mid B)$ is 1.
  • We also know from the Highway Authority, who were using a different and more accurate measurement system, that approximately one in 1,000 drivers is speeding (p(B) = 0.001).
  • Finally, we know that of 1,000 drivers, 31 will be snapped – the one speeder and 3% accounting for the false positive rate –, yielding p(A) = 0.031.

Putting that into our equation,

p(B|A) = \frac{p(A \mid B) p(B)}{p(A)} = \frac{1 \cdot 0.001}{0.031} = 0.032

In other words, the likelihood that we indeed did exceed the speed limit is just barely north of 3%. That’s a far cry from the ‘intuitive’ answer of 97% (quite accidentally, it’s almost the inverse).


Diagnostics, probabilities and Bayesian logic

The procedure of medical diagnostics is ultimately a relatively simple algorithm:

  1. create a list of possibilities, however remote (the process of differential diagnostics),
  2. order them in order of likelihood,
  3. update priors as you run tests.7

From a statistical perspective, this is implemented as follows.

  1. We begin by running a number of tests, specifically m of them. It is assumed that the tests are independent from each other, i.e. the value of one does not affect the value of another. Let R_j denote the results of test $j \leq m$.
    1. For each test, we need to iterate over all our differentials D_{i \ldots n}, and determine the probability of each in light of the new evidence, i.e. $latex p(D_i \mid R_j).
    2. So, let’s take the results of test j that yielded the results R_j, and the putative diagnosis D_i. What we’re interested in is p(D_i \mid R_j), that is, the probability of the putative diagnosis given the new evidence. Or, to use Bayesian lingo, we are updating our prior: we had a previous probability assigned to D_i, which may have been a uniform probability or some other probability, and we are now updating it – seeing how likely it is given the new evidence, getting what is referred to as a posterior.8
    3. To calculate the posterior P(D_i | R_j), we need to know three things – the sensitivity and specificity of the test j (I’ll call these S^+_j and S^-_j, respectively), the overall incidence of D_i,9 and the overall incidence of the particular result R_j.
    4. Plugging these variables into our beloved Bayesian formula, we get p(D_i \mid R_j) = \frac{p(R_j \mid D_i) p(D_i)}{p(R_j)}.
    5. We know that p(R_j \mid D_i), that is, the probability that someone will test a particular way if they do have the condition D_i, is connected to sensitivity and specificity: if R_j is supposed to be positive if the patient has D_i, then p(R_j \mid D_i) = S^-_j (sensitivity), whereas if the test is supposed to be negative if the patient has D_i, then p(R_j \mid D_i) = S^+_j (specificity).
    6. We also know, or are supposed to know, the overall incidence of D_i and the probability of a particular outcome, R_j. With that, we can update our prior for D_i \mid R_j.
  2. We iterate over each of the tests, updating the priors every time new evidence comes in.

This may sound daunting and highly mathematical, but in fact most physicians have this down to an innate skill, so much so that when I explained this to a group of FY2 doctors, they couldn’t believe it – until they thought about how they thought. And that’s a key issue here: thinking about the way we arrive at results is important, because they are the bedrock of what we need to make those results intelligible to others.


Case study 2: ATA testing for coeliac disease

For a worked example of this in the diagnosis of coeliac disease, check Notebook 1: ATA case study. It puts things in the context of sensitivity and specificity in medical testing, and is in many ways quite similar to the above example, except here, we’re working with a real-world test with real-world uncertainties.

There are several ways of testing for coeliac disease, a metabolic disorder in which the body responds to gluten proteins (gliadins and glutenins) in wheats, wheat hybrids, barley, oats and rye. One diagnostic approach looks at genetic markers in the HLA-DQ (Human Leukocyte Antigen type DQ), part of the MHC (Major Histocompatibility Complex) Class II receptor system. Genetic testing for a particular haplotype of the HLA-DQ2 gene, called DQ2.5, can lead to a diagnosis in most patients. Unfortunately, it’s slow and expensive. Another test, a colonoscopic biopsy of the intestines, looks at the intestinal villi, short protrusions (about 1mm long) into the intestine, for tell-tale damage – but this test is unpleasant, possibly painful and costly.

So, a more frequent way is by looking for evidence of an autoantibody called anti-tissue transglutaminase antibody (ATA) – unrelated to this gene, sadly. ATA testing is cheap and cheerful, and relatively good, with a sensitivity (S^+_{ATA}) of 85% and specificity (S^+_{ATA}) of 97%.10 We also know the rough probability of a sample being from someone who actually has coeliac disease – for a referral lab, it’s about 1%.

Let’s consider the following case study. A patient gets tested for coeliac disease using the ATA test described above. Depending on whether the test is positive or negative, what are the chances she has coeliac disease?

Sensitivity and specificity trade-off for an ATA test given various values of true coeliac disease prevalence in the population.

If you’ve read the notebook, you know by now that the probability of having coeliac disease if testing positive is around 22%, or a little better than one-fifth. And from the visualisation to the left, you could see that small incremental improvements in specificity would yield a lot more increase in accuracy (marginal accuracy gain) than increases in sensitivity.

While quite simple, this is a good case study because it emphasises a few essential things about Bayesian reasoning:

  • Always know your baselines. In this case, we took a baseline of 1%, even though the average incidence of coeliac disease in the population is closer to about 0.25% of that. Why? Because we don’t spot-test people for coeliac disease. People who do get tested get tested because they exhibit symptoms that may or may not be coeliac disease, and by definition they have a higher prevalence11 of coeliac disease. The factor is, of course, entirely imaginary – you would, normally, need to know or have a way to figure out the true baseline values.
  • Use independent baselines. It is absolutely crucial to make sure that you do not get the baselines from your own measurement process. In this case, for instance, the incidence of coeliac disease should not be calculated by reference to your own lab’s number of positive tests divided by total tests. This merely allows for further proliferation of false positives and negatives, however minuscule their effect. A good way is to do follow-up studies, checking how many of the patients tested positive or negative for ATA were further tested using other methodologies, many of which may be more reliable, and calculate the proportion of actual cases coming through your door by reference to that.

Case study 3: Vaccines in differential diagnosis

This case is slightly different, as we are going to compare two different scenarios. Both concern D_{VPD}, a somewhat contrived vaccine-preventable illness. D_{VPD} produces a very particular symptom or symptom set, S, and produces this symptom or symptom set in every case, without fail.12 The question is – how does the vaccination status affect the differential diagnosis of two identical patients,13 presenting with the same symptoms S, one of whom is unvaccinated?

No. That’s not how this works. That’s not how ANY of this works. Nnnnope.

It has been a regrettably enduring trope of the anti-vaccination movement that because doctors believe vaccines work, they will not diagnose a patient with a vaccine-preventable disease (VPD), simply striking it off the differential diagnosis or substitute a different diagnosis for it.14 The reality is explored in this notebook, which compares two scenarios, of the same condition, with two persons with the sole difference of vaccination status. That difference makes a massive – about 7,800x – difference between the likelihood of the vaccinated and the unvaccinated person having the disease. The result is that a 7,800 times less likely outcome slides down the differential. As NZ paediatrician Dr Greenhouse (@greenhousemd) noted in the tweet, “it’s good medical care”. In the words of British economist John Maynard Keynes,15 “when the facts change, I change my mind”. And so do diagnosticians.

Quite absolutely simply put: it’s not an exclusion or fudging data or in any sensible way proof that “no vaccine in history has ever worked”. It’s quite simply a reflection of the reality that if in a population a condition is almost 8,000 times less likely, then, yes, other more frequent conditions push ahead.


Lessons learned

Bayesian analysis of the diagnostic procedure allows not only increased clarity about what one is doing as a clinician. Rather, it allows the full panoply of tools available to mathematical and logical reasoning to investigate claims, objections and contentions – and like in the case of the alleged non-diagnosis of vaccines, discard them.

The most powerful tool anyone who utilises any process of structured clinical reasoning – be it clinical reasoning in diagnostics, algorithmic analysis, detective work or intelligence analysis – is to be able to formally reason about one’s own toolkit of structured processes. It is my hope that if you’ve never thought about your clinical diagnostic process in these terms, you will now be able to see a new facet of it.

References   [ + ]

1.The basis of non-Western economies tends to be worse. That’s about as much as Western economies have going for them. See: Venezuela and the DPRK.
2.There’s a whole branch of probability that deals with continuous probabilities, but discrete probabilities are crazy enough for the time being.
3.Read: The probability of A given not-A is zero. A being any arbitrary event: the stock market crashing, the temperature tomorrow exceeding 30ºC, &.
4.In other words, it may be the same, but that’s pure accident. Mathematically, they’re almost always different.
5.It’s tempting to assume that this implies causation, or that the second event must temporally succeed the first, but none of those are implied, and in fact only serve to confuse things more.
6.You will also hear this referred to as ‘base rate neglect’ or ‘base rate fallacy’. As an epidemiologist, ‘rate’ has a specific meaning for us – it generally means events over a span of time. It’s not a rate unless it’s necessarily over time. I know, we’re pedantic like that.
7.This presupposes that these tests are independent of each other, like observations of a random variable. They generally aren’t – for instance, we run the acute phase protein CRP, W/ESR (another acute phase marker) and a WBC count, but these are typically not independent from each other. In such cases, it’s legitimate to use B = B_1 \cap B_2 \cap \ \ldots \cap B_n or, as my preferred notation goes, B = \bigcap^n_{k=1} B_k. I know ‘updating’ is the core mantra of Bayesianism, but knowing what to update and knowing where to simply calculate the conjoint probability is what experts in Bayesian reasoning rake in the big bucks for.
8.Note that a posterior from this step can, upon more new evidence, become the prior in the next round – the prior for j may be the inferred probability p(D_i), but the prior for j + 1 is p(D_i \mid R_j), and so on. More about multiple observations later.
9.It’s important to note that this is not necessarily the population incidence. For instance, the overall incidence and thus the relevant D for EBOV (D_{EBOV}) is going to be different for a haemorrhagic fever referral lab in Kinshasa and a county hospital microbiology lab in Michigan.
10.Lock, R.J. et al. (1999). IgA anti-tissue transglutaminase as a diagnostic marker of gluten sensitive enteropathy. J Clin Pathol 52(4):274-7.
11.More epidemiopedantry: ‘incidence’ refers to new cases over time, ‘prevalence’ refers to cases at a moment in time.
12.This is, of course, unrealistic. I will do a walkthrough of an example of multiple symptoms that each have an association with the illness in a later post.
13.It’s assumed gender is irrelevant to this disease.
14.Presumably hoping that refusing to diagnose a patient with diphtheria and instead diagnosing them with a throat staph infection will somehow get the patient okay enough that nobody will notice the insanely prominent pseudomembrane…
15.Or not…

Are you looking for a data science sensei?

Maybe you’re a junior data scientist, maybe you’re a software developer who wants to go into data science, or perhaps you’ve dabbled in data for years in Excel but are ready to take the next step.

If so, this post is all about you, and an opportunity I offer every year.

You see, life has been very good to me in terms of training as a data scientist. I have been spoiled, really – I had the chance to learn from some of the best data scientists, work with some exceptional epidemiologists, experience some unusual challenges and face many of the day-to-day hurdles of working in data analytics. I’ve had the fortune to see this profession in all its contexts, from small enterprises to multi-million dollar FTSE100 companies, from well-run agile start-ups to large and sometimes pretty slow dinosaurs, from government through the private sector to NGOs: I’ve seen it all. I’ve done some great things. And I’ve made some superbly dumb mistakes.

And so, at the start of every year, I have opened applications for young, start-of-career data scientists looking for their Mr. Miyagi. Don’t worry: no car waxing involved. I will be choosing a single promising young data scientist and pass on as much as I can of my so-called wisdom. At the end, your skills will shine like Mr. Miyagi’s 1947 Ford Deluxe Convertible. There’s no catch, no hidden trap, no fees or charges involved (except the one mentioned below).

Eligibility criteria

To be eligible, you must be:

  • 18 or above if you are taking a gap year or not attending a university/college.
  • You do not have to have a formal degree in data science or a relevant subject, but you must have completed it if you do. In other words: if you’re in your 3rd year of an English Lit degree, you’re welcome to apply, but if you’re in the middle of your CS degree, you have to wait until you’re finished – sorry. The same goes if you intend to go straight on to a data science-related postgrad within the year.
  • Have a solid basis in mathematics: decent statistics, combinatorics, linear algebra and some high school calculus are the very minimum.
  • You must be familiar with Python (3.5 and above), and either familiar with the scientific Python stack (SciPy, NumPy, Pandas, matplotlib) or willing to pick up a lot on the go.
  • Be willing to put in the work: we’ll be convening about once every week to ten days by Skype for an hour, and you’ll probably be doing 6-10 hours’ worth of reading and work for the rest of the week. Please be realistic if you can sustain this.
  • If, as recommended, you are working on an AWS EC2 instance, be aware this might cost money and make sure you can cover the costs. In practice, these are negligible.
  • You must understand that this is a physically and intellectually strenuous endeavor, and it is your responsibility to know whether you’re physically and mentally up for the job. However, no physical or mental disabilities are regarded as automatically excluding you of consideration.
  • You must not live in, reside in or be a citizen of any of the countries listed in CFR Title 22 Part 126, §126.1(d)(1) and (2).
  • You must not have been convicted of a felony anywhere. This includes ‘spent’ UK criminal convictions.

Sounds good? Apply here.

Preferred applicants

When assessing applications, the following groups are given preference:

  • Persons with mental or physical disabilities whose disability precludes them from finding conventional employment – please outline this situation on the application form.
  • Honourably discharged (or equivalent) veterans of NATO forces and the IDF – please include member 4 copy of DD-214, Wehrdienstzeitbescheinigung or equivalent document that lists type of discharge.

What we’ll be up to

Don’t worry. None of this car waxing crap.

Over the 42 weeks to follow, you will be undergoing a rigorous and structured semi-self-directed training process. This will take your background, interests and future ambitions into account, but at the core, you will:

  • master Python’s data processing stack,
  • learn how to visualize data in Python,
  • work with networks and graph databases, including Neo4j,
  • acquire the correct way of presenting results in data science to stakeholders,
  • delve into cutting-edge methods of machine learning, such as deep learning using keras,
  • work on problems in computer vision and get familiar with the Python bindings of OpenCV,
  • scrape data from social networks, and
  • learn convenient ways of representing, summarizing and distributing our results.

The programme is divided into three ‘terms’ of 14 weeks each, which each consist of 9 weeks of directed study, 4 weeks of self-directed project work and one week of R&R.

What you’ll be getting out of this

Since the introduction of Docker, tolerance for wanton destruction as part of coursework has increased, but still won’t earn you a passing grade by itself.

In the past years, mentees have noted the unusual breadth of knowledge they have acquired about data science, as well as the diversity of practical topics and the realistic question settings, with an emphasis on practical applications of data science such as presenting data products. I hope that this year, too, I’ll be able to convey the same important topics. Every year is a little different as I try to adjust the course to meet the individual participant’s needs.

The programme is not, of course, accredited by any accreditation body, but a certificate of completion will be issued to any participant who wishes so.

Application process

Simply fill in the form below and send it off by 14 January 2018. The top contenders will be contacted by e-mail or telephone for a brief conversation thereafter. Finally, a lucky winner will be picked by the 21st January 2018. Easy peasy!

 

FAQ

Q: What does ‘semi-self-directed’ mean? Is there a fixed curriculum?

A: No. There are some basic topics (see list above) that I think are quite likely to come up, but ultimately, this is about making you the data scientist you want to be. For this reason, we’ll begin by planning out where you want to improve – kinda like a PT gives you a training plan before you start out at their gym. We will then adjust as needed. This is not an exam prep, it’s a learning experience, and for that reason, we can focus on delving deeper and getting the fundaments right over other cramming in a particular curriculum.

Q: Can I bring your own data?

A: Sure. In general, we’ll be using standard data sets, because they’re well-known and high-quality data. But if you have a dataset you collected or are otherwise entitled to use that would do equally well, there’s no reason why we couldn’t use it! Note that you must have the right to use and share the data set, meaning it’s unlikely you’re able to use data sets from your day job.

Q: Will this give me an employment advantage?

A: I don’t quite know – it’s impossible to predict. The field of data science degrees is something of a Wild West still, and while some reputable degrees have emerged, others are dubious. Employers still don’t know what to go by. However, you will most definitely be better prepared for an employment interview in data science!

Q: Why are you so keen on presenting data the right way?

A: Because as data scientists, we’re expected to not merely understand the data and draw the right conclusions, but also to convey them to stakeholders at various levels, from plant management to C-suite, in a way that gets the right message across at the first go.

Q: You’re a computational epidemiologist. Can I apply even if my work doesn’t really involve healthcare?

A: Sure. The principles are the same, and we’re largely focusing on generic topics. You might be exposed to bits and pieces of epidemiology, but I can guarantee it won’t hurt.

Q: Why do you only take on one mentee?

A: To begin with, my life is pretty busy – I have a demanding job, a family and – shock horror! – I even need to sleep every once in a while. More importantly, I want to devote my undivided attention to a worthy candidate.

Q: How come I’ve never heard of this before?

A: Until now, I’ve largely gotten mentees by word of mouth. I am concerned that this is keeping some talented people out and limiting the pool of people we should have in. That’s why this year, I have tried to make this process much more transparent.

Q: You’re rather fond of General ‘Mad Dog’ Mattis. Will there be yelling?

No.

Q: There seems to be no upper age limit. Is that a mistake?

No.

Q: I have more questions.

A: You can ask them here.

A deep learning

There are posts that are harder to write than others. This one perhaps has been one of the hardest. It took me the best part of four months and dozens of rewrites.

Because it’s about something I love. And about someone I love. And about something else I love. And how these three came to come into a conflict. And, perhaps, what we all can learn from that.


As many of you might know, deep learning is my jam. Not in a faddish, ‘it’s what cool kids do these days’ sense. Nor, for that matter, in the sense so awfully prevalent in Silicon Valley, whereby the utility of something is measured in how many jobs it will get rid of, presumably freeing off humans to engage in more cerebral pursuits, or how it may someday cure intrinsically human problems if only those pesky humans were to listen to their technocratic betters for once. Rather, I’m a deep learning and AI researcher who believes in what he’s doing. I believe with all I am and all I’ve got that deep learning is right now our best chance to find better ways of curing cancer, producing more with less emissions, building structures that can withstand floods on a dime, identifying terrorists and, heck, create entertaining stuff. I firmly believe that it’s one of the few intellectual pursuits I am somewhat suited for that is also worth my time, not the least because I firmly believe that it will make me have more of it – and if not me, maybe someone equally worthy.

Which is why it was so hard for me to watch this video, of my lifelong idol Hayao Miyazaki ripping a deep learning researcher to shreds.

Now, quite frankly, I have little time for the researcher and his proposition. It’s badly made, dumb and pointless. Why one would inundate Miyazaki-san with it is beyond me. His putdown is completely on point, and not an ounce too harsh. All of his words are well deserved. As someone with a neurological chronic pain disorder that makes me sometimes feel like that creature writhing on the floor, I don’t have a shred of sympathy for this chap.1

Rather, it’s the last few words of Miyazaki-san that have punched a hole in my heart and have held my thoughts captive for months now, coming back into the forefront of my thoughts like a recurring nightmare.

“I feel like we are nearing the end of times,” he says, the camera gracefully hovering over his shoulder as he sketches through his tears. “We humans are losing faith in ourselves.”


Deep learning is something formidable, something incredible, something so futuristic yet so simple. Deep down (no pun intended), deep learning is really not much more than a combination of a few relatively simple tricks, some the best part of a century old, that together create something fantastic. Let me try to put it into layman’s terms (if you’re one of my fellow ML /AI nerds, you can just jump over this part).

Consider you are facing the arduous and yet tremendously important task of, say, identifying whether an image depicts a cat or a dog. In ML lingo, this is what we call a ‘classification’ task. One traditional approach used to be to define what cats are versus what dogs are, and provide rules. If it’s got whiskers, it’s a cat. If it’s got big puppy eyes, it’s, well, a puppy. If it’s got forward pointing eyes and a roughly circular face, it’s almost definitely a kitty. If it’s on a leash, it’s probably a dog. And so on, ad infinitum, your model of a cat-versus-dog becoming more and more accurate with each rule you add.

This is a fairly feasible approach, and is still used. In fact, there’s a whole school of machine learning called decision trees that relies on this kind of definition of your subjects. But there are three problems with it.

  1. You need to know quite a bit about cats and dogs to be able to do this. At the very least, you need to be able to, and take the time and effort to, describe cats and dogs. It’s not enough to merely feed images of each to the computer.2
  2. You are limited in time and ability to put down distinguishing features – your program cannot be infinitely large, nor do you have infinite time to write it. You must prioritise by identifying the factors with the greatest differentiating potential first. In other words, you need to know, in advance, what the most salient characteristics of cats versus dogs are – that is, what characteristics are almost omnipresent among cats but hardly ever occur among dogs (and vice versa)? All dogs have a snout and no cat has a snout, whereas some cats do have floppy ears and some dogs do have almost catlike triangular ears.
  3. You are limited to what you know. Silly as that may sound, there might be some differentia between cats and dogs that are so arcane, so mathematical that no human would think of it – but which might come trivially evident to a computer.

Deep learning, like friendship, is magic. Unlike most other techniques of machine learning, you don’t need to have the slightest idea of what differentiates cats from dogs. What you need is a few hundred images of each, preferably with a label (although that is not strictly necessary – classifiers can get by just fine without needing to be told what the names of the things they are classifying are: as long as they’re told how many different classes they are to split the images into, they will find differentiating features on their own and split the images into ‘images with thing 1’ versus  ‘images with thing 2’. – magic, right?). Using modern deep learning libraries like TensorFlow and their high level abstractions (e.g. keras, tflearn) you can literally write a classifier that identifies cats versus dogs with a very high accuracy in less than 50 lines of Python that will be able to classify thousands of cat and dog pics in a fraction of a minute, most of which will be taken up by loading the images rather than the actual classification.

Told you it’s magic.


What makes deep learning ‘deep’, though? The origins of deep learning are older than modern computers. In 1943, McCullough and Pitts published a paper3 that posited a model of neural activity based on propositional logic. Spurred by the mid-20th century advances in understanding how the nervous system works, in particular how nerve cells are interconnected, McCulloch and Pitts simply drew the obvious conclusion: there is a way you can represent neural connections using propositional logic (and, actually, vice versa). But it wasn’t until 1958 that this idea was followed up in earnest. Rosenblatt’s ground-breaking paper4 introduced this thing called the perceptron, something that sounds like the ideal robotic boyfriend/therapist but in fact was intended as a mathematical model for how the brain stores and processes information. A perceptron is a network of artificial neurons. Consider the cat/dog example. A simple single-layer perceptron has a list of input neurons   x_1 ,   x_2   and so on. Each of these describe a particular property. Does the animal have a snout? Does it go woof? Depending on how characteristic they are, they’re multiplied by a weight  w_n . For instance, all dogs and no cats have snouts, so   w_1   will be relatively high, while there are cats that don’t have long curly tails and dogs that do, so  w_n   will be relatively low.

At the end, the output neuron (denoted by the big  \Sigma  ) sums up these results, and gives an estimate as to whether it’s a cat or a dog.

What was initially designed to model the way the brain works has soon shown remarkable utility in applied computation, to the point that the US Navy was roped into building an actual, physical perceptron machine – the first application of computer vision. However, it was a complete bust. It turned out that a single layer perceptron couldn’t really recognise a lot of patterns. What it lacked was depth.

What do we mean by depth? Consider the human brain. The brain actually doesn’t have a single part devoted to vision. Rather, it has six separate areas5 – the striate cortex (V1) and the extrastriate areas (V2-V6). These form a feedforward pathway of sorts, where V1 feeds into V2, which feeds into V3 and so on. To massively oversimplify: V1 detects optical features like edges, which it feeds on to V2, which breaks these down into more complex features: shapes, orientation, colour &c. As you proceed towards the back of the head, the visual centres detect increasingly complex abstractions from the simple visual information. What was found is that by putting layers and layers of neurons after one another, even very complex patterns can be identified accurately. There is a hierarchy of features, as the facial recognition example below shows.

The first hidden layer recognises simple geometries and blobs at different parts of the zone. The second hidden layer fires if it detects particular manifestations of parts of the face – noses, eyes, mouths. Finally, the third layer fires if it ‘sees’ a particular combination of these. Much like an identikit image, a face is recognised because it contains parts of a face, which in turn are recognised because they contain a characteristic spatial alignment of simple geometries.

There’s much more to deep learning than what I have tried to convey in a few paragraphs. The applications are endless. With the cost of computing decreasing rapidly, deep learning applications have now become feasible in just about all spheres where they can be applied. And they excel everywhere, outpacing not only other machine learning approaches (which makes me absolutely stoked about the future!) but, at times, also humans.


Which leads me back to Miyazaki. You see, deep learning can’t just classify things or predict stock prices. It can also create stuff. To put an old misunderstanding to rest quite early: generative neural networks are genuinely creating new things. Rather than merely combining pre-programmed elements, they come as close as anything non-human can come to creativity.

The pinnacle of it all, generating enjoyable music, is still some ways off, and we have yet to enjoy a novel written by a deep learning engine. But to anyone who has been watching the rapid development of deep learning and especially generative algorithms based on deep learning, these are literally just questions of time.

Or perhaps, as Miyazaki said, questions of the ‘end of times’.

What sets a computer-generated piece apart from a human’s composition? Someday, they will be, as far as quality is concerned, indistinguishable. Yet something that will always set them apart is the absence of a creator.

In what is probably one of the worst written essays in  20th century literary criticism, a field already overflowing with bad prose for bad prose’s sake, Roland Barthes’s 1967 essay La mort de l’auteur posited a sort of separation between the author and the text, countering centuries of literary criticism that sought to explain the meaning of the latter by reference to the former.  According to Barthes, texts (and so, compositions, paintings &.) have a life and existence of their own. To liberate works of art of an  ‘interpretive  tyranny’ that is almost self-explanatorily imposed on it, they must be read, interpreted and understood by reference to its audience and not its author. Indeed, Barthes eschews the term in favour of the term ‘scriptor‘, the latter hearkening back to the Medieval monks who copied manuscripts: like them, the scriptor is not in control of the narrative or work of art that he or she composes. Devoid of the author’s authority, the work of art is now free to exist in a liberated state that allows you – the recipient – to establish its essential meaning.

Oddly, that’s not entirely what post-modernism seems to have created. If anything, there is now an increased focus on the author, at the very least in one particular sense. Consider the curious case of Wagner’s works in Israel. Because of his anti-Semitic views, arguably as well as due to the favour his music found during the tragic years of the Third Reich, Wagner’s works – even those that do not even remotely express a political position – are rarely played in Israel. Even in recent years, other than Holocaust survivor Mendi Roman’s performance of Siegfried in 2000, there have been very few instances of Wagner played in Israel – despite the curious fact that Theodor Herzl, founder of Zionism, admired Wagner’s music (if not his vile racial politics). Rather than the death of the author, we more often witness the death of the work. The taint of the author’s life comes to haunt the chords of his composition and the stanzas of his poetry, every brush-stroke of theirs forever imbued with the often very human sins and mistakes of their lives.

Less dramatic, perhaps, than Wagner’s case are the increasingly frequent boycotts, outbursts and protests against works of art solely based on the character of the author or composer. One must only look at the recent past to see protests, for instance, against the works of HP Lovecraft, themselves having to do more with eldritch horrors than racist horridness, due to the author’s admittedly reprehensible views on matters of race. Outrages about one author or another, one artist or the next, are commonplace, acted out on a daily basis on the Twitter gibbets and the Facebook  pillory. Rather than the death of the author, we experience the death of art, amidst an increasingly intolerant culture towards  the works of flawed or sinful creators.

This is, of course, not to excuse any of those sins or flaws. They should not, and cannot, be excused. Rather, perhaps, it is to suggest that part of a better understanding of humanity is that artists are a cross-section of us as a species, equally prone to be misled and deluded into adopting positions that, as the famous German anti-Fascist and children’s book author Erich Kästner said, ‘feed the animal within man’. Nor is this to condone or justify art that actively expresses those reprehensible views – an entirely different issue. Rather, I seek merely to draw attention to the increased tendency to condemn works of art for the artist’s political sins. In many cases, these sins are far from being as straightforward as Lovecraft’s bigotry and Wagner’s anti-Semitism. In many cases, these sins can be as subtle as going against the drift of public opinion, the Orwellian sin of ‘wrongthink’. With the internet having become a haven of mob mentality (something I personally was subjected to a few years ago), the threshold of what sins  of the creator shall be visited upon their creations has significantly decreased. It’s not the end of days, but you can see it from here.

In which case perhaps Miyazaki is right.

Perhaps what we need is art produced by computers.


As Miyazaki-san said, we are losing faith in ourselves. Not in our ability to create wonderful works of art, but in our ability to measure up to some flawless ethos, to some expectation of the artist as the flawless being. We are losing faith in our artists. We are losing faith in our creators, our poets and painters and sculptors and playwrights and composers, because we fear that with the inevitable revelation of greater – or perhaps lesser – misdeeds or wrongful opinions from their past shall not merely taint them: they shall no less taint us, the fans and aficionados and cognoscenti. Put not your faith in earthly artists, for they are fickle, and prone to having opinions that might be unacceptable, or be seen as such someday. Is it not a straightforward response then to  declare one’s love for the intolerable synthetic Baroque of Stanford machine learning genius Cary Kaiming Huang’s research? In a society where the artist’s sins taint the work of art and through that, all those who confessed to enjoy his works, there’s no other safe bet. Only the AI can cast the first stone.

And if the cost of that is truly the chirps of Cary’s synthetic Baroque generator, Miyazaki is right on the other point, too. It truly is the end of days.

References   [ + ]

1.Least of all because I know how rudimentary and lame his work is. I’ve built evolutionary models of locomotion where the first stages look like this. There’s no cutting edge science here.
2.There’s a whole aspect of the story called feature extraction, which I will ignore for the sake of simplicity, and assume that it just happens. It doesn’t, of course, and it plays a huge role in identifying things, but this story is complex enough already as it is.
3.McCulloch, W and Pitts, W (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259.
4.Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psych Rev 65 (6): 386–408. doi:10.1037/h0042519
5.Or five, depending on whether you consider the dorsomedial area a separate area of the extrastriate cortex

The sinful algorithm

In 1318, amidst a public sentiment that was less than enthusiastic about King Edward II, a young clerk from Oxford, John Deydras of Powderham, claimed to be the rightful ruler of England. He spun a long and rather fantastic tale that involved sows biting off the ears of children and other assorted lunacy.1 Edward II took much better to the pretender than his wife, the all-around badass Isabella of France, who was embarrassed by the whole affair, and Edward’s barons, who feared more sedition if they let this one slide. As such, eventually, Deydras was tried for sedition.

Deydras’s defence was that he has been convinced to engage in this charade by his cat, through whom the devil appeared to him.2 That did not meet with much leniency, it did however result in one of the facts that exemplified the degree to which medieval criminal jurisprudence was divorced from reason and reality: besides Deydras, his cat, too, was tried, convicted, sentenced to death and hung, alongside his owner.

Before the fashionable charge of unreasonableness is brought against the Edwardian courts, let it be noted that other times and cultures have fared no better. In the later middle ages, it was fairly customary for urban jurisdictions to remove objects that have been involved in a crime beyond the city limits, giving rise to the term extermination (ex terminare, i.e., [being put] beyond the ends).3 The Privileges of Ratisbon (1207) allowed the house in which a crime took place or which harboured an outlaw to be razed to the ground – the house itself was as guilty as its owner.4 And even a culture as civilised and rationalistic as the Greeks fared no better, falling victim to the same surge of unreason. Hyde describes

The Prytaneum was the Hôtel de Ville of Athens as of every Greek town. In it was the common hearth of the city, which represented the unity and vitality of the community. From its perpetual fire, colonists, like the American Indians, would carry sparks to their new homes, as a symbol of fealty to the mother city, and here in very early times the prytanis or chieftain probably dwelt. In the Prytaneum at Athens the statues of Eirene (Peace) and Hestia (Hearth) stood; foreign ambassadors, famous citizens, athletes, and strangers were entertained there at the public expense; the laws of the great law-giver Solon were displayed within it and before his day the chief archon made it his home.
One of the important features of the Prytaneum at Athens were the curious murder trials held in its immediate vicinity. Many Greek writers mention these trials, which appear to have comprehended three kinds of cases. In the first place, if a murderer was unknown or could not be found, he was nevertheless tried at this court. Then inanimate things – such as stones, beams, pliece of iron, etc., – which had caused the death of a man by falling upon him-were put on trial at the Prytaneum, and lastly animals, which had similarly been the cause of death.
Though all these trials were of a ceremonial character, they were carried on with due process of law. Thus, as in all murder trials at Athens, because of the religious feeling back of them that such crimes were against the gods as much as against men, they took place in the open air, that the judges might not be contaminated by the pollution supposed to exhale from the prisoner by sitting under the same roof with him.
(…)
[T]he trial of things, was thus stated by Plato:
“And if any lifeless thing deprive a man of life, except in the case of a thunderbolt or other fatal dart sent from the gods – whether a man is killed by lifeless objects falling upon him, or his falling upon them, the nearest of kin shall appoint the nearest neighbour to be a judge and thereby acquit himself and the whole family of guilt. And he shall cast forth the guilty thing beyond the border.”
Thus we see that this case was an outgrowth from, or amplification of the [courts’ jurisdiction trying and punishing criminals in absentia]; for if the murderer could not be found, the thing that was used in the slaying, if it was known, was punished.5


Looking at the current wave of fashionable statements about the evils of algorithms have reminded me eerily of the superstitious pre-Renaissance courts, convening in damp chambers to mete out punishments not only on people but also on impersonal objects. The same detachment from reality, from the Prytaneum through Xerxes’s flogging of the Hellespont through hanging cats for being Satan’s conduits, is emerging once again, in the sophisticated terminology of ‘systematized biases’:

Clad in the pseudo-sophistication of a man who bills himself as ‘one of the world’s leading thinkers‘, a wannabe social theorist with an MBA from McGill and a career full of buzzwords (everything is ‘foremost’, ‘agenda-setting’ or otherwise ‘ultimative’!) that now apparently qualifies him to discuss algorithms, Mr Haque makes three statements that have now become commonly accepted dogma among certain circles when discussing algorithms.

  1. Algorithms are means to social control, or at the very least, social influence.
  2. Algorithms are made by a crowd of ‘geeks’, a largely homogenous, socially self-selected group that’s mostly white, male, middle to upper middle class and educated to a Masters level.
  3. ‘Systematic biases’, by which I presume he seeks to allude to the concept of institutional -isms in the absence of an actual propagating institution, mean that these algorithms are reflective of various biases, effectively resulting in (at best) disadvantage and (at worst) actual prejudice and discrimination against groups that do not fit the majority demographic of those who develop code.

Needless to say, leading thinkers and all that, this is absolute, total and complete nonsense. Here’s why.


A geek’s-eye view of algorithms

We live in a world governed by algorithms – and we have ever since men have mastered basic mathematics. The Polynesian sailors navigating based on stars and the architects of Solomon’s Temple were no less using algorithms than modern machine learning techniques or data mining outfits are. Indeed, the very word itself is a transliteration of the name of the 8th century Persian mathematician Al-Khwarazmi.6 And for most of those millennia of unwitting and untroubled use of algorithms, there were few objections.

The problem is that algorithms now play a social role. What you read is determined by algorithms. The ads on a website? Algorithms. Your salary? Ditto. A million other things are algorithmically calculated. This has endowed the concept of algorithms with an air of near-conspiratorial mystery. You totally expect David Icke to jump out of your quicksort code one day.

Whereas, in reality, algorithms are nothing special to ‘us geeks’. They’re ways to do three things:

  1. Execute things in a particular order, sometimes taking the results of previous steps as starting points. This is called sequencing.
  2. Executing things a particular number of times. This is called iteration.
  3. Executing things based on a predicate being true or false. This is conditionality.

From these three building blocks, you can literally reconstruct every single algorithm that has ever been used. There. That’s all the mystery.

So quite probably, what people mean when they rant about ‘algorithms’ is not the concept of algorithms but particular types of algorithm. In particular, social algorithms, content filtering, optimisation and routing algorithms are involved there.

Now, what you need to understand is that geeks care relatively little about the real world ‘edges’ of problems. They’re not doing this out of contempt or not caring, but rather to compartmentalise problems to manageable little bits. It’s easier to solve tiny problems and make sure the solutions can interoperate than creating a single, big solution that eventually never happens.

To put it this way: to us, most things, if not everything, is an interface. And this largely determines what it means when we talk about the performance of an algorithm.

Consider your washing machine: it can be accurately modelled in the following way.

The above models a washing machine. Given supply water and power, if you put in dirty clothes and detergent, you will eventually get clean clothes and grey water. And that is all.
The above models a washing machine. Given supply water and power, if you put in dirty clothes and detergent, you will eventually get clean clothes and grey water. And that is all.

Your washing machine is an algorithm of sorts. It’s got parameters (water, power, dirty clothes) and return values (greywater tank levels, clean clothes). Now, as long as your washing machine fulfils a certain specification (sometimes called a promise7 or a contract), according to which it will deliver a given set of predictable outputs to a given set of inputs, all will be well. Sort of.

“Sort of”, because washing machines can break. A defect in an algorithm is defined as ‘betraying the contract’, in other words, the algorithm has gone wrong if it has been given the right supply and yields the wrong result. Your washing machine might, however, fail internally. The motor might die. A sock might get stuck in it. The main control unit might short out.


Now consider the following (extreme simplification of an) algorithm. MD5 is what we call a cryptographic hash function. It takes something – really, anything that can be expressed in binary – and gives a 128-bit hash value. On one hand, it is generally impossible to invert the process (i.e. it is not possible to conclusively deduce what the original message was), while at the same time the same message will always yield the same hash value.

Without really having an understanding of what goes on behind the scenes,8 you can rely on the promise given by MD5. This is so in every corner of the universe. The value of MD5("Hello World!") is 0xed076287532e86365e841e92bfc50d8c in every corner of the universe. It was that value yesterday. It will be that value tomorrow. It will be that value at the heat death of the universe. What we mean when we say that an algorithm is perfect is that it upholds, and will uphold, its promise. Always.

At the same time, there are aspects of MD5 that are not perfect. You see, perfection of an algorithm is quite context-dependent, much as the world’s best, most ‘perfect’ hammer is utterly useless when what you need is a screwdriver. As such, for instance, we know that MD5 has to map every possible bit value of every possible length to a limited number of possible hash values (128 bit worth of values, to be accurate, which equates to 2^128 or approximately 3.4×10^38 distinct values). These seem a lot, but are actually teensy when you consider that they are used to map every possible amount of binary data, of every possible length. As such, it is known that sometimes different things can have the same hash value. This is called a ‘collision’, and it is a necessary feature of all hash algorithms. It is not a ‘fault’ or a ‘shortcoming’ of the algorithm, no more than we regard the non-commutativity of division a ‘shortcoming’.

Which is why it’s up to you, when you’re using an algorithm, to know what it can and cannot do. Algorithms are tools. Unlike the weird perception in Mr Haque’s swirl of incoherence, we do not worship algorithms. We don’t tend to sacrifice small animals to quicksort and you can rest assured we don’t routinely bow to a depiction of binary search trees. No more do we believe in the ‘perfection’ of algorithms than a surgeon believes in the ‘perfection’ of his scalpel or a pilot believes in the ‘perfection’ of their aircraft. Both know their tools have imperfections. They merely rely on the promise that if used with an understanding of its limitations, you can stake your, and others’, lives on it. That’s not tool-worship, that’s what it means to be a tool-using human being.


The Technocratic Spectre

We don’t know the name of the first human who banged two stones together to make fire, and became the archetype for Prometheus, but I’m rather sure he was rewarded by his fellow humans rewarded with very literally tearing out his very literal and very non-regrowing liver. Every progress in the history of humanity had those who not merely feared progress and the new, but immediately saw seven kinds of nefarious scheming behind it. Beyond (often justified!) skepticism and a critical stance towards new inventions and a reserved approach towards progress (all valid positions!), there is always a caste of professional fear-mongerers, who, after painting a spectre of disaster, immediately proffer the solution: which, of course, is giving them control over all things new, for they are endowed with the mythical talents that one requires to be so presumptuous as to claim to be able to decide for others without even hearing their views.

The difference is that most people have become incredibly lazy. The result is that there is now a preference for fear over informed understanding that comes at the price of investing some time in reading up on the technologies that now are playing such a transformative role. How many Facebook users do you think have re-posted the “UCC 1-308 and Rome Statute” nonsense? And how many of them, you reckon, actually know how Facebook uses their data? While much of what they do is proprietary, the Facebook graph algorithms are partly primitives9 and partly open. If you wanted, you could, with a modicum of mathematical and computing knowledge, have a good stab at understanding what is going on. On the other hand, posting bad legalese is easier. Much easier.

And thus, as a result, we have a degree of skepticism towards ‘algorithms’, mostly by people like Mr Haque who do not quite understand what they are talking about and are not actually referring to algorithms but their social use.

And there lieth the Technocratic Spectre. It has always been a fashionable argument against progress, good or ill, that it is some mysterious machination by a scientific-technical elite aimed at the common man’s detriment. There is now a new iteration of this philosophy, and it is quite surprising how the backwards, low-information edges of the far right reach hands to the far left’s paranoid and misinformed segment. At least the Know-Nothings of the right live in an honest admission of ignorance, eschewing the over-blown credentials and inflated egos of their left-wing brethren like Mr Haque. But in ignorance, they both are one another’s match.

The left-wing argument against technological progress is an odd one, for the IT business, especially the part heavy on research and innovation that comes up with algorithms and their applications, is a very diverse and rather liberal sphere. Nor does this argument square too well with the traditional liberal values of upholding civil liberties, first and foremost that of freedom of expression and conscience. Instead, the objective seems to be an ever more expansive campaign, conducted entirely outside parliamentary procedure (basing itself on regulating private services from the inside and a goodly amount of shaming people into doing their will through the kind of agitated campaigning that I have never had the displeasure to see in a democracy), of limiting the expression of ideas to a rather narrowly circumscribed set, with the pretense that some minority groups are marginalised and even endangered by wrongthink.10


Their own foray at algorithms has not fared well. One need only look at the misguided efforts of a certain Bay Area developer notorious for telling people to set themselves on fire. Her software, intended to block wrongthink on the weirder-than-weird cultural phenomenon of Gamergate by blocking Twitter users who have followed a small number of acknowledged wrongthinkers, expresses the flaws of this ideology beautifully. Not only is subtleness and a good technical understanding lacking. There is also a distinct shortage of good common sense and, most of all, an understanding of how to use algorithms. While terribly inefficient and horrendously badly written 11, the algorithm behind the GGAutoblocker is sound. It does what its creator intended it to do on a certain level: allow you to block everyone who is following controversial personalities. That this was done without an understanding of the social context (e.g. that this is a great way to block the uncommitted and those who wish to be as widely informed as possible, is of course the very point.

The problem is not with “geeks”.

The problem is when “geeks” decide to play social engineering. Whey they suddenly throw down their coding gear and decide they’re going to transform who talks with whom and how information is exchanged. The problem is exactly the opposite: it happens when geeks cease to be geeks.

It happens when Facebook experiments with users’ timelines without their consent. It happens when companies implement policies aimed at a really laudable goal (diversity and inclusion) that leads to statements by employees that should make any sane person shudder (You know who you are, Bay Area). It happens when Twitter decides they are going to experiment with their only asset. This is how it is rewarded.

$TWTR crash
Top tip: when you’ve got effectively no assets to speak of, screwing with your user base can be very fatal very fast.

The problem is not geeks seeing a technical solution to every socio-political issue.

The problem is a certain class of ‘geeks’ seeing a socio-political use to every tool.


Sins of the algorithm

Why algorithms? Because algorithms are infinitely dangerous: because they are, as I noted above, within their area of applicability universally true and correct.

But they’re also resilient. An algorithm feels no shame. An algorithm feels no guilt. You can’t fire it. You can’t tell them to set themselves on fire or, as certain elements have done to me for a single statistical analysis, threaten to rape my wife and/or kill me. An algorithm cannot be guilted into ‘right-think’. And worst of all, algorithms cannot be convincingly presented as having an internal political bias. Quicksort is not Republican. R/B trees are not Democrats. Neural nets can’t decide to be homophobic.

And for people whose sole argumentation lies on the plane of politics, in particular grievance and identity politics, this is a devastating strike. Algorithms are the greased eels unable to be framed for the ideological sins that are used to attack and remove undesirables from political and social discourse. And to those who wish to govern this discourse by fear and intimidation, a bunch of code that steadfastly spits out results and to hell with threats is a scary prospect.

And so, if you cannot invalidate the code, you have to invalidate the maker. Algorithms perpetuate real equality by being by definition unable to exercise the same kind of bias humans do (not that they don’t have their own kind of bias, but the similarity ends with the word – if your algorithm has a racial or ethnic or gender bias, you’re using it wrong). Algorithms are meritocratic, being immune to nepotism and petty politicking. A credit scorer does not care about your social status the way Mr Jones at the bank might privilege the child of his golf partners over a young unmarried ethnic couple. Trading algorithms don’t care whether you’re a severely ill young man playing the markets from hospital.12 Without human intervention, algorithms have a purity and lack of bias that cannot easily be replicated once humans have touched the darn things.

And so, those whose stock in life is a thorough education in harnessing grievances for their own gain are going after “the geeks”.


Perhaps the most disgusting thing about Mr Haque’s tweet is the contraposition between “geeks” and “regular humans”, with the assumption that “regular humans” know all about algorithms and unlike the blindly algorithm-worshipping geeks, understand how ‘life is more complicated’ and algorithms are full of geeky biases.

For starters, this is hard to take seriously when in the same few tweets, Mr Haque displays a lack of understanding of algorithms that doesn’t befit an Oregon militia hick, never mind somebody who claims spurious credentials as a foremost thinker.

“Regular humans”, whatever they are that geeks aren’t (and really, I’m not one for geek supremacy, but if Mr Haque had spent five minutes among geeks, he’d know the difference is not what, and where, he thinks it is), don’t have some magical understanding of the shortcomings of algorithms. Heck, usually, they don’t have a regular understanding of algorithms, never mind magical. But it sure sounds good when you’re in the game of shaming some of the most productive members of society unless they contribute to the very problem you’re complaining about. For of course ‘geeks’ can atone for their ‘geekdom’ by becoming more of a ‘regular human’, by starting to engage in various ill-fated political forays that end with the problems that sent the blue bird into a dive on Friday.


Little of this is surprising, though. Anyone who has been paying attention could see the warning signs of a forced politicisation of technology, under the guise of making it more equal and diverse. In my experience, diverse teams perform better, yield better results, work a little faster, communicate better and make fewer big mistakes (albeit a little more small ones). In particular, gender-diverse and ethnically diverse teams are much more than the sum of their parts. This is almost universally recognised, and few businesses that have intentionally resisted creating diverse, agile teams have fared well in the long run.13 I’m a huge fan of diversity – because it lives up to a meritocratic ideal, one to which I am rather committed after I’ve had to work my way into tech through a pretty arduous journey.

Politicising a workplace, on the other hand, I am less fond of. Quite simply, it’s not our job. It’s not our job, because for what it’s worth, we’re just a bunch of geeks. There are things we’re better at. Building algorithms is one.

But they are now the enemy. And because they cannot be directly attacked, we’ll become targets. With the passion of a zealot, it will be taught that algorithms are not clever mathematical shortcuts but merely geeks’ prejudices expressed in maths.

And that’s a problem. If you look into the history of mathematics, most of it is peppered by people who held one kind of unsavoury view or another. Moore was a virulent racist. Pauli loved loose women. Half the 20th century mathematicians were communists at some point of their career. Haldane thought Stalin was a great man. And I could go on. But I don’t, because it does not matter. Because they took part in the only truly universal human experience: discovery.


But discovery has its enemies and malcontents. The attitude they display, evidenced by Haque’s tweet too, is ultimately eerily reminiscent of the letter that sounded the death knell on the venerable pre-WW II German mathematical tradition. Titled Kunst des Zitierens (The Art of Citing), it was written in 1934 by Ludwig Bieberbach, a vicious anti-Semite and generally unpleasant character, who was obsessed with the idea of a ‘German mathematics’, free of the Hilbertian internationalism, of what he saw as Jewish influence, of the liberalism of the German mathematical community in the inter-war years. He writes:

“Ein Volk, das eingesehen hat, wie fremde Herrschaftsgelüste an seinem Marke nagen, wie Volksfremde daran arbeiten, ihm fremde Art aufzuzwingen, muss Lehrer von einem ihm fremden Typus ablehnen.”

 

“A people that has recognised how foreign ambitions of power attack its brand, how aliens work on imposing foreign ways on it, has to reject teachers from a type alien to it.”

Algorithms, and the understanding of what they do, protect us from lunatics like Bieberbach. His ‘German mathematics’, suffused with racism and Aryan mysticism, was no less delusional than the idea that a cabal of geeks is imposing a ‘foreign way’ of algorithmically implementing their prejudices, as if geeks actually cared about that stuff.

Every age will produce its Lysenko and its Bieberbach, and every generation has its share of zealots that demand ideological adherence and measure the merit of code and mathematics based on the author’s politics.

Like on Lysenko and Bieberbach, history will have its judgment on them, too.

Head image credits: Max Slevogt, Xerxes at the Hellespont (Allegory on Sea Power). Bildermann 13, Oct. 5, 1916. With thanks to the President and Fellows of Harvard College.

References   [ + ]

1.It is now more or less consensus that Deydras was mentally ill and made the whole story up. Whether he himself believed it or not is another question.
2.As an obedient servant to a kitten, I have trouble believing this!
3.Falcón y Tella, Maria J. (2014). Justice and law, 60. Brill Nijhoff, Leiden
4.Falcón y Tella, Maria J. and Falcón y Tella, Fernando (2006). Punishment and Culture: a right to punish? Nijhoff, Leiden.
5.Hyde, Walter W. (1916). The Prosecution and Punishment of Animals and Lifeless Things in the Middle Ages and Modern Times. 64 U.Pa.LRev. 696.
6.Albeit what we currently regard as the formal definition of an algorithm is largely informed by the work of Hilbert in the 1920s, Church’s lambda calculus and, eventually, the emergence of Turing machines.
7.I discourage the promise terminology here as I’ve seen it confuzzled with the asynchronous meaning of the word way too often
8.In case you’re interested, RFC1321 explains MD5’s internals in a lot of detail.
9.Building blocks commonly used that are well-known and well-documented
10.Needless to say, a multiple-times-over minority in IT, the only people who have marginalised and endangered me were these stalwart defenders of the right never to have to face a controversial opinion.
11.Especially for someone who declaims, with pride, her 15-year IT business experience…
12.It was a great distraction.
13.Not that any statement about this matter is not shut down by reference to ludicrous made-up words like ‘mansplaining’.