A deep learning

There are posts that are harder to write than others. This one perhaps has been one of the hardest. It took me the best part of four months and dozens of rewrites.

Because it’s about something I love. And about someone I love. And about something else I love. And how these three came to come into a conflict. And, perhaps, what we all can learn from that.


As many of you might know, deep learning is my jam. Not in a faddish, ‘it’s what cool kids do these days’ sense. Nor, for that matter, in the sense so awfully prevalent in Silicon Valley, whereby the utility of something is measured in how many jobs it will get rid of, presumably freeing off humans to engage in more cerebral pursuits, or how it may someday cure intrinsically human problems if only those pesky humans were to listen to their technocratic betters for once. Rather, I’m a deep learning and AI researcher who believes in what he’s doing. I believe with all I am and all I’ve got that deep learning is right now our best chance to find better ways of curing cancer, producing more with less emissions, building structures that can withstand floods on a dime, identifying terrorists and, heck, create entertaining stuff. I firmly believe that it’s one of the few intellectual pursuits I am somewhat suited for that is also worth my time, not the least because I firmly believe that it will make me have more of it – and if not me, maybe someone equally worthy.

Which is why it was so hard for me to watch this video, of my lifelong idol Hayao Miyazaki ripping a deep learning researcher to shreds.

Now, quite frankly, I have little time for the researcher and his proposition. It’s badly made, dumb and pointless. Why one would inundate Miyazaki-san with it is beyond me. His putdown is completely on point, and not an ounce too harsh. All of his words are well deserved. As someone with a neurological chronic pain disorder that makes me sometimes feel like that creature writhing on the floor, I don’t have a shred of sympathy for this chap.1Least of all because I know how rudimentary and lame his work is. I’ve built evolutionary models of locomotion where the first stages look like this. There’s no cutting edge science here.

Rather, it’s the last few words of Miyazaki-san that have punched a hole in my heart and have held my thoughts captive for months now, coming back into the forefront of my thoughts like a recurring nightmare.

“I feel like we are nearing the end of times,” he says, the camera gracefully hovering over his shoulder as he sketches through his tears. “We humans are losing faith in ourselves.”


Deep learning is something formidable, something incredible, something so futuristic yet so simple. Deep down (no pun intended), deep learning is really not much more than a combination of a few relatively simple tricks, some the best part of a century old, that together create something fantastic. Let me try to put it into layman’s terms (if you’re one of my fellow ML /AI nerds, you can just jump over this part).

Consider you are facing the arduous and yet tremendously important task of, say, identifying whether an image depicts a cat or a dog. In ML lingo, this is what we call a ‘classification’ task. One traditional approach used to be to define what cats are versus what dogs are, and provide rules. If it’s got whiskers, it’s a cat. If it’s got big puppy eyes, it’s, well, a puppy. If it’s got forward pointing eyes and a roughly circular face, it’s almost definitely a kitty. If it’s on a leash, it’s probably a dog. And so on, ad infinitum, your model of a cat-versus-dog becoming more and more accurate with each rule you add.

This is a fairly feasible approach, and is still used. In fact, there’s a whole school of machine learning called decision trees that relies on this kind of definition of your subjects. But there are three problems with it.

  1. You need to know quite a bit about cats and dogs to be able to do this. At the very least, you need to be able to, and take the time and effort to, describe cats and dogs. It’s not enough to merely feed images of each to the computer.2There’s a whole aspect of the story called feature extraction, which I will ignore for the sake of simplicity, and assume that it just happens. It doesn’t, of course, and it plays a huge role in identifying things, but this story is complex enough already as it is.
  2. You are limited in time and ability to put down distinguishing features – your program cannot be infinitely large, nor do you have infinite time to write it. You must prioritise by identifying the factors with the greatest differentiating potential first. In other words, you need to know, in advance, what the most salient characteristics of cats versus dogs are – that is, what characteristics are almost omnipresent among cats but hardly ever occur among dogs (and vice versa)? All dogs have a snout and no cat has a snout, whereas some cats do have floppy ears and some dogs do have almost catlike triangular ears.
  3. You are limited to what you know. Silly as that may sound, there might be some differentia between cats and dogs that are so arcane, so mathematical that no human would think of it – but which might come trivially evident to a computer.

Deep learning, like friendship, is magic. Unlike most other techniques of machine learning, you don’t need to have the slightest idea of what differentiates cats from dogs. What you need is a few hundred images of each, preferably with a label (although that is not strictly necessary – classifiers can get by just fine without needing to be told what the names of the things they are classifying are: as long as they’re told how many different classes they are to split the images into, they will find differentiating features on their own and split the images into ‘images with thing 1’ versus  ‘images with thing 2’. – magic, right?). Using modern deep learning libraries like TensorFlow and their high level abstractions (e.g. keras, tflearn) you can literally write a classifier that identifies cats versus dogs with a very high accuracy in less than 50 lines of Python that will be able to classify thousands of cat and dog pics in a fraction of a minute, most of which will be taken up by loading the images rather than the actual classification.

Told you it’s magic.


What makes deep learning ‘deep’, though? The origins of deep learning are older than modern computers. In 1943, McCullough and Pitts published a paper3McCulloch, W and Pitts, W (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259. that posited a model of neural activity based on propositional logic. Spurred by the mid-20th century advances in understanding how the nervous system works, in particular how nerve cells are interconnected, McCulloch and Pitts simply drew the obvious conclusion: there is a way you can represent neural connections using propositional logic (and, actually, vice versa). But it wasn’t until 1958 that this idea was followed up in earnest. Rosenblatt’s ground-breaking paper4Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psych Rev 65 (6): 386–408. doi:10.1037/h0042519 introduced this thing called the perceptron, something that sounds like the ideal robotic boyfriend/therapist but in fact was intended as a mathematical model for how the brain stores and processes information. A perceptron is a network of artificial neurons. Consider the cat/dog example. A simple single-layer perceptron has a list of input neurons   x_1 ,   x_2   and so on. Each of these describe a particular property. Does the animal have a snout? Does it go woof? Depending on how characteristic they are, they’re multiplied by a weight  w_n . For instance, all dogs and no cats have snouts, so   w_1   will be relatively high, while there are cats that don’t have long curly tails and dogs that do, so  w_n   will be relatively low.

At the end, the output neuron (denoted by the big  \Sigma  ) sums up these results, and gives an estimate as to whether it’s a cat or a dog.

What was initially designed to model the way the brain works has soon shown remarkable utility in applied computation, to the point that the US Navy was roped into building an actual, physical perceptron machine – the first application of computer vision. However, it was a complete bust. It turned out that a single layer perceptron couldn’t really recognise a lot of patterns. What it lacked was depth.

What do we mean by depth? Consider the human brain. The brain actually doesn’t have a single part devoted to vision. Rather, it has six separate areas5Or five, depending on whether you consider the dorsomedial area a separate area of the extrastriate cortex – the striate cortex (V1) and the extrastriate areas (V2-V6). These form a feedforward pathway of sorts, where V1 feeds into V2, which feeds into V3 and so on. To massively oversimplify: V1 detects optical features like edges, which it feeds on to V2, which breaks these down into more complex features: shapes, orientation, colour &c. As you proceed towards the back of the head, the visual centres detect increasingly complex abstractions from the simple visual information. What was found is that by putting layers and layers of neurons after one another, even very complex patterns can be identified accurately. There is a hierarchy of features, as the facial recognition example below shows.

The first hidden layer recognises simple geometries and blobs at different parts of the zone. The second hidden layer fires if it detects particular manifestations of parts of the face – noses, eyes, mouths. Finally, the third layer fires if it ‘sees’ a particular combination of these. Much like an identikit image, a face is recognised because it contains parts of a face, which in turn are recognised because they contain a characteristic spatial alignment of simple geometries.

There’s much more to deep learning than what I have tried to convey in a few paragraphs. The applications are endless. With the cost of computing decreasing rapidly, deep learning applications have now become feasible in just about all spheres where they can be applied. And they excel everywhere, outpacing not only other machine learning approaches (which makes me absolutely stoked about the future!) but, at times, also humans.


Which leads me back to Miyazaki. You see, deep learning can’t just classify things or predict stock prices. It can also create stuff. To put an old misunderstanding to rest quite early: generative neural networks are genuinely creating new things. Rather than merely combining pre-programmed elements, they come as close as anything non-human can come to creativity.

The pinnacle of it all, generating enjoyable music, is still some ways off, and we have yet to enjoy a novel written by a deep learning engine. But to anyone who has been watching the rapid development of deep learning and especially generative algorithms based on deep learning, these are literally just questions of time.

Or perhaps, as Miyazaki said, questions of the ‘end of times’.

What sets a computer-generated piece apart from a human’s composition? Someday, they will be, as far as quality is concerned, indistinguishable. Yet something that will always set them apart is the absence of a creator.

In what is probably one of the worst written essays in  20th century literary criticism, a field already overflowing with bad prose for bad prose’s sake, Roland Barthes’s 1967 essay La mort de l’auteur posited a sort of separation between the author and the text, countering centuries of literary criticism that sought to explain the meaning of the latter by reference to the former.  According to Barthes, texts (and so, compositions, paintings &.) have a life and existence of their own. To liberate works of art of an  ‘interpretive  tyranny’ that is almost self-explanatorily imposed on it, they must be read, interpreted and understood by reference to its audience and not its author. Indeed, Barthes eschews the term in favour of the term ‘scriptor‘, the latter hearkening back to the Medieval monks who copied manuscripts: like them, the scriptor is not in control of the narrative or work of art that he or she composes. Devoid of the author’s authority, the work of art is now free to exist in a liberated state that allows you – the recipient – to establish its essential meaning.

Oddly, that’s not entirely what post-modernism seems to have created. If anything, there is now an increased focus on the author, at the very least in one particular sense. Consider the curious case of Wagner’s works in Israel. Because of his anti-Semitic views, arguably as well as due to the favour his music found during the tragic years of the Third Reich, Wagner’s works – even those that do not even remotely express a political position – are rarely played in Israel. Even in recent years, other than Holocaust survivor Mendi Roman’s performance of Siegfried in 2000, there have been very few instances of Wagner played in Israel – despite the curious fact that Theodor Herzl, founder of Zionism, admired Wagner’s music (if not his vile racial politics). Rather than the death of the author, we more often witness the death of the work. The taint of the author’s life comes to haunt the chords of his composition and the stanzas of his poetry, every brush-stroke of theirs forever imbued with the often very human sins and mistakes of their lives.

Less dramatic, perhaps, than Wagner’s case are the increasingly frequent boycotts, outbursts and protests against works of art solely based on the character of the author or composer. One must only look at the recent past to see protests, for instance, against the works of HP Lovecraft, themselves having to do more with eldritch horrors than racist horridness, due to the author’s admittedly reprehensible views on matters of race. Outrages about one author or another, one artist or the next, are commonplace, acted out on a daily basis on the Twitter gibbets and the Facebook  pillory. Rather than the death of the author, we experience the death of art, amidst an increasingly intolerant culture towards  the works of flawed or sinful creators.

This is, of course, not to excuse any of those sins or flaws. They should not, and cannot, be excused. Rather, perhaps, it is to suggest that part of a better understanding of humanity is that artists are a cross-section of us as a species, equally prone to be misled and deluded into adopting positions that, as the famous German anti-Fascist and children’s book author Erich Kästner said, ‘feed the animal within man’. Nor is this to condone or justify art that actively expresses those reprehensible views – an entirely different issue. Rather, I seek merely to draw attention to the increased tendency to condemn works of art for the artist’s political sins. In many cases, these sins are far from being as straightforward as Lovecraft’s bigotry and Wagner’s anti-Semitism. In many cases, these sins can be as subtle as going against the drift of public opinion, the Orwellian sin of ‘wrongthink’. With the internet having become a haven of mob mentality (something I personally was subjected to a few years ago), the threshold of what sins  of the creator shall be visited upon their creations has significantly decreased. It’s not the end of days, but you can see it from here.

In which case perhaps Miyazaki is right.

Perhaps what we need is art produced by computers.


As Miyazaki-san said, we are losing faith in ourselves. Not in our ability to create wonderful works of art, but in our ability to measure up to some flawless ethos, to some expectation of the artist as the flawless being. We are losing faith in our artists. We are losing faith in our creators, our poets and painters and sculptors and playwrights and composers, because we fear that with the inevitable revelation of greater – or perhaps lesser – misdeeds or wrongful opinions from their past shall not merely taint them: they shall no less taint us, the fans and aficionados and cognoscenti. Put not your faith in earthly artists, for they are fickle, and prone to having opinions that might be unacceptable, or be seen as such someday. Is it not a straightforward response then to  declare one’s love for the intolerable synthetic Baroque of Stanford machine learning genius Cary Kaiming Huang’s research? In a society where the artist’s sins taint the work of art and through that, all those who confessed to enjoy his works, there’s no other safe bet. Only the AI can cast the first stone.

And if the cost of that is truly the chirps of Cary’s synthetic Baroque generator, Miyazaki is right on the other point, too. It truly is the end of days.

References   [ + ]

1. Least of all because I know how rudimentary and lame his work is. I’ve built evolutionary models of locomotion where the first stages look like this. There’s no cutting edge science here.
2. There’s a whole aspect of the story called feature extraction, which I will ignore for the sake of simplicity, and assume that it just happens. It doesn’t, of course, and it plays a huge role in identifying things, but this story is complex enough already as it is.
3. McCulloch, W and Pitts, W (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259.
4. Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psych Rev 65 (6): 386–408. doi:10.1037/h0042519
5. Or five, depending on whether you consider the dorsomedial area a separate area of the extrastriate cortex

The one study you shouldn’t write

I might have my own set of ideological prejudices,1Largely, they presume outlandish stuff like ‘human life is exceptional and always worth defending’ or ‘death does not cure illnesses’, you get my drift. while at the same time I am more sure than I am about any of these I am certain about this: show me proof that contradicts my most cherished beliefs, and I will read it, evaluate it critically and if correct, learn from it. This, incidentally, is how I ended up believing in God and casting away the atheism of my early teens, but that’s a lateral point.

As such, I’m in support of every kind of inquiry that does not, in its process, harm humans (I am, you may be shocked to learn, far more supportive of torturing raw data than people). There’s one exception. There is that one study for every sociologist, every data scientist, every statistician, every psychologist, everyone – that one study that you should never write: the study that proves how your ideological opponents are morons, psychotics and/or terminally flawed human beings.2For starters, I maintain we all are at the very least the latter, quite probably the middle one at least a portion of the time and, frankly, the first one more often than we would believe ourselves.

Virginia Commonwealth University scholar Brad Verhulst, Pete Hatemi (now at Penn State, my sources tell me) and poor old Lindon Eaves, who of all of the aforementioned should really know better than to darken his reputation with this sort of nonsense, have just learned this lesson at what I believe will be a minuscule cost to their careers compared to the consequence this error ought to cost any researcher in any field.

In 2012, the trio published an article in the American Journal of Political Science, titled Correlation not causation: the relationship between personality traits and political ideologies. Its conclusion was, erm, ground-breaking for anyone who knows conservatives from more than the caricatures they have been reduced to in the media:

First, in line with our expectations, higher P scores correlate with more conservative military attitudes and more socially conservative beliefs for both females and males. For males, the relationship between P and military attitudes (r = 0.388) is larger than the relationship between P and social attitudes (r = 0.292). Alternatively, for females, social attitudes correlate more highly with P (r = 0.383) than military attitudes (r = 0.302).

Further, we find a negative relationship between Neuroticism and economic conservatism (r_{females} = −0.242, $$r_{males}$$ = −0.239). People higher in Neuroticism tend to be more economically liberal.

(P, in the above, being the score in Eysenck’s psychoticism inventory.)

The most damning words in the above were among the very first. I am not sure what’s worst here: that actual educated people believe psychoticism correlates to military attitudes (because the military is known for courting psychotics, am I right? No? NO?!), or that they think it helps any case to disclose what is a blatant bias quite openly. In my lawyering years, if the prosecution expert had stated that the fingerprints on the murder weapon “matched those of that dirty crook over there, as I expected”, I’d have torn him to shreds, and so would any good lawyer. And that’s not because we’re born and raised bloodhounds but because we prefer people not to have biases in what they are supposed to opine on in a dispassionate, clear, clinical manner.

And this story confirms why that matters.

Four years after the paper came into print (why so late?), an erratum had to be  published (that, by the way, is still not replicated on a lot of sites that republished the piece). It so turns out that the gentlemen writing the study have ‘misread’ their numbers. Like, real bad.

The authors regret that there is an error in the published version of “Correlation not Causation: The Relationship between Personality Traits and Political Ideologies” American Journal of Political Science 56 (1), 34–51. The interpretation of the coding of the political attitude items in the descriptive and preliminary analyses portion of the manuscript was exactly reversed. Thus, where we indicated that higher scores in Table 1 (page 40) reflect a more conservative response, they actually reflect a more liberal response. Specifically, in the original manuscript, the descriptive analyses report that those higher in Eysenck’s psychoticism are more conservative, but they are actually more liberal; and where the original manuscript reports those higher in neuroticism and social desirability are more liberal, they are, in fact, more conservative. We highlight the specific errors and corrections by page number below:

It also magically turns out that the military is not full of psychotics.3Yes, I know a high Eysenck P score does not mean a person is ‘psychotic’ and Eysenck’s test is a personality trait test, not a test to diagnose a psychotic disorder. Whodda thunk.

…Ρ is substantially correlated with liberal military and social attitudes, while Social Desirability is related to conservative social attitudes, and Neuroticism is related to conservative economic attitudes.

“No shit, Sherlock,” as they say.

The authors’ explanation is that the dog ate their homework. Ok, only a little bit better: the responses were “miscoded”, i.e. it’s all the poor grad student sods’ fault. Their academic highnesses remain faultless:

The potential for an error in our article initially was pointed out by Steven G. Ludeke and Stig H. R. Rasmussen in their manuscript, “(Mis)understanding the relationship between personality and sociopolitical attitudes.” We found the source of the error only after an investigation going back to the original copies of the data. The data for the current paper and an earlier paper (Verhulst, Hatemi and Martin (2010) “The nature of the relationship between personality traits and political attitudes.” Personality and Individual Differences 49:306–316) were collected through two independent studies by Lindon Eaves in the U.S. and Nichols Martin in Australia. Data collection began in the 1980’s and finished in the 1990’s. The questionnaires were designed in collaboration with one of the goals being to be compare and combine the data for specific analyses. The data were combined into a single data set in the 2000’s to achieve this goal. Data are extracted on a project-by-project basis, and we found that during the extraction for the personality and attitudes project, the specific codebook used for the project was developed in error.

As a working data scientist and statistician, I’m not buying this. This study has, for all its faults, intricate statistical methods. It’s well done from a technical standpoint. It uses Cholesky decomposition and displays a relatively sophisticated statistical approach, even if it’s at times bordering on the bizarre. The causal analysis is an absolute mess, and I have no idea where the authors have gotten the idea that a correlation over 0.2 is “large enough for further consideration”. That’s not a scientifically accepted idea. A correlation is significant or not significant. There is no weird middle way of “give us more money, let’s look into it more”. The point remains, however, that the authors, while practising a good deal of cargo cult science, have managed to oversee an epic blunder like this. How could that have happened?

Well, really, how could it have happened? I trust this should be explained by the words I’ve pointed out before. The authors had what is called “cognitive contamination” in the field of criminal forensic science. The authors had an idea about conservatives and liberals and what they are like. These ideas were caricaturesque to the extreme. They were blind as a bat, blinded by their own ideological biases.

And there goes my point. There are, sometimes, articles that you shouldn’t write.

Let me give you an analogy. My religion has some pretty clear rules about what married people are, and aren’t, allowed to do. Now, what my religion also happens to say is that it’s easier not to mess up these things if you do not engage in temptation. If you are a drug addict, you should not hang out with coke heads. If you are a recovering alcoholic, you would not exactly benefit from hanging out with your friends on a drunken revelry. If you’ve got political convictions, you are more prone to say stupid things when you find a result that confirms your ideas. The term for this is ‘confirmation bias’, the reality is that it’s the simple human proneness to see what we want to see.

Do you remember how as a child, you used to play the game of seeing shapes in clouds? Puppies, cows, elephants and horses? The human brain works on the basis of a Gestalt principle of reification, allowing us to reconstruct known things from its parts. It’s essential to the way our brain works. But it’s also making us see the things we want to see, not what we’re actually seeing.

And that’s why you should never write that one article. The one where you explain why the other side is dumb, evil or has psychotic and/or neurotic traits.

References   [ + ]

1. Largely, they presume outlandish stuff like ‘human life is exceptional and always worth defending’ or ‘death does not cure illnesses’, you get my drift.
2. For starters, I maintain we all are at the very least the latter, quite probably the middle one at least a portion of the time and, frankly, the first one more often than we would believe ourselves.
3. Yes, I know a high Eysenck P score does not mean a person is ‘psychotic’ and Eysenck’s test is a personality trait test, not a test to diagnose a psychotic disorder.

The sinful algorithm

In 1318, amidst a public sentiment that was less than enthusiastic about King Edward II, a young clerk from Oxford, John Deydras of Powderham, claimed to be the rightful ruler of England. He spun a long and rather fantastic tale that involved sows biting off the ears of children and other assorted lunacy.1It is now more or less consensus that Deydras was mentally ill and made the whole story up. Whether he himself believed it or not is another question. Edward II took much better to the pretender than his wife, the all-around badass Isabella of France, who was embarrassed by the whole affair, and Edward’s barons, who feared more sedition if they let this one slide. As such, eventually, Deydras was tried for sedition.

Deydras’s defence was that he has been convinced to engage in this charade by his cat, through whom the devil appeared to him.2As an obedient servant to a kitten, I have trouble believing this! That did not meet with much leniency, it did however result in one of the facts that exemplified the degree to which medieval criminal jurisprudence was divorced from reason and reality: besides Deydras, his cat, too, was tried, convicted, sentenced to death and hung, alongside his owner.

Before the fashionable charge of unreasonableness is brought against the Edwardian courts, let it be noted that other times and cultures have fared no better. In the later middle ages, it was fairly customary for urban jurisdictions to remove objects that have been involved in a crime beyond the city limits, giving rise to the term extermination (ex terminare, i.e., [being put] beyond the ends).3Falcón y Tella, Maria J. (2014). Justice and law, 60. Brill Nijhoff, Leiden The Privileges of Ratisbon (1207) allowed the house in which a crime took place or which harboured an outlaw to be razed to the ground – the house itself was as guilty as its owner.4Falcón y Tella, Maria J. and Falcón y Tella, Fernando (2006). Punishment and Culture: a right to punish? Nijhoff, Leiden. And even a culture as civilised and rationalistic as the Greeks fared no better, falling victim to the same surge of unreason. Hyde describes

The Prytaneum was the Hôtel de Ville of Athens as of every Greek town. In it was the common hearth of the city, which represented the unity and vitality of the community. From its perpetual fire, colonists, like the American Indians, would carry sparks to their new homes, as a symbol of fealty to the mother city, and here in very early times the prytanis or chieftain probably dwelt. In the Prytaneum at Athens the statues of Eirene (Peace) and Hestia (Hearth) stood; foreign ambassadors, famous citizens, athletes, and strangers were entertained there at the public expense; the laws of the great law-giver Solon were displayed within it and before his day the chief archon made it his home.
One of the important features of the Prytaneum at Athens were the curious murder trials held in its immediate vicinity. Many Greek writers mention these trials, which appear to have comprehended three kinds of cases. In the first place, if a murderer was unknown or could not be found, he was nevertheless tried at this court. Then inanimate things – such as stones, beams, pliece of iron, etc., – which had caused the death of a man by falling upon him-were put on trial at the Prytaneum, and lastly animals, which had similarly been the cause of death.
Though all these trials were of a ceremonial character, they were carried on with due process of law. Thus, as in all murder trials at Athens, because of the religious feeling back of them that such crimes were against the gods as much as against men, they took place in the open air, that the judges might not be contaminated by the pollution supposed to exhale from the prisoner by sitting under the same roof with him.
(…)
[T]he trial of things, was thus stated by Plato:
“And if any lifeless thing deprive a man of life, except in the case of a thunderbolt or other fatal dart sent from the gods – whether a man is killed by lifeless objects falling upon him, or his falling upon them, the nearest of kin shall appoint the nearest neighbour to be a judge and thereby acquit himself and the whole family of guilt. And he shall cast forth the guilty thing beyond the border.”
Thus we see that this case was an outgrowth from, or amplification of the [courts’ jurisdiction trying and punishing criminals in absentia]; for if the murderer could not be found, the thing that was used in the slaying, if it was known, was punished.5Hyde, Walter W. (1916). The Prosecution and Punishment of Animals and Lifeless Things in the Middle Ages and Modern Times. 64 U.Pa.LRev. 696.


Looking at the current wave of fashionable statements about the evils of algorithms have reminded me eerily of the superstitious pre-Renaissance courts, convening in damp chambers to mete out punishments not only on people but also on impersonal objects. The same detachment from reality, from the Prytaneum through Xerxes’s flogging of the Hellespont through hanging cats for being Satan’s conduits, is emerging once again, in the sophisticated terminology of ‘systematized biases’:

Clad in the pseudo-sophistication of a man who bills himself as ‘one of the world’s leading thinkers‘, a wannabe social theorist with an MBA from McGill and a career full of buzzwords (everything is ‘foremost’, ‘agenda-setting’ or otherwise ‘ultimative’!) that now apparently qualifies him to discuss algorithms, Mr Haque makes three statements that have now become commonly accepted dogma among certain circles when discussing algorithms.

  1. Algorithms are means to social control, or at the very least, social influence.
  2. Algorithms are made by a crowd of ‘geeks’, a largely homogenous, socially self-selected group that’s mostly white, male, middle to upper middle class and educated to a Masters level.
  3. ‘Systematic biases’, by which I presume he seeks to allude to the concept of institutional -isms in the absence of an actual propagating institution, mean that these algorithms are reflective of various biases, effectively resulting in (at best) disadvantage and (at worst) actual prejudice and discrimination against groups that do not fit the majority demographic of those who develop code.

Needless to say, leading thinkers and all that, this is absolute, total and complete nonsense. Here’s why.


A geek’s-eye view of algorithms

We live in a world governed by algorithms – and we have ever since men have mastered basic mathematics. The Polynesian sailors navigating based on stars and the architects of Solomon’s Temple were no less using algorithms than modern machine learning techniques or data mining outfits are. Indeed, the very word itself is a transliteration of the name of the 8th century Persian mathematician Al-Khwarazmi.6Albeit what we currently regard as the formal definition of an algorithm is largely informed by the work of Hilbert in the 1920s, Church’s lambda calculus and, eventually, the emergence of Turing machines. And for most of those millennia of unwitting and untroubled use of algorithms, there were few objections.

The problem is that algorithms now play a social role. What you read is determined by algorithms. The ads on a website? Algorithms. Your salary? Ditto. A million other things are algorithmically calculated. This has endowed the concept of algorithms with an air of near-conspiratorial mystery. You totally expect David Icke to jump out of your quicksort code one day.

Whereas, in reality, algorithms are nothing special to ‘us geeks’. They’re ways to do three things:

  1. Execute things in a particular order, sometimes taking the results of previous steps as starting points. This is called sequencing.
  2. Executing things a particular number of times. This is called iteration.
  3. Executing things based on a predicate being true or false. This is conditionality.

From these three building blocks, you can literally reconstruct every single algorithm that has ever been used. There. That’s all the mystery.

So quite probably, what people mean when they rant about ‘algorithms’ is not the concept of algorithms but particular types of algorithm. In particular, social algorithms, content filtering, optimisation and routing algorithms are involved there.

Now, what you need to understand is that geeks care relatively little about the real world ‘edges’ of problems. They’re not doing this out of contempt or not caring, but rather to compartmentalise problems to manageable little bits. It’s easier to solve tiny problems and make sure the solutions can interoperate than creating a single, big solution that eventually never happens.

To put it this way: to us, most things, if not everything, is an interface. And this largely determines what it means when we talk about the performance of an algorithm.

Consider your washing machine: it can be accurately modelled in the following way.

The above models a washing machine. Given supply water and power, if you put in dirty clothes and detergent, you will eventually get clean clothes and grey water. And that is all.
The above models a washing machine. Given supply water and power, if you put in dirty clothes and detergent, you will eventually get clean clothes and grey water. And that is all.

Your washing machine is an algorithm of sorts. It’s got parameters (water, power, dirty clothes) and return values (grey water, clean clothes). Now, as long as your washing machine fulfils a certain specification (sometimes called a promise7I discourage the promise terminology here as I’ve seen it confuzzled with the asynchronous meaning of the word way too often or a contract), according to which it will deliver a given set of predictable outputs to a given set of inputs, all will be well. Sort of.

“Sort of”, because washing machines can break. A defect in an algorithm is defined as ‘betraying the contract’, in other words, the algorithm has gone wrong if it has been given the right supply and yields the wrong result. Your washing machine might, however, fail internally. The motor might die. A sock might get stuck in it. The main control unit might short out.


Now consider the following (extreme simplification of an) algorithm. MD5 is what we call a cryptographic hash function. It takes something – really, anything that can be expressed in binary – and gives a 128-bit hash value. On one hand, it is generally impossible to invert the process (i.e. it is not possible to conclusively deduce what the original message was), while at the same time the same message will always yield the same hash value.

Without really having an understanding of what goes on behind the scenes,8In case you’re interested, RFC1321 explains MD5’s internals in a lot of detail. you can rely on the promise given by MD5. This is so in every corner of the universe. The value of MD5("Hello World!") is 0xed076287532e86365e841e92bfc50d8c in every corner of the universe. It was that value yesterday. It will be that value tomorrow. It will be that value at the heat death of the universe. What we mean when we say that an algorithm is perfect is that it upholds, and will uphold, its promise. Always.

At the same time, there are aspects of MD5 that are not perfect. You see, perfection of an algorithm is quite context-dependent, much as the world’s best, most ‘perfect’ hammer is utterly useless when what you need is a screwdriver. As such, for instance, we know that MD5 has to map every possible bit value of every possible length to a limited number of possible hash values (128 bit worth of values, to be accurate, which equates to 2^128 or approximately 3.4×10^38 distinct values). These seem a lot, but are actually teensy when you consider that they are used to map every possible amount of binary data, of every possible length. As such, it is known that sometimes different things can have the same hash value. This is called a ‘collision’, and it is a necessary feature of all hash algorithms. It is not a ‘fault’ or a ‘shortcoming’ of the algorithm, no more than we regard the non-commutativity of division a ‘shortcoming’.

Which is why it’s up to you, when you’re using an algorithm, to know what it can and cannot do. Algorithms are tools. Unlike the weird perception in Mr Haque’s swirl of incoherence, we do not worship algorithms. We don’t tend to sacrifice small animals to quicksort and you can rest assured we don’t routinely bow to a depiction of binary search trees. No more do we believe in the ‘perfection’ of algorithms than a surgeon believes in the ‘perfection’ of his scalpel or a pilot believes in the ‘perfection’ of their aircraft. Both know their tools have imperfections. They merely rely on the promise that if used with an understanding of its limitations, you can stake your, and others’, lives on it. That’s not tool-worship, that’s what it means to be a tool-using human being.


The Technocratic Spectre

We don’t know the name of the first human who banged two stones together to make fire, and became the archetype for Prometheus, but I’m rather sure he was rewarded by his fellow humans rewarded with very literally tearing out his very literal and very non-regrowing liver. Every progress in the history of humanity had those who not merely feared progress and the new, but immediately saw seven kinds of nefarious scheming behind it. Beyond (often justified!) skepticism and a critical stance towards new inventions and a reserved approach towards progress (all valid positions!), there is always a caste of professional fear-mongerers, who, after painting a spectre of disaster, immediately proffer the solution: which, of course, is giving them control over all things new, for they are endowed with the mythical talents that one requires to be so presumptuous as to claim to be able to decide for others without even hearing their views.

The difference is that most people have become incredibly lazy. The result is that there is now a preference for fear over informed understanding that comes at the price of investing some time in reading up on the technologies that now are playing such a transformative role. How many Facebook users do you think have re-posted the “UCC 1-308 and Rome Statute” nonsense? And how many of them, you reckon, actually know how Facebook uses their data? While much of what they do is proprietary, the Facebook graph algorithms are partly primitives9Building blocks commonly used that are well-known and well-documented and partly open. If you wanted, you could, with a modicum of mathematical and computing knowledge, have a good stab at understanding what is going on. On the other hand, posting bad legalese is easier. Much easier.

And thus, as a result, we have a degree of skepticism towards ‘algorithms’, mostly by people like Mr Haque who do not quite understand what they are talking about and are not actually referring to algorithms but their social use.

And there lieth the Technocratic Spectre. It has always been a fashionable argument against progress, good or ill, that it is some mysterious machination by a scientific-technical elite aimed at the common man’s detriment. There is now a new iteration of this philosophy, and it is quite surprising how the backwards, low-information edges of the far right reach hands to the far left’s paranoid and misinformed segment. At least the Know-Nothings of the right live in an honest admission of ignorance, eschewing the over-blown credentials and inflated egos of their left-wing brethren like Mr Haque. But in ignorance, they both are one another’s match.

The left-wing argument against technological progress is an odd one, for the IT business, especially the part heavy on research and innovation that comes up with algorithms and their applications, is a very diverse and rather liberal sphere. Nor does this argument square too well with the traditional liberal values of upholding civil liberties, first and foremost that of freedom of expression and conscience. Instead, the objective seems to be an ever more expansive campaign, conducted entirely outside parliamentary procedure (basing itself on regulating private services from the inside and a goodly amount of shaming people into doing their will through the kind of agitated campaigning that I have never had the displeasure to see in a democracy), of limiting the expression of ideas to a rather narrowly circumscribed set, with the pretense that some minority groups are marginalised and even endangered by wrongthink.10Needless to say, a multiple-times-over minority in IT, the only people who have marginalised and endangered me were these stalwart defenders of the right never to have to face a controversial opinion.


Their own foray at algorithms has not fared well. One need only look at the misguided efforts of a certain Bay Area developer notorious for telling people to set themselves on fire. Her software, intended to block wrongthink on the weirder-than-weird cultural phenomenon of Gamergate by blocking Twitter users who have followed a small number of acknowledged wrongthinkers, expresses the flaws of this ideology beautifully. Not only is subtleness and a good technical understanding lacking. There is also a distinct shortage of good common sense and, most of all, an understanding of how to use algorithms. While terribly inefficient and horrendously badly written 11Especially for someone who declaims, with pride, her 15-year IT business experience…, the algorithm behind the GGAutoblocker is sound. It does what its creator intended it to do on a certain level: allow you to block everyone who is following controversial personalities. That this was done without an understanding of the social context (e.g. that this is a great way to block the uncommitted and those who wish to be as widely informed as possible, is of course the very point.

The problem is not with “geeks”.

The problem is when “geeks” decide to play social engineering. Whey they suddenly throw down their coding gear and decide they’re going to transform who talks with whom and how information is exchanged. The problem is exactly the opposite: it happens when geeks cease to be geeks.

It happens when Facebook experiments with users’ timelines without their consent. It happens when companies implement policies aimed at a really laudable goal (diversity and inclusion) that leads to statements by employees that should make any sane person shudder (You know who you are, Bay Area). It happens when Twitter decides they are going to experiment with their only asset. This is how it is rewarded.

$TWTR crash
Top tip: when you’ve got effectively no assets to speak of, screwing with your user base can be very fatal very fast.

The problem is not geeks seeing a technical solution to every socio-political issue.

The problem is a certain class of ‘geeks’ seeing a socio-political use to every tool.


Sins of the algorithm

Why algorithms? Because algorithms are infinitely dangerous: because they are, as I noted above, within their area of applicability universally true and correct.

But they’re also resilient. An algorithm feels no shame. An algorithm feels no guilt. You can’t fire it. You can’t tell them to set themselves on fire or, as certain elements have done to me for a single statistical analysis, threaten to rape my wife and/or kill me. An algorithm cannot be guilted into ‘right-think’. And worst of all, algorithms cannot be convincingly presented as having an internal political bias. Quicksort is not Republican. R/B trees are not Democrats. Neural nets can’t decide to be homophobic.

And for people whose sole argumentation lies on the plane of politics, in particular grievance and identity politics, this is a devastating strike. Algorithms are the greased eels unable to be framed for the ideological sins that are used to attack and remove undesirables from political and social discourse. And to those who wish to govern this discourse by fear and intimidation, a bunch of code that steadfastly spits out results and to hell with threats is a scary prospect.

And so, if you cannot invalidate the code, you have to invalidate the maker. Algorithms perpetuate real equality by being by definition unable to exercise the same kind of bias humans do (not that they don’t have their own kind of bias, but the similarity ends with the word – if your algorithm has a racial or ethnic or gender bias, you’re using it wrong). Algorithms are meritocratic, being immune to nepotism and petty politicking. A credit scorer does not care about your social status the way Mr Jones at the bank might privilege the child of his golf partners over a young unmarried ethnic couple. Trading algorithms don’t care whether you’re a severely ill young man playing the markets from hospital.12It was a great distraction. Without human intervention, algorithms have a purity and lack of bias that cannot easily be replicated once humans have touched the darn things.

And so, those whose stock in life is a thorough education in harnessing grievances for their own gain are going after “the geeks”.


Perhaps the most disgusting thing about Mr Haque’s tweet is the contraposition between “geeks” and “regular humans”, with the assumption that “regular humans” know all about algorithms and unlike the blindly algorithm-worshipping geeks, understand how ‘life is more complicated’ and algorithms are full of geeky biases.

For starters, this is hard to take seriously when in the same few tweets, Mr Haque displays a lack of understanding of algorithms that doesn’t befit an Oregon militia hick, never mind somebody who claims spurious credentials as a foremost thinker.

“Regular humans”, whatever they are that geeks aren’t (and really, I’m not one for geek supremacy, but if Mr Haque had spent five minutes among geeks, he’d know the difference is not what, and where, he thinks it is), don’t have some magical understanding of the shortcomings of algorithms. Heck, usually, they don’t have a regular understanding of algorithms, never mind magical. But it sure sounds good when you’re in the game of shaming some of the most productive members of society unless they contribute to the very problem you’re complaining about. For of course ‘geeks’ can atone for their ‘geekdom’ by becoming more of a ‘regular human’, by starting to engage in various ill-fated political forays that end with the problems that sent the blue bird into a dive on Friday.


Little of this is surprising, though. Anyone who has been paying attention could see the warning signs of a forced politicisation of technology, under the guise of making it more equal and diverse. In my experience, diverse teams perform better, yield better results, work a little faster, communicate better and make fewer big mistakes (albeit a little more small ones). In particular, gender-diverse and ethnically diverse teams are much more than the sum of their parts. This is almost universally recognised, and few businesses that have intentionally resisted creating diverse, agile teams have fared well in the long run.13Not that any statement about this matter is not shut down by reference to ludicrous made-up words like ‘mansplaining’. I’m a huge fan of diversity – because it lives up to a meritocratic ideal, one to which I am rather committed after I’ve had to work my way into tech through a pretty arduous journey.

Politicising a workplace, on the other hand, I am less fond of. Quite simply, it’s not our job. It’s not our job, because for what it’s worth, we’re just a bunch of geeks. There are things we’re better at. Building algorithms is one.

But they are now the enemy. And because they cannot be directly attacked, we’ll become targets. With the passion of a zealot, it will be taught that algorithms are not clever mathematical shortcuts but merely geeks’ prejudices expressed in maths.

And that’s a problem. If you look into the history of mathematics, most of it is peppered by people who held one kind of unsavoury view or another. Moore was a virulent racist. Pauli loved loose women. Half the 20th century mathematicians were communists at some point of their career. Haldane thought Stalin was a great man. And I could go on. But I don’t, because it does not matter. Because they took part in the only truly universal human experience: discovery.


But discovery has its enemies and malcontents. The attitude they display, evidenced by Haque’s tweet too, is ultimately eerily reminiscent of the letter that sounded the death knell on the venerable pre-WW II German mathematical tradition. Titled Kunst des Zitierens (The Art of Citing), it was written in 1934 by Ludwig Bieberbach, a vicious anti-Semite and generally unpleasant character, who was obsessed with the idea of a ‘German mathematics’, free of the Hilbertian internationalism, of what he saw as Jewish influence, of the liberalism of the German mathematical community in the inter-war years. He writes:

“Ein Volk, das eingesehen hat, wie fremde Herrschaftsgelüste an seinem Marke nagen, wie Volksfremde daran arbeiten, ihm fremde Art aufzuzwingen, muss Lehrer von einem ihm fremden Typus ablehnen.”

 

“A people that has recognised how foreign ambitions of power attack its brand, how aliens work on imposing foreign ways on it, has to reject teachers from a type alien to it.”

Algorithms, and the understanding of what they do, protect us from lunatics like Bieberbach. His ‘German mathematics’, suffused with racism and Aryan mysticism, was no less delusional than the idea that a cabal of geeks is imposing a ‘foreign way’ of algorithmically implementing their prejudices, as if geeks actually cared about that stuff.

Every age will produce its Lysenko and its Bieberbach, and every generation has its share of zealots that demand ideological adherence and measure the merit of code and mathematics based on the author’s politics.

Like on Lysenko and Bieberbach, history will have its judgment on them, too.

Head image credits: Max Slevogt, Xerxes at the Hellespont (Allegory on Sea Power). Bildermann 13, Oct. 5, 1916. With thanks to the President and Fellows of Harvard College.

References   [ + ]

1. It is now more or less consensus that Deydras was mentally ill and made the whole story up. Whether he himself believed it or not is another question.
2. As an obedient servant to a kitten, I have trouble believing this!
3. Falcón y Tella, Maria J. (2014). Justice and law, 60. Brill Nijhoff, Leiden
4. Falcón y Tella, Maria J. and Falcón y Tella, Fernando (2006). Punishment and Culture: a right to punish? Nijhoff, Leiden.
5. Hyde, Walter W. (1916). The Prosecution and Punishment of Animals and Lifeless Things in the Middle Ages and Modern Times. 64 U.Pa.LRev. 696.
6. Albeit what we currently regard as the formal definition of an algorithm is largely informed by the work of Hilbert in the 1920s, Church’s lambda calculus and, eventually, the emergence of Turing machines.
7. I discourage the promise terminology here as I’ve seen it confuzzled with the asynchronous meaning of the word way too often
8. In case you’re interested, RFC1321 explains MD5’s internals in a lot of detail.
9. Building blocks commonly used that are well-known and well-documented
10. Needless to say, a multiple-times-over minority in IT, the only people who have marginalised and endangered me were these stalwart defenders of the right never to have to face a controversial opinion.
11. Especially for someone who declaims, with pride, her 15-year IT business experience…
12. It was a great distraction.
13. Not that any statement about this matter is not shut down by reference to ludicrous made-up words like ‘mansplaining’.