Using screen to babysit long-running processes

In machine learning, especially in deep learning, long-running processes are quite common. Just yesterday, I finished running an optimisation process that ran for the best part of four days –  and that’s on a 4-core machine with an Nvidia GRID K2, letting me crunch my data on 3,072 GPU cores!  Of course, I did not want to babysit the whole process. Least of all did I want to have to do so from my laptop. There’s a reason we have tools like Sentry, which can be easily adapted from webapp monitoring to letting you know how your model is doing.

One solution is to spin up another virtual machine, ssh into that machine, then from that
ssh into the machine running the code, so that if you drop the connection to the first machine, it will not drop the connection to the second. There is also nohup, which makes sure that the process is not killed when you ‘hang up’ the ssh connection. You will, however, not be able to get back into the process again. There are also reparenting tools like reptyr, but the need they meet is somewhat different. Enter terminal multiplexers.

Terminal multiplexers are old. They date from the era of things like time-sharing systems and other antiquities whose purpose was to allow a large number of users to get their time on a mainframe designed to serve hundreds, even thousands of users. With the advent of personal computers that had decent computational power on their own, terminal multiplexers remained the preserve of universities and other weirdos still using mainframe architectures. Fortunately for us, two great terminal multiplexers, screen (aka GNU Screen ) and tmux , are still being actively developed, and are almost definitely available for your *nix of choice. This gives us a convenient tool to sneak a peek at what’s going on with our long-suffering process. Here’s how.

Step 1
ssh into your remote machine, and launch ssh. You may need to do this as sudo if you encounter the error where screen, instead of starting up a new shell, returns [screen is terminating] and quits. If screen is started up correctly, you should be seeing a slightly different shell prompt (and if you started it as sudo, you will now be logged in as root).
ssh into your machine, and launch screen (screen).
In some scenarios, you may want to ‘name’ your screen session. Typically, this is the case when you want to share your screen with another user, e.g. for pair programming. To create a named screen, invoke screen using the session name parameter -S, as in e.g. screen -S my_shared_screen.
Step 2
In this step, we will be launching the actual script to run. If your script is Python based and you are using virtualenv (as you ought to!), activate the environment now using source /<virtualenv folder>/bin/activate, replacing  virtualenv folderby the name of the folder where your virtualenvs live (for me, that’s the environments folder, often enough it’s something like ~/.virtualenvs) and by the name of your virtualenv (in my case, research). You have to activate your virtualenv even if you have done so outside of screen already (remember, screen means you’re in an entirely new shell, with all environment configurations, settings, aliases &c. gone)!

With your virtualenv activated, launch it as normal — no need to launch it in the background. Indeed, one of the big advantages is the ability to see verbose mode progress indicators. If your script does not have a progress logger to stdout but logs to a logfile, you can start it using nohup, then put it into the background (Ctrl--Z, then bg) and track progress using tail -f logfile.log (where logfile.log is, of course, to be substituted by the filename of the logfile.
Step 3
Press Ctrl--A followed by Ctrl--D to detach from the current screen. This will take you back to your original shell after noting the address of the screen you’re detaching from. These always follow the format <identifier>.<session id>.<hostname>, where hostname is, of course, the hostname of the computer from which the screen session was started, stands for the name you gave your screen if any, and is an autogenerated 4-6 digit socket identifier. In general, as long as you are on the same machine, the screen identifier or the session name will be sufficient – the full canonical name is only necessary when trying to access a screen on another host.

To see a list of all screens running under your current username, enter screen -list. Refer to that listing or the address echoed when you detached from the screen to reattach to the process using screen -r <socket identifier>[.<session identifier>.<hostname>]. This will return you to the script, which keeps executing in the background.
Result
Reattaching to the process running in the background, you can now follow the progress of the script. Use the key combination in Step 3 to step out of the process anytime and the rest of the step to return to it.

Bugs
There is a known issue, caused by strace, that leads to screen immediately closing, with the message [screen is terminating] upon invoking screen as a non-privileged user.

There are generally two ways to resolve this issue.

The overall effect of both solutions is the same. Notably, both may be undesirable from a security perspective. As always, weigh risks against utility.

Do you prefer screen to staying logged in? Do you have any other cool hacks to make monitoring a machine learning process that takes considerable time to run? Let me know in the comments!

Image credits: Zenith Z-19 by ajmexico on Flickr

A deep learning

There are posts that are harder to write than others. This one perhaps has been one of the hardest. It took me the best part of four months and dozens of rewrites.

Because it’s about something I love. And about someone I love. And about something else I love. And how these three came to come into a conflict. And, perhaps, what we all can learn from that.


As many of you might know, deep learning is my jam. Not in a faddish, ‘it’s what cool kids do these days’ sense. Nor, for that matter, in the sense so awfully prevalent in Silicon Valley, whereby the utility of something is measured in how many jobs it will get rid of, presumably freeing off humans to engage in more cerebral pursuits, or how it may someday cure intrinsically human problems if only those pesky humans were to listen to their technocratic betters for once. Rather, I’m a deep learning and AI researcher who believes in what he’s doing. I believe with all I am and all I’ve got that deep learning is right now our best chance to find better ways of curing cancer, producing more with less emissions, building structures that can withstand floods on a dime, identifying terrorists and, heck, create entertaining stuff. I firmly believe that it’s one of the few intellectual pursuits I am somewhat suited for that is also worth my time, not the least because I firmly believe that it will make me have more of it – and if not me, maybe someone equally worthy.

Which is why it was so hard for me to watch this video, of my lifelong idol Hayao Miyazaki ripping a deep learning researcher to shreds.

Now, quite frankly, I have little time for the researcher and his proposition. It’s badly made, dumb and pointless. Why one would inundate Miyazaki-san with it is beyond me. His putdown is completely on point, and not an ounce too harsh. All of his words are well deserved. As someone with a neurological chronic pain disorder that makes me sometimes feel like that creature writhing on the floor, I don’t have a shred of sympathy for this chap.1Least of all because I know how rudimentary and lame his work is. I’ve built evolutionary models of locomotion where the first stages look like this. There’s no cutting edge science here.

Rather, it’s the last few words of Miyazaki-san that have punched a hole in my heart and have held my thoughts captive for months now, coming back into the forefront of my thoughts like a recurring nightmare.

“I feel like we are nearing the end of times,” he says, the camera gracefully hovering over his shoulder as he sketches through his tears. “We humans are losing faith in ourselves.”


Deep learning is something formidable, something incredible, something so futuristic yet so simple. Deep down (no pun intended), deep learning is really not much more than a combination of a few relatively simple tricks, some the best part of a century old, that together create something fantastic. Let me try to put it into layman’s terms (if you’re one of my fellow ML /AI nerds, you can just jump over this part).

Consider you are facing the arduous and yet tremendously important task of, say, identifying whether an image depicts a cat or a dog. In ML lingo, this is what we call a ‘classification’ task. One traditional approach used to be to define what cats are versus what dogs are, and provide rules. If it’s got whiskers, it’s a cat. If it’s got big puppy eyes, it’s, well, a puppy. If it’s got forward pointing eyes and a roughly circular face, it’s almost definitely a kitty. If it’s on a leash, it’s probably a dog. And so on, ad infinitum, your model of a cat-versus-dog becoming more and more accurate with each rule you add.

This is a fairly feasible approach, and is still used. In fact, there’s a whole school of machine learning called decision trees that relies on this kind of definition of your subjects. But there are three problems with it.

  1. You need to know quite a bit about cats and dogs to be able to do this. At the very least, you need to be able to, and take the time and effort to, describe cats and dogs. It’s not enough to merely feed images of each to the computer.2There’s a whole aspect of the story called feature extraction, which I will ignore for the sake of simplicity, and assume that it just happens. It doesn’t, of course, and it plays a huge role in identifying things, but this story is complex enough already as it is.
  2. You are limited in time and ability to put down distinguishing features – your program cannot be infinitely large, nor do you have infinite time to write it. You must prioritise by identifying the factors with the greatest differentiating potential first. In other words, you need to know, in advance, what the most salient characteristics of cats versus dogs are – that is, what characteristics are almost omnipresent among cats but hardly ever occur among dogs (and vice versa)? All dogs have a snout and no cat has a snout, whereas some cats do have floppy ears and some dogs do have almost catlike triangular ears.
  3. You are limited to what you know. Silly as that may sound, there might be some differentia between cats and dogs that are so arcane, so mathematical that no human would think of it – but which might come trivially evident to a computer.

Deep learning, like friendship, is magic. Unlike most other techniques of machine learning, you don’t need to have the slightest idea of what differentiates cats from dogs. What you need is a few hundred images of each, preferably with a label (although that is not strictly necessary – classifiers can get by just fine without needing to be told what the names of the things they are classifying are: as long as they’re told how many different classes they are to split the images into, they will find differentiating features on their own and split the images into ‘images with thing 1’ versus  ‘images with thing 2’. – magic, right?). Using modern deep learning libraries like TensorFlow and their high level abstractions (e.g. keras, tflearn) you can literally write a classifier that identifies cats versus dogs with a very high accuracy in less than 50 lines of Python that will be able to classify thousands of cat and dog pics in a fraction of a minute, most of which will be taken up by loading the images rather than the actual classification.

Told you it’s magic.


What makes deep learning ‘deep’, though? The origins of deep learning are older than modern computers. In 1943, McCullough and Pitts published a paper3McCulloch, W and Pitts, W (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259. that posited a model of neural activity based on propositional logic. Spurred by the mid-20th century advances in understanding how the nervous system works, in particular how nerve cells are interconnected, McCulloch and Pitts simply drew the obvious conclusion: there is a way you can represent neural connections using propositional logic (and, actually, vice versa). But it wasn’t until 1958 that this idea was followed up in earnest. Rosenblatt’s ground-breaking paper4Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psych Rev 65 (6): 386–408. doi:10.1037/h0042519 introduced this thing called the perceptron, something that sounds like the ideal robotic boyfriend/therapist but in fact was intended as a mathematical model for how the brain stores and processes information. A perceptron is a network of artificial neurons. Consider the cat/dog example. A simple single-layer perceptron has a list of input neurons   x_1 ,   x_2   and so on. Each of these describe a particular property. Does the animal have a snout? Does it go woof? Depending on how characteristic they are, they’re multiplied by a weight  w_n . For instance, all dogs and no cats have snouts, so   w_1   will be relatively high, while there are cats that don’t have long curly tails and dogs that do, so  w_n   will be relatively low.

At the end, the output neuron (denoted by the big  \Sigma  ) sums up these results, and gives an estimate as to whether it’s a cat or a dog.

What was initially designed to model the way the brain works has soon shown remarkable utility in applied computation, to the point that the US Navy was roped into building an actual, physical perceptron machine – the first application of computer vision. However, it was a complete bust. It turned out that a single layer perceptron couldn’t really recognise a lot of patterns. What it lacked was depth.

What do we mean by depth? Consider the human brain. The brain actually doesn’t have a single part devoted to vision. Rather, it has six separate areas5Or five, depending on whether you consider the dorsomedial area a separate area of the extrastriate cortex – the striate cortex (V1) and the extrastriate areas (V2-V6). These form a feedforward pathway of sorts, where V1 feeds into V2, which feeds into V3 and so on. To massively oversimplify: V1 detects optical features like edges, which it feeds on to V2, which breaks these down into more complex features: shapes, orientation, colour &c. As you proceed towards the back of the head, the visual centres detect increasingly complex abstractions from the simple visual information. What was found is that by putting layers and layers of neurons after one another, even very complex patterns can be identified accurately. There is a hierarchy of features, as the facial recognition example below shows.

The first hidden layer recognises simple geometries and blobs at different parts of the zone. The second hidden layer fires if it detects particular manifestations of parts of the face – noses, eyes, mouths. Finally, the third layer fires if it ‘sees’ a particular combination of these. Much like an identikit image, a face is recognised because it contains parts of a face, which in turn are recognised because they contain a characteristic spatial alignment of simple geometries.

There’s much more to deep learning than what I have tried to convey in a few paragraphs. The applications are endless. With the cost of computing decreasing rapidly, deep learning applications have now become feasible in just about all spheres where they can be applied. And they excel everywhere, outpacing not only other machine learning approaches (which makes me absolutely stoked about the future!) but, at times, also humans.


Which leads me back to Miyazaki. You see, deep learning can’t just classify things or predict stock prices. It can also create stuff. To put an old misunderstanding to rest quite early: generative neural networks are genuinely creating new things. Rather than merely combining pre-programmed elements, they come as close as anything non-human can come to creativity.

The pinnacle of it all, generating enjoyable music, is still some ways off, and we have yet to enjoy a novel written by a deep learning engine. But to anyone who has been watching the rapid development of deep learning and especially generative algorithms based on deep learning, these are literally just questions of time.

Or perhaps, as Miyazaki said, questions of the ‘end of times’.

What sets a computer-generated piece apart from a human’s composition? Someday, they will be, as far as quality is concerned, indistinguishable. Yet something that will always set them apart is the absence of a creator.

In what is probably one of the worst written essays in  20th century literary criticism, a field already overflowing with bad prose for bad prose’s sake, Roland Barthes’s 1967 essay La mort de l’auteur posited a sort of separation between the author and the text, countering centuries of literary criticism that sought to explain the meaning of the latter by reference to the former.  According to Barthes, texts (and so, compositions, paintings &.) have a life and existence of their own. To liberate works of art of an  ‘interpretive  tyranny’ that is almost self-explanatorily imposed on it, they must be read, interpreted and understood by reference to its audience and not its author. Indeed, Barthes eschews the term in favour of the term ‘scriptor‘, the latter hearkening back to the Medieval monks who copied manuscripts: like them, the scriptor is not in control of the narrative or work of art that he or she composes. Devoid of the author’s authority, the work of art is now free to exist in a liberated state that allows you – the recipient – to establish its essential meaning.

Oddly, that’s not entirely what post-modernism seems to have created. If anything, there is now an increased focus on the author, at the very least in one particular sense. Consider the curious case of Wagner’s works in Israel. Because of his anti-Semitic views, arguably as well as due to the favour his music found during the tragic years of the Third Reich, Wagner’s works – even those that do not even remotely express a political position – are rarely played in Israel. Even in recent years, other than Holocaust survivor Mendi Roman’s performance of Siegfried in 2000, there have been very few instances of Wagner played in Israel – despite the curious fact that Theodor Herzl, founder of Zionism, admired Wagner’s music (if not his vile racial politics). Rather than the death of the author, we more often witness the death of the work. The taint of the author’s life comes to haunt the chords of his composition and the stanzas of his poetry, every brush-stroke of theirs forever imbued with the often very human sins and mistakes of their lives.

Less dramatic, perhaps, than Wagner’s case are the increasingly frequent boycotts, outbursts and protests against works of art solely based on the character of the author or composer. One must only look at the recent past to see protests, for instance, against the works of HP Lovecraft, themselves having to do more with eldritch horrors than racist horridness, due to the author’s admittedly reprehensible views on matters of race. Outrages about one author or another, one artist or the next, are commonplace, acted out on a daily basis on the Twitter gibbets and the Facebook  pillory. Rather than the death of the author, we experience the death of art, amidst an increasingly intolerant culture towards  the works of flawed or sinful creators.

This is, of course, not to excuse any of those sins or flaws. They should not, and cannot, be excused. Rather, perhaps, it is to suggest that part of a better understanding of humanity is that artists are a cross-section of us as a species, equally prone to be misled and deluded into adopting positions that, as the famous German anti-Fascist and children’s book author Erich Kästner said, ‘feed the animal within man’. Nor is this to condone or justify art that actively expresses those reprehensible views – an entirely different issue. Rather, I seek merely to draw attention to the increased tendency to condemn works of art for the artist’s political sins. In many cases, these sins are far from being as straightforward as Lovecraft’s bigotry and Wagner’s anti-Semitism. In many cases, these sins can be as subtle as going against the drift of public opinion, the Orwellian sin of ‘wrongthink’. With the internet having become a haven of mob mentality (something I personally was subjected to a few years ago), the threshold of what sins  of the creator shall be visited upon their creations has significantly decreased. It’s not the end of days, but you can see it from here.

In which case perhaps Miyazaki is right.

Perhaps what we need is art produced by computers.


As Miyazaki-san said, we are losing faith in ourselves. Not in our ability to create wonderful works of art, but in our ability to measure up to some flawless ethos, to some expectation of the artist as the flawless being. We are losing faith in our artists. We are losing faith in our creators, our poets and painters and sculptors and playwrights and composers, because we fear that with the inevitable revelation of greater – or perhaps lesser – misdeeds or wrongful opinions from their past shall not merely taint them: they shall no less taint us, the fans and aficionados and cognoscenti. Put not your faith in earthly artists, for they are fickle, and prone to having opinions that might be unacceptable, or be seen as such someday. Is it not a straightforward response then to  declare one’s love for the intolerable synthetic Baroque of Stanford machine learning genius Cary Kaiming Huang’s research? In a society where the artist’s sins taint the work of art and through that, all those who confessed to enjoy his works, there’s no other safe bet. Only the AI can cast the first stone.

And if the cost of that is truly the chirps of Cary’s synthetic Baroque generator, Miyazaki is right on the other point, too. It truly is the end of days.

References   [ + ]

1. Least of all because I know how rudimentary and lame his work is. I’ve built evolutionary models of locomotion where the first stages look like this. There’s no cutting edge science here.
2. There’s a whole aspect of the story called feature extraction, which I will ignore for the sake of simplicity, and assume that it just happens. It doesn’t, of course, and it plays a huge role in identifying things, but this story is complex enough already as it is.
3. McCulloch, W and Pitts, W (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259.
4. Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psych Rev 65 (6): 386–408. doi:10.1037/h0042519
5. Or five, depending on whether you consider the dorsomedial area a separate area of the extrastriate cortex

How I predicted Trump’s victory

Introit

“Can you, just once, explain it in intelligible words?”, my wife asked.

We’ve been talking for about an hour about American politics, and I made a valiant effort at trying to explain to her how my predictive model for the election worked, what it took into account and what it did… but twenty minutes in, I was torn between either using terms like stochastic gradient descent and confusing her, or having to start to build everything up from high school times tables onwards.

Now, my wife is no dunce. She is one of the most intelligent people I’ve ever had the honour to encounter, and I’ve spent years moving around academia and industry and science. She’s not only a wonderful artist and a passionate supporter of the arts, she’s also endowed with that clear, incisive intelligence that can whittle down the smooth, impure rock of a nascent theory into the Koh-I-Noor clarity of her theoretical work.

Yet, the fact is, we’ve become a very specialised industry. We, who are in the business of predicting the future, now do so with models that are barely intelligible to outsiders, and some even barely intelligible to those who do not share a subfield with you (I’m looking at you, my fellow topological analytics theorists!). Quite frankly, then: the world is run by algorithms that at best a fraction of us understand.

So when asked to write an account of how I predicted Trump’s victory, I’ve tried to write an account for a ‘popular audience’. 1There is an academic paper with a lot more details forthcoming on the matter – incidentally, because republication is generally not permitted, it will contain many visualisations I was not able or allowed to put into this blog post. So just for that, it may be worth reading once it’s out. I will post a link to it here. That means there’s more I want to get across than the way I built some model that for once turned out to be right. I also want to give you an insight into a world that’s generally pretty well hidden behind a wall made of obscure theory, social anxiety and plenty of confusing language. The latter, in and of itself, takes some time and patience to whittle down. People have asked me repeatedly what this support vector machine I was talking about all the time looked like, and were disappointed to hear it was not an actual machine with cranks and levers, just an algorithm. And the joke is not really on them, it’s largely on us. And so is the duty to make ourselves intelligible.

Prelude

I don’t think there’s been a Presidential election as controversial as Trump’s in recent history. Certainly I cannot remember any recent President having aroused the same sort of fervent reactions from supporters and opponents alike. As a quintessentially apolitical person, that struck me as the kind of odd that attracts data scientists like flies. And so, about a year ago, amidst moving stacks of boxes into my new office, I thought about modelling the outcome of the US elections.

It was a big gamble, and it was a game for a David with his sling. Here I was, with a limited (at best) understanding of the American political system, not much access to private polls the way major media and their court political scientists have, and generally having to rely on my own means to do it. I had no illusions about the chances.

After the first debate, I tweeted this:

Also, as so many asked: post debate indicators included, only 1 of over 200 ensemble models predict a HRC win. Most are strongly Trump win.

– Chris (@DoodlingData), September 28, 2016

To recall, this was a month and a half ago, and chances for Trump looked dim. He was assailed from a dozen sides. He was embroiled in what looked at the time as the largest mass accusation of sexual misconduct ever levelled against a candidate. He had, as many right and left were keen on pointing out, “no ground game”, polling unanimously went against him and I was fairly sure dinner on 10 November at our home will include crow.

But then, I had precious little to lose. I was never part of the political pundits’ cocoon, nor did I ever have a wish to be so. There’s only so much you can offer a man in consideration of a complete commonsensectomy. I do, however, enjoy playing with numbers – even if it’s a Hail Mary pass of predicting a turbulent, crazy election.

I’m not alone with that – these days, the average voter is assailed by a plethora of opinions, quantifications, pontifications and other -fications about the vote. It’s difficult to make sense of most of it. Some speak of their models downright with the same reverence one might once have invoked the name of the Pythiae of the Delphic Oracle. Others brashly assert that ‘math says’ one or other party has ‘already won’ the elections, a month ahead. And I would entirely forgive anyone who were to think that we are, all in all, a bunch of charlatans with slightly more high-tech dowsing rods and flashier crystal balls.

Like every data scientist, I’ve been asked a few times what I ‘really’ do. Do I wear a lab coat? I work in a ‘lab’, after all, so many deduced I would be some sort of experimental scientist. Or am I the Moneyball dude? Or Nate Silver?

Thankfully, neither of those is true. I hate working in the traditional experimental science lab setting (it’s too crowded and loud for my tastes), I don’t wear a lab coat (except as a joke at the expense of one of my long-term harassers), I don’t know anything about baseball statistics and, thanks be to God, I am not Nate Silver.

I am, however, in the business of predicting the future. Which sounds very much like theorising about spaceships and hoverboards, but is in fact quite a bit narrower. You see, I’m a data scientist specialising in several fields of working with data, one of which is ‘predictive analytics’ (PA). PA emerged from combinatorics (glorified dice throwing), statistics (lies, damned lies and ~) and some other parts of math (linear algebra, topology, etc.) and altogether aims to look at the past and find features that might help predicting the future. Over the last few years, this field has experienced an absolute explosion, thanks to a concept called machine learning (ML).

ML is another of those notions that evokes more passionate fear than understanding. In fact, when I explained to a kindly old lady with an abundance of curiosity that I worked in machine learning, she asked me what kind of machines I was teaching, and what I was teaching them – and whether I had taught children before. The reality is, we don’t sit around and read Moby Dick to our computers. Nor is ML some magic step towards artificial intelligence, like Cortana ingesting the entire Forerunner archives in Halo. No, machine learning is actually quite simple: it’s the art and science of creating applications that, at least when they work well, perform better each time than the time before.

It is high art and hard science. Most of modern ML is unintelligible without very solid mathematical foundations, and yet knowledge has never really been able to substitute for experience and a flair for constructing, applying and chaining mathematical methods to the point of accomplishing the best, most accurate result.

Wait, I haven’t talked about results yet! In machine learning, we have two kinds of ‘result’. We have processes we call ‘supervised learning’, where we give the computer a pattern and expect it to keep applying it. For instance, we give it a set (known in this context as the training set) of heart rhythm (ECG) tracings, and tell it which ones are fine and which ones are pathological. We then expect the computer to accurately label any heart rhythm we give to it.

There is also another realm of machine learning, called ‘unsupervised learning’. In unsupervised learning, we let the computer find the similarities and connections it wants to. One example would be giving the computer the same set of heart traces. It would then return what we call a ‘clustering’ – a group of heartbeats on one hand that are fine, and the pathological heartbeats on the other. We are somewhat less concerned with this type of machine learning. Electoral prediction is pretty much a straightforward supervised learning task, although there are interesting addenda that one can indeed do by leveraging certain unsupervised techniques. For instance, groups of people designated by certain characteristics might vote together, and a supervised model might be ‘told’ that a given number of people have to vote ‘as a block’.

These results are what we call ‘models’.

On models

Ever since Nate Silver allegedly predicted the Obama win, there has been a bit of a mystery-and-no-science-theatre around models, and how they work. Quite simply, a model is a function, like any other. You feed it source variables, it spits out a target variable. Like your washing machine:

f(C_d, W, E_{el}, P_w) = (C_c)

That is, put in dirty clothes (C_d ), water (W ), electricity (E_{el} ) and washing powder (P_w ), get clean clothes (C_c ) as a result. Simple, no?

The only reason why a model is a little different is that it is, or is supposed to be, based on the relationship between some real entities on each side of the equality, so that if we know what’s on the left side (generally easy-to-measure things), we can get what’s on the right side. And normally, models were developed in some way by reference to data where we do have both sides of the equation. An example for this is the tool known as Henssge’s nomogram, which is a tool called a nomogram, a visual representation of certain physical relationships. That particular model was developed from hundreds, if not thousands, of measurements of (get your retching bag ready), butthole temperature measurements of dead bodies where the time of death actually was known. As I’m certain you know, when you die, you slowly assume room temperature. There are a million factors that influence this, and to calculate the time since death could certainly break a supercomputer. And it would be accurate, but not much more accurate than Henssge’s method. Turns out, a gentleman called Claus Henssge discovered, that three and a half factors are pretty much enough to estimate the time since death with reasonable accuracy: the ambient temperature, the aforementioned butthole temperature, the decedent’s body weight, and a corrective factor to take account for the decedent’s state of nakedness. Those factors altogether give you 95% or so accuracy – which is pretty good.

The Henssge nomogram illustrates two features of every model:

  1. They’re all based on past or known data.
  2. They’re all, to an extent, simplifications.

Now, traditionally, a model used to be built by people who reasoned deductively, then did some inductive stuff such as testing to assuage the more scientifically obsessed. And so it was with the Henssge nomogram, where data was collected, but everyone had a pretty decent hunch that time of death will correlate best with body weight and the difference between ambient and core (= rectal) temperature. That’s because heat transfer from a body to its environment generally depends on the temperature differential and the area of the surface of exchange:

Q = hA(T_a - T_b)

where Q is heat transferred per unit time, h is the heat transfer coefficient, A is the area of the object and T_a - T_b is the temperature difference. So from that, it then follows that T_a and T_b can be measured, h is relatively constant for humans (most humans are composed of the same substance) and A can be relatively well extrapolated from body weight.2The reasoning here is roughly as follows. Assume the body is a sphere. All bodies are assumed of being made of the same material, which is also assumed to be homogenous. The volume of a sphere V = \frac{4}{3} \pi r^3 , and its weight is that multiplied by its density \rho . Thus the radius of a sphere of a matter of known density \rho can be calculated as r = \sqrt[3]{\frac{3}{4} \frac{M}{\pi \rho}} . From this, the surface area can be calculated (A = 4 \pi r^2 ). Thus, body weight is a decent stand-in for surface area.

The entire story of modelling can be understood to focus on one thing, and do it really well: based on a data set (the training set), it creates a model that seeks to describe the essence of the relationship between the variables involved in the training set. The simplest suich relationships are linear: for instance, if the training set consists of {number of hamburgers ordered; amount paid}, the model will be a straight line – for every increase on the hamburger axis, there will be the same increase on the amount paid axis. Some models are more complex – when they can no longer be described as a combination of straight lines, they’re called ‘nonlinear’. And eventually, they get way too complex to be adequately plotted. That is often the consequence of the training dataset consisting not merely of two fields (number of hamburgers and the target variable, i.e. price), but a whole list of other fields. These fields are called elements of the feature vector, and when there’s a lot of them, we speak of a high-dimensional dataset. The idea of a ‘higher dimension’ might sound mysterious, but true to fashion, mathematicians can make it sound boring. In data science, we regularly throw around data sets of several hundred or thousand dimensions or even more – so many, in fact, that there are whole techniques intended to reduce this number to something more manageable.

But just how do we get our models?

Building our model

In principle, you can sit down, think about a process and create a model based on some abstract simplifications and some other relationships you are aware of. That’s how the Henssge model was born – you need no experimental data to figure out that heat loss will depend on the radiating area, the temperature difference to ‘radiate away’ and the time the body has been left to assume room temperature: these things more or less follow from an understanding of how physics happens to work. You can then use data to verify or disprove your model, and if all goes well, you will get a result in the end.

There is another way of building models, however. You can feed a computer a lot of data, and have it come up with whatever representation gives the best result. This is known as machine learning, and is generally a bigger field than I could even cursorily survey here. It comes in two flavours – unsupervised ML, in which we let the computer loose on some data and hope it turns out ok, and supervised ML, in which we give the computer a very clear indication of what approrpiate outputs are for given input values. We’re going to be concerned with the latter. The general idea of supervised ML is as follows.

  1. Give the algorithm a lot of value pairs from both sides of the function – that is, show the algorithm what comes out given a particular input. The inputs, and sometimes even the outputs, may be high-dimensional – in fact, in the field I deal with normally, known as time series analytics, thousands of dimensions of data are pretty frequently encountered. This data set is known as the training set.
  2. Look at what the algorithm came up with. Start feeding it some more data to which you know the ‘correct’ output, so to speak, data which you haven’t used as part of the training set. Examine how well your model is doing predicting the test set.
  3. Tweak model parameters until you get closer to higher accuracy. Often, an algorithm called gradient descent is used, which is basically a fancy way of saying ‘look at whether changing a model parameter in a particular direction by \mu makes the model perform better, and if so, keep doing it until it doesn’t’. \mu is known as the ‘learning rate’, and determines on one hand how fast the model will get to a best possible approximation of the result (how fast the modell will converge), and on the other, how close it will be to the true best settings. Finding a good learning rate is more a dark art than science, but something people eventually get better at with practice.

In this case, I was using a modelling approach called a backpropagation neural network. An artificial neural network (ANN) is basically a bunch of nodes, known as neurons, connected to each other. Each node runs a function on the input value and spits it out to its output. An ANN has these neurons arranged in layers, and generally nodes feed in one direction (‘forward’), i.e. from one layer to the next, and never among nodes in the same layer.

Neurons are connected by ‘synapses’ that are basically weighted connections (weighting simply means multiplying each input to a neuron by a value that emphasises its significance, so that these values all add up to 1). The weights are the ‘secret sauce’ to this entire algorithm. For instance, you may have an ANN set to recognise handwritten digits. The layers would get increasingly complex. So one node may respond to whether the digit has a straight vertical line. The output node for the digit 1 would weight the output from this node quite strongly, while the output node for 8 would weight it very weakly. Now, it’s possible to pick the functions and determine the weights manually, but there’s something better – an algorithm called backpropagation that, basically, keeps adjusting weights using gradient descent (as described above) to reach an optimal weighting, i.e. one that’s most likely to return accurate values.

My main premise for creating the models was threefold.

  1. No polling. None at all. The explanation for that is twofold. First, I am not a political scientist. I don’t understand polls as well as I ought to, and I don’t trust things I don’t understand completely (and neither should you!). Most of all, though, I worry that polls are easy to influence. I witnessed the 1994 Hungarian elections, where the incumbent right-wing party won all polls and exit-poll surveys by a mile… right up until eventually the post-communists won the actual elections. How far that was a stolen election is a different question: what matters is that ever since, I have no faith at all in polling, and that hasn’t gotten better lately. Especially in the current elections, a stigma has developed around voting Trump – people have been beaten up, verbally assaulted and professionally ostracised for it. Clearly asking them politely will not give you the truth.
  2. No prejudice for or against particular indicators. The models were generated from a vast pool of indicators, and, to put it quite simply, a machine was created that looked for correlations between electoral results and various input indicators. I’m pretty sure many, even perhaps most, of those correlations were spurious. At the same time, spurious correlations don’t hurt a predictive model if you’re not intending to use the model for anything other than prediction.
  3. Assumed ergodicity. Ergodicity, quite simply, means that the average of an indicator over time is the same as the average of an indicator over space. To give you an example:3I am indebted to Nassim Nicholas Taleb for this example. assume you’re interested in the ‘average price’ of shoes. You may either spend a day visiting every shoe store and calculate the average of their prices (average over space), or you may swing past the window of the shoe store on your way to work and look at the prices every day for a year or so. If the price of shoes is ergodic, then the two averages will be the same. I thus made a pretty big and almost certainly false assumption, namely that the effect of certain indicators on individual Senate and House races is the same as on the Presidency. As said, while this is almost certainly false, it did make the model a little more accurate and it was the best model I could use for things for which I do not have a long history of measurements, such as Twitter prevalence.

One added twist was the use of cohort models. I did not want to pick one model to stake all on – I wanted to generate groups (cohorts) of 200 models each, wherein each would be somewhat differently tuned. Importantly, I did not want to create a ‘superteam’ of the best 200 models generated in different runs. Rather, I wanted to select the group of 200 models that is most likely to give a correct overall prediction, i.e. in which the actual outcome would most likely be the outcome predicted by the majority of the models. This allows for picking models where we know they will, ultimately, act together as an effective ensemble, and models will ‘balance out’ each other.

A supercohort of 1,000 cohorts of 200 models each was trained on electoral data since 1900. Because of the ergodicity assumption (as detailed above), the models included non-Presidential elections, but anything ‘learned’ from such elections was penalised. This is a decent compromise if we consider the need for ergodicity. For example, I have looked at the (normalised fraction4Divide the smaller by the larger value, normalise to 1. of the) two candidates’ media appearances and their volume of bought advertising, but mass media hasn’t always been around for the last 116 years in its current form. So I looked at the effect that this had on smaller elections. All variables weighted to ‘decay’ depending on their age.

Tuning of model hyperparameters and deep architecture was attempted in two ways. I initially began with a classical genetic algorithm for tuning hyperparameters and architecture, aware that this was less efficient than gradient descent based algorithms but more likely to give you a diversity of hyperparameters and far more suited to multi-objective systems. Compared with gradient descent algorithms, genetic algorithms took longer but performed better. This was an acceptable tradeoff to me, so I eventually adapted a multi-objective genetic algorithm implementation, drawing on the Python DEAP package and some (ok, a LOT of) custom code. Curiously (or maybe not – I recently learned this was a ‘well known’ finding –  apparently not as well known after all!), the best models came out of ‘split training’: genetically optimised convolutional layers, genetically optimised structure but non-convolutional layers are trained using backpropagation.

Another twist was the use of ‘time contingent parameters’. That’s a fancy word of saying data that’s not available ab initio. An example for that would be post-debate changes of web search volumes for certain keywords associated with each candidate. Trivially, that information is not in existence until a week or so post-debate. These models were trained to ‘variants’. So if a particular model had information missing, it defaulted to an equally weighted model without the nodes that would have required that information. Much as this was a hacky solution, it was acceptable to me as I knew that by late October, every model would have complete information.

I wrote a custom mdoel runner in Python with an easy-as-heck output interface – I was not concerned with creating pretty, I was concerned with creating good. The runner first pulled all data it required once again, diffed it against the previous version, reran feature extractors where there was a change, then ran the models over the feature vectors. Outputs went into CSV files and simple outputs that looked like this (welcome to 1983):

CVoncsefalvay @ orinoco ~/Developer/mfarm/election2016 $ mrun –all

< lots of miscellaneous debug outputs go here >

[13:01:06.465 02 Nov 2016 +0000] OK DONE.
[13:01:06.590 02 Nov 2016 +0000] R 167; D 32; DNC 1
[13:01:06.630 02 Nov 2016 +0000] Output written to outputs/021301NOV2016.mconfdef.csv

That’s basically saying that (after spending the best part of a day scoring through all the models) 167 models were predicting a Republican victory, 32 a Democratic victory and one model crashed, did not converge somewhere or otherwise broke. The CSV output file would then give further data about each submodel, such as predicted turnout, predictions of the electoral college and popular vote, etc. The model was run with a tolerance of 1%, i.e. up to two models can break and the model would still be acceptable. Any more than that, and a rerun would be initiated automatically. One cool thing: this was my first application using the Twilio API to send me messages keeping me up to date on the model. Yes, I know, the 1990s called, they want SMS messaging back.

By the end of the week, the first models have phoned back. I was surprised: was Trump really that far ahead? The polls have slammed him, he seemed hopeless, he’s not exactly anyone’s idea of the next George Washington and he ran against more money, more media and more political capital. I had to spend the best part of a weekend confirming the models, going over them line by line, doing tests and cross-validation, until I was willing to trust my models somewhat.

But part of our story in science is to believe evidence with the same fervour we disbelieve assertions without it. And so, after being unable to find the much expected error in my code and the models, I concluded they must be right.

Living with the models

The unique exhilaration, but also the most unnerving feature, of creating these models was how different they are from my day-to-day fare. When I write predictive models, the approach is, and remains, quintessentially iterative. We build models, we try them, and iteratively improve on them. It is dangerous to fall in love with one’s own models – today’s hero is in all likelihood destined for tomorrow’s dungheap, with another, better model taking its place – until that model, too, is discarded for a better approach, and so on. We do this because of the understanding that reality is a harsh taskmaster, and it always has some surprises in store for us. This is not to say that data scientists build and sell half-assed, flawed products – quite the opposite: we give you the best possible insight we can with the information we’ve got. But how reality pans out will give us more new information, and we can work with that to move another step closer to the elusive truth of predicting the future. And one day, maybe, we’ll get there. But every day, if we play the game well, we get closer.

Predicting a one-time event is different. You don’t get pointers as to whether you are on the right track or not. There are no subtle indications of whether the model is going to work or not. I have rarely had a problem sticking by a model I built that I knew was correct, because I knew every day that new information would either confirm or improve my model – and after all, turning out the best possible model is the important part, not getting it in one shot, right? It was unnerving to have a model built on fairly experimental techniques, with the world predicting a Clinton win with a shocking unanimity. There were extremely few who predicted a Trump win, and we all were at risk of being labelled either partisans for Trump (a rather hilarious accusation when levelled at me!) or just plain crackpots. So I pledged not to discuss the technical details of my models unless and until the elections confirmed they were right.

So it came to pass that it was me, the almost apolitical one, rather than my extremely clever and politically very passionate wife, who stayed up until the early hours of the morning, watching the results pour in. With CNN, Fox and Twitter over three screens, refreshing all the time, I watched as Trump surged ahead early and maintained a steady win.

My model was right.

Coda

It’s the 16th of November today. It’s been almost a week since the elections, and America is slowly coming to terms with the unexpected. It is a long process, it is a traumatic process, and the polling and ‘quantitative social science’ professions are, to an extent, responsible for this. There was all kinds of sloppiness, multiplication of received wisdom, ‘models’ that in fact were thin confirmations of the author’s prejudices in mathematical terms, and a great deal of stupidity. That does sound harsh, but there’s no better way really to describe articles that, weeks before the election, state without a shade of doubt that we needed to ‘move on’, for Clinton had already won. I wonder if Mr Frischling had a good family recipe for crow? And on the note of election night menu, he may exchange tips with Dr Sam Wang, whom Wired declared 2016’s election data hero in an incredibly complimentary puff piece, apparently quite more on the basis that the author, Jeff Nesbit, hoped Wang was right rather than any indications for analytical superiority.

The fact is, the polling profession failed America and has no real reason to continue to exist. The only thing it has done is make campaigns more expensive and add to the pay-to-play of American politics. I don’t really see myself crying salt tears at the polling profession’s funeral.

The jury is still out on the ‘quantitative social sciences’, but it’s not looking good. The ideological homogeneity in social science faculties worldwide, but especially in America, has contributed to the kind of disaster that happens when people live in a bubble. As scientists, we should never forget to sanity check our conclusions against our experiences, and intentionally cultivate the most diverse circle of friends we can to get as many little slivers of the human experience as we can. When one’s entire milieu consists of pro-Clinton academics, it’s hard to even entertain doubt about who is going to win – the availability heuristic is a strong and formidable adversary, and the only way to beat it is by recruiting a wide array of familiar people, faces, notions, ideas and experiences to rely on.

As I write this, I have an inch-thick pile of papers next to me: calculations, printouts, images, drafts of a longer academic paper that explains the technical side of all this in detail. Over the last few days, I’ve fielded my share of calls from the media – which was somewhat flattering, but this is not my field. I’m just an amateur who might have gotten very lucky – or maybe not.

Time will tell.

In a few months, I will once again be sharing a conference room with my academic brethren. We will discuss, theorize, ideate and exchange views; a long, vivid conversation written for a 500-voice chorus, with all the beauty and passion and dizzying heights and tumbling downs of Tallis’s Spem in Alium. The election has featured prominently in those conversations last time, and no doubt that will be the case again. Many are, at least from an academic perspective, energised by what happened. Science is the only game where you actually want to lose from time to time. You want to be proven wrong, you want to see you don’t know anything, you want to be miles off, because that means there is still something else to discover, still some secrets this Creation conceals from our sight with sleights of hand and blurry mirrors. And so, perhaps the real winners are not those few, those merry few, who got it right this time. The real winners are those who, led by their curiosity about their failure to predict this election, find new solutions, new answers and, often enough, new puzzles.

That’s not a consolation prize. That’s how science works.

And while it’s cool to have predicted the election results more or less correctly, the real adventure is not the destination. The real adventure is the journey, and I hope that I have been able to grant you a little insight into this adventure some of us are on every hour of every day.

References   [ + ]

1. There is an academic paper with a lot more details forthcoming on the matter – incidentally, because republication is generally not permitted, it will contain many visualisations I was not able or allowed to put into this blog post. So just for that, it may be worth reading once it’s out. I will post a link to it here.
2. The reasoning here is roughly as follows. Assume the body is a sphere. All bodies are assumed of being made of the same material, which is also assumed to be homogenous. The volume of a sphere V = \frac{4}{3} \pi r^3 , and its weight is that multiplied by its density \rho . Thus the radius of a sphere of a matter of known density \rho can be calculated as r = \sqrt[3]{\frac{3}{4} \frac{M}{\pi \rho}} . From this, the surface area can be calculated (A = 4 \pi r^2 ). Thus, body weight is a decent stand-in for surface area.
3. I am indebted to Nassim Nicholas Taleb for this example.
4. Divide the smaller by the larger value, normalise to 1.

Give your Twitter account a memory wipe… for free.

The other day, my wife has decided to get rid of all the tweets on one of her twitter accounts, while of course retaining all the followers. But bulk deleting tweets is far from easy. There are, fortunately, plenty of tools that offer you the service of bulk deleting your tweets… for a cost, of course. One had a freemium model that allowed three free deletes per day. I quickly calculated that it would have taken my wife something on the order of twelve years to get rid of all her tweets. No, seriously. That’s silly. I can write some Python code to do that faster, can’t I?

Turns out you can. First, of course, you’ll need to create a Twitter app from the account you wish to wipe and generate an access token, since we’ll also be performing actions on behalf of the account.

import tweepy
import time

CONSUMER_KEY=<your consumer key>
CONSUMER_SECRET=<your consumer secret>
ACCESS_TOKEN=<your access token>
ACCESS_TOKEN_SECRET=<your access token secret>
SCREEN_NAME=<your screen name, without the @>

Time to use tweepy’s OAuth handler to connect to the Twitter API:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

api = tweepy.API(auth)

Now, we could technically write an extremely sophisticated script, which looks at the returned headers to determine when we will be cut off by the API throttle… but we’ll use the easy and brutish route of holding off for a whole hour if we get cut off. At 350 requests per hour, each capable of deleting 100 tweets, we can get rid of a 35,000 tweet account in a single hour with no waiting time, which is fairly decent.

The approach will be simple: we ask for batches of 100 tweets, then call the .destroy() method on each of them, which thanks to tweepy is now bound into the object representing every tweet we receive. If we encounter errors, we respond accordingly: if it’s a RateLimitError, an error object from tweepy that – as its name suggests – shows that the rate limit has been exceeded, we’ll hold off for an hour (we could elicit the reset time from headers, but this is much simpler… and we’ve got time!), if it can’t find the status we simply leap over it (sometimes that happens, especially when someone is doing some manual deleting at the same time) and otherwise, we break the loops.

def destroy():
    while True:
        q = api.user_timeline(screen_name=SCREEN_NAME,
                              count=100)
        for each in q:
            try:
                each.destroy()
            except tweepy.RateLimitError as e:
                print (u"Rate limit exceeded: {0:s}".format(e.message))
                time.sleep(3600)
            except tweepy.TweepError as e:
                if e.message == "No status found with that ID.":
                    continue
            except Exception as e:
                print (u"Encountered undefined error: {0:s}".format(e.message))
                break
        break

Finally, we’ll make sure this is called as the module default:

if __name__ == '__main__':
    destroy()

Happy destruction!

Immortal questions

When asked for a title for his 1979 collection of philosophical papers, my all-time favourite philosopher1That does not mean I agree with even half of what he’s saying. But I do undoubtedly acknowledge his talent, agility of mind, style of writing, his knowledge and his ability to write good and engaging papers that have not yet fallen victim to the neo-sophistry dominating universities. Thomas Nagel chose the title Mortal Questions, an apt title, for most of our philosophical preoccupations (and especially those pertaining to the broad realm of moral philosophy) stem from the simple fact that we’re all mortal, and human life is as such an irreplaceable good. By extension, most things that can be created by humans are capable of being destroyed by humans.

That time is ending, and we need a new ethics for that.

Consider the internet. We all know it’s vulnerable, but is it existentially vulnerable?2I define existential vulnerability as being capable of being destroyed by an adversary that does not require the adversary to accept an immense loss or undertake a nonsensically arduous task. For example, it is possible to kill the internet by nuking the whole planet, but that would be rather disproportionate. Equally, destruction of major lines of transmission may at best isolate bits of the internet (think of it in graph theory terms as turning the internet from a connected graph into a spanning acyclic tree), but it takes rather more to kill off everything. On the other hand, your home network is existentially vulnerable. I kill router, game over, good night and good luck. The answer is probably no. Neither would any significantly distributed self-provisioning pseudo-AI be. And by pseudo-AI, I don’t even mean a particularly clever or futuristic or independently reasoning system, but rather a system that can provision resources for itself in response to threat factors just as certain balancers and computational systems we write and use on a day to day basis can commission themselves new cloud resources to carry out their mandate. Based on their mandate, such systems are potentially existentially immortal/existentially indestructible.3As in, lack existential vulnerability.

The human factor in this is that such a system will be constrained by mandates we give them. Ergo,4According to my professors at Oxford, my impatience towards others who don’t see the connections I do has led me to try to make up for it by the rather annoying verbal tic of overusing ‘thus’ at the start of every other sentence. I wrote a TeX macro that automatically replaced it with neatly italicised ‘Ergo‘. Sometimes, I wonder why they never decided to drown me in the Cherwell. those mandates are as fit a subject for human moral reasoning as any other human action.

Which means we’re going to need that new ethics pretty darn’ fast, for there isn’t a lot of time left. Distributed systems, smart contracts, trustless M2M protocols, the plethora of algorithms that have arisen that each bring us a bit closer to a machine capable of drawing subtle conclusions from source data (hidden Markov models, 21st century incarnations of fuzzy logic, certain sorts of programmatic higher order logic and a few other factors are all moving towards an expansion of what we as humans can create and the freedom we can give our applications. Who, even ten years ago, would have thought that one day I will be able to give a computing cluster my credit card and if it ran out of juice, it could commission additional resources until it bled me dry and I had to field angry questions from my wife? And that was a simple dumb computing cluster. Can you teach a computing cluster to defend itself? Why the heck not, right?

Geeks who grew up on Asimov’s laws of robotics, myself included, think of this sort of problem as largely being one of giving the ‘right’ mandates to the system, overriding mandates to keep itself safe, not to harm humans,5…or at least not to harm a given list of humans or a given type of humans. or the like. But any sufficiently well-written system will eventually grow to the level of the annoying six-year-old, who lives for the sole purpose of trying to twist and redefine his parents’ words to mean the opposite of what they intended.6Many of these, myself included, are at risk of becoming lawyers. Parents, talk to your kids. If you don’t talk to them about the evils of law school, who will? In the human world, a mandate takes place in a context. A writ is executed within a legal system. An order by a superior officer is executed according to the applicable rules of military justice, including circumstances when the order ought not be carried out. Passing these complex human contexts, which most of us ignore as we do all the things we grew up with and take for granted, into a more complicated model may not be feasible. Rules cannot be formulated exhaustively,7H.L.A. Hart makes some good points regarding this as such a formulation by definition would have to encompass all past, present and future – all that potentially can happen. Thus, the issue moves on soon from merely providing mandates to what in the human world is known as ‘statutory construction’ or interpretation of legislative works. How are computers equipped to reason about symbolic propositions according to rules that we humans can predict? In other words, how can we teach rules of reasoning about rules in a way that is not inherently recursing this question (i.e. is not based on a simple conditional rule based framework).

Which means that the best that can be provided in such a situation is a framework based on values, and target optimisation algorithms (i.e. what’s the best way to reach the overriding objective with least damage to other objectives and so on). Which in turn will need a good bit of rethinking ethical norms.

But the bottom line is quite simple: we’re about to start creating immortals. Right now, you can put data on distributed file infrastructures like IPFS that’s effectively impossible to destroy using a reasonable amount of resources. Equally, distributed applications via survivable infrastructures such as the blockchain, as well as smart contract platforms, are relatively immortal. The creation of these is within the power of just about everyone with a modicum of computing skills. The rise of powerful distributed execution engines for smart contracts, like Maverick Labs’ Aletheia Platform,8Mandatory disclosure: I’m one of the creators of Aletheia, and a shareholder and CTO of its parent corporation. will give a burst of impetus to systems’ ability to self-provision, enter into contracts, procure services and thus even effect their own protection (or destruction). They are incarnate, and they are immortal. For what it’s worth, man is steps away from creating its own brand of deities.9For the avoidance of doubt: as a Christian, a scientist and a developer of some pretty darn complex things, I do not believe that these constructs, even if omnipotent, omniscient and omnipresent as they someday will be by leveraging IoT and surveillance networks, are anything like my capital-G God. For lack of space, there’s no way to go into an exhaustive level of detail here, but my God is not defined by its omniscience and omnipotence, it’s defined by his grace, mercy and love for us. I’d like to see an AI become incarnate and then suffer and die for the salvation of all of humanity and the forgiveness of sins. The true power of God, which no machine will ever come close to, was never as strongly demonstrated as when the child Jesus lay in the manger, among animals, ready to give Himself up to save a fallen, broken humanity. And I don’t see any machine ever coming close to that.

What are the ethics of creating a god? What is right and wrong in this odd, novel context? What is good and evil to a device?

The time to figure out these questions is running out with merciless rapidity.

Title image: God the Architect of the Universe, Codex Vindobonensis 2554, f1.v

References   [ + ]

1. That does not mean I agree with even half of what he’s saying. But I do undoubtedly acknowledge his talent, agility of mind, style of writing, his knowledge and his ability to write good and engaging papers that have not yet fallen victim to the neo-sophistry dominating universities.
2. I define existential vulnerability as being capable of being destroyed by an adversary that does not require the adversary to accept an immense loss or undertake a nonsensically arduous task. For example, it is possible to kill the internet by nuking the whole planet, but that would be rather disproportionate. Equally, destruction of major lines of transmission may at best isolate bits of the internet (think of it in graph theory terms as turning the internet from a connected graph into a spanning acyclic tree), but it takes rather more to kill off everything. On the other hand, your home network is existentially vulnerable. I kill router, game over, good night and good luck.
3. As in, lack existential vulnerability.
4. According to my professors at Oxford, my impatience towards others who don’t see the connections I do has led me to try to make up for it by the rather annoying verbal tic of overusing ‘thus’ at the start of every other sentence. I wrote a TeX macro that automatically replaced it with neatly italicised ‘Ergo‘. Sometimes, I wonder why they never decided to drown me in the Cherwell.
5. …or at least not to harm a given list of humans or a given type of humans.
6. Many of these, myself included, are at risk of becoming lawyers. Parents, talk to your kids. If you don’t talk to them about the evils of law school, who will?
7. H.L.A. Hart makes some good points regarding this
8. Mandatory disclosure: I’m one of the creators of Aletheia, and a shareholder and CTO of its parent corporation.
9. For the avoidance of doubt: as a Christian, a scientist and a developer of some pretty darn complex things, I do not believe that these constructs, even if omnipotent, omniscient and omnipresent as they someday will be by leveraging IoT and surveillance networks, are anything like my capital-G God. For lack of space, there’s no way to go into an exhaustive level of detail here, but my God is not defined by its omniscience and omnipotence, it’s defined by his grace, mercy and love for us. I’d like to see an AI become incarnate and then suffer and die for the salvation of all of humanity and the forgiveness of sins. The true power of God, which no machine will ever come close to, was never as strongly demonstrated as when the child Jesus lay in the manger, among animals, ready to give Himself up to save a fallen, broken humanity. And I don’t see any machine ever coming close to that.

Actually, yes, you should sometimes share your talent for free.

Adam Hess is a ‘comedian’. I don’t know what that means these days, so I’ll give him the benefit of doubt here and assume that he’s someone paid to be funny rather than someone living with their parents and occasionally embarrassing themselves at Saturday Night Open Mic. I came across his tweet from yesterday, in which he attempted some sarcasm aimed at an advertisement in which Sainsbury’s was looking for an artist who would, free of charge, refurbish their canteen in Camden.

Now, I’m married to an artist. I have dabbled in art myself, though with the acute awareness that I’ll never make a darn penny anytime soon given my utter lack of a) skills, b) talent. As such, I have a good deal of compassion for artists who are upset when clients, especially fairly wealthy ones, ask young artists and designers at the beginning of their career to create something for free. You wouldn’t tell a junior solicitor or a freshly qualified accountant to do your legal matters or your accounts for free to ‘gain experience’, ‘get some exposure’ and ‘perhaps get some future business’. It invalidates the fact that artists are, like any other profession, working for a living and have got bills to pay.

Then there’s the reverse of the medal. I spend my life in a profession that has a whole culture of giving our knowledge, skills and time away for free. The result is an immense body of code and knowledge that is, I repeat, publicly available for free. Perhaps, if you’re not in the tech industry, you might want to stop and think about this for five minutes. The multi-trillion industry that is the internet and its associated revenue streams, from e-commerce through Netflix to, uh, porn (regrettably, a major source of internet-based revenue), rely for its very operation on software that people have built for no recompense at all, and/or which was open-sourced by large companies. Over half of all web servers globally run Apache or nginx, both having open-source licences.1Apache and a BSD variant licence, respectively. To put it in other words – over half the servers on the internet use software for which the creators are not paid a single penny.

The most widespread blog engine, WordPress, is open source. Most servers running SaaS products use an open-source OS, usually something *nix based. Virtually all programming languages are open-source – freely available and provided for no recompense. Closer to the base layer of the internet, the entire TCP/IP stack is open, as is BIND, the de facto gold standard for DNS servers.2DNS servers translate verbose and easy-to-remember domain names to IP addresses, which are not that easy to remember. And whatever your field, chances are, there is a significant open source community in it.

Over the last decade and a bit, I have open-sourced quite a bit of code myself. That’s, to use Mr Hess’s snark, free stuff I produced to, among others, ‘impress’ employers. A few years ago, I attended an interview for the data department of a food retailer. As a ‘show and tell’ piece, I brought them a client for their API that I built and open-sourced over the days preceding the interview.3An API is the way third-party software can communicate with a service. API wrappers or API clients are applications written for a particular language that translate the API to objects native to that language. They were ready to offer me the job right there and then. But it takes patience and faith – patience to understand that rewards for this sort of work are not immediate and faith in one’s own skills to know that they will someday be recognised. That is, of course, not the sole reason – or even the main reason – why I open-source software, but I would lie if I pretended it was not sometimes at the back of my head.

At which point it’s somewhat ironic to see Mr Hess complain about an artist being asked to do something for free (and he wasn’t even approached – this is a public advertisement in a local fishwrap!) while using a software pipeline worth millions that people have built, and simply given away, for free, for the betterment of our species and our shared humanity.

Worse, it’s quite clear that this seems to be an initiative not by Sainsbury’s but rather by a few workers who want slightly nicer surroundings but cannot afford to pay for it. Note that it’s the staff canteen, rather than customer areas, that are to be decorated. At this point, Mr Hess sounds greedier than Sainsbury’s. Who, really, is ‘exploiting’ whom here?

In my business life, I would estimate the return I get from work done free of charge at 2-300% long term. That includes, for the avoidance of doubt, people for whom I’ve done work who ended up not paying me anything at all ever. I’m not sure how it works in comedy, but in the real world, occasionally doing something for someone else without demanding recompense is not only lucrative, it’s also beneficial in other ways:

  • It builds connections because it personalises a business relationship.
  • It builds character because it teaches the value of selflessness.
  • And it’s fun. Frankly, the best times I’ve had during my working career usually involved unpaid engagements, free-of-charge investments of time, open-source contributions or volunteer work.

The sad fact is that many, like Mr Hess, confuse righteous indignation about those who seek to profit off ‘young artists’ by exploiting them with the terrific, horrific, scary prospect of doing something for free just once in a blue moon.

Fortunately, there are plenty of young artists eager to show their skills who either have more business acumen than Mr Hess or more common sense than to publicly snub their noses at the fearsome prospect of actually doing something they are [supposed to be] enjoying for free. As such, I doubt that the Camden Sainsbury’s canteen will go undecorated.

Of the 800 or so retweets, I see few who would heed a word of wisdom, as I see the retweets are awash with remarks that are various degrees of confused, irate or just full of creative smuggity smugness), but for the rest, I’d venture the following word of wisdom:4Credited to Dale Carnegie, but reportedly in use much earlier.

If you want to make a million dollars, you’ve got to first make a million people happy.

The much-envied wealth of Silicon Valley did not happen because they greedily demanded an hourly rate for every line of code they ever produces. It happened because of the realisation that we all are but dwarfs on the shoulders of giants, and ultimately our lives are going to be made not by what we secret away but by what others share to lift us up, and what we share to lift up others with.

You are the light of the world. A city seated on a mountain cannot be hid. Neither do men light a candle and put it under a bushel, but upon a candlestick, that it may shine to all that are in the house. So let your light shine before men, that they may see your good works, and glorify your Father who is in heaven.

Matthew 5:14-16

Title image: The blind Orion carries Cedalion on his shoulders, from Nicolas Poussin’s The Blind Orion Searching for the Rising Sun, 1658. Oil on canvas; 46 7/8 x 72 in. (119.1 x 182.9 cm), Metropolitan Museum of Art.

References   [ + ]

1. Apache and a BSD variant licence, respectively.
2. DNS servers translate verbose and easy-to-remember domain names to IP addresses, which are not that easy to remember.
3. An API is the way third-party software can communicate with a service. API wrappers or API clients are applications written for a particular language that translate the API to objects native to that language.
4. Credited to Dale Carnegie, but reportedly in use much earlier.

Diffie-Hellman in under 25 lines

How can you and I agree on a secret without anyone eavesdropping being able to intercept our communications? At first, the idea sounds absurd – for the longest time, without a pre-shared secret, encryption was seen as impossible. In World War II, the Enigma machines relied on a fairly complex pre-shared secret – the Enigma configurations (consisting of the rotor drum wirings and number of rotors specific to the model, the Ringstellung of the day, and Steckbrett configurations) were effectively the pre-shared key. During the Cold War, field operatives were provided with one-time pads (OTPs), randomly (if they were lucky) or pseudorandomly (if they weren’t, which was most of the time) generated1As a child, I once built a pseudorandom number generator from a sound card, a piece of wire and some stray radio electronics, which basically rested on a sampling of atmospheric noise. I was surprised to learn much later that this was the method the KGB used as well. one time pads (OTPs) with which to encrypt their messages. Cold War era Soviet OTPs were, of course, vulnerable because like most Soviet things, they were manufactured sloppily.2Under pressure from the advancing German Wehrmacht in 1941, they had duplicated over 30,000 pages worth of OTP code. This broke the golden rule of OTPs of never, ever reusing code, and ended up with a backdoor that two of the most eminent female cryptanalysts of the 20th, Genevieve Grotjan Feinstein and Meredith Gardner, on whose shoulders the success of the Venona project rested, could exploit. But OTPs are vulnerable to a big problem: if the key is known, the entire scheme of encryption is defeated. And somehow, you need to get that key to your field operative.

Enter the triad of Merkle, Diffie and Hellman, who in 1976 found a way to exploit the fact that multiplying primes is simple but decomposing a large number into the product of two primes is difficult. From this, they derived the algorithm that came to be known as the Diffie-Hellman algorithm.3It deserves noting that the D-H key exchange algorithm was another of those inventions that were invented twice but published once. In 1975, the GCHQ team around Clifford Cocks invented the same algorithm, but was barred from publishing it. Their achievements weren’t recognised until 1997.

5535098

How to cook up a key exchange algorithm

The idea of a key exchange algorithm is to end up with a shared secret without having to exchange anything that would require transmission of the secret. In other words, the assumption is that the communication channel is unsafe. The algorithm must withstand an eavesdropper knowing every single exchange.

Alice and Bob must first agree to use a modulus p and a baseg, so that the base is a primitive root modulo the modulus.

Alice and Bob each choose a secret key a and b respectively – ideally, randomly generated. The parties then exchange A = g^a \mod(p) (for Alice) and B = g^b \mod(p) (for Bob).

Alice now has received B. She goes on to compute the shared secret s by calculating B^a \mod(p) and Bob computes it by calculating A^b \mod(p).

The whole story is premised on the equality of

A^b \mod(p) = B^a \mod(p)

That this holds nearly trivially true should be evident from substituting g^b for B and g^a for A. Then,

g^{ab} \mod(p) = g^{ba} \mod(p)

Thus, both parties get the same shared secret. An eavesdropper would be able to get A and B. Given a sufficiently large prime for p, in the range of 6-700 digits, the discrete logarithm problem of retrieving a from B^a \mod(p) in the knowledge of B and p is not efficiently solvable, not even given fairly extensive computing resources. Read more

References   [ + ]

1. As a child, I once built a pseudorandom number generator from a sound card, a piece of wire and some stray radio electronics, which basically rested on a sampling of atmospheric noise. I was surprised to learn much later that this was the method the KGB used as well.
2. Under pressure from the advancing German Wehrmacht in 1941, they had duplicated over 30,000 pages worth of OTP code. This broke the golden rule of OTPs of never, ever reusing code, and ended up with a backdoor that two of the most eminent female cryptanalysts of the 20th, Genevieve Grotjan Feinstein and Meredith Gardner, on whose shoulders the success of the Venona project rested, could exploit.
3. It deserves noting that the D-H key exchange algorithm was another of those inventions that were invented twice but published once. In 1975, the GCHQ team around Clifford Cocks invented the same algorithm, but was barred from publishing it. Their achievements weren’t recognised until 1997.

What’s the value of a conditional clause?

No, seriously, bear with me. I haven’t lost my mind. Consider the following.

Joe, a citizen of Utopia, makes Ut$142,000 a year. In Utopia, you pay 25% on your first Ut$120,000 and from there on 35% on all earnings above. Let’s calculate Joe’s tax rate.

Trivial, no? JavaScript:

var income = 142000;
var tax_to_pay = 0;

if(income <= 120000){
 tax_to_pay = income * 0.25;
} else {
 tax_to_pay = 30000 + (income - 120000) * 0.35;
}

console.log(tax_to_pay);

And Python:

income = 142000

if income <= 120000
    tax_to_pay = income * 0.25
else
    tax_to_pay = 30000 + (income - 120000) * 0.35

print(tax_to_pay)

And so on. Now let’s consider the weirdo in the ranks, Julia:

income = 142000

if income <= 120000:
    tax_to_pay = income * 0.25
else
    tax_to_pay = 30000 + (income - 120000) * 0.35
end

Returns 37700.0 all right. But now watch what Julia can do and the other languages (mostly) can’t! The following is perfectly correct Julia code.

income = 142000

tax_to_pay = (if income <= 120000
                  income * 0.25
              else
                  30000 + (income - 120000) * 0.35
              end)

print(tax_to_pay)

This, too, will return 37700.0. Now, you might say, that’s basically no different from a ternary operator. Except unlike with ternary ops in most languages, you can actually put as much code there as you want and have as many side effects as your heart desires, while still being able to assign the result of the last calculation within the block that ends up getting executed to the variable at the head.

Now, that raises the question of what the value of a while expression is. Any guesses?

i = 0

digits = (while i < 10
                i = i + 1
          end)

Well, Julia says 0123456789 when executing it, so digits surely must be…

julia> digits


Wait, wat?! That must be wrong. Let’s type check it.

julia> typeof(digits)
Void

So there you have it. A conditional has a value, a while loop doesn’t… even if both return a value. Sometimes, you’ve gotta love Julia, kick back with a stiff gin and listen to Gary Bernhardt.

The sinful algorithm

In 1318, amidst a public sentiment that was less than enthusiastic about King Edward II, a young clerk from Oxford, John Deydras of Powderham, claimed to be the rightful ruler of England. He spun a long and rather fantastic tale that involved sows biting off the ears of children and other assorted lunacy.1It is now more or less consensus that Deydras was mentally ill and made the whole story up. Whether he himself believed it or not is another question. Edward II took much better to the pretender than his wife, the all-around badass Isabella of France, who was embarrassed by the whole affair, and Edward’s barons, who feared more sedition if they let this one slide. As such, eventually, Deydras was tried for sedition.

Deydras’s defence was that he has been convinced to engage in this charade by his cat, through whom the devil appeared to him.2As an obedient servant to a kitten, I have trouble believing this! That did not meet with much leniency, it did however result in one of the facts that exemplified the degree to which medieval criminal jurisprudence was divorced from reason and reality: besides Deydras, his cat, too, was tried, convicted, sentenced to death and hung, alongside his owner.

Before the fashionable charge of unreasonableness is brought against the Edwardian courts, let it be noted that other times and cultures have fared no better. In the later middle ages, it was fairly customary for urban jurisdictions to remove objects that have been involved in a crime beyond the city limits, giving rise to the term extermination (ex terminare, i.e., [being put] beyond the ends).3Falcón y Tella, Maria J. (2014). Justice and law, 60. Brill Nijhoff, Leiden The Privileges of Ratisbon (1207) allowed the house in which a crime took place or which harboured an outlaw to be razed to the ground – the house itself was as guilty as its owner.4Falcón y Tella, Maria J. and Falcón y Tella, Fernando (2006). Punishment and Culture: a right to punish? Nijhoff, Leiden. And even a culture as civilised and rationalistic as the Greeks fared no better, falling victim to the same surge of unreason. Hyde describes

The Prytaneum was the Hôtel de Ville of Athens as of every Greek town. In it was the common hearth of the city, which represented the unity and vitality of the community. From its perpetual fire, colonists, like the American Indians, would carry sparks to their new homes, as a symbol of fealty to the mother city, and here in very early times the prytanis or chieftain probably dwelt. In the Prytaneum at Athens the statues of Eirene (Peace) and Hestia (Hearth) stood; foreign ambassadors, famous citizens, athletes, and strangers were entertained there at the public expense; the laws of the great law-giver Solon were displayed within it and before his day the chief archon made it his home.
One of the important features of the Prytaneum at Athens were the curious murder trials held in its immediate vicinity. Many Greek writers mention these trials, which appear to have comprehended three kinds of cases. In the first place, if a murderer was unknown or could not be found, he was nevertheless tried at this court. Then inanimate things – such as stones, beams, pliece of iron, etc., – which had caused the death of a man by falling upon him-were put on trial at the Prytaneum, and lastly animals, which had similarly been the cause of death.
Though all these trials were of a ceremonial character, they were carried on with due process of law. Thus, as in all murder trials at Athens, because of the religious feeling back of them that such crimes were against the gods as much as against men, they took place in the open air, that the judges might not be contaminated by the pollution supposed to exhale from the prisoner by sitting under the same roof with him.
(…)
[T]he trial of things, was thus stated by Plato:
“And if any lifeless thing deprive a man of life, except in the case of a thunderbolt or other fatal dart sent from the gods – whether a man is killed by lifeless objects falling upon him, or his falling upon them, the nearest of kin shall appoint the nearest neighbour to be a judge and thereby acquit himself and the whole family of guilt. And he shall cast forth the guilty thing beyond the border.”
Thus we see that this case was an outgrowth from, or amplification of the [courts’ jurisdiction trying and punishing criminals in absentia]; for if the murderer could not be found, the thing that was used in the slaying, if it was known, was punished.5Hyde, Walter W. (1916). The Prosecution and Punishment of Animals and Lifeless Things in the Middle Ages and Modern Times. 64 U.Pa.LRev. 696.


Looking at the current wave of fashionable statements about the evils of algorithms have reminded me eerily of the superstitious pre-Renaissance courts, convening in damp chambers to mete out punishments not only on people but also on impersonal objects. The same detachment from reality, from the Prytaneum through Xerxes’s flogging of the Hellespont through hanging cats for being Satan’s conduits, is emerging once again, in the sophisticated terminology of ‘systematized biases’:

Clad in the pseudo-sophistication of a man who bills himself as ‘one of the world’s leading thinkers‘, a wannabe social theorist with an MBA from McGill and a career full of buzzwords (everything is ‘foremost’, ‘agenda-setting’ or otherwise ‘ultimative’!) that now apparently qualifies him to discuss algorithms, Mr Haque makes three statements that have now become commonly accepted dogma among certain circles when discussing algorithms.

  1. Algorithms are means to social control, or at the very least, social influence.
  2. Algorithms are made by a crowd of ‘geeks’, a largely homogenous, socially self-selected group that’s mostly white, male, middle to upper middle class and educated to a Masters level.
  3. ‘Systematic biases’, by which I presume he seeks to allude to the concept of institutional -isms in the absence of an actual propagating institution, mean that these algorithms are reflective of various biases, effectively resulting in (at best) disadvantage and (at worst) actual prejudice and discrimination against groups that do not fit the majority demographic of those who develop code.

Needless to say, leading thinkers and all that, this is absolute, total and complete nonsense. Here’s why.


A geek’s-eye view of algorithms

We live in a world governed by algorithms – and we have ever since men have mastered basic mathematics. The Polynesian sailors navigating based on stars and the architects of Solomon’s Temple were no less using algorithms than modern machine learning techniques or data mining outfits are. Indeed, the very word itself is a transliteration of the name of the 8th century Persian mathematician Al-Khwarazmi.6Albeit what we currently regard as the formal definition of an algorithm is largely informed by the work of Hilbert in the 1920s, Church’s lambda calculus and, eventually, the emergence of Turing machines. And for most of those millennia of unwitting and untroubled use of algorithms, there were few objections.

The problem is that algorithms now play a social role. What you read is determined by algorithms. The ads on a website? Algorithms. Your salary? Ditto. A million other things are algorithmically calculated. This has endowed the concept of algorithms with an air of near-conspiratorial mystery. You totally expect David Icke to jump out of your quicksort code one day.

Whereas, in reality, algorithms are nothing special to ‘us geeks’. They’re ways to do three things:

  1. Execute things in a particular order, sometimes taking the results of previous steps as starting points. This is called sequencing.
  2. Executing things a particular number of times. This is called iteration.
  3. Executing things based on a predicate being true or false. This is conditionality.

From these three building blocks, you can literally reconstruct every single algorithm that has ever been used. There. That’s all the mystery.

So quite probably, what people mean when they rant about ‘algorithms’ is not the concept of algorithms but particular types of algorithm. In particular, social algorithms, content filtering, optimisation and routing algorithms are involved there.

Now, what you need to understand is that geeks care relatively little about the real world ‘edges’ of problems. They’re not doing this out of contempt or not caring, but rather to compartmentalise problems to manageable little bits. It’s easier to solve tiny problems and make sure the solutions can interoperate than creating a single, big solution that eventually never happens.

To put it this way: to us, most things, if not everything, is an interface. And this largely determines what it means when we talk about the performance of an algorithm.

Consider your washing machine: it can be accurately modelled in the following way.

The above models a washing machine. Given supply water and power, if you put in dirty clothes and detergent, you will eventually get clean clothes and grey water. And that is all.
The above models a washing machine. Given supply water and power, if you put in dirty clothes and detergent, you will eventually get clean clothes and grey water. And that is all.

Your washing machine is an algorithm of sorts. It’s got parameters (water, power, dirty clothes) and return values (grey water, clean clothes). Now, as long as your washing machine fulfils a certain specification (sometimes called a promise7I discourage the promise terminology here as I’ve seen it confuzzled with the asynchronous meaning of the word way too often or a contract), according to which it will deliver a given set of predictable outputs to a given set of inputs, all will be well. Sort of.

“Sort of”, because washing machines can break. A defect in an algorithm is defined as ‘betraying the contract’, in other words, the algorithm has gone wrong if it has been given the right supply and yields the wrong result. Your washing machine might, however, fail internally. The motor might die. A sock might get stuck in it. The main control unit might short out.


Now consider the following (extreme simplification of an) algorithm. MD5 is what we call a cryptographic hash function. It takes something – really, anything that can be expressed in binary – and gives a 128-bit hash value. On one hand, it is generally impossible to invert the process (i.e. it is not possible to conclusively deduce what the original message was), while at the same time the same message will always yield the same hash value.

Without really having an understanding of what goes on behind the scenes,8In case you’re interested, RFC1321 explains MD5’s internals in a lot of detail. you can rely on the promise given by MD5. This is so in every corner of the universe. The value of MD5("Hello World!") is 0xed076287532e86365e841e92bfc50d8c in every corner of the universe. It was that value yesterday. It will be that value tomorrow. It will be that value at the heat death of the universe. What we mean when we say that an algorithm is perfect is that it upholds, and will uphold, its promise. Always.

At the same time, there are aspects of MD5 that are not perfect. You see, perfection of an algorithm is quite context-dependent, much as the world’s best, most ‘perfect’ hammer is utterly useless when what you need is a screwdriver. As such, for instance, we know that MD5 has to map every possible bit value of every possible length to a limited number of possible hash values (128 bit worth of values, to be accurate, which equates to 2^128 or approximately 3.4×10^38 distinct values). These seem a lot, but are actually teensy when you consider that they are used to map every possible amount of binary data, of every possible length. As such, it is known that sometimes different things can have the same hash value. This is called a ‘collision’, and it is a necessary feature of all hash algorithms. It is not a ‘fault’ or a ‘shortcoming’ of the algorithm, no more than we regard the non-commutativity of division a ‘shortcoming’.

Which is why it’s up to you, when you’re using an algorithm, to know what it can and cannot do. Algorithms are tools. Unlike the weird perception in Mr Haque’s swirl of incoherence, we do not worship algorithms. We don’t tend to sacrifice small animals to quicksort and you can rest assured we don’t routinely bow to a depiction of binary search trees. No more do we believe in the ‘perfection’ of algorithms than a surgeon believes in the ‘perfection’ of his scalpel or a pilot believes in the ‘perfection’ of their aircraft. Both know their tools have imperfections. They merely rely on the promise that if used with an understanding of its limitations, you can stake your, and others’, lives on it. That’s not tool-worship, that’s what it means to be a tool-using human being.


The Technocratic Spectre

We don’t know the name of the first human who banged two stones together to make fire, and became the archetype for Prometheus, but I’m rather sure he was rewarded by his fellow humans rewarded with very literally tearing out his very literal and very non-regrowing liver. Every progress in the history of humanity had those who not merely feared progress and the new, but immediately saw seven kinds of nefarious scheming behind it. Beyond (often justified!) skepticism and a critical stance towards new inventions and a reserved approach towards progress (all valid positions!), there is always a caste of professional fear-mongerers, who, after painting a spectre of disaster, immediately proffer the solution: which, of course, is giving them control over all things new, for they are endowed with the mythical talents that one requires to be so presumptuous as to claim to be able to decide for others without even hearing their views.

The difference is that most people have become incredibly lazy. The result is that there is now a preference for fear over informed understanding that comes at the price of investing some time in reading up on the technologies that now are playing such a transformative role. How many Facebook users do you think have re-posted the “UCC 1-308 and Rome Statute” nonsense? And how many of them, you reckon, actually know how Facebook uses their data? While much of what they do is proprietary, the Facebook graph algorithms are partly primitives9Building blocks commonly used that are well-known and well-documented and partly open. If you wanted, you could, with a modicum of mathematical and computing knowledge, have a good stab at understanding what is going on. On the other hand, posting bad legalese is easier. Much easier.

And thus, as a result, we have a degree of skepticism towards ‘algorithms’, mostly by people like Mr Haque who do not quite understand what they are talking about and are not actually referring to algorithms but their social use.

And there lieth the Technocratic Spectre. It has always been a fashionable argument against progress, good or ill, that it is some mysterious machination by a scientific-technical elite aimed at the common man’s detriment. There is now a new iteration of this philosophy, and it is quite surprising how the backwards, low-information edges of the far right reach hands to the far left’s paranoid and misinformed segment. At least the Know-Nothings of the right live in an honest admission of ignorance, eschewing the over-blown credentials and inflated egos of their left-wing brethren like Mr Haque. But in ignorance, they both are one another’s match.

The left-wing argument against technological progress is an odd one, for the IT business, especially the part heavy on research and innovation that comes up with algorithms and their applications, is a very diverse and rather liberal sphere. Nor does this argument square too well with the traditional liberal values of upholding civil liberties, first and foremost that of freedom of expression and conscience. Instead, the objective seems to be an ever more expansive campaign, conducted entirely outside parliamentary procedure (basing itself on regulating private services from the inside and a goodly amount of shaming people into doing their will through the kind of agitated campaigning that I have never had the displeasure to see in a democracy), of limiting the expression of ideas to a rather narrowly circumscribed set, with the pretense that some minority groups are marginalised and even endangered by wrongthink.10Needless to say, a multiple-times-over minority in IT, the only people who have marginalised and endangered me were these stalwart defenders of the right never to have to face a controversial opinion.


Their own foray at algorithms has not fared well. One need only look at the misguided efforts of a certain Bay Area developer notorious for telling people to set themselves on fire. Her software, intended to block wrongthink on the weirder-than-weird cultural phenomenon of Gamergate by blocking Twitter users who have followed a small number of acknowledged wrongthinkers, expresses the flaws of this ideology beautifully. Not only is subtleness and a good technical understanding lacking. There is also a distinct shortage of good common sense and, most of all, an understanding of how to use algorithms. While terribly inefficient and horrendously badly written 11Especially for someone who declaims, with pride, her 15-year IT business experience…, the algorithm behind the GGAutoblocker is sound. It does what its creator intended it to do on a certain level: allow you to block everyone who is following controversial personalities. That this was done without an understanding of the social context (e.g. that this is a great way to block the uncommitted and those who wish to be as widely informed as possible, is of course the very point.

The problem is not with “geeks”.

The problem is when “geeks” decide to play social engineering. Whey they suddenly throw down their coding gear and decide they’re going to transform who talks with whom and how information is exchanged. The problem is exactly the opposite: it happens when geeks cease to be geeks.

It happens when Facebook experiments with users’ timelines without their consent. It happens when companies implement policies aimed at a really laudable goal (diversity and inclusion) that leads to statements by employees that should make any sane person shudder (You know who you are, Bay Area). It happens when Twitter decides they are going to experiment with their only asset. This is how it is rewarded.

$TWTR crash
Top tip: when you’ve got effectively no assets to speak of, screwing with your user base can be very fatal very fast.

The problem is not geeks seeing a technical solution to every socio-political issue.

The problem is a certain class of ‘geeks’ seeing a socio-political use to every tool.


Sins of the algorithm

Why algorithms? Because algorithms are infinitely dangerous: because they are, as I noted above, within their area of applicability universally true and correct.

But they’re also resilient. An algorithm feels no shame. An algorithm feels no guilt. You can’t fire it. You can’t tell them to set themselves on fire or, as certain elements have done to me for a single statistical analysis, threaten to rape my wife and/or kill me. An algorithm cannot be guilted into ‘right-think’. And worst of all, algorithms cannot be convincingly presented as having an internal political bias. Quicksort is not Republican. R/B trees are not Democrats. Neural nets can’t decide to be homophobic.

And for people whose sole argumentation lies on the plane of politics, in particular grievance and identity politics, this is a devastating strike. Algorithms are the greased eels unable to be framed for the ideological sins that are used to attack and remove undesirables from political and social discourse. And to those who wish to govern this discourse by fear and intimidation, a bunch of code that steadfastly spits out results and to hell with threats is a scary prospect.

And so, if you cannot invalidate the code, you have to invalidate the maker. Algorithms perpetuate real equality by being by definition unable to exercise the same kind of bias humans do (not that they don’t have their own kind of bias, but the similarity ends with the word – if your algorithm has a racial or ethnic or gender bias, you’re using it wrong). Algorithms are meritocratic, being immune to nepotism and petty politicking. A credit scorer does not care about your social status the way Mr Jones at the bank might privilege the child of his golf partners over a young unmarried ethnic couple. Trading algorithms don’t care whether you’re a severely ill young man playing the markets from hospital.12It was a great distraction. Without human intervention, algorithms have a purity and lack of bias that cannot easily be replicated once humans have touched the darn things.

And so, those whose stock in life is a thorough education in harnessing grievances for their own gain are going after “the geeks”.


Perhaps the most disgusting thing about Mr Haque’s tweet is the contraposition between “geeks” and “regular humans”, with the assumption that “regular humans” know all about algorithms and unlike the blindly algorithm-worshipping geeks, understand how ‘life is more complicated’ and algorithms are full of geeky biases.

For starters, this is hard to take seriously when in the same few tweets, Mr Haque displays a lack of understanding of algorithms that doesn’t befit an Oregon militia hick, never mind somebody who claims spurious credentials as a foremost thinker.

“Regular humans”, whatever they are that geeks aren’t (and really, I’m not one for geek supremacy, but if Mr Haque had spent five minutes among geeks, he’d know the difference is not what, and where, he thinks it is), don’t have some magical understanding of the shortcomings of algorithms. Heck, usually, they don’t have a regular understanding of algorithms, never mind magical. But it sure sounds good when you’re in the game of shaming some of the most productive members of society unless they contribute to the very problem you’re complaining about. For of course ‘geeks’ can atone for their ‘geekdom’ by becoming more of a ‘regular human’, by starting to engage in various ill-fated political forays that end with the problems that sent the blue bird into a dive on Friday.


Little of this is surprising, though. Anyone who has been paying attention could see the warning signs of a forced politicisation of technology, under the guise of making it more equal and diverse. In my experience, diverse teams perform better, yield better results, work a little faster, communicate better and make fewer big mistakes (albeit a little more small ones). In particular, gender-diverse and ethnically diverse teams are much more than the sum of their parts. This is almost universally recognised, and few businesses that have intentionally resisted creating diverse, agile teams have fared well in the long run.13Not that any statement about this matter is not shut down by reference to ludicrous made-up words like ‘mansplaining’. I’m a huge fan of diversity – because it lives up to a meritocratic ideal, one to which I am rather committed after I’ve had to work my way into tech through a pretty arduous journey.

Politicising a workplace, on the other hand, I am less fond of. Quite simply, it’s not our job. It’s not our job, because for what it’s worth, we’re just a bunch of geeks. There are things we’re better at. Building algorithms is one.

But they are now the enemy. And because they cannot be directly attacked, we’ll become targets. With the passion of a zealot, it will be taught that algorithms are not clever mathematical shortcuts but merely geeks’ prejudices expressed in maths.

And that’s a problem. If you look into the history of mathematics, most of it is peppered by people who held one kind of unsavoury view or another. Moore was a virulent racist. Pauli loved loose women. Half the 20th century mathematicians were communists at some point of their career. Haldane thought Stalin was a great man. And I could go on. But I don’t, because it does not matter. Because they took part in the only truly universal human experience: discovery.


But discovery has its enemies and malcontents. The attitude they display, evidenced by Haque’s tweet too, is ultimately eerily reminiscent of the letter that sounded the death knell on the venerable pre-WW II German mathematical tradition. Titled Kunst des Zitierens (The Art of Citing), it was written in 1934 by Ludwig Bieberbach, a vicious anti-Semite and generally unpleasant character, who was obsessed with the idea of a ‘German mathematics’, free of the Hilbertian internationalism, of what he saw as Jewish influence, of the liberalism of the German mathematical community in the inter-war years. He writes:

“Ein Volk, das eingesehen hat, wie fremde Herrschaftsgelüste an seinem Marke nagen, wie Volksfremde daran arbeiten, ihm fremde Art aufzuzwingen, muss Lehrer von einem ihm fremden Typus ablehnen.”

 

“A people that has recognised how foreign ambitions of power attack its brand, how aliens work on imposing foreign ways on it, has to reject teachers from a type alien to it.”

Algorithms, and the understanding of what they do, protect us from lunatics like Bieberbach. His ‘German mathematics’, suffused with racism and Aryan mysticism, was no less delusional than the idea that a cabal of geeks is imposing a ‘foreign way’ of algorithmically implementing their prejudices, as if geeks actually cared about that stuff.

Every age will produce its Lysenko and its Bieberbach, and every generation has its share of zealots that demand ideological adherence and measure the merit of code and mathematics based on the author’s politics.

Like on Lysenko and Bieberbach, history will have its judgment on them, too.

Head image credits: Max Slevogt, Xerxes at the Hellespont (Allegory on Sea Power). Bildermann 13, Oct. 5, 1916. With thanks to the President and Fellows of Harvard College.

References   [ + ]

1. It is now more or less consensus that Deydras was mentally ill and made the whole story up. Whether he himself believed it or not is another question.
2. As an obedient servant to a kitten, I have trouble believing this!
3. Falcón y Tella, Maria J. (2014). Justice and law, 60. Brill Nijhoff, Leiden
4. Falcón y Tella, Maria J. and Falcón y Tella, Fernando (2006). Punishment and Culture: a right to punish? Nijhoff, Leiden.
5. Hyde, Walter W. (1916). The Prosecution and Punishment of Animals and Lifeless Things in the Middle Ages and Modern Times. 64 U.Pa.LRev. 696.
6. Albeit what we currently regard as the formal definition of an algorithm is largely informed by the work of Hilbert in the 1920s, Church’s lambda calculus and, eventually, the emergence of Turing machines.
7. I discourage the promise terminology here as I’ve seen it confuzzled with the asynchronous meaning of the word way too often
8. In case you’re interested, RFC1321 explains MD5’s internals in a lot of detail.
9. Building blocks commonly used that are well-known and well-documented
10. Needless to say, a multiple-times-over minority in IT, the only people who have marginalised and endangered me were these stalwart defenders of the right never to have to face a controversial opinion.
11. Especially for someone who declaims, with pride, her 15-year IT business experience…
12. It was a great distraction.
13. Not that any statement about this matter is not shut down by reference to ludicrous made-up words like ‘mansplaining’.