A deep learning

There are posts that are harder to write than others. This one perhaps has been one of the hardest. It took me the best part of four months and dozens of rewrites.

Because it’s about something I love. And about someone I love. And about something else I love. And how these three came to come into a conflict. And, perhaps, what we all can learn from that.

As many of you might know, deep learning is my jam. Not in a faddish, ‘it’s what cool kids do these days’ sense. Nor, for that matter, in the sense so awfully prevalent in Silicon Valley, whereby the utility of something is measured in how many jobs it will get rid of, presumably freeing off humans to engage in more cerebral pursuits, or how it may someday cure intrinsically human problems if only those pesky humans were to listen to their technocratic betters for once. Rather, I’m a deep learning and AI researcher who believes in what he’s doing. I believe with all I am and all I’ve got that deep learning is right now our best chance to find better ways of curing cancer, producing more with less emissions, building structures that can withstand floods on a dime, identifying terrorists and, heck, create entertaining stuff. I firmly believe that it’s one of the few intellectual pursuits I am somewhat suited for that is also worth my time, not the least because I firmly believe that it will make me have more of it – and if not me, maybe someone equally worthy.

Which is why it was so hard for me to watch this video, of my lifelong idol Hayao Miyazaki ripping a deep learning researcher to shreds.

Now, quite frankly, I have little time for the researcher and his proposition. It’s badly made, dumb and pointless. Why one would inundate Miyazaki-san with it is beyond me. His putdown is completely on point, and not an ounce too harsh. All of his words are well deserved. As someone with a neurological chronic pain disorder that makes me sometimes feel like that creature writhing on the floor, I don’t have a shred of sympathy for this chap.1Least of all because I know how rudimentary and lame his work is. I’ve built evolutionary models of locomotion where the first stages look like this. There’s no cutting edge science here.

Rather, it’s the last few words of Miyazaki-san that have punched a hole in my heart and have held my thoughts captive for months now, coming back into the forefront of my thoughts like a recurring nightmare.

“I feel like we are nearing the end of times,” he says, the camera gracefully hovering over his shoulder as he sketches through his tears. “We humans are losing faith in ourselves.”

Deep learning is something formidable, something incredible, something so futuristic yet so simple. Deep down (no pun intended), deep learning is really not much more than a combination of a few relatively simple tricks, some the best part of a century old, that together create something fantastic. Let me try to put it into layman’s terms (if you’re one of my fellow ML /AI nerds, you can just jump over this part).

Consider you are facing the arduous and yet tremendously important task of, say, identifying whether an image depicts a cat or a dog. In ML lingo, this is what we call a ‘classification’ task. One traditional approach used to be to define what cats are versus what dogs are, and provide rules. If it’s got whiskers, it’s a cat. If it’s got big puppy eyes, it’s, well, a puppy. If it’s got forward pointing eyes and a roughly circular face, it’s almost definitely a kitty. If it’s on a leash, it’s probably a dog. And so on, ad infinitum, your model of a cat-versus-dog becoming more and more accurate with each rule you add.

This is a fairly feasible approach, and is still used. In fact, there’s a whole school of machine learning called decision trees that relies on this kind of definition of your subjects. But there are three problems with it.

1. You need to know quite a bit about cats and dogs to be able to do this. At the very least, you need to be able to, and take the time and effort to, describe cats and dogs. It’s not enough to merely feed images of each to the computer.2There’s a whole aspect of the story called feature extraction, which I will ignore for the sake of simplicity, and assume that it just happens. It doesn’t, of course, and it plays a huge role in identifying things, but this story is complex enough already as it is.
2. You are limited in time and ability to put down distinguishing features – your program cannot be infinitely large, nor do you have infinite time to write it. You must prioritise by identifying the factors with the greatest differentiating potential first. In other words, you need to know, in advance, what the most salient characteristics of cats versus dogs are – that is, what characteristics are almost omnipresent among cats but hardly ever occur among dogs (and vice versa)? All dogs have a snout and no cat has a snout, whereas some cats do have floppy ears and some dogs do have almost catlike triangular ears.
3. You are limited to what you know. Silly as that may sound, there might be some differentia between cats and dogs that are so arcane, so mathematical that no human would think of it – but which might come trivially evident to a computer.

Deep learning, like friendship, is magic. Unlike most other techniques of machine learning, you don’t need to have the slightest idea of what differentiates cats from dogs. What you need is a few hundred images of each, preferably with a label (although that is not strictly necessary – classifiers can get by just fine without needing to be told what the names of the things they are classifying are: as long as they’re told how many different classes they are to split the images into, they will find differentiating features on their own and split the images into ‘images with thing 1’ versus  ‘images with thing 2’. – magic, right?). Using modern deep learning libraries like TensorFlow and their high level abstractions (e.g. keras, tflearn) you can literally write a classifier that identifies cats versus dogs with a very high accuracy in less than 50 lines of Python that will be able to classify thousands of cat and dog pics in a fraction of a minute, most of which will be taken up by loading the images rather than the actual classification.

Told you it’s magic.

What makes deep learning ‘deep’, though? The origins of deep learning are older than modern computers. In 1943, McCullough and Pitts published a paper3McCulloch, W and Pitts, W (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259. that posited a model of neural activity based on propositional logic. Spurred by the mid-20th century advances in understanding how the nervous system works, in particular how nerve cells are interconnected, McCulloch and Pitts simply drew the obvious conclusion: there is a way you can represent neural connections using propositional logic (and, actually, vice versa). But it wasn’t until 1958 that this idea was followed up in earnest. Rosenblatt’s ground-breaking paper4Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psych Rev 65 (6): 386–408. doi:10.1037/h0042519 introduced this thing called the perceptron, something that sounds like the ideal robotic boyfriend/therapist but in fact was intended as a mathematical model for how the brain stores and processes information. A perceptron is a network of artificial neurons. Consider the cat/dog example. A simple single-layer perceptron has a list of input neurons $x_1$, $x_2$  and so on. Each of these describe a particular property. Does the animal have a snout? Does it go woof? Depending on how characteristic they are, they’re multiplied by a weight $w_n$. For instance, all dogs and no cats have snouts, so $w_1$  will be relatively high, while there are cats that don’t have long curly tails and dogs that do, so $w_n$  will be relatively low.

At the end, the output neuron (denoted by the big $\Sigma$ ) sums up these results, and gives an estimate as to whether it’s a cat or a dog.

What was initially designed to model the way the brain works has soon shown remarkable utility in applied computation, to the point that the US Navy was roped into building an actual, physical perceptron machine – the first application of computer vision. However, it was a complete bust. It turned out that a single layer perceptron couldn’t really recognise a lot of patterns. What it lacked was depth.

What do we mean by depth? Consider the human brain. The brain actually doesn’t have a single part devoted to vision. Rather, it has six separate areas5Or five, depending on whether you consider the dorsomedial area a separate area of the extrastriate cortex – the striate cortex (V1) and the extrastriate areas (V2-V6). These form a feedforward pathway of sorts, where V1 feeds into V2, which feeds into V3 and so on. To massively oversimplify: V1 detects optical features like edges, which it feeds on to V2, which breaks these down into more complex features: shapes, orientation, colour &c. As you proceed towards the back of the head, the visual centres detect increasingly complex abstractions from the simple visual information. What was found is that by putting layers and layers of neurons after one another, even very complex patterns can be identified accurately. There is a hierarchy of features, as the facial recognition example below shows.

The first hidden layer recognises simple geometries and blobs at different parts of the zone. The second hidden layer fires if it detects particular manifestations of parts of the face – noses, eyes, mouths. Finally, the third layer fires if it ‘sees’ a particular combination of these. Much like an identikit image, a face is recognised because it contains parts of a face, which in turn are recognised because they contain a characteristic spatial alignment of simple geometries.

There’s much more to deep learning than what I have tried to convey in a few paragraphs. The applications are endless. With the cost of computing decreasing rapidly, deep learning applications have now become feasible in just about all spheres where they can be applied. And they excel everywhere, outpacing not only other machine learning approaches (which makes me absolutely stoked about the future!) but, at times, also humans.

Which leads me back to Miyazaki. You see, deep learning can’t just classify things or predict stock prices. It can also create stuff. To put an old misunderstanding to rest quite early: generative neural networks are genuinely creating new things. Rather than merely combining pre-programmed elements, they come as close as anything non-human can come to creativity.

The pinnacle of it all, generating enjoyable music, is still some ways off, and we have yet to enjoy a novel written by a deep learning engine. But to anyone who has been watching the rapid development of deep learning and especially generative algorithms based on deep learning, these are literally just questions of time.

Or perhaps, as Miyazaki said, questions of the ‘end of times’.

What sets a computer-generated piece apart from a human’s composition? Someday, they will be, as far as quality is concerned, indistinguishable. Yet something that will always set them apart is the absence of a creator.

In what is probably one of the worst written essays in  20th century literary criticism, a field already overflowing with bad prose for bad prose’s sake, Roland Barthes’s 1967 essay La mort de l’auteur posited a sort of separation between the author and the text, countering centuries of literary criticism that sought to explain the meaning of the latter by reference to the former.  According to Barthes, texts (and so, compositions, paintings &.) have a life and existence of their own. To liberate works of art of an  ‘interpretive  tyranny’ that is almost self-explanatorily imposed on it, they must be read, interpreted and understood by reference to its audience and not its author. Indeed, Barthes eschews the term in favour of the term ‘scriptor‘, the latter hearkening back to the Medieval monks who copied manuscripts: like them, the scriptor is not in control of the narrative or work of art that he or she composes. Devoid of the author’s authority, the work of art is now free to exist in a liberated state that allows you – the recipient – to establish its essential meaning.

Oddly, that’s not entirely what post-modernism seems to have created. If anything, there is now an increased focus on the author, at the very least in one particular sense. Consider the curious case of Wagner’s works in Israel. Because of his anti-Semitic views, arguably as well as due to the favour his music found during the tragic years of the Third Reich, Wagner’s works – even those that do not even remotely express a political position – are rarely played in Israel. Even in recent years, other than Holocaust survivor Mendi Roman’s performance of Siegfried in 2000, there have been very few instances of Wagner played in Israel – despite the curious fact that Theodor Herzl, founder of Zionism, admired Wagner’s music (if not his vile racial politics). Rather than the death of the author, we more often witness the death of the work. The taint of the author’s life comes to haunt the chords of his composition and the stanzas of his poetry, every brush-stroke of theirs forever imbued with the often very human sins and mistakes of their lives.

Less dramatic, perhaps, than Wagner’s case are the increasingly frequent boycotts, outbursts and protests against works of art solely based on the character of the author or composer. One must only look at the recent past to see protests, for instance, against the works of HP Lovecraft, themselves having to do more with eldritch horrors than racist horridness, due to the author’s admittedly reprehensible views on matters of race. Outrages about one author or another, one artist or the next, are commonplace, acted out on a daily basis on the Twitter gibbets and the Facebook  pillory. Rather than the death of the author, we experience the death of art, amidst an increasingly intolerant culture towards  the works of flawed or sinful creators.

This is, of course, not to excuse any of those sins or flaws. They should not, and cannot, be excused. Rather, perhaps, it is to suggest that part of a better understanding of humanity is that artists are a cross-section of us as a species, equally prone to be misled and deluded into adopting positions that, as the famous German anti-Fascist and children’s book author Erich Kästner said, ‘feed the animal within man’. Nor is this to condone or justify art that actively expresses those reprehensible views – an entirely different issue. Rather, I seek merely to draw attention to the increased tendency to condemn works of art for the artist’s political sins. In many cases, these sins are far from being as straightforward as Lovecraft’s bigotry and Wagner’s anti-Semitism. In many cases, these sins can be as subtle as going against the drift of public opinion, the Orwellian sin of ‘wrongthink’. With the internet having become a haven of mob mentality (something I personally was subjected to a few years ago), the threshold of what sins  of the creator shall be visited upon their creations has significantly decreased. It’s not the end of days, but you can see it from here.

In which case perhaps Miyazaki is right.

Perhaps what we need is art produced by computers.

As Miyazaki-san said, we are losing faith in ourselves. Not in our ability to create wonderful works of art, but in our ability to measure up to some flawless ethos, to some expectation of the artist as the flawless being. We are losing faith in our artists. We are losing faith in our creators, our poets and painters and sculptors and playwrights and composers, because we fear that with the inevitable revelation of greater – or perhaps lesser – misdeeds or wrongful opinions from their past shall not merely taint them: they shall no less taint us, the fans and aficionados and cognoscenti. Put not your faith in earthly artists, for they are fickle, and prone to having opinions that might be unacceptable, or be seen as such someday. Is it not a straightforward response then to  declare one’s love for the intolerable synthetic Baroque of Stanford machine learning genius Cary Kaiming Huang’s research? In a society where the artist’s sins taint the work of art and through that, all those who confessed to enjoy his works, there’s no other safe bet. Only the AI can cast the first stone.

And if the cost of that is truly the chirps of Cary’s synthetic Baroque generator, Miyazaki is right on the other point, too. It truly is the end of days.

References   [ + ]

 1 ↑ Least of all because I know how rudimentary and lame his work is. I’ve built evolutionary models of locomotion where the first stages look like this. There’s no cutting edge science here. 2 ↑ There’s a whole aspect of the story called feature extraction, which I will ignore for the sake of simplicity, and assume that it just happens. It doesn’t, of course, and it plays a huge role in identifying things, but this story is complex enough already as it is. 3 ↑ McCulloch, W and Pitts, W (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5 (4): 115–133. doi:10.1007/BF02478259. 4 ↑ Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psych Rev 65 (6): 386–408. doi:10.1037/h0042519 5 ↑ Or five, depending on whether you consider the dorsomedial area a separate area of the extrastriate cortex
Posted in Uncategorized

Adam Hess is a ‘comedian’. I don’t know what that means these days, so I’ll give him the benefit of doubt here and assume that he’s someone paid to be funny rather than someone living with their parents and occasionally embarrassing themselves at Saturday Night Open Mic. I came across his tweet from yesterday, in which he attempted some sarcasm aimed at an advertisement in which Sainsbury’s was looking for an artist who would, free of charge, refurbish their canteen in Camden.

Now, I’m married to an artist. I have dabbled in art myself, though with the acute awareness that I’ll never make a darn penny anytime soon given my utter lack of a) skills, b) talent. As such, I have a good deal of compassion for artists who are upset when clients, especially fairly wealthy ones, ask young artists and designers at the beginning of their career to create something for free. You wouldn’t tell a junior solicitor or a freshly qualified accountant to do your legal matters or your accounts for free to ‘gain experience’, ‘get some exposure’ and ‘perhaps get some future business’. It invalidates the fact that artists are, like any other profession, working for a living and have got bills to pay.

Then there’s the reverse of the medal. I spend my life in a profession that has a whole culture of giving our knowledge, skills and time away for free. The result is an immense body of code and knowledge that is, I repeat, publicly available for free. Perhaps, if you’re not in the tech industry, you might want to stop and think about this for five minutes. The multi-trillion industry that is the internet and its associated revenue streams, from e-commerce through Netflix to, uh, porn (regrettably, a major source of internet-based revenue), rely for its very operation on software that people have built for no recompense at all, and/or which was open-sourced by large companies. Over half of all web servers globally run Apache or nginx, both having open-source licences.1Apache and a BSD variant licence, respectively. To put it in other words – over half the servers on the internet use software for which the creators are not paid a single penny.

The most widespread blog engine, WordPress, is open source. Most servers running SaaS products use an open-source OS, usually something *nix based. Virtually all programming languages are open-source – freely available and provided for no recompense. Closer to the base layer of the internet, the entire TCP/IP stack is open, as is BIND, the de facto gold standard for DNS servers.2DNS servers translate verbose and easy-to-remember domain names to IP addresses, which are not that easy to remember. And whatever your field, chances are, there is a significant open source community in it.

Over the last decade and a bit, I have open-sourced quite a bit of code myself. That’s, to use Mr Hess’s snark, free stuff I produced to, among others, ‘impress’ employers. A few years ago, I attended an interview for the data department of a food retailer. As a ‘show and tell’ piece, I brought them a client for their API that I built and open-sourced over the days preceding the interview.3An API is the way third-party software can communicate with a service. API wrappers or API clients are applications written for a particular language that translate the API to objects native to that language. They were ready to offer me the job right there and then. But it takes patience and faith – patience to understand that rewards for this sort of work are not immediate and faith in one’s own skills to know that they will someday be recognised. That is, of course, not the sole reason – or even the main reason – why I open-source software, but I would lie if I pretended it was not sometimes at the back of my head.

At which point it’s somewhat ironic to see Mr Hess complain about an artist being asked to do something for free (and he wasn’t even approached – this is a public advertisement in a local fishwrap!) while using a software pipeline worth millions that people have built, and simply given away, for free, for the betterment of our species and our shared humanity.

Worse, it’s quite clear that this seems to be an initiative not by Sainsbury’s but rather by a few workers who want slightly nicer surroundings but cannot afford to pay for it. Note that it’s the staff canteen, rather than customer areas, that are to be decorated. At this point, Mr Hess sounds greedier than Sainsbury’s. Who, really, is ‘exploiting’ whom here?

In my business life, I would estimate the return I get from work done free of charge at 2-300% long term. That includes, for the avoidance of doubt, people for whom I’ve done work who ended up not paying me anything at all ever. I’m not sure how it works in comedy, but in the real world, occasionally doing something for someone else without demanding recompense is not only lucrative, it’s also beneficial in other ways:

• It builds connections because it personalises a business relationship.
• It builds character because it teaches the value of selflessness.
• And it’s fun. Frankly, the best times I’ve had during my working career usually involved unpaid engagements, free-of-charge investments of time, open-source contributions or volunteer work.

The sad fact is that many, like Mr Hess, confuse righteous indignation about those who seek to profit off ‘young artists’ by exploiting them with the terrific, horrific, scary prospect of doing something for free just once in a blue moon.

Fortunately, there are plenty of young artists eager to show their skills who either have more business acumen than Mr Hess or more common sense than to publicly snub their noses at the fearsome prospect of actually doing something they are [supposed to be] enjoying for free. As such, I doubt that the Camden Sainsbury’s canteen will go undecorated.

Of the 800 or so retweets, I see few who would heed a word of wisdom, as I see the retweets are awash with remarks that are various degrees of confused, irate or just full of creative smuggity smugness), but for the rest, I’d venture the following word of wisdom:4Credited to Dale Carnegie, but reportedly in use much earlier.

If you want to make a million dollars, you’ve got to first make a million people happy.

The much-envied wealth of Silicon Valley did not happen because they greedily demanded an hourly rate for every line of code they ever produces. It happened because of the realisation that we all are but dwarfs on the shoulders of giants, and ultimately our lives are going to be made not by what we secret away but by what others share to lift us up, and what we share to lift up others with.

You are the light of the world. A city seated on a mountain cannot be hid. Neither do men light a candle and put it under a bushel, but upon a candlestick, that it may shine to all that are in the house. So let your light shine before men, that they may see your good works, and glorify your Father who is in heaven.

Title image: The blind Orion carries Cedalion on his shoulders, from Nicolas Poussin’s The Blind Orion Searching for the Rising Sun, 1658. Oil on canvas; 46 7/8 x 72 in. (119.1 x 182.9 cm), Metropolitan Museum of Art.

References   [ + ]

 1 ↑ Apache and a BSD variant licence, respectively. 2 ↑ DNS servers translate verbose and easy-to-remember domain names to IP addresses, which are not that easy to remember. 3 ↑ An API is the way third-party software can communicate with a service. API wrappers or API clients are applications written for a particular language that translate the API to objects native to that language. 4 ↑ Credited to Dale Carnegie, but reportedly in use much earlier.
Posted in Uncategorized