The #juliabook is dead, long live the #juliabook!

If you are reading this, chances are you have been following my work on Julia in Action, formerly known as Learn Julia and affectionately referred to by me as the #juliabook, developed with Manning and currently in pre-publication access (MEAP). You might have been a reader of Learn Julia the Hard Way, which is the Github repository from which the book has emerged. Or you may even have bought the MEAP.

In the latter case, you might also know that Manning has decided no longer to pursue the publishing of Julia in Action. This was their decision following their assessment of the Julia market, the language, the community and the books on the market about Julia. I believe this was an incorrect decision, as incorrect as it was when it was first brought up at the end of last year. Back then, I fought for the book, and I managed to save it, or at least give it a stay of execution. We agreed that Julia in Action would be targeted to be published at Julia v0.8, and delivered at this slower pace. But when a few weeks ago, the publisher has informed me that they are once again thinking of cancelling the book, I could no longer save it.

Coming full circle

It is important to recall the origins of this project, and where it all began.

When a few years ago, I put together some of my own notes about learning the fledgling Julia programming language on Github, I couldn’t possibly have imagined the wild ride that would eventually take me full circle to the same Github repository where it all begun.

In the two years since I started that repository, it’s been starred over 330 times, making it – according to my brief calculation – the seventh most popular Julia related repo. Not bad for something that began its life as notes I typed up as I was playing around with Julia between two jobs. Before I knew, I had a book deal from Manning to develop Learn Julia the Hard Way into Julia in Action.

A lot has happened in those two years. As the Julia language kept finding its sea-legs among technical programming languages, some of the initial enthusiasm for it waned. There are still basically no exclusively Julia jobs. There isn’t a single major application or system that runs Julia. It hasn’t replaced or supplanted Python and R to the extent many have hoped. But in what matters much more to me, it flourished: it created a great community, full of kind and supportive people and enthusiastic developers.

In those two years, I also marked my official five-year survival from HLH, making me statistically a ‘survivor’. I’ve gone from strength to strength, when I had a less than 10% chance to make it to that point without a relapse. I have also been in the meantime diagnosed with multiple sclerosis. What I thought I’d shake off (after all, have I not survived much worse before?) was actually a bigger physical and emotional struggle than I thought it would be. In the meantime, I also changed jobs and moved to a foreign country. Amidst all this, working on the #juliabook was one of the few fixed points in my life, a source of consistency and a constant ‘thing to do’.

At risk of sounding emotional, this book is my baby. And I am not going to let it disappear into nothingness.

Future Present

I have negotiated a settlement with Manning that would allow me to retain a range of rights in the text, including copyright in the manuscript and the visuals. I could, at this point, look for another publisher or knock on the door of a large publisher I know who are struggling with their Julia book and their authors. But the fact is that this has been the Community’s book all along. It would have been dedicated to the two greatest sources of impetus for me to write about Julia: the Julia community, and my wonderful wife Katie. I intend to take this dedication to the community seriously.

For this reason, I have resolved to gradually merge the contents of the #juliabook and LJtHW, creating a much more extensive, well-illustrated, colourful and comprehensive online textbook on Julia that will be free to everyone (under the Creative Commons BY-NC-SA 4.0 license).

I believe this is not only the right thing to do towards the community, whose help has meant so much for me throughout, but also to those who invested into the book by buying the MEAP edition. While you should be receiving a refund from Manning, I would like to acknowledge your help in making this book happen. For this reason, if you would like to be publicly acknowledged, please send me a message with proof of purchase of the MEAP, and I will make sure you are acknowledged for your support of this book.

Roadmap

I have spent much of the last few days working out the logistical details of making this happen. I have devised a plan that would involve integrating what is currently written in both sources, revising it and then amending it with actionable examples in Jupyter notebooks.

Completion schedule for the #juliabook, with an estimated completion date on 15 March 2018.

 

The Community’s book

I believe very strongly in the freedom of information and in access to information about the technologies that run our lives – to everyone. The more I think about it, the more I see what an opportunity Manning’s decision to abandon the #juliabook has given to the Julia community itself. What we need is not another $50 book reflecting one guy’s perspective on the language, but rather a way for the community at large to co-create a tool that will be out there and available for all who wish to get started with Julia.

Many have recently commented on Julia’s doldrums. Some even went so far as to give up on it. And it’s true. Dan Luu is spot on in his criticism of Julia, so spot on that even John Myles White, the guy whose writings on Julia got me really interested in the language, agrees with him:

A small team of highly talented developers who can basically hold all of the code in their collective heads can make great progress while eschewing anything that isn’t just straight coding at the cost of making it more difficult for other people to contribute. Is that worth it? It’s hard to say. If you have to slow down Jeff, Keno, and the other super productive core contributors and all you get out of it is a couple of bums like me, that’s probably not worth it. If you get a thousand people like me, that’s probably worth it. The reality is in the ambiguous region in the middle, where it might or might not be worth it.

What if we could make that a thousand people? What if we could get more people involved who would not have to piece together what’s what in Julia? What if we could dramatically reduce the barriers to entry to Julia, and do so without the $50 price tag? If Manning abandoning my book means that I can be just a tiny part of that, then that e-mail I got the other day from my editor might just have been the best news I’ve received in a long, long time.

Using screen to babysit long-running processes

In machine learning, especially in deep learning, long-running processes are quite common. Just yesterday, I finished running an optimisation process that ran for the best part of four days –  and that’s on a 4-core machine with an Nvidia GRID K2, letting me crunch my data on 3,072 GPU cores!  Of course, I did not want to babysit the whole process. Least of all did I want to have to do so from my laptop. There’s a reason we have tools like Sentry, which can be easily adapted from webapp monitoring to letting you know how your model is doing.

One solution is to spin up another virtual machine, ssh into that machine, then from that
ssh into the machine running the code, so that if you drop the connection to the first machine, it will not drop the connection to the second. There is also nohup, which makes sure that the process is not killed when you ‘hang up’ the ssh connection. You will, however, not be able to get back into the process again. There are also reparenting tools like reptyr, but the need they meet is somewhat different. Enter terminal multiplexers.

Terminal multiplexers are old. They date from the era of things like time-sharing systems and other antiquities whose purpose was to allow a large number of users to get their time on a mainframe designed to serve hundreds, even thousands of users. With the advent of personal computers that had decent computational power on their own, terminal multiplexers remained the preserve of universities and other weirdos still using mainframe architectures. Fortunately for us, two great terminal multiplexers, screen (aka GNU Screen ) and tmux , are still being actively developed, and are almost definitely available for your *nix of choice. This gives us a convenient tool to sneak a peek at what’s going on with our long-suffering process. Here’s how.

Step 1
ssh into your remote machine, and launch ssh. You may need to do this as sudo if you encounter the error where screen, instead of starting up a new shell, returns [screen is terminating] and quits. If screen is started up correctly, you should be seeing a slightly different shell prompt (and if you started it as sudo, you will now be logged in as root).
ssh into your machine, and launch screen (screen).
In some scenarios, you may want to ‘name’ your screen session. Typically, this is the case when you want to share your screen with another user, e.g. for pair programming. To create a named screen, invoke screen using the session name parameter -S, as in e.g. screen -S my_shared_screen.
Step 2
In this step, we will be launching the actual script to run. If your script is Python based and you are using virtualenv (as you ought to!), activate the environment now using source /<virtualenv folder>/bin/activate, replacing  virtualenv folderby the name of the folder where your virtualenvs live (for me, that’s the environments folder, often enough it’s something like ~/.virtualenvs) and by the name of your virtualenv (in my case, research). You have to activate your virtualenv even if you have done so outside of screen already (remember, screen means you’re in an entirely new shell, with all environment configurations, settings, aliases &c. gone)!

With your virtualenv activated, launch it as normal — no need to launch it in the background. Indeed, one of the big advantages is the ability to see verbose mode progress indicators. If your script does not have a progress logger to stdout but logs to a logfile, you can start it using nohup, then put it into the background (Ctrl--Z, then bg) and track progress using tail -f logfile.log (where logfile.log is, of course, to be substituted by the filename of the logfile.
Step 3
Press Ctrl--A followed by Ctrl--D to detach from the current screen. This will take you back to your original shell after noting the address of the screen you’re detaching from. These always follow the format <identifier>.<session id>.<hostname>, where hostname is, of course, the hostname of the computer from which the screen session was started, stands for the name you gave your screen if any, and is an autogenerated 4-6 digit socket identifier. In general, as long as you are on the same machine, the screen identifier or the session name will be sufficient – the full canonical name is only necessary when trying to access a screen on another host.

To see a list of all screens running under your current username, enter screen -list. Refer to that listing or the address echoed when you detached from the screen to reattach to the process using screen -r <socket identifier>[.<session identifier>.<hostname>]. This will return you to the script, which keeps executing in the background.
Result
Reattaching to the process running in the background, you can now follow the progress of the script. Use the key combination in Step 3 to step out of the process anytime and the rest of the step to return to it.

Bugs
There is a known issue, caused by strace, that leads to screen immediately closing, with the message [screen is terminating] upon invoking screen as a non-privileged user.

There are generally two ways to resolve this issue.

The overall effect of both solutions is the same. Notably, both may be undesirable from a security perspective. As always, weigh risks against utility.

Do you prefer screen to staying logged in? Do you have any other cool hacks to make monitoring a machine learning process that takes considerable time to run? Let me know in the comments!

Image credits: Zenith Z-19 by ajmexico on Flickr