On the challenge of building HTTP REST APIs that don’t suck

Here’s a harsh truth: most RESTful HTTP APIs (in the following, APIs) suck, to some degree or another. Including the ones I’ve written.

Now, to an extent, this is not my fault, your fault or indeed anyone’s fault. APIs occupy a strange no man’s land between stuff designed for machines and stuff designed for humans. On one hand, APIs are intended to allow applications and services to communicate with each other. If humans want to interact with some service, they will do so via some wrapper around an API, be it an iOS app, a web application or a desktop client. Indeed, the tools you need to interact with APIs – a HTTP client – are orders of magnitude less well known and less ubiquitous than web browsers. Everybody has a web browser and knows how to use one. Few people have a dedicated desktop HTTP browser like Paw or know how to use something like curl. Quick, how do you do a token auth header in curl?

At the same time, even if the end user of the API is the under-the-hood part of a client rather than a human end user, humans have to deal with the API at some point, when they’re building whatever connects to the API. Tools like Swagger/OpenAPI were intended to somewhat simplify this process, and the idea was good – let’s have APIs generate a schema that they also serve up from which a generic client can then build a specific client. Except that’s not how it ended up working in practice, and in the overwhelming majority of cases, the way an API handler is written involves Dexedrine, coffee and long hours spent poring over the API documentation.

Which is why your API can’t suck completely. There’s no reason why your API can’t be a jumbled mess of methods from the perspective of your end user, who will interact without your API without needing to know what an API even is. That’s the beauty of it all. But if you want people to use your service – which you should very much want! -, you’ll have to have an API that people can get some use out of.

Now, the web is awash with stuff about best practices in developing REST APIs. And, quite frankly, most of these are chock-full of good advice. Yes, use plural nouns. Use HATEOAS. Use the right methods. Don’t create GET methods that can change state. And so on.

But the most important thing to know about rules, as a former CO of mine used to say, is to know when to break them, and what the consequences will be when you do. There’s a philosophy of RESTful API design called pragmatic REST that acknowledges this to an extent, and uses the ideas underlying REST as a guideline, rather than strict, immutable rules. So, the first step of building APIs that don’t suck is knowing the consequences of everything you do. The problem with adhering to doctrine or rules or best practices is that none of that tells you what the consequences of your actions are, whether you follow them or not. That’s especially important when considering the consequences of not following the rules: not pluralizing your nouns and using GET to alter state have vastly different consequences. The former will piss off your colleagues (rightly so), the latter will possibly endanger the safety of your API and lead to what is sometimes referred to in the industry as Some Time Spent Updating Your LinkedIn & Resume.

Secondly, and you can take this from me – no rules are self-explanatory. Even with all the guidance in the world in your hand, there’s a decent chance I’ll have no idea why most of your code is the way it is. The bottom line being: document everything. I’d much rather have an API breaking fifteen rules and giving doctrinaire rule-followers an apoplectic fit but which is well-documented over a super-tidy bit of best practices incarnate (wouldn’t that be incodeate, given that code is not strictly made of meat?) that’s missing any useful documentation, any day of the week. There are several ways to document APIs, and no strictly right one – in fact, I would use several different methods within the same project for different endpoints. So for instance a totally run-of-the-mill DELETE endpoint that takes an object UUID as an argument requires much less documentation than a complex filtering interface that takes fifty different arguments, some of which may be mandatory. A few general principles have served me well in the past when it comes to documenting APIs:

  • Keep as much of the documentation as you can out of the code and in the parts that make it into the documentation. For instance, in Python, this is the docstring.
  • If your documentation allows, put examples into the docstring. An example can easily be drawn from the tests, by the way, which makes it a twofer.
  • Don’t document for documentation’s sake. Document to help people understand. Avoid tedious, wordy explanation for a method that’s blindingly obvious to everyone.
  • Eschew the concept of ‘required’ fields, values, query parameters, and so on. Nothing is ‘required’ – the world will not end if a query parameter is not provided, and you will be able to make the request at the very least. Rather, make it clear what the consequences of not providing the information will be. What happens if you do not enter a ‘required’ parameter? Merely calling it ‘required’ does not really tell me if it will crash, yield a cryptic error message or simply fail silent (which is something you also should try to avoid).
  • Where something must have a particular type of value (e.g. an integer), where a value will have to be provided in a particular way (e.g. a Boolean encoded as o/1 or True/False) or has a limited set of possible values (e.g. in an application tracking high school students, the year query parameter may only take the values ['freshman', 'sophomore', 'junior', 'senior']), make sure this is clearly documented, and make it clear whether the values are case sensitive or not.
  • Only put things into inline comments that would only be required for someone who is reading your code. Anything a user of your methods/endpoints ought to know about them should be in the docstring or otherwise end up in your documentation – your docstring, of course, has the added benefit that it will be visible not only for people reading the documentation but also for whoever is reading your code.
  • If you envisage even the most remote possibility that your API will have to handle Unicode, emojis or other fancy things (basically, anything beyond ASCII), make sure you explain how your API handles such values.

Finally, eat your own dog food. Writing a wrapper for your API is not only a commercially sound idea (it is much more fun for other developers to just grab an API wrapper for their language of choice than having to homebrew one), it’s also a great way to gauge how painful it is to work with your API. Unless it’s anything above a 6 on the 1-10 Visual Equivalent Scale of Painful and Grumpy Faces, you’ll be fine. And if you need to make changes, make any breaking change part of a new version. An API version string doesn’t necessarily mean the API cannot change at all, but it does mean you may not make breaking changes – this means any method, any endpoint and any argument that worked on day 0 of releasing v1 will have to work on v1, forever.

Following these rules won’t ensure your API won’t suck. But they’ll make sucking much more difficult, which is half the victory. A great marksmanship instructor I used to know said that the essence of ‘technique’, be it in handling a weapon or writing an API, is to reduce the opportunity of making avoidable mistakes. Correct running technique will force you to run in a way that doesn’t even let you injure your ankle unless you deviate from the form. Correct shooting technique eliminates the risk of elevation divergences due to discrepancies in how much air remains in the lungs by simply making you squeeze the trigger at the very end of your expiration. Good API development technique keeps you from creating APIs that suck by restricting you to practices that won’t allow you to commit some of the more egregious sins of writing APIs. And the more you can see beyond the rules and synthesise them into a body of technique that keeps you from making mistakes, the better your code will be, without cramping your creativity.

Fixing the mysterious Jupyter Tensorflow import bug

There’s a weird bug afoot that you might encounter when setting up a ‘lily white’ (brand new) development environment to play around with Tensorflow.  As it seems to have vexed quite a few people, I thought I’ll put my solution here to help future  tensorflowers find their way.  The problem presents after you have set up your new  virtualenv . You install Jupyter and Tensorflow, and  when importing, you get this:

In [1]:   import tensorflow as tf

---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
in ()
----> 1 import tensorflow as tf

ModuleNotFoundError: No module named 'tensorflow'

Oh.

Added perplexion

Say you are a dogged pursuer of bugs, and wish to check if you might have installed Tensorflow and Jupyter into different virtualenvs. One way to do that is to simply activate your virtualenv (using activate or source activate, depending on whether you use virtualenvwrapper), and starting a Python shell. Perplexingly, importing Tensorflow here will work just fine.

The solution

Caution
At this time, this works only for CPython aka ‘regular Python’ (if you don’t know what kind of Python you are running, it is in all likelihood CPython).
Note

In general, it is advisable to start fixing these issues by destroying your virtualenv and starting anew, although that’s not strictly necessary. Create a virtualenv, and note the base Python executable’s version (it has to be a version for which there is a Tensorflow wheel for your platform, i.e. 2.7 or 3.3-3.6).

Step 1

Go to the PyPI website to find the Tensorflow installation appropriate to your system and your Python version (e.g. cp36 for Python 3.6). Copy the path of the correct version, then open up a terminal window and declare it as the environment variable TF_BINARY_URL. Use pip to install from the URL you set as the environment variable, then install Jupyter.

[email protected] ~ $ export TF_BINARY_URL=https://pypi.python.org/packages/b1/74/873a5fc04f1aa8d275ef1349b25c75dd87cbd7fb84fe41fc8c0a1d9afbe9/tensorflow-1.1.0rc2-cp36-cp36m-macosx_10_11_x86_64.whl#md5=c9b6f7741d955d1d3b4991a7942f48b9
[email protected] ~ $ pip install --upgrade $TF_BINARY_URL jupyter                 
Collecting tensorflow==1.1.0rc2 from https://pypi.python.org/packages/b1/74/873a5fc04f1aa8d275ef1349b25c75dd87cbd7fb84fe41fc8c0a1d9afbe9/tensorflow-1.1.0rc2-cp36-cp36m-macosx_10_11_x86_64.whl#md5=c9b6f7741d955d1d3b4991a7942f48b9
  Using cached tensorflow-1.1.0rc2-cp36-cp36m-macosx_10_11_x86_64.whl
Collecting jupyter
  Using cached jupyter-1.0.0-py2.py3-none-any.whl

(... lots more installation steps to follow ...)

Successfully installed ipykernel-4.6.1 ipython-6.0.0 jedi-0.10.2 jinja2-2.9.6 jupyter-1.0.0 jupyter-client-5.0.1 jupyter-console-5.1.0 notebook-5.0.0 prompt-toolkit-1.0.14 protobuf-3.2.0 qtconsole-4.3.0 setuptools-35.0.1 tensorflow-1.1.0rc2 tornado-4.5.1 webencodings-0.5.1 werkzeug-0.12.1
Step 2
Now for some magic. If you launch Jupyter now, there’s a good chance it won’t find Tensorflow. Why? Because you just installed Jupyter, your shell might not have updated the jupyter alias to point to that in the virtualenv, rather than your system Python installation.

Enter which jupyter to find out where the Jupyter link is pointing. If it is pointing to a path within your virtualenvs folder, you’re good to go. Otherwise, open a new terminal window and activate your virtualenv. Check where the jupyter command is pointing now – it should point to the virtualenv.

Step 3
Fire up Jupyter, and import tensorflow. Voila – you have a fully working Tensorflow environment!

As always, let me know if it works for you in the comments, or if you’ve found some alternative ways to fix this issue. Hopefully, this helps you on your way to delve into Tensorflow and explore this fantastic deep learning framework!

Header image: courtesy of Jeff Dean, Large Scale Deep Learning for Intelligent Computer Systems, adapted from Untangling invariant object recognition by DiCarlo and Cox (2007).

Give your Twitter account a memory wipe… for free.

The other day, my wife has decided to get rid of all the tweets on one of her twitter accounts, while of course retaining all the followers. But bulk deleting tweets is far from easy. There are, fortunately, plenty of tools that offer you the service of bulk deleting your tweets… for a cost, of course. One had a freemium model that allowed three free deletes per day. I quickly calculated that it would have taken my wife something on the order of twelve years to get rid of all her tweets. No, seriously. That’s silly. I can write some Python code to do that faster, can’t I?

Turns out you can. First, of course, you’ll need to create a Twitter app from the account you wish to wipe and generate an access token, since we’ll also be performing actions on behalf of the account.

import tweepy
import time

CONSUMER_KEY=<your consumer key>
CONSUMER_SECRET=<your consumer secret>
ACCESS_TOKEN=<your access token>
ACCESS_TOKEN_SECRET=<your access token secret>
SCREEN_NAME=<your screen name, without the @>

Time to use tweepy’s OAuth handler to connect to the Twitter API:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

api = tweepy.API(auth)

Now, we could technically write an extremely sophisticated script, which looks at the returned headers to determine when we will be cut off by the API throttle… but we’ll use the easy and brutish route of holding off for a whole hour if we get cut off. At 350 requests per hour, each capable of deleting 100 tweets, we can get rid of a 35,000 tweet account in a single hour with no waiting time, which is fairly decent.

The approach will be simple: we ask for batches of 100 tweets, then call the .destroy() method on each of them, which thanks to tweepy is now bound into the object representing every tweet we receive. If we encounter errors, we respond accordingly: if it’s a RateLimitError, an error object from tweepy that – as its name suggests – shows that the rate limit has been exceeded, we’ll hold off for an hour (we could elicit the reset time from headers, but this is much simpler… and we’ve got time!), if it can’t find the status we simply leap over it (sometimes that happens, especially when someone is doing some manual deleting at the same time) and otherwise, we break the loops.

def destroy():
    while True:
        q = api.user_timeline(screen_name=SCREEN_NAME,
                              count=100)
        for each in q:
            try:
                each.destroy()
            except tweepy.RateLimitError as e:
                print (u"Rate limit exceeded: {0:s}".format(e.message))
                time.sleep(3600)
            except tweepy.TweepError as e:
                if e.message == "No status found with that ID.":
                    continue
            except Exception as e:
                print (u"Encountered undefined error: {0:s}".format(e.message))
                break
        break

Finally, we’ll make sure this is called as the module default:

if __name__ == '__main__':
    destroy()

Happy destruction!