How to write error messages that don’t suck

I spend most of my life interacting with software I wrote, or know intimately. Most of this code is what you would consider ‘safety critical’, and for that reason, it’s written in a particularly stringent way of defensive programming. It also has to adhere to incredibly strict API contracts, so that its output has to be predictable. Error messages must be meticulous and detailed. But elsewhere, in the real world, with tough deadlines to keep, the best one can do is writing error messages that don’t suck. By that, I mean error messages that may not be works of art, but which do the job. Just how important this is I learned thanks to a recent incident.

A few days ago, I logged into my bank account, and saw that it looked like it had been raided by the Visigoth hordes. It turns out that I had an interminably long series of transactions –– not particularly big ones, but they do add up ––, for the same sum, to the same American airline, around the same time. I will name and shame them, because I want them to fix this issue: Delta, we’re talking about you.

You see, my wife has been spending Christmastide in the US, visiting friends and family (and giving me some much-needed alone time to finish the first half of my upcoming book), and for the transatlantic long haul segment of her homebound flight, we decided it would be prudent to upgrade her to a comfier seat. Except every time we tried to do so, it returned a cryptic error message – “Upgrade failed”, and a code: 4477. So we tried again. Even when my wife used Delta’s phone booking service, the booking clerk couldn’t provide her with better information as to where the problem may lie. Finally, using someone else’s card, it worked.

The problem was that each time, we were charged. The error message failed to convey one of the most important things an error message should tell you: what was changed, and what was not? Our assumption was that payment and seat upgrade was an atomic transaction: either you were charged and the seat upgrade succeeded, or the seat upgrade failed and you would not be charged. The fact that this was not so, and was not disclosed, led to what could have been quite vexing (such as our account being blocked for presumed fraudulent use).

And to think all of this could have been avoided by error messages that don’t suck.

The five rules of writing error messages that don’t suck

There are error messages that are a pure work of art. They’re something to aspire to, but in the real world, you often won’t have time to create such works of creative beauty, and have to create something acceptable, which is going to be an error message that does not suck. Such an error message really only needs to explain four things.

  • Why am I seeing this crap?
  • What caused this crap?
  • What, of all the things I wanted to happen, have happened before this crap hit?
  • What, owing to this crap, has not happened, despite me wanting them to?

Bonus points are awarded for “how do I avoid this crap from happening ever again?”, but that’s more the domain of the documentation than of error messages. Let’s take these one by one.

This is NOT what an informative error message looks like. Image courtesy of redditor u/BenkoUK.

Why am I seeing this crap?

Programming languages written for adults, such as Python, allow you to directly instantiate even base-level exception classes, such as IndexError or KeyError. The drawback of this is that most of these often don’t explain what actually happened. I encountered this recently in some code that, it turned out, I myself have committed (sometimes git blame really lives up to its name!).

Consider the following function:

def mean_gb(image, x, y):
    """Returns the mean of the green and blue values of the pixel at (x, y)."""
    mean_gb = (image[x][y][0] + image[x][y][1])/2
    return math.floor(mean_gb)

To explain in brief: in Python, it is a convention to represent images as a tensor of rank 3 (a cube of numbers) using an n-dimensional array type (ndarray) from numpy, where one dimension each represents the x and y coordinates of each pixel and one dimension represents the channel’s sequential identifier.1 The function above retrieves the average of the green and blue channel values of an image represented in this way. To access a value m_{i, j, k} of a tensor m of rank 3, numpy uses the chained accessors syntax m[i][j][k]. And as long as we provide a tensor of rank 3, this function works just fine.

What, however, if we simply provide a tensor of rank 2 (a two-dimensional matrix)? We get… an IndexError:

In  [5]: mean_gb(a, 1, 1) 
Out [5]: 
    --------------------------------------------------------------------------- IndexError 
    <ipython-input-4-743d4fa2a9dd> in <module> 
    ----> 1 mean_gb(a, 1, 1) 

    <ipython-input-3-fe96408de555> in mean_gb(image, x, y) 
          2 """Returns the mean of the green and blue values of the pixel at (x, y).""" 
          3 
    ----> 4 mean_gb = (image[x][y][0] + image[x][y][1])/2
IndexError: invalid index to scalar variable.

From the perspective of numpy, this is completely true: we did request a value that didn’t exist. In fact, we tried to index a single scalar (the first two accessors already yielded a single scalar, and a scalar cannot normally be indexed), which we are indeed being told, albeit in a slightly strange language (it’s not that the index to the scalar is invalid, it’s that the idea of indexing a scalar is invalid).

However, for the user, this makes no sense. It does not point out to the user that they submitted a tensor of rank 2 for a function that only makes sense with tensors of rank 3. The proper way to deal with this, of course, is to write a custom error message:

def mean_gb(image, x, y):     
    """Returns the mean of the green and blue values of the pixel at (x, y)."""          

    if len(image.shape) == 3:         
         mean_gb = (image[x][y][0] + image[x][y][1])/2
         return math.floor(mean_gb)     
    else:         
         raise IndexError(f"The image argument has incorrect dimensionality. The image you submitted has {len(image.shape)} dimensions, whereas this function requires the image to have 3 dimensions to work.")

Best practice: smart error messages
A good error message is as long as it needs to be and as short as it can be while not omitting any critical details.

The .shape property of a numpy.ndarray object provides a tuple containing the shape of the array. For instance, a 4-element vector has the shape (4,), a 3-by-5 matrix has shape (3, 5) and a tensor of rank 3 representing a 100×200 pixel image with three channels has shape (100, 200, 3). len(image.shape) therefore is a convenient way to determine the tensor dimensionality of an n-dimensional numpy array. In the above instance, if the dimensionality was anything other than three, an error message was raised. Note that the error message explained

  • what’s wrong (incorrect dimensionality),
  • why that dimensionality is wrong (it should be 3, and it’s something else), and
  • what the value is and what it should be instead.

An equally correct, and shorter, error message would have been

IndexError(“The image parameter must be a three-dimensional ndarray.”)

Short but sweet. You could also subclass IndexError:

class Expected3DError(IndexError):
     def __init__(self, *args, **kwargs):
         super(IndexError, self).__init__("Expected ndarray to have rank 3.", *args, **kwargs)

Best practice: exception/error classes
If a particular kind of error is prone to occur, it makes sense to give it its own class. The name itself can already hint at the more specific problem (e.g. that it’s the number of dimensions that’s at issue, not just some random index, in the case outlined above).

This could then be simply raised with no arguments (or, of course, you can build in additional fanciness, such as displaying the actual dimensions in the error message if the image or its shape is passed into the error):

def mean_gb(image, x, y):
    """Returns the mean of the green and blue values of the pixel at (x, y)."""
    
    if len(image.shape) == 3:
        mean_gb = (image[x][y][0] + image[x][y][1])/2
        return math.floor(mean_gb)
    else:
        raise Expected3DError()

This would yield the following output:

---------------------------------------------------------------------------
Expected3DError                           Traceback (most recent call last)
<ipython-input-49-743d4fa2a9dd> in <module>
----> 1 mean_gb(a, 1, 1)

<ipython-input-48-11cc8210b9d7> in mean_gb(image, x, y)
      5         mean_gb = (image[x][y][0] + image[x][y][1])/2
      6     else:
----> 7         raise Expected3DError()

Expected3DError: Expected ndarray to have rank 3.

As you can see, there are many ways to write error messages that tell you why the error has occurred. They do not need to replace a traceback, but they need to explain the initial (root) cause, not the proximate cause, of the error.

What caused this crap?

Yes, this is a single Java stack trace. It is worse than uninformative, and superbly confusing, despite valiant efforts to the contrary. Image courtesy of Tomasz Nurkiewicz, who contributed the root-cause-first patch to Java.

Best practice: tracebacks/stack traces
Always ensure your stack traces are meaningful and root-cause-first (the first function to be mentioned is the innermost function).

Here, the root cause is explained… badly. 58,220 is a number in the human sense of the word but to the computer, it contains a non-numeric character (the ,).
Good solution: rephrase error message to explain numbers must only consist of digits.
Better solution: simply prune the comma.
Image courtesy of r/softwaregore.

Closely related to the previous, your error message should provide some explanation of root causes. For instance, in the above example, explaining that the function expected a 3-dimensional ndarray described not only the reason the error message appeared but also why the value provided was erroneous. How much detail this requires depends strongly on the overall context of the API. However, users will want to know what actually happened.There is a way to overdo this, of course, as anyone who has seen the horror that is a multi-page Java traceback. But in general, error messages that explain what is expected, and possibly also what was provided instead, spare long hours of debugging. In user-facing interfaces, this is often the only way for the user to understand what the issue is and try to fix it. For instance, a form checker that rejects an e-mail address should at least try to explain which of the relatively few categories (throwaway domain, barred e-mail address, missing @, domain name is missing at least one ., etc.) the issue falls under. Believe it or not, users do not like to play ‘let’s figure out what the issue is’! For instance, I often use e-mail subaddresses that help me filter stuff, so many of my subscriptions are under [email protected]. Not all e-mail field checkers allow the + symbol, even though it is part of a valid e-mail address2and so are some far more exotic things, such as escaped @-signs! To cut a long story short: tell your users where the issue is.

Best practice: error numbers
Error numbers are fine in larger projects, e.g. Django has a great system for it. However, if you do use error numbers, make sure you have 1) a consistent error number nomenclature, 2) a widely available explanation of what each error code means and 3) if possible, a link to the online error code resolver in the error body

There are multiple ways to accomplish this. Consider my failing channel averager function from the previous subsection. You could phrase the IndexError‘s message string in multiple ways. Here are two examples I considered:

# Version 1
IndexError("Dimension mismatch.")

# Version 2
IndexError("This function requires a 3-dimensional ndarray.")
It helps to anticipate what users think is illogical, including minor things such as negative balances. Especially where money is at issue, it would have been prudent to explain to the user how adding $15.00 of credit got them a $2.59 balance. Image courtesy of r/softwaregore.

If you have seen the source code, or spend some time thinking, both make sense. However, Version 2 is far superior: it explains what is expected, and while Dimension Mismatch sounds like a great name for an synthpop band, it isn’t exactly the most enlightening error message.

Keep in mind that users may insert a breakpoint and run a traceback on your code, user interfaces do not afford them this option. Therefore, errors on user-facing interfaces should make it very clear what was expected and what is being provided. While a technical user of your code, faced with the better Version 2 error message from above could simply examine the dimensionality of the ndarray you provided to the function, this is not an option for someone interacting with your code via, say, a web-based environment. Therefore, these environments should be very clear about why they rejected particular values, and try to identify what the user input should have looked like versus what it did look like.

What, of all the things I wanted to happen, have happened before this crap hit?

The assumption by most people is that transactions are atomic: either the entire transaction goes through, or the entire transaction is rolled back. Therefore, if a transaction is not atomic, and permanently changes state, it should make this clear. The general human reaction to an error is to try it again, so it is helpful (Delta web devs, please listen carefully) to point out which part of the transaction has occurred.

Best practice: non-atomic transactions
Non-atomic transactions can fake atomicity by built-in rollback provisions. Always perform the steps that are furthest from your system (e.g. payment processing) last, because those will be harder to roll back than any steps or changes in your own system, over which you have more total and more immediate control.

This is especially important for iterator processors. Iterator processors go through a list of things – numbers, files, tables, whatever – and perform a task on them. Often, these lists can be quite long, and it is helpful if the user knows which, if any, of the items have been successfully processed and which have not. One of the worst culprits in this field that I have ever come across was a software that took in BLAST queries, each in a separate file, and rendered visualisations. It would then proudly proclaim at the end that it has successfully rendered 995 of 1,000 files –– without, of course, telling you which five files did not render, why they did not render, and least of all why the whole application did not have a verbose mode or a logging facility. Don’t be the guy who develops eldritch horrors like this.

What, owing to this crap, has not happened, despite me wanting them to?

The reverse of the above is that it’s useful for users to know what has not been changed. This is crucial, since users need to have an awareness of what still needs to be done for a re-run. This occurs not only in the context of error messages. One of my persistent bugbears is the Unix adduser command which, after you have provided it with some details about the new user, asks you if that’s all ok, and notes the default answer is ‘yes’. However, you never get any confirmation as to whether the user has been created successfully. This is particularly annoying when batch creating users, as you cannot get a log of which user has been created successfully on stdout unless you write your own little hack that echoes the corresponding message if the exit code of adduser is 0.

Best practice: tracking sub-transaction status
Non-atomic transactions that consist of multiple steps should track the status of each of these, and be able to account for what steps have, and what steps have not, been performed.

Where a particular transaction is not atomic, it is not sufficient to simply point out its failure. Unless the transaction can ‘fake atomicity’ with rollbacks (see best practice note above), it must detail what has, and what has not, occurred. It is important for the code itself to support this, by keeping track of which steps have been successfully carried out, especially if the function consists of asynchronous calls.

Conclusion

Writing great error messages is an art for in and of itself, and it takes time to master. Unfortunately, it is a vanishing art, usually under the pressures of developing software as fast as possible with various agile methodologies that then end up pushing these into a relatively unobtrusive backlog. Just because error messages don’t make a smidgen of sense does not mean they don’t pass tests –– but they are the things that can turn users off your product for good.

The good news is that your error messages don’t have to be perfect. They don’t even have to be good. It’s enough if they don’t suck. And my hope is that with this short guide, I have helped point out some ways in which you, too, can write error messages that don’t suck. With that in mind, you will be able to build user interfaces and APIs that users will love to use because when they get something wrong (and they will get things wrong, because that’s what we humans do), they will have a less frustrating experience. We have been letting ourselves off the hook for horrid error messages and worst practices like error codes that are not adequately resolved. It is time for all of us to step up and write code that doesn’t add insult to injury by frustrating error messages after failure.

References   [ + ]

1.Typically ordered in reverse (blue, green, red) due to a quirk of OpenCV.
2.See RFC 822, Section 6.1.