How to write error messages that don’t suck

I spend most of my life interacting with software I wrote, or know intimately. Most of this code is what you would consider ‘safety critical’, and for that reason, it’s written in a particularly stringent way of defensive programming. It also has to adhere to incredibly strict API contracts, so that its output has to be predictable. Error messages must be meticulous and detailed. But elsewhere, in the real world, with tough deadlines to keep, the best one can do is writing error messages that don’t suck. By that, I mean error messages that may not be works of art, but which do the job. Just how important this is I learned thanks to a recent incident.

A few days ago, I logged into my bank account, and saw that it looked like it had been raided by the Visigoth hordes. It turns out that I had an interminably long series of transactions –– not particularly big ones, but they do add up ––, for the same sum, to the same American airline, around the same time. I will name and shame them, because I want them to fix this issue: Delta, we’re talking about you.

You see, my wife has been spending Christmastide in the US, visiting friends and family (and giving me some much-needed alone time to finish the first half of my upcoming book), and for the transatlantic long haul segment of her homebound flight, we decided it would be prudent to upgrade her to a comfier seat. Except every time we tried to do so, it returned a cryptic error message – “Upgrade failed”, and a code: 4477. So we tried again. Even when my wife used Delta’s phone booking service, the booking clerk couldn’t provide her with better information as to where the problem may lie. Finally, using someone else’s card, it worked.

The problem was that each time, we were charged. The error message failed to convey one of the most important things an error message should tell you: what was changed, and what was not? Our assumption was that payment and seat upgrade was an atomic transaction: either you were charged and the seat upgrade succeeded, or the seat upgrade failed and you would not be charged. The fact that this was not so, and was not disclosed, led to what could have been quite vexing (such as our account being blocked for presumed fraudulent use).

And to think all of this could have been avoided by error messages that don’t suck.

The five rules of writing error messages that don’t suck

There are error messages that are a pure work of art. They’re something to aspire to, but in the real world, you often won’t have time to create such works of creative beauty, and have to create something acceptable, which is going to be an error message that does not suck. Such an error message really only needs to explain four things.

  • Why am I seeing this crap?
  • What caused this crap?
  • What, of all the things I wanted to happen, have happened before this crap hit?
  • What, owing to this crap, has not happened, despite me wanting them to?

Bonus points are awarded for “how do I avoid this crap from happening ever again?”, but that’s more the domain of the documentation than of error messages. Let’s take these one by one.

This is NOT what an informative error message looks like. Image courtesy of redditor u/BenkoUK.

Why am I seeing this crap?

Programming languages written for adults, such as Python, allow you to directly instantiate even base-level exception classes, such as IndexError or KeyError. The drawback of this is that most of these often don’t explain what actually happened. I encountered this recently in some code that, it turned out, I myself have committed (sometimes git blame really lives up to its name!).

Consider the following function:

def mean_gb(image, x, y):
    """Returns the mean of the green and blue values of the pixel at (x, y)."""
    mean_gb = (image[x][y][0] + image[x][y][1])/2
    return math.floor(mean_gb)

To explain in brief: in Python, it is a convention to represent images as a tensor of rank 3 (a cube of numbers) using an n-dimensional array type (ndarray) from numpy, where one dimension each represents the x and y coordinates of each pixel and one dimension represents the channel’s sequential identifier.1 The function above retrieves the average of the green and blue channel values of an image represented in this way. To access a value m_{i, j, k} of a tensor m of rank 3, numpy uses the chained accessors syntax m[i][j][k]. And as long as we provide a tensor of rank 3, this function works just fine.

What, however, if we simply provide a tensor of rank 2 (a two-dimensional matrix)? We get… an IndexError:

In  [5]: mean_gb(a, 1, 1) 
Out [5]: 
    --------------------------------------------------------------------------- IndexError 
    <ipython-input-4-743d4fa2a9dd> in <module> 
    ----> 1 mean_gb(a, 1, 1) 

    <ipython-input-3-fe96408de555> in mean_gb(image, x, y) 
          2 """Returns the mean of the green and blue values of the pixel at (x, y).""" 
          3 
    ----> 4 mean_gb = (image[x][y][0] + image[x][y][1])/2
IndexError: invalid index to scalar variable.

From the perspective of numpy, this is completely true: we did request a value that didn’t exist. In fact, we tried to index a single scalar (the first two accessors already yielded a single scalar, and a scalar cannot normally be indexed), which we are indeed being told, albeit in a slightly strange language (it’s not that the index to the scalar is invalid, it’s that the idea of indexing a scalar is invalid).

However, for the user, this makes no sense. It does not point out to the user that they submitted a tensor of rank 2 for a function that only makes sense with tensors of rank 3. The proper way to deal with this, of course, is to write a custom error message:

def mean_gb(image, x, y):     
    """Returns the mean of the green and blue values of the pixel at (x, y)."""          

    if len(image.shape) == 3:         
         mean_gb = (image[x][y][0] + image[x][y][1])/2
         return math.floor(mean_gb)     
    else:         
         raise IndexError(f"The image argument has incorrect dimensionality. The image you submitted has {len(image.shape)} dimensions, whereas this function requires the image to have 3 dimensions to work.")

Best practice: smart error messages
A good error message is as long as it needs to be and as short as it can be while not omitting any critical details.

The .shape property of a numpy.ndarray object provides a tuple containing the shape of the array. For instance, a 4-element vector has the shape (4,), a 3-by-5 matrix has shape (3, 5) and a tensor of rank 3 representing a 100×200 pixel image with three channels has shape (100, 200, 3). len(image.shape) therefore is a convenient way to determine the tensor dimensionality of an n-dimensional numpy array. In the above instance, if the dimensionality was anything other than three, an error message was raised. Note that the error message explained

  • what’s wrong (incorrect dimensionality),
  • why that dimensionality is wrong (it should be 3, and it’s something else), and
  • what the value is and what it should be instead.

An equally correct, and shorter, error message would have been

IndexError(“The image parameter must be a three-dimensional ndarray.”)

Short but sweet. You could also subclass IndexError:

class Expected3DError(IndexError):
     def __init__(self, *args, **kwargs):
         super(IndexError, self).__init__("Expected ndarray to have rank 3.", *args, **kwargs)

Best practice: exception/error classes
If a particular kind of error is prone to occur, it makes sense to give it its own class. The name itself can already hint at the more specific problem (e.g. that it’s the number of dimensions that’s at issue, not just some random index, in the case outlined above).

This could then be simply raised with no arguments (or, of course, you can build in additional fanciness, such as displaying the actual dimensions in the error message if the image or its shape is passed into the error):

def mean_gb(image, x, y):
    """Returns the mean of the green and blue values of the pixel at (x, y)."""
    
    if len(image.shape) == 3:
        mean_gb = (image[x][y][0] + image[x][y][1])/2
        return math.floor(mean_gb)
    else:
        raise Expected3DError()

This would yield the following output:

---------------------------------------------------------------------------
Expected3DError                           Traceback (most recent call last)
<ipython-input-49-743d4fa2a9dd> in <module>
----> 1 mean_gb(a, 1, 1)

<ipython-input-48-11cc8210b9d7> in mean_gb(image, x, y)
      5         mean_gb = (image[x][y][0] + image[x][y][1])/2
      6     else:
----> 7         raise Expected3DError()

Expected3DError: Expected ndarray to have rank 3.

As you can see, there are many ways to write error messages that tell you why the error has occurred. They do not need to replace a traceback, but they need to explain the initial (root) cause, not the proximate cause, of the error.

What caused this crap?

Yes, this is a single Java stack trace. It is worse than uninformative, and superbly confusing, despite valiant efforts to the contrary. Image courtesy of Tomasz Nurkiewicz, who contributed the root-cause-first patch to Java.

Best practice: tracebacks/stack traces
Always ensure your stack traces are meaningful and root-cause-first (the first function to be mentioned is the innermost function).

Here, the root cause is explained… badly. 58,220 is a number in the human sense of the word but to the computer, it contains a non-numeric character (the ,).
Good solution: rephrase error message to explain numbers must only consist of digits.
Better solution: simply prune the comma.
Image courtesy of r/softwaregore.

Closely related to the previous, your error message should provide some explanation of root causes. For instance, in the above example, explaining that the function expected a 3-dimensional ndarray described not only the reason the error message appeared but also why the value provided was erroneous. How much detail this requires depends strongly on the overall context of the API. However, users will want to know what actually happened.There is a way to overdo this, of course, as anyone who has seen the horror that is a multi-page Java traceback. But in general, error messages that explain what is expected, and possibly also what was provided instead, spare long hours of debugging. In user-facing interfaces, this is often the only way for the user to understand what the issue is and try to fix it. For instance, a form checker that rejects an e-mail address should at least try to explain which of the relatively few categories (throwaway domain, barred e-mail address, missing @, domain name is missing at least one ., etc.) the issue falls under. Believe it or not, users do not like to play ‘let’s figure out what the issue is’! For instance, I often use e-mail subaddresses that help me filter stuff, so many of my subscriptions are under [email protected]. Not all e-mail field checkers allow the + symbol, even though it is part of a valid e-mail address2and so are some far more exotic things, such as escaped @-signs! To cut a long story short: tell your users where the issue is.

Best practice: error numbers
Error numbers are fine in larger projects, e.g. Django has a great system for it. However, if you do use error numbers, make sure you have 1) a consistent error number nomenclature, 2) a widely available explanation of what each error code means and 3) if possible, a link to the online error code resolver in the error body

There are multiple ways to accomplish this. Consider my failing channel averager function from the previous subsection. You could phrase the IndexError‘s message string in multiple ways. Here are two examples I considered:

# Version 1
IndexError("Dimension mismatch.")

# Version 2
IndexError("This function requires a 3-dimensional ndarray.")
It helps to anticipate what users think is illogical, including minor things such as negative balances. Especially where money is at issue, it would have been prudent to explain to the user how adding $15.00 of credit got them a $2.59 balance. Image courtesy of r/softwaregore.

If you have seen the source code, or spend some time thinking, both make sense. However, Version 2 is far superior: it explains what is expected, and while Dimension Mismatch sounds like a great name for an synthpop band, it isn’t exactly the most enlightening error message.

Keep in mind that users may insert a breakpoint and run a traceback on your code, user interfaces do not afford them this option. Therefore, errors on user-facing interfaces should make it very clear what was expected and what is being provided. While a technical user of your code, faced with the better Version 2 error message from above could simply examine the dimensionality of the ndarray you provided to the function, this is not an option for someone interacting with your code via, say, a web-based environment. Therefore, these environments should be very clear about why they rejected particular values, and try to identify what the user input should have looked like versus what it did look like.

What, of all the things I wanted to happen, have happened before this crap hit?

The assumption by most people is that transactions are atomic: either the entire transaction goes through, or the entire transaction is rolled back. Therefore, if a transaction is not atomic, and permanently changes state, it should make this clear. The general human reaction to an error is to try it again, so it is helpful (Delta web devs, please listen carefully) to point out which part of the transaction has occurred.

Best practice: non-atomic transactions
Non-atomic transactions can fake atomicity by built-in rollback provisions. Always perform the steps that are furthest from your system (e.g. payment processing) last, because those will be harder to roll back than any steps or changes in your own system, over which you have more total and more immediate control.

This is especially important for iterator processors. Iterator processors go through a list of things – numbers, files, tables, whatever – and perform a task on them. Often, these lists can be quite long, and it is helpful if the user knows which, if any, of the items have been successfully processed and which have not. One of the worst culprits in this field that I have ever come across was a software that took in BLAST queries, each in a separate file, and rendered visualisations. It would then proudly proclaim at the end that it has successfully rendered 995 of 1,000 files –– without, of course, telling you which five files did not render, why they did not render, and least of all why the whole application did not have a verbose mode or a logging facility. Don’t be the guy who develops eldritch horrors like this.

What, owing to this crap, has not happened, despite me wanting them to?

The reverse of the above is that it’s useful for users to know what has not been changed. This is crucial, since users need to have an awareness of what still needs to be done for a re-run. This occurs not only in the context of error messages. One of my persistent bugbears is the Unix adduser command which, after you have provided it with some details about the new user, asks you if that’s all ok, and notes the default answer is ‘yes’. However, you never get any confirmation as to whether the user has been created successfully. This is particularly annoying when batch creating users, as you cannot get a log of which user has been created successfully on stdout unless you write your own little hack that echoes the corresponding message if the exit code of adduser is 0.

Best practice: tracking sub-transaction status
Non-atomic transactions that consist of multiple steps should track the status of each of these, and be able to account for what steps have, and what steps have not, been performed.

Where a particular transaction is not atomic, it is not sufficient to simply point out its failure. Unless the transaction can ‘fake atomicity’ with rollbacks (see best practice note above), it must detail what has, and what has not, occurred. It is important for the code itself to support this, by keeping track of which steps have been successfully carried out, especially if the function consists of asynchronous calls.

Conclusion

Writing great error messages is an art for in and of itself, and it takes time to master. Unfortunately, it is a vanishing art, usually under the pressures of developing software as fast as possible with various agile methodologies that then end up pushing these into a relatively unobtrusive backlog. Just because error messages don’t make a smidgen of sense does not mean they don’t pass tests –– but they are the things that can turn users off your product for good.

The good news is that your error messages don’t have to be perfect. They don’t even have to be good. It’s enough if they don’t suck. And my hope is that with this short guide, I have helped point out some ways in which you, too, can write error messages that don’t suck. With that in mind, you will be able to build user interfaces and APIs that users will love to use because when they get something wrong (and they will get things wrong, because that’s what we humans do), they will have a less frustrating experience. We have been letting ourselves off the hook for horrid error messages and worst practices like error codes that are not adequately resolved. It is time for all of us to step up and write code that doesn’t add insult to injury by frustrating error messages after failure.

References   [ + ]

1.Typically ordered in reverse (blue, green, red) due to a quirk of OpenCV.
2.See RFC 822, Section 6.1.

The Ten Rules of Defensive Programming in R

[su_pullquote align=”right”]Where R code is integrated into a pipeline, runs autonomously or is embedded into a larger analytical solution, writing code that fails well is going to be crucial.[/su_pullquote]The topic of defensive programming in R is, admittedly, a little unusual. R, while fun and powerful, is not going to run defibrillators, nuclear power plants or spacecraft. In fact, much – if not most! – R code is actually executed interactively, where small glitches don’t really matter. But where R code is integrated into a pipeline, runs autonomously or is embedded into a larger analytical solution, writing code that fails well is going to be crucial. So below, I have collected my top ten principles of defensive programming in R. I have done so with an eye to users who do not come from the life critical systems community and might not have encountered defensive programming before, so some of these rules apply to all languages.

What is defensive programming?

The idea of defensive programming is not to write code that never fails. That’s an impossible aspiration. Rather, the fundamental idea is to write code that fails well. To me, ‘failing well’ means five things:

  1. Fail fast: your code should ensure all criteria are met before they embark upon operations, especially if those are computationally expensive or might irreversibly affect data.
  2. Fail safe: where there is a failure, your code should ensure that it relinquishes all locks and does not acquire any new ones, not write files, and so on.
  3. Fail conspicuously: when something is broken, it should return a very clear error message, and give as much information as possible to help unbreak it.
  4. Fail appropriately: failure should have appropriate effects. For every developer, it’s a judgment call to ensure whether a particular issue would be a a debug/info item, a warning or an error (which by definition means halting execution). Failures should be handled appropriately.
  5. Fail creatively: not everything needs to be a failure. It is perfectly legitimate to handle problems. One example is repeating a HTTP request that has timed out: there’s no need to immediately error out, because quite frankly, that sort of stuff happens. Equally, it’s legitimate to look for a parameter, then check for a configuration file if none was provided, and finally try checking the arguments with which the code was invoked before raising an error.1

And so, without further ado, here are the ten ways I implement these in my day-to-day R coding practice – and I encourage you to do so yourself. You will thank yourself for it.

The Ten Commandments of Defensive Programming in R

  1. Document your code.
  2. In God we trust, everyone else we verify.
  3. Keep functions short and sweet.
  4. Refer to external functions explicitly.
  5. Don’t use require() to import packages into the namespace.
  6. Aggressively manage package and version dependencies.
  7. Use a consistent style and automated code quality tools.
  8. Everything is a package.
  9. Power in names.
  10. Know the rules and their rationale, so that you know when to break them.

ONE: Document your code.

It’s a little surprising to even see this – I mean, shouldn’t you do this stuff anyway? Yes, you should, except some people think that because so much of R is done in the interpreter anyway, rules do not apply to them. Wrong! They very much do.

A few months ago, I saw some code written by a mentee of mine. It was infinitely long – over 250 standard lines! –, had half a dozen required arguments and did everything under the sun. This is, of course, as we’ll discuss later, a bad idea, but let’s put that aside. The problem is, I had no idea what the function was doing! After about half an hour of diligent row-by-row analysis, I figured it out, but that could have been half an hour spent doing something more enjoyable, such as a root canal without anaesthetic while listening to Nickelback. My friend could have saved me quite some hair-tearing by quite simply documenting his code. In R, the standard for documenting the code is called roxygen2, it’s got a great parser that outputs the beautiful LaTeX docs you probably (hopefully!) have encountered when looking up a package’s documentation, and it’s described in quite a bit of detail by Hadley Wickham. What more could you wish for?

Oh. An example. Yeah, that’d be useful. We’ll be documenting a fairly simple function, which calculates the Hamming distance between two strings of equal length, and throws something unpleasant in our face if they are not. Quick recap: the Hamming distance is the number of characters that do not match among two strings. Or mathematically put,

H(s, t) = \sum_{k=1}^{\mathcal{l}(s)} I(s_k, t_k) \mid \mathcal{l}(s) = \mathcal{l}(t)

where \mathcal{l}() is the length function and D(p, q) is the dissimilarity function, which returns 1 if two letters are not identical and 0 otherwise.

So, our function would look like this:

hamming <- function(s1, s2) {
  s1 <- strsplit(s1, "")[[1]]
  s2 <- strsplit(s2, "")[[1]]
  
  return(sum(s1 != s2))
}

Not bad, and pretty evident to a seasoned R user, but it would still be a good idea to point out a thing or two. One of these would be that the result of this code will be inaccurate (technically) if the two strings are of different lengths (we could, and will, test for that, but that’s for a later date). The Hamming distance is defined only for equal-length strings, and so it would be good if the user knew what they have to do – and what they’re going to get. Upon pressing Ctrl/Cmd+Shift+Alt+R, RStudio helpfully whips us up a nice little roxygen2 skeleton:

#' Title
#'
#' @param s1 
#' @param s2 
#'
#' @return
#' @export
#'
#' @examples
hamming <- function(s1, s2) {
  s1 <- strsplit(s1, "")[[1]]
  s2 <- strsplit(s2, "")[[1]]
  
  return(sum(s1 != s2))
}

So, let’s populate it! Most of the fields are fairly self-explanatory. roxygen2, unlike JavaDoc or RST-based Python documentation, does not require formal specification of types – it’s all free text. Also, since it will be parsed into LaTeX someday, you can go wild. A few things deserve mention.

  • You can document multiple parameters. Since s1 and s2 are both going to be strings, you can simply write @param s1,s2 The strings to be compared.
  • Use \code{...} to typeset something as fixed-width.
  • To create links in the documentation, you can use \url{https://chrisvoncsefalvay.com} to link to a URL, \code{\link{someotherfunction}} to refer to the function someotherfunction in the same package, and \code{\link[adifferentpackage]{someotherfunction}} to refer to the function someotherfunction in the adifferentpackage package. Where your function has necessary dependencies outside the current script or the base packages, it is prudent to note them here.
  • You can use \seealso{} to refer to other links or other functions, in this package or another, worth checking out.
  • Anything you put under the examples will be executed as part of testing and building the documentation. If your intention is to give an idea of what the code looks like in practice, and you don’t want the result or even the side effects, you can surround your example with a \dontrun{...} environment.
  • You can draw examples from a file. In this case, you use @example instead of @examples, and specify the path, relative to the file in which the documentation is, to the script: @example docs/examples/hamming.R would be such a directive.
  • What’s that @export thing at the end? Quite simply, it tells roxygen2 to export the function to the NAMESPACE file, making it accessible for reference by other documentation files (that’s how when you use \code{\link[somepackage]{thingamabob}}, roxygen2 knows which package to link to.

With that in mind, here’s what a decent documentation to our Hamming distance function would look like that would pass muster from a defensive programming perspective:

#' Hamming distance
#'
#' Calculates the Hamming distance between two strings of equal length.
#'
#' @param s1 
#' @param s2 
#'
#' @return The Hamming distance between the two strings \code{s1} and \code{s2}, provided as an integer.
#'
#' @section Warning:
#' 
#' For a Hamming distance calculation, the input strings must be of equal length. This code does NOT reject input strings of different lengths. 
#'
#' @examples
#' hamming("AAGAGTGTCGGCATACGTGTA", "AAGAGCGTCGGCATACGTGTA")  
#'  
#' @export
hamming <- function(s1, s2) {
  s1 <- strsplit(s1, "")[[1]]
  s2 <- strsplit(s2, "")[[1]]
  
  return(sum(s1 != s2))
}

The .Rd file generated from the hamming() function’s roxygen2 docstring: an intermediary format, resembling LaTeX, from which R can build a multitude of documentation outputsThis little example shows all that a good documentation does: it provides what to supply the function with and in what type, it provides what it will spit out and in what format, and adequately warns of what is not being checked. It’s always better to check input types, but warnings go a long way.2 From this file, R generates an .Rd file, which is basically a LaTeX file that it can parse into various forms of documentation (see left.) In the end, it yields the documentation below, with the adequate warning – a win for defensive programming!

The finished documentation, rendered as it would be in an online documentation pane in RStudio

TWO: In God we trust, everyone else we verify.

In the above example, we have taken the user at face value: we assumed that his inputs will be of equal length, and we assumed they will be strings. But because this is a post on defensive programming, we are going to be suspicious, and not trust our user. So let’s make sure we fail early and check what the user supplies us with. In many programming languages, you would be using various assertions (e.g. the assert keyword in Python), but all R has, as far as built-ins are concerned, is stopifnot(). stopifnot() does as the name suggests: if the condition is not met, execution stops with an error message. However, on the whole, it’s fairly clunky, and it most definitely should not be used to check for user inputs. For that, there are three tactics worth considering.

  1. assertthat is a package by Hadley Wickham (who else!), which implements a range of assert clauses. Most importantly, unlike stopifnot(), assertthat::assert_that() does a decent job at trying to interpret the error message. Consider our previous Hamming distance example: instead of gracelessly falling on its face, a test using
assert_that(length(s1) == length(s2))
  1. would politely inform us that s1 not equal to s2. That’s worth it for the borderline Canadian politeness alone.
  2. Consider the severity of the user input failure. Can it be worked around? For instance, a function requiring an integer may, if it is given a float, try to coerce it to a float. If you opt for this solution, make sure that you 1) always issue a warning, and 2) always allow the user to specify to run the function in ‘strict’ mode (typically by setting the strict parameter to TRUE), which will raise a fatal error rather than try to logic its way out of this pickle.
  3. Finally, make sure that it’s your code that fails, not the system code. Users should have relatively little insight into the internals of the system. And so, if at some point there’ll be a division by an argument foo, you should test whether foo == 0 at the outset and inform the user that foo cannot be zero. By the time the division operation is performed, the variable might have been renamed baz, and the user will not get much actionable intelligence out of the fact that ‘division by zero’ has occurred at some point, and baz was the culprit. Just test early for known incompatibilities, and stop further execution. The same goes, of course, for potentially malicious code.

In general, your code should be strict as to what it accepts, and you should not be afraid to reject anything that doesn’t look like what you’re looking for. Consider for this not only types but also values, e.g. if the value provided for a timeout in minutes is somewhere north of the lifetime of the universe, you should politely reject such an argument – with a good explanation, of course.

Update: After posting this on Reddit, u/BillWeld pointed out a great idiom for checking user inputs that’s most definitely worth reposting here:

f <- function(a, b, c)
{
    if (getOption("warn") > 0) {
        stopifnot(
            is.numeric(a),
            is.vector(a),
            length(a) == 1,
            is.finite(a),
            a > 0,
            is.character(b),
            # Other requirements go here
            )
    }

    # The main body of the function goes here
}

I find this a great and elegant idiom, although it is your call, as the programmer, to decide which deviations and what degree of incompatibility should cause the function to fail as opposed to merely emit a warning.

THREE: Keep functions short and sweet.

Rule #4 of the Power of Ten states

No function should be longer than what can be printed on a single sheet of paper in a standard reference format with one line per statement and one line per declaration. Typically, this means no more than about 60 lines of code per function.

Rationale: Each function should be a logical unit in the code that is understandable and verifiable as a unit. It is much harder to understand a logical unit that spans multiple screens on a computer display or multiple pages when printed. Excessively long functions are often a sign of poorly structured code.

In practice, with larger screen sizes and higher resolutions, much more than a measly hundred lines fit on a single screen. However, since many users view R code in a quarter-screen window in RStudio, an appropriate figure would be about 60-80 lines. Note that this does not include comments and whitespaces, nor does it penalise indentation styles (functions, conditionals, etc.).

Functions should represent a logical unity. Therefore, if a function needs to be split for compliance with this rule, you should do so in a manner that creates logical units. Typically, one good way is to split functions by the object they act on.

FOUR: Refer to external functions explicitly.

In R, there are two ways to invoke a function, yet most people don’t tend to be aware of this. Even in many textbooks, the library(package) function is treated as quintessentially analogous to, say, import in Python. This is a fundamental misunderstanding.

In R, packages do not need to be imported in order to be able to invoke their functions, and that’s not what the library() function does anyway. library() attaches a package to the current namespace.

What does this mean? Consider the following example. The foreign package is one of my favourite packages. In my day-to-day practice, I get data from all sorts of environments, and foreign helps me import them. One of its functions, read.epiinfo(), is particularly useful as it imports data from CDC’s free EpiInfo toolkit. Assuming that foreign is in a library accessible to my instance of R,4, I can invoke the read.epiinfo() function in two ways:

  • I can directly invoke the function using its canonical name, of the form package::function() – in this case, foreign::read.epiinfo().
  • Alternatively, I can attach the entire foreign package to the namespace of the current session using library(foreign). This has three effects, of which the first tends to be well-known, the second less so and the third altogether ignored.
    1. Functions in foreign will be directly available. Regardless of the fact that it came from a different package, you will be able to invoke it the same way you invoke, say, a function defined in the same script, by simply calling read.epiinfo().
    2. If there was a package of identical name to any function in foreign, that function will be ‘shadowed’, i.e. removed from the namespace. The namespace will always refer to the most recent function, and the older function will only be available by explicit invocation.
    3. When you invoke a function from the namespace, it will not be perfectly clear from a mere reading of the code what the function actually is, or where it comes from. Rather, the user or maintainer will have to guess what a given name will represent in the namespace at the time the code is running the particular line.

Controversially, my suggestion is

  • to eschew the use of library() altogether, and
  • write always explicitly invoke functions outside those functions that are in the namespace at startup.

This is not common advice, and many will disagree. That’s fine. Not all code needs to be safety-critical, and importing ggplot2 with library() for a simple plotting script is fine. But where you want code that’s easy to analyse, easy to read and can be reliably analysed as well, you want explicit invocations. Explicit invocations give you three main benefits:

  1. You will always know what code will be executed. filter may mean dplyr::filter, stats::filter, and so on, whereas specifically invoking dplyr::filter is unambiguous. You know what the code will be (simply invoking dplyr::filter without braces or arguments returns the source), and you know what that code is going to do.
  2. Your code will be more predictable. When someone – or something – analyses your code, they will not have to spend so much time trying to guess what at the time a particular identifier refers to within the namespace.
  3. There is no risk that as a ‘side effect’ various other functions you seek to rely on will be removed from the namespace. In interactive coding, R usually warns you and lists all shadowed functions upon importing functions with the same name into the namespace using library(), but for code intended to be bulk executed, this issue has caused a lot of headache.

Obviously, all of this applies to require() as well, although on the whole the latter should not be applied in general.

FIVE: Don’t use require() to import a package into the namespace.

Even seasoned R users sometimes forget the difference between library() and require(). The difference is quite simple: while both functions attempt to attach the package argument to the namespace,

  • require() returns FALSE if the import failed, while
  • library() simply loads the package and raises an error if the import failed.

Just about the only legitimate use for require() is writing an attach-or-install function. In any other case, as Yihui Xie points out, require() is almost definitely the wrong function to use.

SIX: Aggressively manage package and version dependencies.

Packrat is one of those packages that have changed what R is like – for the better. Packrat gives every project a package library, akin to a private /lib/ folder. This is not the place to document the sheer awesomeness of Packrat – you can do so yourself by doing the walkthrough of Packrat. But seriously, use it. Your coworkers will love you for it.

Equally important is to make sure that you specify the version of R that your code is written against. This is best accomplished on a higher level of configuration, however.

SEVEN: Use a consistent style and automated code quality tools.

This should be obvious – we’re programmers, which means we’re constitutionally lazy. If it can be solved by code faster than manually, then code it is! Two tools help you in this are lintr and styler.

  • lintr is an amazingly widely supported (from RStudio through vim to Sublime Text 3, I hear a version for microwave ovens is in the works!) linter for R code. Linters improve code quality primarily by enforcing good coding practices rather than good style. One big perk of lintr is that it can be injected rather easily into the Travis CI workflow, which is a big deal for those who maintain multi-contributor projects and use Travis to keep the cats appropriately herded.
  • styler was initially designed to help code adhere to the Tidyverse Style Guide, which in my humble opinion is one of the best style guides that have ever existed for R. It can now take any custom style files and reformat your code, either as a function or straight from an RStudio add-in.

So use them.

EIGHT: Everything is a package.

Whether you’re writing R code for fun, profit, research or the creation of shareholder value, your coworkers and your clients – rightly! – expect a coherent piece of work product that has everything in one neat package, preferably version controlled. Sending around single R scripts might have been appropriate at some point in the mid-1990s, but it isn’t anymore. And so, your work product should always be structured like a package. As a minimum, this should include:

  1. A DESCRIPTION and NAMESPACE file.
  2. The source code, including comments.
  3. Where appropriate, data mappings and other ancillary data to implement the code. These go normally into the data/ folder. Where these are large, such as massive shape files, you might consider using Git LFS.
  4. Dependencies, preferably in a packrat repo.
  5. The documentation, helping users to understand the code and in particular, if the code is to be part of a pipeline, explaining how to interact with the API it exposes.
  6. Where the work product is an analysis rather than a bit of code intended to carry out a task, the analysis as vignettes.

To understand the notion of analyses as packages, two outstanding posts by Robert M. Flight are worth reading: part 1 explains the ‘why’ and part 2 explains the ‘how’. Robert’s work is getting a little long in the tooth, and packages like knitr have taken the place of vignettes as analytical outputs, but the principles remain the same. Inasmuch as it is possible, an analysis in R should be a self-contained package, with all the dependencies and data either linked or included. From the perspective of the user, all that should be left for them to do is to execute the analysis.

NINE: Power in names.

There are only two hard things in Computer Science: cache invalidation and naming things.

Phil Karlton

In general, R has a fair few idiosyncrasies in naming things. For starters, dots/periods . are perfectly permitted in variable names (and thus function names), when in most languages, the dot operator is a binary operator retrieving the first operand object’s method called the second operand:

a.b(args) = dot(a, b, args) = a_{Method: b}(args)

For instance, in Python, wallet.pay(arg1, arg2) means ‘invoke the method pay of the object wallet with the arguments arg1 and arg2‘. In R, on the other hand, it’s a character like any other, and therefore there is no special meaning attached to it – you can even have a variable contaning multiple dots, or in fact a variable whose name consists entirely of dots5 – in R, .......... is a perfectly valid variable name. It is also a typcal example of the fact that justibecause you can do something doesn’t mean you also should do so.

A few straightforward rules for variable names in R are worth abiding by:

  1. Above all, be consistent. That’s more important than whatever you choose.
  2. Some style guides, including Google’s R style guide, treat variables, functions and constants as different entities in respect of naming. This is, in my not-so-humble opinion, a blatant misunderstanding of the fact that functions are variables of the type function, and not some distinct breed of animal. Therefore, I recommend using a unitary schema for all variables, callable or not.
  3. In order of my preference, the following are legitimate options for naming:
    • Underscore separated: average_speed
    • Dot separated: average.speed
    • JavaScript style lower-case CamelCase: averageSpeed
  4. Things that don’t belong into identifiers: hyphens, non-alphanumeric characters, emojis (🤦🏼‍♂️) and other horrors.
  5. Where identifiers are hierarchical, it is better to start representing them as hierarchical objects rather than assigning them to different variables. For example, instead of monthly_forecast_january, monthly_forecast_february and so on, it is better to have a list associative array called forecasts in which the forecasts are keyed by month name, and can then be retrieved using the $key or the [key] accessors. If your naming has half a dozen components, maybe it’s time to think about structuring your data better.

Finally, in some cases, the same data may be represented by multiple formats – for instance, data about productivity is first imported as a text file, and then converted into a data frame. In such cases, Hungarian notation may be legitimate, e.g. txt_productivity or productivity.txt vs df_productivity or productivity.df. This is more or less the only case in which Hungarian notation is appropriate.6

And while we’re at variables: never, ever, ever use = to assign. Every time you do it, a kitten dies of sadness.

For file naming, some sensible rules have served me well, and will be hopefully equally useful for you:

  1. File names should be descriptive, but no longer than 63 characters.
  2. File names should be all lower case, separated by underscores, and end in .R. That’s a capital R. Not a lower-case R. EveR.
  3. Where there is a ‘head’ script that sequentially invokes (sources) a number of subsidiary scripts, it is common for the head script to be called 00-.R, and the rest given a sequential corresponding to the order in which they are sourced and a descriptive name, e.g. 01-load_data_from_db.R, 02-transform_data_and_anonymise_records.R and so on.
  4. Where there is a core script, but it does not invoke other files sequentially, it is common for the core script to be called 00-base.R or main.R. As long as it’s somewhere made clear to the user which file to execute, all is fair.
  5. The injunction against emojis and other nonsense holds for file names, too.

TEN: Know the rules and their rationale, so that you know when to break them.

It’s important to understand why style rules and defensive programming principles exist. How else would we know which rules we can break, and when?

The reality is that there are no rules, in any field, that do not ever permit of exceptions. And defensive programming rules are no exception. Rules are tools that help us get our work done better and more reliably, not some abstract holy scripture. With that in mind, when can you ignore these rules?

  • You can, of course, always ignore these rules if you’re working on your own, and most of your work is interactive. You’re going to screw yourself over, but that’s your right and privilege.
  • Adhering to common standards is more important than doing what some dude on the internet (i.e. my good self) thinks is good R coding practice. Coherence and consistency are crucial, and you’ll have to stick to your team’s style over your own ideas. You can propose to change those rules, you can suggest that they be altogether redrafted, and link them this page. But don’t go out and start following a style of your own just because you think it’s better (even if you’re right).
  • It’s always a good idea to appoint a suitable individual – with lots of experience and little ego, ideally! – as code quality standards coordinator (CQSC). They will then centrally coordinate adherence to standards, train on defensive coding practices, review operational adherence, manage tools and configurations and onboard new people.
  • Equally, having an ‘editor settings repo’ is pretty useful. This should support, at the very least, RStudio lintr and styler.
  • Some prefer to have the style guide as a GitHub or Confluence wiki – I generally advise against that, as that cannot be tracked and versioned as well as, say, a bunch of RST files that are collated together using Sphinx, or some RMarkdown files that are rendered automatically upon update using a GitHub webhook.

Conclusion

[su_pullquote align=”right”]Always code as if your life, or that of your loved ones, depended on the code you write – because it very well may someday.[/su_pullquote]In the end, defensive programming may not be critical for you at all. You may never need to use it, and even if you do, the chances that as an R programmer your code will have to live up to defensive programming rules and requirements is much lower than, say, for an embedded programmer. But algorithms, including those written in R, create an increasing amount of data that is used to support major decisions. What your code spits out may decide someone’s career, whether someone can get a loan, whether they can be insured or whether their health insurance will be dropped. It may even decide a whole company’s fate. This is an inescapable consequence of the algorithmic world we now live in.

A few years ago, at a seminar on coding to FDA CDRH standards, the instructor finished by giving us his overriding rule: in the end, always code as if your life, or that of your loved ones, depended on the code you write (because it very well may someday!). This may sound dramatic in the context of R, which is primarily a statistical programming language, but the point stands: code has real-life consequences, and we owe it to those whose lives, careers or livelihoods depend on our code to give them the kind of code that we wish to rely on: well-tested, reliable and stable.

References   [ + ]

1.One crucial element here: keep in mind the order of parameters – because the explicit should override the implied, your code should look at the arguments first, then the environment variables, then the configuration file!
2.Eagle-eyed readers might spot the mutation in the example. This is indeed the T310C mutation on the TNFRSF13B gene (17p11.2), which causes autosomal dominant type 2 Common Variable Immune Deficiency, a condition I have, but manage through regular IVIG, prophylactic antibiotics and aggressive monitoring.
3.Holzmann, G.J. (2006). The Power of 10: Rules for developing safety-critical code. IEEE Computer 39(6):95-99.
4.R stores packages in libraries, which are basically folders where the packages reside. There are global packages as well as user-specific packages. For a useful overview of packages and package installation strategies, read this summary by Nathan Stephens.
5.The exception is ..., which is not a valid variable name as it collides with the splat operator.
6.In fact, the necessity of even this case is quite arguable. A much better practice would be simply importing the text as productivity, then transform it into a data frame, if there’s no need for both versions to continue coexisting.