Auto-DOI for Quarto posts via Rogue Scholar
Oh, that’s mint. We can finally use Rogue Scholar to mint DOIs for Quarto posts and append them automagically.
I love posts that allow me to merge some of my addictions. In this case, it’s my love for Quarto project scripts (which I’ve written about elsewhere), my fondness for Rogue Scholar and the overuse of the word ‘mint’ to mean ‘generally really quite rather nice’.
Rogue Scholar is a fantastic tool for science bloggers, and while it’s a little artisanal (i.e. hand-made much of the time) at this point, it’s got some really cool automated features. One is that it registers (mints, hence the abundance of lame peppermint puns across this post) DOIs for your posts.
I’ve been using Rogue Scholar to mint DOIs for my posts for a while now, but it’s always been a bit of a manual process. I’d have to wait for a while for the post to go on the Rogue Scholar feed, then copy/paste the DOI, then copy the DOI into the YAML front matter. It’s not a lot of work, but it’s a bit of a pain. I’ve been meaning to automate it for a while, but I’ve been busy with other things.
Just after I posted about this solution, Martin Fenner, who runs Rogue Scholar, pointed out that there’s now an API. The API is great, and would have spared me the part of having to scrape the HTML. I will, one of these days, switch over – if I had to build it, I’d obviously use the API, and simply parse the JSON result. The rest, ceteris paribus, holds true.
This weekend, I was laid up with being on the receiving end (for once) of the bounties of a clinical trial, so I’ve decided to finally build it. It’s a bit of a hack, but it works.
First, we scrape Rogue Scholar for titles and DOIs. Rogue Scholar’s CSS isn’t really helpful here, as the link isn’t a particular class/id of its own as far as I could discern, so I just grabbed the link by the fact that only DOI links are formatted like DOI links. Not the most elegant way, but it does the job.
- Technically unnecessary, as Rogue Scholar currently only displays ten links, but hey.
- This is where we split the DOI link into the link prefix and the DOI. We don’t need the prefix, so we just grab the second part of the split.
Next, we iterate through each blog post. This is actually quite fast, since (1) we have relatively few of them, (2) they’re text documents. We parse the YAML preface at the beginning of each of them. This looks something like this:
What this tells us is that we do want a citation (someday), which is why we’re doing this in the first place. That, according to our beautiful flowchart in Figure 1, means this post is eligible to get a DOI appended. We also know there isn’t one – DOIs are appended as key-value pairs (with the key being, unsurprisingly, doi
) to the citation
object in the YAML preface. So, we’ll see if we can get one by looking in the dictionary we scraped from Rogue Scholar in Listing 1.
- We have to split the document in two because only the preamble is proper, parseable YAML. The rest of the document is just text, so we have to recombine it later.
- If it’s a cross-post, we don’t want it to have a Google Scholar link, and we’ll definitely not attach a DOI. In theory, we could have built this to be overridable in case I’ll ever produce a cross-post I do want to have a DOI, but I don’t see that happening.
- While we’re at it, might as well prune the cross-posts.
- And anything with a DOI should also get a Google Scholar metadata.
- The
.rstrip()
is pretty useful – otherwise, every time you run this, you’ll get another newline appended to the YAML preface. - Don’t forget the
\n
before the YAML block’s end, otherwise you’ll end up with a YAML block that’s not properly separated from the rest of the document and won’t parse.
Finally, we write the YAML back to the file, and we’re done. We can now declare this as a project script, and we’re good:
One thing worth noting is that we’re not actually running this on the Quarto project itself, but on a copy of it. The consequence is that the changes are made ‘on the fly’ to the .qmd
files and do not necessarily propagate into the repo. This is a pain, because recall that we’re only fetching the last ten posts’ DOIs so as to be kind on the server: as time goes on, that means older posts ‘lose’ their DOI.
To prevent this, we can simply check our changes back in:
- The
diff-index --quiet HEAD
checks if there have been changes to the working tree.git
returns an error if you’re trying to commit on an empty working tree, so we’re checking for that first.
And that’s it. We can now run this as a Github action, and it’ll automatically append DOIs to our posts.
As noted: Quarto project scripts are pretty awesome stuff. I’m thinking of setting up an awesome-
for it on Github, because way too few of them are shared properly. I’m hoping this will change.
Citation
@misc{csefalvay2023,
author = {Chris von Csefalvay},
title = {Auto-DOI for {Quarto} Posts via {Rogue} {Scholar}},
date = {2023-11-13},
url = {https://chrisvoncsefalvay.com/posts/auto-doi},
doi = {10.59350/5hxdg-fz574},
langid = {en-GB}
}