A standardised way of organising and bundling up the code, text, data and outputs associated with a publication can greatly increase the ease of distributing, checking, versioning and building on or updating your research (particularly useful during the review process!). It can also help meet the growing demands from funders, publishers and the wider academic community for greater transparency in research.
What I offer
Should you choose this route for your publications, I can set you up or help you bundle your the code, text and data associated with a publication as a research compendium.
Some potential features of a research compendium:
- Standardised compendium file structure to ensure ease of navigation within and between projects.
- Following R convention for organising and packaging code, enabling use of tools and automations available for R code development, distribution, checking and testing.
- Use of literate programming (in Rmarkdown or Quarto) to combine code, text and outputs like tables and graphs in a single reproducible document.
- Packaging up associated code like functions or data processing scripts for ease of access.
- Use of formal version control system and remote sharing platforms like GitHub for better tracking of changes and ease of distribution.
- Use of formal release system to improve reproducibility and provencance tracking of state of code used to produce results.
- Inclusion of tests for code validation and increased robustness.
- Appropriate dependency and computational environment management for increased reproducibility.
- Metadata generation for associated data to increase the potential for and ensure correct reuse.
Why bother?
Research Compendia for publication of research
While Data Science outputs might take a variety of forms, the academic standard for communicating research results remains the academic paper. However…
“An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.“John Claerbout paraphrased in Buckheit and Donoho (1995)
So how do we share and make better use of the code, data and computational enviroment representing the legitimate scholarship outputs underlying a research paper? Enter the concept of the research compendium as a container for the collection of materials associated with results reported in a research paper. The compendium serves as a means for distributing, managing, and updating the collection. Here are a couple of articles (among many others) which expand on the argument:
-
The Scientific Paper is obsolete. Here’s what’s next. The Atlantic, April 5, 2018.
-
Cut the tyranny of copy-and-paste with these coding tools. Nature, 28 February 2022
Why me?
I’ve been a proponent of the research compendium for some time now and, as an RSE, have helped many researchers bundle their projects up for better reproducibilty and reuse. I’ve taught workshops and courses on converting research code, text and data into a research compendium using R package rrtools
. More importantly, through the numerous ReproHack events I’ve facilitated, I’ve seen many approaches to publishing code and data associated with published research so have a good idea of what works and what can leave others having difficulty reproducing research or reusing outputs, despite them being freely available.
Overall I’m a true believer that building convention around domain specific research compendia for publishing associated materials is the way forward for more reproducible, transparent, robust and reusable scholarship.