More and more scientists are asked to conduct their research in a reproducible fashion. We are often asked to publish code and data that accompanies an analysis. Ostensibly, this is a good thing as it helps others judge the accuracy and efficacy of your analyses. Another good thing is that reproducible workflows help your future self when you come back to an analysis, days, months, or even years later. However, incorporating a reproducible workflow is hard for a variety of reasons. One of these is that both the data collected during an experiment, and computer code written to analyze said data change over time. Keeping track of these changes as projects scale in duration and complexity can be difficult. Another reason reproducible workflows are hard is that most non-computer scientists don't get the training needed to help with this process. For instance, do you know what a Merkle tree is? I didn't, but most CS majors do. So when they encounter version control for the first time, the underlying concept is clear.
In this workshop we will review some foundation concepts to building and implementing basic reproducible workflows using best practices in project architecture, and version control with git. By the end of the workshop, you should have a good feel for what constitutes good workflows, as well as a handful of good techniques for working with your data and your code.