Open research and reproducible science

Women in Sage, Montreal
https://doi.org/cn9t

Presented By
Tania Allard, PhD
https://trallard.github.io/Talks/WomenInSage

Introduction to reproducibility

Everything you always wanted to know but you were too afraid to ask

The father of reproducibility

~150 years ago Pasteur demonstrated how experiments can be conducted reproducibly and the value of doing it that way.

pasteur

💉 Antibiotics

medicine

🍻 Beers!!!

beers

There *is* a reproducibility (chronic) problem

Rather than a reproducibility crisis

- Mike Konczal

obataka
excelno
http://www.bitss.org/2015/12/31/science-is-show-me-not-trust-me/

- Philip Stark, Science is 'show me' not 'trust me' (2015)

🤔 But what is reproducibility?

Glad you asked

Reproducible

reproduce

Replicable

replicate

Robust

robust

Convention

- Jenny Brian on Project oriented workflows

Let's revisit a typical scenario

What you did...

Open package 'x'. Click, click, drag, click, click, right-click, save, 'results.csv'

Load into Excel. Click, drag, generate graph, right-click, save, 'graph1.png'

What you reported...

The data was analysed using package 'x' using the 'y' analysis. The results are shown in 'graph1.png'

A better technical scenario

Your objective is to have a complete chain of custody (provenance) from your raw data to your finished results and figures.

This way you would be able to figure out what code and data were used to generate which result.

If using version control you can also refer to specific versions of your study (i.e. manuscript, first quarter report, Nobel Prize committee version)

Practical scenario

Imagine someone manages to sneak into your office at night AND deletes E-V-E-R-Y-T-H-I-N-G except for your code and data ('cause these are in safe repositories)
ninja
Imagine being able to run a single command to generate everything including results, tables, and figures in their final polished form.

Wouldn't it be great?

Better yet, if someone completly unfamiliar with your project could be able to look at these files and understand what you did and why (i.e. readers, collaborators, your replacement, you in 6 months time).

From a sustainability point of view

Speed scientific progress

Contribute to open source and our beloved community

Acquire more varied, highly valuable skills

Change the current academic culture ✨

And my all time favourite... increase the bus factor

Why is it so hard?

Barriers to reproducible science/research
  • Not considered for promotion*
  • Requires additional skills
  • Takes time
  • Publication bias towards *innovative* findings
  • Held to higher standards than other

There is hope

So how do I get started?

What can I do to make my research more reproducible?

Start with small practical steps....



  • Automate when possible: a.k.a learn to code
  • Ask for help: need more training? need an extra pair of 👀 on your code?
  • Use version control
  • Help others
  • Adopt an open science approach
  • Choose the right tools... if uncertain ask for advice

Treat your digital research assets with care

The results are important but the process you followed and the tools you used to get there are just as important.
Your scripts/code, null results, datasets, and iterations can make a positive difference in research

Share with others

  • Well documented code.... even if for yourself in 6 months time
  • Data used to produce the results
  • The details of the workflows used
  • Information on how to cite your work
  • Information on how to use your work: licenses
  • Deterministic execution environments*
*To ensure that if anyone else runs your analysis on a different machine the would get the same results
think

Is reproducible == open?

What if I cannot share my data/code?

That is fine...

Find what works for you

It is not only about disclosure

You can have FAIR assets without them being open

Adopt an open science approach

What does this even mean?

It is not only about the science

It is also about the people and empowering them to make better science

We are not the leaders of tomorrow, we are the leaders of today

Have you got a minute to talk about open source?

- Lorena Barba

Quick guide to licensing

Open data and content can be freely used, modified, and shared by anyone for any purpose (The Open Definition)
  • Simply making the source code public does not make your project open source
  • Code has copyright, and without a license others don't know if they can use it or not (always add a license)

  • Permissive licenses give more freedoms: authors need to be credited (MIT, BSD, Apache License)
  • Copyleft (share-alike) licenses restrict the use of software by requiring that any derivative works be also under the license of the original (GPL).
The website http://choosealicense.com/ is a good starting point.
Also make sure to check their Appendix with a table of all FOSS licenses and features.
Not all licenses are compatible! Also compatibility is directional.
excelno
Morin et al. (2012), PLOS
Doing open science and reproducible science can often be hard and frustrating. But ...
frustrating
"Unless someone like you cares an awful lot, nothing's going to get better.
It's not."

Dr Seuss