Encourage better data management in research groups: a #Scidata16 approach

Last week I attended the conference “Better Science through better data” or #Scidata16 for short, organised by Nature’s journal Scientific Data and the Wellcome Collection. From its inception, the first Scidata conference back in 2014 wascidata16-1s a highly enlightening event and so were the subsequent in 2015 and this year’s (2016).


At the end of this post you will find a list of reports by other participants which give a very detailed account of the day, so I thought, instead of repeating the day in a linear fashion, to pick some questions from the audience and respond to those based on the proceedings.


The organisers this year used sli.do to capture audience engagement and it was nice for me to experience it from the participant’s side rather than the instructor’s, and use it to quickly up-vote questions appearing from others, especially early career researchers.

The honours of the first question comes from Twitter rather than sli.do and go to Ben Britton who is one of my academic colleagues.



How can we encourage better data management in our research groups?

This is a universal question, prominent throughout the conference, expressed in different ways such as:


  • How can we build a better culture of reproducibility? [Sli.do question from Erica Brockmeier]
  • How do you convince your PI to share data, particularly pre-publication? [Sli.do question from Anonymous]
  • Everyone here agrees with you – we need open data. The problem is that most people outside this conference can’t be bothered. How do we get THEM to care? [Sli.do question from Anonymous]
  • Have you had any push back from colleagues and peers for sharing all your data? [Sli.do question from Anonymous]
  • How much are problems with data management simply problems with a historical lack of knowledge on best practice? Can we teach old dogs new tricks? [Sli.do question from Anonymous]
  • How to convince your supervisor to make your data open source? What arguments and advantages? [Sli.do question from Anonymous]
  • Why should I make my data and work transparent when no one else does? [Sli.do question from Anonymous]


If we accept that one of the advantages of Data management is to make your data reproducible, then the keynote by Dr Florian Markowetz  at the start of the day explored a set of motivations and reasons why every researcher should care for reproducible research. No matter how idealistic it is to believe that at the core of science there is the ability to reproduce experiments, and definitely the right and honourable thing to do and of course the world would be a better place if everyone did it, in reality the academic career is a very competitive and demanding path where a long CV and a list of publications matter a lot.

With this in mind Florian suggested 5 selfish reasons to work reproducibly:

  1. Avoid Disaster (from a small to large-scale disasters good Data management practice saves you time in the long run whenever you need to return to your data for further analysis or revalidation of results – see for example a big career disaster which happened while transferring data across spreadsheets (Kolata, 2011),  or not being able to validate your own results (Markowetz, 2015).
  2. Easier to write papers (good documentation of code and data enables us to look up numbers and transfer them in the manuscript, be confident that figures and tables are up-to-date and of course computational data handling results in automatic and flawless update when data change).
  3. Easier to talk to reviewers (This is especially important when you submit articles for peer review. The reviewers can go to your data, check your analysis and test for themselves any suggestion for improvement before returning to you with their feedback).
  4. Continuity of your work in the lab (People move their careers across Institutions/labs and good documentation can ensure the continuity of a research project especially a long-term one without needing to start all over again). Scidata2016-3
  5. Reputation (PhD students and Postdocs engaging with reproducible research and Data management have the opportunity to learn new tools and apply this knowledge in their daily routine. Automatically this contributes to a cutting-edge skillset and more and better career opportunities. The PIs -Principal Investigators- also create a culture of best practice and reproducibility in their own labs, resulting in better research and leading by example).


Dr Jenny Molloy emphasised the benefits of being an open researcher drawing from her own experience as an early career researcher.

Data Sharing Benefits

Data Sharing Benefits

Open research:

  • accelerates career recognition;
  • leads to citations to your own research;
  • opens new possibilities for collaborations;
  • career opportunities in Open Science projects.

Dr Kevin Ashley reiterated some of the “selfish” reasons to engage with reproducible research adding the current funding mandates. With this in mind, not all data should be made open. There are always exceptions when data should be safeguarded, however a statement on how the underlying data can be accessed (a “data access statement”) is essential and required by funders.

Royston Robertson from Ludic Creatives produced this fantastic Scrib

Royston Robertson from Ludic Creatives produced this fantastic Scrib

There were asked many great questions during the day. In subsequent posts I will try to address some of those drawing from the other talks on the day.

Check back for more posts under the #scidata16 tag.





References mentioned above:

Kolata, G. (2011) How bright promise in Cancer testing fell apart. The New York Times. Available from: http://www.nytimes.com/2011/07/08/health/research/08genes.html?_r=0 [Accessed: 1 November 2016].

Markowetz, F. (2015) Five selfish reasons to work reproducibly. Genome Biology.  16 (1), 274. Available from: doi:10.1186/s13059-015-0850-7 [Accessed: 10 November 2016].

Reports and Recordings:

Follow the recordings and the presentations from the event at the Scientific Data blog


This entry was posted in Events, RDM and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s