Changing culture with the Dataset Maturity Model¶

Abstract¶

The present content introduces the FAIRplus “Dataset Maturity Model”, shows how to use it in the context of a FAIRification process to decide how far to go on a FAIR journey. We also show how each FAIR Cookbook recipes has been anchored to the model. Recipes can therefore be assembled to build a coherent path which should guarantee that datasets handled according to the recommendations can meet data management expectations in terms of FAIRness. Therefore, the FAIRplus DSM provides a handy tool for data managers to advise on changing the culture of data management but also manage expectations (and costs) when devising a FAIRification program for specific domains and digital objects living in that space.

Background¶

Maturity models are not new. There are in fact training programs specifically designed for this such as the Capability Maturity Model Integration (CMMI) 1. These models originate from engineering and manufacturing fields, in particular the military and aerospace industries, as means to rate the reliability and degree of development of a particular technology, skill or process, in other words, a capability. Notions such as Technical Readiness Levels or TRLs define a scale of 9 levels to rate a process from basic idea to production grade technology.

With the digitization of society and the pervasiveness of digital technology, the life sciences, as other fields, are wrestling with the challenges of data management as defined in the Fourth paradigm: Data-intensive Scientific Discovery 3. Organizations need to decide how to allocate resources to increase the impact of digital artefacts as they are created. Large amounts of literature exist detailing the key ideas for handling data. A resource such as the DAMA book covers in great depths the fundamental operations and challenges associated with data management activities 4.

More recently, The FAIR principles articulated key requirements and properties data should have 7. Following this important work, a number of initiatives have worked at producing domain specific maturity indicators. Among these initiatives, the Research Data Alliance Maturity Indicators seem to have gain notoriety 2.

Building on these efforts, the FAIRplus project has developed a more targeted approach by focusing on the notion of dataset.

Presentation of the FAIRplus Dataset Maturity Model (DSM)¶

The FAIRplus Dataset Maturity Model proposes a framework to incorporate key concepts defined by the Capability and Maturity Models and apply them to define maturity levels which can used to describe a dataset.

For a comprehensive overview of the FAIRplus Dataset Maturity Model, refer to the dedicated site, a screenshot of which is presented below.

Integrating the FAIRplus Dataset Maturity Model in the FAIRplus Cookbook¶

Each recipe in the FAIRplus Cookbook now incorporates one or more FAIRplus DSM indicators.

These can be found in the Recipe Card.

They are there to provide our readership with a pointer to the level of data set maturity they can expect to meet if they apply and implement the recipe.

The FAIR DSM indicators are also used to browse the recipes through the lense of maturity improvements level, which is of interest.

Finally, the FAIR Cookbook produced specific content available as jupyter notebooks which use the familiar Investigation Study Assay model 5 and Research Objects 6 to showcase how users can move through maturity levels and decide for themselves how far they need to go along the scale.

Assessing FAIRplus intervention¶

The FAIRplus DSM has subsequently been used to assess the effectiveness of interventions on datasets presented to FAIRplus experts.

Each of the 20 projects, which have interacted with FAIRplus, have been subjected to a standard protocol looking at FAIR maturity before and after intervention, when performing retrospective processing of the data. In few instances, the effect of prospective interventions could also be measured.

FAIRplus DSM developed a dedicated manual assessment template.

Training the assessor¶

As with any tool, familiarization and training are necessary to ensure that the personnel carrying out the evaluations can use the framework in a consistent fashion.

Performing the assessment¶

The FAIRplus DSM group therefore recruited FAIR experts and over the course of a dedicated workshop presented the DSM model, proposed exercises and then asked participants to rate several datasets independently.

The next step consisted in evaluating the inter-rater agreement when using the framework.

A debriefing of the rating was carried out and was the ideal opportunity to clarify any misunderstandings about the indicator definitions and therefore reconcile rating discrepancies between the participant. Difference in interpretations were identified leading in a refinement of the definitions and improvement of the documentation of the FAIRplus dataset maturity model. It also resulted in streamlining both the training program and the evaluation program.

The following figure shows the effect of a FAIRification process on an IMI eTOX dataset.

Conclusions: It is about changing the data management culture!¶

The FAIRplus Dataset Maturity Model (DSM) developed by the consortium is proving a valuable tool for Data Managers, Decision Makers and Data Scientist to identify the weak points in their FAIRification strategies or more simply to define the level of maturity they are capable of delivery within the constraints of the project or research program. The FAIRplus DSM, following a minimal familiarization and training period, provides the means to effective quantity and articulate on FAIRification strategies and choke points. Therefore, by enabling a clearer way for communicating and talking about FAIRification process, the FAIRplus DSM constitutes an excellent tool to plan and enable changes in the way datasets can be managed

What to read next?¶

Moving through maturity levels with ISA by running the following notebooks in the indicated order:

The Pistoia Alliance FAIRtoolkit Data Capability Maturity Model

FAIRsharing records appearing in this recipe:

Reference¶

References

1: Capability maturity model integration. URL: https://cmmiinstitute.com/cmmi.
2: FAIR Data Maturity Model Working Group. FAIR Data Maturity Model. Specification and Guidelines. June 2020. URL: https://doi.org/10.15497/rda00050, doi:10.15497/rda00050.
3: Tony Hey, Stewart Tansley, and Kristin Tolle, editors. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington, 2009. URL: http://research.microsoft.com/en-us/collaboration/fourthparadigm/.
4: Dama International. DAMA-DMBOK: Data Management Body of Knowledge (2nd Edition). Technics Publications, LLC, Denville, NJ, USA, 2017. ISBN 1634622340. doi:10.5555/3165209.
5: P. Rocca-Serra, M. Brandizi, E. Maguire, N. Sklyar, C. Taylor, K. Begley, D. Field, S. Harris, W. Hide, O. Hofmann, S. Neumann, P. Sterk, W. Tong, and S. A. Sansone. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level. Bioinformatics, 26(18):2354–2356, Sep 2010.
6: Peter Sefton, Eoghan Ó Carragáin, Stian Soiland-Reyes, Oscar Corcho, Daniel Garijo, Raul Palma, Frederik Coppens, Carole Goble, José M. Fernández, Kyle Chard, Jose Manuel Gomez-Perez, Michael R. Crusoe, Ignacio Eguinoa, Nick Juty, Kristi Holmes, Jason A. Clark, Salvador Capella-Gutierrez, Alasdair J. G. Gray, Stuart Owen, Alan R. Williams, Giacomo Tartari, Finn Bacall, Thomas Thelen, Hervé Ménager, Laura Rodríguez-Navas, Paul Walk, brandon whitehead, Mark Wilkinson, Paul Groth, Erich Bremer, Leyla Jael Castro, Karl Sebby, Alexander Kanitz, Ana Trisovic, Gavin Kennedy, Mark Graves, Jasper Koehorst, Simone Leo, Marc Portier, Paul Brack, Milan Ojsteršek, Bert Droesbeke, Chenxu Niu, Kosuke Tanabe, Tomasz Miksa, Marco La Rosa, Cedric Decruw, Andreas Czerniak, Jeremy Jay, Sergio Serra, Ronald Siebes, Shaun de Witt, Shady El Damaty, Douglas Lowe, Xuanqi Li, Sveinung Gundersen, and Muhammad Radifar. RO-Crate Metadata Specification 1.1.2. January 2022. Recommendation published by researchobject.org - see https://w3id.org/ro/crate/1.1 for web version. URL: https://doi.org/10.5281/zenodo.5841615, doi:10.5281/zenodo.5841615.
7: M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-Beltran, A. J. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. ‘t Hoen, R. Hooft, T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson, P. Rocca-Serra, M. Roos, R. van Schaik, S. A. Sansone, E. Schultes, T. Sengstag, T. Slater, G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, and B. Mons. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 3:160018, Mar 2016.

Authors¶

Authors

Name

ORCID

Affiliation

Type

ELIXIR Node

Contribution

Ibrahim Emam

Imperial College London

Writing & Editing

Philippe Rocca-Serra

University of Oxford

Writing & Editing - Initial Draft