The FAIR Cookbook
fab fa-github repository fas fa-lightbulb open issue fas fa-pencil-alt suggest edit

FOREWORD

  • Introduction
  • Introducing the FAIR Principles
  • Reflecting on the ethical values of FAIR
  • Introducing our FAIRification framework
  • Prioritizing projects for FAIRification
  • Framing FAIR and the notion of metadata
  • Understanding the relation between FAIR and Knowledge Graphs
  • Training for FAIRification with open or synthetic biomedical datasets
  • Raising Awareness in Public Knowledge Graphs for Life Sciences
  • Reflecting on Practical Considerations for CROs to play FAIR
  • Data Protection Impact Assessment and Data Privacy
  • Glossary

Recipes at a Glance

  • All Recipes In a Table

FAIR RECIPES

  • Findability
    • 1. Introducing unique, persistent identifiers
    • 2. Creating InChI & SMILES identifiers for chemical structures
    • 3. Creating InChIKeys for IUPAC names
    • 4. Minting identifiers with Globus Minid client
    • 5. Depositing to generic repositories - Zenodo use case
    • 6. Registering datasets with Wikidata
    • 7. Creating file checksums
    • 8. Validating checksums to verify file integrity
    • 9. Introducing Search Engine Optimization (SEO)
      • 9.9.1. Marking up Data pages with Schema.org & Bioschemas for SEO
      • 9.9.2. Marking up Dataset page with Schema.org & Bioschemas for SEO
      • 9.9.3. Marking up Data Catalogue page with Schema.org & Bioschemas for SEO
  • Accessibility
    • 1. Transferring data with SFTP protocol
    • 2. Downloading data with Aspera protocol
  • Interoperability
    • 1. Registering SwissLipids identifiers in Wikidata
    • 2. Interlinking data from different sources
    • 3. Mapping identifiers with BridgeDb
      • 3.11.1. Mapping identifiers using BridgeDb web services
    • 4. Introducing terminologies and ontologies
    • 5. Selecting terminologies and ontologies
    • 6. Requesting new terms from terminologies and ontologies
    • 7. Introducing ontology-related tools and services
    • 8. Building an application ontology with ROBOT
      • 8.11.1. Building an application ontology for metabolomics - MSIO
      • 8.11.2. Defining competency questions for the Ontology ROBOT use case
    • 9. Mapping Ontologies with OxO, EBI Ontology Xref Service
    • 10. Creating a data/variable dictionary
    • 11. Creating a metadata profile
      • 11.5.1. Outlining a metadata profile for transcriptomics
      • 11.5.2. Building a community compliant metadata profile - The Covid19 sample profile use case
      • 11.5.3. Outlining a metadata profile for Bioactivity data
      • 11.5.4. Creating a metadata profile for clinical trial protocols
    • 12. Converting from proprietary to open format
    • 13. Validating file format - FASTQ example
    • 14. Inventorying tools for converting data to RDF
    • 15. Surveying extraction, transformation, load (ETL) tools
    • 16. Expressing Clinical Genetic Information as FHIR JSON
      • 16.9.1. Converting VCF file to FHIR JSON
    • 17. Creating knowledge graphs from unstructured text
      • 17.12.1. unstructured text to graph as executable notebook
  • Reusability
    • 1. Licensing
    • 2. Licensing Software
    • 3. Licensing Data
    • 4. Declaring data permitted uses
    • 5. Introducing Provenance Information
  • Infrastructure
    • 1. Introducing identifier resolution services
    • 2. Creating resolvable identifiers
    • 3. Building a catalogue of datasets
    • 4. Introducing the DATS model
    • 5. Deploying a data catalogue - The IMI data catalogue example
    • 6. Vocabulary management
      • 6.1. Introducing vocabulary portals and lookup services
      • 6.2. Selecting an ontology lookup service
      • 6.3. Deploying EBI Ontology Loopkup Service
    • 7. Using OpenRefine and Karma for FAIRification
    • 8. Developing FAIR API for the Web
  • Assessment
    • 1. Assessing with FAIR Evaluator
    • 2. Assessing with FAIRshake

FAIR Maturity

  • Changing culture with the Dataset Maturity Model
    • Improving dataset maturity - MIAPPE-compliant submission to EMBL-EBI databases
    • Publishing plant phenotypic data
    • Moving through maturity levels with ISA
      • Minimal Data Maturity with ISA - ISA-Tab and free text
      • Enhanced Data Maturity with ISA - ISA-JSON and ontology markup
      • Moving to a semantically typed version - ISA-JSON-LD
      • Dissemination - Packaging ISA as a Research Object (RO)
    • Making an omics data matrix FAIR
      • FAIRifying Data Matrices - Step1 - Starting material
      • FAIRifying Data Matrices - Step2 - Structuring data
      • FAIRifying Data Matrices - Step3 - Exploring data with SPARQL
      • FAIRifying Data Matrices - Step4 - Integrating data
    • Making Computational Workflows FAIR

FAIRified Datasets

  • Applied examples
    • 1. IMI eTox - toxicity datasets
    • 2. IMI nd4bb - chemical activities datasets
    • 3. Readying IMI Oncotrack - clinical cohort datasets for deposition to EBI Biosamples
    • 4. Depositing IMI ReSOLUTE transcriptomics datasets to EBI repositories
    • 5. Enhancing discoverability of EHDEN OHDSI data with Schema.org markup
    • 6. Depositing IMI EUBOPEN High-Content Screening data to EBI BioImage Archive
    • 7. Depositing epifluorescence and confocal microscopy data to EBI BioImage Archive
    • 8. Mapping IMI APPROACH datasets to CDISC-SDTM standard

AFTERWORD

  • Lessons learned from the FAIR journey and project outlook

GLOSSARY

  • Glossary of terms and abbreviations

JOIN US

  • Community
    • Boards and contributors
    • Code of conduct
    • Platform
      • Leveraging the Turing Way Book
  • Contribute
    • 1. How to contribute
    • 2. Add via GoogleDoc
    • 3. Add via HackMD
    • 4. Add via Git
    • 5. Git recipe template
    • 6. Tips and tricks
    • 7. Markdown cheatsheet
Powered by Jupyter Book

7. Ontology-related tools and services¶

Recipe Overview
Reading Time
15 minutes
Executable Code
No
Difficulty
Introducing ontology-related tools and services
FAIRPlus logo
Recipe Type
Survey / Review
Audience
Data Curator, Data Manager, Data Scientist, Ontologist, Software Engineer, Terminology Manager
Maturity Level & Indicator
not applicable
hover me Tooltip text
Cite me with FCB022

7.1. Main Objectives¶

This recipe aims to provide an overview of tools available to perform a number of key operations using ontologies and relevant to FAIR processes: from ontology management to using ontology for annotation or performing ontology mapping.

It aims to serve as a starting point to identify tools for FAIRification tasks where ontologies and semantic frameworks are needed.

disclaimer

It is not intended to provide a comprehensive list covering all possible tools.

The lists of tools are generated either automatically by querying the bio.tools repository, or through manual curation. In this last instance, the list produced reflects what is being used in the industry and is influenced by the FAIRplus project partners that have been surveyed for the purpose of this work.

Warning

The content in these tables was generated in March 2021. For an updated contents, please check the FAIR tooling repository.

7.2. Requirements¶

  • recipe dependency:

    • Selecting terminologies and ontologies

  • knowledge requirement:

    • be familiar with ontologies and semantic annotation.


7.3. FAIRification Objectives, Inputs and Outputs¶

Actions.Objectives.Tasks

Input

Output

ontology and terminology

text annotation


7.4. Overview¶

The figure below shows different ontology-related operations and their relationships, together with related tools and recipes.

Overview of key aspects in ontology associated processes

Fig. 7.2 Overview of key aspects in ontology associated processes¶

The table below is an overview of ontology strategies tools identified. Details of each tools are provided below.

Topic

Curated tools

Related tools in Bio.tools

Ontology annotation

ZOOMA

bioBERT

NCBI BioPortal Annotator

PPR-SSM

BioBert

HPO2GO

Termite

Vapur

PoolParty Semantic Suite

matscholar

OntoMaton

CollaboNet

Prodigy

Calchas

OntoText

QTL TableMiner++(QTM)

thbp

Ontology mapping

OxO

meshr

locdb

Ontology management

AberOWL

ngly1

BioPortal

Doc2Hpo

Centree Ontology Manager

PlanGexQ

OLS

GOcats

Ontobee

RDFScape

PoolParty

OntoBrowser

QuickGO

Circular Gene Ontology (CirGO)

Ontology engineering

eNanoMapper Slimmer

OWLAPI

Protégé

ROBOT

TopBraid Composer

VocBench


7.5. Operations¶

7.5.1. Ontology annotation¶

Ontoloy Annotation is the process of linking free text or data items to ‘tokens’ (defined terms from a lexicon) which provide semantic value. For example, “type 2 diabetes” can be annotated with corresponding term in the MONDO disease ontology.

Curated tools

Tool

Description

License

Topics

Resource Type

How to use

ZOOMA

A tool for mapping free text annotations to ontology term based on a curated repository of annotation knowledge.

EMBL-EBI Terms of Use

Ontology and terminology,
Systems biology,
Data identity and mapping

Web application,
API

ZOOMA-Getting started

NCBI BioPortal Annotator

Get annotations for biomedical text with classes from the ontologies.

BioPortal Terms of Use

Ontology and terminology,
Systems biology,
Data identity and mapping

Web application,
API

BioPortal help

BioBert

A biomedical language representation model designed for biomedical text mining tasks such as biomedical named entity recognition, relation extraction, question answering.

Apache 2.0

text mining,
named-entity recognition,
natural language processing

Python

Termite

Semantic enrichment to unlock the value of unstructured text and simplify the identification of new potential biomarker leads from scientific text.

Commercial license

Ontology and terminology

PoolParty Semantic Suite

Automate the handling of heterogeneous metadata systems and the creation of enterprise knowledge graphs.design knowledge graphs at your own pace and with speed. Create your own ontologies and custom schemes by reusing already existing ontologies such as FOAF, FIBO, schema.org and CHEBI, among others. Apply them to your existing taxonomies with ease.

Commercial license

Content enrichment,
Data integration

OntoMaton

A tool facilitating ontology search and tagging functionalities within Google Spreadsheets.

CPAL license

Google Add-ons

Prodigy

A modern annotation tool for creating training and evaluation data for machine learning models. You can also use Prodigy to help you inspect and clean your data, do error analysis and develop rule-based systems to use in combination with your statistical models.

Commercial license

Data annotation

Python, Web application,API

OntoText

Connect and publish complex enterprise knowledge with standard-compliant semantic graph database; Customize and apply analytics to link documents to graphs, extract new facts, classify and recommend content.

Commercial license

Related tools in Bio.Tools

Tool

Description

License

Topics

Resource Type

bioBERT

A pre-trained weights of BioBERT, a language representation model for biomedical domain, especially designed for biomedical text mining tasks such as biomedical named entity recognition, relation extraction, question answering, etc.

N/A

Medicine, Ontology and terminology, Natural language processing

Python

PPR-SSM

Personalized PageRank and semantic similarity measures for linking entities found in documents to concepts from domain-specific ontologies.

N/A

Imaging, Natural language processing, Data mining, Genotype and phenotype, Ontology and terminology

Java, Python

HPO2GO

Prediction of human phenotype ontology term associations using cross ontology annotation co-occurrences.Mapping between Human Phenotype Ontology (HPO) and Gene Ontology (GO) terms for the prediction of gene/protein - function - phenotype - disease associations.

GPL-3.0

Pathology, Protein interactions, Genotype and phenotype, Ontology and terminology, Gene expression

Command-line tool

Vapur

A Search Engine to Find Related Protein.Vapur is an online entity-oriented search engine for the COVID-19 anthology. Vapur is empowered with a semantic inverted index that is created through named entity recognition and relation extraction on CORD-19 abstracts.

N/A

Pathology, Ontology and terminology, Natural language processing, Enzymes

Python

matscholar

A Python library for materials-focused natural language processing (NLP). Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.

MIT

Chemistry, Ontology and terminology, Natural language processing

Command-line tool

CollaboNet

Collaboration of deep neural networks for biomedical named entity recognition.

MIT

Ontology and terminology, Natural language processing, Machine learning

Command-line tool

Calchas

A web based framework that takes advantage of domain specific ontologies, and Natural Language Processing, aiming to empower exploration of biomedical resources via semantic-based querying and search. The NLP engine analyzes the input free-text query and translates it into targeted queries with terms from the underlying ontology.

N/A

Medical informatics, Ontology and terminology, Natural language processing, Bioinformatics

Web application

QTL TableMiner++(QTM)

It is a command-line tool to retrieve and semantically annotate results obtained from QTL mapping experiments. It takes full-text articles from the Europe PMC repository as input and outputs the extracted QTLs into a relational database (SQLite) and text file (CSV).

Apache-2.0

Ontology and terminology

Command-line tool

thbp

Mapping anatomical related entities to human body parts based on wikipedia in discharge summaries.

N/A

Anatomy, Ontology and terminology, Natural language processing

7.5.2. Ontology mapping¶

The process of determining correspondences between equivalent concepts in alternative ontologies, and other vocabularies. This may include mapping to convey different levels of granularity.

Curated tools

Tool

Description

License

Topics

Resource Type

How to use

OxO

A service for finding mappings (or cross-references) between terms from ontologies, vocabularies and coding standards.

EMBL-EBI Terms of Use

Ontology alignment

GUI and API

Related tools in Bio.Tools

Tool

Description

License

Topics

Type

meshr

A set of annotation maps describing the entire MeSH assembled using data from MeSH.

Apache-2.0

Medical informatics, Data quality management

Command-line tool, Library

locdb

Manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens (HS, human) and Arabidopsis thaliana (AT, thale cress).

N/A

Ontology and terminology, Data submission, annotation and curation, Proteins

Database portal

7.5.3. Ontology management¶

The process of managing ontologies and other vocabularies in semantic web-linked data environments.This includes policies for update and maintenance of constituent and new terms.

Curated tools

Tool

Description

License

Topics

Resource Type

How to use

OLS

a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions.

EMBL-EBI Terms of Use

Ontology and terminology

Web Application, API

BioPortal

A repository of biomedical ontologies.

BioPortal Terms of Use

Ontology and terminology

Web Application, API

PoolParty

Knowledge Engineering & Knowledge Graph Management. Taxonomy, ontology and linked dataset management.

Commercial license

Ontology and terminology

Centree Ontology Manager

A centralised, enterprise-ready resource for ontology management and transforms the experience of maintaining and releasing ontologies for research-led businesses.

Commercial license

Web application, API

Ontobee

A linked data server designed for ontologies. Ontobee is aimed to facilitate ontology data sharing, visualization, query, integration, and analysis.

Apache 2.0

Web application

AberOWL

A framework for ontology-based access to biological data. It consists of a repository of bio-ontologies, a set of webservices which provide access to OWL(-EL) reasoning over the ontologies, and several frontends which utilise the ontology repository and reasoning services.

Web application, API

Related tools in Bio.Tools

Tool

Description

License

Topics

Type

ngly1

A repository for the NGLY1 Deficiency Knowledge Graph, the reasoning context to support hypothesis discovery for NGLY1 Deficiency-CDDG (DOID:0060728) research. The user can navigate the knowledge in the graph in the Neo4j Browser website.

N/A

Molecular interactions, pathways and networks, Ontology and terminology, Machine learning

Command-line tool

Doc2Hpo

Web application for efficient and accurate Human Phenotype Ontology (HPO) concept curation.

Unlicense

Genotype and phenotype, Ontology and terminology, Natural language processing

Web application

PlanGexQ

A user-friendly interactive tool for the curation and annotation of planarian morphologies and gene expression patterns in a centralized database.

N/A

Mathematics, Genotype and phenotype, Model organisms, Ontology and terminology, Gene expression

GOcats

Advances in gene ontology utilization improve statistical power of annotation enrichment.

N/A

Mapping, Ontology and terminology, Microarray experiment

Command-line tool

RDFScape

This is a project that brings Semantic Web features to the popular Systems Biology software Cytoscape. It allows to query, visualize and reason on ontologies represented in OWL or RDF within Cytoscape.

Systems biology, Ontology and terminology, Biology

Desktop application

OntoBrowser

The tool was developed to manage ontologies (and controlled terminologies e.g. CDISC SEND). The primary goal of the tool is to provide an online collaborative solution for expert curators to map code list terms (sourced from multiple systems/databases) to preferred ontology terms.

Apache 2.0

Ontology and terminology, Data identity and mapping

Web API, Web application

QuickGO

A fast browser for Gene Ontology terms and annotations.

Ontology and terminology

Web application

Circular Gene Ontology (CirGO)

Visualises non-redundant two-level hierarchically structured ontology terms from gene expression data in a 2D space.

GPL3.0

Ontology and terminology, Data visualisation, Gene expression

Command-line tool, Desktop application

7.5.4. Ontology engineering¶

Ontology engineering is the process of developing and maintaining ontologies during the ontology life cycle.

Curated tools

Tool

Description

License

Topics

Resource Type

Protégé

A free, open source ontology editor and a knowledge management system.

2-Clause BSD

Web application, Desktop application

ROBOT

An open source library and command-line tool for automating ontology development tasks. ROBOT provides ontology processing commands for a variety of tasks, including commands for converting formats, running a reasoner, creating import modules, running reports, and various other tasks.

BSD 3-Clause License

Command-line tool

OWLAPI

A Java API and reference implmentation for creating, manipulating and serialising OWL Ontologies.

LGPL and Apache

API

eNanoMapper Slimmer

A slim tool to slim ontologies as part of ontology integration. It allows users to provide configuration files that specify which parts of an ontology should be kept and/or removed, allowing to just select parts of the ontology you like.

MIT license

Java

TopBraid Composer

TopBraid Composer Maestro Edition is used to develop ontology models, configure data source integration, and create semantic services and user interfaces.

Commercial license

VocBench

a web-based, multilingual, collaborative development platform for managing OWL ontologies, SKOS(/XL) thesauri, Ontolex-lemon lexicons and generic RDF datasets.

License

Desktop application

7.6. Implementation examples¶

To show how these tools can be used in real life examples, please check the related recipes.

  • Selecting terminologies and ontologies

  • Building an application ontology with ROBOT

7.7. References¶

References

7.7.1. What to read next?¶

FAIRsharing logo

FAIRsharing records appearing in this recipe:

  • BioPortal
  • CDISC
  • CDISC Standard for Exchange of Nonclinical Data (CDISC SEND)
  • Chemical Entities of Biological Interest (ChEBI)
  • Comma-separated Values (CSV)
  • Disease Ontology (DOID)
  • Europe PubMed Central (Europe PMC)
  • Gene Ontology (GO)
  • Human Phenotype Ontology (HP)
  • Monarch Disease Ontology (MONDO)
  • Ontobee
  • Ontology Cross Reference Service (OxO)
  • Ontology Lookup Service (OLS)
  • Resource Description Framework (RDF)
  • Schema.org
  • Simple Knowledge Organization System (SKOS)
  • The FAIR Principles (FAIR)
  • Web Ontology Language (OWL)
  • bio.tools

7.8. Authors¶

Authors

Name

ORCID

Affiliation

Type

ELIXIR Node

Contribution

Fuqi Xu

EMBL-EBI

Writing - Original Draft, Software

Eva Marin del Pico

Barcelona Supercomputing Centre

Writing - Original Draft, Software

Sukhi Singh

The Hyve

Data curation, Software

Philippe Rocca-Serra

University of Oxford

Writing - Review & Editing

7.9. License¶

License
The Creative Commons 4.0 BY license

Contents
  • 7.1. Main Objectives
  • 7.2. Requirements
  • 7.3. FAIRification Objectives, Inputs and Outputs
  • 7.4. Overview
  • 7.5. Operations
    • 7.5.1. Ontology annotation
    • 7.5.2. Ontology mapping
    • 7.5.3. Ontology management
    • 7.5.4. Ontology engineering
  • 7.6. Implementation examples
  • 7.7. References
    • 7.7.1. What to read next?
  • 7.8. Authors
  • 7.9. License
6. Requesting new terms 8. Building an application ontology with ROBOT
EU flag IMI logo EFPIA logo
Grant agreement 802750
  • Contact us
  • FAIRplus
  • Contributors
  • How to contribute
Elixir logo FAIRplus logo
Recommended in the IMI/IHI Project Guidelines and the Horizon Europe Work Programme for Health as the resource for FAIR Data Management guidelines and good practices for the Life Sciences.
The FAIR Cookbook is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.