logo
  • FAIR Cookbook

FOREWORD

  • Introduction
  • What are the FAIR principles?
  • Ethical values of FAIR
  • Metadata and FAIR
  • Datasets to train on FAIRification
  • FAIR and Knowledge Graphs
  • Public Knowledge Graphs for Life Sciences
  • Selecting projects for FAIRification
  • Practical Considerations for a CRO to do FAIR
  • Glossary

FAIR RECIPES

  • Findability
    • 1. Unique, persistent identifiers
    • 2. InChI and SMILES identifiers for chemical structures
    • 3. Minting identifiers with Minid
    • 4. Depositing in Zenodo generic repository
    • 5. Registering Datasets in Wikidata
    • 6. Describe data by checksums
    • 7. Validating data by checksums
    • 8. Search engine optimization
      • 8.9.1. Data page markup with Bioschemas
      • 8.9.2. Dataset page markup with Schema.org
      • 8.9.3. Data catalogue markup
  • Accessibility
    • 1. Transferring data with SFTP
    • 2. Downloading data with Aspera
  • Interoperability
    • 1. Interlinking data from different sources
    • 2. Identifier mapping with BridgeDb
      • 2.12.1. Using BridgeDb web services
    • 3. Introduction to terminologies and ontologies
    • 4. Selecting terminologies and ontologies
    • 5. Requesting new terms
    • 6. Ontology-related tools and services
    • 7. Building an application ontology with ROBOT
      • 7.12.1. Competency questions for the Ontology ROBOT use case
      • 7.12.2. Application ontology for metabolomics
    • 8. Creating a data/variable dictionary
    • 9. Creating a metadata profile
      • 9.4.1. Metadata profile for transcriptomics
      • 9.4.2. Metadata profile validation in RDF
      • 9.4.3. Bioactivity data profile
    • 10. Converting from proprietary to open format
    • 11. An inventory of tools for converting your data to RDF
    • 12. File format validation, FASTQ example
    • 13. Extraction, transformation, and loading process
    • 14. Clinical Genetic Information as FHIR JSON
      • 14.10.1. Converting VCF file to FHIR JSON
  • Reusability
    • 1. Licensing
    • 2. Software licenses
    • 3. Data licenses
    • 4. Declaring data's permitted uses
    • 5. Provenance information
  • Infrastructure
    • 1. Identifier resolution services
    • 2. Building a catalogue of datasets
    • 3. Deploying the IMI data catalogue
    • 4. Vocabulary management
      • 4.1. Portals and look-up services
      • 4.2. Selecting a look-up service
      • 4.3. Deploying the EBI OLS
  • Assessment
    • 1. FAIR Evaluator tool
    • 2. FAIRshake tool

FAIR Maturity

  • A Model To Gauge Dataset Maturity
    • Improving dataset maturity - the MIAPPE use case
    • Moving through maturity levels with ISA
      • Creating a basic ISA document
      • Making a community compliant - ISA in JSON
      • Moving to a semantically typed version - ISA in RDF
      • Packaging ISA as a Research Object (RO)
    • Making omics data matrices FAIR
      • 1. Starting material
      • 2. Structuring data
      • 3. Exploring data with SPARQL
      • 4. Integrating data
    • Making Computational Workflows FAIR

FAIRified Datasets

  • Applied examples
    • 1. IMI eTox - toxicity datasets
    • 2. IMI nd4bb - chemical activities datasets
    • 3. IMI Oncotrack - clinical cohort datasets
    • 4. IMI ReSOLUTE - transcriptomics datasets
    • 5. IMI EHDEN-OHDSI
    • 6. IMI EUBOPEN FAIR High-Content Screening data deposition

AFTERWORD

  • Community
    • Boards and contributors
    • Code of conduct
    • Platform
      • Leveraging the Turing Way Book
  • Contribute
    • 1. How to contribute
    • 2. Add via GoogleDoc
    • 3. Add via HackMD
    • 4. Add via Git
    • 5. Git recipe template
    • 6. Tips and tricks
    • 7. Markdown cheatsheet
Powered by Jupyter Book
Contents
  • 6.1. Main Objectives
  • 6.2. Requirements
  • 6.3. Capability & Maturity Table
  • 6.4. FAIRification Objectives, Inputs and Outputs
  • 6.5. Overview
  • 6.6. Operations
    • 6.6.1. Ontology annotation
    • 6.6.2. Ontology mapping
    • 6.6.3. Ontology management
    • 6.6.4. Ontology engineering
  • 6.7. Implementation examples
  • 6.8. References
  • 6.9. Authors
  • 6.10. License

6. Ontology-related tools and services¶




Recipe Overview
Reading Time
15 minutes
Executable Code
No
Difficulty
Ontology-related tools and services
FAIRPlus logo
Recipe Type
Survey / Review
Audience
Data Curator, Data Manager, Data Scientist, Ontologist, Software Engineer, Terminology Manager
Maturity Level & Indicator
[F+MM-1.1C] [F+MM-1.2C]
hover me Tooltip text
Cite me with FCB022

6.1. Main Objectives¶

This recipe aims to provide an overview of tools available to perform a number of key operations using ontologies and relevant to FAIR processes: from ontology management to using ontology for annotation or performing ontology mapping.

It aims to serve as a starting point to identify tools for FAIRification tasks where ontologies and semantic frameworks are needed.

disclaimer

It is not intended to provide a comprehensive list covering all possible tools.

The lists of tools are generated either automatically by querying the bio.tools repository, or through manual curation. In this last instance, the list produced reflects what is being used in the industry and is influenced by the FAIRplus project partners that have been surveyed for the purpose of this work.

Warning

The content in these tables was generated in March 2021. For an updated contents, please check the FAIR tooling repository.

6.2. Requirements¶

  • recipe dependency:

    • Selecting terminologies and ontologies

  • knowledge requirement:

    • be familiar with ontologies and semantic annotation.


6.3. Capability & Maturity Table¶

Capability

Initial Maturity Level

Final Maturity Level

Interoperability

minimal

automatable


6.4. FAIRification Objectives, Inputs and Outputs¶

Actions.Objectives.Tasks

Input

Output

ontology and terminology

text annotation


6.5. Overview¶

The figure below shows different ontology-related operations and their relationships, together with related tools and recipes.

Overview of key aspects in ontology associated processes

Fig. 6.2 Overview of key aspects in ontology associated processes¶

The table below is an overview of ontology strategies tools identified. Details of each tools are provided below.

Topic

Curated tools

Related tools in Bio.tools

Ontology annotation

ZOOMA

bioBERT

NCBI BioPortal Annotator

PPR-SSM

BioBert

HPO2GO

Termite

Vapur

PoolParty Semantic Suite

matscholar

OntoMaton

CollaboNet

Prodigy

Calchas

OntoText

QTL TableMiner++(QTM)

thbp

Ontology mapping

OxO

meshr

locdb

Ontology management

AberOWL

ngly1

BioPortal

Doc2Hpo

Centree Ontology Manager

PlanGexQ

OLS

GOcats

Ontobee

RDFScape

PoolParty

OntoBrowser

QuickGO

Circular Gene Ontology (CirGO)

Ontology engineering

eNanoMapper Slimmer

OWLAPI

Protégé

ROBOT

TopBraid Composer

VocBench


6.6. Operations¶

6.6.1. Ontology annotation¶

Ontoloy Annotation is the process of linking free text or data items to ‘tokens’ (defined terms from a lexicon) which provide semantic value. For example, “type 2 diabetes” can be annotated with corresponding term in the MONDO disease ontology.

Curated tools

Tool

Description

License

Topics

Resource Type

How to use

ZOOMA

A tool for mapping free text annotations to ontology term based on a curated repository of annotation knowledge.

EMBL-EBI Terms of Use

Ontology and terminology,
Systems biology,
Data identity and mapping

Web application,
API

ZOOMA-Getting started

NCBI BioPortal Annotator

Get annotations for biomedical text with classes from the ontologies.

BioPortal Terms of Use

Ontology and terminology,
Systems biology,
Data identity and mapping

Web application,
API

BioPortal help

BioBert

A biomedical language representation model designed for biomedical text mining tasks such as biomedical named entity recognition, relation extraction, question answering.

Apache 2.0

text mining,
named-entity recognition,
natural language processing

Python

Termite

Semantic enrichment to unlock the value of unstructured text and simplify the identification of new potential biomarker leads from scientific text.

Commercial license

Ontology and terminology

PoolParty Semantic Suite

Automate the handling of heterogeneous metadata systems and the creation of enterprise knowledge graphs.design knowledge graphs at your own pace and with speed. Create your own ontologies and custom schemes by reusing already existing ontologies such as FOAF, FIBO, schema.org and CHEBI, among others. Apply them to your existing taxonomies with ease.

Commercial license

Content enrichment,
Data integration

OntoMaton

A tool facilitating ontology search and tagging functionalities within Google Spreadsheets.

CPAL license

Google Add-ons

Prodigy

A modern annotation tool for creating training and evaluation data for machine learning models. You can also use Prodigy to help you inspect and clean your data, do error analysis and develop rule-based systems to use in combination with your statistical models.

Commercial license

Data annotation

Python, Web application,API

OntoText

Connect and publish complex enterprise knowledge with standard-compliant semantic graph database; Customize and apply analytics to link documents to graphs, extract new facts, classify and recommend content.

Commercial license

Related tools in Bio.Tools

Tool

Description

License

Topics

Resource Type

bioBERT

A pre-trained weights of BioBERT, a language representation model for biomedical domain, especially designed for biomedical text mining tasks such as biomedical named entity recognition, relation extraction, question answering, etc.

N/A

Medicine, Ontology and terminology, Natural language processing

Python

PPR-SSM

Personalized PageRank and semantic similarity measures for linking entities found in documents to concepts from domain-specific ontologies.

N/A

Imaging, Natural language processing, Data mining, Genotype and phenotype, Ontology and terminology

Java, Python

HPO2GO

Prediction of human phenotype ontology term associations using cross ontology annotation co-occurrences.Mapping between Human Phenotype Ontology (HPO) and Gene Ontology (GO) terms for the prediction of gene/protein - function - phenotype - disease associations.

GPL-3.0

Pathology, Protein interactions, Genotype and phenotype, Ontology and terminology, Gene expression

Command-line tool

Vapur

A Search Engine to Find Related Protein.Vapur is an online entity-oriented search engine for the COVID-19 anthology. Vapur is empowered with a semantic inverted index that is created through named entity recognition and relation extraction on CORD-19 abstracts.

N/A

Pathology, Ontology and terminology, Natural language processing, Enzymes

Python

matscholar

A Python library for materials-focused natural language processing (NLP). Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.

MIT

Chemistry, Ontology and terminology, Natural language processing

Command-line tool

CollaboNet

Collaboration of deep neural networks for biomedical named entity recognition.

MIT

Ontology and terminology, Natural language processing, Machine learning

Command-line tool

Calchas

A web based framework that takes advantage of domain specific ontologies, and Natural Language Processing, aiming to empower exploration of biomedical resources via semantic-based querying and search. The NLP engine analyzes the input free-text query and translates it into targeted queries with terms from the underlying ontology.

N/A

Medical informatics, Ontology and terminology, Natural language processing, Bioinformatics

Web application

QTL TableMiner++(QTM)

It is a command-line tool to retrieve and semantically annotate results obtained from QTL mapping experiments. It takes full-text articles from the Europe PMC repository as input and outputs the extracted QTLs into a relational database (SQLite) and text file (CSV).

Apache-2.0

Ontology and terminology

Command-line tool

thbp

Mapping anatomical related entities to human body parts based on wikipedia in discharge summaries.

N/A

Anatomy, Ontology and terminology, Natural language processing

6.6.2. Ontology mapping¶

The process of determining correspondences between equivalent concepts in alternative ontologies, and other vocabularies. This may include mapping to convey different levels of granularity.

Curated tools

Tool

Description

License

Topics

Resource Type

How to use

OxO

A service for finding mappings (or cross-references) between terms from ontologies, vocabularies and coding standards.

EMBL-EBI Terms of Use

Ontology alignment

GUI and API

Related tools in Bio.Tools

Tool

Description

License

Topics

Type

meshr

A set of annotation maps describing the entire MeSH assembled using data from MeSH.

Apache-2.0

Medical informatics, Data quality management

Command-line tool, Library

locdb

Manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens (HS, human) and Arabidopsis thaliana (AT, thale cress).

N/A

Ontology and terminology, Data submission, annotation and curation, Proteins

Database portal

6.6.3. Ontology management¶

The process of managing ontologies and other vocabularies in semantic web-linked data environments.This includes policies for update and maintenance of constituent and new terms.

Curated tools

Tool

Description

License

Topics

Resource Type

How to use

OLS

a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions.

EMBL-EBI Terms of Use

Ontology and terminology

Web Application, API

BioPortal

A repository of biomedical ontologies.

BioPortal Terms of Use

Ontology and terminology

Web Application, API

PoolParty

Knowledge Engineering & Knowledge Graph Management. Taxonomy, ontology and linked dataset management.

Commercial license

Ontology and terminology

Centree Ontology Manager

A centralised, enterprise-ready resource for ontology management and transforms the experience of maintaining and releasing ontologies for research-led businesses.

Commercial license

Web application, API

Ontobee

A linked data server designed for ontologies. Ontobee is aimed to facilitate ontology data sharing, visualization, query, integration, and analysis.

Apache 2.0

Web application

AberOWL

A framework for ontology-based access to biological data. It consists of a repository of bio-ontologies, a set of webservices which provide access to OWL(-EL) reasoning over the ontologies, and several frontends which utilise the ontology repository and reasoning services.

Web application, API

Related tools in Bio.Tools

Tool

Description

License

Topics

Type

ngly1

A repository for the NGLY1 Deficiency Knowledge Graph, the reasoning context to support hypothesis discovery for NGLY1 Deficiency-CDDG (DOID:0060728) research. The user can navigate the knowledge in the graph in the Neo4j Browser website.

N/A

Molecular interactions, pathways and networks, Ontology and terminology, Machine learning

Command-line tool

Doc2Hpo

Web application for efficient and accurate Human Phenotype Ontology (HPO) concept curation.

Unlicense

Genotype and phenotype, Ontology and terminology, Natural language processing

Web application

PlanGexQ

A user-friendly interactive tool for the curation and annotation of planarian morphologies and gene expression patterns in a centralized database.

N/A

Mathematics, Genotype and phenotype, Model organisms, Ontology and terminology, Gene expression

GOcats

Advances in gene ontology utilization improve statistical power of annotation enrichment.

N/A

Mapping, Ontology and terminology, Microarray experiment

Command-line tool

RDFScape

This is a project that brings Semantic Web features to the popular Systems Biology software Cytoscape. It allows to query, visualize and reason on ontologies represented in OWL or RDF within Cytoscape.

Systems biology, Ontology and terminology, Biology

Desktop application

OntoBrowser

The tool was developed to manage ontologies (and controlled terminologies e.g. CDISC SEND). The primary goal of the tool is to provide an online collaborative solution for expert curators to map code list terms (sourced from multiple systems/databases) to preferred ontology terms.

Apache 2.0

Ontology and terminology, Data identity and mapping

Web API, Web application

QuickGO

A fast browser for Gene Ontology terms and annotations.

Ontology and terminology

Web application

Circular Gene Ontology (CirGO)

Visualises non-redundant two-level hierarchically structured ontology terms from gene expression data in a 2D space.

GPL3.0

Ontology and terminology, Data visualisation, Gene expression

Command-line tool, Desktop application

6.6.4. Ontology engineering¶

Ontology engineering is the process of developing and maintaining ontologies during the ontology life cycle.

Curated tools

Tool

Description

License

Topics

Resource Type

Protégé

A free, open source ontology editor and a knowledge management system.

2-Clause BSD

Web application, Desktop application

ROBOT

An open source library and command-line tool for automating ontology development tasks. ROBOT provides ontology processing commands for a variety of tasks, including commands for converting formats, running a reasoner, creating import modules, running reports, and various other tasks.

BSD 3-Clause License

Command-line tool

OWLAPI

A Java API and reference implmentation for creating, manipulating and serialising OWL Ontologies.

LGPL and Apache

API

eNanoMapper Slimmer

A slim tool to slim ontologies as part of ontology integration. It allows users to provide configuration files that specify which parts of an ontology should be kept and/or removed, allowing to just select parts of the ontology you like.

MIT license

Java

TopBraid Composer

TopBraid Composer Maestro Edition is used to develop ontology models, configure data source integration, and create semantic services and user interfaces.

Commercial license

VocBench

a web-based, multilingual, collaborative development platform for managing OWL ontologies, SKOS(/XL) thesauri, Ontolex-lemon lexicons and generic RDF datasets.

License

Desktop application

6.7. Implementation examples¶

To show how these tools can be used in real life examples, please check the related recipes.

  • Selecting terminologies and ontologies

  • Building an application ontology with ROBOT

6.8. References¶

References

6.9. Authors¶

Authors

Name

ORCID

Affiliation

Type

ELIXIR Node

Contribution

Fuqi Xu

EMBL-EBI

Writing - Original Draft, Software

Eva Marin del Pico

Barcelona Supercomputing Centre

Writing - Original Draft, Software

Sukhi Singh

The Hyve

Data curation, Software

Philippe Rocca-Serra

University of Oxford

Writing - Review & Editing

6.10. License¶

License
The Creative Commons 4.0 BY license

5. Requesting new terms 7. Building an application ontology with ROBOT

© Copyright 2020.

EU flag IMI logo EFPIA logo
Grant agreement 802750
  • Contact us
  • FAIRplus
  • Contributors
  • How to contribute
Elixir logo FAIRplus logo
The FAIR Cookbook is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.