11.5.3. Bioactivity data profile¶

Recipe Overview

Reading Time

30 minutes

Executable Code

No

Difficulty

Outlining a metadata profile for Bioactivity data

Recipe Type

Hands-on

Audience

Principal Investigator, Data Manager, Data Scientist

Maturity Level & Indicator

DSM-3-C1

DSM-3-C2

Cite me with FCB057

11.5.3.1. Main objective¶

This recipe shows how to prepare bioactivity data, defined as the measurable effects of a chemical compound in a biological system monitored with a specific assay, to meet the ChEMBL submission criteria, focusing on data formats, structures, and vocabularies. This recipe is meant to address the Findability and Interoperability of such type of data.

11.5.3.2. Graphical overview of the Recipe FAIRification Objectives¶

11.5.3.3. Introduction¶

Bioactivity data, as stored in public archives such as the European repository CHEMBL or its US counterpart PubChem in together with chemical data and omics data, can be used to search for new hits(compounds with desired property in drug screening), for example by using cell line information, compound ID as input to queries over such resources.

Early-stage bioactivity dataset includes compound molecular structure, molecular production details, assay data and, pharmacokinetic study information.

The FAIR principles for data management can guide the improvements of pharmacokinetic properties of compounds and the identification of drug targets by enhancing the reporting of bioactivity data.

Among the FAIR principles, the use of rich metadata (F2. data are described with rich metadata and R1. meta(data) are richly described with a plurality of accurate and relevant attributes) and the reliance on community standards (R1.3. (meta)data meet domain-relevant community standards) are essential.

In the context of bioactivity data, we have on the one hand the Minimum information about a bioactive entity (MIABE) checklist recommend attributes, formats and vocabularies for the reuse of such datasets.

On the other hand, public bioactivity data archives, such as ChEMBL, PubChem, and ECBD also have their own requirements for data submission.

11.5.3.3.1. Data content¶

Content	Details	Data types
Chemistry (SDF)	Structure ID	SDF SMILE InChI CID
Target	Protein/GENE ID	PN_ or SwissProt ID
Assay	Typology	Binding, FRET, SPR, Inhibition, phenotypic cellular
Result Type	Potency/Tox	CC50/IC50/EC50/%
Unit	Result unit	Concentration/ratio/SI
Image		OMETIFF Matrix Format-Zarr

11.5.3.3.2. Minimum metadata¶

A minimum metadata set represents a collection of metadata items that should ideally be systematically supplied to support interpretation by humans or machines within a specific domain, for instance bioactivity experimental data. The minimum metadata set includes three parts:

Assay and project bibliographic references (mainly links to literature and protocol or summary)
- Project level metadata
- Common sample-level metadata, such as species, tissue, cell type and so on.
Chemical compounds reference, including chemical structures
Assay results

For ChEMBL submission, molecular structures and assay description as depicted in the scheme above are suggested as essential metadata. This is a subset of the following schema. In case mutated cell lines and/or mutated target proteins have been used in the assay, additional desirable metadata should be added in the proper group. MIABE also lists detailed bioassay description requirements.

Besides metadata, the diagram below also shows how to prepare numeric assay data.

11.5.3.3.3. Data vocabularies¶

A set of well-established standards and minimum metadata checklists exist for various aspects of ChEMBL formatting.

Chemical information ontology (CHEMINF) http://semanticchemistry.github.io/semanticchemistry/ontology/cheminf.owl

CHEMINF covers information about chemical entities and defines descriptors commonly used in cheminformatics software applications and to denote algorithms used to generate those chemicals.
BioAssay Ontology(BAO)

http://www.bioassayontology.org/bao/bao_complete.owl

The BioAssay Ontology (BAO) describes biological screening assays and their results, including high-throughput screening (HTS) data for the purpose of categorising assays and data analysis. BAO is an extensible, knowledge-based, highly expressive description of biological assays 1 making use of descriptive logic based features of the Web Ontology Language (OWL)
Ontology of units of Measure (OM) http://www.ontology-of-units-of-measure.org/resource/om-2 The OM ontology provides classes, instances, and properties that represent the different concepts used for defining and using measures and units. It includes, for instance, common units such as the SI units meter and kilogram, and a wide range of units of significance for the field of Chemistry and related information. It can be easily mapped to other resources such as Unit Ontology, with tools such as OXO

More information on annotating data with ontologies using tools like Zooma, can be found in Section 7.7.3.3. of this recipe

11.5.3.3.4. Exemplar Bioactivity datasets¶

SARS CoV2 phenotypic assay from Caco2 cell line

The present dataset is a subset of IMI CARE dataset with compounds tested on the Caco-2 cell line. The dataset can be downloaded and, besides structural information, it will contain readout numbers for activity (e.g. either percentage of cellular cytopathic inhibition at a given concentration or corresponding extracted dose-response IC50 (Half-maximal inhibitory concentration)).

Recommendations above are based on ChEMBL ontology requirements. The US counterpart to ChEMBL, the PubChem data bank have different ontology requirements for upload but provide a wizard-based upload process described in this blog

11.5.3.4. Glossary¶

Term	Definition
Experiment	Biochamical Assay, Cellular Activity Assay, Cellular Toxicity Assay
Readout	Quantitive measurements of a biophysical event followed by assay (e.g. change in fluorescence)
EC50	Half maximal Effective Concentration
IC50	Half maximal Inhibition Concentration
AC50	Half maximal Activation Concentration
CC50	Half maximal Cytotoxic Concentration