11.5.3. Bioactivity data profile¶
11.5.3.1. Main objective¶
This recipe shows how to prepare bioactivity data
, defined as the measurable effects of a chemical compound in a biological system monitored with a specific assay, to meet the ChEMBL submission criteria, focusing on data formats, structures, and vocabularies.
This recipe is meant to address the Findability and Interoperability of such type of data.
11.5.3.2. Graphical overview of the Recipe FAIRification Objectives¶
11.5.3.3. Introduction¶
Bioactivity data, as stored in public archives such as the European repository CHEMBL or its US counterpart PubChem in together with chemical data and omics data, can be used to search for new hits
(compounds with desired property in drug screening), for example by using cell line information, compound ID as input to queries over such resources.
Early-stage bioactivity dataset includes compound molecular structure, molecular production details, assay data and, pharmacokinetic study information.
The FAIR principles for data management can guide the improvements of pharmacokinetic properties of compounds and the identification of drug targets by enhancing the reporting of bioactivity data
.
Among the FAIR principles, the use of rich metadata
(F2. data are described with rich metadata and R1. meta(data) are richly described with a plurality of accurate and relevant attributes) and the reliance on community standards
(R1.3. (meta)data meet domain-relevant community standards) are essential.
In the context of bioactivity data
, we have on the one hand the Minimum information about a bioactive entity (MIABE) checklist recommend attributes, formats and vocabularies for the reuse of such datasets.
On the other hand, public bioactivity data archives, such as ChEMBL, PubChem, and ECBD also have their own requirements for data submission.
11.5.3.3.1. Data content¶
Content | Details | Data types |
Chemistry (SDF) | Structure ID |
|
Target | Protein/GENE ID | PN_ or SwissProt ID |
Assay | Typology | Binding, FRET, SPR, Inhibition, phenotypic cellular |
Result Type | Potency/Tox | CC50/IC50/EC50/% |
Unit | Result unit | Concentration/ratio/SI |
Image |
|
11.5.3.3.2. Minimum metadata¶
A minimum metadata set represents a collection of metadata items that should ideally be systematically supplied to support interpretation by humans or machines within a specific domain, for instance bioactivity experimental data. The minimum metadata set includes three parts:
Assay and project bibliographic references (mainly links to literature and protocol or summary)
Project level metadata
Common sample-level metadata, such as species, tissue, cell type and so on.
Chemical compounds reference, including chemical structures
Assay results
For ChEMBL submission, molecular structures and assay description as depicted in the scheme above are suggested as essential metadata. This is a subset of the following schema. In case mutated cell lines and/or mutated target proteins have been used in the assay, additional desirable metadata should be added in the proper group. MIABE also lists detailed bioassay description requirements.
Besides metadata, the diagram below also shows how to prepare numeric assay data.
11.5.3.3.3. Data vocabularies¶
A set of well-established standards and minimum metadata checklists exist for various aspects of ChEMBL formatting.
Chemical information ontology (CHEMINF) http://semanticchemistry.github.io/semanticchemistry/ontology/cheminf.owl
CHEMINF covers information about chemical entities and defines descriptors commonly used in cheminformatics software applications and to denote algorithms used to generate those chemicals.
BioAssay Ontology(BAO)
http://www.bioassayontology.org/bao/bao_complete.owl
The BioAssay Ontology (BAO) describes biological screening assays and their results, including high-throughput screening (HTS) data for the purpose of categorising assays and data analysis. BAO is an extensible, knowledge-based, highly expressive description of biological assays 1 making use of descriptive logic based features of the Web Ontology Language (OWL)
Ontology of units of Measure (OM) http://www.ontology-of-units-of-measure.org/resource/om-2 The OM ontology provides classes, instances, and properties that represent the different concepts used for defining and using measures and units. It includes, for instance, common units such as the SI units meter and kilogram, and a wide range of units of significance for the field of Chemistry and related information. It can be easily mapped to other resources such as Unit Ontology, with tools such as OXO
More information on annotating data with ontologies using tools like Zooma, can be found in Section 7.7.3.3. of this recipe
11.5.3.3.4. Exemplar Bioactivity datasets¶
SARS CoV2 phenotypic assay from Caco2 cell line
The present dataset is a subset of IMI CARE dataset with compounds tested on the Caco-2 cell line. The dataset can be downloaded and, besides structural information, it will contain readout numbers for activity (e.g. either percentage of cellular cytopathic inhibition at a given concentration
or corresponding extracted dose-response IC50
(Half-maximal inhibitory concentration)).
Recommendations above are based on ChEMBL ontology requirements. The US counterpart to ChEMBL, the PubChem data bank have different ontology requirements for upload but provide a wizard-based upload process described in this blog
11.5.3.4. Glossary¶
Term |
Definition |
---|---|
Experiment |
Biochamical Assay, Cellular Activity Assay, Cellular Toxicity Assay |
Readout |
Quantitive measurements of a biophysical event followed by assay (e.g. change in fluorescence) |
EC50 |
Half maximal Effective Concentration |
IC50 |
Half maximal Inhibition Concentration |
AC50 |
Half maximal Activation Concentration |
CC50 |
Half maximal Cytotoxic Concentration |
11.5.3.4.1. What to read next?¶
FAIRsharing records appearing in this recipe:
11.5.3.5. References¶
References
- 1
U. Visser, S. Abeyruwan, U. Vempati, R. P. Smith, V. Lemmon, and S. C. Schürer. BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinformatics, 12:257, Jun 2011.
11.5.3.6. Authors¶
Authors
Name |
ORCID |
Affiliation |
Type |
ELIXIR Node |
Contribution |
---|---|---|---|---|---|
Fraunhofer Institute |
Writing - Original Draft, Editing, Conceptualization |
||||
EMBL-EBI |
Writing - Writing - Original Draft, Editing, Conceptualization |
||||
University of Oxford |
Writing - Review & Editing |