10.4.3. Bioactivity data profile¶
10.4.3.1. Main objective¶
This recipe shows how to prepare
bioactivity data, defined as the measurable effects of a chemical compound in a biological system monitored with a specific assay, to meet the ChEMBL submission criteria, focusing on data formats, structures, and vocabularies.
This recipe is meant to address the Findability and Interoperability of such type of data.
10.4.3.2. Graphical overview of the Recipe FAIRification Objectives¶
Bioactivity data, as stored in public archives such as the European repository CHEMBL or its US counterpart PubChem in together with chemical data and omics data, can be used to search for new
hits(compounds with desired property in drug screening), for example by using cell line information, compound ID as input to queries over such resources.
Early-stage bioactivity dataset includes compound molecular structure, molecular production details, assay data and, pharmacokinetic study information.
The FAIR principles for data management can guide the improvements of pharmacokinetic properties of compounds and the identification of drug targets by enhancing the reporting of
Among the FAIR principles, the
use of rich metadata (F2. data are described with rich metadata and R1. meta(data) are richly described with a plurality of accurate and relevant attributes) and the reliance on
community standards (R1.3. (meta)data meet domain-relevant community standards) are essential.
In the context of
bioactivity data, we have on the one hand the Minimum information about a bioactive entity (MIABE) checklist recommend attributes, formats and vocabularies for the reuse of such datasets.
10.4.3.3.1. Data content¶
|Chemistry (SDF)||Structure ID||
|Target||Protein/GENE ID||PN_ or SwissProt ID|
|Assay||Typology||Binding, FRET, SPR, Inhibition, phenotypic cellular|
10.4.3.3.2. Minimum metadata¶
A minimum metadata set represents a collection of metadata items that should ideally be systematically supplied to support interpretation by humans or machines within a specific domain, for instance bioactivity experimental data. The minimum metadata set includes three parts:
Assay and project bibliographic references (mainly links to literature and protocol or summary)
Project level metadata
Common sample-level metadata, such as species, tissue, cell type and so on.
Chemical compounds reference, including chemical structures
For ChEMBL submission, molecular structures and assay description as depicted in the scheme above are suggested as essential metadata. This is a subset of the following schema. In case mutated cell lines and/or mutated target proteins have been used in the assay, additional desirable metadata should be added in the proper group. MIABE also lists detailed bioassay description requirements.
Besides metadata, the diagram below also shows how to prepare numeric assay data.
10.4.3.3.3. Data vocabularies¶
A set of well-established standards and minimum metadata checklists exist for various aspects of ChEMBL formatting.
Chemical information ontology (CHEMINF) http://semanticchemistry.github.io/semanticchemistry/ontology/cheminf.owl
CHEMINF covers information about chemical entities and defines descriptors commonly used in cheminformatics software applications and to denote algorithms used to generate those chemicals.
The BioAssay Ontology (BAO) describes biological screening assays and their results, including high-throughput screening (HTS) data for the purpose of categorising assays and data analysis. BAO is an extensible, knowledge-based, highly expressive description of biological assays 1 making use of descriptive logic based features of the Web Ontology Language (OWL)
Ontology of units of Measure (OM) http://www.ontology-of-units-of-measure.org/resource/om-2 The OM ontology provides classes, instances, and properties that represent the different concepts used for defining and using measures and units. It includes, for instance, common units such as the SI units meter and kilogram, and a wide range of units of significance for the field of Chemistry and related information. It can be easily mapped to other resources such as Unit Ontology, with tools such as OXO
10.4.3.3.4. Exemplar Bioactivity datasets¶
The present dataset is a subset of IMI CARE dataset with compounds tested on the Caco-2 cell line. The dataset can be downloaded and, besides structural information, it will contain readout numbers for activity (e.g. either
percentage of cellular cytopathic inhibition at a given concentration or corresponding extracted
dose-response IC50 (Half-maximal inhibitory concentration)).
Recommendations above are based on ChEMBL ontology requirements. The US counterpart to ChEMBL, the PubChem data bank have different ontology requirements for upload but provide a wizard-based upload process described in this blog
Biochamical Assay, Cellular Activity Assay, Cellular Toxicity Assay
Quantitive measurements of a biophysical event followed by assay (e.g. change in fluorescence)
Half maximal Effective Concentration
Half maximal Inhibition Concentration
Half maximal Activation Concentration
Half maximal Cytotoxic Concentration
10.4.3.5. What to read next?¶
U. Visser, S. Abeyruwan, U. Vempati, R. P. Smith, V. Lemmon, and S. C. Schürer. BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinformatics, 12:257, Jun 2011.