11.5.4. Creating a metadata profile for clinical trial protocols¶
11.5.4.1. Main Objectives¶
The purpose of this recipe is to describe the process to define and standardize study and protocol-level (meta)data commonly collected in paediatric clinical trials, with the aim of making trial data more Findable through a common Interoperable metadata profile. The recipe details how to:
Collect & refine a list of representative variables
Represent protocol-level additional (meta)data in a complementary data model
Define extraction processes for populating variables of interest
11.5.4.2. Graphical Overview¶
11.5.4.3. Requirements¶
Technical requirements: none
Knowledge requirement:
A basic understanding of clinical trial design and the types of data that are collected in clinical trials.
Understanding of what a metadata profile is.
11.5.4.4. Table of Data Standards¶
Data Formats |
Terminologies |
Models |
---|---|---|
11.5.4.5. Introduction¶
This recipe was created in collaboration with conect4children (c4c), a large collaborative European network that aims to facilitate the development of new drugs and other therapies for the entire paediatric population. This work was carried out as part of the WP5 data harmonization and standardization tasks within c4c.
The creation of a clinical trial protocol metadata profile allows historic clinical trial data to be discovered, and increases the potential for data to be shared and reused. This may ultimately decrease the number of patients needed for new clinical trials, and potentially reduce the cost and effort of conducting those trials. For paediatric trials, the disease being studied is often rare and the number of patients enrolled is small, making the data scarce and valuable.
Enabling FAIR data collection from the planning stages of a trial will improve the FAIRness of trial data and the potential for interoperable data sharing and (metadata-level) data querying from different studies.
11.5.4.6. Reviewing existing clinical trials registries¶
The first step in the process defined and then refined a list of variables to be collected. The (advanced) search features of the following repositories and registries were recorded and then mapped to create a list of common metadata items across all of the resources:
The first step created a list of metadata items from the Advanced Search screen on ClinicalTrials.gov. Metadata items from each subsequent repository were compared against this list and mapped across if there was a match. For example, ‘Age Group’ and ‘Age Range’ or ‘Trial Phase’ and ‘Study Phase’. New metadata items that couldn’t be mapped against existing entries were added to the bottom of the list. This gave a good visual of which metadata items occurred most frequently across all of the repositories. The results of the mapping exercise were captured in a Google Sheet. We started with ClinicalTrials.gov because it is the most comprehensive and used repository.
11.5.4.7. Refining the initial metadata list¶
The list of metadata items was reviewed by c4c partners, and those not considered cross cutting or common enough for paediatric clinical trials were removed from the list. The original list consisted of 36 items and this was reduced to 28. The following were identified for inclusion:
Considered cross cutting or common enough to be included in the metadata schema |
---|
Study IDs |
Title |
Acronym |
Condition or Disease |
Therapeutic Area |
Indication |
Study Type |
Phase |
Funder Type |
Study Start |
Sample Size |
Study Description |
Status: Recruitment |
Study Documents |
Study Results |
Country |
Age |
Age Group |
Sex |
Ethnicity |
Race |
Additional Inclusion Criteria |
Additional Exclusion Criteria |
Outcome Measures |
Intervention/Treatment |
IMP with Orphan Designation in the Indication |
Biospecimens Retained |
Product Class |
The following 8 terms were excluded after the initial review, either because they were unique to one particular registry’s model and therefore not considered cross cutting enough, or because the information they represented could be abstracted into one of the selected variables.
Not considered cross cutting or common enough to be included in the Metadata Schema |
Comment |
---|---|
Rare Disease (tick box yes/no) |
There is no fixed defintion of “rare disease”. The condition or disease studied in a trial is included in the final list of variables |
Intervention Model |
Covered by existing variables |
NCT Number |
Sub-type of study ID |
Consent |
Legal frameworks around consent vary widely and consent conditions are not captured consistently, which would make mapping historical clinical trials to the new model problematic |
Criterion |
Covered by existing variables |
Site Name |
Covered by other location metadata |
Formulation |
Covered by existing variables |
Route of Administration |
Covered by existing variables |
11.5.4.8. Testing the metadata profile with a representative clinical trial protocol¶
Each of the above variables were populated (where possible) with information from a clinical trial protocol. They were also mapped to the following ontologies/vocabularies:
Vocabulary |
Reason for selection |
---|---|
OMOP CDM is commonly used for structuring trial results and study participant information |
|
NCIt is aligned with the CDSIC vocabularies used to mark up data in CDISC SDTM format, mandated by the regulatory authorities in the USA for deposition |
|
Potential to provide a semantic bridge between CDISC and OMOP representations and the preclinical world where OBO Foundry resources are extensively used for semantic representation |
11.5.4.8.1. Example:¶
Term |
Protocol |
OMOP |
Clinical Trials Ontology |
NCI Thesaurus |
---|---|---|---|---|
Age Group |
Child up to 15 years inclusive |
4305451 Infant 37016983 Toddler 4285883 Child 4305318 Adolescent |
NCIT:C49643 Infant & toddler NCIT:C16423 Child NCIT:C89342 Toddler NCIT:C49683 Children 2-11 years NCIT:C85405 School age child NCIT:C27954 Adolescent |
C27956 Infant C89342 Toddler C16423 Child C27954 Adolescent |
11.5.4.9. The metadata profile in action¶
The metadata profile created using the steps described above was used to create a metadata schema in tabular format, as shown in the following table:
Variable/record_id |
Form Name |
Section Header |
Field Type |
Field Label |
---|---|---|---|---|
record_id |
C4C Study Metadata Collection |
autofill |
Record ID |
|
study_id |
C4C Study Metadata Collection |
Study Information |
short text |
Study ID |
study_id_1 |
C4C Study Metadata Collection |
Study Information |
short text |
Add Another Study ID |
study_id_text |
C4C Study Metadata Collection |
Study Information |
text box |
Add Additional Study IDs |
study_title |
C4C Study Metadata Collection |
Study Information |
text box |
Study Title |
study_acronym |
C4C Study Metadata Collection |
Study Information |
short text |
Study Acronym |
disease |
C4C Study Metadata Collection |
Study Information |
ontology field |
Condition or Disease |
therapeutic_area |
C4C Study Metadata Collection |
Study Information |
ontology field |
Therapeutic Area |
indication |
C4C Study Metadata Collection |
Study Information |
ontology field |
Indication |
study_type |
C4C Study Metadata Collection |
Study Information |
dropdown |
Study Type |
country |
C4C Study Metadata Collection |
Study Information |
multiple choice |
Country |
phase |
C4C Study Metadata Collection |
Study Information |
dropdown |
Phase of Trial |
funder_type |
C4C Study Metadata Collection |
Study Information |
dropdown |
Funder Type |
study_start |
C4C Study Metadata Collection |
Study Information |
date field |
Study Start |
sample_size |
C4C Study Metadata Collection |
Study Information |
short text |
Estimated Sample Size |
study_description |
C4C Study Metadata Collection |
Study Information |
text box |
Study Description |
status_recruitment |
C4C Study Metadata Collection |
Study Information |
dropdown |
Status: Recruitment |
study_documents |
C4C Study Metadata Collection |
Study Information |
multiple choice |
Study Documents Available |
study_results |
C4C Study Metadata Collection |
Study Information |
dropdown |
Study Results |
age |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
short text |
Age Range |
age_group |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
multiple choice |
Age Grou(p) |
sex |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
dropdown |
Sex |
race |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
multiple choice |
Race |
ethnicity |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
multiple choice |
Ethnicity |
inclusion_criteria |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
text box |
Additional Inclusion Criteria |
exclusion_criteria |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
text box |
Additional Exclusion Criteria |
outcome_measures |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
text box |
Outcome Measures |
intervention_treatment |
C4C Study Metadata Collection |
Treatment Information |
ontology field |
Intervention/Treatment |
orphan_designation |
C4C Study Metadata Collection |
Treatment Information |
dropdown |
IMP with orphan designation in the indication |
biospecimens_retained |
C4C Study Metadata Collection |
Treatment Information |
dropdown |
Biospecimens Retained |
biospecimens_text |
C4C Study Metadata Collection |
Treatment Information |
text box |
Type of Specimens Retained |
product_class |
C4C Study Metadata Collection |
Treatment Information |
ontology field |
Product Class |
This schema was used to create a survey in REDCap to allow for more stringent review and testing. The creation of the survey resulted in changes to the schema which may not have been apparent without this additional step. For example, Race was removed from the survey as it was difficult to standardize responses due to geographic variance and text boxes were added for additional inclusion/exclusion criteria. The revised metadata schema is shown below.
Variable/record_id |
Form Name |
Section Header |
Field Type |
Field Label |
---|---|---|---|---|
record_id |
C4C Study Metadata Collection |
autofill |
Record ID |
|
C4C Study Metadata Collection |
Study Information |
begin new section |
||
study_id_ct.gov |
C4C Study Metadata Collection |
Study Information |
short text |
|
study_id_eudract |
C4C Study Metadata Collection |
Study Information |
short text |
EudraCT/CTIS ID |
study_id_brand |
C4C Study Metadata Collection |
Study Information |
short text |
Study Brand Name ID (if applicable) |
study_id_text |
C4C Study Metadata Collection |
Study Information |
text box |
Add Additional Study IDs |
study_title |
C4C Study Metadata Collection |
Study Information |
text box |
Study Title |
study_acronym |
C4C Study Metadata Collection |
Study Information |
short text |
Study Acronym |
disease_snomed_1 |
C4C Study Metadata Collection |
Study Information |
ontology field |
First Condition or Disease - SNOMED CT |
disease_snomed_2 |
C4C Study Metadata Collection |
Study Information |
ontology field |
Second Condition or Disease (if applicable) - SNOMED CT |
disease_omim_1 |
C4C Study Metadata Collection |
Study Information |
ontology field |
First Condition or Disease - OMIM |
disease_omim_2 |
C4C Study Metadata Collection |
Study Information |
ontology field |
Second Condition or Disease (if applicable) - OMIM |
therapeutic_area |
C4C Study Metadata Collection |
Study Information |
ontology field |
Therapeutic Area |
indication |
C4C Study Metadata Collection |
Study Information |
text box |
Indication |
study_type |
C4C Study Metadata Collection |
Study Information |
dropdown |
Study Type |
study_type_other |
C4C Study Metadata Collection |
Study Information |
short text |
Add Other Study Types |
phase |
C4C Study Metadata Collection |
Study Information |
multiple choice |
Phase of Trial |
phase_other |
C4C Study Metadata Collection |
Study Information |
short text |
Add Additional Trial Phases |
funder_type |
C4C Study Metadata Collection |
Study Information |
dropdown |
Funder Type |
funder_type_other |
C4C Study Metadata Collection |
Study Information |
short text |
Provide Information about ‘Other’ Funder Types |
study_start |
C4C Study Metadata Collection |
Study Information |
date field |
Study Start Date |
sample_size |
C4C Study Metadata Collection |
Study Information |
short text |
Estimated Sample Size |
study_description |
C4C Study Metadata Collection |
Study Information |
text box |
Study Description |
status_recruitment |
C4C Study Metadata Collection |
Study Information |
dropdown |
Status: Recruitment |
study_documents |
C4C Study Metadata Collection |
Study Information |
multiple choice |
Study Documents Available |
study_documents_other |
C4C Study Metadata Collection |
Study Information |
short text |
Add Additional Types of Study Documents |
study_results |
C4C Study Metadata Collection |
Study Information |
dropdown |
Study Results |
study_continents |
C4C Study Metadata Collection |
Study Information |
multiple choice |
Please Select Study Site Locations |
european_sites |
C4C Study Metadata Collection |
Study Information |
multiple choice |
Please Select European Study Site Locations |
n_american_sites |
C4C Study Metadata Collection |
Study Information |
multiple choice |
Please Select North American Study Site Locations |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
begin new section |
||
age |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
short text |
Age Range |
age_group |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
multiple choice |
Age Group(s) |
sex |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
dropdown |
Sex |
ethnicity |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
multiple choice |
Ethnicity |
inclusion_criteria |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
text box |
Additional Inclusion Criteria |
exclusion_criteria |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
text box |
Additional Exclusion Criteria |
outcome_measures |
C4C Study Metadata Collection |
Inclusion/Exclusion Criteria |
text box |
Outcome Measures |
C4C Study Metadata Collection |
Treatment Information |
begin new section |
||
intervention_treatment |
C4C Study Metadata Collection |
Treatment Information |
ontology field |
First Intervention/Treatment |
product_class |
C4C Study Metadata Collection |
Treatment Information |
ontology field |
Product Class - First Intervention/Treatment |
intervention_treatment_2 |
C4C Study Metadata Collection |
Treatment Information |
ontology field |
Second Intervention/Treatment |
product_class_2 |
C4C Study Metadata Collection |
Treatment Information |
ontology field |
Product Class - Second Intervention/Treatment |
orphan_designation |
C4C Study Metadata Collection |
Treatment Information |
dropdown |
IMP with orphan designation in the indication |
biospecimens_retained |
C4C Study Metadata Collection |
Treatment Information |
dropdown |
Biospecimens Retained |
biospecimens_text |
C4C Study Metadata Collection |
Treatment Information |
text box |
Type of Biospecimens Retained |
C4C Study Metadata Collection |
Comments |
begin new section |
||
comments |
C4C Study Metadata Collection |
Comments |
text box |
Comments |
The REDCap survey will be sent to studies within the c4c consortium for additional testing. A representative of the study will be asked to complete the survey with metadata from their study and provide feedback. This feedback will be used to further refine the list of metadata items collected. A Shapes Constraint Language (ShaCL) representation of the final metadata schema will be used to create a FAIR Data Point for c4c studies. A FAIR Data Point is a REST API and web client for creating, storing, and serving metadata in compliance with the FAIR principles through the use of standardised exchange formats. This will allow researchers to find sources of paediatric data from clinical trials.
11.5.4.10. Conclusion¶
Paediatric data is often rare and scarce which contributes to the slow development of knowledge and treatments. Any activity that can improve the Findability (and potential Reusability) of the data is therefore valuable. Other researchers could benefit from this recipe by applying it to other sources or types of (meta)data to improve Findability.
The REDCap survey will be sent to c4c partners to allow for further testing of the (meta)data schema. The test results will be used to develop a FAIR data point for c4c studies.
11.5.4.10.1. What to read next?¶
FAIRsharing records appearing in this recipe:
- CDISC
- CDISC Study Data Tabulation Model (CDISC SDTM)
- Clinical Trials Information System (CTIS)
- Clinical Trials Ontology (CTO)
- ClinicalTrials.gov
- European Union Drug Regulating Authorities Clinical Trials (EudraCT)
- FAIR Data Point (FDP)
- NCI Thesaurus (NCIt)
- OBO Foundry (OBO)
- Online Mendelian Inheritance in Man (OMIM)
- Shapes Constraint Language (SHACL) (SHACL)
- The FAIR Principles (FAIR)
- The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM)
11.5.4.11. Authors¶
Authors
Name |
ORCID |
Affiliation |
Type |
ELIXIR Node |
Contribution |
---|---|---|---|---|---|
Avril Palmeri |
Newcastle University |
Writing - Original Draft |
|||
Becca Leary |
Newcastle University |
Writing - Original Draft |
|||
Anando Sen |
Newcastle University |
Writing - Original Draft |
|||
Amsterdam UMC |
Writing - Original Draft |
||||
University of Luxembourg |
Writing - Review & Editing |
||||
University of Oxford |
Writing - Review & Editing |