11.5.4. Creating a metadata profile for clinical trial protocols¶

Recipe Overview

Reading Time

20 minutes

Executable Code

No

Difficulty

Creating a metadata profile for clinical trial protocols

Recipe Type

Experience Report / Applied Example

Audience

Principal Investigator, Data Manager, Data Scientist, Terminology Manager, Ontologist

Maturity Level & Indicator

DSM-2-C5

DSM-2-H1

Cite me with FCB084

11.5.4.1. Main Objectives¶

The purpose of this recipe is to describe the process to define and standardize study and protocol-level (meta)data commonly collected in paediatric clinical trials, with the aim of making trial data more Findable through a common Interoperable metadata profile. The recipe details how to:

Collect & refine a list of representative variables

Represent protocol-level additional (meta)data in a complementary data model
Define extraction processes for populating variables of interest

11.5.4.2. Graphical Overview¶

11.5.4.3. Requirements¶

Technical requirements: none
Knowledge requirement:
- A basic understanding of clinical trial design and the types of data that are collected in clinical trials.
- Understanding of what a metadata profile is.

11.5.4.4. Table of Data Standards¶

Data Formats	Terminologies	Models
	OMOP
	Clinical Trials Ontology
	NCI Thesaurus

11.5.4.5. Introduction¶

This recipe was created in collaboration with conect4children (c4c), a large collaborative European network that aims to facilitate the development of new drugs and other therapies for the entire paediatric population. This work was carried out as part of the WP5 data harmonization and standardization tasks within c4c.

The creation of a clinical trial protocol metadata profile allows historic clinical trial data to be discovered, and increases the potential for data to be shared and reused. This may ultimately decrease the number of patients needed for new clinical trials, and potentially reduce the cost and effort of conducting those trials. For paediatric trials, the disease being studied is often rare and the number of patients enrolled is small, making the data scarce and valuable.

Enabling FAIR data collection from the planning stages of a trial will improve the FAIRness of trial data and the potential for interoperable data sharing and (metadata-level) data querying from different studies.

11.5.4.6. Reviewing existing clinical trials registries¶

The first step in the process defined and then refined a list of variables to be collected. The (advanced) search features of the following repositories and registries were recorded and then mapped to create a list of common metadata items across all of the resources:

The first step created a list of metadata items from the Advanced Search screen on ClinicalTrials.gov. Metadata items from each subsequent repository were compared against this list and mapped across if there was a match. For example, ‘Age Group’ and ‘Age Range’ or ‘Trial Phase’ and ‘Study Phase’. New metadata items that couldn’t be mapped against existing entries were added to the bottom of the list. This gave a good visual of which metadata items occurred most frequently across all of the repositories. The results of the mapping exercise were captured in a Google Sheet. We started with ClinicalTrials.gov because it is the most comprehensive and used repository.

11.5.4.7. Refining the initial metadata list¶

The list of metadata items was reviewed by c4c partners, and those not considered cross cutting or common enough for paediatric clinical trials were removed from the list. The original list consisted of 36 items and this was reduced to 28. The following were identified for inclusion:

Considered cross cutting or common enough to be included in the metadata schema
Study IDs
Title
Acronym
Condition or Disease
Therapeutic Area
Indication
Study Type
Phase
Funder Type
Study Start
Sample Size
Study Description
Status: Recruitment
Study Documents
Study Results
Country
Age
Age Group
Sex
Ethnicity
Race
Additional Inclusion Criteria
Additional Exclusion Criteria
Outcome Measures
Intervention/Treatment
IMP with Orphan Designation in the Indication
Biospecimens Retained
Product Class

The following 8 terms were excluded after the initial review, either because they were unique to one particular registry’s model and therefore not considered cross cutting enough, or because the information they represented could be abstracted into one of the selected variables.

Not considered cross cutting or common enough to be included in the Metadata Schema	Comment
Rare Disease (tick box yes/no)	There is no fixed defintion of “rare disease”. The condition or disease studied in a trial is included in the final list of variables
Intervention Model	Covered by existing variables
NCT Number	Sub-type of study ID
Consent	Legal frameworks around consent vary widely and consent conditions are not captured consistently, which would make mapping historical clinical trials to the new model problematic
Criterion	Covered by existing variables
Site Name	Covered by other location metadata
Formulation	Covered by existing variables
Route of Administration	Covered by existing variables

11.5.4.8. Testing the metadata profile with a representative clinical trial protocol¶

Each of the above variables were populated (where possible) with information from a clinical trial protocol. They were also mapped to the following ontologies/vocabularies:

Vocabulary	Reason for selection
OMOP vocabularies	OMOP CDM is commonly used for structuring trial results and study participant information
NCI Thesaurus	NCIt is aligned with the CDSIC vocabularies used to mark up data in CDISC SDTM format, mandated by the regulatory authorities in the USA for deposition
Clinical Trials Ontology	Potential to provide a semantic bridge between CDISC and OMOP representations and the preclinical world where OBO Foundry resources are extensively used for semantic representation

11.5.4.8.1. Example:¶

Term	Protocol	OMOP	Clinical Trials Ontology	NCI Thesaurus
Age Group	Child up to 15 years inclusive	4305451 Infant 37016983 Toddler 4285883 Child 4305318 Adolescent	NCIT:C49643 Infant & toddler NCIT:C16423 Child NCIT:C89342 Toddler NCIT:C49683 Children 2-11 years NCIT:C85405 School age child NCIT:C27954 Adolescent	C27956 Infant C89342 Toddler C16423 Child C27954 Adolescent

11.5.4.9. The metadata profile in action¶

The metadata profile created using the steps described above was used to create a metadata schema in tabular format, as shown in the following table:

Variable/record_id	Form Name	Section Header	Field Type	Field Label
record_id	C4C Study Metadata Collection		autofill	Record ID
study_id	C4C Study Metadata Collection	Study Information	short text	Study ID
study_id_1	C4C Study Metadata Collection	Study Information	short text	Add Another Study ID
study_id_text	C4C Study Metadata Collection	Study Information	text box	Add Additional Study IDs
study_title	C4C Study Metadata Collection	Study Information	text box	Study Title
study_acronym	C4C Study Metadata Collection	Study Information	short text	Study Acronym
disease	C4C Study Metadata Collection	Study Information	ontology field	Condition or Disease
therapeutic_area	C4C Study Metadata Collection	Study Information	ontology field	Therapeutic Area
indication	C4C Study Metadata Collection	Study Information	ontology field	Indication
study_type	C4C Study Metadata Collection	Study Information	dropdown	Study Type
country	C4C Study Metadata Collection	Study Information	multiple choice	Country
phase	C4C Study Metadata Collection	Study Information	dropdown	Phase of Trial
funder_type	C4C Study Metadata Collection	Study Information	dropdown	Funder Type
study_start	C4C Study Metadata Collection	Study Information	date field	Study Start
sample_size	C4C Study Metadata Collection	Study Information	short text	Estimated Sample Size
study_description	C4C Study Metadata Collection	Study Information	text box	Study Description
status_recruitment	C4C Study Metadata Collection	Study Information	dropdown	Status: Recruitment
study_documents	C4C Study Metadata Collection	Study Information	multiple choice	Study Documents Available
study_results	C4C Study Metadata Collection	Study Information	dropdown	Study Results
age	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	short text	Age Range
age_group	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	multiple choice	Age Grou(p)
sex	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	dropdown	Sex
race	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	multiple choice	Race
ethnicity	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	multiple choice	Ethnicity
inclusion_criteria	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	text box	Additional Inclusion Criteria
exclusion_criteria	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	text box	Additional Exclusion Criteria
outcome_measures	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	text box	Outcome Measures
intervention_treatment	C4C Study Metadata Collection	Treatment Information	ontology field	Intervention/Treatment
orphan_designation	C4C Study Metadata Collection	Treatment Information	dropdown	IMP with orphan designation in the indication
biospecimens_retained	C4C Study Metadata Collection	Treatment Information	dropdown	Biospecimens Retained
biospecimens_text	C4C Study Metadata Collection	Treatment Information	text box	Type of Specimens Retained
product_class	C4C Study Metadata Collection	Treatment Information	ontology field	Product Class

This schema was used to create a survey in REDCap to allow for more stringent review and testing. The creation of the survey resulted in changes to the schema which may not have been apparent without this additional step. For example, Race was removed from the survey as it was difficult to standardize responses due to geographic variance and text boxes were added for additional inclusion/exclusion criteria. The revised metadata schema is shown below.

Variable/record_id	Form Name	Section Header	Field Type	Field Label
record_id	C4C Study Metadata Collection		autofill	Record ID
	C4C Study Metadata Collection	Study Information	begin new section
study_id_ct.gov	C4C Study Metadata Collection	Study Information	short text	ClinicalTrials.gov ID
study_id_eudract	C4C Study Metadata Collection	Study Information	short text	EudraCT/CTIS ID
study_id_brand	C4C Study Metadata Collection	Study Information	short text	Study Brand Name ID (if applicable)
study_id_text	C4C Study Metadata Collection	Study Information	text box	Add Additional Study IDs
study_title	C4C Study Metadata Collection	Study Information	text box	Study Title
study_acronym	C4C Study Metadata Collection	Study Information	short text	Study Acronym
disease_snomed_1	C4C Study Metadata Collection	Study Information	ontology field	First Condition or Disease - SNOMED CT
disease_snomed_2	C4C Study Metadata Collection	Study Information	ontology field	Second Condition or Disease (if applicable) - SNOMED CT
disease_omim_1	C4C Study Metadata Collection	Study Information	ontology field	First Condition or Disease - OMIM
disease_omim_2	C4C Study Metadata Collection	Study Information	ontology field	Second Condition or Disease (if applicable) - OMIM
therapeutic_area	C4C Study Metadata Collection	Study Information	ontology field	Therapeutic Area
indication	C4C Study Metadata Collection	Study Information	text box	Indication
study_type	C4C Study Metadata Collection	Study Information	dropdown	Study Type
study_type_other	C4C Study Metadata Collection	Study Information	short text	Add Other Study Types
phase	C4C Study Metadata Collection	Study Information	multiple choice	Phase of Trial
phase_other	C4C Study Metadata Collection	Study Information	short text	Add Additional Trial Phases
funder_type	C4C Study Metadata Collection	Study Information	dropdown	Funder Type
funder_type_other	C4C Study Metadata Collection	Study Information	short text	Provide Information about ‘Other’ Funder Types
study_start	C4C Study Metadata Collection	Study Information	date field	Study Start Date
sample_size	C4C Study Metadata Collection	Study Information	short text	Estimated Sample Size
study_description	C4C Study Metadata Collection	Study Information	text box	Study Description
status_recruitment	C4C Study Metadata Collection	Study Information	dropdown	Status: Recruitment
study_documents	C4C Study Metadata Collection	Study Information	multiple choice	Study Documents Available
study_documents_other	C4C Study Metadata Collection	Study Information	short text	Add Additional Types of Study Documents
study_results	C4C Study Metadata Collection	Study Information	dropdown	Study Results
study_continents	C4C Study Metadata Collection	Study Information	multiple choice	Please Select Study Site Locations
european_sites	C4C Study Metadata Collection	Study Information	multiple choice	Please Select European Study Site Locations
n_american_sites	C4C Study Metadata Collection	Study Information	multiple choice	Please Select North American Study Site Locations
	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	begin new section
age	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	short text	Age Range
age_group	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	multiple choice	Age Group(s)
sex	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	dropdown	Sex
ethnicity	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	multiple choice	Ethnicity
inclusion_criteria	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	text box	Additional Inclusion Criteria
exclusion_criteria	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	text box	Additional Exclusion Criteria
outcome_measures	C4C Study Metadata Collection	Inclusion/Exclusion Criteria	text box	Outcome Measures
	C4C Study Metadata Collection	Treatment Information	begin new section
intervention_treatment	C4C Study Metadata Collection	Treatment Information	ontology field	First Intervention/Treatment
product_class	C4C Study Metadata Collection	Treatment Information	ontology field	Product Class - First Intervention/Treatment
intervention_treatment_2	C4C Study Metadata Collection	Treatment Information	ontology field	Second Intervention/Treatment
product_class_2	C4C Study Metadata Collection	Treatment Information	ontology field	Product Class - Second Intervention/Treatment
orphan_designation	C4C Study Metadata Collection	Treatment Information	dropdown	IMP with orphan designation in the indication
biospecimens_retained	C4C Study Metadata Collection	Treatment Information	dropdown	Biospecimens Retained
biospecimens_text	C4C Study Metadata Collection	Treatment Information	text box	Type of Biospecimens Retained
	C4C Study Metadata Collection	Comments	begin new section
comments	C4C Study Metadata Collection	Comments	text box	Comments

The REDCap survey will be sent to studies within the c4c consortium for additional testing. A representative of the study will be asked to complete the survey with metadata from their study and provide feedback. This feedback will be used to further refine the list of metadata items collected. A Shapes Constraint Language (ShaCL) representation of the final metadata schema will be used to create a FAIR Data Point for c4c studies. A FAIR Data Point is a REST API and web client for creating, storing, and serving metadata in compliance with the FAIR principles through the use of standardised exchange formats. This will allow researchers to find sources of paediatric data from clinical trials.

11.5.4.10. Conclusion¶

Paediatric data is often rare and scarce which contributes to the slow development of knowledge and treatments. Any activity that can improve the Findability (and potential Reusability) of the data is therefore valuable. Other researchers could benefit from this recipe by applying it to other sources or types of (meta)data to improve Findability.

The REDCap survey will be sent to c4c partners to allow for further testing of the (meta)data schema. The test results will be used to develop a FAIR data point for c4c studies.

11.5.4.10.1. What to read next?¶

FAIRsharing records appearing in this recipe:

11.5.4.11. Authors¶

Authors

Name	Affiliation	Contribution
Avril Palmeri	Newcastle University	Writing - Original Draft
Becca Leary	Newcastle University	Writing - Original Draft
Anando Sen	Newcastle University	Writing - Original Draft
Ronald Cornet	Amsterdam UMC	Writing - Original Draft
Danielle Welter	University of Luxembourg	Writing - Review & Editing
Philippe Rocca-Serra	University of Oxford	Writing - Review & Editing