Creating a basic ISA descriptor: Dataset Maturity Level 1¶
Abstract:¶
In this recipe, we will show how to programmatically create a minimal metadata for a single study ISA descriptor. It is minimal for the following reasons:
no community annotation requirements are met
very limited used of ontology to refine annotations
serialization to TAB format
The recipe then shows how to write (serialize) the ISA objects generated with the ISA-API to ISA-Tab format.
support: isatools@googlegroups.com
issue tracker: https://github.com/ISA-tools/isa-api/issues
If executing the notebooks on Google Colab, uncomment the following command and run it to install the required python libraries. Also, make the test datasets available.
# !pip install -r requirements.txt
Let’s get ready and import all the necessary components¶
from isatools.model import (Investigation,\
Study, Protocol, Publication, Person, Source, Sample, OntologySource, OntologyAnnotation, \
Process, Assay, Material, DataFile, Characteristic, plink, batch_create_materials)
1. Declaring key ISA objects: Investigation, Study, Protocols, Ontologies, Contacts metadata¶
Creating the Investigation object:
investigation = Investigation()
investigation.identifier = "i1"
investigation.title = "My Simple ISA Investigation"
investigation.description = "We could alternatively use the class constructor's parameters to set some default " \
"values at the time of creation, however we want to demonstrate how to use the " \
"object's instance variables to set values."
investigation.submission_date = "2022-11-03"
investigation.public_release_date = "2022-11-03"
Creating a Study object and set some values.
The ISA Study object must have a filename.
We must also add the current ISA study to the Investigation by adding it to the investigation
object and its
associated list of studies.
study = Study(filename="s_study.txt")
study.identifier = "s1"
study.title = "An exemplar ISA Study"
study.description = "Like with the Investigation, we could use the class constructor to set some default values, " \
"but have chosen to demonstrate in this example the use of instance variables to set initial " \
"values."
study.submission_date = "2022-11-03"
study.public_release_date = "2022-11-03"
investigation.studies.append(study)
1.1 Declaring and using Ontology Resources
:¶
Some instance variables are typed with different objects and lists of objects. For example, a Study can have a
list of design descriptors. A design descriptor is an ISA Ontology Annotation
object describing the kind of study at hand.
Ontology Annotations should typically reference an Ontology Source. We demonstrate a mix of using the class
constructors and setting values with instance variables. Note that the OntologyAnnotation object
intervention_design
links its term_source
directly to the obi
object instance. To ensure the OntologySource
is encapsulated in the descriptor, it is added to a list of ontology_source_references
in the Investigation
object. The intervention_design
object is then added to the list of design_descriptors
held by the Study
object.
ncbitaxon = OntologySource(name='NCBITaxon', description="NCBI Taxonomy")
investigation.ontology_source_references.append(ncbitaxon) # remember to add the newly declared ontology source to the parent investigation
intervention_design = OntologyAnnotation(term = "intervention design")
study.design_descriptors.append(intervention_design)
1.2. Declaring Contacts and Publications¶
Other attributes to both Investigation
and Study
objects include ‘contacts’ and ‘publications’,
each filled by lists of corresponding Person
and Publication
objects.
contact = Person(first_name="Alice",
last_name="Robertson",
affiliation="University of Life",
roles=[OntologyAnnotation(term='submitter')])
study.contacts.append(contact)
publication = Publication(title="Experiments with Elephants",
author_list="A. Robertson, B. Robertson",
pubmed_id="12345678",
status= OntologyAnnotation(term="published"))
study.publications.append(publication)
2. Building the ISA biomaterial creation graph¶
To create the ISA study graph that captures the biological materials used as study subjects, and which corresponds to
the contents of the study table file (the s_*.txt file), we need to create a process sequence.
To do this we use the Process
class and attach it to the Study object’s process_sequence
list instance variable.
Each process must be linked with a Protocol
object that is attached to a Study object’s ‘protocols’ list instance
variable. The sample collection Process
object usually has as input a Source material
and as output a Sample material
.
Here, we create one Source
material object and attach it to our study.
source = Source(name='source_material')
study.sources.append(source)
Then, we create three Sample
objects, with organism as Homo Sapiens
, and attach them to the study.
We use the utility function batch_create_material()
to clone a prototype material object.
The function automatically appends an index to the material name.
In this case, three samples will be created, with the names ‘sample_material-0’, ‘sample_material-1’ and ‘sample_material-2’.
prototype_sample = Sample(name='sample_material', derives_from=[source])
characteristic_organism = Characteristic(category=OntologyAnnotation(term="Organism"),
value=OntologyAnnotation(term="Homo Sapiens",
term_source=ncbitaxon,
term_accession="http://purl.bioontology.org/ontology/NCBITAXON/9606"))
prototype_sample.characteristics.append(characteristic_organism)
study.samples = batch_create_materials(prototype_sample, n=3) # creates a batch of 3 samples
Now, we create a single Protocol
object that represents our sample collection protocol, and attach it to the
study object.
Protocols must be declared before we describe Processes, as a processing event of some sort
must execute some defined protocol.
In the case of the class model, Protocols should therefore be declared before Processes in order for the Process
to be linked to one.
sample_collection_protocol = Protocol(name="sample collection",
protocol_type=OntologyAnnotation(term="sample collection"))
study.protocols.append(sample_collection_protocol)
sample_collection_process = Process(executes_protocol=sample_collection_protocol)
Next, we link our materials to the Process. In this particular case, we are describing a sample collection process that takes one source material, and produces three different samples.
(source_material)->(sample collection)->[(sample_material-0), (sample_material-1), (sample_material-2)]
for src in study.sources:
sample_collection_process.inputs.append(src)
for sam in study.samples:
sample_collection_process.outputs.append(sam)
Finally, attach the finished Process object to the study process_sequence. This can be done many times to describe multiple sample collection events.
study.process_sequence.append(sample_collection_process)
#IMPORTANT: remember to list all Characteristics used in the study object: do as follows:
study.characteristic_categories.append(characteristic_organism.category)
3. Creating an ISA Assay with all associated objects¶
Next, we build n Assay object and attach two protocols, extraction and sequencing.
assay = Assay(filename="a_assay.txt")
extraction_protocol = Protocol(name='extraction', protocol_type=OntologyAnnotation(term="material extraction"))
study.protocols.append(extraction_protocol)
sequencing_protocol = Protocol(name='sequencing', protocol_type=OntologyAnnotation(term="nucleic acid sequencing"))
study.protocols.append(sequencing_protocol)
To build out assay graphs, we enumerate the samples from the study-level, and for each sample, we create an extraction process and a sequencing process.
The extraction process takes as input a sample material, and produces an extract material.
The sequencing process takes the extract material and produces a data file. This will produce three graphs, from sample material through to data, as follows:
(sample_material-0)->(extraction)->(extract-0)->(sequencing)->(sequenced-data-0)
(sample_material-1)->(extraction)->(extract-1)->(sequencing)->(sequenced-data-1)
(sample_material-2)->(extraction)->(extract-2)->(sequencing)->(sequenced-data-2)
Note
The extraction processes and sequencing processes are distinctly separate instances, where the three graphs are NOT interconnected.
for i, sample in enumerate(study.samples):
# create an extraction process that executes the extraction protocol
extraction_process = Process(executes_protocol=extraction_protocol)
# extraction process takes as input a sample, and produces an extract material as output
extraction_process.inputs.append(sample)
material = Material(name="extract-{}".format(i))
material.type = "Extract Name"
extraction_process.outputs.append(material)
# create a sequencing process that executes the sequencing protocol
sequencing_process = Process(executes_protocol=sequencing_protocol)
sequencing_process.name = "assay-name-{}".format(i)
sequencing_process.inputs.append(extraction_process.outputs[0])
# Sequencing process usually has an output data file
datafile = DataFile(filename="sequenced-data-{}".format(i),
label="Raw Data File")
sequencing_process.outputs.append(datafile)
# Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set
# these links for you. It is found in the isatools.model package
plink(extraction_process, sequencing_process)
# make sure the extract, data file, and the processes are attached to the assay
assay.data_files.append(datafile)
assay.samples.append(sample)
assay.other_material.append(material)
assay.process_sequence.append(extraction_process)
assay.process_sequence.append(sequencing_process)
assay.measurement_type = OntologyAnnotation(term="gene sequencing")
assay.technology_type = OntologyAnnotation(term="nucleotide sequencing")
Finally, we attach the ISA assay object to the ISA study object.
study.assays.append(assay)
4. Writing our objects as an ISA-Tab document¶
To do this, we simply use the isatab.dumps()
function, as follows:
from isatools.isatab import dumps
print(dumps(investigation))
et Voilà!
Authors¶
Authors
Name |
ORCID |
Affiliation |
Type |
ELIXIR Node |
Contribution |
---|---|---|---|---|---|
University of Oxford |
Writing - Original Draft |