1. Identifier resolution services¶
1.1. Main Objective¶
Linked Data require on URL and HTTP protocols to ensure linking.
CoolURI don’t change
1.3. Identifier Resolution - Enabling persistence through indirection¶
This relates to the following FAIR principle mentioned in the introduction:
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol.
URI resolution is fundamentally about directing requests to the relevant identified entity.
The standard approach would be resolving a
HTTP GET request using content negotiation to choose between different representations of the resource.
A PURL is a
persistent URL, meaning that it provides a permanent address to access a resource on the web.
To understand the notion of PURL, one needs to first get familiar with the notion of
url indirection (also known as
url redirect or
url forwarding ), which refers to the practice of providing a stable, fixed web address/url, but setting it up so that it points to another content, which may be periodically modified.
When a user retrieves a PURL, they will be
redirected to the current location of the resource.
When an author needs to move a page, they can update the PURL to point to the new location.
The practice of indirection comes handy as it ensures invariant url address for resources which are known to change, owing to version changes for instance or owing to change in ownership.
We can see this practice in action with the reliance on purl.org url for identifying OBO Foundry resources. For instance, the following url
http://purl.obolibrary.org/obo/stato.owl is a redirect to the latest release of the file, which is https://raw.githubusercontent.com/ISA-tools/stato/dev/releases/latest_release/stato.owl.
PURLs with a
common prefix are grouped together into domains. Each domain has a single maintainer who can add new PURLs to the domain and make changes to existing PURLs within the domain.
FAIR Principle A1 states that:
(meta)data should be retrievable by its identifier.
When the identifier is not a resolvable URL, then
Identifier Resolution Services are required that know how to map an IRI to a location for the data.
1.3.1. Introducing CURIEs or Compact URIs¶
CURIEs (short for compact URIs) are defined by a World Wide Web Consortium Working Group Note CURIE Syntax 1.0, and provide a human readable shortening of IRIs.
The CURIE consists of a
namespace prefix followed by the
There are some widely used and defined CURIEs such as DOIs and ISBN numbers. For example the DOI
[doi:10.1038/sdata.2016.18] refers to the FAIR Principles paper. The Digital Object Identifier System web site (https://www.doi.org/) provides a resolution service for DOIs. The service is available as a web form on the site or can be used by appending a DOI to the website.The client will be redirected to the URL where the resource about the concept is located, e.g. for the FAIR Data Principles paper we can use the URL https://www.doi.org/10.1038/sdata.2016.18 to resolve the paper’s DOI. This results in the client being taken to the page at https://www.nature.com/articles/sdata201618.
Namespaces can be defined by convention, such as the case with
doi, and registered with services to allow for the resolution of CURIEs (see Identifier Resolution Services below). These are extensively used to map CURIEs to URLs that can be resolved.
Going back to our Life Science context, we can use the following CURIE
[uniprot:P38398] to refer to the UniProt record for the protein.
This is very useful for including unambiguous, global identifiers in scientific articles.
1.3.2. Identifier Resolution services¶
The PURL system is a service of the Internet Archive, which provides an interface to administer domain. For more information about the service, visit https://archive.org/services/purl/help
Permanent Identifiers for the Web. Secure, permanent URLs for your Web application that will stand the test of time.
authority registration service
Send a request to add a redirect to the email@example.com mailing list. Make sure to include the URL that you want on w3id.org, the URL that you want to redirect to, and the HTTP code that you want to use when redirecting. An administrator will then create the redirect for you.
Identifiers.org is a Resolution Service provides consistent access to life science data using
Compact Uniform Resource Identifiers, hosted by the EBI provides a resolution service, both as a web form and through the URL pattern 2.
Compact Identifiersconsist of an
local provider designated
accession number(prefix:accession). The resolving location of
Compact Identifiersis determined using information that is stored in the Identifiers.org Registry. Datasets can register their namespace
prefixtogether with their
identifier pattern. The service can then be used in the same way as the DOI resolution service. So for the UniProt page about BRCA1, we can resolve the CURIE
[uniprot:P38938]using Identifiers.org. This means that the URL https://identifiers.org/uniprot:P38938 resolves to the UniProt page https://www.uniprot.org/uniprot/P38938.
Name-to-Thing (N2T) is a Resolution Service, maintained at the California Digital Library (CDL) within the University of California (UC) Office of the President. CDL supports electronic library services for ten UC campuses and affiliated law schools, medical centers, and national laboratories, as well as hundreds of museums, herbaria, botanical gardens, etc. Similar to URL shorteners like bit.ly, N2T serves content indirectly. N2T can store more than one “target” (forwarding link) for an identifier, as well as any kind or amount of metadata (descriptive information) N2T.net is also a “meta-resolver”. In collaboration with identifiers.org, it recognizes over 600 well-known identifier types and knows where their respective servers are. Failing to find forwarding information for a specific individual identifier, it uses the identifier’s type to look for an overall target rule.
The Bioregistry is a Resolution Service, developed in a GitHub repository 1. Like Identifiers.org, it has a registry, but also a registry of registries, and it imports data from Identifiers.org, Name-to-Thing, and 20+ other registries that extends beyond identifiers for things but also supports, for example, ontologies. As a community effort, new namespace prefixes and their identifier patterns can be registered via GitHub issues. Compact identifiers are supported and the URL https://bioregistry.io/chebi:138488 resolves to the ChEBI page https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:138488. Bioregistry provides an API to query the registry itself.
1.3.3. PURL stands for Persistent URL¶
As defined in https://archive.org/services/purl/help, PURL are
persistent URL and they provide a permanent http address to access a resource on the web [https://archive.org/services/purl/help].
The PURL service is administered by the Internet Archive. Users can request domains from the service under which to administer and mint persistent url.
Charles Tapley Hoyt, Meghan Balk, Tiffany J Callahan, Daniel Domingo-Fernández, Melissa A Haendel, Harshad B Hegde, Daniel S Himmelstein, Klas Karis, John Kunze, Tiago Lubiana, Nicolas Matentzoglu, Julie McMurry, Sierra Moxon, Christopher J Mungall, Adriano Rutz, Deepak R Unni, Egon Willighagen, Donald Winston, and Benjamin M Gyori. Unifying the identification of biomedical entities with the Bioregistry. Sci. Data, 9(1):714, 2022. URL: https://doi.org/10.1038/s41597-022-01807-3, doi:10.1038/s41597-022-01807-3.
N. Juty, N. Le Novère, and C. Laibe. Identifiers.org and MIRIAM Registry: community resources to provide persistent identification. Nucleic Acids Res, 40(Database issue):D580–586, Jan 2012.
1.5.1. What to read next?¶
FAIRsharing records appearing in this recipe:
- Bioregistry (bioregistry)
- Chemical Entities of Biological Interest (ChEBI)
- Compact URI (CURIE)
- Digital Object Identifier (DOI)
- Identifiers.org Central Registry
- OBO Foundry (OBO)
- Persistent Uniform Resource Locator (PURL)
- The FAIR Principles (FAIR)
- UniProt Knowledgebase (UniProtKB)
- Uniform Resource Locator (URL)
- w3id.org (w3id)