2. Creating resolvable identifiers¶
2.1. Requirements¶
Technical requirements:
Hypertext Transfer Protocol (HTTP)
Uniform Resource Locator (URL)
Domain Name System (DNS)
Knowledge requirement:
understanding of identifiers.
2.2. Abstract¶
With this recipe, we will reinforce the notion and importance of Globally Unique, Persistent, Resolvable Identifier (GUPRI) by introducing several public services for minting such identifiers. For each service, we will highlight the strengths and weaknesses.
2.3. Background information¶
In the ideal linked data world, best practice is to provide a URL which can be accessed via HTTP, more specifically the HTTP GET command.
(Learn more on HTTP GET here: https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods)
To make this work, you have to guarantee that the HTTP GET command is answered.
2.3.1. How can you do that?¶
HTTP uses Domain Name System (DNS) to decide which computer (i.e. which IP address) to contact and ask for content regarding the URL you want to resolve.
DNS uses the concept of hierarchical domains to resolve URLs.
(Learn more about subdomains here: https://en.wikipedia.org/wiki/Subdomain)
For example, you might want to resolve the following URL:
http://en.wikipedia.org/wiki/Enzyme_Commission_number
This will make your computer first look up who is responsible for managing “org” (the top-level domain), and ask there who is responsible for managing the domain “wikipedia.org”, and finally who is responsible for managing “en.wikipedia.org”. All responses might yield different computer systems, i.e. IP addresses.
This principle can thus be used to delegate authority. In an enterprise context, this might be used to delegate authority within hierarchy:
http://my-department.my-enterprise.com/my-resource
can be understood in the following sense: “my-enterprise” allowed “my-department” to “mint”, i.e. generate / provide / resolve, URLs over the internet.
If even more granularity / hierarchy is desired, an URL may also look like this:
http://my-team.my-department.my-enterprise.com/my-resource
This can also be understood in the context of accountability / responsibility:
“my-team” is responsible for ensuring that “http://my-team.my-department.my-enterprise.com/my-resource” is accessible and resolvable via the internet.
As all of this resolution practice is organized by DNS, this allows system administrators to hook in and generate inaccessible subnets
.
Probably you know this phenomenon in the form of an “intranet”, which is only available via VPN if you’re at home, or only available on-site. Typical domain names look like this, then:
http://internal-system.intranet.cnb
Here, your computer is configured in a specific way to resolve “intranet.cnb”, which is not accessible from the public internet, to specialized computer systems.
2.4. How to realize a GUPRI?¶
A Globally Unique, Persistent, Resolvable Identifier (GUPRI) can be realized by implementing all the above. Some consider a GUPRI to be equivalent with PID (persistent identifier) or PURL (persistent URL). If intended for public use, your GUPRI must be accessible from the public internet; if intended for closed use, you would probably use specialized DNS rules and domain names to exclude the public from resolving your identifier.
2.5. Public services for realizing a GUPRI¶
Most likely, you don’t want to set up a specialized HTTP web server at home. This would in principle be an option, but introduces enormous debt on your side: you have to guarantee it is running all the time, and secure it against malicious intents (finally, you have to pay the bill for electricity and internet access as well). You have to rent a domain name. You have to be quite an expert to set the system up and keep it running.
So, what are your options? A public service for realizing GUPRIs sounds like it would be a good idea!
In the following recipe you will learn about w3id.org, purl.org, and the DOI / handle system, all of which are (or can be accessed as) free-of-cost services.
2.5.1. Purl.org¶
Although being the oldest and supposedly easiest-to-use service, purl.org is known by some veterans for having been offline for a long time. This was especially bad as a “persistent” identifier should be resolvable at all times. This service is run by the non-profit Internet Archive.
To access the graphical user interface / web frontend, you need to register with archive.org. The frontend is easy to work with, although it is known for sometimes throwing errors or reacting in unexpected ways.
Using purl.org, you can add custom redirects, with HTTP status codes 301, 302, 303, 307, or errors encoded as HTTP status codes 404 and 410. It also allows “partial redirects”, i.e. it would attach all the string behind a known part to redirect you
Note
example
deposited URL for the partial redirect is http://example.com/one
and intended to be routed
from http://purl.org/example
.
When you call http://purl.org/example/two
, it will route you to http://example.com/one/two
.
If there was no partial redirect in place, you would receive an error when calling http://purl.org/example/two
because this URL would not be known to the purl.org server).
A typical implementation route goes like this:
Register on
archive.org
Log-in to
purl.org
Create your domain, e.g.
purl.org/faircookbook
Create your individual redirect, e.g.
purl.org/faircookbook/recipe1
with status 302 redirecting tohttp://example.org/faircookbook/recipe/1
Note
Take home: Quick and easy solution; but not the most reliable. You have to register one purl “domain” to one user; if multiple users want to manage a common GUPRI space, they have to share the same credentials.
2.5.2. w3id.org¶
The service w3id.org
emerged when purl.org
was down 1.
It is run by multiple stakeholders (see self-description on w3id.org) bound by a “social contract”.
Management is more difficult than purl.org, because w3id.org does not have a graphical user interface / frontend,
but relies on “raw” .htaccess
files deposited on GitHub.com.
You need to have some knowledge about Apache .htaccess redirect rules though.
To learn more about Apache .htaccess redirect rules, refer to this documentation
A clear advantage is that it gives you more freedom in the implementation.
After making a proposal to change the redirection rules via a GitHub Pull Request, maintainers of the w3id.org team have to (manually!) accept your changes. This is (normally) happening very fast, though (turnaround time < 1 day). If you make changes very often though, you will annoy the maintainers, and you should think about using another service.
A typical implementation route goes like this:
Register to github.com
Fork the GitHub repository at https://github.com/perma-id/w3id.org
Copy e.g. the folder “faircookbook” to your desired path (if you name the folder “faircookbook”, you can resolve everything that goes like http://w3id.org/ faircookbook /*)
Change the README and .htaccess
Create a Pull Request
Wait for acceptance of the Pull Request, and/or answer questions from the maintainers
Note
Take home: Possibly quick and easy solution. If no maintainer reacts on your pull request, you can’t act yourself (but our experience is maintainers always act diligently). Bound by social contract, but no formal / legal one. You decide yourself how you share the maintenance load within your namespace, and can only hope that the maintainers of the w3id.org repo act in accordance with your wishes (e.g. don’t allow “outsiders” to change identifiers in your namespace).
2.5.3. Free services which provide access to the Handle system¶
The Handle System (also known as Handle.Net
or HDL.NET
) is a globally distributed system to resolve “handles” (which are local identifiers per se).
The Handle System forms also the basis of DOIs (Digital Object Identifiers).
DOIs add a layer of policies and specifications on top of The Handle System.
Unfortunately, it is quite expensive (approx. 5000 €/\( per year) to join the DOI system
(which would go via a Registration Agency, see here: https://www.doi.org/registration_agencies.html
-- well known Registration Agencies are DataCite and Crossref),
whereas The Handle System is quite inexpensive (approx. 50 €/\) per year).
There is also a way to get handles for free, which is described here:
You can reach out to http://grnet.gr
– e.g. through the subpage https://epic.grnet.gr/
where there is a Contact form.
An alternative is http://www.pidconsortium.net/ (of which grnet is a member).
The author of this recipe managed to get access to both services for free.
A typical implementation goes like this:
Request an account for hdl.grnet.gr
Make yourself familiar with the API (no easy-to-use graphical user interface available as of now) – see http://www.handle.net/tech_manual/HN_Tech_Manual_9.pdf
Make the API call to register a handle
2.6. Conclusion¶
This recipe provided several options for generated globally unique persistent resolvable identifiers. Furthermore, our audience can see how we used the w3id service to generate the identifiers for the recipes and how effective this is. One has to bear in mind that the service works well for projects like this one where the expected number of objects remains small. However, if scale-up is required and millions of GUPRIs are needed, a different service may be needed.
2.7. References¶
References
- 1
Website source code for w3id.org. 2015. URL: https://github.com/perma-id/w3id.org.
2.8. Authors¶
Authors
Name |
ORCID |
Affiliation |
Type |
ELIXIR Node |
Contribution |
---|---|---|---|---|---|
Bayer AG |
Writing - Original Draft |
||||
Maastricht University |
Reviewing & Editing |
||||
University of Oxford |
Reviewing & Editing |