3. Creating InChIKeys for IUPAC names¶
3.1. Main Objectives¶
The main purpose of this recipe is:
To take an IUPAC name and generate an InChIKey from it.
3.1.1. Using the OPSIN website¶
The OPSIN library is an open source tool to parse IUPAC names into chemical graphs 2.
OPSIN has a website where IUPAC names are converted into other representations, including an InChIKey.
The latter is done by the official InChI library 1.
3.1.2. Automating translations with Google Colab¶
Google Colaboratory (Colab for short) allows us to use Python to automate conversions of IUPAC names.
In Colab, we can use Bacting 3 to access the OPSIN library.
We would first need to set up Colab for Java, Maven, and scyjava, followed by the download of the Bacting libraries and creation of Bacting manager objects.
Java 17 and Maven are installed with the following commands, (with a confirmation which Java is available):
apt-get install openjdk-17-jre-headless maven -qq > /dev/null
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-17-openjdk-amd64"
update-alternatives --set java /usr/lib/jvm/java-17-openjdk-amd64/bin/java
java -version
Scyjava is installed with the following command:
pip install scyjava
We can then continue by installing Bacting and setting up the two Bacting managers, inchi
and opsin
:
from scyjava import config, jimport
config.endpoints.append('io.github.egonw.bacting:managers-inchi:0.4.1')
config.endpoints.append('io.github.egonw.bacting:managers-opsin:0.4.1')
inchi_cls = jimport("net.bioclipse.managers.InChIManager")
inchi = inchi_cls(".")
opsin_cls = jimport("net.bioclipse.managers.OpsinManager")
opsin = opsin_cls(".")
After that, we use the manager API to parse the IUPAC name and generate an InChI
and an InChIKey
:
anInChI = inchi.generate(opsin.parseIUPACName("methane"))
print(f"InChI: {anInChI.getValue()}")
print(f"InchIKey: {anInChI.getKey()}")
The full Jupyter notebook can be found here, including a button to open the notebook in Colab.
3.1.3. Automating translations with Apache Groovy¶
Because Bacting is written in Java and the libraries being available from Maven Central, it also be used in Apache Groovy and other Java-based environments.
The above code in Groovy looks like:
@Grab(group='io.github.egonw.bacting', module='managers-inchi', version='0.4.1')
@Grab(group='io.github.egonw.bacting', module='managers-opsin', version='0.4.1')
workspaceRoot = "."
inchi = new net.bioclipse.managers.InChIManager(workspaceRoot);
opsin = new net.bioclipse.managers.OpsinManager(workspaceRoot);
anInChI = inchi.generate(opsin.parseIUPACName("methane"))
println "InChI: ${anInChI.getValue()}"
println "InchIKey: ${anInChI.getKey()}"
3.2. Conclusion¶
Cheminformatics provides us the tools to parse IUPAC names and convert them to chemical graph based identifiers, such as the InChIKey.
The InChIKey identifier can be used to find more information about the chemicals represented by the original IUPAC names.
3.2.1. What to read next?¶
Learn more about:
FAIRsharing records appearing in this recipe:
3.3. References¶
References
- 1
Jonathan M. Goodman, Igor Pletnev, Paul Thiessen, Evan Bolton, and Stephen R. Heller. Inchi version 1.06: now more than 99.99. Journal of Cheminformatics, may 24 2021.
- 2
Daniel M. Lowe, Daniel M. Lowe, Peter T. Corbett, Peter Murray-Rust, and Robert C. Glen. Chemical Name to Structure: opsin, an Open Source Solution. Journal of Chemical Information and Modeling, 51(3):739–753, mar 28 2011.
- 3
Egon Willighagen. Bacting: a next generation, command line version of Bioclipse. Journal of Open Source Software, 6(62):2558, jun 23 2021.
3.4. Authors¶
Authors
Name |
ORCID |
Affiliation |
Type |
ELIXIR Node |
Contribution |
---|---|---|---|---|---|
Maastricht University |
Writing - Original Draft |