Chemistry Toolkit Rosetta Wiki
Explore
Main Page
All Pages
Community
Interactive Maps
Recent Blog Posts
Wiki Content
Recently Changed Pages
Convert a SMILES string to canonical SMILES
Depict a compound as an image
Convert SMILES file to SD file
Align the depiction using a fixed substructure
Chemistry Toolkit Rosetta Wiki
Heavy atom counts from an SD file
Report the similarity between two structures
Cactvs/Tcl
Break rotatable bonds and report the fragments
Calculate TPSA
Align the depiction using a fixed substructure
Depict a compound as an image
Convert SMILES file to SD file
Convert a SMILES string to canonical SMILES
Change stereochemistry of certain atoms in SMILES file
Cactvs/Python
Convert SMILES file to SD file
Report how many SD file records are within a certain molecular weight range
Working with SD tag data
Convert a SMILES string to canonical SMILES
Ring counts in a SMILES file
Heavy atom counts from an SD file
Detect and report SMILES and SDF parsing errors
Community
Help
FANDOM
Fan Central
BETA
Games
Anime
Movies
TV
Video
Wikis
Explore Wikis
Community Central
Start a Wiki
Don't have an account?
Register
Sign In
Sign In
Register
Chemistry Toolkit Rosetta Wiki
22
pages
Explore
Main Page
All Pages
Community
Interactive Maps
Recent Blog Posts
Wiki Content
Recently Changed Pages
Convert a SMILES string to canonical SMILES
Depict a compound as an image
Convert SMILES file to SD file
Align the depiction using a fixed substructure
Chemistry Toolkit Rosetta Wiki
Heavy atom counts from an SD file
Report the similarity between two structures
Cactvs/Tcl
Break rotatable bonds and report the fragments
Calculate TPSA
Align the depiction using a fixed substructure
Depict a compound as an image
Convert SMILES file to SD file
Convert a SMILES string to canonical SMILES
Change stereochemistry of certain atoms in SMILES file
Cactvs/Python
Convert SMILES file to SD file
Report how many SD file records are within a certain molecular weight range
Working with SD tag data
Convert a SMILES string to canonical SMILES
Ring counts in a SMILES file
Heavy atom counts from an SD file
Detect and report SMILES and SDF parsing errors
Community
Help
Editing
Convert a SMILES string to canonical SMILES
Back to page
Edit
Edit source
View history
Talk (0)
Edit Page
Convert a SMILES string to canonical SMILES
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
The edit appears to have already been undone.
Anti-spam check. Do
not
fill this in!
<chem>c6h6</chem>A SMILES string is a way to represent a 2D molecular graph as a 1D string. In most cases there are many possible SMILES strings for the same structure. Canonicalization is a way to determine which of all possible SMILES will be used as the reference SMILES for a molecular graph. Suppose you want to find if a structure already exists in a data set. In graph theory this is the graph isomorphism problem. Using the canonical SMILES instead of the graphs reduces the problem to a simple text matching problem. Keep track of the canonical SMILES for each compound in a database and convert the query structure to its canonical SMILES. If that SMILES doesn't already exist then it is a new structure. There is no universal canonical SMILES. Every toolkit uses a different algorithm, and sometimes the algorithm changes with different versions of the toolkit. There are even different forms of canonical SMILES, depending on if atomic properties like isotope are important for the result. Canonical SMILES is mostly important inside of software tools. It isn't meant as an exchange format and no one really types in canonical SMILES. That's why this task doesn't have any form of I/O. The point of this task is to see how to convert a in-memory SMILES string to a molecule then generate the canonical SMILES for it. ==Implementation== Parse two SMILES strings and convert them to canonical form. Check that the results give the same string. The input SMILES structures are: [H]c1c([H])c2c(c3OC([H])([H])Oc13)C([H])([H])N(C([H])([H])[H])C([H])([H])C([H])([H])c1c([H])c3OC([H])([H])Oc3c([H])c1C(=O)C2([H])[H] ==CDK/Groovy== <source lang="groovy">import org.openscience.cdk.smiles.SmilesGenerator; import org.openscience.cdk.smiles.SmilesParser; import org.openscience.cdk.nonotify.NoNotificationChemObjectBuilder; parser = new SmilesParser( NoNotificationChemObjectBuilder.getInstance() ); generator = new SmilesGenerator(); smi = [ "CN2C(=O)N(C)C(=O)C1=C2N=CN1C", "CN1C=NC2=C1C(=O)N(C)C(=O)N2C" ] can = []; smi.each { smiles -> can.add( generator.createSMILES( parser.parseSmiles(smiles) ) ) } assert can[0] == can[1] </source> ==Indigo/C== <source lang="c"> #include <stdio.h> #include <string.h> #include "indigo.h" int main (int argc, const char *argv[]) { int mol1 = indigoLoadMoleculeFromString("CN2C(=O)N(C)C(=O)C1=C2N=CN1C"); int mol2 = indigoLoadMoleculeFromString("CN1C=NC2=C1C(=O)N(C)C(=O)N2C"); char *smi1, *smi2; indigoAromatize(mol1); indigoAromatize(mol2); smi1 = strdup(indigoCanonicalSmiles(mol1)); smi2 = strdup(indigoCanonicalSmiles(mol2)); if (strcmp(smi1, smi2) != 0) fprintf(stderr, "canonical SMILES strings do not match\n"); free(smi1); free(smi2); indigoFree(mol1); indigoFree(mol2); } </source> ==Indigo/Java== <source lang="java"> package test; import com.gga.indigo.*; import java.io.*; import java.util.*; public class Main { public static void main (String[] args) throws java.io.IOException { Indigo indigo = new Indigo(); IndigoObject mol1 = indigo.loadMolecule("CN2C(=O)N(C)C(=O)C1=C2N=CN1C"); IndigoObject mol2 = indigo.loadMolecule("CN1C=NC2=C1C(=O)N(C)C(=O)N2C"); mol1.aromatize(); mol2.aromatize(); assert mol1.canonicalSmiles().equals(mol2.canonicalSmiles()); } }</source> ==Indigo/Python== <source lang="python"> from indigo import * indigo = Indigo() mol1 = indigo.loadMolecule("CN2C(=O)N(C)C(=O)C1=C2N=CN1C") mol2 = indigo.loadMolecule("CN1C=NC2=C1C(=O)N(C)C(=O)N2C") mol1.aromatize() mol2.aromatize() assert mol1.canonicalSmiles() == mol2.canonicalSmiles() </source> ==OpenBabel/Pybel== <source lang="python">import pybel smiles = ["CN2C(=O)N(C)C(=O)C1=C2N=CN1C", "CN1C=NC2=C1C(=O)N(C)C(=O)N2C"] cans = [pybel.readstring("smi", smile).write("can") for smile in smiles] assert cans[0] == cans[1] </source> ==OpenBabel/Rubabel== <source lang="ruby">require 'rubabel' smiles = %w{CN2C(=O)N(C)C(=O)C1=C2N=CN1C CN1C=NC2=C1C(=O)N(C)C(=O)N2C} cans = smiles.map {|smile| Rubabel[smile] } fail unless cans.reduce(:==) </source> ==OpenEye/Python== <source lang="python">from openeye.oechem import * def canonicalize(smiles): mol = OEGraphMol() OEParseSmiles(mol, smiles) return OECreateCanSmiString(mol) assert (canonicalize("CN2C(=O)N(C)C(=O)C1=C2N=CN1C") == canonicalize("CN1C=NC2=C1C(=O)N(C)C(=O)N2C")) </source> ==RDKit/Python== <source lang="python">from rdkit import Chem smis = ["CN2C(=O)N(C)C(=O)C1=C2N=CN1C", "CN1C=NC2=C1C(=O)N(C)C(=O)N2C"] cans = [Chem.MolToSmiles(Chem.MolFromSmiles(smi),True) for smi in smis] assert cans[0] == cans[1] </source> ==Cactvs/Tcl== <pre lang="tcl"> prop setparam E_SMILES unique 1 set s1 [ens new [ens create CN2C(=O)N(C)C(=O)C1=C2N=CN1C] E_SMILES] set s2 [ens new [ens create CN1C=NC2=C1C(=O)N(C)C(=O)N2C] E_SMILES] if {$s1 ne $s2} {error "SMILES not equal"} </pre> ==Cactvs/Python== <pre lang="python"> Prop.Setparam('E_SMILES',{'unique':True}) s1=Ens('CN2C(=O)N(C)C(=O)C1=C2N=CN1C').new('E_SMILES') s2=Ens('CN1C=NC2=C1C(=O)N(C)C(=O)N2C').new('E_SMILES') if (s1!=s2): raise RuntimeError('SMILES not equal') </pre> [[Category:OpenBabel/Pybel]] [[Category:Canonical SMILES]] [[Category:SMILES]] [[Category:OpenEye/Python]] [[Category:RDKit/Python]] [[Category:CDK/Groovy]] [[Category:Indigo/Python]] [[Category:Indigo/C]] [[Category:Indigo/Java]] [[Category:Cactvs/Tcl]] [[Category:Cactcvs/Python]] [[Category:Cactvs/Python]]
Summary:
Please note that all contributions to the Chemistry Toolkit Rosetta Wiki are considered to be released under the CC-BY-SA
Cancel
Editing help
(opens in new window)
Follow on IG
TikTok
Join Fan Lab