Chemistry Toolkit Rosetta Wiki
Explore
Main Page
All Pages
Community
Interactive Maps
Recent Blog Posts
Wiki Content
Recently Changed Pages
Convert a SMILES string to canonical SMILES
Depict a compound as an image
Convert SMILES file to SD file
Align the depiction using a fixed substructure
Chemistry Toolkit Rosetta Wiki
Heavy atom counts from an SD file
Report the similarity between two structures
Cactvs/Tcl
Break rotatable bonds and report the fragments
Calculate TPSA
Align the depiction using a fixed substructure
Depict a compound as an image
Convert SMILES file to SD file
Convert a SMILES string to canonical SMILES
Change stereochemistry of certain atoms in SMILES file
Cactvs/Python
Convert SMILES file to SD file
Report how many SD file records are within a certain molecular weight range
Working with SD tag data
Convert a SMILES string to canonical SMILES
Ring counts in a SMILES file
Heavy atom counts from an SD file
Detect and report SMILES and SDF parsing errors
Community
Help
FANDOM
Fan Central
BETA
Games
Anime
Movies
TV
Video
Wikis
Explore Wikis
Community Central
Start a Wiki
Don't have an account?
Register
Sign In
Sign In
Register
Chemistry Toolkit Rosetta Wiki
22
pages
Explore
Main Page
All Pages
Community
Interactive Maps
Recent Blog Posts
Wiki Content
Recently Changed Pages
Convert a SMILES string to canonical SMILES
Depict a compound as an image
Convert SMILES file to SD file
Align the depiction using a fixed substructure
Chemistry Toolkit Rosetta Wiki
Heavy atom counts from an SD file
Report the similarity between two structures
Cactvs/Tcl
Break rotatable bonds and report the fragments
Calculate TPSA
Align the depiction using a fixed substructure
Depict a compound as an image
Convert SMILES file to SD file
Convert a SMILES string to canonical SMILES
Change stereochemistry of certain atoms in SMILES file
Cactvs/Python
Convert SMILES file to SD file
Report how many SD file records are within a certain molecular weight range
Working with SD tag data
Convert a SMILES string to canonical SMILES
Ring counts in a SMILES file
Heavy atom counts from an SD file
Detect and report SMILES and SDF parsing errors
Community
Help
Editing
Unique SMARTS matches against a SMILES string
Back to page
Edit
Edit source
View history
Talk (0)
Edit Page
Unique SMARTS matches against a SMILES string
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
The edit appears to have already been undone.
Anti-spam check. Do
not
fill this in!
A number of molecular descriptors are based on how many times a given SMARTS pattern is uniquely found in a structure. For example, the CACTVS substructure key 122 is set if there are ">= 2 any ring size 3". That query can be written in SMARTS as "*1**1", but of course each ring of size 3 will match 6 times because of symmetry. Most toolkits have have a way to find all matches for a given SMARTS and a way to find all unique matches for a given SMARTS. "Unique" here means that no two different matches will have the same set of matched atoms. The point of this task is to show how that's done. ==Implementation== Given the SMILES structure "C1CC12C3(C24CC4)CC3" (which is PubChem CID 141640), how many times does "*1**1" match the structure and how many times does the same SMARTS match the structure uniquely? The answers should be 24 and 4, respectively. ==OpenBabel/Pybel== <source lang="python"> import pybel mol = pybel.readstring("smi", "C1CC12C3(C24CC4)CC3") smarts = pybel.Smarts("*1**1") # pybel doesn't have a direct way to get the non-unique matches, so # use the lower-level OpenBabel API directly smarts.obsmarts.Match(mol.OBMol) num_matches = sum(1 for indicies in smarts.obsmarts.GetMapList()) num_unique_matches = len(smarts.findall(mol)) print "number of matches:", num_matches print "number of unique matches:", num_unique_matches </source> ==OpenBabel/Rubabel== <source lang='ruby'> require 'rubabel' mol = Rubabel["C1CC12C3(C24CC4)CC3"] [false, true].map {|uniq| puts mol.matches("*1**1", uniq).size } </source> ==OpenEye/Python== <source lang="python"> from openeye.oechem import * mol = OEGraphMol() OEParseSmiles(mol, "C1CC12C3(C24CC4)CC3") pat = OESubSearch("*1**1") num_matches = sum(1 for match in pat.Match(mol)) num_unique_matches = sum(1 for match in pat.Match(mol, True)) print "number of matches:", num_matches print "number of unique matches:", num_unique_matches </source> ==RDKit/Python== <source lang="python"> from rdkit import Chem mol = Chem.MolFromSmiles('C1CC12C3(C24CC4)CC3') patt = Chem.MolFromSmarts('*1**1') print "number of matches:", len(mol.GetSubstructMatches(patt,uniquify=False)) print "number of unique matches:", len(mol.GetSubstructMatches(patt)) </source> ==Cactvs/Tcl== <pre lang"tcl"> set eh [ens create C1CC12C3(C24CC4)CC3] puts "Number of matches: [match ss -mode all *1**1 $eh]" puts "Number of atom set unique matches: [match ss -mode distinct *1**1 $eh]" puts "Number of topologically unique matches: [match ss -mode unique *1**1 $eh]" </pre> The results are 24, 4 and 2. "Unique" in Cactvs match nomenclature means that not only the matched atoms cannot be the same set, but they also must be topologically distinct from any other match set. The "distinct" match mode simply checks whether a different set of structure atoms was matched (the Rosetta problem), which is a far simpler task. ==Cactvs/Python== The same in Python: <pre lang="python"> e=Ens('C1CC12C3(C24CC4)CC3') print('Number of matches:',match('ss','*1**1',e,mode='all')) print('Number of atom set unique matches:',match('ss','*1**1',e,mode='distinct')) print('Number of topologically unique matches:',match('ss','*1**1',e,mode='unique')) </pre> == CDK/Groovy == The substructure searching code in the CDK is based on a edge matching algorithm, limiting it such that cyclopropane and isobutane cannot be distinguished. Hence, the workaround: <source lang="Groovy"> import org.openscience.cdk.interfaces.*; import org.openscience.cdk.smiles.*; import org.openscience.cdk.smiles.smarts.*; import org.openscience.cdk.silent.SilentChemObjectBuilder; SmilesParser sp = new SmilesParser(SilentChemObjectBuilder.getInstance()); atomContainer = sp.parseSmiles("C1CC12C3(C24CC4)CC3"); querytool = new SMARTSQueryTool("*1**1"); found = querytool.matches(atomContainer); if (found) { mappings = querytool.getMatchingAtoms() hits = 0 for (int i = 0; i < mappings.size(); i++) { atomIndices = mappings.get(i); if (atomIndices.size() == 3) { // work around the cyclopropane / isobutane equivalence hits++ } } println "hits: $hits" mappings = querytool.getUniqueMatchingAtoms() uniqueHits = 0 for (int i = 0; i < mappings.size(); i++) { atomIndices = mappings.get(i); if (atomIndices.size() == 3) { // work around the cyclopropane / isobutane equivalence uniqueHits++ } } println "unique hits: $uniqueHits" } </source> [[Category:SMARTS]] [[Category:feature counts]] [[Category:OpenEye/Python]] [[Category:Cactvs/Tcl]] [[Category:CDK/Groovy]] [[Category:Cactvs/Python]]
Summary:
Please note that all contributions to the Chemistry Toolkit Rosetta Wiki are considered to be released under the CC-BY-SA
Cancel
Editing help
(opens in new window)
Follow on IG
TikTok
Join Fan Lab