Chemistry Toolkit Rosetta Wiki
(problem description)
 
(14 intermediate revisions by 7 users not shown)
Line 1: Line 1:
  +
This task is meant to give examples of how to add and delete atoms and bonds from the molecular graph, using the graph API.
Input is a somewhat complex SMILES and a SMARTS pattern which defines "rotatable bonds".
 
   
  +
One of the researchers I worked with asked me for a program which took an input molecule and broke it apart at the rotatable bonds. When a bond breaks, each fragment gets attached to an atom with atomic number 0, that is "*".
Find each rotatable bond and break it. For atom it was connected to, add a single bond connection to a new "*" atom. Canonicalize each fragment of the result and print out the results, one fragment per line. The line must contain the number of atoms in the fragment and the fragment's SMILES, including the "*" term.
 
   
  +
For a simple example, "c1ccccc1C=O" would become "c1ccccc1*" and "*C=O".
The goal here is to give an example of molecular editing.
 
  +
  +
"Rotatable bond" was defined as the bond in the following SMARTS
  +
  +
<pre>
  +
[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]
  +
</pre>
  +
  +
but a toolkit may have a built-in definition of rotatable bond.
  +
  +
  +
==Implementation==
  +
  +
Write a function or method which takes a molecule as input and finds all of the rotatable bonds using either the toolkit's built-in definition or the above SMARTS definition. Delete each bond from the structure and add an atom with atomic number 0 to each of the atoms which was at the end of the bond. The function may modify the molecule in-place or return a new molecule.
  +
  +
Use the function to write a program which takes the structure "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O" (which is the perceived aromatic form of PubChem CID 391758), breaks the rotatable bonds, and prints the fragments out in SMILES form, one per line.
  +
  +
The output should be something like
  +
  +
<pre>
  +
*c1(ccc(cc1)O)*
  +
*c1(cccc(c1)O)*
  +
*[CH]1(C(=O)[N](c2ccccc2C(=N1)(*)*)(*)*)*
  +
*[CH2](*)(*)*
  +
*[CH2](*)(*)*
  +
*C(=O)(*)O
  +
</pre>
  +
  +
although the specific strings will depend on the toolkit's canonicalization algorithm.
  +
  +
If the toolkit can do this with SMIRKS then feel free to show how that works, in addition to a solution which manipulates the graph directly. (Frankly, I haven't figured out how to do it with SMIRKS in OEChem.)
  +
  +
==OpenBabel/Rubabel==
  +
<source lang='ruby'>
  +
require 'rubabel'
  +
smarts = "[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]"
  +
mol = Rubabel["c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O"]
  +
mol.matches(smarts).each do |atom1, atom2|
  +
mol.delete(atom1.get_bond(atom2))
  +
[atom1, atom2].each do |old_a|
  +
mol.add_bond!(old_a, mol.add_atom!(0))
  +
end
  +
end
  +
</source>
  +
  +
==OpenEye/Python==
  +
<source lang="python">
  +
from openeye.oechem import *
  +
  +
rotatable_pat = OESubSearch("[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]")
  +
  +
def break_rotatable_bonds(mol):
  +
# Get the rotatable bonds
  +
rotatable_bonds = []
  +
for match in rotatable_pat.Match(mol):
  +
rotatable_bonds.extend(match.GetTargetBonds())
  +
  +
# Delete each bond
  +
for bond in rotatable_bonds:
  +
# Get the atoms it's bonded too then delete the atom
  +
bond_atoms = bond.GetBgn(), bond.GetEnd()
  +
mol.DeleteBond(bond)
  +
# Add the dummy atom to each of the ex-bond's atoms
  +
for atom in bond_atoms:
  +
new_atom = mol.NewAtom(0)
  +
mol.NewBond(atom, new_atom, 1)
  +
  +
  +
mol = OEGraphMol()
  +
OEParseSmiles(mol,
  +
"c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O")
  +
OEAssignAromaticFlags(mol)
  +
  +
break_rotatable_bonds(mol)
  +
print OECreateCanSmiString(mol).replace(".", "\n")
  +
</source>
  +
  +
===SMIRKS version===
  +
<source lang='python'>
  +
from openeye.oechem import *
  +
  +
def main():
  +
'''
  +
'''
  +
SMIRKS = "[!$([NH]!@C(=O))&!D1&!$(*#*):1]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*):2]>>[*:1]*.*[*:2]"
  +
  +
molecule = OEGraphMol()
  +
OEParseSmiles(molecule, 'c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O')
  +
  +
libgen = OELibraryGen(SMIRKS)
  +
libgen.SetValenceCorrection(True)
  +
libgen.SetExplicitHydrogens(False)
  +
libgen.SetStartingMaterial(molecule, 0)
  +
  +
for product in libgen.GetProducts():
  +
partcount,partlist = OEDetermineComponents(product)
  +
pred = OEPartPredAtom(partlist)
  +
  +
for i in range(partcount):
  +
fragment = OEGraphMol()
  +
pred.SelectPart(i+1)
  +
OESubsetMol(fragment, product, pred)
  +
  +
print OECreateCanSmiString(fragment),
  +
  +
print
  +
main()
  +
  +
"*C1=NC(C(=O)N(c2c1cccc2)CC(=O)O)Cc3ccc(cc3)O" "*c1cccc(c1)O"
  +
"*C1C(=O)N(c2ccccc2C(=N1)c3cccc(c3)O)CC(=O)O" "*Cc1ccc(cc1)O"
  +
"*N1c2ccccc2C(=NC(C1=O)Cc3ccc(cc3)O)c4cccc(c4)O" "*CC(=O)O"
  +
"*CN1c2ccccc2C(=NC(C1=O)Cc3ccc(cc3)O)c4cccc(c4)O" "*C(=O)O"
  +
"*CC1C(=O)N(c2ccccc2C(=N1)c3cccc(c3)O)CC(=O)O" "*c1ccc(cc1)O"
  +
</source>
  +
  +
==RDKit/Python==
  +
<source lang="python">
  +
from rdkit import Chem
  +
  +
patt = Chem.MolFromSmarts('[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]')
  +
mol = Chem.MolFromSmiles("c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O")
  +
  +
# find the rotatable bonds
  +
bonds = mol.GetSubstructMatches(patt)
  +
  +
# create an editable molecule, break the bonds, and add dummies:
  +
em = Chem.EditableMol(mol)
  +
nAts = mol.GetNumAtoms()
  +
for a,b in bonds:
  +
em.RemoveBond(a,b)
  +
em.AddAtom(Chem.Atom(0))
  +
em.AddBond(a,nAts,Chem.BondType.SINGLE)
  +
em.AddAtom(Chem.Atom(0))
  +
em.AddBond(b,nAts+1,Chem.BondType.SINGLE)
  +
nAts+=2
  +
p = em.GetMol()
  +
Chem.SanitizeMol(p)
  +
  +
smis = [Chem.MolToSmiles(x,True) for x in Chem.GetMolFrags(p,asMols=True)]
  +
for smi in smis: print smi
  +
  +
# ----------------------------------
  +
# There's actually an easier way to do this since the BRICS decomposition code has
  +
# exactly the function we need:
  +
from rdkit.Chem import BRICS
  +
bonds = [((x,y),(0,0)) for x,y in bonds]
  +
p = BRICS.BreakBRICSBonds(mol,bonds=bonds)
  +
  +
smis = [Chem.MolToSmiles(x,True) for x in Chem.GetMolFrags(p,asMols=True)]
  +
for smi in smis: print smi
  +
</source>
  +
  +
  +
==Cactvs/Tcl==
  +
<pre lang="tcl">
  +
set eh [ens create "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O"]
  +
prop setparam E_SMILES useexplicith 1
  +
filter create rotbonds property B_ROTATABILITY operator = value1 yes
  +
foreach b [ens bonds $eh rotbonds] {
  +
lassign [bond atoms $eh $b] a1 a2
  +
bond delete $eh $b
  +
bond create $eh [list $a1 [atom add $eh *]]
  +
bond create $eh [list $a2 [atom add $eh *]]
  +
}
  +
foreach fragsmiles [ens get $eh M_SMILES] {
  +
puts $fragsmiles
  +
}
  +
  +
</pre>
  +
  +
Output:
  +
<pre>
  +
C1(=C(C2=C(C(=C1[H])[H])N(C(C(N=C2*)([H])*)=O)*)[H])[H]
  +
C([H])([H])(*)*
  +
C(=O)(O[H])*
  +
C([H])([H])(*)*
  +
C3(=C(C(=C(C(=C3[H])[H])O[H])[H])[H])*
  +
C4(=C(C(=C(C(=C4[H])[H])[H])O[H])[H])*
  +
</pre>
  +
  +
And here is the requested version using SMIRKS with the provided rotatable bond definition pattern:
  +
<pre lang="tcl">
  +
set eh [ens create "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O"]
  +
prop setparam E_SMILES useexplicith 1
  +
set transform {[!$([NH]!@C(=O))&!D1&!$(*#*):1]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*):2]>>[*:1]*.[*:2]*}
  +
set eh [ens transform $eh [list $transform] forward exhaustive]
  +
foreach fragsmiles [ens get $eh M_SMILES] {
  +
puts $fragsmiles
  +
}
  +
</pre>
  +
  +
The output is the same. Using SMIRKS is slower than the first version, though, and it requires toolkit version 3.384 or later. Previous versions ignored non-standard atoms for degree computations, without the option to include query atoms, and since the D attribute is used in the SMIRKS transform, the outcome was not what is expected.
  +
  +
==Cactvs/Python==
  +
  +
The first solution scripted in Python looks like this:
  +
  +
<pre lang="python">
  +
e=Ens("c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O")
  +
Prop.Setparam('E_SMILES','useexplicith',1)
  +
f=Filter('rotbonds','property','B_ROTATABILITY','operator','=','value1','yes')
  +
for b in e.bonds(f):
  +
(a1,a2)=b.atoms()
  +
b.delete()
  +
Bond(e,(a1,Atom(e,'*')))
  +
Bond(e,(a2,Atom(e,'*')))
  +
for fragsmiles in e.M_SMILES:
  +
print(fragsmiles)
  +
</pre>
  +
  +
The output is identical.
  +
[[Category:Cactvs/Tcl]]
  +
[[Category:editing]]
  +
[[Category:SMILES]]
  +
[[Category:SMARTS]]
  +
[[Category:OpenEye/Python]]
  +
[[Category:RDKit/Python]]

Latest revision as of 18:16, 11 October 2013

This task is meant to give examples of how to add and delete atoms and bonds from the molecular graph, using the graph API.

One of the researchers I worked with asked me for a program which took an input molecule and broke it apart at the rotatable bonds. When a bond breaks, each fragment gets attached to an atom with atomic number 0, that is "*".

For a simple example, "c1ccccc1C=O" would become "c1ccccc1*" and "*C=O".

"Rotatable bond" was defined as the bond in the following SMARTS

[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]

but a toolkit may have a built-in definition of rotatable bond.


Implementation[]

Write a function or method which takes a molecule as input and finds all of the rotatable bonds using either the toolkit's built-in definition or the above SMARTS definition. Delete each bond from the structure and add an atom with atomic number 0 to each of the atoms which was at the end of the bond. The function may modify the molecule in-place or return a new molecule.

Use the function to write a program which takes the structure "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O" (which is the perceived aromatic form of PubChem CID 391758), breaks the rotatable bonds, and prints the fragments out in SMILES form, one per line.

The output should be something like

*c1(ccc(cc1)O)*
*c1(cccc(c1)O)*
*[CH]1(C(=O)[N](c2ccccc2C(=N1)(*)*)(*)*)*
*[CH2](*)(*)*
*[CH2](*)(*)*
*C(=O)(*)O

although the specific strings will depend on the toolkit's canonicalization algorithm.

If the toolkit can do this with SMIRKS then feel free to show how that works, in addition to a solution which manipulates the graph directly. (Frankly, I haven't figured out how to do it with SMIRKS in OEChem.)

OpenBabel/Rubabel[]

require 'rubabel'
smarts = "[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]"
mol = Rubabel["c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O"]
mol.matches(smarts).each do |atom1, atom2|
  mol.delete(atom1.get_bond(atom2))
  [atom1, atom2].each do |old_a|
    mol.add_bond!(old_a, mol.add_atom!(0))
  end
end

OpenEye/Python[]

from openeye.oechem import *

rotatable_pat = OESubSearch("[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]")

def break_rotatable_bonds(mol):
    # Get the rotatable bonds
    rotatable_bonds = []
    for match in rotatable_pat.Match(mol):
        rotatable_bonds.extend(match.GetTargetBonds())

    # Delete each bond
    for bond in rotatable_bonds:
        # Get the atoms it's bonded too then delete the atom
        bond_atoms = bond.GetBgn(), bond.GetEnd()
        mol.DeleteBond(bond)
        # Add the dummy atom to each of the ex-bond's atoms
        for atom in bond_atoms:
            new_atom = mol.NewAtom(0)
            mol.NewBond(atom, new_atom, 1)


mol = OEGraphMol()
OEParseSmiles(mol,
              "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O")
OEAssignAromaticFlags(mol)

break_rotatable_bonds(mol)
print OECreateCanSmiString(mol).replace(".", "\n")

SMIRKS version[]

from openeye.oechem import *

def main():
    '''
    '''
    SMIRKS = "[!$([NH]!@C(=O))&!D1&!$(*#*):1]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*):2]>>[*:1]*.*[*:2]"
    
    molecule = OEGraphMol()
    OEParseSmiles(molecule, 'c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O')
    
    libgen = OELibraryGen(SMIRKS)
    libgen.SetValenceCorrection(True)
    libgen.SetExplicitHydrogens(False)
    libgen.SetStartingMaterial(molecule, 0)

    for product in libgen.GetProducts():
        partcount,partlist = OEDetermineComponents(product)
        pred = OEPartPredAtom(partlist)
        
        for i in range(partcount):
            fragment = OEGraphMol()
            pred.SelectPart(i+1)
            OESubsetMol(fragment, product, pred)
            
            print OECreateCanSmiString(fragment),
        
        print
main()

"*C1=NC(C(=O)N(c2c1cccc2)CC(=O)O)Cc3ccc(cc3)O" "*c1cccc(c1)O"
"*C1C(=O)N(c2ccccc2C(=N1)c3cccc(c3)O)CC(=O)O" "*Cc1ccc(cc1)O"
"*N1c2ccccc2C(=NC(C1=O)Cc3ccc(cc3)O)c4cccc(c4)O" "*CC(=O)O"
"*CN1c2ccccc2C(=NC(C1=O)Cc3ccc(cc3)O)c4cccc(c4)O" "*C(=O)O"
"*CC1C(=O)N(c2ccccc2C(=N1)c3cccc(c3)O)CC(=O)O" "*c1ccc(cc1)O"

RDKit/Python[]

from rdkit import Chem

patt = Chem.MolFromSmarts('[!$([NH]!@C(=O))&!D1&!$(*#*)]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*)]')
mol = Chem.MolFromSmiles("c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O")

# find the rotatable bonds
bonds = mol.GetSubstructMatches(patt)

# create an editable molecule, break the bonds, and add dummies:
em = Chem.EditableMol(mol)
nAts = mol.GetNumAtoms()
for a,b in bonds:
    em.RemoveBond(a,b)
    em.AddAtom(Chem.Atom(0))
    em.AddBond(a,nAts,Chem.BondType.SINGLE)
    em.AddAtom(Chem.Atom(0))
    em.AddBond(b,nAts+1,Chem.BondType.SINGLE)
    nAts+=2
p = em.GetMol()
Chem.SanitizeMol(p)

smis = [Chem.MolToSmiles(x,True) for x in Chem.GetMolFrags(p,asMols=True)]
for smi in smis: print smi

# ----------------------------------
# There's actually an easier way to do this since the BRICS decomposition code has
# exactly the function we need:
from rdkit.Chem import BRICS
bonds = [((x,y),(0,0)) for x,y in bonds]
p = BRICS.BreakBRICSBonds(mol,bonds=bonds)

smis = [Chem.MolToSmiles(x,True) for x in Chem.GetMolFrags(p,asMols=True)]
for smi in smis: print smi


Cactvs/Tcl[]

set eh [ens create "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O"]
prop setparam E_SMILES useexplicith 1
filter create rotbonds property B_ROTATABILITY operator = value1 yes
foreach b [ens bonds $eh rotbonds] {
        lassign [bond atoms $eh $b] a1 a2
        bond delete $eh $b
        bond create $eh [list $a1 [atom add $eh *]]
        bond create $eh [list $a2 [atom add $eh *]]
}
foreach fragsmiles [ens get $eh M_SMILES] {
        puts $fragsmiles
}

Output:

C1(=C(C2=C(C(=C1[H])[H])N(C(C(N=C2*)([H])*)=O)*)[H])[H]
C([H])([H])(*)*
C(=O)(O[H])*
C([H])([H])(*)*
C3(=C(C(=C(C(=C3[H])[H])O[H])[H])[H])*
C4(=C(C(=C(C(=C4[H])[H])[H])O[H])[H])*

And here is the requested version using SMIRKS with the provided rotatable bond definition pattern:

set eh [ens create "c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O"]
prop setparam E_SMILES useexplicith 1
set transform {[!$([NH]!@C(=O))&!D1&!$(*#*):1]-&!@[!$([NH]!@C(=O))&!D1&!$(*#*):2]>>[*:1]*.[*:2]*}
set eh [ens transform $eh [list $transform] forward exhaustive]
foreach fragsmiles [ens get $eh M_SMILES] {
        puts $fragsmiles
}

The output is the same. Using SMIRKS is slower than the first version, though, and it requires toolkit version 3.384 or later. Previous versions ignored non-standard atoms for degree computations, without the option to include query atoms, and since the D attribute is used in the SMIRKS transform, the outcome was not what is expected.

Cactvs/Python[]

The first solution scripted in Python looks like this:

e=Ens("c1ccc2c(c1)C(=NC(C(=O)N2CC(=O)O)Cc3ccc(cc3)O)c4cccc(c4)O")
Prop.Setparam('E_SMILES','useexplicith',1)
f=Filter('rotbonds','property','B_ROTATABILITY','operator','=','value1','yes')
for b in e.bonds(f):
   (a1,a2)=b.atoms()
   b.delete()
   Bond(e,(a1,Atom(e,'*')))
   Bond(e,(a2,Atom(e,'*')))
for fragsmiles in e.M_SMILES:
   print(fragsmiles)

The output is identical.