Chemistry Toolkit Rosetta Wiki
No edit summary
(Undo spam 4440 by 117.59.217.236 (talk))
Line 277: Line 277:
 
</source>
 
</source>
   
  +
==Unix command-line==
Hi friends am 22 now .ma dad got a fiat 1968 model way back when i was in foturh grade..she looked beautiful as always..ma first drive with her was might be when i was in sixth grade ..today am a good d diver and can drive any dam car..we got a indigo marina back in 2005 and after that we din pay much attention towards ma fiat really feel sorry sometimes ..thats because she had a serious break problem ..and so obviously din drive her at all ma fiat had a good paint new interiors new tyres but the break thing was bad i love her a lot she looks dam good even today when i wash her during dasera as if she is sayin am still alive don forget me..the last time when i started her was in mid of 2007 and after that i just gave up din pay attentio n at all but i knew she was good enough but the excitment of the new car ws so much that i din think of correcting her brakes at all..but every year when i use to wash her for dasera i just use to sit an adore her she really looked beutiful,,some days back suddenly ma dad said lets give our fiat to someone who can use it ..i couldnt take at all as if someone is gonna take part of ma heart i then dicided that is somedays back i wont let that happen ..i wont let ma sweet heart some other hands..so now ma brain started workin..i got two bottles of oil poured into her and got a petrol of 200 rs and took ma indigo ka battery and gave her supply i thought she wont even reply at all thought she would be angry with me ..then with ma fingures crossed i twisted the key i heard ma sweet hearts voice after three years ..but now i was still not sure if she will start and will be alive ..i gave her ingnition for more seven times and beleave me at the eight time touch wood yes she started yes she was speaking so very smooth so very nice ..felt like never before .then i got a mechanic and plannin out for goa with her and ma friends friends just wanna tel you its a good car keep her if you have feelings attached to her she wont let you down speak wit her tell her how much you love her and how much you really care for her..love you all fiats and their oweners kapil kumar says:I have been a great fan of fiat .this car is very comfortable and economical to maintain.if u have a good knowledge about cars nothing better than this .these cars need a good knowledge other wise at times it creates a problem as we all know these cars are not new.send me a mail if u want to know how to take care of these cars.kapil kumar says:my email id is( }Glen says:Hi there,I am in the UK and I am wondering where I can get the spares for the Fiat 137D engine, and will they ship to the UK.Thanks in anicipation.Faleel Rahman says:hi all, i am also great fan for Fial Premier padmini. why dont we ask to premier ltd tostart to produce again premier padmini car? i think if we send request Premier auto ltd, they may be consider or will do the survey about that. think .my sugesstion is if premier auto ltd produce the same design but latest technology,Faleel Rahman says:and interior, with reasonable price, i am sure premier padmini will give tuff competition to other international brand cars. dear padmini fans consider to send request to PAL. thanksHemant Bhavsar says:Hi, i like Permier padmini & many people like this car. premier padmini is not failuer model, accualy i.e.some company problems about their personal issue, but people like that to restart premier padmini dissel version added with modify new technology of power stearing, power gear %
 
  +
It is possible to do this on the Unix command-line using a single (very long) line. To make it easier to understand I've written it using an awk script named 'heavy_atom_counts.awk'. Use it as:
  +
  +
<pre>gzcat benzodiazepine.sdf.gz | awk -f heavy_atom_counts.awk</pre>
  +
  +
<pre>
  +
# This file is named "heavy_atom_counts.awk"
  +
# Check track of the line number in the record
  +
{lineno += 1}
  +
  +
# Get the atom counts from the record
  +
lineno == 4 { num_atoms = int(substr($0, 1, 3)); num_hydrogens = 0}
  +
  +
# Count the number of hydrogens
  +
4 < lineno && lineno <= 4+num_atoms && substr($0, 32, 2) == "H " {num_hydrogens++}
  +
  +
# And the end of the record, print the number of heavy atoms and reset
  +
/^\$\$\$\$/ {print num_atoms - num_hydrogens; lineno = 0}
  +
</pre>
  +
   
 
==Cactvs/Tcl==
 
==Cactvs/Tcl==

Revision as of 19:11, 30 June 2014

For each record from the benzodiazepine file, print the total number of heavy atoms in each record (that is, exclude hydrogens). You must get the data from the connection table and not from the tag data. (In particular, don't use PUBCHEM_HEAVY_ATOM_COUNT .)

The output is one output line per record, containing the count as an integer. If at all possible, show how to read directly from the gzip'ed input SD file. The first 5 heavy atom counts are 21, 24, 22, 22 and 22. A copy of the expected output is available for reference.

Some of the hydrogens in the data set have atomic weights of 2 or 3, which may cause difficulties for toolkits which convert 1H hydrogens to implicit form while leaving 2H and 3H to explicit form. Please make sure your output agrees with the reference output.

Suggestion: Please add a couple of '$$$$' SD data lines to the file, to trip those naive implementations which think that a '$$$$' is automatically a record separator!

CDK/Java

Save the code to "countHeavyAtoms.java" then do

javac countHeavyAtoms.java
   java -cp ./:$HOME/cdk-1.2.4.1.jar countHeavyAtoms > countHeavyAtoms.out
import java.io.*;
import org.openscience.cdk.interfaces.IChemObjectBuilder;
import org.openscience.cdk.io.iterator.IteratingMDLReader;
import org.openscience.cdk.Molecule;
import org.openscience.cdk.DefaultChemObjectBuilder;
import org.openscience.cdk.interfaces.IAtom;

public class countHeavyAtoms {	
    public static void main(String[] args) {	
        FileReader sdfile = null;
        try {
            /* CDK does not automatically understand gzipped files */
            sdfile = new FileReader(new File("benzodiazepine.sdf"));
        } catch (FileNotFoundException e) {
            System.err.println("benzodiazepine.sdf not found");
            System.exit(1);
        }
        IteratingMDLReader mdliter = new IteratingMDLReader(sdfile,
                                            DefaultChemObjectBuilder.getInstance()
        while (mdliter.hasNext()) {
            Molecule mol = (Molecule) mdliter.next();
            int numHeavies = 0;
            for (IAtom atom : mol.atoms()) {
                if (atom.getAtomicNumber() > 1) {
                    numHeavies++;
                }
            }
            System.out.println(numHeavies);
        }
    }
}

CDK/Groovy

Save the code to "countHeavyAtoms.groovy" then do:

 groovy countHeavyAtoms.groovy > countHeavyAtoms.out
@GrabResolver(
  name='idea',
  root='http://ambit.uni-plovdiv.bg:8083/nexus/content/repositories/thirdparty/'
)
@Grapes([
  @Grab(
    group='org.openscience.cdk',
    module='cdk-io',
    version='1.4.11'
  ),
  @Grab(
    group='org.openscience.cdk',
    module='cdk-silent',
    version='1.4.11' 
  )
])

import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.io.*;
import org.openscience.cdk.io.iterator.*;
import org.openscience.cdk.silent.*;
import org.openscience.cdk.tools.manipulator.*;
import java.util.zip.GZIPInputStream;

iterator = new IteratingMDLReader(
  new GZIPInputStream(
    new File("benzodiazepine.sdf.gz").newInputStream()
  ),
  SilentChemObjectBuilder.getInstance()
)
iterator.setReaderMode(IChemObjectReader.Mode.STRICT)
while (iterator.hasNext()) {
  mol = iterator.next()
  heavyAtomCount = 0
  for (atom in mol.atoms()) {
    if (1 == atom.atomicNumber || "H" == atom.symbol) {
      // do not count hydrogens
    } else {
      heavyAtomCount++
    }
  }
  println heavyAtomCount
}

CDK/Cinfony/Jython

from cinfony import cdk

manip = cdk.cdk.tools.manipulator.AtomContainerManipulator()
for mol in cdk.readfile("sdf", "benzodiazepine.sdf"):
    print len(manip.getHeavyAtoms(mol.Molecule))

Chemkit/C++

#include <iostream>

#include <chemkit/atom.h>
#include <chemkit/foreach.h>
#include <chemkit/molecule.h>
#include <chemkit/moleculefile.h>

using namespace chemkit;

int main(void)
{
    MoleculeFile file("benzodiazepine.sdf.gz");
    bool ok = file.read();
    if(!ok){
        std::cerr << "Error: " << file.errorString() << std::endl;
        return -1;
    }

    foreach(const Molecule *molecule, file.molecules()){
        int heavyAtomCount = 0;

        foreach(const Atom *atom, molecule->atoms()){
            if(!atom->isTerminalHydrogen()){
                heavyAtomCount++;
            }
        }

        std::cout << heavyAtomCount << std::endl;
    }

    return 0;
}

Indigo/C++

#include "base_cpp/scanner.h"
#include "molecule/molecule.h"
#include "molecule/sdf_loader.h"
#include "molecule/molfile_loader.h"

int main (void)
{
   const char *filename = "benzodiazepine.sdf"; // you can use ".sdf.gz" as well

   try
   {
      FileScanner scanner(filename);
      SdfLoader loader(scanner);

      while (!loader.isEOF())
      {
         loader.readNext();
         Molecule mol;
         BufferScanner molscan(loader.data);
         MolfileLoader molload(molscan);
         molload.loadMolecule(mol, false);
            
         int heavy_count = 0;
         for (int i = mol.vertexBegin(); i != mol.vertexEnd(); i = mol.vertexNext(i))
            if (mol.getAtom(i).label > ELEM_H)
               heavy_count++;

         cout << heavy_count << endl;
      }
   }
   catch (Exception &e)
   {
      cerr << "error: " << e.message() << endl;
      return -1;
   }

   return 0;
}

Instructions:

  1. Unpack 'graph' and 'molecule' projects into some folder
  2. Create 'utils' folder nearby
  3. Paste the above code into utils/count_heavy.cpp file
  4. Compile the file using the following commands:
    $ cd graph; make CONF=Release32; cd ..
    $ cd molecule; make CONF=Release32; cd ..
    $ cd utils
    $ gcc count_heavy.cpp -o count_heavy -O3 -m32 -I.. -I../common ../molecule/dist/Release32/GNU-Linux-x86/libmolecule.a ../graph/dist/Release32/GNU-Linux-x86/libgraph.a -lpthread -lstdc++
  5. Run the program like that:
    $ ./count_heavy

Indigo/Python

from indigo import Indigo
indigo = Indigo()

for mol in indigo.iterateSDFile("benzodiazepine.sdf.gz"):
    print mol.countHeavyAtoms()

OpenBabel/Python

import openbabel as ob

obconversion = ob.OBConversion()
obconversion.SetInFormat("sdf")

obmol = ob.OBMol()
notatend = obconversion.ReadFile(obmol, "benzodiazepine.sdf.gz")
while notatend:
    print obmol.NumHvyAtoms()
    # reuse the molecule for 0.3 seconds of savings vs. making an new one
    obmol.Clear()
    notatend = obconversion.Read(obmol)

OpenBabel/Pybel

This builds on and simplifies OpenBabel

import pybel

for mol in pybel.readfile("sdf", "benzodiazepine.sdf.gz"):
    print mol.OBMol.NumHvyAtoms()

OpenBabel/Python

import openbabel as ob

obconversion = ob.OBConversion()
obconversion.SetInFormat("sdf")

obmol = ob.OBMol()
notatend = obconversion.ReadFile(obmol, "benzodiazepine.sdf.gz")
while notatend:
    print obmol.NumHvyAtoms()
    # reuse the molecule for 0.3 seconds of savings vs. making an new one
    obmol.Clear()
    notatend = obconversion.Read(obmol)

OpenBabel/Rubabel

require 'rubabel'
Rubabel.foreach("benzodiazepine.sdf.gz") {|mol| puts mol.ob.num_hvy_atoms }

OpenEye/Python

from openeye.oechem import *

ifs = oemolistream()
ifs.open("benzodiazepine.sdf.gz")
for mol in ifs.GetOEGraphMols():
    print OECount(mol, OEIsHeavy())
    # Can't use
    #    print mol.NumAtoms()
    # because the input structure contains explicit hydrogens.


RDKit/Python

I could also have iterated over the atoms and only counted the non-hydrogen ones, but I doubt that would be any faster or elegant.

from rdkit import Chem

# RDKit does not seem to support gzip'ed SD files
suppl = Chem.SDMolSupplier("benzodiazepine.sdf")
heavy_patt = Chem.MolFromSmarts("[!#1]")

for mol in suppl:
    print len(mol.GetSubstructMatches(heavy_patt))

The method "mol.GetNumAtoms()" is supposed to report only the non-hydrogen counts, but actually reports the number of explicit atoms in the molecular graph.


RDKit/Cinfony/Python

from cinfony import rdk

for mol in rdk.readfile("sdf", "benzodiazepine.sdf"):
    print mol.calcdesc(['HeavyAtomCount'])['HeavyAtomCount']

Unix command-line

It is possible to do this on the Unix command-line using a single (very long) line. To make it easier to understand I've written it using an awk script named 'heavy_atom_counts.awk'. Use it as:

gzcat benzodiazepine.sdf.gz | awk -f heavy_atom_counts.awk
# This file is named "heavy_atom_counts.awk"
# Check track of the line number in the record
{lineno += 1}

# Get the atom counts from the record
lineno == 4 { num_atoms = int(substr($0, 1, 3)); num_hydrogens = 0}

# Count the number of hydrogens
4 < lineno && lineno <= 4+num_atoms && substr($0, 32, 2) == "H " {num_hydrogens++}

# And the end of the record, print the number of heavy atoms and reset
/^\$\$\$\$/ {print num_atoms - num_hydrogens; lineno = 0}


Cactvs/Tcl

molfile loop "benzodiazepine.sdf.gz" eh {
   puts [ens get $eh E_HEAVY_ATOM_COUNT]
}

The toolkit can read compressed files directly. The loop construct automatically discards the structure read for each record, so this works on arbitrarily big files. Also, the format of the input file can be anything which can be identified automatically, so this is not limited to SD files.

Cactvs/Python

for eh in Molfile('benzodiazepine.sdf.gz'):
    print(eh.E_HEAVY_ATOM_COUNT)

The Python language version has the same capabilities as the Tcl variant.