Chemistry Toolkit Rosetta Wiki

Read the benzodiazepine file and report the number of records which contain a molecular weight between 300 and 400. This should include the end ranges, so 300 ≤ weight ≤ 400 but as none of the compounds have exactly the endpoint masses, it doesn't affect the result.

The output of the program should be a single line reporting "3916".


Identical to OpenBabel/Pybel and RDKit/Cinfony solutions.

from cinfony import cdk

moliter = cdk.readfile("sdf", "benzodiazepine.sdf")
print sum(1 for mol in moliter if 300 <= mol.molwt <= 400)


Identical to RDKit/Cinfony and CDK/Cinfony solutions.

import pybel

moliter = pybel.readfile("sdf", "benzodiazepine.sdf.gz")
print sum(1 for mol in moliter if 300 <= mol.molwt <= 400)


require 'rubabel'
puts Rubabel.foreach("benzodiazepine.sdf.gz").count {|mol| (300..400) === mol.mol_wt }


from openeye.oechem import *

ifs = oemolistream()"benzodiazepine.smi")
print sum(1 for mol in ifs.GetOEGraphMols() if 300 <= OECalculateMolecularWeight(mol) <= 400)


Identical to OpenBabel/Pybel and CDK/Cinfony solutions.

from cinfony import rdk

moliter = rdk.readfile("sdf", "benzodiazepine.sdf")
print sum(1 for mol in moliter if 300 <= mol.molwt <= 400)


set n 0
molfile loop "benzodiazepine.sdf.gz" eh {
  if {range([ens get $eh E_WEIGHT],300,400)} { incr n }
puts $n

Newer versions of the Cactvs toolkit have a parameter to control whether implicit hydrogens should also be used in the molecular weight determination, which is set by default. Adding hydrogens during the scan as was done in earlier versions of the sample code) is thus no longer needed.

This can be further optimized by directly using a scan function on the input file. For SD files, this is not notably faster then explicit reading and computation, but for files which contain the molecular weight in direct-access and potentially indexed file formats (the Cactvs toolkit provides two such files for optimized access) this can be much faster.

puts [molfile scan benzodiazepine.sdf.gz "E_WEIGHT <-> {300 400}" count]


The Python versions look very similar:

for e in Molfile('benzodiazepine.sdf.gz'):
    if (e.E_WEIGHT>=300 and e.E_WEIGHT<=400): n+=1


print(Molfile.Scan('benzodiazepine.sdf.gz','E_WEIGHT <-> {300 400}','count'))