Report how many SD file records are within a certain molecular weight range

Read the benzodiazepine file and report the number of records which contain a molecular weight between 300 and 400. This should include the end ranges, so 300 &le; weight &le; 400 but as none of the compounds have exactly the endpoint masses, it doesn't affect the result.

The output of the program should be a single line reporting "3916".

CDK/Cinfony
Identical to OpenBabel/Pybel and RDKit/Cinfony solutions. from cinfony import cdk

moliter = cdk.readfile("sdf", "benzodiazepine.sdf") print sum(1 for mol in moliter if 300 <= mol.molwt <= 400)

OpenBabel/Pybel
Identical to RDKit/Cinfony and CDK/Cinfony solutions. import pybel

moliter = pybel.readfile("sdf", "benzodiazepine.sdf.gz") print sum(1 for mol in moliter if 300 <= mol.molwt <= 400)

OpenEye/Python
from openeye.oechem import *

ifs = oemolistream ifs.open("benzodiazepine.smi") print sum(1 for mol in ifs.GetOEGraphMols if 300 <= OECalculateMolecularWeight(mol) <= 400)

RDKit/Cinfony
Identical to OpenBabel/Pybel and CDK/Cinfony solutions. from cinfony import rdk

moliter = rdk.readfile("sdf", "benzodiazepine.sdf") print sum(1 for mol in moliter if 300 <= mol.molwt <= 400)

Cactvs/Tcl
set n 0 molfile loop "benzodiazepine.sdf.gz" eh { if {range([ens get $eh E_WEIGHT],300,400)} { incr n } } puts $n

Newer versions of the Cactvs toolkit have a parameter to control whether implicit hydrogens should also be used in the molecular weight determination, which is set by default. Adding hydrogens during the scan as was done in earlier versions of the sample code) is thus no longer needed.

This can be further optimized by directly using a scan function on the input file. For SD files, this is not notably faster then explicit reading and computation, but for files which contain the molecular weight in direct-access and potentially indexed file formats (the Cactvs toolkit provides two such files for optimized access) this can be much faster.

puts [molfile scan benzodiazepine.sdf.gz "E_WEIGHT <-> {300 400}" count]

Cactvs/Python
The Python versions look very similar:

n=0 for e in Molfile('benzodiazepine.sdf.gz'): if (e.E_WEIGHT>=300 and e.E_WEIGHT<=400): n+=1 print(n)

and

print(Molfile.Scan('benzodiazepine.sdf.gz','E_WEIGHT <-> {300 400}','count'))