Parse the strings "Q" and "C1C1" as if they were SMILES and store any error or warning messages to a file. These should not be accepted as valid SMILES strings.
There needs to be a similar test for SD files.
Hmm, even better would be to parse a SMILES file and SD file containing errors, to see if the respective readers can skip the errors.
The idea here is that an application (including a web app) may want to report that an input structure was incorrect, and give some information about what was wrong.
OpenBabel/Rubabel[]
require 'rubabel'
File.open("log.txt", 'w') do |out|
%w(Q C1C).each do |smile|
Rubabel[smile] rescue out.puts "bad smiles #{smile}"
end
end
Cactvs/Tcl[]
In Tcl
set fh [open log.txt w]
foreach smiles [list "Q" "C1C"] {
if {[catch {ens create $smiles} msg]} {
puts $fh $msg
}
}
close $fh
The message is "Error: ens create failed: Failed to decode structure data specification"
For file input, you can do something like
set fh [molfile open "dubious.smi"]
while 1 {
if {[catch {molfile read $fh} eh]} {
if {[molfile get $fh eof]} break
puts $eh
continue
}
ens delete $eh
}
molfile close $fh
The logged messges about corrupted records are typically something like "Data file syntax error in line 99 record 6"
All I/O modules in the toolkit have the capability to re-sync the file (trivial for SMILES, not so simple for SDF). In the read loop, this happens automatically.
Cactvs/Python[]
Here essentially the same code in Python:
f=open('log.txt','w')
for smiles in ['Q','C1C']:
try:
e=Ens(smiles)
except Exception as x:
f.write(x.args[0]+"\n")
f.close()
and
f=Molfile('dubious.smi')
while True:
try:
e=f.read()
except Exception as x:
print(x.args[0])
else:
if (e==None): break
e.delete()
f.close()
Note that there is a subtle difference between the Tcl and Python implementations of the structure file input command: In Tcl, hitting EOF raises an error, while on python, a None magic object is returned.