(filled in the task details) |
(added sections, including an implementation for OpenEye) |
||
Line 1: | Line 1: | ||
Ertle, Rohde, and Selzer (J. Med. Chem., 43:3714-3717, 2000) published an algorithm for fast molecular polar surface area (PSA). Part of it involves summing up partial surface values based on fragment contributions. Each fragment corresponds to a SMARTS match. |
Ertle, Rohde, and Selzer (J. Med. Chem., 43:3714-3717, 2000) published an algorithm for fast molecular polar surface area (PSA). Part of it involves summing up partial surface values based on fragment contributions. Each fragment corresponds to a SMARTS match. |
||
+ | |||
− | |||
+ | |||
− | The goal of this task is get an idea of how to do a set of SMARTS matches when the data comes in from an external table. In this case it's a data table from TJ O'Donnell's [http://www.gnova.com/index.php?page=http://www.gnova.com/software.html CHORD chemistry extension] for PostgreSQL, listed at http://www.gnova.com/book/tpsa.tab |
+ | The goal of this task is get an idea of how to do a set of SMARTS matches when the data comes in from an external table. In this case it's a data table from TJ O'Donnell's [http://www.gnova.com/index.php?page=http://www.gnova.com/software.html CHORD chemistry extension] for PostgreSQL, listed at http://www.gnova.com/book/tpsa.tab and available for use here with permission. Each line in the file contains three tab-separated fields. The first line is the header. The other lines define a fragment contribution. The first field is the partial surface area contribution, for each SMARTS pattern match defined in the second column. The last column is a comment. |
To compute the topological polar surface area (for purposes of this task) of a given structure, take the sum over all fragment contributions, weighted by the number of times that fragment matches. |
To compute the topological polar surface area (for purposes of this task) of a given structure, take the sum over all fragment contributions, weighted by the number of times that fragment matches. |
||
+ | ==Implementation== |
||
⚫ | |||
+ | |||
⚫ | Write a function or method named "TPSA" which gets its data from the file "tpsa.tab". The function should take a molecule record as input, and return the TPSA value as a float. Use the function to calculate the TPSA of "CN2C(=O)N(C)C(=O)C1=C2N=CN1C". The answer should be 56.22, which agrees exactly with [http://www.daylight.com/meetings/emug00/Ertl/tpsa.html Ertl's online TPSA tool] but not with [http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2519 PubChem's value of 58.4]. |
||
+ | |||
+ | ==OpenEye/Python== |
||
+ | |||
+ | <source lang="python"> |
||
+ | from openeye.oechem import * |
||
+ | import collections |
||
+ | |||
+ | # Some place to store the pattern defintions |
||
+ | Pattern = collections.namedtuple("Pattern", ["value", "subsearch"]) |
||
+ | patterns = [] |
||
+ | |||
+ | # Get the patterns from the tpsa.tab file, ignoring the header line |
||
+ | for line in open("tpsa.tab").readlines()[1:]: |
||
+ | # Extract the fields |
||
+ | value, smarts, comment = line.split("\t") |
||
+ | |||
+ | # Use the SMARTS to define a subsearch object |
||
+ | subsearch = OESubSearch(smarts) |
||
+ | |||
+ | # Store for later use |
||
+ | patterns.append( Pattern(float(value), subsearch) ) |
||
+ | |||
+ | # Helper function to count how many times a substructure matches |
||
+ | def count_matches(subsearch, mol): |
||
+ | return sum(1 for match in subsearch.Match(mol)) |
||
+ | |||
+ | def TPSA(mol): |
||
+ | "Compute the topological polar surface area of a molecule" |
||
+ | return sum(count_matches(pattern.subsearch, mol)*pattern.value |
||
+ | for pattern in patterns) |
||
+ | |||
+ | # Test it with the reference structure |
||
+ | mol = OEGraphMol() |
||
+ | OEParseSmiles(mol, "CN2C(=O)N(C)C(=O)C1=C2N=CN1C") |
||
+ | print TPSA(mol) |
||
+ | </source> |
||
[[Category:TPSA]] |
[[Category:TPSA]] |
||
[[Category:feature counts]] |
[[Category:feature counts]] |
Revision as of 04:12, 2 February 2010
Ertle, Rohde, and Selzer (J. Med. Chem., 43:3714-3717, 2000) published an algorithm for fast molecular polar surface area (PSA). Part of it involves summing up partial surface values based on fragment contributions. Each fragment corresponds to a SMARTS match.
The goal of this task is get an idea of how to do a set of SMARTS matches when the data comes in from an external table. In this case it's a data table from TJ O'Donnell's CHORD chemistry extension for PostgreSQL, listed at http://www.gnova.com/book/tpsa.tab and available for use here with permission. Each line in the file contains three tab-separated fields. The first line is the header. The other lines define a fragment contribution. The first field is the partial surface area contribution, for each SMARTS pattern match defined in the second column. The last column is a comment.
To compute the topological polar surface area (for purposes of this task) of a given structure, take the sum over all fragment contributions, weighted by the number of times that fragment matches.
Implementation
Write a function or method named "TPSA" which gets its data from the file "tpsa.tab". The function should take a molecule record as input, and return the TPSA value as a float. Use the function to calculate the TPSA of "CN2C(=O)N(C)C(=O)C1=C2N=CN1C". The answer should be 56.22, which agrees exactly with Ertl's online TPSA tool but not with PubChem's value of 58.4.
OpenEye/Python
from openeye.oechem import *
import collections
# Some place to store the pattern defintions
Pattern = collections.namedtuple("Pattern", ["value", "subsearch"])
patterns = []
# Get the patterns from the tpsa.tab file, ignoring the header line
for line in open("tpsa.tab").readlines()[1:]:
# Extract the fields
value, smarts, comment = line.split("\t")
# Use the SMARTS to define a subsearch object
subsearch = OESubSearch(smarts)
# Store for later use
patterns.append( Pattern(float(value), subsearch) )
# Helper function to count how many times a substructure matches
def count_matches(subsearch, mol):
return sum(1 for match in subsearch.Match(mol))
def TPSA(mol):
"Compute the topological polar surface area of a molecule"
return sum(count_matches(pattern.subsearch, mol)*pattern.value
for pattern in patterns)
# Test it with the reference structure
mol = OEGraphMol()
OEParseSmiles(mol, "CN2C(=O)N(C)C(=O)C1=C2N=CN1C")
print TPSA(mol)