© 1998 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
Similarity and Dissimilarity Methods for Processing Chemical Structure Databases
1 Krebs Institute for Biomolecular Research and Department of Information Studies, University of Sheffield, Sheffield S10 2TN, UK Email: p.willett{at}sheffield.ac.uk, 2 Glaxo Wellcome Research and Development Limited, Gunnels Wood Road, Stevenage SG1 2NY, UK
This paper reviews measures of similarity and dissimilarity between pairs of chemical molecules and the use of such measures for processing chemical databases. The applications discussed include similarity searching, database clustering and diversity analysis, focusing upon measures that are based on fragment bit-string occurrence data. The paper then discusses recent work on the calculation of similarity by aligning molecular fields and on the selection of structurally diverse subsets of chemical databases.