Similarity Methods PDF
Document Details
Uploaded by DazzlingFreedom
University of the Philippines Manila
Billones
Tags
Summary
These lecture notes discuss several methods for determining similarity between molecules, including similarity based on 2D fingerprints, similarity coefficients, and 3D similarity. The notes cover various aspects of similarity searching and evaluation methods. They also touch upon the concept of similar properties and how they can be used.
Full Transcript
5 Similarity Methods Similarity Based on 2D Fingerprints Similarity Coefficients Other 2D Descriptor Methods 3D Similarity Billones Lecture Notes Similarity Searching • offers a complementary alternative to substructure searching and 3D pharmacophore searching. • a query compound is used to...
5 Similarity Methods Similarity Based on 2D Fingerprints Similarity Coefficients Other 2D Descriptor Methods 3D Similarity Billones Lecture Notes Similarity Searching • offers a complementary alternative to substructure searching and 3D pharmacophore searching. • a query compound is used to search a database to find those compounds that are most similar to it. • the database is then sorted in order of decreasing similarity to the query. Advantages of Similarity Searching • no need to define a precise substructure or pharmacophore query o since a single active compound is sufficient to initiate a search • the user has control over the size of the output o every compound in the database is given a numerical score that can be used to generate a complete ranking Billones Lecture Notes • one can specify a particular level of similarity and retrieve just those compounds that exceed the threshold • similarity searching facilitates an iterative approach to searching chemical databases o the top-scoring compounds resulting from one search can be used as queries in subsequent similarity searches. Rationale for Similarity Searching Similar Property Principle [Johnson and Maggiora 1990] “Structurally similar molecules tend to have similar properties. • Thus, given a molecule of known biological activity, compounds that are structurally similar to it are likely to exhibit the same activity. o this is referred to as neighborhood behavior [Patterson 1996] Billones Lecture Notes Evidence for the Neighborhood Principle Morphine, codeine and heroin are all active at opioid receptors • The structural similarities of these compounds is clear. Difficulty with Similarity Searching • subjective assessment of the degree of similarity between two objects o no “hard and fast” rules • to quantify the similarity, a set of numerical descriptors that can be used to compare molecules, and a similarity coefficient are needed Billones Lecture Notes 5.1 Similarity Based on 2D Fingerprints 2D fingerprints • most commonly used similarity method • binary vectors where each bit indicates the presence (“1”) or absence (“0”) of a particular substructural fragment within a molecule Tanimoto coefficient • quantifies the similarity between two molecules • gives a measure of the number of fragments in common between molecules The Tanimoto similarity between molecules A and B, represented by binary vectors, is given by: a = no. of bits set to “1” in molecule A b = no. of bits set to “1” in molecule B C = no of “1” bits common to both A and B. Billones Lecture Notes • the value of the Tanimoto coefficient ranges from 0 to 1 • “1” indicates that the molecules have identical fingerprint representations • ”0” indicates that there is no similarity e.g. Calculating similarity using binary vector representations and the Tanimoto coefficient. Billones Lecture Notes 5.2 Similarity Coefficients Tanimoto coefficient is the most widely used similarity coefficient for binary fingerprints such as structural keys and hashed fingerprints. Billones Lecture Notes Billones Lecture Notes • Tanimoto and Dice coefficients measure similarity directly. • Hamming and Euclidean coefficients provide the distance (or dissimilarity) between pairs of molecules. • Coefficients are monotonic with each other if they produce the same similarity rankings. e.g. The Hamming and Euclidean distances are monotonic as are the Tanimoto and Dice coefficients. • Tanimoto, Dice and Cosine coefficients are all directly dependent upon the number of bits in common. • The presence of common molecular features will therefore tend to increase the values of these coefficients. Billones Lecture Notes • The Hamming and Euclidean distances regard a common absence of features as evidence of similarity. • Small molecules have lower similarity values when using the popular Tanimoto coefficient since they naturally tend to have fewer bits set to “1” than large molecules. • Smaller molecules can appear to be closer together when using the Hamming distance, which does take into account the absence of common features. A comparison of the Soergel and Hamming distance values for two pairs of structures to illustrate the effect of molecular size. Billones Lecture Notes 5.3 Other 2D Descriptor Methods • many of the 2D descriptors can be used to compute similarity values • continuous whole molecule properties such as the calculated log P, molar refractivity and topological indices can also be used • prior to calculating similarities, the data should be scaled to ensure that each descriptor makes an equal contribution to the similarity score • statistical methods such as PCA are often used to reduce a large set of descriptors to a smaller set of orthogonal descriptors. o Basak [1988] reduced 90 topological indices to ten PCs o Xue [1999] used short binary bitstrings consisting of just 32 structural fragments combined with three numerical 2D descriptors (# of aromatic bonds, # of hydrogen bond acceptors and the fraction of rotatable bonds). Billones Lecture Notes 5.4 3D Similarity 2D structure-based methods tend to identify molecules with common substructures, whereas the aim is often to identify structurally different molecules. Molecular recognition depends on the 3D structure and properties (e.g. electrostatics and shape) of a molecule rather than the underlying substructure(s). Similarities to morphine calculated using Daylight fingerprints and the Tanimoto coefficient. Billones Lecture Notes • Methadone has a very low similarity score, but active against opioid receptors. • Methadone can be aligned with morphine to give a good overlap of the benzene rings and the nitrogen atoms. 3D overlay of morphine and methadone showing the superimposition of the basic nitrogens and the benzene rings. • Thus, there’s much interest in similarity measures based on 3D properties. Billones Lecture Notes 5.4.1 Gnomonic Projection Methods Gnomonic projection • the molecule is positioned at the center of a sphere and its properties projected onto the surface of the sphere • the similarity between two molecules is then determined by comparing the spheres. • typically the sphere is approximated by a tessellated icosahedron or dodecahedron • in the SPERM program, molecular properties are mapped onto vertices of an icosahedron • a database structure can then be aligned to a target by rotating it about the center relative to the target until the differences between the vertices from the two molecules is minimized. Billones Lecture Notes • Schematic illustration of the use projections to compare molecules. of gnomonic • In practice each of the triangular faces of the icosahedron is divided (tessellated) into a series of smaller triangles. • the similarity is based on calculating the root mean squared difference between the properties calculated at each of the n vertices, r: where PA,r and PB,r are the properties of molecules A and B at point r. Billones Lecture Notes 5.4.6 Comparison and Evaluation of Similarity Methods • based on the calculation of enrichment factors and hit rates [Edgar 2000] o enrichment factor is ratio of number of actives actually received at a given rank to the actives retrieved by pure chance • based on simulated screening experiments performed using databases of compounds of known activity o If the similarity method is perfect then all active compounds will appear at the top of the list; if it is ineffective then the active compounds will be evenly distributed throughout the list. Simulated screening experiment illustrating how a similarity method can retrieve active compounds more effectively than simple random selection. Billones Lecture Notes Activity 5 Perform similarity search using SwissSimilarity tool (http://swisssimilarity.ch) with methamphetamine as query molecule. CNC(C)Cc1ccccc1 Explore 3 classes of compounds: 1) Bioactive: Compound Library: Screening Method: 2) Commercial: Compound Library: Screening Method: 3) Synthesizable: Compound Library: Screening Method: ChEMBL full database 2D & 3D Combined ZINC (Drug-like) 3D Electroshape Enamine FP2 Report the structure, similarity score, and SMILES of top 10 hits in each method. (Exclude duplicates, and hits that are identical with the query molecule.) Billones Lecture Notes