Document Details

DazzlingFreedom

Uploaded by DazzlingFreedom

University of the Philippines Manila

Billones

Tags

molecular descriptors chemical structure pharmacophore chemistry

Summary

This document provides lecture notes from the University of the Philippines Manila on molecular descriptors. It covers different types of descriptors, including constitutional, physicochemical, and topological descriptors; as well as related calculations and methods for analysis. The text is focused on computational methods to understand molecular properties.

Full Transcript

3 MOLECULAR DESCRIPTORS Introduction Descriptors Calculated From The 2D Structure Descriptors Based On 3D Representation s Data Verification and Manipulation Billones Lecture Notes 3.1 Introductions Molecular descriptors • allows manipulation and analysis of chemical structural information...

3 MOLECULAR DESCRIPTORS Introduction Descriptors Calculated From The 2D Structure Descriptors Based On 3D Representation s Data Verification and Manipulation Billones Lecture Notes 3.1 Introductions Molecular descriptors • allows manipulation and analysis of chemical structural information • numerical values that characterize properties of molecules • may represent the physicochemical properties of a molecule or they may be values that are derived by applying algorithmic techniques to the molecular structures. • in general, the computational requirements increase with the level of discrimination that is achieved. e.g. MW does not convey much about a properties but fast to compute. • some descriptors have an experimental counterpart (e.g. the octanol–water partition coefficient), some are purely algorithmic constructs (e.g. 2D fingerprints). Billones Lecture Notes 3.2 Descriptors Calculated from the 2D Structure 3.2.1 Simple Counts • simplest descriptors; based on simple counts of features o o o o o # hydrogen bond donors (HBD), # hydrogen bond acceptors (HBA) # ring systems (Nring) # rotatable bonds (Nrot) molecular weight (MW) • substructures or molecular fragments calculated from a 2D connection table • low level of discrimination; often used in combination with other descriptors Billones Lecture Notes Count of Molecular Fragments Billones Lecture Notes Constitutional Descriptors Billones Lecture Notes 3.2.2 Physicochemical Properties • Hydrophobicity is an important property in determining the activity and transport of drugs  a molecule’s hydrophobicity can affect how tightly it binds to a protein and its ability to pass through a cell membrane.  it is most commonly modelled using the logarithm of the partition coefficient between n-octanol and water (log P) • log P was based on an additive scheme whereby the log P for a compound with a substituent X is equal to the log P for the parent compound plus the appropriate substituent constant πX [Fujita 1964]. Billones Lecture Notes • Another method for estimating log P is based on breaking the molecule into fragments. • The partition coefficient for the molecule then equals the sum of fragment values plus a series of “correction factors” to account for interactions between the fragments such as intramolecular hydrogen bonding [Rekker 1977, 1992] where there are ai fragments of type i with fi being the corresponding contribution and bj occurrences of correction factor j with Fj being the correction factor. • The most widely used program of this type is the ClogP program, developed by Leo and Hansch [1993]. Billones Lecture Notes ClogP calculations for Benzyl bromide and o-Methyl acetanilide • advantage of fragment-based approach is that electronic interactions can be taken into account • potential disadvantage is that it fails on molecules containing fragments for which values have not been provided Billones Lecture Notes Molecular Properties 1. 2. 3. 4. 5. 6. 7. Ghose­Crippen octanol water coefficient (ALOGP) Ghose­Crippen octanol water coefficient squared (ALOGP2) Ghose­Crippen molar refractivity (AMR) Wang octanol water partition coefficient (XLOGP) Wang octanol water partition coefficient squared (XLOGP2) Hydrophilic index (Hy) Unsaturation index (Ui) AMR = åk a MR k N akMR is the molar refractivity group contribution coefficient for the kth molar fragment type N kMR is the number of occurrences for the kth molar refractivity fragment type N Hy 1ö æ1 ç log 2 ÷ + Aø A2 èA Hy = log 2 (1 + A) fNhydrophilic (or the total number of hydrogens atta of hydrophilic groups Hy is the numbergroups Hy 2 Hy n,Nsulfur, or nitrogen atoms), N is the nu C is the number of carbon atoms hilicity index is described in more datoms A is the number of non­hydrogen Handbook of Molecu ak is the group contribution coefficient for the kth fragment type ient forthe the kth fragm ences for number kth fragment type. The ALOGP descriptor for the kth fragment typeis descr Nk is the of occurrences : ALOGP2 = ALOGP 2 ALOGP2 is simply the square of ALOGP XLOGP = å ai Ai + å b j B j i MR k (1 + N )log (1 + N ) + N ALOGP = åk a k N k C j Ai is the occurrence of the ith atom type type and B isofthe ,Baj isisthe the contribution of ith atom typfactor, occurrence thethe jthoccurrence correction dabi isisthe thecontribution contributionofofthe theith jthatom correcti type XLOGP descriptor is described in mfactor, bThe contribution of the jth correction j is the detail by Wang and coworkers (Wang et XLOGP2 = XLOGP 2 XLOGP2 is simply the square of XLOGP UI = log 2 (1 + nDB + nTB + nAB ) nDB = the number of double bonds nTB = number of triple bonds, nAB = number described in theofuser manualbonds f aromatic ragon (Talete Billones Lecture Notes 3.2.3 Molar Refractivity The molar refractivity is given by: n = refractive index, d = density, MW = molecular weight • used as a measure of the steric bulk of a molecule  The refractive index term accounts for polarizability of the molecule and does not vary much from one molecule to another. • calculated using atomic contributions Billones Lecture Notes 3.2.4 Topological Indices Topological indices • single-valued descriptors that can be calculated from the 2D graph representation of molecules • they characterize structures according to their size, degree of branching and overall shape e.g. Wiener Index  involves counting the number of bonds between each pair of atoms and summing the distances, Dij, between all such pairs: Billones Lecture Notes Topological Descriptors Billones Lecture Notes MOLECULAR CONNECTIVITY INDICES • Introduced by Randic ́ [1975] and developed by Kier and Hall [1986]. Branching index • calculated from the hydrogen-suppressed graph representation of a molecule • based on the degree δi of each atom i.  the degree equals the number of adjacent non-hydrogen atoms  a bond connectivity value is calculated for each bond as the reciprocal of the square root of the product of the degree of the two atoms in the bond • equals the sum of the bond connectivities over all of the bonds in the molecule: Billones Lecture Notes Chi molecular connectivity indices [Kier and Hall] • δi value is redefined in terms of the number of sigma electrons and the number of hydrogen atoms associated with an atom. • valence δiv values are also introduced; these encode atomic and valence state electronic information through counts of sigma, pi and lone pair electrons Simple Delta Value, δi is given by: σi = number of sigma electrons for atom i hi = number of hydrogen atoms bonded to atom i Billones Lecture Notes Valence delta value for atom i, δiv is defined as: Ziv = number of valence electrons (sigma, pi and lone pair electrons) for atom i For elements beyond fluorine in the periodic table the valence delta expression is modified as follows: Zi = atomic number Billones Lecture Notes Values of δi and δiv for several common types of atom • the simple delta value differentiates –CH3 from –CH2 • while –CH3 has the same simple delta value as –NH2 it has a different valence delta value and so the two atoms can be differentiated using δiv Billones Lecture Notes The chi molecular connectivity indices are sequential indices that sum the atomic delta values over bond paths of different lengths. Zeroth order chi index (0χ) • summation over all atoms in a molecule (i.e. paths of length zero): First-order chi index (1χ) • summation over bonds • the same as Randic ́’s branching index when simple delta values are used Billones Lecture Notes Higher-order chi indices - summations over sequences of two, three, etc. bonds. Chi indices for the various isomers of hexane. 𝑒. 𝑔. 𝑛 − ℎ𝑒𝑥𝑎𝑛𝑒: 2𝜒 = ( )( )( ) + ( )( )( ) + ( )( )( ) + ( )( )( ) = 1.707 Billones Lecture Notes Chi Connectivity Indices Billones Lecture Notes Assignment A. Complete the table. Paths of Length 2 Paths of Length 3 Paths of Length 4 0χ 1χ 2χ 3χ Billones Lecture Notes 3.2.5 Kappa Shape Indices Kappa shape indices [Hall and Kier 1991] • designed to characterize aspects of molecular shape by comparing a molecule with the “extreme shapes” that are possible for that number of atoms. • there are shape indices of various order (first, second, etc.) - The first-order shape index involves a count over single bond fragments. The first-order kappa index is defined as: 1P - number of edges (or paths of length one) in the completely connected graph 1P min - number of bonds in the linear molecule 1P - number of bonds in the molecule for which the shape index is being calculated max Billones Lecture Notes The two extreme shapes are the linear molecule and the completely connected graph where every atom is connected to every other atom. Extreme shapes used in the first- and second-order kappa indices for graphs containing four, five and six atoms. • • The linear molecule corresponds to the minimum (middle column). The maximum for the first-order index corresponds to the completely connected graph (lefthand column) and for the second-order index to the star shape (right-hand column). Billones Lecture Notes For a molecule containing A atoms, 1P min = (A−1) and 1P max = A(A−1)/2 Thus 1κ becomes: The second-order kappa index is determined by the count of two-bond paths, 2P. 2P min =A−2 and 2P max = ( A − 1)( A − 2)/2 Thus, Billones Lecture Notes • The kappa indices themselves do not include any information about the identity of the atoms. Kappa–alpha indices • include atom identity • the alpha value for an atom i is a measure of its size relative to some standard (sp3-hybridized carbon): The kappa–alpha indices: where α is the sum of the αi ’s for all atoms in the molecule Billones Lecture Notes Kappa Shape Indices The kappa flexibility index (phia) is given by phia = k a 2k a 1 A on page 178 of th Billones Lecture Notes 3.2.6 Electrotopological State Indices Electrotopological state indices [Hall 1991] • determined for each atom (including hydrogen atoms, if so desired) rather than for whole molecules. • depend on the intrinsic state of an atom, Ii , which for an atom i in the first row of the periodic table is given by: • The intrinsic state encodes electronic and topological characteristics of atoms. • The effects of interactions with the other atoms are incorporated by determining the number of bonds between the atom i and each of the other atoms, j. Billones Lecture Notes For path length rij, the perturbation ΔIij is defined as: Electrotopological state (E-state) for an atom is given by the sum of ΔIij and Ii. Atomic E-states can be combined into a whole-molecule descriptor by calculating the mean-square value over all atoms. Molconn-Z program provides access to several hundred different E-state descriptors. Billones Lecture Notes Electrotopological State Indices Billones Lecture Notes Billones Lecture Notes Information Indices Billones Lecture Notes Molecular Distance-Edge Vector Billones Lecture Notes Burden Eigenvalue Descriptors 24. Highest eigenvalue n. 8 of Burden matrix / weighted by atomic van der Waals volumes (BEHv8) 43. Lowest eigenvalue n. 3 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe3) 25. Lowest eigenvalue n. 1 of Burden matrix / weighted by atomic van der Waals volumes (BELv1) 44. Lowest eigenvalue n. 4 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe4) 26. Lowest eigenvalue n. 2 of Burden matrix / weighted by atomic van der Waals volumes (BELv2) 27. Lowest eigenvalue n. 3 of Burden matrix / weighted by atomic van der Waals volumes (BELv3) 28. Lowest eigenvalue n. 4 of Burden matrix / weighted by atomic van der Waals volumes (BELv4) 29. Lowest eigenvalue n. 5 of Burden matrix / weighted by atomic van der Waals volumes (BELv5) 30. Lowest eigenvalue n. 6 of Burden matrix / weighted by atomic van der Waals volumes (BELv6) 31. Lowest eigenvalue n. 7 of Burden matrix / weighted by atomic van der Waals volumes (BELv7) 45. Lowest eigenvalue n. 5 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe5) 46. Lowest eigenvalue n. 6 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe6) 47. Lowest eigenvalue n. 7 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe7) 48. Lowest eigenvalue n. 8 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe8) 49. Highest eigenvalue n. 1 of Burden matrix / weighted by atomic polarizabilities (BEHp1) 50. Highest eigenvalue n. 2 of Burden matrix / weighted by atomic polarizabilities (BEHp2) 32. Lowest eigenvalue n. 8 of Burden matrix / weighted by atomic van der Waals volumes (BELv8) 51. Highest eigenvalue n. 3 of Burden matrix / weighted by atomic polarizabilities (BEHp3) 33. Highest eigenvalue n. 1 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe1) 53. Highest eigenvalue n. 5 of Burden matrix / weighted by atomic polarizabilities (BEHp5) 34. Highest eigenvalue n. 2 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe2) 35. Highest eigenvalue n. 3 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe3) 36. Highest eigenvalue n. 4 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe4) 37. Highest eigenvalue n. 5 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe5) 38. Highest eigenvalue n. 6 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe6) 39. Highest eigenvalue n. 7 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe7) 40. Highest eigenvalue n. 8 of Burden matrix / weighted by atomic Sanderson electronegativities (BEHe8) 52. Highest eigenvalue n. 4 of Burden matrix / weighted by atomic polarizabilities (BEHp4) 54. Highest eigenvalue n. 6 of Burden matrix / weighted by atomic polarizabilities (BEHp6) 55. Highest eigenvalue n. 7 of Burden matrix / weighted by atomic polarizabilities (BEHp7) 56. Highest eigenvalue n. 8 of Burden matrix / weighted by atomic polarizabilities (BEHp8) 57. Lowest eigenvalue n. 1 of Burden matrix / weighted by atomic polarizabilities (BELp1) 58. Lowest eigenvalue n. 2 of Burden matrix / weighted by atomic polarizabilities (BELp2) 59. Lowest eigenvalue n. 3 of Burden matrix / weighted by atomic polarizabilities (BELp3) 60. Lowest eigenvalue n. 4 of Burden matrix / weighted by atomic polarizabilities (BELp4) 61. Lowest eigenvalue n. 5 of Burden matrix / weighted by atomic polarizabilities (BELp5) 62. Lowest eigenvalue n. 6 of Burden matrix / weighted by atomic polarizabilities (BELp6) 63. Lowest eigenvalue n. 7 of Burden matrix / weighted by atomic polarizabilities (BELp7) 64. Lowest eigenvalue n. 8 of Burden matrix / weighted by atomic polarizabilities (BELp8) 41. Lowest eigenvalue n. 1 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe1) 42. Lowest eigenvalue n. 2 of Burden matrix / weighted by atomic Sanderson electronegativities (BELe2) Billones Lecture Notes Walk and Path Counts 1. Molecular walk count of order 01 (MWC01) 2. Molecular walk count of order 02 (MWC02) 3. Molecular walk count of order 03 (MWC03) 4. Molecular walk count of order 04 (MWC04) 5. Molecular walk count of order 05 (MWC05) 6. Molecular walk count of order 06 (MWC06) 7. Molecular walk count of order 07 (MWC07) 8. Molecular walk count of order 08 (MWC08) 9. Molecular walk count of order 09 (MWC09) 10. Molecular walk count of order 10 (MWC10) 11. Total walk count(TWC) 12. Self­returning walk count of order 01 (SRW01) 13. Self­returning walk count of order 02 (SRW02) 14. Self­returning walk count of order 03 (SRW03) 15. Self­returning walk count of order 04 (SRW04) 16. Self­returning walk count of order 05 (SRW05) 17. Self­returning walk count of order 06 (SRW06) 18. Self­returning walk count of order 07 (SRW07) 19. Self­returning walk count of order 08 (SRW08) 20. Self­returning walk count of order 09 (SRW09) 21. Self­returning walk count of order 10 (SRW10) 22. Molecular path count of order 01 (MPC01) 22. Molecular path count of order 01 (MPC01) 23. Molecular path count of order 02 (MPC02) 24. Molecular path count of order 03 (MPC03) 25. Molecular path count of order 04 (MPC04) 26. Molecular path count of order 05 (MPC05) 27. Molecular path count of order 06 (MPC06) 28. Molecular path count of order 07 (MPC07) 29. Molecular path count of order 08 (MPC08) 30. Molecular path count of order 09 (MPC09) 31. Molecular path count of order 10 (MPC10) 32. Total path count(TPC) 33. Molecular multiple path count of order 01 (piPC01) 34. Molecular multiple path count of order 02 (piPC02) 35. Molecular multiple path count of order 03 (piPC03) 36. Molecular multiple path count of order 04 (piPC04) 37. Molecular multiple path count of order 05 (piPC05) 38. Molecular multiple path count of order 06 (piPC06) 39. Molecular multiple path count of order 07 (piPC07) 40. Molecular multiple path count of order 08 (piPC08) 41. Molecular multiple path count of order 09 (piPC09) 42. Molecular multiple path count of order 10 (piPC10) 43. Conventional bond­order ID number(piID) 44. Randic ID number(CID) Billones Lecture Notes 3.2.7 2D Fingerprints • In dictionary-based fingerprints each bit position often corresponds to a specific substructural fragment. • Fragments that occur infrequently may be more likely to be useful than fragments which occur very frequently. Billones Lecture Notes • Unfortunately, the optimum set of fragments is often data set dependent. • The hashed fingerprints are not dependent on a predefined dictionary so any fragment that is present in a molecule will be encoded. Billones Lecture Notes • It is not possible to map from a bit position back to a unique substructural fragment and so the fingerprints are not directly interpretable. • The fact that 2D Fingerprints “work” as descriptors is probably due to the fact that a molecule’s properties and biological activity often depends on features such as those encoded by 2D Fingerprints. Billones Lecture Notes 3.3 Descriptors Based on 3D Representations 3.3.1 3D Fragment Screens 3D screens • originally designed for use in 3D substructure searching • encode spatial relationships (e.g. distances and angles) between the different features of a molecule such as atoms, ring centroids and planes. • Distance ranges for each pair of features are divided into a series of bins by specifying a bin width. • Valence angle descriptors consist of three atoms, ABC • Torsion angle descriptors consist of four atoms, ABCD • The different types of screens can be combined into a bitstring of length equal to the total number of bins over all feature types. Billones Lecture Notes 3.3.2 Pharmacophore Keys Pharmacophore keys • based on pharmacophoric features, that is, atoms or substructures that are thought to have relevance for receptor binding. • pharmacophoric features typically include hydrogen bond donors, hydrogen bond acceptors, charged centers, aromatic ring centers The generation of 3-point pharmacophore keys, illustrated using benperidol. Two different conformations are shown, together with two different combinations of three pharmacophore points. Billones Lecture Notes 3.3.3 Other 3D Descriptors 3D topographical indices • can be calculated from the distance matrix of a molecule • analogous to the topological indices which are generated from a 2D connection table. Geometric atom pairs • extension of atom pair descriptors, which encode all pairs of atoms in a molecule together with the length of the shortest bond-by-bond path between them Others • HOMO and LUMO energies, • Molecular electrostatic potentials • dipole moments • Etc. Billones Lecture Notes 3.4 Data Verification and Manipulation • examine their characteristics of the descriptors prior to using them in an analysis. • evaluate the distribution of values for a given descriptor • check for correlations between different descriptors which could lead to over-representation of certain information. Manipulation of the data may be required. • could involve a simple technique such as scaling to ensure that each descriptor contributes equally to the analysis • may involve a more complex technique such as Principal Components Analysis (PCA) that results in a new set of descriptors with more desirable characteristics. Billones Lecture Notes 3.4.1 Data Spread and Distribution • examine the spread of values for the data set. • if the values show no variation then there is nothing to be gained from inclusion of the descriptor. • the values of a descriptor should (in some cases) follow a particular distribution, often the normal distribution. Coefficient of variation can be used to assess the spread of a descriptor. • equal to the standard deviation (σ ) divided by the mean ( x ): N = data points; xi = value of descriptor x for data point i. • The larger the coefficient of variation the better the spread of values. Billones Lecture Notes 3.4.2 Scaling • If descriptors have different numerical ranges, they are scaled so that each one has an equal chance of contributing to the overall analysis. • scaling is also often referred to as standardization. Ways to scale data: Unit variance scaling (also known as auto-scaling) • each descriptor value is divided by the standard deviation for that descriptor across all observations (molecules). • each scaled descriptor then has variance of one. Billones Lecture Notes Unit variance scaling is usually combined with mean centering in which the average value of a descriptor is subtracted from each individual value. In this way all descriptors are centered on zero and have a standard deviation of one: xi′ is the new, transformed value of the original xi. Range scaling • uses a related expression in which the denominator equals the difference between the maximum and minimum values. Billones Lecture Notes 3.4.3 Correlations • Correlations between the descriptors should also be checked as a matter of routine to avoid over-representation. • Many correlations can be identified from simple scatterplots of pairs of descriptor values. • Ideally, the points will be distributed with no discernible pattern. A pair of descriptors with no correlation will have points in all four quadrants of the scatter plot and with no obvious pattern or correlation. Billones Lecture Notes When many descriptors need to be considered then it is more convenient to compute a pairwise correlation matrix. This quantifies the degree of correlation between all pairs of descriptors. The correlation coefficient, r, is given by: Each entry (i, j) in the correlation matrix is the correlation coefficient between the descriptors xi and xj. Billones Lecture Notes The values of the correlation coefficient range from −1.0 to +1.0. • r = +1.0 means perfect positive correlation; a line with a positive slope • r = −1.0 means perfect negative correlation, negative slope • r=0 means no relationship between the variables e.g. Correlation matrix for amino acid data. • there is a high degree of positive correlation between the two lipophilicity constants (LCE and LCF, and between volume (VOL) and the solvent-accessible surface area ASA). • there is a strong negative correlation between the FET and LCE and LCF parameters. Billones Lecture Notes 3.4.4 Reducing the Dimensionality of a Data Set: PCA The dimensionality of a data set is the number of variables that are used to describe each object. Principal Components Analysis (PCA) • commonly used method for reducing the dimensionality of a data set when there are significant correlations between some or all of the descriptors. • provides a new set of variables that have some special properties • it is often found that much of the variation in the data set can be explained by a small number of principal components. • the principal components are also much more convenient for graphical data display and analysis. Billones Lecture Notes • there is a high correlation between the x1 and the x2 values. • most of this variation can be explained by introducing a single variable that is a linear combination of these (i.e. z = x1 − x2). • The new variable (z) is referred to as a principal component. In the general, the principal components is a linear combination of the original variables or descriptors: Billones Lecture Notes Descriptors of amino acids and amino acid side chains • Thus the first two PCs taken together account for 95% of the variance. LCE and LCF are two lipophilicity constants of amino acid side chains, FET is the free energy of transfer of amino acid side chains from organic solvent into water, POL is a polarity parameter, VOL is the molecular volume of the amino acid and ASA is the solvent-accessible surface area of the amino acid. Billones Lecture Notes Loadings plot indicates the coefficients for each of the descriptors in the various principal components. • This shows the relative contribution of each descriptor to the different principal components. • PC1 has reasonably significant contributions from all six descriptors whereas the second principal component is mostly composed of the VOL, ASA and POL terms with little contribution from the LCE, LCF, and FET. • Moreover, the close proximity of the VOL and ASA terms and of the LCF, LCE terms reflects the high degree of correlation between these two pairs of descriptors. Billones Lecture Notes Scores plot shows how the various amino acids relate to each other in the space of the first two principal components. • This plot indicates the underlying structure of the data. • similar amino acids are grouped roughly together, with the charged and polar amino acids in one region (e.g. Arg, Lys, Glu, Asp), the small hydrophobic amino acids forming another group (e.g. Ala, Pro, Val) and the aromatic amino acids in a third region(e.g. Phe, Trp). Billones Lecture Notes B. 1) Calculate the 3D descriptors for the following set of non-steroidal anti-inflammatory drugs (NSAIDs) using the PaDEL descriptor calculator in ChemDes. (http://www.scbdd.com/blue_desc/index/). Make sure to convert the structures into 3D using the MMFF94 forcefield. 2) Combine the data in one spreadsheet (10 rows of compounds x 432 descriptors). Hint: Download the csv file for each compound. Consolidate all files into one MS Excel spreadsheet like the one shown below. 4. Examine each column, identify and remove the descriptors whose values do not vary (i.e. values are all or mostly 0). 5. Download and install RapidMiner Studio. Import your data set to RapidMiner Studio. 6. Apply scaling by normalization and perform Principal Component Analysis. (Hint: Use the corresponding operators and connect them as follows.) 7. Generate a Scores Plot (choose Scatter plot) in which the x-axis is PC1 and the y-axis are the rest of the PCs. (It should look like the plot below.) 8. Which descriptor has the largest contribution to PC1?

Use Quizgecko on...
Browser
Browser