Data Pre-Processing-III (Data Reduction) PDF

Data Pre-Processing-III (Data Reduction) TIET, PATIALA Dimensionality/Data Reduction ▪ The number of input variables or features for a dataset is referred to as its dimensionality. ▪ Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. ▪ More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality. ▪ There exist a optimal number of feature in a feature set for corresponding Machine Learning task. ▪ Adding additional features than optimal ones (strictly necessary) results in a performance degradation ( because of added noise). Dimensionality/Data Reduction “ Challenging task” Dimensionality/Data Reduction Benefits of data reduction Accuracy improvements. Over-fitting risk reduction. Speed up in training. Improved Data Visualization. Increase in explain ability of ML model. Increase storage efficiency. Reduced storage cost. Data Reduction Techniques Feature Selection – Filter methods Wrapper methods Embedded methods find the best set of feature Feature Extraction- Principal Component Analysis methods of constructing Singular-Valued Decomposition combinations of the variables Linear Discriminant Analysis to get around these problems while still describing the data. Feature Selection ▪ Feature selection in machine learning is to find the best set of features that allows one to build useful models of studied phenomena. ▪ The two key drivers used in feature selection are: Maximizing feature relevance Feature contributing significant information for the machine learning model – strongly relevant Feature contributing little information for the machine learning model – weakly relevant Feature contributing no information for the machine learning model – irrelevant Minimizing feature redundancy Information contributed by the feature is similar to the information contributed by one or more other features. Feature Selection (Contd….) Roll Number Age Height Weight Let us consider a student database, with attributes Roll Number, Age , Height and Target Variable (Weight). The objective is to predict a weight for each new test case. Roll Number is irrelevant as it will not provide any information regarding weight of students. Age and Height are redundant as both provide same information. Feature Selection- Measuring Feature Redundancy ▪ Feature Redundancy is measured in terms of similarity information contributed by features. ▪ Similarity information is measured in terms of: Correlation-based features. Distance based features. Feature Selection- Measuring Feature Redundancy To deal with redundant features correlation analysis is performed. Denoted by r. A threshold is decided to find redundant features. r is +ve Feature Selection- Measuring Feature Redundancy Distance-based: ▪The most commonly used distance metric is various forms of Minkowski distance. 𝑛 𝑟 𝑑 𝐹1 , 𝐹2 = ෍(𝐹1𝑖 − 𝐹2𝑖 )𝑟 𝑖=1 It takes the form of Euclidian distance when r =2 (L2 norm) and Manhattandistance when r = 1 (L1 norm). ▪Cosine similarity is another important metric for computing similarity between features. 𝐹1. 𝐹2 𝑐𝑜𝑠 𝐹1 , 𝐹2 = 𝐹1 |𝐹2 | Where F1 and F2 denote feature vectors. Feature Selection- Measuring Feature Redundancy For binary features, following metrics are useful: 1. Hamming distance: number of values which are different in two feature vectors. 2. Jaccard distance: 1- Jaccard Similarity 𝑛11 𝐽𝑎𝑐𝑐𝑎𝑟𝑑 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = 𝑛01 + 𝑛10 + 𝑛11 3. Simple Matching Coefficient (SMC): 𝑛11 + 𝑛00 𝑆𝑀𝐶 = 𝑛00 + 𝑛01 + 𝑛10 + 𝑛11 Where n11, n00 represent number of cases where both features have value 1 and 0 respectively n10 denote cases where feature 1 has value 1 and feature 2 has value 0. n01 denote cases where feature 1 has value 0 and feature 2 has value 1. Overall Feature Selection Process Feature Selection Approaches Filter Approach: ▪ In this approach, the feature subset is selected based on statistical measures. ▪ No learning algorithm is employed to evaluate the goodness of the feature selected. ▪ Commonly used metrics include correlation, chi square, Fisher score, ANOVA, Information Gain, etc. Feature Selection Approaches Wrapper Approach: ▪ In this approach, for every candidate subset, the learning model is trained and the result is evaluated by running the learning algorithm. ▪ Computationally very expensive but superior in performance. ▪ Requires some method to search the space of all possible subsets of features Feature Selection Approaches Wrapper Approach- Searching Methods: ▪ Forward Feature Selection This is an iterative method wherein we start with the best performing variable against the target. Next, we select another variable that gives the best performance in combination with the first selected variable. This process continues until the preset criterion is achieved. ▪ Backward Feature Elimination Here, we start with all the features available and build a model. Next, we the variable from the model which gives the best evaluation measure value. ▪ Exhaustive Feature Selection It tries every possible combination of the variables and returns the best performing subset. Feature Selection Approaches Embedded Approach ▪ These methods encompass the benefits of both the wrapper and filter methods. ▪ It includes interactions of features but also maintaining reasonable computational cost. ▪Embedded methods are iterative in the sense that takes care of each iteration of the model training process and carefully extracts those features which contribute the most to the training for a particular iteration. Feature Extraction ▪Feature extraction, creates new features from a combination of original features. ▪For a given Feature set Fi (F1, F2, F3,……..Fn), feature extraction finds a mapping function that maps it to new feature set Fi’ (F1’, F2’, F3’,…….Fm’) such that Fi’=f(Fi) and m

Data Pre-Processing-III (Data Reduction) PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue