Learning from Data - Lecture Notes PDF

Learning from Data Elshimaa Elgendi, PhD Operations Research and Decision Support Faculty of Computers and Artificial Intelligence Cairo University Logistics Course Google Drive Link https://drive.google.com/drive/folders/1TnPO 8EcQhRNEKhxiBM3Kh4saq7jU0769?usp=shari ng Office hours: Saturday 1 - 3pm or to be scheduled online Email: [email protected] TAs Eng. Ahmed Fouad Course Grade Distribution 4 Assignments (15%) – Both theory and programming Course participation (5%) Midterm (20%) Final exam (60%) Introduction Big Data Widespread use of personal computers and wireless communication leads to “big data” We are both producers and consumers of data Data is not random, it has structure, e.g., customer behavior We need “big theory” to extract that structure from data for (a) Understanding the process (b) Making predictions for the future Why ‘Learn’? Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to “learn” to calculate payroll Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics) What We Talk About When We Talk About “Learning” Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought “Blink” also bought “Outliers” (www.amazon.com) Build a model that is a good and useful approximation to the data. past future Training model/ Testing model/ Data predictor Data predictor Data Mining Retail: Market basket Finance: Credit Manufacturing: analysis, Customer scoring, fraud Control, robotics, relationship detection troubleshooting management (CRM) Telecommunications: Medicine: Medical Bioinformatics: Spam filters, diagnosis Motifs, alignment intrusion detection Web mining: Search... engines 8 Machine Learning is… Machine learning is programming computers to optimize a performance criterion using example data or past experience. -- Ethem Alpaydin The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. -- Kevin P. Murphy What is Machine Learning? Optimize a performance criterion using example data or past experience. Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to Solve the optimization problem Representing and evaluating the model for inference 10 Machine learning tasks Supervised learning ( ) regression: predict numerical values classification: predict categorical values, i.e., labels Ranking Unsupervised learning ( ) clustering: group data according to "distance" association: find frequent co-occurrences link prediction: discover relationships in data data reduction: project features to fewer features Reinforcement learning The main concern is how the algorithm/software agent ought to take actions in an environment, to maximize some notion of reward. Decision making (robot, chess machine) Supervised Learning: find f Given: Training set {(xi , yi ) | i = 1 … n} Find: A good approximation to f : X → Y Supervised Learning Uses: Prediction of future cases: Use the rule to predict the output for future inputs e.g., spam detection Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud 12 Supervised Learning Unsupervised Learning Learning “what normally happens” No output Clustering: Grouping similar instances Example applications Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs 14 UNSUPERVISED LEARNING Reinforcement Learning Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agents, partial observability,... 16 Supervised Learning Classification from data ---> discrete classes Learning a Class from Examples Class C of a “family car” Prediction: Is car x a family car? Knowledge extraction: What do people expect from a family car? Output: Positive (+) and negative (–) examples Input representation: x1: price, x2 : engine power 21 Training set X X = {xt ,r t }tN=1  1 if x is positive r = 0 if x is negative  x1  x=  x2  22 Class C (p1  price  p2 ) AND (e1  engine power  e2 ) 23 Multiple Classes, Ci i=1,...,K X = {xt ,r t }tN=1 1 if x t C i ri =  t 0 if x t C j , j  i Train hypotheses hi(x), i =1,...,K:  if t Ci hi (x ) =  t 1 x 0 if x t C j , j  i 24 Classification: Applications Aka Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles. Speech recognition: Temporal dependency. Medical diagnosis: From symptoms to illnesses Biometrics: Recognition/authentication using physical and/or behavioral characteristics: Face, iris, signature, etc Outlier/novelty detection: 25 Important Concepts Data: labeled instances, e.g. emails marked spam/Not spam Training set Held out set (sometimes call Validation set) Test set Features: attribute-value pairs which characterize each x Experimentation cycle Select a hypothesis f to best match training set Tune hyperparameters on held-out or validation set Compute accuracy of test set Very important: never “peek” at the test set! Evaluation e.g. Accuracy: fraction of instances predicted correctly Overfitting and generalization Want a classifier which does well on test data Overfitting: fitting the training data very closely, but not generalizing well We’ll investigate overfitting and generalization formally in a few lectures Supervised Learning Regression predicting a numeric value Regression X = x , r t  t N t =1 g(x ) = w1 x + w 0 rt  g(x ) = w 2 x 2 + w1 x + w 0 r t = f (x t ) +   E (g | X ) =  r − g (x ) 1 N t N t =1 t 2  1 N t N t =1  E (w1 , w 0 | X ) =  r − (w1 x + w 0 ) t 2 30 Model Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique solution The need for inductive bias, assumptions about H Generalization: How well a model performs on new data Overfitting: H more complex than f Underfitting: H less complex than f 31 Triple Trade-Off There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, f, 2. Training set size, N, 3. Generalization error, E, on new data (Loss function) 32 Cross-Validation To estimate generalization error, we Training set (50%) need data unseen during training. We Validation set (25%) split the data as Test (publication) set (25%) Resampling when there is few data 33 Dimensions of a Supervised Learner 1. Model: g(x | ) E ( | X ) =  L(r t , g (xt | )) 2. Loss function: t 3. Optimization procedure:  * = arg min E ( | X )  34 Bias and Variance Unknown parameter  Estimator di = d (Xi) on sample Xi Bias: b(d) = E [d] –  Variance: E [(d–E [d])2]  Mean square error: r (d,) = E [(d–)2] = (E [d] – )2 + E [(d–E [d])2] = Bias2 + Variance 35 Bias/Variance Dilemma Example: gi(x)=2 has no variance and high bias gi(x)= ∑t rti/N has lower bias with variance As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with data) Bias/Variance dilemma: (Geman et al., 1992) 36 Supervised Learning Ranking Comparing Items Collaborative Filtering Unsupervised Learning Clustering Discovering structure in data Python Programming Language Extensive Collection of Libraries and Packages 1.Working with textual data – use NLTK, SciKit, and NumPy 2.Working with images – use Sci-Kit image and OpenCV 3.Working with audio – use Librosa 4.Implementing deep learning – use TensorFlow, Keras, PyTorch 5.Implementing basic machine learning algorithms – use Sci-Kit- learn. 6.Want to do scientific computing – use Sci-Py 7.Want to visualise the data clearly – use Matplotlib, Sci-Kit, and Seaborn. R Programming Language R has an exhaustive list of packages for machine learning 1.MICE for dealing with missing values. 2.CARET for working with classification and regression problems. 3.PARTY and rpart for creating data partitions. 4.randomFOREST for creating decision trees. 5.dplyr and tidyr for data manipulation. 6.ggplot2 for creating beautiful visualizations. 7.Rmarkdown and Shiny for communicating insights through reports Other Languages… Java and JavaScript Julia LISP Pandas is a Python library used for working with data sets. (Data Preprocessing) It has functions for analyzing, cleaning, exploring, and manipulating data. The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008. It provides models and algorithms for Tool Cost/Plan Official Website Classification, Regression, Clustering, Details scikit-learn Dimensional reduction, Model selection, Free. and Pre-processing. Released in January 2017, is an open-source machine learning and deep learning library based on Torch, a scientific computing framework and script language that is in turn based on the Lua programming language. Features: Tool Cost/Plan Details Official Website Helps in training and building your Free models. Tensorflow You can run your existing models with the help of TensorFlow.js which is a model converter. It helps in the neural network. Features: Keras is an API (Application Tool Cost/Plan Details Official Website Programming Interface) for It can be used for easy and fast neural networks. prototyping. It supports convolution networks. Free Keras It assists recurrent networks. It supports a combination of two networks. It can be run on the CPU and GPU. Data pipelines Data ingestion CSV/JSON/XML/H5 files, RDBMS, NoSQL, HTTP,... Data cleaning Must be done systematically outliers/invalid values? → filter missing values? → impute Data transformation scaling/normalization 56 Questions Thank you

Learning from Data - Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript