Machine Learning Lecture Notes PDF

Machine Learning Dr. Amira Abdelatey The Data Science Process (contd.) Exploratory 4 data analysis...

Machine Learning Dr. Amira Abdelatey The Data Science Process (contd.) Exploratory 4 data analysis Raw data 1 Data is2 Data is 3 collected processed cleaned Machine 5 learning algorithms; Build data7 Statistical products models Communication 6 Visualization Report Findings Make decisions Micro-level data strategy 12/8/2024 Rich's Training 2 The Seven Steps 1. Collect data 2. Prepare data 3. Clean data 4. EDA 5. Visualize + understand 6. Fit Models/apply algorithms + analyze + predict + understand (deeper) + visualize 7. Build data products 12/8/2024 Rich's Big Data Anlytics Training 3 Classification algorithms Naive Bayes classifier Support vector machines k-nearest neighbor Random forest tree Decision tree Logistic regression 4 Clustering algorithms 1. K-Means clustering 2. Mean Shift Clustering algorithm 3. Gaussian Mixture Model 5 Feature selection Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Why? – to avoid the curse of dimensionality Decision tree Decision Trees as Rules Training Data Example: Goal is to Predict When This Player Will Play Tennis? Intro AI Decision Trees 8 9 Feature selection Feature selection is a process that chooses a subset of features from the original features so that the feature space is optimally reduced according to a certain criterion. Temperature is irrelevant Homework Which feature will be at the root node of the decision tree trained for the following data? In other words which attribute makes a person most attractive? Height Hair Eyes Attractive? small blonde brown No tall dark brown No tall blonde blue Yes tall dark Blue No small dark Blue No tall red Blue Yes tall blonde brown No small blonde blue Yes Import required libraries # Load libraries import pandas as pd from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier from sklearn.model_selection import train_test_split # Import train_test_split function from sklearn import metrics #Import scikit-learn metrics module for accuracy Loading data load the required Pima Indian Diabetes dataset using pandas' read CSV function. col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label'] # load dataset pima = pd.read_csv("diabetes.csv", header=None, names=col_names) pima.head() Feature Selection Splitting Data # Split dataset into training set and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% test Building Decision Tree Model Evaluating the Model

Machine Learning Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript