Data Mining & Warehousing Practical 08: Logistic Regression PDF

Document Details

Uploaded by Deleted User

Smt. Chandibai Himathmal Mansukhani College

2024

Tags

logistic regression python programming data mining machine learning

Summary

This document contains a practical assignment for the Data Mining & Warehousing course, focusing on logistic regression using Python. Students are required to implement logistic regression for the Iris dataset and another dataset pertaining to breast cancer, showcasing the practical application of classification techniques. The practical covers steps from data loading to model evaluation, including aspects like data preprocessing and evaluation metrics like confusion matrices.

Full Transcript

Smt.Chandibai Himathmal Mansukhani College Data Mining & Warehousing (USCSP6041) Practical 08: Logistic Regression Learning Objectives Students will be able to: Content: Implementing a logistic regression method to make predictions base...

Smt.Chandibai Himathmal Mansukhani College Data Mining & Warehousing (USCSP6041) Practical 08: Logistic Regression Learning Objectives Students will be able to: Content: Implementing a logistic regression method to make predictions based on the sample data set using Python. Process: Find the best fitting model to describe the relationship between the variables. Estimating the parameters of the model and predicting probabilities Prior Knowledge: Basics of regression. Concept of Logistic regression AIM: Implement a logistic regression method to make predictions based on the sample data set using Python What is Logistic Regression? Logistic Regression is a statistical method used for binary classification problems, where the goal is to predict one of two possible outcomes. Despite the name, logistic regression is used for classification tasks rather than regression tasks. Binary Classification: Logistic regression is commonly used when you have two possible categories or classes. For example, in a breast cancer dataset, it could predict whether a tumor is benign (0) or malignant (1). Logistic Function (Sigmoid): Logistic regression applies the logistic (sigmoid) function to the output of a linear equation to map any real-valued number into a probability between 0 and 1. The sigmoid function is defined as: Output: The output of logistic regression is a probability, and based on a chosen threshold (commonly 0.5), the model predicts: If the probability is greater than or equal to 0.5, the outcome is classified as 1 (positive class, e.g., Batch No:___________/ Roll No:_____________ 63 Smt.Chandibai Himathmal Mansukhani College malignant). If the probability is less than 0.5, the outcome is classified as 0 (negative class, e.g., benign). Formula for Logistic Regression: Q1. Implement logistic regression in Python to classify iris flowers into three species (Setosa, Versicolor, and Virginica) based on their features using the iris dataset. Import the modules Load the dataset Smt. CHM College/CS Dept./Sem-VI/USCS601/USCSP6041/2024-2025 64 Smt.Chandibai Himathmal Mansukhani College Extract features and target Split the Data for Training and Testing Standardize feature value Train a multinomial logistic regression model Make predictions on the test set Batch No:___________/ Roll No:_____________ 65 Smt.Chandibai Himathmal Mansukhani College Print the accuracy of the Model Display classification report Compute confusion matrix Plot confusion matrix Smt. CHM College/CS Dept./Sem-VI/USCS601/USCSP6041/2024-2025 66 Smt.Chandibai Himathmal Mansukhani College Output of the plot: Q2. Implement logistic regression in Python to predict whether a breast tumor is malignant or benign based on several features using Breast Cancer Dataset. Import required libraries Load the dataset Features and target variable Batch No:___________/ Roll No:_____________ 67 Smt.Chandibai Himathmal Mansukhani College Split the dataset into training and testing sets Standardize the feature values Create and train the logistic regression model Make predictions on the test set Evaluate the model's performance Display the classification report Smt. CHM College/CS Dept./Sem-VI/USCS601/USCSP6041/2024-2025 68 Smt.Chandibai Himathmal Mansukhani College Compute the confusion matrix Plot the confusion matrix Output of the plot: Batch No:___________/ Roll No:_____________ 69 Smt.Chandibai Himathmal Mansukhani College Q3. Transform iris dataset into a binary classification problem (Setosa vs. Non-Setosa), and then use logistic regression to predict the species of the iris flower, where the outcome is binary (Setosa or not Setosa). Import necessary libraries Load the Iris dataset Convert the target variable to binary: Setosa = 1, Non-Setosa = 0 Split the dataset into training and testing Standardize the feature values Smt. CHM College/CS Dept./Sem-VI/USCS601/USCSP6041/2024-2025 70 Smt.Chandibai Himathmal Mansukhani College Create and train the logistic regression model Make predictions on the test set Evaluate the model's performance Display the classification report Compute the confusion matrix Batch No:___________/ Roll No:_____________ 71 Smt.Chandibai Himathmal Mansukhani College Plot the confusion matrix Output: Date: ________________________ Teacher’s Signature: _________________________ Smt. CHM College/CS Dept./Sem-VI/USCS601/USCSP6041/2024-2025 72

Use Quizgecko on...
Browser
Browser