Classifying Machine Learning Techniques PDF
Document Details
Uploaded by Deleted User
Westcliff University
2024
Bhagawati Prasad Chaudhary
Tags
Related
Summary
This document provides an overview of classifying machine learning techniques, including supervised learning with linear regression, unsupervised learning with k-means clustering, classical machine learning with logistic regression, and deep learning models with neural networks. It details the process of using exploratory data analysis (EDA) and feature engineering to prepare data for a neural network model.
Full Transcript
**Classifying the Machine Learning Techniques** Bhagawati Prasad Chaudhary Presidential Graduate School Westcliff University California, USA New-Baneshwor, Kathmandu TECH 405: Artificial Neural Network & Deep Learning Professor Acharya October 27, 2024 **Classifying the Machine Learning Tec...
**Classifying the Machine Learning Techniques** Bhagawati Prasad Chaudhary Presidential Graduate School Westcliff University California, USA New-Baneshwor, Kathmandu TECH 405: Artificial Neural Network & Deep Learning Professor Acharya October 27, 2024 **Classifying the Machine Learning Techniques** Introduction: Machine learning is a field which is also included in artificial intelligence in order to create algorithms from the valuable data so that decisions can be made through the proper analysis. Machine learning is categorized in three main categories i.e., supervised, unsupervised, and reinforcement learning which are categorized based on the algorithms which are learnt from the data. Types of Machine Learning Methods: 1. Supervised Learning (with Linear Regression): Used Code and Output: Explanation: 2. Unsupervised Learning (with K-means Clustering): Used Code and Output: ![](media/image2.png) Explanation: 3. Classical Machine Learning Model (with Logistic Regression): Code and Output: 4. Deep Learning Mode (with Neural Network): Code and Output: ![](media/image4.png) Summary of Model Performances: In, Linear Regression, MSE of 0.5559 showed the moderate level of prediction error. In K-means Clustering, clusters presented visually in image shows the potential grouping of similar data within the limited area. In Logistic Regression, there is need to validate on additional datasets in order to ensure that the perfect accuracy is not actually due to overfitting, because perfect accuracy is rare to get. As for Neural Network model, it performed well and while passing over the data the improvement in loss and accuracy between epochs reveals the effectiveness of model performance in case of minimizing error. Link to performed code and output: **References** GeeksforGeeks. (2024, June 20). *Logistic regression in machine learning*. GeeksforGeeks. *IBM DB2 Warehouse on Cloud*. (n.d.). Stanford Online. (2020, April 17). *Lecture 11 - Introduction to Neural Networks \| Stanford CS229: Machine Learning (Autumn 2018)* \[Video\]. YouTube. Swaminathan, S. (2019, January 18). Linear regression --- detailed view - towards data science. *Medium*. **Exploratory Data Analysis and Feature Engineering for Neural Networks** Bhagawati Prasad Chaudhary Presidential Graduate School Westcliff University California, USA New-Baneshwor, Kathmandu TECH 405b: Artificial Neural Network & Deep Learning Professor Aacharya November 06, 2024 **Table of Contents** {#table-of-contents.TOCHeading} ===================== [Abstract 3](#abstract) [1. Introduction 4](#introduction) [1.1 Objective 4](#objective) [1.2 Dataset Description 5](#dataset-description) [2. Understanding Key Concepts 5](#understanding-key-concepts) [2.1 Exploratory Data Analysis (EDA) 5](#exploratory-data-analysis-eda) [2.2 Feature Engineering 5](#feature-engineering) [2.3 Visualization 5](#visualization) [3. Dataset Structure 6](#dataset-structure) [4. Exploratory Data Analysis 7](#exploratory-data-analysis) [4.1 Histogram 9](#histogram) [5. Feature Engineering 12](#feature-engineering-1) [5.1 Encoding Categorical Variables 12](#encoding-categorical-variables) [5.2 Feature Scaling: 12](#feature-scaling) [6. Findings and Observations 13](#findings-and-observations) [7. Conclusion 13](#conclusion) [8. Repository Links 14](#repository-links) [**References** 15](#references) **Table of Figures** [Figure 1: Code for knowing Dataset Structure 6](#_Toc181798570) [Figure 2: Output of data Structure (Code) 7](#_Toc181798571) [Figure 3: Code and Output of EDA 8](#_Toc181798572) [Figure 4:Code for Histogram of calories 9](#_Toc181798573) [Figure 5: Output of Histogram of Calories 9](#_Toc181798574) [Figure 6: Code for Bar chart of manufacturer distribution 10](#_Toc181798575) [Figure 7:Output of Bar Chart (Cereal Manufacturer Distribution) 10](#_Toc181798576) [Figure 8: Code for \'Correlation Heatmap\' 11](#_Toc181798577) [Figure 9: Output of Correlation Heatmap 'Cereal' 11](#_Toc181798578) [Figure 10: Code and Output for Encoding Categorical Variable 12](#_Toc181798579) [Figure 11:code and Output of Feature Scaling 12](#_Toc181798580) **Abstract** ============ This report is based on our course 1^st^ week assignment which helps to explore and prepares 'cereal' dataset so that it can be used in a neural network model by following the series of data preprocessing steps. So, the first is to go with EDA process, where statistical summaries and visualization, such as histograms and heatmaps in order to reveal patterns and relationships among features. By following EDA and also feature engineering techniques are applied for converting categorical data into a suitable format and also to normalize numerical values which is an important matter for establishing a structured approach to data preparation in order to enhance model performance and interpretability. *Keywords:* Visualization, EDA, Neural Network Model, Histograms. **Exploratory Data Analysis and Feature Engineering for Neural Networks** 1. **Introduction** ================ 1. **Objective** ------------- **Dataset Description** ----------------------- 2. Understanding Key Concepts ========================== 3. **Exploratory Data Analysis (EDA)** ----------------------------------- **Feature Engineering** ----------------------- 5. **Visualization** ----------------- Visualizations of data is important for finding out the important patters of the dataset, such as in 'cereals' dataset, there are calorie distribution, fiber content, and patterns like how many nutritional factors have affected the cereal rating. Popular example of Visualization are: histograms, bar charts, and heatmap. **Data**set Structure ===================== []{#_Toc181798570.anchor}Figure 1: Code for knowing Dataset Structure []{#_Toc181798571.anchor}Figure 2: Output of data Structure (Code) Exploratory Data Analysis ========================= []{#_Toc181798572.anchor}Figure 3: Code and Output of EDA Codes are performed with the help of Codecademy (n.d.) and Ngecha (2023). Histogram --------- []{#_Toc181798573.anchor}Figure 4:Code for Histogram of calories []{#_Toc181798574.anchor}Figure 5: Output of Histogram of Calories The output generated from the codes is the histogram of the distribution of numeric features 'Calories'. 7. Bar Charts Code and Output: ![](media/image11.png) []{#_Toc181798575.anchor}Figure 6: Code for Bar chart of manufacturer distribution []{#_Toc181798576.anchor}Figure 7:Output of Bar Chart (Cereal Manufacturer Distribution) The output reflects the number of cereals from each manufacturer of dataset while also showing that which brand have the most products. 8. Heatmap Code and Output: ![](media/image13.png) []{#_Toc181798577.anchor}Figure 8: Code for \'Correlation Heatmap\' []{#_Toc181798578.anchor}Figure 9: Output of Correlation Heatmap 'Cereal' The result clearly reveals the correlations between features of the 'cereals' datasets for example, fiber. Rating, sugars where the correlations which are strong means that which features is the way more influenced on the rating of cereal which is very useful in case for performing future models. 5. Feature Engineering =================== 9. **Encoding Categorical Variables** ---------------------------------- []{#_Toc181798579.anchor}Figure 10: Code and Output for Encoding Categorical Variable **Feature Scaling:** -------------------- Code and Output: ![](media/image16.png) []{#_Toc181798580.anchor}Figure 11:code and Output of Feature Scaling Findings and Observations ========================= Conclusion ========== Repository Links ================ **References** ============== *Cereals dataset*. (2017, December 8). Kaggle. Codecademy. (n.d.). *Exploratory Data analysis: data visualization*. Codecademy. Ngecha, M. (2023, October 9). Exploratory Data Analysis (EDA)through Data Visualization. *Medium*. Week3\_DQ -- \[Deep Learning\] In current time, deep learning is considered as very valuable in many sectors, like healthcare agriculture, cybersecurity, and even in social media. However, the main challenge in deep learning is that it is quite challenging for people to understand its concept of making decisions. So, because of lack of clarity in this, it is still seen as concerning in sensitive fields such as, healthcare. In order to use and manage deep learning safely and effectively in those sensitive fields, researchers are focused on approaches that can balance the performance in better way. Ways to Make Deep Learning More Understandable: 1. Explainable AI (XAI): Explainable AI technique is very important to make people understand that how a model reaches its certain decision. Such as, tools like SHAP (Shapley Additive Explanation) helps to show the important factors that are important in case of a model's prediction (Molnar, 2024). In healthcare sector, this will be helpful for doctors to understand that what led an AI to diagnose a specific condition while highlighting factors such as age, blood pressure, or another data. 2. Layer-Wise Relevance Propagation (LRP): LRP is another method for helping to understand model's output by revealing the input data which have contributed most to the model decision (Guidotti et al., 2018). For example, in sector of agriculture, LRP can be used by farmers to make smart decision by clarifying the fact that which weather conditions or soil factors led to prediction about crop health with LRP. 3. Hybrid Models: By combining deep learning algorithm with simpler type of method such as decision trees, the transparency of model can be improved. 1. Expert Rule Based Systems: This Expert System are based on predefined rules given by experts. IBM has Watson that combines such rules with other forms of AI to recommend a treatment in healthcare with an explainability rooted in medical guidelines. 2. Probabilistic Models: These are models such as Bayesian networks that calculate the probability of an event given known conditions (Holzinger et al., 2019). For instance, in agriculture a Bayesian model could provide guidance for crop failure risk, by indicating the particular weather and soil conditions responsible. 3. Clear Machine Learning Models: Decision trees are simple and mostly do the job anyway. As an illustration, decision trees in finance evaluate credit risk associated with income and credit history by linking these factors together where all users can clearly comprehend decisions made on each level. **Real-World Example: Explainable Deep Learning in Healthcare** One of the real-world applications based on explainable AI in healthcare is the one being developed by Google DeepMind for predicting kidney injury. DeepMind has trained a deep learning model on patient data that is able to provide predictions about acute kidney injury approaching as far away as 48 hours beforehand. To tackle the interpretability problem, they apply SHAP values to explain the predictions in terms of features of patients such as blood pressure or creatinine levels. Such transparency enables healthcare providers to comprehend and trust the model forecasts, making these a more clinically-deployable solution. Hence, in sensitive fields there is important to provide proper and better explainability in AI by using simpler models or even by combining AI with rule-based methods, or applying interpretability tools which can provide proper understanding and ensure that AI can be safely trusted in all the critical applications. References Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. *ACM Computing Surveys*, *51*(5), 1--42. Molnar, C. (2024, July 31). *Interpretable Machine learning*. Holzinger, A., Langs, G., Denk, H., Zatloukal, K., & Müller, H. (2019). Causability and explainability of artificial intelligence in medicine. *Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery*, *9*(4).