EE5252_Lecture01-Introduction to Machine Learning.pdf
Document Details
University of Ruhuna
Tags
Full Transcript
Lecture 01: Introduction to Machine Learning EE5252: Machine Learning Mr. M.W.G Charuka Kavinda Moremada Lecturer, Department of Electrical and Information Engineering, Faculty of Engineering, University of Ruhuna. Lecture Overview 1. What is Machine Learn...
Lecture 01: Introduction to Machine Learning EE5252: Machine Learning Mr. M.W.G Charuka Kavinda Moremada Lecturer, Department of Electrical and Information Engineering, Faculty of Engineering, University of Ruhuna. Lecture Overview 1. What is Machine Learning (ML) 6. Data i. Key Terminologies 2. Conventional Programming vs ML ii. Splits 3. A Brief History 7. Parameters vs Hyperparameters 4. ML in Action 8. General ML Application Development 5. Types of ML Workflow i. Supervised Learning 9. Example: Classify iris plants into ii. Unsupervised Learning three species iii. Reinforced Learning 10. How Supervised Learning Works? iv. Semi-Supervised Learning 11. Project i. YouTube Tutorials 6/28/2024 EE7209: Machine Learning 2 What is Machine Learning (ML) 6/28/2024 EE7209: Machine Learning 3 What is Machine Learning (ML) Subfield of Artificial Intelligence (AI). The goal of AI is to create computer models that exhibit “intelligent behaviors” like humans. Machine learning is one way to use AI. The field of study that gives computers the ability to learn without explicitly being programmed. - Arthur Samuel 6/28/2024 EE7209: Machine Learning 4 What is Machine Learning (ML) ML takes the approach of letting computers learn to program themselves through experience. Machine learning starts with data. Machine learning is turning data into information. These data can be: Numbers Pictures Text Sound/voice records Time series data from sensors etc. These data will be collected and transformed in to forms which can be utilized as the information that the ML models can trained on. 6/28/2024 EE7209: Machine Learning 6 Conventional Programming vs ML Conventional Programming Example for Activity Recognition. What is this? 6/28/2024 EE7209: Machine Learning 7 Conventional Programming vs ML ML example for activity recognition. ML build a mathematical model based on sample data (generally known as “training data”) for the problem without being explicitly programmed. 6/28/2024 EE7209: Machine Learning 8 Conventional Programming vs ML 6/28/2024 EE7209: Machine Learning 9 1959: Arthur Samuel of IBM A Brief History came up with the phrase “Machine Learning”. 1950-1980: Very first ML programs for simple applications (e.g., improving computer performance in game checkers, recognizing rough patterns). 1980-2010: Increased the availability of digital data from internet growth. Shifted from knowledge-driven approaches to data driven approaches. 2010-Today: Cheaper memory allows storage of massive amount of data. Faster processing power become available (E.g., GPUs). Deep learning become a reality. 6/28/2024 EE7209: Machine Learning 10 ML in Action Image Recognition: ML is used to identify objects, persons, places, and digital images. For example, Facebook uses ML for automatic friend tagging suggestions. 6/28/2024 EE7209: Machine Learning 11 ML in Action Translation: ML is used to translate texts from one language to another. 6/28/2024 EE7209: Machine Learning 12 ML in Action Self-driving cars: Identifying objects and obstacles on the road to navigate safely. 6/28/2024 EE7209: Machine Learning 13 ML in Action Medical Diagnosis: Analyze patient’s data and early detection of illnesses. Analyzing medical images like X-rays and MRIs to identify diseases and abnormalities 6/28/2024 EE7209: Machine Learning 14 ML in Action Recommender Systems: Product recommendations: Suggesting products to users based on their browsing history and past purchases. Content recommendation: Recommending music, movies, and articles that users might enjoy. News feed personalization: Curating news feeds and social media content based on individual user preferences. 6/28/2024 EE7209: Machine Learning 15 ML in Action Some other common examples: Fraud Detection: Identifying fraudulent transactions in credit card spending and online banking. Detecting fake news and misinformation online. Preventing cyberattacks and malware infections. Predictive Analytics: Predicting maintenance needs: Foreseeing equipment failures to prevent downtime and costly repairs. Demand forecasting: Predicting customer demand for products and services to optimize inventory and production. Risk assessment: Evaluating the likelihood of events like loan defaults or insurance claims. 6/28/2024 EE7209: Machine Learning 16 Types of ML Supervised Learning. Unsupervised Learning. Reinforcement Learning. Semi-Supervised Learning – Falls between supervised and unsupervised learning. 6/28/2024 EE7209: Machine Learning 17 Types of ML: Supervised Learning Supervised Machine Learning: In supervised learning, the model is trained on a labelled dataset. Labelled datasets have both input and output parameters. The model learns to map points between inputs and correct outputs. There are two main categories of supervised learning: Classification: Classification algorithms predict categorical target variables, which represent discrete classes or labels. Example usages: predicting whether a person is ill or not, detecting fraudulent transactions, face classifier. Regression: Regression algorithms predict continuous target variables, which represent numerical values. Example usages: assessing the house price, forecasting grocery store food demand, temperature forecasting. 6/28/2024 EE7209: Machine Learning 18 Types of ML: Supervised Learning 6/28/2024 EE7209: Machine Learning 19 Types of ML: Unsupervised Learning Unsupervised Machine Learning: Unsupervised learning involves training the model with datasets that contain only input parameters. The model identifies patterns and relationships in the data. 6/28/2024 EE7209: Machine Learning 20 Types of ML: Unsupervised Learning Some common types of unsupervised learning: Clustering: This is the task of dividing the population or data points into several groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. It’s basically the collection of objects based on similarity and dissimilarity between them. Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y. Dimensionality Reduction: This method is used when the number of features (or dimensions) in the dataset is too high. It reduces the number of observed variables into a smaller number of principal variables. 6/28/2024 EE7209: Machine Learning 21 Types of ML: Reinforced Learning Reinforcement Learning: In reinforcement learning, an agent learns to behave in an environment by performing certain actions and observing the rewards/results which it gets from those actions. 6/28/2024 EE7209: Machine Learning 22 Types of ML: Semi-Supervised Learning Semi-Supervised Machine Learning: Semi-supervised learning is a combination of supervised and unsupervised learning. The model is trained on a partially labelled dataset 6/28/2024 EE7209: Machine Learning 23 Data The data available can divide into mainly categories. Structured Data: Type of data that fits nicely into a relational database. It’s highly organized and easily analyzed. Unstructured Data: These data doesn’t fit nicely into a spreadsheet or database. It can be textual or non-textual. It can be human- or machine-generated. 6/28/2024 EE7209: Machine Learning 24 Data: Structured Data 6/28/2024 EE7209: Machine Learning 25 Data: Unstructured Data 6/28/2024 EE7209: Machine Learning 26 Data: Key Terminologies An example with a structured dataset for a classification problem. Task: Bird species classification based on four features. Example From: P. Harrington, Machine Learning in Action. Shelter Island, NY: Manning Publications, 2012. ISBN-13: 978-1617290183 6/28/2024 EE7209: Machine Learning 27 The target variable is what we’ll be trying to predict with our machine learning algorithms. In classification Data: Key Terminologies the target variable takes on a nominal value, and in the task of regression its value could be continuous. An example with a structured dataset for a classification problem. Task: Bird species classification based on four features. Example From: P. Harrington, Machine Learning in Action. Shelter Island, NY: Manning Publications, 2012. ISBN-13: 978-1617290183 6/28/2024 EE7209: Machine Learning 28 Data: Key Terminologies Features: Features or Note: Supervised vs Unsupervised Cases attributes are the individual measurements that, when combined with other features, make up a training example. This is usually columns in a training or test set. 6/28/2024 EE7209: Machine Learning 29 Data: Splits More information on overfitting to be followed in upcoming lectures… We need to split our data into three sets because we want to make sure that our model is not overfitting to the training data. Overfitting occurs when the model learns the training data too well, and it becomes unable to generalize to new data. Training: Used to train the model. Validation: Tune the hyperparameters Testing: Evaluate the model’s performance on new data. 6/28/2024 EE7209: Machine Learning 30 Parameters vs Hyperparameters Parameters Hyperparameters Estimated during the training with historical Values are set beforehand. data. Part of the model. External to the model. Estimated values are saved with the trained Not a part of the trained model; therefore, model. values are not saved. Depends on the dataset that the system is Independent of the dataset. trained. Will become more clear when we are staring to develop models… 6/28/2024 EE7209: Machine Learning 31 General ML Application Development Workflow 6/28/2024 EE7209: Machine Learning 32 Example: Classify iris plants into three species Number of Classes: 3 Samples per class: 50 Samples in total: 150 Dimensionality: 4 Ref:“Classification — Python Numerical Methods,” pythonnumericalmethods.berkeley.edu. https://pythonnumericalmethods.berkeley.edu/notebooks/c hapter25.02-Classification.html (accessed Jan. 15, 2024). 6/28/2024 EE7209: Machine Learning 33 Example: Classify iris plants into three species Note: Only 2 features have been utilized for visualization purposes. Ref:“Classification — Python Numerical Methods,” pythonnumericalmethods.berkeley.edu. https://pythonnumericalmethods.berkeley.edu/notebooks/c hapter25.02-Classification.html (accessed Jan. 15, 2024). 6/28/2024 EE7209: Machine Learning 34 Example: Classify iris plants into three species Ref:“Classification — Python Numerical Methods,” pythonnumericalmethods.berkeley.edu. https://pythonnumericalmethods.berkeley.edu/notebooks/c hapter25.02-Classification.html (accessed Jan. 15, 2024). 6/28/2024 EE7209: Machine Learning 35 How Supervised Learning Works? 6/28/2024 EE7209: Machine Learning 36 Project Each project group will consist of 2 members. Together you need to solve, classification, regression (supervised learning) or clustering problem (unsupervised learning) using two machine learning models and compare the model performance for the selected problem. Need to select a dataset for your problem. Some popular data repositories: Kaggle: https://www.kaggle.com/ UC Irvine: https://archive.ics.uci.edu/ Recommended Language of Implementation: Python Platforms for implementation: Google CoLab: https://colab.google/ Jupyter Notebook hosted locally with Anaconda platform: https://www.anaconda.com/download# 6/28/2024 EE7209: Machine Learning 37 Project: YouTube Tutorials Python: https://youtu.be/rfscVS0vtbw NumPy: https://youtu.be/QUT1VHiLmmI Pandas: https://youtu.be/vmEHCJofslg Matplotlib: https://youtu.be/3Xc3CA655Y4 Scikit-Learn: https://youtu.be/pqNCD_5r0IU Feel free to explore more tutorials/knowledge by yourself. Upcoming important days: 5th of June 2024: Open the LMS assignment for the proposal submission. 12th of July 2024: Proposal submissions will be closed. Better to start thinking about what are you going to do! 6/28/2024 EE7209: Machine Learning 38 References 1. “What is Machine Learning?,” www.youtube.com. https://youtu.be/f_uwKZIAeM0 (accessed May 21, 2021). 2. M. Crabtree, “What is Machine Learning? Definition, Types, Tools & More,” datacamp, Jul. 2023. https://www.datacamp.com/blog/what- is-machine-learning 3. S. Brown, “Machine learning, explained,” MIT Sloan, Apr. 21, 2021. https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning- explained 4. R. Stefanus, “Conventional Programming VS Machine Learning,” Medium, May 28, 2019. https://rstefanus16.medium.com/conventional- programming-vs-machine-learning-a3b7b3425531 5. A. Pant, “Introduction to Machine Learning for Beginners,” Medium, Jan. 22, 2019. https://towardsdatascience.com/introduction-to- machine-learning-for-beginners-eed6024fdb08 6. adjhbv67thgfj, “The Evolution of Machine Learning in Business,” TurinTech AI, Oct. 04, 2021. https://www.turintech.ai/the-evolution-of- machine-learning-in-business/ (accessed Jan. 14, 2024). 7. Renukasoni, “Image detection, recognition and image classification with machine learning.,” Medium, Jul. 31, 2019. https://medium.com/ai-techsystems/image-detection-recognition-and-image-classification-with-machine-learning-92226ea5f595 8. www.aionlinecourse.com, “Self-Driving car research topics and guidelines,” www.aionlinecourse.com. https://www.aionlinecourse.com/blog/self-driving-car-research-topics-and-guidelines (accessed Jan. 14, 2024). 6/28/2024 EE7209: Machine Learning 39 References 9. U. S. G. A. Office, “Machine Learning’s Potential to Improve Medical Diagnosis,” www.gao.gov, Nov. 10, 2022. https://www.gao.gov/blog/machine-learnings-potential-improve-medical-diagnosis 10. A. Kumar, “Recommender Systems in Machine Learning: Examples,” Data Analytics, Aug. 19, 2022. https://vitalflux.com/recommender- systems-in-machine-learning-examples/ 11. “How to Use Machine Learning in Fraud Detection and Prevention,” Intellias, Jul. 30, 2019. https://intellias.com/how-to-use-machine- learning-in-fraud-detection/ 12. Kamil Krzyk, “Coding Deep Learning For Beginners,” Medium, Jul. 25, 2018. https://towardsdatascience.com/coding-deep-learning-for- beginners-types-of-machine-learning-b9e651e1ed9d 13. A. Pant, “Introduction to Machine Learning for Beginners,” Medium, Jan. 22, 2019. https://towardsdatascience.com/introduction-to- machine-learning-for-beginners-eed6024fdb08 14. C. Atten, “The Ultimate Beginner Guide of Semi-Supervised Learning,” Medium, Sep. 25, 2023. https://medium.datadriveninvestor.com/the-ultimate-beginner-guide-of-semi-supervised-learning-3bd11cb19835 15. “Structured Vs. Unstructured Data: What is the Difference?,” encord.com. https://encord.com/blog/what-is-structured-data-and- unstructured-data/ (accessed Jan. 15, 2024). 16. [email protected], “What is Structured Data vs. Unstructured Data?,” M-Files, Oct. 18, 2022. https://www.m-files.com/what-is- structured-data-vs-unstructured-data-3/ (accessed Jan. 15, 2024). 6/28/2024 EE7209: Machine Learning 40 References 17. “Supervised vs Unsupervised Learning, Explained - Sharp Sight,” www.sharpsightlabs.com, Apr. 12, 2021. https://www.sharpsightlabs.com/blog/supervised-vs-unsupervised-learning/#intro (accessed Jan. 15, 2024). 18. R. Chavan, “Understanding Train, Test, and Validation Split in Simple Quick Terms,” Medium, Jun. 25, 2023. https://medium.com/@rahulchavan4894/understanding-train-test-and-validation-dataset-split-in-simple-quick-terms-5a8630fe58c8 (accessed Jan. 15, 2024). 19. A. Kaladharan, “What are parameters and Hyperparameters in Machine Learning?,” Medium, Jul. 28, 2021. https://medium.com/@athi.9307/what-are-parameters-and-hyperparameters-in-machine-learning-93ba71d71f76 6/28/2024 EE7209: Machine Learning 41 Thank You 6/28/2024 EE7209: Machine Learning 42