Document Details

SportyDeciduousForest4462

Uploaded by SportyDeciduousForest4462

Dr Marcos Oliveira

Tags

data integration data science supervised learning linear regression

Summary

This lecture covers data integration strategies, including common user interfaces, middleware, application-based, and uniform data access techniques. The document discusses various approaches to combining data from disparate sources.

Full Transcript

Learning from Data Lecture 2 Dr Marcos Oliveira Lecture 2 1. Data Integration 2. Learning from Data 3. Supervised Learning a. Linear Regression 1. Data Integration 2. Learning from Data 3. Supervised Learning 4. Linear Regression D...

Learning from Data Lecture 2 Dr Marcos Oliveira Lecture 2 1. Data Integration 2. Learning from Data 3. Supervised Learning a. Linear Regression 1. Data Integration 2. Learning from Data 3. Supervised Learning 4. Linear Regression Data Integration Here’s where the popular view of data scientists diverges pretty significantly from reality. Generally, we think of data scientists building algorithms, exploring data, and doing predictive analysis. That’s actually not what they spend most of their time doing, however. What do Data Scientists do? 3% 5% What data scientists spend the most time doing 4% 9% Building training sets: 3% Cleaning and organizing data: 60% Collecting data sets; 19% 19% Mining data for patterns: 9% 60% Refining algorithms: 4% Other: 5% https://www.ipa.go.jp/digital/chousa/trend/datademocra/ug65p90000001hna-att/CrowdFlower_DataScienceReport_2016.pdf They had this to say: What do Data Scientists do? 3% 4% What’s the least enjoyable part of data science? 5% Building training sets: 10% 10% Cleaning and organizing data: 57% Collecting data sets: 21% 57% Mining data for patterns: 3% 21% Refining algorithms: 4% Other: 5% https://www.ipa.go.jp/digital/chousa/trend/datademocra/ug65p90000001hna-att/CrowdFlower_DataScienceReport_2016.pdf Data Integration Data Integration Data integration is the practice of combining data from heterogeneous sources into a single coherent data store. The goal is to provide users with consistent access and delivery of data across a spectrum of subjects and data structure types. Data Source 1 Data Data Source 2 Integration Why? Data Source 3 Data sources are often disparate and siloed. Access data across different sub-systems (e.g., hardware devices, software applications, operating systems). Data Integration Data integration is the practice of combining data from heterogeneous sources into a single coherent data store. The goal is to provide users with consistent access and delivery of data across a spectrum of subjects and data structure types. Data Source 1 Data Data Source 2 Integration Why? Data Source 3 Data sources are often disparate and siloed. Access data across different sub-systems (e.g., hardware devices, software applications, operating systems). Data Integration: Strategies Common user interface (also known as manual data integration): A hands-on approach where data managers manually handle every step of the integration, from retrieval to presentation. Middleware data integration: It uses middleware software to bridge and facilitate communication between different systems, especially between legacy and newer systems. Application-based integration: Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another. Uniform data access: It provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Common data storage (also known as data warehouse): It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. https://www.talend.com/resources/data-integration-methods/ Data Integration: Strategies Common user interface (also known as manual data integration): A hands-on approach where data managers manually handle every step of the integration, from retrieval to presentation. Middleware data integration: It uses middleware software to bridge and facilitate communication between different systems, especially between legacy and newer systems. Application-based integration: Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another. Uniform data access: It provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Common data storage (also known as data warehouse): It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. https://www.talend.com/resources/data-integration-methods/ Data Integration: Strategies Common user interface (also known as manual data integration): A hands-on approach where data managers manually handle every step of the integration, from retrieval to presentation. Middleware data integration: It uses middleware software to bridge and facilitate communication between different systems, especially between legacy and newer systems. Application-based integration: Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another. Uniform data access: It provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Common data storage (also known as data warehouse): It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. https://www.talend.com/resources/data-integration-methods/ Data Integration: Strategies Common user interface (also known as manual data integration): A hands-on approach where data managers manually handle every step of the integration, from retrieval to presentation. Middleware data integration: It uses middleware software to bridge and facilitate communication between different systems, especially between legacy and newer systems. Application-based integration: Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another. Uniform data access: It provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Common data storage (also known as data warehouse): It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. https://www.talend.com/resources/data-integration-methods/ Data Integration: Strategies Common user interface (also known as manual data integration): A hands-on approach where data managers manually handle every step of the integration, from retrieval to presentation. Middleware data integration: It uses middleware software to bridge and facilitate communication between different systems, especially between legacy and newer systems. Application-based integration: Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another. Uniform data access: It provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Common data storage (also known as data warehouse): It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. https://www.talend.com/resources/data-integration-methods/ Data Integration: Strategies Common user interface (also known as manual data integration): A hands-on approach where data managers manually handle every step of the integration, from retrieval to presentation. Middleware data integration: It uses middleware software to bridge and facilitate communication between different systems, especially between legacy and newer systems. Application-based integration: Software applications locate, retrieve and integrate data by making data from different sources and systems compatible with one another. Uniform data access: It provides a consistent view of data from diverse sources without moving or altering it, keeping the data in its original location. Common data storage (also known as data warehouse): It retrieves and presents data uniformly while creating and storing a duplicate copy, often in a central repository. https://www.talend.com/resources/data-integration-methods/ Data Integration: Strategies summary Pros and cons Technique Advantage Disadvantage Reduced cost, requires little maintenance, integrates a Data must be handled at each stage, scaling for Common user interface small number of data sources, user has total control. projects require changing code, manual orchestration. Middleware data Middleware software conducts the integration integration Middleware needs to be deployed and maintained. automatically, and the same way each time. Simplified process, application allows systems to Application based Requires specialist technical knowledge and integration transfer information seamlessly, much of the process is maintenance, complicated setup. automated. Can compromise data integrity, data host systems are Uniform access Lower storage requirements, provides a simplified integration not designed to handle amount and frequency of data view of the data to the end user, easier data access requests. Reduced burden on the host system, increased data Need to find a place to store a copy of the data, Common data storage version management control, can run sophisticated increases storage cost, require technical experts to set (Data warehouse) queries on a stored copy of the data without up the integration, oversee and maintain the data compromising data integrity warehouse. 1. Data Integration 2. Learning from Data 3. Supervised Learning 4. Linear Regression Learning from Data Learning from Data: Module Overview Supervised learning: Linear regression, polynomial regression, logistic regression. Measures of error, model complexity and model selection. Multilayer Perceptron, Convolution Neural Networks. K-Nearest Neighbors, Support Vector Machines, Linear Discriminant Analysis. Decision trees. Unsupervised learning: Centroid-based clustering, hierarchical clustering, density-based clustering. Gaussian Mixture Model, Clustering validation. Dimensionality reduction: PCA, t-SNE, UMAP. Introduction to Computer Vision. Natural language processing: TF-IDF, topic modeling, embeddings. Types of Learning Supervised learning algorithms use data with labelled outcomes. Regression Supervised Classification Unsupervised learning algorithms use data without labelled outcomes. Clustering Unsupervised Dimensionality Reduction Semi-supervised learning algorithms use both data with labelled outcomes and without labelled outcomes. Supervised learning Supervised learning is the most common form of machine learning. The task is to learn AAAB6HicbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQx68ZiAeUCyhNlJbzJmdnaZmRXCki/w4kERr36SN//GSbIHTSxoKKq66e4KEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6m/qtJ1Sax/LBjBP0IzqQPOSMGivVw16p7FbcGcgy8XJShhy1Xumr249ZGqE0TFCtO56bGD+jynAmcFLsphoTykZ0gB1LJY1Q+9ns0Ak5tUqfhLGyJQ2Zqb8nMhppPY4C2xlRM9SL3lT8z+ukJrzxMy6T1KBk80VhKoiJyfRr0ucKmRFjSyhT3N5K2JAqyozNpmhD8BZfXibN84p3VbmsX5Srt3kcBTiGEzgDD66hCvdQgwYwQHiGV3hzHp0X5935mLeuOPnMEfyB8/kDzjGM9A== f y 2Y. AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5iHJEmYnvcmQ2dllZlYISz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WDGSfoR3QgecgZNVZqjUmXS/LYK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCI/I8P x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N a mapping function from inputs to output Supervised learning Supervised learning is the most common form of machine learning. The task is to learn AAAB6HicbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQx68ZiAeUCyhNlJbzJmdnaZmRXCki/w4kERr36SN//GSbIHTSxoKKq66e4KEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6m/qtJ1Sax/LBjBP0IzqQPOSMGivVw16p7FbcGcgy8XJShhy1Xumr249ZGqE0TFCtO56bGD+jynAmcFLsphoTykZ0gB1LJY1Q+9ns0Ak5tUqfhLGyJQ2Zqb8nMhppPY4C2xlRM9SL3lT8z+ukJrzxMy6T1KBk80VhKoiJyfRr0ucKmRFjSyhT3N5K2JAqyozNpmhD8BZfXibN84p3VbmsX5Srt3kcBTiGEzgDD66hCvdQgwYwQHiGV3hzHp0X5935mLeuOPnMEfyB8/kDzjGM9A== f y 2Y. AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5iHJEmYnvcmQ2dllZlYISz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WDGSfoR3QgecgZNVZqjUmXS/LYK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCI/I8P x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N a mapping function from inputs to output features, covariates, or predictors label, target, or response Supervised learning Supervised learning is the most common form of machine learning. The task is to learn AAAB6HicbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQx68ZiAeUCyhNlJbzJmdnaZmRXCki/w4kERr36SN//GSbIHTSxoKKq66e4KEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6m/qtJ1Sax/LBjBP0IzqQPOSMGivVw16p7FbcGcgy8XJShhy1Xumr249ZGqE0TFCtO56bGD+jynAmcFLsphoTykZ0gB1LJY1Q+9ns0Ak5tUqfhLGyJQ2Zqb8nMhppPY4C2xlRM9SL3lT8z+ukJrzxMy6T1KBk80VhKoiJyfRr0ucKmRFjSyhT3N5K2JAqyozNpmhD8BZfXibN84p3VbmsX5Srt3kcBTiGEzgDD66hCvdQgwYwQHiGV3hzHp0X5935mLeuOPnMEfyB8/kDzjGM9A== f y 2Y. AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5iHJEmYnvcmQ2dllZlYISz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WDGSfoR3QgecgZNVZqjUmXS/LYK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCI/I8P x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N a mapping function from inputs to output features, covariates, or predictors Examples? label, target, or response Supervised learning Supervised learning is the most common form of machine learning. The task is to learn AAAB6HicbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQx68ZiAeUCyhNlJbzJmdnaZmRXCki/w4kERr36SN//GSbIHTSxoKKq66e4KEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6m/qtJ1Sax/LBjBP0IzqQPOSMGivVw16p7FbcGcgy8XJShhy1Xumr249ZGqE0TFCtO56bGD+jynAmcFLsphoTykZ0gB1LJY1Q+9ns0Ak5tUqfhLGyJQ2Zqb8nMhppPY4C2xlRM9SL3lT8z+ukJrzxMy6T1KBk80VhKoiJyfRr0ucKmRFjSyhT3N5K2JAqyozNpmhD8BZfXibN84p3VbmsX5Srt3kcBTiGEzgDD66hCvdQgwYwQHiGV3hzHp0X5935mLeuOPnMEfyB8/kDzjGM9A== f y 2Y. AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5iHJEmYnvcmQ2dllZlYISz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WDGSfoR3QgecgZNVZqjUmXS/LYK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCI/I8P x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N a mapping function from inputs to output features, covariates, or predictors label, target, or response Examples? cat cat cat cat dog cat dog cat cat cat dog dog cat cat dog cat Supervised learning Supervised learning is the most common form of machine learning. The task is to learn AAAB6HicbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQx68ZiAeUCyhNlJbzJmdnaZmRXCki/w4kERr36SN//GSbIHTSxoKKq66e4KEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6m/qtJ1Sax/LBjBP0IzqQPOSMGivVw16p7FbcGcgy8XJShhy1Xumr249ZGqE0TFCtO56bGD+jynAmcFLsphoTykZ0gB1LJY1Q+9ns0Ak5tUqfhLGyJQ2Zqb8nMhppPY4C2xlRM9SL3lT8z+ukJrzxMy6T1KBk80VhKoiJyfRr0ucKmRFjSyhT3N5K2JAqyozNpmhD8BZfXibN84p3VbmsX5Srt3kcBTiGEzgDD66hCvdQgwYwQHiGV3hzHp0X5935mLeuOPnMEfyB8/kDzjGM9A== f y 2Y. AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5iHJEmYnvcmQ2dllZlYISz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/VbT6g0j+WDGSfoR3QgecgZNVZqjUmXS/LYK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCI/I8P x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N a mapping function from inputs to output features, covariates, or predictors label, target, or response Examples? cat cat cat cat dog cat dog cat cat cat dog dog cat cat dog cat Unsupervised learning In unsupervised learning, our task is to try to “make sense of” data, as opposed to learning a mapping. x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N We have inputs but no associated responses. What can we do? Unsupervised learning In unsupervised learning, our task is to try to “make sense of” data, as opposed to learning a mapping. x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N We have inputs but no associated responses. What can we do? Example: electricity usage pattern at houses. Electricity usage per house: Unsupervised learning In unsupervised learning, our task is to try to “make sense of” data, as opposed to learning a mapping. x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N We have inputs but no associated responses. What can we do? Example: electricity usage pattern at houses. Electricity usage per house: Overall electricity usage pattern: Unsupervised learning In unsupervised learning, our task is to try to “make sense of” data, as opposed to learning a mapping. x2X AAAB7nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8cI5gHJEmYnvcmQ2dllZlYMSz7CiwdFvPo93vwbJ8keNLGgoajqprsrSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0O/Vbj6g0j+WDGSfoR3QgecgZNVZqPZEul6TdK1fcqjsDWSZeTiqQo94rf3X7MUsjlIYJqnXHcxPjZ1QZzgROSt1UY0LZiA6wY6mkEWo/m507ISdW6ZMwVrakITP190RGI63HUWA7I2qGetGbiv95ndSE137GZZIalGy+KEwFMTGZ/k76XCEzYmwJZYrbWwkbUkWZsQmVbAje4svLpHlW9S6rF/fnldpNHkcRjuAYTsGDK6jBHdShAQxG8Ayv8OYkzovz7nzMWwtOPnMIf+B8/gCF7o8N We have inputs but no associated responses. What can we do? Example: electricity usage pattern at houses. Clustering houses with similar Electricity usage per house: Overall electricity usage pattern: electricity usage patterns: Learning from Data 1. Data Integration 2. Learning from Data 3. Supervised Learning 4. Linear Regression Supervised Learning Supervised learning In supervised learning, we learn a mapping function AAAB+3icbVDLSsNAFJ34rPUV69LNYBEqSEnE10YounFnBfuANoTJdNIOnZmEmYk0hP6KGxeKuPVH3Pk3TtsstPXAhcM593LvPUHMqNKO820tLa+srq0XNoqbW9s7u/ZeqamiRGLSwBGLZDtAijAqSENTzUg7lgTxgJFWMLyd+K0nIhWNxKNOY+Jx1Bc0pBhpI/l2KfVjeA3DSveekz46gaNj3y47VWcKuEjcnJRBjrpvf3V7EU44ERozpFTHdWLtZUhqihkZF7uJIjHCQ9QnHUMF4kR52fT2MTwySg+GkTQlNJyqvycyxJVKeWA6OdIDNe9NxP+8TqLDKy+jIk40EXi2KEwY1BGcBAF7VBKsWWoIwpKaWyEeIImwNnEVTQju/MuLpHladS+q5w9n5dpNHkcBHIBDUAEuuAQ1cAfqoAEwGIFn8ArerLH1Yr1bH7PWJSuf2Qd/YH3+AK68ku4= yp = f (⌦, x) x input AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaNryPRi0dI5JHAhswODYzMzm5mZo1kwxd48aAxXv0kb/6NA+xBwUo6qVR1p7sriAXXxnW/ndzK6tr6Rn6zsLW9s7tX3D9o6ChRDOssEpFqBVSj4BLrhhuBrVghDQOBzWB0O/Wbj6g0j+S9Gcfoh3QgeZ8zaqxUe+oWS27ZnYEsEy8jJchQ7Ra/Or2IJSFKwwTVuu25sfFTqgxnAieFTqIxpmxEB9i2VNIQtZ/ODp2QE6v0SD9StqQhM/X3REpDrcdhYDtDaoZ60ZuK/3ntxPSv/ZTLODEo2XxRPxHERGT6NelxhcyIsSWUKW5vJWxIFWXGZlOwIXiLLy+TxlnZuyxf1M5LlZssjjwcwTGcggdXUIE7qEIdGCA8wyu8OQ/Oi/PufMxbc042cwh/4Hz+AOl5jQY= yp output (values predicted by the model) AAAB6nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8eI5gHJEmYns8mQ2dllplcISz7BiwdFvPpF3vwbJ8keNLGgoajqprsrSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWj26nfeuLaiFg94jjhfkQHSoSCUbTSw7iX9MoVt+rOQJaJl5MK5Kj3yl/dfszSiCtkkhrT8dwE/YxqFEzySambGp5QNqID3rFU0YgbP5udOiEnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8NrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2nZEPwFl9eJs2zqndZvbg/r9Ru8jiKcATHcAoeXEEN7qAODWAwgGd4hTdHOi/Ou/Mxby04+cwh/IHz+QNv8o3q AAAB7XicbVDJSgNBEK1xjXGLevTSGARPYUbcjkEv3oxgFkiG0NPpSdr0MnT3CGHIP3jxoIhX/8ebf2MnmYMmPih4vFdFVb0o4cxY3//2lpZXVtfWCxvFza3tnd3S3n7DqFQTWieKK92KsKGcSVq3zHLaSjTFIuK0GQ1vJn7ziWrDlHywo4SGAvclixnB1kmNzp2gfdwtlf2KPwVaJEFOypCj1i19dXqKpIJKSzg2ph34iQ0zrC0jnI6LndTQBJMh7tO2oxILasJseu0YHTulh2KlXUmLpurviQwLY0Yicp0C24GZ9ybif147tfFVmDGZpJZKMlsUpxxZhSavox7TlFg+cgQTzdytiAywxsS6gIouhGD+5UXSOK0EF5Xz+7Ny9TqPowCHcAQnEMAlVOEWalAHAo/wDK/w5invxXv3PmatS14+cwB/4H3+AGJxjwU= ⌦ parameters of the model Supervised learning In supervised learning, we learn a mapping function AAAB+3icbVDLSsNAFJ34rPUV69LNYBEqSEnE10YounFnBfuANoTJdNIOnZmEmYk0hP6KGxeKuPVH3Pk3TtsstPXAhcM593LvPUHMqNKO820tLa+srq0XNoqbW9s7u/ZeqamiRGLSwBGLZDtAijAqSENTzUg7lgTxgJFWMLyd+K0nIhWNxKNOY+Jx1Bc0pBhpI/l2KfVjeA3DSveekz46gaNj3y47VWcKuEjcnJRBjrpvf3V7EU44ERozpFTHdWLtZUhqihkZF7uJIjHCQ9QnHUMF4kR52fT2MTwySg+GkTQlNJyqvycyxJVKeWA6OdIDNe9NxP+8TqLDKy+jIk40EXi2KEwY1BGcBAF7VBKsWWoIwpKaWyEeIImwNnEVTQju/MuLpHladS+q5w9n5dpNHkcBHIBDUAEuuAQ1cAfqoAEwGIFn8ArerLH1Yr1bH7PWJSuf2Qd/YH3+AK68ku4= yp = f (⌦, x) Customer 1 1 2 3 4 5 6 7 8 9 10 Customer 2 x input AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaNryPRi0dI5JHAhswODYzMzm5mZo1kwxd48aAxXv0kb/6NA+xBwUo6qVR1p7sriAXXxnW/ndzK6tr6Rn6zsLW9s7tX3D9o6ChRDOssEpFqBVSj4BLrhhuBrVghDQOBzWB0O/Wbj6g0j+S9Gcfoh3QgeZ8zaqxUe+oWS27ZnYEsEy8jJchQ7Ra/Or2IJSFKwwTVuu25sfFTqgxnAieFTqIxpmxEB9i2VNIQtZ/ODp2QE6v0SD9StqQhM/X3REpDrcdhYDtDaoZ60ZuK/3ntxPSv/ZTLODEo2XxRPxHERGT6NelxhcyIsSWUKW5vJWxIFWXGZlOwIXiLLy+TxlnZuyxf1M5LlZssjjwcwTGcggdXUIE7qEIdGCA8wyu8OQ/Oi/PufMxbc042cwh/4Hz+AOl5jQY= Customer 3 yp output (values predicted by the model) Customer 4 AAAB6nicbVDLSgNBEOyNrxhfUY9eBoPgKeyKr2PQi8eI5gHJEmYns8mQ2dllplcISz7BiwdFvPpF3vwbJ8keNLGgoajqprsrSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWj26nfeuLaiFg94jjhfkQHSoSCUbTSw7iX9MoVt+rOQJaJl5MK5Kj3yl/dfszSiCtkkhrT8dwE/YxqFEzySambGp5QNqID3rFU0YgbP5udOiEnVumTMNa2FJKZ+nsio5Ex4yiwnRHFoVn0puJ/XifF8NrPhEpS5IrNF4WpJBiT6d+kLzRnKMeWUKaFvZWwIdWUoU2nZEPwFl9eJs2zqndZvbg/r9Ru8jiKcATHcAoeXEEN7qAODWAwgGd4hTdHOi/Ou/Mxby04+cwh/IHz+QNv8o3q Customer 5 AAAB7XicbVDJSgNBEK1xjXGLevTSGARPYUbcjkEv3oxgFkiG0NPpSdr0MnT3CGHIP3jxoIhX/8ebf2MnmYMmPih4vFdFVb0o4cxY3//2lpZXVtfWCxvFza3tnd3S3n7DqFQTWieKK92KsKGcSVq3zHLaSjTFIuK0GQ1vJn7ziWrDlHywo4SGAvclixnB1kmNzp2gfdwtlf2KPwVaJEFOypCj1i19dXqKpIJKSzg2ph34iQ0zrC0jnI6LndTQBJMh7tO2oxILasJseu0YHTulh2KlXUmLpurviQwLY0Yicp0C24GZ9ybif147tfFVmDGZpJZKMlsUpxxZhSavox7TlFg+cgQTzdytiAywxsS6gIouhGD+5

Use Quizgecko on...
Browser
Browser