Fundamentals of Data Science DS302 Lecture Notes PDF
Document Details
Uploaded by FasterLime
Dr. Islam Saeed
Tags
Summary
This document is lecture notes on Fundamentals of Data Science. It covers topics like what data science is, how it's used, and the relationship between artificial intelligence, machine learning, and data science. The document also includes course grading details, and exam schedule details.
Full Transcript
Fundamentals of Data Science DS302 Dr. Islam Saeed Reference Books Data Science :Concepts and Practice, Vijay Kotu and Bala Deshpande,2019. DATA SCIENCE: FOUNDATION & FUNDAMENTALS, B. S. V. Vatika, L. C. Dabra, Gwalior,2023....
Fundamentals of Data Science DS302 Dr. Islam Saeed Reference Books Data Science :Concepts and Practice, Vijay Kotu and Bala Deshpande,2019. DATA SCIENCE: FOUNDATION & FUNDAMENTALS, B. S. V. Vatika, L. C. Dabra, Gwalior,2023. 2 Course Grading Course Grading Mid-Term Exam 20 Lectures Quizzes (Average) 10 Assignments 5 Class work (Lectures + Labs) 5 Project Discussion 10 Practical Exam 10 Bonus (for Project and class work) 1-5 Final Exam 40 Exams Week Action 3rd Quiz 1 5th DS302 Quiz 2 7th Mid-Term Dr. Islam Saeed 10th Quiz 3 14th Final Exam What is Data Science Data science is a compilation of techniques that extract value from data. Some of the techniques used in data science have a long history and trace. their roots to applied statistics, machine learning, visualization, logic, and computer science. Data science techniques rely on finding useful patterns, connections, and relationships within data. 7 Data science is also commonly referred to as knowledge discovery, machine learning, predictive analytics, and data mining. In spite of the present growth and popularity, the underlying methods of data science are decades if not centuries old. The use of the term science in data science indicates that the methods are evidence based, and are built on empirical knowledge, more specifically historical observations. 8 This in turn allows companies to increase efficiencies, manage costs, identify new market opportunities, and boost their market Advantage. Data science is the practice of mining large data sets of raw data, both structured and unstructured, to identify patterns and extract actionable insight from them. It’s an interdisciplinary field and the foundations of data science include: statistics, computer science, predictive analytics, machine learning algorithm development, and new technologies to gain insights from big data. 9 AI, MACHINE LEARNING, AND DATA SCIENCE Artificial intelligence, Machine learning, and data science are all related to each other. They are often used interchangeably and conflated with each other in popular media and business communication. Artificial intelligence is about giving machines the capability of mimicking human behavior, particularly cognitive functions. - Examples would be: facial recognition, automated driving, sorting mail based on postal code. 10 AI, MACHINE LEARNING, AND DATA SCIENCE In some cases, machines have far exceeded human capabilities and in other cases we have barely scratched the surface (search “artificial stupidity”). This can occur when an AI system is not properly programmed or trained, or when it is given incomplete or inaccurate data. For example, a self-driving car may make a wrong turn because it was not programmed to recognize a detour or road closure. Machine learning can either be considered a sub-field or one of the tools of artificial intelligence, is providing machines with the capability of learning from experience. 11 AI, MACHINE LEARNING, AND DATA SCIENCE Experience for machines comes in the form of data. Data that is used to teach machines is called training data. A program, a set of instructions to a computer, transforms input signals into output signals using predetermined rules and relationships. Machine learning algorithms, also called “learners”, take both the known input and output (training data) to figure out a model for the program which converts input to output. 12 AI, MACHINE LEARNING, AND DATA SCIENCE For example, many organizations like social media platforms, review sites, or forums are required to remove abusive content. How can machines be taught to automate the removal of abusive content? The machines need to be shown examples of both abusive and non- abusive posts with a clear indication of which one is abusive. Data science is the business application of machine learning, artificial intelligence, and other quantitative fields like statistics, visualization, and mathematics. 13 AI, MACHINE LEARNING, AND DATA SCIENCE It is an interdisciplinary field that extracts value from data. - Recommendation engines that can recommend movies for a particular User. - A fraud alert model that detects fraudulent credit card transactions. - Find customers who will most likely churn next month. - Predict revenue for the next quarter. 14 AI, MACHINE LEARNING, AND DATA SCIENCE 15 AI, MACHINE LEARNING, AND DATA SCIENCE 16 Data Science Life Cycle Data Science encompasses the following phases ▪ Capture ▪ Prepare and Maintain ▪ Preprocess or Process ▪ Analyze ▪ Communicate Prepare and Preprocess Capture Maintain or Process Analyze Communicate 17 Data Science Life Cycle Capture: This is the gathering of raw structured and unstructured data from all relevant sources via just about any method from manual entry and web scraping to capturing data from systems and devices in real time. Prepare and maintain: This involves putting the raw data into a consistent format for analytics or machine learning or deep learning models. This can include everything from cleansing, deduplicating, and reformatting the data, to using ETL (extract, transform, load) or other data integration technologies to combine the data into a data warehouse, data lake, or other unified store for analysis. 18 Data Science Life Cycle Preprocess or process: Here, data scientists examine biases, patterns, ranges, and distributions of values within the data to determine the data’s suitability for use with predictive analytics, machine learning, and/or deep learning algorithms (or other analytical methods). Analyze: This is where the discovery happens—where data scientists perform statistical analysis, predictive analytics, regression, machine learning and deep learning algorithms, and more to extract insights from the prepared data. Communicate: Finally, the insights are presented as reports, charts, and other data visualizations that make the insights—and their impact on the business—easier for decision makers to understand. 19 Data scientist A data scientist analyzes business data to extract meaningful insights. In other words, a data scientist solves business problems through a series of steps 20 Data scientist Business Acumen Skills A data scientist should have business acumen skills to counter the pressure of business: Understanding of domain Business strategy Problem solving Communication Presentation Inquisitiveness 21 Data scientist Technology Expertise Skills A data scientist should be technology expert to convert the business into business logic: Good database knowledge such as RDBMS. Good NoSQL database knowledge such as MongoDB, Cassandra, HBase, etc. Programming languages such as Java, Python, etc. Open-source tools such as Hadoop, R. Data warehousing, Data mining. Visualization such as Tableau, Flare, Google visualization APIs, etc. 22 Data scientist Mathematics Expertise Skills A data scientist should be mathematics expert to formulize and analyze data: Mathematics. Statistics. Artificial Intelligence (AI). Machine learning. Pattern recognition. Natural Language Processing. 23 Data scientist What Does a Data Scientist Do? A data scientist analyzes business data to extract meaningful insights. In other words, a data scientist solves business problems through a series of steps, including: o Before tackling the data collection and analysis, the data scientist determines the problem by asking the right questions and gaining understanding. o The data scientist then determines the correct set of variables and data sets. 24 What Does a Data Scientist Do? o The data scientist gathers structured and unstructured data from many disparate sources—enterprise data, public data, etc. o Once the data is collected, the data scientist processes the raw data and converts it into a format suitable for analysis. o This involves cleaning and validating the data to guarantee uniformity, completeness, and accuracy. o After the data has been rendered into a usable form, it’s fed into the analytic system— ML algorithm or a statistical model. 25 What Does a Data Scientist Do? o This is where the data scientists analyze and identify patterns and trends. o When the data has been completely rendered, the data scientist interprets the data to find opportunities and solutions. o The data scientists finish the task by preparing the results and insights to share with the appropriate stakeholders and communicating the results. 26 27