Introduction to Data Science PDF

Summary

This document provides an introduction to data science, outlining its core concepts, applications in various fields, and the importance of skilled practitioners in the current data-driven world. It explains data science's role in decision-making processes and emphasizes the growing demand for qualified data scientists and engineers. The document includes a 3V model of data, exploring the aspects of velocity, volume, and variety, which are critical to the field. A brief overview of topics like machine learning, computer vision, and other areas of artificial intelligence is also discussed.

Full Transcript

Introduction to Data Science Prepared by: Dr. Mais Haj Qasem Introduction What is Data Science? Data Science is concerned with the collection, analysis, and decision-making of data The goal of data science is to identify patterns in data throug...

Introduction to Data Science Prepared by: Dr. Mais Haj Qasem Introduction What is Data Science? Data Science is concerned with the collection, analysis, and decision-making of data The goal of data science is to identify patterns in data through analysis and predict future events Employing Data Science, businesses may create: Better decisions (Should A or B be selected?) Predictive analysis: what is going to occur next? Finding patterns in the data or even identifying hidden information Prepared by: Dr. Mais Haj Qasem Why Data Science is so important now? We have a lot of data, we continue to generate a staggering amount of data at an unprecedented and ever-increasing speed, analyzing data wisely necessitates the involvement of competent and well-trained practitioners, and analyzing such data can provide actionable insights. Data Science is used in many industries in the world today, e.g. banking, consultancy, healthcare, and manufacturing. Demand for data scientists and data engineers tripled over the past five years, rising 231%. That’s much faster than job postings overall in the UK, which rose 36%, according to the report, “Dynamics of data science skills.” Prepared by: Dr. Mais Haj Qasem Data is everywhere! Humans and machines are constantly creating new data! 3V model for Data 1. Velocity: The speed at which data is accumulated. 2. Volume: The size and scope of the data. 3. Variety: The massive array of data and types (structured and unstructured). Prepared by: Dr. Mais Haj Qasem The increase in size of data is a 50-fold more in volume than what was available at the beginning of 2010 Prepared by: Dr. Mais Haj Qasem Acritical Intelligence Prepared by: Dr. Mais Haj Qasem Computer Vision : a branch of artificial intelligence (AI) that makes use of digital images, videos, and other visual inputs to allow computers and systems to extract useful information. Example : Azure AI Video Indexer (https://vi.microsoft.com/en-us) Prepared by: Dr. Mais Haj Qasem Voice Recognition : with the development of artificial intelligence, voice recognition has become more popular and useful. That is, the ability of a machine or program to receive and interpret dictation or to understand and perform spoken commands. Example: Amazon's Alexa and Apple's Siri Prepared by: Dr. Mais Haj Qasem Natural Language Processing :an area of artificial intelligence that focuses on helping computers in understanding how people write and speak. Example: ticket classification, and spell check Prepared by: Dr. Mais Haj Qasem Robotics : Robotics is to build intelligent robots that can help people in many ways, while artificial intelligence (AI) in robotics seeks to provide an intelligent environment for more automation in the robotics industry in general. Example: cleaning robots, Sophia Prepared by: Dr. Mais Haj Qasem Neural Network :. a technique in artificial intelligence, also called deep learning, that trains machines to process data in a way that is inspired by the human brain. Example: Image Captioning, facial recognition Prepared by: Dr. Mais Haj Qasem Data Science Vs Acritical Intelligence Data science uses AI because they have a lot of data that cannot be handled; they just use the machine learning part of it. Data Science Output: − Analysis (past or present "something already happened”) − Analytics (what may happen in the future, “predicted”) Prepared by: Dr. Mais Haj Qasem Analysis Observation Height ( inches ) Weight ( lbs ) What is present in the dataset? 1 58 115 2 59 117 Sorted 3 60 120 4 61 123 5 62 126 What are the benefits of this dataset? 6 63 129 7 64 132 Height Ranges from 58 to 72 8 65 135 Weight Ranges from 115 to 164 9 66 139 10 67 142 11 68 146 12 69 150 What are your questions? 13 70 154 Average height: 65 14 71 159 15 72 164 Average weight: 136 Prepared by: Dr. Mais Haj Qasem What else? An increase in height correlates with the value of weight Prepared by: Dr. Mais Haj Qasem Analytics Observation Height ( inches ) Weight ( lbs ) 1 58 115 2 59 117 3 60 120 4 61 123 What the weight of a woman of 73 inches? 5 62 126 6 63 129 Here, we must seek the help of one of the 7 64 132 Machine Learning algorithms 8 65 135 9 66 139 10 67 142 11 68 146 12 69 150 13 70 154 14 71 159 15 72 164 Prepared by: Dr. Mais Haj Qasem A data scientist should have at least three basic skills: 1. A strong knowledge of basic statistics and machine learning. 2. The computer science skills to take an unruly dataset and use a programming language (like R or Python) to make it easy to analyze. 3. The ability to visualize and express their data and analysis in a way that is meaningful to somebody less conversant in data. Issues of Ethics, Bias, and Privacy in Data Science : Many of the issues related to privacy, bias, and ethics can be traced back to the origin of the data. Ask: how, where, and why was the data collected? Who collected it? What did they intend to use it for? Prepared by: Dr. Mais Haj Qasem What do you need to be Data Scientist ? 1. Programming Skills: Python, R 2. Data Skills: Database, SQL and Hadoop or Spark Programming skills and data skills are at the same level of importance; you can learn one of them before another. Prepared by: Dr. Mais Haj Qasem 3. Data Pre-processing: Numpy and Pandas 4. Data Visualization: Matplotlib (You need it. Maybe sometimes it will be easier to see the data. 5. Machine Learning (ML): Supervised and Unsupervised 6. Deep Learning (DL): Neural Network (FNN, CNN, RNN and GAN) (If you don’t know, ML is difficult to learn. Sometimes ML will not be able to output the best result. In that case, you need DL, but if ML is a good choice with the best result, you need to not go through DL) Here, you can be a data scientist, but to be more professional, you need to be perfect in 7 and 8. Prepared by: Dr. Mais Haj Qasem 7. Cloud Computing: AWS, Azure, IBM and Google (You need it when the data you work on is massive or your laptop or PC cannot handle it.) 8. Model Deployment: Web API or Lite (All the results can be done in a real system, like a website or application, where there is a contact language between a data scientist and a web developer.) Prepared by: Dr. Mais Haj Qasem Prepared by: Dr. Mais Haj Qasem Typical Data Science Process Data Visualization Data Analysis Predictive Analytics Knowledge Extraction Data Preprocessing Business Application Data Gathering Prepared by: Dr. Mais Haj Qasem Typical Data Science Process 1. Data Gathering : The process of collecting, evaluating, and analyzing data from multiple sources in order to find insights. Data may be gathered from a variety of sources, including social media monitoring, web tracking, surveys, feedback, and so on. 2. Data Analysis : The process of working with data to extract meaningful information that is then used to make correct choices. 3. Data preprocessing : The most important step in the data mining process. It refers to the process of cleaning, converting, and combining data in order to prepare it for analysis. The purpose of data preprocessing is to enhance data quality and make it more suited for the specific data mining activity at present. Prepared by: Dr. Mais Haj Qasem 3. Predictive Analytics : The process of predicting future events using data. The technique employs data analysis, machine learning, artificial intelligence, and statistical models to identify patterns that may predict future behavior. 4. Knowledge Extraction : The extraction of knowledge from data. This may be accomplished using a variety of techniques, such as machine learning, natural language processing, and data mining. 5. Data Visualization : The representation of data using standard images such as charts, plots, infographics, and even animations. These information visualizations explain complicated data relationships and data-driven insights in an easy-to-understand manner. 6. Business Application : helped businesses in obtaining comprehensive market, competitive, and consumer information. Furthermore, by using technologies like artificial intelligence and machine learning, the application of data science in business helps in the automation of several complex processes. Prepared by: Dr. Mais Haj Qasem Machine Learning Machine learning is an artificial intelligence application that use statistical methods to allow computers to learn and make decisions without being explicitly programmed. It is based on the idea that computers can learn from data, recognize patterns, and make decisions with little help from humans. Learning is used when: ❖ Human expertise does not exist ❖ Human are unable to explain their expertise ❖ Solution changes in time ❖ Solution need to be adapted to particular case Prepared by: Dr. Mais Haj Qasem Types of Machine Learning Supervised Learning Unsupervised Learning Supervised Learning Supervised learning is a type of machine learning in which machines learn with well-labelled training data and then predict the output based on that data. Labelled data indicates that some input data has already been marked with the appropriate output. There are two types of supervised learning: ❖ Classification ❖ Regression Input Output Classification When the output variable is categorical, with two or more classes, classification is used. For instance, yes or no, male or female, true or false, and so on. Regression When the output variable is a real or continuous value, regression is used. There is a link between two or more variables in this example, which means that a change in one variable is related to a change in the other. Salary is based on job experience, for example, or weight depending on height, etc. Unsupervised Learning Unsupervised learning, in which the machine learns for itself from unlabelled data, The machine looks to discover a pattern in the unlabelled data and responds. There are two types of supervised learning. Also known as Clustering. Data Scientist Vs Machine Learning Engineers Data Scientist should be Machine Learning Engineering, but Machine Learning Engineering could or not be Data Scientist. Data scientists should have extensive domain knowledge in the fields in which they work, as well as extensive statistical understanding, more than Machine Learning Engineers, and for that, Machine Learning Engineers worked under the vision of Data Scientists. Prepared by: Dr. Mais Haj Qasem

Use Quizgecko on...
Browser
Browser