9adaae6d8bb51a87a7df026eca55924c.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
WEEK 1 The World of Data , Data Science and Artificial Intelligence INFO8066 - DATA ANALYTICS - SEC XX 1 DATA Data has been the buzzword for ages now. Either the data being generated from large-scale enterprises or the data generated from an individual, each and every aspec...
WEEK 1 The World of Data , Data Science and Artificial Intelligence INFO8066 - DATA ANALYTICS - SEC XX 1 DATA Data has been the buzzword for ages now. Either the data being generated from large-scale enterprises or the data generated from an individual, each and every aspect of data needs to be analyzed to benefit yourself from it. But how do we do it? Well, that's where the term 'Data Analytics' comes in. INFO8066 - DATA ANALYTICS - SEC XX 2 WHAT IS DATA ? “Data is raw, unorganized facts that need to be processed” Diffen.com "Data" comes from a singular Latin word, datum, which originally meant "something given." Its early usage dates back to the 1600s Data are simply facts or figures — bits of information, but not information Itself. When data are processed, interpreted, organized, structured or presented to make them meaningful or useful, they are called information. Information provides a much richer contexts Into the phenomenon a clear actionable Insight (if applicable) Exploring your data at the early onset of a project will help you communicate this need to data scientists and your team alike, thus easily narrowing the project's focus and streamlining the path toward results. It is always important to understand how the data was generated before starting to analyze it INFO8066 - DATA ANALYTICS - SEC XX 3 DATA AND INFORMATION "Data" and ”Information" are intricately tied together, whether one is recognizing them as two separate words or using interchangeably, as is common today EXAMPLES OF DATA EXAMPLES OF INFORMATION The number of visitors to a website in one month Understanding that changes to a website have led to an increase or decrease in monthly site visitors Inventory levels in a warehouse on a specific date Identifying supply chain issues based on trends in warehouse inventory levels over time Individual satisfaction scores on a customer service survey Finding areas for improvement with customer service based on a collection of survey responses The price of a competitors’ product Determining if a competitor is charging more or less for a similar product INFO8066 - DATA ANALYTICS - SEC XX 4 DATA CAN BE MISLEADING (SURVIVORSHIP BIAS) INFO8066 - DATA ANALYTICS - SEC XX 5 SURVIVORSHIP BIAS Survivorship bias is our tendency to study the people or companies who “survived” or were victorious in a certain situation while ignoring those that failed (“LinkedIn”). This can lead researchers to form incorrect conclusions due to only studying a subset of the population (Kassiani Nikolopoulou). Examples to get you thinking: A real estate investing course features testimonials from a handful of happy customers who “got rich” following the investing approach being sold by the company. They don’t mention the thousands of others who failed using the same method. Mark Zuckerberg, Steve Jobs, and Bill Gates dropped out of college and became billionaires. Does this mean that if you follow their example and drop out of college you are more likely to become a billionaire too? “LinkedIn.” Linkedin.com, 2023, www.linkedin.com/pulse/survivorship-bias-avoid-mistake-stephen-lynch/. Accessed 4 Sept. 2023. Kassiani Nikolopoulou. “What Is Survivorship Bias? | Definition & Examples.” Scribbr, 4 Oct. 2022, www.scribbr.com/research-bias/survivorship-bias/. Accessed 4 Sept. 2023. INFO8066 - DATA ANALYTICS - SEC XX 6 WHAT IS DATA ANALYTICS ? Data analytics is the process of looking at raw data to find patterns and insights. It helps businesses make better decisions, improve efficiency, and increase revenue by turning data into useful information. INFO8066 - DATA ANALYTICS - SEC XX 7 DATA ANALYTICS TOOLS To analyze data, you need to use tools that can help you process and manipulate data. Some of the commonly used data analytics tools include Excel, R, Python, SQL, Tableau, and PowerBI. Excel is a widely used tool for data analytics, and it is easy to use, especially for beginners. R & Python are programming languages that are commonly used for data analytics. SQL is a language used to query databases Tableau and Power BI are tools used for data visualization INFO8066 - DATA ANALYTICS - SEC XX 8 WHY IS DATA ANALYTICS IMPORTANT ? Data Analytics has a key role in improving your business as it is used to gather hidden insights, generate reports, perform market analysis, and improve business requirements. COLLECT DATA ANALYZE DATA GENERATE REPORTS INFO8066 - DATA ANALYTICS - SEC XX 9 THE POWER OF DATA ANALYTICS - NETFLIX INFO8066 - DATA ANALYTICS - SEC XX 10 THE POWER OF DATA ANALYTICS - NETFLIX Founded in 1997, as a subscription mail – order DVD company Current valuation of over $282 Billion – (Yahoo Finance) Current user base 151 million worldwide Retention rate 93% Jangam, R. (2023, March 23). The power of Data Analytics: A case study of netflix. Medium. https://medium.com/@raj.w.2336/the-power-of- data-analytics-a-case-study-of-netflix-555ae819b0d7 INFO8066 - DATA ANALYTICS - SEC XX 11 NETFLIX – KEY STRATEGIES DEVELOPING CUSTOMER USER INTERACTION PERSONA DATA ROBUST FEEDBACK SYSTEM INFO8066 - DATA ANALYTICS - SEC XX 12 NETFLIX – KEY STRATEGIES DEVELOPING CUSTOMER USER INTERACTION PERSONA DATA ROBUST FEEDBACK SYSTEM INFO8066 - DATA ANALYTICS - SEC XX 13 NETFLIX – KEY STRATEGIES DEVELOPING CUSTOMER USER INTERACTION PERSONA DATA ROBUST FEEDBACK SYSTEM INFO8066 - DATA ANALYTICS - SEC XX 14 NETFLIX – KEY STRATEGIES DEVELOPING CUSTOMER USER INTERACTION PERSONA DATA ROBUST FEEDBACK SYSTEM INFO8066 - DATA ANALYTICS - SEC XX 15 UBER USES DATA TO REINVENT TRANSPORTATION 8 Million Users 160k drivers 449 cities in 66 countries 1 million rides/day 2 billion rides recorded https://www.projectpro.io/article/how-uber-uses-data-science-to-reinvent-transportation/290 INFO8066 - DATA ANALYTICS - SEC XX 16 UBER USES DATA TO REINVENT TRANSPORTATION (Kivestu, 2020) INFO8066 - DATA ANALYTICS - SEC XX 17 UBER USES DATA TO REINVENT TRANSPORTATION Uber Knows You Well Tracking supply and demand allows them to implement "surge pricing," boosting fares at peak times to draw more drivers out. Uber's secret lies in its sophisticated supply chain management system, which uses data analytics to optimize every aspect of the ride-sharing experience. Uber relies heavily on real-time data to monitor supply and demand patterns, adjust operations, and optimize its driver allocation. Using data analytics, Uber analyzes user behavior, location, and other data points to predict demand patterns, identify potential bottlenecks, and adjust operations to ensure maximum efficiency. INFO8066 - DATA ANALYTICS - SEC XX 18 UBER USES DATA TO REINVENT TRANSPORTATION Uber uses data analytics is through its "heatmap" tool, which provides real-time insights into where and when riders are requesting rides. This tool allows Uber to adjust its pricing and driver allocation to meet demand, resulting in a better user experience. Codebasics. (n.d.). How uber uses data analytics to increase supply efficiency? https://codebasics.io/blog/how-uber- uses-data-analytics-to-increase-supply-efficiency INFO8066 - DATA ANALYTICS - SEC XX 19 UBER USES DATA TO REINVENT TRANSPORTATION Uber’s predictive supply management system uses historical and real- time data to predict rider demand and driver supply in a given geographical area. By analyzing past demand patterns, Uber determines the likelihood of future demand in a particular location at a specific time. The system also considers factors such as weather, events, and traffic to make more accurate predictions. Codebasics. (n.d.). How uber uses data analytics to increase supply efficiency? https://codebasics.io/blog/how-uber- uses-data-analytics-to-increase-supply-efficiency Uber’s real-time data intelligence platform at scale: Improving gairos INFO8066 - DATA ANALYTICS - SEC XX scalability/reliability | uber blog. (n.d.). https://www.uber.com/blog/gairos- 20 scalability/ USES OF DATA ANALYTICS IN VARIOUS INDUSTRIES Application of data analytics is all around us Whether it be Entertainment , Manufacturing, E- commerce , Health, Marketing you name it. The rise of AI Is primarily due to the billions of information points we as humans are generating everyday combined with the rising computational power. INFO8066 - DATA ANALYTICS - SEC 1 21 DATA IN HEALTHCARE Among the many use cases , healthcare sector continues to use Chatbots for medical scheduling and Xray computer visions to detect early signs of a lethal disease saving lives of thousands of patients. Nalini. (2023, December 20). AI in Healthcare: Benefits, Applications, and Cases. Apptunix Blog. https://www.apptunix.com/blog/ai-in- healthcare-benefits-applications-and-cases/ INFO8066 - DATA ANALYTICS - SEC 1 22 VIDEO: HIGH-TECH HOSPITAL USES ARTIFICIAL INTELLIGENCE IN PATIENT CARE INFO8066 - DATA ANALYTICS - SEC 1 23 DATA IN E-COMMERCE Recommender system algorithms continue to make profit for companies by showing targeted and useful product recommendations to the customers. INFO8066 - DATA ANALYTICS - SEC 1 24 DATA IN ENTERTAINMENT Entertainment giants like Netflix and YouTube also use your past viewership data to recommend movies and videos you would also like. Medium.com lists that over 80% of Netflix viewing comes from Its recommendation system. INFO8066 - DATA ANALYTICS - SEC 1 25 DATA IN TRANSPORTATION Fully self-Driving Cars are now on roads taking every decision a human executes while driving including stopping , steering , braking for pedestrians or emergency stops. This kind of AI enabled use case Is only made possible with highly advance compute vision algorithms and blazing fast computational speeds. INFO8066 - DATA ANALYTICS - SEC 1 26 VIDEO: TRANSFORMING TRANSPORTATION WITH AI | I AM AI INFO8066 - DATA ANALYTICS - SEC 1 27 ARTIFICIAL INTELLIGENCE INFO8066 - DATA ANALYTICS - SEC XX 28 INTELLIGENCE All living organisms are intelligent They interact with their environment & survive Examples: Crossing a road Discovering alternate paths Writing a poem, drawing a picture, creating a new recipe INFO8066 - DATA ANALYTICS - SEC XX 29 ARTIFICIAL INTELLIGENCE Living things are intelligent; but are man made non-living things also intelligent? Can a machine Make discoveries? Pass a ruling order in a court? Compose a symphony? Go for a PLAN B? Decide to wait or let go? INFO8066 - DATA ANALYTICS - SEC XX 30 ARTIFICIAL INTELLIGENCE Traditional computers are powerful but not intelligent They can compile MBs and GBs of code but may get stuck at a minor logical error AI is a field of Computer Science which aims to make computer systems that can mimic human intelligence Just as we humans act when we don’t have exact information about a situation but still go ahead and choose one of the many possible moves INFO8066 - DATA ANALYTICS - SEC XX 31 ARTIFICIAL INTELLIGENCE IN ACTION INFO8066 - DATA ANALYTICS - SEC 1 32 ARTIFICIAL INTELLIGENCE Subsets of AI - Javatpoint. www.javatpoint.com. (n.d.). https://www.javatpoint.com/subsets-of-ai INFO8066 - DATA ANALYTICS - SEC XX 33 Artificial Over arching term which refers to the ability of a machine to Intelligenc be able to perform tasks with minimal interference e Machine Learning A subset of Artificial Intelligence Supervised learning Unsupervised learning Reinforcement learning Neural Network A subset of Machine learning. Neural Networks/Deep learning can work with supervised, unsupervised and reinforcement learning. Like the human brain, this kind of Machine learning Deep synthesizes relevant data from experience Learning 34 What is Machine Learning? Regression Stock price prediction Supervised Sales prediction 1 learning Classificatio Stock price prediction n Sales prediction Customer segmentation Clustering Anomaly detection Machine Unsupervised Learning 2 Learning Association Recommender systems Market basket analysis Dimensionality Principal component analysis reduction Autoencoders Reinforcement Real time decisions Recommender systems 3 Learning Skill Acquisition Game AI Robot Navigation 35 AI - > MACHINE LEARNING “Learning is any process by which a system improves performance from experience.” - Herbert Simon ML is used when: Human expertise does not exist (navigating on Mars) Humans can’t explain their expertise (speech recognition) Models must be customized (personalized medicine) Models are based on huge amounts of data (genomics) Based on slide by E. Alpaydin INFO8066 - DATA ANALYTICS - SEC 1 36 TYPES OF MACHINE LEARNING INFO8066 - DATA ANALYTICS - SEC 1 37 APPLICATIONS OF MACHINE LEARNING Image Recognition Virtual Personal Assistant INFO8066 - DATA ANALYTICS - SEC 1 38 APPLICATIONS OF MACHINE LEARNING What other applications of Machine Learning can you think of ?? INFO8066 - DATA ANALYTICS - SEC 1 39 PROS & CONS OF MACHINE LEARNING No Human Intervention Easily identify Wide trends and Applications patterns Advantages Handling Continuous multi-variety Improvement data INFO8066 - DATA ANALYTICS - SEC 1 40 PROS & CONS OF MACHINE LEARNING Data No Human Acquisition Intervention Easily identify Results Time & Wide Interpretations Resources trends and Applications patterns Disadvantages Advantages Elimination of Handling High Error Continuous Human multi-variety Chances Improvement Interface data INFO8066 - DATA ANALYTICS - SEC 1 41 DEEP LEARNING INFO8066 - DATA ANALYTICS - SEC 1 42 Input What are Neural Networks? Layer Hidden Layer Output Layer Inspired by the human brain. Neurons receive inputs from their neighbors and after some threshold, they activate and send signals further along. Neural Networks learn by processing examples, each of which contain known input and result forming probability. Very effective across a range of applications (vision, text, speech, medicine, robotics, etc.) 43 DEEP LEARNING DL tasks can be expensive, depending on significant computing resources, and require massive structured or unstructured data sets to train ML models on. For Deep Learning, a huge number of parameters need to be understood by a learning algorithm, which can initially produce many false positives. Barn owl or apple? This example indicates how challenging learning from INFO8066 - DATA ANALYTICS - SEC 1 samples is – even for machine learning. – Source: @teenybiscuit 44 APPLICATIONS OF DEEP LEARNING Medical Image Analysis Self Driving Car INFO8066 - DATA ANALYTICS - SEC 1 45 NATURAL LANGUAGE PROCESSING (NLP) “ Natural language processing is the set of methods for making human language accessible to computers” – Jacob Eisenstein (Google Scientist) “ Natural language processing is the field at the intersection of Computer science (Artificial intelligence) and linguistics “ – Christopher Manning (Professor, Stanford Uni) INFO8066 - DATA ANALYTICS - SEC 1 46 NLP IN ACTION INFO8066 - DATA ANALYTICS - SEC 1 47 APPLICATIONS OF NLP Sentiment Analysis Email Filtering INFO8066 - DATA ANALYTICS - SEC 1 48 DATA GENERATION PROCESS (DGP) INFO8066 - DATA ANALYTICS - SEC XX 49 DATA GENERATION Data Generating Process (DGP) describes the rules with which the data has been generated - (“Data Generating Process”) Extensive surveys are never easy to analyze. There are so many factors going into the design of a survey, most notably the selection probabilities. Consider the Australian Bureau of Statistics (ABS) national surveys. The ABS data protocols for the DGP of every survey are extensive and warrant careful examination before engaging with the data. By reading ABS’s data protocols you learn about the how the data was collected, about the scale of each measure, the coverage of the survey, precision and estimation error, the sample employed and more. https://www.globalpatron.com/images/10-best-data-collection-forms-1024x576.png Its is always important to understand how the data has been captured, stored and presented to you before starting to analyze it. INFO8066 - DATA ANALYTICS - SEC XX 50 DATA GENERATION Some of the Key points to check are What does each row (observation) in data represents ? What does each column (variable) represents in the data ? Are there any missing values In the datasets ? If yes, then what could have caused the missing values ? Has this data been processed or altered before ? Is the next cycle of data going to be In the same format or will carry the same set of rules ? INFO8066 - DATA ANALYTICS - SEC XX 51 DATA GENERATION CHALLENGES Data Quality Issues - Poor data quality can lead to incorrect insights and wasted resources Data Security - Protecting data at every stage of its lifecycle, from collection to storage to disposal Data Privacy - With the rise of data breaches and cyber attacks, customers are increasingly concerned about how their data is being used and who has access to it Data Volume - The sheer volume of data generated by businesses can be overwhelming, making it difficult to extract meaningful insights Siloed Data sources - Combining data from multiple sources to create a complete picture of a business’s operations. The top challenges of data collection and how to overcome them. (2023, April 5). https://aspenasolutions.com/challenges-of-data-collection-and-how-to-overcome-them INFO8066 - DATA ANALYTICS - SEC XX 52 The End What do you call data that floats? INFO8066 - DATA ANALYTICS - SEC XX 53