IE MBD Data Viz - Session 1.pdf
Document Details
Uploaded by PerfectPanda
IE University
2024
Tags
Full Transcript
Data Visualization MBD Session 1 “Data visualization is the language of decision-making. Good charts effectively convey information. Great charts enable, inform and improve decision-making.” –Dante Vitagliano Professor Christina Stathopoulos, February 2024 Course Introduction Data Visualization Chri...
Data Visualization MBD Session 1 “Data visualization is the language of decision-making. Good charts effectively convey information. Great charts enable, inform and improve decision-making.” –Dante Vitagliano Professor Christina Stathopoulos, February 2024 Course Introduction Data Visualization Christina Stathopoulos Founder, Data Evangelist Dare to Data Ex-Google, 5+ years… - Analytical Lead, Waze @Google - Data Specialist, Google Spain Adjunct Professor of Analytics IE Business School | IE University Podcast Host @EM360 Instructor @LinkedIn Learning Ambassador @Aporia Master in Business Analytics & Big Data IE School of Science & Technology [email protected] www.linkedin.com/in/christinastathopoulos Course Breakdown Data Viz Fundamentals Hands-On Practice Data Storytelling Final Project & Closing Sessions 1-4 Sessions 5-12 Session 13 Sessions 14-15 Grading Breakdown 0% 6 : l Tota 0% 4 : l Tota Individual Group Attendance & Participation 20% Group Work 15% Final Exam 20% (closed book) Group Presentation 25% Individual Work 20% Note that for double sessions, each session will be counted ‘separately’ in regards to attendance, so you must attend both for full credit. Grading Breakdown Following IE policy, we apply the curve grading methodology: 35% 35% 15% HONORS (4.0) 15% EXCELLENCE (3.66) PROFICIENCY (3.33) PASS (3.0) Final Group Project In groups, find a dataset and an objective or problem you would like to solve for using that dataset. Prepare data visualizations and a data story to deliver in a live presentation during our last session together. Full details can be found in the Blackboard assignment block. NOTE: You have full flexibility on your choice of the dataset and data viz tool used Looker Studio or Tableau. Presentation should be ~8 minutes. All materials you develop for the project should be shared the evening before the last session (by March 21st 23:59h) to the designated assignment block. Learning Objectives By the end of this course, you should… ü Understand why data visualization is so important today ü Be familiar with the history and origins of the field ü Grasp design principles and the correct taxonomy for visual interpretations of data ü Recognize what to avoid when visualizing data ü Know the right sources to find inspiration for visualizations ü Discover the power of proper data storytelling ü Control Looker Studio and Tableau at an intermediate level to get started with your own reports and dashboards My Approach I share all slides before class Lecturing (sorry! but it’s necessary) Using examples to connect theory to practice Class discussion, real-world examples, videos, demos, hands-on practice And… try to have some fun J Class Logistics Fully Here – Be present and pay attention. Try – We will debate, do exercises. Don’t be afraid to try new things and fail. Curious – Be curious about our course content, don’t be afraid to ask questions. You can raise your hand at anytime. On Time – We start all classes on time so be punctual. Book of the Day Learning to See Data by Ben Jones Session 1 Agenda Setting the Stage Definition & Purpose Historical Foundation & Leaders Data Viz for the 21st Century Intro to Data Visualization Setting the Stage Why so much data? Today… It would take a person approximately 181 million years to download all the data from the Internet. Source: Unicorn Insights “Where for most of history we have suffered from a shortage of information, tomorrow we will struggle with a surfeit. Already today, the chief difficulty is not so much obtaining information but finding the relevant data.” Source: The Economist (2012), Megachange: the World in 2050 90% of all information transmitted to our brains is visual. People remember 80% of what they see and only 20% of what they hear. Photo by Rhondak Native Florida Folk Artist on Unsplash ”The brain processes images 60,000 times faster than it does text.” ~Ritu Pant How many F’s are there? HGJALQMDIFGKTPSNAKPENFVLA PQOEPJGAMAFYPQOEYCNMLCAK FHGURNQLFUNVMALQPTUEMSIFH GNCKAPKFNSLEITYGMSLAOQECM ZXLSPFIGMGBECBAKPQMDKELTUI How many F’s are there? HGJALQMDIFGKTPSNAKPENFVLA PQOEPJGAMAFYPQOEYCNMLCAK FHGURNQLFUNVMALQPTUEMSIFH GNCKAPKFNSLEITYGMSLAOQECM ZXLSPFIGMGBECBAKPQMDKELTUI The Human Mind is Easily Fooled First Exam: Average Score 72 out of 100 Second Exam: Average Score 96 out of 137 First Exam 0 72 100 Second Exam 0 Source: Cairo, Alberto (2016). The Truthful Art: Data, Charts, and Maps for Communication 96 137 Anscombe’s Quartet I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Anscombe’s Quartet I II III All four sets are identical when we consider simple summary statistics! IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 Mean of y 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 Sample variance of 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Summary Statistics Mean of x Sample variance of Correlation between x and y Linear regression line Coefficient of determination of the linear regression: Anscombe’s Quartet I II III All four sets are identical when we consider simple summary statistics! IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 Mean of y 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 Sample variance of 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Summary Statistics Mean of x Sample variance of Correlation between x and y Linear regression line Coefficient of determination of the linear regression: Anscombe’s Quartet Visualized I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Datasaurus Dozen Never trust summary statistics alone, always visualize your data. Definition & Purpose What & Why Graphical representation of information. HUGE amounts of data can be easily digestible when represented visually. STATIC Tells a specific story & focuses on a specific data relationship. (e.g. infographic) INTERACTIVE Allows users to select specific data points & alter the findings. (e.g. dashboard) Data Visualization Tools Looker Studio: lookerstudio.google.com Tableau Public: public.tableau.com PowerBI: powerbi.microsoft.com Carto: carto.com Qlik: qlik.com D3: d3js.org Rstudio & ggplot2 package: rstudio.com Python & packages like Bokeh: continuum.io/downloads Google Sheets Excel Data Visualization Tools Looker Studio: lookerstudio.google.com Tableau Public: public.tableau.com PowerBI: powerbi.microsoft.com Carto: carto.com Qlik: qlik.com D3: d3js.org Rstudio & ggplot2 package: rstudio.com Python & packages like Bokeh: continuum.io/downloads Google Sheets Excel Historical Foundation & Leaders Data Viz in Yesteryear Maps Source: Ptolemy’s World Map Timeline Source: Joseph Priestley’s “A Chart of Biography” Bar Chart Source: William Playfair’s “Commercial and Political Atlas” Time Series Source: William Playfair’s “Commercial and Political Atlas” Pie Chart “Incredible to think that hundreds of years later, the ideas of ONE man, William Playfair, still make up the bulk of the chart options that we use today.” Source: William Playfair’s “Statistical Breviary” Sankey Diagram Source: Charles Minard’s Map of Napoleon’s Russian campaign 1812 ► Florence Nightingale Pioneer in modern nursing, statistics & data visualization. Her work as a nurse in the 1850s mixed with a love of statistics led her to build data visualizations that showed the impact of hygiene on health, particularly during wartime. Pioneer of data visualization ► Edward Tufte American statistician & professor at Yale University. Famous for his work in information design & data visualization, having written one of the core books in the field: The Visual Display of Quantitative Information. Pioneer of data visualization ► Hans Rosling Pioneer of data visualization Swedish physician, statistician & public speaker. Became a YouTube star for his captivating ways of telling stories with data, revealing trends & illuminating facts like never before seen. Most famous one: The Best Stats You’ve Ever Seen! THE IDEA IS TO GO FROM NUMBERS TO INFORMATION TO UNDERSTANDING. -- Hans Rosling Group Assignment Create ONE slide about a data visualization leader past or present. Be prepared to explain the work that person has done in the field & highlight one interesting thing you learned from them or a powerful data visualization piece of theirs. Delivery: Very quick presentation, ~3minutes. DUE: To be presented in Session 4. It must be uploaded to the designated assignment block by midnight on the day before Session 4. Only one person from your group needs to upload it. Data Viz for the 21st Century Covid-19 & Beyond Inform Educate Persuade Top Visualizations in Covid-19 The Rise of Data Journalism (and in parallel, a more data literate public) 1900 – 1980 Academic use, very limited public use 1980 – 2015 Popularity in public media increases, readers can handle more complexity Data Literacy Test A. In recent years, the rate of cavities has increased in many countries. B. In some countries, people brush their teeth more frequently than in other countries. C. The more sugar people eat, the more likely they are to get cavities. D. In recent years, the consumption of sugar has increased in many countries. Source: American Trends Panel (wave 6). Survey of U.S. adults conducted Aug 11-Sep 3, 2014. PEW RESEARCH CENTER The general public is becoming more comfortable with complex data visualization used to portray messages or stories America’s Most Expensive Housing Markets The world is one big data problem. Feel free to stay in touch! Any questions? What are you going to do about it?