Topic2_Part2_Visual_tools_2023_2.pptx
Document Details

Uploaded by OticNirvana
Full Transcript
Topic 2: Visual Presentation of Data GEN 4191 Data Analytics for Business Optimisation 2023.2 – BBA6 Dr. Krisztina Soreg Let’s get started! After this session you should be able to: • Utilize a number of visual tools which will help to improve decision-making in the strategic planning process....
Topic 2: Visual Presentation of Data GEN 4191 Data Analytics for Business Optimisation 2023.2 – BBA6 Dr. Krisztina Soreg Let’s get started! After this session you should be able to: • Utilize a number of visual tools which will help to improve decision-making in the strategic planning process. • Recognize the most typical mistakes when developing visual tools. • Offer a short analytical report based on the created visualization. Visual Presentation of Data Visual Presentation of Data Question 1: What is the difference between a chart and a graph? Visual Presentation of Data Question 1: What is the difference between a chart and a graph? Chart: visual representation of information or data. The purpose is to help viewers understand and analyze information easily Graph: using mathematical connections (equations) to visualize data and analyze relationships and trends Visual Presentation of Data Question 2: Which are the most frequently used tools to visualize data? Main types of visual tools Line graph Main types of visual tools Simple line graph • Focus: raw data visualization as the relationship between 2 variables • Goal: depict the trend, pattern, development or change overtime-related to such data • Time horizon: extended • All graphs are charts but not all charts are graphs! • Recommendation: using not more than 3-4 different components + distinguishing by line colour, pattern, etc. Main types of visual tools Don’t do this with the line graphs… Main types of visual tools Don’t do this with the line graphs… Main types of visual tools Bar chart Main types of visual tools Bar chart • Focus: simplify data and categorize them by sections, intervals, groups or individuals • Goal: depict the structure, organization, size, etc. of data comparison • Time horizon: short, mid-term or none • Remember: the length of each bar is proportional to the value they represent • Types: horizontal or vertical Main types of visual tools Don’t do this with the bar charts… Main types of visual tools Don’t do this with the bar charts… Too much “chart junk”: unnecessary noise and formatting (non-data ink) making the chart unreadable Main types of visual tools How to improve the visualization? Mirroring the scale under the horizontal axis, removing data labels, more structured timeline Main types of visual tools Pie chart Main types of visual tools Pie chart • Focus: displaying data in a circular-shaped graph • Goal: depict proportions, ratios, shares, percentages (slices of a “pie) • Time horizon: none static chart • Remember: using not more than 5-6 components (slices) and adding the values in separate labels • Types: donut plot, 3D pie chart or exploded pie chart Main types of visual tools Pie chart – main types Donut plot “Exploded” pie chart Main types of visual tools Don’t do this with the pie charts… Main types of visual tools Time Magazine, March 2020 https://time.com/5798168/coronavirus-mo rtality-rate/ Problem: the pandemic was still unfolding in 2020… it cannot be compared with longexisting diseases! Seasonal flu: annual average of several years! Context is everything!!! Main types of visual tools Scatter plot Main types of visual tools Scatter plot • Focus: representing values for two different numeric variables • Goal: depict a relationship and its strength between two variables observing the pattern of the dots (correlation) • Time horizon: long or none • Remember: insert a trendline and check the outlier data whether they are correct • Types: liner or non-linear (logarithmic, etc.) Main types of visual tools When the y variable tends to increase as the x variable increases there is a positive correlation When the y variable tends to decrease as the x variable increases there is a negative correlation When there is no clear relationship between the two variables there is no correlation Main types of visual tools Example: negative linear correlation: the y variable (accidents per 100 drivers) tends to decrease as the x variable (age) increases Best platforms for visual tools Microsoft Excel https://www.canva.com/graphs/ https://www.visme.co/chart-maker/ Famous people to follow https://es.linkedin.com/in/christinastathopoulos https://thehiddenspeaker.com/ Misleading data: what is wrong with this chart? “In pink, that’s the reduction in the breast exams, and the red is the increase in the abortions. That’s what’s going on in your organization.” - Cecile Richards, the president of Planned Parenthood (2015) https://www.datapine.com/blog/misleading-statistics-and-data/ True: that the number of abortions since 2006 experienced substantial growth, while the number of cancer screenings substantially decreased Misleading: the chart points appear to indicate that 327,000 abortions are greater in inherent value than 935,573 cancer screenings… Problem: the chart has no defined y-axis! Misleading data: what is wrong with this graph? Why did they manipulate the data? Answer: $500 million in annual federal funding… Possible “authors” of the original graph: Americans United for Life, (anti-abortion group) Correct presentation of data in percentages and million USD Misleading data: what is wrong with this chart? In September 2021, during one of Fow New’s broadcasts, anchor Tucker Carlson displayed a graph saying that the number of Americans identifying as Christians had collapsed over the last decade. https://www.datapine.com/blog/misleading-statistics-and-data/ True: we can see a graph showing 77% of Christian Americans in 2009, a number that decreased to 65% in 2019 Misleading: we see the 12% drop from 2009 to 2019 way more significant than it actually is Problem: the Y-axis in this chart starts from 58% and ends at 78%! Misleading data: what is wrong with this chart? The distribution of the 2012 presidential run in the USA by the Republican candidates https://www.datapine.com/blog/misleading-statistics-and-data/ Misleading: the news network showed the graph below where we see a pie chart displaying a total of 193% which is clearly wrong and misleading as the total should be 100%! Pie charts: must always be 100% in total! Misleading data: what is wrong with this advertisement? “More than 80% of Dentists recommend Colgate.” https://www.datapine.com/blog/misleading-statistics-and-data/ How we understand it: 80 percent of dentists recommend Colgate over and above other brands, and the remaining 20 percent would recommend different brands. Truth: it allowed the participants to select one or more toothpaste brands… Problem: the research was not carried out by an independent research company but by the manufacturer of the product! Secondary data: example Step 1: visit the following platform of the World Bank https://data.worldbank.org/ Step 2: choose one indicator for detailed viewing e.g.: GDP growth (annual, %) Step 3: select 2 countries to compare by adding them one by one in the upper search bar e.g.: Spain and Italy Step 4: adjust the timeline of the indicator e.g.: from 2005 to 2021 Step 5: insert the graph into your research or download the data in Excel and customize your graph https://data.worldbank.org/indicator/NY.GDP.MKTP.KD .ZG?end=2021&locations=ES-IT&start=2005 Visual tools From time to time, we need to provide the visual proof for our findings that might include such tools as diagrams, charts, graphs, tables, photos or other elements. If inserting any figure (either from another source or our own diagram developed in Excel): • we must always provide the title of the diagram/chart/table/photo • the original source • we must make sure that both the X and Y-axis is labeled Visual tools Figure 1: Inflation rate of the European Union (1997-2022) Source: https://www.statista.com/statistics/685943/cpi-inflation-rate-europe/ Microsoft Excel: First steps Microsoft Excel: First steps Technical aspects: In order to easily follow the course please set your Excel to use U.S. English version in Excel. We want to achieve the following results: • arguments in a formula are separated by a comma, as in the example: =VLOOKUP(what you want to look up, from where, column number, FALSE) • decimal numbers are written with a period, as in 3.25 Language setup in Windows: • • • • Open your Microsoft Excel program Go to File > Options > Language Choose English as preferred in each case Video tutorial: https://www.youtube.com/watch?v=FzZWDblcEjY Language setup in Mac: • • • • Open your Tools Go to Set Proofing Language > OK Choose English as preferred in each case Video tutorial: https://www.youtube.com/watch?v=Geb5V6KPUn4 Microsoft Excel: First steps STEP 1: • Open your Microsoft Excel program • Go to File > Options > Language • Choose English as preferred in each case Microsoft Excel: First steps Decimal setup in Windows: STEP 2: MENU (computer menu) > CONTROL PANEL or SETTINGS STEP 3: Locate and select Change date, time, or number formats within the Clock, Language, and Region category. STEP 4: The Region dialog box will appear. Click Additional settings. https://www.youtube.com/watch?v=8nMd9c0mte8 Decimal setup in Mac: STEP 2: Open your System Preferences STEP 3: Open Language & Region > Choose Advanced STEP 4: Change the number separators (Decimals) > OK > Reboot the computer https://www.youtube.com/watch?v=oMEerRSdEHU Microsoft Excel: Data Analysis Toolpak Microsoft Excel: Data Analysis Toolpak What is it and why is it useful? • The Data Analysis Toolpak is included with every copy of Excel. It gives you access to a wide variety of statistical functions including histograms, correlation, a range of ztest and t-test functions and a random number generator • The program provides a set of data analysis tools called the Analysis ToolPak which you can use to save steps when you develop complex statistical analyses. • You provide the data and parameters for each analysis and the tool uses the appropriate statistical macro functions and then displays the results in an output table. Microsoft Excel: Data Analysis Toolpak Analysis ToolPak in Windows • Step 1: on the File tab, click Options • Step 2: under Add-ins, select Analysis ToolPak and click on the Go button Microsoft Excel: Data Analysis Toolpak Analysis ToolPak in Windows • Step 3: Check Analysis ToolPak and click on OK Microsoft Excel: Data Analysis Toolpak Analysis ToolPak in Windows • Step 4: on the Data tab, in the Analysis group, you can now click on Data Analysis • Step 5: for example, select Histogram and click OK to create a Histogram in Excel Microsoft Excel: Data Analysis Toolpak For a short tutorial, watch this 2-minute-long video on Youtube and make sure that the ToolPak is properly installed on your computer https://www.youtube.com/watch?v=_yNxLFagKgw Microsoft Excel: Data Analysis Toolpak Analysis ToolPak in Mac • Click the Tools menu > Excel Add-ins. • In the Add-Ins available box, select the Analysis ToolPak check box > OK. • If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. • If you get a prompt that the Analysis ToolPak is not currently installed on your computer, click Yes to install it. • Quit and restart Excel. Data Analysis Toolpak: Example 1 – Descriptive Statistics Let’s see an example! Data Analysis Toolpak: Example 1 – Descriptive Statistics Step 1: open a new Excel sheet and type in the following data about the available prices of a hotel Step 2: click on the Data tab in the upper toolbar and then on the installed Data Analysis button Data Analysis Toolpak: Example 1 – Descriptive Statistics Step 3: select the Descriptive Statistics option and click on OK Data Analysis Toolpak: Example 1 – Descriptive Statistics Step 4: set up the following options • Within the Input Range, select the entire area of the data in your sheet including the title • Activate the Labels in First Row box • Select New Worksheet Ply under the Output Options • Choose Summary statistics and click on OK • Wait a few seconds… Data Analysis Toolpak: Example 1 – Descriptive Statistics Step 5: check your Sheet 2 with the detailed results of your analysis You must see 13 different outputs in a separate table What happened? In one minute, you carried out 13 independent operations with your ToolPak saving an immense amount of time! Data Analysis Toolpak: Example 2 – Sampling Under Chapter 2 of Moodle, you will find an Excel file named Generating sample with the following original data. Your task is to carry out two different approaches of sampling by using the Data Analysis ToolPak. What is sampling? Creating a representative group of individuals or cases from a particular population, e.g.: in a survey. Potential problem: If a sample isn't randomly selected, it will be probably biased in some way and the data may not be representative for the entire population we cannot generalize correctly. Data Analysis Toolpak: Example 2 – Sampling Random sampling Types of Sampling Systematic (periodic) random sample Every individual observation has equal probability to be selected into a sample. In random sampling there should be no pattern! Members of the population are put in some order. A starting point is selected at random, and every nth member is selected to be in the sample e.g.: A principal takes an alphabetized list of student names and picks a random starting point. Every 20th is selected to take a survey. Data Analysis Toolpak: Example 2 – Sampling Step 1: create two new columns named Random sample and Systematic random sample Step 2: go to Data Data Analysis ToolPak select the Sampling tool Data Analysis Toolpak: Example 2 – Sampling Step 3: select the area of the original data without the title. Step 4: select the Random sampling option and type in 10 (number of samples to be exported as results). Step 5: choose your output range (where your results will be exported automatically, it should be under your Random sample cell. Data Analysis Toolpak: Example 2 – Sampling Step 6: go to Data Data Analysis ToolPak select the Sampling tool Step 7: select the Periodic sampling option and type in 4 (every 4th data will be randomly selected) Step 8: choose your output range (where your results will be exported automatically, it should be under your Periodic sample cell Data Analysis Toolpak: Example 2 – Sampling For the full tutorial, watch this 7-minutelong video on Youtube for more detailed explanations: https://www.youtube.com/watch?v=bEvWsb_018o How to create random numbers? Formula: =RAND() The Random or RAND Excel function generates random values greater than 0 but less than 1. RAND will calculate a new value each time the worksheet is calculated. =RAND()*100: numbers between 0 and 100; =a+ (b-a)*RAND(): random number between two numbers. Thank you for your attention! [email protected]