Data Visualization PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document discusses data visualization methods, specifically focusing on their application in business analytics. It provides examples of different chart types, including bar charts, line charts, and pie charts, along with explanations of how to create these charts in Microsoft Excel. It also explores the use of dashboards and conditional formatting for effective data visualization.
Full Transcript
# Data Visualization The old adage "A picture is worth 1000 words" is probably truer in today's information-rich environment than ever before. In Chapter 1 we stated that data visualization is at the core of modern business analytics. Data visualization is the process of displaying data (often in l...
# Data Visualization The old adage "A picture is worth 1000 words" is probably truer in today's information-rich environment than ever before. In Chapter 1 we stated that data visualization is at the core of modern business analytics. Data visualization is the process of displaying data (often in large quantities) in a meaningful fashion to provide insights that will support better decisions. Making sense of large quantities of disparate data is necessary not only for gaining competitive advantage in today's business environment but also for surviving in it. Researchers have observed that data visualization improves decision-making, provides managers with better analysis capabilities that reduce reliance on IT professionals, and improves collaboration and information sharing. Raw data are important, particularly when one needs to identify accurate values or compare individual numbers. However, it is quite difficult to identify trends and patterns, find exceptions, or compare groups of data in tabular form. The human brain does a surprisingly good job processing visual information-if presented in an effective way. Visualizing data provides a way of communicating data at all levels of a business and can reveal surprising patterns and relationships. For many unique and intriguing examples of data visualization, visit the Data Visualization Gallery at the U.S. Census Bureau Web site, www.census.gov/dataviz/. ## **EXAMPLE 3.1 Tabular versus Visual Data Analysis** Figure 3.1 shows the data in the Excel file Monthly Product Sales. We can use the data to determine exactly how many units of a certain product were sold in a particular month, or to compare one month to another. For example, we see that sales of product A dropped in February, specifically by 6.7% (computed by the Excel formula = 1-B3/B2). Beyond such calculations, however, it is difficult to draw big picture conclusions. Figure 3.2 displays a chart of monthly sales for each product. We can easily compare overall sales of different products (Product C sells the least, for example), and identify trends (sales of Product D are increasing), other patterns (sales of Product C is relatively stable while sales of Product B fluctuates more over time), and exceptions (Product E's sales fell considerably in September). Data visualization is also important both for building decision models and for interpreting their results. For example, recall the demand-prediction models in Chapter 1 (Examples 1.9 and 1.10). To identify the appropriate model to use, we would normally have to collect and analyze data on sales demand and prices to determine the type of relationship (linear or nonlinear, for example) and estimate the values of the parameters in the model. Visualizing the data will help to identify the proper relationship and use the appropriate data analysis tool. Furthermore, complex analytical models often yield complex results. Visualizing the results often helps in understanding and gaining insight about model output and solutions. ## Dashboards Making data visible and accessible to employees at all levels is a hallmark of effective modern organizations. A dashboard is a visual representation of a set of key business measures. It is derived from the analogy of an automobile's control panel, which displays speed, gasoline level, temperature, and so on. Dashboards provide important summaries of key business information to help manage a business process or function. Dashboards might include tabular as well as visual data to allow managers to quickly locate key data. Figure 3.3 shows a simple dashboard for the product sales data in Figure 3.1 showing monthly sales for each product individually, sales of all products combined, total annual sales by product, a comparison of the last two months, and monthly percent changes by product. ## Tools and Software for Data Visualization Data visualization ranges from simple Excel charts to more advanced interactive tools and software that allow users to easily view and manipulate data with a few clicks, not only on computers, but on iPads and other devices as well. In this chapter we discuss basic tools available in Excel. In Chapter 10, we will see several other tools used in data mining applications that are available with the Excel add-in, XLMiner, that is used in this book. ## Creating Charts in Microsoft Excel Microsoft Excel provides a comprehensive charting capability with many features. With a little experimentation, you can create very professional charts for business analyses and presentations. These include vertical and horizontal bar charts, line charts, pie charts, area charts, scatter plots, and many other special types of charts. We generally do not guide you through every application but do provide some guidance for new procedures as appropriate. Certain charts work better for certain types of data, and using the wrong chart can make it difficult for the user to interpret and understand. While Excel offers many ways to make charts unique and fancy, naive users often focus more on the attention-grabbing aspects of charts rather than their effectiveness of displaying information. So we recommend that you keep charts simple, and avoid such bells and whistles as 3-D bars, cylinders, cones, and so on. We highly recommend books written by Stephen Few, such as *Show Me the Numbers* (Oakland, CA: Analytics Press, 2004) for additional guidance in developing effective data visualizations. To create a chart in Excel, it is best to first highlight the range of the data you wish to chart. The Excel Help files provide guidance on formatting your data for a particular type of chart. Click the Insert tab in the Excel ribbon (Figure 3.4). From the Charts group, click the chart type, and then click a chart subtype that you want to use. Once a basic chart is created, you may use the options in the Design and Format tabs within the Chart Tools tabs to customize your chart (Figure 3.5). In the Design tab, you can change the type of chart, data included in the chart, chart layout, and styles. The Format tab provides various formatting options. You may also customize charts easily by right-clicking on elements of the chart or by using the Quick Layout options in the Chart Layout group within the Chart Tools Design tab. You should realize that up to 10% of the male population are affected by color blindness, making it difficult to distinguish between different color variations. Although we generally display charts using Excel's default colors, which often, unfortunately, use red, experts suggest using blue-orange palettes. We suggest that you be aware of this for professional and commercial applications. ## Column and Bar Charts Excel distinguishes between vertical and horizontal bar charts, calling the former column charts and the latter bar charts. A clustered column chart compares values across categories using vertical rectangles; a stacked column chart displays the contribution of each value to the total by stacking the rectangles; and a 100% stacked column chart compares the percentage that each value contributes to a total. Column and bar charts are useful for comparing categorical or ordinal data, for illustrating differences between sets of values, and for showing proportions or percentages of a whole. ## **EXAMPLE 3.2 Creating Column Charts** The Excel file EEO Employment Report provides data on the number of employees in different categories broken down by racial/ethnic group and gender (Figure 3.6). We will construct a simple column chart for the various employment categories for all employees. First, highlight the range C3:K6, which includes the headings and data for each category. Click on the Column Chart button and then on the first chart type in the list (a clustered column chart). To add a title, click on the Add Chart Elements button in the Design tab ribbon. Click on "Chart Title" in the chart and change it to "EEO Employment Report-Alabama." The names of the data series can be changed by clicking on the Select Data button in the Data group of the Design tab. In the Select Data Source dialog (see Figure 3.7), click on "Series1" and then the Edit button. Enter the name of the data series, in this case "All Employees." Change the names of the other data series to "Men" and "Women" in a similar fashion. You can also change the order in which the data series are displayed on the chart using the up and down buttons. The final chart is shown in Figure 3.8. Be cautious when changing the scale of the numerical axis. The heights or lengths of the bars only accurately reflect the data values if the axis starts at zero. If not, the relative sizes can paint a misleading picture of the relative values of the data. ## Data Labels and Data Tables Chart Options Excel provides options for including the numerical data on which charts are based within the charts. Data labels can be added to chart elements to show the actual value of bars, for example. Data tables can also be added; these are usually better than data labels, which can get quite messy. Both can be added from the Add Chart Element Button in the Chart Tools Design tab, or also from the Quick Layout button, which provides standard design options. Figure 3.9 shows a data table added to the Alabama Employment chart. You can see that the data table provides useful additional information to improve the visualization. ## Line Charts Line charts provide a useful means for displaying data over time, as Example 3.3 illustrates. You may plot multiple data series in line charts; however, they can be difficult to interpret if the magnitude of the data values differs greatly. In that case, it would be advisable to create separate charts for each data series. ## **EXAMPLE 3.3 A Line Chart for China Export Data** Figure 3.10 shows a line chart giving the amount of U.S. exports to China in billions of dollars from the Excel file China Trade Data. The chart clearly shows a significant rise in exports starting in the year 2000, which began to level off around 2008. ## Pie Charts For many types of data, we are interested in understanding the relative proportion of each data source to the total. A pie chart displays this by partitioning a circle into pie-shaped areas showing the relative proportion. Example 3.4 provides one application. ## **EXAMPLE 3.4 A Pie Chart for Census Data** Consider the marital status of individuals in the U.S. population in the Excel file Census Education Data, a portion of which is shown in Figure 3.11. To show the relative proportion in each category, we can use a pie chart, as shown in Figure 3.12. This chart uses a layout option that shows the labels associated with the data as well as the actual proportions as percentages. A different layout that shows both the values and/or proportions can also be chosen. Data visualization professionals don't recommend using pie charts. For example, contrast the pie chart in Figure 3.12 with the column chart in Figure 3.13 for the same data. In the pie chart, it is difficult to compare the relative sizes of areas; however, the bars in the column chart can easily be compared to determine relative ratios of the data. If you do use pie charts, restrict them to small numbers of categories, always ensure that the numbers add to 100%, and use labels to display the group names and actual percentages. Avoid three-dimensional (3-D) pie charts-especially those that are rotated-and keep them simple. ## Area Charts An area chart combines the features of a pie chart with those of line charts. Area charts present more information than pie or line charts alone but may clutter the observer's mind with too many details if too many data series are used; thus, they should be used with care. ## **EXAMPLE 3.5 An Area Chart for Energy Consumption** Figure 3.14 displays total energy consumption (billion Btu) and consumption of fossil fuels from the Excel file Energy Production & Consumption. This chart shows that although total energy consumption has grown since 1949, the relative proportion of fossil fuel consumption has remained generally consistent at about half of the total, indicating that alternative energy sources have not replaced a significant portion of fossil-fuel consumption. ## Scatter Chart Scatter charts show the relationship between two variables. To construct a scatter chart, we need observations that consist of pairs of variables. For example, students in a class might have grades for both a midterm and a final exam. A scatter chart would show whether high or low grades on the midterm correspond strongly to high or low grades on the final exam or whether the relationship is weak or nonexistent. ## **EXAMPLE 3.6 A Scatter Chart for Real Estate Data** Figure 3.15 shows a scatter chart of house size (in square feet) versus the home market value from the Excel file Home Market Value. The data clearly suggest that higher market values are associated with larger homes. ## Bubble Charts A bubble chart is a type of scatter chart in which the size of the data marker corresponds to the value of a third variable; consequently, it is a way to plot three variables in two dimensions. ## **EXAMPLE 3.7 A Bubble Chart for Comparing Stock Characteristics** Figure 3.16 shows a bubble chart for displaying price, P/E (price/earnings) ratio, and market capitalization for five different stocks on one particular day in the Excel file Stock Comparisons. The position on the chart shows the price and P/E; the size of the bubble represents the market cap in billions of dollars. ## Miscellaneous Excel Charts Excel provides several additional charts for special applications. These additional types of charts (including bubble charts) can be selected and created from the Other Charts button in the Excel ribbon. These include the following: * A stock chart allows you to plot stock prices, such as the daily high, low, and close. It may also be used for scientific data such as temperature changes. * A surface chart shows 3-D data. * A doughnut chart is similar to a pie chart but can contain more than one data series. * A radar chart allows you to plot multiple dimensions of several data series. ## Geographic Data Many applications of business analytics involve geographic data. For example, problems such as finding the best location for production and distribution facilities, analyzing regional sales performance, transporting raw materials and finished goods, and routing vehicles such as delivery trucks involve geographic data. In such problems, data mapping can help in a variety of ways. Visualizing geographic data can highlight key data relationships, identify trends, and uncover business opportunities. In addition, it can often help to spot data errors and help end users understand solutions, thus increasing the likelihood of acceptance of decision models. Companies like Nike use geographic data and information systems for visualizing where products are being distributed and how that relates to demographic and sales information. This information is vital to marketing strategies. The use of prescriptive analytic models in combination with data mapping was instrumental in the success of Procter & Gamble Company's North American Supply Chain study, which saved the company in excess of $200 million dollars per year. We discuss this application in Chapter 15. Geographic mapping capabilities were introduced in Excel 2000 but were not available in Excel 2002 and later versions. These capabilities are now available through Microsoft MapPoint 2010, which must be purchased separately. MapPoint is a geographic data-mapping tool that allows you to visualize data imported from Excel and other database sources and integrate them into other Microsoft Office applications. For further information, see http://www.microsoft.com/mappoint/en-us/home.aspx. ## Other Excel Data Visualization Tools Microsoft Excel offers numerous other tools to help visualize data. These include data bars, color scales, and icon sets; sparklines, and the camera tool. We will describe each of these in the following sections. ## **Data Bars, Color Scales, and Icon Sets** These options are part of Excel's Conditional Formatting rules, which allow you to visualize different numerical values through the use of colors and symbols. Excel has a variety of standard templates to use, but you may also customize the rules to meet your own conditions and styles. We encourage you to experiment with these tools. ## **EXAMPLE 3.8 Data Visualization through Conditional Formatting** Data bars display colored bars that are scaled to the magnitude of the data values (similar to a bar chart) but placed directly within the cells of a range. Figure 3.17 shows data bars applied to the data in the Monthly Product Sales worksheet. Highlight the data in each column, click the Conditional Formatting button in the Styles group within the Home tab, select Data Bars, and choose the fill option and color. Color scales shade cells based on their numerical value using a color palette. This is another option in the Conditional Formatting menu. For example, in Figure 3.18 we use a green-yellow-red color scale, which highlights cells containing large values in green, small values in red, and middle values in yellow. The darker the green, the larger the value; the darker the red, the smaller the value. For intermediate values, you can see that the colors blend together. This provides a quick way of identifying the largest and smallest product-month sales values. Color-coding of quantitative data is commonly called a heatmap. We will see another application of a heatmap in Chapter 14. Finally, Icon Sets provide similar information using various symbols such as arrows or stoplight colors. Figure 3.19 shows an example. ## Sparklines Sparklines are graphics that summarize a row or column of data in a single cell. Sparklines were introduced by Edward Tufte, a famous expert on visual presentation of data. He described sparklines as "data-intense, design-simple, word-sized graphics." Excel has three types of sparklines: line, column, and win/loss. Line sparklines are clearly useful for time-series data, while column sparklines are more appropriate for categorical data. Win-loss sparklines are useful for data that move up or down over time. They are found in the Sparklines group within the Insert menu on the ribbon. ## **EXAMPLE 3.9 Examples of Sparklines** We will again use the Monthly Product Sales data. Figure 3.20 shows line sparklines in row 14 for each product. In column G, we display column sparklines, which are essentially small column charts. Generally you need to expand the row or column widths to display them effectively. Notice, however, that the lengths of the bars are not scaled properly to the data; for example, in the first one, products D and E are roughly one-third the value of Product E yet the bars are not scaled correctly. So be careful when using them. Figure 3.21 shows a modified worksheet in which we computed the percentage change from 1 month to the next for products A and B. The win-loss sparklines in row 14 show the patterns of sales increases and decreases, suggesting that product A has a cyclical pattern while product B changed in a more random fashion. If you click on any cell containing a sparkline, the Sparkline Tools Design tab appears, allowing you to customize colors and other options. ## Excel Camera Tool A little-known feature of Excel is the camera tool. This allows you to create live pictures of various ranges from different worksheets that you can place on a single page, size them, and arrange them easily. They are simply linked pictures of the original ranges, and the advantage is that as any data are changed or updated, the camera shots are also. This is particularly valuable for printing summaries when you need to extract data from multiple worksheets, consolidating PivotTables (introduced later in this chapter) onto one page, or for creating dashboards when the tables and charts are scattered across multiple worksheets. To use the camera too, first add it to the Quick Access Toolbar (the set of buttons above the ribbon). From the File menu, choose Options and then Quick Access Toolbar. Choose Commands, and then Commands Not in the Ribbon. Select Camera and add it. It will then appear as shown in Figure 3.22. To use it, simply highlight a range of cells (if you want to capture a chart, highlight a range of cells surrounding it), click the camera tool button and then click the location where you want to place the picture. You may size the picture just like any other Microsoft Excel object. We will illustrate this tool later in the chapter when we discuss PivotTables. ## Data Queries: Tables, Sorting, and Filtering Managers make numerous queries about data. For example, in the Purchase Orders database (Figure 1.3), they might be interested in finding all orders from a certain supplier, all orders for a particular item, or tracing orders by order data. To address these queries, we need to sort the data in some way. In other cases, managers might be interested in extracting a set of records having certain characteristics. This is termed filtering the data. For example, in the Purchase Orders database, a manager might be interested in extracting all records corresponding to a certain item. Excel provides a convenient way of formatting databases to facilitate analysis, called Tables. ## **EXAMPLE 3.10 Creating an Excel Table** We will use the Credit Risk Data file to illustrate an Excel table. First, select the range of the data, including headers (a useful shortcut is to select the first cell in the upper left corner, then click Ctrl+Shift+down arrow, and then Ctrl+Shift+right arrow). Next, click Table from the Tables group on the Insert tab and make sure that the box for My Table Has Headers is checked. (You may also just select a cell within the table and then click on Table from the Insert menu. Excel will choose the table range for you to verify.) The table range will now be formatted and will continue automatically when new data are entered. Figure 3.23 shows a portion of the result. Note that the rows are shaded and that each column header has a drop-down arrow to filter the data (we'll discuss this shortly). If you click within a table, the Table Tools Design tab will appear in the ribbon, allowing you to do a variety of things, such as change the color scheme, remove duplicates, change the formatting, and so on. ## Table-Based Calculations Suppose that in the Credit Risk Data table, we wish to calculate the total amount of savings in column C. We could, of course, simply use the function SUM(C4:C428). However, with a table, we could use the formula =SUM(Table1 [Savings]). The table name, Table1, can be found (and changed) in the Properties group of the Table Tools Design tab. Note that Savings is the name of the header in column C. One of the advantages of doing this is that if we add new records to the table, the calculation will be updated automatically, and we don't have to change the range in the formula or get a wrong result if we forget to. As another example, we could find the number of home owners using the function =COUNTIF(Table1 [Housing], "Own"). If you add additional records at the end of the table, they will automatically be included and formatted, and if you create a chart based on the data, the chart will automatically be updated if you add new records. ## Sorting Data in Excel Excel provides many ways to sort lists by rows or column or in ascending or descending order and using custom sorting schemes. The sort buttons in Excel can be found under the Data tab in the Sort & Filter group (see Figure 3.24). Select a single cell in the column you want to sort on and click the "AZ down arrow" button to sort from smallest to largest or the "AZ up arrow" button to sort from largest to smallest. You may also click the Sort button to specify criteria for more advanced sorting capabilities. ## **EXAMPLE 3.12 Sorting Data in the Purchase Orders Database** In Chapter 1 (Figure 1.3), we introduced a data set for purchase orders for an aircraft-component manufacturer. Suppose we wish to sort the data by supplier. Click on any cell in column A of the data (but not the header cell A3) and then the "AZ down" button in the Data tab. Excel will select the entire range of the data and sort by name of supplier in column A, a portion of which is shown in Figure 3.25. This allows you to easily identify the records that correspond to all orders from a particular supplier. ## Pareto Analysis Pareto analysis is a term named after an Italian economist, Vilfredo Pareto, who, in 1906, observed that a large proportion of the wealth in Italy was owned by a relatively small proportion of the people. The Pareto principle is often seen in many business situations. For example, a large percentage of sales usually comes from a small percentage of customers, a large percentage of quality defects stems from just a couple of sources, or a large percentage of inventory value corresponds to a small percentage of items. As a result, the Pareto principle is also often called the "80-20 rule," referring to the generic situation in which 80% of some output comes from 20% of some input. A Pareto analysis relies on sorting data and calculating the cumulative percentage of the characteristic of interest. ## **EXAMPLE 3.13 Applying the Pareto Principle** The Excel file Bicycle Inventory lists the inventory of bicycle models in a sporting goods store (see columns A through F in Figure 3.26).2 To conduct a Pareto analysis, we first compute the inventory value of each product by multiplying the quantity on hand by the purchase cost; this is the amount invested in the items that are currently in stock. Then we sort the data in decreasing order of inventory value and compute the percentage of the total inventory value for each product and the cumulative percentage. See columns G through I in Figure 3.26. We see that about 75% of the inventory value is accounted for by less than 40% (9 of 24) of the items. If these high-value inventories aren't selling well, the store manager may wish to keep fewer in stock. ## Filtering Data For large data files, finding a particular subset of records that meet certain characteristic by sorting can be tedious. Excel provides two filtering tools: AutoFilter for simple criteria and Advanced Filter for more complex criteria. These tools are best understood by working through some examples. ## **EXAMPLE 3.14 Filtering Records by Item Description** In the Purchase Orders database, suppose we are interested in extracting all records corresponding to the item Bolt-nut package. First, select any cell within the database. Then, from the Excel Data tab, click on Filter in the Sort & Filter group. A dropdown arrow will then be displayed on the right side of each header column. Clicking on one of these will display a drop-down box. These are the options for filtering on that column of data. Click the one next to the Item Description header. Uncheck the box for Select All and then check the box corresponding to the Bolt-nut package, as shown in Figure 3.27. Click the OK button, and the Filter tool will display only those orders for this item (Figure 3.28). Actually, the filter tool does not extract the records; it simply hides the records that don't match the criteria. However, you can copy and paste the data to another Excel worksheet, Microsoft Word document, or a PowerPoint presentation, for instance. To restore the original data file, click on the drop-down arrow again and then click Clear filter from "Item Description."