Introduction To Visualization PDF
Document Details
Uploaded by ysabriena
KPM Beranang
Tags
Summary
This document provides an introduction to data visualization, explaining concepts like exploration and explanation and discussing different visualization types, including scatterplots, bar charts, and pie charts. It touches upon tools like Plotly, D3.js, and Tableau, and emphasizes best practices for effective visualization.
Full Transcript
SESI 3 / 2024-2025 INTRODUCTION TO VISUALIZATION KPM Indera Mahkota preencoded.png INTRODUCTION TO VISUALIZATION 1. What is data visualization? 2. Why is data visualization important? 3. When should you visualize your data? 4. Different types of dat...
SESI 3 / 2024-2025 INTRODUCTION TO VISUALIZATION KPM Indera Mahkota preencoded.png INTRODUCTION TO VISUALIZATION 1. What is data visualization? 2. Why is data visualization important? 3. When should you visualize your data? 4. Different types of data visualization and when to use them 5. Top data visualization tools 6. Best practices and principles for effective data visualization 7. Getting started with data visualization preencoded.png WHAT IS DATA VISUALIZATION? Data visualization is the graphical or visual representation of data. It helps to highlight the most useful insights from a dataset, making it easier to spot trends, patterns, outliers, and correlations. There are two broad categories of data visualization: exploration and explanation preencoded.png EXPLORATION VS. EXPLANATION Exploration When faced with a new dataset, one of the first things you'll do is carry out an exploratory data analysis. This is where you investigate the dataset and identify some of its main features, laying the foundation for more thorough analysis. At this stage, visualizations can make it easier to get a sense of what's in your dataset and to spot any noteworthy trends or anomalies. Ultimately, you're getting an initial lay of the land and finding clues as to what the data might be trying to tell you. preencoded.png EXPLORATION VS. EXPLANATION Explanation Once you've conducted your analysis and have figured out what the data is telling you, you'll want to share these insights with others—key business stakeholders who can take action based on the data, for example, or public audiences who have an interest in your topic area. Explanatory data visualizations help you tell this story, and it's up to you to determine which visualizations will help you to do so most effectively. preencoded.png EXPLORATION VS. EXPLANATION Exploratory Visualization Explanation Visualizations Helps you figure out what's in your data, help us Helps you to communicate what you've found. understand the data. Exploration takes place while you're still analyzing the data Explanation comes towards the end of the process when you're ready to share your findings. preencoded.png WHY IS DATA VISUALIZATION IMPORTANT? The data visualization is important because it helps make data analytics useful and effective. Good data visualization presents the findings in a simple and clear way, making it easier for people to understand the meaning behind the data. We're living in an increasingly data-rich world; at the start of 2020, the digital universe comprised approximately 44 zettabytes of data. For perspective, one zettabyte is roughly equal to a trillion gigabytes. By 2025, it's estimated that around 463 exabytes of data will be created every 24 hours across the globe. An exabyte is equivalent to one billion gigabytes. Basically, we're producing tons and tons of data all the time. preencoded.png WHY IS DATA VISUALIZATION IMPORTANT? Data analytics allows us to make sense of (at least some of) that data. From a business perspective, it enables companies to learn from the past and plan ahead for the future. In fields like healthcare, it can help to improve patient care and treatment. In finance and insurance, it can help to assess risk and combat fraudulent activity. Essentially, we need data analytics in order to make smart decisions—and data visualization is a crucial part of that. preencoded.png preencoded.png preencoded.png WHY DO WE USE DATA VISUALIZATION? 1.Meaningful Storytelling preencoded.png WHY DO WE USE DATA VISUALIZATION? 2. Better decision making preencoded.png WHY DO WE USE DATA VISUALIZATION? 3. Data Literacy preencoded.png ADVANTAGES AND BENEFITS OF EFFECTIVE DATA VISUALIZATION Get an initial understanding of your data by making trends, patterns, and outliers easily visible to the naked eye Understand large volumes of data quickly and efficiently Communicate insights and findings to non-data experts, making your data accessible and actionable Tell a meaningful and impactful story, highlighting only the most relevant information for a given context preencoded.png preencoded.png preencoded.png WHEN SHOULD YOU VISUALIZE YOUR DATA? Aside from exploratory data visualization which takes place in the early stages, data visualization usually comprises the final step in the data analysis process. To recap, the data analysis process can be set out as follows: preencoded.png DATA ANALYSIS PROCESS 1. Define the question: What problem are you trying to solve? 2. Collect the data: Determine what kind of data you need and where you'll find it. 3. Clean the data: Remove errors, duplicates, outliers, and unwanted data points—anything that might skew how your data is interpreted. 4. Analyze the data: Determine the type of data analysis you need to carry out in order to find the insights you're looking for. 5. Visualize the data and share your findings: Translate your key insights into visual format (e.g. graphs, charts, or heatmaps) and present them to the relevant audience(s). preencoded.png WHAT IS DATA VISUALIZATION USED FOR? 1. Convey changes over time: For example, a line graph could be used to present how the value of Bitcoin changed over a certain time period. 2. Determine the frequency of events: You could use a histogram to visualize the frequency distribution of a single event over a certain time period (e.g. number of internet users per year from 2007 to 2021). 3. Highlight interesting relationships or correlations between variables: If you wanted to highlight the relationship between two variables (e.g. marketing spend and revenue, or hours of weekly exercise vs. cardiovascular fitness), you could use a scatter plot to see, at a glance, if one increases as the other decreases (or vice versa). preencoded.png WHAT IS DATA VISUALIZATION USED FOR? 4. Examine a network: if you want to understand how people or things are connected in a group, like your customers, using a network visualization can help 5. Analyze value and risk: If you want to weigh up value versus risk in order to figure out which opportunities or strategies are worth pursuing, data visualizations—such as a color-coded system—could help you to categorize and identify, at a glance, which items are feasible. preencoded.png DATA VISUALIZATION CATEGORIES Temporal data visualizations are linear and one-dimensional. Examples include scatterplots, timelines, and line graphs. Hierarchical visualizations organize groups within larger groups, and are often used to display clusters of information. Examples include tree diagrams, ring charts, and sunburst diagrams. Network visualizations show the relationships and connections between multiple datasets. Examples include matrix charts, word clouds, and node-link diagrams. Multidimensional or 3D visualizations are used to depict two or more variables. Examples include pie charts, Venn diagrams, stacked bar graphs, and histograms. Geospatial visualizations convey various data points in relation to physical, real-world locations (for example, voting patterns across a certain country). Examples include heat maps, cartograms, and density maps. preencoded.png preencoded.png TASK 1 1. Define correlation. 2. There are 2 types of correlation: Positive and Negative. Explain both correlation. 3. Give an example for both correlation. 4. Post on the padlet dashboard → preencoded.png TASK 2 1. Find the example of diagram for each data visualization category. 2. Use the given template to prepare your answer. preencoded.png COMMON TYPES OF DATA VISUALIZATION Five common types of data visualization 1. Scatterplots 2. Bar charts 3. Pie charts 4. Network graphs 5. Geographical maps preencoded.png SCATTERPLOTS Scatterplots (or scatter graphs) visualize the relationship between two variables. One variabale is shown on the x-axis, and the other on the y-axis, with each data point depicted as a single "dot" or item on the graph. This creates a "scatter" effect, hence the name. preencoded.png SCATTERPLOTS data does not involve time as a factor or doesn’t include time-based variables like dates, timestamps, or sequences. Scatterplots are best used for large datasets when there's no temporal element. For example, if you wanted to visualize the relationship between a person's height and weight, or between how many carats a diamond measures and its monetary value, you could easily visualize this using a scatterplot. It's important to bear in mind that scatterplots simply describe the correlation between two variables; they don't infer any kind of cause-and-effect relationship. preencoded.png BAR CHARTS Bar charts are used to plot categorical data against discrete values. Categorical data refers to data that is not numeric, and it's often used to describe certain traits or characteristics. Examples: Education level (e.g. high school, undergrad, or post-grad) Age group (e.g. under 30, under 40, under 50, or 50 and over). Discrete values are those which can only take on certain values—there are no "half measures" or "gray areas." Example, the number of people attending an event would be a discrete variable preencoded.png BAR CHARTS So, with a bar chart, you have your categorical data on the x-axis plotted against your discrete values on the y-axis. The height of the bars is directly proportional to the values they represent, making it easy to compare your data at a glance. preencoded.png PIE CHARTS Just like bar charts, pie charts are used to visualize categorical data. However, while bar charts represent multiple categories of data, pie charts are used to visualize just one single variable broken down into percentages or proportions. A pie chart is essentially a circle divided into different "slices," with each slice representing the percentage it contributes to the whole. Thus, the size of each pie slice is proportional to how much it contributes to the whole "pie." preencoded.png Pie charts Imagine you have a class of thirty students and you want to divide them up based on what color t-shirt they're wearing on a given day. The possible "slices" are red, green, blue, and yellow, with each color representing 40%, 30%, 25%, and 5% of the class total respectively. You could easily visualize this using a pie chart—and the yellow slice (5%) would be considerably thinner than the red slice (40%)! Pie charts are best suited for data that can be split into a maximum of five or six categories. preencoded.png NETWORK GRAPHS Network graphs show how different elements or entities within a network relate to one another, with each element represented by an individual node. These nodes are connected to other, related nodes via lines. Network graphs are great for spotting and representing clusters within a large network of data. Let's imagine you have a huge database filled with customers, and you want to segment them into meaningful clusters for marketing purposes. You could use a network graph to draw connections and parallels between all your customers or customer groups. With any luck, certain clusters and patterns would emerge, giving you a logical means by which to group your audience preencoded.png preencoded.png GEOGRAPHICAL MAPS Geo maps are used to visualize the distribution of data in relation to a physical, geographical area. For example, you could use a color-coded map to see how natural oil reserves are distributed across the world, or to visualize how different states voted in a political election. Maps are an extremely versatile form of data visualization, and are an excellent way of communicating all kinds of location- related data. Some other types of maps used in data visualization include dot distribution maps (think scatterplots combined with a map), and cartograms which distort the size of geographical areas to proportionally represent a given variable (population density, for example). preencoded.png GEOGRAPHICAL MAPS preencoded.png TOP DATA VISUALIZATION TOOLS 1. Plotly → Open-source software built on Python. Plotly is ideal if you've got some coding knowledge and want to create highly customizable visualizations. 2. D3.js → A free, open-source data viz library built using JavaScript. As with Plotly, you'll need some programming knowledge in order to use this data viz tool. 3. Tableau → Perhaps one of the most popular data analytics tools, Tableau is known for its user- friendliness—you don't need any coding knowledge to create beautiful visualizations in Tableau. And, unlike some other BI tools, it's good at handling large volumes of data. preencoded.png DATA VISUALIZATION BEST PRACTICES Data visualization truly is an art form—but the goal is always, first and foremost, to provide valuable information and insights. If you can do this by way of beautiful visualizations, you're onto a winner. So, when creating data visualizations, it's important to adhere to certain best practices. These will help you strike the right balance, keeping your audience engaged and informed. 1. Define a clear purpose 2. Know your audience 3. Keep it simple 4. Avoid distorting the data 5. Ensure your visualizations are inclusive preencoded.png DEFINE A CLEAR PURPOSE Like any data analytics project, it's important to define a clear purpose for your data visualizations. What are the priorities in terms of what you want to convey and communicate? What should your audience take away from your visualization? It's essential to have this defined from the outset; that way, you can ensure that you're only presenting the most valuable information— and giving your audience something they can use and act upon. preencoded.png KNOW YOUR AUDIENCE The purpose of data visualization is to communicate insights to a specific audience, so you'll want to give some thought to who your audience is and how familiar they are with the information you're presenting. What kind of context can you provide around your visualizations in order to help your audience understand them? What types of visualization are likely to be most accessible to this particular group of people? Keep your audience in mind at all times. preencoded.png KEEP IT SIMPLE When creating visualizations, it's often the case that less is more. Ultimately, you want your visualizations to be as digestible as possible, and that means trimming away any unnecessary information while presenting key insights clearly and succinctly. The goal is to keep cognitive load to a minimum—that is, the amount of "brainpower" or mental effort it takes to process information. Even if the data is complex, your visualizations don't have to be, so strive for simplicity at all times. preencoded.png AVOID DISTORTING THE DATA You should strive to present your findings as accurately as possible, so avoid any kind of visual "tricks" that could bias how your data is perceived and interpreted. Think about the labels you use, as well as how you scale your visualizations. For example, things like "blowing up" certain data segments to make them appear more significant, or starting your graph axis on a number other than zero are both bad practices which could mislead your audience. Prioritize integrity and accuracy! preencoded.png ENSURE YOUR VISUALIZATIONS ARE INCLUSIVE Last but by no means least, make sure that your visualizations are accessible and inclusive. Think about how colors, contrasts, font sizes, and the use of white space affect the readability of your visualization Is it easy for your users to distinguish between the data and see what's going on, regardless of whether they have twenty-twenty vision or a visual impairment? Inclusivity and accessibility are central to good data visualization, so don't overlook this step preencoded.png preencoded.png REFERENCES 1. Stevens, E. (2023, August 31). What is data visualization? A complete introductory guide. CareerFoundry. https://careerfoundry.com/en/blog/data-analytics/what-is-data-visualization/ 2. IBM. (2024, November 1). Data Visualization. IBM. https://www.ibm.com/topics/data-visualization 3. GeeksforGeeks. (2024, June 11). What is Data Visualization and Why is It Important? GeeksforGeeks. https://www.geeksforgeeks.org/data-visualization-and-its-importance/ 4. Tableau (2024). What Is Data Visualization? Definition, Examples, And Learning Resources. Tableau. https://www.tableau.com/visualization/what-is-data- visualization#:~:text=Data%20visualization%20is%20the%20graphical,outliers%2C%20and%20patterns%20i n%20data. preencoded.png