Data Visualization IT1164 SEM 2 PDF
Document Details
Isabel
Tags
Summary
This document details learning outcomes for data visualization. It goes over defining the process, the cycle of visualization, and determining the goal. It also introduces the concept of why data visualization is important.
Full Transcript
Data Visualisation IT1164 SEM 2 2. Formulate possible ways to analyse the data using the W1: Intro to data viz...
Data Visualisation IT1164 SEM 2 2. Formulate possible ways to analyse the data using the W1: Intro to data viz selected data fields 3. Identify one or two key focus of the analysis - key focus should be Learning outcomes concise, specific and measurable A. Define data vis and process B. Describe the seven visual What is visualisation variables used in mapping data C. Identify good visual rep DEF. Graphical display of abstract information for data analysis and communication Purpose is to discover and understand patterns in our data and present it visually to others Defining data visualisation Understand data that you want to visualise - data size and data preparation effort that will be required The cycle of visualisation Determine what you are trying to visualise and what information you want Determine what you're trying to measure to communicate and why - ask questions, figure out Know your audience and understand motives how it processes visual information Get the data that answers those Use a visual that conveys the information questions - gather insights in the best and simplest form for your Develop your data visualisation design - audience choose visual structure and view your data Why data visualisation Further develop insights about your business - do analysis and conclude what you have found Importance Publish the results for others to view and conduct investigation to find the underlying reasons for the data trends The way human brain processes you've identified information - visual vs text An easier and quick way to convey abstract concepts Determining goal Through data visualisation you can easily - Visualise data Identifying key focus or objective for - Classify and categorise analysis is the first step before any - Find relationship visualisation can be one - Understand composition, The purpose of an analysis can be used distribution and overlapping to understand the problem or to propose - Determine pattern and trends a solution or both - Detect outliers and anomalies 3 steps in identifying key focus: - Predict trends 1. Identify and select data fields that - Make it engaging and meaningful are ‘usable’ for decision makers. PAGE 1 Isabel – 2024 – physki on tele Data Visualisation IT1164 SEM 2 Steps and process Visualisation process Extracting raw data directly from system without processing the data - sales department get sales data Raw data converted to data tables - excel Pie chart doesn't work as well as bar format etc can allow you to analyse data graph because we need to use more Identify which visual structure is most effort to decode it but for the bar graph useful in presenting your data - pie the comparison is easier charts, bar graphs Creating visualisation to have multiple views View transformation Before you create, must think of task/stories to form your data - ask questions to find solution Mapping of the visual to the final - Data preprocessing representation [ dashboard, report] - Visual mapping Measure by expressiveness and - View transformation effectiveness [these involve user interaction, Expressiveness - an expressive collecting feedback] visualisation presents all the information, and only the information Effectiveness - a visualisation is effective Data preprocessing when it can be interpreted accurately and quickly Seven key visual variables Data is mapped to the fundamental data types Specific application data issues - missing values, errors in input, large data - Removal of missing data, interpolation [ putting average ] - Using different methods to extract relevant data - CSV,JSON,XML - Large data may require sampling, filtering , aggregation - objective - clean data will lead to meaningful visualisation Visual mapping Determining which visual representation to use PAGE Isabel – 2024 – physki on tele 2 Data Visualisation IT1164 SEM 2 - Mapping spatial data [longitude and latitude] to position on a map Good visual representation Mapping based on context - temperature to colour, blood pressure to height Important consideration: Understanding good visual rep - Compatibility between scale of data field and attribute A successful visualisation is one that - EX. ordered data attributes [ age] efficiently and accurately conveys the should not be represented by desired information to the target unordered attributes like shape audience Suitable mapping from data to W2: Intro to data viz (II) visualisation Ability to select and modify view Sufficient information density - not too Learning outcomes much or little Importance of keys, labels and legends A. Explain the Using colour with care importance of human Importance of aesthetics perception B. Discuss how human Using colour to distinguish data perception influences visualisation design Colour used well can enhance and clarify a presentation Used poorly will obscure, muddle and - confuse How to make data visualisation effective Colours for data types Form Grey scale Pictures for the eyes and mind Qualitative Full spectral scale Visualisation is only successful when it encodes information in a manner that Quantitative and Single hue sequence, our eyes can discern and our brains can ordinal single hue scale understand Getting this right is much more a science Quantitative Double ended than an art, which we can only achieve [diverging] multiple hue scale by studying human perception The goal is to translate abstract Hue circle - full spectral can use opposing information into visual representations Don't use more than 8 colours that can be easily, effectively, accurately and meaningfully decoded Suitable mapping data to visualisation Human perception PAGE Isabel – 2024 – physki on tele 3 Data Visualisation IT1164 SEM 2 Perception is the process of recognising , - organising and interpreting sensory Pre-attentive visual attributes information It deals with the human senses that generates signals from the environment through sight, hearing, touch, smell and taste We only have the 2 degree ability to focus to give us details 3 phases in perception process Sensing - deciding what stimuli to pay attention to [subject to selective perception] - Parallel - Shapes, colour, spatial, movement - pre-attentive Organising - how to arrange information in our minds - Serial - Pattern recognition - Gestalt principle Colour is pre-attentively processed Reacting - responding to stimuli; Shape is pre-attentively processed experiences will then feedback and Conjunction of 2 properties usually not influence future sensing pre-attentive - Goal-directed processing - Objects held in visual memory - attention-driven Gestalt Principle What is pre-attentive processing? It describes how people tend to organise visual elements into groups or unified wholes when certain principles are Our low-level visual system can detect a applied limited set of visual properties very rapidly and accurately These properties are called pre-attentive Law of proximity attributes, we can process and understand them almost unconsciously, We perceive objects that are located near before sending the information to the one another as belonging to the same attention processing parts of our brain group These are generally the best ways to present data, because we can see these patterns without thinking too hard Law of similarity It takes less than 500 milliseconds for the eye and the brain to process a We tend to group together objects that preattentive property of any images are similar in colour, shape and orientation PAGE Isabel – 2024 – physki on tele 4 Data Visualisation IT1164 SEM 2 A way to remember is that nominal sounds like name and nominal scales are Law of continuation like names or labels We perceive objects as belonging Ordinal data together, as part of a single whole, if they are aligned with one another or appear to form a continuation of one another The order of the values are important and significant but the difference between each one is not really known Law of closure Ordinal scales are typically measures of non-numeric concepts like satisfaction, We perceive open structures as closed, happiness, discomfort, etc complete and regular whenever there is Ordinal easy to remember as it sounds a way that we can reasonably do so like order W3: data viz Discrete data It means distinct or separate i.e data that Learning outcomes relies on counts It contains only finite values i.e. values A. Illustrates the that can only be counted in whole use of various numbers or integers and cannot be charts of broken down into fraction or decimal visualising Example, number of students in the different types school, number of cars in the parking lot, of data the number of computers in a com lab Continuous data Types of data It is unbroken set of observations that Qualitative: can be measured on a scale - Nominal It can take any numeric value, within a - ordinal finite or infinite range of possible value, Quantitative the continuous data can be broken down - Discrete into fractions and decimal i.e. it can be - Continuous meaningfully subdivided into smaller parts according to the measurement precision. Nominal Example. Age, height or weight of a person, temperature, time, money Nominal scales are used for labelling variables, without any quantitative value, Commonly-used chart types also known as “labels” The scales are mutually exclusive (no overlap) and none of them have any Vertical bar chart numerical significance Horizontal bar chart PAGE Isabel – 2024 – physki on tele 5 Data Visualisation IT1164 SEM 2 Histogram Two measures are represented by the Line chart plot axes. The third measure is Pei chart represented by the size of the bubble Scatter plot chart 8. Word clouds Bubble plot Used on unstructured data as a way to Word cloud display high - or low - frequency words Area chart 9. Area chart To show the magnitude of change between two or more data points Types of charts E.g. a visual feel for the degree of variance between the high and low price 1. Vertical bar charts of each month Best for comparing average or percentages between 2 to 7 different groups W4: data pre-processing X-axis - mainly for mutually exclusive categories ( multiple choice, or check box questions) Learning outcomes 2. Horizontal bar charts A. Identify and classify the different Best for comparing average or types of data percentages of 8 or more different groups X-axis mainly for mutually exclusive categories More on types of data 3. Histogram To illustrate sample distribution on dimensions measured with discrete Most people prefer ordinal as it has the intervals best position, length, size, shape and X-axis categories that are based on a colour continuous scale Nominal is second best with its position, 4. Pie charts shape and colour Best used to illustrate the proportion Quantitative is third with position and within groups based on one variable length Pie charts should only be used with a I dont understand what different data group of categories that combine to calls is so heres just the image try to make up a whole make sense of it 5. Line charts Illustrate trends over time To compare two variables over time 6. Scatter plots 2D plot showing the joint variation of two data items Scatter plots are useful for examining the relationship, or correlations between x and y variables 7. Bubble plots A bubble plot displays the relationship amongst at least three measures Orientate data so people can read it easily PAGE Isabel – 2024 – physki on tele 6 Data Visualisation IT1164 SEM 2 Extra characters in data fields that need to be removed before any analytics can What is good data be ran Duplicate data records which can Cleaned and well formatted data are rare happen because of multiple data entries If u happen to have them without much for the same information work yay! Lol this part is stupid Data cleaning/ pre-processing Ensuring data is parsed correctly Before we attempt to visualise the data, Parsing is done when dividing data into data cleaning and pre-processing has to parts based on some kind of delimiter/ be done separator This is often the most time consuming E.g. in a comma separated values (csv) part of the making sense of data and file. The data is delimited/ separated into could take up to 40 to 80% of the effort commas and time Regardless of what delimiter is used it needs to be a character that's not usually found in the data so that when its used Why data handling? in parse fields it will not pull fields of data apart unexpectedly Without any interesting data the visualisation is useless Fixing data with extra characters Data can be stored in subsets or various formats so conversion or any extraction is needed before visualisation can take Extra characters in fields could be place currency symbols, number signs etc Data may need to be cleaned You need to remove these extra - Any missing value? characters before analytics can be run - Should we remove outliers? - E.g. you might need to convert strings to number types so you can do arithmetic What is dirty data What can cause duplicate data Dirty data that contains some kinds of errors in them, or are in a format that is Manual mistake where a record may unfriendly or unusable have been entered twice You will need to spend time cleaning Program error where some kind of data dirty data to make sure you get a correct was submitted twice answer during your analysis Removing duplicate data it sometimes called de-duping Examples of dirty data Data that is not parsed correctly PAGE Isabel – 2024 – physki on tele 7