Visualization Techniques PDF
Document Details
Uploaded by SuitableInterstellar
Eindhoven University of Technology
Tags
Summary
This document provides an overview of visualization techniques. It explores different types of visualization goals and nested models. It touches upon data abstraction and visual encoding/interaction idioms. The document explains how to choose the appropriate visual representations and provides an understanding, and implementation considerations related to visual communication. It also discusses data representations like trees and their visualization. Finally, it touches upon color and other visual channels for data presentation.
Full Transcript
Lecture 1 Visualization is typically used for data exploration and making the unseen visible. Visualization pipeline Three types of goals: 1. To Explore = nothing is known, visualization used for data exploration 2. To Analyse = there are hypotheses, vis used for verification or falsification...
Lecture 1 Visualization is typically used for data exploration and making the unseen visible. Visualization pipeline Three types of goals: 1. To Explore = nothing is known, visualization used for data exploration 2. To Analyse = there are hypotheses, vis used for verification or falsification 3. To Present = everything is known about he data, vis used for communication of results. Nested model Way to go through the design process. its an iterative/refinement process. 1. Domain situation a. Understand the user()needs/wants/limitation/skills, the data and tasks. How to provide actionable knowledge. How to make them satisfied. b. Domain specific vocabulary c. Produce set of tasks/questions of target users on the data. 2. Data/task abstraction a. Data described in generic terms: table, hierarchy, sets. b. Tasks described in generic terms: search compare, see trend. 3. Visual encoding/interaction idiom a. Design space, select visual encodings, define interactions. 4. Algorithm a. Layout algorithm, ordering and rendering. Dangers at each level Sketching is fast, imprecise and conceptual since computer tools are slow and precise. Data abstraction What Data types: 1. Items, 2. Attributes, 3. Links, 4. Positions, 5. Grids. Tables=1,2. Networks = 1,3 and 2. Trees are a specific type of networks. Geometry(spatial) = 1,2,4. Fields = 1,2,5,4 A tree is a table with items and attributes. A fields is an image which is a sampling from a continuous space. The position of each cell has a specific meaning. Tabular data = data that is displayed in columns or tables. Why In categorical there is no implicit order. We have Quantitative data = describes a measurable physical dimension. Ordinal data = categorical variables with implied order. Nominal data = describes categories without ordering. The present visualization goal focuses on conveying specific, predefined insights or conclusions to the audience, whereas the analyse/explore goal emphasizes discovering new patterns or insights from the data. Lecture 2 Marks Points, lines 1d, areas 2d and 3d marks. Links containment and connection. Visual Channels: Control the appearance of the mark Expresiveness principle: Show all but only what is in the data and mathc the chnnel/makr to data characteristsics. E ectveness principle: Encode the most important attributes with the highest ranked channels. RANKING The ranking is based on accuracy, discriminability, separability and popout. Accuracy is how accurate does the representation represent the value of the data. We are worst at representing values by angles or area and better at the ones represented by length and position. Accuracy: Power law Colour hue or shape alone: Pre-attentive Attentional system not invoked and sear speed independent of distractor count, parallel processing. Combined hue and shape: Not pre-attentive Requires attention and search speed linear with distractor count serial search. Colour Light is the electromagnetic waves we are able to see. Each colour has a di erent wavelength. Rods are achromatic you don’t see colour, they allow night vision. Cones gives colour to our images, chromatic perception (S,M,L). 6-7 million in our retina. Cone spectral sensitivity: short, medium, long wavelength response which allows to see colours. We can see 10 million colours. Colour blindness is deficiencies on the functioning cones. Red- Green colour-blindness 5-8% men and 0.5%woman. Blue yellow colour-blindness 1% males- females. We have 3 dimensions since we have 3 cones. We show colour in our computers through a screen of RGB space (3 lamps), for printers its CMY. Hue: The colour wheel. Saturation: How much grey Light/Value: how much light/brightness. CIE 193 XYZ colour: represents all human visible colours. Its build through experimentation and they are device independent. Green is slightly better distinguishable, blue is very weak. Humans are very sensitive to luminance. Colormaps of ordered attributes are hue rainbow If detail is important using the rainbow is not good use better luminance. This is because rainbow can lead to the wrong conclusion. The higher the luminance the more brightness. It’s a non-linear perception. Humans can di erentiate 100 grey levels. Webber law Perceptual system mostly operate with relative judgements, not absolute. Smallest change in ∆ stimuli that is perceived (∆s) to background stimuli (S) is constant (K). = 𝐾 Lecture 3 Gestalt Principles How human perception groups elements, see patterns and simplifies information. How humans create a while from parts. Commonly used in design. 1. Proximity: We group elements that are close to each other. 2. Similarity: We group elements which are similar tot each other. 3. Common region: We group elements that are in the same enclosed region. 4. Good fidure: Objects grouped together tend to be perceceived as a single figure. 5. Closure(Rectificaiton): WE complete missing parts. Dog walking with missing parts 6. Continuity: We tend to form and group continues lines from pieces. 7. Figure vs ground: We see dependeing on our preception of the figure or background. Tuftes principle Graphical integrity o Missing scales o Scale distortion Maximize data ink ratio. 𝒅𝒂𝒕𝒂 𝒊𝒏𝒌 o 𝒕𝒐𝒕𝒂𝒍 𝒊𝒏𝒌 𝒖𝒔𝒆𝒅 𝒊𝒏 𝒕𝒉𝒆 𝒈𝒓𝒂𝒑𝒉𝒊𝒄 o Avoid chart junk Dangers of depth We do not really see 3d, we see 2.05d. We see 2D projections brain combines into depth Motion parallax depth from view point movement Occlusion is not resolved Di iculties of 3D Occlusion – The challenge of object detection and tracking when objects are hidden or obstructed by other objects in images or Interaction complexity Di icult text legibility Perspective distortion o Interferes with all-size channel encoding o By being further away we don’t know if something is smaller because of its value or because of its distance. Justification 3D legitimate use for spatial data to understand the shape. 3D need very careful justification for non-spatial data. Resolution beats immersion Resolution is very important. The virtual reality is immersion for non spatial data is very di icult to justify. Eyes beat memory Long term memory – unlimited Short term memory o Working memory -it has a limit and when reaching the limit you get the cognitive load. Most people can remember 5 to nine words. Attention- Very limited attention for conscious visual search tasks. Animation vs side by side Side by side views easy to compare by moving eyes. Animation hard to compare visible items to memory of what you see. You need to member what you see. Its good to tell a story of my data you follow a choreograph. Its also good to transition between tow data sets. If you use it the user has to have control on the animation. Also, change blindness is poor if the er many changes at the same time since its di icult to pay attention to everything. Lecture 4 Visualization idioms Focus on visualization of tabular data. Idioms put restrictions on tasks. Idioms are described with: Data -> number of categorical data, number of quantitative attributes, semantics of keys and values. Mark -> which visual elements are used? point, lines. Channels -> How is the data encoded? arrangement and mapping. Tasks -> What are the supported tasks? Discover trends, outliers and distribution. Bar chart Data: 1 Categorical attribute (key), 1 Quantitative attribute (value) Mark: Lines Channels: Length to convey quantitative value, Spatial regions: one per mark Tasks: Compare, lookup values Scalability Hundreds of levels for key attributes. Stacked bar chart Data: what is the data behind the chart? 2 Categorical attribute (key), 1 Quantitative attribute (value) Mark: Stack of line marks Channels: Length and colour hue. Spatial regions: one per mark, Aligned: first bar Unaligned: other bars, this is the downside. Tasks: Compare, lookup values. Part-to-whole relationship. Scalability Hundreds of levels for key attributes. Line chart Data: 2 Quantitative attributes. One key, one value Mark: Points, line connecting marks. Channels: Aligned lengths to express quantitative value. Separated and ordered by key attribute into horizontal regions Tasks: Find trends, Connection marks emphasize ordering of items along key axis to show relationship Scalability Hundreds of thousands. Bar charts for categorical key attributes and line charts if the key attributes are ordered. Do not use line charts for categorical key attributes it violates expressiveness principle. Streamgraph Data: 1 Categorical attribute (names), 1 Ordered key attribute (time), 1 Quantitative value attribute (counts) Marks & channels: Derived geometry: layers, height encodes counts. Tasks: Find trends Part-to-whole relationship Scalability: Hundreds of time keys. Heatmap Data: 1 Quantitative attribute Two keys, one value Mark: Separate and align in 2D matrix Indexed by 2 attributes Channels: Color by quantitative attribute Tasks: Find clusters, outliers, patterns Pie chart and polar area chart Data: 1 Categorical attribute 1 Quantitative attribute One key, one value Mark: Separate colored area Channels: Color by categorical attribute Angle for quantitative attribute Tasks: Part-to-whole judgment Scalability: one dozen Histogram Data: 1 Quantitative attribute Derive data: Keys are bins, values are counts Bin size is crucial Mark: Line Channels: Length encodes frequency Tasks: Understand distribution Boxplot Data: 1 Quantitative attribute Derive data: 5 quantitative attributes Median, min, max, lower + upper quartile Explicitly show outliers Mark: Lines (+ box) Channels: Length encodes derived values Tasks: Understand distribution Violin plot Data: 1 Quantitative attribute Derive data: 5 quantitative attributes (boxplot) Density at each point Mark: Lines (+ box) Channels: Length encodes derived values Width encodes frequency Tasks: Understand distribution Boxplot vs violin plot In violin plot we can show the density at each value and much more detail, to better understand the distribution. We need Interaction because too much data is shown in one view, di erent audiences with di erent questions and increase in engagement. There are many modalities of interaction. Low latency visual feedback. Use progress bar if in 10 seconds there is no move. Interaction techniques Change over time- advantage of digital over paper. Within the encoding there are many things that can be changed the sorting order. Consider animation. Select- basic operation for all interaction. Design choice: click vs hover. Change visual encoding of items of interest like the colour, border. Focus+Context visualization Di erent levels of detail integrated in the same view Show area/items of interest (focus) in detail and surroundings (context) in less detail Distortion techniques Lecture 5 Multivariate idioms Scatter plots Data: 2 quantitative attributes Mark: points Channels: Horizontal + vertical position Tasks: find trends, outliers, distribution and correlation. Scalability: hundred of items To extend this to multiple variables you can use the scatterplot matrix. Brushing should be used. Colour the categories to di erent the categories and reduce the size to be abelt to see them better. Another technique is the Parallel coordinate plots. Add the same amount of parallel lines as the number of variables. They are saled on te same way based ont eh maximun and minimun amount of all variables. Radar plot is a variant of PCP since the axis are arranged like a polygon. The advantage is that like this every data point is a shape. Data reduction If the size doesn’t stop us then the complexity will. Therefore we need to reduce the data. Data reduction Filter – items: we eliminate items based on their values with respect to specific attributes. Number of attributes does not change. There is a tight loop between the encoding and the interaction. We need to show immediately the result. Filter – Attributes: we eliminate attributes, the number items doesn’t change. Aggregation – Items: Merge together group of similar items. Represent many data items with a single mark. This can be doen through clustering which is defining groups of similar items. Aggregation – Attributes: Summarize attributes, the number of items doesn’t change. You can establisha a similarity measure or Dimensionality reduction. Lecture 6 Maps – Used for understanding spatial relationships. The advantages are that they familiar to people, so they know where something is. Maps acts as index from spatial(map) to semantic(data) information and vice-versa. Choropleth map Data:1 Quantitative attribute (table with 1 quantitative attribute per region), Geographic geometry Mark: Geometric area Channels: Colour Tasks: Understand spatial relationships The area of the places in the map isn’t quantitative of the values therefore don’t colour. Cartogram – size represents quantity at cost of familiarity due to distortion. Dot map – used to represent quantitative data by not using the regions, using dots. Density map – Need a data transformation to turn discrete data into continuous data, using the density estimation. Binning to represent density. Topographic map – Field data with a grid and quantitative points. Data: Scalar spatial field (1 quantitative attribute per grid cell), Geographic geometry Derived data: Isoline geometry Mark: Lines Channels: Shape, position, colour Tasks: Understand spatial relationships Caution with maps Absolute vs relative. When population are low the variation tends to be high. Network Graph- network vertices– nodes edges links It can be directed or undirected. Tree – A cycle without cycles and one rote. Every other node exactly one root acyclic. E= V-. In node link the position in of the nodes can be random or in a grid. Radial – a circle and a link between them. Hides a lot of information. Easier to put the nodes in a line and use arcs, cycles are clearly represented. Its easy to understand for non-experts. The positioning of the nodes is called layout or embedding. Compute layouts graph drawing. Force-directed algorithms – Mechanical laws, model edges as springs, also nodes repel each other. Numerically simulate until stable state is reached. Termination: when there is a fixed number of iteration, total energy below threshold, local minimum and user input. Large networks The network + hierarchy we get a compound network. Adjacency matrix is more scalable. You can use colour to represent weights. Maximize the edge visibility no crossing edgers. Its more di icult to see the path. The sorting of the nodes is very important. We can compute Trees Static trees use a node link diagram. Hyperbolic layouts is encoding the tree using a hyperbola circle with paths. Enclosure: one big square the sub squares are the child’s. Space filling technique: recursively go through the tree divide the squares into smaller squares. The problem is that long-thin rectangles can occur, the squarified layout generates nicer rectangles. We loose a little bit the order of the children. If its changing over time a problem also arises. Icicle plot and Sunburst diagram. No overlapping parent child- attribute’s easier displayed and as dense as tree maps. Tree map Enclosure techniques Scalable: very good usage of available space – show attributes Di iculty in distinction of the hierarchy (implicit) Node-link diagram Intuitive Good at exposing structure of information A lot of empty space Small multiples: With smaller multiples you have a visualisation for each of the divided time stamps. The pros are that its independent of the visualisation method used and that eyes beats memory. The cons are you ned to decide the number of multiples to use, limit on the number of multiples and multiples might be far apart di icult to spot patterns. Massive sequence view : refers to a visualization approach designed to help analyse and interpret large amounts of temporal data representing changes in the network's structure over time. This view is particularly useful for understanding evolutionary patterns or events in dynamic systems. Connceted scatterplot is a normal scatterplot with line connection marks its popular in journalism horizontal + vert axes: value attributes line connection marks: temporal order Its more engaging but the corelation is not clear. Gantt chart Used to see if the is any overlaps in time or any dependencies between the key attributes. Data:1Categorical attribute, 2 Quantitative attributes ,One key, two (related) values Mark: Line, length duration Channels: Horiz. Position: start/end times. Horiz. Length: duration Tasks: Emphasize temporal overlaps, start/end dependencies between items Scalability? Dozens of key levels, hundreds of value levels Lecture 7 Validation Algorithm validation: Downstream: analyse computational complexity. Upstream: experimental result, Visual encodings: Downstream: justification of choices important validation step directly derives from task abstraction. Upstream: The ICE-T method = evaluation of the visualization. The value equation is the insight, time, essence and confidence. Lab study We move from a formal to an informal setting. There is a hypothesis. Data/task abstraction validation: Downstream – di erent strategies. Domain validation: Downstream – observe target users. Upstream – measure adoption