Visualization Analysis and Design (2014) PDF - Tamara Munzner

AN A K PETERS BOOK WITH VITALSOURCE® EBOOK A K Peters Visualization Series “A must read for researchers, sophisticated “Munzner elegantly synthesizes an astounding amount of Visualization practitioners, and graduate students.” cutting-edge work on visualization into a clear, engaging, —Jim Foley, College of Computing, Georgia Institute of Technology and comprehensive textbook that will prove indispensable Author of Computer Graphics: Principles and Practice to students, designers, and researchers.” Analysis & Design —Steven Franconeri, Department of Psychology, “Munzner’s new book is thorough and beautiful. It Northwestern University belongs on the shelf of anyone touched and enriched by visualization.” “Munzner shares her deep insights in visualization with us —Chris Johnson, Scientific Computing and Imaging Institute, in this excellent textbook, equally useful for students and University of Utah experts in the field.” —Jarke van Wijk, Department of Mathematics and Computer Science, Tamara Munzner “This is the visualization textbook I have long awaited. Eindhoven University of Technology It emphasizes abstraction, design principles, and the importance of evaluation “The book shapes the field of visualization in an and interactivity.” unprecedented way.” —Jim Hollan, Department of Cognitive Science, —Wolfgang Aigner, Institute for Creative Media Technologies, University of California, San Diego St. Pölten University of Applied Sciences “Munzner is one of the world’s very top researchers in “This book provides the most comprehensive coverage of information visualization, and this meticulously crafted the fundamentals of visualization design that I have found. volume is probably the most thoughtful and deep It is a much-needed and long-awaited resource for both synthesis the field has yet seen.” teachers and practitioners of visualization.” —Michael McGuffin, Department of Software and IT Engineering, —Kwan-Liu Ma, Department of Computer Science, École de Technologie Supérieure University of California, Davis This book’s unified approach encompasses information visualization techniques for abstract data, scientific visualization techniques for spatial data, and Access online or download to your smartphone, tablet visual analytics techniques for interweaving data or PC/Mac Search the full text of this and other titles you own transformation and analysis with interactive visual Make and share notes and highlights exploration. Suitable for both beginners and more Copy and paste text and figures for use in your own experienced designers, the book does not assume any documents experience with programming, mathematics, human– Customize your view by changing font size and layout computer interaction, or graphic design. K14708 Visualization/Human–Computer Interaction/Computer Graphics Illustrations by Eamonn Maguire Visualization Analysis & Design A K PETERS VISUALIZATION SERIES Series Editor: Tamara Munzner Visualization Analysis and Design Tamara Munzner 2014 Visualization Analysis & Design Tamara Munzner Department of Computer Science University of British Columbia Illustrations by Eamonn Maguire Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business AN A K PETERS BOOK Cover art: Genesis 6-3-00, by Aribert Munzner. Casein on paperboard, 26” × 20”, 2000. http://www.aribertmunzner.com For reuse of the diagram figures released under the CC-BY-4.0 license, written permission from the publishers is not required. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20140909 International Standard Book Number-13: 978-1-4665-0893-4 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com i i i i Contents Preface xv Why a New Book?....................................... xv Existing Books......................................... xvi Audience............................................ xvii Who’s Who........................................... xviii Structure: What’s in This Book............................... xviii What’s Not in This Book................................... xx Acknowledgments....................................... xx 1 What’s Vis, and Why Do It? 1 1.1 The Big Picture..................................... 1 1.2 Why Have a Human in the Loop?.......................... 2 1.3 Why Have a Computer in the Loop?......................... 4 1.4 Why Use an External Representation?....................... 6 1.5 Why Depend on Vision?................................ 6 1.6 Why Show the Data in Detail?............................ 7 1.7 Why Use Interactivity?................................. 9 1.8 Why Is the Vis Idiom Design Space Huge?..................... 10 1.9 Why Focus on Tasks?................................. 11 1.10 Why Focus on Effectiveness?............................. 11 1.11 Why Are Most Designs Ineffective?.......................... 12 1.12 Why Is Validation Difficult?.............................. 14 1.13 Why Are There Resource Limitations?........................ 14 1.14 Why Analyze?...................................... 16 1.15 Further Reading.................................... 18 2 What: Data Abstraction 20 2.1 The Big Picture..................................... 21 2.2 Why Do Data Semantics and Types Matter?.................... 21 2.3 Data Types....................................... 23 2.4 Dataset Types...................................... 24 2.4.1 Tables...................................... 25 2.4.2 Networks and Trees.............................. 26 2.4.2.1 Trees................................. 27 v i i i i vi Contents 2.4.3 Fields...................................... 27 2.4.3.1 Spatial Fields............................ 28 2.4.3.2 Grid Types............................. 29 2.4.4 Geometry.................................... 29 2.4.5 Other Combinations.............................. 30 2.4.6 Dataset Availability.............................. 31 2.5 Attribute Types..................................... 31 2.5.1 Categorical................................... 32 2.5.2 Ordered: Ordinal and Quantitative..................... 32 2.5.2.1 Sequential versus Diverging................... 33 2.5.2.2 Cyclic................................ 33 2.5.3 Hierarchical Attributes............................ 33 2.6 Semantics........................................ 34 2.6.1 Key versus Value Semantics......................... 34 2.6.1.1 Flat Tables............................. 34 2.6.1.2 Multidimensional Tables..................... 36 2.6.1.3 Fields................................ 37 2.6.1.4 Scalar Fields............................ 37 2.6.1.5 Vector Fields............................ 37 2.6.1.6 Tensor Fields............................ 38 2.6.1.7 Field Semantics.......................... 38 2.6.2 Temporal Semantics.............................. 38 2.6.2.1 Time-Varying Data......................... 39 2.7 Further Reading.................................... 40 3 Why: Task Abstraction 42 3.1 The Big Picture..................................... 43 3.2 Why Analyze Tasks Abstractly?........................... 43 3.3 Who: Designer or User................................ 44 3.4 Actions.......................................... 45 3.4.1 Analyze..................................... 45 3.4.1.1 Discover............................... 47 3.4.1.2 Present............................... 47 3.4.1.3 Enjoy................................ 48 3.4.2 Produce..................................... 49 3.4.2.1 Annotate.............................. 49 3.4.2.2 Record................................ 49 3.4.2.3 Derive................................ 50 3.4.3 Search...................................... 53 3.4.3.1 Lookup............................... 53 3.4.3.2 Locate................................ 53 3.4.3.3 Browse................................ 53 3.4.3.4 Explore............................... 54 Contents vii 3.4.4 Query...................................... 54 3.4.4.1 Identify............................... 54 3.4.4.2 Compare............................... 55 3.4.4.3 Summarize............................. 55 3.5 Targets.......................................... 55 3.6 How: A Preview..................................... 57 3.7 Analyzing and Deriving: Examples.......................... 59 3.7.1 Comparing Two Idioms............................ 59 3.7.2 Deriving One Attribute............................ 60 3.7.3 Deriving Many New Attributes........................ 62 3.8 Further Reading.................................... 64 4 Analysis: Four Levels for Validation 66 4.1 The Big Picture..................................... 67 4.2 Why Validate?...................................... 67 4.3 Four Levels of Design................................. 67 4.3.1 Domain Situation............................... 69 4.3.2 Task and Data Abstraction.......................... 70 4.3.3 Visual Encoding and Interaction Idiom................... 71 4.3.4 Algorithm.................................... 72 4.4 Angles of Attack.................................... 73 4.5 Threats to Validity................................... 74 4.6 Validation Approaches................................. 75 4.6.1 Domain Validation............................... 77 4.6.2 Abstraction Validation............................ 78 4.6.3 Idiom Validation................................ 78 4.6.4 Algorithm Validation.............................. 80 4.6.5 Mismatches................................... 81 4.7 Validation Examples.................................. 81 4.7.1 Genealogical Graphs............................. 81 4.7.2 MatrixExplorer................................. 83 4.7.3 Flow Maps................................... 85 4.7.4 LiveRAC..................................... 87 4.7.5 LinLog...................................... 89 4.7.6 Sizing the Horizon............................... 90 4.8 Further Reading.................................... 91 5 Marks and Channels 94 5.1 The Big Picture..................................... 95 5.2 Why Marks and Channels?.............................. 95 5.3 Deﬁning Marks and Channels............................ 95 5.3.1 Channel Types................................. 99 5.3.2 Mark Types................................... 99 viii Contents 5.4 Using Marks and Channels.............................. 99 5.4.1 Expressiveness and Effectiveness...................... 100 5.4.2 Channel Rankings............................... 101 5.5 Channel Effectiveness................................. 103 5.5.1 Accuracy.................................... 103 5.5.2 Discriminability................................ 106 5.5.3 Separability................................... 106 5.5.4 Popout...................................... 109 5.5.5 Grouping.................................... 111 5.6 Relative versus Absolute Judgements........................ 112 5.7 Further Reading.................................... 114 6 Rules of Thumb 116 6.1 The Big Picture..................................... 117 6.2 Why and When to Follow Rules of Thumb?..................... 117 6.3 No Unjustiﬁed 3D................................... 117 6.3.1 The Power of the Plane............................ 118 6.3.2 The Disparity of Depth............................ 118 6.3.3 Occlusion Hides Information......................... 120 6.3.4 Perspective Distortion Dangers....................... 121 6.3.5 Other Depth Cues............................... 123 6.3.6 Tilted Text Isn’t Legibile............................ 124 6.3.7 Beneﬁts of 3D: Shape Perception...................... 124 6.3.8 Justiﬁcation and Alternatives........................ 125 Example: Cluster–Calendar Time-Series Vis............... 125 Example: Layer-Oriented Time-Series Vis................. 128 6.3.9 Empirical Evidence.............................. 129 6.4 No Unjustiﬁed 2D................................... 131 6.5 Eyes Beat Memory................................... 131 6.5.1 Memory and Attention............................ 132 6.5.2 Animation versus Side-by-Side Views.................... 132 6.5.3 Change Blindness............................... 133 6.6 Resolution over Immersion.............................. 134 6.7 Overview First, Zoom and Filter, Details on Demand............... 135 6.8 Responsiveness Is Required............................. 137 6.8.1 Visual Feedback................................ 138 6.8.2 Latency and Interaction Design....................... 138 6.8.3 Interactivity Costs............................... 140 6.9 Get It Right in Black and White........................... 140 6.10 Function First, Form Next.............................. 140 6.11 Further Reading.................................... 141 Contents ix 7 Arrange Tables 144 7.1 The Big Picture..................................... 145 7.2 Why Arrange?...................................... 145 7.3 Arrange by Keys and Values............................. 145 7.4 Express: Quantitative Values............................. 146 Example: Scatterplots............................ 146 7.5 Separate, Order, and Align: Categorical Regions.................. 149 7.5.1 List Alignment: One Key........................... 149 Example: Bar Charts............................. 150 Example: Stacked Bar Charts........................ 151 Example: Streamgraphs........................... 153 Example: Dot and Line Charts....................... 155 7.5.2 Matrix Alignment: Two Keys......................... 157 Example: Cluster Heatmaps......................... 158 Example: Scatterplot Matrix......................... 160 7.5.3 Volumetric Grid: Three Keys......................... 161 7.5.4 Recursive Subdivision: Multiple Keys.................... 161 7.6 Spatial Axis Orientation................................ 162 7.6.1 Rectilinear Layouts.............................. 162 7.6.2 Parallel Layouts................................ 162 Example: Parallel Coordinates........................ 162 7.6.3 Radial Layouts................................. 166 Example: Radial Bar Charts......................... 167 Example: Pie Charts............................. 168 7.7 Spatial Layout Density................................ 171 7.7.1 Dense...................................... 172 Example: Dense Software Overviews.................... 172 7.7.2 Space-Filling.................................. 174 7.8 Further Reading.................................... 175 8 Arrange Spatial Data 178 8.1 The Big Picture..................................... 179 8.2 Why Use Given?.................................... 179 8.3 Geometry........................................ 180 8.3.1 Geographic Data................................ 180 Example: Choropleth Maps......................... 181 8.3.2 Other Derived Geometry........................... 182 8.4 Scalar Fields: One Value............................... 182 8.4.1 Isocontours................................... 183 Example: Topographic Terrain Maps.................... 183 Example: Flexible Isosurfaces........................ 185 8.4.2 Direct Volume Rendering........................... 186 Example: Multidimensional Transfer Functions............. 187 x Contents 8.5 Vector Fields: Multiple Values............................ 189 8.5.1 Flow Glyphs.................................. 191 8.5.2 Geometric Flow................................ 191 Example: Similarity-Clustered Streamlines................ 192 8.5.3 Texture Flow.................................. 193 8.5.4 Feature Flow.................................. 193 8.6 Tensor Fields: Many Values.............................. 194 Example: Ellipsoid Tensor Glyphs..................... 194 8.7 Further Reading.................................... 197 9 Arrange Networks and Trees 200 9.1 The Big Picture..................................... 201 9.2 Connection: Link Marks................................ 201 Example: Force-Directed Placement.................... 204 Example: sfdp................................. 207 9.3 Matrix Views...................................... 208 Example: Adjacency Matrix View...................... 208 9.4 Costs and Beneﬁts: Connection versus Matrix................... 209 9.5 Containment: Hierarchy Marks........................... 213 Example: Treemaps.............................. 213 Example: GrouseFlocks........................... 215 9.6 Further Reading.................................... 216 10 Map Color and Other Channels 218 10.1 The Big Picture..................................... 219 10.2 Color Theory...................................... 219 10.2.1 Color Vision.................................. 219 10.2.2 Color Spaces.................................. 220 10.2.3 Luminance, Saturation, and Hue...................... 223 10.2.4 Transparency.................................. 225 10.3 Colormaps........................................ 225 10.3.1 Categorical Colormaps............................ 226 10.3.2 Ordered Colormaps.............................. 229 10.3.3 Bivariate Colormaps.............................. 234 10.3.4 Colorblind-Safe Colormap Design...................... 235 10.4 Other Channels..................................... 236 10.4.1 Size Channels................................. 236 10.4.2 Angle Channel................................. 237 10.4.3 Curvature Channel.............................. 238 10.4.4 Shape Channel................................. 238 10.4.5 Motion Channels................................ 238 10.4.6 Texture and Stippling............................. 239 10.5 Further Reading.................................... 240 Contents xi 11 Manipulate View 242 11.1 The Big Picture..................................... 243 11.2 Why Change?...................................... 244 11.3 Change View over Time................................ 244 Example: LineUp............................... 246 Example: Animated Transitions....................... 248 11.4 Select Elements..................................... 249 11.4.1 Selection Design Choices........................... 250 11.4.2 Highlighting.................................. 251 Example: Context-Preserving Visual Links................ 253 11.4.3 Selection Outcomes.............................. 254 11.5 Navigate: Changing Viewpoint............................ 254 11.5.1 Geometric Zooming.............................. 255 11.5.2 Semantic Zooming............................... 255 11.5.3 Constrained Navigation............................ 256 11.6 Navigate: Reducing Attributes............................ 258 11.6.1 Slice....................................... 258 Example: HyperSlice............................. 259 11.6.2 Cut........................................ 260 11.6.3 Project...................................... 261 11.7 Further Reading.................................... 261 12 Facet into Multiple Views 264 12.1 The Big Picture..................................... 265 12.2 Why Facet?....................................... 265 12.3 Juxtapose and Coordinate Views.......................... 267 12.3.1 Share Encoding: Same/Different...................... 267 Example: Exploratory Data Visualizer (EDV)............... 268 12.3.2 Share Data: All, Subset, None........................ 269 Example: Bird’s-Eye Maps.......................... 270 Example: Multiform Overview–Detail Microarrays............ 271 Example: Cerebral.............................. 274 12.3.3 Share Navigation: Synchronize....................... 276 12.3.4 Combinations................................. 276 Example: Improvise.............................. 277 12.3.5 Juxtapose Views................................ 278 12.4 Partition into Views.................................. 279 12.4.1 Regions, Glyphs, and Views......................... 279 12.4.2 List Alignments................................ 281 12.4.3 Matrix Alignments............................... 282 Example: Trellis................................ 282 12.4.4 Recursive Subdivision............................. 285 12.5 Superimpose Layers.................................. 288 xii Contents 12.5.1 Visually Distinguishable Layers....................... 289 12.5.2 Static Layers.................................. 289 Example: Cartographic Layering...................... 289 Example: Superimposed Line Charts.................... 290 Example: Hierarchical Edge Bundles.................... 292 12.5.3 Dynamic Layers................................ 294 12.6 Further Reading.................................... 295 13 Reduce Items and Attributes 298 13.1 The Big Picture..................................... 299 13.2 Why Reduce?...................................... 299 13.3 Filter........................................... 300 13.3.1 Item Filtering.................................. 301 Example: FilmFinder............................. 301 13.3.2 Attribute Filtering............................... 303 Example: DOSFA............................... 304 13.4 Aggregate........................................ 305 13.4.1 Item Aggregation................................ 305 Example: Histograms............................. 306 Example: Continuous Scatterplots..................... 307 Example: Boxplot Charts........................... 308 Example: SolarPlot.............................. 310 Example: Hierarchical Parallel Coordinates................ 311 13.4.2 Spatial Aggregation.............................. 313 Example: Geographically Weighted Boxplots............... 313 13.4.3 Attribute Aggregation: Dimensionality Reduction............. 315 13.4.3.1 Why and When to Use DR?.................... 316 Example: Dimensionality Reduction for Document Collections..... 316 13.4.3.2 How to Show DR Data?...................... 319 13.5 Further Reading.................................... 320 14 Embed: Focus+Context 322 14.1 The Big Picture..................................... 323 14.2 Why Embed?...................................... 323 14.3 Elide........................................... 324 Example: DOITrees Revisited........................ 325 14.4 Superimpose...................................... 326 Example: Toolglass and Magic Lenses................... 326 14.5 Distort.......................................... 327 Example: 3D Perspective........................... 327 Example: Fisheye Lens............................ 328 Example: Hyperbolic Geometry....................... 329 Contents xiii Example: Stretch and Squish Navigation................. 331 Example: Nonlinear Magniﬁcation Fields................. 333 14.6 Costs and Beneﬁts: Distortion............................ 334 14.7 Further Reading.................................... 337 15 Analysis Case Studies 340 15.1 The Big Picture..................................... 341 15.2 Why Analyze Case Studies?.............................. 341 15.3 Graph-Theoretic Scagnostics............................. 342 15.4 VisDB.......................................... 347 15.5 Hierarchical Clustering Explorer........................... 351 15.6 PivotGraph....................................... 355 15.7 InterRing........................................ 358 15.8 Constellation...................................... 360 15.9 Further Reading.................................... 366 Figure Credits 369 Bibliography 375 Idiom and System Examples Index 397 Concept Index 399 This page intentionally left blank Preface Why a New Book? I wrote this book to scratch my own itch: the book I wanted to teach out of for my graduate visualization (vis) course did not exist. The itch grew through the years of teaching my own course at the University of British Columbia eight times, co-teaching a course at Stanford in 2001, and helping with the design of an early vis course at Stanford in 1996 as a teaching assistant. I was dissatisﬁed with teaching primarily from original research papers. While it is very useful for graduate students to learn to read papers, what was missing was a synthesis view and a frame- work to guide thinking. The principles and design choices that I intended a particular paper to illustrate were often only indirectly alluded to in the paper itself. Even after assigning many papers or book chapters as preparatory reading before each lecture, I was frustrated by the many major gaps in the ideas discussed. More- over, the reading load was so heavy that it was impossible to ﬁt in any design exercises along the way, so the students only gained direct experience as designers in a single monolithic ﬁnal project. I was also dissatisﬁed with the lecture structure of my own course because of a problem shared by nearly every other course in the ﬁeld: an incoherent approach to crosscutting the subject mat- ter. Courses that lurch from one set of crosscuts to another are intellectually unsatisfying in that they make vis seem like a grab- bag of assorted topics rather than a ﬁeld with a unifying theoretical framework. There are several major ways to crosscut vis mate- rial. One is by the ﬁeld from which we draw techniques: cognitive science for perception and color, human–computer interaction for user studies and user-centered design, computer graphics for ren- dering, and so on. Another is by the problem domain addressed: for example, biology, software engineering, computer networking, medicine, casual use, and so on. Yet another is by the families of techniques: focus+context, overview/detail, volume rendering, xv xvi Preface and statistical graphics. Finally, evaluation is an important and central topic that should be interwoven throughout, but it did not ﬁt into the standard pipelines and models. It was typically rele- gated to a single lecture, usually near the end, so that it felt like an afterthought. Existing Books Vis is a young ﬁeld, and there are not many books that provide a synthesis view of the ﬁeld. I saw a need for a next step on this front. Tufte is a curator of glorious examples [Tufte 83, Tufte 91, Tufte 97], but he focuses on what can be done on the static printed page for purposes of exposition. The hallmarks of the last 20 years of computer-based vis are interactivity rather than simply static presentation and the use of vis for exploration of the unknown in addition to exposition of the known. Tufte’s books do not address these topics, so while I use them as supplementary material, I ﬁnd they cannot serve as the backbone for my own vis course. However, any or all of them would work well as supplementary reading for a course structured around this book; my own favorite for this role is Envisioning Information [Tufte 91]. Some instructors use Readings in Information Visualization [Card et al. 99]. The ﬁrst chapter provides a useful synthesis view of the ﬁeld, but it is only one chapter. The rest of the book is a collection of seminal papers, and thus it shares the same problem as directly reading original papers. Here I provide a book-length synthesis, and one that is informed by the wealth of progress in our ﬁeld in the past 15 years. Ware’s book Information Visualization: Perception for Design [Ware 13] is a thorough book on vis design as seen through the lens of perception, and I have used it as the backbone for my own course for many years. While it discusses many issues on how one could design a vis, it does not cover what has been done in this ﬁeld for the past 14 years from a synthesis point of view. I wanted a book that allows a beginning student to learn from this collective experience rather than starting from scratch. This book does not attempt to teach the very useful topic of perception per se; it covers only the aspects directly needed to get started with vis and leaves the rest as further reading. Ware’s shorter book, Visual Thinking for Design [Ware 08], would be excellent supplemental reading for a course structured around this book. Preface xvii This book offers a considerably more extensive model and framework than Spence’s Information Visualization [Spence 07]. Wilkinson’s The Grammar of Graphics [Wilkinson 05] is a deep and thoughtful work, but it is dense enough that it is more suitable for vis insiders than for beginners. Conversely, Few’s Show Me The Numbers [Few 12] is extremely approachable and has been used at the undergraduate level, but the scope is much more limited than the coverage of this book. The recent book Interactive Data Visualization [Ward et al. 10] works from the bottom up with algorithms as the base, whereas I work from the top down and stop one level above algorithmic con- siderations; our approaches are complementary. Like this book, it covers both nonspatial and spatial data. Similarly, the Data Visu- alization [Telea 07] book focuses on the algorithm level. The book on The Visualization Toolkit [Schroeder et al. 06] has a scope far be- yond the vtk software, with considerable synthesis coverage of the concerns of visualizing spatial data. It has been used in many sci- entiﬁc visualization courses, but it does not cover nonspatial data. The voluminous Visualization Handbook [Hansen and Johnson 05] is an edited collection that contains a mix of synthesis material and research speciﬁcs; I refer to some speciﬁc chapters as good re- sources in my Further Reading sections at the end of each chapter in this book. Audience The primary audience of this book is students in a ﬁrst vis course, particularly at the graduate level but also at the advanced under- graduate level. While admittedly written from a computer scien- tist’s point of view, the book aims to be accessible to a broad audi- ence including students in geography, library science, and design. It does not assume any experience with programming, mathemat- ics, human–computer interaction, cartography, or graphic design; for those who do have such a background, some of the terms that I deﬁne in this book are connected with the specialized vocabu- lary from these areas through notes in the margins. Other au- diences are people from other ﬁelds with an interest in vis, who would like to understand the principles and design choices of this ﬁeld, and practitioners in the ﬁeld who might use it as a reference for a more formal analysis and improvements of production vis applications. I wrote this book for people with an interest in the design and analysis of vis idioms and systems. That is, this book is aimed xviii Preface at vis designers, both nascent and experienced. This book is not directly aimed at vis end users, although they may well ﬁnd some of this material informative. The book is aimed at both those who take a problem-driven approach and those who take a technique-driven approach. Its focus is on broad synthesis of the general underpinnings of vis in terms of principles and design choices to provide a framework for the design and analysis of techniques, rather than the algorithms to instantiate those techniques. The book features a uniﬁed approach encompassing informa- tion visualization techniques for abstract data, scientiﬁc visualiza- tion techniques for spatial data, and visual analytics techniques for interleaving data transformation and analysis with interactive visual exploration. Who’s Who I use pronouns in a deliberate way in this book, to indicate roles. I am the author of this book. I cover many ideas that have a long and rich history in the ﬁeld, but I also advocate opinions that are not necessarily shared by all visualization researchers and practi- tioners. The pronoun you means the reader of this book; I address you as if you’re designing or analyzing a visualization system. The pronoun they refers to the intended users, the target audience for whom a visualization system is designed. The pronoun we refers to all humans, especially in terms of our shared perceptual and cognitive responses. I’ll also use the abbreviation vis throughout this book, since visualization is quite a mouthful! Structure: What’s in This Book The book begins with a deﬁnition of vis and walks through its many implications in Chapter 1, which ends with a high-level introduc- tion to an analysis framework of breaking down vis design accord- ing what–why–how questions that have data–task–idiom answers. Chapter 2 addresses the what question with answers about data abstractions, and Chapter 3 addresses the why question with task abstractions, including an extensive discussion of deriving new data, a preview of the framework of design choices for how id- ioms can be designed, and several examples of analysis through this framework. Preface xix Chapter 4 extends the analysis framework to two additional lev- els: the domain situation level on top and the algorithm level on the bottom, with the what/why level of data and task abstraction and the how level of visual encoding and interaction idiom design in between the two. This chapter encourages using methods to val- idate your design in a way that matches up with these four levels. Chapter 5 covers the principles of marks and channels for en- coding information. Chapter 6 presents eight rules of thumb for design. The core of the book is the framework for analyzing how vis idioms can be constructed out of design choices. Three chapters cover choices of how to visually encode data by arranging space: Chapter 7 for tables, Chapter 8 for spatial data, and Chapter 9 for networks. Chapter 10 continues with the choices for mapping color and other channels in visual encoding. Chapter 11 discusses ways to manipulate and change a view. Chapter 12 covers ways to facet data between multiple views. Choices for how to reduce the amount of data shown in each view are covered in Chapter 13, and Chapter 14 covers embedding information about a focus set within the context of overview data. Chapter 15 wraps up the book with six case studies that are analyzed in detail with the full framework. Each design choice is illustrated with concrete examples of spe- ciﬁc idioms that use it. Each example is analyzed by decompos- ing its design with respect to the design choices that have been presented so far, so these analyses become more extensive as the chapters progress; each ends with a table summarizing the analy- sis. The book’s intent is to get you familiar with analyzing existing idioms as a springboard for designing new ones. I chose the particular set of concrete examples in this book as evocative illustrations of the space of vis idioms and my way to approach vis analysis. Although this set of examples does cover many of the more popular idioms, it is certainly not intended to be a complete enumeration of all useful idioms; there are many more that have been proposed that aren’t in here. These examples also aren’t intended to be a historical record of who ﬁrst proposed which ideas: I often pick more recent examples rather than the very ﬁrst use of a particular idiom. All of the chapters start with a short section called The Big Pic- ture that summarizes their contents, to help you quickly deter- mine whether a chapter covers material that you care about. They all end with a Further Reading section that points you to more in- formation about their topics. Throughout the book are boxes in the margins: vocabulary notes in purple starting with a star, and xx Preface cross-reference notes in blue starting with a triangle. Terms are highlighted in purple where they are deﬁned for the ﬁrst time. The book has an accompanying web page at http://www.cs.ubc. ca/∼tmm/vadbook with errata, pointers to courses that use the book in different ways, example lecture slides covering the mate- rial, and downloadable versions of the diagram ﬁgures. What’s Not in This Book This book focuses on the abstraction and idiom levels of design and doesn’t cover the domain situation level or the algorithm levels. I have left out algorithms for reasons of space and time, not of interest. The book would need to be much longer if it covered algo- rithms at any reasonable depth; the middle two levels provide more than enough material for a single volume of readable size. Also, many good resources already exist to learn about algorithms, in- cluding original papers and some of the previous books discussed above. Some points of entry for this level are covered in Further Reading sections at the end of each chapter. Moreover, this book is intended to be accessible to people without a computer science background, a decision that precludes algorithmic detail. A ﬁnal consideration is that the state of the art in algorithms changes quickly; this book aims to provide a framework for thinking about design that will age more gracefully. The book includes many con- crete examples of previous vis tools to illustrate points in the design space of possible idioms, not as the ﬁnal answer for the very latest and greatest way to solve a particular design problem. The domain situation level is not as well studied in the vis lit- erature as the algorithm level, but there are many relevant re- sources from other literatures including human–computer interac- tion. Some points of entry for this level are also covered in Further Reading. Acknowledgments My thoughts on visualization in general have been inﬂuenced by many people, but especially Pat Hanrahan and the students in the vis group while I was at Stanford: Robert Bosch, Chris Stolte, Diane Tang, and especially François Guimbretiére. This book has beneﬁted from the comments and thoughts of many readers at different stages. Preface xxi I thank the recent members of my research group for their incisive comments on chapter drafts and their patience with my sometimes-obsessive focus on this book over the past six years: Matt Brehmer, Jessica Dawson, Joel Ferstay, Stephen Ingram, Miriah Meyer, and especially Michael Sedlmair. I also thank the previous members of my group for their collaboration and discus- sions that have helped shape my thinking: Daniel Archambault, Aaron Barsky, Adam Bodnar, Kristian Hildebrand, Qiang Kong, Heidi Lam, Peter McLachlan, Dmitry Nekrasovski, James Slack, Melanie Tory, and Matt Williams. I thank several people who gave me useful feedback on my Visu- alization book chapter [Munzner 09b] in the Fundamentals of Com- puter Graphics textbook [Shirley and Marschner 09]: TJ Jankun- Kelly, Robert Kincaid, Hanspeter Pﬁster, Chris North, Stephen North, John Stasko, Frank van Ham, Jarke van Wijk, and Mar- tin Wattenberg. I used that chapter as a test run of my initial structure for this book, so their feedback has carried forward into this book as well. I also thank early readers Jan Hardenburgh, Jon Steinhart, and Maureen Stone. Later reader Michael McGufﬁn contributed many thoughtful comments in addition to several great illustrations. Many thanks to the instructors who have test-taught out of draft versions of this book, including Enrico Bertini, Remco Chang, Heike Jänicke Leitte, Raghu Machiragu, and Melanie Tory. I espe- cially thank Michael Laszlo, Chris North, Hanspeter Pﬁster, Miriah Meyer, and Torsten Möller for detailed and thoughtful feed- back. I also thank all of the students who have used draft versions of this book in a course. Some of these courses were structured to provide me with a great deal of commentary from the students on the drafts, and I particularly thank these students for their contributions. From my own 2011 course: Anna Flagg, Niels Hanson, Jingxian Li, Louise Oram, Shama Rashid, Junhao (Ellsworth) Shi, Jillian Slind, Mashid ZeinalyBaraghoush, Anton Zoubarev, and Chuan Zhu. From North’s 2011 course: Ankit Ahuja, S.M. (Arif) Arifuzza- man, Sharon Lynn Chu, Andre Esakia, Anurodh Joshi, Chiran- jeeb Kataki, Jacob Moore, Ann Paul, Xiaohui Shu, Ankit Singh, Hamilton Turner, Ji Wang, Sharon Chu Yew Yee, Jessica Zeitz, and especially Lauren Bradel. From Pﬁster’s 2012 course: Pankaj Ahire, Rabeea Ahmed, Salen Almansoori, Ayindri Banerjee, Varun Bansal, Antony Bett, Made- xxii Preface laine Boyd, Katryna Cadle, Caitline Carey, Cecelia Wenting Cao, Zamyla Chan, Gillian Chang, Tommy Chen, Michael Cherkassky, Kevin Chin, Patrick Coats, Christopher Coey, John Connolly, Dan- iel Crookston Charles Deck, Luis Duarte, Michael Edenﬁeld, Jef- frey Ericson, Eileen Evans, Daniel Feusse, Gabriela Fitz, Dave Fobert, James Garﬁeld, Shana Golden, Anna Gommerstadt, Bo Han, William Herbert, Robert Hero, Louise Hindal, Kenneth Ho, Ran Hou, Sowmyan Jegatheesan, Todd Kawakita, Rick Lee, Na- talya Levitan, Angela Li, Eric Liao, Oscar Liu, Milady Jiminez Lopez, Valeria Espinosa Mateos, Alex Mazure, Ben Metcalf, Sarah Ngo, Pat Njolstad, Dimitris Papnikolaou, Roshni Patel, Sachin Patel, Yogesh Rana, Anuv Ratan, Pamela Reid, Phoebe Robinson, Joseph Rose, Kishleen Saini, Ed Santora, Konlin Shen, Austin Silva, Samuel Q. Singer, Syed Sobhan, Jonathan Sogg, Paul Stravropoulos, Lila Bjorg Strominger, Young Sul, Will Sun, Michael Daniel Tam, Man Yee Tang, Mark Theilmann, Gabriel Trevino, Blake Thomas Walsh, Patrick Walsh, Nancy Wei, Karisma Williams, Chelsea Yah, Amy Yin, and Chi Zeng. From Möller’s 2014 course: Tamás Birkner, Nikola Dichev, Eike Jens Gnadt, Michael Gruber, Martina Kapf, Manfred Klaffenböck, Sümeyye Kocaman, Lea Maria Joseffa Koinig, Jasmin Kuric, Mladen Magic, Dana Markovic, Christine Mayer, Anita Moser, Mag- dalena Pöhl, Michael Prater, Johannes Preisinger, Stefan Rammer, Philipp Sturmlechner, Himzo Tahic, Michael Tögel, and Kyriakoula Tsafou. I thank all of the people connected with A K Peters who con- tributed to this book. Alice Peters and Klaus Peters steadfastedly kept asking me if I was ready to write a book yet for well over a decade and helped me get it off the ground. Sarah Chow, Char- lotte Byrnes, Randi Cohen, and Sunil Nair helped me get it out the door with patience and care. I am delighted with and thankful for the graphic design talents of Eamonn Maguire of Antarctic Design, an accomplished vis re- searcher in his own right, who tirelessly worked with me to turn my hand-drawn Sharpie drafts into polished and expressive dia- grams. I am grateful for the friends who saw me through the days, through the nights, and through the years: Jen Archer, Kirsten Cameron, Jenny Gregg, Bridget Hardy, Jane Henderson, Yuri Hoff- man, Eric Hughes, Kevin Leyton-Brown, Max Read, Shevek, Anila Srivastava, Aimée Sturley, Jude Walker, Dave Whalen, and Betsy Zeller. I thank my family for their decades of love and support: Naomi Munzner, Sheila Oehrlein, Joan Munzner, and Ari Munzner. I also Preface xxiii thank Ari for the painting featured on the cover and for the way that his artwork has shaped me over my lifetime; see http://www. aribertmunzner.com. This page intentionally left blank Chapter 1 What’s Vis, and Why Do It? 1.1 The Big Picture This book is built around the following deﬁnition of visualization— vis, for short: Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively. Visualization is suitable when there is a need to augment human capabilities rather than replace people with com- putational decision-making methods. The design space of possible vis idioms is huge, and includes the consid- erations of both how to create and how to interact with visual representations. Vis design is full of trade-offs, and most possibilities in the design space are ineffective for a particular task, so validating the effectiveness of a design is both necessary and difﬁcult. Vis designers must take into account three very different kinds of resource limi- tations: those of computers, of humans, and of displays. Vis usage can be analyzed in terms of why the user needs it, what data is shown, and how the idiom is designed. I’ll discuss the rationale behind many aspects of this deﬁnition as a way of getting you to think about the scope of this book, and about visualization itself: Why have a human in the decision-making loop? Why have a computer in the loop? Why use an external representation? Why depend on vision? 1 2 1. What’s Vis, and Why Do It? Why show the data in detail? Why use interactivity? Why is the vis idiom design space huge? Why focus on tasks? Why are most designs ineffective? Why care about effectiveness? Why is validation difﬁcult? Why are there resource limitations? Why analyze vis? 1.2 Why Have a Human in the Loop? Vis allows people to analyze data when they don’t know exactly what questions they need to ask in advance. The modern era is characterized by the promise of better deci- sion making through access to more data than ever before. When people have well-deﬁned questions to ask about data, they can use purely computational techniques from ﬁelds such as statistics and The ﬁeld of machine machine learning. Some jobs that were once done by humans can learning is a branch of now be completely automated with a computer-based solution. If artiﬁcial intelligence where a fully automatic solution has been deemed to be acceptable, then computers can handle a there is no need for human judgement, and thus no need for you to wide variety of new situa- design a vis tool. For example, consider the domain of stock mar- tions in response to data- ket trading. Currently, there are many deployed systems for high- driven training, rather than frequency trading that make decisions about buying and selling by being programmed with stocks when certain market conditions hold, when a speciﬁc price explicit instructions in ad- is reached, for example, with no need at all for a time-consuming vance. check from a human in the loop. You would not want to design a vis tool to help a person make that check faster, because even an augmented human will not be able to reason about millions of stocks every second. However, many analysis problems are ill speciﬁed: people don’t know how to approach the problem. There are many possible ques- tions to ask—anywhere from dozens to thousands or more—and people don’t know which of these many questions are the right ones in advance. In such cases, the best path forward is an anal- ysis process with a human in the loop, where you can exploit the 1.2. Why Have a Human in the Loop? 3 powerful pattern detection properties of the human visual system in your design. Vis systems are appropriate for use when your goal is to augment human capabilities, rather than completely replace the human in the loop. You can design vis tools for many kinds of uses. You can make a tool intended for transitional use where the goal is to “work itself out of a job”, by helping the designers of future solutions that are purely computational. You can also make a tool intended for long- term use, in a situation where there is no intention of replacing the human any time soon. For example, you can create a vis tool that’s a stepping stone to gaining a clearer understanding of analysis requirements before developing formal mathematical or computational models. This kind of tool would be used very early in the transition process in a highly exploratory way, before even starting to develop any kind of automatic solution. The outcome of designing vis tools targeted at speciﬁc real-world domain problems is often a much crisper understanding of the user’s task, in addition to the tool itself. In the middle stages of a transition, you can build a vis tool aimed at the designers of a purely computational solution, to help them reﬁne, debug, or extend that system’s algorithms or under- stand how the algorithms are affected by changes of parameters. In this case, your tool is aimed at a very different audience than the end users of that eventual system; if the end users need vi- sualization at all, it might be with a very different interface. Re- turning to the stock market example, a higher-level system that determines which of multiple trading algorithms to use in vary- ing circumstances might require careful tuning. A vis tool to help the algorithm developers analyze its performance might be use- ful to these developers, but not to people who eventually buy the software. You can also design a vis tool for end users in conjunction with other computational decision making to illuminate whether the au- tomatic system is doing the right thing according to human judge- ment. The tool might be intended for interim use when making deployment decisions in the late stages of a transition, for exam- ple, to see if the result of a machine learning system seems to be trustworthy before entrusting it to spend millions of dollars trading stocks. In some cases vis tools are abandoned after that decision is made; in other cases vis tools continue to be in play with long-term use to monitor a system, so that people can take action if they spot unreasonable behavior. 4 1. What’s Vis, and Why Do It? Figure 1.1. The Variant View vis tool supports biologists in assessing the impact of genetic variants by speeding up the exploratory analysis process. From [Ferstay et al. 13, Figure 1]. In contrast to these transitional uses, you can also design vis tools for long-term use, where a person will stay in the loop indef- initely. A common case is exploratory analysis for scientiﬁc dis- covery, where the goal is to speed up and improve a user’s ability to generate and check hypotheses. Figure 1.1 shows a vis tool designed to help biologists studying the genetic basis of disease through analyzing DNA sequence variation. Although these scien- tists make heavy use of computation as part of their larger work- ﬂow, there’s no hope of completely automating the process of can- cer research any time soon. You can also design vis tools for presentation. In this case, you’re supporting people who want to explain something that they already know to others, rather than to explore and analyze the unknown. For example, The New York Times has deployed sophis- ticated interactive visualizations in conjunction with news stories. 1.3 Why Have a Computer in the Loop? By enlisting computation, you can build tools that allow people to explore or present large datasets that would be completely infeasi- ble to draw by hand, thus opening up the possibility of seeing how datasets change over time. 1.3. Why Have a Computer in the Loop? 5 (a) (b) Figure 1.2. The Cerebral vis tool captures the style of hand-drawn diagrams in biology textbooks with vertical layers that correspond to places within a cell where interactions between genes occur. (a) A small network of 57 nodes and 74 edges might be possible to lay out by hand with enough patience. (b) Automatic layout handles this large network of 760 nodes and 1269 edges and provides a substrate for interactive exploration: the user has moved the mouse over the MSK1 gene, so all of its immmediate neighbors in the network are highlighted in red. From [Barsky et al. 07, Figures 1 and 2]. People could create visual representations of datasets manu- ally, either completely by hand with pencil and paper, or with com- puterized drawing tools where they individually arrange and color each item. The scope of what people are willing and able to do manually is strongly limited by their attention span; they are un- likely to move beyond tiny static datasets. Arranging even small datasets of hundreds of items might take hours or days. Most real-world datasets are much larger, ranging from thousands to millions to even more. Moreover, many datasets change dynami- cally over time. Having a computer-based tool generate the visual representation automatically obviously saves human effort com- pared to manual creation. As a designer, you can think about what aspects of hand-drawn diagrams are important in order to automatically create drawings that retain the hand-drawn spirit. For example, Figure 1.2 shows 6 1. What’s Vis, and Why Do It? an example of a vis tool designed to show interactions between genes in a way similar to stylized drawings that appear in biol- ogy textbooks, with vertical layers that correspond to the location within the cell where the interaction occurs [Barsky et al. 07]. Fig- ure 1.2(a) could be done by hand, while Figure 1.2(b) could not. 1.4 Why Use an External Representation? External representations augment human capacity by allowing us to surpass the limitations of our own internal cognition and mem- ory. Vis allows people to ofﬂoad internal cognition and memory us- age to the perceptual system, using carefully designed images as a form of external representations, sometimes also called external memory. External representations can take many forms, including touchable physical objects like an abacus or a knotted string, but in this book I focus on what can be shown on the two-dimensional display surface of a computer screen. Diagrams can be designed to support perceptual inferences, which are very easy for humans to make. The advantages of dia- grams as external memory is that information can be organized by spatial location, offering the possibility of accelerating both search and recognition. Search can be sped up by grouping all the items needed for a speciﬁc problem-solving inference together at the same location. Recognition can also be facilitated by grouping all the rel- evant information about one item in the same location, avoiding the need for matching remembered symbolic labels. However, a nonoptimal diagram may group irrelevant information together, or support perceptual inferences that aren’t useful for the intended problem-solving process. 1.5 Why Depend on Vision? Visualization, as the name implies, is based on exploiting the hu- man visual system as a means of communication. I focus exclu- sively on the visual system rather than other sensory modalities because it is both well characterized and suitable for transmitting information. The visual system provides a very high-bandwidth channel to our brains. A signiﬁcant amount of visual information processing occurs in parallel at the preconscious level. One example is visual 1.6. Why Show the Data in Detail? 7 popout, such as when one red item is immediately noticed from a sea of gray ones. The popout occurs whether the ﬁeld of other ob- jects is large or small because of processing done in parallel across the entire ﬁeld of vision. Of course, our visual systems also feed into higher-level processes that involve the conscious control of attention. Sound is poorly suited for providing overviews of large informa- tion spaces compared with vision. An enormous amount of back- ground visual information processing in our brains underlies our ability to think and act as if we see a huge amount of information at once, even though technically we see only a tiny part of our visual ﬁeld in high resolution at any given instant. In contrast, we ex- perience the perceptual channel of sound as a sequential stream, rather than as a simultaneous experience where what we hear over a long period of time is automatically merged together. This crucial difference may explain why soniﬁcation has never taken off despite many independent attempts at experimentation. The other senses can be immediately ruled out as communica- tion channels because of technological limitations. The perceptual channels of taste and smell don’t yet have viable recording and re- production technology at all. Haptic input and feedback devices Chapter 5 covers impli- exist to exploit the touch and kinesthetic perceptual channels, but cations of visual perception they cover only a very limited part of the dynamic range of what we that are relevant for vis de- can sense. Exploration of their effectiveness for communicating sign. abstract information is still at a very early stage. 1.6 Why Show the Data in Detail? Vis tools help people in situations where seeing the dataset struc- ture in detail is better than seeing only a brief summary of it. One of these situations occurs when exploring the data to ﬁnd patterns, both to conﬁrm expected ones and ﬁnd unexpected ones. Another occurs when assessing the validity of a statistical model, to judge whether the model in fact ﬁts the data. Statistical characterization of datasets is a very powerful ap- proach, but it has the intrinsic limitation of losing information through summarization. Figure 1.3 shows Anscombe’s Quartet, a suite of four small datasets designed by a statistician to illustrate how datasets that have identical descriptive statistics can have very different structures that are immediately obvious when the dataset is shown graphically [Anscombe 73]. All four have identi- cal mean, variance, correlation, and linear regression lines. If you 8 1. What’s Vis, and Why Do It? Anscombe’s Quartet: Raw Data 1 2 3 4 X Y X Y X Y X Y Mean Variance Correlation Figure 1.3. Anscombe’s Quartet is four datasets with identical simple statisti- cal properties: mean, variance, correlation, and linear regression line. However, visual inspection immediately shows how their structures are quite different. Af- ter [Anscombe 73, Figures 1–4]. 1.7. Why Use Interactivity? 9 are familiar with these statistical measures, then the scatterplot of the ﬁrst dataset probably isn’t surprising, and matches your intu- ition. The second scatterplot shows a clear nonlinear pattern in the data, showing that summarizing with linear regression doesn’t adequately capture what’s really happening. The third dataset shows how a single outlier can lead to a regression line that’s mis- leading in a different way because its slope doesn’t quite match the line that our eyes pick up clearly from the rest of the data. Finally, the fourth dataset shows a truly pernicious case where these measures dramatically mislead, with a regression line that’s almost perpendicular to the true pattern we immediately see in the data. The basic principle illustrated by Anscombe’s Quartet, that a single summary is often an oversimpliﬁcation that hides the true structure of the dataset, applies even more to large and complex datasets. 1.7 Why Use Interactivity? Interactivity is crucial for building vis tools that handle complex- ity. When datasets are large enough, the limitations of both people and displays preclude just showing everything at once; interac- tion where user actions cause the view to change is the way for- ward. Moreover, a single static view can show only one aspect of a dataset. For some combinations of simple datasets and tasks, the user may only need to see a single visual encoding. In con- trast, an interactively changing display supports many possible queries. In all of these cases, interaction is crucial. For example, an in- teractive vis tool can support investigation at multiple levels of de- tail, ranging from a very high-level overview down through multiple levels of summarization to a fully detailed view of a small part of it. It can also present different ways of representing and summariz- ing the data in a way that supports understanding the connections between these alternatives. Before the widespread deployment of fast computer graphics, visualization was limited to the use of static images on paper. With computer-based vis, interactivity becomes possible, vastly increas- ing the scope and capabilities of vis tools. Although static repre- sentations are indeed within the scope of this book, interaction is an intrinsic part of many idioms. 10 1. What’s Vis, and Why Do It? 1.8 Why Is the Vis Idiom Design Space Huge? A vis idiom is a distinct approach to creating and manipulating visual representations. There are many ways to create a visual en- coding of data as a single picture. The design space of possibilities gets even bigger when you consider how to manipulate one or more of these pictures with interaction. Many vis idioms have been proposed. Simple static idioms in- clude many chart types that have deep historical roots, such as scatterplots, bar charts, and line charts. A more complicated id- iom can link together multiple simple charts through interaction. For example, selecting one bar in a bar chart could also result in highlighting associated items in a scatterplot that shows a differ- ent view of the same data. Figure 1.4 shows an even more com- plex idiom that supports incremental layout of a multilevel network through interactive navigation. Data from Internet Movie Database showing all movies connected to Sharon Stone is shown, where ac- tors are represented as grey square nodes and links between them Figure 1.4. The Grouse vis tool features a complex idiom that combines visual encoding and interaction, supporting incremental layout of a network through in- teractive navigation. From [Archambault et al. 07a, Figure 5]. 1.9. Why Focus on Tasks? 11 mean appearance in the same movie. The user has navigated by opening up several metanodes, shown as discs, to see structure at many levels of the hierarchy simultaneously; metanode color en- codes the topological structure of the network features it contains, and hexagons indicate metanodes that are still closed. The inset Compound networks are shows the details of the opened-up clique of actors who all appear discussed further in Sec- in the movie Anything but Here, with name labels turned on. tion 9.5. This book provides a framework for thinking about the space of vis design idioms systematically by considering a set of design choices, including how to encode information with spatial position, how to facet data between multiple views, and how to reduce the amount of data shown by ﬁltering and aggregation. 1.9 Why Focus on Tasks? A tool that serves well for one task can be poorly suited for another, for exactly the same dataset. The task of the users is an equally important constraint for a vis designer as the kind of data that the users have. Reframing the users’ task from domain-speciﬁc form into ab- stract form allows you to consider the similarities and differences between what people need across many real-world usage contexts. For example, a vis tool can support presentation, or discovery, or enjoyment of information; it can also support producing more in- formation for subsequent use. For discovery, vis can be used to generate new hypotheses, as when exploring a completely unfamil- The space of task ab- iar dataset, or to conﬁrm existing hypotheses about some dataset stractions is discussed in that is already partially understood. detail in Chapter 3. 1.10 Why Focus on Effectiveness? The focus on effectiveness is a corollary of deﬁning vis to have the goal of supporting user tasks. This goal leads to concerns about correctness, accuracy, and truth playing a very central role in vis. The emphasis in vis is different from other ﬁelds that also involve making images: for example, art emphasizes conveying emotion, achieving beauty, or provoking thought; movies and comics em- phasize telling a narrative story; advertising emphasizes setting a mood or selling. For the goals of emotional engagement, story- telling, or allurement, the deliberate distortion and even fabrica- tion of facts is often entirely appropriate, and of course ﬁction is as 12 1. What’s Vis, and Why Do It? respectable as nonﬁction. In contrast, a vis designer does not typi- cally have artistic license. Moreover, the phrase “it’s not just about making pretty pictures” is a common and vehement assertion in vis, meaning that the goals of the designer are not met if the result is beautiful but not effective. However, no picture can communicate the truth, the whole truth, and nothing but the truth. The correctness concerns of a vis de- signer are complicated by the fact that any depiction of data is Abstraction is discussed an abstraction where choices are made about which aspects to in more detail in Chapters 3 emphasize. Cartographers have thousands of years of experience and 4. with articulating the difference between the abstraction of a map and the terrain that it represents. Even photographing a real-world scene involves choices of abstraction and emphasis; for example, the photographer chooses what to include in the frame. 1.11 Why Are Most Designs Ineffective? The most fundamental reason that vis design is a difﬁcult enter- prise is that the vast majority of the possibilities in the design space will be ineffective for any speciﬁc usage context. In some cases, a possible design is a poor match with the properties of the human perceptual and cognitive systems. In other cases, the design would be comprehensible by a human in some other setting, but it’s a bad match with the intended task. Only a very small number of pos- sibilities are in the set of reasonable choices, and of those only an even smaller fraction are excellent choices. Randomly choosing possibilities is a bad idea because the odds of ﬁnding a very good solution are very low. Figure 1.5 contrasts two ways to think about design in terms of traversing a search space. In addressing design problems, it’s not a very useful goal to optimize; that is, to ﬁnd the very best choice. A more appropriate goal when you design is to satisfy; that is, to ﬁnd one of the many possible good solutions rather than one of the even larger number of bad ones. The diagram shows ﬁve spaces, each of which is progressively smaller than the previous. First, there is the space of all possible solutions, including potential solutions that nobody has ever thought of before. Next, there is the set of possibilities that are known to you, the vis designer. Of course, this set might be small if you are a novice designer who is not aware of the full array of methods that have been proposed in the past. If you’re in that situation, one of the goals of this book is to enlarge the set of methods that you know about. The next set is the 1.11. Why Are Most Designs Ineffective? 13 Selected Good! o o solution Bad! x x x o o Proposal space x o Consideration x space x Known o x o x space o o o o x o x Good solution o o OK solution o Space of possible solutions Poor Solution Space of possible solutions Figure 1.5. A search space metaphor for vis design. consideration space, which contains the solutions that you actively consider. This set is necessarily smaller than the known space, because you can’t consider what you don’t know. An even smaller set is the proposal space of possibilities that you investigate in detail. Finally, one of these becomes the selected solution. Figure 1.5 contrasts a good strategy on the left, where the known and consideration spaces are large, with a bad strategy on the right, where these spaces are small. The problem of a small con- sideration space is the higher probability of only considering ok or poor solutions and missing a good one. A fundamental princi- ple of design is to consider multiple alternatives and then choose the best, rather than to immediately ﬁxate on one solution without considering any alternatives. One way to ensure that more than one possibility is considered is to explicitly generate multiple ideas in parallel. This book is intended to help you, the designer, en- tertain a broad consideration space by systematically considering many alternatives and to help you rule out some parts of the space by noting when there are mismatches of possibilities with human capabilities or the intended task. Chapter 4 introduces a As with all design problems, vis design cannot be easily handled model for thinking about as a simple process of optimization because trade-offs abound. A the design process at four design that does well by one measure will rate poorly on another. different levels; the model The characterization of trade-offs in the vis design space is a very is intended to guide your open problem at the frontier of vis research. This book provides thinking through these several guidelines and suggested processes, based on my synthesis trade-offs in a systematic of what is currently known, but it contains few absolute truths. way. 14 1. What’s Vis, and Why Do It? 1.12 Why Is Validation Difﬁcult? The problem of validation for a vis design is difﬁcult because there are so many questions that you could ask when considering whether a vis tool has met your design goals. How do you know if it works? How do you argue that one de- sign is better or worse than another for the intended users? For one thing, what does better mean? Do users get something done faster? Do they have more fun doing it? Can they work more effec- tively? What does effectively mean? How do you measure insight or engagement? What is the design better than? Is it better than another vis system? Is it better than doing the same things man- ually, without visual support? Is it better than doing the same things completely automatically? And what sort of thing does it do better? That is, how do you decide what sort of task the users should do when testing the system? And who is this user? An ex- pert who has done this task for decades, or a novice who needs the task to be explained before they begin? Are they familiar with how the system works from using it for a long time, or are they seeing it for the ﬁrst time? A concept like faster might seem straightfor- ward, but tricky questions still remain. Are the users limited by the speed of their own thought process, or their ability to move the mouse, or simply the speed of the computer in drawing each picture? How do you decide what sort of benchmark data you should use when testing the system? Can you characterize what classes of data the system is suitable for? How might you measure the quality of an image generated by a vis tool? How well do any of the automatically computed quantitative metrics of quality match Chapter 4 answers these up with human judgements? Even once you limit your considera- questions by providing a tions to purely computational issues, questions remain. Does the framework that addresses complexity of the algorithm depend on the number of data items to when to use what methods show or the number of pixels to draw? Is there a trade-off between for validating vis designs. computer speed and computer memory usage? 1.13 Why Are There Resource Limitations? When designing or analyzing a vis system, you must consider at least three different kinds of limitations: computational capacity, human perceptual and cognitive capacity, and display capacity. Vis systems are inevitably used for larger datasets than those they were designed for. Thus, scalability is a central concern: de- 1.13. Why Are There Resource Limitations? 15 signing systems to handle large amounts of data gracefully. The continuing increase in dataset size is driven by many factors: im- provements in data acquisition and sensor technology, bringing real-world data into a computational context; improvements in computer capacity, leading to ever-more generation of data from within computational environments including simulation and log- ging; and the increasing reach of computational infrastructure into every aspect of life. As with any application of computer science, computer time and memory are limited resources, and there are often soft and hard constraints on the availability of these resources. For instance, if your vis system needs to interactively deliver a response to user in- put, then when drawing each frame you must use algorithms that can run in a fraction of a second rather than minutes or hours. In some scenarios, users are unwilling or unable to wait a long time for the system to preprocess the data before they can interact with it. A soft constraint is that the vis system should be parsimonious in its use of computer memory because the user needs to run other programs simultaneously. A hard constraint is that even if the vis system can use nearly all available memory in the computer, dataset size can easily outstrip that ﬁnite capacity. Designing sys- tems that gracefully handle larger datasets that do not ﬁt into core memory requires signiﬁcantly more complex algorithms. Thus, the computational complexity of algorithms for dataset preprocessing, transformation, layout, and rendering is a major concern. How- ever, computational issues are by no means the only concern! On the human side, memory and attention are ﬁnite resources. Chapter 5 will discuss some of the power and limitations of the low-level visual preattentive mechanisms that carry out massively parallel processing of our current visual ﬁeld. However, human memory for things that are not directly visible is notoriously lim- ited. These limits come into play not only for long-term recall but also for shorter-term working memory, both visual and nonvisual. We store surprisingly little information internally in visual work- ing memory, leaving us vulnerable to change blindness: the phe- More aspects of memory nomenon where even very large changes are not noticed if we are and attention are covered in attending to something else in our view [Simons 00]. Section 6.5. Display capacity is a third kind of limitation to consider. Vis de- signers often run out of pixels; that is, the resolution of the screen is not enough to show all desired information simultaneously. The Synonyms for informa- information density of a single image is a measure of the amount tion density include gra- of information encoded versus the amount of unused space. Fig- phic density and data–ink ure 1.6 shows the same tree dataset visually encoded three differ- ratio. 16 1. What’s Vis, and Why Do It? (a) (b) (c) Figure 1.6. Low and high information density visual encodings of the same small tree dataset; nodes are the same size in each. (a) Low information density. (b) Higher information density, but depth in tree cannot be read from spatial position. (c) High information density, while maintaining property that depth is encoded with position. From [McGufﬁn and Robert 10, Figure 3]. ent ways. The layout in Figure 1.6(a) encodes the depth from root to leaves in the tree with vertical spatial position. However, the information density is low. In contrast, the layout in Figure 1.6(b) uses nodes of the same size but is drawn more compactly, so it has higher information density; that is, the ratio between the size of each node and the area required to display the entire tree is larger. However, the depth cannot be easily read off from spatial position. Figure 1.6(c) shows a very good alternative that combines the beneﬁts of both previous approaches, with both high informa- tion density from a compact view and position coding for depth. There is a trade-off between the beneﬁts of showing as much as possible at once, to minimize the need for navigation and explo- ration, and the costs of showing too much at once, where the user is overwhelmed by visual clutter. The goal of idiom design choices is to ﬁnd an appropriate balance between these two ends of the information density continuum. 1.14 Why Analyze? This book is built around the premise that analyzing existing sys- tems is a good stepping stone to designing new ones. When you’re confronted with a vis problem as a designer, it can be hard to de- cide what to do. Many computer-based vis idioms and tools have 1.14. Why Analyze? 17 What? Why? How? Figure 1.7. Three-part analysis framework for a vis instance: why is the task being performed, what data is shown in the views, and how is the vis idiom constructed in terms of design choices. been created in the past several decades, and considering them one by one leaves you faced with a big collection of different pos- sibilities. There are so many possible combinations of data, tasks, and idioms that it’s unlikely that you’ll ﬁnd exactly what you need to know just by reading papers about previous vis tools. More- over, even if you ﬁnd a likely candidate, you might need to dig even deeper into the literature to understand whether there’s any evidence that the tool was a success. This book features an analysis framework that imposes a struc- ture on this enormous design space, intended as a scaffold to help you think about design choices systematically. It’s offered as a guide to get you started, not as a straitjacket: there are certainly many other possible ways to think about these problems! Figure 1.7 shows the high-level framework for analyzing vis use Chapter 2 discusses data according to three questions: what data the user sees, why the and the question of what. user intends to use a vis tool, and how the visual encoding and in- Chapter 3 covers tasks and teraction idioms are constructed in terms of design choices. Each the question of why. Chap- three-fold what–why–how question has a corresponding data–task– ters 7 through 14 answer idiom answer trio. One of these analysis trios is called an instance. the question of how idioms Simple vis tools can be fully described as an isolated analy- can be designed in detail. sis instance, but complex vis tool usage often requires analysis in terms of a sequence of instances that are chained together. In these cases, the chained sequences are a way to express dependen- cies. All analy

Visualization Analysis and Design (2014) PDF - Tamara Munzner

Document Details

Tags

Related

Summary

Full Transcript