Podcast
Questions and Answers
According to the provided text, what is the primary advantage of using hexagons in 2D hexbin plots compared to square bins?
According to the provided text, what is the primary advantage of using hexagons in 2D hexbin plots compared to square bins?
- Hexagons have a more symmetrical distribution of neighboring points, leading to a less biased representation of data. (correct)
- Hexagons provide a more accurate representation of the density of data points.
- Hexagons allow for larger data sets to be visualized efficiently.
- Hexagons offer a more visually appealing representation of the data.
- Hexagons are simpler to implement computationally and are less resource-intensive.
The text highlights that hexagon binning is particularly effective for visualizing datasets with a large number of data points. What is the minimum number of data points mentioned as suitable for efficient visualization using hexagon binning?
The text highlights that hexagon binning is particularly effective for visualizing datasets with a large number of data points. What is the minimum number of data points mentioned as suitable for efficient visualization using hexagon binning?
- 100
- 10^4
- 10^6 (correct)
- 10^3
What is the primary purpose of a bagplot, as described in the text?
What is the primary purpose of a bagplot, as described in the text?
- To visualize the distribution of data points in a two-dimensional space.
- To determine the optimal binning parameters for hexagon binning.
- To analyze the correlation between two variables in a two-dimensional dataset.
- To effectively represent the location, spread, and skewness of a two-dimensional dataset. (correct)
- To identify outliers in a two-dimensional dataset.
Which of the following is NOT a benefit of using hexagons in 2D hexbin plots?
Which of the following is NOT a benefit of using hexagons in 2D hexbin plots?
According to the provided information, what is the key factor determining the color of a tile in a 2D hexbin plot?
According to the provided information, what is the key factor determining the color of a tile in a 2D hexbin plot?
What property of hexagons contributes to their ability to effectively depict the density of data points in 2D hexbin plots?
What property of hexagons contributes to their ability to effectively depict the density of data points in 2D hexbin plots?
The text refers to 'tessellation' as a key concept in 2D hexbin plots. What does tessellation refer to in this context?
The text refers to 'tessellation' as a key concept in 2D hexbin plots. What does tessellation refer to in this context?
What is the main difference between a boxplot and a bagplot?
What is the main difference between a boxplot and a bagplot?
What is the key advantage of using a scatterplot matrix when comparing multiple numerical variables?
What is the key advantage of using a scatterplot matrix when comparing multiple numerical variables?
Which type of correlation is considered more robust in the presence of outliers?
Which type of correlation is considered more robust in the presence of outliers?
What does a high positive correlation between two variables imply?
What does a high positive correlation between two variables imply?
What is the key limitation of Bravais Pearson correlation?
What is the key limitation of Bravais Pearson correlation?
Why is it important to remember that "correlation is not causation"?
Why is it important to remember that "correlation is not causation"?
Which of the following is NOT a characteristic of the visual variable 'Shape'?
Which of the following is NOT a characteristic of the visual variable 'Shape'?
What is the main reason why using size to represent a numerical variable should be done with caution?
What is the main reason why using size to represent a numerical variable should be done with caution?
What is the key advantage of using shapes in visualization?
What is the key advantage of using shapes in visualization?
Why is it important for the link between a shape and its intended meaning to be explicit?
Why is it important for the link between a shape and its intended meaning to be explicit?
Why are spreadsheets generally not suitable for identifying outliers, clusters, or trends?
Why are spreadsheets generally not suitable for identifying outliers, clusters, or trends?
What is the main difference between visual marks and text in terms of processing?
What is the main difference between visual marks and text in terms of processing?
Which of the following visual variables can be considered quantitative?
Which of the following visual variables can be considered quantitative?
Which of the following is NOT a reason why distant objects appear less vibrant in color?
Which of the following is NOT a reason why distant objects appear less vibrant in color?
How can we perceive depth information?
How can we perceive depth information?
What is a key issue with using gratuitous 3D visualizations in charts?
What is a key issue with using gratuitous 3D visualizations in charts?
Which of the following is NOT mentioned as a common example of unnecessarily using 3D in visualizations?
Which of the following is NOT mentioned as a common example of unnecessarily using 3D in visualizations?
When is using a 3D plot potentially appropriate?
When is using a 3D plot potentially appropriate?
Why are camera lenses considered to provide a larger depth of view compared to our eyes?
Why are camera lenses considered to provide a larger depth of view compared to our eyes?
Which of the following is NOT a visual cue that provides depth information?
Which of the following is NOT a visual cue that provides depth information?
What is the primary reason why 3D plots are often considered inappropriate?
What is the primary reason why 3D plots are often considered inappropriate?
What is the purpose of using paste0
in the provided code snippet?
What is the purpose of using paste0
in the provided code snippet?
What does the geom_text()
function do in the context of the provided code?
What does the geom_text()
function do in the context of the provided code?
What is the intended effect of using coord_polar(theta="y")
in the pie chart code?
What is the intended effect of using coord_polar(theta="y")
in the pie chart code?
What does the code snippet labs(fill = "# of fron gears")
achieve?
What does the code snippet labs(fill = "# of fron gears")
achieve?
Which of the following statements accurately describes the purpose of using stat="identity"
in both the bar chart and pie chart code snippets?
Which of the following statements accurately describes the purpose of using stat="identity"
in both the bar chart and pie chart code snippets?
Why is the ggplot()
function used twice, separately for the bar chart and the pie chart?
Why is the ggplot()
function used twice, separately for the bar chart and the pie chart?
What is the primary goal of using the theme_void()
function in the pie chart code?
What is the primary goal of using the theme_void()
function in the pie chart code?
What would likely be the result for the pie chart if the line coord_polar(theta="y")
was removed?
What would likely be the result for the pie chart if the line coord_polar(theta="y")
was removed?
What is the meaning of the 210 in the music data table under 'Listen=Yes'?
What is the meaning of the 210 in the music data table under 'Listen=Yes'?
What does the mosaic plot of the 'music' data show?
What does the mosaic plot of the 'music' data show?
What does the dimension 'music' represent in the context of the given data structure?
What does the dimension 'music' represent in the context of the given data structure?
What is the R function used for creating mosaic plots?
What is the R function used for creating mosaic plots?
What is the 'Titanic' data set in R used for?
What is the 'Titanic' data set in R used for?
What are the categories in the 'Titanic' data set?
What are the categories in the 'Titanic' data set?
How are the columns in the 'music' data represented in the 'music' data structure?
How are the columns in the 'music' data represented in the 'music' data structure?
What is the purpose of the following R code: mosaicplot(music, col = hcl(240), main = "Classical Music Listening")
?
What is the purpose of the following R code: mosaicplot(music, col = hcl(240), main = "Classical Music Listening")
?
Flashcards
Visual Variable Characteristics
Visual Variable Characteristics
Five key properties that influence visualization: Selective, Associative, Quantitative, Order, and Length.
Selective Variable
Selective Variable
A visual variable that draws attention to specific data elements, helping to distinguish them.
Associative Variable
Associative Variable
A visual variable that connects symbols or shapes to meanings or categories.
Quantitative Variable
Quantitative Variable
Signup and view all the flashcards
Order in Visuals
Order in Visuals
Signup and view all the flashcards
Size as a Variable
Size as a Variable
Signup and view all the flashcards
Shape in Visualization
Shape in Visualization
Signup and view all the flashcards
Value in Visualization
Value in Visualization
Signup and view all the flashcards
Variable in R
Variable in R
Signup and view all the flashcards
paste0 function
paste0 function
Signup and view all the flashcards
ggplot2
ggplot2
Signup and view all the flashcards
geom_bar function
geom_bar function
Signup and view all the flashcards
stat="identity"
stat="identity"
Signup and view all the flashcards
coord_polar
coord_polar
Signup and view all the flashcards
Position_stack function
Position_stack function
Signup and view all the flashcards
theme_void
theme_void
Signup and view all the flashcards
Hexagon Binning
Hexagon Binning
Signup and view all the flashcards
Density Calculation
Density Calculation
Signup and view all the flashcards
Tessellation
Tessellation
Signup and view all the flashcards
Bivariate Histogram
Bivariate Histogram
Signup and view all the flashcards
Color Ramp in Hexbin
Color Ramp in Hexbin
Signup and view all the flashcards
Packing Efficiency
Packing Efficiency
Signup and view all the flashcards
Bagplot
Bagplot
Signup and view all the flashcards
Symmetry of Nearest Neighbors
Symmetry of Nearest Neighbors
Signup and view all the flashcards
Similarity between Variables
Similarity between Variables
Signup and view all the flashcards
Correlation
Correlation
Signup and view all the flashcards
Linear Correlation (Pearson)
Linear Correlation (Pearson)
Signup and view all the flashcards
Rank Correlation (Spearman)
Rank Correlation (Spearman)
Signup and view all the flashcards
Correlation vs. Causation
Correlation vs. Causation
Signup and view all the flashcards
Proximity and Detail
Proximity and Detail
Signup and view all the flashcards
Color and Distance
Color and Distance
Signup and view all the flashcards
3D Plot Issues
3D Plot Issues
Signup and view all the flashcards
Visual Distortion in 3D
Visual Distortion in 3D
Signup and view all the flashcards
Atmospheric Scattering
Atmospheric Scattering
Signup and view all the flashcards
Depth Information
Depth Information
Signup and view all the flashcards
Gratuitous 3D
Gratuitous 3D
Signup and view all the flashcards
Eye vs Camera Focus
Eye vs Camera Focus
Signup and view all the flashcards
Mosaic Plot
Mosaic Plot
Signup and view all the flashcards
mosaicplot function
mosaicplot function
Signup and view all the flashcards
Titanic Data Set
Titanic Data Set
Signup and view all the flashcards
Variables in Titanic Data
Variables in Titanic Data
Signup and view all the flashcards
Class in Titanic Data
Class in Titanic Data
Signup and view all the flashcards
Sex in Titanic Data
Sex in Titanic Data
Signup and view all the flashcards
Age in Titanic Data
Age in Titanic Data
Signup and view all the flashcards
Survived Status
Survived Status
Signup and view all the flashcards
Study Notes
Background
- Information being visualized may not have an obvious visual manifestation.
- The process of creating a mapping from information to the visual representation is complex.
- Choosing the best visual variables for a specific set of information is challenging.
Visualization Pipeline
- The visualization pipeline converts raw information into an interactive visual representation.
- The pipeline consists of Raw Information, Data Transform, Dataset, Visual Mapping, Visual Form, View Transform, Visual Perception, and User Interaction.
Basic Visual Units (Marks)
- Points: Location, size, shape, color.
- Lines: Length, location. Changes in thickness, texture, or color don't change the line's meaning. Changing location does change its meaning.
- Areas: Length, width. Changing length and width changes the meaning. Changes in position, colour, value, or texture don't change the meaning.
- Surfaces: Similar to areas but exist in 3D. Changes in colour, texture don't alter meaning; changes in position, size, shape, or orientation change the meaning.
- Volumes: Length, width, and height. Their size is their meaning. Changes in position, colour, or texture don't change meaning; changes in size, shape, or orientation alter meaning.
Visual Units
- Each visual unit may have multiple visual variables.
Visual Variables
- Position: Changes in X, Y location.
- Size: Change in length, area, or repetition.
- Shape: Infinite number of shapes.
- Value: Changes from light to dark.
- Colour: Changes in hue at a given value.
- Orientation: Changes in alignment.
- Texture: Variation in 'grain'.
Characters of Visual Variables
- Selective: Is change in this visual variable enough to allow selection from a group? How easy is it to spot an outlier?
- Associative: Is a change in this visual variable enough to allow us to perceive them as a group? How easy is it to see a cluster?
- Quantitative: Is there a numerical reading obtainable from changes in this visual variable?
- Order: Are changes in this visual variable perceived as ordered? How easy is to spot a trend? How easy is it to rank things numerically?
- Length: How many changes in value can still be recognized with confidence as separate? How big a range of data can this visual variable encode?
- Interpretation of symbolic meanings: How easy is it to interpret the symbolic (not numeric) meaning of a visual variable? Greatly influence experience of reading a visualization.
Characteristics of Each Visual Variable
- We will discuss the five characteristics of each visual variable: Selective, Associative, Quantitative, Order, and Length.
Position
- selective: YES
- associative: YES
- quantitative: YES
- order: YES
- length: YES
Example
- Example 1: Dot Plot of State Education Spending, 2000 (in dollars per capita). Shows education expenditure per capita for different states.
- Example 2: Gapminder World. Visualizes global data across different countries and years; demonstrates effective encoding of multiple variables.
- Example 3: Vehicle Miles, Year-over-year Change in Rolling 12 Months Miles Driven. Represents the proportion/change of vehicle miles over time.
- Example 4: Human Poverty Index. Represents geographical distribution of the human poverty index.
- Example 5: Pie Chart. Represents the distribution of a value.
- Example 6: Population Distribution, 2000. This shows population density across US states in 2000 using a square-based visualization.
- Example 7: Population-Age by Sex 2007, Logan County. Column chart that visually displays demographic information.
- Example 8: Song Title vs. Internet Marketer. This represents the popularity/performance of songs across different online vendors.
- Example 9: Line Thickness (Monsieur Minard's visualization of Napoleon's 1812-1813 invasion of Russia.). This is a map showing troop movements and fatalities.
- Example 10: Chernoff Face. Shows how facial features can be used to display multiple data points through visualization.
Words and Text
- Words can be seen as a special case of shape.
- They are generally not considered selective, or quantitative, or ordered.
- Text often requires serial processing while visual marks can typically be processed in parallel.
Numbers
- Numbers are a special case of shape.
- They are selective; associative and quantitative; order depends on the context, generally not considered to be non-ordered.
Value
- Changing a mark's value is achieved by changes in darkness/lightness.
- Color is divided into hue, saturation, and value.
- Color in later slides refers to hue.
- Changes in saturation are not discussed.
- Changes in value do not provide numerical readings.
- Value (grey scale) is not quantitative.
Size
- selective: YES
- associative: YES
- quantitative: YES (but with limitations)
- order: YES
- length: YES (theoretically infinite but practically limited; approximately 5 types and 20 distinctions)
Size
- Numerical readings interpreted from size changes are usually approximate and less accurate when compared to other methods.
- Using size to represent numerical data should be done with caution.
Shape
- selective: YES
- associative: YES
- quantitative: NO
- order: NO
- length: Theoretically infinite, but practically limited (association and selection ~5, distinctions ~20).
Colour
- selective: YES
- associative: YES
- quantitative: NO,
- order: NO
- length: Theoretically infinite (association and selection < 7, distinction <10)
Order
- selective: YES
- associative: YES
- quantitative: NO
- order: YES
- length: Possibly, depends on context.
Orientation
- selective: YES
- associative: YES
- quantitative: NO
- order: NO
- length: Theoretically infinite, practically limited to 4 variants (vertical, horizontal, and two opposing diagonals).
Grain, Pattern, and Texture
- Pattern: repetitive use of shape variations.
- Grain: varying granularity.
- Texture: a characteristic of the material.
Grain
- selective: YES
- associative: YES
- quantitative: NO
- order: NO
- length: Theoretically infinite, but practically limited (association and selection ~<5)
Pattern
- The characters of pattern are basically the same as shape.
Texture
- selective: YES
- associative: YES
- quantitative: NO
- order: NO
- length: Theoretically infinite
Motion
- Selective: Probably Yes
- Associative: Yes
- Quantitative: No
- Order: Probably Yes
- Length: Considerable variations
Other Visual Variables
- Bertin's book did not include depth, occlusion, and transparency, which should be addressed.
Example: Chernoff Face
- Demonstrates representing multiple variables using facial features.
Summary
- All visual variables are selective.
- Generally, most variables are described as associative, except for shape.
- Position and size are usually considered quantitative; others are not.
- Order is usually considered related to position, but in other cases might depend on context.
- Length usually involves quantitative (but limited) measurement.
Summary of Order
- Order - Position, size, and value.
- Length - All theoretically infinite length, but limited by display resolution.
Interpretation of Symbolic Meanings
- All variables require some mental effort for their symbolic interpretation.
- Developers often pay less attention to this aspect, but it significantly affects the complexity of reading a visualization.
Readings
- M.S.T. Carpendale "Considering Visual Variables as a Basis for Information Visualization", Technical Report, Dept. of Computer Science, University of Calgary 2001.
- Interview with Jacques Bertin.
Reference and Material to Study for These Slides
- Jacques Bertin. Semiology of Graphics: Diagrams, Networks, Maps. University of Wisconsin Press.
- Related URLs to study materials.
Data Visualization & Reporting
- An introduction to plots for continuous numerical variables.
- Prof. Antonio Irpino, Data Analytics BC.
Basic Plots
- Plot types from a geometrical point of view: Cartesian coordinates, polar coordinates (pie charts), and topological spaces (maps).
(Vertical) Bar Chart
- A bar chart is a plot based on Cartesian coordinates.
- Objects are listed on the x-axis, and their values on the y-axis.
- A vertical bar is associated with each object, whose height corresponds to its value.
- The y-axis is always numerical. A horizontal bar chart is just a vertical one with inverted x and y axes.
Bar Chart: Input Data
- Case 1: Quantitative variable, series, Individuals (x axis), and Intensity (y axis).
- Case 2: Quantitative, Geo series, Places (x axis), and Intensity (y axis).
- Case 3: Quantitative, Time series, Time (x axis), and Intensity (y axis).
- Case 4: Quantitative discrete, Frequency table, Single values (x axis), and Frequency (y axis).
- Case 5: Qualitative, Frequency table, Categories (x axis), and Frequency (y axis)
Bar Chart: Cases 1 and 2 (Series=Individuals)
- Example of bar charts using individuals as categories (e.g., precipitation in different cities).
Bar Chart: Case 3 (Time Series)
- Time series, a sequence of observations over time.
- Example of plotting iPhone prices over time; the x-axis represents time(periods, or year/month).
Bar Chart: Case 4 (Frequency Table from a Numeric Discrete Variable)
- Frequency table, numerical, discrete variables; a way to display the frequency of each value.
- Example of housing distribution by size (a numeric categorical variable)
Bar Chart: Case 5 (Frequency Table of a Categorical Variable)
- Categorical data, frequencies; a way to display the frequency of each category (e.g., employee occupation).
Sorting Bars
- Importance of ordering of categorical variables if the order matters in the context.
Allowed/Not Allowed Sorting
- When the series are related to objects with no inherent order, sorting can help improve visualization. Otherwise, it's not appropriate.
Bar Chart: Grouping Bars
- Multiple Series, grouped by one more variable.
- The bars are grouped side-by-side for the same group, but more distant from the bars of another group.
Stacked Bar Chart
- Series of positive values to be stacked on the same bar.
- Example : Display of home households data, divided into Male/Female populations.
Stacked Percentage Bar Charts
- Values as percentages, not raw data values; useful for showing composition of the whole.
- Example of percentage breakdown of how bills are made by each person.
Stacked Percentage BC: Frequency Tables (Categories Unordered/Ordered)
- Represent data as percentages showing subgroups divided by a variable/category.
- Example of occupation/gender distribution/frequencies of employees.
Some Notes and Tips When Using Bar Charts
- Y-axis should always start from zero; misleading the public by not doing this is a very common mistake.
Graphical Representations of Functions with Two Variables
- Functions often involve more than one variable; visualizing them becomes more complex.
- Example: a 2D plane or a 2D Gaussian density function.
Level Curves
- Level curves (or contour lines) are curves on a 2D plot that indicates constant values of a 2D function.
- Useful in certain types of maps, for instance topographics maps or weather maps, to visualize constant elevation bands.
3D Plots
- 3D plots (e.g., 3D histograms, scatterplots,...) can be helpful when you need to show how the value, and quantities vary with respect to three more variables, because of the third dimension.
- The main problem with 3D plots is that visualizations are often difficult and unreliable when visualizing relationships.
Other Visualization Techniques for High Dimensional Data
- Parallel coordinates: An individual/row is represented as a line crossing the parallel axes. The value in each coordinate is shown by marking the crossing point with respect to the corresponding axis, accordingly to its value. Useful for exploring the relationship among several variables.
How Data Are Organized in a Table (Adjacency Matrix)
- Matrices with rows and columns to represent connections among nodes (e.g., relational data, networks.
Another Compact Way (Origin-Destination List)
- List of lists of nodes associated with a single node, representing connections.
More
- Networks: A graph composed of nodes and connections, can be used to visualize complex relational data.
- A graph alone might be insufficient; often useful to use variables (e.g., gender, type of relationships, etc.) to enrich it.
- Many variations: directed graphs (with arrows, signifying direction), and weighted graphs (with values on edges).
Visualizations for Very Large Data Tables
- Methods such as dimensionality reduction and biplots, parallel coordinates can be useful for handling large amounts of data.
Dimensionality Reduction Techniques/Example
- Principal Component Analysis (PCA) is a technique to reduce the dimensionality of data, keeping the essential information, and simplifying the problem.
- PCA works by identifying a new set of orthogonal variables, called principal components, which captures the most important variations of the data; thus, a substantial amount of information is preserved.
Some Examples of Visualizations
Bivariate Data Analysis, Correlation coefficient
- Visualization of the relationship between two numerical variables, such as a scatter plot, can show whether values increase or decrease together, or there's no relationship.
- The correlation coefficient measures the strength and type of this relationship.
- Methods like correlation plots (correlograms for example), or heatmaps can be useful to represent the relationship among multiple variables.
Correlograms
- A correlogram shows the correlation coefficient among all pairs of quantitative variables.
- It is a matrix of tiles, the color/intensity of the tile corresponds to the correlation coefficient.
Heatmaps
- A heatmap represents data in a table by using colors to represent different values. It's a matrix of colored tiles, where the intensity/color of the tile is proportional to the value in the cell. The color palette can provide a visual summary of the relations among variables in a data set.
- Useful in cases like microarray data (the intensities of the expression levels of thousands of genes).
Other Types of Data, Variable Plots and Considerations
- Visualizations such as scatterplots that show how different factors (and their combination) are related.
- Mosaic plots, chord diagrams, parallel sets.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.