Graph Selection and Types

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

A company wants to compare the sales performance of four different product lines over the last quarter. Which chart type is most suitable for this?

  • Pie Chart
  • Word Cloud
  • Bar Chart (correct)
  • Scatter Plot

A marketing team needs to display customer satisfaction scores before and after implementing a new service design. Which of the following chart types would be most effective?

  • Proportional Shape Chart
  • Clustered Bar Chart
  • Connected Dot Plot (Dumbbell Chart) (correct)
  • Bubble Chart

Which chart type is most appropriate for visualizing the share of different departments that make up a company's total expenses?

  • Scatter Plot
  • Clustered Bar Chart
  • Stacked Bar Chart (correct)
  • Line Chart

You want to compare the distribution of income across different regions, highlighting median values, quartiles, and potential outliers. Which chart type is best suited for this purpose?

<p>Box and Whisker Plot (D)</p>
Signup and view all the answers

Which of the following chart types is LEAST suitable when precise comparison between data points is critical?

<p>Proportional shape chart (C)</p>
Signup and view all the answers

A human resources manager wants to visualize the frequency of different skills mentioned in employee feedback. Which of the following chart types is most appropriate?

<p>Word Cloud (A)</p>
Signup and view all the answers

An analyst needs to display the flow of funds within a company, indicating the magnitude of transfers between different departments. Which of the following chart types would be most effective?

<p>Sankey Diagram (D)</p>
Signup and view all the answers

What type of chart would be used to showcase the min-max ranges across several categories?

<p>Range chart (B)</p>
Signup and view all the answers

An environmental scientist wants to represent cyclic data, for example the fluctuation of pollution levels over a year's time. Which chart adapts best to the representation of this data?

<p>Polar Chart (D)</p>
Signup and view all the answers

What is the main purpose of 'imputation' in data cleaning?

<p>Filling missing data using various methods (C)</p>
Signup and view all the answers

A project manager needs a visualization to track the timeline of different tasks in a construction project. Which chart type would be most appropriate?

<p>Gantt Chart (B)</p>
Signup and view all the answers

Which of the following charts is most suitable when you have many categories and want to display hundreds of survey results?

<p>Piled Bars (A)</p>
Signup and view all the answers

An economist wants to compare demographic splits such as age and gender in two opposing groups to efficiently display the differences. The best chart type would be:

<p>Back-to-back bar chart (D)</p>
Signup and view all the answers

Which chart is effective at comparing multiple variables radiating from a center, highlighting the outliers well?

<p>Radar chart (B)</p>
Signup and view all the answers

What term does Excel use for the dependent variable in regression analysis, and what are the independent variables called?

<p>The dependent variable is the Intercept variable, and the independent variables are the (X₁) variables. (A)</p>
Signup and view all the answers

What does a low p-value (typically < 0.05) indicate when checking if a regression model is good?

<p>The model is statistically significant and is a good fit. (D)</p>
Signup and view all the answers

In the context of regression analysis, what does the R-squared value represent?

<p>The proportion of the variance in the dependent variable that is predictable from the independent variable(s). (B)</p>
Signup and view all the answers

When would it be inappropriate to use linear regression?

<p>Predicting pass/fail outcomes. (B)</p>
Signup and view all the answers

In analyzing internal company data, you find several entries with similar data but slightly different formatting. In the context of data analytics, what is the most suitable action to take?

<p>Normalise the data for one value per cell and to identify a primary key for uniqueness. (D)</p>
Signup and view all the answers

What is a key characteristic of a treemap chart that differentiates it from other chart types?

<p>It is space efficent and highlights large items of data. (C)</p>
Signup and view all the answers

Flashcards

Bar Chart

Graph that compares values across different categories.

Clustered Bar Chart

Graph that compares values from multiple groups.

Connected Dot Plot

Graph that shows shows change between two points.

Proportional Shape Chart

Chart with sized shapes representing parts of a whole. Eye-catching and good for visual storytelling.

Signup and view all the flashcards

Bubble Chart

Graph that plots 3+ variables (X, Y, size). Shows clusters visually.

Signup and view all the flashcards

Radar Chart

Graph with multivariable comparisons radiating from the center; good for highlighting outliers.

Signup and view all the flashcards

Polar Chart

Radial graph for cyclic data.

Signup and view all the flashcards

Range Chart

Graph that shows min-max values for categories.

Signup and view all the flashcards

Box and Whisker Plot

Graph the displays distribution with median, quartiles, outliers.

Signup and view all the flashcards

Histogram

Graph with a binned frequency distribution.

Signup and view all the flashcards

Word Cloud

Graph where the size of the words shows frequency. Good for qualitative data.

Signup and view all the flashcards

Pie Chart

Circle divided into slices representing parts of a whole.

Signup and view all the flashcards

Stacked Bar Chart

Bars divided into sections.

Signup and view all the flashcards

Back-to-Back Bar Chart

Two bar charts facing opposite directions.

Signup and view all the flashcards

Treemap

Rectangles sized by value; good for big datasets.

Signup and view all the flashcards

Sunburst Chart

Circular treemap showing hierarchy.

Signup and view all the flashcards

Line Chart

Connects points to show trend over time.

Signup and view all the flashcards

Regression

Used a to model and predict the relationships between variables.

Signup and view all the flashcards

Simple Linear Regression

Models a relationship between one indepedent variable and one dependent variable.

Signup and view all the flashcards

Multiple Linear Regression

Models a relationships between several independent variables and one dependent variables.

Signup and view all the flashcards

Study Notes

Graph Selection

  • Consider perception, comparison, interpretation, and summarization when choosing a graph
  • Perception involves how the graph is perceived
  • Comparison involves making comparisons using the graph
  • Interpretation considers what "larger is better" means in the graph
  • Comprehension involves summarizing the information

Bar Chart

  • Compares a set of values across different categories
  • Easy to understand, good for comparison, and flexible
  • Can be crowded if there are too many bars
  • Not suitable for showing trends
  • Needs careful ordering
  • Used for comparing values across categories
  • Should not be used for showing parts of a whole
  • Example: Comparing sales across different product types

Clustered Bar Chart

  • Compares values from multiple groups
  • Provides clear group comparison and a familiar layout
  • Effective for a small number of groups
  • Can be confusing with too many groups
  • Requires a legend
  • Takes up a lot of space
  • Example: Showing male vs. female performance across departments

Connected Dot Plot (Dumbbell/DNA Chart)

  • Shows the change between two points
  • Highlights change neatly, is compact, and less cluttered
  • Can be tricky for many points and requires labels
  • Not intuitive
  • Used for showing before/after comparisons
  • Should not be used for very large datasets
  • Example: Change in customer satisfaction scores before and after a service redesign

Proportional Shape Chart

  • Uses sized shapes to represent parts of a whole
  • Eye-catching, good for visual storytelling, and shows proportions
  • Hard to compare accurately and needs annotation
  • Area can mislead
  • Used for showing parts of a whole where precision is not critical
  • Should not be used when precise comparison is needed

Bubble Chart

  • Plots three or more variables (X, Y, size)
  • Provides visual richness, shows clusters, and is dynamic
  • Can be hard to read sizes, crowded, and color/size can mislead
  • Used for showing relationships among multiple variables
  • Should not be used for two-variable simple comparisons
  • Example: Showing GDP, population, and CO2 emissions across countries

Radar Chart (Spidergram)

  • Compares multiple variables radiating from the center
  • Great for skill/attribute comparison and highlights outliers
  • Overlapping lines can be hard to read and the scale can be confusing
  • Used for profile comparison
  • Should not be used for linear data
  • Example: Comparing player performance across multiple skills

Polar Chart

  • Radial graph for cyclic data
  • Good for circular patterns and engaging
  • Poor comparison across distance and nonlinear scaling
  • Used for showing seasonal/cyclic data
  • Should not be used for non-cyclic relationships
  • Example: Monthly sales trends shown in a circular format

Range Chart

  • Shows the minimum and maximum values for categories
  • Highlights variability, simple, and easy
  • Does not show distribution and can be confusing with many items
  • Used for showing value ranges
  • Should not be used for showing whole distribution
  • Example: Showing temperature ranges across cities

Box and Whisker Plot

  • Displays distribution with median, quartiles, and outliers
  • Spots outliers quickly, summarizes distributions, and is compact
  • Hard for beginners, no exact numbers, and less intuitive
  • Used for showing variation across groups
  • Should not be used for audiences unfamiliar with boxplots
  • Example: Income distribution across regions

Histogram

  • Binned frequency distribution
  • Shows distribution shape, highlights skewness, and is easy to spot peaks
  • Sensitive to bin width and poor category comparison
  • Used for showing spread of continuous data
  • Should not be used for category comparisons
  • Example: Distribution of students' test scores

Word Cloud

  • Size of words indicates frequency
  • Good for qualitative data, has visual impact, and is engaging
  • Poor for detailed analysis and hard to compare exactly
  • Used for highlighting the most common words
  • Should not be used for detailed text analytics
  • Example: Most common keywords in customer feedback

Pie Chart

  • Circle divided into slices representing parts
  • Familiar, good with few categories, and easy to grasp
  • Bad for close comparisons and easily misinterpreted
  • Used for parts of a whole with 2-5 slices
  • Should not be used for many small categories
  • Example: Share of different expense categories in a budget

Stacked Bar Chart

  • Bars divided into sections
  • Shows composition, totals visible, and is compact
  • Hard to compare internal parts and relies on color
  • Used for showing overall totals and compositions
  • Should not be used for too many small components
  • Example: Market share by company and region

Back-to-Back Bar Chart

  • Two bar charts facing opposite directions
  • Shows demographic splits, efficient, and neat
  • Can be tricky if not balanced and labels are critical
  • Used for comparing two opposing groups
  • Should not be used for multi-group comparisons
  • Example: Male vs. female age distributions

Treemap

  • Rectangles sized by value
  • Space-efficient, highlights large items, and great for big datasets
  • Tiny blocks unreadable and positioning is meaningless
  • Used for visualizing parts of a large whole
  • Should not be used for precise comparisons
  • Example: Market share by company in a tech sector

Sunburst Chart

  • Circular treemap showing hierarchy
  • Clear hierarchy, attractive design, and engages viewers
  • Tiny slices at edges and tricky comparisons
  • Used for visualizing category breakdowns
  • Should not be used for non-hierarchical data
  • Example: Website navigation paths

Bubble Chart (Rich Version)

  • X, Y, size, and color plotted simultaneously
  • Multi-dimensional, cluster finding, and has visual impact
  • Size is hard to judge and overlapping can occur
  • Used for displaying complex multi-variable data
  • Should not be used for simple comparisons
  • Example: Global inequality vs. GDP vs. population

Chord Chart

  • Circular flow between categories
  • Many-to-many relations and elegant
  • Hard for beginners and crowded easily
  • Used for showing flows between categories
  • Should not be used for simple one-way relationships
  • Example: Migration flows between countries

Sankey Diagram

  • Flow diagrams showing magnitude
  • Great for proportional flows and storytelling power
  • Hard to design and risk of crowding
  • Used for showing energy use and money flows
  • Should not be used for basic single-step processes
  • Example: Energy input/output in a power plant

Line Chart

  • Connects points to show trend over time
  • Clear for trends, simple, and clean
  • Risk of overlap
  • Used for showing trend over time
  • Should not be used for unordered categories
  • Example: Revenue growth over 10 years

Gantt Chart

  • A project schedule visualization
  • Tracks project timelines and clear duration view
  • Messy for large projects and constant updating required
  • Used for managing projects
  • Should not be used for small unrelated tasks
  • Example: Construction project timelines

Piled Bars

  • Wrapping bars into multiple columns
  • Handles large datasets
  • Organized visually
  • Hard cross-comparison and breaks flow
  • Used for displaying hundreds of categories
  • Avoid use for small data
  • Example: Survey results with 300+ options

Zvinca Plot

  • Advanced scatter for distributions
  • Large dataset summarization and unique
  • Hard to interpret and nonstandard
  • Used for very large distribution datasets
  • Avoid use for simple distributions
  • Example: Distribution of 1 million customer ratings

Linear Regression

  • Used to estimate and model the relationships between variables
  • Dependent Variable (Y) is the outcome to predict
  • Independent Variables (X1, X2, ..., Xp) are the inputs
  • Understand the relationship between variables (descriptive analytics)
  • Predict future values (predictive analytics)

Simple Linear Regression

  • Models the relationship between an independent variable (X) and a dependent variable (Y)
  • Goal is to find a straight line that best fits the data points
  • Equation of the line: Y = mX + c
  • Example: Data points (X: 1, 2, 5, 3, 4; Y: 2, 4, 10, 6, 8), Formula learned: Y=2X

Least Squares Method

  • Finds the best-fitting line by minimizing the sum of squared errors
  • Tool in Excel: Data → Data Analysis → Regression

Multiple Linear Regression

  • Models the relationship between several independent variables and one dependent variable
  • Equation: Y = b0 + b1X1 + b2X2 + ... + bpXp
  • Example output: Q = 205 - 3.9P + 0.6I + 1.33T, where:
    • Q = Quantity (dependent variable)
    • P = Price, I = Income, T = Temperature (independent variables)

Regression Model Check

  • Look at the p-value: if p-value < 0.05, the model is statistically significant
  • Check individual variables: if a predictor's p-value > 0.05, it may not be useful

Linear vs Logistic Regression

  • Linear Regression:
    • Output is continuous (e.g., price, temperature)
    • Predicts a real number
    • Example: Predict a salary
  • Logistic Regression:
    • Output is categorical (e.g., pass/fail, yes/no)
    • Predicts a probability (0 to 1)
    • Example: Predict if a customer buys or not

Linear Regression Use Cases

  • Simple, easy to understand and apply
  • Good for data with a linear relationship and strong relationships
  • Provides a clear mathematical formula
  • Forecasting sales based on advertising spend
  • Predict house prices based on size, location, and number of bedrooms
  • Estimating students' final grades
  • Modeling relationship between height and weight

Linear Regression Bad Use Cases

  • Predicting with categorical variables only
  • Predicting pass/fail outcomes
  • Modeling non-linear relationships
  • When data has many outliers or is highly skewed

Linear Regression in Excel

  • Go to Data → Data Analysis → Regression
  • Set Input Y Range (dependent variable) and Input X Range (independent variable[s])
  • Check p-values for significance and coefficients to build your formula

Regression Statistics Metrics

Multiple R

  • Correlation coefficient between predicted and actual values
  • Values between -1 and 1 indicate the strength of the relationship

R Square

  • Coefficient of determination
  • Shows the variance in Y that is explained by the X variables.

Adjusted R Square

  • Adjusts R² for the number of predictors
  • Prevents overestimating model quality

Standard Error

  • Average distance that the observed values fall from the regression line
  • Lower is better

Observations

  • The number of data points used in the analysis

ANOVA Table Components

  • df (degrees of freedom): Number of values that can vary, Regression df = number of predictors
  • SS (Sum of Squares): Measures total variation, Divided into Regression SS (explained) and Residual SS (unexplained)
  • MS (Mean Square): SS divided by df, Used to calculate the F-statistic
  • F: F-statistic for model fit, Higher F = more likely at least one predictor is useful
  • Significance F: p-value for the F-test, If Significance F < 0.05, the model is statistically significant

Coefficients Table

  • Coefficients: Numbers you plug into the regression equation
  • Standard Error: Accuracy of each coefficient estimate, Smaller = more precise estimate
  • t Stat: Tests whether the coefficient is significantly different from zero, Higher absolute value = more significant
  • P-value: Probability that the coefficient is actually zero (no effect), If p-value < 0.05, that variable is statistically significant
  • Lower 95% and Upper 95%: 95% confidence interval for each coefficient,True coefficient is expected to fall within this range 95% of the time

Data Analytics Pipeline

  • Acquire Data: Collect data from various sources like CSV files, web scraping, internal/external systems, open data platforms, always check data formats and licensing.
  • Clean Data, Normalise Data, Impute if Necessary
    • Clean removes duplicates, fixes incorrect entries, standardizes formats
    • Normalise ensures one value per cell, creates primary keys, and enforces data types
    • Imputation fills missing values

Statistical Understanding

  • Familiarise with Data by running descriptive statistics and looking at distributions
  • Develop Instinct or Intuition by answering: "What is unusual here?". "What factors might influence this outcome?".
  • Formulate a hypothesis: Higher seat rows correlate with lower Year 1 scores
  • Apply reasoning through: Correlations, regressions, t-tests, chi-square tests to see of these assumptions hold true
  • Visualise data using bar charts (categories), line charts (trends), Scatter plots (relationships), Histograms (distribution shapes), and always pick the simplest chart
  • Write the Narrative in a clear story
  • Conclude to prioritise most important findings to guide future decisions

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Clustered File Systems
5 questions

Clustered File Systems

AffordablePeach avatar
AffordablePeach
Clustered Personality Disorders in ICD-10
8 questions
Use Quizgecko on...
Browser
Browser