Chapter 3: Numerical Data Description

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which measure of central tendency is most affected by extreme values (outliers)?

  • Interquartile Range
  • Mode
  • Mean (correct)
  • Median

The median is always equal to the mean in a symmetric distribution.

True (A)

What is the primary use of the mode as a measure of central tendency?

Nominal data

Data are considered to be right-skewed if the mean is ______ than the median.

<p>Larger</p> Signup and view all the answers

Which of the following is LEAST affected by extreme values?

<p>Interquartile range (D)</p> Signup and view all the answers

The range provides information about the spread of a dataset and is calculated by subtracting the smallest value from the largest value.

<p>True (A)</p> Signup and view all the answers

What is the interquartile range (IQR) used for in statistical analysis?

<p>Measuring the spread of the middle 50% of the data</p> Signup and view all the answers

The interquartile range is the difference between the ______ quartile and the first quartile.

<p>Third</p> Signup and view all the answers

Which of the following describes the relationship between variance and standard deviation?

<p>Standard deviation is the square root of the variance. (D)</p> Signup and view all the answers

The variance can be a negative value.

<p>False (B)</p> Signup and view all the answers

Why is standard deviation a commonly used measure of variation?

<p>Shows variation about the mean</p> Signup and view all the answers

Standard deviation is the square root of the ______.

<p>Variance</p> Signup and view all the answers

What does the coefficient of variation (CV) measure?

<p>Relative variability (C)</p> Signup and view all the answers

A higher coefficient of variation indicates lower variability in the data.

<p>False (B)</p> Signup and view all the answers

What is the purpose of calculating the coefficient of variation?

<p>To compare the variability of datasets with different units or means</p> Signup and view all the answers

The coefficient of variation is calculated by dividing the standard deviation by the ______ multiplying by 100.

<p>Mean</p> Signup and view all the answers

Match the measure to its characteristic:

<p>Mean = Affected by extreme values Median = Center value when data is ordered Mode = Most frequent value Range = Difference between max and min values</p> Signup and view all the answers

In a dataset of professor salaries, the mean is $170,571, and the median is $155,000. What does this suggest about the distribution?

<p>The distribution is skewed to the right. (C)</p> Signup and view all the answers

When the mean is less than the median, the data is left-skewed.

<p>True (A)</p> Signup and view all the answers

How is data skewed if the mean is smaller than the median?

<p>Left-skewed</p> Signup and view all the answers

If data is skewed towards the left, the ______ is typically less than the median.

<p>Mean</p> Signup and view all the answers

Which type of data level is the mode most useful for?

<p>Nominal (B)</p> Signup and view all the answers

The median can only be used with interval and ratio data.

<p>False (B)</p> Signup and view all the answers

For what types of data are mean, median, and mode appropriate to use?

<p>Mean: ratio/interval, Median: interval/ratio/ordinal, Mode: nominal/ordinal/interval/ratio</p> Signup and view all the answers

The ______ only uses the center values in its calculation.

<p>Median</p> Signup and view all the answers

Match each measure with its corresponding formula.

<p>Sample Mean = $\frac{\sum_{i=1}^{n} x_i}{n}$ Population Mean = $\frac{\sum_{i=1}^{N} x_i}{N}$ Sample Variance = $\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}$ Population Variance = $\frac{\sum_{i=1}^{N} (x_i - \mu)^2}{N}$</p> Signup and view all the answers

The following data represent a sample: 22, 13, 10, 16, 23, 13, 11, 13. What is the mean?

<p>15.125 (D)</p> Signup and view all the answers

The 1st quartile is the same as the 50th percentile.

<p>False (B)</p> Signup and view all the answers

What percentage of data is greater than the 1st quartile?

<p>75</p> Signup and view all the answers

The second quartile is also known as the ______.

<p>Median</p> Signup and view all the answers

Given a data set of 19 values, what position would the 60th percentile be found?

<p>12th (B)</p> Signup and view all the answers

If the given position for the 75th percentile is 7, the 75th percentile is the 7th element in the dataset.

<p>False (B)</p> Signup and view all the answers

Why does the interquartile range ignore outliers?

<p>Removes the highest and lowest value observations</p> Signup and view all the answers

The measure that eliminates some outlier problems is the ______.

<p>Interquartile Range</p> Signup and view all the answers

Match the type of variability with each condition.

<p>Small Variability = Coefficient of variation is less than 25% Moderate Variability = Coefficient of variation is around 25% Large Variability = Coefficient of variation is more than 25%</p> Signup and view all the answers

A new advertising campaign has a follow-up survey, and from 150 individuals contacted, 45 of them could recognize the new advertising slogan. Which of the following is the proportion of the recognition?

<p>.3 (A)</p> Signup and view all the answers

Coefficient variations can be negative.

<p>False (B)</p> Signup and view all the answers

How to present Ethical Considerations with Numerical descriptive measures?

<p>To present ethical considerations with numerical descriptive measures, they should be presented in a fair, objective, and neutral manner. They also should not contain any inappropriate summary measures to distort facts.</p> Signup and view all the answers

A measure of central location is applied to a sample rather than a population is known as a ______.

<p>Statistic</p> Signup and view all the answers

Flashcards

What is the 'Mean'?

The arithmetic average of a data set.

What is the 'Median'?

The center value that divides data into two halves when arranged numerically.

What is the 'Mode'?

The value in a data set that occurs most frequently.

What is right-skewed data?

Data are right skewed if the mean is larger than the median.

Signup and view all the flashcards

What is left-skewed data?

Data are left skewed if the mean is smaller than the median.

Signup and view all the flashcards

What are 'Quartiles'?

Values that divide a data array into four equal-sized groups.

Signup and view all the flashcards

What is a percentile?

The pth percentile in an ordered array of n values. i = (p/100)(n).

Signup and view all the flashcards

What is the 'Range'?

It is the simplest measure of variation, calculates the difference between the largest and smallest observations.

Signup and view all the flashcards

What is Interquartile Range?

Measure that eliminates some outlier problems by using the interquartile range.

Signup and view all the flashcards

What is the 'Variance'?

The average of the squared distances of the data values from the mean.

Signup and view all the flashcards

What is the 'Standard Deviation'?

The positive square root of the variance; has the same units as original data.

Signup and view all the flashcards

What is the Coefficient of Variation?

A ratio of standard deviation to the mean, expressed as a percentage.

Signup and view all the flashcards

What is a parameter?

A summary measure computed to describe a characteristic of the population

Signup and view all the flashcards

What is a statistic?

A summary measure computed to describe a characteristic of the sample

Signup and view all the flashcards

Study Notes

  • Chapter 3 focuses on describing data using numerical measures.
  • Focus is on computing measures of middle, variability, and using numerical measures to describe data effectively.

Summary Measures

  • Data can be described numerically through measures of center and location, other measures of location, and measures of variation.

Measures of Center and Location

  • Measures of center and location include the mean, median, and mode, providing an overview of where the data is centered.

Mean (Arithmetic Average)

  • The mean can be thought of as the balance point or center of mass of the data.
  • To calculate the mean, sum the values and divide by the number of values.
  • The population mean (μ) involves summing all values in the population (Σxᵢ) and dividing by the total number of data values (N).
  • The sample mean (x̄) sums all values in the sample (Σxᵢ) and divides by the number of values in the sample (n).
  • Extreme values (outliers) affect the mean.
  • The mean is generally used for interval/ratio data.
  • The mean is the most common measure of central tendency.
  • The formula for the population mean is μ = (Σx) / N, where N is the number of data values.
  • The average occupancy rate is found to be 15.125 rooms per week

Sample Mean Calculation Steps on a Sharp Calculator

  • Clear the memory before each question by pressing [2nd F], [ALPHA], [0], then [0].
  • Set the calculator to 4 decimal places by pressing [SET UP], [0], [0], then [4].
  • Use [MODE] [1] [0] to display "STAT 0."
  • Enter data using the [DATA] key {ENT}.
  • To get the mean, press [ALPHA] [x-bar] {on the 4 key}, then the = key.
  • For a set of professor salaries, the mean salary is $170,571.
  • Solved problem 3.3, the mean is found to be 14.63

Median

  • Used for ordinal and interval/ratio data
  • The median is the center value that divides sorted data into two halves.
  • If the number of data points is odd, the median is the middle number; if even, it is the average of the two middle numbers.
  • The median is not affected by extreme values.
  • A skewed distribution benefits from the median as a better measure of center.
  • 50% of professors earn less than $155,000 (median salary).
  • The ordered array for professor salaries, with an extreme value included, remains at $155,000

Alternative Calculation Methods

  • Method 1: Calculate i = (n+1)/2 to find the median index point.
  • If i is an integer, it corresponds to the median's position.
  • If i is not an integer, the median is the average of the values in the integer positions below and above i.
  • Method 2: Calculate i = (1/2)n as the Median Index Point.
  • If i is an integer, the median is the average of the values in positions i and i+1.
  • If i is not an integer, round up to the next integer to find the median position.
  • A data array is given to be: 4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23 with n = 12, the median position is calculated.
  • The median is the average of the 6th and 7th values: (11+12)/2 = 11.5
  • By finding the median in problem 3.3, the value is equal to 14.35.

Mode

  • The mode as a measure of central tendency is the value in a data set that occurs most frequently.
  • The mode is used for either numerical or nominal (categorical) data, making it useful for nominal data.
  • Datasets may lack a mode, or exhibit multiple modes if several values share the highest frequency.
  • Extreme values do not affect the mode.
  • In Example 3-6 a mode group found to be 2 groups of size 4.

Skewed Data

  • In right-skewed data, the mean is larger than the median.
  • In left-skewed data, the mean is smaller than the median.

Shape of a Distribution

  • How data is distributed is described.
  • Distributions can be symmetric or skewed.
  • In a left-skewed distribution, the mean is less than the median
  • In a symmetric distribution, the mean equals the median.
  • In a right-skewed distribution, the median is less than the mean.

Descriptive Measures

  • Descriptive measures have been summarized in figure 3-6
  • The mean is used for ratio intervals and is sensitive to extremes.
  • The median is used for ratio intervals and ordinal values and is not used on all data.
  • The mode is used for ratio intervals, ordinal values, and nominal data, however may not reflect the center.

Other Measures of Location

  • Percentiles and quartiles are the two other measures of data location.

Percentiles

  • To calculate the ith position = p/100 * n
  • If it is not an integer, round up to the next integer which corresponds to the pth percentile in the dataset.
  • If it is an integer, the pth percentile is the means of the dataset.
  • As an example, calculating the 60th percentile of 19 ordered values.

Quartiles

  • Quartiles split the ranked data into 4 equal groups.
  • The second quartile is typically known to be the 50th percentile.
  • A first Quartile example needs to be shown.
  • A tutorial question requires arranging values into an ordered array to find the mean (280.54), median (293), and mode (325).
  • It requires the identification of a measure of central tendency and calculation of the 1st and 3rd quartiles (271, 317).
  • Also, problem 3.7 needs to consider the following
  • Determine the median scores
  • Determine the 25th and 75th percentiles
  • Determine the 60th percentile

Answers Given

  • Median: 71.5. i = (20+1)/2 = 10.5 is not an integer. Median is the average of the 10th and 11th values; (70+73)/2=71.5.
  • 25th percentile: i=25/100*20=5. i is 5, the 25th percentile average of the 5th and 6th values = (59+65)/2 = 62.
  • 75th percentile: i =75/10020=15. As i is 15, the 75th percentile average of the 15th and 16th values = (81+82)/2 = 81.5 60th percentile: i = 60/10020=12. As i is 12, the 60th percentile average of the 12th and 13th values = (73+78)/2 = 75.5

Proportions

  • π = the proportion of population having some characteristic
  • Formula for population proportion π = (number of occurrences in the population) / (population size)
  • Sample proportion (p) provides an estimate of N:
    • Formula for sample proportion p = (number of occurrences in the sample) / (sample size)
  • A proportion is a special form of the arithmetic average when scoring occurrences with 1.
  • Non occurrences use a 0, and proportion p is the arithmetic average of these scores.

Question

  • In a telephone follow-up survey of a new advertising campaign, 45 out of 150 individuals contacted could recall the new advertising slogan associated with the product.
  • Compute the proportion of people who could recall the new advertising slogan.
  • Is this value the population parameter π or the sample statistic p?

Variation

  • If all of the data are not the same value, a set of data exhibits variation.
  • Measures of variation are measures of spread or variability of the data values.

Measures of Variation

  • Measures of Variation include
    • Range
    • Variance
    • Standard Deviation
    • Coefficient of Variation

Range

  • The simplest variation measure.
  • A difference between the largest and smallest observations.
  • The simple formula is: Range = Xmaximum - Xminimum. Example range calculation is 14 - 1 = 13
  • The way in which data is distributed is simply ignored
  • The range can be too sensitive to outcomes.
  • The weak measure of variation uses too few values to indicate the variation.

Interquartile Range

  • The Interquartile Range:
    • Some outlier problems are eliminated.
    • High and low-valued observations are eliminated.
    • A Interquartile range = 3rd quartile and 1st quartiles
    • Or the 75th and 25th percentiles. Example: interquartile range shown = 57 – 30 => 27 Problem 3-1 requests an IQR only

Variance

  • The variance is the average of the squared distances of the data values from the mean.
  • Sample variance: S2 = Σ(x₁ - x)² / n-1
  • where: n = sample size s2 = sample variance Population variance: σ² = Σ(Χ₁ - μ) 2 / N N = population size population variance (sigma squared) For data (Fleetwood Mobile Home):X 15 15 - 25 = 25 25 -25 = 35 35 -25 = 20 20 - 25 = 30 30 - 25 =

Sharp calculator data to find the mean: 15 [Data] 25 [Data] 35 [Data] 20 [Data] 30 [Data]

Mode = 0, STAT Press [Alpha] [4] = Mean is 25 Press [Alpha] [6] = Standard deviation 7.07

Problems

  • Data is given for 3-25 Part (c), 3-26 Parts (b and c), 3-29 number of typo errors from pages of a book Compute the standard deviation for these sample data.
  • The number of times a population of business execs has been to the previous month; by: Computing variance standard deviation and assuming that data represent a sample instead of population By discussing the difference between values computed.

Answer

  • 3-25c= 3.1251 3-26b= var= 3.1389, STD=1.7717 3-26c= var= 3.7667, STD=1.9408 - 3-29
  • Range = 28 VAR = 92.75 STD = 9.6307 Q1=23.5, Q3=40.5. IQR = 17.

Comparing Standard Deviations

  • Same mean, but different standard deviations.

Coefficient of Variation

  • The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage.
  • The coefficient of variation is used to measure the relative variation in the data.
  • The relative data set is that there is quite a big variability in the Lumber data

Formula

  • S Population Population CV =

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Master Multivariate Statistics
5 questions
Statistics Chapter: Measures of Center
26 questions
Statistics Unit 2 Flashcards
31 questions
Measures of Centre - Chapter 3-1
10 questions

Measures of Centre - Chapter 3-1

BreathtakingIllumination avatar
BreathtakingIllumination
Use Quizgecko on...
Browser
Browser