Descriptive Statistics and Frequency Distribution
40 Questions
5 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is NOT a quantitative measure used in descriptive statistics?

  • Mean
  • Color (correct)
  • Median
  • Mode

The Excel Analysis Toolpak add-in can only perform simple statistical computations.

False (B)

What type of summary is a frequency distribution?

tabular

The relative frequency is the ______ or proportion of observations that fall within a cell.

<p>fraction</p> Signup and view all the answers

A graphical representation of a frequency distribution is known as what?

<p>Histogram (B)</p> Signup and view all the answers

When creating a histogram in Excel for numerical data with few discrete values, it's necessary to define a bin range in the Excel dialog box.

<p>False (B)</p> Signup and view all the answers

Cell intervals in a histogram should be of what width?

<p>equal</p> Signup and view all the answers

The formula to calculate the cell width in a histogram is (largest value – smallest value)/number of ______.

<p>cells</p> Signup and view all the answers

What does the cumulative relative frequency represent?

<p>The proportion of observations that fall below the upper limit of a cell (D)</p> Signup and view all the answers

The Excel function to create a histogram is AVERAGE.

<p>False (B)</p> Signup and view all the answers

What is the term for dividing a data set into four equal parts?

<p>quartiles</p> Signup and view all the answers

A division of a data set into 100 equal parts, showing the points below which k percent of the observations lie, is called ______.

<p>percentiles</p> Signup and view all the answers

Which of the following is NOT a measure of descriptive statistics for numerical data?

<p>Measures of taste (D)</p> Signup and view all the answers

The arithmetic mean is suitable for nominal data.

<p>False (B)</p> Signup and view all the answers

Which measure of central tendency is most affected by outliers?

<p>mean</p> Signup and view all the answers

The middle value when data are ordered from smallest to largest is the ______.

<p>median</p> Signup and view all the answers

Which of the following measures of central tendency is NOT affected by extremes?

<p>Median (B)</p> Signup and view all the answers

The Excel function for finding the median of a dataset is AVERAGE(data range).

<p>False (B)</p> Signup and view all the answers

What is the observation that occurs most frequently in a dataset called?

<p>mode</p> Signup and view all the answers

The average of the largest and smallest observations in a dataset is the ______.

<p>midrange</p> Signup and view all the answers

Which measure describes the degree of variation in the data?

<p>Dispersion (C)</p> Signup and view all the answers

The interquartile range is calculated as $Q_1 - Q_3$.

<p>False (B)</p> Signup and view all the answers

What is the difference between the maximum and minimum observations in a dataset called?

<p>range</p> Signup and view all the answers

The Excel functions VAR.P and VAR.S calculate the ______ of a dataset.

<p>variance</p> Signup and view all the answers

Which of the following is TRUE about standard deviation?

<p>It is the square root of the variance. (D)</p> Signup and view all the answers

Chebyshev's Theorem is only applicable to normally distributed data.

<p>False (B)</p> Signup and view all the answers

According to the Empirical Rule, approximately what percentage of observations will fall within one standard deviation of the mean in a normal distribution?

<p>68%</p> Signup and view all the answers

The Coefficient of Variation (CV) is calculated as Standard Deviation / ______.

<p>mean</p> Signup and view all the answers

What does a coefficient of skewness (CS) value between -0.5 and 05 indicate?

<p>Relative symmetry (A)</p> Signup and view all the answers

A higher kurtosis indicates a flatter distribution with a wide degree of dispersion.

<p>False (B)</p> Signup and view all the answers

What is the name of the Excel tool that can be used to quickly calculate various descriptive statistics such as mean, median, and standard deviation?

<p>Descriptive Statistics</p> Signup and view all the answers

The Excel function SKEW calculates the ______ of a dataset.

<p>skewness</p> Signup and view all the answers

A correlation coefficient of 0 indicates what type of relationship between two variables?

<p>No linear relationship (C)</p> Signup and view all the answers

A negative correlation coefficient indicates that as one variable increases, the other variable also increases

<p>False (B)</p> Signup and view all the answers

What Excel function would you utilize to find correlation between two datasets?

<p>CORREL</p> Signup and view all the answers

A ______ is useful for counting observation meeting a criteron to compute proportions.

<p>COUNTIF</p> Signup and view all the answers

Proportions of students are best organized inside of which type of table?

<p>Contingency Table (C)</p> Signup and view all the answers

Excel PivotTables cannot be customized after the table creation is complete.

<p>False (B)</p> Signup and view all the answers

In a PivotTable, what area do you drag fields with categories to?

<p>Row Labels</p> Signup and view all the answers

Estimated mean may be used in the calculation of the ______ for grouped frequency distributions

<p>variance</p> Signup and view all the answers

Flashcards

Descriptive Statistics

Quantitative measures and ways of describing data, including measures of central tendency, dispersion and frequency distributions.

Frequency Distribution

A tabular summary showing the frequency of observations in each of several non-overlapping classes or cells.

Relative Frequency

Fraction or proportion of observations that fall within a cell.

Histogram

A graphical representation of a frequency distribution.

Signup and view all the flashcards

Quartiles

A division of a data set into four equal parts.

Signup and view all the flashcards

Deciles

A division of a data set into 10 equal parts.

Signup and view all the flashcards

Percentiles

A division of a data set into 100 equal parts.

Signup and view all the flashcards

Measures of Location

Measures the central value.

Signup and view all the flashcards

Measures of Dispersion

Measures the spread of data.

Signup and view all the flashcards

Measures of Shape

Measures the symmetry or asymmetry.

Signup and view all the flashcards

Measures of Association

Describes the statistical relationship between two or more variables

Signup and view all the flashcards

Arithmetic Mean

Sum of values divided by count.

Signup and view all the flashcards

Median

Middle value when data is ordered.

Signup and view all the flashcards

Mode

Observation that occurs most frequently.

Signup and view all the flashcards

Midrange

Average of largest and smallest values.

Signup and view all the flashcards

Dispersion

Degree of variation in a data set.

Signup and view all the flashcards

Range

Difference between max and min.

Signup and view all the flashcards

Interquartile Range

Q3 minus Q1.

Signup and view all the flashcards

Variance

Average squared deviation from mean.

Signup and view all the flashcards

Standard Deviation

Square root of variance.

Signup and view all the flashcards

Chebyshev's Theorem

Proportion within k deviations of mean.

Signup and view all the flashcards

Empirical Rules

Approximate data percentages within deviations.

Signup and view all the flashcards

Coefficient of Variation

Standard deviation divided by mean.

Signup and view all the flashcards

Skewness

Measure of data symmetry.

Signup and view all the flashcards

Kurtosis

Measure of distribution peakedness.

Signup and view all the flashcards

Correlation

Measure of linear relationship strength.

Signup and view all the flashcards

Sample Proportion (p)

Fraction with a characteristic.

Signup and view all the flashcards

Cross-Tabulation

Table of categorical variable counts.

Signup and view all the flashcards

Box Plots

minimum, first quartile, median, third quartile and maximum

Signup and view all the flashcards

Outliers

Values that make significant difference in a dataset.

Signup and view all the flashcards

Study Notes

  • Descriptive statistics involves quantitative measures for describing data.

Categories of Descriptive Statistics

  • Measures of central tendency include mean, median, mode and proportion.
  • Measures of dispersion include range, variance, and standard deviation.
  • Frequency distributions and histograms also fall under descriptive statistics.

Statistical Support in Excel

  • Statistical functions can be directly entered into worksheet cells or embedded in formulas.
  • The Excel Analysis Toolpak add-in facilitates more complex statistical computations.
  • The Prentice-Hall statistics add-in, PHStat, performs analyses not designed into Excel.

Frequency Distribution

  • It is a tabular summary that shows the frequency of observations in non-overlapping classes or cells.

Relative Frequency Distribution

  • Relative frequency represents the fraction or proportion of observations falling within a cell.

Histogram

  • It is a graphical representation of a frequency distribution.

Excel Histogram Tool

  • Access via Excel Menu > Tools > Data Analysis > Histogram.
  • Specify the range of data to be analyzed.
  • It is recommended to define and specify a bin range.
  • Chart Output should always be checked in the output options.

Histograms for Numerical Data

  • For data with few discrete values, leave the Bin Range blank in Excel.
  • For data with many discrete or continuous values, define a Bin Range in the spreadsheet.

Guidelines for Good Practice

  • Cell intervals should maintain equal width.
  • Choose the width using the formula: (largest value – smallest value)/number of cells, rounded to reasonable numbers.
  • Aim for somewhere between 5 to 15 cells to display a useful data picture.

Cumulative Relative Frequency

  • Cumulative relative frequency shows the proportion or percentage of observations falling below the upper limit of a cell.

Excel Frequency Function

  • Define bins in Excel.
  • Select a range of cells adjacent to the bin range.
    • For continuous data, adding an empty cell below this range serves as an overflow cell.
  • Enter the formula =FREQUENCY(range of data, range of bins) and press Ctrl-Shift-Enter simultaneously.
  • Construct a histogram with the Chart Wizard using a column chart.

Data Profiles (Fractiles)

  • Data profiles describe the location and spread of data.
  • Quartiles divide a data set into four equal parts, showing points below which 25%, 50%, 75%, and 100% of observations lie.
    • 25% is the first quartile and 75% is the third quartile.
  • Deciles divide a data set into 10 equal parts, showing points below which 10%, 20%, etc., of observations lie.
  • Percentiles divide a data set into 100 equal parts, showing points below which “k” percent of observations lie.

Descriptive Statistics for Numerical Data

  • Measures of location indicate central values.
  • Measures of dispersion quantify the spread of data.
  • Measures of shape describe the distribution's form.
  • Measures of association assess relationships between variables.

Arithmetic Mean

  • Population mean is calculated using the formula µ = Σxi / N, where N is the population size.
  • Sample mean is calculated using the formula x = Σxi / n, where n is the sample size.
  • The Excel function AVERAGE(data range) calculates the arithmetic mean.

Properties of the Mean

  • It is meaningful for interval and ratio data.
  • All data points are used in the calculation.
  • Each data set has a unique mean.
  • Unusually large or small observations (outliers) affect the mean.
  • The sum of deviations of each value from the mean is zero, Σ(xi – x ) = 0.

Median

  • It is the middle value when data are ordered from smallest to largest, resulting in an equal number of observations above and below it.
  • Each data set has a unique median.
  • The median is not affected by extreme values.
  • It is meaningful for ratio, interval, and ordinal data.
  • The Excel function MEDIAN(data range) calculates the median.

Mode

  • It is the observation occurring most frequently.
    • For grouped data, it is the midpoint of the cell with the largest frequency, providing an approximate value.
  • It is most useful when data consists of a small number of unique values.
  • The Excel functions MODE.SNGL(data range) and MODE.MULT(data range) determine the mode(s).

Midrange

  • It is the average of the largest and smallest observations.
  • It is useful for very small samples, but is susceptible to distortion by extremedata values

Measures of Dispersion

  • Dispersion refers to the degree of variation in data.
  • For example, {48, 49, 50, 51, 52} has less dispersion than {10, 30, 50, 70, 90}, although both have a mean of 50.

Range Measures

  • The range is the difference between the maximum and minimum observations.
    • It is useful for small samples but can be distorted by extreme values.
  • The interquartile range is Q3 – Q1.
    • It avoids problems with outliers.

Variance

  • Population variance is calculated using the formula σ² = Σ(xi - µ)² / N.
  • Sample variance is calculated using the formula s² = Σ(xi - x)² / (n - 1).
  • Excel functions VAR.P(data range) and VAR.S(data range) compute variance.

Standard Deviation

  • Population standard deviation is σ = √[Σ(xi - µ)² / N].
  • Sample standard deviation is s = √[Σ(xi - x)² / (n - 1)].
  • Standard deviation shares original data units, unlike variance.
  • Excel functions STDEV.P(data range) and STDEV.S(data range) derive standard deviation.

Chebyshev’s Theorem

  • For any data set, the proportion of values within k standard deviations of the mean is at least 1 – 1/k², for any k > 1.
    • For k = 2, at least ¾ of data lie within 2 standard deviations of the mean.
    • For k = 3, at least 8/9 (approximately 89%) of data lie within 3 standard deviations.
    • For k = 10, at least 99/100 (99%) of data lie within 10 standard deviations of the mean.

Empirical Rules

  • Approximately 68% of observations fall within one standard deviation of the mean.
  • Approximately 95% of observations fall within two standard deviations of the mean.
  • Approximately 99.7% of observations fall within three standard deviations of the mean.

Coefficient of Variation (CV)

  • CV is defined as (Standard Deviation / Mean).
  • It is dimensionless and useful when comparing differently scaled data sets.

Skewness

  • The coefficient of skewness (CS) measures the asymmetry of a distribution.
    • -0.5 < CS < 0.5 indicates relative symmetry.
    • CS > 1 or CS < -1 indicates a high degree of skewness.
  • The Excel function SKEW(data range) computes skewness.

Kurtosis

  • It refers to the peakedness or flatness of a distribution.
  • The coefficient of kurtosis (CK) indicates distribution characteristics.
    • CK < 3 means the distribution is flatter with a wide degree of dispersion.
    • CK > 3 means the distribution is more peaked with less dispersion.
  • A higher kurtosis value means more area in the tails of the distribution.
  • The Excel function KURT(data range) determines kurtosis.

Excel Descriptive Statistics Tool

  • Excel includes a tool for generating descriptive statistics.

Measures of Association

  • Correlation measures the strength of the linear relationship between two variables.
  • The correlation coefficient ranges between -1 and 1.
    • A correlation of 0 suggests no linear association.
    • A positive correlation indicates that one variable increases as the other increases.
    • A negative correlation indicates that one variable increases as the other decreases.
  • The Excel function CORREL or the Data Analysis Correlation tool can be utilized.

Excel Tool - Correlation

  • Access via Excel menu > Tools > Data Analysis > Correlation.

Descriptive Statistics for Categorical Data

  • Sample proportion is the fraction of data with a certain characteristic.
  • Use the Excel function COUNTIF(data range, criteria) to count observations meeting a criterion, in order to compute proportions.

Cross-Tabulation (Contingency Table)

  • It is a table that displays the number of observations in a data set for different subcategories of two categorical variables.
  • Subcategories must be mutually exclusive and exhaustive, meaning each observation fits into only one subcategory, and all subcategories constitute the complete data set.

Box Plots

  • Box plots display the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values graphically.

Dot Scale Diagram

  • PHStat menu > Descriptive Statistics > Dot Scale Diagram.

Outliers

  • Outliers significantly affect the results of statistical analyses.
  • Box plots and dot-scale diagrams can help identify possible outliers visually.

Other approaches to identifying outliers

  • Use the empirical rule; an outlier is a value more than three standard deviations from the mean.
  • “Mild” outliers are often defined as being between 1.5IQR and 3IQR to the left of Q 1 or to the right of Q 3.
  • "Extreme" outliers area values more than 3*IQR away from these quartiles.

PivotTables

  • Create custom summaries and charts from data, requiring a data set with column labels.
  • Select any cell and choose PivotTable Report from the Data menu, following the wizard steps.

Blank PivotTable

  • Drag the desired data to Row Labels, Column Labels or Values.

Example PivotTable Usage

  • Gender from the PivotTable Field List is dragged to the Row Labels area.
  • Graduate Degree? is dragged to the Column Labels area.
  • Years of Service is dragged to the Values area.

Value Field Settings

  • In the Options tab (under PivotTable Tools), click on the Active Field group and choose Value Field Settings to change the summary type.

Changing PivotTable Views

  • Uncheck the boxes in the PivotTable Field List
  • Drag the variable names to different field areas.

PivotTables for Cross Tabulation

  • PivotTables used for for cross tabulation.

Calculation of Mean in Grouped Data

  • Sample mean x = Σfixi / n.
  • Population mean µ = Σfixi / N.

Grouped Frequency Distribution

  • Estimate a mean through use of the midpoint and the quantity in each cell.

Calculation of Variance in Grouped Data

  • Sample variance s2 = Σfi(xi - x)2 / (n - 1).
  • Population variance σ2 = Σfi(xi - µ)2 / N.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore descriptive statistics, including measures of central tendency and dispersion. Learn about frequency distributions, relative frequency, and how to represent data graphically using histograms. Also, discover how to use Microsoft Excel for statistical analysis.

More Like This

Use Quizgecko on...
Browser
Browser