Types of Data in Statistics

CureAllHammeredDulcimer avatar
CureAllHammeredDulcimer
·
·
Download

Start Quiz

Study Flashcards

10 Questions

What is the general rule that states that at least 1-1/k^2 of the data lies within the interval (x̅ - ks, x̅ + ks)?

Chebyshev's Rule

What is the main purpose of discussing the 'center' of the data in descriptive statistics?

To determine the mean of the data

What is the measure of shape of a distribution?

Symmetric or skewed

What is the primary difference between Model A and Model B in terms of their EPA mileage ratings?

Model B has a lower mean EPA mileage rating

What is the purpose of discussing variability in descriptive statistics?

To understand the spread of the data

What is the result of Chebyshev's Rule when k = 2?

At least 1/2 of the data lies within the interval (x̅ - 2s, x̅ + 2s)

What is the purpose of including a summary evaluation in descriptive statistics?

To make a subjective interpretation of the data

What is the main difference between the standard deviation of Model A and Model B?

Model A has a higher standard deviation than Model B

What is the sequence of a descriptive statistics analysis?

Begin with a discussion of the center of the data, followed by a discussion of variability, and end with a summary evaluation

What is the purpose of using numbers in a descriptive statistics analysis?

To make the analysis more concise and clear

Study Notes

Types of Data

  • Qualitative (categorical) data: • Measured by classification • Non-numerical in nature • Categories with a meaningful order are ordinal data • Categories without a meaningful order are nominal data • Typically, categories are mutually exclusive and collectively exhaustive
  • Quantitative (numerical) data: • Measured on a naturally occurring scale • Allow for meaningful mathematical calculations

Cross-Sectional and Time Series Data

  • Cross-sectional data: • Collected at the same or approximately the same point in time • Example: average age of people in each state in 2013
  • Time series data: • Collected over several consecutive time periods for the same unit • Example: a company's daily stock price in April 2014
  • Panel data: • Combines both cross-sectional and time series data

Key Terms

  • Population (universe): • Data on all units/items of interest
  • Sample: • Portion of population
  • Parameter: • Summary measure about population
  • Statistic: • Summary measure about sample

Samples

  • Samples need to be: • Representative to reflect the population of interest • Random to ensure that each subset of fixed size is equally likely to be selected • Large, the more data the better

Presenting Qualitative Data

  • Frequency table: • A summary of data showing the frequency (or count) of items in each of several non-overlapping classes • Relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class • Percent frequency of a class is the relative frequency multiplied by 100
  • Bar graph: • Bars (or columns) arranged in descending order of the values from top to bottom (or from left to right)
  • Pie chart: • A circular graph divided into sectors to show proportion of each category
  • Pareto diagram: • Bars (or columns) arranged in descending order of the values from top to bottom (or from left to right)

Presenting Quantitative Data

  • Frequency distribution: • Determine range • Select the number of classes (usually between 5 and 20) • Compute class intervals (width) • Determine class boundaries (limits) • Count observations in each class
  • Histogram: • A common graphical presentation of quantitative data • The variable of interest is placed on the horizontal axis, and the frequency, relative frequency, or percent frequency is placed on the vertical axis • A rectangle is drawn above each class interval with its height corresponding to the interval's frequency
  • Dot plot: • A simple graphical summary of data • A horizontal axis shows the range of data values • Each data value is represented by a dot placed above the axis
  • Stem-and-leaf display: • Shows both the order of data and the shape of the distribution of the data • Similar to a histogram, but it has the advantage of showing the actual data values • Divide each observation into stem value and leaf value • Stem value defines the class • Leaf value defines the frequency (count)

Graphing Bivariate Relationships

  • Describes a relationship between two variables
  • Plotted as a scatterplot/scattergram

Measures of Variability

  • Variability: • The spread of the data across possible values • Commonly used measures of variability: range, interquartile range, variance, and standard deviation
  • Range: • Difference between the largest and the smallest values • Ignores how data are distributed • Sensitive to outliers
  • Quartiles: • Split ordered data into 4 segments (4 quarters) with an equal number of values in each segment • Position of i-th quartile: [i(n+4)]/4
  • Interquartile range: • Also called midspread • Difference between the third and first quartiles (spread in the middle 50%) • Not affected by extreme values
  • Sample variance: • The sum of the squared deviations from the mean divided by (n-1) • Expressed as "units" squared
  • Sample standard deviation: • The positive square root of the sample variance • Most commonly used measure of variation • Shows variation about the mean • Expressed in the original units of the data

Interpreting the Standard Deviation

  • Chebyshev's rule: • A rule of thumb that applies to any set of data regardless of the shape of the distribution • In general, for k > 1, at least 1 - 1/k^2 of the data lies within the interval (x̄ - k * s, x̄ + k * s)
  • Empirical rule: • A rule of thumb that applies to mound-shaped and symmetric distributions only • Approximately 68% of the data lies within 1 standard deviation of the mean • Approximately 95% of the data lies within 2 standard deviations of the mean • Approximately 99.7% of the data lies within 3 standard deviations of the mean

Using Descriptive Statistics

  • Begin with a discussion of the "center" of the data, generally based on the mean
  • Follow with a discussion of variability (and skew if appropriate)
  • End with a summary evaluation that may have a subjective component
  • Make sure to use numbers in a description wisely – not too few or too many

This quiz covers the basics of data types in statistics, including qualitative and quantitative data. Learn about categorization, ordinal and nominal data, and more.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Statistics and Research Methods Quiz
10 questions
Statistics and Data Analysis
8 questions
Statistics and Data Analysis
16 questions

Statistics and Data Analysis

DeftLivermorium8803 avatar
DeftLivermorium8803
Use Quizgecko on...
Browser
Browser