Podcast
Questions and Answers
Which of the following is NOT a quantitative measure used in descriptive statistics?
Which of the following is NOT a quantitative measure used in descriptive statistics?
- Mean
- Color (correct)
- Median
- Mode
The Excel Analysis Toolpak add-in can only perform simple statistical computations.
The Excel Analysis Toolpak add-in can only perform simple statistical computations.
False (B)
What type of summary is a frequency distribution?
What type of summary is a frequency distribution?
tabular
The relative frequency is the ______ or proportion of observations that fall within a cell.
The relative frequency is the ______ or proportion of observations that fall within a cell.
A graphical representation of a frequency distribution is known as what?
A graphical representation of a frequency distribution is known as what?
When creating a histogram in Excel for numerical data with few discrete values, it's necessary to define a bin range in the Excel dialog box.
When creating a histogram in Excel for numerical data with few discrete values, it's necessary to define a bin range in the Excel dialog box.
Cell intervals in a histogram should be of what width?
Cell intervals in a histogram should be of what width?
The formula to calculate the cell width in a histogram is (largest value – smallest value)/number of ______.
The formula to calculate the cell width in a histogram is (largest value – smallest value)/number of ______.
What does the cumulative relative frequency represent?
What does the cumulative relative frequency represent?
The Excel function to create a histogram is AVERAGE.
The Excel function to create a histogram is AVERAGE.
What is the term for dividing a data set into four equal parts?
What is the term for dividing a data set into four equal parts?
A division of a data set into 100 equal parts, showing the points below which k percent of the observations lie, is called ______.
A division of a data set into 100 equal parts, showing the points below which k percent of the observations lie, is called ______.
Which of the following is NOT a measure of descriptive statistics for numerical data?
Which of the following is NOT a measure of descriptive statistics for numerical data?
The arithmetic mean is suitable for nominal data.
The arithmetic mean is suitable for nominal data.
Which measure of central tendency is most affected by outliers?
Which measure of central tendency is most affected by outliers?
The middle value when data are ordered from smallest to largest is the ______.
The middle value when data are ordered from smallest to largest is the ______.
Which of the following measures of central tendency is NOT affected by extremes?
Which of the following measures of central tendency is NOT affected by extremes?
The Excel function for finding the median of a dataset is AVERAGE(data range).
The Excel function for finding the median of a dataset is AVERAGE(data range).
What is the observation that occurs most frequently in a dataset called?
What is the observation that occurs most frequently in a dataset called?
The average of the largest and smallest observations in a dataset is the ______.
The average of the largest and smallest observations in a dataset is the ______.
Which measure describes the degree of variation in the data?
Which measure describes the degree of variation in the data?
The interquartile range is calculated as $Q_1 - Q_3$.
The interquartile range is calculated as $Q_1 - Q_3$.
What is the difference between the maximum and minimum observations in a dataset called?
What is the difference between the maximum and minimum observations in a dataset called?
The Excel functions VAR.P and VAR.S calculate the ______ of a dataset.
The Excel functions VAR.P and VAR.S calculate the ______ of a dataset.
Which of the following is TRUE about standard deviation?
Which of the following is TRUE about standard deviation?
Chebyshev's Theorem is only applicable to normally distributed data.
Chebyshev's Theorem is only applicable to normally distributed data.
According to the Empirical Rule, approximately what percentage of observations will fall within one standard deviation of the mean in a normal distribution?
According to the Empirical Rule, approximately what percentage of observations will fall within one standard deviation of the mean in a normal distribution?
The Coefficient of Variation (CV) is calculated as Standard Deviation / ______.
The Coefficient of Variation (CV) is calculated as Standard Deviation / ______.
What does a coefficient of skewness (CS) value between -0.5 and 05 indicate?
What does a coefficient of skewness (CS) value between -0.5 and 05 indicate?
A higher kurtosis indicates a flatter distribution with a wide degree of dispersion.
A higher kurtosis indicates a flatter distribution with a wide degree of dispersion.
What is the name of the Excel tool that can be used to quickly calculate various descriptive statistics such as mean, median, and standard deviation?
What is the name of the Excel tool that can be used to quickly calculate various descriptive statistics such as mean, median, and standard deviation?
The Excel function SKEW calculates the ______ of a dataset.
The Excel function SKEW calculates the ______ of a dataset.
A correlation coefficient of 0 indicates what type of relationship between two variables?
A correlation coefficient of 0 indicates what type of relationship between two variables?
A negative correlation coefficient indicates that as one variable increases, the other variable also increases
A negative correlation coefficient indicates that as one variable increases, the other variable also increases
What Excel function would you utilize to find correlation between two datasets?
What Excel function would you utilize to find correlation between two datasets?
A ______ is useful for counting observation meeting a criteron to compute proportions.
A ______ is useful for counting observation meeting a criteron to compute proportions.
Proportions of students are best organized inside of which type of table?
Proportions of students are best organized inside of which type of table?
Excel PivotTables cannot be customized after the table creation is complete.
Excel PivotTables cannot be customized after the table creation is complete.
In a PivotTable, what area do you drag fields with categories to?
In a PivotTable, what area do you drag fields with categories to?
Estimated mean may be used in the calculation of the ______ for grouped frequency distributions
Estimated mean may be used in the calculation of the ______ for grouped frequency distributions
Flashcards
Descriptive Statistics
Descriptive Statistics
Quantitative measures and ways of describing data, including measures of central tendency, dispersion and frequency distributions.
Frequency Distribution
Frequency Distribution
A tabular summary showing the frequency of observations in each of several non-overlapping classes or cells.
Relative Frequency
Relative Frequency
Fraction or proportion of observations that fall within a cell.
Histogram
Histogram
Signup and view all the flashcards
Quartiles
Quartiles
Signup and view all the flashcards
Deciles
Deciles
Signup and view all the flashcards
Percentiles
Percentiles
Signup and view all the flashcards
Measures of Location
Measures of Location
Signup and view all the flashcards
Measures of Dispersion
Measures of Dispersion
Signup and view all the flashcards
Measures of Shape
Measures of Shape
Signup and view all the flashcards
Measures of Association
Measures of Association
Signup and view all the flashcards
Arithmetic Mean
Arithmetic Mean
Signup and view all the flashcards
Median
Median
Signup and view all the flashcards
Mode
Mode
Signup and view all the flashcards
Midrange
Midrange
Signup and view all the flashcards
Dispersion
Dispersion
Signup and view all the flashcards
Range
Range
Signup and view all the flashcards
Interquartile Range
Interquartile Range
Signup and view all the flashcards
Variance
Variance
Signup and view all the flashcards
Standard Deviation
Standard Deviation
Signup and view all the flashcards
Chebyshev's Theorem
Chebyshev's Theorem
Signup and view all the flashcards
Empirical Rules
Empirical Rules
Signup and view all the flashcards
Coefficient of Variation
Coefficient of Variation
Signup and view all the flashcards
Skewness
Skewness
Signup and view all the flashcards
Kurtosis
Kurtosis
Signup and view all the flashcards
Correlation
Correlation
Signup and view all the flashcards
Sample Proportion (p)
Sample Proportion (p)
Signup and view all the flashcards
Cross-Tabulation
Cross-Tabulation
Signup and view all the flashcards
Box Plots
Box Plots
Signup and view all the flashcards
Outliers
Outliers
Signup and view all the flashcards
Study Notes
- Descriptive statistics involves quantitative measures for describing data.
Categories of Descriptive Statistics
- Measures of central tendency include mean, median, mode and proportion.
- Measures of dispersion include range, variance, and standard deviation.
- Frequency distributions and histograms also fall under descriptive statistics.
Statistical Support in Excel
- Statistical functions can be directly entered into worksheet cells or embedded in formulas.
- The Excel Analysis Toolpak add-in facilitates more complex statistical computations.
- The Prentice-Hall statistics add-in, PHStat, performs analyses not designed into Excel.
Frequency Distribution
- It is a tabular summary that shows the frequency of observations in non-overlapping classes or cells.
Relative Frequency Distribution
- Relative frequency represents the fraction or proportion of observations falling within a cell.
Histogram
- It is a graphical representation of a frequency distribution.
Excel Histogram Tool
- Access via Excel Menu > Tools > Data Analysis > Histogram.
- Specify the range of data to be analyzed.
- It is recommended to define and specify a bin range.
- Chart Output should always be checked in the output options.
Histograms for Numerical Data
- For data with few discrete values, leave the Bin Range blank in Excel.
- For data with many discrete or continuous values, define a Bin Range in the spreadsheet.
Guidelines for Good Practice
- Cell intervals should maintain equal width.
- Choose the width using the formula: (largest value – smallest value)/number of cells, rounded to reasonable numbers.
- Aim for somewhere between 5 to 15 cells to display a useful data picture.
Cumulative Relative Frequency
- Cumulative relative frequency shows the proportion or percentage of observations falling below the upper limit of a cell.
Excel Frequency Function
- Define bins in Excel.
- Select a range of cells adjacent to the bin range.
- For continuous data, adding an empty cell below this range serves as an overflow cell.
- Enter the formula =FREQUENCY(range of data, range of bins) and press Ctrl-Shift-Enter simultaneously.
- Construct a histogram with the Chart Wizard using a column chart.
Data Profiles (Fractiles)
- Data profiles describe the location and spread of data.
- Quartiles divide a data set into four equal parts, showing points below which 25%, 50%, 75%, and 100% of observations lie.
- 25% is the first quartile and 75% is the third quartile.
- Deciles divide a data set into 10 equal parts, showing points below which 10%, 20%, etc., of observations lie.
- Percentiles divide a data set into 100 equal parts, showing points below which “k” percent of observations lie.
Descriptive Statistics for Numerical Data
- Measures of location indicate central values.
- Measures of dispersion quantify the spread of data.
- Measures of shape describe the distribution's form.
- Measures of association assess relationships between variables.
Arithmetic Mean
- Population mean is calculated using the formula µ = Σxi / N, where N is the population size.
- Sample mean is calculated using the formula x = Σxi / n, where n is the sample size.
- The Excel function AVERAGE(data range) calculates the arithmetic mean.
Properties of the Mean
- It is meaningful for interval and ratio data.
- All data points are used in the calculation.
- Each data set has a unique mean.
- Unusually large or small observations (outliers) affect the mean.
- The sum of deviations of each value from the mean is zero, Σ(xi – x ) = 0.
Median
- It is the middle value when data are ordered from smallest to largest, resulting in an equal number of observations above and below it.
- Each data set has a unique median.
- The median is not affected by extreme values.
- It is meaningful for ratio, interval, and ordinal data.
- The Excel function MEDIAN(data range) calculates the median.
Mode
- It is the observation occurring most frequently.
- For grouped data, it is the midpoint of the cell with the largest frequency, providing an approximate value.
- It is most useful when data consists of a small number of unique values.
- The Excel functions MODE.SNGL(data range) and MODE.MULT(data range) determine the mode(s).
Midrange
- It is the average of the largest and smallest observations.
- It is useful for very small samples, but is susceptible to distortion by extremedata values
Measures of Dispersion
- Dispersion refers to the degree of variation in data.
- For example, {48, 49, 50, 51, 52} has less dispersion than {10, 30, 50, 70, 90}, although both have a mean of 50.
Range Measures
- The range is the difference between the maximum and minimum observations.
- It is useful for small samples but can be distorted by extreme values.
- The interquartile range is Q3 – Q1.
- It avoids problems with outliers.
Variance
- Population variance is calculated using the formula σ² = Σ(xi - µ)² / N.
- Sample variance is calculated using the formula s² = Σ(xi - x)² / (n - 1).
- Excel functions VAR.P(data range) and VAR.S(data range) compute variance.
Standard Deviation
- Population standard deviation is σ = √[Σ(xi - µ)² / N].
- Sample standard deviation is s = √[Σ(xi - x)² / (n - 1)].
- Standard deviation shares original data units, unlike variance.
- Excel functions STDEV.P(data range) and STDEV.S(data range) derive standard deviation.
Chebyshev’s Theorem
- For any data set, the proportion of values within k standard deviations of the mean is at least 1 – 1/k², for any k > 1.
- For k = 2, at least ¾ of data lie within 2 standard deviations of the mean.
- For k = 3, at least 8/9 (approximately 89%) of data lie within 3 standard deviations.
- For k = 10, at least 99/100 (99%) of data lie within 10 standard deviations of the mean.
Empirical Rules
- Approximately 68% of observations fall within one standard deviation of the mean.
- Approximately 95% of observations fall within two standard deviations of the mean.
- Approximately 99.7% of observations fall within three standard deviations of the mean.
Coefficient of Variation (CV)
- CV is defined as (Standard Deviation / Mean).
- It is dimensionless and useful when comparing differently scaled data sets.
Skewness
- The coefficient of skewness (CS) measures the asymmetry of a distribution.
- -0.5 < CS < 0.5 indicates relative symmetry.
- CS > 1 or CS < -1 indicates a high degree of skewness.
- The Excel function SKEW(data range) computes skewness.
Kurtosis
- It refers to the peakedness or flatness of a distribution.
- The coefficient of kurtosis (CK) indicates distribution characteristics.
- CK < 3 means the distribution is flatter with a wide degree of dispersion.
- CK > 3 means the distribution is more peaked with less dispersion.
- A higher kurtosis value means more area in the tails of the distribution.
- The Excel function KURT(data range) determines kurtosis.
Excel Descriptive Statistics Tool
- Excel includes a tool for generating descriptive statistics.
Measures of Association
- Correlation measures the strength of the linear relationship between two variables.
- The correlation coefficient ranges between -1 and 1.
- A correlation of 0 suggests no linear association.
- A positive correlation indicates that one variable increases as the other increases.
- A negative correlation indicates that one variable increases as the other decreases.
- The Excel function CORREL or the Data Analysis Correlation tool can be utilized.
Excel Tool - Correlation
- Access via Excel menu > Tools > Data Analysis > Correlation.
Descriptive Statistics for Categorical Data
- Sample proportion is the fraction of data with a certain characteristic.
- Use the Excel function COUNTIF(data range, criteria) to count observations meeting a criterion, in order to compute proportions.
Cross-Tabulation (Contingency Table)
- It is a table that displays the number of observations in a data set for different subcategories of two categorical variables.
- Subcategories must be mutually exclusive and exhaustive, meaning each observation fits into only one subcategory, and all subcategories constitute the complete data set.
Box Plots
- Box plots display the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values graphically.
Dot Scale Diagram
- PHStat menu > Descriptive Statistics > Dot Scale Diagram.
Outliers
- Outliers significantly affect the results of statistical analyses.
- Box plots and dot-scale diagrams can help identify possible outliers visually.
Other approaches to identifying outliers
- Use the empirical rule; an outlier is a value more than three standard deviations from the mean.
- “Mild” outliers are often defined as being between 1.5IQR and 3IQR to the left of Q 1 or to the right of Q 3.
- "Extreme" outliers area values more than 3*IQR away from these quartiles.
PivotTables
- Create custom summaries and charts from data, requiring a data set with column labels.
- Select any cell and choose PivotTable Report from the Data menu, following the wizard steps.
Blank PivotTable
- Drag the desired data to Row Labels, Column Labels or Values.
Example PivotTable Usage
- Gender from the PivotTable Field List is dragged to the Row Labels area.
- Graduate Degree? is dragged to the Column Labels area.
- Years of Service is dragged to the Values area.
Value Field Settings
- In the Options tab (under PivotTable Tools), click on the Active Field group and choose Value Field Settings to change the summary type.
Changing PivotTable Views
- Uncheck the boxes in the PivotTable Field List
- Drag the variable names to different field areas.
PivotTables for Cross Tabulation
- PivotTables used for for cross tabulation.
Calculation of Mean in Grouped Data
- Sample mean x = Σfixi / n.
- Population mean µ = Σfixi / N.
Grouped Frequency Distribution
- Estimate a mean through use of the midpoint and the quantity in each cell.
Calculation of Variance in Grouped Data
- Sample variance s2 = Σfi(xi - x)2 / (n - 1).
- Population variance σ2 = Σfi(xi - µ)2 / N.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore descriptive statistics, including measures of central tendency and dispersion. Learn about frequency distributions, relative frequency, and how to represent data graphically using histograms. Also, discover how to use Microsoft Excel for statistical analysis.