Statistic Reviewer PDF
Document Details
Uploaded by FoolproofLasVegas
Tags
Summary
This document reviews fundamental concepts in statistics, covering descriptive and inferential statistics, different types of variables (qualitative and quantitative—discrete and continuous), and levels of measurement (nominal, ordinal, interval, and ratio).
Full Transcript
STATISTIC REVIEWER (INTRODUCTORY V. QUALITATIVE VARIABLES and CONCEPTS) QUANTITATIVE VARIABLES I. MEANING OF STATISTICS STATISTICS – is the science of conducting studies to collect, organize, summarize, analyze,...
STATISTIC REVIEWER (INTRODUCTORY V. QUALITATIVE VARIABLES and CONCEPTS) QUANTITATIVE VARIABLES I. MEANING OF STATISTICS STATISTICS – is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. STATISTICS are like bikinis. What they reveal is suggestive, but what they conceal is vital. QUALITATIVE VARIABLES - are variables that have distinct categories according to some II. BRANCHES OF STATISTICS characteristic or attribute. DESCRIPTIVE STATISTICS – used to QUANTITATIVE VARIABLES – are variables describe, organize and summarize information that can be counted or measured. about an entire population. (i.e. 90% satisfaction of all customers) VI. TYPES OF QUANTITATIVE VARIABLE INFERENTIAL STATISTICS – used to generalize about a population based on a sample of data (i.e. 90% satisfaction of all customers) REMEMBER: Descriptive statistics summarize your current dataset and Inferential statistics aim to draw conclusions about an additional population outside of your dataset. DISCRETE VARIABLES – assume values that can be counted. III. POPULATION and SAMPLE Examples: The number of children in a family POPULATION The number of students in a classroom The measurable quality is called a parameter. CONTINUOUS VARIABLES – can assume an The population is a complete set. infinite number of values between any two Reports are a true representation of specific values. They are obtained by opinion. measuring. They often include fractions and It contains all members of a specified decimals. group. Examples: Temperature SAMPLE Height The measurable quality is called a Weight statistic. The sample is a subset of the population. VII. LEVELS or SCALES OF Reports have a margin of error and MEASUREMENT confidence interval. It is a subset that represents the entire The NOMINAL LEVEL OF MEASUREMENT population. classifies data into mutually exclusive (non- overlapping) categories in which no order or IV. CONSTANTS and VARIABLE ranking can be imposed on the data. Examples: CONSTANT is a characteristic or property of a Gender population or sample which makes the member Eye color similar to each other. The ORDINAL LEVEL OF MEASUREMENT VARIABLE is a characteristic of interest classifies data into categories that can be ranked; measurable on each and every individual in the however, precise differences between the ranks universe denoted by a capital letter in the do not exist. English alphabet which assumes different Examples: values or labels. Student letter grades Ranking of players The INTERVAL LEVEL OF MEASUREMENT Ungrouped frequency distribution lists the ranks data, and precise differences between data values with the corresponding number of units of measure do exist; however, there is no times or frequency count with which each value meaningful zero. occurs. Examples: Example: Temperature The following data represent the number Standardized exam score of defective bulbs observed each day over a 25-day period for a manufacturing The RATIO LEVEL OF MEASUREMENT process. Summarize the information possesses all the characteristics of interval with a frequency distribution. measurement, and there exists a true zero. In addition, true ratios exist when the same variable is measured on two different members of the population. Examples: Height Age Grouped frequency distribution is obtained by constructing classes (or intervals) for the data, and then listing the corresponding number of values (frequency count) in each interval. To construct a frequency distribution, follow these rules: THE FREQUENCY DISTRIBUTION 1. There should be between 5 and 20 classes. A frequency distribution is the organization of 2. It is preferable but not absolutely raw data in table form, using classes and necessary that the class width be an odd frequencies. number. 3. The classes must be mutually exclusive. Types of frequency distributions: 4. The classes must be continuous. Categorical frequency distribution 5. The classes must be exhaustive. Ungrouped frequency distribution 6. The classes must be equal in width. Grouped frequency distribution Constructing a Grouped Frequency The frequency or the frequency count for a data Distribution value is the number of times the value occurs in To construct a frequency distribution, follow the data set. these steps: 1. Determine the classes. Categorical frequency distribution represents o Find the highest and lowest data that can be placed in specific categories. values. o Find the range. Example: o Select the number of classes Twenty-five incoming freshmen were given a desired. blood test to determine their blood type. The o Find the width by dividing the data set is range by the number of classes A B B AB O and rounding up. O O B AB B o Select a starting point (usually the B B O A O lowest value or any convenient A O O O AB number less than the lowest AB A O B A value); add the width to get the lower limits. Construct a frequency distribution for the data. o Find the upper class limits. o Find the boundaries. 2. Tally the data. 3. Find the numerical frequencies from the tallies, and find the cumulative frequencies. Number of classes Sometimes it is necessary to use a cumulative Some statisticians use “2𝑘 ” rule. frequency distribution. A cumulative frequency 2𝑘 ≥ 𝒏 distribution is a distribution that shows the 2𝑘 rule is just a guide number of data values less than or equal to a If the 2𝑘 rule suggests you need 6 classes, specific value (usually an upper boundary). also consider using 5 or 7 classes … but certainly not 3 or 9. NOTE The class limits should have the same decimal place value as the data, but the THE FREQUENCY DISTRIBUTION USING class boundaries should have one MS EXCEL additional place value and end in a 5. The “frequency function” can be found in Example: Formulas menu under the statistical category by The data below represent the record high following the below steps as follows: temperatures in degrees Fahrenheit (F) for each of the 50 cities in the Philippines this April. ✔ Go to Formula menu. Construct a grouped frequency distribution for the data, using 7 classes. ✔ Click on More Function. Find the highest value and lowest value: H = 134 and L = 100. Find the range: R = highest value – lowest value H L, So R = 134 – 100 = 34 In this case, we will use 7 classes to construct the frequency distribution. Find the class width by dividing the ✔ Under Statistical category choose Frequency range by the number of classes. Function. ✔ We will get the Frequency Function Dialogue box as shown. THE MEASURES OF CENTRAL TENDENCY Measures of Central Tendency Mean (Arithmetic Mean) of Data Values Sample mean Population mean Mean (Arithmetic Mean) The Most Common Measure of Central Tendency Affected by Extreme Values (Outliers) Median Robust Measure of Central Tendency Not Affected by Extreme Values In an Ordered Array, the Median is the ‘Middle’ Number If n or N is odd, the median is the middle number. If n or N is even, the median is the average of the 2 middle numbers. Mode A Measure of Central Tendency Value that Occurs Most Often Not Affected by Extreme Values There May Not Be a Mode There May Be Several Modes Used for Either Numerical or Categorical Data THE MEASURES OF CENTRAL TENDENCY USING EXCEL THE MEASURES OF LOCATION Location or Position Used to describe the position of a data value in relation to the rest of the data. Types: 1. Quartiles Q1 – Lower Quartile At most, 25% of data is smaller than Q1. It divides the lower half of a data set in half. Q2 – Median The median divides the data set in half. 50% of the data values fall below the median and 50% fall above. Q3 – Upper Quartile At most, 25% of data is larger than Q3. It divides the upper half of the data set in half. Interquartile Range The inter quartile range is Q3-Q1 50% of the observations in the THE MEASURES OF LOCATION USING distribution are in the inter quartile EXCEL range. The following figure shows the interaction between the quartiles, the median and the inter quartile range. 2. Deciles 3. Percentiles MEASURES OF VARIATION Measuring Variability Variability can be measured with o the range o the interquartile range o the standard deviation/variance o Coefficient of variation. In each case, variability is determined by measuring distance. The Range The range is the total distance covered by the distribution, from the highest score to the lowest score. Limitations of the Range It is based only on two values and does not cover all the data values in a data set. It is subject to wide fluctuations from sample to sample based on the same population. It fails to give any idea about the pattern of distribution. It is not possible to compute the range Properties of the Standard Deviation when the distribution is open-ended. If a constant is added to every score in a distribution, the standard deviation will The Standard Deviation not be changed. Standard deviation measures the standard If you visualize the scores in a frequency distance between a score and the mean. distribution histogram, then adding a constant will move each score so that the entire distribution is shifted to a new location. The center of the distribution (the mean) The Variance changes, but the standard deviation The population variance is the average of the remains the same. squares of the distance each value is from the mean. Properties of the Standard Deviation (cont.) If each score is multiplied by a constant, the standard deviation will be multiplied by the same constant. Multiplying by a constant will multiply the distance between scores, and because the standard deviation is a measure of distance, it will also be multiplied. The Coefficient of Variation The coefficient of variation, denoted by CVar, is the standard deviation divided by the mean. The result is expressed as a percentage. Chebyshev’s Theorem This theorem states that: ✔ At least three-fourths or 75% of all data values will fall within 2 standard deviations of the mean. ✔ At least eight-ninths or 89% of all data values will fall within 3 standard deviations of the mean. The Empirical Rule ✔ Approximately 68% of the data values will fall within 1 standard deviation of the mean. ✔ Approximately 95% of the data values will fall within 2 standard deviations of the mean. ✔ Approximately 99.7% of the data values will fall within 3 standard deviations of the mean. MEASURES OF VARIATION using EXCEL