MMW Midterm PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of basic statistical concepts. It covers key terms such as variables, data, and different scales of measurement. It also introduces descriptive and inferential statistics, along with sampling techniques. The document is a likely a midterm exam study guide or notes.
Full Transcript
Basic Statistical Concepts Statistics is the science of dealing with numbers. Data management in statistics happens when all disciplines involve data as valuable resources. It is a process that include acquiring, validating, organizing, processing, analyzing, and presenting data that gives meaning...
Basic Statistical Concepts Statistics is the science of dealing with numbers. Data management in statistics happens when all disciplines involve data as valuable resources. It is a process that include acquiring, validating, organizing, processing, analyzing, and presenting data that gives meaning to the data result which aids on providing statistically correct conclusion to any research activity. Key Statistical Terms 1. Variables refers to any characteristic, number or quantity that can be counted or measured. Qualitative variables also named as categorical variable, wherein the characteristics cannot be measured numerically. Example: gender (male or female), eye color (blue, green, brown, hazel) Quantitative variables that are measured on a numerical or quantitative scale. Ordinal, interval, and ratio scales are quantitative. Example: Car's speed, shoe size, test scores, and total number of hours sleeping Discrete variables that assume a finite number of isolated values. There is complete range of specified number. The values may obtain thru counting and cannot be divided into fractions like gender, blood group, and number of children in family. This is also known as categorical or classificatory variable. However, no values can exist in-between two categories. Continuous data refer to the variables which assume infinite number of different values. Values are obtained by measuring and can be divided into fractions. Examples of this type of variable include age, height, and temperature. 2. Data is a set of values of subjects with respect to quantitative or qualitative variables that used as a basis for reasoning, discussion, or calculation. 3. Scale of measurement Nominal data categories data that cannot be ordered in any particu- lar way such as female and male, yes and no responses, political affiliations like LP, Lakas, LDP and religious groupings Christian and non-Christian and other organizations. Ordinal data data such as Strongly agree, Agree, No opinion, Disagree and Strongly disagree, and also other data which employ rankings. Interval data provide numbers that reflect difference among items. With the interval scales, the measurements units are equal. Ordinal data the ratio scale is the highest type of scale. The basic difference between the interval and the ratio scales is that the inter- val scale has no true zero value while the ratio scale has an absolute zero value. 4. Descriptive statistics refers to some techniques concerning the gathering and presenting of a single result from analysis of the set of data or information. Including frequency distribution, measure of central tendency, measure of dispersion, measure of relative position, testing normality, and graphs. 5. Inferential statistics practices that implicate making decision and conclusion about the observed population using the representative samples. Major types of inference include regration, confidence intervals, and hypothesis tests. 6. Parameter is an important component of any statistical analysis. It is numerical characteristics of the entire population. Example: 33% of 516 students who took entrance exam at a particular college got below passing score. (Each and every student's test scores are recorded) 7. Statistics is a fraction data from a portion of a population. Example: 78% of the Filipinos were against legalization on same-sex marriage, based on online survey of Philippines House of Representative 8. Analysis involves gathering and examining simple or raw data is a set of items from which sample can be drawn. This is a process of breaking a complex or substance into smaller parts in order to easily give meaning and obtain a better understanding of it. Univariate analysis it refers to simplest form of data analysis in- volving only one variable at a time. Example: Height of college students Bivariate analysis - the examination of two variables simultaneously. Example: Relationship of study habit and test anxiety of the students Multivariate analysis-an investigation of more than two variables at once. Example: The relationship between level of self-discipline, academic per- formance, and logical skills. LESSON 2 Sampling Techniques Identifying the people and places is the most important step in the process of accumulating quantitative data. It includes identifying which group of people would be the participants and the accurate number of persons to be involved. However, if the population is composed of too large a number then representation may apply. Representative refers to the selection of individuals as a sample of a population that may enable them to draw conclusions from the sample about the population as a whole. Population is a group of people or individuals that share common connections. It was identified by the totality of objects or person under investigation. Sample is a subgroup of population that represents the characteristics and attributes of target population. Sample Sizes The easiest and common way on determining the sample size (n) needed on representing a finite population of (N) individuals would be using the Slovin's formula. It was developed by Robert Slovin that aim to determined the appropriate number of participants in a survey. This determination of sample size is based on the accessibility of the number of populations. Thus, this formula cannot be used without the actual value of total number of population. Slovin's Formula n = N/(1 + N * e ^ 2) Where: n = number of samples N = population e = margin of error Example: 1. Find the sample size if the population size is 3215 at 95% accuracy Solution: At 95% accuracy, the corresponding percentage margin or error is 5%. n = N/(1 + N * e ^ 2) n = 3215/(1 + 3215 * (0.05) ^ 2) n = 355.73 or 356 Example: 2. A group of senior high school students aim to describe the inter- personal skill of the STEM students but do not have the resources to survey an entire population of 1,556. Help them determine the accurate number of respondents that would represent the whole STEM students with a 3% margin of error. What should their sample size be? Solution: n = N/(1 + Ne ^ 2) n = 1556/(1 + 1556 * (0.05) ^ 2) n = 648.22or * 648 Probability Sampling Techniques Non-probability Sampling Techniques The samples in non-probability sampling techniques are selected based on the subjective judgement, rather that random selection. The different types of methods are: Convenience sampling, quota sampling, purposive sampling, and snowball sampling. Convenience sampling is a sampling technique wherein the selection of group of individuals are based on suitably and conveniently of the individual. It also called as accidental sampling. Quota sampling is used by means of deciding sample numbers that selection of respondent is made out of availability of the respondent. Purposive sampling happens when the selection of sample is based on the characteristics of a population and on the objective of the study. It is also known as judgment, selective, or objective sampling. If the needed sample in a study is difficult to find, snowball sampling to employed because the use of one sample may lead to more of the same kind of sample. Probability Sampling A probability sampling is a method of sampling that employs random selection. This process assures that it gives equal chance to all individuals in the population. Simple random sampling. Drawing randomly from a list of the population, this sampling technique where every item in the population has even chance and likelihood of being selected. Example: Put 50 names into bowl. Select 15 names from the bowl without looking to eliminate bias on identifying the samples. Systematic sampling. Representative from population are selected according to a random starting point but fixed, periodic interval. Example: A sample size of 3 from a population of 12, select every 12/3 = 4th member of the sampling frame. The figure below shows the possible illustrations on the selection of the sample using the systematic sampling. Stratified random sampling. This method aims to equally or proportionally partitioned the number of required samples depend on the population of the subgroup or strata. In stratified random sampling or stratification, the strata are formed based on members' shared attributes or characteristics, Example: Supposed that the four samples should represent the three groups (A, B and C). Based on the subgroup population or strata, the stratified random sample will be obtained using this formula: (sample size/population size) x stratum size. LESSON 3 Data Presentation The acquired data in most cases are generally raw and disorganized. It is important to manage the collected raw data because the data that is organized create a very valuable resource. When data are well organized, it is easy to interpret and give meaning. On arranging and systemizing data, appropriate tables and graphs are used. Frequency Distribution Table Frequency distribution is a table that shows the occurrence of various outcomes in a sample within particular group or interval. It tells how frequencies are distributed over values. In this way, it may help identify noticeable trends within a data set and can be used as basis to compare data sets of the same type. It is mostly used for summarizing categorical variables. The categorical data and numerical data can be presented via frequency table. Categorical data such as color, gender, school level or type were also called qualitative. Table 3.1 presents an example of a frequency table for categorical data. Each category and its frequency are shown to easily determine the gender of most students. While, numerical data or interval such as age, price, height, or number were also called quantitative. For instance, a trigonometry class with 35 students had a summative test and their unorganized raw scores are presented in Table 3.2. The summary displayed in Table 3.3 indicating the scores, tally frequency and percentage. Graphical Presentation Pie Chart Figure 3.1. Female and Male Enrollee A pie chart is a statistical device that can provide an easy presentation of nominal data or any categorical data by showing the part or division of it to the whole. Figure 3.1 displays the distribution of female and male enrollee in the College of Secondary Education. Bar Graph A bar graph contains of rectangular bars of equal width aligned horizontally or vertically. It used to show comparison of data sizes and frequencies. When the data are ordinal and interval, bars are connected while bars of nominal data are constructed far apart. Figure 3.2 shows the marital status of the respondents on the survey regarding the product testing of kamias (Bilimbi averrhoa) Jam made by Home Economic students. Likewise, their opinion about recommending the product revealed in Figure 3.3. Disagree Agree Strongly Agree Line Graph Also known as frequency polygon, this graph is prepared by plotting the points of paired value and thru connecting the points by straight line. Frequency polygon is used to visualize the changes over time. Figure 3.4 below best illustrates a line graph, showing the results of summative test of the students in trigonometry subject. LESSON 4 Measures of Central Tendency Measuring central tendency is obtaining a single value that attempts to comprehensively describe and represent the whole set of data. Its purpose is to identify the central position or. typical value of a dataset. These measures indicate where most values in a distribution located. Central tendency is a branch of descriptive statistics wherein the three most common measures of it are the mean, median, and mode. Median Median is the middle value of the set of data. It is used to locate and describe the half of the value of the set of data. It also shows the amount of data or observations on each side, upper, and lower data. Ungrouped data Step 1: Arrange from lowest value to highest value. Step 2: Locate the middle value, Example 1: 1. The samples are composed of nine middle children, their age are 14, 19, 13, 14, 12, 15, 18, 17, and 16. Find the median. 12, 13, 14, 14, 15, 16, 17, 18, 19 2. The median is 15. Example 2: 9, 4, 3, 2, 1, 1, 8, 7, 6, 5 1, 1, 2, 3, 4, 5, 6, 7, 8, 9 (a + b)/2 Since there are two middle values, then get their average (4 + 5) / 2 = 4.5. So, the median is 4.5. Grouped data Formula: Md=1+( n/2 * cf f )i Where: Md Median l = lower boundary n = total frequency cf = cumulative frequency of the lower class next to median class f = frequency of the median class i = interval Mode The mode is the value with greatest frequency or which occurs most often. It is not affected by extreme values. When the set of value has one mode, then it is called unimodal. Multimodal happens when a set of value has more than one mode. BIMODAL MODES Ungrouped data Example: 1. The following are the sizes of shoes sold. Find the most common shoe size sold (mode). 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8 So, the bestseller (unimodal) shoe size is 6. 2. 9,4,3,2,2,1,1,8,7,6,5 Multimodal, 2, 1 Grouped data Formula: Mo =l+( Delta1 Delta1 + Delta 2 )i Where: Mo = Mode 1 lower boundary Delta_{1} = difference between the frequencies of the modal class and the previous class Delta_{2} = difference between the frequencies of the modal class and the next class i = interval Mean Mean is the equal sum of the values in the data set divided by the number of values in the set of data. Thus, mean is referring to the average score or values of a group of data. It is the most reliable but most sensitive measure of central tendency. Ungrouped data The samples are composing of 9 middle children, their age is 14, 19, 13, 14, 12, 15, 18, 17, and 16. Find the mean. overline x = (Sigma*x)/n = (14 + 19 + 13 + 14 + 12 + 15 + 18 + 17 + 16)/9 = 138/9 = 15.33 Grouped data Formula: overline x = (Sigma*fx)/n where: x mean Sigma*fx = sum of the product of frequency and the midpoints n = sample size 51Weighted Mean A type of mean that is calculated when each data point is not contributing equally to the final mean. It is used when some data points contribute more "weight" than others. This can obtain by multiplying the weight related with certain event and its occurrences. Commonly use on survey instrument including a likert scale. It is a type of rating scale used to measure attitudes or opinions. Example Strongly Disagree Disagree Agree Strongly Agree N Do the physical appearance of the product is recommendable? 2 5 14 10 31 Solution: WM = (2(1) + 5(2) + 3(14) + 10(4))/n = 3.03 Steps in finding the median: 1. Construct less than cumulative frequency (cf) by copying the frequency of the last step then add frequency on the next step or class until the last step is equal to the total number of frequencies. 2. Determine the median class. a. Get the half of total frequency n/2. b. Locate the median class, it is step or class with the cumulative frequency greater than or equal to n/2. 3. Identify the lower boundary of the median class by subtracting its lower value by 0.5. Total Solution: Median class = n/2 = 35/2 = 3.03 Md =28.5+ binom (17.5 9 7 Md = 28.5 + (8.5/9) * 7 Md = 28.5 + (0.944) * 7 Md = 28.5 + 6.61 Md = 35.11 LESSON 5 Measures of Dispersion The measures of dispersion are often called measures of variability. The value under the measure of dispersion aims to describe the spread of the data, or its variation around a central value. It can provide information about the complete series. This will describe in what way each value related to each other in terms of the homogeneity of the values or data. Range It is simplest measure of variability or dispersion. Difference of the highest value and lowest value in the given distribution. When grouped data, the upper boundary of highest class or step distribution subtracted to lower boundary of lowest class or step distribution. Mean Deviation The mean deviation refers to measure of variation or dispersion that derives into consideration the difference of the individual scores from the mean. Ungrouped data MD- Σχ Grouped data MD=1 Variance and Standard Deviation Variance and Standard Deviation measure of how concentrated the data are around the mean. Variance is the average of the squared differences from the mean. Standard deviation (sd) refer how spread out numbers are. Also, it is the positive square root of the variance and most important measure of dispersion. The advantage of the variance and sd is having several applications in inferential statistics. A low sd signifies that the data points tend to be very close to mean while high sd suggests that the data points are spread out over a large range of values. Ungrouped data Variance LESSON 6 Measures of Relative Position Relative position refers to the location or spot of a value, relative to other values in a group of data. The most common measures of relative position are quartile, decile, and percentile. Quartile Quartiles are score-values which divide the distribution in four equal parts. Q_{1} = 25% Q_{2} = 50% Q_{3} = 75% Q_{4} = 100% Ungrouped data: Q_{1} = n/4 Q_{2} = (2n)/4 * or * n/2 Q_{3} = (2n)/4 Grouped data: Q_{i} = L + (((km)/A - df)/f) * i Locate the Qk class distribution that contains the computed kn/4 where k is either 1, 2, and 3. where: D = quartile k from 1,2,3,4...99 l = lower limit cf = cumulative frequency before the QK class f = frequency where the lower limit is located Decile Ungrouped data: D_{i} = (Kn)/10 where: D = decile; k = from 1, 2,....9; and n = sample size Grouped data: D_{s} = L(((km)/10 - cf)/f) * i Locate the Dk class on step distribution that contains the computed kn/10 where k is either 1, 2, 3, 4,.... 9. where: D = decile k from 1, 2, 3, 4,.... 9 l = lower limit cf = cumulative frequency before the QK class f = frequency where the lower limit is located Percentile Percentiles are rank-ordered set of values which divide the distribution in a hundred equal parts. Q_{1} corresponds to P_{25} Qcorresponds Q_{2} to P_{5i} and Q_{2} corresponds to P Ungrouped data: P_{lambda} = (Kh)/100 where: P = decilefrom 1, 2,....99; and n = sample size Grouped data: P_{a} = L_{i} + ((K_{B}/100 - cf)/f) * i Locate the Pk class that contain the computed where k is either 1, 2, 3, 4,.... 99. (Kn)/10 Where: D = decile k from 1, 2, 3, 4,.... 99 l = lower limit cf = cumulative frequency before the PK class f = frequency where the lower limit is located Example: Below are the performance ratings of 40 professors scored by their students. Data are arranged in ascending order. Find the Q_{3} and P_{66} D_{S} P 30 deg The value of 30th item is 89. Thus, 75% of the data are below 89. D_{s} = (Kn)/10 = (8(40))/10 = 320/10 = 32 ^ (nd) = 91 Q_{s} = (3n)/4 = (3(40))/4 = 120/4 = 30 deg = 89 The value of 32nd item is 91. Thus, 80% of the data are below 91. P_{30} = K_{H}/100 = (30(40))/100 = 1200/100 = 12 ^ (14i) = 54 The value of 12th item is 54. Thus, 30% of the data are below 54. P_{66} = (Kn)/100 = (66(40))/100 = 2640/100 = 26.4 ^ 64 The value of 26th item is 77 and that of the 27th item is 78. Thus, the 66 percentile (P_{66}) is the 0.4th of the value 77 and 78. Since the difference between 77 and 78 , therefore 77 +1(0.4) = 77.4. Hence, 66% of the data are below 77.4 (P_{66} = 77.4) Z-score or Standard Score Z-score is the number of standard deviations from the mean a data point is. It is the measure of how many standard deviations below or above the population mean of a raw score. It is also referred to as standard score that aims to: calculate the probability of a score occurring within the normal distribution; and provide knowledge to compare two scores that are from different normal distribution. When z-score is: 0-it indicate that the data point's score is identical to mean score. 1.0-it means a value that is one standard deviation from the mean. Positive the score is above the mean. Negative the score is below the mean. Formula: Population z = (x - mu)/sigma Sample pi = x- overline x s where: x = observed value mu = population mean sigma = population standard deviation where: observed value x = sample mean overline x = s = sample standard deviation Example: Marry is a consistent honor student. She scored 47 in the exam in earth science for which the average score of the class was 35 with standard deviation of four. While in basic calculus exam, she got 48 for which the average score is 31 with standard deviation of eight. Relative to other students who took the exam, what subject should Marry focus more-earth science or basic calculus?