Statistics & Probability PDF

UNIT I DESCRIPTIVE STATISTICS In this unit, you are expected to: Substantially relate population and sample to scientific inquiry based on their role in solving problems. Compute at least 50% problems concerning sample size from given population sizes. Substantially differentiate sampling techniques from each other in terms of their processes and properties. Identify the sampling technique described/used in at least 50% of given statements of properties/situations. Substantially differentiate data collection techniques in terms of procedure, data reliability, ethical concerns, and resources needed. Identify the data collection technique described in at least 50% of given statements of properties. Substantially describe each graph as to when each of them can be used, their parts, their form, and how they can be constructed. Construct 3-5 examples of each graph in soft copy based on MS Excel procedures. Provide 2 examples of at least 3 out of 5 measurement scales which relate to their field of specialization. Classify at least 50% of the given data based on their scale of measurement. Solve at least 50% of the problems involving measures of central tendency, measures of variation, and measures of location of ungrouped data using a scientific calculator. Provide values to at least 50% of the empty cells in a frequency distribution table with the aid of a scientific calculator. Finely and neatly construct a frequency histogram, frequency polygon, and ogives of a given data set using a graphing paper. Solve at least 50% of the problems involving measures of central tendency, measures of variation, and measures of location of grouped data based on a given frequency distribution table. Correctly decide the degree of relationship between two variables using appropriate correlation technique. 1 Lesson 1 Definition and Importance of Statistics Consider the following information: 1. The number of students enrolled in the program or course Bachelor of Mechanical Engineering Technology, Bachelor of Architectural Engineering Technology, Bachelor of Electrical Engineering Technology, Bachelor of Electronics Engineering Technology, Bachelor of Automotive Engineering Technology and in Associate of Industrial Technology 2. The scores of teams in basketball games during a season 3. The number of trees planted by an organization in a month 4. The mean salary of technicians in different cities of a country 5. The number of electronic gadgets repaired by a technician in a week 6. The number of typhoons passing over the Philippine Area of Responsibility (PAR) in a year 7. The number of computer viruses detected by an anti-virus software in a certain period of time 8. The number of customers or clients asking technical assistance in a week 9. The grades of students in a semester 10. The number of students who pass in Probability and Statistics in a particular semester There are two definitions of the word “statistics”. The actual data of the information described above are called “statistics”. The first definition of “statistics” refers to the data itself or numerical facts. On the other hand, the second definition of statistics refers to a scientific investigation which follows many sequential processes. Refer to the definition that follows below. Statistics is a branch of mathematics that deals with the science of conducting investigation involving collection, organization, presentation, analysis, and interpretation of data. 2 In conducting any scientific investigation, these five processes are followed in the order as presented in the definition above. A person needs to collect data, organize them, summarize or present them in an understandable form, analyze using formulas and techniques, then interpret the meaning of the results of the computations as a solution to a problem. An investigation is conducted because of the presence of a problem and performing these five processes unlocks the solution. Statistics therefore helps mankind in easing discomfort caused by a problem. A quotation from Florence Nightingale says that “Statistics is the most important science in the world: For upon it depends the practical application of every other science and of every art. The one science essential to all political and social administration, all education, and all organization based on experience, for it only gives results of our experience.” Statistics is applicable in all fields or areas from information and communications technology to industrial technology. It is also useful to education, government, business and economics, medicine, sociology, sports, etc. The applications of statistics vary in different fields. A computer programmer may be asked by his/her superior to monitor the success of newly developed software. An electronics technician may need to ask information from the client as to usage, problems, and other related information concerning the gadgets to be repaired. A mechanic needs to have records of conditions and specifications (SPECS) of a machine. A technical support representative in a call center needs to gather information from his/her clients concerning the trouble of a cellular phone, a computer, a television, or any product. Professionals need statistical skills in order to be more competent in doing their job. They need the specific skills in performing statistical processes in a scientific inquiry. The rest of the lessons explain each of the processes of conducting scientific investigation. Calculators will be needed in acquiring statistical skills for the next lessons. Lesson 2 3 Population and Sample When conducting scientific investigations, it is important to identify the sources of information for the solution of a problem. Who or what will qualify as data sources? How many are they? How many should be included? These concerns can be addressed with the idea of “population” and “sample”. Population is the complete set of subjects (persons, items, or objects) whose characteristics are under investigation. Sample is a well-defined representative of a population. A subject qualifies as member of a population if he/she/it has the characteristics which are under inquiry. There are instances that the population is too large that including the entire population would require a lot of resources like time, money, and/or efforts. Deadline might not be met if duration of investigation is too long. There may be financial constraints or limited budget that gathering data from the entire population will cost a lot of money. There may also be limited persons who could perform the investigation that efforts of limited workforce may not be enough to finish the tasks. With these reasons, sources of information need to be statistically reduced with the concept of “sample”. A sub-group of elements of a population which bear the same characteristics of the population is a well-defined representative. If sample is used instead of the population, it is necessary to determine how many elements of the population shall compose as the sample. Skills in determining the population size and computing the sample size is needed. Population size (N) is the total number of subjects comprising the population. Sample size (n) is the total number of subjects comprising the sample. The computation of the sample size can be done using the Slovin’s Formula provided that the population size is known or given. Please prepare your calculator and refer to the formula in the next page. Yamane’s Formula (Slovin’s Formula) for computation of n 4 N n= 1+Ne2 where n is the sample size, N is the population size e is the margin of error (0.1, 0.05, 0.10) Note Slovin’s formula applies only to population sizes which are at least 500 or 𝑵 ≥ 𝟓𝟎𝟎 when the margin of error is either 0.05 or 0.10. For the margin of error which is 0.01, the minimum population size is 10000 subjects, 𝑵 ≥ 𝟏𝟎𝟎𝟎𝟎. If specific condition for each margin of error is not met, the sample size would be equal to the population size or 𝒏 = 𝑵. When conducting statistical investigations, the result of the Slovin’s formula serves as the minimum sample size. Investigating more than the result of this formula would be better. However, investigating less than the result will obtain an unreliable investigation. Example1 Given: N = 500 and e = 0.05 𝑁 𝑛= 1+𝑁𝑒 2 500 𝑛= 1+(500)(0.05)2 In your calculator press: 500÷( 1 + 500 x 0.05 x2 ) = 𝑛 = 222.22 ≈ 223 This means that the minimum sample size for N = 500 at 0.05 or 5% margin of error is 222.22. However, it doesn’t make sense to count a fraction of an object. Sample size should be expressed as a positive integer or whole number. Round up your answer to the next whole number. Do not round off because you might get a value less than the minimum. Rounding off 222.22 results to 222 which is less than the minimum but rounding up 222.22 yields 223 which is greater than the minimum. Example2 Given: N = 1500 and e = 0.05 5 𝑁 𝑛= 1+𝑁𝑒 2 𝑛= 1+( )(0.05)2 In your calculator press: ____ ÷( 1 + ____ x 0.05 x2 ) = 𝑛 = ______ ≈ 316 Example3 Given: N = 1500 and e = 0.10 𝑁 𝑛= 1+𝑁𝑒 2 𝑛= 1+( )( )2 In your calculator press: ____÷( 1 + ____ x ____ x2 ) = 𝑛 = ______ ≈ 94 Example4 Given: N = 50000 and e = 0.01 𝑁 𝑛= 1+𝑁𝑒 2 𝑛= 1+( )( )2 In your calculator press: ____÷( 1 + ____ x ____ x2 ) = 𝑛 = ______ ≈ Practice Tasks Name : _________________ Course & Section : _________________ 6 Date : ____________ Score : ____________ Solve for the sample size given the population size and margin of error in each item. 1. N = 12564 , e = 10% 2. N = 12564 , e = 5% 3. N = 12564 , e = 1% 4. N = 4658 , e = 10% 5. N = 4658 , e = 5% 6. N = 4658 , e = 1% 7. N = 854 , e = 1% 8. N = 548 , e = 5% 9. N = 458 , e =10% Supplemental Tasks Name :_________________ Course & Section :_________________ 7 Date : ____________ Score : ____________ 1. A researcher in Surigao State College of Technology would want to make a survey in the school with a population of 8000 students at 5% margin of error. At least how many students must he take as a sample? 2. An electronics technician wants to determine the level of satisfaction of his customers regarding his services in the previous year. Based on his log book, he had 2619 customers last year. How many customers must he include in his investigation if he will use 10% margin of error? 3. A computer programmer was asked by his supervisor to assess whether or not users of newly developed anti-virus software like the program. The software was sold to 2000000 users for the last three months. How many users must he consider in his study if he can tolerate only 1% error? 4. A professor in a state university is interested to determine the level frequency that student will have his/her USB Flash disks scanned in a month. He found that 8925 students own at least one flash disk(s). At 5% margin or error, how many students must he consider in his investigation? 5. A study was conducted to ascertain the mean salary of computer practitioners in the Philippines. If there are 15375908 practitioners, how many of them will be included in the study at 1% margin or error? Lesson 3 Sampling Techniques 8 Once sample size is determined, it is time to identify the members of the population who will compose the sample. The process of identifying elements of the population as elements of the sample is called sampling. The diagram below displays the types of sampling techniques. Non-Probability Probability Sampling Sampling Simple Purposive Random Systematic Quota Cluster Convenience Stratified Multi-stage Definition of Terms 9 TECHNIQUE DEFINITION This is a technique which gives equal chances to each Probability element of the population to be part of the sample. Sampling This is a technique which gives unequal chances to each Non-Probability element of the population to be part of the sample. Sampling This is a technique which is performed without definite order. The one selecting the sample element does not have Simple Random control over who will be chosen. It is dependent to chance Sampling alone. Specifically, this can be done using lottery or electronic method. Systematic This is a technique which utilizes either the system(s) of Sampling time, location, and/or interval. This is a technique which divides the population into homogeneous groups with each group having heterogeneous Cluster Sampling elements. A cluster or some clusters are randomly chosen to compose the sample or equal elements from each cluster are randomly chosen to compose the sample. This is a technique which divides the population into heterogeneous groups with each group having homogeneous Stratified Sampling elements. Proportionate elements from each stratum are randomly chosen to compose the sample. This is a technique which involves different stages of Multi-stage choosing sample elements from the population. Sampling This is a non-random sampling of choosing samples which Purposive Sampling is based on certain criteria and rules laid down by the researcher. This is a non-random sampling in which the investigator limits the number of his samples based on the required Quota Sampling number of the subject under investigation. This is a non-random sampling in which the investigator Convenience identifies the samples in his convenient time, preferred place Sampling or venue. Sampling Procedure This section provides a detailed explanation of the process including 10 computations in identifying the sample elements from the population using different methods. The data used in the discussion are based on the situation in which the population size (N) is 3756 and the sample size (n) is 362 at 5% margin or error. Simple Random Sampling Lottery Method Step 1: Have the list of the population and number them from 1 up to the population size. In the given example, number the list of the population from 1-3756. 1 Juan de la Cruz 2 Karl Bautista 3 Jessa Mendez 4 Bricks Oyales...... 3755 Dianne Dee 3756 Jessie Urbano Step 2: Cut N pieces of papers and number them from 1 – N. In our examples, cut 3756 papers and number them from 1-3756. 1 2 3 4 5... 3755 3756 Step 3: Put the papers in a receptacle (box, bowl, urn) and mix them. Step 4: Draw or get n papers from the box. In the example, draw 362 papers. Step 5: Identify the name bearing the numbers drawn from the receptacle. Example, if paper No. 4 is drawn, then Bricks Oyales is one of the samples or sample elements. He is a sample because his number on the list in Step 1 is 4. Electronic Method (Calculator) Step 1: Have the list of the population and number them from 1 up to the population size. In the given example, number the list of the population from 1-3756. 11 1 Juan de la Cruz 2 Karl Bautista 3 Jessa Mendez 4 Bricks Oyales...... 3755 Dianne Dee 3756 Jessie Urbano Step 2: Prepare a Scientific Calculator. Step 3: Press the following keys on the calculator: Shift = or 2nd F = After pressing the keys above, a decimal number will appear in the screen. Different calculators will show different results because the numbers are randomly displayed. Step 4: Decide on what number will be considered as reference to the sample element. Disregard the decimal number. Refer to the illustration below. 0.3125 Resulting Random Number 3125 Sample Reference Number The sample reference number means that the element on the list in Step 1 with number 3125 is a sample or a sample element. Step 5: Press the equal key in the calculator to get another random number and repeat Step 4 to identify the sample. Step 6: Repeat Step 5 until the desired number of samples or sample of elements are identified. Systematic Sampling Method 1 (Location and Time Interval) Step 1: Choose a location where the population elements or subjects would possibly 12 pass by. For example, position at the Security Guard’s Post (Entrance or Exit) if the population subjects are students. Step 2: Decide on the Time Interval for sample identification (i.e. every 10 minutes). Step 3: Determine the number of samples or sample elements to be chosen per time interval (i.e. 5 subjects/10 minutes). Step 4: Position at the chosen area/location and list down the names of the first group of subjects who pass by the location within the specified time interval. If the desired number of subjects per time is identified before the time interval is over, wait until the time interval is over before you proceed identifying the next subjects. On the other hand, if the time interval is over but you have identified subjects less than the desired number, then only those subjects are considered as samples for that time interval. Example is list down the names of the first 5 subjects who pass by the location within 10 minutes. If 5 subjects are identified in less than 10 minutes, wait until 10 minutes is over before you continue identifying the subjects. If there are only 3 subjects who pass by the area within 10 minutes, then only they are the samples for such time interval. Start again for another time interval. Step 5: Repeat Step 4 until the desired number of samples are identified. Method 2 (Interval) Step 1: Have the list of the population and number them from 1 up to the population size. In the given example, number the list of the population from 1-3756. 13 1 Juan de la Cruz 2 Karl Bautista 3 Jessa Mendez 4 Bricks Oyales...... 3755 Dianne Dee 3756 Jessie Urbano Step 2: Prepare a Scientific Calculator. Step 3: Press the following keys on the calculator: Shift = or 2nd F = After pressing the keys above, a decimal number will appear in the screen. Different calculators will show different results because the numbers are randomly displayed. Step 4: Decide on what number will be considered as reference to the sample element. Disregard the decimal number. Refer to the illustration below. 0.3125 Resulting Random Number 3125 Sample Reference Number The sample reference number means that the element on the list in Step 1 with number 3125 is the first sample or the first sample element. 𝑵 Step 5: Determine the interval of sample selection using the formula 𝒌 = 𝒏 𝒙 𝟑. In the given example, 𝟑𝟕𝟓𝟔 𝒌= 𝒙𝟑 𝟑𝟔𝟐 14 𝒌 = 𝟑𝟏. 𝟏𝟑 𝒌 = 𝟑𝟏 (rounded off) Step 6: Add the result in Step 5 (k) to the result in Step 4 (sample reference number) to identify the next sample. Continue adding the value of k to the previous results to identify the next samples. Refer to the illustration below. 3125 n 1 + 31 3156 n 2 + 31 3187 n 3 + 31 3218 n 4 + 31 3249 n 5 + 31 3280 n 6 + 31 3311 n 7 + 31 3342 n 8 + 31 3373 n 9 + 31 3404 n 10 Step 7: If the result/sum exceeds or is more than the population size (N), subtract the population size to the sum to identify the next sample. 3125 n 1 3404 n 10 + 31 + 31 3156 n 2 3435 n 11 + 31 + 31 3187 n 3 3466 n 12 + 31 + 31 15 3218 n 4 3497 n 13 + 31 + 31 3249 n 5 3528 n 14 + 31 + 31 3280 n 6 3559 n 15 + 31 + 31 3311 n 7 3590 n 16 + 31 + 31 3342 n 8 3621 n 17 + 31 + 31 3373 n 9 3652 n 18 + 31 + 31 3404 n 10 3683 n 19 + 31 3714 n 20 + 31 3745 n 21 + 31 3776 The result/sum after adding 31 to 3745 is 3776 which is greater than the population size of 3756. Subtract the population size to the last result/sum, 3776 - 3756 20 n 22 This means that the student on the list whose number is 20 is the 22nd sample. Step 8: Continue adding the value of k to the previous sum until the desired number of samples is met. Cluster Sampling Method 1 (Equal) Step 1: Prepare a Table showing the number of subjects per cluster. Refer to the illustration below. Cluster N C1 650 16 C2 810 C3 750 C4 881 C5 665 Total 3756 𝑛 Step 2: Solve for the value of k using the formula 𝑘 = 𝑚 where n is the sample size and m is the number of clusters. 𝑛 𝑘= 𝑚 362 𝑘= 5 𝑘 = 72.4 𝑘 = 72 (rounded off) This means that 72 subjects will be chosen from each cluster. Step 3: Use simple random sampling (lottery or electronic) or systematic sampling to select k subjects from each cluster. Step 4: Multiply the value of k to m. If the result is less than n, then select the lacking samples from the cluster(s) with the most number of elements. In the example, 72 x 5 = 360. This is less than 362. You can select two more subjects from C4 since it has the most number of elements compared to other clusters. Method 2 (Cluster Selection) Step 1: Prepare a Table showing the number of subjects per cluster. Refer to the illustration below. Cluster N C1 650 C2 810 C3 750 C4 881 17 C5 665 Total 3756 Step 2: Prepare a Scientific Calculator. Step 3: Press the following keys on the calculator: Shift = or 2nd F = After pressing the keys above, a decimal number will appear in the screen. Different calculators will show different results because the numbers are randomly displayed. Step 4: Determine the cluster from which samples shall be chosen using the result in Step 3. Disregard the decimal point. If there are more than 9 clusters, use two digits from the result starting from the digit having the value “1”. However, if there are less than 10 clusters, use only one digit. Refer to the illustration below. If there are more than 9 clusters 0.2137 Resulting Random Number 13 Cluster Reference Number (C13) If there are less than 10 clusters 0.2137 Resulting Random Number 2 Cluster Reference Number (C2) Step 5: Use simple random sampling (lottery or electronic) or systematic sampling to select n subjects from the cluster identified in Step 4. Stratified Sampling Step 1: Prepare a Table showing the number of subjects per stratum. Refer to the illustration below. Stratum N S1 650 18 S2 810 S3 750 S4 881 S5 665 Total 3756 Step 2: Compute for the relative frequency (rf) in each stratum with respect to the population size (N). This is obtained by dividing the number of subjects in each stratum by N. In the example, S1 has 650 elements and the population size (N) is 3756. So, 650/3756 = 0.173056. Complete the Table below. Stratum N rf S1 650 0.173056 S2 810 S3 750 S4 881 S5 665 Total 3756 Step 3: Compute for the sample size per stratum by multiplying each relative frequency to the sample size (n). In the example, the sample size is 362 while S1 has a relative frequency or rf of 0.173056. So, 0.173056 x 362 = 62 (rounded off). Complete the Table below. Stratum N rf n 19 S1 650 0.173056 63 S2 810 S3 750 S4 881 S5 665 Total 3756 Step 4: Use simple random sampling (lottery or electronic) or systematic sampling to select the number of subjects from each stratum. In the example, select 62 from 650 subjects from S1. Multi-stage Sampling This sampling technique is performed using Method 2 of Cluster Sampling done in many stages. This is useful for studies with wide coverage (i.e. provincial, regional, national, international studies). For example, if the coverage of the scientific investigation is national, the investigator can classify the population by region. One or two regions then is/are selected. For the region(s) selected, the population elements can be classified again by provinces; one or few provinces can be selected. This process can be repeated even up to the barangay level. Practice Tasks Name :_________________ Date : ____________ Course & Section :_________________ Score : ____________ 1. Compute for the sample size of the population size indicated in the Table. 20 Complete the Table below using Cluster Sampling Method 1. 1.1. 5% Margin of Error Clusters N n C1 2500 C2 2564 C3 3550 C4 4520 C5 3251 C6 4337 C7 3458 Total 24180 1.2. 1% Margin of Error Clusters N n C1 2500 C2 2564 C3 3550 C4 4520 C5 3251 C6 4337 C7 3458 Total 24180 2. Compute for the sample size of the population size indicated in the Table. Complete the Table below using Stratified Sampling. 2.1. 5% Margin of Error 21 Strata N rf n S1 2500 S2 2564 S3 3550 S4 4520 S5 3251 S6 4337 S7 3458 Total 24180 2.2. 1% Margin of Error Strata N rf n S1 2500 S2 2564 S3 3550 S4 4520 S5 3251 S6 4337 S7 3458 Total 24180 Lesson 4 Data Collection Techniques When samples or sample elements are already identified, data gathering procedure will follow. This section presents different data collection techniques. Data can be collected directly or indirectly depending on the type of investigation 22 and the nature of the data. Direct methods are techniques which are personally gathered by the investigator while indirect methods use other medium like a paper, records, video camera, etc. The diagram below displays data collection techniques classified as direct or indirect. Direct Indirect Interview Observation Questionnaire Experiment Registration Definition of Terms TECHNIQUE DEFINITION Medium 23 Questionnaire Questionnair Gathers data through set of and e questions written on a paper. Respondent Interviewer Gathers data by asking questions Interview and face-to-face or over a telephone. Interviewee Gathers data by accessing Letter Registration information readily available in an and office. Record Gathers data by recording Observer behaviors of subjects in their Observation or natural habitat sensed by the Video Camera observer. Gathers data by observing the Experimenter Experiment reaction of the subject to an or intervention. Other Media Practice Tasks Name :_________________ Date : ____________ Course & Section :_________________ Score : ____________ 24 Determine what data collection technique is most appropriate or reasonable to be used for the data described in each item. __________1. Grades of ICT students __________2. Performance of OJT students __________3. The average rainfall in Surigao City __________4. The efficiency of new transistor brand __________5. The economic conditions of the country __________6. The average monthly temperature in Surigao City __________7. The defects of an electronic device under repair __________8. Perceptions of programmers on the new Operating System __________9. Average number of cars repaired by a technician per month __________10. Average number of tricycles that would pass by Narciso St. along Surigao State College of Technology Supplemental Tasks Name :_________________ Date : ____________ Course & Section :_________________ Score : ____________ Proceed to the library and choose two undergraduate theses. Copy their Title and 25 describe the research instrument used for data collection. Title 1: Description of Research Instrument: Title 2: Description of Research Instrument: Lesson 5 Types of Variables and Measurement Scales Quantitative – Qualitative Classification 26 Variables Qualitative Quantitative Discrete Continuous Level of Measurement Classification 27 Ratio Interval Ordinal Nominal Definition of Terms MEASUREMENT SCALES DEFINITION 28 Classifies data into mutually exclusive (non- overlapping), exhausting categories in which Nominal no order or ranking can be imposed on the data Classifies data into categories that can be Ordinal ranked; however, precise differences between the ranks do not exist. Ranks data, and precise differences between Interval units of measure exists; however, there is no meaningful zero. Possesses all the characteristics of interval measurement, and there exists a true zero. Ratio In addition, true ratios exist when the same variable is measured on two different members of the population. 29 File Size Height Ratio Mass Age Temperature Interval Grade Examples IQ Grade Level Ordinal Versions of Operating System Species Nominal Computer Brand 30 Practice Tasks Name :_________________ Date : ____________ Course & Section :_________________ Score : ____________ A. Classify each variable as qualitative or quantitative. ___________1. Number of desks in a classroom ___________2. Mass of fish caught in Lake Emilie ___________3. Number of pages in statistics textbooks ___________4. Colors of automobiles in the parking lot ___________5. Capacity (in gallons) of water in selected dams B. Classify each variable as discrete or continuous. ___________6. Lifetimes of batteries in a tape recorder ___________7. Number of pizzas sold last year in Surigao City ___________8. Capacity (in gallons) of water in swimming pools ___________9. Weights of newborn infants at a certain hospital ___________10. Water temperature of the Saunas at a given health spa C. Classify each as nominal, ordinal, interval, or ratio. ___________11. Salaries of technicians ___________12. Horsepower of motorcycle engines ___________13. Temperature of automatic popcorn poppers ___________14. Time required by drivers to complete a course ___________15. Ratings of newscasts in the Philippines (poor, fair, good, excellent) Supplemental Task 31 Name :_________________ Date : ____________ Course & Section :_________________ Score : ____________ Provide five (5) examples for each type of measurement scale related to your area of specialization. Nominal 1. 2. 3. 4. 5. Ordinal 1. 2. 3. 4. 5. Interval 1. 2. 3. 4. 5. Ratio 1. 2. 3. 4. 5. Lesson 6 32 Data Organization and Presentation Techniques When you are already certain the type of data you gathered, organization through “tally” follows. Tallying is a convenient way of organizing data by which each case is assigned a frequency. After having a tally, the data can now be presented. DATA PRESENTATION TECHNIQUES Tabular-Textual - Data are presented in a table followed by an explanation. Graphical-Textual - Data are presented through a figure followed by an explanation. Line Map Bar Types of Graphs Picto Pie 33 Group Activity 1. Find your group mates. 2. Select a facilitator and a recorder. 3. Brainstorm on the type of graph assigned to your group. 4. Answer the following: Course & Section : ____________ Members : ____________ Group # : ____________ : ____________ Facilitator : ____________ : ____________ Recorder : ____________ : ____________  When can it be used?  What are its parts?  How does it look like?  How can it be manually constructed? 34 Practice Tasks 1. Write your group’s output of today’s activity in a “Manila Paper”. 2. Study in group regarding your output. 3. Prepare for group presentation. Supplemental Task Burn in CD-RW the following: a. MS Excel 2007 Syntax in Constructing the graph assigned to your group (Sheet 1) b. Five (5) different outputs – Tables and Graphs (Sheet 2-6) CD Label and Filename: Course_Section_Group No. Example: BET_J_Group 3 Lesson 7 35 Analysis of Ungrouped Data Measures of Central Tendency Median- the middle score How? Arrange the scores from lowest to highest. The middle score is the median. In case that there are two middle scores, then the average of the scores is the median. Illustration1 Data: 16 25 5 7 9 30 11 12 15 Array: 5 7 9 11 12 15 16 25 30 Median = 12 Illustration2 Data: 32 16 25 5 7 9 30 11 12 15 35 41 Array: 5 7 9 11 12 15 16 25 30 32 35 41 Median = (15+16)/2 = 15.5 Illustration3 Data: 32 16 25 5 22 7 9 30 11 15 35 41 10 Array: 5 7 9 10 11 15 16 22 25 30 32 35 41 Median = 16 36 Mode- the most frequent score(s) Unimodal – only single score appeared most frequently Illustration4: Data: 15 17 6 17 17 20 11 9 10 11 Array: 6 9 10 11 11 15 17 17 17 20 Mode = 17 Bimodal – two scores appeared most frequently with equal count Illustration5: Data: 15 17 20 11 6 9 10 11 20 Array: 6 9 10 11 11 15 17 20 20 Mode = 11 and 20 Trimodal- three scores appeared most frequently with equal count Illustration6: Data: 20 28 28 20 25 26 28 28 6 9 11 15 17 20 20 9 10 11 9 9 Array: 6 9 9 9 9 10 11 11 15 17 20 20 20 20 25 26 28 28 28 28 Mode = 9, 20, and 28 Multimodal- four or more scores appeared most frequently with equal count Illustration6: Data: 20 25 26 28 28 6 11 15 17 20 10 11 9 9 Array: 6 9 9 10 11 11 15 17 20 20 25 26 28 28 Mode = 9, 11, 20, and 28 37 Mean- the average of the scores How? This can be computed using the Statistical functions of a scientific calculator. Data: 20 28 28 20 25 26 28 28 6 9 11 15 17 20 20 9 10 11 9 9 Syntax: Measures of Variation Standard Deviation- distance of scores from each other with respect to the mean and used for small scores How? This can be computed using the Statistical functions of a scientific calculator. Data: 20 28 28 20 25 26 28 28 6 9 11 15 17 20 20 9 10 11 9 9 Syntax: Variance- distance of scores from each other and used for larger scores How? This can be computed using the Statistical functions of a scientific calculator. Syntax: Quantiles 38 Quartiles scores which divide the distribution into 4 equal parts Qk where k = 1, 2, 3) Deciles scores which divide the distribution into 10 equal parts (Dk where k = 1, 2, 3,..., 9) Percentiles scores which divide the distribution into 100 equal parts (Pk where k = 1, 2, 3, …, 99) How? Arrange the scores from lowest to highest and solve for the value of “c” C = kn/4 for quartile C = kn/10 for decile C = kn/100 for percentile Rule1: a. If the value of “c” is a decimal number, round up to the next number as the final value of “c”. b. Using the final value of “c”, count from the lowest score as the first, up to the score which contains the count corresponding the final value of “c.” Rule2: a. If the value of “c” is a whole number, then it is the final value of “c.” b. Count from the lowest score as the first, up to the score which contains the count corresponding the value of “c.” c. Count also from the lowest score as the first up to the score which contains the count corresponding the value of “c+1.” d. Solve for the average of the scores counted by c and c+1. Illustration7: 39 Data: 10 11 15 17 20 28 25 26 28 9 28 29 30 6 9 20 Array: 6 9 9 10 11 15 17 20 20 25 26 28 28 28 29 30 Solve for 1st Quartile or Q1 k=1 n = 16 c = kn/4 Note that the divisor is 4 because it is for quartile. = 1(16)/4 c=4 This is a whole number, so rule 2 will be used. The score corresponding to c= 4 is 10 while the score corresponding to c+1 = 5 is 11. So, Q1 = (10+11)/2 = 10.5 Illustration8: Data: 10 11 15 17 20 28 25 26 28 9 28 29 30 6 9 20 Array: 6 9 9 10 11 15 17 20 20 25 26 28 28 28 29 30 Solve for 6th Decile or D6 k=6 n = 16 c = kn/10 Note that the divisor is 10 because it is for decile. = 6(16)/10 c = 9.6 This is a decimal number, so rule1 will be used. c = 10 This is the final value of c. The score corresponding to c = 10 is 25. So, D6 = 25 Practice Tasks Name :_________________ Course & Section :_________________ 40 Date : ____________ Score : ____________ Data: 10 11 15 17 20 28 25 26 28 9 28 29 30 6 9 20 Solve for: 1. 3rd Quartile or Q3 2. 4th Decile or D4 3. 37th Percentile or P37 4. 50th Percentile or P50 5. 64th Percentile or P64 Lesson 8 Analysis of Grouped Data 41 How to construct a frequency distribution Table Illustration 36 43 48 35 40 25 30 36 45 58 33 40 26 30 36 46 50 38 41 29 31 37 46 51 39 28 25 32 38 41 53 41 50 34 33 Step1. Identify the Highest Score (HS) and the Lowest Score (LS) HS = 58 LS = 25 Step 2. Solve for the Range (R) R = HS-LS = 58-25 R = 33 Step 3. Solve for the Class Width/Size (i) i = R/m where m is the number of groups or classes desired i = 33/6 The number of classes, m, can be computed i = 5.5 using Sturge Formula i=6 m = 1 + 3.32log n Name: ___________________ Section: ___________________ 42 Step 4. Identify the Lowest Limit. The lowest score must be checked first if it is divisible by the class width or i. If yes, then it is the lowest limit. If not, look for a number less than the lowest score but closest to it which is divisible by the class width or i. In the example, the lowest score is 25 which is not divisible by i=6. So, the score less than 25 but closest to it which is divisible by i=6 is 24. Step 5. Construct the frequency distribution Table. Study the Table below. Class Classes xm f cf f xm xm -𝒙 (xm-𝒙)2 f(xm-𝒙)2 Boundaries 24 29 23.5 29.5 26.5 5 5 35 132.50 -11.83 139.95 699.74 30 35 29.5 35.5 32.5 8 13 260.00 -5.83 33.99 271.91 36 41 35.5 41.5 38.5 12 25 22 462.00 0.17 0.03 0.35 42 47 41.5 47.5 44.5 4 29 10 6.17 38.07 152.28 48 53 47.5 53.5 5 34 6 252.50 12.17 148.11 740.54 54 59 53.5 59.5 56.5 1 1 56.50 18.17 330.15 n 35 ∑ f xm 1341.5 ∑ f(xm-𝒙)2 2194.97 Name: ___________________ Section: ___________________ 43 Measures of Central Tendency Mean ∑ 𝐟 𝐱𝐦 𝑥 = 𝑛 Median 𝑛 2 −∑𝑓𝑐 𝑀𝑒 = 𝐿𝐿 + ( )𝑖 𝑓 Mode ∆1 𝑀𝑜 = 𝐿𝐿 + ( )𝑖 ∆1 +∆2 Measures of Variation Variance ∑ 𝐟(𝐱𝐦 −𝐱)𝟐 𝑠2 = 𝑛−1 Standard Deviation 𝑠 = √𝑠 2 Name: ___________________ Section: ___________________ 44 Quantiles Quartiles 𝑘𝑛 − ∑𝑓𝑐 𝑄𝑘 = 𝐿𝐿 + ( 4 )𝑖 𝑓 𝑄3 = Deciles 𝑘𝑛 − ∑𝑓𝑐 𝐷𝑘 = 𝐿𝐿 + (10 )𝑖 𝑓 𝐷6 = Percentiles 𝑘𝑛 − ∑𝑓𝑐 𝑃𝑘 = 𝐿𝐿 + ( 100 )𝑖 𝑓 𝑃45 = Step 4. Identify the Lowest Limit. 45 The lowest score must be checked first if it is divisible by the class width or i. If yes, then it is the lowest limit. If not, look for a number less than the lowest score but closest to it which is divisible by the class width or i. In the example, the lowest score is 25 which is not divisible by i=6. So, the score less than 25 but closest to it which is divisible by i=6 is 24. Step 5. Construct the frequency distribution Table. Study the Table below. Class Classes xm f cf f xm xm -𝒙 (xm-𝒙)2 f(xm-𝒙)2 Boundaries 24 29 23.5 29.5 26.5 5 5 35 132.50 -11.83 139.95 699.74 30 35 29.5 35.5 32.5 8 13 260.00 -5.83 33.99 271.91 36 41 35.5 41.5 38.5 12 25 22 462.00 0.17 0.03 0.35 42 47 41.5 47.5 44.5 4 29 10 6.17 38.07 152.28 48 53 47.5 53.5 5 34 6 252.50 12.17 148.11 740.54 54 59 53.5 59.5 56.5 1 1 56.50 18.17 330.15 n 35 ∑ f xm 1341.5 ∑ f(xm-𝒙)2 2194.97 Measures of Central Tendency Mean 46 ∑ 𝐟 𝐱𝐦 𝑥 = 𝑛 Median 𝑛 −∑𝑓𝑐 2 𝑀𝑒 = 𝐿𝐿 + ( )𝑖 𝑓 Mode ∆1 𝑀𝑜 = 𝐿𝐿 + ( )𝑖 ∆1 +∆2 Measures of Variation Variance ∑ 𝐟(𝐱𝐦 −𝐱)𝟐 𝑠2 = 𝑛−1 Standard Deviation 𝑠 = √𝑠 2 Quantiles Quartiles 47 𝑘𝑛 − ∑𝑓𝑐 𝑄𝑘 = 𝐿𝐿 + ( 4 )𝑖 𝑓 𝑄3 = Deciles 𝑘𝑛 − ∑𝑓𝑐 𝐷𝑘 = 𝐿𝐿 + (10 )𝑖 𝑓 𝐷6 = Percentiles 𝑘𝑛 − ∑𝑓𝑐 𝑃𝑘 = 𝐿𝐿 + ( 100 )𝑖 𝑓 𝑃45 = Practice Tasks Name :_________________ Date : ____________ Course & Section :_________________ Score : ____________ 48 Using the frequency distribution in Step 5, solve for: 1. 3rd Decile or D3 2. 4th Decile or D4 3. 37th Percentile or P37 4. 50th Percentile or P50 5. 64th Percentile or P64 Supplemental Tasks 49 1. Research for the definition of the following: 1.1. Frequency histogram 1.2. Frequency polygon 1.3. Frequency ogives 2. Using the frequency distribution in Page 43, construct the following in graphing papers: 2.1. Frequency histogram 2.2. Frequency polygon 2.3. Frequency ogives Lesson 9 Measures of Correlation 50 This determines the association or degree of relationship between two variables. Ranges of r: r Strength of Correlation 0.00 Not Correlated 0.01 - 0.20 Negligibly Correlated 0.21 - 0.40 Low/Slightly Correlated 0.41 - 0.70 Moderately Correlated 0.71 - 0.90 Highly Correlated 0.91 - 0.99 Very Highly Correlated 1.00 Perfectly Correlated Review on Measurement Scales In the computation of the correlation between two variables, it is necessary that you know the measurement scale of each variable. MEASUREMENT DESCRIPTION SCALES Nominal Categorical Ranking Ordinal Distance between values are not known Age group or income or grouped interval/ratio data Distance between values are equal and known Interval No Absolute Zero (Addition or Subtraction) With absolute Zero Ratio (Addition, Subtraction, Multiplication, Division) Summary of Correlation Techniques 51 CORRELATION MEASUREMENT SCALES STATISTICS Quantitative Quantitative Pearson r Dicho Ordinal with Quantitative Biserial Continuity Quantitative Dicho Nominal Point Biserial Nominal, ordinal, Quantitative Eta squared grouped interval Dicho Nominal Dicho Nominal Phi Dicho Nominal Ordinal Rank-Biserial Ordinal Ordinal Spearman Rho Nominal Nominal (Artificially (Artificially Tetrachoric Dicho) Dicho) Nominal Ordinal Gamma Pearson's Contingency Nominal Nominal Coefficient or C Nominal Nominal Cramer's V coefficient Goodman and Kruskal Nominal Nominal Lambda coefficient * If no specific statistics can be used for a particular pair of variables, the higher measure must be treated or transformed into the lower one. The variables then must be treated statistically based on the lower measure. Pearson Product-Moment Correlation Coefficient 𝒏 ∑ 𝑿𝒀 − (∑ 𝑿)(∑ 𝒀) 𝒓= √𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 √𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 52 X – Data Set 1 (First Variable) Y – Data Set 2 (Second Variable) Subject X Y XY X2 Y2 1 1 92 92 1 8464 2 1 76 76 1 5776 3 1 82 82 1 6724 4 2 90 180 4 8100 5 2 88 176 4 7744 6 2 75 150 4 5625 7 3 83 249 9 6889 8 4 92 368 16 8464 9 5 94 470 25 8836 10 5 78 390 25 6084 11 6 76 456 36 5776 12 7 82 574 49 6724 13 7 81 567 49 6561 14 8 80 640 64 6400 15 8 75 600 64 5625 TOTALS 62 1244 5070 352 103792 Solve for r: 𝒏 ∑ 𝑿𝒀 − (∑ 𝑿)(∑ 𝒀) 𝒓= √𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 √𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 𝒓= Syntax : Interpretation: Name: __________________ Section: ________________ Pearson Product-Moment Correlation Coefficient 53 𝒏 ∑ 𝑿𝒀 − (∑ 𝑿)(∑ 𝒀) 𝒓= √𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 √𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 X – Data Set 1 (First Variable) Y – Data Set 2 (Second Variable) Subject X Y XY X2 Y2 1 1 92 92 1 8464 2 1 76 76 1 5776 3 1 82 82 1 6724 4 2 90 180 4 8100 5 2 88 176 4 7744 6 2 75 150 4 5625 7 3 83 249 9 6889 8 4 92 368 16 8464 9 5 94 470 25 8836 10 5 78 390 25 6084 11 6 76 456 36 5776 12 7 82 574 49 6724 13 7 81 567 49 6561 14 8 80 640 64 6400 15 8 75 600 64 5625 TOTALS 62 1244 5070 352 103792 Solve for r: 𝒏 ∑ 𝑿𝒀 − (∑ 𝑿)(∑ 𝒀) 𝒓= √𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 √𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 𝒓= Syntax : Interpretation: Practice Tasks Name :_________________ Date : ____________ Course & Section :_________________ Score : ____________ 54 Complete the Table: X – Data Set 1 (First Variable) Y – Data Set 2 (Second Variable) Subject X Y XY X2 Y2 1 25 3 75 625 9 2 38 4 1444 3 45 5 2025 25 4 65 6 390 36 5 48 4 192 16 6 24 3 72 576 9 7 15 2 30 225 8 15 3 45 225 9 24 3 72 576 9 10 32 4 1024 16 11 15 3 225 12 14 3 42 196 9 13 15 2 30 4 14 35 4 140 16 15 78 5 390 6084 25 16 15 3 45 225 9 17 54 4 16 18 24 3 72 576 19 84 5 420 7056 20 35 4 140 1225 16 TOTALS 700 73 287 𝒏 ∑ 𝑿𝒀 − (∑ 𝑿)(∑ 𝒀) 𝒓= √𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 √𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 Supplemental Tasks Name :_________________ Course & Section :_________________ 55 Date : ____________ Score : ____________ Provide an example for each of the given correlation technique. 1. Biserial Another measure of association, the biserial correlation coefficient, termed rb, is similar to the point biserial, but its quantitative data against ordinal data, but ordinal data with an underlying continuity but measured discretely as two values (dichotomous). An example might be test performance vs anxiety, where anxiety is designated as either high or low. Presumably, anxiety can take on any value in between, perhaps beyond, but it may be difficult to measure. We further assume that anxiety is normally distributed. Rb = ( (M1 - M2) * p1p2 ) / ZSy where M1 and M2 are the means of the two groups. p1 and p2 are the proportions of the two groups form of the total. Sy is the standard deviation on the continuous variable as a whole and Z is the Z-score from a table of the normal distributions for p1 or p2, whichever is smaller. 2. Point-Biserial 56 where This is mathematically equivalent to the traditional correlation formula. The interpretation is similar. The point biserial correlation is positive when large values of X are associated with Y=1 and small values of X are associated with Y=0. 3. Phi 57 Y\X 0 1 Totals 1 A B A+B 0 C D C+D Totals: A + C B + D N With this coding: phi = (BC - AD)/sqrt((A+B)(C+D)(A+C)(B+D)). 4. Rank-Biserial 58 The rank-biserial correlation coefficient, rrb, is used for dichotomous nominal data vs rankings (ordinal). The formula is usually expressed as rrb = 2 (Y1 - Y0)/n, where n is the number of data pairs, and Y0 and Y1, again, are the Y score means for data pairs with an x score of 0 and 1, respectively. These Y scores are ranks. This formula assumes no tied ranks are present. 5. Spearman Rho 59 n is the number of paired ranks and d is the difference between the paired ranks. UNIT II 60 INFERENTIAL STATISTICS In this unit, you are expected to: Symbolically and substantially formulate hypotheses of at least 50% of the given real- life problems. Test hypothesis of at least 50% of the given real-life situations using appropriate statistical tools. Lesson 1 Statistical Hypothesis 61 Statistical hypothesis is an assertion or conjecture concerning one or more populations. It is a logical tentative solution to a problem concerning population(s). There are two types of hypothesis: null (Ho) and alternative (Ha). A null hypothesis is an assertion that the characteristics of groups under investigation DO NOT significantly differ based on data. On the other hand, alternative hypothesis is an assertion that the characteristics of groups under investigation significantly differ based on data. It is the null hypothesis that will be tested for rejection and not for acceptance. If the null hypothesis is false, then we reject it and accept the alternative hypothesis. However, if statistical result cannot establish that the null hypothesis is false, it doesn’t necessary mean that it is true; thus, we do not accept the null hypothesis but we reserve our judgment instead for future investigation. Moreover, it is necessary to determine whether the test is directional (one-tailed) or non-directional (two-tailed) when testing a hypothesis. To do it, we look into the nature of the alternative hypothesis. Consider the table below. Ho Ha Hypothesis Type of Test Word Symbol Word Symbol Equal, do Not equal, differ, not differ, ≠ there is a = Non-directional no difference Indicator difference At least ≥ less than < Directional left greater than, At most ≤ > Directional right more than Examples: 1. To test if the average temperature in a given sampling site is 280C. 62 Ho: Ha: 2. To test if the average temperature in a given sampling site is not 280C. Ho: Ha: 3. To test if the average temperature in a given sampling site is less than 280C. Ho: Ha: 4. To test if the average temperature in a given sampling site is at least 280C. Ho: Ha: 5. To test if the average temperature in a given sampling site is at most 280C. Ho: Ha: 6. To test if coral reefs in station A is equally diverse to coral reefs in station B. Ho: Ha: 63 7. To test if coral reefs in station A is more diverse than coral reefs in station B. Ho: Ha: 8. To test if coral reefs in station A is less diverse than coral reefs in station B. Ho: Ha: 9. To test if there is a significant difference on the salinity of water of the three sampling sites. Ho: Ha: 10. To test if there is a significant relationship between highest educational attainment of parents of the respondents and their level of awareness to global warming. Ho: Ha: Lesson 2 Steps in Hypothesis Testing Testing a hypothesis requires a step-by-step process from formulation of the 64 hypothesis to making a conclusion. Please consider the steps below: 1. State the null and alternative hypotheses. 2. Choose a fixed significance level of α. 3. Determine the appropriate test statistic for the problem and the critical value(s) from the Table. 4. Compute using the formula of the test statistic. 5. Make a decision whether to reject or not the null hypothesis by comparing the computed value and the critical value. 6. Draw a scientific conclusion with interpretation. The construction of the null and alternative hypotheses must be done in substantial and symbolic manners. Substantial means that it talks about the problem while the symbolic hypothesis uses only symbols for the values. The selection of the significance level depends on the researcher. However, in solving problems concerning hypothesis testing, the significance level is already set. Moreover, in determining the appropriate test statistic, you must carefully understand the problem and investigate the given quantities and cross-check these with the list of test statistics. When the test statistic is already determined, check on the table for that test statistic the critical value(s) based on the significance level. Careful computations follow. Pay attention to substitution of values and pressing keys in the calculator. It would be a waste of time if you make a mistake in this step. Besides, the next steps would be affected by your mistake in this step. After computation, it is time to make a decision on the null hypothesis as to whether reject or not reject it by comparing the critical value in Step 2 and the computed value in Step 4. Different test statistics have different ruling in rejecting the null hypothesis. If the null hypothesis is rejected, then accept the alternative hypothesis and make a conclusion based on it coupled with interpretations. On the other hand, if the null hypothesis is not rejected, do not accept it. Conclusion must speak only on the lack of evidence to reject the null hypothesis and future investigations will be needed. Lesson 3 Test Statistics There are many test-statistics. The commonly used statistics are the z-test, t- 65 test, F-test (used in ANOVA) and chi-square test. The z and t-tests have many types depending on cases. Different cases require different formula for these tests. Different t-test types have different formulas for df or degrees of freedom. On the other hand, chi-square test has types and they all used the same formula but computations of the “expected value” differ. The F-test is a widely used test statistic. It requires the computation of post-hoc test when results show that the null hypothesis will be rejected. The most common post-hoc test for F-test is the Scheffē’s test. Consider the following formulas with rules in using them and in making decisions concerning the null hypothesis: Single Sample: Test Concerning a Single Mean (Variance Known) ̿ − 𝝁𝟎 𝒙 𝒛= 𝝈 √𝒏 Note: The Table provided in this guide provides only positive values. If the test is left tailed, the value provided in the Table should be given a negative sign in assigning the critical value. If the test is non-directional, the critical values should be both positive and negative values of what is provided in the Table. If Ha is left-tailed (), reject Ho if the computed z-value is greater than the critical z-value (positive). If Ha is non-directional, reject Ho if the computed z-value is less than the critical z-value (negative) OR if the computed z-value is greater than the critical z-value (positive). Single Sample: Test Concerning a Single Mean (Variance Unknown) ̅ − 𝝁𝟎 𝒙 𝒕= 𝒔 , 𝜈 =𝑛−1 √𝒏 66 Note: The Table provided in this guide provides only positive values. If the test is left tailed, the value provided in the Table should be given a negative sign in assigning the critical value. If the test is non-directional, the critical values should be both positive and negative values of what is provided in the Table. If Ha is left-tailed (), reject Ho if the computed t-value is greater than the critical t-value (positive). If Ha is non-directional, reject Ho if the computed t-value is less than the critical z-value (negative) OR if the computed t-value is greater than the critical z-value (positive). Two Samples: Test Concerning Two Means (Variance Known) 67 ̅𝟏 − 𝒙 (𝒙 ̅ 𝟐 ) − 𝒅𝟎 𝒛= 𝟐 𝟐 √𝝈 𝟏 + 𝝈𝟐 𝒏𝟏 𝒏𝟐 Note: The same rules apply in determining the critical value(s) as that of the single mean. If Ha is left-tailed (), reject Ho if the computed z-value is greater than the critical z-value (positive). If Ha is non-directional, reject Ho if the computed z-value is less than the critical z-value (negative) OR if the computed z-value is greater than the critical z-value (positive). Two Samples: Test Concerning Two Means (Equal Variance and Unknown) 68 ̅𝟏 − 𝒙 (𝒙 ̅ 𝟐 ) − 𝒅𝟎 𝒕= 𝟏 𝟏 𝒔𝒑 √ + 𝒏𝟏 𝒏𝟐 𝟐 𝒔𝟏 𝟐 (𝒏𝟏 − 𝟏) + 𝒔𝟐 𝟐 (𝒏𝟐 − 𝟏) 𝒔𝒑 = 𝒏𝟏 + 𝒏𝟐 − 𝟐 𝜈 = 𝒏𝟏 + 𝒏𝟐 − 𝟐 Note: The same rules apply in determining the critical value(s) as that of the single mean. If Ha is left-tailed (), reject Ho if the computed t-value is greater than the critical t-value (positive). If Ha is non-directional, reject Ho if the computed t-value is less than the critical z-value (negative) OR if the computed t-value is greater than the critical z-value (positive). Two Samples: Test Concerning Two Means (Unequal Variance and Unknown) 69 ̅𝟏 − 𝒙 (𝒙 ̅ 𝟐 ) − 𝒅𝟎 𝒕= 𝟐 𝟐 √ 𝒔𝟏 + 𝒔𝟐 𝒏𝟏 𝒏𝟐 𝒔𝟏 𝟐 𝒔𝟐 𝟐 𝟐 ( + ) 𝒏𝟏 𝒏𝟐 𝜈= 𝒔 𝟐 𝒔 𝟐 ( 𝟏 )𝟐 ( 𝟐 )𝟐 𝒏𝟏 𝒏𝟐 + 𝒏𝟏 − 𝟏 𝒏𝟐 − 𝟏 Note: The same rules apply in determining the critical value(s) as that of the single mean. If Ha is left-tailed (), reject Ho if the computed t-value is greater than the critical t-value (positive). If Ha is non-directional, reject Ho if the computed t-value is less than the critical z-value (negative) OR if the computed t-value is greater than the critical z-value (positive). 70 Paired Observations ̅ − 𝒅𝟎 𝒅 𝒕= 𝒔 , 𝜈 =𝑛−1 𝒅 √𝒏 Note: The same rules apply in determining the critical value(s) as that of the single mean. If Ha is left-tailed (), reject Ho if the computed t-value is greater than the critical t-value (positive). If Ha is non-directional, reject Ho if the computed t-value is less than the critical z-value (negative) OR if the computed t-value is greater than the critical z-value (positive). 71 Test on Single Proportion 𝒙 − 𝒏𝒑𝟎 𝒛= √𝒏𝒑𝟎 𝒒𝟎 Note: The same rules apply in determining the critical value(s) as that of the single mean. If Ha is left-tailed (), reject Ho if the computed z-value is greater than the critical z-value (positive). If Ha is non-directional, reject Ho if the computed z-value is less than the critical z-value (negative) OR if the computed z-value is greater than the critical z-value (positive). Test on Two Proportions 72 ̂𝟏 − 𝒑 𝒑 ̂𝟐 𝒛= 𝟏 𝟏 √𝒑 ̂𝒒̂( + ) 𝒏𝟏 𝒏𝟐 Note: The same rules apply in determining the critical value(s) as that of the single mean. If Ha is left-tailed (), reject Ho if the computed z-value is greater than the critical z-value (positive). If Ha is non-directional, reject Ho if the computed z-value is less than the critical z-value (negative) OR if the computed z-value is greater than the critical z-value (positive). Chi-Square: Goodness-of-fit 73 𝒌 (𝒐𝒊 − 𝒆𝒊 )𝟐  =∑ 𝟐 , 𝒗=𝒌−𝟏 𝒆𝒊 𝒊=𝟏 Note: This used to determine if observed frequencies follow of fit a predetermined distribution. The directionality of the test does not matter. The Table provides a critical value at the particular degrees of freedom. The expected values depend on the predetermined distribution. Reject Ho if the computed 𝟐 -value is greater than the critical 𝟐 -value. Chi-Square: Test of Homogeneity 74 𝒌 (𝒐𝒊 − 𝒆𝒊 )𝟐  =∑ 𝟐 , 𝒗 = (𝒓 − 𝟏)(𝒄 − 𝟏) 𝒆𝒊 𝒊=𝟏 Note: This is used to determine if there is a significant difference on the observed frequencies of different groups. The directionality of the test does not matter. The Table provides a critical value at the particular degrees of freedom. The expected values can be obtained by the formula row total x column total / grand total Reject Ho if the computed 𝟐 -value is greater than the critical 𝟐 -value. Chi-Square: Test of Independence 75 𝒌 (𝒐𝒊 − 𝒆𝒊 )𝟐  =∑ 𝟐 , 𝒗 = (𝒓 − 𝟏)(𝒄 − 𝟏) 𝒆𝒊 𝒊=𝟏 Note: This is used to determine if there is a significant relationship between observed frequencies of two groups; that is if one group depends the other or not. The directionality of the test does not matter. The Table provides a critical value at the particular degrees of freedom. The expected values can be obtained by the formula row total x column total / grand total Reject Ho if the computed 𝟐 -value is greater than the critical 𝟐 -value. Repeated t-test: One-way Analysis of Variance (ANOVA) 76 𝒅𝒇𝟏 = 𝒌 − 𝟏 𝒅𝒇𝟐 = (𝒏 − 𝟏) − (𝒌 − 𝟏) (∑ 𝒙𝟏 + ∑ 𝒙𝟐 + ∑ 𝒙𝟑 + ⋯ + ∑ 𝒙𝒊 )𝟐 𝑪𝑭 = 𝒏𝟏 + 𝒏𝟐 + 𝒏𝟑 + ⋯ + 𝒏𝒊 𝑻𝑺𝑺 = ∑ 𝒙𝟐𝟏 + ∑ 𝒙𝟐𝟐 + ∑ 𝒙𝟐𝟑 + ⋯ + ∑ 𝒙𝟐𝒊 − 𝑪𝑭 (∑ 𝒙𝟏 )𝟐 (∑ 𝒙𝟐 )𝟐 (∑ 𝒙𝟑 )𝟐 (∑ 𝒙𝒊 )𝟐 𝑩𝑺𝑺 = + + + ⋯+ − 𝑪𝑭 𝒏𝟏 𝒏𝟐 𝒏𝟑 𝒏𝒊 𝑾𝑺𝑺 = 𝑻𝑺𝑺 − 𝑩𝑺𝑺 𝑩𝑺𝑺 𝑴𝑺𝑩 = 𝒅𝒇𝟏 𝑾𝑺𝑺 𝑴𝑺𝑾 = 𝒅𝒇𝟐 𝑴𝑺𝑩 𝑭 = 𝑴𝑺𝑾 Note: This is used to determine if there is a significant difference on the values of more than three groups. The directionality of the test does not matter. The Table provides a critical F-value for each pair of critical values in a particular alpha. Reject Ho if the computed 𝑭-value is greater than the critical 𝑭 −value. Post hoc Test for ANOVA: Scheffē’s Test 77 ′ ̅𝟏 − 𝒙 (𝒙 ̅ 𝟐 )𝟐 𝑭 = 𝑴𝑺𝑾(𝒏𝟏 + 𝒏𝟐 ) 𝒏𝟏 𝒏𝟐 Note: This is used to determine as support to ANOVA when Ho is rejected. Specifically, it is used to identify which pair of groups solved by ANOVA significantly differ. The computation of the critical value is based on the formula critical-value of ANOVA x (k-1) Reject Ho if the computed 𝑭′ −value is greater than the critical 𝑭′ −value. Significance of Correlation Coefficient 78 𝒓√𝒏 − 𝟐 𝒕= , 𝜈 =𝑛−2 √𝟏 − 𝒓𝟐 Note: This is used to determine if the obtained r-value for correlation is significant or if there is a significant relationship between two variables. The same rules apply in determining the critical value(s) as that of the single mean. If Ha is left-tailed (), reject Ho if the computed t-value is greater than the critical t-value (positive). If Ha is non-directional, reject Ho if the computed t-value is less than the critical z-value (negative) OR if the computed t-value is greater than the critical z-value (positive). TABLES z- distribution Table 79  0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0 0.5 0.49601 0.49202 0.48803 0.48405 0.48006 0.47608 0.4721 0.46812 0.46414 0.1 0.46017 0.45621 0.45224 0.44828 0.44433 0.44038 0.43644 0.43251 0.42858 0.42466 0.2 0.42074 0.41683 0.41294 0.40905 0.40517 0.40129 0.39743 0.39358 0.38974 0.38591 0.3 0.38209 0.37828 0.37448 0.3707 0.36693 0.36317 0.35942 0.35569 0.35197 0.34827 0.4 0.34458 0.3409 0.33724 0.3336 0.32997 0.32636 0.32276 0.31918 0.31561 0.31207 0.5 0.30854 0.30503 0.30153 0.29806 0.2946 0.29116 0.28774 0.28434 0.28096 0.2776 0.6 0.27425 0.27093 0.26763 0.26435 0.26109 0.25785 0.25463 0.25143 0.24825 0.2451 0.7 0.24196 0.23885 0.23576 0.2327 0.22965 0.22663 0.22363 0.22065 0.2177 0.21476 0.8 0.21186 0.20897 0.20611 0.20327 0.20045 0.19766 0.19489 0.19215 0.18943 0.18673 0.9 0.18406 0.18141 0.17879 0.17619 0.17361 0.17106 0.16853 0.16602 0.16354 0.16109 1 0.15866 0.15625 0.15386 0.15151 0.14917 0.14686 0.14457 0.14231 0.14007 0.13786 1.1 0.13567 0.1335 0.13136 0.12924 0.12714 0.12507 0.12302 0.121 0.119 0.11702 1.2 0.11507 0.11314 0.11123 0.10935 0.10749 0.10565 0.10384 0.10204 0.10027 0.09853 1.3 0.0968 0.0951 0.09342 0.09176 0.09012 0.08851 0.08692 0.08534 0.08379 0.08226 1.4 0.08076 0.07927 0.0778 0.07636 0.07493 0.07353 0.07215 0.07078 0.06944 0.06811 1.5 0.06681 0.06552 0.06426 0.06301 0.06178 0.06057 0.05938 0.05821 0.05705 0.05592 1.6 0.0548 0.0537 0.05262 0.05155 0.0505 0.04947 0.04846 0.04746 0.04648 0.04551 1.7 0.04457 0.04363 0.04272 0.04182 0.04093 0.04006 0.0392 0.03836 0.03754 0.03673 1.8 0.03593 0.03515 0.03438 0.03363 0.03288 0.03216 0.03144 0.03074 0.03005 0.02938 1.9 0.02872 0.02807 0.02743 0.0268 0.02619 0.02559 0.025 0.02442 0.02385 0.0233 2 0.02275 0.02222 0.02169 0.02118 0.02068 0.02018 0.0197 0.01923 0.01876 0.01831 2.1 0.01786 0.01743 0.017 0.01659 0.01618 0.01578 0.01539 0.015 0.01463 0.01426 2.2 0.0139 0.01355 0.01321 0.01287 0.01255 0.01222 0.01191 0.0116 0.0113 0.01101 2.3 0.01072 0.01044 0.01017 0.0099 0.00964 0.00939 0.00914 0.00889 0.00866 0.00842 2.4 0.0082 0.00798 0.00776 0.00755 0.00734 0.00714 0.00695 0.00676 0.00657 0.00639 2.5 0.00621 0.00604 0.00587 0.0057 0.00554 0.00539 0.00523 0.00509 0.00494 0.0048 2.6 0.00466 0.00453 0.0044 0.00427 0.00415 0.00403 0.00391 0.00379 0.00368 0.00357 2.7 0.00347 0.00336 0.00326 0.00317 0.00307 0.00298 0.00289 0.0028 0.00272 0.00264 2.8 0.00256 0.00248 0.0024 0.00233 0.00226 0.00219 0.00212 0.00205 0.00199 0.00193 2.9 0.00187 0.00181 0.00175 0.0017 0.00164 0.00159 0.00154 0.00149 0.00144 0.0014 3 0.00135 0.00131 0.00126 0.00122 0.00118 0.00114 0.00111 0.00107 0.00104 0.001 3.1 0.00097 0.00094 0.0009 0.00087 0.00085 0.00082 0.00079 0.00076 0.00074 0.00071 3.2 0.00069 0.00066 0.00064 0.00062 0.0006 0.00058 0.00056 0.00054 0.00052 0.0005 3.3 0.00048 0.00047 0.00045 0.

Statistics & Probability PDF

Document Details

Tags

Related

Summary

Full Transcript