Data Management PDF
Document Details
Uploaded by RecordSettingPluto
STI College Caloocan
Tags
Summary
This document provides an overview of data management, including different methods of data collection, such as census, sample surveys, experiments, and observational studies. It also details the key characteristics of a well-designed and well-conducted survey, along with various sampling methods and their applications.
Full Transcript
GE1707 Data Management Data Management What is Data? Data are raw information or facts that become useful information when organized in a meaningful way. It could be of qualitative and quantitative nature. What is Data Management? Data Management is...
GE1707 Data Management Data Management What is Data? Data are raw information or facts that become useful information when organized in a meaningful way. It could be of qualitative and quantitative nature. What is Data Management? Data Management is concerned with “looking after” and processing data. It involves the following: Looking after field data sheets Checking and correcting the raw data Preparing data for analysis Documenting and archiving the data and meta-data Importance of Data Management Ensures that data for analysis are of high quality so that conclusions are correct Good data management allows further use of the data in the future and enables efficient integration of results with other studies. Good data management leads to improved processing efficiency, improved data quality, and improved meaningfulness of the data. Planning and Conducting an Experiment or Study A. Methods of data collection 1. Census – this is the procedure of systematically acquiring and recording information about all members of a given population. Researchers rarely survey the entire population for two (2) reasons: the cost is too high and the population is dynamic in that the individuals making up the population may change over time. 2. Sample Survey – sampling is a selection of a subset within a population, to yield some knowledge about the population of concern. The three main advantages of sampling are that (i) the cost is lower, (ii) data collection is faster, and (iii) since the data set is smaller, it is possible to improve the accuracy and quality of the data. 3. Experiment – this is performed when there are some controlled variables (like certain treatment in medicine) and the intention is to study their effect on other observed variables (like health of patients). One of the main requirements to experiments is the possibility of replication. 4. Observation study – this is appropriate when there are no controlled variables and replication is impossible. This type of study typically uses a survey. An example is one that explores the correlation between smoking and lung cancer. In this case, the researchers would collect observations of both smokers and non-smokers and then look for the number of cases of lung cancer in each group. B. Planning and Conducting Surveys 1. Characteristics of a well-designed and well-conducted survey a. A good survey must be representative of the population. b. To use the probabilistic results, it always incorporates a chance, such as a random number generator. Often we don’t have a complete listing of the population, so we have to be careful about exactly how we are applying “chance”. Even when the frame is correctly specified, the subjects may choose not to respond or may not be able to respond. c. The wording of the question must be neutral; subjects give different answers depending on the phrasing. d. Possible sources of errors and biases should be controlled. The population of concern as a whole may not be available for a survey. Its subset of items possible to measure is called a sampling frame (from which the sample will be selected). The plan of the survey should specify a sampling method, determine the sample size and steps for implementing the sampling plan, and sampling and data collecting. 2. Sampling Methods a. Nonprobability sampling – is any sampling method where some elements of the population have no chance of selection or where the probability of selection can’t be accurately determined. The selection of elements is based on some criteria other than randomness. These conditions give rise to exclusion 04 Handout 1 *Property of STI Page 1 of 8 GE1707 bias, caused by the fact that some elements of the population are excluded. Nonprobability sampling does not allow the estimation of sampling errors. Information about the relationship between sample and population is limited, making it difficult to extrapolate from the sample to the population. Example: We visit every household in a given street, and interview the first person to answer the door. In any household with more than one occupant, this is a nonprobability sample, because some people are more likely to answer the door (e.g. an unemployed person who spends most of their time at home is more likely to answer than an employed housemate who might be at work when the interviewer calls) and it’s not practical to calculate these probabilities. One example of nonprobability sampling is convenience sampling (customers in a supermarket are asked questions). Another is quota sampling, when judgment is used to select the subjects based on specified proportions. For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60. In addition, nonresponse effects may turn any probability design into a nonprobability design if the characteristics of nonresponse are not well understood, since nonresponse effectively modifies each element’s probability of being sampled. b. Probability Sampling – it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. The following sampling methods are example of probability sampling: i. Simple Random Sampling (SRS), all samples of a given size have an equal probability of being selected and selections are independent. The frame is not subdivided or partitioned. The sample variance is a good indicator of the population variance, which makes it relatively easy to estimate the accuracy of results. However, SRS can be vulnerable to sampling error because the randomness of the selection may result in a sample that doesn’t reflect the makeup of the population. For instance, a simple random sample of ten people from a given country will on average produce five men and five women, but any given trial is likely to overrepresent one sex and underrepresent the other. Systematic and stratified techniques, discussed below, attempt to overcome this problem by using information about the population to choose a more representative sample. In some cases, investigators are interested in research questions specific to subgroups of the population. For example, researchers might be interested in examining whether cognitive ability as predictor of job performance is equally applicable across racial groups. SRS cannot accommodate the needs of researchers in this situation because it does not provide subsamples of the population. Stratified sampling, which is discussed below, addresses this weakness of SRS. ii. Systematic Sampling – relies on dividing the target population into strata (subpopulations) of equal size and then selecting randomly one element from the first stratum and corresponding elements from all other strata. A simple example would be to select every 10th name from the telephone directory, with the first selectin being random. SRS may select a sample from the beginning of the list. Systematic sampling helps to spread the sample over the list. As long as the starting point is randomized, systematic sampling is a type of probability sampling. Every 10th sampling is especially useful for efficient sampling from databases. However, systematic sampling is especially vulnerable to periodicities in the list. Consider a street where the odd-numbered houses are all on one side of the road, and the even-numbered houses are all on another side. Under systematic sampling, the houses sampled will all be either odd- numbered or even-numbered. Another drawback of systematic sampling is that even in scenarios where it is more accurate than SRS, its theoretical properties make it difficult to quantify that accuracy. 04 Handout 1 *Property of STI Page 2 of 8 GE1707 Systematic sampling is not SRS because different samples of the same size have different selection probabilities e.g. the set (4,14, 24,) has a one-in-ten probability of selection, but the set (4,1,24, 34,) has zero probability of selection. iii. Stratified Sampling – when the population embraces a number of distinct categories, the frame can be organized by these categories into separate “strata”. Each stratum is then sampled as an independent sub-population. Dividing the population into strata can enable researchers to draw inferences about specific subgroups that may be lost in a more generalized random sample. Since each stratum is treated as an independent population, different sampling approaches can be applied to different strata. However, implementing such an approach can increase the cost and complexity of sample selection. Example: To determine the proportions of defective products being assembled in a factory. A stratified sampling approach is most effective when three conditions are met: a. Variability within strata are minimized b. Variability between strata are maximized c. The variables upon which the population is stratified are strongly correlated with the desired dependent variable (beer consumption is strongly correlated with gender). iv. Cluster Sampling – sometimes it is cheaper to ‘cluster’ the sample in some way (e.g. by selecting respondents from certain areas only, or certain time-periods only). Cluster sampling is an example of two-stage random sampling: in the first stage a random sample of areas is chosen; in the second stage a random sample of respondents within those areas is selected. This works best when each cluster is a small copy of the population. This can reduce travel and other administrative costs. Cluster sampling generally increases the variability of sample estimates above that of simple random sampling, depending on how the clusters differ between themselves, as compared with the within-cluster variation. If clusters chosen are biased in a certain way, inferences drawn about population parameters will be inaccurate. v. Matched random sampling – in this method, there are two (2) samples in which the members are clearly paired, or are matched explicitly by the researcher (for example, IQ measurements or pairs of identical twins). Alternatively, the same attribute, or variable, may be measured twice on each subject, under different circumstances (e.g. the milk yields of cows before and after being fed a particular diet). C. Planning and conducting experiments 1. Characteristics of a well-designed and well-conducted experiment A good statistical experiment includes: a. Stating the purpose of research, including estimates regarding the size of treatment effects, alternative hypotheses, and the estimated experimental variability. Experiments must compare the new treatment with at least one (1) standard treatment, to allow an unbiased estimates of the difference in treatment effects. b. Design of experiments, using blocking (to reduce the influence of confounding variables) and randomized assignment of treatments to subjects c. Examining the data set in secondary analyses, to suggest new hypotheses for future study d. Documenting and presenting the results of the study Example: Experiments on humans can change their behavior. The famous Hawthorne study examined changes to the working environment at the Hawthorne plant of the Western Electric Company. The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and found that productivity improved. However, the study is criticized today for the lack of a control group and blindness. Those in the Hawthorne study became more productive not because the lighting was changed but because they were being observed. 2. Treatment, control groups, experimental units, random assignments and replication a. Control groups and experimental units 04 Handout 1 *Property of STI Page 3 of 8 GE1707 To be able to compare effects and make inference about associations or predictions, one typically has to subject different groups to different conditions. Usually, an experimental unit is subjected to treatment and a control group is not. b. Random Assignments The second fundamental design principle is randomization of allocation of (controlled variables) treatments to units. The treatment effects, if present, will be similar within each group. c. Replication All measurements, observations or data collected are subject to variation, as there are no completely deterministic processes. To reduce variability, in the experiment the measurements must be repeated. The experiment itself should allow for replication itself should allow for replication, to be checked by other researchers. 3. Sources of bias and confounding, including placebo effect and blinding Sources of bias specific to medicine are confounding variables and placebo effects, among others. a. Confounding – a confounding variable is an extraneous variable in a statistical model that correlates (positively or negatively) with both the dependent variable and the independent variable. The methodologies of scientific studies therefore need to control for these factors to avoid a false positive (Type I) error (an erroneous conclusion that the dependent variables are in a causal relationship with the independent variable). Example: Consider the statistical relationship between ice cream sales and drowning deaths. These two (2) variables have a positive correlation because both occur more often during summer. However, it would be wrong to conclude that there is a cause-and-effect relation between them. b. Placebo and blinding – a placebo is an imitation pill identical to the actual treatment pill, but without the treatment ingredients. A placebo effect is a sham (or simulated) effect when medical intervention has no direct health impact but results in actual improvement of a medical condition because the patients knew they were treated. Typically, all patients are informed that some will be treated using the drug and some will receive the insert pill, however the patients are blinded as to whether they actually received the drug or the placebo. Blinding is a technique used to make the subjects “blind” to which treatment is being given. c. Blocking – is the arranging of experimental units in groups (blocks) that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter. An example of a blocking factor might be the sex of a patient; by blocking on sex (that is comparing men to men and women to women), this source of variability is controlled for, thus leading to greater precision. 4. Completely randomized design, randomized block design and matched pairs a. Completely randomized designs – are for studying the effects of one primary factor without the need to take other nuisance variables into account. The experiment compares the values of a response variable (like health improvement) based on the different levels of that primary factor (e.g., different amounts of medication). For completely randomized designs, the levels of the primary factor are randomly assigned to the experimental units (for example, using a random number generator). b. Randomized block design – is a collection of completely randomized experiments, each run within one of the blocks of the total experiment. A matched pairs of design is its special case when the blocks consist of just two (2) elements (measurements on the same patient before and after the treatment or measurements on two (2) different but in some way similar patients). Chi-Square The chi-square test is used to determine whether there is significant difference between the expected value frequencies and the observed frequencies in one or more categories. There are two (2) types of chi-square tests. Both use the chi-square statistic and distribution for different purposes: 04 Handout 1 *Property of STI Page 4 of 8 GE1707 A chi-square goodness of fit test determines if a sample data matches a population. A chi-square test for independence compares two (2) variables in a contingency table to see if they are related. It tests to see whether the distributions of categorical variables differ from each other. - A very small chi-square test statistic means that your observed data fits your expected data well. In other words, there is a relationship. - A very large chi-square test statistic means that the data does not fit very well. In other words, there is no relationship. Assumptions of the Chi-Square Test The assumptions of the chi-square test are the same whether we are using the goodness-of-fit or the test-of- independence. The standard assumptions are: Random sample Independent observations for the sample (one observation per subject) No expected counts less than five (5) Notice that the last two (2) assumptions are concerned with the expected counts, not the raw observed counts. To calculate the chi-square statistic, 𝜒𝜒 2 , use the following formula: (𝑂𝑂 − 𝐸𝐸)2 𝜒𝜒 2 = 𝐸𝐸 where: 𝜒𝜒 2 is the chi-square test statistic. 𝑂𝑂 is the observed frequency value for each event. 𝐸𝐸 is the expected frequency value for each event. We compare the value of the test statistic to a tabled chi-square value to determine the probability that a sample fits an expected pattern. Goodness of Fit Test A chi-square goodness-of-fit test is used to test whether a frequency distribution obtained experimentally fits an “expected” frequency distribution that is based on the theoretical or previously known probability of each outcome. An experiment is conducted in which a simple random sample is taken from a population, and each member of the population is grouped into exactly one of 𝑘𝑘 categories. Step 1: The observed frequencies are calculated for the sample. Step 2: The expected frequencies are obtained from previous knowledge (or belief) or probability theory. In order to proceed to the next step, it is necessary that each expected frequency is at least 5. Step 3: A hypothesis test is performed: a. The null hypothesis 𝐻𝐻0 : the population frequencies are equal to the expected frequencies. b. The alternative hypothesis 𝐻𝐻𝑎𝑎 : the null hypothesis is false. c. 𝛼𝛼 is the level of the significance. d. The degrees of freedom: 𝑘𝑘 − 1 e. A test statistic is calculated: (𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐)2 (𝑂𝑂 − 𝐸𝐸)2 𝜒𝜒 2 = = 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐸𝐸 f. From 𝛼𝛼 and 𝑘𝑘 − 1, a critical values is determined from the chi-square table. g. Reject 𝐻𝐻0 if 𝜒𝜒 2 is larger than the critical value (right tailed test) 04 Handout 1 *Property of STI Page 5 of 8 GE1707 Example: Researchers have conducted a survey of 1600 coffee drinkers asking how much coffee they drink in order to confirm previous studies. Previous studies have indicated that 72% of Americans drink coffee. Below are the results of previous studies (left) and the survey (right). At 𝛼𝛼 = 0.05, is there enough evidence to conclude that the distributions are the same? Response % of Coffee Drinkers Response Frequency 2 cups per week 15% 2 cups per week 206 1 cup per week 13% 1 cup per week 193 1 cup per day 27% 1 cup per day 462 2+ cups per day 45% 2+ cups per day 739 a. The null hypothesis 𝐻𝐻0 : the population frequencies are equal to the expected frequencies b. The alternative hypothesis 𝐻𝐻𝑎𝑎 : the null hypothesis is false. c. 𝑎𝑎 = 0.05 d. The degrees of freedom: 𝑘𝑘 − 1 = 4 − 1 = 3 e. The test statistic can be calculated using the table below: Response % of Coffee 𝐸𝐸 𝑂𝑂 𝑂𝑂 − 𝐸𝐸 (𝑂𝑂 − 𝐸𝐸)2 (𝑂𝑂 − 𝐸𝐸)2 Drinkers 𝐸𝐸 2 cups per week 15% 0.15 × 1600 = 240 206 −34 1156 4.817 1 cup per week 13% 0.13 × 1600 = 208 193 −15 225 1.082 1 cup per day 27% 0.27 × 1600 = 432 462 30 900 2.083 2+ cups per day 45% 0.45 × 1600 = 720 739 19 361 0.5014 (𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)2 (𝑂𝑂 − 𝐸𝐸)2 𝜒𝜒 2 = = = 8.483 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐸𝐸 f. From 𝛼𝛼 = 0.05 and 𝑘𝑘 − 1 = 3, the critical values is 7.815. g. Is there enough evidence to reject 𝐻𝐻0 ? Since 𝜒𝜒 2 ≈ 8.483 > 7.815, there is enough statistical evidence to reject the null hypothesis and to believe that the old percentages no longer hold. Test of Independence The chi-square test of independence is used to assess if two (2) factors are related. This test is often used in social science research to determine if factors are independent of each other. For example, we would use this test to determine relationships between voting patterns and race, income and gender, and behavior and education. In general, when running the test of independence, we ask, “Is Variable X independent of Variable Y?” It is important to note that this test does not test how the variables are related, just simply whether or not they are independent of one another. For example, while the test of independence can help us determine if income and gender are independent, it cannot help us assess how one category might affect the other. Just as with a goodness of fit test, we will calculate expected values, calculate chi-square statistic, and compare it to the appropriate chi-square value from a reference to see if we should reject 𝐻𝐻0 , which is that the variables are not related. Formally, the hypothesis statements for the chi-square test-of independence are: 𝐻𝐻0 : There is no association between the two (2) categorical variables 𝐻𝐻1 : There is an association (the two (2) variables are not independent) An experiment is conducted in which the frequencies for two (2) variables are determined. To use the test, the same assumptions must be satisfied: the observed frequencies are obtained through a simple random sample, and each expected frequency is at least 5. The frequencies are written down in a table: the columns contain outcomes for one (1) variable, and the rows contain outcomes for the other variable. 04 Handout 1 *Property of STI Page 6 of 8 GE1707 The procedure for the hypothesis test is essentially the same. The differences are that: a. 𝐻𝐻0 is that the two (2) variables are independent. b. 𝐻𝐻𝑎𝑎 is that the two (2) variables are not independent (they are dependent). c. The expected frequency 𝐸𝐸𝑟𝑟,𝑐𝑐 for the entry in row 𝑟𝑟, column 𝑐𝑐 is calculated using: (𝑆𝑆𝑆𝑆𝑆𝑆 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟 𝑟𝑟) × (𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐) 𝐸𝐸𝑟𝑟,𝑐𝑐 = 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 d. The degrees of freedom: (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑒𝑒𝑟𝑟 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 − 1) × (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 − 1) Example: The results of a random sample of children with pain from musculoskeletal injuries treated with acetaminophen, ibuprofen, or codeine are shown in the table. At 𝛼𝛼 = 0.10, is there enough evidence to conclude that the treatment and result are independent? Acetaminophen Ibuprofen Codeine Total Significant Improvement 58 (66.7) 81 (66.7) 61 (66.7) 200 Slight Improvement 42 (33.3) 19 (33.3) 39 (33.3) 100 Total 100 100 100 300 First, calculate the column and row totals. Then, find the expected frequency for each item and write it in the parenthesis next to the observed frequency. Now, perform the hypothesis test. a. The null hypothesis 𝐻𝐻0 : the treatment and response are independent. b. The alternative hypothesis 𝐻𝐻𝑎𝑎 : the treatment and response are dependent. c. 𝑎𝑎 = 0.10. d. The degrees of freedom: (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 − 1) × (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 − 1) = (2 − 1) × (3 − 1) = 1 × 2 = 2 e. The test statistic can be calculated using the table below: 𝑅𝑅𝑅𝑅𝑅𝑅, 𝐶𝐶𝐶𝐶𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝐸𝐸 𝑂𝑂 (𝑂𝑂 − 𝐸𝐸) (𝑂𝑂 − 𝐸𝐸)2 (𝑂𝑂 − 𝐸𝐸)2 𝐸𝐸 1,1 200 ∙ 100 58 −8.7 75.69 1.135 = 66.7 300 1,2 200 ∙ 100 81 14.3 204.49 3.067 = 66.7 300 1,3 200 ∙ 100 61 −5.7 32.49 0.487 = 66.7 300 2,1 100 ∙ 100 42 8.7 75.69 2.271 = 33.3 300 2,2 100 ∙ 100 19 −14.3 204.49 6.135 = 33.3 300 2,3 100 ∙ 100 39 5.7 32.49 0.975 = 33.3 300 (𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)2 (𝑂𝑂 − 𝐸𝐸)2 𝜒𝜒 2 = = = 14.07 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐸𝐸 f. From 𝛼𝛼 = 0,10 and 𝑑𝑑. 𝑓𝑓 = 2, the critical value is 4.605. g. Is there enough evidence to reject 𝐻𝐻0 ? Since 𝜒𝜒 2 ≈ 14.07 > 4.605, there is enough statistical evidence to reject the null hypothesis and to believe that there is a relationship between the treatment and response. 04 Handout 1 *Property of STI Page 7 of 8 GE1707 Example: A doctor believes that the proportions of births in this country on each day of the week are equal. A simple random of 700 births from a recent year is selected, and the result are below. At a significance level of 0.01, is there enough evidence to support the doctor’s claim? Day Sunday Monday Tuesday Wednesday Thursday Friday Saturday Frequency 65 103 114 116 115 112 75 a. The null hypothesis 𝐻𝐻0 : the population frequencies are equal to the expected frequencies b. The alternative hypothesis 𝐻𝐻𝑎𝑎 : the null hypothesis is false. c. 𝑎𝑎 = 0.01 d. The degrees of freedom: 𝑘𝑘 − 1 = 7 − 1 = 6 e. The test statistic can be calculated using a table: Day 𝐸𝐸 𝑂𝑂 𝑂𝑂 − 𝐸𝐸 (𝑂𝑂 − 𝐸𝐸)^2 (𝑂𝑂 − 𝐸𝐸)2 𝐸𝐸 Sunday 700/7 = 100 65 −35 1225 12.25 Monday 700/7 = 100 103 3 9 0.09 Tuesday 700/7 = 100 114 14 196 1.96 Wednesday 700/7 = 100 116 16 256 2.56 Thursday 700/7 = 100 115 15 225 2.25 Friday 700/7 = 100 112 12 144 1.44 Saturday 700/7 = 100 75 −25 625 6.25 (𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 − 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)2 (𝑂𝑂 − 𝐸𝐸)2 𝜒𝜒 2 = = = 26.8 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝐸𝐸 f. From 𝛼𝛼 = 0.01 and 𝑘𝑘 − 1 = 6, the critical value is 16.812. g. Is there enough evidence to reject 𝐻𝐻0 ? Since 𝜒𝜒 2 ≈ 26.8 > 16.812, there is enough statistical evidence to reject the null hypothesis and to believe that the proportion of births is not the same for each day of the week. REFERENCES: Almukkahal R., Ottman L., DeLancey D., Evans A., Lawsky E., & Meery B. (2016). CK12 advance probability and statistics. Flexbook Next Generation Textbooks. Introduction to data management (n.d.) Retrieved from http://www.deniva.or.ug/docs/Reports/statistics/IntroductiontoDataManagement.pdf Sampling and experimentation: planning and conducting a study (n.d.) Retrieved from https://www.scribd.com/document/51105391/Planning-and-Conducting-a-Study-for-AP-Statistics 04 Handout 1 *Property of STI Page 8 of 8