Summary

This document provides an overview of data collection methods. It discusses various methods like censuses, surveys, experiments, and observational studies. It also explains the concepts of sampling, generalizability, and causal inference in research.

Full Transcript

Data Collection Introduction The purpose of most simulation models is to collect data and to analyze the data, in order to gain insights into the system being simulated. Thus we literally play with data in the conduct of what if scenarios to enable conclusions and decisions to be made. To derive go...

Data Collection Introduction The purpose of most simulation models is to collect data and to analyze the data, in order to gain insights into the system being simulated. Thus we literally play with data in the conduct of what if scenarios to enable conclusions and decisions to be made. To derive good conclusions from data, we do not just use any data, care and efforts are usually made to collect usable data from select at times restricted source(s). Therefore before decisions are made base on statistics from data, we need to know how the data were collected; that is, we need to know the method(s) of data collection, and the necessary screening on the data that has taken place. Intended Learning Outcomes (ILOs) THE READER SHOULD BE ABLE BY THE END OF THIS UNIT TO: Discuss the different methods of data collection: Census, Sample Survey, Experiments and Observations Describe the various methods determining sample size Describe data coding with respect to: What, Why, uses and determination of codes Methods of Data Collection There are four main methods of data collection: 1. Census: A census is a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required. 2. Sample survey: A sample survey is a study that obtains data from a subset of a population, in order to estimate population attributes. 3. Experiment: An experiment is a controlled study in which the researcher attempts to understand cause- and-effect relationships. The study is "controlled" in the sense that the researcher controls (1) how subjects are assigned to groups and (2) which treatments each group receives. In the analysis phase, the researcher compares group scores on some dependent variable. Based on the analysis, the researcher draws a conclusion about whether the treatment (independent variable) had a causal effect on the dependent variable. 4.Observational study: Like experiments, observational studies attempt to understand cause-and-effect relationships. However, unlike experiments, the researcher is not able to control (1) how subjects are assigned to groups and/or (2) which treatments each group receives. Pros and Cons of Data Collection Methods Each method of data collection has advantages and disadvantages. Resources. When the population is large, a sample survey has a big resource advantage over a census. A well-designed sample survey can provide very precise estimates of population parameters - quicker, cheaper, and with less manpower than a census. Generalizability. Generalizability refers to the appropriateness of applying findings from a study to a larger population. Generalizability requires random selection. If participants in a study are randomly selected from a larger population, it is appropriate to generalize study results to the larger population; if not, it is not appropriate to generalize. Observational studies do not feature random selection; so it is not appropriate to generalize from the results of an observational study to a larger population. Causal inference. Cause-and-effect relationships can be teased out when subjects are randomly assigned to groups. Therefore, experiments, which allow the researcher to control assignment of subjects to treatment groups, are the best method for investigating causal relationships. Sampling Survey Methods Sampling method refers to the way that observations are selected from a population to be in the sample for a sample survey. The reason for conducting a sample survey is to estimate the value of some attribute of a population. Population parameter. A population parameter is the true value of a population attribute Sample statistic. A sample statistic is an estimate, based on sample data, of a population parameter. Consider this example. A public opinion pollster wants to know the percentage of voters that favor a flat-rate income tax. The actual percentage of all the voters is a population parameter. The estimate of that percentage, based on sample data, is a sample statistic. The quality of a sample statistic (i.e., accuracy, precision, representativeness) is strongly affected by the way those sample observations are chosen; that is, by the sampling method. Categorization of Sampling Methods As a group, sampling methods fall into one of two categories: 1. Probability samples. With probability sampling methods, each population element has a known (non-zero) chance of being chosen for the sample. 2. Non-probability samples. With non-probability sampling methods, we do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen. Non-probability sampling methods offer two potential advantages - convenience and cost. The main disadvantage is that non-probability sampling methods do not allow you to estimate the extent to which sample statistics are likely to differ from population parameters. Only probability sampling methods permit that kind of analysis. Non-Probability Sampling Methods Two of the main types of non-probability sampling methods are voluntary samples and convenience samples. 1. Voluntary sample. A voluntary sample is made up of people who self-select into the survey. Often, these folks have a strong interest in the main topic of the survey. For example, that a news show that asks viewers to participate in an on-line poll. This is a volunteer sample because the sample is chosen by the viewers, not by the survey administrator. 2. Convenience sample. A convenience sample is made up of people who are easy to reach. Consider the following example. A pollster interviews shoppers at a local mall. If the mall was chosen because it was a convenient site from which to solicit survey participants and/or because it was close to the pollster's home or business, this would be a convenience sample. Probability Sampling Methods The main types of probability sampling methods are simple random sampling, stratified sampling, cluster sampling, multistage sampling, and systematic random sampling. The key benefit of probability sampling methods is that they guarantee that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid. Simple random sampling. Simple random sampling refers to any sampling method that has the following properties. The population consists of N objects. The sample consists of n objects. If all possible samples of n objects are equally likely to occur, the sampling method is called simple random sampling. There are many ways to obtain a simple random sample. One way would be the lottery method. Each of the N population members is assigned a unique number. The numbers are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers. Population members having the selected numbers are included in the sample. a. Stratified Sampling Method (SSM) With stratified sampling, the population is divided into groups, based on some characteristic. Then, within each group, a probability sample (often a simple random sample) is selected. In stratified sampling, the groups are called strata. As an example, suppose we conduct a national survey. We might divide the population into groups or strata, based on geography - north, east, south, and west. Then, within each stratum, we might randomly select survey respondents. b. Cluster sampling With cluster sampling, every member of the population is assigned to one, and only one, group. Each group is called a cluster. A sample of clusters is chosen, using a probability method (often simple random sampling). Only individuals within sampled clusters are surveyed. Note the difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes elements from each stratum. With cluster sampling, in contrast, the sample includes elements only from sampled clusters. c. Multistage Sampling With multistage sampling, we select a sample by using combinations of different sampling methods. For example, in Stage 1, we might use cluster sampling to choose clusters from a population. Then, in Stage 2, we might use simple random sampling to select a subset of elements from each chosen cluster for the final sample. d. Systematic Random Sampling With systematic random sampling, we create a list of every member of the population. From the list, we randomly select the first sample element from the first k elements on the population list. Thereafter, we select every kth element on the list. This method is different from simple random sampling since every possible sample of n elements is not equally likely. Experiments Method in Data Collection In an experiment, a researcher manipulates one or more variables, while holding all other variables constant. By noting how the manipulated variables affect a response variable, the researcher can test whether a causal relationship exists between the manipulated variables and the response variable. Parts of an Experiment All experiments have independent variables, dependent variables, and experimental units. a. Independent variable. An independent variable (also called a factor) is an explanatory variable manipulated by the experimenter. Each factor has two or more levels, i.e., different values of the factor. Combinations of factor levels are called treatments. The table below shows independent variables, factors, levels, and treatments for a hypothetical experiment. In this hypothetical experiment, the researcher is Vitamin C studying the possible effects of Vitamin C and Vitamin E on health. There are two factors - dosage of Vitamin C 0 mg 250 mg 500 mg and dosage of Vitamin E. The Vitamin C factor has 3 levels - 0 mg per day, 250 mg per day, and 500 mg per Vitamin E day. The Vitamin E factor has 2 levels - 0 mg per day and 400 mg per day. The experiment has six treatments. 0 mg Treatment 1 Treatment 2 Treatment 3 Treatment 1 is 0 mg of E and 0 mg of C, Treatment 2 is 0 400 mg Treatment 4 Treatment 5 Treatment 6 mg of E and 250 mg of C, and so on. b. Dependent variable. In the hypothetical experiment before, the researcher is looking at the effect of vitamins on health. The dependent variable in this experiment would be some measure of health (annual doctor bills, number of colds caught in a year, number of days hospitalized, etc.). c. Experimental units. The recipients of experimental treatments are called experimental units. The experimental units in an experiment could be anything - people, plants, animals, or even inanimate objects. In the hypothetical experiment above, the experimental units would probably be people (or lab animals). But in an experiment to measure the tensile strength of string, the experimental units might be pieces of string. When the experimental units are people, they are often called participants; when the experimental units are animals, they are often called subjects. Characteristics of a Well-Designed Experiment A well-designed experiment includes design features that allow researchers to eliminate extraneous variables as an explanation for the observed relationship between the independent variable(s) and the dependent variable. Some of these features are listed below. 1. Control. Control refers to steps taken to reduce the effects of extraneous variables (i.e., variables other than the independent variable and the dependent variable). These extraneous variables are called lurking variables. Control involves making the experiment as similar as possible for experimental units in each treatment condition. Three control strategies are control groups, placebos, and blinding. Control group. A control group is a baseline group that receives no treatment or a neutral treatment. To assess treatment effects, the experimenter compares results in the treatment group to results in the control group. Placebo. Often, participants in an experiment respond differently after they receive a treatment, even if the treatment is neutral. A neutral treatment that has no "real" effect on the dependent variable is called a placebo, and a participant's positive response to a placebo is called the placebo effect. To control for the placebo effect, researchers often administer a neutral treatment (i.e., a placebo) to the control group. The classic example is using a sugar pill in drug research. The drug is effective only if participants who receive the drug have better outcomes than participants who receive the sugar pill. Blinding. Of course, if participants in the control group know that they are receiving a placebo, the placebo effect will be reduced or eliminated; and the placebo will not serve its intended control purpose. Blinding is the practice of not telling participants whether they are receiving a placebo. In this way, participants in the control and treatment groups experience the placebo effect equally. Often, knowledge of which groups receive placebos is also kept from people who administer or evaluate the experiment. This practice is called double blinding. It prevents the experimenter from "spilling the beans" to participants through subtle cues; and it assures that the analyst's evaluation is not tainted by awareness of actual treatment conditions. 2. Randomization. Randomization refers to the practice of using chance methods (random number tables, flipping a coin, etc.) to assign experimental units to treatments. In this way, the potential effects of lurking variables are distributed at chance levels (hopefully roughly evenly) across treatment conditions. 3. Replication. Replication refers to the practice of assigning each treatment to many experimental units. In general, the more experimental units in each treatment condition, the lower the variability of the dependent measures. Confounding Confounding occurs when the experimental controls do not allow the experimenter to reasonably eliminate plausible alternative explanations for an observed relationship between independent and dependent variables. Consider this example. A drug manufacturer tests a new cold medicine with 200 participants - 100 men and 100 women. The men receive the drug, and the women do not. At the end of the test period, the men report fewer colds. This experiment implements no controls at all! As a result, many variables are confounded, and it is impossible to say whether the drug was effective. For example, gender is confounded with drug use. Perhaps, men are less vulnerable to the particular cold virus circulating during the experiment, and the new medicine had no effect at all. Or perhaps the men experienced a placebo effect. This experiment could be strengthened with a few controls. Women and men could be randomly assigned to treatments. One treatment could receive a placebo, with blinding. Then, if the treatment group (i.e., the group getting the medicine) had sufficiently fewer colds than the control group, it would be reasonable to conclude that the medicine was effective in preventing colds.

Use Quizgecko on...
Browser
Browser