Soc 2206 Sampling Strategies Week 5 PDF
Document Details
Uploaded by Deleted User
Western Social Science
2024
Jasmine Ha
Tags
Summary
These lecture notes cover sampling strategies for social science research. Different types of sampling methods, such as probability and non-probability sampling, are presented. The importance of representative samples and margin of error estimation is also explained.
Full Transcript
SOC 2206 Sampling Strategies Week 5 Prof. Jasmine Ha Agenda 1. Population & Sample Random Errors Systematic Errors 2. Probability Sampling 3. Non-probability Sampling 4. Q&A for the midterm exam Conduct the r...
SOC 2206 Sampling Strategies Week 5 Prof. Jasmine Ha Agenda 1. Population & Sample Random Errors Systematic Errors 2. Probability Sampling 3. Non-probability Sampling 4. Q&A for the midterm exam Conduct the research Write research proposal/report Scientific community (other researchers) Sample (participants) Practitioners (Policy makers, etc.) Researchers The general public Target population People who read your research findings We collect data from these people “consumers” of knowledge This study is about [a group of people] Key concepts Target population Population Parameters Census Sampling frame Sample: a subset of the population selected for a study Sampling: the process of deciding what or whom to include in the sample Target population A group about which social scientists attempt to make generalizations about Can be very specific: Western undergraduate students Can be quite abstract: people, youths in general Non-human populations: corporations, legal documents Unit of analysis: I will need to collect data from ____ to answer my research question. Find the target population Population A census is a study that includes data on every member of a population Population parameter Represents the “true value” of the population They are often not feasible in social research. Why? Time, Resource, Frequency (every 5, 6, 10 or 15 years) Limited number of questions Sample A subset of a population: save time & resource ask questions relevant to your research interests Sampling: the action, to draw a sample Why it matters? Textbook: 1936 US election (Landon vs. Roosevelt) Literary Digest survey: 2 million completed responses Survey (Poll) result: Landon (31 states) – Roosevelt (19 states) Landon (57% voters) – Roosevelt (43%) Actual result: Landon (2 states) – Roosevelt (48 states) Landon (40%) – Roosevelt (60%) What’s wrong? Poor sampling strategy Sample ≠ Population Sampling strategy Observed value = True value + Systematic error + Random error Types of errors 1. Systematic errors: cannot be estimated, only discuss direction of bias 2. Random errors: unbiased, can be estimated using statistics Representativeness: sample mirrors the population Probability sample Random selection: everyone has equal probability of being selected into the sample Remove most systematic errors Estimate random errors Probability sample Example: Population = Everyone attending SOC2206 today Would I have a random sample if I randomly call 8 names from the class list? Probability sample Example: Population = Everyone attending SOC2206 today Would I have a random sample if I use a random number generator to pick 8 students from the class list? Probability sampling Identify: Target population The desired sample size The sampling frame Select a sampling process: 1. Simple random or systematic 2. Cluster 3. Stratified 1a – Simple Random Sampling Each individual has the same probability of being selected & each pair of individuals has the same probability of being selected. Sampling process: 1. Obtain the sampling frame 2. Generate a set of random numbers and select individual corresponding to the selected numbers Why do this? Easy, but the sample may be nonrepresentative due to pure chance! 1b – Systematic Sampling Each individual has the same probability of being selected. Sampling process: 1. Obtain the sampling frame 2. Decide on the sample size = population size 3. Random select the first case, then select every nth case in the list. Why do this? Easy, but consecutive cases will not be selected. Order in the sampling frame may create bias 1b – Systematic Sampling Sampling process: 1. Obtain the sampling frame 2. Decide on the sample size = population size 3. Random select the first case, then select every nth case in the list. Math notes: Percentage = Proportion 25% = 2 – Cluster sampling No available sampling frame Sampling process: 1. Divide the target population into clusters (e.g., cities within Canada, classrooms in a high school) 2. Select clusters randomly 3. Get sampling frame for all selected clusters 4. Select individuals randomly from the selected clusters Why do this? Improved feasibility, lower cost 3 – Stratified sampling Sampling process: 1. Obtain the sampling frame 2. Divide the target population into population is divided into strata (e.g., gender, social class) 3. Select individuals randomly from all strata 4. Number of selected individuals reflects the proportions from each stratum Why do this? Prevent samples from becoming non-representative due to pure chance Oversample for small groups To recap Weighting How much sample members “count” when producing estimates. A group that is oversampled should receive less weight than other members of the sample. Example: We use simple random sampling to create a sample of 20 students from a total of 200 students in our class. How much weight should we assign to each student in the sample? Sampling method: simple random, no oversampling Weight = Each respondent “speaks for” 10 people in the population Oversample In a class of 100 students, we want a stratified sample of 10. Stratified by international student status (90% domestic, 10% international) & by gender (male 45%, female 55%) The problem: Stratification by international status: How many domestic students? How many international students? Stratification by gender: How many respondents in each? Male domestic Female domestic Male international Female international Oversample In a class of 100 students, we want a stratified sample of 10. Stratified by international student status (90% domestic, 10% international) & by gender (male 45%, female 55%) The solution: Sample 2 international students instead of 1 International students are oversampled by a factor of 2 Sample size = 11 (9 domestic + 2 international) Each international student should have smaller weight, specifically 1/2 = 0.5 times the weight of domestic students in the sample Oversample In a class of 100 students, we want a stratified sample of 10. Sample size = Population size n= 10 International students are oversampled by a factor of 2 Weight Domestic student: Weight = n = 10 International student: Weight = n x 0.5 = 5 Postsurvey weighting Nonresponse rate = 100 – Response rate Nonresponse may create systematic errors Example: access to personal computers nonresponse to an online survey Use postsurvey weighting: increase or reduce the weight of specific respondent groups Postsurvey weighting Example: Based on the Census, we know that there is 20% older adults in population. However, due to nonresponse, only 10% of the respondents in our sample are older adults. The older adults in our sample should have more weight than younger adults. How much more: times Why probability sampling? Unbiased: equally likely to underestimate or overestimate the population parameter Difference between sample estimates and population parameters is due to random chance Free from systematic errors Still have random sampling errors, but we can estimate how much Random sampling error Statistics: Say we draw 1000 random samples We will get 1000 different results Let’s try: Go to https://www.random.org/integers/ Random sampling error Statistics: Say we draw 1000 random samples We will get 1000 different results Reality: we can only draw one sample in our research project Use statistical distributions to estimate errors Margin of error The amount of uncertainty in an estimate Equals to the distance between the estimate and the boundary of the confidence interval. Levels of confidence: 95% and 99% Margin of error According to a Gallup poll, 43 percent of Americans approve of the job the president is doing. This estimates has a margin of error of 3 percentage points at 95% confident interval. Translation: We can be 95 percent confident that the true level of presidential approval is between 40 percent and 46 percent Calculating the confidence interval: Lower bound = mean - margin of error = 43 – 3 = 40 Upper bound = mean + margin of error = 43 + 3 = 46 Margin of error A poll estimates the 95% confidence interval of support for a senatorial candidate to be between 39% and 43%. Based on this, what is the margin of error? A. 1% B. 2% C. 4% D. 6% Margin of error & Sample size Larger sample smaller margin of error Study A has a sample of 100 people & a margin of error being 3%. If we want to reduce the margin of error to 1%, how many people do we need to include in the sample? Reduction in margin of errors = = 3 times Increase in sample size = times So, in Study A, we need to have 9 x 100 = 900 respondents in the sample Nonprobability samples Individuals in the sample are not randomly selected Systematic errors: impossible to quantify, only possible to predict the direction of bias Not representative of the population Low generalizability Can a probability sample be nonrepresentative? Yes, by chance. This is why we have stratified sampling. Why nonprobability samples? The diversity of representative samples makes detecting cause-and-effect relationships more difficult. Example: How people would response to cash incentives? Many experiments use samples of college students. Why nonprobability samples? We can often gather more or better information on nonrepresentative samples Qualitative research, in-depth examinations of subgroups Non-probability sampling Convenience Purposive Sequential Snowball Convenience sampling Select any subjects who are willing to participate Cheapest and easiest method Systematic errors Purposive sampling Selecting cases based on key features Access & quality Typicality Extremity Importance Deviant cases Contrasting outcomes Key differences Past experiences and intuition Sequential sampling Collect additional data based on their findings from data they’ve already collected. Key informants Sampling for range Saturation Snowball sampling Starts with one respondent who meets the requirements for inclusion Asks him or her to recommend other people to contact. Useful for studying “hidden populations” Hidden populations examples? Recap Observed value = True value + Systematic error + Random error How to minimize systematic error? How to minimize random error?