MED106_7a_Selection Bias & Sampling Methods (Student) 2024 PDF

Systematic error in research I: Selection bias and different sampling methods Constantinos Koshiaris Assistant Professor in Medical Statistics and Epidemiology Session LOBs LOB31: Differentiate between random error and systematic error. LOB32: Outline the different sampling methods and describe how each of these can give rise to selection bias. LOB33: Describe selection bias and how it affects the validity of research studies. Systematic error vs. Random error Systematic error vs. Random error Random error is error introduced solely by chance and is inherent in the sampling process Systematic error (also called bias) is introduced via manmade actions relating to the conduct of a study Sample vs True population? What we measure is usually based on data collected from sampling from the total population We do not measure the true population measure (mean, %, etc) but an estimate of that based on representative sampling There is a level of uncertainty around the measure/estimate (precision of the measure) Example (random error): Height in a college class with 10 students Values: 165, 160, 170, 175, 182, 169, 190, 183, 163, 155 Population (N=10) Mean 171.2cm Sample (n=9) 173cm If excluding the shortest 169.1cm If excluding the tallest Between 169-173cm for the rest of the 8 samples Example (random error): Height in a college class with 10 students Values: 165, 160, 170, 175, 182, 169, 190, 183, 163, 155 Population (N=10) Mean 171.2cm Sample (n=4) 175cm 169.3cm 163cm estimates varied from one another by quite a bit. some of the estimates are very inaccurate, i.e. far from the true mean for the class. Errors in epidemiological studies Chance/random bias decreases with increase in the sample size Goes down to zero if the total population is included Confidence Interval of sample estimates A confidence interval indicates the level of uncertainty around the estimated measure Most studies report the 95% confidence interval (95%CI) 95%CI indicates a range within which we can be 95% certain/confident that the true population measure lies there; the larger the sample size the narrower is the 95%CI Systematic error vs. Random error Random error is error introduced solely by chance and is inherent in the sampling process Systematic error (also called bias) is introduced via manmade actions relating to the conduct of a study Errors in epidemiological studies Chance bias (p-value) decreases with increase in the sample size Goes down to zero if the total population is included 95%CI becomes narrower with increase in the sample size Systematic bias are not influenced by sample size Selection bias Selection bias Selection bias is systematic error resulting from the fact that the participants included in the study are not representative of the population from where they were selected (source population) Selection bias leads to a biased sample, which almost always, will give rise to biased estimates ! The sampling method of choice plays a major role in the representativeness of the sample Representative Sample Population 1:1 Males/Females 1:1 Sample Non-representative Sample Population 1:1 Males/Females 1:2 Sample Sampling methods Sampling methods Probability (random) sampling: sample selected by probabilistic methods; involves random selection, allowing you to make strong statistical inferences about the whole group Systematic sampling: sample selected according to some simple, systematic rule Non-probability sampling: sample selected by easily employed (convenient); involves non-random selection based on convenience or other criteria, allowing you to easily collect data. Sampling methods non-probability sampling simple random sampling stratified random sampling probability sampling cluster sampling multistage sampling systematic sampling simple syst. sampling prop. quota sampling Probabilistic sampling methods Simple Random Sampling Often referred to simply as ‘random sampling’ The most straight-forward of all random sampling methods All individuals in the sampling frame have the same probability of being selected independently of all others It is mainly used in quantitative research. Given a large sample size, random sampling ensures the chosen individuals are representative of the source population – Demography (e.g. age, sex, ethnicity) – Other important factors (e.g., clinical history, current disease status, lifestyle factors, etc.) Simple Random Sampling (procedure) Simple Random Sampling (procedure) Use tools like random number generators or other techniques that are based entirely on chance. Example: You want to select a simple random sample of 100 cancer patients from a registry of 1000. You assign a number to every patient in the registry database from 1 to 1000, and use a random number generator to select 100 numbers. Example of random number generator Simple Random Sampling (advantages and disadvantages) Advantages Ensures a representative sample from the source population – Provided that the sample size is large enough Less costly and less time consuming from other more sophisticated sampling methods Ideal for quantitative studies & test of hypothesis Disadvantages If the sampling frame is too large and/or the population is geographically diverse it may be impractical to perform (can be difficult to access lists of the full population) If a large sample is required, simple random sampling may be time consuming and costly Stratified Random Sampling Same principles as simple random sampling but within strata (subgroups) of the population – in terms of key demographic characteristics The size of the random sample should be proportional to the specific stratum size in the population Example stratified random sampling The company has 800 female employees and 200 male employees. You need a sample of 100 You sort the population into two strata based on gender. You want to ensure that the sample reflects the gender balance of the company so you use random sampling on each group, selecting 80 women and 20 men, which gives you a representative sample of 100 people. Stratified Random Sampling (procedure) Stratified Random Sampling (individual strata examples) Stratum 1: district Stratum 2: gender Stratum 3: age-group Nicosia (39%) men (48.6%) 0-14 (16.1%) Limassol (28%) women (51.4%) 15-64 (70.6%) Larnaca (17%) Paphos (11%) Famagusta (5%) ≥65 (13.3%) Stratified Random Sampling (combined strata example) Combined strata: district-gender-age Nicosia males aged 0-14 (6%) Nicosia males aged 15-64 (16%) Nicosia males aged ≥65 (4%) Nicosia females aged 0-14 (6%) Nicosia females aged 15-64 (17%) Nicosia females aged ≥65 (6%) Limassol males aged 0-14 (4%) Limassol males aged 15-64 (12%) etc.. Stratified Random Sampling (advantages and disadvantages) Advantages It allows you draw more precise conclusions by ensuring that every subgroup is properly represented in the sample. Enables the comparison of population sub-groups Disadvantages More time-consuming than simple random sampling Can’t be used when researchers can’t confidently classify every member of the population into a subgroup Higher complexity might give rise to errors (e.g. stratification not conducted properly) Cluster Sampling Based on the hierarchical structure of natural clusters (groups) of individuals within the population – Natural clusters may be hospitals, schools, streets, city districts, etc. Involves taking a random sample of these natural clusters, and then selecting all individuals in the selected clusters The sampling frame is a list of all clusters. If it is practically possible, you might include every individual from each sampled cluster. If the clusters themselves are large, you can also sample individuals from within each cluster using one of the techniques above. Cluster Sampling (procedure) Cluster Sampling (advantages and disadvantages) Advantages Disadvantages Can reduce cost and time Substantial differences between of data collection, clusters can cause errors especially when the It’s difficult to guarantee that the population is spread over a sampled clusters are really large area representative of the whole population Representativeness may be compromised if – Too few clusters are selected and/or – Clusters are too specific and/or – Clusters contain too few individuals Multi-stage Sampling Utilizes the hierarchical structure of natural clusters (groups) of individuals within the population – Similarly to cluster sampling After randomly selecting clusters, there is a random selection of individuals within the cluster May involve several random sampling stages: – Stage 1: Random selection of large clusters e.g. schools – Stage 2: Random selection of smaller clusters within large clusters e.g. class – Stage 3: Random selection of individuals within smaller clusters Multi-stage Sampling (advantages and disadvantages) Advantages Disadvantages Multi-stage sampling may improve sample representativeness (compared to simple random sampling) The representativeness of the sample may be compromised if – Especially if the population is geographically diverse and/or the sample is too small Less costly and less time consuming (depending on the number of stages however) – Too few clusters are selected and/or – Clusters are too specific and/or – Clusters contain too few individuals Systematic sampling methods (Simple) Systematic Sampling Sample selected according to some simple, systematic rule, but not randomly Sample may end up being equivalent to a simple random sample, provided there was no biasing pattern in the system of selection Examples: – selecting people from the sampling frame whose name starts with a certain letter (i.e. ‘A’) – selecting people from the sampling frame who were born on a selected month (i.e. January) – selecting every 2nd/5th/10th person from the sampling frame (Simple) Systematic Sampling (procedure) Example of Systematic Sampling All asthma patients of hospital are listed in alphabetical order. From the first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 asthmatics. If you use this technique, it is important to make sure that there is no hidden pattern in the list that might skew the sample. E.g., if the asthma database groups patients by doctor and patients of each doctor are listed by severity (mild to severe), there is a risk that your interval might skip over people a group of asthmatics, resulting in a sample that is skewed towards mild or severe. (Simple) Systematic Sampling (advantages and disadvantages) Advantages An acceptable, more convenient, alternative approach if for some reason random sampling is not possible Faster and possibly also cheaper Disadvantages The representativeness of the sample may be compromised if the system of choice selects individuals in a non-random fashion Proportional Quota Sampling Same principle as stratified random sampling – The sample is selected on a weighted manner based on predefined strata (distinct population subgroups) Strata instead of being filled by random sampling, they are filled by non-random sampling (systematic or other) – For example, if a total sample size of 1000 is required and the population consists of 40% women and 60% men, then (non-random) sampling will continue until these percentages are obtained and the overall sample quota met Proportional Quota Sampling (procedure) Proportional Quota Sampling (advantages and disadvantages) Advantages An acceptable, more convenient, alternative approach if for some reason stratified random sampling is not possible Compared to simple systematic sampling, could ensure the original population structure as it uses predefined population strata Disadvantages The representativeness of the sample may be compromised as individuals are selected in a nonrandom fashion Non-probabilistic sampling methods Convenience Sampling Convenience sampling is the most frequent example of non-probability sampling Individuals are selected in a non-random fashion, solely based on convenience (i.e. they are easy to access) Example: You are researching anxiety of university students, so after a class of an elective course, you ask your fellow students to complete a survey on the topic. This is a convenient way to gather data, but as you only surveyed students at the same year and elective course as you, the sample is not representative of all the students of your class nor the university. Convenience Sampling (procedure) Other non-random sampling methods Convenience Sampling (advantages and disadvantages) Advantages Cheap, fast and convenient Disadvantages The representativeness of the sample will definitely be compromised as individuals are selected in a nonrandom fashion Which sampling method to choose in order to minimize selection bias?? Which sampling method to choose? Depends on: – – – – The aim of the study The nature of the source population The sample size Other practical issues (i.e. financial resources, time availability, etc.) When no financial and time constrains exist: – Always strongly advised to use probability (random) sampling techniques in order to minimize selection bias – Stratified random sampling is the ideal method if the sample is small When non-random sampling techniques have been used: – The representativeness of the sample is always questionable – Assume that selection bias is operating at some extent Which sampling method to choose? In descriptive research (i.e. investigating the prevalence of a disease in a population): – Extremely important to have a perfectly representative sample, as selection bias will greatly influence the findings In analytic research (i.e. investigating exposure-outcome associations): – Minor deviations from a perfectly representative sample may be acceptable Minor selection bias may not affect the findings at a large extent Convenience sampling should always be avoided whatever the study design! Session LOBs LOB31: Differentiate between random error and systematic error. LOB32: Outline the different sampling methods and describe how each of these can give rise to selection bias. LOB33: Describe selection bias and how it affects the validity of research studies. Further reading (optional) Petrie A. & Sabin C. Medical Statistics at a Glance, 3rd Edition, Chapters 10, 34 [ISBN : 978-1-4051-8051-1] Buring EJ. Epidemiology in Medicine, Chapter 11 [ISBN : 978-0316356367] http://www.bmj.com/content/342/bmj.d1249 http://www.bmj.com/content/342/bmj.d1387 http://www.bmj.com/content/342/bmj.d1537 http://www.bmj.com/content/339/bmj.b5512 http://www.bmj.com/content/340/bmj.b5677 http://ebmh.bmj.com/content/10/3/67.extract http://www.bmj.com/about-bmj/resourcesreaders/publications/epidemiology-uninitiated/5-planning-and-conductingsurvey

MED106_7a_Selection Bias & Sampling Methods (Student) 2024 PDF

Document Details

Tags

Related

Summary

Full Transcript