Biostatistics Quiz Material PDF
Document Details
Uploaded by CelebratedNovaculite8593
University of Sharjah
Dr. Khalid Al Kubaisi
Tags
Summary
This document provides an overview of sampling methods, variability, and sample size calculations in biostatistics. Key topics covered include sampling distributions, the central limit theorem, and confidence intervals.
Full Transcript
Understanding Sampling in Dr. Khalid Al Kubaisi Research Sampling Distribution of the Mean Definition: The distribution of sample means over repeated sampling from the same population. Purpose: To understand the variability of the sample mean. Standard Error Suppose that we will...
Understanding Sampling in Dr. Khalid Al Kubaisi Research Sampling Distribution of the Mean Definition: The distribution of sample means over repeated sampling from the same population. Purpose: To understand the variability of the sample mean. Standard Error Suppose that we will measure the mean high to the 10,000 students. We found that the mean Height of the whole population is 170 centimeters. So, each time we take a sample and measure examine the sample, the mean will not be precisely the population mean, but it varies around it. This variability is called sampling error. This error can be quantified by the standard error. Standard Error Standard Error is used to create a range of values or intervals, which is likely to include the unknown population mean, and this interval is called a 95 percent confidence interval. Standard Definition: The standard deviation of the sampling Error Formula distribution of the mean. Practical Example - Standard Error in Drug Use A study is conducted to estimate the average reduction in cholesterol levels after using a new drug. The study involves 50 patients. Sample Mean (𝑥𝑥ˉ): 25 mg/dL reduction Sample Standard Deviation (s): 5 mg/dL Calculate the standard error of the mean cholesterol reduction. Answer Explanation: The standard error of 0.71 mg/dL indicates the precision of the sample mean (25 mg/dL) as an estimate of the population mean cholesterol reduction. In other words, if we were to repeat the study multiple times with different samples of 50 patients each, the sample means would typically vary by about 0.71 mg/dL from the true population mean. Understanding Standard Error Definition: The standard error (SE) quantifies the variability of the sample mean from the true population mean. Purpose: To provide a measure of how accurately the sample mean estimates the population mean. a) Lower SE: Indicates higher precision and less variability in the sample mean estimates. b) Higher SE: Indicates lower precision and more variability in the sample mean estimates. Why Standard Error Matters a) Precision of Estimates: SE helps us understand how much the sample mean is expected to vary from the true population mean. b) Confidence Intervals: SE is used to calculate confidence intervals, which provide a range of plausible values for the population mean. Practical Implications of SE in Drug Use a) Drug Efficacy Study: SE helps determine how reliably the sample mean reduction in cholesterol levels represents the true effect of the drug in the population. b) Decision Making: Lower SE provides more confidence in the results, aiding in regulatory approval and clinical recommendations. Central Limit Theorem CLT Definition: The Central Limit Theorem states that the distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. Importance: This theorem allows researchers to make inferences about population parameters using sample data. Practical Benefits of CLT in Drug Use Research a) Improved Accuracy: Allows for accurate estimation of population parameters even with small or non-normal samples. b) Informed Decision-Making: Facilitates hypothesis testing and confidence interval estimation, leading to better-informed decisions about drug efficacy and safety. c) Cost-Effectiveness: Reduces the need for larger sample sizes, saving time and resources. Sampling Definition: Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population. Purpose: To make inferences about the population without studying the entire population. Understanding Sample Size and Proportions Proportion Consistency Key Point: A sample of 100 individuals with 𝑝𝑝=0.14 is similar to a sample of 1000 individuals with 𝑝𝑝=0.14 Explanation: The proportion (p) remains the same, indicating that the characteristic of interest is consistent across different sample sizes. Practical Example – Small & Large Sample Scenario: Testing a new drug on 100 Scenario: Testing the same drug on patients. 1000 patients. A. Proportion with Side Effect (p): A. Proportion with Side Effects 0.14 (14 out of 100) (p): 0.14 (140 out of 1000) Interpretation: The larger sample Interpretation: The sample proportion also indicates that 14% of patients (0.14) indicates that 14% of patients experience the side effect. experience the side effect. a) Representativeness: The sample should accurately reflect the characteristics of the population. Criteria for b) Randomness: Each member of the Good population should have an equal chance of being selected. Sampling c) Adequate Size: The sample size should be large enough to provide reliable and valid results. Sampling Theory Definition: The study of relationships between populations and samples, and the techniques for drawing representative samples. Objective: To make valid generalizations from a sample to the population. Probability Sampling Definition: Sampling methods where each member of the population has a known, non-zero probability of being selected. Importance: Ensures representativeness and allows for generalization to the population. Probability sampling is often used in scientific research to minimize selection bias (errors) and increase the representativeness of the sample. Types of probability sampling Definition: Every member of the population has Probability an equal chance of being selected. sampling: Simple Method: Use random number generator Random Sampling software or drawing lots. Definition: Selects every nth (e.g. 10th) Probability sampling: individual from the population list. Systematic Sampling Method: Determine the sampling interval (k) and randomly select a starting point. When you know your target sample size, you can calculate your interval, k, by dividing your total estimated population size by your sample size. This can be a rough estimate rather than an exact calculation. Probability Definition: The population is divided into strata, and random samples are taken from each stratum. sampling: Purpose: Ensures representation of all subgroups Stratified Sampling within the population. Definition: The population is divided into clusters, and entire clusters are randomly selected. Probability sampling: Purpose: Useful when the population is large and Cluster Sampling geographically dispersed. Definition: is a sampling technique in which Non- members of the population are not randomly selected. Probability Importance: Useful for exploratory research Sampling and when probability sampling is impractical. Types of non-probability sampling: A. Convenience Sampling Definition: Selecting individuals who are readily available or convenient to sample. Example: Surveying people at a shopping mall. This method is quick and easy, but it has several limitations, such as: potential for bias, lack of representativeness and, reduced generalizability of the results to the broader population. Non-probability sampling: Quota Sampling Definition: Quota sampling involves selecting participants based on pre- defined quotas (specific number or proportion) for certain characteristics, such as age, gender, or income. Method: Divide the population into categories and select a predetermined number from each category. Non-probability sampling- Purposive sampling Definition: Selecting individuals based on specific characteristics or criteria. Example: Choosing experts in a particular field for a study. Sample Size Calculation A. Definition: Sample size calculation is the process of determining the number of observations or replicates to include in a statistical sample. B. Purpose: To ensure that the sample accurately represents the population and provides reliable and valid results. Sample Size Calculation Importance of Sample Size Calculation A. Accuracy: Ensures reliable and valid results. B. Cost-Effectiveness: Avoids unnecessary expenditure of resources a) Population Size: The total number of individuals in the population. b) Margin of Error: The range within which Factors the true population parameter is expected to lie. Influencing c) Confidence Level: The probability that the margin of error contains the true Sample Size population parameter. d) Variability: The degree of variation in the population Sample Size Formula for Proportions Example - Estimating Proportion A researcher wants to estimate the prevalence of hyperlipidemia (high cholesterol) in a large population (>10,000 individuals) with a 95% confidence interval and a 5% margin of error. Given: Confidence Level (CI): 95% (Z = 1.96) Margin of Error (E): 5% (0.05) Estimated Proportion (p): 0.14 (14%) Answer Interpretation: The required sample size is approximately 185 individuals to estimate the prevalence of hyperlipidemia with a 95% confidence interval and a 5% margin of error Sample Size Formula for Means Example - Estimating Mean A researcher wants to estimate the average serum cholesterol level in a large population (>10,000 individuals) with a 95% confidence interval and a 5% margin of error. Given: Confidence Level (CI): 95% (Z = 1.96) Margin of Error (E): 5 mg/dL Population Standard Deviation (𝜎𝜎): 20 mg/dL Answer Interpretation: The required sample size is approximately 61 individuals to estimate the average serum cholesterol level with a 95% confidence interval and a 5% margin of error Introduction to Statistical Inference CONFIDENCE INTERVALS Definition of Statistical Inference Definition: Statistical inference is the process of drawing conclusions about a population based on information from a sample. Purpose: To make predictions or decisions about population parameters. Importance of Statistical Inference Applications: Widely used in research, medicine, economics, and other fields. Example: Inferring the average blood pressure of a population based on a sample of patients. Inferring the Population Mean from the Sample Mean The Sample Mean Definition: The sample mean (𝑥𝑥ˉ) is the average value of a sample. Formula: Example: Calculating the average blood pressure from a sample of patients. Case Study A medical researcher wants to estimate the average blood pressure of adults in a city. Due to time and cost constraints, only a sample of patients can be measured. The following data has been collected from a sample of 30 randomly selected adults: Sample Size (n): 30 Sample Mean (𝑥𝑥ˉ): 122 mmHg Sample Standard Deviation (s): 12 mmHg Using a 95% confidence level, calculate the confidence interval for the population mean blood pressure (𝜇𝜇). Use the Z-score for a 95% confidence level (1.96). Definition of Confidence Intervals Definition: A confidence interval is a range of values within which the population parameter is expected to lie with a certain level of confidence. Purpose: To provide an estimate of the population parameter along with a measure of uncertainty. Confidence Level Common Levels: 90%, 95%, 99% Example: A 95% confidence interval means we are 95% confident that the interval contains the population mean. Steps to Construct a 95% Confidence Interval 1.Calculate the Sample Mean (𝑥𝑥ˉ) 2.Determine the Sample Standard Deviation (s) 3.Find the Z-Value for 95% Confidence Level 4.Calculate the Standard Error (SE) 5.Apply the Confidence Interval Formula Calculating Confidence Intervals Example Scenario: Estimating the average blood pressure with a 95% confidence interval. Sample mean (𝑥𝑥ˉ) = 120 mmHg Sample standard deviation (s) = 10 mmHg Sample size (n) = 25 Z-score for 95% confidence level = 1.96 Calculation We are 95% confident that the population mean blood pressure of adults in the city is between 117.71 mmHg and 126.29 mmHg. Exercise A pharmaceutical company wants to estimate the average effectiveness of a new drug in reducing cholesterol levels. They conduct a study with a sample of 40 patients. The results show: Sample Mean (𝑥𝑥ˉ) = 18 mg/dL reduction in cholesterol Sample Standard Deviation (s) = 5 mg/dL Calculate the 95% confidence interval for the population mean reduction in cholesterol levels due to the new drug. Use the Z-score for a 95% confidence level (1.96). Answer Conclusion: We are 95% confident that the population mean reduction in cholesterol levels due to the new drug is between 16.45 mg/dL and 19.55 mg/dL. If we were to repeat the sampling process 100 times, about 95% of the resulting confidence intervals would contain the true population mean. One-Sided Confidence Intervals Definition: A one-sided confidence interval provides either an upper or lower bound for a population parameter based on a directional hypothesis. Purpose: Used when the researcher is only interested in testing whether the parameter is greater than or less than a certain value. Types of One-Sided Confidence Intervals Lower Confidence Bound: Used to determine if a parameter is greater than a certain value. Upper Confidence Bound: Used to determine if a parameter is less than a certain value. When to Use One-Sided Confidence Intervals Directional Hypotheses: When the research question is specifically about testing one direction (greater than or less than). Examples: Testing if a new drug's effectiveness is greater than a standard drug. Determining if the mean pollution level is below a regulatory threshold. Calculating One-Sided Confidence Intervals Choosing the Z-Value Common Confidence Levels: 90% confidence level: 𝑧𝑧=1.28. 95% confidence level: 𝑧𝑧=1.645 99% confidence level: 𝑧𝑧=2.33 Practical Example - Upper Confidence Bound Scenario: Testing if a new drug reduces cholesterol levels to less than 200 mg/dL. Sample Data: Sample mean (𝑥𝑥ˉ) = 195 mg/dL Sample standard deviation (s) = 5 mg/dL Sample size (n) = 40 Confidence Level: 95% Z-Value: 1.645 Calculation: Interpretation: We are 95% confident that the mean cholesterol reduction due to the drug is less than 196.30 mg/dL. Practical Example - Lower Confidence Bound Scenario: Testing if the average test score is greater than 70. Sample Data: Sample mean (𝑥𝑥ˉ) = 75 Sample standard deviation (s) = 10 Sample size (n) = 25 Confidence Level: 95% Z-Value: 1.645 Calculation: Interpretation: We are 95% confident that the average test score is greater than 71.71. Advantages and Limitations Advantages: a) More powerful test is when the hypothesis is directional. b) Provides a clear boundary for decision-making. Limitations: a) Is Only applicable when the research question is directional. b) Cannot be used to test two-tailed hypotheses. Understanding Two-Sided Confidence Intervals Definition: A two-sided confidence interval provides a range within which the population parameter is expected to lie, considering both directions from the sample mean. Purpose: Used when the research hypothesis is non-directional, meaning the researcher is interested in testing whether the parameter is different from a certain value, without specifying the direction. When to Use Two-Sided Confidence Intervals Non-Directional Hypotheses: When the research question is about testing if a parameter is simply different from a certain value, not specifically higher or lower. Examples: Testing if a new drug's effectiveness is different from a standard drug. Determining if the mean weight of a product batch is different from the target weight Calculating Two-Sided Confidence Intervals Formula: Confidence Interval=𝑥𝑥ˉ±(𝑧𝑧×SE) Components: 𝑥𝑥ˉ: Sample mean 𝑧𝑧: Z-value corresponding to the desired confidence level SE: Standard error of the mean Choosing the Z-Value Common Confidence Levels: 90% confidence level: 𝑧𝑧=1.645 95% confidence level: 𝑧𝑧=1.96 99% confidence level: 𝑧𝑧=2.576 Two-Tailed Focus: The Z-value used for two-sided confidence intervals corresponds to the desired confidence level, covering both tails of the distribution Choosing the Z-Value Standard Error (SE) Calculation Components: 𝑠𝑠: Sample standard deviation 𝑛𝑛: Sample size Practical example A pharmaceutical company is testing a new drug intended to reduce blood pressure. They conduct a study with a sample of 36 patients. The results show: Sample Mean (𝑥𝑥ˉ) = 8 mmHg reduction in blood pressure Sample Standard Deviation (s) = 3 mmHg Question: Calculate the 95% confidence interval for the population mean reduction in blood pressure due to the new drug. Answer Advantages and Limitations Advantages: Applicable for non-directional hypotheses. Provides a comprehensive range for the parameter estimate. Limitations: Requires a larger sample size compared to one-sided intervals for the same confidence level. Does not specify the direction of the difference