Biostatistics Lecture Notes PDF

Biostatistics Dr. Youssif Alemad A Lecturer of community medicine Faculty of medicine Lect.8 Measures of central Measures of Measures of tendency dispersion position Median...

Biostatistics Dr. Youssif Alemad A Lecturer of community medicine Faculty of medicine Lect.8 Measures of central Measures of Measures of tendency dispersion position Median Range Z and T score mode The average deviation The variance The standard deviation Coefficient of variation 1 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ III-Measures of position — Z score: it represent distance that data point is from of all observation in term of normal distribution — The number of S.D that given value x is above or below the mean or reference population — Z- score range from -3 SD ( which would fall to the far left of a normal distribution curve) +3 SD ( which would fall to the far right of a normal distribution curve 2 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ III-Measures of position Example: someone's weight is 80 kg may be good information, but if you want to compare it to the average of persons weight. ▪ Z score can tell you where that person weight is compared to the average population mean weight. A z- score is zero tell you the value is exactly mean A z- score is one tell you is one SD above the mean A z- score is 2 tell you is 2 SD above the mean A z- score is -1.8 tell you is -1.8 SD below the mean While z- score is +3 tell you that value is much higher than mean Z- score to the right of mean are positive score and Z- score to the Sample Population left of mean are negative score for There are two z score: µ Z= Z= 3 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Calculating z-scores The data The mean number of standard deviations x is away from the mean − = The deviation The standard deviation The z-score is the deviation divided by the standard deviation III-Measures of position Example 1: ▪ For example lets say you have test score of 190. the test has a mean 150 and standard deviation 25. assuming a normal distribution. Answer: µ 190− 150 Z= == = 1.6 - The z score tell you how many standard deviation from mean year score is. - In this example your score is 1.6 standard deviation above the mean. 4 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ III-Measures of position Example 2: ▪ For example lets say Mohammed had mark 110 degree and Ali 103 degree. The exam mean 130 and standard deviation 25. the student marks < −1SD below the mean are considered faild Answer: µ − 130 Z score Mohammed = == = -0.8 successful µ − 130 Z score Ali = == = -1.08 faild 5 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪6‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪7‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪8‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪9‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪10‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪11‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ What are Z-scores for bone density? A bone density scan gives a person a Z-score and a Tscore. Bone density scores can tell a doctor whether a person has osteopenia or osteoporosis or is at risk of developing either condition. T-scores compare bone density with that of a healthy person, whereas Z-scores use the average bone density of people of the same age, sex, and size as a comparator. Although both scores can be useful, most experts prefer using Z-scores for children, teenagers, premenopausal females, and younger males. These scores are helpful for diagnosing secondary osteoporosis, which stems from underlying medical conditions, rather than primary osteoporosis, which usually results from aging. 12 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ DEXA scans Dual-energy X-ray absorptiometry (DEXA) scans use a low dose of ionizing radiation to measure bone density. If doctors need to measure a person’s bone density, they will likely use a DEXA scan. These scans measure bone mass, and doctors compare the results with established norms to provide a score. 13 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ T-scores versus Z-scores T-scores reflect how bone density compares with that of a typical, young, healthy person, whereas Z-scores use the bone density of those with similar characteristics for comparison. Healthcare professionals may provide DEXA scan results via T-scores and Z-scores. T-scores reveal how high or low a person’s bone density is compared with that of a typical healthy 30-year-old. The lower the scores, the lower the bone density. Thank you 14 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪15‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪16‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪17‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‬ ‫‪18‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪19‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Calcualte the strength of relation between 2 categorical variables variables: Relative risk Odds ratio 20 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪21‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪22‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪23‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪24‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Sampling & Sample Size Estimation By : Dr. Youssif Al Emad What is sampling ◼ A sample is some part of a larger body specially selected to represent the whole ◼ Sampling is then is taking any portion of a population or universe as representative of that population or universe ◼ Sampling is the process by which this part is chosen 25 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Reasons for Drawing a Sample ◼ Less time consuming than a census ◼ Less costly to administer than a census ◼ Less cumbersome and more practical to administer than a census of the targeted population Population and sample Population Sample 26 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Key Definitions ◼ A population (universe) is the collection of things under consideration ◼ A sample is a portion of the population selected for analysis ◼ A parameter is a summary measure computed to describe a characteristic of the population ◼ A statistic is a summary measure computed to describe a characteristic of the sample A Census ◼ A survey in which information is gathered about all members of a population ◼ Gallup poll is able to develop representative samples of any adult population with interviews of approximately 1500 respondents ◼ That sample size allows them to be 95% confident that the results they obtain are accurate within + or – 3% points 27 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Sampling concepts and terminologies ◼ Population/Target population ◼ Sampling unit ◼ Sampling frame Population/Target Population ◼ is the collection of all individuals, families, groups organizations or events that we are interested in finding out about. ◼ Is the population to which the researcher would like to generalize the results. For example, all adults population of Myanmar aged 65 or older 28 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Sampling unit/Element/ Unit of analysis ◼ Sampling unit is the unit about which information is collected. ◼ Unit of analysis is the unit that provides the basis of analysis. ◼ Each member of a population is an element. (e.g. a child under 5) ◼ Sometimes it is household, e.g. any injury in the household in the last three months. Sampling Frame ◼ The actual list of sampling units from which the sample, or some stage of the sample, is collected ◼ It is simply a list of the study population 29 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Sample Design ◼ A set of rules or procedures that specify how a sample is to be selected ◼ This can either be probability or non- probability ◼ Sample size: The number of elements in the obtained sample Types of sampling 30 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Types of sampling ◼ There are two types of sampling techniques: Probability sampling (or Random) Non- Probability sampling (or non-Random) Types of Sampling Methods Sampling Non-Probability Probability Samples Samples Simple Random Stratified Convenience Snow ball Cluster Quota Purposive Systematic 31 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Probability Sampling ◼ This is one in which each person in the population has a chance/probability of being selected Probability Sample Simple Systematic Stratified Cluster Random Types of Probability Sampling Simple random Systematic sampling Stratified random Cluster sampling Multi-stage sampling 32 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Simple Random Samples ◼ Every individual or item from the frame has an equal chance of being selected ◼ Selection may be with replacement or, without replacement ◼ Samples obtained from table of random numbers or computer random number generators ◼ Random samples are unbiased and, on average, representative of the population Systematic sample ◼ This method is referred to as a systematic sample with a random start. ◼ This is done by picking every 5th or 10th unit at regular intervals. ◼ For example to carry out a filarial survey in a town, we take 10% sample. If the total population of the town is about 5000. The sample comes to 500. 33 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Systematic Samples ◼ Randomly select one individual from the 1 st group ◼ Select every k-th individual thereafter ◼ We number the houses first. Then a number is taken at random; say 3.Than every 10th number is selected from that point onward like 3, 13, 23, 33 etc. N = 500 n=3 First Group k = 10 Stratified Random sample ◼ This involves dividing the population into distinct subgroups according to some important characteristics, such as age, or socioeconomic status, religion and selecting a random number from each subgroup. (e.g. African voodoo healers) ◼ Especially important when one group is so small (say, 3% of the population) that a random sample might miss them entirely. ◼ Population divided into two or more groups according to some common characteristic ◼ Simple random sample selected from each group ◼ The two or more samples are combined into one 34 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Stratified Samples ◼ Procedure: Divide the population into strata (mutually exclusive classes), such as men and women. Then randomly sample within strata. ◼ Suppose a population is 30% male and 70% female. To get a sample of 100 people, we randomly choose males (from the population of all males) and, separately, choose females. Our sample is then guaranteed to have exactly the correct proportion of sexes. Cluster sample ◼ A sampling method in which each unit selected is a group of persons (all persons in a city block, a family, etc.) rather than an individual. ◼ Used when (a) sampling frame not available or too expensive, and (b) cost of reaching an individual element is too high ❑ E.g., there is no list of automobile mechanics in the Myanmar. Even if you could construct it, it would cost too much money to reach randomly selected mechanics across the entire Myanmar : would have to have unbelievable travel budget ◼ In cluster sampling, first define large clusters of people. Fairly similar to other clusters. For example, cities make good clusters. ◼ Once you've chosen the cities, might be able to get a reasonably accurate list of all the mechanics in each of those cities. Is also much less expensive to fly to just 10 cities instead of 200 cities. ◼ Cluster sampling is less expensive than other methods, but less accurate. 35 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Cluster Samples ◼ Population divided into several “clusters,” each representative of the population ◼ Simple random sample selected from each ◼ The samples are combined into one Population divided into 4 clusters. 36 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Non- Probability Sampling /(Non-Random) ◼ This is where the probability of inclusion in the sample is unknown. Convenience sampling Purposive sampling Quota sampling Snow ball sampling Convenience Sample ◼ Man-in-the-street surveys and a survey of blood pressure among volunteers who drop in at an examination booth in public places are in the category. ◼ It is improper to generalize from the results of a survey based upon such a sample for there is no known way of knowing what sorts of biases may have been operating. 37 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Convenience sample ◼ Whoever happens to walk by your office; who's on the street when the camera crews come out ◼ If you have a choice, don't use this method. Often produces really wrong answers, because certain attributes tend to cluster with certain geographic and temporal variables. ❑ For example, at 8am in Tokyo, most of the people on the street are workers heading for their jobs. ❑ At 10am, there are many more people who don't work, and the proportion of women is much higher. ❑ At midnight, there are young people and muggers. Quota ◼ Haphazard sampling within categories ◼ Is an improvement on convenience sampling, but still has problems. ◼ How do you know which categories are key? ◼ How many do you get of each category? 38 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Purposive/Judgment ◼ Selecting sample on the basis of knowledge of the research problem to allow selection of appropriate persons for inclusion in the sample ◼ Expert judgment picks useful cases for study ◼ Good for exploratory, qualitative work, and for pre-testing a questionnaire. Snowball Friend Friend Friend Friend Friend Friend Friend ◼ Recruiting people based on Friend recommendation of Friend Friend people you have just interviewed Friend Friend Main person ◼ Useful for studying invisible/illegal Friend Friend populations, such as Friend drug addicts Friend Friend Friend Friend 39 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Errors in statistical Study A sample is expected to mirror the population from which it comes, however, there is no guarantee that any sample will be precisely representative of the population. No sample is the exact mirror image of the population. Sampling or Random Errors Non-sampling or systematic 31 1. Sampling error – random error- the sample selected is not representative of the population due to chance – The uncertainty associated with an estimate that is based on data gathered from a sample of the population rather than the full population is known as sampling error. – Sampling errors are the random variations in the sample estimates around the true population parameters. 40 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 32 Sampling error cont’d… the level of it is controlled by sample size a larger sample size leads to a smaller sampling error. it decreases with the increase in the size of the sample, and it happens to be of a smaller magnitude in case of homogeneous population. When n = N ⇒ sampling error = 0 ✓ Can not be avoided or totally eliminated 33 Sampling error cont’d… why do sample estimates have uncertainty associated with them? There are two reasons. Estimates of characteristics from the sample data can differ from those that would be obtained if the entire population were surveyed. Estimates from one subset or sample of the population can differ from those based on a different sample from the same population (sample to sample variations). 41 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 34 The cause of sampling error Chance: main cause of sampling error and is the error that occurs just because of bad luck. Sampling bias: Sampling bias is a tendency to favor the selection of participants that have particular characteristics. The chance component (sometimes called random error) exists no matter how carefully the selection procedures are implemented, and the only way to minimize chance- sampling errors is to select a s u f f i ci e n t ly lar ge s a mp l e. 35 2. Non Sampling Error It is a type of systematic error in the design or conduct of a sampling procedure which results in distortion of the sample, so that it is no longer representative of the reference population. We can eliminate or reduce the non-sampling error (bias) by careful design of the sampling procedure and not by increasing the sample size. It can occur whether the total study population or a sample is being used. 42 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 36 o The basic types of non-sampling error ❑ Non-response error ❑ Response or data error o A occurs when units selected as part of the sampling procedure do not respond in whole or in part ❑ If non-respondents are not different from those that did respond, there is no non-response error ❑ When non-respondents constitute a significant proportion of the sample (about 15% or more 37 o A is any systematic bias that occurs during data collection, analysis or interpretation ❑ Respondent error (e.g., lying, forgetting, etc.) ❑ Interviewer bias ❑ Recording errors ❑ Poorly designed questionnaires 43 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 38 Non-Sampling Error cont’d … Systematic error makes survey results unrepresentative of the target population by distorting the survey estimates in one direction. Random error can distort the results in any given direction but tend to balance out on average Thus, the total survey error sampling error + non-sampling error 39 Non-sampling Errors ◼ An inadequate sampling frame (Non- coverage) ◼ Non-response from participants ◼ Response errors ◼ Coding and data entry errors 44 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Evaluating Survey Worthiness ◼ What is the purpose of the survey? ◼ Is the survey based on a probability sample? ◼ Coverage error – appropriate frame ◼ Non-response error – follow up ◼ Measurement error – good questions elicit good responses ◼ Sampling error – always exists Sample size estimation 45 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Sample Size ❖ Sample size relates to how many people to pick up for the study ❖ The question often asked is: How big a sample is necessary for a good survey? ❖ The main objective is to obtain both a desirable accuracy and a desirable confidence level with minimum cost. Determination of Sample Size ◼ Type of analysis to be employed ◼ The level of precision needed ◼ Population homogeneity /heterogeneity ◼ Available resources ◼ Sampling technique used 46 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Sample Size Calculation ◼ the desired sample size ◼ the standard normal deviate usually set at 1.96 (which corresponds to the 95% confidence level) ◼ the proportion in the target population to have a specific characteristic. If no estimate available set at 50% (or 0.50) ◼ ◼ absolute precision or accuracy, normally set at 0.05. Sample Size Calculation n = (1.96)2 (0.5) (0.5) (0.05) 2 n =384 47 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Before you go to the field… ◼ Work plan ❑ Time lines ❑ Field work logistics ◼ Financing and budget ◼ Develop instruments ◼ Drawing a sample of household ◼ Training manual ◼ Pilot test Sample Size Formula 48 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Hypothesis Testing: DR. Youssif Al Emad 1 Hypothesis Testing Hypothesis testing is the key to our scientific inquiry. In additional to research hypotheses, need statistical hypotheses. Involves the statement of a null hypothesis, an alternative hypothesis, and the selection of a level of significance. 2 49 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Statistical Hypotheses Statements of circumstances in the population that the statistical process will examine and decide the likely truth or validity Statistical hypotheses are discussed in terms of the population, not the sample, yet tested on samples Based on the mathematical concept of probability Null Hypothesis Alternative Hypothesis 3 Null Hypothesis What is the Null Hypothesis? 4 50 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Null Hypothesis The case when the two groups are equal; population means are the same Null Hypothesis = H0 This is the hypothesis actually being tested H0 is assumed to be true 5 Alternative Hypothesis What is the Alternative Hypothesis? 6 51 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Alternative Hypothesis The case when the two groups are not equal; when there is some treatment difference; when other possibilities exist Alternative Hypothesis = H 1 Or Ha H1 is to be true when the H 0 is false. 7 Statistical Hypotheses The H0 and H 1 must be mutually exclusive The H0 and H 1 must be exhaustive; that is, no other possibilities can exist The H1 contains our research hypotheses 8 52 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Statistical Hypotheses Can you give an example of a Null and Alternative Hypothesis? 9 Null Hypothesis H0 There is no treatment effect The drug has no effect H1 There is a treatment effect The drug had an (some, any) effect 10 53 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Evaluation of the Null In order to gain support for our research hypothesis, we must reject the Null Hypothesis Thereby concluding that the alternative hypothesis (likely) reflects what is going on in the population. You can never “prove” the Alternative Hypothesis! 11 Significance Level Need to decide on a Significance Level: The probability that the test statistic will reject the null hypothesis when the null hypothesis is true Significance is a property of the distribution of a test statistic, not of any particular draw of the statistic Determines the Region of Rejection Generally 5% or 1% 12 54 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Alpha Level The value of alpha (α) is associated with the confidence level of our test; significance level. For results with a 90% level of confidence, the value of α is 1 - 0.90 = 0.10. For results with a 95% level of confidence, the value of alpha is 1 - 0.95 = 0.05. Typically set at 5% (.05) or 1% (.01) 13 -value The -value, or calculated probability, is the estimated probability of the null hypothesis (H 0) of a study question when that hypothesis is Probability that the observed statistic occurred by chance alone 14 55 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 15 Obtaining Significance Compare the values of alpha and the -value. There are two possibilities that emerge: – The -value is less than or equal to alpha (e.g., <.05). In this case we reject the null hypothesis. When this happens we say that the result is statistically significant. In other words, we are reasonably sure that there is something besides chance alone that gave us an observed sample. – The -value is greater than alpha (e.g., p >.05). In this case we fail to reject the H0. Therefore, not statistically significant. Observed data are likely due to chance alone. 16 56 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ FIGURE 27-2 Examples of Hypotheses for Statistical Tests 17 1. State 0 2. State 1 3. Choose a ( a = 0,10 or 0,05 or 0,01 4. Choose n = sample size 5. Choose the suitable statistical Test 6. Set Up Critical Value(s) 7. Compute Statistical Test and value 8. Make Statistical Decision 9. Interpret the result 18 57 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 1. The null hypo thesis (denoted by H0) Hypothesis of no association ( no statistical significance) or difference e.g.: Mean = 30 or Mean > 30 There is no association/ no statistical significance between sex of the patients and hypertension 2. The alternat ive hypothesis (denoted by H1 or Ha) Hypothesis of association (statistical significance) or difference e.g.: Mean # 30 or Mean ‹ 30 There is association/ statistical significance between sex of the patients and hypertension 19 19 1. One tailed hypothesis: testing for the possibility of the relationship in one direction and completely disregarding the possibility of a relationship in the other direction e.g. independent variable causes only increase or only decrease in the dependent variable or mean is only greater or only less than 2. Two tailed hypothesis: testing for the possibility of the relationship in two direction with the possibility of a relationship in the other direction e.g. independent variable causes increase or decrease in the dependent variable or mean58 is greater and less 20 than ‫ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ‬ 20 ‫ﻣﻊ‬ Two tailed hypothesis One tailed hypothesis 21 The set level for researcher to get chance of errors It is also called Level of Significance Selected by the Researcher at the Start It is set either: – 0.05 (5%) C.I. is 95% – 0.01 (1%) C.I. is 99% – 0.10 (10%) C.I. is 90% – α = 1 – Confidence Interval (C.I.) – So, Confidence Interval (C.I.) = 1 – α 22 59 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Each study design uses different method of sample size calculation and one formula cannot be used in all designs 1 Population size 2 Expected prevalence of outcome or event of interest 3 Sample error for estimate 4 Significance level 5 Design effect 23 Examples of Sample Size Calculation 24 60 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ These tests are mathematical methods by which the probability (P) or relative frequency of an observed difference occurring by chance is found There are two types of tests for quantitative variables: 1- Parametric tests : If the data are normally distributed 2- Non parametric tests: If the data are not normally distributed Statistical tests types are going to be taken in details in 25 the next lecture 26 61 ‫ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 27 ‫ﻣﻊ‬ Is the probability of error which may occur due to chance It is calculated by test of significance It is compared to alpha to make the statistical decision 27 If the P-value is less than Level of Significance - ( , The and H1 is accepted If the P-value is more than Level of Significance ( , The null hypothesis is not rejected. 28 62 ‫ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 29 ‫ﻣﻊ‬ When The and H1 is accepted The interpretation is that: so there is significant association When the null hypothesis is Failed to be rejected. The interpretation is that: The so there is no significant association 29 30 An investigator wants to study the association between Smoking (Yes/ No) and Lung cancer. He collects relevant data from 100 patients of lung cancer and 100 persons without lung cancer the data were Normally distributed answer the following knowing that: ✓ Confidence interval was calculated by 90% confidence degree ✓ P value was 0.06 1. Write down Null and alternative hypothesis for this study 2. What is the level of significance used 3. How much was the sample size 4. What is the statistical decision the investigator has made 5. What is the interpretation the 30 63 investigator has made ‫ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ‬ 9- HYPOTHESIS TESTING (2) (TESTS OF SIGNIFICANCE) 31 1 1. State 0 2. State 1 3. Choose a ( a = 0,10 or 0,05 or 0,01 4. Choose n = sample size 5. Choose the suitable statistical Test 6. Set Up Critical Value(s) 7. Compute Statistical Test and value 8. Make Statistical Decision 9. Interpret the result 32 64 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ These tests are mathematical methods by which the probability (P) or relative frequency of an observed difference occurring by chance is found There are two types of tests for quantitative variables: 1- Parametric tests : If the data are normally distributed 2- Non parametric tests: If the data are not normally distributed 33 Type of Test: Variable (1) Variable (2) Criteria 1- Chi Square Qualitative Qualitative Sample size ≥ 20 and no Test (X²) expected value < 5 2- X² Test with Qualitative Qualitative Sample size > 40 but Yates Dichotomous Dichotomous with at least one expected value < 5 Correction 3- Fisher test Qualitative Qualitative Sample size < 20 or (< Dichotomous Dichotomous 40 but with at least one expected value < 5) 34 65 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Type of Test: Variable (1) Variable (2) Criteria 1- Student's t Qualitative Quantitative Normally distributed data Test Dichotomous 2- ANOVA Qualitative Quantitative Normally distributed data Polynomial 3- Paired t Test Quantitative Quantitative Repeated measurement of the same individual & item (e.g. Hb level before & after treatment) Normally distributed data 4- Pearson Quantitative - Quantitative Normally distributed data Correlation continuous - continuous & Linear Regression 35 Data normally Data are not Variable)1( Variable (2) Criteria distributed normally distributed Two sample t- Wilcoxon rank- Qualitative Quantitative Compare means between Dichotomous two distinct/ independent test sum test groups Paired t Test Wilcoxon Quantitative Quantitative Compare two quantitative measurements taken from signed-rank the same individual test Analysis of Kruskal- Wallis Qualitative Quantitative compare means between Polynomial three or more distinct/ variance test independent groups (ANOVA) 4- Pearson Spearman’s Quantitative - Quantitative Estimate the degree of continuous - association between two Correlation rank quantitative variables & Linear correlation continuous 36 66 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Regression Chi Square Test (X²) ❖Formulate H0 and H1 hypothesis ? ❖What statistical test of hypothesis would you advise for the investigator in this situation? 37 X² Test with Yates Correction ❖Formulate H0 and H1 hypothesis ? ❖What statistical test of hypothesis would you advise for the investigator in this situation? 38 67 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Fisher test ❖Formulate H0 and H1 hypothesis ? ❖What statistical test of hypothesis would you advise for the investigator in this situation? 39 Fisher test ❖Formulate H0 and H1 hypothesis ? ❖What statistical test of hypothesis would you advise for the investigator in this situation? 40 68 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Student's t Test ❖ Formulate H0 and H1 hypothesis ? ❖ What are the types of variables in this study? ❖ What statistical test would you advise for the investigator in this situation? 41 Wilcoxon rank-sum test ❖ Formulate H0 and H1 hypothesis ? ❖ What are the types of variables in this study? ❖ What statistical test would you advise for the investigator in this situation? 42 69 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ANOVA test An investigator wants to study the association between maternal intake of calcium supplements (Sufficing , partial sufficing ,non- Sufficing ) and baby birth weights (in gms) of newborn babies. He collects relevant data from 100 pregnant women and their newborns the data were normally distributed ❖ What statistical test of hypothesis would you advise for the investigator in this situation? 43 Kruskal- Wallis test An investigator wants to study the association between maternal intake of calcium supplements (Sufficing , partial sufficing ,non- Sufficing ) and baby birth weights (in gms) of newborn babies. He collects relevant data from 100 pregnant women and their newborns the data were normally distributed ❖ What statistical test of hypothesis would you advise for the investigator in this situation? 44 70 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Paired t Test The investigator wants to determine the efficacy of new drug for hypercholesterolemia. So he has recorded the cholesterol level of 250 patients then he has given them the new drug then recorded their cholesterol after new drug administration. Data were normally distributed. ❖ What is the suitable statistical test to be used in this study? 45 Wilcoxon signed-rank test The investigator wants to determine the efficacy of new drug for hypercholesterolemia. So he has recorded the cholesterol level of 250 patients then he has given them the new drug then recorded their cholesterol after new drug administration. Data were normally distributed. ❖ What is the suitable statistical test to be used in this study? 46 71 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ Pearson Correlation & Linear Regression The investigator wants to determine the association between age of the patients and their creatinine level in blood. So he has chosen 40 person and recorded their ages and investigated their serum creatinine level Data were normally distributed ❖ What is the suitable statistical test? 47 Spearman’s rank correlation The investigator wants to determine the association between age of the patients and their creatinine level in blood. So he has chosen 40 person and recorded their ages and investigated their serum creatinine level Data were normally distributed ❖ What is the suitable statistical test? 48 72 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ The investigator wants to determine the effect of the new oral hyp oglycemic drug. He has chosen 30 patients and gave them the new drug and 50 patients taking Metformin. Then he compared the mean decrease in their FBS (fasting blood sugar). Alpha was 0.05 P-value was 0.08 Data were normally distributed 1. Write down H0 and H1? 2. What is the sample size (n)? 3. What is the suitable statistical test to be used here? 4. What is the Interpretation made after this study? 49 50 73 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ 51 = Of those who have the disease, what % test positive? = Of those who do not have the disease, what % test negative? 52 74 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ = Of those who test positive, what % actually have the disease? = Of those who test negative, what % actually do not have the disease? 53 54 75 ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪55‬‬ ‫‪56‬‬ ‫‪76‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪57‬‬ ‫‪58‬‬ ‫‪77‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪59‬‬ ‫‪60‬‬ ‫‪78‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪61‬‬ ‫‪62‬‬ ‫‪79‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬ ‫‪80‬‬ ‫ﻣﻊ ﲢﻴﺎﺕ ﻣﻜﺘﺒﺔ ﺍﳋﻠﻴﺞ ﺍﻟﻌﺮﺑﻲ‬

Biostatistics Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue