Biostatistics 2017/18 Seminar Notes PDF
Document Details
Uploaded by ComfortingAestheticism
University of Debrecen Faculty of Medicine
Tags
Summary
These notes provide an overview of introductory biostatistics concepts, specifically focusing on set theory, probability calculations, conditional probability, and independent events. The document also contains working examples that illustrate different concepts.
Full Transcript
2nd seminar ▪ Set theory. Set operations. ▪ Probability. Frequency and relative frequency The axioms of Kolmogorov ▪ Conditional probability ▪ Principle of marginalization, Bayes theorem ▪ Independent events ▪ Examples Week 3 ...
2nd seminar ▪ Set theory. Set operations. ▪ Probability. Frequency and relative frequency The axioms of Kolmogorov ▪ Conditional probability ▪ Principle of marginalization, Bayes theorem ▪ Independent events ▪ Examples Week 3 Probability ❖ Classical, theoretical approach. if there are N mutually exclusive and equally like outcomes of an event, and k of these posses a trait, E, the probability of E is equal to k/N. k P(E) = N ❖ Experimental, practical approach. if certain process is repeated a large number of times, n, and if some resulting event with the characteristic, E, occurs k times, the relative frequency of E is k/n. (n – the total number of experiments or absolute frequency) If the number of experiments is very large (n→∞), the variation of relative frequency becomes negligible. This number is called the probability of the event. k lim = p( E ) n → n The axioms of Kolmogorov the probability of an event is a number between 0 and 1 0≤p(A)≤1 the probability of an impossible event is 0, of the certain event is 1. P(S) = 1, P(Ø) = 0 the probability of complement of event A: p(Ā)=1–p(A) (the complement of A is the event which occurs when A doesn't occur) probability of the sum of events… p( A B) (the sum of A and B is the event which occurs when either A or B or both of them occur) …if A and B are mutually exclusive events p(A+B) = p(A) + p(B) A B …if A and B are not mutually exclusive events p ( A + B ) = p ( A ) + p ( B ) − p ( AB ) A B the sum of the probabilities of all mutually exclusive outcomes is 1 p ( A1 ) + p ( A2 ) + + p ( An ) = 1 Conditional probability ❖ Definition. Let A and B are two events and p(B) > 0. The conditional probability is the probability of event A, given the occurrence of another event B. That is, it gives the probability that A will occur under the condition that B does occur. p ( A B) = p( AB) p( B) notation: P(A|B), read: probability of A given B Conditional probability ❖ Example. A = {1st year students at the MHSC}, p(A) = 0.24 B = {Norwegian students at the MHSC}, p(B) = 0.05 p(AB) = 0.015 P ( A B) = P( AB) A AB B P( B) 1) What fraction of 1st year students is Norwegian? 𝑝(𝐴𝐵) 0.015 𝑝 𝐵𝐴 = = = 0.0625 = 6.25% 𝑝(𝐴) 0.24 2) What fraction of Norwegian students is in 1st year? 𝑝(𝐴𝐵) 0.015 𝑝 𝐴𝐵 = = = 0.3 = 30% 𝑝(𝐵) 0.05 Independent events ❖ Definition. If A and B are independent then p(A|B) = p(A), that is, the probability of the occurrence of A does not depend on whether B has occurred or not. In this case: P ( AB ) = P ( A ) P ( B ) P ( A B ) = P( A) this equation should be used to check the independence of two events (A and B) if event A does not depend on event B hence event B will be independent from event A ✓ Are these events independent? A B NO! These are mutually exclusive events – if one of them occurs, the other will definitely NOT occur because they have no intersection. So the occurrence of one DOES have an effect on the occurrence of the other. mutually exclusive events are NOT independent!! Marginalization; Bayes-theorem ❖ Definition. if events B1, B2…Bn are mutually exclusive pairwise and the sum of their probabilities is 1, and p(Bi)≠0, then the marginal probability p(A) is: n n p ( A ) = p ( ABi ) = p ( A | Bi ) p ( Bi ) i =1 i =1 The Bayes-theorem can be used to calculate a conditional probability in the knowledge of the reverse conditional probabilities and marginal probabilities. For example knowing the p(A|Bi) conditional probabilities and the p(Bi) marginal probabilities the p(Bk|A) conditional probability can be calculated as: p ( A | Bk ) p ( Bk ) p ( Bk | A) = n p( A | B ) p ( B ) i =0 i i Example Twenty percent of a particular age group has hypertension. Five percent of this age group has hypertension and diabetes. Given that an individual from this age group has hypertension, what is the probability that the individual also has diabetes? Example Twenty percent of a particular age group has hypertension. Five percent of this age group has hypertension and diabetes. Given that an individual from this age group has hypertension, what is the probability that the individual also has diabetes? Solution 1. let us designate the event of having hypertension and diabetes by H and D, respectively P ( H ) = 0.2 P ( H D ) = P ( D H ) = 0.05 we would like to find the following conditional probability: P ( D H ) P( D H ) 0.05 using the definition of conditional probability: P(D H ) = = = 0.25 P( H ) 0.2 Solution 2. set H is drawn such that its area is 20% of the area of the sample space. set DH is drawn such that its area is 5% of the area of the sample space. the area of set D is not given in the problem. we would like to find the area of DH compared to that of H. Example Twenty percent of a particular age group has hypertension. Five percent of this age group has hypertension and diabetes. Given that an individual from this age group has hypertension, what is the probability that the individual also has diabetes? Solution 3. displaying the problem using a contingency table: hypertension SUM have don’t have have 0.05 diabetes don’t have SUM 0.2 From among the people having hypertension (0.2) 0.05/0.2=0.25=25% also have diabetes, which is the conditional probability of diabetes given the presence of hypertension. Example 2. In a town 48% of teenagers have a bike and 39% of teenagers have both a bike and a skateboard. What is the probability that a randomly selected teenager has a skateboard given that he/she has a bike? Example 2. In a town 48% of teenagers have a bike and 39% of teenagers have both a bike and a skateboard. What is the probability that a randomly selected teenager has a skateboard given that he/she has a bike? Solution 1. B and S designate those teenagers who have a bike and skateboard, respectively P ( B ) = 0.48 P ( B S ) = 0.39 we would like to find the following conditional probability: P (S B) P ( S B ) 0.39 Then, using the definition of conditional probability: P ( S B ) = = = 0.8125 P (B) 0.48 Solution 2. Displaying and solving the problem using a contingency table: bike SUM have don’t have have 39% skateboard don’t have SUM 48% From among the teenagers having a bike (48%), 39%/48%=0.8125=81.25% also have a skateboard, which is the answer to the question. Example 3. In a Drosophila (fruit fly) population 25% of the animals have eye mutation, 50% have wing mutation and 40% of the animals affected by the eye mutation also have wing mutation. What is the probability that a randomly selected animal will have at least one of the mutations? Example 3. In a Drosophila (fruit fly) population 25% of the animals have eye mutation, 50% have wing mutation and 40% of the animals affected by the eye mutation also have wing mutation. What is the probability that a randomly selected animal will have at least one of the mutations? Solution 1. probability of eye mutation: P ( E ) = 0.25 probability of wing mutation: P (W ) = 0.5 conditional probability of a wing mutation given an eye mutation: P (W E ) = 0.4 we are looking for the following probability: P (W + E ) using the addition rule: P (W + E ) = P (W ) + P ( E ) − P (W E ) P (W E ) using the definition of conditional probability: P (W E ) = P (W E ) = P (W E ) P ( E ) P (E ) substituting this last result into the penultimate equation: P (W + E ) = P (W ) + P ( E ) − P (W E ) P ( E ) = 0. + 0.25 − 0.4 0.25 = 0.65 Example 3. In a Drosophila (fruit fly) population 25% of the animals have eye mutation, 50% have wing mutation and 40% of the animals affected by the eye mutation also have wing mutation. What is the probability that a randomly selected animal will have at least one of the mutations? Solution 2. Solution of the problem using a contingency table: eye mutation SUM present absent present a. 0.25×0.4=0.1=P(WE) ഥW c. 0.5-0.1=0.4=P E 0.5=P(W) wing mutation ഥ absent b. 0.25-0.1=0.15=P WE SUM 0.25=P(E) We have to find the probability of the shaded events. a. According to the problem 40% of the animals having an eye mutation also have wing mutation. Therefore, the probability of the joint occurrence of the two events is 0.1. b. Since the fraction of animals with an eye mutation is 0.25, the fraction of those animals which have an eye mutation but don’t have a wing mutation is 0.15. c. Since the fraction of animals with a wing mutation is 0.5, the fraction of those animals which have a wing mutation but don’t have an eye mutation is 0.4. d. A probability we have to find is: 0.15+0.1+0.4=0.65 Remark: this probability can be read from the table in other different ways ( ) P (W ) + P W E = 0.5 + 0.15 = 0.65 P ( E ) + P ( E W ) = 0.25 + 0.4 = 0.65 Example 4. A computer disk manufacturer has three locations to produce computer disks. Plant A produces 30% of the disks, of which 0.5% are defective. Plant B produces 50% of the disks, of which 0.75% are defective. Plant C produces the remaining 20%, of which 0.25% are defective. a. What is the fraction of defective disks produced by the manufacturer? b. If a disk is purchased and found to be defective, what is the probability that it was manufactured by plant B? Example 4. A computer disk manufacturer has three locations to produce computer disks. Plant A produces 30% of the disks, of which 0.5% are defective. Plant B produces 50% of the disks, of which 0.75% are defective. Plant C produces the remaining 20%, of which 0.25% are defective. a. What is the fraction of defective disks produced by the manufacturer? b. If a disk is purchased and found to be defective, what is the probability that it was manufactured by plant B? Solution Let us designate defective disks by d, and disks produced by plant A, B and C with the letter of the plant P ( A ) = 0.3 P ( d A ) = 0.005 P ( B ) = 0.5 P ( d B ) = 0.0075 P ( C ) = 0.2 P ( d C ) = 0.0025 The Venn-diagram representation of the above is the following (d, represented by the shaded areas, is not drawn to scale) : We have to determine the total area of the blue-shaded part. Using the principle of marginalization and the definition of conditional probability: P ( d ) = P ( d A ) P ( A ) + P ( d B ) P ( B ) + P ( d C ) P ( C ) = P ( A d ) + P ( B d ) + P (C d ) = P(d ) = 0.3 0.005 + 0.5 0.0075 + 0.2 0.0025 = 0.00575 Example 4. A computer disk manufacturer has three locations to produce computer disks. Plant A produces 30% of the disks, of which 0.5% are defective. Plant B produces 50% of the disks, of which 0.75% are defective. Plant C produces the remaining 20%, of which 0.25% are defective. a. What is the fraction of defective disks produced by the manufacturer? b. If a disk is purchased and found to be defective, what is the probability that it was manufactured by plant B? Example 4. A computer disk manufacturer has three locations to produce computer disks. Plant A produces 30% of the disks, of which 0.5% are defective. Plant B produces 50% of the disks, of which 0.75% are defective. Plant C produces the remaining 20%, of which 0.25% are defective. a. What is the fraction of defective disks produced by the manufacturer? b. If a disk is purchased and found to be defective, what is the probability that it was manufactured by plant B? Solution We have to determine the ratio of the hatched blue area (defective disks produced in plant B) to the total area of the blue-shaded part (all defective disks). Using Bayes theorem: P ( B h ) P ( d B ) P ( B ) 0.0075 0.5 = = = 0.6521 P (h) P (d ) 0.00575 Example 4. A computer disk manufacturer has three locations to produce computer disks. Plant A produces 30% of the disks, of which 0.5% are defective. Plant B produces 50% of the disks, of which 0.75% are defective. Plant C produces the remaining 20%, of which 0.25% are defective. a. What is the fraction of defective disks produced by the manufacturer? b. If a disk is purchased and found to be defective, what is the probability that it was manufactured by plant B? Solution 2. Solution of the problem using a contingency table: Plant A Plant B Plant C SUM Defective 0.3×0.005 = 0.5×0.0075 = 0.2×0.0025 = 0.00575 0.0015 0.00375 0.0005 Not 0.3 – 0.0015 = 0.5 – 0.00375 = 0.2 – 0.0005 = 0.99425 defective 0.2985 0.49625 0.1995 SUM 0.3 0.5 0.2 1 P ( B d ) 0.00375 = = 0.6521 P (d ) 0.00575 Example 5. In a population 32% of the animals have an eye mutation, 56% of them have ear mutation and 15% of them have both of the mutations? Are the two mutations independent? Example 5. In a population 32% of the animals have an eye mutation, 56% of them have ear mutation and 15% of them have both of the mutations? Are the two mutations independent? Solution P ( eye ) = 0.32 P ( ear ) = 0.56 P ( eye ear ) = 0.15 the events are independent if...: p ( eye ear ) = p ( eye ) p ( ear ) since 0.15 0.32 0.56 the mutations are not independent Example 6. Are event A and B independent of each other if a. P(A)=0.3; P(B)=0.4; P(AB)=0 b. P(A)=0.3; P(B)=0.5; P(AB)=0.15 Example 6. Are event A and B independent of each other if a. P(A)=0.3; P(B)=0.4; P(AB)=0 b. P(A)=0.3; P(B)=0.5; P(AB)=0.15 Solution This equation should be used to check the independence of two events (A and B): P ( AB ) = P ( A ) P ( B ) a.) since P(AB)=0, and P(A)≠0, P(B)≠0, the events are not independent of each other. In general, mutually exclusive events (P(AB)=0) are never independent of each other. P ( AB ) = P ( A ) P ( B ) A B 0 0,3 0, 4 b.) since P(A)×P(B)=0.3×0.5=0.15=P(AB), these events are independent of each other. P ( AB ) = P ( A ) P ( B ) 0,15 = 0,3 0,5 0,15 = 0,15 Example 7. 80 patients took part in a clinical trial. 50 of them received an experimental drug, while 30 of them were treated with placebo. During the trail the condition of 30 patients treated with the experimental drug and 5 of those treated with placebo improved. If we randomly select a person whose condition didn’t improve during the test, what is the probability that he/she received placebo? Example 7. 80 patients took part in a clinical trial. 50 of them received an experimental drug, while 30 of them were treated with placebo. During the trail the condition of 30 patients treated with the experimental drug and 5 of those treated with placebo improved. If we randomly select a person whose condition didn’t improve during the test, what is the probability that he/she received placebo? Solution using a contingency table: improve not improve SUM treated with drug 30 50 – 30 = 20 50 treated with placebo 5 30 – 5 = 25 30 SUM 30 + 5 = 35 20 + 25 = 45 80 45 P ( didn 't improve ) = = 0.5625 80 P ( didn 't improve placebo ) 25 = = 0.55 P ( didn 't improve ) 45 Example 8. In a town 1/1000 of people suffer from a disease. A medical examination identifies with 99% certainty the condition of the patient, i.e. if he/she is sick, it shows the disease condition in 99% of the cases, if he/she is healthy, it shows the absence of disease in 99% of the cases. (This is equivalent to saying that the sensitivity and specificity of the diagnostic test are both 99%). a.) What is probability that the medical examination of a randomly selected person will identify the individual as sick? b.) Let us assume that someone has been found sick by the test. What is the probability that he/she is indeed sick? Example 8. In a town 1/1000 of people suffer from a disease. A medical examination identifies with 99% certainty the condition of the patient, i.e. if he/she is sick, it shows the disease condition in 99% of the cases, if he/she is healthy, it shows the absence of disease in 99% of the cases. (This is equivalent to saying that the sensitivity and specificity of the diagnostic test are both 99%). a.) What is probability that the medical examination of a randomly selected person will identify the individual as sick? b.) Let us assume that someone has been found sick by the test. What is the probability that he/she is indeed sick? Solution a.) we have to use the principle of marginalization and conditional probability. let us first examine that fraction of the population which has the disease condition. 99% of these people will be found sick by the test: P ( pos D + ) = P ( pos D + ) P ( D + ) = 0.99 1 1000 (where pos means a positive test result suggesting the disease condition and D+ stands for the presence of the disease) then, let us concentrate on the healthy part of the population (999/1000 of the people). 99% of them will be identified as healthy by the test, but 1% will be found to be sick by the test: P ( pos D − ) = P ( pos D − ) P ( D − ) = 0.01 999 1000 In summary: P ( pos ) = P ( pos D + ) P ( D + ) + P ( pos D − ) P ( D − ) = 0.99 1 999 + 0.01 = 0.01098 1000 1000 Example 8. In a town 1/1000 of people suffer from a disease. A medical examination identifies with 99% certainty the condition of the patient, i.e. if he/she is sick, it shows the disease condition in 99% of the cases, if he/she is healthy, it shows the absence of disease in 99% of the cases. (This is equivalent to saying that the sensitivity and specificity of the diagnostic test are both 99%). a.) What is probability that the medical examination of a randomly selected person will identify the individual as sick? b.) Let us assume that someone has been found sick by the test. What is the probability that he/she is indeed sick? Example 8. In a town 1/1000 of people suffer from a disease. A medical examination identifies with 99% certainty the condition of the patient, i.e. if he/she is sick, it shows the disease condition in 99% of the cases, if he/she is healthy, it shows the absence of disease in 99% of the cases. (This is equivalent to saying that the sensitivity and specificity of the diagnostic test are both 99%). a.) What is probability that the medical examination of a randomly selected person will identify the individual as sick? b.) Let us assume that someone has been found sick by the test. What is the probability that he/she is indeed sick? Solution b.) Using Bayes theorem we have to find the fraction of true positive test results (i.e. the probability of positive test results in sick persons divided by the probability of all positive test results): 1 P ( pos D ) P ( D ) + + 0.99 ( ) P D + pos = P ( pos ) = 1000 = 0.09016 0.01098 Example 8. In a town 1/1000 of people suffer from a disease. A medical examination identifies with 99% certainty the condition of the patient, i.e. if he/she is sick, it shows the disease condition in 99% of the cases, if he/she is healthy, it shows the absence of disease in 99% of the cases. (This is equivalent to saying that the sensitivity and specificity of the diagnostic test are both 99%). a.) What is probability that the medical examination of a randomly selected person will identify the individual as sick? b.) Let us assume that someone has been found sick by the test. What is the probability that he/she is indeed sick? Solution 2. Solving both problems using a contingency table: Disease SUM present absent I. a. III. positive 0.99×0.001=0.00099= 0.00099+0.00999= Test result 0.999-0.98901=0.00999 =9.9×10-4 =P(VP) =0.01098=P(pos) negative II. 0.99×0.999=0.98901 SUM 0.001 0.999 You have to find the probabilities in the order shown by the roman numerals. The answer to the first question is the probability in the shaded cell. P (VP ) 0.00099 = = 0.09016 P ( pos ) 0.01098