Probability in Clinical Assessment, Diagnosis, and Sample Size PDF

Summary

This document provides details on probability in clinical assessment, diagnosis and sample size. It covers topics such as Bayes' Rule, sensitivity, specificity, and the calculation of sample size. The content is likely suitable for undergraduate medical students.

Full Transcript

Unit 3 Probability in clinical assessment, diagnosis and sample size By a diagnostic test the physician may ask, "In view of this test result, how uncertain should I be about this patient?" Fortunately, there is a method for answering this question: the theory of probability. This unit is a primer...

Unit 3 Probability in clinical assessment, diagnosis and sample size By a diagnostic test the physician may ask, "In view of this test result, how uncertain should I be about this patient?" Fortunately, there is a method for answering this question: the theory of probability. This unit is a primer for applying probability theory to the interpretation of test results and deciding when to do a test rather than treat or do nothing. 1. Objectives: At the end of the unit the student expected to be able to A. Be familiar with term as sensitivity, specificity, PPV, NPV, Bayes equation. B. Calculate the rate of false positive or false negative gibe the sensitivity, specificity and prevalence. C. Calculate the probability that the patient has the disease given the sensitivity and specificity and prior probability. D. Conduct the ROC analysis using the software SPSS E. Using the Nomo grams table to determine the sample size in case diagnostic rests 2. Topics: Probability in clinical assessment A. Bayes’Rule B. Sensitivity and specificity of a test C. Positive and negative predictive values D. Area under the ROC Curve E. Bet threshold of a continuous Test: ROC Curve Sample size determination in some cases F. Formula in simple situations, Nomograms and tables of sample size 3. Presentation: A. Bayes’Rule Suppose E 1 , E 2 , E 3 are mutually exclusive events (i.e E 1 Ո E 2 =Փ and E 1 Ո E 3 =Փ and E 2 Ո E 3 =Փ and E 1 Մ E 2 Մ E 3 =Universe), suppose further A is event such that P(A)≠0. Then: [Type text] Page 1 𝑃(𝐸𝑖 ∩ 𝐴) 𝑃(𝐴/𝐸𝑖 ). 𝑃(𝐸𝑖 ) 𝑃(𝐸𝑖 /𝐴) = = 𝑃(𝐴) 𝑃(𝐴/𝐸1 ). 𝑃(𝐸1 ) + 𝑃(𝐴/𝐸2 ). 𝑃(𝐸2 ) + 𝑃(𝐴/𝐸2 ). 𝑃(𝐸3 ) Example from the literature: Diagnosis of heart disease Consider the results of an assay of N-terminal pro-brain natriuretic peptide (NT-proBNP) for diagnosis of heart failure in a general population survey in those over 45 years of age and in patients with existing diagnosis of heart failure obtained by Hobbs et al (2002) and summarised in Table. Heart failure was identified when NT-proBNP >36 pmol/l. Table: Results of NT-proBNP assay in the general population over 45 and those with a previous diagnosis of heart failure (after Hobbs et al, 2002) Calculate P(D+/T+)=? Solution: Suppose T + test result is positive; T- test result is negative D+ individual has the disease, D- individual is healthy D+ՄD-=universe P(D+) (priori) pre-test probability of the diseased in population (i.e. prevalence) P(D+/T+) is the a posteriori probability, It reflects the procedure of making a clinical judgement. 𝑃(𝐷 + ∩ 𝑇+) 𝑃(𝑇 +/𝐷 +). 𝑃(𝐷 +) 𝑃(𝐷 +/𝑇 +) = = 𝑃(𝑇+) 𝑃(𝑇 +/𝐷 +). 𝑃(𝐷 +) + 𝑃(𝑇 +/𝐷 −). 𝑃(𝐷 −). 𝑃(𝐷 + ∩ 𝑇+) (35 ÷ 103). (103 ÷ 410) 𝑃(𝐷 +/𝑇 +) = = 𝑃(𝑇+) (35 ÷ 103). (130 ÷ 410) + (7 ÷ 307). (307 ÷ 410) [Type text] Page 2 0.085 𝑃(𝐷 +/𝑇 +) = = 0.83 0.085 + 0.017 B. Sensitivity and specificity of a diagnosis test The sensitivity, SEN of a diagnostic test is the probability that this test will detect a diseased person among the group of patients, i.e. the accuracy of detecting the disease when the disease is present. (Sensitivity is expressed as a percentage) The Specificity SPE of the diagnostic test is the probability that this test will give a negative result for the healthy group, that is, the accuracy of detecting the health condition when the disease is absent. (Quality is expressed as a percentage) TP 35 sencitivity = P(T + / D +) = = = 0.340 or 34% TP + FN 35 + 68 TN 300 specificity = P(T − / D −) = = = 0.977 or 98% FP + TN 7 + 300 Sensitivity and specificity are useful statistics because they will yield consistent results for the diagnostic test in a variety of patient groups with different disease prevalence. This is an important point; sensitivity and specificity are characteristics of the test, not the population to which the test is applied. C. Positive and negative predictive values Positive predictive value, PPV It is the test's ability to identify true positives, i.e. individuals who are actually infected Or is the probability that an individual will be diseased when the test result is positive. (Positive predictive value is expressed as a percentage) 𝑃(𝐷 + ∩ 𝑇+) 𝑃𝑃𝑃 = 𝑃(𝐷 +/𝑇 +) = 𝑃(𝑇+) Negative predictive value NPV [Type text] Page 3 It is the ability of the test to identify negative positives, that is, individuals who are not actually infected. Or is the probability that an individual will be healthy when the test result is negative. (Negative predictive value is expressed as a percentage) 𝑃(𝐷 − ∩ 𝑇−) 𝑁𝑃𝑃 = 𝑃(𝐷 −/𝑇 −) = 𝑃(𝑇−) Example: Positive predictive value The prevalence of a disease is 1 in 1000, and there is a test that can detect it with a sensitivity of 100% and specificity of 95%. What is the probability that a person has the disease, given a positive result on the test? Solution: Method 1: 𝑃(𝑇 +/𝐷 +). 𝑃(𝐷 +) 𝑃(𝐷 +/𝑇 +) = 𝑃(𝑇 +/𝐷 +). 𝑃(𝐷 +) + 𝑃(𝑇 +/𝐷 −). 𝑃(𝐷 −). 𝑠𝑠𝑠 × 𝑝𝑝𝑝𝑝 = 𝑠𝑠𝑠 × 𝑝𝑝𝑝𝑝 + (1 − 𝑠𝑠𝑠𝑠) × (1 − 𝑝𝑝𝑝𝑝) 1 × 0.001 0.001 = = 1 × 0.001 + (1 − 0.95) × (1 − 0.001) 0.001 + (0.05) × (0.999) 0.001 0.001 1 = = = = 0.0196 𝑜𝑜 0.02 0.001 + 0.04059 0.05095 50.95 Method 2: Disease Test result Total D+ D- T+ 1 49.95 50.95 T- 0 949.05 949.05 Total 1 999 1000 1 𝑃(𝐷 +/𝑇 +) = = 0.0196 𝑜𝑜 0.02 50.95 Method 3: Tree, sen = 100%, spec=95%, prev = 0.001 T+ 1 1 D+ ALL T+= T- 0 1+49.95= 50.95 1000 T+ 𝑃(𝐷 +/𝑇 +) 49.95 1 999 D- = = 0.02 T- 949.05 50.95 [Type text] Page 4 Notes: If the studied disease is rare (that is, its incidence is between 0.001 and 0.002), then the sensitivity must be high so that we can detect these few cases. If the studied disease leads to real death, and its prognosis improves if detected early, then the sensitivity of the test must be very high. Sensitivity and specificity are constant under test conditions, while the positive predictive value increases with increasing prevalence, and in contrast the negative predictive value decreases. Therefore, high-risk communities are considered the best target for screening programmes. The positive predictive value increases with increasing sensitivity and specificity of the test. D. Area under the ROC Curve One way to visualize the sensitivity and specificity is to create and plot the ROC Curve, (Receiver operating characteristic). In this plot the y-axis represents the sensitivity and x- axis represents the (1-specificity). ROC Curve is useful for evaluating and comparing the performance of classification models where the response variable is binary, such as the presence or absence of a disease. Using ROC curve: Since many test results are presented as continuous or ordinal variables, a reference value (cut-off value) for diagnosis must be set. Whether a disease is present can thus be determined based on the cut-off value. Example using IBM SPSS: A cancer marker is measured for a total of 20 patients to determine the presence of cancer. If the measured value of the cancer marker is the same as or greater than the cut- off value (reference value), the patient is determined to have cancer, whereas if the measured value is less than the reference value the patient will be considered as normal. Supposed three patients had biopsy-confirmed cancer diagnoses (Gold standard method). The data are entered into IBM SPSS program: Conduct ROC Analysis: Analyze - ROC Curve [Type text] Page 5 Results: The table indicates that the area under the curve AUC is significantly different from 0.5 since p-value is 0.016 meaning that there is a successful classification based on tumormarker , Area =.867, p <.05. Test Result Variable(s):tumormarker Asymptotic 95% Confidence Interval a b Area Std. Error Asymptotic Sig. Lower Bound Upper Bound.867.096.016.000 1.000 [Type text] Page 6 Therefore the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the diagnosis test (Zweig & Campbell, 1993). E. Best threshold of a continuous Test: ROC Curve The cut score that yields 0.80 sensitivity and.80 specificity is 46.3. Although based on Figure, the maximal value on both metrics is about.80, associated with a score of 46.3. Sample size determination in some cases F. Formula in simple situations, Nomograms and tables of sample size This nomogram is for the 95% confidence level and consists of five parallel lines. The first line depicts anticipated sensitivity or specificity of the diagnostic test that can vary from 0.70 to 0.97. A test with anticipated sensitivity or specificity less than 0.70 may not be worthy of investigations. The minimum value of L on either side of anticipated sensitivity or specificity is taken as 0.03. The second line depicts the number of subjects required at 0.03 and 0.05 absolute precision and the third line depicts the number of subjects for 0.07 and 0.10 absolute precision. Fourth and fifth lines are prevalence lines and represent the expected prevalence of disease; the fourth line is to be used for L = 0.03 or 0.05 and the fifth for L = 0.07 or 0.10. [Type text] Page 7 Suppose the researcher selects anticipated sensitivity Sen = 0.80, precision = 0.03 with 95% confidence level (two-tailed), i.e., Sen can be from 0.77 to 0.83, and expected prevalence = 0.20. Place a ruler joining the point 0.80 on the anticipated sensitivity/specificity line to point 0.20 on the estimated prevalence line of 0.03 absolute precision and read the required sample size from the number of subjects line of 0.03 absolute precision. In our example, the number of subjects required is nearly 3450 as shown in Fig. By formula, the exact value is 3415–a difference of nearly 1%. 2 Verifying: 95% CI means α=0.05 𝑍1−∝/2 = 1.96, S N =0.80, 𝐿2 = 0.03 ∗ 0.03 = 0.0009 Prevalence =0.20-----Substitute all in the formula, then Sample size n=3415 To find the required sample size for estimating specificity, first subtract the expected prevalence from 1 and place the ruler joining the anticipated specificity to (1 – prevalence) value on the prevalence line of required precision. For example, if Spe= 0.80, precision = 0.05 with 95% confidence level, and prevalence is 0.20, join the point Spe = 0.80 with the point (1 – 0.20) = 0.80, on the prevalence line of 0.05 absolute precision, and read the sample size from the number of subjects line for 0.05 absolute precision This is nearly 300. By calculation, the exact value is 308. Now the difference is 3%. The final sample size depends on the interest of the researcher. If sensitivity and specificity are equally important for the study, determine the sample size for both sensitivity and specificity, separately. The final sample size of the study would be the larger of these two. But sometimes the researcher is interested more in sensitivity than specificity. In that case, the final sample size would be based on the sensitivity only. [Type text] Page 8 END [Type text] Page 9 [Type text] Page 10 603 CLS Practical Activity No. 3 Probability in clinical assessment, diagnosis and sample size ……………………………………………………………………………………………………Mark Name: _________________________________________________________________________ Student Number: ________________________________________Date: __________________ Sample size calculation based on the area under the ROC curve Q1 Example: a study to evaluate the accuracy of CSF lactate in discriminating the bacterial meningitis from enteroviral meningitis. Assume that we will conduct a study to estimate the accuracy of CSF lactate to discriminate bacterial meningitis from enteroviral meningitis. Therefore, we will enroll a group of patients with acute meningitis including those with bacterial meningitis and those with enteroviral meningitis. For each CSF specimen, bacterioscopy, bacterial antigen latex agglutination test and CSF bacterial culture will be performed as a standard test (reference test), then the CSF lactate will be estimated (new test). A previous study by Manomaivat et al. showed that the AUC of CSF lactate was 94% for discriminating bacterial meningitis from enteroviral meningitis. The ratio between negative and positive cases was 525/662. Calculate the sample size required for our new study, using for example the program MedCalc software Solution: [Type text] Page 11 Q2 U If 10% of the children in a country are malnourished and 5% are anaemic, we know that 50% of the anaemic people are also malnourished. 1. Are malnutrition and anaemia independent? 2. What is the probability of anemia in malnourished people? 3. What is the probability of anemia in non-malnourished people? 4. What is the probability of having both diseases together? 5. What is the overall probability of not having any of the two diseases? Solve (via the table). Solution: Anemia Malnutrition Total Yes(D+) No(D-) Yes(T+) No(T-) Total 10,000 1. Anemia and malnutrition are not independent because 50% of people with anemia are also malnourished. 2. The probability of anemia among people with malnutrition = ………….. = ………….% 3. The probability of anemia for non-malnourished people ……………….. = ………….% 4. The overall probability of being anemic and malnourished = …………… = …………% 5. The overall probability of not having either disease = ………………… = …………….% [Type text] Page 12

Use Quizgecko on...
Browser
Browser