Evidence-Based Medicine: Critical Appraisal of Articles on Diagnosis PDF
Document Details

Uploaded by TerrificNephrite9581
School of Medicine - AUTH
2007
Damon A. Schranz, DO
Tags
Summary
This article provides an introduction to the critical appraisal of articles on diagnosis. The authors propose a systematic process to evaluate the validity and importance of using diagnostic tests. The methods will help physicians improve the quality of their patient care and also with the use of evidence-based medicine.
Full Transcript
SPECIAL COMMUNICATION Evidence-Based Medicine, Part 3. An Introduction to Critical Appraisal of Articles on Diagnosis Damon A. Schranz, DO Michael A. Dunn, OMS III, MBA This article provides an i...
SPECIAL COMMUNICATION Evidence-Based Medicine, Part 3. An Introduction to Critical Appraisal of Articles on Diagnosis Damon A. Schranz, DO Michael A. Dunn, OMS III, MBA This article provides an introductory step-by-step process to all diagnostic tests are equal in their ability to differentiate the appraise an article on diagnosis. The authors introduce these presence, absence, or severity of a particular disease or con- principles using a systematic approach and case-based dition present in a patient. Therefore, clinicians need a method format. The process of assessing the validity of an article for selecting the best test to meet a particular patient’s needs.2 on diagnosis, determining its importance, and applying it to Evidence-based medicine (EBM), the practice of appraising an individual patient is reviewed. The concepts of study the literature in a time-efficient manner to answer a clinical population homogeneity, reference and criterion standards, question about, and for, the patient,3 is such a method. and completeness are discussed to help physicians deter- In this article, we present a strategy for busy clinicians, mine an article’s validity. Instruction on calculating preva- physician residents, and medical students to critically assess lence, sensitivity, specificity, and positive and negative pre- the medical literature on diagnosis. In-depth details of research dictive values and likelihood ratios is provided and applied methods are beyond the scope of this introductory series on to a hypothetical clinical scenario. Study generalizability EBM. Readers are encouraged to seek further training on and the role of patient values, expectations, and concerns these topics with supplemental learning opportunities and are also addressed. The skills learned from appraising an continuing medical education. Finally, the clinical scenario article on diagnosis in the manner outlined provides a solid described has been simplified to provide readers with an basis for life-long learning and improved patient care. illustrative example for the general concepts introduced. [Editor’s note: This article is part 3 of a six-article series intended to introduce the principles of evidence-based medicine (EBM) to Searching the Evidence busy clinicians, physician residents, and medical students. Because To find an article that is appropriate to review for the pur- the application of EBM is a career-long process, further training is pose of better establishing patient diagnosis, physicians can needed beyond the information provided within this article and series. approach searching the evidence in two ways. In general, A foundation of knowledge about research methods is critical in physicians who practice EBM search the evidence for an article understanding EBM; however, such details, though introduced, are that contains the information sought. However, physicians in beyond the scope of this series.] the habit of summarizing articles relevant to their practice can J Am Osteopath Assoc. 2007;107:304-309 first refer to their clinically-appraised topics (CATs) when faced with a clinical question. E very medical school graduate is taught how to assess and diagnose a patient’s condition. A diagnostic test and its results are important tools that help guide physicians to the Critically Appraised Topics Similar to the index card method of recording researched appropriate diagnosis by revealing the likelihood of whether information, CATs are a personal method of documenting or not a patient has a specific condition.1 Results of the best the results of any article in medical literature for a specific diagnostic tests remove all doubt that a patient has (or does clinical problem.3 These records are simply summaries of a not have) an identifiable disease or disorder. However, not study and its results that a physician can create for later retrieval, review, and reuse (Figure 1). The most thorough CATs consist of the article title, the clinical “bottom line,” the clinical question, a summary of the results, comments, the From the Department of Family Medicine at the University of North Texas date the study was published, and any relevant citations.3 A Health Science Center—Texas College of Osteopathic Medicine in Fort Worth. more detailed description of these components is available in Address correspondence to Damon A. Schranz, DO, Department of Family Medicine, Texas College of Osteopathic Medicine, University of North Texas Figure 2.4 Physicians may choose to share their CATs with Health Science Center, 855 Montgomery St, Patient Care Center, 2nd Fl, Fort colleagues, in which case physicians should also include their Worth, TX 76107-2553. name or initials as the CAT appraiser. E-mail: [email protected] A CAT is not a systematic review and should not be con- Submitted February 14, 2007; revision received June 14, 2007; accepted June sidered a practice guideline because the information found 18, 2007. in it may not be authoritative.3 However, physicians will begin 304 JAOA Vol 107 No 8 August 2007 Schranz and Dunn Special Communication Downloaded From: http://jaoa.org/ on 11/27/2016 SPECIAL COMMUNICATION before putting its recommendations into practice. For guide- lines on how to appraise such review articles, a handbook is Clinical Scenario available on The Cochrane Collaboration Web site (http://www.cochrane.org/resources/handbook/Hand- book4.2.6Sep2006.pdf). A 58-year-old man visits your primary care clinic In the absence of a systematic review or meta-analysis, for the first time in 17 years. He states that, although individual articles are often the only source of new informa- he feels fine and has no complaints, he is concerned tion available to clinicians. Assessing these individual articles that his blood pressure may be elevated. He relays to (Figure 3) is the focus of this paper. you that, while shopping at a supermarket last week, he had his blood pressure measured by the store’s Validity of Articles on Diagnosis blood pressure machine. He was alarmed to see a To ascertain the validity of an individual article, physicians reading of 182/104 mm Hg and wanted to verify need to determine not only if the study’s results and conclu- that his blood pressure was indeed that high. sions were accurately deduced but also if the methods used to During a routine physical examination you arrive at the conclusions were free of error and bias. This is the confirm that he is hypertensive but in otherwise most crucial step in evaluating an article. If its validity is ques- good physical health. Naturally, your first reaction is tionable, the article’s results cannot be confidently inter- to help the patient relieve his hypertension, but you preted.2,5,6 Physicians may use the following questions3 to are unsure about the appropriate goal. help them determine an article’s validity: You remember generating a critically appraised topic (CAT) from a study that demonstrated a 䡲 Was there an independent and blind comparison to a ref- marked reduction in major cardiovascular events erence standard? when diastolic targets of 80 mm Hg were achieved. A reference standard is a method of defining the presence or After reviewing this CAT, you realize that the benefits absence of the disease or condition in question.7 To determine to the lower diastolic pressure applied only to patients whether a diagnostic test is effective, a reference standard is with diabetes. However, the CAT also states that the needed for comparison.8 If a reference standard is not used in US Preventative Service Task Force recommends the study, the benefit of the diagnostic test cannot be ascer- screening hypertensive patients for type 2 diabetes tained. In addition, not all reference standards are equal or mellitus. subjective.9 For example, reference standards for psychiatric dis- You discuss the finding of the study with your orders may not be clear-cut and subjective, and other stan- patient along with the pros and cons of screening for dards, such as biopsies, rely on expert interpretation. The best diabetes. He agrees to think it over and return in a reference standard to evaluate the effectiveness of a diagnostic week for a follow-up visit. After writing a prescription test is the criterion standard, which is considered the diagnostic to initiate treatment of his hypertension, you make a model for identifying a specific disease or condition.3 note to review the literature regarding diagnostic The study’s data collection and analysis must be care- tests for type 2 diabetes mellitus in asymptomatic fully planned and executed to ensure that unconscious (or patients. conscious) biases are maximally reduced.3 In other words, in (continued) clinical investigations, those who perform tests and those who interpret the results should be independent of one another. Both Figure 1. Clinical scenario. groups of researchers should be blinded to the diagnostic and reference standard test results. to refine and improve their EBM skills after summarizing 䡲 Was the diagnostic test evaluated in subjects similar to varying clinical issues in this fashion.3 patients seen in practice? Because physicians practice in a wide range of geographic Systematic Reviews vs Individual Articles areas and within various medical specialties, the patients they When searching the evidence for a clinically relevant article on treat have distinct characteristics. For a study to be applicable diagnosis, systematic reviews and meta-analyses are the most to a physician’s patient, the study’s subjects need to have sim- authoritative types of reports.3 These studies, which critically ilar baseline characteristics. A physician who evaluates the appraise and summarize multiple similar studies concerning applicability of an article in this way maximizes the likelihood a common medical problem, are not as numerous as indi- that a study’s results can be generalized to his or her patient. vidual articles. However, such reviews are only as good as the individual studies they include. A physician must be vigi- 䡲 Was the reference standard obtained regardless of the lant in critically assessing a systematic review or meta-analysis diagnostic test’s result? Schranz and Dunn Special Communication JAOA Vol 107 No 8 August 2007 305 Downloaded From: http://jaoa.org/ on 11/27/2016 SPECIAL COMMUNICATION Component Description Sample CAT Title Topic of clinical interest Tight diastolic blood pressure (BP) control reduces risk in reviewed article of cardiovascular disease (CVD) in type 2 diabetes mellitus Clinical Bottom Line Major findings of the study Hypertensive patients with diabetes who maintained a as they relate to the topic of interest target diastolic BP of 80 mm Hg had a reduction in CVD and all-cause mortality compared with those with a target of 90 mm Hg. There were no CVD or mortality differences in patients without diabetes. Clinical Question Information sought regarding In a patient with type 2 diabetes mellitus and hypertension, topic of interest would reduction in BP reduce the risk of CVD? Study Design Pertinent information regarding Double-blind, randomized controlled trial. Target disorder the study design and criterion standard: diastolic BP, echocardiogram/ Independent Clinical Event Committee. Patients: 18,790 patients from 26 countries aged 50 to 80 years with diastolic BP between 100 and 115 mg Hg. Evidence Summary A summary of the article’s Target BP: ⬍80 mm Hg major findings Events per year: 11.9 Comparison (mm Hg): 90 vs 80 Relative risk: 2.06 P value: ⬍.001 Comments Suggestions for using Valid article to use in clinical practice the study’s findings Publication Date 1998 Citations Citation to referenced article Hansson L, Zanchetti A, Carruthers SG, Dahlof B, Elmfeldt D, and other studies, if applicable Julius S, et al, for the HOT Study Group. Effects of intensive blood-pressure lowering and with hypertension: principal results of the Hypertension Optimal Treatment (HOT) randomised trial. Lancet. 1998;351:1755-62.4 US Preventive Services Task Force, Screening for Type 2 Diabetes Mellitus in Adults: Recommendation and Rationale, 2003 Figure 2. Example of the information that should be included in a critically appraised topic (CAT). Assessment of a diagnostic test to a reference standard (prefer- Study Results ably the criterion standard) requires that both tests are per- Now that a diagnostic article of interest is found and is formed and their effectiveness compared, which should not deemed to have merit, one can evaluate its results to deter- be an issue if the comparison study is truly independent and mine its general usefulness (Figure 4). Although this step of blinded. One exception to the rule is a negative noninvasive the appraisal process for articles on diagnosis appears intim- diagnostic test result coupled with an invasive or risky ref- idating, it only requires basic mathematic and statistical skills. erence standard.9 In this situation, the investigators would be With practice, these invaluable calculations will become hesitant to perform the invasive reference standard if the second nature. noninvasive diagnostic test results were negative. Studies can be designed to reduce this risk by creating, for example, 䡲 Does the diagnostic test help determine who has the target a method to screen persons who do not have the target dis- disorder? order, thus eliminating the need to verify the noninvasive neg- Research articles present information to emphasize the ative result with an invasive test. However, a study should authors’ point of interest. Although this focus may be different be viewed with suspicion if it does not independently perform from the reader’s particular interest, the information sought the reference standard test and diagnostic test on every par- can usually be found within the article. To determine the ticipant, even if the reference standard was considered inva- diagnostic discrimination of a test, or the statistical assessment sive or risky.9 of how a diagnostic test compares with a reference standard, 306 JAOA Vol 107 No 8 August 2007 Schranz and Dunn Special Communication Downloaded From: http://jaoa.org/ on 11/27/2016 SPECIAL COMMUNICATION test result, or the negative predictive value, was 97%. There- fore, within the study’s population,10 a positive diagnostic test Clinical Scenario (continued) result shifted the pretest odds of having type 2 diabetes mel- litus from 11% to 43% (posttest), which is clinically signifi- cant. A PubMed search for the terms “performance of Sensitivity, specificity, and positive (LR+) and negative screening tests for undiagnosed diabetes” returns (LR-) likelihood ratios are additional parameters to help physi- 10 items. You select the article with the title that best cians determine the usefulness of a test’s diagnostic abilities. reflects your inquiry. Sensitivity is defined as the proportion of true positives (eg, (continued) patients who test positive for a disease as measured by both the criterion or reference standard and the diagnostic test) of a Figure 3. Clinical scenario (continued). study population. Specificity is the proportion true negatives (eg, patients who test negative for a disease as measured by both the criterion or reference standard and the diagnostic test) of a study population. These parameters can be used to calculate the diagnostic test’s LR+ and LR-, which are the probablilities Clinical Scenario (continued) of getting a positive or negative test result if the patient has the condition compared with the probablility of getting the result if the patient does not have the condition. You review the article and find it to be a well- According to the Table,10 the LR+, the ratio of the true designed study. However, as the authors indicate, the positive rate to the false positive rate, means that a positive test study does have limitations, which must be taken result would be 6.25 times as likely in someone with type 2 dia- into consideration. Although the criterion standard betes mellitus as in someone without type 2 diabetes mellitus. was used as the reference standard, a positive result Likewise, in the referenced study,10 the LR-, the ratio of the false requires duplication in order to make the diagnosis negative rate to true negative rate, a negative test result would of diabetes. be 0.28 times as likely in someone with type 2 diabetes mellitus In addition, the study population consisted of a as in someone without type 2 diabetes mellitus. convenience sample of volunteers and therefore it did not constitute a randomized controlled trial. 䡲 How can a diagnosis be determined? Although this population appears to be congruent An interesting and useful feature of high sensitivity and speci- with your clinic’s demographics, it may be skewed ficity values is that they can help rule in or rule out a diagnosis, by undefined characteristics, such as a person’s respectively. Mnemonic devices can be used to help one willingness to participate in a medical trial. remember how to use specificity and sensitivity to make a Nonetheless, a diagnostic study for diabetes is quite clinical decision. rare, so you decide that it is in your patient’s best interest to consider it further. ▫ With a high sensitivity (Sn), a negative (N) result effectively (continued) rules out the diagnosis (SnNout)3 ▫ With a high specificity (Sp), a positive (P) result effectively Figure 4. Clinical scenario (continued). rules in the diagnosis (SpPin)3 For example, a positive result on a rapid streptococcal antigen test rules in (SpPin) the diagnosis of a streptococcal pharyngitis, critical readers must calculate the predictive values and rates, and a negative D-dimer test result effectively rules out (SnNout) the sensitivity, and the specificity (Table).10 the diagnosis of deep venous thrombosis (Figure 5). Based on the example in the Table,10 the prevalence of type 2 diabetes mellitus in the study population is 11%.10 If the Practical Use characteristics of the physician’s patient is similar to the study’s Now that the article has been reviewed for its validity and population, then an estimate of the patient’s pretest proba- relevance to the physician’s patient and it is determined to bility (the probability that a patient has the disease before the have significant clinical applicability, one still needs to answer diagnostic test is performed) for having undiagnosed diabetes a fundamental question: Can these results benefit the patient?3 may be close to 11%. The positive predictive value, which is the If a physician cannot confidently answer “yes,” the article probability that a study participant has the disease if the diag- must be placed aside and a new search started. The potential nostic test result is positive, was 43%. The probability of a for “wasted time” is the main factor behind why physicians patient not having type 2 diabetes mellitus after a negative often do not apply this step. However, the real waste of time— Schranz and Dunn Special Communication JAOA Vol 107 No 8 August 2007 307 Downloaded From: http://jaoa.org/ on 11/27/2016 SPECIAL COMMUNICATION Table Diagnostic Test Results of Type 2 Diabetes Mellitus Compared With the Criterion Standard (N=1471) and Statistical Assessment of the Data Test Results by No. of Patients According to Criterion Standard Diagnostic Test Result With Type 2 Without Type 2 (Blood Glucose, mg/dL) Label Diabetes Mellitus Diabetes Mellitus True positive (⭓120) a 118 False positive (⭓120) b 158 False negative (⬍120) c 39 True negative (⬍120) d 1156 Totals 157 1314 Statistical Assessment Equation* Equation With Data Result ▫ Prevalence (a+c)/(a+b+c+d) 157/1471 0.11 or 11% ▫ Positive predictive value a/(a+b) 118/276 0.43 or 43% ▫ Negative predictive value d/(c+d) 1156/1195 0.97 or 97% ▫ Sensitivity a/(a+c) 118/157 0.75 or 75% ▫ Specificity d/(b+d) 1156/1314 0.88 or 88% ▫ Positive likelihood ratio sensitivity/(1-specificity) 0.75/0.12 6.25 ▫ Negative likelihood ratio (1-sensitivity)/specificity 0.25/0.88 0.28 *a⫽118; b⫽158; c⫽39; d⫽1156 Source: Rolka DB, et al. Diabetes Care. 2001;24:1899-1903.11 not to mention a potential for harm—would result from imple- 䡲 Is the pre- to posttest probability shift valuable to the spe- menting results that cannot be expected to help the patient cific patient? or that are unrealistic to apply in the clinical setting. The purpose of performing a diagnostic test is to confirm or rule out a diagnosis. Therefore, the shift from pre- to posttest prob- 䡲 Is the diagnostic test available and affordable in the physi- ability of the diagnostic test must be clinically useful; if it is not, cian’s clinical setting? the test result will not be valuable to the patient or the decision- The diagnostic test must be available to a physician before he making process.11 or she can order it. In addition, the diagnostic test must be The shift in pretest probability to the positive predictive affordable to patients or covered by their health insurance. value (or posttest probability) for a given diagnostic test is an Applying the right diagnostic tool at the appropriate time assists one’s efforts in reducing healthcare costs by reducing the number of unnecessary tests. Clinical Scenario (continued) 䡲 How can the physician determine a specific patient’s pretest probability of having the target disorder? One method for determining a patient’s pretest probability of You create a table to consolidate the article’s findings having the target disorder has already been discussed: using the that are related to your clinical question. This table study’s inherent disease prevalence. This inherent prevalence, assists your calculation of prevalence, pre- and however, is appropriate only if the physician’s patient is similar posttest probabilities, sensitivity, specificity, and to those in the study’s population. Other means of determining positive and negative likelihood ratios. In reviewing a patient’s pretest probability include the physician’s clinical these statistics, you determine that the diagnostic experience, regional and national statistics, and studies specif- test is clinically important. ically developed to determine pretest probabilities for the target (continued) disorder. All of these methods have merit and should be con- sidered. The one that is chosen should be based on available data and their applicability to the particular patient. Figure 5. Clinical scenario (continued). 308 JAOA Vol 107 No 8 August 2007 Schranz and Dunn Special Communication Downloaded From: http://jaoa.org/ on 11/27/2016 SPECIAL COMMUNICATION 2. Jaeschke R, Guyatt GH, Sackett DL, for the Evidence-Based Medicine Working Group. Users’ guide to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? JAMA. Clinical Scenario (continued) 1994;271:389-391. 3. Straus SE, Richardson WS, Glasziou, P, Haynes RB. Evidence-Based Medicine: How to Practice and Teach EBM. 3rd ed. St Louis, Mo: Churchill Livingstone; After critically assessing the article, you determine 2005. that it would be premature to use the recommended 4. Hansson L, Zanchetti A, Carruthers SG, Dahlof B, Elmfeldt D, Julius S, et al, diagnostic test on your patient. However, the study for the HOT Study Group. Effects of intensive blood-pressure lowering and low-dose aspirin in patients with hypertentsion: principle results of the hyper- shows great promise for this method and you decide tension optimal treatment (HOT) randomized trial. Lancet. 1998;351:1755-1762. to create a critically appraised topic to document its 5. Lijmer J, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JHP, findings. You store it for future reference and remind et al. Empirical evidence of design-related bias in studies of diagnostic tests. yourself to continue to look for developments JAMA. 1999;282:1061-1066. concerning this potential diagnostic tool. 6. Bossuyt PMM. The quality of reporting in diagnostic test research: getting better, still not optimal [editorial]. Clin Chem. 2004;50:465-466. Available at: http://www.clinchem.org/cgi/content/full/50/3/465. Accessed July 9, 2007. Figure 6. Clinical scenario (continued). 7. Mayer D. Essential Evidence-Based Medicine. Cambridge, UK: Cambridge University Press; 2004. 8. Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic effective discriminator for choosing between competing tests. review. Ann Intern Med. 2004;140:189-202. Available at: http://www.annals Large LR+ values and small LR- values are indicative of sig-.org/cgi/content/full/140/3/189. Accessed July 9, 2007. nificant shifts. For example, a diagnostic test that provides a 9. Knottnerus JA, van Weel C, Muris JWM. Evidence base of clinical diagnosis: LR+ or LR- of 1.0 will not shift the posttest probability at all.1,3 evaluation of diagnostic procedures [published correction appears in BMJ. 2002;324:1391]. BMJ. 2002;324:477-480. Available at: http://www.bmj.com Therefore, it would be wasteful to perform the test because its /cgi/content/full/324/7335/477. Accessed July 9, 2007. results would not benefit the patient or the clinical decision- 10. Rolka DB, Venkat Narayan KM, Thompson TJ, Goldman D, Lindenmayer making process. On the other hand, a test with a LR+ of 10.0 J, Alich K, et al. Performance of recommended screening tests for undiagnosed would shift a pretest probability of 50% to a positive predic- diabetes and dysglycemia. Diabetes Care. 2001;24:1899-1903. Available at: tive value of 92%, which would be clinically useful.1,3 http://care.diabetesjournals.org/cgi/content/full/24/11/1899. Accessed July 31, 2007. In addition to the test’s pre- to posttest shift, one needs to 11. Bossuyt PMM, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LW, et al, for the consider the cost and invasiveness of the tests when choosing STARD group. Towards complete and accurate reporting of studies of diag- between competing diagnostic tests. When these competing ele- nostic accuracy: the STARD initiative. Fam Pract. 2004;21:4-10. Available at: ments are considered and balanced with the patient’s needs http://fampra.oxfordjournals.org/cgi/content/full/21/1/4. Accessed July 9, 2007. and informed consent, physicians can be confident that the best evidence is being applied in the most efficient and effective manner (Figure 6). Conclusion Although most clinicians are already incorporating EBM prin- ciples in their practices, often instinctively, some physicians may require a more organized approach to integrating this rela- tively new model of self-education. Improved comfort levels and true expertise in the practice of EBM are the result of additional education, repetition, and self-assessment. The prin- ciples of EBM allow physicians to stay informed while also improving the quality of the information communicated to patients during patient encounters. The systematic approach that is used to appraise an article on diagnosis is but one step in practicing EBM. Remember, the goal is always to provide the best care possible to patients—using one’s clinical exper- tise to address patient values and expectations for treatment. References 1. Jaeschke R, Guyatt GH, Sackett DL, for the Evidence-Based Medicine Working Group. Users’ guide to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? JAMA. 1994;271:703-707. Schranz and Dunn Special Communication JAOA Vol 107 No 8 August 2007 309 Downloaded From: http://jaoa.org/ on 11/27/2016