Lecture 2: Healthcare Data, Information and Knowledge PDF
Document Details
Uploaded by SublimeManticore
Elmer Bernstam MD, Todd Johnson PhD, Trevor Cohen MD PhD
Tags
Summary
This lecture covers healthcare data, information, and knowledge. It discusses the difference between data and information, and how data warehouses can be used to gain knowledge in healthcare. The lecture also introduces the i2b2 platform as an example of a biomedical informatics model.
Full Transcript
Lecture 2: Healthcare Data, Information and Knowledge Elmer Bernstam MD Todd Johnson PhD Trevor Cohen MD PhD Presented by Dr. Musa Faneer Introduction ∗ Data are symbols or observations reflec...
Lecture 2: Healthcare Data, Information and Knowledge Elmer Bernstam MD Todd Johnson PhD Trevor Cohen MD PhD Presented by Dr. Musa Faneer Introduction ∗ Data are symbols or observations reflecting differences in the world. Example = 250.00 (Note: data is the plural of datum) ∗ Information is data with meaning. Example = ICD-9 code of 250.00 means type 2 diabetes ∗ Knowledge is information that is justifiably believed to be true. Example = obese patients are more likely to develop type 2 diabetes Introduction ∗ Computers generate and analyze binary information: zero (off) and one (on). Each zero or one is a bit; a series of 8 bits is a byte. Note that these bits and bytes have no meaning per se ∗ Bits can occur as various data types ∗ Integers such as 345 or 669988 ∗ Floating point numbers such as 14.1 or -1.23 ∗ Characters such as a or z ∗ Character strings such as “hello” or “goodbye” Introduction ∗ Data can be aggregated into a variety of formats such as image files (JPG, GIG, PNG), text files, sound files (WAV, MP3) or video files (WMV, MP4) ∗ Recognize that these formats do not define what information is available, just the category format ∗ Data are the domain of computer scientists, but information is the domain of informatics and informaticians Introduction ∗ Information retrieval involves both computer science (data) and informatics (information). See image below Data and Information ∗ Computer data not only lacks meaning, but must includes dates and other qualifiers to gain significance. For example, blood glucose = 127. Was that mg/dl, was the sample drawn fasting, etc. ∗ Everything must be standardized, otherwise computer B will not understand data transmitted from computer A (i.e. data won’t be interoperable) Information to Knowledge ∗ A modern way to convert medical information to knowledge is to use a clinical data warehouse (CDW) ∗ EHRs are now a huge source of healthcare data and information. They contain both structured (coded e.g. ICD-9 codes) and unstructured text (free text or natural language) ∗ Interpreting free text requires natural language processing (NLP) Clinical Data Warehouse ∗ Data from EHRs, Radiology, Pathology, etc. are copied into a staging database where they are cleaned and loaded into another common database and associated with meta data (data that describes data). ICD-type data is an example of meta data ∗ Tools can be applied to the data in the CDW, such as simple descriptive analytics that reports the number of patients with breast cancer, their age, menopausal status, etc. More about this in chapter 3 ∗ CDWs do a better job of analyzing and reporting aggregate healthcare data than the average EHR, which tends to focus on the individual Clinical Data Warehouse ∗ CDWs can be used to evaluate a critical clinical process, cost estimates and they can analyze potential solutions ∗ CDWs are highly valuable for informatics and evidence based medical research ∗ CDWs can help track infections and report trends to public health ∗ Next slide shows a typical CDW schema Clinical Data Warehouse ETL = extract, transfer and load i2b2 platform https://www.i2b2.org ∗ Informatics for Integrating Biology and the Bedside (i2b2) is a Harvard project used by many other academic institutions in the US ∗ The program is open source and modular and incorporates genomic and clinical information for research purposes ∗ Data base consists of facts (diagnoses, lab results, etc.) queried by users and dimensions that describe the facts ∗ With this model data can be aggregated from multiple hospitals i2b2 star schema Concept Extraction ∗ In order to extract concepts from free text in EHRs or CDWs several systems have been developed. See below Concept Extractor Gold Standard Precision Recall F-score (F1) cTAKES17 Mayo clinic 0.80 0.65 0.72 MetaMap20 NLM 500 articles 0.32 0.53 0.40 MEDLEE21 Proprietary 0.86 0.77 0.81 What Makes Informatics Difficult? ∗ With other industries such as banking, data and information are much closer (smaller semantic gap). ∗ For example, banking data such as $100.50 is close to an account balance of $100.50. It leaves little leeway for a different interpretation ∗ In healthcare, there are subjective factors (“I feel sick”) that are difficult to measure and vary from patient to patient and physician to physician. Lab results are more objective and easier to interpret What Makes Informatics Difficult? ∗ It is difficult to model all of healthcare. View the HL7 RIM model on next slide ∗ Biomedical information is difficult due to incomplete, imprecise, vague, inconsistent and uncertain information ∗ Humans can adapt to this dynamic and vague information but computers can not. Clinical decision support in EHRs is precise, when in reality it might need to be flexible over time HL7 version 3 RIM model Conclusions ∗ Computer scientists focus on data, while informaticists focus on information ∗ There is a gap between healthcare data and information ∗ The transformation of information into knowledge is a primary goal of informaticists ∗ Clinical data warehouses are increasing used to research clinical questions and generate knowledge from information