Chapter 1.pdf
Document Details

Uploaded by InterestingLove
University of Hail
Full Transcript
UNIVERSITY OF HAIL COLLAGE OF PUBLIC HEALTH AND HEALTH INFORMATICS DEPT. OF HEALTH INFORMATICS DATA SCIENCE IN HEALTH CARE HIIM 233 Chapter 1 Data Sources Instructors: Dr. Muteb Alshammri MSc. Ibrahim A. Ibrahim ELECTRONIC MEDICAL RECORDS ▪ Electronic medical records (EMRs), often also referred to a...
UNIVERSITY OF HAIL COLLAGE OF PUBLIC HEALTH AND HEALTH INFORMATICS DEPT. OF HEALTH INFORMATICS DATA SCIENCE IN HEALTH CARE HIIM 233 Chapter 1 Data Sources Instructors: Dr. Muteb Alshammri MSc. Ibrahim A. Ibrahim ELECTRONIC MEDICAL RECORDS ▪ Electronic medical records (EMRs), often also referred to as electronic health records (EHRs), are a major source of clinical data (although EMR and EHR have subtle differences). (“EHR (electronic health record) vs. EMR (electronic medical record),”. ▪ EMRs are computerized medical information systems that collect, store and display patient information. They are means to create legible and organized recordings and to access clinical information about individual patients. ▪ EMRs have been described as an important tool to reduce medical errors and improve information sharing among physicians. ▪ EMRs contain different sources of data which are relevant for data science. Most obvious are data that are directly linked to personal health status, such as laboratory values (tabular data), medical imaging (audiovisual data) or physicians’ written notes (semi-structured or free text). 2 OTHER MEDICAL INFORMATION SYSTEMS ▪ A laboratory information (management) system (LI(M)S) isa software system that records, manages, and stores data for clinical laboratories. A LIS has traditionally been most adept at sending laboratory test orders to lab instruments, tracking those orders, and then recording the results, typically to a searchable database. ▪ The standard LIS has supported the operations of public health institutions (like hospitals and clinics) and their associated labs by managing and reporting critical data concerning “the status of infection, immunology, and care and treatment status of patients”. ▪ Radiology information systems (RIS) have been introduced much earlier than EMRs for efficient ordering and scheduling, and were later integrated with the Picture Archiving and Communication System (PACS) for increased workflow efficiency in radiology departments. 3 MOBILE APPS ▪ For many telemonitoring (telemedicine, telehealth) applications, mobile apps are a very important tool to measure health-related data independent of time and loca- tion. ▪ Modern smartphones can capture various sorts of data and store them directly to a remote server using built-in wireless communication channels. 4 INTERNET OF THINGS AND BIG DATA ▪ Internet of Things (IoT) refers to the networked interconnection of everyday objects, which are often equipped with omnipresent intelligence. ▪ Such objects could be wearables (like smartwatches) but also shoe insoles or home domotics. ▪ IoT will increase the ubiquity of the Internet by integrating every object for inter- action via embedded systems, which leads to a highly distributed network of devices communicating with human beings as well as other devices. Thanks to rapid advances in underlying technologies, IoT is opening tremendous opportunities for a large number of novel applications that promise to improve the quality of our lives. ▪ In healthcare, Big Data are increasingly referred to as the solution for all sorts of problems. Although they are of fundamental importance, what matters is what we do with these data. That is covered later in this book in the sections on modelling. 5 SOCIAL MEDIA ▪ Social media such as Twitter, Facebook and blogs can also be an important source of data. ▪ Publicly available data (e.g. Twitter) can be used for several sorts of analysis, like sentiment analysis or graph networks. ▪ They are also relevant media to recruit participants for studies that can take place completely online using frame- works as Apple ResearchKit or Google Study. 6 GDPR ▪ The General Data Protection Regulation (GDPR) is a European regulation that became the standard for privacy in May 2018. ▪ All European organizations that process privacy-sensitive data have to comply to the GDPR. Therefore, the GDPR applies to all data sources mentioned above. ▪ Moreover, for scientific research most medical-ethical research committees now also require explicit attention to the GDPR when filing a new research protocol. ▪ A detailed description of the GDPR is provided in Chap. 5. 7 DATA TYPES Tabular Data: ▪ Tabular data are the most common and well known data for research and data sci- ence. ▪ They are represented in a column-row format in which -most commonly- rows represent individual records and columns represent the relevant variables. ▪ For machine learning applications in which you try to predict one variable based on the others (supervised learning), the variable you try to predict is called the independent or class variable, and the others are the feature or predictor variables. 8 Time Series: ▪ Time series are an ordered sequence of values of a variable at equally spaced time intervals. ▪ They are a particular sort of tabular data in which (mostly) columns represent different time stamps in chronological order. ▪ In data science applications the goal is mostly to predict future events. Time series require specific sorts of preprocessing as values (e.g. the mean) can -by definition- change over time. ▪ A particularly relevant sort of time series are processes. 9 Natural Language: ▪ In many medical applications free text format is still frequently used by physicians (physician notes, radiology reports), but also surveys or daily logs by patients can contain free text. ▪ Besides, social media contain free text as their data source. ▪ Techniques are available for text mining, also called “natural language processing”, to extract meaning in an automated fashion from free text input. ▪ These techniques in particular fall outside the scope of this book, but general principles for modelling do still apply. 10 Images and Videos: ▪ Images are another important source of data for data science, and also requires specific processing techniques for feature extraction before modelling can take place. Also here, deep neural networks can perform automated feature extraction nowadays. ▪ A famous example is Google’s Deep mind project, in which a computer model was fed videos that were tagged as containing cats or not containing cats. The model came up with cat images, despite never being trained in recognizing the concept of a cat. The same deep learning platform was later used to defeat the world champion in the game of Go, and an improved version learned to play the game from scratch and defeated the previous (world champion beating) algorithm with 100-0 11 DATA STANDARDS ▪ Standardizing health care data involves the following: Definition of data elements—determination of the data content to be collected and exchanged. Data interchange formats—standard formats for electronically encoding the data elements (including sequencing and error handling). Interchange standards can also include document architectures for structuring data elements as they are exchanged and information models that define the relationships among data elements in a message. Terminologies—the medical terms and concepts used to describe, classify, and code the data elements and data expression languages and syntax that describe the relationships among the terms/concepts. Knowledge Representation—standard methods for electronically representing medical literature, clinical guidelines, and the like for decision support. 12 CONCLUSION ▪ A variety of data sources and data types are relevant for clinical data science. A general overview of such data sources has been provided, and the concepts of different data types were introduced. Next chapters will dive deeper on data and standards, and a toolkit for natural data stewardship will be provided. 13