Bioinformatics and Public Health Informatics PDF
Document Details
Uploaded by EnchantedCharoite6373
The Hong Kong Polytechnic University
Dr. WK Chan
Tags
Summary
These lecture notes cover bioinformatics and public health informatics, including introduction, required skills, computational thinking and translational bioinformatics. They also discuss interdisciplinary, computational thinking, and define some related terms. The notes cover a variety of topics that fall under bioinformatics and computational biology.
Full Transcript
SN6006 Information Technology in Healthcare Bioinformatics and Public Health Informatics Lecturer Dr. WK Chan ([email protected]) 1 Introduction Required Skills Computational Thinking Translational Bioinformatics...
SN6006 Information Technology in Healthcare Bioinformatics and Public Health Informatics Lecturer Dr. WK Chan ([email protected]) 1 Introduction Required Skills Computational Thinking Translational Bioinformatics BIOINFORMATICS SN6006 - Information Technology in Healthcare 2 Introduction Definition Bioinformatics is the biological application of information technology with focus on data storage and analytics. Involves the research, development or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data Including usages to acquire, store, organize, archive, analyze or visualize such data. Computational Biology Application of information technology in understanding biology with emphasis on analytical algorithms. Computational approach to analyze and understand biological processes. The field of science in which biology, computer science and information technology merge to form a single discipline. Examples Finding the genes of various organisms. Predicting the structure or function of newly developed proteins. Developing protein models and examining evolutionary relationships. SN6006 - Information Technology in Healthcare 3 Interdisciplinary Interdisciplinary Development of bioinformatics involves various scientific disciplines such as biology, computer science, mathematics and statistics, physics and chemistry. Examples Areas in biology Biophysics, biochemistry, cell and molecular biology, genomics and evolutionary biology. Areas in computer science Programming, database, data structures, machine learning and artificial intelligence. Areas in mathematics and statistics Biostatistics, probability theory, linear algebra, discrete mathematics, differential equations, Bayesian statistics and calculus. Also draws ideas from fundamental concepts in physics and chemistry for finding solutions to a computational problem. SN6006 - Information Technology in Healthcare 4 Required Skills Skills Expertise in mathematical and statistical modelling Probability theory, graph theory, descriptive and inferential statistics and differential equation. Computational skills Ability to manage, store and analyze large biological datasets using available algorithms and software. Statistical programming Knowledge of mainstream statistical programming languages such as R and Python. Understanding in core biological subjects Genetics, genomics, biochemistry, molecular biology and evolution. Knowledge and application of state-of-the-art technologies E.g., Next-generation sequencing and mass spectrometry. Computational thinking Different from computer programming. Computational thinking is a logical thought process encompassing formulation of a complex problem and its subsequent possible computational solutions. Bioinformatics focus on the ability to ask a biological question and find its solution through algorithmic thinking. SN6006 - Information Technology in Healthcare 5 Computational thinking 4 steps of computational thinking Step 2 Step 1 Step 3 Step 4 Decomposition Pattern Abstraction Algorithms Recognition A complex problem Finding similarities Focus on key Development of is broken down into among the smaller information only solution to the smaller parts. parts. (signals). problem in stepwise Avoiding manner. concomitant unnecessary noises. SN6006 - Information Technology in Healthcare 6 Translational Bioinformatics Introduction Involves the development and usage of computational methods that could handle the data generated by biotechnology, which are then accumulated, assimilated, and analyzed for creating new tools for medicine. To optimize the transformation of increasingly voluminous biomedical data into proactive, predictive, preventive and participatory (P4) medicine. Purpose flows into development of precision medicine that underpins genomic, environmental, and clinical profiles of individuals that would allow the output of genomic data into personalized medicine Difference to Bioinformatics The specialization of bioinformatics for human health. The biological discoveries are directly translated to existing or future medicine and are related to human health and disease. Precision medicine Precision medicine has implications in both clinical medicine and therapeutic development, including the discovery of new drugs. The process of implementing new drugs involves crucial rounds of clinical trials and multiplex coordination between professionals of different fields, including clinical staff, clinicians, laboratory staff, biostatisticians, and bioinformaticians. SN6006 - Information Technology in Healthcare 7 Areas of translational bioinformatics Clinical genomics Usage of information from an individual patient's genome to inform clinical decision-making. Aids in the development of new molecular biomarkers that are related to a disease condition or state and are validated by clinically relevant genetic tests. Genomics: Study of genetic materials from a species. Genomic medicine A medical discipline that involves using a patient’s genomic information as part of their clinical care. Linking knowledge across biological and clinical realms. Development of personalized medicine. Pharmacogenomics Study of the genomic/clinical phenotype relationships with the pharmacologically active substances. The study of genetic material in relationship with drug targets. How an individual patient’s genes affect his/her response to drugs Phenotype: All the observable characteristics of an organism that result from the interaction of its genotype with the environment. For example, its size, shape, colors, behavior, etc. Genotype: The complete set of genetic material of an organism. Genetic epidemiology The aggregation of genome-based data in comparison to the public health and environmental registries. Study of how genetic factors influence human traits, such as human health and disease. Measures the interaction of genes with the environment. SN6006 - Information Technology in Healthcare 8 Protein Sequencing Evolutionary Biology Molecular Techniques Bioinformatic Algorithms DNA Sequencing Biological Databases BIOTECHNOLOGY SN6006 - Information Technology in Healthcare 9 Protein Sequencing PITC Deciphering the primary structure of a protein Edman degradation: the process of purifying protein by sequentially removing one residue at a time from the peptide. Method of peptide sequencing (sequencing amino acid components in a peptide) developed by Pehr Edman and published in 1950. The first biological problem solved by the bioinformatics approach. COMPROTEIN Each residue is cleaved by lowering the pH A computer program coded in FORTRAN on punch cards for the IBM 7090 by Margaret Dayhoff between 1958 and 1962. Designed to determine primary protein structure using Edman peptide sequencing data. The input and output amino acid sequences were represented in three-letter abbreviations. To simplify the handling of protein sequence data, Dayhoff further developed the one-letter amino acid code that is still in use today. Atlas of Protein Sequence and Structure The first ever biological sequence database issued in 1965 by Dayhoff and her colleagues. The one-letter codes for amino acids were used. Contained 65 protein sequences, most of which were interspecific variants of a handful of proteins. SN6006 - Information Technology in Healthcare 10 Evolutionary Biology Molecular Clock Emile Zuckerkandl and Linus Pauling performed evolutionary analysis of several protein biomolecules, including hemoglobin, in 1963. Observation Orthologous proteins from vertebrate organisms, such as hemoglobin, showed a degree of similarity too high over long evolutionary time to be the result of either chance or convergent evolution. The amount of differences in orthologous proteins from different species seemed proportional to the evolutionary divergence between those species E.g., human hemoglobin showed higher conservation in chimpanzee hemoglobin than in mouse hemoglobin, which correlated with divergence estimates derived from the fossil record. Hypothesis All the orthologous sequences were evolved from a single common ancestor. The sequences reflect the evolutionary history of species ("molecular evolutionary clock”). The findings paved the way for the prediction and reconstruction of an ancestral sequence from the available sequences of surviving species. Orthologous genes: Genes in different species that originated from a common ancestor SN6006 - Information Technology in Healthcare 11 Molecular Techniques Revolutionary molecular methods New molecular techniques were developed to target and amplify specific genes. Gene cloning Polymerase chain reaction (PCR) Gene cloning In 1972, Jackson, Symons and Berg developed the technique of DNA cloning. Step 1 Enzymes are used to cut and insert a DNA fragment into the circular SV40 viral DNA. Step 2 The resulting recombinant DNA is introduced into Escherichia coli (E. coli) bacteria cells using a process called transformation. Step 3 As the E. coli host cell cultures grow, the inserted DNA fragment is replicated and amplified, yielding large amounts of copies of the single DNA molecule insert. This technique pioneered both the isolation and amplification of genes independently from their source organism. https://www.khanacademy.org/science/ap-biology/gene-expression-and-regulation/biotechnology/a/overview-dna-cloning SN6006 - Information Technology in Healthcare 12 Molecular Techniques Polymerase chain reaction (PCR) A laboratory technique invented in 1983 by Kary Mullis for rapidly producing (amplifying) large amounts of copies of a specific DNA segment. Two main reagents are used. Primers: Short, single-strand DNA fragments that are complementary to the target DNA region, selecting the DNA segment to be amplified. DNA polymerase: A thermal stable enzyme, such as Taq polymerase, that catalyzes the synthesis of DNA molecules from dNTPs (the molecular building blocks of DNA). PCR Cycle Multiple cycles of DNA synthesis are performed to amplify that DNA segment. Each cycle consists of 3 steps. Step 1: Denaturation The double-stranded helix of the DNA template is separated under a high temperature (94–98 °C), breaking the hydrogen bonds between complementary bases and yielding two single-stranded DNA molecules. Step 2: Annealing The temperature is lowered, allowing the annealing of the primers to each of the single-stranded DNA templates at the site of complementary sequences. Stable hydrogen bonds between complementary bases are formed only when the primer sequence very closely matches the template sequence. Step 3: Extension The temperature is raised to the optimum activity temperature for the DNA polymerase (enzyme catalyst). The two DNA strands become templates for the DNA polymerase to assemble a new DNA strand using the free dNTPs enzymatically. Free dNTP that is complementary to the template sequence is added to the end of the elongating DNA strand. Under optimal conditions, after each cycle, the number of DNA target sequences is doubled. SN6006 - Information Technology in Healthcare 13 Polymerase chain reaction (PCR) Enzoklop, CC BY-SA 4.0, via Wikimedia Commons SN6006 - Information Technology in Healthcare 14 Bioinformatics algorithms Needleman–Wunsch algorithm In 1970, Needleman and Wunsch developed an algorithm for aligning 2 protein sequences (pairwise alignment). The algorithm was one of the first applications of dynamic programming to compare biological sequences. Divide a large problem (E.g., the full protein sequence) into a series of smaller problems. The solution of each smaller problem is then used to find an optimal solution to the larger problem. Multiple sequence alignment (MSA) The natural extension of pairwise alignment is multiple sequence alignment to align multiple related sequences to achieve their optimal matching. The process generates multiple matching sequence pairs, which are then turned into a single alignment, which arranges all sequences in such a way that evolutionarily equivalent positions across all the sequences are matched. Although it is theoretically possible to use dynamic programming to align any number of sequences in a similar way as for pairwise alignment. However, the amount of computing time and memory required increases exponentially when more sequences are being compared. Early MSA algorithms until the 1980s were impractical to apply due to their high computational resources required. SN6006 - Information Technology in Healthcare 15 Bioinformatics algorithms Feng-Doolittle progressive sequence alignment In 1987, Feng and Doolittle developed the first truly practical approach to MSA, using a “progressive sequence alignment” approach. Progressive alignment builds up a final MSA by combining pairwise alignments, starting with the most similar pair and progressing to the most distantly related. Stage 1 The Needleman-Wunsch dynamic programming algorithm is used to calculate a set of global pairwise alignments. Stage 2 The relationships between the sequences are represented as a phylogenetic tree, called a guide tree. The alignment scores from Stage 1 are used to approximate the similarity of the pairs. Pairs are added to the guide tree from the most similar pairs to the most distantly related pairs. Stage 3 The MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree constructed in Stage 2. Issues Progressive alignments are not guaranteed to be globally optimal. When errors are made at any stage in growing the MSA, they are propagated through to the final result. Performance is worse when all of the sequences are distantly related. Clustal A widely used computer program used for MSA in bioinformatics. Uses a modified version of the Feng-Doolittle progressive sequence alignment. The most recent version is ClustalΩ (Omega) released in 2011. SN6006 - Information Technology in Healthcare 16 Bioinformatics algorithms Amino acid substitutions In 1978, Dayhoff, Schwartz and Orcutt developed the first probabilistic model of amino acid substitutions. The model was based on the observation of 1572 point accepted mutations (PAMs) in the phylogenetic trees of 71 families of proteins sharing above 85% identity. The PAM matrix is a 20 x 20 substitution matrix that contains values of probability that each amino acid will change in a given small evolutionary interval, based on the observed mutations of each amino acid. Amino acid substitutions became a popular metric for measuring evolutionary changes in a sequence Point accepted mutation (PAM): The replacement of a single amino acid in the primary structure of a protein with another single amino acid, which is accepted by the processes of natural selection. PAM250 matrix SN6006 - Information Technology in Healthcare 17 DNA Sequencing “Central Dogma” During 1970–1980, there was a paradigm shift from studying protein to DNA analysis. Researchers believe that the specifications for any living being (more precisely, its ‘proteins’) are encoded in the specific nucleotide arrangements of the DNA molecule. This view was formalized in Francis Crick’s sequence hypothesis - The “Central Dogma” It was postulated that RNA sequences, transcribed from DNA, determine the amino acid sequence of the proteins they encode. The amino acid sequence, in turns, determines the three-dimensional structure of the protein. DNA is considered as the primary source of biological information. It became clear that DNA would provide unprecedented amounts of biological information. If one can figure out how the cell translates the ‘DNA language’ into polypeptide sequences, One can predict the primary structure of any protein produced by an organism by ‘reading its DNA’. DNA Sequencing “Plus and Minus” method In 1977, a team led by Frederick Sanger developed the ‘plus and minus’ DNA sequencing method. The first to rely on primed synthesis with DNA polymerase. Technical modifications to ‘plus and minus’ DNA sequencing led to the common Sanger chain termination method, which is still in use today. SN6006 - Information Technology in Healthcare 18 DNA Sequencing Advantages Being able to obtain DNA sequences from an organism holds many advantages in terms of information throughput. Proteins must be individually purified to be sequenced The whole genome of an organism can be theoretically derived from a single genomic DNA extract. From this whole-genome DNA sequence, one can predict the primary structure of all proteins expressed by an organism through translation of the genes present in the sequence. Phylogenetics The first phylogenetic tree was reconstructed using protein sequences based on the least number of amino acid changes. But DNA sequences carry more information than the protein sequences in terms of synonymous mutations which does not manifest in a protein sequence. A statistically more robust method known as maximum likelihood method was developed for inferring phylogenetic tree by Felsenstein using DNA sequences. This method finds a tree which has maximum probability of evolving the observed data. Bayesian approach to molecular phylogeny gained momentum in the 1990s and is a very popular and statistically most robust method of phylogenetics. Phylogenetics: is the study of evolutionary relationships among biological entities SN6006 - Information Technology in Healthcare 19 DNA Sequencing Principle Extracting information manually from DNA sequences involves the following 1. Comparisons (E.g., finding homology between sequences from different organisms); 2. Calculations (E.g., building a phylogenetic tree of multiple protein orthologs using the PAM1 matrix); 3. Pattern matching (E.g., finding open reading frames in a DNA sequence). These tasks are much more efficiently and rapidly performed by computers than by humans. Computer assisted analysis As demonstrated in protein sequencing, computer-assisted analysis can yield more information than mechanistic modeling alone Similar approach to protein sequencing is used in DNA analysis, due to the sequence nature of DNA and its remarkable understandability In 1979, Roger Staden developed a collection of software for analyzing Sanger sequencing reads. The software are used to Search for overlaps between Sanger gel readings. Verify, edit and join sequence reads into DNA segments. Annotate and manipulate sequence files. SN6006 - Information Technology in Healthcare 20 Human Genome Project Human Genome Project (HGP) The development of DNA sequencing technologies paved the way for whole-genome sequencing of an organism. Started in 1990 and finished in 2003. Sponsored by the US Department of Energy and the National Institute of Health. Aimed at acquiring the human genome. A complete set of DNA sequences, encompassing all 23 chromosomes. Genome sequences for a number of other key organisms. After completion of human genome project, massive worldwide efforts started to sequence various animal and plant genomes. Recent developments Since the completion of the HGP, attention is now focused on the development of approaches to analyze and learn from volumes of data representing increasing numbers of individuals. Annotation of information associated with disease onto chromosomes. DNA arrays (gene chips): speed the analysis and comparison of DNA fragments. General alignment, database searching and pattern searching. DNA sequencing of genomes has become much cheaper and faster with time Vast amounts of genome sequences are created as big data for computational biologists. The cost of the HGP was close to $3 trillion. By 2010, a single gene chip could detect over a million variations in the base pairs in a genome, costing only several hundred dollars and taking only a few hours. SN6006 - Information Technology in Healthcare 21 Next-Generation Sequencing (NGS) Next-Generation Sequencing Sanger sequencing technique was instrumental in completing the first human genome sequence in 2004 In the same year, a worldwide effort to develop a high-throughput, cheaper and faster next-generation sequencing (NGS) technology was started. This initiative has led to the development of novel next-generation technologies Which can generate enormous number of short reads at an unprecedented speed. Second-generation Sequencing Technology The first NGS technology developed in 2005 was based on the pyrosequencing method Popularly known as the 454-genome sequencer. This technology was soon followed by two new variants of the NGS The Solexa/ Illumina sequencer The SOLiD sequencer In 2010, a new addition to the series of NGS technology was the Ion Torrent technology. Based on semiconductor technology with a small instrument size at a lower cost. Overall, the second-generation technologies have high throughput with a lower error rate and cost per base. SN6006 - Information Technology in Healthcare 22 Next-Generation Sequencing (NGS) Third-generation Sequencing Technology All the second-generation technologies require prior amplification of the template DNA. The necessity of prior DNA amplification was circumvented in the third-generation sequencing technology known as Pacific Biosciences (PacBio) platform. Forth-generation Sequencing Technology The latest addition to sequencing technology, Oxford nanopore is treated as the fourth-generation sequencing technology Capable of producing ultra-long reads at a cheaper cost. The rapid development of next-generation technology was complemented by the ever-increasing computing power and the development of efficient algorithms for the assembly of short reads. The third- and fourth-generation technologies offer long read length with short running time. The advantages of different generations of technologies are complementary, A hybrid-sequencing approach having a mixture of different generations of sequencing offers a better solution to whole-genome sequencing. SN6006 - Information Technology in Healthcare 23 Sequence and Structure Databases Protein Databases Other Biological Databases BIOLOGICAL DATABASES SN6006 - Information Technology in Healthcare 24 Biological Databases Biological Databases Biological data have accumulated at a faster pace in the recent past due to the advent of high-throughput and cheaper next-generation sequencing technologies. New databases are being developed in order to manage this ever-increasing biological data. Biological databases are an important component of bioinformatics research. Biological Databases are usually well-annotated and cross-referenced to other databases. The primary objective of a database is to organize the data in a structured and searchable form, allowing easy retrieval of useful data. Currently, there are more than a thousand of biological databases providing access to multifarious omics data to biologists. Classifications Biological databases can be classified based on levels of data coverage and data curation. Based on the extent of data coverage, biological databases consist of two main categories Comprehensive databases Comprehensive database such as GenBank includes a variety of data collected from numerous species. Specialized databases Specialized databases contain data from one particular species, for example, WormBase contains data on nematode worm. Biological databases can also be classified based on the levels of data curation Primary databases Primary databases such as GenBank are created from experimentally derived raw data generated and submitted by experimental biologists. Secondary databases Seconadry databases are highly curated and are usually created from analysis of various sources of primary data. Ensembl (maintained at the EMBL-EBI, UK). UCSC Genome Browser (maintained at the University of California, Santa Cruz, USA) TIGR (maintained at the Institute of Genomic Research, Maryland, USA). SN6006 - Information Technology in Healthcare 25 Biological Databases Hybrid Databases Some databases have characteristics of both primary and secondary databases. UniProt database stores peptide sequences generated from sequencing experiments. As well as sequences computationally inferred from genomic data. Integrated database retrieval system Majority of biological databases do not have a complete information An integrated database retrieval system provides integrated access to multiple databases. E.g., Entrez maintained by NCBI provides integrated access to 35 distinct databases. SN6006 - Information Technology in Healthcare 26 Sequence and Structure Databases Nucleic acid databases All published DNA and RNA sequences are usually deposited in three parallel public databases GenBank (National Centre for Biotechnology Information), Maintained in the USA GenBank has grown exponentially since its inception in 1982 and currently contains more than 2.1 billion nucleotide sequences. EMBL (European Molecular Biology Laboratory) Maintained in the UK The European Nucleotide Archive (ENA) is maintained by EMBL and provides open access to a wide range of nucleotide sequences, from raw reads to finished genome sequences. DDBJ (DNA Data Bank of Japan available at the National Institute of Genetics) Maintained in Japan. The DDBJ Sequence Read Archive (DRA), provides free access to raw read data and assembled genomic data from next-generation sequencing platforms. These three public databases exchange their data under the International Nucleotide Sequence Database Collaboration (INSDC) framework. The DNA and RNA sequences are directly submitted by the researchers in these databases. Sequences are cross-submitted in the other two databases as well if submitted in one database. The GenBank is searched online using Entrez, whereas both EMBL and DDBJ databases are searched through the integrated retrieval system known as the Simple Realtime Server (SRS servers). Contents Each sequence in the database has a unique accession number and a version number which is common for all three databases. Expressed Sequence Tags (ESTs) Small fragments of mRNA with high error rate are also available on nucleic acid databases. Genome Survey Sequences (GSS) a single-pass fragments of genomic sequences with high error rate are also integral part of these databases. Whole-genome sequences of various species are also deposited in the nucleic acid databases and are regularly updated with the release of new genomic sequences. SN6006 - Information Technology in Healthcare 27 Sequence and Structure Databases Other examples Nucleic acid databases are primarily customized for human DNA and RNA sequences useful in biomedical research. For example, a reference human genome in form of NCBI Refseq database is created and human genetic variation is also profiled in a database known as dbSNP. Ensembl is an integrated platform for genome annotation and distribution of genomic data with comprehensive annotation of genomic variants, transcript structures and regulatory regions. It provides a valuable resource for evolutionary studies using large-scale comparative genomics data of 227 vertebrate and model species. It also provides a genome browser showing genomic variations emerging frequently in SARS-CoV2 virus. The University of California Santa Cruz (UCSC) Genome Browser Currently provides a web-based view of 211 genome assemblies of more than hundred species. The SARS-CoV-2 Genome Browser was addedto UCSC Genome Browser including datasets from major annotation databases. Non-coding RNA Databases There are also many databases dedicated for various non-coding RNAs Such as micro-RNA (miRNA) Long non-coding RNA (lnc RNA) RNAcentral is a popular database for unified access to non-coding RNA sequences. DIANA-TarBase is a reference database of experimentally tested numerous miRNA targets providing cell-specific miRNA- gene interaction. SN6006 - Information Technology in Healthcare 28 Protein Databases Protein Databases The protein sequences available in the protein databases are Obtained from protein sequencing methods such as Edman degradation and peptide mass spectrometry. They are also inferred from three-dimensional structures obtained through X-ray crystallography and Nuclear magnetic resonance (NMR). A significant amount of protein sequence data is also obtained from translating DNA and RNA sequences. Universal protein resource (UniProt) UniProt Provides a comprehensive access to high-quality protein sequences. UniProt Knowledge Base (UniProtKB) The primary source of universal protein sequence information containing more than 189 million sequences obtained from experimental sequencing as well as translated open reading frames (ORF) sequences from EMBL. Contains well-annotated protein sequences and a preliminary assignment of motifs present in the sequences. It is also cross-referenced with other useful databases. Uniprot database also consists of two divisions SwissProt Swissprot is a well-curated database having manual entry of useful information from available literature. TrEMBL TrEMBL is an automated database requiring minimum human intervention. SN6006 - Information Technology in Healthcare 29 Protein Databases (no need) Other important protein databases CATH-Gene3D The CATH-Gene3D database has the classification of about 151 million protein domains into 54,881 superfamilies and the prediction of structural domains on the publicly available protein sequences. GenPept GeneBank Gene Products Databank (GenPept) contain translated coding sequences from GenBank and DDBJ, respectively. The Database of Interacting proteins (DIP) Has a detailed documentation of experimentally determined protein–protein interactions involved in a biological process. The Human Protein Reference Database (HPRD) An integrated platform regarding the domain architecture and post-translational modifications of proteins along with their interactions and association with diseases. Pfam A protein database dedicated to protein families and domains. The Protein Data Bank (PDB) A global archive for protein structures and other macromolecules determined using X-ray crystallography and NMR. An international consortium consisting of four partners, namely Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB-PDB), Protein Data Bank in Europe (PDBe), Protein Data Bank Japan (PDBj) Biological Magnetic Resonance Bank (BMRB) ProteomicsDB A protein-centric database with large quantities of human proteomics data generated using mass spectrometry. It provides a real-time exploration of protein abundance in different tissues, body fluids and cell lines. In addition, a visual representation of diverse drug-target interaction data is also allowed in this database. STRING database Provides an integrated access to all known and predicted protein–protein interactions, both physical interactions and functional associations, in more than 14,000 organisms. SN6006 - Information Technology in Healthcare 30 Other Biological Databases Other Biological databases Expression databases Pathway databases Disease Databases Organism-Specific and Virus Databases SN6006 - Information Technology in Healthcare 31 Importance of Bioinformatics Importance of Bioinformatics Besides diagnosing the 3000 to 4000 hereditary diseases that are currently known, Bioinformatics may be helpful to Discover future drug targets, Develop personalized drugs based on genetic profiles and Develop gene therapies to treat diseases with a strong genomic component, such as cancer Manipulation of genomes in other organisms, such as microbes, has shown promise for energy production, environmental cleanup, industrial processing, and waste reduction. Genetically engineered plants could also be made to be drought or disease resistant. Translational bioinformatics Enable bi-directional crossing of the translational barrier between the research bench and the bed in medical clinics. Create tailor-made drugs for higher efficacy. Pharmacogenomics is an excellent example. Integrating genomic, clinical, and environmental data may offer valuable insights intro the understanding and eventual treatment of disease. The patient’s genetic profile may be an additional data field within the Electronic Health Record. Recently, gene variants have been identified for diabetes, Crohn’s disease, rheumatoid arthritis, bipolar disorder, coronary artery disease, and multiple other diseases. SN6006 - Information Technology in Healthcare 32 Public Health Public Health Informatics PUBLIC HEALTH INFORMATICS SN6006 - Information Technology in Healthcare 33 Public Health Introduction The science and practice of Biomedical Informatics supports public health in its efforts to Promote the health of populations. Prevent disease and unhealthy exposures and behaviors. Protect populations exposed to human-caused or natural disasters. To optimize population health, one must address factors beyond the genetic and biologic makeup of individuals, such as The environment Behaviors Socio-economic status Occupation Access to care Examples Public health measures leading to improved access to safe water and sanitation, nutrition, immunizations, and preventive care (particularly for pregnant women and children). Responsible for 25 of the 30 years gained in life expectancy. Effective improvement of the health status of populations requires the effective application of informatics strategies beyond the clinical care setting. SN6006 - Information Technology in Healthcare 34 Public Health Definition Public health is a complex discipline focused on promoting and protecting the health of people and communities where they live, learn, work, and play. Public Health Practice is Guided by social justice and the needs of all persons within a population, not simply those accessing healthcare delivery systems. Involves a broad array of disciplines and diverse activities With an overarching emphasis on primary prevention. Intervening at the earliest possible place in the causal chain leading to disease or disability. Prevention activities Span improved access to safe food, clean water, air, and sanitation, vaccines, safe roadways and workplaces, etc. In an effort to improve the health of communities, however defined. In comparison, medical care only focuses on the detection, treatment, and management of injury and disease, Achievements and Challenges Public health achievements have been associated with major gains in life expectancy (CDC 19992) Investments in disease prevention yield significant cost savings and a healthier and less costly life (Trust for America’s Health 20203). Despite these achievements, global public health is challenged Increasing mobility of populations. Ongoing threats to security and safe environments. Resulting in regional outbreaks becoming pandemics (E.g., COVID-19). SN6006 - Information Technology in Healthcare 35 Public Health Functions Public Health can be conceptualized in terms of three core functions Assessment Policy Development Assurance Assessment Assessment involves the monitoring and tracking the health status of populations, including identifying and controlling disease outbreaks. By relating health status to a variety of demographic, geographic, environmental, and other factors, it is possible to Develop and test hypotheses about the etiology, transmission, and risk factors that contribute to health problems in a population. Develop and implement control strategies that contribute to improvements in population health. Policy Development Uses the results of assessment activities and etiologic research in concert with local resources, values, and culture to recommend public policies and interventions that improve health status. Assurance Refers to the duty of public health agencies to assure their constituents that services necessary to achieve agreed-upon goals are available. SN6006 - Information Technology in Healthcare 36 Public Health Policy Development Given that public health is primarily a governmental activity, it depends on and is informed by the consent of those governed. Advances in information technology and widespread use of the internet, including social media sites and online discussion forums, as well as the use of mobile apps, provide new opportunities for public health policy development. Policy development in public health should be Based on science Guided by the values, beliefs, and opinions of each society it serves. Therefore, public health officials who wish to promote certain healthy behaviors or to promulgate regulations (E.g., concerning fluoridated water, e-cigarettes, bicycle helmets, social distancing, etc.), should tap into the online marketplace of ideas to Understand the opinions and beliefs of their society Inform and influence the society to engage in those healthy behaviors. Example Observing the relationship between fatalities in automobile crashes and ejection of passengers from vehicles Results in recommendations, and eventually laws, mandating seat belt use. This contributed to a subsequent decrease in morbidity and mortality from automobile crashes. SN6006 - Information Technology in Healthcare 37 Public Health Assurance The duty of public health agencies is to assure their constituents that services necessary to achieve agreed-upon goals are available. Health services and medical care may be Provided directly by the public health agency or By encouraging or requiring other public or private entities to deliver the services. The fundamental of assurance function is To assure that all members of the community have adequate access to needed services, especially preventive care services and testing and diagnostic services in the context of an outbreak such as COVID-19. The function is frequently associated with clinical care but also refers to assurance of the conditions that allow people to be healthy and free from avoidable threats to health. E.g., Access to clean water, a safe food supply, responsive and effective public safety entities, etc. Example In some communities, local public health agencies may Provide direct clinical care to underserved or at-risk populations Offering healthcare services in multiple primary care clinics, schools, community sites, and in people’s homes. In other communities, local public health agencies may Seek to minimize or eliminate direct clinical care services Instead work with and rely on community partners to provide such cares. SN6006 - Information Technology in Healthcare 38 Public Health Informatics Definition The systematic application of informatics methods and tools to support public health goals and outcomes, regardless of the setting. Public Health Informatics is distinguished by its Focus on populations (versus the individual). Orientation to prevention (rather than diagnosis and treatment). Governmental context (because it almost always involves government agencies). Characteristics The differences between public health informatics and other informatics specialty areas is Comparable to the contrast between public health and medical care itself. Medical care Individuals with specific diseases or conditions are the primary concern. Public health Focus on the health of the community as opposed to that of the individual patient. The information and unit of analysis often relates to the community. Sharing information such as disclosure of the disease status of an individual to prevent further spread of illness Isolating individuals to protect others. Information about environmental and other factors is also part of the public health domain. E.g., air and water quality, animal health, etc. Focus on prevention and assessing health status across a population, rather than responding to diagnosis and treatment of individuals. These necessitate the use of standards for health information exchange and large-scale analysis of data across multiple health systems. SN6006 - Information Technology in Healthcare 39 Public Health Systems Public health agencies Public health agencies are typically organized both by disease and by function Which has contributed to both siloed activities and siloed funding. As information technology has become more widely used in public health, information systems have typically been implemented program area by program area, as resources became available. In some health departments, clinical information systems have been separated by disease or clinic. Each disease-specific program usually does not have its own laboratory A single public health clinical facility and its staff may provide varied services Such as immunizations for well children, treatment of people with tuberculosis (TB) and their contacts, and Pap smear services. Departments hey may even combine activities in a single patient encounter Testing women for gonorrhea and chlamydia infections at the same visit where they get a Pap smear. Offering hepatitis B vaccination during a visit for sexually transmitted diseases (STD) treatment. These led to the creation of information silos. Information Silos silo: gap Laboratory information systems were often developed in isolation from systems to support clinical care or public health surveillance. Cases may be present within separate and unlinked program surveillance systems (E.g., hepatitis C and HIV/AIDS). Limiting understanding of shared risk factors that may be useful in prevention and control efforts. Public Health Systems Implemented across a wide range of settings (large and small, urban and rural) With variable infrastructure and capabilities, For a workforce with a wide range of informatics experience and skills and access to technical resources and support. SN6006 - Information Technology in Healthcare 40 Public Health Systems Efficiency There are potential savings and efficiencies from Identifying the ways that the components of one system depend upon information from another system. How a Public Health system can serve multiple programs. Potential efficiencies can be viewed from two perspectives: Shared services Information systems can provide the same services for multiple disease programs. Electronic reporting of selected laboratory results for surveillance purposes can be implemented once for any given public health agency. The same reporting system can receive reportable results related to numerous purposes. Different infectious diseases, acute poisonings, screening tests like Pap smears, abnormal pathology reports for cancer surveillance, etc. Unified systems Information systems supporting different program components can be unified, often using a Master Person Index (MPI). A unified system can allow clinicians treating people with tuberculosis (TB) to have ready access to any HIV testing results on their patients Allow HIV/AIDS clinicians similar access to information about results of tests indicating TB infection. SN6006 - Information Technology in Healthcare 41 Public Health Systems Master Person Index (MPI) Master Person Index (MPI) is frequently developed using three methods Deterministic matching (exact matching) Employs highly discriminatory factors (e.g., ID Card number, social security number, etc.) Probabilistic matching Bases matching on the probability of two records being the same person given a set of matching factors. A combination of both deterministic and probabilistic techniques (also known as fuzzy matching) Alternatively, a common data repository can be designed in which all information about each person is permanently linked. System Integrations A key challenge for the public health informatician is to help their agency make decisions about where information system integration will yield substantial benefits and where it will not. E.g., Unnecessary, impractical, or not worth the cost. There are numerous factors involved in making decisions about system integration. Isolated request An isolated request to determine how many people in a population have been reported with both syphilis and hepatitis B during a particular time interval Analysts can do an ad hoc match of information from two independent surveillance information systems. This task may take an analyst a few days or weeks to accomplish, but it is almost certainly inexpensive compared to the cost of building a new information system that could do this task almost immediately. Ongoing request If the request is to be ongoing, a more efficient solution should be reached. It may be useful and sufficient to offer the functionality to display multiple streams of related surveillance or programmatic data in the same environment, on the same screen, or even in the same chart. SN6006 - Information Technology in Healthcare 42 Public Health Systems Typical Public Health Workflows The US Public Health Informatics Institute published in 2012 a detailed analysis on Typical workflow involved in surveillance, investigation, and intervention for reportable diseases, as well as the corresponding information system requirement. The workgroup was able to identify a large number of processes that were common, which includes Case finding Case investigation Data analysis and visualization Monitoring and reporting Case/Contact specific intervention These common processes can then serve as a basis for designing public health information systems to support case reporting, surveillance, and case-based intervention work. There are also components common to disease control and prevention programs Policy and guidance development Public education SN6006 - Information Technology in Healthcare 43 Public Health Surveillance Disease Surveillance Syndromic Surveillance PUBLIC HEALTH SURVEILLANCE SN6006 - Information Technology in Healthcare 44 Public Health Surveillance Definition The US Centers for Disease Control and Prevention (CDC) defines public health surveillance as “The ongoing, systematic collection, analysis, and interpretation of health data essential to the planning, implementation and evaluation of public health practice, closely integrated with the dissemination of these data to those who need to know and linked to prevention and control”. What is NOT surveillance A one-time data collection activity is not surveillance. Data collection for research purposes is not surveillance. Public Health Surveillance System Purposes Facilitate standardization of data Improve timeliness of reporting In turn, support rapid investigations and implementation of control and prevention activities. Public health surveillance systems can be based on data captured from a variety of sources. Case reports Population-based surveys Sentinel providers Electronic health records Administrative data Various Registries (E.g., a registry of records for a particular disease or immunization records) Registries are usually established by specific legislation, and typically relate to a single topic, and may be restricted to a geographic region. Sentinel: an indicator of the presence of disease SN6006 - Information Technology in Healthcare 45 Public Health Surveillance Characteristics Public health surveillance is the foundation of public health practice. The ability of public health to respond to disease outbreaks and other health-related events in the population relies on timely and valid surveillance data. Recent advances in technology allowing electronic transfer of disease case reports have enabled more complete and rapid reporting and public health responses compared to historical methods of paper-based and fax reporting. Technology and science continue to rapidly evolve and change, as do public health information systems. However, the underlying principles of public health informatics and information systems are persistent. Disease prevention for the community Methods Syndromic surveillance Registries for reporting and follow-up of cases E.g., Cancer, birth defects, lead poisoning, hepatitis B, etc. Population-based surveys Outbreak tracking Information systems must collect data and manage summary information about outbreaks, investigations, and responses. Investigation of outbreaks and clusters may be supported by generic tools or specialized toolkits. Surveillance systems are often integrated with systems to support case management, contact tracing, and case- based disease control interventions. SN6006 - Information Technology in Healthcare 46 Public Health Surveillance Disease Prevention Programs Historically, disease prevention programs have been designed and implemented one disease at a time. Each disease has its own patterns of distribution in populations and risk factors. Each disease may also have different optimal and practical intervention strategies that will be effective in controlling, preventing, or eliminating cases of the disease. Examples of prevention strategies include Vaccination (Measles, HPV vaccinations) Antibiotic treatment of case contacts before they become ill themselves (Gonorrhea) Screening (E.g., Pap smears for Cervical cancer) Treatment of preclinical disease (Cervical cancer) Supplementation of selected foods (E.g., folic acid supplement for Neural tube defects) Components of Public Health Prevention Programs Despite the variety of prevention strategies, each disease prevention program’s components are drawn from a relatively short list. Ideally, program managers choose the most effective combination of these strategies to prevent or control the diseases they are addressing. This must be done within the constraints imposed by Available resources Cost-effectiveness Staffing SN6006 - Information Technology in Healthcare 47 Public Health Surveillance Evidence-based Public Health Public Health Surveillance Data are often used to Define priorities for public health actions To guide a public health response or policy development. Public Health Surveillance informs practice through analysis, interpretation, and dissemination of these data for program planning and evaluation. The analysis of surveillance activities results in descriptive studies examining outcomes or risk factors by Characteristics of person (e.g., age, race, sex, occupation, lifestyle, genetics) Place (e.g., state, county, city, rurality, event, nearby industry) Time (e.g., day, week, year, seasonal) Evidence-based Surveillance data are collected to support Public Health action Analytic studies often seek to compare the effectiveness of public health programs to generate evidence for best practices (known as evidence-based public health). Results of analyses and interpretations should be disseminated to those providing surveillance data, as well as to public health practitioners and community stakeholders to inform current and future programs. Analyses and recommendations based on the surveillance data must be shared with those who provided the data and with others who need to know. SN6006 - Information Technology in Healthcare 48 Public Health Surveillance Public Health Surveillance Data Surveillance data may serve Short-term needs (E.g., to respond to an acute infectious disease outbreak or pandemic such as COVID-19) Longer-term needs (E.g., to determine leading causes of premature death, injury, or disability) Increasingly available for querying and visualization through public health web-sites (e.g., data.gov). Used by epidemiologists and researchers Can impact public understanding of health threats. E.g., data used to manage the COVID-19 pandemic, data used to visualize the increasing prevalence of obesity Over time, these data contributed to the tremendous public focus brought to bear on these problems. Mortality data has been critical for understanding drug overdose problem. Challenges Often, no single data system provides all the information required to appropriately tailor Public Health response, particularly at a local level. For example, In addition to mortality data, more timely and comprehensive non-fatal and fatal overdose data are needed Other systems (such as bio-surveillance, syndromic surveillance systems, or an unintentional drug overdose reporting system) can be used to identify overdoses and emerging threats in local communities. SN6006 - Information Technology in Healthcare 49 Public Health Surveillance Recent Developments Rapid advances in technology and sources of data are changing the practice of Public Health. New data sources and methods to assess and understand Prevalence of disease in communities Impact of public health response actions (E.g., contact tracing or stay at home orders associated with COVID- 19), Health status and determinants of disease in populations, Improved analytical and visualization software (E.g., geographic information systems (GIS)), and Improved ability to integrate and share health data across systems. Informatics is, therefore, a foundational science for public health practice. SN6006 - Information Technology in Healthcare 50 DISEASE SURVEILLANCE SN6006 - Information Technology in Healthcare 51 Disease Surveillance Reportable and Notifiable Diseases An important distinction of diseases is that they may be reportable or notifiable. Reportable diseases Must be reported to the appropriate state or territory by regulation. These reports typically include personal identifying data. Notifiable diseases Voluntarily reported to state agencies and do not include personal identifiers. Passive and Active Disease Surveillance Disease Surveillance can be considered passive or active. Passive surveillance system Utilizes regular, ongoing reporting based on specific criteria Based on specific criteria such as reporting by health-care providers and electronic laboratory reporting (ELR). Reporting entities initiate reports as needed, following a protocol, without the health department actively collecting or soliciting them. Such systems may require considerable effort to design and implement. While passive surveillance limits the resource expenditure of the health departments, the burden is shifted to reporters. As a result, case report data may be incomplete, delayed, and may not represent the true disease incidence in the population. Can be enhanced by periodic review of reports received to identify reporting entities that appear to be delinquent in making required reports. Active surveillance system Requires the health department to actively collect data. Can be used to evaluate passive reporting mechanisms. Can supplement case reports obtained through passive surveillance when more detailed information is required (E.g., during outbreak investigations). Surveillance of chronic diseases and their risk factors based on surveys also does not depend on providers to make case reports. SN6006 - Information Technology in Healthcare 52 Disease Surveillance Case Reports Information to support reportable disease surveillance contains records representing case reports. Traditionally entered manually into a database or information system by public health staff. Based on information received from doctors, infection control practitioners, hospitals, and laboratories. These records contain a combination of clinical, laboratory, and epidemiologic information about each case. Increasing proportions of these case reports can be entered electronically by the practitioner creating the case report. Advances in technology have allowed case reports to move almost instantaneously from electronic health record (EHR) systems maintained by doctors, hospitals, and laboratories to public health authorities. Allow more rapid awareness by public health officials at all levels of individual cases of high-priority diseases, Results in more rapid detection and characterization of likely outbreaks. Electronic Laboratory Reporting The laboratory information in the case reports are increasingly coming from electronic records Transmitted by the public health laboratory, hospital laboratories, and commercial laboratories. Electronic laboratory reporting (ELR) is an enhanced passive reporting system A formatted message is triggered to transmit when a laboratory result matches specific reporting criteria (E.g., a positive antibody test for hepatitis A). Regardless of the specific underlying technology, ELR systems have clear value to Public Health. ELR has demonstrated improvements in both completeness and timeliness of disease reporting, providing significant benefits to Public Health surveillance and activities. SN6006 - Information Technology in Healthcare 53 Disease Surveillance Challenges Case Reports that are rapidly transmitted electronically typically have not benefitted from cleaning, error checking, or collection of initially-missing data by local staff. The simultaneous availability of raw data to multiple agencies at different levels of government also presents certain challenges. A user at the local level can have ready access to information from many sources about local conditions and events, which can be used to interpret local observations. They are in a position to understand when an apparent anomaly in their surveillance data is due to an artifact or to local conditions that are not a cause for alarm. They will also know whether a problem is already under investigation. May not realize that what they are seeing is part of a larger phenomenon. A user at a higher level can see patterns over a larger area They are able to identify multi-jurisdictional outbreaks, patterns, or trends that are not evident at a local level. May prematurely disseminate or act on information that, while based on facts, is incomplete or misleading. Users at different level examining the same raw data at the same time Requires them to be in frequent communication about what they are seeing in their data. Know which apparent anomalies are already explained or need further investigation. Considerations There is a need to balance between the speed of information flow and its quality and completeness. It is technically possible for likely cases of reportable diseases to be recognized automatically in healthcare electronic record systems. Some information can be passed on to public health authorities without human review. There will also be the need for the case reports to be reviewed and validated at the lower level before they are passed on to the higher- level users. SN6006 - Information Technology in Healthcare 54 Disease Surveillance Evaluating Public Health Surveillance Systems A list of key attributes a disease surveillance system can be evaluated includes Simplicity Flexibility Data quality Acceptability Sensitivity Positive predictive value Representativeness Timeliness Stability The relative importance of these attributes varies depending on the conditions under surveillance and the main purposes of surveillance. Timeliness, Positive predictive value (PPV), and Sensitivity of a surveillance system are always in tension with each other Increasing two of these always compromises the third. Example A surveillance system to detect cases of food-poisoning for immediate public health response Puts a high premium on timeliness Users are likely to be willing to accept a modest number of false-positive reports (i.e., a lower positive predictive value) to ensure that reports are received very quickly. A surveillance to support planning of cancer prevention programs and treatment services Less time-sensitive, given the long incubation periods for most cancers The surveillance is more concerned with diagnostic accuracy of every case report than with speed of reporting. SN6006 - Information Technology in Healthcare 55 Disease Surveillance Evaluation When evaluating positive and negative predictive value metrics, it is important to consider the prevalence of the disease in question. As prevalence increases, it is more likely that individuals in the population will truly have the disease. As prevalence decreases, it is more likely that individuals in the population will be free of that disease. The area where the curves overlap represents potential false negative and false positive case reports Shifting the vertical line to the left In circumstances requiring immediate public health response to ensure that The vertical line representing the sensitivity and specificity of a given surveillance case definition all true positive cases are detected. Number of healthy individuals in the population (left) Number of cases of disease (right) Increased sensitivity is accompanied by a decrease in the likelihood that case reports truly meet the surveillance case definition Results in a decrease in positive predictive value. Shifting the vertical line to the right Increase the likelihood that case reports meet the surveillance case definition (increasing positive predictive value) But at the expense of an increase in undetected cases (poorer sensitivity). SN6006 - Information Technology in Healthcare 56 SYNDROMIC SURVEILLANCE SN6006 - Information Technology in Healthcare 57 Syndromic Surveillance Definition Syndromic surveillance uses near real-time electronic pre-diagnostic and diagnostic health-related data to detect, characterize, and monitor events of potential public health importance. The term “syndrome” implies the use of signs and symptoms to identify disease and other events of public health significance. Syndromic surveillance Relies on the intersecting disciplines of epidemiologists, computer scientists, statisticians, standards experts, and academic researchers. Create a near real-time understanding of the community’s health, without requiring special reporting or additional burden from clinical providers Skims portions of the electronic health record systems and analyzes these data to provide public health insights. Syndromic data Generated from Electronic Health Records (EHRs) and transmitted to a public health authority via standardized messaging. The data are then processed, analyzed, and visualized for epidemiologists. The most common data source for syndromic surveillance is emergency department (ED) data. Primary data elements of interest include Chief complaints (brief reason for visiting the emergency department, ideally in the patient’s own words) Diagnosis codes that classify the emergency department visit into a standardized vocabulary. Additional data sources are also brought into the analytic environment, including Death certificates, poison center consultations, over-the-counter medication sales, and various measures of social media attention/concern about syndromes or diseases of interest. The data elements are used to group ED visits into classifiers or “syndromes” for a given time period (daily, weekly, monthly, quarterly). To detect statistically significant clusters in a given area within a given time frame and to monitor trends. SN6006 - Information Technology in Healthcare 58 Syndromic Surveillance Syndromic Surveillance Systems Syndromic Surveillance Systems are based on rapid acquisition of unfiltered, real-time electronic records, usually without individual identifiers. Primary purpose To support detection and characterization of community disease outbreaks, as they are reflected in care received at emergency departments, physicians’ offices, or hospitals. The systems obtain from Hospital emergency rooms and urgent care centers Each visit to an emergency department is assigned to a category or syndrome based on words and strings contained in the patient’s chief complaint and the nurse’s notes. Outpatient physicians’ offices and hospital admissions. Adding outpatient visits and hospital admissions to the scope of syndromic surveillance is opening up additional uses for this technology, especially in the areas of real-time non-infectious disease surveillance. Usage of standardized codes The usage of standardized diagnosis codes (such as ICD-10-CM) can increase both the sensitivity and specificity of a syndrome. Example A classic syndrome is influenza-like illness (ILI). ILI is used as a proxy for influenza because not every case of influenza is tested for the presence of the virus. Different jurisdictions may have different keywords that are included in their ILI syndrome A common approach is to look for the concepts E.g., fever, cough, sore throat, etc. A chief complaint can express concepts like fever in different ways E.g., high temperature, fever of unknown origin (FUO), etc. SN6006 - Information Technology in Healthcare 59 Syndromic Surveillance Primary Goal Traditional public health surveillance is based on a limited list of specific conditions Relies on clinicians or laboratories to report any occurrence of those conditions to the health department. Health departments then work with those initial reports, sometimes ask for additional information, and finally determine whether or not the report meets very specific case definitions. This process can take time for the information to be collated and for case status to be determined. Syndromic surveillance Provides a flexible, near real-time, and complementary approach to traditional public health surveillance. The reporting of data is automated, making syndromic surveillance a form of passive surveillance. The primary goal is to detect, characterize, and monitor potential public health threats in as timely a manner as possible. As a result, syndromic surveillance must rely on data sources and data elements that may lack specificity SN6006 - Information Technology in Healthcare 60 Syndromic Surveillance vs Disease Surveillance Comparison For Disease Surveillance Reportable diseases are defined by law, and healthcare providers transmit information specific to those diseases only. Enables the monitoring of a limited number of diseases with a high degree of specificity Typically relies on laboratory data and clinical diagnoses that can take days or weeks to be reported. Adding additional conditions or emerging public health threats to a physician’s or laboratory’s routine requires additional work on the part of healthcare providers. Reporting may be incomplete. For Syndromic surveillance All patient encounters are transmitted to public health automatically The healthcare provider does not need to filter the data based on specific predetermined case definitions This facilitates rapid characterization and monitoring of a variety of potential diseases and public health threats beyond infectious disease E.g., injury, mental health, etc. Although not as specific or definitive, it allows public health to search the data for any condition or event that may be occurring. Allows suspected public health threats to be identified or ruled out in near real-time. Public health professionals can Identify population groups that may be at greatest risk Assess the severity and magnitude of possible threat(s) and the effectiveness of control measures Develop timely health-related communications for the public and other stakeholders Continuously evaluate and develop new and improved surveillance methods. SN6006 - Information Technology in Healthcare 61 Privacy and Confidentiality Law and Ethics PRIVACY AND CONFIDENTIALITY SN6006 - Information Technology in Healthcare 62 Privacy and Confidentiality Introduction Health information Created in many different healthcare contexts. Used, released, and exchanged for many different purposes, both within the healthcare field and outside of it. E.g., public health, research, emergency response, etc. Therefore, the landscape of health information is complex and involves a variety of entities. Each use, release, disclosure, and exchange of health information must be governed by laws. The concepts of privacy, confidentiality, and authorization are related but separate legal issues. Related to Creation, Use, Release, and Exchange of health information. Concepts Privacy Privacy can be defined as a set of protections for health information that are held by the individual whose information it is. Confidentiality Confidentiality is a duty held by the person who receives a patient’s health information. A duty of confidentiality can apply to various entities that collect health information, including doctors, nurses, and health department officials. Unlike privacy protections, a duty of confidentiality might not be expressly written in law; The application of confidentiality might be implied when a law deems a particular kind of information confidential or requires that the information be held confidentially. E.g., Patient-provider confidentiality requires that a healthcare provider not disclose information gained from an interaction with a patient while engaged in the patient-provider relationship. Authorization Authorization is a tool that enables use and disclosure of protected health information. Laws that require health information be kept private or confidential often have exceptions for situations in which the individual whose information is at stake has affirmatively consented to the release of the information. Other exceptions exist for public health activities that enable public health professionals to access health information that is otherwise confidential. SN6006 - Information Technology in Healthcare 63 Privacy and Confidentiality Considerations The first consideration involves whose data are being held and requested at an information point. Different rules might cover direct patient data than data created by a health department during a disease investigation. A second consideration involves the identity of the information point: where were the data created, and where are they being held? For example, a hospital might be governed by rules different from those of a health department or an academic research university. Another consideration involves the content of the data. General health data may be treated differently than data about alcohol or substance abuse. Data that identify an individual are treated very differently than de-identified or aggregate data. The reason for the data collection can affect the data life cycle. Health data obtained as part of a lawsuit could have a much shorter life cycle than health data collected and stored by a government agency. The identity of the requestor is another significant consideration. A request from a health department to a hospital will be covered by different laws than a request from a law enforcement agency to that same hospital. The purpose of a request will also affect what rules apply as data travel across an information pathway. The release of health data held by a health department to a law enforcement agency might be treated very differently if the purpose is child abuse and neglect reporting versus a general criminal investigation. SN6006 - Information Technology in Healthcare 64 Privacy and Confidentiality of Public Health Information International Laws There are various data protection law that are applied internationally. The General Data Protection Regulation (GDPR) of the European Union (EU) The International Health Regulations (IHR) by the WHO International laws such as the GDPR regulates the processing of data relating to individual inside of the EU regardless of where the processing occurs. Non-compliance to their privacy policies may result in stiff fines. General Data Protection Regulation (GDPR) GDPR applies a single set of rules (and exceptions) to all data processing. The GDPR defines processing broadly. Including operations on personal data as well as data collection, storage, use, disclosure, and destruction. The GDPR rules are rooted in six data processing principles. 1. Lawfulness, fairness, and transparency— requiring lawful, fair, and transparent data processing 2. Purpose limitation—requiring data be processed consistently with the purpose for which it was collected 3. Data minimization—limiting processing to what is necessary for a given purpose 4. Accuracy—requiring “every reasonable step” to ensure that data are accurate 5. Storage limitation—limiting the storage of identifiable data 6. Integrity and confidentiality—requiring appropriate security for processing personal data The GDPR also contains a number of exceptions that permit data processing for various purposes. For example, the GDPR purpose limitation expressly permits additional processing for public interest (including public health), research, and statistical purposes. Despite the broad applicability of the rules, the GDPR permits some flexibility for EU member countries to adapt the rules to their laws and their public interests. SN6006 - Information Technology in Healthcare 65 Privacy and Confidentiality of Public Health Information The International Health Regulations (2005) The International Health Regulations (IHR) provide a framework for 196 participating countries to cooperate on global health issues. Contains provisions that relate to protections for “personal data” (i.e., identifiable data). Deals with the issues of data sharing between countries. For example, if there is a public health emergency of international concern (E.g., the 2014 Ebola outbreak), there will likely be a need to share data between countries to enable an adequate international response. IHR creates confidentiality obligations for countries that receive personal data from another country. Despite these confidentiality obligations The IHR contains strong language permitting the use of personal data “for the purposes of assessing and managing a public health risk.” Gives countries considerable latitude as to how they implement these confidentiality obligations. For example IHR states that “Health information… which refers to an identified or identifiable person shall be kept confidential and processed anonymously as required by national law”. local laws IHR aims to facilitate cross-border data sharing for public health while simultaneously protecting personal data and respecting the autonomy and sovereignty of nations. SN6006 - Information Technology in Healthcare 66 Public Health Ethics Health Data Use and Public Health Ethics Ensuring the ethical use of data is an issue of paramount importance in Public Health. Ethics provide a framework for evaluating whether or not certain actions or activities are good or bad, right or wrong. Principle-based ethical frameworks have dominated the healthcare and public health sectors. Healthcare and Public Health take slightly different ethical approaches based on the concerns of each field. In healthcare and research, the principal concern is the well-being of the individual patient (or research subject). In public health, the principal concern is the well-being of populations. Consequently, bioethics and public health ethics have subtle but significant differences in how they approach ethical problems. Public health ethics is rooted in principles that are more macroscopic than bioethics. WHO Guidelines on Ethical Issues in Public Health Surveillance The World Health Organization (WHO) Guidelines on Ethical Issues in Public Health Surveillance apply a public health ethics framework to surveillance activities. Principle-based ethics is used. The WHO principles reflect the values of public health centered around communities and populations. The WHO guidelines cited four ethical principles 1) Common good 2) Equity 3) Respect for persons 4) Good governance SN6006 - Information Technology in Healthcare 67 Public Health Ethics Ethical Principles The WHO guidelines cited four ethical principles 1) Common good Concerned with promoting and preserving those things that benefit the community. 2) Equity Relates to the conditions for humans to flourish. Concerned with the context and conditions for different people, and acknowledges that differential distribution of resources, benefits, or risks might be required to balance the scales 3) Respect for persons Focus on the rights, liberty, and interests of individuals 4) Good governance Concerned with accountability, transparency, and community engagement. In public health informatics, it is common to have issues where the principles of common good and respect for persons come into conflict. Since the needs of the community will sometimes conflict with individual interests Accountability, transparency and community engagement are critical to provide legitimacy to any actions of the collective that conflict with an individual’s interests. When data are collected and used to promote the needs of the community, mechanisms that ensure transparency and accountability promote trust between the collective and the public health community. These allows the community to review whether the informatics activity appropriately balances the costs and benefits to the community and to individuals. SN6006 - Information Technology in Healthcare 68 Law vs Ethics Law and Ethics Law and Ethics are two independent concepts. Law relates to what can be done Ethics relates to what should be done. While ethics provide a framework for evaluating actions that are right or wrong, it is law that permits or forbids conduct. What is legal is not always ethical, and what is ethical is not always legal. Ideally, laws should reflect ethical values. SN6006 - Information Technology in Healthcare 69 THANK YOU SN6006 - Information Technology in Healthcare 70