Data Models and Data Management PDF

Document Details

EntrancingDragon

Uploaded by EntrancingDragon

National University of Singapore

Dr. Judice Koh / Dr. Kenneth Ban

Tags

data models database management clinical data biomedicine

Summary

This document discusses data models and data management, focusing on biomedical data. It covers topics such as data types, data models, relational databases and Structured Query Language (SQL), along with the data lifecycle, security, and privacy.

Full Transcript

Data Models and Data Management Dr. Judice Koh / Dr. Kenneth Ban NUS Medicine [email protected] Discussion points for Lecture 3 Why do we need to model and manage data? What are the most critical aspects of data manag...

Data Models and Data Management Dr. Judice Koh / Dr. Kenneth Ban NUS Medicine [email protected] Discussion points for Lecture 3 Why do we need to model and manage data? What are the most critical aspects of data management in biomedicine? (keywords) Learning objectives Explain the differences between structured, unstructured, semi- structured and meta data Explain the differences between different data models (relational/document/graph) Explain how data is organized in relational databases Describe how to use SQL in relational data querying Explain the lifecycle of data management Discuss other aspects of clinical data management including data privacy, security and de-identification. Overview Data Management Life Cycle Data types and Data models Relational database and RDBMS Structured Query Language Practical session on SQL 1. Data Management Life Cycle Data-driven healthcare Biological Biomedical/Clinical Research data Biological models Patient Health Record Data Communication Analytics Patient & Development of Information Healthcare prevention, diagnosis and Decision support Provider treatment guidelines Sources of data: healthcare/biomedical Healthcare Institution (HCI) Administrative Billing Public Health Personal Research Data Tracking Data Admissions/Clinic (Surveillance) Data Patient Care National EHR EHR Prescription Centralized Data Repository Lab Procedures Imaging Linkage of data allows for integrative analysis of Big Data different data sources Time/Event associated with outcomes Data management: Lifecycle The approach for managing data throughout its lifecycle so as to ensure optimization from creations to deprecation Create and identify relevant data while filtering out unnecessary information. Store and maintain data securely in a stable environment for future use. Use data to drive actionable outcomes while managing security Share and manage access to data for collaborators and audience Archive valuable, rarely used data for long- term compliance or historical purposes. Destroy unnecessary data to reduce management costs and improve efficiency. Data management : Data repository An example of how clinical data is handled from source to final form in data repository Data Repository Source Pseudo-ID Cleaning/ Source Data De-identification Linkage Transformation Ingest Pseudonymization Validation Source Combining Catalog Remove identifiers Processing Data management: Security Data needs to be secured with proper access controls and audit logs Source Administrators Users Is access logged? Data Repository Is the data encrypted? Cleaning/ Data De-identification Linkage Transformation Who has access to the datasets at each zone? Data management: Catalog Metadata collection allows for building a data catalog Ingestion Data Source Catalog Repository Metadata Example Tags Descriptive Keywords Schema Structural Format Timestamp Provenance of data origin Administrative Origin Discovery of datasets by users Data management: Privacy The right of an individual to protect their personal data Personal data is data about an individual who can be identified from that data, or in conjunction with other information Direct Identifiers Indirect Identifiers Can explicitly identify particular Can be used together or in conjunction individuals and therefore should be with other information to identify removed particular individuals Name Gender Mailing address Ethnicity Phone number Birth year or age Email address Place of birth NRIC Rare disease or treatment Passport No Occupation Biometric data Annual income Vehicle identifiers Postal code Data management: Privacy De-identification removes identifying information such that the remaining data cannot be used to identify any particular individual Removal/masking of information needs to be balanced against the usefulness of the data after de- identification El Emam & Arbuckle, 2014 2. Data types and Data models Types of data Variety of Big Data Structured Unstructured Semi-structured Metadata Unstructured data accounts for 80% of the world’s data Berisha, Blend & Mëziu, Endrit. (2021). Big Data Analytics in Cloud Computing: An overview. 10.13140/RG.2.2.26606.95048. Structured data type Schema with relationships Pre-defined data model or schema Usually tabular format with relationships Tabular data Unstructured data type No pre-defined data model in non-tabular native format History 52 yr Chinese male Ex-smoker, HPT started on captopril Cough x 2/52 dry no SOB no chest pain P/E BP 150/90 HR 82 RR 20 Pharynx NAD Lungs clear, no creps Heart S1S2 displaced apex No pedal edema Free-text clinical Images (e.g. notes CXR, CT) Semi-structured data type Flexible data model Usually text with markers to indicate semantic elements Can include hierarchies for records and fields Example of JSON Metadata data type E.g JPG image Data that describes other data Metadata Example Tags Descriptive Keywords Purpose EXIF metadata Schema Structural Camera: Canon PowerShot A520 Format Lens: Auto exposure, 1/501 sec, f/4, ISO 50 Exposure: Timestamp Flash: Off, Did not fire Administrative Origin Focus: Single, Auto AF point selection, with a depth of field of from 65.53 m to infinity. Date: November 25, 2006 11:06:59AM Data Models - How can data be organized? Document Relational Graph Green B ID ID A 1 1 Pink C 2 2 ID 3 3 D 4 Yellow 4 F 5 ID E 5 Data stored in Data stored in Data stored in documents tables linked by nodes linked by identified by ID relationships relationships Document data model Data is represented as Key : Value documents without need for { a fixed schema of _id: "10025AE336", Personal_details:{ tables/relationships First_Name: "John", Last_Name: "Doe", Data stored as key : value Date_Of_Birth: "1995-09-26" }, pairs in semi-structured Contact: { format (e.g. JSON) e-mail: "[email protected]", phone: "65-555-5555" Data can be nested to }, Address: { represent hierarchical City: "Singapore", relationships Country: "Singapore" } NoSQL } Document data model: example document id { "_id": { "$oid": "63272de0e544c500070a12f7" }, "name": "John Doe", key: value pair "age": 42, "date_encounter": { "$date": "2022-07-12T00:00:00Z" }, "diagnosis": { "snomed": "281794004", "description": "URTI" } } hierarchical nesting Document data model: example Flexible model (adding icd-10 field) { "_id": { "$oid": "63272de0e544c500070a12f7" }, "name": "John Doe", "age": 42, "date_encounter": { "$date": "2022-07-12T00:00:00Z" }, "diagnosis": { Addition of new - "icd-10": "J06.9", "snomed": "281794004", data (key: value) "description": "URTI" } } Document data model: example Querying (age >40 and (pl:Place) 0) Alter structure of a table  Syntax: ALTER TABLE table_name ADD/DROP column_name or ALTER TABLE table_name MODIFY column_name  Example: (Lab 1 Part 2 Step 5) ALTER TABLE patient MODIFY name varchar(100) not null Insert data into table Syntax: INSERT INTO table_name(column_name1, column_name2,...) VALUES(value1, values 2,...) Example: (Lab 1 Part 2 Step 6) INSERT INTO patient(patient_id, name, gender, dob) VALUES(1, 'Ben Mark', 'M', '1980-01-08'); INSERT INTO patient(patient_id, name, gender, dob) VALUES(2, 'Petter Nielsen', 'M', '1970-12-12'); INSERT INTO patient(patient_id, name, gender, dob) VALUES(3, 'Sahay Sundeep', 'M', '1960-02-19'); INSERT INTO patient(patient_id, name, gender, dob) VALUES(4, 'Jorn Braa', 'F', '1976-08-27'); INSERT INTO patient(patient_id, name, gender, dob) VALUES(5, 'Ole Hanseth', 'F', '1956-11-07'); Update data in table Syntax: UPDATE table_name SET column_name1 = new_value1, column_name2 = new_value2... WHERE conditions Example: (Lab 1 Part 2 Step 8) UPDATE patient SET name='Jorn Anders Braa' WHERE patient_id=4 Query data in table  Syntax: SELECT * FROM table_name SELECT column_name1, column_name2,.... FROM table_name  Example: (Lab 1 Part 3 Step 1 and 2) SELECT * FROM patient; SELECT patient_id, name FROM patient LIMIT 2; Filter records using WHERE clause  To search rows that meet specific conditions, use WHERE clause  Operators:=,>=, '1965-01-01’; Returning unique values  DISTINCT returns only unique values  Multiple columns can be selected  Syntax: SELECT DISTINCT column_name1, column_name2,.... FROM table_name  Example: (Lab 1 Part 3 Step 7) SELECT gender FROM patient What is the result from the SELECT DISTINCT gender FROM patient second SQL command? Sorting query results  Query results can be sorted in ascending [ASC] or descending [DESC] order  The result is sorted by ascending order by default.  Syntax: SELECT * FROM table_name ORDER BY column1 [DESC], column2 [DESC]  Example: (Lab 1 Part 3 Step 9) SELECT * FROM patient ORDER BY gender ASC, dob ASC LIKE with wildcards  LIKE with wildcards can be used to search for a specific pattern  For string columns  %: zero or more character  Syntax: SELECT * FROM table_name WHERE column1 LIKE patterns  Example: (Lab 1 Part 3 Step 10) SELECT * FROM patient WHERE name LIKE 'B%'; What is the result from the SELECT * FROM patient SQL commands? WHERE YEAR(dob) LIKE '197%'; Arithmetic operations  For numerical columns, some arithmetic operations can be applied  Syntax: SELECT MIN | MAX | COUNT | AVG | SUM (column1) FROM table_name  Example: (Lab 1 Part 3 Step 12) SELECT COUNT(*) AS total_number_patient FROM patient; SELECT MIN(dob) AS DOB_oldest, MAX(dob) AS DOB_youngest FROM patient; What is the result from the SQL commands? Grouping query results  The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country".  Often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more columns.  Can be used with single column or multiple columns  Syntax: SELECT column1, column2, aggregate_function(column3) FROM table_name GROUP BY column1, column2  Example: (Lab 1 Part 3 Step 13) What are the DOB of the oldest and youngest male and female patients? SELECT gender, MIN(dob) AS DOB_oldest, MAX(dob) AS DOB_youngest FROM patient GROUP BY gender; Combine multiple tables Two tables (A & B) can be combined in several ways Set operations Relational operations UNION INNER JOIN INTERSECT LEFT OUTER JOIN EXCEPT RIGHT OUTER JOIN FULL OUTER JOIN SQL multiple tables: set operations Only for tables that have same columns 1 a 2 b Table A UNION INTERSECT EXCEPT 3 c 4 d 1 a 3 c 1 a 2 b 4 d 2 b 3 c 3 c 5 e 4 d 4 d 6 f Table B 5 e 5 e 6 f 6 f Set Operations  Set operations of 2 tables with exact same columns  Syntax: SELECT * FROM table_name A UNION/INTERSECT/EXCEPT SELECT * FROM table_name B  Example: Create a new database NUSmed2 and patient table. Integrate the patient tables in the two databases. (Lab 1 Part 4 Step 1 and 2) SELECT * FROM nusmed.patient INTERSECT SELECT * FROM nusmed2.patient SQL multiple tables: relational operations For tables that have Table A 1 a Table B a 100 matching keys 2 b b 200 3 c e 300 LEFT RIGHT OUTER FULL INNER JOIN OUTER JOIN JOIN OUTER JOIN 1 a 100 1 a 100 1 a 100 1 a 100 2 b 200 2 b 200 2 b 200 2 b 200 3 c NULL NULL e 300 3 c NULL NULL e 300 Join Operations Combine rows in tables where values of common fields are the same Can contain multiple join tables Syntax: SELECT columns FROM table1 INNER JOIN/LEFT JOIN/RIGHT JOIN/FULL JOIN table2 ON table1.column1 = table2.column2 Example: (Lab 1 Part 4 Step 3 and 4) SELECT * FROM patient AS p _(A)_ prescription AS s ON ________(B)________; What are the commands in (A) and (B)? Review  4 classes of SQL (DDL, DML, TCL, DCL)  Data Definition Language  CREATE, DROP table | database  ALTER table | database  TRUNCATE table  Data Manipulation Language  INSERT INTO table  UPDATE table  SELECT … FROM … WHERE… [IN/NOT IN] [BETWEEN.. AND] [LIKE]… LIMIT, ORDER BY  MIN | MAX | AVG | COUNT | SUM, GROUP BY, HAVING  INNER | LEFT | RIGHT | OUTER JOIN Thank you!

Use Quizgecko on...
Browser
Browser