INF2003 Database Systems Transaction AY24/25 PDF
Document Details
Uploaded by ConscientiousDeciduousForest
Singapore Institute of Technology (SIT)
2024
null
Zhang Wei
Tags
Summary
These lecture notes for INF2003 Database Systems cover transactions, concurrency, and ACID properties. They provide examples and explanations related to database concepts.
Full Transcript
INF2003: Database Systems Transaction AY24/25 Trimester 1 Zhang Wei SIT Oct. 29, 2024 Survey for SIT ICT Cluster Students and Alumni Dear ICT Students and Alumni, We request your participation in an ICT Degree Survey of undergraduate students and Alumni of the ICT Cluste...
INF2003: Database Systems Transaction AY24/25 Trimester 1 Zhang Wei SIT Oct. 29, 2024 Survey for SIT ICT Cluster Students and Alumni Dear ICT Students and Alumni, We request your participation in an ICT Degree Survey of undergraduate students and Alumni of the ICT Cluster, SIT. The purpose of conducting this survey is to gather the information regarding your considerations and priorities while deciding on the undergraduate programme. The information collected in this survey will be used by the ICT Cluster's Curriculum Management Committee while reviewing the requirement of refinement and fine-tuning of the programme structure. Your participation in this survey is completely voluntary and all of your responses are anonymous. None of the responses will be connected to identifying information. The survey will take about 5 minutes to complete. https://forms.office.com/r/D9iBXCAZep 00:50:41 2nd Half Topics Indexing (mainly centralized) W8 Transactions and Concurrency W9 NoSQL W10 Blockchain W11 Data Warehouse W12 Transaction Definition: a program with some reads and writes (Abbreviation: TXN). In the world of SQL, a TXN is a series of SELECT, UPDATE, and INSERT. Transaction example: transfer 100 dollars from me to Batman. Read balance of my account: bal_wei. Write new balance to my account: bal_wei := bal_wei – 100. Read balance of Batman’s account : bal_wayne. Write new balance to Batman: bal_wayne := bal_wayne + 100. Transaction so simple? What would be the problem? Key issue: concurrency. One database used by several users or threads. Each user can start a transaction freely. Different transactions access the same data. Results depend on the execution order or the operations. Even system throughput varies. Concurrency Example TXN-1: transfer 100 dollars from me to Batman. TXN-2: pay 3% interest for accounts. Read balance of my account: bal_wei. Write to my account: bal_wei := bal_wei * 1.03. Read balance of Batman’s account: bal_wayne. Write to Batman’s account: bal_wayne := bal_wayne * 1.03. Suppose TXN-1 was executed first and then TXN-2. My account, e.g., initially with 200: 200 -> 100 -> 103. Batman, e.g., also with 200: 200 -> 300 -> 309. Reverse the order: TXN-2 -> TXN-1. My account: 200 -> 206 -> 106. (3 dollars more) Batman: 200 -> 206 -> 306. (3 dollars less) Not perfect, but quite lucky, as the total amounts are the same, 412. With interleaving, could be worse. Interleaving When we allow the operations to interleave arbitrarily. Interleaving 1: (suppose again me 200 and Batman 200 initially) TNX-1: 1) W –= 100; 3) B += 100. TNX-2: 2) W *= 1.03; 4) B *= 1.03. Results: me 103; Batman 309; sum 412. Interleaving 2: TNX-1: 1) W –= 100; 4) B += 100. TNX-2: 2) W *= 1.03; 3) B *= 1.03. Results: me 103; Batman 306; sum 409. Interleaving can be bad, or wrong. Cache Matching Algorithm But also a key tool to improve the system performance. Group the same read from different threads together. The same read loads data once from disk to memory and reuses a lot. Scheduling Transactions Serial schedule: does not interleave the actions of transactions. Fact: highest level of safety, but sometimes poor efficiency. Equivalent schedule: two schedules with the same output. Serializable schedule: equivalent to some serial ones’ outputs. (Serial) 200 200 (Serializable) 200 200 Read(W, t) Read(W, t) t := t+100 t := t+100 Write(W, t) 300 Write(W, t) 300 Read(B, t) Read(W, s) t := t–100 s:= s*1.03 Write(B, t) 100 Write(W, s) 309 Read(W, s) Read(B, t) s:= s*1.03 t := t-100 Write(W, s) 309 Write(B, t) 100 Read(B, s) Read(B, s) s := s*1.03 s := s*1.03 Write(B, s) 103 Write(B, s) 103 Transaction: ACID Database becomes inconsistent when transactions are wrongly scheduled. 4 transaction properties - ACID. A: atomicity. C: consistency. I: isolation. D: durability. Atomicity: all updates must happen or not at all. Conceptually, one transaction, one unit. e.g., money transfer as 1 transaction. Op-1: withdrawal from the first account. Op-2: deposit on the second. Op-2 failed, don’t commit Op-1. Otherwise, that money would disappear! Do all or do nothing. ACID: Consistency and Isolation Consistency before transaction -> consistency after transaction. Or, each transaction must preserve the consistency of the database. Programmer’s job to preserve. DBMS’s job to execute atomically. Possible to be inconsistent in the middle of one transaction. Consistency guarantees the correctness of transaction? No. Responsibility of application-level code. You may mistakenly write 3% interest rate as 30%. Nothing to do with C. Isolation: concurrently executed as if isolated. No result changes due to concurrency. e.g., A teller looking up a balance. Must be isolated from a concurrent withdrawal trans. Only when withdrawal commits, -> the teller can see the new balance. ACID: Durability Durability: once done, holds true even system crashes. Record where we are in the beginning. In case cannot compete, go back to the original state. Stay down. Final warning. Transaction log: records the transaction details. Include any changes to the database. Include when transactions complete and how. Write before commit. If crashes, no opportunity to record. I could do this all day! Chained together by trans id. Easy to restore to any point. Log stored on disk, not memory. Why? ACID in money transfer example. Atomicity: cannot deduct money from my account without giving it to Batman. Consistency: money comes from somewhere, isn’t lost or gained. Isolation: invisible to the public until fully complete. Durability: the money does not flow back to my account. Transactions in SQL COMMIT: end the current trans and make pending changes permanently. ROLLBACK: discard all pending changes. SAVEPOINT: a marker, which divides a transaction into pieces. ROLLBACK TO SAVEPOINT: any errors, rollback to the specified SAVEPOINT. Keep the changes until the SAVEPOINT and discard the changes after it. We code line by line and find an error after the 2nd update. We execute rollback immediately. Then commit. What is the grade of student 200123? Do we have student 200999? UPDATE Students SET grade=“B” WHERE sid=200123; Trans ends where? SAVEPOINT sp1234; Either COMMIT or ROLLBACK. INSERT INTO Students VALUES(200999, “Jon”, “A”); UPDATE Students SET grade=“A+”; Or DDL, or GRANT, or REVOKE. ROLLBACK TO SAVEPOINT sp1234; Naturally exit. COMMIT; Recovery Transaction offers one layer of protection but cannot protect all. Potential failures: system crash; power failure; disk crash; user mistake, cyber-attack, earthquake, etc. Singapore becomes the datacenter hub in ASEAN w/ system, infra, and talent. Good practice: prevention is the best medicine. Robust (Linux for server), security (local network), UPS, etc. Relatively easy case: physical media is still working. Create some checkpoints to restart, redo, etc. Details beyond the scope of this module. Physical media damage: more serious. Data damaged. Transaction log damaged. In practice, backup the database in a secondary device, e.g., a tape. Try to backup the transaction log as well, in a secondary device. e.g., 1% failure rate for 1 device; 0.01% for 2 devices. Concurrency Control: Lock For atomicity, what are we afraid of? Not serial. Interleaving. A brute-force solution: lock. When I am using the data, you cannot use, and vice verse. Prevent another transaction from reading inconsistent data. Force the system to be non-concurrent. Lock manager: responsible for assigning and policing the locks. Lock granularity. Database-level lock: lock the entire database. Table-level lock: lock the entire table. Page-level lock: lock the entire disk page. The first two unsuitable for multi-user. Page-level much more popular. Row-level lock: each row accessed by one trans and locked to the others. Field-level lock: can access the same row, as long as for different columns. The last two, often impractical with a lot of locks. Lock and Optimistic Methods Deadlock: you wait for me to release, and I wait for you. Forever. Prevent (double-check the code) and detect (timeout). Do we really need locks? Underdog for speed? All read operations without any writes -> no problem. If there are writes, many applications do not write the same data. Even write the same data, often not at the same time. Go ahead to execute, assuming/expecting bad things won’t happen. Anything bad happens, restore to the original state. A popular optimistic method: 3-phase method (no lock). Read: trans keeps a private data copy -> update in a private and temporary file. Validate: don’t affect integrity and consistency -> if pass, write; cannot pass, restart. Write: everything okay, write (commit the changes) permanently. Summary Transaction is a core concept in database. More important for large-scale and multi-user database. Transactions improperly executed -> errors and data inconsistency. Understand the examples. Interleaving can be troublesome. 4 transaction properties: ACID. Atomicity, consistency, isolation, and durability. Simple way to avoid troubles: lock. Drawbacks: slow and overhead -> try first w/t lock (3-phase method). Transaction control language (TCL) in SQL: COMMIT, ROLLBACK, etc. Database can crash. Prevent (the best defense is a good offense). Database issues is not limited to SQL. Be aware of the real-world. May have to scale up the database. Remain centralized? Go distributed?