BCA DBMS (BCA37107) Study Material PDF
Document Details
Uploaded by EnergySavingAzalea1365
Brainware University, Kolkata
2024
Uttaran Chowdhury
Tags
Summary
This document is study material on database normalization, specifically for a BCA course (DBMS). It outlines the process and advantages of normalization, as well as challenges related to complex queries, and managing a normalized database.
Full Transcript
Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Study Material NORMALIZATION in DBMS Course Name: Database Management System...
Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Study Material NORMALIZATION in DBMS Course Name: Database Management System Course Code: BCA37107 Table of Contents: Concept of Normalization Need of Normalization Types of Normal forms Advantages of Normalization Disadvantage of Normalization 1NF 2NF 3NF BCNF 4NF Model Question ❖ Normalization The process of arranging the data in the database is called normalization. To reduce repetition in a relation or group of relations, normalization is utilized. Unwanted features like Insertion, Update, and Deletion Anomalies are also removed with its help. The larger table is divided into smaller ones by normalization, and relationships are used to connect them. The database table's redundancy is decreased by using the normal form. ❖ Need of Normalization Eliminating these anomalies is the primary goal of normalizing the relationships. As the database expands, failure to remove anomalies might result in data redundancy, data integrity issues, and other issues. A set of rules known as normalization aids in directing you in the creation of a sound database structure. Data modification anomalies can be categorized into three types: 1) Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 new tuple into a relationship due to lack of data. 2) Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some other important data 3) Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be updated. ❖ Types of Normal Forms ✓ Normalization works through a series of stages called Normal forms. ✓ The normal forms apply to individual relations. ✓ The relation is said to be in particular normal form if it satisfies constraints. Following are the various types of Normal forms: Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Normal Description Form 1NF A relation is in 1NF if it contains an atomic value. 2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key. 3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists. BCNF A stronger definition of 3NF is known as Boyce Codd's normal form. 4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency. Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 ❖ Advantages of Normalization Reduces Redundancy: Minimizes duplicate data. Improves Integrity: Ensures data consistency and accuracy. Enhances Security: Better control over sensitive data. Simplifies Maintenance: Easier updates and changes. Optimizes Queries: Efficient data retrieval. Supports Scalability: Easier to expand and adapt. Boosts Quality: Maintains data validity and consistency. ❖ Disadvantages of Normalization Complex Queries: Joins required to retrieve data from multiple tables can make queries more complex. Performance Overhead: Frequent joins can lead to slower query performance, especially with large datasets. Increased Maintenance: Maintaining a highly normalized database can be more labor-intensive, requiring more careful schema design and updates. Scalability Challenges: While normalization supports logical scalability, physical scaling can be more challenging with highly normalized schemas. ❖ First Normal Form (1NF) First Normal Form (1NF) is a property of a relational database table to ensure that it is structured properly. To be in 1NF, a table must adhere to the following rules: 1. Atomicity: Each column must contain only atomic (indivisible) values. This means that each field in a table must contain only one value. 2. Uniqueness: Each column must contain unique values, meaning there should be no repeating groups or arrays in a single column. 3. Consistency: All entries in a column must be of the same data type. Example Consider the following table that lists students and their subjects, which is not in 1NF: Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 StudentID Name Subjects 1 Arpan Math, Science 2 Boby English, History This table violates 1NF because the Subjects column contains multiple values (i.e., Math, Science for Alice and English, History for Bob). To convert this table into 1NF, we split the Subjects column so that each subject is in its own row: StudentID Name Subject 1 Arpan Math 1 Arpan Science 2 Boby English 2 Boby History Now, the table adheres to 1NF because each column contains only atomic values, there are no repeating groups, and all entries in the Subject column are consistent. This structure allows for better organization and querying of data. ❖ Second Normal Form (2NF) Second Normal Form (2NF) builds on the requirements of the First Normal Form (1NF) and addresses the issue of partial dependencies. A table is in 2NF if it is in 1NF and every non-key attribute is fully functionally dependent on the entire primary key. Example Consider a table of student enrollments that is in 1NF but not 2NF: StudentID CourseID CourseName Instructor 1 101 Math Dr. Saha 1 102 English Prof. Dutta Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 StudentID CourseID CourseName Instructor 2 101 Math Dr. Saha 2 103 History Dr. Roy Primary Key: Composite key (StudentID, CourseID) This table is in 1NF because all columns contain atomic values. However, it is not in 2NF because non-key attributes (CourseName and Instructor) are partially dependent on CourseID alone, not the entire primary key (StudentID, CourseID). To normalize this table to 2NF, we decompose it into two tables to remove the partial dependencies: Table 1: Enrollments StudentID CourseID 1 101 1 102 2 101 2 103 Table 2: Courses CourseID CourseName Instructor 101 Math Dr. Saha 102 English Prof. Dutta 103 History Dr. Roy Now, the Enrollments table contains only the relationships between students and courses, with no partial dependencies. The Courses table holds the details about each course, and every non-key attribute is fully functionally dependent on the primary key (CourseID). This decomposition eliminates partial dependencies and ensures that the tables are in 2NF. This approach also makes the data more manageable and reduces redundancy. Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Question: - Given the relation R(StudentID, CourseID, CourseName, Instructor) with the following functional dependencies: StudentID, CourseID → CourseName CourseID → Instructor a) Identify the candidate key(s) for the relation R. b) Explain why the relation R is not in 2NF. c) Decompose the relation R into 2NF. ❖ Third Normal Form (3NF) Third Normal Form (3NF) is an important step in database normalization that further refines the structure of a relational database to reduce redundancy and improve data integrity. Let's dive into the details: Definition of 3NF A relation is in Third Normal Form (3NF) if it satisfies the following conditions: 1. It is in Second Normal Form (2NF). 2. It has no transitive dependencies: A non-key attribute must not depend on another non-key attribute. Formally, a relation R is in 3NF if, for every non-trivial functional dependency X→Y: X is a super key, or Y is a prime attribute (i.e., an attribute that is part of a candidate key). Understanding Transitive Dependencies A transitive dependency occurs when a non-key attribute depends on another non-key attribute. For example, if A→B and B→C, then A→C is a transitive dependency. Example Consider the following table that is in 2NF but not in 3NF: Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 StudentID CourseID CourseName Instructor InstructorEmail 1 101 Math Dr. Smith [email protected] 1 102 English Prof. Lee [email protected] 2 101 Math Dr. Smith [email protected] 2 103 History Dr. Brown [email protected] Functional Dependencies: StudentID,CourseID→CourseName CourseID→Instructor Instructor→InstructorEmail Steps to Convert to 3NF 1. Identify the transitive dependencies. 2. Decompose the table to eliminate transitive dependencies. Decomposition To convert the table to 3NF, we decompose it into three relations: Relation 1: Enrollments Enrollments(StudentID, CourseID, CourseName) Functional Dependency: StudentID,CourseID→CourseName Relation 2: Courses Courses(CourseID, Instructor) Functional Dependency: CourseID→Instructor Relation 3: Instructors Instructors(Instructor, InstructorEmail) Functional Dependency: Instructor→InstructorEmail Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Resulting Tables Enrollments Table: StudentID CourseID CourseName 1 101 Math 1 102 English 2 101 Math 2 103 History Courses Table: CourseID Instructor 101 Dr. Smith 102 Prof. Lee 103 Dr. Brown Instructors Table: Instructor InstructorEmail Dr. Smith [email protected] Prof. Lee [email protected] Dr. Brown [email protected] Benefits of 3NF Reduced Redundancy: By removing transitive dependencies, 3NF minimizes data duplication. Improved Data Integrity: Ensures that updates and deletions are easier to manage without introducing anomalies. Enhanced Clarity: Simplifies the database structure, making it more logical and easier to understand. Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 ❖ Boyce-Codd Normal Form (BCNF) Boyce-Codd Normal Form (BCNF) is an enhanced version of the Third Normal Form (3NF). BCNF aims to handle certain types of anomalies that are not handled by 3NF. Let's delve into the specifics: Definition of BCNF A relation is in Boyce-Codd Normal Form (BCNF) if, for every one of its non-trivial functional dependencies X→Y: X is a superkey. Formally, a relation R is in BCNF if and only if, for every functional dependency X→Y in R, X is a superkey of R. Understanding Superkeys A superkey is a set of one or more columns (attributes) that can uniquely identify a row in a table. A candidate key is a minimal superkey, meaning that no proper subset of it is also a superkey. Example Consider the following table that is in 3NF but not in BCNF: StudentID CourseID Instructor 1 101 Dr. Smith 1 102 Prof. Lee 2 101 Dr. Smith 2 103 Dr. Brown Functional Dependencies: StudentID,CourseID→Instructor StudentID, Instructor→CourseID Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Decomposition into BCNF To decompose the table into BCNF, we need to remove the violation of BCNF by ensuring all dependencies have the left side as a superkey. 1. Identify the violating dependency: Instructor→CourseID 2. Decompose the original table to satisfy BCNF: Relation 1: Instructors Instructors(Instructor, CourseID) Functional Dependency: Instructor→CourseID Here, InstructorInstructor is the superkey. Relation 2: Enrollments Enrollments(StudentID, CourseID) No transitive dependencies. The combined key StudentID,CourseIDStudentID, CourseID is the superkey. Resulting Tables Instructors Table: Instructor CourseID Dr. Smith 101 Prof. Lee 102 Dr. Brown 103 Enrollments Table: StudentID CourseID 1 101 1 102 Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 StudentID CourseID 2 101 2 103 Benefits of BCNF Eliminates Redundancy: By ensuring that all dependencies have superkeys on the left- hand side, BCNF minimizes redundancy. Prevents Anomalies: Reduces update, insertion, and deletion anomalies that could occur due to indirect dependencies. Enhances Data Integrity: By adhering strictly to the superkey rule, BCNF ensures better data consistency and integrity. ❖ Fourth Normal Form (4NF) Fourth Normal Form (4NF) is an advanced level of database normalization that goes beyond BCNF (Boyce-Codd Normal Form). It addresses multi-valued dependencies, which can cause redundancy and anomalies even in BCNF-compliant tables. Definition of 4NF A relation is in Fourth Normal Form (4NF) if: 1. It is in Boyce-Codd Normal Form (BCNF). 2. It has no multi-valued dependencies: This means there should be no non-trivial multi- valued dependencies other than a candidate key. Understanding Multi-Valued Dependencies A multi-valued dependency (MVD) exists in a relation when one attribute determines a set of values for another attribute, independently of the values of other attributes. In other words, an MVD occurs when for a single value of attribute, A, there are multiple values of attribute BB that are not dependent on C. Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Example: STUDENT STU_ID COURSE HOBBY 21 Computer Dancing 21 Math Singing 34 Chemistry Dancing 74 Biology Cricket 59 Physics Hockey The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data. So to make the above table into 4NF, we can decompose it into two table STUDENT_COURSE STU_ID COURSE 21 Computer 21 Math 34 Chemistry 74 Biology 59 Physics STUDENT_HOBBY Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 STU_ID HOBBY 21 Dancing 21 Singing 34 Dancing 74 Cricket 59 Hockey Benefits of 4NF Eliminates Redundancy: By removing multi-valued dependencies, 4NF reduces data redundancy. Prevents Anomalies: It avoids update, insertion, and deletion anomalies related to multi-valued dependencies. Enhances Data Integrity: Ensures that the database maintains accurate and consistent data by properly organizing independent sets of values. ❖ Model Question: 1. From the given a relation R(A,B,C,D) with the following functional dependencies: { A→B , B→C , A→D} Normalize R into 3NF. 2. “Every relation in BCNF is also in 3 NF; but a relation in 3 NF is not necessarily in BCNF” Explain. 3. Compare between 3NF and BCNF with suitable example. Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Study Material Transaction in DBMS Course Name: Database Management System Course Code:BCA37107 Module: V Table of Contents: Basic Concept of Transaction Concurrency Problems in DBMS Serializability in DBMS Conflict Serializability View Serializability Recoverability in DBMS Two Phase Locking Protocol Transaction: Transaction is a set of operations which are all logically related. Transaction is a single logical unit of work formed by a set of operations. Operations in Transaction: The main operations in a transaction are- Read Operation Write Operation Example of Transaction: E.g. transaction to transfer 500 from account A to account B: read(A) A := A – 500 write(A) read(B) B := B + 500 write(B) Transaction States: A transaction goes through many different states throughout its life cycle. These states are called as transaction states. Transaction states are as follows- Active state Partially committed state Committed state Failed state Aborted state Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Terminated state Transaction states (Cont.): Active State This is the first state in the life cycle of a transaction. A transaction is called in an active state as long as its instructions are getting executed. All the changes made by the transaction now are stored in the buffer in main memory. Partially Committed State After the last instruction of transaction has executed, it enters into a partially committed state. After entering this state, the transaction is considered to be partially committed. It is not considered fully committed because all the changes made by the transaction are still stored in the buffer in main memory. Committed State After all the changes made by the transaction have been successfully stored into the database, it enters into a committed state. Now, the transaction is considered to be fully committed. After a transaction has entered the committed state, it is not possible to roll back the transaction. Failed State When a transaction is getting executed in the active state or partially committed state and some failure occurs due to which it becomes impossible to continue the execution, it enters into a failed state. Aborted State After the transaction has failed and entered into a failed state, all the changes made by it have to be undone. Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 To undo the changes made by the transaction, it becomes necessary to roll back the transaction. After the transaction has rolled back completely, it enters into an aborted state. Terminated State This is the last state in the life cycle of a transaction. After entering the committed state or aborted state, the transaction finally enters into a terminated state where its life cycle finally comes to an end. ACID Properties It is important to ensure that the database remains consistent before and after the transaction. To ensure the consistency of database, certain properties are followed by all the transactions occurring in the system. These properties are called as ACID Properties of a transaction. Atomicity This property ensures that either the transaction occurs completely or it does not occur at all. In other words, it ensures that no transaction occurs partially. That is why, it is also referred to as “All or nothing rule“. Consistency This property ensures that integrity constraints are maintained. In other words, it ensures that the database remains consistent before and after the transaction. It is the responsibility of DBMS and application programmer to ensure consistency of the database. Isolation This property ensures that multiple transactions can occur simultaneously without causing any inconsistency. During execution, each transaction feels as if it is getting executed alone in the system. A transaction does not realize that there are other transactions as well getting executed parallel. Changes made by a transaction become visible to other transactions only after they are written in the memory. Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Durability This property ensures that all the changes made by a transaction after its successful execution are written successfully to the disk. It also ensures that these changes exist permanently and are never lost even if there occurs a failure of any kind. It is the responsibility of recovery manager to ensure durability in the database. Example of Atomicity Transaction to transfer 50 from account A to account B: read(A) A := A – 50 write(A) read(B) B := B + 50 write(B) Atomicity requirement: if the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent database state. Example of Durability Once the user has been notified that the transaction has completed (i.e., the transfer of the 50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures. Example of Consistency The sum of A and B is unchanged by the execution of the transaction Transaction to transfer 50 from account A to account B: read(A) A := A – 50 write(A) read(B) B := B + 50 write(B) Example of Isolation Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be). T1 T2 1. read(A) 2. A := A – 50 3. write(A) read(A), read(B), print(A+B) 4. read(B) 5. B := B + 50 6. write(B) Name of the Faculty :Uttaran Chowdhury Designation :Assistant Professor Department :CSS Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Concurrency Problems in DBMS When multiple transactions execute concurrently in an uncontrolled or unrestricted manner, then it might lead to several problems. Such problems are called as concurrency problems. Types of concurrency control problem Dirty Read Problem: Reading the data written by an uncommitted transaction is called as dirty read. This read is called as dirty read because- There is always a chance that the uncommitted transaction might roll back later. Thus, uncommitted transaction might make other transactions read a value that does not even exist. This leads to inconsistency of the database. Dirty Read Problem example Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Dirty Read Problem (Cont.) Dirty read does not lead to inconsistency always. It becomes problematic only when the uncommitted transaction fails and rolls backs later due to some reason. Unrepeatable Read Problem This problem occurs when a transaction gets to read unrepeated i.e. different values of the same variable in its different read operations even when it has not updated its value. Unrepeatable read Problem is also known as Inconsistent Retrievals. Unrepeatable Read Problem Example Lost Update Problem This problem occurs when multiple transactions execute concurrently and updates from one or more transactions get lost. When two transactions that access the same database items contain their operations in a way that makes the value of some database item incorrect, then the lost update problem occurs. Lost Update Problem Example Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Phantom Read Problem This problem occurs when a transaction reads some variable from the buffer and when it reads the same variable later, it finds that the variable does not exist. Phantom Read Problem Example Serializability in DBMS Schedules in DBMS The order in which the operations of multiple transactions appear for execution is called as a schedule. Types of Schedules Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Serial Schedules: In serial schedules, All the transactions execute serially one after the other. When one transaction executes, no other transaction is allowed to execute. In this schedule, There are two transactions T1 and T2 executing serially one after the other. Transaction T1 executes first. After T1 completes its execution, transaction T2 executes. So, this schedule is an example of a Serial Schedule. Non-Serial Schedules: In non-serial schedules, Multiple transactions execute concurrently. Operations of all the transactions are interred leaved or mixed with each other. In this schedule, There are two transactions T1 and T2 executing concurrently. The operations of T1 and T2 are interleaved. So, this schedule is an example of a Non-Serial Schedule. Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Serializability in DBMS: We have discussed- A schedule is the order in which the operations of multiple transactions appear for execution. Serial schedules are always consistent. Non-serial schedules are not always consistent. Serializability When multiple transactions are running concurrently then there is a possibility that the database may be left in an inconsistent state. Serializability is a concept that helps us to check which schedules are serializable. A serializable schedule is the one that always leaves the database in consistent state. Serializable Schedules If a given non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’ transactions, then it is called as a serializable schedule. Types of Serializability: Serializability is mainly of two types- Conflict Serializable Schedule A schedule is called conflict Serializability if after swapping of non- conflicting operations, it can transform into a serial schedule. The schedule will be a conflict serializable if it is conflict equivalent to a serial schedule. Conflicting Operations: The two operations become conflicting if all conditions satisfy: They belong to different transactions They operate on the same data item At Least one of them is a write operation Conflict Serializability Conclusion By converting non serial schedule to serial schedule by swapping of non-conflicting instruction, then this schedule known as Conflict Serializability. View Serializability A schedule will view serializable if it is view equivalent to a serial schedule. If a schedule is conflict serializable, then it will be view serializable. Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 The view serializable which does not conflict serializable contains blind writes. View Equivalent: Two schedules S1 and S2 are said to be view equivalent if they satisfy the following conditions: Initial Read Updated Read Final Write Initial Read: An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should also read A. Updated Read: If in schedule S1, the transaction T1 is reading a data item updated by T2 then in schedule S2, T1 should read the value after the write operation of T2 on same data item. Final Write: A final write must be the same between both the schedules. In schedule S1, if a transaction T1 updates A at last then in S2, final writes operations should also be done by T1. Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 View Serializability example What is Concurrency Control? Concurrency control is the procedure in DBMS for managing simultaneous operations without conflicting with each another. Concurrent access is quite easy if all users are just reading data. There is no way they can interfere with one another. Though for any practical database, would have a mix of reading and WRITE operations and hence the concurrency is a challenge. Concurrency Control Protocols Different concurrency control protocols offer different benefits between the amount of concurrency they allow and the amount of overhead that they impose. Lock-Based Protocols Two Phase Timestamp-Based Protocols Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Lock-based Protocols: A lock is a data variable which is associated with a data item. To achieve consistency isolation is the most important idea. Locking is the simplest idea to achieve isolation. T1 T2 I1 I2 Q Lock-based Protocols (Cont.): First obtain the a lock on data item then performed a desired operation and then unlock the data item. To provide better consistency along with isolation we are use different mode of lock. 1. Different types of Lock It is also known as a Read-only lock. In a shared lock, the data item can only read by the transaction. It can be shared between the transactions because when the transaction holds a lock, then it can't update the data on the data item. 2. Exclusive lock: In the exclusive lock, the data item can be both reads as well as written by the transaction. This lock is exclusive, and in this lock, multiple transactions do not modify the same data simultaneously. Locks: Department of Computational Science Name of the Faculty Designation and Department Brainware University, Kolkata Programme Name and Semester: BCA Course Name: DBMS (BCA37107) Academic Session: 2024-2025 Lock-based protocols allow transactions to obtain a lock on every object before a 'write' operation is performed. Transactions may unlock the data item after completing the ‘write’ operation. Two-phase locking (2PL) The two-phase locking protocol divides the execution phase of the transaction into three parts. In the first part, when the execution of the transaction starts, it seeks permission for the lock it requires. In the second part, the transaction acquires all the locks. The third phase is started as soon as the transaction releases its first lock. In the third phase, the transaction cannot demand any new locks. It only releases the acquired locks. Growing phase & Shrinking phase Growing phase: In the growing phase, a new lock on the data item may be acquired by the transaction, but none can be released. Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but no new locks can be acquired. Transaction can perform read/write operation in growing and shrinking phase. 2PL Example: The following way shows how unlocking and locking work with 2-PL. Transaction T1: Growing phase: from step 0-2 Shrinking phase: from step 4-6 Lock point: at 3 Transaction T2: Growing phase: from step 1-5 Shrinking phase: from step 7-8 Lock point: at 6 Name of the Faculty Designation and Department Brainware University, Kolkata Ensure Conflict Serializable/ View Serializable: Ensure Conflict Serializable/ View Serializable, the order of Serializability in the order in which reaches lock point. T1 T2 T3 T4 S Serial order conflict equivalent: T2> T1> T3>T4 Example of an Unrecoverable Schedule Consider two transactions T1T_1T1 and T2T_2T2 operating on data item AAA: 1. T1: Reads A. 2. T2: Reads A. 3. T2: Updates A (write operation). 4. T2: Commits. 5. T1: Updates A based on its earlier read. Issue in This Schedule: If T2T_2T2 commits, and later T1T_1T1 fails and rolls back, the changes made by T2T_2T2 would have been dependent on T1T_1T1's earlier (possibly invalid) read. This causes inconsistency. Since T2T_2T2 has already committed, its changes cannot be undone, making the schedule unrecoverable. Sample Questions: 1. Let three transactions T1, T2 and T3 with a given schedule: R1(A), W2(A) W1(A) W3(A). Check whether the schedule is view serializable or not. Schedule Given: R1(A),W2(A),W1(A),W3(A) Where: R1(A): Transaction T1reads A. W2(A): Transaction T2writes A. W1(A): Transaction T1 writes A. W3(A): Transaction T3 writes A. Steps to Verify View Serializability 1. Determine the Final Write: o The final write on A is W3(A), performed by T3. o In any equivalent serial schedule, T3 must appear last for A. 2. Establish the Read Dependencies: o R1(A): Transaction T1reads A. Since T1 reads before any write, it must read the initial value of A. 3. Write-Read Dependencies: o W2(A) precedes W1(A): The value written by T2 is overwritten by T1. o W1(A)precedes W3(A): The value written by T1 is overwritten by T3. 4. Construct the Dependency Graph: o Add nodes for transactions: T1, T2, T3. o Draw edges based on dependencies: ▪ T2→T1 because W2(A)happens before W1(A). ▪ T1→T3because W1(A) happens before W3(A). 5. Check for Cycles: o The dependency graph has the following edges: T2→T1→T3. o There are no cycles in the graph. Since the dependency graph is acyclic, the given schedule is view serializable. The equivalent serial schedule is: T2→T1→T3. 2. Consider the two transactions T1 and T2 such that T1: R1 (A) W1 (A) R1 (B) W1 (B) T2: R2 (A) W2 (A) R2(C) W2(C) Leat a schedule S: R1 (A) W1 (A) R2 (A) W2 (A) R1 (B) W1 (B) R2 (C) W2 (C). Find out whether the given schedule is conflict serializable or not. JOIN in SQL Definition: JOIN is an SQL clause that combines columns from two or more tables based on a related column between them. Different types of JOIN operations: Inner join ✓ Equi Join ✓ Natural Join Outer join ✓ Left Outer Join ✓ Right Outer Join ✓ Full Outer Join Inner Join: The inner join will keep only the information from the two joined tables that is related. EQUI JOIN: EQUI JOIN creates a JOIN for equality or matching column(s) values of the relative tables. EQUI JOIN also create JOIN by using JOIN with ON and then providing the names of the columns with their relative tables to check equality using equal sign (=). Example: SELECT student.name, student.id, record.class, record.city FROM student, record WHERE student.city = record.city; OR, SELECT student.name, student.id, record.class, record.city FROM student JOIN record ON student.city = record.city; NATURAL JOIN: Natural join is an SQL join operation that creates a join on the base of the common columns in the tables. To perform natural join there must be one common attribute (Column) with same name between two tables. Example: Assume employee and department has a common attribute with same name. SELECT * FROM employee NATURAL JOIN department; Difference between Natural Join and Inner Join Natural Join joins two tables based on the same attribute name and data types. The resulting table will contain all the attributes of both the table but keep only one copy of each common column while Inner Join joins two tables on the basis of the column which is explicitly specified in the ON clause. The resulting table will contain all the attributes from both tables including the common column also. OUTER JOIN: outer join, is a type of SQL JOIN operation that retrieves all records from both tables, including matching and non-matching records. Let two tables Table1 and Table2 are as follows: Table1 Table2 A B A C 1 B 1 1 C1 2 B 2 3 C2 3 B 3 4 B4 1. LEFT OUTER JOIN: Returns all records from the left table, and the matched records from the right table. Example: Select Table1.A, Table1.B, Table2.C from Table1 LEFT OUTER JOIN Table2 ON Table1.A = Table2.A OUTPUT A B C 1 B1 C1 3 B3 C2 2 B2 NULL 4 B4 NULL 2. RIGHT OUTER JOIN: Returns all records from the right table, and the matched records from the left table. Example: Select Table1.A, Table1.B, Table2.C from Table1 RIGHT OUTER JOIN Table2 ON Table1.A = Table2.A 3. FULL OUTER JOIN: Returns all records when there is a match in either left or right table. Example: Select Table1.A, Table1.B, Table2.C from Table1 FULL OUTER JOIN Table2 ON Table1.A = Table2.A What is Lossless Decomposition? Lossless join decomposition is a decomposition of a relation R into relations R1, and R2 such that if we perform a natural join of relation R1 and R2, it will return the original relation R. This is effective in removing redundancy from databases while preserving the original data. R R1 R2 R For lossless join decomposition, we always have-R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R Where ⋈ is a natural join operator. Determining Whether Decomposition Is Lossless or Lossy- Consider a relation R is decomposed into two sub relations R1 and R2. Then, ✓ If all the following conditions satisfy, then the decomposition is lossless. ✓ If any of these conditions fail, then the decomposition is lossy. Condition-01: ❖ Union of both the sub relations must contain all the attributes that are present in the original relation R. Thus, R1 𝖴 R2 = R Condition-02: ❖ Intersection of both the sub relations must not be null. ❖ In other words, there must be some common attribute which is present in both the sub relations. Thus, R1 ∩ R2 ≠ ∅ Condition-03: ❖ Intersection of both the sub relations must be a super key of either R1 or R2 or both. That means, R1 ∩ R2 → R1 OR, R1 ∩ R2 → R2 Problem1: Let a relation R with FD ={ A→B, B→C} is decomposed into two sub relations R1( A , B ) and R2( B , C ). Check the decomposition is lossless or lossy. Solution: Condition-01: Here R1 𝖴 R2 = R Condition-02: R1 ∩ R2 = B ≠ ∅ Finding the closure of B = B+ = { BC} That means R1 ∩ R2 → R2; Hence the decomposition is lossless. Another way to check lossless or lossy with the following example: Lossless Join Example discussed in class Let R = ABCDE, R1 = AD, R2 = AB, R3 = BE, R4 = CDE, and R5 = AE. Let the functional dependencies be: A -> C, B -> C, C -> D, DE -> C, CE -> A Apply algorithm 7.2 from class handout to test if the decomposition of R into {R1,..,R5} is a lossless join decomposition. The initial table looks as follows: A B C D E R1(AD) a1 b12 b13 a4 b15 R2(AB) a1 a2 b23 b24 b25 R3(BE) b31 a2 b33 b34 a5 R4(CDE) b41 b42 a3 a4 a5 R5(AE) a1 b52 b53 b54 a5 Apply FD A -> C to the initial table to modify the violating dependencies. Rows 1, 2, 5 will need to change the values for the RHS attribute C – equate b13, b23, b53 to b13 (you might very well have picked b23 or b53 to equate all three). We won’t change the rows 3 & 4 yet because their symbols b31, b41 are different from a1. A B C D E a1 b12 b13 b13 a4 b15 a1 a2 b23 b13 b24 b25 b31 a2 b33 b34 a5 b41 b42 a3 a4 a5 a1 b52 b53 b13 b54 a5 Apply B -> C next to equate b33 with b13 A B C D E a1 b12 b13 b13 a4 b15 a1 a2 b23 b13 b24 b25 b31 a2 b33 b13 b34 a5 b41 b42 a3 a4 a5 a1 b52 b53 b13 b54 a5 Next, use C -> D to equate a4, b24, b34, and b54 A B C D E a1 b12 b13 b13 a4 b15 a1 a2 b23 b13 b24 a4 b25 b31 a2 b33 b13 b34 a4 a5 b41 b42 a3 a4 a5 a1 b52 b53 b13 b54 a4 a5 DE -> C helps us to equate b13 (all occurrences) with a3. The following table shows the corresponding changes. A B C D E a1 b12 b13 b13 a3 a4 b15 a1 a2 b23 b13 a3 b24 a4 b25 b31 a2 b33 b13 a3 b34 a4 a5 b41 b42 a3 a4 a5 a1 b52 b53 b13 a3 b54 a4 a5 Apply CE -> A to the table above to equate b31, b41, and a1. A B C D E a1 b12 b13 b13 a3 a4 b15 a1 a2 b23 b13 a3 b24 a4 b25 b31 a1 a2 b33 b13 a3 b34 a4 a5 b41 a1 b42 a3 a4 a5 a1 b52 b53 b13 a3 b54 a4 a5 The middle row in the table above is all a’s, and the decomposition has a lossless join. What happens if we apply FDs in a different order? Try any FD with the same symbols (a or b) on the LHS attribute in at least two rows. We have two choices: A -> B or B -> C. We tried A -> B in the previous test. Try B -> C here. A B C D E a1 b12 b13 a4 b15 a1 a2 b23 b23 b24 b25 b31 a2 b33 b23 b34 a5 b41 b42 a3 a4 a5 a1 b52 b53 b54 a5 Apply A -> C after that to get the following table. Notice I chose b23 to equate b13, b23, and b53. A B C D E a1 b12 b13 b23 a4 b15 a1 a2 b23 b23 b24 b25 b31 a2 b33 b23 b34 a5 b41 b42 a3 a4 a5 a1 b52 b53 b23 b54 a5 The table shown above is the similar to the one we got before after applying A -> C and B-> C to the initial table. To the current table we apply C -> D as before because we have the same symbol b23 in rows 1, 2, 3, and 5.