Object-Oriented Database Concepts PDF

Document Details

TrustedCarnelian2212

Uploaded by TrustedCarnelian2212

Addis Ababa Science and Technology University

Tags

object-oriented databases database concepts object-oriented programming database systems

Summary

This document provides an overview of object-oriented database concepts. It explains the core principles of object-oriented databases, including object structure, object classes, polymorphism, inheritance, and abstraction. These concepts are fundamental in understanding how complex data can be represented and managed efficiently in database systems.

Full Transcript

CHAPTER ONE 1.1. Overview of Object-Oriented Concepts Introduction Database systems that were based on the object data model were known originally as object- oriented databases (OODBs). But, are now referred to as object databases (ODBs).Traditional data models and systems, such as netw...

CHAPTER ONE 1.1. Overview of Object-Oriented Concepts Introduction Database systems that were based on the object data model were known originally as object- oriented databases (OODBs). But, are now referred to as object databases (ODBs).Traditional data models and systems, such as network, hierarchical, and relational have been quite successful in developing the database technologies required for many traditional business database applications. However, they have certain shortcomings when more complex database applications must be designed and implemented like databases for engineering design and manufacturing, biological and other sciences, telecommunications, geographic information systems, and multimedia. These ODBs were developed for applications that have requirements requiring more complex structures for stored objects. A key feature of object databases is the power they give the designer to specify both the structure of complex objects and the operations that can be applied to these objects. Another reason for the creation of object-oriented databases is the vast increase in the use of object-oriented programming languages for developing software applications. Databases are fundamental components in many software systems, and traditional databases are sometimes difficult to use with software applications that are developed in an object-oriented programming language such as C++ or Java. Object databases are designed so they can be directly or seamlessly integrated with software that is developed using object-oriented programming languages. An object-oriented database is based on the principles of object-oriented programming (OOP). In an object- oriented database, data is organized and stored as objects. Objects are self-contained units that contain both data and the operations or methods that can be performed on that data. It helps in efficient representation and management of complex data structures and relationships. Object-oriented databases are often used in applications that require the efficient management of complex data structures and relationships, such as CAD/CAM systems, geographic information systems, and document management systems. They are also well suited for applications that require the integration of different data Page 1 of 10 types and sources, such as multimedia data or data from multiple sources. However, object-oriented databases can be more difficult to learn and use if compared to other database models, and may require specialized expertise to set up and manage.  Objects are a real world entities, such as a specific task in a to-do list: “take the garbage out”. All objects are assigned a class within data structures for both hierarchy and functional purposes. So, when you hear the phrase "instances of a class," this is simply a reference to objects created from particular classes.  Attributes and Methods: An object has state and behaviors. Objects also have properties (attributes) like name, status, and createdate. The set of properties, taken together, represents its state. Objects also have behaviors (known as methods, actions, or functions) that modify or operate on its properties. Examples include updatetask() or gettaskhistory(). Methods are also the primary path of object-to-object communication. Here are the key concepts for object-oriented databases: 1. Object Structure: Object-oriented databases store data in the form of objects, which have attributes (variables) and methods (behaviors). The object structure encapsulates data and code into a single unit, providing data abstraction. 2. Object Classes: Objects are instances of classes, which define the structure and behavior of the objects. Classes can inherit properties and methods from parent classes, allowing for code reuse and modeling of real-world relationships. 3. Polymorphism: Objects can take on multiple forms, allowing the same program code to work with different data types. This enables flexibility and extensibility in the database. Polymorphism is a fundamental concept in object-oriented programming (OOP) that allows objects of different classes to be treated as objects of a common superclass. Here's a more detailed explanation: Polymorphism refers to the ability of objects of different classes to respond to the same method call. It allows a single interface to be used with objects of different types. 4. Inheritance: New classes can inherit properties and methods from parent classes, creating a hierarchical relationship between related classes. This promotes code reuse and modeling of real-world entities.. 5. Abstraction: Only the essential data features needed for the required functionality are represented, reducing complexity and enabling reusability. Abstraction is the process of hiding unnecessary information from a user while displaying essential information that is valuable to them. Abstraction is a process of hiding the implementation details of a system from the user, and only the functional details will be available to the user end. On the other hand, Encapsulation is a method of wrapping up the data and code acting on the data into a single unit. 2 It does not display trivial or non-essential units to the user. By using interfaces and abstract classes, you can accomplish this. A user can only access the methods of an interface. A TV remote, for example, only shows the keys to the user, not the codes on which it operates since those are of no use to them. 6. Transparent Persistence: Object-oriented databases allow the use of an object-oriented programming language for data manipulation, seamlessly integrating the database with the application. 7. ACID Transactions: Object-oriented databases support ACID (Atomicity, Consistency, Isolation, Durability) transactions, ensuring data integrity even in the face of concurrent access and system failures. 8. Database Caching: Partial replicas of the database are stored in memory, allowing faster access to data compared to disk-based storage. 9. Recovery: Object-oriented databases provide mechanisms for disaster recovery in case of application or system failures. 1.2. Object Identity, Object Structure, and Type Constructors Object Identity In the context of databases, particularly Object-Oriented Database Management Systems (OODBMS), the object structure refers to the organization of data within an object. This structure encapsulates both the data (attributes) and the behavior (methods) associated with that data. Here’s a breakdown of the components that make up an object structure: Components of Object Structure Attributes: These are the properties or characteristics of the object. Each attribute holds a specific piece of data that defines the object. For example, in a Car object, attributes might include make, model, year, and color. Methods: These are functions or procedures that define the behavior of the object. Methods can manipulate the object's attributes or perform operations related to the object. For instance, a Car object might have methods like start(), stop(), and accelerate(). Messages: In object-oriented programming, messages are used to communicate between objects. When a method is invoked, a message is sent to the object to execute that method. Messages can be categorized as: Read-only messages: These do not alter the object's state. Read-only messages are operations that retrieve data from the database without modifying it. These messages are used to query the database and obtain information. Examples of read-only operations include: SELECT Queries: These are used to fetch data from one or more tables. For example: SELECT * FROM Students WHERE age > 18; This query retrieves all records of students older than 18 without altering any data in the database. 3 Functions that return specific information, such as counting records or finding averages, also fall under read- only messages. For instance: SELECT COUNT(*) FROM Students; This counts the total number of students in the database. Update messages: These change the state of the object by modifying its attributes. In the context of databases, read-only messages and update messages refer to the types of operations that can be performed on data stored in a database. Understanding the distinction between these two types of messages is crucial for managing data effectively. Update messages are operations that modify the data stored in the database. These messages can change existing records, insert new records, or delete records. Examples of update operations include: INSERT Statements: These are used to add new records to a table. For example: INSERT INTO Students (studentID, name, age, major) VALUES (1, 'Alice', 20, 'Computer Science'); This command adds a new student record to the Students table. UPDATE Statements: These modify existing records. For example: UPDATE Students SET age = 21 WHERE studentID = 1; This updates the age of the student with studentID 1 to 21. DELETE Statements: These remove records from a table. For example: DELETE FROM Students WHERE studentID = 1; This command deletes the student record with studentID 1 from the database.. Example of Object Structure Consider a simple example of a Student object: Object Name: Student Attributes: studentID: Integer name: String age: Integer major: String Methods: getDetails(): Returns the student's details. enroll(course): Enrolls the student in a specified course. In this example, the Student object encapsulates both its data (attributes) and its behavior (methods), providing a clear structure for managing student information in a database. The object structure in databases is crucial for organizing data in a way that reflects real-world entities and their interactions. By encapsulating attributes and methods, object-oriented databases facilitate a more intuitive and efficient way to manage complex data relationships. 4 Type Constructors Type constructors in databases are essential for defining complex data types and structures. They allow the creation of new types based on existing ones, enabling better organization and representation of data. Here’s an overview of type constructors along with examples. Type constructors are mechanisms that allow the definition of new data types by combining existing types. They are commonly used in object-oriented databases and object-relational databases. In database systems, particularly in object-oriented databases, the three most basic constructors are atom, tuple, and set. These constructors are fundamental for defining and organizing data structures. Here’s a brief overview of each: 1. Atom The atom constructor represents basic atomic values. These values can include integers, real numbers, character strings, Booleans, and any other basic data types supported by the system. Atoms are the simplest form of data and cannot be decomposed further. Example: An atomic value could be an integer like 42 or a string like "Hello, World!". 2. Tuple The tuple constructor is used to group multiple values into a single composite structure. A tuple can contain various types of data, including atoms and other tuples. It is similar to a record in relational databases. Example: A tuple representing a person might look like this: (Name: "Alice", Age: 30, Height: 5.5) Here, the tuple contains an atom (string for Name), an atom (integer for Age), and another atom (float for Height). An object type constructor allows you to create a new object type that encapsulates attributes and methods. For example, consider a Person object type: In this example, the Person type has three attributes: ID, Name, and Age. You can create instances of this type using the constructor: 5 3. Set The set constructor is used to represent a collection of unique values. Sets do not allow duplicate entries and are useful for representing relationships where the order of elements does not matter. Collection types allow you to define a type that can hold multiple values. For instance, you can create a nested table type to store a list of addresses: You can then use this collection type to store multiple addresses: Type constructors are powerful tools in databases that facilitate the creation of complex data types, enhancing the ability to model real-world entities. By using object types and collection types, developers can create structured and organized data representations that are easy to manage and manipulate. Object Identity Object identity in databases, particularly in Object-Oriented Database Management Systems (OODBMS), refers to the unique identification of objects that persists regardless of changes to their attributes. This concept is essential for maintaining data integrity and consistency. Here’s a detailed explanation along with an example: 6 Key Aspects of Object Identity 1. Unique Object Identifiers (OIDs): Each object in an OODBMS is assigned a unique identifier known as an Object Identifier (OID). This OID is generated by the system and ensures that each object can be distinctly identified, even if its attributes change over time. This is different from relational databases, where identity is often based on primary key values. 2. Immutability: OIDs are immutable, meaning that once an OID is assigned to an object, it cannot be changed or reassigned. This characteristic is crucial for preserving the identity of the object throughout its lifecycle, even if the object is modified or deleted. 3. Internal Management: OIDs are managed internally by the database system and are not visible to external users. This allows the system to maintain references between objects efficiently without exposing the complexity of these identifiers to users. 4. Challenges in Relational Models: In relational databases, if the value of a primary key changes, the identity of the tuple is considered to change as well. This can complicate the determination of whether different records represent the same real-world entity. In contrast, OIDs provide a consistent way to reference objects without being affected by changes in their attributes. 5. Efficient Retrieval: Modern OODBMS typically use long integers or similar identifiers as OIDs, which decouples the OID from the physical storage location. This allows for efficient object retrieval even after physical reorganizations of the database. Example For instance, consider a department object in a database. It might be represented as a tuple with attributes like department number, name, and a set of locations. The structure could look like this: Department Object:  OID: i1  Type: Tuple  Attributes: {DNO: i2, DNAME: i3, LOCATIONS: {i4, i5}} Here, i2, i3, i4, and i5 would be OIDs representing the department number, name, and locations, respectively Object identity is a fundamental concept in OODBMS that ensures each object is uniquely identifiable and maintains its identity over time, independent of changes to its attributes. This is achieved through the use of immutable OIDs, which facilitate efficient data management and retrieval. 1.3. Encapsulation of Operations, Methods, and Persistence Encapsulation is a fundamental concept in object-oriented programming (OOP) that involves bundling the data (attributes) and the methods (operations) that operate on that data into a single unit, or class. This principle 7 helps to protect the internal state of an object from unintended interference and misuse. ENCAPSULATION OPRATOR Constructor operation Used to create a new object Destructor operation Used to destroy (delete) an object Modifier operations Modify the state of an object Key Concepts: Operations and Methods  Operations: - refer to the actions that can be performed on an object.  Methods: - are the implementations of those operations defined within a class. They dictate how the object's data can be manipulated. Data Hiding  By restricting access to certain components of an object, encapsulation helps maintain the integrity of the data. This is typically achieved using access modifiers (e.g., private, protected, public).  Only the methods of the object are allowed to interact with its internal data, reducing the risk of unintended modifications. Persistence Persistence refers to the ability of an object to maintain its state between different sessions or instances. This is often achieved through serialization, databases, or file systems. Encapsulating persistence logic within the class ensures that developers can manage how data is saved and retrieved without exposing the underlying implementation details. The capability of processes or items to continue existing even after systems are shut down is referred to as persistence in databases. It makes sure that data is safe and accessible for users in the future. Persistent data is crucial as it ensures data durability by allowing it to survive system shutdowns and update conditions. Persistent databases enable data manipulation across multiple transactions, providing data consistency and reliability which allow storage and retrieval of complex objects and maintain integrity by preserving their relationship. 1.4. Type hierarchies and inheritance 8 Type hierarchies and inheritance are essential features of Object-Oriented Databases (OODB), enabling the organization of data in a structured and efficient manner. These concepts allow for the creation of complex data models that reflect real-world relationships and behaviors. A type hierarchy in OODB allows for the definition of new types based on existing ones, creating a structured relationship among types. This is akin to class hierarchies in object-oriented programming, where subclasses inherit attributes and methods from their superclasses. Structure: Each type is defined by its name and a set of attributes (instance variables) and operations (methods). For example, a PERSON type might include attributes like Name, Address, and Birth_date. Inheritance Inheritance allows subtypes to inherit functions from their supertype, promoting code reuse and reducing redundancy. For instance, if EMPLOYEE and STUDENT are defined as subtypes of PERSON, they inherit all functions of PERSON while adding their specific attributes. Inheritance facilitates the creation of a type hierarchy, where each subtype can have additional functions that are not present in the supertype. This structure enhances data organization and retrieval, making it easier to manage complex data relationships Type hierarchies and inheritance in OODB provide a robust framework for organizing data, promoting reusability, and enhancing the flexibility of database design. By leveraging these concepts, developers can create efficient and maintainable data models that accurately reflect complex relationships in real-world applications. Inheritance allows the definition of new types based on other predefined types, leading to a type (class) hierarchy. A type is defined by assigning it a type name and then defining a number of attributes (instance variables) and operations (methods) for the class. In the simplified model we use in this section, the attributes and operations are together called functions, since attributes resemble functions with zero arguments. A function name can be used to refer to the value of an attribute or to refer to the resulting value of an operation (method). We use the term function to refer to both attributes and operations, since they are treated similarly in a basic introduction to inheritance. A class in its simplest form has a class name and a list of visible (public) functions. When specifying a class in this section, we use the following format, which does not specify arguments of functions, to simplify the discussion: CLASS_NAME: function, function, … , function 9 Example: a class that describes characteristics of a PERSON may be defined as follows: PERSON: Name, Address, Birth_date, Age, Ssn In the PERSON type, the Name, Address, Ssn, and Birth_date functions can be implemented as stored attributes, whereas the Age function can be implemented as an operation that calculates the Age from the value of the Birth_date attribute and the current date. The concept of subtype is useful when the designer or user must create a new type that is similar but not identical to an already defined type. The subtype then inherits all the functions of the predefined type, which is referred to as the super type. Example: suppose that we want to define two new types EMPLOYEE and STUDENT as follows: EMPLOYEE: Name, Address, Birth_date, Age, Ssn, Salary, Hire_date, Seniority STUDENT: Name, Address, Birth_date, Age, Ssn, Major, Gpa Since both STUDENT and EMPLOYEE include all the functions defined for PERSON plus some additional functions of their own, we can declare them to be subtypes of PERSON. Each will inherit the previously defined functions of PERSON—namely, Name, Address, Birth_date, Age, and Ssn. For STUDENT, it is only necessary to define the new (local) functions Major and Gpa, which are not inherited. Presumably, Major can be defined as a stored attribute, whereas Gpa may be implemented as an operation that calculates the student’s grade point average by accessing the Grade values that are internally stored (hidden) within each STUDENT object as hidden attributes. For EMPLOYEE, the Salary and Hire_date functions may be stored attributes, whereas Seniority may be an operation that calculates Seniority from the value of Hire_date. Therefore, we can declare EMPLOYEE and STUDENT as follows: EMPLOYEE subtype-of PERSON: Salary, Hire_date, Seniority STUDENT subtype-of PERSON: Major, Gpa In general, a subtype includes all of the functions that are defined for its super type plus some additional functions that are specific only to the subtype. Hence, it is possible to generate a type hierarchy to show the supertype/subtype relationships among all the types declared in the system. 10 Chapter 2: Query processing and Optimization 2.1. Translating SQL Queries into Relational Algebra 2.2. Basic Algorithms for Executing Query Operations 2.3. Using Heuristic in Query Optimization 2.4. Using Selectivity and Cost Estimates in Query Optimization 2.5. Semantic Query Optimization 1 Query Processing in DBMS Query processing in a relational database involves several steps to translate and execute a high- level query, such as SQL, into a low-level query that can be executed by the database system. Query Processing is the activity performed in extracting data from the database. In query processing, it takes various steps for fetching the data from the database. The steps involved are 1. Parsing and translation 2. Optimization 3. Evaluation Parsing and Translation: The user query, written in a high-level language like SQL, is translated into an internal representation known as a parse tree. The parser checks the syntax of the query, verifies the names of relations in the database, and ensures the required attribute values are present. The parse tree is then translated into a form of relational algebra, which is well-suited for the internal representation of the query. 2 Most queries submitted to a DBMS are in a high-level language such as SQL. During the parsing and translation stage, the human readable form of the query is translated into forms usable by the DBMS. These can be in the forms of a relational algebra expression, query tree and query graph. After parsing and translation into a relational algebra expression, the query is then transformed into a form, usually a query tree or graph that can be handled by the optimization engine. The optimization engine then performs various analyses on the query data, generating a number of valid evaluation plans. From there, it determines the most appropriate evaluation plan to execute Optimization: The database system generates an efficient query evaluation plan that minimizes the cost of executing the query. The query optimizer analyzes the estimated cost of each operation in the query and selects the most efficient execution plan. The optimization process considers factors such as memory allocations, execution costs, 3 and available indexes Evaluation: The translated query, along with the query evaluation plan, is passed to the query execution engine. The query execution engine executes the query evaluation plan and produces the output of the query. The execution engine may use various algorithms and techniques to evaluate each operation in the query. In summary, query processing in a relational database involves parsing and translating the user query, optimizing the query evaluation plan, and executing the plan to produce the desired output In summary, query processing in a relational database involves parsing and translating the user query, optimizing the query evaluation plan, and executing the plan to produce the desired output. Thus, we can understand the working of a query processing in the below-described diagram Suppose a user executes a query. As we have learned that there are various methods of extracting the data from the database. In SQL, a user wants to fetch the records of the employees whose salary is greater than or equal to 10000. For doing this, the following query is undertaken: select emp_name from Employee where salary>10000; Thus, to make the system understand the user query, it needs to be translated in the form of relational algebra. We can bring this query in the relational algebra form as:  σsalary>10000 (πsalary (Employee)) 4  πsalary (σsalary>10000 (Employee)) After translating the given query, we can execute each relational algebra operation by using different algorithms. So, in this way, a query processing begins its working. A query execution engine is responsible for generating the output of the given query. It takes the query execution plan, executes it, and finally makes the output for the user query. Optimization Query optimization: The process of choosing a suitable execution strategy for processing a query. A query typically has many possible execution strategies, and the process of choosing a suitable one for the processing a query is known as query optimization Query optimization is an activity conducted by a query optimizer in a DBMS to select thebest available strategy for executing the query High-level query languages like SQL (for RDBMS) are more declarative in nature because they specifies what the intended results of the query are, rather than identifying the details of how the result should be obtained. SQL queries are translated into corresponding relational algebra for the optimization. The cost of the query evaluation can vary for different types of queries. Although the system is responsible for constructing the evaluation plan, the user does need not to write their query efficiently. Usually, a database system generates an efficient query evaluation plan, which minimizes its cost. This type of task performed by the database system and is known as Query Optimization. For optimizing a query, the query optimizer should have an estimated cost analysis of each operation. It is because the overall operation cost depends on the memory allocations to several operations, execution costs, and so on. Finally, after selecting an evaluation plan, the system evaluates the query and produces the output of the query. 5 2.1. Translating SQL Queries into Relational Algebra Relational algebra is a procedural query language used in database theory. It provides a theoretical foundation for relational databases and SQL. Relational algebra uses algebraic structures to model data and define queries with a well-founded semantics. Fundamental Operators: Selection (σ): This operator selects tuples from a relation based on a given condition Projection (π): It projects specific columns from a relation Union (U): This operator combines two relations and returns a relation that contains all the tuples from both relations Set Difference (-): It returns the tuples that are present in one relation but not in the other Set Intersection (∩): It returns the tuples that are common to both relations Rename (ρ): This operator is used to rename attributes of a relation Cartesian Product (X): It combines every tuple from one relation with every tuple from another relation Derived Operators Natural Join (⋈): It combines tuples from two or more relations based on the equality of their common attributes. Conditional Join: Similar to natural join, but allows for specifying conditions other than equality. Example : consider the following three table 6 7 2.2. Basic Algorithms for Executing Query Operations Basic algorithms for executing query operations in Object-Oriented Databases (OODB) are essential for efficiently retrieving and manipulating data. These algorithms are designed to handle various types of queries while considering the unique features of OODBs, such as object structures and relationships. Below are some fundamental algorithms commonly used in OODB query execution, along with examples to illustrate their application. 1. Traversal Algorithms Traversal algorithms are used to navigate through the object graph, which represents the relationships between objects. Depth-First Search (DFS): This algorithm explores as far down a branch of the graph as possible before backtracking. For example, if you have a hierarchy of employees where each employee can have subordinates, DFS can be used to retrieve all subordinates of a given employee. Example Consider a graph represented as follows: 8 In this graph: A is connected to B and C. B is connected to D and E. C is connected to F. DFS Algorithm Steps 1. Start at the Root: Begin at node A. 2. Visit Node: Mark A as visited. 3. Explore Neighbors: Move to the first unvisited neighbor, which is B. 4. Continue Down the Path: o Mark B as visited. o Move to D (the first unvisited neighbor of B). o Mark D as visited. Since D has no unvisited neighbors, backtrack to B. 5. Backtrack and Explore: o From B, move to E (the next unvisited neighbor). o Mark E as visited. Since E has no unvisited neighbors, backtrack to B, then backtrack to A. 6. Explore Remaining Neighbors: o From A, move to C (the next unvisited neighbor). o Mark C as visited. o Move to F (the only unvisited neighbor of C). o Mark F as visited. Since F has no unvisited neighbors, backtrack to C, then backtrack to A. Result of DFS Traversal The order of nodes visited in this DFS traversal would be: A, B, D, E, C, F. Breadth-First Search (BFS): This algorithm explores all neighbors at the present depth prior to moving on to nodes at the next depth level. For instance, if you want to find all employees at the same level in a hierarchy, BFS would be effective. 9 In this graph:  A is connected to B and C.  B is connected to D and E.  C is connected to F. BFS Algorithm Steps 1. Initialization: o Start with a queue and enqueue the starting node (A). o Mark A as visited. 2. Process the Queue: o Dequeue A and visit it. o Enqueue its unvisited neighbors (B and C) and mark them as visited. 3. Continue the Process: o Dequeue B and visit it. o Enqueue its unvisited neighbors (D and E) and mark them as visited. o Dequeue C and visit it. o Enqueue its unvisited neighbor (F) and mark it as visited. 4. Final Steps: o Dequeue D and visit it (no unvisited neighbors). o Dequeue E and visit it (no unvisited neighbors). o Dequeue F and visit it (no unvisited neighbors). Result of BFS Traversal The order of nodes visited in this BFS traversal would be: A, B, C, D, E, F. 2. Join Algorithms Join operations are crucial for combining data from multiple objects. Nested Loop Join: This straightforward method involves iterating through each object in one set and comparing it with every object in another set. For example, if you have two classes, Employee and Department, and you want to find all employees in a specific department, you would compare each employee with the department records. 10 Hash Join: This algorithm uses a hash table to store one of the input sets, allowing for faster lookups when matching tuples from the other set. For example, if you have a large dataset of Orders and Customers, you can hash the Customers dataset and quickly find matching orders. Merge Join: This method requires both input sets to be sorted on the join key. For example, if you have sorted lists of Products and Categories, you can efficiently merge them based on the category ID. Example1: Consider the same "employees" and "departments" tables. If both tables are sorted on the "dept_id", the sort-merge join algorithm will merge the sorted data based on the "dept_id" to perform the join operation. Example 2 11 After Sort-merge join using sid EXERSCISE :-join the following table using sort merge join 3. Selection and Projection Algorithms These algorithms focus on filtering and retrieving specific attributes from objects.  Selection: This operation filters objects based on specified criteria. For example, if you want to retrieve all employees with a salary greater than $50,000, the selection algorithm will filter out those who do not meet this criterion.  Projection: This operation retrieves specific attributes from objects. For instance, if you only need the names and salaries of employees, the projection algorithm will eliminate all other attributes from the result set. 4. Aggregation Algorithms Aggregation operations, such as COUNT, SUM, AVG, etc., are used to compute summary statistics over a set of objects. Group By: This algorithm groups objects based on specified attributes and applies aggregate functions to each group. For example, if you want to calculate the average 12 salary of employees in each department, you would group the employees by department and then compute the average salary for each group. Streaming Aggregation: This method processes data in a single pass, maintaining a running total or count. For instance, if you want to count the number of orders processed in a day, you can maintain a running total as each order is processed. 5. Optimization Techniques To enhance the performance of query execution, various optimization techniques can be applied: Indexing: Creating indexes on frequently queried attributes can significantly speed up data retrieval. For example, indexing the EmployeeID in the Employee class allows for faster lookups when querying by employee ID. Query Rewriting: Transforming the query into a more efficient form can reduce the complexity of execution. For instance, rewriting a query to filter data before performing joins can minimize the amount of data processed. Heuristic Optimization: Applying heuristic rules to determine the best execution plan based on empirical data can improve performance. For example, using heuristics to decide the order of joins based on the size of the datasets can lead to more efficient execution. These algorithms and techniques collectively contribute to the efficient execution of query operations in OODBs, ensuring that data retrieval and manipulation are performed effectively. 2.3. Using Heuristic in Query Optimization Heuristic optimization is a crucial technique used in database management systems (DBMS) to improve the efficiency of query execution. Unlike exhaustive optimization methods that evaluate all possible execution plans, heuristic optimization applies a set of predefined rules or guidelines to quickly derive a good execution plan. This approach is particularly beneficial in scenarios where time and computational resources are limited. 13 Key Components of Heuristic Optimization 1. Cost-Based Heuristics: o Heuristic strategies often involve estimating the cost of different execution plans based on statistical data about the database. This includes factors like data distribution and system parameters, which help in selecting plans with lower estimated overheads. 2. Join Order Heuristics: o When dealing with queries that involve multiple tables, determining the optimal order of joins is essential. Heuristic methods can utilize greedy algorithms or dynamic programming to explore different join orders and select the most efficient one. 3. Index Selection Heuristics: o Indexes can significantly speed up query processing. Heuristic optimization can help in selecting the most effective indexes based on query predicates and their selectivity, thereby improving performance. 4. Query Rewriting Heuristics: o This involves transforming the original query into a more efficient form. Techniques such as operation shifting and query splitting can help streamline the execution process by reducing the amount of data processed at each step. Heuristic Optimization rules Here are some common heuristic optimization rules: Selection Pushdown: Apply selection operations as early as possible in the query execution process. This reduces the number of rows that need to be processed in subsequent operations. Join Order Optimization: Choose an optimal order for joining tables. Generally, it is more efficient to join smaller tables first or those with more selective conditions to minimize the intermediate result size. Index Usage: Utilize available indexes effectively. Queries that can benefit from indexes should be rewritten to take advantage of them, as this can significantly reduce access time. Projection Pushdown: Apply projection operations (selecting specific columns) as early as possible to reduce the amount of data processed in later stages of the query. Elimination of Redundant Joins: 14 Remove any joins that do not contribute to the final result. If a join condition does not affect the output, it can be omitted to streamline the query. Query Rewriting: Transform complex queries into simpler forms that are easier to optimize. This can include breaking down subqueries or using equivalent expressions that are more efficient to execute. Use of Aggregation Early: If the query involves aggregation functions, perform these operations as early as possible to reduce the dataset size before further processing. By applying these heuristic rules, a DBMS can significantly enhance the performance of query execution, leading to faster response times and more efficient resource utilization Example of Heuristic Optimization Consider the following SQL query: SELECT PNUMBER, DNUM, LNAME FROM PROJECT, DEPARTMENT, EMPLOYEE WHERE DNUM = DNUMBER AND MGREMPID = EMPID AND PLOCATION = 'Stafford'; In this scenario, a heuristic optimizer might apply the following steps: 1. Cascading Selections: The optimizer can break down the WHERE clause into individual selection conditions. For example, it can first filter rows based on PLOCATION = 'Stafford' before performing joins, reducing the number of rows processed in subsequent operations. 2. Join Order Optimization: The optimizer might determine that joining the DEPARTMENT table with the EMPLOYEE table first is more efficient than joining PROJECT first, based on the selectivity of the join conditions. 3. Using Indexes: If there are indexes on DNUM and MGREMPID, the optimizer can choose to use these indexes to speed up the retrieval of relevant rows, further enhancing performance. By applying these heuristic strategies, the optimizer can significantly reduce the execution time of the query compared to a naive approach that processes all tables and conditions without optimization. 15 2.4. Using Selectivity and Cost Estimates in Query Optimization Query optimization is a vital process in database management systems that aims to determine the most efficient way to execute a given SQL query. Two key concepts in this process are selectivity and cost estimates. Selectivity Selectivity refers to the fraction of rows that a query condition will return from a table. It is crucial for estimating how many rows will be processed at each step of the query execution. The selectivity of a predicate can significantly influence the choice of execution plans. For example, if a condition is highly selective (returning only a few rows), it is generally more efficient to apply that condition early in the query execution process. This reduces the amount of data that subsequent operations need to handle, thereby improving performance. Cost Estimates Cost estimates are numerical values assigned to different execution plans based on various factors, including: Access Cost: The cost associated with retrieving data from storage. Memory Usage Cost: The cost related to the amount of memory required during execution. Storage Cost: The cost of storing intermediate results generated during query processing. Computational Cost: The cost of CPU operations needed to process the query. The optimizer evaluates multiple execution plans by calculating the total cost for each plan and selecting the one with the lowest estimated cost. This process involves analyzing the query tree, which represents the operations needed to execute the query, and estimating the cost based on the expected cardinality of the output at each step. Example Consider a database with two tables: Employees and Departments. Employees Table: 16 o EmployeeID (Primary Key) o Name o DepartmentID (Foreign Key) Departments Table: o DepartmentID (Primary Key) o DepartmentName Suppose we want to execute the following query: SELECT Name FROM Employees WHERE DepartmentID = 5; By using selectivity and cost estimates, the query optimizer can efficiently determine that using the index on DepartmentID is the best execution strategy for this query, leading to faster performance. In summary, selectivity and cost estimates are fundamental components of query optimization that enable database systems to execute queries efficiently by selecting the most appropriate execution plans based on expected data characteristics and resource usage. 2.5. Semantic Query Optimization 17 Semantic Query Optimization (SQO) is an advanced technique in database management that enhances query execution efficiency by leveraging the semantic knowledge of the data. Unlike traditional optimization methods that focus primarily on physical aspects like index usage and join strategies, SQO utilizes the inherent meaning and constraints of the data to reformulate queries for better performance. Key Concepts of Semantic Query Optimization 1. Use of Constraints: o SQO employs constraints defined in the database schema, such as unique attributes and referential integrity constraints, to determine if a query can be simplified or modified. For example, if a query violates a known constraint, the optimizer can conclude that the result will be empty and avoid executing the query altogether. 2. Query Reformulation: o The optimizer can transform a query into a more efficient form based on the semantic understanding of the data. This may involve eliminating unnecessary joins or substituting complex subqueries with simpler ones, thereby reducing the computational load. 3. Active Rules and Metadata: o The incorporation of active rules and additional metadata in the database system can enhance the effectiveness of SQO. These rules can provide insights into how data is related and how queries can be optimized based on that relationship. Example of Semantic Query Optimization Consider the following SQL query: SELECT E.Lname, M.Lname FROM EMPLOYEE AS E, EMPLOYEE AS M WHERE E.Super_ssn = M.Ssn AND E.Salary > M.Salary; This query aims to retrieve the names of employees who earn more than their supervisors. However, if there is a constraint in the database schema stating that no employee can earn more than their direct supervisor, the semantic query optimizer can recognize that the result of this query will always be empty. Consequently, it can avoid executing the query, saving considerable time and resources. 18 19 Chapter 3: Transaction Processing Concepts 3.1. Introduction  The transaction is a set of logically related operation. It contains a group of tasks.  A transaction is an action or series of actions. It is performed by a single user to perform operations for accessing the contents of the database.  Transaction processing in a database refers to the sequence of operations performed as a single logical unit of work Example: Suppose an employee of bank transfers Birr 800 from X's account to Y's account. This small transaction contains several low-level tasks: X's Account 1. Open_Account(X) 2. Old_Balance = X.balance 3. New_Balance = Old_Balance - 800 4. X.balance = New_Balance 5. Close_Account(X) Y's Account 1. Open_Account(Y) 2. Old_Balance = Y.balance 3. New_Balance = Old_Balance + 800 4. Y.balance = New_Balance 5. Close_Account(Y) Following are the main operations of transaction: Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main memory. Write(X): Write operation is used to write the value back to the database from the buffer. 1 Let's take an example to debit transaction from an account which consists of following operations: 1. R(X); 2. X = X - 500; 3. W(X); Let's assume the value of X before starting of the transaction is 4000.  The first operation reads X's value from database and stores it in a buffer.  The second operation will decrease the value of X by 500. So buffer will contain 3500.  The third operation will write the buffer's value to the database. So X's final value will be 3500. But it may be possible that because of the failure of hardware, software or power, etc. that transaction may fail before finished all the operations in the set. For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's value will remain 4000 in the database which is not acceptable by the bank. To solve this problem, we have two important operations: Commit: It is used to save the work done permanently. Rollback: It is used to undo the work done. 3.2. Properties of Transaction The transaction has the four properties. These are used to maintain consistency in a database, before and after the transaction. Property of Transaction 1. Atomicity 2. Consistency 3. Isolation 4. Durability 2 Atomicity  It states that all operations of the transaction take place at once if not, the transaction is aborted.  Ensures that a transaction is treated as a single unit. If any part of the transaction fails, the entire transaction is rolled back, leaving the database unchanged.  There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit and either run to completion or is not executed at all. Atomicity involves the following two operations: Abort: If a transaction aborts then all the changes made are not visible. Commit: If a transaction commits then all the changes made are visible. 3 Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to account B. T1 T2 Read(A) Read(B) A:= A-100 Y:= Y+100 Write(A) Write(B) After completion of the transaction, A consists of Birr 500 and B consists of Birr 400. If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then the amount will be deducted from A but not added to B. This shows the inconsistent database state. In order to ensure correctness of database state, the transaction must be executed in entirety. Consistency  The integrity constraints are maintained so that the database is consistent before and after the transaction.  Guarantees that a transaction will bring the database from one valid state to another, maintaining all predefined rules, including integrity constraints.  The execution of a transaction will leave a database in either its prior stable state or a new stable state.  The consistent property of database states that every transaction sees a consistent database instance.  The transaction is used to transform the database from one consistent state to another consistent state. For example: The total amount must be maintained before or after the transaction. 1. Total before T occurs = 600+300=900 2. Total after T occurs= 500+400=900 4 Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then inconsistency will occur. Consistency in the context of database transactions ensures that a transaction brings the database from one valid state to another, maintaining all defined rules and constraints. Example Scenario: Library Book Borrowing System Consider a library database where patrons can borrow books. The system must ensure that the borrowing process maintains the integrity of the data regarding book availability. Transaction: Borrowing a Book Goal: Alice wants to borrow "Database 101". Steps in the Transaction Check Book Availability: The system checks if "Database 101" has available copies. Update Book Records: If available, reduce the count of available copies by one. Update Borrowers Record: Add a record indicating that Alice has borrowed the book. 5 Explanation of Consistency in This Example - Before the Transaction: The library has 3 available copies of "Database 101". - During the Transaction: - The system checks the availability and finds that there are 3 copies. - The transaction updates the `AvailableCopies` to 2. - It also adds an entry to the `BorrowersBooks` table indicating that Alice has borrowed the book. In this transaction example, the **consistency** property ensures that the database transitions from a valid state (3 copies available) to another valid state (2 copies available) without violating any rules. If the process is interrupted or an error occurs (e.g., trying to borrow a book that is not available), the transaction will not complete, and the database will remain consistent. This illustrates how consistency maintains the integrity of data in a transactional system. 6 Isolation  It shows that the data which is used at the time of execution of a transaction cannot be used by the second transaction until the first one is completed.  In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be accessed by any other transaction T2 until the transaction T1 ends.  The concurrency control subsystem of the DBMS enforced the isolation property. Durability  The durability property is used to indicate the performance of the database's consistent state. It states that the transaction made the permanent changes.  They cannot be lost by the erroneous operation of a faulty transaction or by the system failure. When a transaction is completed, then the database reaches a state known as the consistent state. That consistent state cannot be lost, even in the event of a system's failure.  The recovery subsystem of the DBMS has the responsibility of Durability property. 7 Operations of Transaction A user can make different types of requests to access and modify the contents of a database. So, we have different types of operations relating to a transaction. They are discussed as follows: Read(X) A read operation is used to read the value of X from the database and store it in a buffer in the main memory for further actions such as displaying that value. Such an operation is performed when a user wishes just to see any content of the database and not make any changes to it. For example, when a user wants to check his/her account’s balance, a read operation would be performed on user’s account balance from the database. 8 Write(X) A write operation is used to write the value to the database from the buffer in the main memory. For a write operation to be performed, first a read operation is performed to bring its value in buffer, and then some changes are made to it, e.g. some set of arithmetic operations are performed on it according to the user’s request, then to store the modified value back in the database, a write operation is performed. For example, when a user requests to withdraw some money from his account, his account balance is fetched from the database using a read operation, then the amount to be deducted from the account is subtracted from this value, and then the obtained value is stored back in the database using a write operation. Commit This operation in transactions is used to maintain integrity in the database. Due to some failure of power, hardware, or software, etc., a transaction might get interrupted before all its operations are completed. This may cause ambiguity in the database, i.e. it might get inconsistent before and after the transaction. To ensure that further operations of any other transaction are performed only after work of the current transaction is done, a commit operation is performed to the changes made by a transaction permanently to the database. Rollback This operation is performed to bring the database to the last saved state when any transaction is interrupted in between due to any power, hardware, or software failure. In simple words, it can be said that a rollback operation does undo the operations of transactions that were performed before its interruption to achieve a safe state of the database and avoid any kind of ambiguity or inconsistency. 3.3. Schedules and Recoverability A series of operation from one transaction to another transaction is known as schedule. It is used to preserve the order of the operation in each of the individual transaction. It seems like you're looking to create or manage a schedule in a database. Here’s a general approach to how you can handle scheduling in a database 9 Steps to Create a Schedule in a Database 1. Define Your Requirements: - What kind of schedule are you creating? (e.g., events, appointments, tasks) - What attributes do you need? (e.g., title, description, start time, end time, participant) 2. Design the Database Schema: Here’s a simple example of a table structure for a schedule: CREATE TABLE Schedule ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(100) NOT NULL, description TEXT, start_time DATETIME NOT NULL, end_time DATETIME NOT NULL, participant VARCHAR(100) ); 3. Insert Data: Use SQL commands to insert your schedule items. For example: INSERT INTO Schedule (title, description, start_time, end_time, participant) VALUES ('Meeting with Team', 'Discuss project updates', '2024-11-15 10:00:00', '2024-11-15 11:00:00', 'John Doe'); 4. Query the Schedule: Retrieve schedule items with a query: SELECT * FROM Schedule WHERE start_time >= NOW() ORDER BY start_time; 5. Update or Delete Entries: You can update or delete entries as needed: UPDATE Schedule SET title = 'Updated Meeting' WHERE id = 1; DELETE FROM Schedule WHERE id = 1; 6. Considerations for Time Zones: If your application involves users in different time zones, be sure to store times in UTC and convert them as necessary when displaying to users. 10 3.4. Serializability of Schedules A schedule is a sequence of operations (such as read and write) from multiple transactions that are executed in a database. The order of these operations can significantly affect the final state of the database. Schedules can be categorized into different types: 1. Serial Schedule: Transactions are executed one after the other without overlapping. This guarantees consistency but may not utilize system resources efficiently. 2. Concurrent Schedule: Transactions are executed in overlapping time periods. This can improve performance but may lead to issues like lost updates or inconsistent reads if not managed properly. 1. Serial Schedule The serial schedule is a type of schedule where one transaction is executed completely before starting another transaction. In the serial schedule, when the first transaction completes its cycle, then the next transaction is executed. For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no interleaving of operations, then there are the following two possible outcomes: 1. Execute all the operations of T1 which was followed by all the operations of T2. 2. Execute all the operations of T1 which was followed by all the operations of T2. o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2. 11 o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1. 12 2. Non-serial Schedule  If interleaving of operations is allowed, then there will be non-serial schedule.  It contains many possible orders in which the system can execute the individual operations of the transactions.  In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It has interleaving of operations. 13 3. Serializable schedule The serializability of schedules is used to find non-serial schedules that allow the transaction to execute concurrently without interfering with one another. It identifies which schedules are correct when executions of the transaction have interleaving of their operations. A non-serial schedule will be serializable if its result is equal to the result of its transactions executed serially. A serializable schedule in database management refers to a schedule of transactions that ensures consistency and isolation. It guarantees that the outcome of executing a set of transactions concurrently is the same as if they had been executed serially, one after the other, without overlap. This is crucial for maintaining data integrity in multi-user environments. Recoverability Recoverability is a property of database systems that ensures that, in the event of a failure or error, the system can recover the database to a consistent state. Recoverability guarantees that all committed transactions are durable and that their effects are permanently stored in the database, while the effects of uncommitted transactions are undone to maintain data consistency. The recoverability property is enforced through the use of transaction logs, which record all changes made to the database during transaction processing. When a failure occurs, the system uses the log to 14 recover the database to a consistent state, which involves either undoing the effects of uncommitted transactions or redoing the effects of committed transactions. There are several levels of recoverability that can be supported by a database system:  No-undo logging: This level of recoverability only guarantees that committed transactions are durable, but does not provide the ability to undo the effects of uncommitted transactions.  Undo logging: This level of recoverability provides the ability to undo the effects of uncommitted transactions but may result in the loss of updates made by committed transactions that occur after the failed transaction.  Redo logging: This level of recoverability provides the ability to redo the effects of committed transactions, ensuring that all committed updates are durable and can be recovered in the event of failure.  Undo-redo logging: This level of recoverability provides both undo and redo capabilities, ensuring that the system can recover to a consistent state regardless of whether a transaction has been committed or not. In addition to these levels of recoverability, database systems may also use techniques such as checkpointing and shadow paging to improve recovery performance and reduce the overhead associated with logging. Overall, recoverability is a crucial property of database systems, as it ensures that data is consistent and durable even in the event of failures or errors. It is important for database administrators to understand the level of recoverability provided by their system and to configure it appropriately to meet their application’s requirements. Recoverable Schedules:  Schedules in which transactions commit only after all transactions whose changes they read commit are called recoverable schedules. In other words, if some transaction Tj is reading value updated or written by some other transaction Ti, then the commit of Tj must occur after the commit of Ti. 15 Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of first transaction performed first update on data item x, hence given schedule is recoverable. This is a recoverable schedule since T1 commits before T2, that makes the value read by T2 correct. Irrecoverable Schedule: The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. T2 commits. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. But we have already committed that. So this schedule is irrecoverable schedule. When Tj is reading the value updated by Ti and Tj is committed before committing of Ti, the schedule will be irrecoverable. 16 Recoverable with Cascading Rollback: The table below shows a schedule with two transactions, T1 reads and writes A and that value is read and written by T2. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value written by T1, it should also be rollbacked. As it has not committed, we can rollback T2 as well. So it is recoverable with cascading rollback. Therefore, if Tj is reading value updated by Ti and commit of Tj is delayed till commit of Ti, the schedule is called recoverable with cascading rollback. Cascadeless Recoverable Rollback: The table below shows a schedule with two transactions, T1 reads and writes A and commits and that value is read by T2. But if T1 fails before commit, no other transaction has read its value, so there is no need to rollback other transaction. So this is a Cascadeless recoverable schedule. So, if Tj reads value updated by Ti only after Ti is committed, 17 the schedule will be cascadeless recoverable. 3.5. Transaction Support in SQL A SQL transaction is a sequence of database operations that behave as a single unit of work. It ensures that multiple operations are executed in an atomic and consistent manner, which is crucial for maintaining database integrity. In the following sections, you will learn about the overview of SQL transactions, its principles, and various types and properties. Transactions have several benefits, such as:  Controlling the concurrent execution of operations and preventing conflicts among them Ensuring data integrity even when an operation fails or a system crash occurs  Allowing easy recovery from errors and maintaining a consistent state of the database Types and Properties of SQL Transactions 18 There are various types of SQL transactions based on their properties and usage. Some common types are:  Read-only transaction: A transaction that only reads data but does not modify it.  Write transaction: A transaction that modifies the data, i.e., inserts, updates, or deletes records from the database.  Distributed transaction: A transaction that spans across multiple databases or systems. 19 Chapter 4: Concurrency Control Techniques Concurrency control in databases is essential for ensuring that multiple transactions can be executed simultaneously without leading to data inconsistency. It addresses the challenges that arise when transactions interact with shared data, aiming to maintain the integrity and consistency of the database. Here are the key concepts and techniques involved in concurrency control: Integrity and consistency Integrity and consistency are fundamental concepts in database management that ensure data remains accurate, reliable, and valid throughout its lifecycle. Here’s a detailed explanation of each concept along with examples to illustrate their importance. Data Integrity Data integrity refers to the accuracy and consistency of data stored in a database. It ensures that the data is correct, reliable, and maintained according to defined rules and constraints. There are several types of data integrity: Data Consistency Data consistency refers to the state of data where all copies or instances are the same across all systems and databases. It ensures that data remains accurate and coherent across different database systems, applications, and platforms. Here are some examples of data consistency: 1.1. Locking Techniques for Concurrency Control Locking techniques are fundamental methods used in concurrency control to ensure that multiple transactions can operate safely in a shared database environment. Here are the main types of locking techniques: 1. Shared and Exclusive Locks Shared Locks: Allow multiple transactions to read a resource concurrently but prevent any transaction from writing to it. Useful for read-heavy operations. Allows a single transaction to 1 read and modify a resource. Prevents other transactions from accessing the resource until the lock is released. Allow multiple transactions to read a resource but prevent any modifications. Example: A transaction viewing a bank account balance must acquire a shared lock. Exclusive Locks: Allows a single transaction to read and modify a resource. Prevents other transactions from accessing the resource until the lock is released.  Allow only one transaction to write to a resource, preventing others from reading or writing while the lock is held.  Ensures data integrity during modifications. Example: A transaction updating a bank account balance must acquire an exclusive lock.  Transaction A acquires a shared lock to read a record.  Transaction B also acquires a shared lock on the same record to read it.  Transaction C tries to acquire an exclusive lock to modify the record. It must wait until both A and B release their shared locks. 2. Two-Phase Locking (2PL) This technique involves two phases: the Growing Phase, where a transaction can acquire locks but cannot release them, and the Shrinking Phase, where it can release locks but cannot acquire new ones. For example, if Transaction T1 locks a record to update it, it must complete its updates before releasing the lock, preventing other transactions from accessing the same record until T1 is finished. This helps avoid conflicts and ensures that transactions are executed in a controlled manner.  Description: Transactions grow and shrink their locks in two phases: 2  Growing Phase: Locks are acquired but not released.  Shrinking Phase: Locks are released and no new locks can be acquired.  In database management systems, a locking point refers to a specific moment during a transaction when a lock is acquired on a data item to control access and ensure data integrity. Locking points are crucial for managing concurrency, especially in environments where multiple transactions may attempt to read or write the same data simultaneously.  Advantages: Guarantees serializability.  Disadvantages: Can lead to deadlocks and requires deadlock detection or prevention mechanisms. 3. Timestamp Ordering In this method, each transaction is assigned a unique timestamp. The system uses these timestamps to determine the order of operations. For instance, if Transaction T1 has an earlier timestamp than T2, T1's operations will be prioritized. If T1 reads a data item, and T2 attempts to write to it, T2 will be delayed until T1 completes, ensuring that the database remains consistent  Description: Each transaction is assigned a unique timestamp. Transactions are then executed in the order of their timestamps.  Advantages: Eliminates the need for locks, reducing contention.  Disadvantages: Can lead to wasted work if transactions are aborted due to conflicts. 4. Optimistic Locking/ Validation Concurrency Control: This optimistic approach allows transactions to execute without restrictions until they reach a validation phase. For example, a transaction may read and modify data freely, but before committing, it checks whether any other transactions have modified the data it read. If conflicts are detected, the transaction is rolled back. This method is effective in environments where conflicts are rare.  Description: Transactions proceed without locking resources, but check for conflicts before committing. 3  Usage: Suitable for environments with low contention.  Advantages: Reduces lock overhead and increases concurrency.  Disadvantages: May lead to higher abort rates under contention. 5. Multi-Version Concurrency Control (MVCC) MVCC allows multiple versions of a data item to exist simultaneously. When a transaction reads a data item, it accesses the version that was current at the time the transaction started. For example, if Transaction T1 updates a record, T2 can still read the old version of that record without being blocked. This increases concurrency and reduces waiting times for read operations  Description: Allows multiple versions of a data item, enabling readers to access the most recent version without blocking writers.  Advantages: High concurrency and no read locks, suitable for read-heavy applications.  Disadvantages: Increased storage requirements and complexity in managing versions.  Choosing the appropriate locking technique depends on the specific application requirements, transaction patterns, and workload characteristics. Balancing concurrency, performance, and data integrity is key to effective concurrency control in database systems. 1.2. Granularity of Data Items and Multiple Granularity Locking Granularity of data items in a database management system (DBMS) refers to the size or level of detail at which data can be locked. This concept is crucial for managing concurrency and ensuring that multiple transactions can operate on the database without interfering with each other. Multiple Granularity Locking (MGL) is a technique that allows for locking at various levels of granularity, enhancing both efficiency and concurrency. Granularity Levels Granularity can be understood in terms of different levels at which locks can be applied:  Database Level: The entire database can be locked.  Area Level: Specific sections or areas within the database can be locked.  File Level: Individual files within an area can be locked.  Record Level: Specific records within a file can be locked. 4 This hierarchical structure allows transactions to lock data at the most appropriate level, depending on their needs. Multiple Granularity Locking (MGL) MGL organizes the database into a tree structure where each node represents a different level of granularity. When a transaction locks a node, it implicitly locks all its descendant nodes in the same mode. For example, if a transaction locks a file in exclusive mode, it automatically locks all records within that file in exclusive mode as well. Lock Modes in MGL MGL uses several lock modes to manage access:  Shared Lock (S): Allows multiple transactions to read the same data simultaneously.  Exclusive Lock (X): Prevents any other transaction from accessing the data while it is being modified.  Intention Locks: Intention-Shared (IS): Indicates that a transaction intends to acquire shared locks on lower-level nodes. Intention-Exclusive (IX): Indicates that a transaction intends to acquire exclusive or shared locks on lower-level nodes. Shared & Intention-Exclusive (SIX): Indicates that a transaction has a shared lock at the current level and intends to acquire exclusive locks on lower levels. Example of Multiple Granularity Locking Consider a database structured as follows: Database Area A File F1 Record R1 Record R2 File F2 Record R3 Area B File F3 Record R4 5 Transaction Scenarios Transaction T1 wants to read Record R1 in File F1:  T1 locks the Database in IS mode.  T1 locks Area A in IS mode.  T1 locks File F1 in S mode.  Finally, T1 locks Record R1 in S mode. Transaction T2 wants to modify Record R3 in File F2:  T2 locks the Database in IX mode.  T2 locks Area A in IX mode.  T2 locks File F2 in X mode.  Finally, T2 locks Record R3 in X mode. Transaction T3 wants to read all records in File F1:  T3 locks the Database in IS mode.  T3 locks Area A in IS mode.  T3 locks File F1 in S mode.  Transaction T4 wants to read the entire database:  T4 locks the Database in S mode. In this scenario, T1, T3, and T4 can operate concurrently, while T2 can execute concurrently with T1 but not with T3 or T4, demonstrating how MGL enhances concurrency while managing locks effectively Multiple Granularity Locking allows for efficient management of locks at various levels of granularity, improving concurrency and reducing locking overhead. By structuring the database in a hierarchical manner and using intention locks, MGL provides a robust framework for handling multiple transactions simultaneously. Example 2 Consider a banking database where transactions might involve multiple accounts and their balances. Here’s how Multiple Granularity Locking might work: Transaction A Goal: Transfer money from Account A to Account B. Steps: 6  Acquire an Exclusive Lock on Table (to prevent other transactions from accessing any account during the transfer).  Acquire an Exclusive Lock on Row (R1) (Account A) to read and modify the balance.  Acquire an Exclusive Lock on Row (R2) (Account B) to read and modify the balance. Transaction B Goal: Check balances of Account C and Account D. Steps:  Acquire a Shared Lock on Table (to read any account balance).  Acquire a Shared Lock on Row (R3) (Account C).  Acquire a Shared Lock on Row (R4) (Account D). 1.3. Using Locks for Concurrency Control in Indexes Using locks for concurrency control in indexes is crucial for ensuring data integrity and allowing multiple transactions to operate simultaneously without conflicts. Here’s an overview of how locks are applied in index structures, particularly focusing on two-phase locking (2PL). Two-Phase Locking (2PL) in Indexes 1. Locking Mechanism: In a two-phase locking protocol, locks are held during the growing phase until the transaction enters the shrinking phase. However, if locks are held on index pages for too long, it can lead to significant transaction blocking. For instance, when a write operation occurs, the root node of the index is typically locked in exclusive mode, preventing other transactions from accessing the index until the lock is released. 2. Read Operations: During read operations, a transaction traverses the index tree from the root to a leaf node. Once a lower-level node is accessed, the lock on the parent node can be released, which allows other transactions to proceed without waiting. 3. Insertions: For insert operations, a conservative approach involves locking the root node exclusively while traversing down to the appropriate child node. If the child node is not full, the lock on the root can be released, minimizing the duration of the lock. Alternatively, a more optimistic approach is to hold shared locks on the nodes leading to the leaf node, upgrading to an exclusive lock only when necessary. 7

Use Quizgecko on...
Browser
Browser