Document Details

AchievableSuprematism

Uploaded by AchievableSuprematism

Tags

database management data modeling relational databases computer science

Full Transcript

Video chapters Chapter-1 (Basics): Data & information, Database System vs File System, Views of Data Base, Data Independence, Instances & Schema, OLAP Vs OLTP, Types of Data Base, DBA, Architecture. Chapter-2 (ER Diagram): Entity, Attributes, Relat...

Video chapters Chapter-1 (Basics): Data & information, Database System vs File System, Views of Data Base, Data Independence, Instances & Schema, OLAP Vs OLTP, Types of Data Base, DBA, Architecture. Chapter-2 (ER Diagram): Entity, Attributes, Relationship, Degree of a Relationship, Mapping, Weak Entity set, Conversion from ER Diagram to Relational Model, Generalization, Specification, Aggregation. Chapter-3 (RDBMS & Functional Dependency): Basics & Properties, Update Anomalies, Purpose of Normalization, Functional Dependency, Closure Set of Attributes, Armstrong’s axioms, Equivalence of two FD, Canonical cover, Keys. Chapter-4 (Normalization): 1NF, 2NF, 3NF, BCNF, Multivalued Dependency, 4NF, Lossy-Lossless Decomposition, 5NF, Dependency Preserving Decomposition. Chapter-5 (Indexing): Overview of indexing, Primary indexing, Clustered indexing and Secondary Indexing, B-Tree. Chapter-6 (Relational Algebra) : Query Language, Select, Project, Union, Set Difference, Cross Product, Rename Operator, Additional or Derived Operators. Chapter-7(SQL) : Introduction to SQL, Classification, DDL Commands, Select, Where, Set Operations, Cartesian Product, Natural Join, Outer Join, Rename, Aggregate Functions, Ordering, String, Group, having, Trigger, embedded, dynamic SQL. Chapter-8 (Relational Calculus) : Overview, Tuple Relation Calculus, Domain Relation Calculus. Chapter-9 (Transaction) : What is Transaction, ACID Properties, Transaction Sates, Schedule, Conflict Serializability, View Serializability, Recoverability, Cascade lessness, Strict Schedule. Chapter-10 (Recovery & Concurrency Control) : Log Based Recovery, Shadow Paging, Data Fragmentation, TIME STAMP ORDERING PROTOCOL, THOMAS WRITE RULE, 2 phase locking, Basic 2pl, Conservative 2pl, Rigorous 2pl, Strict 2pl, Validation based protocol Multiple Granularity. Knowledge Gate Website Syllabus UNIT-1 : INTRODUCTION Overview, Database System vs File System, Database System Concept and Architecture, Data Model Schema and Instances, Data Independence and Database Language and Interfaces, Data Definitions Language, DML, Overall Database Structure. Data Modeling Using the Entity Relationship Model: ER Model Concepts, Notation for ER Diagram, Mapping Constraints, Keys, Concepts of Super Key, Candidate Key, Primary Key, Generalization, Aggregation, Reduction of an ER Diagrams to Tables, Extended ER Model, Relationship of Higher Degree. UNIT-2 : RELATIONAL DATA MODEL Relational Data Model Concepts, Integrity Constraints, Entity Integrity, Referential Integrity, Keys Constraints, Domain Constraints, Relational Algebra, Relational Calculus, Tuple and Domain Calculus. Introduction on SQL: Characteristics of SQL, Advantage of SQL. SQl Data Type and Literals. Types of SQL Commands. SQL Operators and Their Procedure. Tables, Views and Indexes. Queries and Sub Queries. Aggregate Functions. Insert, Update and Delete Operations, Joins, Unions, Intersection, Minus, Cursors, Triggers, Procedures in SQL/PL SQL. UNIT-3 : DATA BASE DESIGN & NORMALIZATION Functional dependencies, normal forms, first, second, 3 third normal forms, BCNF, inclusion dependence, loss less join decompositions, normalization using FD, MVD, and JDs, alternative approaches to database design. UNIT-4 : TRANSACTION PROCESSING CONCEPT Transaction System, Testing of Serializability, Serializability of Schedules, Conflict & View Serializable Schedule, Recoverability, Recovery from Transaction Failures, Log Based Recovery, Checkpoints, Deadlock Handling. Distributed Database: Distributed Data Storage, Concurrency Control, Directory System. UNIT-5 : CONCURRENCY CONTROL TECHNIQUES Concurrency Control, Locking Techniques for Concurrency Control, Time Stamping Protocols for Concurrency Control, Validation Based Protocol, Multiple Granularity, Multi Version Schemes, Recovery with Concurrent Transaction, Case Study of Oracle. Knowledge Gate Website What is Data ? Data are characteristics or attributes, often numerical, collected through observation and can be qualitative(descriptive) or quantitative(numerical). Any facts and figures about an entity is called as Data. Data serves a crucial role in various sectors by facilitating analysis, supporting decision-making processes, and being fundamental to research activities. Knowledge Gate Website What is Information ? Data becomes information when analyzed and placed in context, providing a basis for understanding, decision-making, and further analysis. Processed Data is called information. Knowledge Gate Website What is Data Base? A database is a structured collection of data, facilitating easy access, management, and updates., generally stored and accessed electronically from a computer system. Knowledge Gate Website What is Data Base Management System? A DBMS is software facilitating efficient data storage, retrieval, and management in databases. Ensures data safety and integrity, while offering accessibility and concurrency control. Supports functions like data querying, reporting, and analytics for informed decision- making. Knowledge Gate Website Aspect File System Database Management System Slower data retrieval due to unstructured Structured querying capabilities allow for Data Access querying capabilities. quicker data access. Challenges in correlating data across Facilitates data integration, reducing data Data Isolation separate files leading to data isolation. isolation issues. Risk of inadvertent data alterations or Features to prevent unauthorized data Data Integrity deletions creating integrity problems. alterations, maintaining integrity. Potential for data inconsistency due to Supports transaction properties like Atomicity Problem incomplete operations, leading to atomicity, ensuring operations are either atomicity problems. completed fully or not at all. Conflicts and inconsistencies from Advanced concurrency controls to manage Concurrent Access simultaneous data access/modifications, multiple users accessing the database Anomalies causing concurrent access anomalies. simultaneously, reducing anomalies. Knowledge Gate Website View of Data Base (Data Abstraction) Physical Level Logical Level/ Conceptual Level View Level Knowledge Gate Website View of Data Base (Data Abstraction) Physical Level: The internal schema details data storage and access on hardware, featuring the lowest level of data abstraction with complex structures, predominantly managed by the database administrator. Knowledge Gate Website View of Data Base Logical Level/ Conceptual Level: Above the physical level, this level showcases data as entity sets and their relationships, detailing the types and connections between stored data in the database. Knowledge Gate Website View of Data Base View Level: This is the pinnacle of data abstraction, displaying only a portion of the entire database focusing on user-interest areas. It can represent multiple views of the same data, allowing users to access information through various applications from the database. Knowledge Gate Website Data independence Data independence is defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. Types of data independence : Physical data independence: a. Physical data independence is the ability to modify internal schema without changing the conceptual schema. b. Modification at the physical level is occasionally necessary in order to improve performance. c. It refers to the immunity of the conceptual schema to change in the internal schema. d. Examples of physical data independence are reorganizations of files, adding a new access path or modifying indexes, etc. Logical data independence: a. Logical data independence is the ability to modify the conceptual schema without having to change the external schemas or application programs. b. It refers to the immunity of the external model to changes in the conceptual model. c. Examples of logical data independence are addition/removal of entities. Knowledge Gate Website Instance and Schemas Instance of the Database: The collection of information stored in the database at a specific moment is known as an instance of the database. It is a snapshot of the database that contains live data at that moment, showing the current state of all records and transactions. Database Schema: The database schema refers to the overall design of the database, illustrating the logical structure and organization of data. It defines how data is organized and how relationships between data are handled, essentially serving as the blueprint for how the database is constructed. Knowledge Gate Website OLAP (Online Analytical OLTP (Online Transaction Aspect Processing) Processing) Designed for complex data analysis Handles daily transactional data Primary Function and reporting. processing. Star or snowflake schema, Normalized schema, optimizing for Database Design optimizing for read operations. write operations. Complex queries involving Simple and standard queries Query Complexity aggregations and computations focusing on CRUD operations across multiple dimensions. (Create, Read, Update, Delete). Deals with large volumes of data Processes a high number of small Data Volume for historical analysis. transactions. Slower response time due to Fast response time to support high Response Time complex queries. Knowledge Gate Websitetransaction rates. Types of Data Base Commercial Database: Predominantly used in the business sector to handle large volumes of transactions and customer data. A CRM system like Salesforce which handles large volumes of customer data and transactions. Multimedia Database: Stores data types such as images, audio, and video files, facilitating the management and retrieval of multimedia content. A digital asset management system like Adobe Experience Manager that facilitates the storage and retrieval of multimedia content. Deductive Database: Utilizes logic programming to derive information from data stored in a database, allowing for more complex and analytical queries. A database using Datalog (a query language) which allows for complex logical queries and information derivation. Knowledge Gate Website Temporal Database: Keeps track of changing data over time, allowing for queries concerning time-based data. A historical trading database in the financial sector which keeps track of stock prices over time. Geological Information System (GIS): Stores, organizes, and analyzes geographical data, aiding in spatial analysis and mapping projects. A system like ArcGIS which enables the storage, analysis, and visualization of geographical data. Knowledge Gate Website DBA(Database Administrator) Database administrators hold authority over data and the programs facilitating data access. Their roles/functions are: Schema Definition: DBA outlines the original database schema. Achieved through writing definitions translated to permanent labels in the data dictionary by the DDL compiler. Storage Structure and Access Method Definition: Responsible for forming appropriate storage structures and access methods. Achieved through writing definitions translated by the data storage and definition language compiler. Schema and Physical Organization Modification: Involves altering the database schema or physical storage organization. Changes are implemented by writing definitions that modify the relevant internal system tables. Granting of Authorization for Data Access: DBA grants varied types of data access authorization to different database users. Integrity Constraint Specification: DBA implements and maintains integrity constraints to ensure data accuracy and consistency. Knowledge Gate Website DBMS Architecture Knowledge Gate Website 1.Query Processor: This is the component of a DBMS that interprets and executes user queries. It comprises several sub-components including: 1. DML Compiler: Processes Data Manipulation Language (DML) statements into low-level instructions that can be executed. 2. DDL Interpreter: Processes Data Definition Language (DDL) statements into metadata tables. 3. Embedded DML Pre-compiler: Translates DML statements embedded in application programs into procedural calls. 4. Query Optimizer: Determines the most efficient way to execute a query by evaluating different query plans. 2.Storage Manager: Also known as the Database Control System, it is responsible for managing the data stored in the database, ensuring its consistency and integrity. It includes the following sub-components: 1. Authorization Manager: Manages access controls and privileges. 2. Integrity Manager: Ensures that data modifications adhere to integrity constraints. 3. Transaction Manager: Manages concurrent access to the database and maintains database consistency during transactions. 4. File Manager: Manages file space and data structures representing information in the database. 5. Buffer Manager: Manages data cache and data transfer between main memory and secondary storage. 3.Disk Storage: Represents the storage aspect of a DBMS, encompassing the following components: 1. Data Files: Files where the actual data is stored. 2. Data Dictionary: Repository containing information about the structure and characteristics of database objects. 3. Indices: Data structures that facilitate faster data retrieval. Knowledge Gate Website A Database Management System (DBMS) consists of three primary components: 1. Internal Level: Concerns the physical storage of data in databases, overseeing data storage on hardware devices, and managing low-level aspects like data compression and indexing. 2. Conceptual Level: Represents the logical layout of the database, detailing the schema with tables and attributes and their interrelations. It's independent of specific DBMS implementations, focusing on organizing and connecting data elements. 3. External Level: Embodies the user interface of the database, facilitating data access and interaction through user-friendly views and interfaces tailored to various user groups. Knowledge Gate Website ER Diagram Developed by Dr. Peter Chen in 1976, this conceptual level method, grounded in real-world perceptions, facilitates diagrammatic data representation, simplifying comprehension for non-technical users. The E-R data model, central to database design, encapsulates entities and their attributes within an enterprise schema, serving as a clear, standardized tool for translating real-world enterprise interactions into a conceptual schema. Knowledge Gate Website ER diagram for bank Knowledge Gate Website ER diagram for University System Knowledge Gate Website ER diagram for University System Knowledge Gate Website ER diagram for Marketing Company Knowledge Gate Website ENTITY An entity is a thing or an object in the real world that is distinguishable from other object based on the values of the attributes it possesses. An entity may be concrete, such as a person or a book, or it may be abstract, such as a course, a course offering, or a flight reservation. Knowledge Gate Website Types of Entity Tangible - Entities which physically exist in real world. E.g. - Car, Pen, locker Intangible - Entities which exist logically. E.g. – Account, video. Knowledge Gate Website In ER diagram we cannot represent an entity, as entity is an instant not schema, and ER diagram is designed to understand schema In a relational model entity is represented by a row or a tuple or a record in a table. Knowledge Gate Website ENTIY SET- Collection of same type of entities that share the same properties or attributes. In an ER diagram an entity set is represented by a rectangle In a relational model it is represented by a separate table Knowledge Gate Website ATTRIBUTES Attributes are the units defines and describe properties and characteristics of entities. Attributes are the descriptive properties possessed by each member of an entity set. for each attribute there is a set of permitted values called domain. Knowledge Gate Website In an ER diagram attributes are represented by ellipse or oval connected to rectangle. While in a relational model they are represented by independent column. e.g. Instructor (ID, name, salary, dept_name) Knowledge Gate Website Types of Attributes Single valued- Attributes having single value at any instance of time for an entity. E.g. – Aadhar no, dob. Multivalued - Attributes which can have more than one value for an entity at same time. E.g. - Phone no, email, address. A multivalued attribute is represented by a double ellipse in an ER diagram and by an independent table in a relational model. Separate table for each multivalued attribute, by taking mva and pk of main table as fk in new table Knowledge Gate Website Customer Customer ID First Name Surname Telephone Number 123 Rabri Devi Singh 555-861-2025, 192-122-1111 (555) 403-1659 Ext. 53; 182-929- 456 Imarti Devi Zhang 2929 789 Barfi Devi Doe 555-808-9633 Knowledge Gate Website Customer Customer ID First Name Surname Telephone Number 123 Pooja Singh 555-861-2025, 192-122-1111 456 San Zhang (555) 403-1659 Ext. 53; 182-929-2929 789 John Doe 555-808-9633 जुगाड़ technology Customer Customer ID First Name Surname Telephone Number1 Telephone Number2 123 Pooja Singh 555-861-2025 192-122-1111 (555) 403-1659 Ext. 456 San Zhang 182-929-2929 53 789 John Doe 555-808-9633 Knowledge Gate Website Customer Customer ID First Name Surname Telephone Number 123 Rabri Devi Singh 555-861-2025, 192-122-1111 456 Imarti Devi Zhang (555) 403-1659 Ext. 53; 182-929-2929 789 Barfi Devi Doe 555-808-9633 Customer Name Customer Phone Number Customer ID Telephone Number Customer ID First Name Surname 123 555-861-2025 123 Pooja Singh 123 192-122-1111 456 (555) 403-1659 Ext. 53 456 San Zhang 456 182-929-2929 789 John Doe 789 555-808-9633 Knowledge Gate Website Simple - Attributes which cannot be divided further into sub parts. E.g. Age Composite - Attributes which can be further divided into sub parts, as simple attributes. A composite attribute is represented by an ellipse connected to an ellipse and in a relational model by a separate column. Knowledge Gate Website Stored - Main attributes whose value is permanently stored in database. E.g. date_of_birth Derived -The value of these types of attributes can be derived from values of other Attributes. E.g. - Age attribute can be derived from date_of_birth and Date attribute. Knowledge Gate Website Descriptive attribute - Attribute of relationship is called descriptive attribute. An attribute takes a null value when an entity does not have a value for it. The null value may indicate “not applicable”— that is, that the value does not exist for the entity. Knowledge Gate Website Relationship / Association Knowledge Gate Website Relationship / Association Is an association between two or more entities of same or different entity set. In ER diagram we cannot represent individual relationship as it is an instance or data. Knowledge Gate Website In an ER diagram it is represented by a diamond, while in relational model sometimes through foreign key and other time by a separate table. Knowledge Gate Website Every relationship type has three components. Name Degree Structural constraints (cardinalities ratios, participation) Knowledge Gate Website NAME- every relation must have a unique name. Knowledge Gate Website Degree of a relationship/relationship set Means number of entities set(relations/tables) associated(participate) in the relationship set. Most of the relationship sets in a data base system are binary. Occasionally however relationship sets involve more than two entity sets. Logically, we can associate any number of entity set in a relationship called N-ary Relationship. Knowledge Gate Website Unary Relationship - One single entity set participate in a Relationship, means two entities of the same entity set are related to each other. These are also called as self -referential Relationship set. E.g.- A member in a team maybe supervisor of another member in team. Knowledge Gate Website Binary Relationship - Two entity sets participate in a Relationship. It is most common Relationship. Knowledge Gate Website Ternary Relationship - When three entities participate in a Relationship. E.g. The University might need to record which teachers taught which subjects in which courses. Knowledge Gate Website Quaternary Relationship - When four entities participate in a Relationship. Knowledge Gate Website N-ary relationship – where n number of entity set are associated But the most common relationships in ER models are Binary. Knowledge Gate Website Structural constraints (Cardinalities Ratios, Participation) An E-R enterprise schema may define certain constraints to which the contents of a database must conform. Knowledge Gate Website MAPPING CARDINALITIES / CARNINALITY RATIOS Express the number of entities to which another entity can be associated via a relationship set. Four possible categories are- One to One (1:1) Relationship. One to Many (1: M) Relationship. Many to One (M: 1) Relationship. Many to Many (M: N) Relationship. Knowledge Gate Website One to One (1:1) Relationship - An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. E.g.- The directed line from relationship set advisor to both entities set indicates that ‘an instructor may advise at most one student, and a student may have at most one advisor’. Knowledge Gate Website One to Many (1: M) Relationship - An entity in A is associated with any number (zero or more) of entities in B. An entity in B, however, can be associated with at most one entity in A. E.g.- This indicates that an instructor may advise many students, but a student may have at most one advisor. Knowledge Gate Website Many to One (M: 1) Relationship - An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with any number (zero or more) of entities in A. E.g.- This indicates that student may have many instructors but an instructor can advise at most one student. Knowledge Gate Website Many to Many(M:N) Relationship - An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is associated with any number (zero or more) of entities in A. E.g.- This indicates a student may have many advisors and an instructor may advise many students. Knowledge Gate Website PARTICIPATION CONSTRAINTS- it defines participations of entities of an entity type in a relationship. Partial participation Total Participation Knowledge Gate Website STRONG AND WEAK ENTITY SET An entity set is called strong entity set, if it has a primary key, all the tuples in the set are distinguishable by that key. An entity set that does not process sufficient attributes to form a primary key is called a weak entity set. It contains discriminator attributes (partial key) which contain partial information about the entity set, but it is not sufficient enough to identify each tuple uniquely. Represented by double rectangle. Knowledge Gate Website For a weak entity set to be meaningful and converted into strong entity set, it must be associated with another strong entity set called the identifying or owner entity set i.e. weak entity set is said to be existence dependent on the identity set. The identifying entity set is said to own weak entity set that it identifies. A weak entity set may participate as owner in an identifying relationship with another weak entity set. Knowledge Gate Website The relationship associating the weak entity set with the identifying entity set is called the identifying relationship (double diamonds). The identifying relationship is many to one from the weak entity set to identifying entity set, and the participation of the weak entity set in relationship is always total. The primary key of weak entity set will be the union of primary key and discriminator attributes. Knowledge Gate Website REASONS TO HAVE WEAK ENTITY SET Weak entities reflect the logical structure of an entity being dependent on another. Weak entity can be deleted automatically when their strong entity is deleted. Without weak entity set it will lead to duplication and consequent possible inconsistencies. Knowledge Gate Website Conversion From ER Diagram To Relational Model Entity Set Convert every strong, weak entity set into a separate table. In weak entity set we make it dependent onto one strong entity set (identifying or owner entity set). Relationship If Unary: No separate table is required, add a new column as fk which refer the pk of the same table. if 1:1 No separate table is required, take pk of one side and put it as fk on other side, priority must be given to the side having total participation. if 1:n or n:1 No separate table is required, modify n side by taking pk of 1 side a foreign key on n side. If m-n Separate table is required take pk of both table and declare their combination as a pk of new table (3 or More) Take the pk of all participating entity sets as fk and declare their combinations as pk in the new table. Attributes Multivalued-A separate table must be taken for all multivalued attributes, where we take pk of the main table as fk and declare combination of fk and multivalued attribute are pk in the new table. Composite Attributes-A separate column must be taken for all simple attributes of the composite Knowledge Gate Website Generalization Involves merging two lower-level entities to create a higher-level entity. A bottom-up approach that builds complexity from simpler components. Highlights similarities among lower-level entity sets while hiding differences. Leads to a simplified, structured data representation, aiding in database design and querying processes. Knowledge Gate Website Specialization A process where a higher-level entity is broken down into more specific, lower- level entities. This top-down approach delineates complexity into simpler components. Acts as the converse of the generalization process, focusing on differentiating properties rather than similarities. Knowledge Gate Website Aggregation A concept wherein relationships are abstracted to form higher-level entities, enabling a more organized representation of complex relationships. Knowledge Gate Website ADVANTAGES OF E-R DIGRAM Constructs used in the ER diagram can easily be transformed into relational tables. It is simple and easy to understand with minimum training. DISADVANTAGE OF E-R DIGRAM Loss of information content Limited constraints representation It is overly complex for small projects Knowledge Gate Website RELATIONAL DATABASE MANAGEMENT SYSTEM A relational database management system (RDBMS), conceptualized by Edgar F. Codd in 1970, serves as the foundation for most contemporary commercial and open-source database applications. Central to its design is the utilization of tables for data storage, where it maintains and enforces specific data relationships, marking a significant evolution in database design. Knowledge Gate Website BASICS OF RDBMS Domain (set of permissible value in particular column) is a set of atomic values. By atomic we mean that each value in the domain is indivisible as far as the formal relational model is concerned. A common method of specifying a domain is to specify a data type from which the data values forming the domain are drawn. E.g. Names: The set of character strings that represent names of persons. Domain/ NAME ID CITY COUNTRY HOBBY NISHA 1 AGRA INDIA PLAYING Field/ NIKITA 2 DELHI INDIA DANCING Column/ AJAY 3 AGRA INDIA CHESS ARPIT 4 PATNA INDIA READING Arity/ Degree Knowledge Gate Website Table (Relation) - A Relation is a set of tuples/rows/entities/records. Tuple - Each row of a Relation/Table is called Tuple. Arity/Degree - No. of columns/attributes of a Relation. E.g. - Arity is 5 in Table Student. Cardinality - No of rows/tuples/record of a Relational instance. E.g. - Cardinality is 4 in table Student. NAME ID CITY COUNTRY HOBBY NISHA 1 AGRA INDIA PLAYING Rows/Tuples/Record/ NIKITA 2 DELHI INDIA DANCING Cardinality AJAY 3 AGRA INDIA CHESS ARPIT 4 PATNA INDIA READING Knowledge Gate Website Properties of Relational tables 1. Cells contains atomic values 2. Values in a column are of the same kind 3. Each row is unique 4. No two tables can have the same name in a relational schema. 5. Each column has a unique name 6. The sequence of rows is insignificant 7. The sequence of columns is insignificant. Knowledge Gate Website Problems in relational database Update Anomalies- Anomalies that cause redundant work to be done during insertion into and Modification of a relation and that may cause accidental loss of information during a deletion from a relation Insertion Anomalies Modification Anomalies Deletion Anomalies Knowledge Gate Website Insertion anomalies: An independent piece of information cannot be recorded into a relation unless an irrelevant information must be inserted together at the same time. Roll no name Age Br_code Br_name Br_hod_name 1 A 19 101 Cs Abc 2 B 18 101 Cs Abc 3 C 20 101 Cs Abc 4 D 20 102 Ec Pqr Knowledge Gate Website Modification anomalies: The update of a piece of information must occur at multiple locations. Roll no name Age Br_code Br_name Br_hod_name 1 A 19 101 Cs Abc 2 B 18 101 Cs Abc 3 C 20 101 Cs Abc 4 D 20 102 Ec Pqr Knowledge Gate Website Deletion Anomalies: The deletion of a piece of information unintentionally removes other information. Roll no name Age Br_code Br_name Br_hod_name 1 A 19 101 Cs Abc 2 B 18 101 Cs Abc 3 C 20 101 Cs Abc 4 D 20 102 Ec Pqr Knowledge Gate Website Roll no name Age Br_code Br_name Br_hod_name 1 A 19 101 Cs Abc 2 B 18 101 Cs Abc 3 C 20 101 Cs Abc 4 D 20 102 Ec Pqr PK FK PK Roll no name Age Br_code Br_code Br_name Br_hod_name 1 A 19 101 101 Cs Abc 2 B 18 101 102 ec Pqr 3 C 20 101 4 D 20 102 Knowledge Gate Website Purpose of Normalization Normalization may be simply defined as refinement process. Which includes creating tables and establishing relationships between those tables according to rules designed both to protect data and make the database more flexible by eliminating two factors; Redundancy Inconsistent Dependency With out normalization data base system may be inaccurate, slow and inefficient and they might not produce the data we expect. A series of normal form tests that can be carried out on individual relation schemas so that the relational database can be normalized to any desired degree. 1NF>>>2NF>>3NF>>BCNF Knowledge Gate Website Conclusion 1. Like every paragraph must have a single idea similarly every table must have a single idea and if a table contains more than one idea then that table must be decomposed until each table contains only one idea. 2. We need some tools to approach this decomposition or normalization on large database which contains a number of table, and that tool is functional dependencies. Knowledge Gate Website Roll no name Age Br_code Br_name Br_hod_name 1 A 19 101 Cs Abc 2 B 18 101 Cs Abc 3 C 20 101 Cs Abc 4 D 20 102 Ec Pqr Roll no name Age Br_code Br_code Br_name Br_hod_name 1 A 19 101 101 Cs Abc 2 B 18 101 102 ec Pqr 3 C 20 101 4 D 20 102 Knowledge Gate Website Functional Dependency Knowledge Gate Website Knowledge Gate Website Br_code Br_hod_name Br_code Br_hod_name Knowledge Gate Website Functional Dependency कोई बताता नह ,ीं इसक feel आ जात है Knowledge Gate Website FUNCTIONAL DEPENDENCY A formal tool for analysis of relational schemas. X Y Z In a Relation R, if ‘α’ ⊑ R AND ‘β’ ⊑ R, then 1 4 2 attribute or a Set of attribute ‘α’ Functionally 1 4 3 derives an attribute or set of attributes ‘β’, iff 2 6 3 each ‘α’ value is associated with precisely one ‘β’ value. 3 2 2 For all pairs of tuples t1 and t2 in R such that If T1[α] = T2[α] Then, T1[β] = T2[β] Knowledge Gate Website Q Consider the following relation instance, which of the following dependency doesn’t hold A) A → b A B C 1 2 3 B) BC → A 4 2 3 C) B → C 5 3 3 D) AC → B Knowledge Gate Website Trivial Functional dependency - If β is a subset of α, then the functional dependency α → β will always hold. X Y Z जजसका होना न होना बराबर हो 1 4 2 1 4 3 2 6 3 3 2 2 Knowledge Gate Website ATTRIBUTES CLOSURE/CLOSURE ON ATTRIBUTE SET/ CLOSURE SET OF ATTRIBUTES Attribute closure of an attribute set can be defined as set of attributes which can be functionally determined from F. DENOTED BY F+ Q R(ABCDE) R (A, B, C, D) R(ABCDEFG) A → BC A→B A→B CD → E B→C BC → DE B→D AB → D AEG → G E→A A+ = (AC)+ =? (B)+ = Knowledge Gate Website ARMSTRONG’S AXIOMS 1. An axiom or postulate is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. 2. Armstrong's axioms are a set of axioms (or, more precisely, inference rules) used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong in his 1974 paper. 3. The axioms are sound in generating only functional dependencies in the closure of a set of functional dependencies (denoted as F+) when applied to that set (denoted as F). Knowledge Gate Website Armstrong Axioms Reflexivity: If Y is a subset of X, then X → Y Augmentation: If X → Y, then XZ → YZ Transitivity: If X → Y and Y → Z, then X → Z Knowledge Gate Website From these rules, we can derive these secondary rules- Union: If X → Y and X → Z, then X → YZ Decomposition: If X → YZ, then X → Y and X → Z Pseudo transitivity: If X → Y and WY → Z, then WX → Z Composition: If X → Y and Z → W, then XZ → YW Knowledge Gate Website Why Armstrong axioms refers to the Sound and Complete By sound, we mean that given a set of functional dependencies F specified on a relation schema R, any dependency that we can infer from F by using the primary rules of Armstrong axioms holds in every relation state r of R that satisfies the dependencies in F. By complete, we mean that using primary rules of Armstrong axioms repeatedly to infer dependencies until no more dependencies can be inferred results in the complete set of all possible dependencies that can be inferred from F. Knowledge Gate Website Equivalence of Two FD sets- Two FD sets F1 and F2 are equivalent if – F1+ = F2+ Or F1 ⊑ F2+ and F2 ⊑ F1+ Knowledge Gate Website Q Consider the following set of fd R(ACDEH) F G A→C A → CD AC → D E → AH E → AD E→H Knowledge Gate Website To find the MINIMAL COVER / CANONICAL COVER / IRREDUCIBLE SET A canonical cover (also known as a minimal cover) for a set of functional dependencies in a database is a minimal set of functional dependencies that is equivalent to the original set, but with redundant dependencies and extraneous attributes removed. It is used in the normalization process of database design to simplify the set of functional dependencies and to find a good set of relations. There may be any following type of redundancy in the set of functional dependencies: - Complete production may be Redundant. One or more than one attributes may be redundant on right hand side of a production. One or more than one attributes may be redundant on Left hand side of a production. Knowledge Gate Website Q R(ABCD) A→B C→B D → ABC AC → D Knowledge Gate Website R(A,B,C) A→B B→C A→C AB → B AB → C AC → B Knowledge Gate Website Key Knowledge Gate Website Super key Set of attributes using which we can identify each tuple uniquely is called Super key. Let X be a set of attributes in a Relation R , if X+(Closure of X) determines all attributes of R then X is said to be Super key of R. There should be at least one Super key in every relation. Knowledge Gate Website Candidate key Minimal set of attributes using which we can identify each tuple uniquely is called Candidate key. A super key is called candidate key if it’s No proper subset is a super key. Also called as MINIMAL SUPER KEY. There should be at least one candidate key. Knowledge Gate Website Prime attribute - Attributes that are member of at least one candidate Keys are called Prime attributes. Knowledge Gate Website Primary key One of the candidate keys is selected by database administrator as a Primary means to identify tuple is called primary Key. Primary Key attribute are not allowed to have Null values. Exactly one Primary Key per table in RDMS. Candidate key which are not chosen as primary key is alternate key. Knowledge Gate Website Foreign Keys A foreign key is a column or group of columns in a relational database table that refers the primary key of the same table or some other table to represent relationship. The concept of referential integrity is derived from foreign key theory. Knowledge Gate Website Roll no name Age Br_code Br_name Br_hod_name 1 A 19 101 Cs Abc 2 B 18 101 Cs Abc 3 C 20 101 Cs Abc 4 D 20 102 Ec Pqr PK FK PK Roll no name Age Br_code Br_code Br_name Br_hod_name 1 A 19 101 101 Cs Abc 2 B 18 101 102 ec Pqr 3 C 20 101 4 D 20 102 Knowledge Gate Website PK FK Roll no CR 1 1 2 2 3 1 4 2 5 1 6 2 7 1 8 2 9 1 10 2 11 1 12 2 13 1 14 2 15 null Knowledge Gate Website Composite key – Composite key is a key composed of more than one column sometimes it is also known as concatenated key. Secondary key – Secondary key is a key used to speed up the search and retrieval contrary to primary key, a secondary key does not necessary contain unique values. Knowledge Gate Website NORMAL FORM Knowledge Gate Website FIRST NORMAL FORM 1NF is the initial step of database normalization. Implications of first normal form Atomic Values: Each cell in a table contains indivisible, atomic values. Means a Relation should not contain any multivalued or composite attributes. Unique Columns: Each column must have a distinct name to identify the data it contains. Primary Key: A table in 1NF should have a primary key that uniquely identifies each record. Eliminating Duplicates: Duplicate rows are removed to prevent data redundancy. Knowledge Gate Website Prime attribute: - A attribute is said to be prime if it is part of any of the candidate key Non-Prime attribute: - A attribute is said to be non-prime if it is not part of any of the candidate key Eg R(ABCD) AB→CD Here candidate key is AB so, A and B are prime attribute, C and D are non-prime attributes. Knowledge Gate Website PARTIAL DEPENDENCY- When a non – prime attribute is dependent only on a part (Proper subset) of candidate key then it is called partial dependency. (PRIME > NON-PRIME) Full DEPENDENCY- When a non – prime attribute is dependent on the entire candidate key then it is called Full dependency. e.g. R(ABCD) AB→D A→C Knowledge Gate Website SECOND NORMAL FORM Relation R is in 2NF if, R should be in 1 NF. R should not contain any Partial dependency. (that is every non-prime attribute should be fully dependent upon candidate key) Knowledge Gate Website Q R(A, B, C) B→C A B C A B B C a 1 X A 1 1 X b 2 Y B 2 2 Y a 3 Z A 3 3 z C 3 Z C 3 D 3 Z D 3 E 3 Z E 3 Knowledge Gate Website Knowledge Gate Website TRANSITIVE DEPENDENCY – A functional dependency from non-Prime attribute to non-Prime attribute is called transitive E.g.- R(A, B, C, D) with A as a candidate key A→B B→C [ transitive dependency] C→D [transitive dependency] Knowledge Gate Website THIRD NORMAL FORM Let R be the relational schema, it is said to be in 3 NF R should be in 2NF It must not contain any transitive dependency Knowledge Gate Website THIRD NORMAL FORM DIRECT DEFINATION A relational schema R is said to be 3 NF if every functional dependency in R from α-->β, either α is super key or β is the prime attribute Knowledge Gate Website R(A, B, C) A B C A B B C A 1 P A 1 1 P A→B B 2 Q B 2 2 Q C 2 Q C 2 3 R B→C D 2 Q D 2 4 S E 3 R E 3 F 3 R F 3 G 4 S G 4 Knowledge Gate Website BCNF (BOYCE CODD NORMAL FORM) A relational schema R is said to be BCNF if every functional dependency in R from α→β α must be a super key Knowledge Gate Website R(A, B, C) A B C A B C B AB → C A C B A B B C B B C B B C B C→B B A D B A D A A A E A A E A C C B C C D C B D C E C B e C F C B f c Knowledge Gate Website Some important note points on Normalization: A Relation with two attributes is always in BCNF. A Relation schema R consist of only prime attributes then R is always in 3NF, but may or may not be in BCNF. Knowledge Gate Website Q Consider the universal relational schema R (A, B, C, D, E, F, G, H, I, J) and a set of following functional dependencies. F = {AB → C, A → DE, B → F, F → GH, D → IJ} determine the keys for R ? Decompose R into 2nd normal form. Knowledge Gate Website Q Find the normal form of relation R(A, B, C, D, E) having FD set F= {A → B, BC → E, ED → A}. Knowledge Gate Website Multivalued Dependency Denoted by, A →→ B, Means, for every value of A, there may exist more than one value of B. E.g. S_name → → Club_name S_Name Club_name Kamesh Dance Kamesh Guitar Knowledge Gate Website A trivial multivalued dependency X→→Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the relation. S_name → → Club_name S_name → → Club_name S_name → → P_no S_Name Club_name S_Name Club_name P_no Kamesh Dance Kamesh Dance 123 Kamesh Guitar Kamesh Guitar 123 Kamesh Dance 789 Kamesh Guitar 789 Knowledge Gate Website E.g. let the constraint specified by MVD in relation Student as S_name → → Club_name S_name → → P_no S_Name Club_name S_Name P_no Kamesh Dance Kamesh 123 Kamesh Guitar Kamesh 789 S_Name Club_name P_no Kamesh Dance 123 Kamesh Guitar 123 Kamesh Dance 789 Kamesh Guitar 789 NOTE: The above Student schema is in BCNF as no functional dependency holds on EMP, but still redundancy due to MVD. Knowledge Gate Website Each row indicates that a given restaurant can Restaurant Restaurant Delivery Permutations Variety Delivery Area deliver a given variety. The table has no non-key Chatora Sweets Samosa Hatibagan Market Chatora Sweets Samosa Chandni Chowk attributes because its only key is {Restaurant, Chatora Sweets Samosa Koramangala Variety, Delivery Area}. Therefore, it meets all Chatora Sweets Dosa Hatibagan Market Chatora Sweets Dosa Chandni Chowk normal forms up to BCNF. Chatora Sweets Dosa Koramangala If we assume, however, that Variety offered by a Moolchand Ladoo Koramangala Moolchand Dosa Koramangala restaurant are not affected by delivery area (i.e. a Thaggu Samosa Hatibagan Market restaurant offers all Variety it makes to all areas it Thaggu Samosa Chandni Chowk Thaggu Ladoo Hatibagan Market supplies), then it does not meet 4NF. The Thaggu Ladoo Chandni Chowk problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a super key). The dependencies are: o {Restaurant} →→ {Variety} o {Restaurant} →→ {Delivery Area} Knowledge Gate Website If we have two or more multivalued independent attributes in the same relation schema, we get into a problem of having to repeat every value of one of the attributes with every value of the other attribute to keep the relation state consistent and to maintain the independence among the attributes involved. This constraint is specified by a multivalued dependency. Delivery Areas By Restaurant Varieties By Restaurant Restaurant Delivery Area Restaurant Pizza Variety Chatora Sweets Hatibagan Market Chatora Sweets Samosa Chatora Sweets Chandni Chowk Chatora Sweets Dosa Chatora Sweets Koramangala Moolchand Ladoo Moolchand Koramangala Moolchand Dosa Thaggu Hatibagan Market Thaggu Samosa Thaggu Chandni Chowk Thaggu Ladoo Knowledge Gate Website A relation is in 4NF iff It is in BCNF There must not exist any non-trivial multivalued dependency. Each MVD is decomposed in separate table, where it becomes trivial MVD. Knowledge Gate Website Lossy/Lossless-Dependency Preserving Decomposition Because of a normalization a table is Decomposed into two or more tables, but during this decomposition we must ensure satisfaction of some properties out of which the most important is lossless join property / decomposition. Knowledge Gate Website if we decompose a table r into two tables r1 and r2 because of normalization then at some later stage if we want to join(combine) (natural join) these tables r1 and r2, then we must get back the original table r, without any extra or less tuple. But some information may be lost during retrieval of original relation or table. For e.g. r A B C 1 a p 2 b q 3 a r r1 r2 A B B C Knowledge Gate Website r A B C 1 a p 2 b q 3 a r r1 r2 A B B C 1 a a p 2 b b q 3 a A B C a r Knowledge Gate Website Decomposition is lossy if R1 ⋈ R2 ⊃ R Decomposition is lossy if R ⊃ R1 ⋈ R2 Knowledge Gate Website Decomposition is lossless if R1 ⋈ R2 = R "The decomposition of relation R into R1 and R2 is lossless when the join of R1 and R2 yield the same relation as in R." which guarantees that the spurious (extra or less) tuple generation problem does not occur with respect to the relation schemas created after decomposition. This property is extremely critical and must be achieved at any cost. lossless Decomposition / NonAdditive Join Decomposition Knowledge Gate Website A B C D E A 122 1 W A E 236 4 X B A 199 1 Y C B 213 2 Z D Knowledge Gate Website How to check for lossless join decomposition using FD set, following conditions must hold: Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be either in R1 or in R2. Att(R1) U Att(R2) = Att(R) Intersection of Attributes of R1 and R2 must not be NULL. Att(R1) ∩ Att(R2) ≠ Φ Common attribute must be a key for at least one relation (R1 or R2) Att(R1) ∩ Att(R2) → (R1) or Att(R1) ∩ Att(R2) → (R2) Knowledge Gate Website Q R (A, B, C, D) A→ B, B→C, C→D, D→A R1(A, B), R2(B, C) AND R3(C, D) Knowledge Gate Website Q R(ABCDE)(NF) R1(AB) R2(BC) R3(ABCD) R4(EG) A→BC C→DE D→E Knowledge Gate Website 5 NF/Project-Join Normal Form A Relational table R is said to be in 5th normal form if it is in 4 NF it cannot be further non-loss decomposed Knowledge Gate Website Dependency Preserving Decomposition Let relation R be decomposed into Relations R1, R2, R3…………. RN with their respective functional Dependencies set as F1, F2, F3…………. FN, then the Decomposition is Dependency Preserving iff {F1 ∪ F2 ∪ F3 ∪ F4………. ∪ FN }+ = F+ Dependency preservation property, although Knowledge desirable, is sometimes sacrificed. Gate Website X = (PQRS) F = {QR → S, R → P, S → Q} Y = (PR) and Z = (QRS) Knowledge Gate Website Indexing Relational databases are based on set theory. In set theory, the order of elements is unimportant, similarly in database tables. However, in practical implementation, element order in tables is often specified. Various operations such as search, insertion, and deletion are influenced by the order of elements in the tables. Elements in a table can be stored in two ways: sorted (ordered) or unsorted (unordered). Knowledge Gate Website File organization/ organization of records in a file Ordered file organization All the records in the file are ordered on some search key field. Here binary search is possible. (give example of book page searching) Maintenance (insertion & deletion) is costly, as it requires reorganization of entire file. Notes that we will get binary search only if we are using that key for searching on which indexing is done, otherwise it will behave as unsorted file. if file is unordered then no of block assesses required to reach correct block which contain the desired record is O(log2n), where n is the number of blocks. Knowledge Gate Website Unordered file organization Records are typically added at the end of the file, without following any specific order. This insertion method allows only linear search, resulting in slower search times. Despite slow searches, maintenance including insertion and deletion is simpler. No reorganization of the entire file is needed, making maintenance easier. If file is unordered then no of block assesses required to reach correct block which contain the desired record is O(n), where n is the number of blocks. Knowledge Gate Website Indexes are supplementary structures in databases, aiding in swift record retrieval. They enable quick data access based on particular attributes identified for indexing. This technique is similar to the index sections seen in books. Indexes provide secondary pathways to access records without changing their physical position in the main file. Knowledge Gate Website The size of index file is way smaller than that of the main file, as index file record contain only two columns key (attribute in which searching is done) and block pointer (base address of the block of main file which contains the record holding the key), while main file contains all the columns. Knowledge Gate Website Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block Size B = 1024 B. File records are of fixed size and are unspanned with record length R = 100 B. Suppose that ordering key field of file is 9 B long and a block pointer is 6 B long, Implement primary indexing? Knowledge Gate Website 1. Indexes can be established on any relation field, be it primary key or non-key. 2. Each attribute can have a dedicated index file, meaning multiple index files may exist for one main file. 3. Index files are always organized, allowing for the utilization of binary search advantages, irrespective of the main file's order. 4. Indexing accelerates data retrieval time but also introduces space overhead for storing the index file. 5. The correct block in the main file can be located with log2(number of blocks in index file) + 1 accesses. Knowledge Gate Website TYPES OF INDEXING Index Single Multilevel level Primary Clustering Secondary Simple B tree B+ tree Index Index index multilevel In single-level indexing, an index file is created for the main file, marking the end of the indexing process. Multiple-level indexing, on the other hand, involves creating an index for the index file and continually repeating this procedure until only a single block remains. Knowledge Gate Website PRIMARY INDEXING Main file is always sorted according to primary key. Indexing is done on Primary Key, therefore called as primary indexing Index file have two columns, first primary key and second anchor pointer (base address of block) It is an example of Sparse Indexing. Here first record (anchor record) of every block gets an entry in the index file No. of entries in the index file = No of blocks acquired by the main file. Knowledge Gate Website CLUSTERED INDEXING Main file will be ordered on some non-key attributes No of entries in the index file = no of unique values of the attribute on which indexing is done. It is the example of Sparse as well as dense indexing Knowledge Gate Website Q Suppose we have ordered file with records stored r = 30,000 on a disk with Block Size B = 1024 B. File records are of fixed size and are unspanned with record length R = 100 B. Suppose that ordering key field of file is 9 B long and a block pointer is 6 B long, Implement Secondary indexing? Knowledge Gate Website SECONDARY INDEXING Most common scenarios, suppose that we already have a primary indexing on primary key, but there is frequent query on some other attributes, so we may decide to have one more index file with some other attribute. Main file is ordered according to the attribute on which indexing is done(unordered). Secondary indexing can be done on key or non-key attribute. No of entries in the index file is same as the number of entries in the main file. It is an example of dense indexing. Knowledge Gate Website Dense Vs Sparse Dense Index In dense index, there is an entry in the index file for every search key value in the main file. This makes searching faster but requires more space to store index records itself. Note that it is not for every record, it is for every search key value. Sometime number of records in the main file > number of search keys in the main file, for example if search key is repeated. Sparse Index-If an index entry is created only for some records of the main file, then it is called sparse index. No. of index entries in the index file < No. of records in the main file. Note: - dense and sparse are not complementary to each other, sometimes it is possible that a record is both dense and sparse. Knowledge Gate Website B tree A B-tree of order m if non-empty is an m-way search tree in which. The root has at least zero child nodes and at most m child nodes. The internal nodes except the root have at least celling(m/2) child nodes and at most m child nodes. The number of keys in each internal node is one less than the number of child nodes and these keys partition the subtrees of the nodes in a manner similar to that of m-way search tree. All leaf nodes are on the same level(perfectly balanced). Root Internal except Root Leaf Rules MAX MIN Rules MAX MIN Rules MAX MIN CHILD m 0 CHILD m ⌈m/2⌉ CHILD 0 0 DATA m-1 1 DATA m-1 ⌈m/2⌉ - 1 DATA m-1 ⌈m/2⌉ - 1 Knowledge Gate Website Insertion in B-TREE A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once the root node is full with m – 1 search key values and we attempt to insert another entry in the tree, the root node splits into two nodes at level 1. Only the middle value is kept in the root node, and the rest of the values are split evenly between the other two nodes. When a non-roof node is full and a new entry is inserted into it, that node is split into two nodes at the same level, and the middle entry is moved to the parent node along with two pointers to the new split nodes. If the parent node is full, it is also split. Splitting can propagate all the way to the root node, creating a new level if the root is split. Knowledge Gate Website Q Consider the following elements 5, 10, 12, 13, 14, 1, 2, 3, 4 insert them into an empty b-tree of order = 3. Knowledge Gate Website Query Language After designing a data base, that is ER diagram followed by conversion in relational model followed by normalization and indexing, now next task is how to store, retrieve and modify data in the data base. So here we will be concentrating more on the retrieval part. Query languages are used for this purpose. Query languages, data query languages or database query languages (DQLs) are computer languages using which user request some information from the database. A well known example is the Structured Query Language (SQL). Knowledge Gate Website Procedural Query Language Here users instruct the system to performs a sequence of operations on the data base in order to compute the desired result. Means user provides both what data to be retrieved and how data to be retrieved. e.g. Relational Algebra. Knowledge Gate Website Non-Procedural Query Language In nonprocedural language, the user describes the desired information without giving a specific procedure for obtaining that information. What data to be retrieved e.g. Relational Calculus. Tuple relational calculus, Domain relational calculus are declarative query languages based on mathematical logic Knowledge Gate Website Relational Algebra (Procedural) and Relational Calculus (non-procedural) are mathematical system/ query languages which are used for query on relational model. RA and RC are not executed in any computer they provide the fundamental mathematics on which SQL is based. SQL (structured query language) works on RDBMS, and it includes elements of both procedural or non-procedural query language. Relational model RDBMS RA, RC SQL Algo Code Conceptual Reality Theoretical Practical Chess Battle Field Knowledge Gate Website RELATIONAL ALGEBRA RA like any other mathematical system provides a number of operators and use relations (tables) as operands and produce a new relation as their result. 3+4 Operand Operator Operand A⊕B Knowledge Gate Website Every operator in the RA accepts (one or two) relation/table as input arguments and returns always a single relation instance as the result without a name. * Knowledge Gate Website It also does not consider duplicity by default as it is based on set theory. Same query is written in RA and SQL the result may be different as SQL considers duplication. As it is pure mathematics no use of English keywords. Operators are represented using symbols. Knowledge Gate Website BASIC / FUNDAMENTAL OPERATORS The fundamental operations in the relational algebra are select, project, union, set difference, Cartesian product, and Rename. Name Symbol Select (σ) Project (∏) Union (∪) Set difference (−) Cross product (Χ) Rename (ρ) Knowledge Gate Website DERIVED OPERATORS There are several other operations namely: set intersection, natural join, and assignment. Name Symbol DERIVED FROM Join (⋈) (Χ) (−) Intersection (∩) A∩B=A-(A-B) Division (÷) (X,-, ∏) Assignment (=)Knowledge Gate Website The select, project, and rename operations are called unary operations, because they operate on one relation. Union, Cartesian product and set difference operate on pairs of relations and are, therefore, called binary operations. Relational algebra also provides the framework for query optimization. Knowledge Gate Website Relational schema - A relation schema R, denoted by R (A1, A2,..., An), is made up of a relation name R and a list of attributes, A1, A2,..., An. Each attribute Ai is the name of a role played by some domain D in the relation schema R. It is use to describe a Relation. E.g. Schema representation of Table Student is as – STUDENT (NAME, ID, CITY, COUNTRY, HOBBY). Relational Instance - Relations with its data at particular instant of time. Knowledge Gate Website The Project Operation (Vertical Selection) Main idea behind project operator is to select desired columns. The project operation is a unary operation that returns its argument relation, with certain attributes left out. Projection is denoted by the uppercase Greek letter pi (Π). Minimum number of columns selected can be 1, Maximum selected Columns can be n - 1. Πcolumn_name (table_name) Knowledge Gate Website Q Write a RELATIONAL ALGEBRA query to find the name of all customer without duplication having bank account? Q Write a RELATIONAL ALGEBRA query to find all the details of bank branches? Knowledge Gate Website Q Write a RELATIONAL ALGEBRA query to find the name of all customer without duplication having bank account? Πcustomer_name (depositor) Q Write a RELATIONAL ALGEBRA query to find all the details of bank branches? (branch) Knowledge Gate Website The Select Operation (Horizontal Selection) The select operation selects tuples that satisfy a given predicate/Condition p. Lowercase Greek letter sigma (σ) is used to denote selection. It is a unary operator. Eliminates only tuples/rows. σ condition (table_name) Knowledge Gate Website Q Write a RELATIONAL ALGEBRA query to find all account_no where balance is less the 1000? Q Write a RELATIONAL ALGEBRA query to find branch name which is situated in Delhi and having assets less than 1,00,000? Knowledge Gate Website Q Write a RELATIONAL ALGEBRA query to find all account_no where balance is less the 1000? Πaccount_number(σ balance 1200] } Knowledge Gate Website Q Find the loan number for each loan of amount over 1200? { < l > | ∃b, a < l, b, a > ∈ loan ∧ a > 1200] } Knowledge Gate Website Q Find the name of all the customers who have a loan from Noida branch with loan amount? { < c, a > | ∃l (< c, l > ∈ borrower ∧ ∃b (< l, b, a > ∈ loan ∧ b = ‘Noida’ ))} Knowledge Gate Website Q Find the name of all the customers who have a loan or account or both at the bank? { < c > | ∃l (< c, l > ∈ borrower ∧ ∃b, a (< l, b, a > ∈ loan ∧ b = ‘Noida’)) V ∃a (< c, a > ∈ depositor ∧ ∃b, a (< a, bn, bal > ∈ account ∧ bn = ‘Noida’))} Knowledge Gate Website Safety of Expressions Similar to TRC we can have an unsafe expression that may generate infinite relations. An expression such as {< i, n, d,s > | ¬ (< i, n, d,s > ∈ instructor )} is unsafe, because it allows values in the result that are not in the domain of the expression. Knowledge Gate Website Expressive Power of Languages When the domain relational calculus is restricted to safe expressions, it is equivalent in expressive power to the tuple relational calculus restricted to safe expressions. All three of the following are equivalent: o The basic relational algebra (without the extended relational-algebra operations) o The tuple relational calculus restricted to safe expressions o The domain relational calculus restricted to safe expressions Domain relational calculus also does not have any equivalent of the aggregate operation, but it can be extended to support aggregation, and extending it to handle arithmetic expressions is straightforward. Knowledge Gate Website TRANSACTION T1 Why we study transaction? Read(A) According to general computation principle (operating system) we may have partially executed program, as the level of A = A-100 atomicity is instruction i.e. either an instruction is executed Write(A) completely or not Read(B) But in DBMS view, user perform a logical work(operation) which is always atomic in nature i.e. either operation is B = B+100 execute or not executed, there is no concept like partial Write(B) execution. For example, Transaction T1 which transfer 100 units from account A to B. In this transaction if a failure occurs after Read(B) then the final statue of the system will be inconsistent as 100 units are debited from account A but not credited in account B, this will generate inconsistency. Here for ‘consistency’ before (A + B) == after (A + B)”. Knowledge Gate Website What is transaction To remove this partial execution problem, we increase the level of atomicity and bundle all the instruction of a logical operation into a unit called transaction. So formally ‘A transaction is a Set of logically related instructions to perform a logical unit of work’. T1 Read(A) A = A-100 Write(A) Read(B) B = B+100 Write(B) Knowledge Gate Website As here we are only concerned with DBMS so we well only two basic operation on database. READ (X) - Accessing the database item x from disk (where database stored data) to memory variable also name as X. WRITE (X) - Writing the data item from memory variable X to disk. Knowledge Gate Website Desirable properties of transaction Now as the smallest unit which have atomicity in DBMS view is transaction, so if want that our data should be consistent then instead of concentrating on data base, we must concentrate on the transaction for our data to be consistent. T1 Read(A) A = A-100 Write(A) Read(B) B = B+100 Write(B) Knowledge Gate Website Transactions should possess several properties, often called the ACID properties; to provide integrity and consistency of the data in the database. The following are the ACID properties: Knowledge Gate Website Atomicity - A transaction is an atomic unit of processing; it should either be performed in its entirety or not performed at all. T1 Read(A) A = A-100 Write(A) Read(B) B = B+100 Write(B) Knowledge Gate Website Consistency - A transaction should be consistency preserving, meaning that if it is completely executed from beginning to end without interference from other transactions, it should take the database from one consistent state to another. Knowledge Gate Website Isolation - A transaction should appear as though it is being executed in isolation from other transactions, even though many transactions are executing concurrently. That is, the execution of a transaction should not be interfered with by any other transactions executing concurrently. Knowledge Gate Website Durability - The changes applied to the database by a committed transaction must persist in the database. Knowledge Gate Website Transaction states ACTIVE - It is the initial state. Transaction remains in this state while it is executing operations. Knowledge Gate Website Transaction states PARTIALLY COMMITTED - After the final statement of a transaction has been executed, the state of transaction is partially committed as it is still possible that it may have to be aborted (due to any failure) since the actual output may still be temporarily residing in main memory and not to disk. Knowledge Gate Website Transaction states FAILED - After the discovery that the transaction can no longer proceed (because of hardware /logical errors). Such a transaction must be rolled back. Knowledge Gate Website Transaction states ABORTED - A transaction is said to be in aborted state when the when the transaction has been rolled back and the database has been restored to its state prior to the start of execution. Knowledge Gate Website Transaction states COMMITTED - A transaction enters committed state after successful completion of a transaction and final updation in the database. Knowledge Gate Website Why we need concurrent execution Concurrent execution is necessary because- It leads to good database performance , less weighting time. Overlapping I/O activity with CPU increases throughput and response time. Knowledge Gate Website PROBLEMS DUE TO CONCURRENT EXECUTION OF TRANSACTION But interleaving of instructions between transactions may also lead to many problems that can lead to inconsistent database. Sometimes it is possible that even though individual transaction are satisfying the acid properties even though the final statues of the system will be inconsistent. Knowledge Gate Website Solution is Schedule When two or more transaction executed together or one after another then they can be bundled up into a higher unit of execution called schedule. Knowledge Gate Website Serial schedule - A serial schedule consists of sequence of instruction belonging to different transactions, where instructions belonging to one single transaction appear together. Before complete execution of one transaction another transaction cannot be started. Every serial schedule lead database into consistent state. Knowledge Gate Website Non-serial schedule - A schedule in which sequence of instructions of a transaction appear in the same order as they appear in individual transaction but the instructions may be interleaved with the instructions of different transactions i.e. concurrent execution of transactions takes place. Knowledge Gate Website Conclusion of schedules We do not have any method to proof that a schedule is consistent, but from the above discussion we understand that a serial schedule will always be consistent. So if somehow we proof that a non-serial schedule will also have same effects as of a serial schedule than we can proof that, this particular non-serial schedule will also be consistent. Knowledge Gate Website On the basis of SERIALIZABILITY Conflict serializable View serializable Knowledge Gate Website T1 T2 T1 T2 T1 T2 T1 T2 R(A) R(B) R(A) W(A) R(B) R(A) W(A) R(A) T1 T2 T1 T2 T1 T2 T1 T2 R(A) W(B) R(A) W(A) W(B) R(A) W(A) R(A) T1 T2 T1 T2 T1 T2 T1 T2 R(A) R(A) W(A) W(A) R(A) R(A) W(A) W(A) Knowledge Gate Website Conflict equivalent – if one schedule can be converted to another schedule by swapping of non- conflicting instruction then they are called conflict equivalent schedule. T1 T2 T1 T2 R(A) R(B) A=A-50 B=B+50 R(B) R(A) B=B+50 A=A-50 R(B) R(B) B=B+50 B=B+50 R(A) R(A) A=A+10 A=A+10 Knowledge Gate Website SERIALIZABILITY Conflicting instructions - Let I and J be two consecutive instructions belonging to two different transactions Ti and Tj in a schedule S, the possible I and J instruction can be as- I= READ(Q), J=READ(Q) ->Non-conflicting I= READ(Q), J=WRITE(Q) ->Conflicting I= WRITE(Q), J=READ(Q) ->Conflicting I= WRITE(Q), J=WRITE(Q) ->Conflicting So, the instructions I and J are said to be conflicting, if they are operations by different transactions on the same data item, and at least one of these instructions is a write operation. Knowledge Gate Website CONFLICT SERIALIZABLE The schedules which are conflict equivalent to a serial schedule are called conflict serializable schedule. If a schedule S can be transformed into a schedule S’ by a series of swaps of non- conflicting instructions, we say that S and S’ are conflict equivalent. A schedule S is conflict serializable, if it is conflict equivalent to a serial schedule. Knowledge Gate Website Q Consider the following schedule for transactions T1, T2 and T3: what is the correct serialization of the above? Knowledge Gate Website Procedure for determining conflict serializability of a schedule It can be determined using PRECEDENCE GRAPH method: A precedence graph consists of a pair G (V, E) V= set of vertices consisting of all the transactions participating in the schedule. E= set of edges consists of all edges Ti → Tj, for which one of the following conditions holds: Ti executes write(Q) before Tj executes read(Q) Ti executes read(Q) before Tj executes write(Q) Ti executes write(Q) before Tj executes write(Q) If an edge Ti → Tj exists in the precedence graph, then in any serial schedule S’ equivalent to S, Ti must appear before Tj. If the precedence graph for S has no cycle, then schedule S is conflict serializable, else it is not. Knowledge Gate Website VIEW SERIALIZABLE If a schedule is not conflict serializable, still it can be consistent, so let us study a weaker form of serializability called View serializability, and even if a schedule is view serializable still it can be consistent. If a schedule is conflict serializable then it will also be view serializable, so we must check view serializability only if a schedule is not conflict serializable. Knowledge Gate Website Two schedules S and S’ are view equivalent, if they satisfy following conditions – For each data item Q, if the transaction Ti reads the initial value of Q in schedule S , then then the transaction Ti must, in schedule S’ ,also read the initial value of Q. If a transaction Ti in schedule S reads any data item Q, which is updated by transaction Tj, then a transaction Ti must in schedule S’ also read data item Q updated by transaction Tj in schedule S’. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S, then the same transaction must also perform final write(Q) in schedule S’. Knowledge Gate Website View Serializable A schedule S is view serializable, if it is view equivalent to a serial schedule. Knowledge Gate Website On the basis of On the basis of SERIALIZABILITY RECOVERABILITY Conflict Recoverable serializable View Cascadeless serializable Strict Knowledge Gate Website NON- RECOVERABLE Vs RECOVERABLE SCHEDULE A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a data item previously written by Ti, then the commit or abort operation of Ti appears before Tj. Such a schedule is called Non- Recoverable schedule. A schedule in which for each pair of transaction Ti and Tj , such that if Tj reads a data item previously written by Ti, then the commit or abort of Ti must appear before Tj. Such a schedule is called Recoverable schedule. S S T1 T2 T1 T2 R(X) R(X) W(X) W(X) R(X) R(X) C C C C Knowledge Gate Website CASCADING ROLLBACK Vs CASCADELESS SCHEDULE It is a phenomenon, in which even if the schedule is recoverable, the a single transaction failure leads to a series of transaction rollbacks, is called cascading rollback. Cascading rollback is undesirable, since it leads to undoing of a significant amount of work. Uncommitted reads are not allowed in cascade less schedule. A schedule in which for each pair of transactions Ti and Tj, such that if Tj reads a data item previously written by Ti then the commit or abort of Ti must appear before read operation of Tj. Such a schedule is called cascade less schedule. S S T1 T2 T3 T1 T2 T3 R(X) R(X) W(X) W(X) R(X) C W(X) R(X) R(X) W(X) C C C R(X) C C Knowledge Gate Website Strict Schedule A schedule in which for each pair of transactions Ti and Tj, such that if Tj reads a data item previously written by Ti then the commit or abort of Ti must appear before read and write operation of Tj. S1 S2 S3 T1 T2 T1 T2 T1 T2 R(a) R(a) R(a) W(a) W(a) R(b) W(a) C W(a) C W(a) W(b) R(a) R(a) C C C R(a) C Knowledge Gate Website Knowledge Gate Website Log based Recovery The system log, or transaction log, records all the changes or activities happening in the database, ensuring that transactions are durable and can be recovered in the event of a failure. Different types of log records represent various stages or actions in a transaction. : This log entry indicates that the transaction Ti has started its execution. < Ti, Xj, V1, V2>: This log entry documents a write operation. It states that transaction Ti has changed the value of data item Xj from V1 to V2. < Ti commit>: This log entry marks the successful completion of transaction Ti. < Ti abort>: This log entry denotes that the transaction Ti has been aborted, either due to an error or a rollback operation. Before any write operation modifies the database, a log record of that operation needs to be created. This is to ensure that in case of a failure, the system can restore the database to a consistent state using the log records. Knowledge Gate Website Aspect Deferred Database Modification Immediate Database Modification Write Operation Timing Only at the commit point.

Use Quizgecko on...
Browser
Browser