Advanced Database Management Systems PDF

Advanced Database Management Systems Programs Offered n e i Post Graduate Programmes (PG) l Master of Business Administration Master of Computer Applications Advanced Database n Master of Commerce (Financial Management / Financial Technology) Management OSystems Master of Arts (Journalism and Mass Communication) Master of Arts (Economics) Master of Arts (Public Policy and Governance) Master of Social Work Master of Arts (English) Master of Science (Information Technology) (ODL) Master of Science (Environmental Science) (ODL) i t y Diploma Programmes Post Graduate Diploma (Management) rs e Post Graduate Diploma (Logistics) Post Graduate Diploma (Machine Learning and Artificial Intelligence) Post Graduate Diploma (Data Science) i v Undergraduate Programmes (UG) Bachelor of Business Administration Bachelor of Computer Applications Bachelor of Commerce Bachelor of Arts (Journalism and Mass Communication) U n English / Sociology) Bachelor of Social Work y Bachelor of Arts (General / Political Science / Economics / it Bachelor of Science (Information Technology) (ODL) A m c )DIRECTORATE OF Product code ( DISTANCE & ONLINE EDUCATION Amity Helpline: 1800-102-3434 (Toll-free), 0120-4614200 AMITY For Distance Learning Programmes: [email protected] | www.amity.edu/addoe DIRECTORATE OF For Online Learning programmes: [email protected] | www.amityonline.com DISTANCE & ONLINE EDUCATION e in nl O ty Advanced Database Management Systems r si ve ni U ity m )A (c e in © Amity University Press All Rights Reserved nl No parts of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior permission of the publisher. O SLM & Learning Resources Committee ty Chairman : Prof. Abhinash Kumar si Members : Dr. Divya Bansal Dr. Coral J Barboza Dr. Apurva Chauhan Dr. Monica Rose r ve Dr. Winnie Sharma Member Secretary : Ms. Rita Naskar ni U ity m )A (c Published by Amity University Press for exclusive use of Amity Directorate of Distance and Online Education, Amity University, Noida-201313 Contents e Page No. in Module I: Query Processing 01 1.1 Basic concepts of queryprocessing 1.2 Converting SQL queries into Relational Algebra nl 1.3 Basic Algorithms for executing query operations 1.4 Query tree and querygraph O 1.5 Using Heuristics in Query Optimization 1.6 Fnctional Dependency 1.7 Normaliozation ty Module II: Object Oriented and Extended Relational Database Technologies 38 2.1 Over viewObject-Oriented Database si 2.2 OOConcepts 2.3 Architecture of OODBMS 2.4 OODBMS Modeling 2.5 ORDBMS Modeling r ve 2.6 Oodb Query Language Module III: Parallel and Distributed Database 54 3.1 Introduction ni 3.2 Design of Parallel databases 3.3 Distributed databasesprinciples U 3.4 Architectures DDBMS 3.5 Fragmentation 3.6 Transparency ity 3.7 Transaction control in Distributed Database 3.8 Distributed Query Processing mainly works with Module IV: Databases on the Web and Semi Structured Data 80 m 4.1 Web interfaces 4.2 Overview of XML 4.3 XML Schema )A 4.4 QueryingXML 4.5 Storage of XML data 4.6 The semi structured datamodel (c 4.7 Issues on implementation 4.8 Indexes for textdata e Module V: Advance Transactions and Emerging trends 99 in 5.1 Multi Level Transaction 5.2 Long-livedtransactions(Saga) nl 5.3 Data Warehouse And Data Mining 5.4 Activedatabase 5.5 Spatial Database O 5.6 Deductive database 5.7 Multi Media Data Base ty si r ve ni U ity m )A (c Advanced Database Management Systems 1 Module - I: Query Processing Notes e Key Learning Objectives : in At the end of this module, you will lean: Basic Concept about Query Processing. nl How to Convert SQL Queries into Relational Algebra. Basic Algo Used for Query Processing O ty r si ve ni U ity m )A (c Amity Directorate of Distance & Online Education 2 Advanced Database Management Systems Unit - 1.1: Basic concepts of Query Processing Notes e Main aim of query processing is to transform the query written in high –level language (Structured Query Language) into low –level language (implementing in relational algebra) with efficient , correct and sequential manner for retrieving the data from the database as per the requirement. One of the most important aspect of query processing is query optimisation (select an efficient execution strategy for processing a nl query). Query processing mainly divided into four phase. Decomposition O Optimisation Code generation Execution. ty Query in high-level language si Query Decomposition System catalog r Relational Algebra ve Expression Compile time Query Optimization Database Statistics ni Execution plan Code U Generation Generated ity Code Query Run time Database execution on runtime m Query Output )A Fig 1.1.1 Query Processing Procedure Decomposition: That is the first phase of query processing, here the high level language into a relational algebra query and check whether the query syntactically and semantically correct or not. The main phase of the decomposition are (c Analysis ( Programming Compiler analysed the lexical and syntactical correctness of query) Amity Directorate of Distance & Online Education Advanced Database Management Systems 3 Normalisation (Convert query into normalised form for easy manipulation) Notes e Two Types ○○ Conjunctive Normal Form (connected with AND operator) in ○○ Disjunctive Normal form (Connected with OR operator) Semantic analysis (Reject incorrectly normalised query) nl Mainly by using two graphical techniques ○○ Relation connected graph (showing the incorrect query). ○○ Normalised attribute connection graph (showing query is contradictory) O Simplification ○○ Detect redundant qualifications ty ○○ Eliminate common sub expressions ○○ Transfer the query into semantically equivalent ○○ Access restriction si ○○ View Definitions ○○ Integrity Constraints Query Restructuring (Query is restructured for providing efficient implemantaion). r ve ni U ity m )A (c Amity Directorate of Distance & Online Education 4 Advanced Database Management Systems Unit - 1.2: Converting SQL Queries into Relational Notes e Algebra in Basic SQL Relational Algebra Operations nl Relational Algebra devided in various groups Unary Relational Operations ○○ SELECT (symbol: σ) O ○○ PROJECT (symbol: π) ○○ RENAME (symbol: ρ) Relational Algebra Operations From Set Theory ty ○○ UNION (υ) ○○ INTERSECTION (∩ ), si ○○ DIFFERENCE (-) ○○ CARTESIAN PRODUCT ( x ) Binary Relational Operations ○○ JOIN r ve ○○ DIVISION Let’s study them in detail with solutions: SELECT (σ) ni The SELECT operation is used for selecting a subset of the tuples according to a given selection condition. Sigma(σ)Symbol denotes it. It is used as an expression to U choose tuples which meet the selection condition. Select operator selects tuples that satisfy a given predicate. σp(r) ity σ is the predicate. r stands for relation which is the name of the table. p is prepositional logic. m Basic Relational Algebra Operations (Unary): Selection, Projection. Picks tuples from the relation )A Selection: s () Picks columns from the relation Projection: π () (c Relational Algebra Operations (Set): Union, Set Difference Union: () U () Amity Directorate of Distance & Online Education Advanced Database Management Systems 5 New relation contains all tuples from both relations, duplicate tuples eliminated. Notes e Set Difference: R – S in Produces a relation with tuples that are in R but NOT in S. Relational Algebra Operations (Set): Cartesian Product, Intersect nl Cartesian Product: R x S Produces a relation that is concatenation of every tuple of R with every tuple of S O The Above operations are the 5 fundamental operations of relational algebra. Intersection: R ∩ S All tuples that are in both R and S ty Relational Algebra Operations (Join): Theta Join, Natural Join Theta Join: R F S = s F (R x S) si Select all tuples from the Cartesian product of the two relations, matching condition F r When F contains only equality “=“, it is called Equijoin ve Natural Join: R S Equijoin with common attributes eliminated Relational Algebra Operations: ni Division Standard language for querying and manipulating data U Structured Query Language Relational algebra queries to SQL queries ity FROM clause produces Cartesian product (x) of listed tables WHERE clause assigns rows to C in sequence and produces table containing only rows satisfying condition ( sort of like s ) SELECT clause retains listed columns (P ) m Example Relational Algebra Translation to SQL Though it is tough to convert a high level language (SQL) into a low level relational )A algebra but I try here to define with logic and example of few. To overcome from ambiguities, for relation symbols in a SQL statement assigned a distinct name through the alias (SQL aliases are used to give a table, or a column in a table, a temporary name. Aliases are often used to make column names more readable. An alias only exists for the duration of the query.) mechanism of SQL. In other. (c The SQL statements in where a relation symbol occurs multiple times, for example, Amity Directorate of Distance & Online Education 6 Advanced Database Management Systems SELECT * FROM R, R is rewritten into a SQL statement of the form Notes e SELECT * FROM R, R R1 in Here every occurrence is given a distinct (alias) name. 1. Select-from-where statements without sub queries nl Consider a general SELECT-FROM-WHERE statement of the form SELECT Select-list O FROM R1,... , R2 T1,... WHERE (Where-condition) Science here query does not use sub queries in where-condition then it can be ty translate, into the relational algebra as follows: π Select-list σWhere-condition(R1 X ……..X ρT1(R2) _ _ _ _ ) si Note: Here an alias R2 T1 in the FROM-clause corresponds to a renaming ρT1(R2). If there is no WHERE clause then there no need to include the selection σ in the expression. r For omitting the projection (π) we obtain the translation of the following ve special case: SELECT * ni FROM R1,... , R2 T1,... WHERE Where-condition U E.g.: SQL SELECT-FROM-WHERE statement is SELECTCinema_Name ity FROMActors_In, Cinema_Actor WHEREActor_Name = name AND date of birth = 1980 Translating relational algebra like m πCinema_NameσActor_Name=name (Actors_In X Cinema_Actor): )A ^ date of birth =1980 Example with Sub queries : The SQL query is (c SELECT LastName, FirstName FROM EMPLOYEE Amity Directorate of Distance & Online Education Advanced Database Management Systems 7 WHERE Salary >( SELECTMAX (Salary) Notes e FROM EMPLOYEE in WHERE IdNo = 2); It can be split into two following Sub queries like nl SELECT LastName, FirstName FROM EMPLOYEE O WHERE Salary > 10000 Respective R.A expression like that πLastName, FirstName(σ Salary>10000(EMPLOYEE)) ty SELECT MAX (Salary) FROM EMPLOYEE si WHERE IdNo = 2 Respective R.A expression like that ℱMAX Salary (σIdNo = 2 (EMPLOYEE)) r ve Translating Joins (SELECT * FROM R R1) JOIN (SELECT * FROM R R1) ON R1.A = R2.B ni Ans: ρR1(R) R1.A= R2.B R2(R). Group and Having U SELECT Select-list FROM From-list ity WHERE Where-condition GROUP BY Group-list HAVINGHaving-condition m Ans: πA1;:::;An;Select-listσ Having-condition ϒ A1;:::;An;Group-list;Agg-list(E): )A E.g.: SELECT name, SUM(length) SQL FROM Cinema Exce, Cinema (c WHERE cert = SeniorProducer GROUP BY name Amity Directorate of Distance & Online Education 8 Advanced Database Management Systems HAVING MIN(year) 30000 AND SEX=F(EMPLOYEE) U (EX5): σESSN=123456789 AND PNO=12(WORKS_ON) ity For selection, use the following search methods S1.Brute force Search: Fetch each and every record and check whether it is reach the selection condition or not. S2 Binary search: If a key attribute involves an equality comparison on m the selection condition then used more efficient search binary search rather than linear(compare EX1). )A S3:Using hash key: nTo retrieve the record, If a key attribute involves an equality comparison on the selection condition(compare EX1). SP4: To fetch multiple records used a primary index: If the comparison condition are like>;_; ;_;(R) If < list of attribute > has a key in relation R and fetch all tuples from R for the si values attributes in < list of attribute >. If < list of attribute>have NOT include with a key of relation R and duplicated tuples must be delete from the results. Methods used to remove duplicate are r ve Sorting Hashing Algorithm for SET operations ni Set operations: UNION, U INTERSECTION, SET DIFFERENCE ity CARTESIAN PRODUCT CARTESIAN PRODUCT for relations R and S with allpossible combinations of records from R and S where The result’s attributes included all attributes of R and S. Cost analysis of CARTESIAN PRODUCT m If relation R with n records and j attributes and relation S with m records and k attributes, the result like n*m records and j+k attributes. )A UNION Sort the two relations depending on the same attributes. Sorted files are concurrently Scan and merge both. (c If the same tuple exists in both relations then only one is placed in the merged results. Amity Directorate of Distance & Online Education 16 Advanced Database Management Systems Intersection Notes e Sort the two relations depending on the same attributes. Sorted files are concurrently Scan and merge both. in If the same tuple exists in both relations then only both is placed in the merged results nl SET DIFFERENCE R-S Tuples which are appear inrelation R but not in relation S Kept in the merged results. O Implementing Aggregate Operationsand Outer Joins Implementing Aggregate Operations: ty Aggregate operators: MIN, MAX, SUM, COUNT and AVG Options to implement aggregate operators: si Table Scan Index Example r ve SELECT MAX (SALARY) FROM EMPLOYEE; ni If there have an ascending order index of SALARY for the employee relation, then optimizer can traversing the index for the largest value from the root to a leaf with right most pointer in each index node. U Implementing Aggregate Operations (contd.): SUM, COUNT and AVG For a dense index (every record has has single index entry): ity In the index Apply the associated computation to the values For a non-dense index. Total number of records associated with each index entry must be accounted. m With GROUP BY: Aggregate operator must be worked individually on each group of tuples.Useing sorting or hashing on the group attributespartitioned the file )A into the appropriate groups and computes the aggregate function for the tuples for each group. Outer Join Operators: LEFT OUTER JOIN (c RIGHT OUTER JOIN Amity Directorate of Distance & Online Education Advanced Database Management Systems 17 FULL OUTER JOIN. Notes e The full outer join’s result is equivalent to result of the union left and right outer joins. in Example: SELECT F_NAME, D_NAME nl FROM (EMPLOYEE LEFT OUTER JOIN DEPARTMENT ON D_NO = D_NUMBER); O Note: The result of this query is a table of employee names and their associated departments. It is similar to a regular join result, only exception is though any employee doesn’t have an associated department then the employee’s name still appear in the result but department name would be null. ty Modifying Join Algorithms: Nested Loop or Sort-Merge joins can be modified toimplement outer join. In left outer join, use the left relation (outer relation) and construct result and if it match then si concatenated tuple is saved in to the result. And if not then the tuple isstill included in the result with a null value. r Executing a combination of relational algebra operators. ve Implement the previous left outer join example (Compute the JOIN of the EMPLOYEE and DEPARTMENTtables) TEMP1←πF_NAME,D_NAME(EMPLOYEED_NO=D_NUMBER DEPARTMENT) ni (Find the EMPLOYEEs that do not appear in the JOIN) TEMP2 ←πF _NAME (EMPLOYEE) - πF_NAME (Temp1) U {Pad each tuple in TEMP2 with a null D_NAME field} TEMP2 ←TEMP2 x ‘null’ ity {UNION the temporary tables to produce the LEFT OUTER JOIN} RESULT ← TEMP1 υ TEMP2 The cost of the outer join, as computed above, would include the cost of the associated steps like join, projections and union. m )A (c Amity Directorate of Distance & Online Education 18 Advanced Database Management Systems Unit - 1.4: Query Tree and Querygraph: Notes e Query tree: in Query tree is a relational algebra expression based tree data structure where the input relations of the query as leaf nodes and the relational algebra operations as internal nodes. nl Execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation. O Query graph: A query graph is a relational calculus expression based data structure. Science here a single graph dedicated for each query so here no need to maintain any operation ty order. Relation nodes: Represent by single circles. si Constant nodes: Represent by double circles. Graph edges: Represent the selection and join conditions. Note: The retrieved attributes are displayed by square brackets above each relation. r ve Heuristic Optimisation of Query Trees: The query parser generate a standard query tree depend on the perticular SQL query without any optimization. ni The canonical query tree is generated by the following sequence. Firstly the CARTESIAN PRODUCT of particular relations defined by the FROM U clause. The selection and join conditions of the WHERE clause are applied. The projection are applied on the attributes of SELECT clause. ity The initial inefficient query tree convert in to a final executable efficient query tree by the heuristic query optimizer. Example: For every project located in ‘Stafford’, retrieve the project number, the controlling m department number and the department manager’s last name, address and birthdate. Relation algebra: )A pPNUMBER, DNUM, LNAME, ADDRESS, BDATE (((sPLOCATION=‘STAFFORD’( PROJECT)) DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE)) SQL query: (c Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E Amity Directorate of Distance & Online Education Advanced Database Management Systems 19 WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND Notes e P.PLOCATION=‘STAFFORD’; The same query could correspond to many different relational algebra in expressions — and hence many different query trees. nl O ty Ex:Consider the SQL query below. si SELECT L_NAME FROM EMPLOYEE, WORKS ON, PROJECT r WHERE P_NAME=’Aquarius’ and P_NUMBER=P-NO and ESSN=SSN ve and B_DATE > ‘1984-12-31’; General transformation rules for relational Algebra operations are ni Cascadeσ: σ c1 and c2 and …and cn(R) =σc1(σc2(….. (σcn(R)) ….)) Commutativity ofσ: σc1(σc2(R)) =σc2(σc1(R)) Cascade of π: πList1(_List2(…. (πListn(R))….)) =πList1(R) U Commuting σ with π: If the selection condition c involves only those attributes A1;A2; : : : ;An in the projection list. πA1;A2…..An(σc(R)) =σc(πA1;A2…..An(R)) ity Commutativity of (and X): R c S = S.c R m R XS = S X R )A (c Amity Directorate of Distance & Online Education 20 Advanced Database Management Systems Unit -1.5: Using Heuristics in Query Optimisation Notes e Process for heuristics optimization in The parser of a high-level query generates an initial internal representation; Apply heuristics rules to optimise the internal representation. nl A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query. The main heuristic is to apply first the operations that O reduce the size of intermediate results. E.g., Apply SELECT and PROJECT operations before ty applying the JOIN or other binary operations. Heuristic Optimisation of Query Trees: si The same query could correspond to many different relational algebra expressions — and hence many different query trees. The task of heuristic Optimisation of query trees is to find a final query tree that is efficient to execute. r ve Example: Q: SELECT LNAME FROM EMPLOYEE, WORKS_ON, PROJECT ni WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO AND ESSN=SSN U AND BDATE > ‘1957-12-31’; ity m )A (c Amity Directorate of Distance & Online Education Advanced Database Management Systems 21 Notes e in nl O ty r si ve ni U ity m )A (c Amity Directorate of Distance & Online Education 22 Advanced Database Management Systems Notes e in nl O ty All the above figure shows the above examples (Ref : fundamental of database system Elmsri&Navathe) r si ve ni U ity m )A (c Amity Directorate of Distance & Online Education Advanced Database Management Systems 23 Unit - 1.6 : Functional Dependency Notes e Functional dependencies (FDs) are used to specify formal measures of the “goodness” of relational designs. in FDs and keys are used to define normal forms for relations. FDs are constraints that are derived from the meaning and interrelationships of nl the data attributes. A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y. O X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance r(R). ty X → Y in R specifies a constraint on all relation instances r(R). FDs are derived from the real-world constraints on the attributes. si Social Security Number determines employee name SSN → ENAME. r Project Number determines project name and location ve PNUMBER → {PNAME, PLOCATION}. Employee SSN and project number determines the hours per week that the employee works on the project {SSN, PNUMBER} → HOURS. ni An FD is a property of the attributes in the schema R. The constraint must hold on every relation instance r(R). U If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]). Inference Rules for FDs ity Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold Armstrong’s inference rules m ○○ A1. (Reflexive) If Y subset-of X, then X à Y ○○ A2. (Augmentation) If X à Y, then XZ à YZ (Notation: XZ stands for X U Z) )A ○○ A3. (Transitive) If X à Y and Y à Z, then X à Z A1, A2, A3 form a sound and complete set of inference rules Additional Useful Inference Rules (c Decomposition If X → YZ, then X → Y and X → Z Amity Directorate of Distance & Online Education 24 Advanced Database Management Systems Union Notes e If X → Y and X → Z, then X → YZ in Psuedotransitivity If X → Y and WY → Z, then WX v Z nl Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F If we take an example like that O Student(sid,sname,address) Course(cid,cname) Enrolled(sid,cid,grade) ty Then the several functional dependency work like that We can say that sid determines address – We’ll write this si sid → address This is called a functional dependency (FD) r (Note: An FD is just another integrity constraint) ve We’d expect the following functional dependencies to hold in our Student database – sid → sname,address ni – cid → cname – sid, cid → grade U A functional dependency X → Y is simply a pair of sets (of field names) – Note: the sloppy notation A,B → C,D rather than {A,B} → {C,D} ity Formalities Given a relation R=R(A1 :τ1 , …, An :τ n ), m and X, Y (⊆{A1 , …, An }), an instance r of R satisfies X→Y, if – For any two tuples t1 , t2 in R, if t1.X=t2.X then t1.Y=t2.Y Note: This is a semantic assertion. We can not look at an instance to determine which )A FDs hold (although we can tell if the instance does not satisfy an FD!) Assume that X → Y and Y → Z are known to hold in R. It’s clear that X → Z holds too. We shall say that an FD set F logically implies X → Y, and write F [X → Y – e.g. {X (c → Y, Y → Z} [ X → Z The closure of F is the set of all FDs logically implied by F, i.e. F + @ {X→Y | F [ Amity Directorate of Distance & Online Education Advanced Database Management Systems 25 X→Y} Notes e The set F+ can be big, even if F is small. in Candidate keys and FDs If R=R(A1 :τ1 , …, An :τ n ) with FDs F and X⊆ {A1 , …, An }, then X is a candidate key for R if – X → A1 , …,An ∈ F+ – For no proper subset Y⊆X is Y → A1 , …,An nl ∈ F+. Armstrong’s axioms O Reflexivity: If Y⊆X then F \ X→Y – (This is called a trivial dependency) – Example: sname,address→address Augmentation: If F \ X→Y then F \ X,W→Y,W – Example: As cid→cname then cid,sid→cname,sid ty Transitivity: If F \ X→Y and F \ Y→Z then F \ X→Z – Example: As sid,cid→cid and cid→cname, then sid,cid→cname si Consequences of Armstrong’s axioms: Union: If F \ X→Y and F \ X→Z then F \ X→Y,Z Pseudo-transitivity: If F \ X→Y and F \ r W,Y→Z then F \ X,W→Z Decomposition: If F \ X→Y and Z⊆Y then F \ X→Z ve Equivalence: Two sets of FDs, F and G, are said to be equivalent if F+ =G+ For example: ni {(A,B→C), (A→B)} and {(A→C), (A→B)} are equivalent U F + can be huge – we’d prefer to look for small equivalent FD sets. Minimal cover: ity An FD set, F, is said to be minimal if Every FD in F is of the form X→A, where A is a single attribute For no X→A in F is F-{X→A} equivalent to F m For no X→A in F and Z⊂X is (F-{X→A})∪{Z→A} equivalent to F For example, {(A→C), (A→B)} is a minimal cover for {(A,B→C), (A→B)} )A FACT: If F is an FD set, and X→Y∉F + then there exists an attribute A∈Y such that X→A∉F + Why Armstrong’s axioms? (c Soundness – If F \ X→Y is deduced using the rules, then X→Y is true in any relation in which the dependencies of F are true Amity Directorate of Distance & Online Education 26 Advanced Database Management Systems Completeness – If X→Y is is true in any relation in which the dependencies of F are true, Notes e then F \ X→Y can be deduced using the rules. Soundness Consider the Augmentation rule: – We have X→Y, i.e. if t1.X=t2.X then in t1.Y=t2.Y – If in addition t1.W=t2.W then it is clear that t1.(Y,W)=t2.(Y,W) Consider the Transitivity rule: – We have X→Y, i.e. if t1.X=t2.X then t1.Y=t2.Y (*) – We have Y→Z, i.e. if t1.Y=t2.Y then t1.Z=t2.Z (**) – Take two tuples s1 and s2 such that s1 nl.X=s2.X then from (*) s1.Y=s2.Y and then from (**) s1.Z=s2.Z O ty r si ve ni U ity m )A (c Amity Directorate of Distance & Online Education Advanced Database Management Systems 27 Unit - 1.7: Normalisation Notes e Normalisation: Process of decomposing unsatisfactory “bad” relations by breaking up their attributes into smaller relations in Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form nl 2NF, 3NF, BCNF based on keys and FDs of a relation schema 4NF based on keys, multi-valued dependencies O First Normal Form: Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are non-atomic. ty Considered to be part of the definition of relation r si ve ni U ity m )A Second Normal Form: Uses the concepts of FDs, primary key Definitions: Prime attribute - attribute that is member of the primary key K (c Full functional dependency - a FD Y → Z where removal of any attribute from Y means the FD does not hold any more. Amity Directorate of Distance & Online Education 28 Advanced Database Management Systems Examples Notes e Second Normal Form: {SSN, PNUMBER} → HOURS is a full FD since neither SSN → HOURS nor in PNUMBER → HOURS hold {SSN, PNUMBER} → ENAME is not a full FD (it is called a partial dependency ) nl since SSN → ENAME also holds A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key O R can be decomposed into 2NF relations via the process of 2NF Normalisation ty r si ve ni U ity m Third Normal Form: )A Definition Transitive functional dependency – if there a set of attributes Z that are neither a primary or candidate key and both X à Z and Y à Z holds. (c Examples: SSN → DMGRSSN is a transitive FD since Amity Directorate of Distance & Online Education Advanced Database Management Systems 29 SSN v DNUMBER and DNUMBER → DMGRSSN hold Notes e SSN → ENAME is non-transitive since there is no set of attributes X where SSN → X and X → ENAME in A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key. nl BCNF (Boyce-Codd Normal Form): A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X à A holds in R, then X is a superkey of R O ○○ Each normal form is strictly stronger than the previous one: Every 2NF relation is in 1NF Every 3NF relation is in 2NF ty Every BCNF relation is in 3NF ○○ There exist relations that are in 3NF but not in BCNF si ○○ The goal is to have each relation in BCNF (or 3NF) r ve ni U ity m )A {Student,course} à Instructor Instructor à Course (c Decomposing into 2 schemas Amity Directorate of Distance & Online Education 30 Advanced Database Management Systems ○○ {Student,Instructor} {Student,Course} Notes e ○○ {Course,Instructor} {Student,Course} {Course,Instructor} {Instructor,Student in Example: Given the relation nl Book(Book_title, Authorname, Book_type, Listprice, Author_affil, Publisher) The FDs are O Book_title → Publisher, Book_type Book_type → Listprice ty Authorname → Author_affil Fourth normal form (4NF): si Fourth normal form (4NF) is a level of database Normalisation where there are no non-trivial multivalued dependencies other than a candidate key. It builds on the first three normal forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). r It states that, in addition to a database meeting the requirements of BCNF, it must not contain more than one multivalued dependency. ve Properties – A relation R is in 4NF if and only if the following conditions are satisfied: It should be in the Boyce-Codd Normal Form (BCNF). ni the table should not have any Multi-valued Dependency. A table with a multivalued dependency violates the Normalisation standard of U Fourth Normal Form (4NK) because it creates unnecessary redundancies and can contribute to inconsistent data. To bring this up to 4NF, it is necessary to break this information into two tables. Example – Consider the database table of a class whaich has two relations R1 ity contains student ID(SID) and student name (SNAME) and R2 contains course id(CID) and course name (CNAME). SID SNAME m S1 A S2 B )A CID CNAME C1 C C2 D (c When there cross product is done it resulted in multivalued dependencies: Amity Directorate of Distance & Online Education Advanced Database Management Systems 31 SID SNAME CID CNAME Notes e S1 A C1 C S1 A C2 D in S2 B C1 C S2 B C2 D nl Multivalued dependencies (MVD) are: O SID->->CID; SID->->CNAME; SNAME->->CNAME Joint dependency – Join decomposition is a further generalisation of Multivalued dependencies. If the join of R1 and R2 over C is equal to relation R. then we can say ty that a join dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition of R. A JD {R1, R2, …, Rn} is said to hold over a relation R if si R1, R2, ….., Rn is a lossless-join decomposition. The *(A, B, C, D), (C, D) will be a JD of R if the join of join’s attribute is equal to the relation R. Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R. R1 r ve COMPANY PRODUCT C1 CD C1 DVD ni C2 HD C2 HD U R2 AGENT COMPANY ity ARKA C1 ARKA C2 RAJA C1 m R3 AGENT PRODUCT )A ARKA CD ARKA DVD ARKA HD RAJA HD (c Table – R1 R2 R3 Amity Directorate of Distance & Online Education 32 Advanced Database Management Systems COMPANY PRODUCT AGENT Notes e C1 CD ARKA C1 DVD ARKA in C2 HD HD C1 HD ARKA nl Fifth Normal Form / Projected Normal Form (5NF): A relation R is in 5NF if and only if every join dependency in R is implied by the O candidate keys of R. A relation decomposed into two relations must have loss-less join Property, which ensures that no spurious or extra tuples are generated, when relations are reunited through a natural join. Properties – A relation R is in 5NF if and only if it satisfies following conditions: ty R should be already in 4NF. It cannot be further non loss decomposed (join dependency) si Example – Consider the above schema, with a case as “if a company makes a product and an agent is an agent for that company, then he always sells that product for the company”. Under these circumstances, the ACP table is shown as: r ve AGENT COMPANY PRODUCT A1 PQR Nut A1 PQR Bolt ni A1 XYZ Nut A1 XYZ Bolt A2 PQR Nut U The relation ACP is again decompose into 3 relations. Now, the natural Join of all the three relations will be shown as: ity R1 AGENT COMPANY A1 PQR m A1 XYZ A2 PQR )A R2 AGENT PRODUCT A1 Nut A1 Bolt (c A2 Nut Amity Directorate of Distance & Online Education Advanced Database Management Systems 33 R3 Notes e COMPANY PRODUCT in PQR Nut PQR Bolt XYZ Nut nl XYZ Bolt Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13 O and R2 over ‘Agent’ and ‘Product’ will be table ACP. Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is a lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the property of lossless join. ty Module Out comes: si From the above, we learn Query processing technique Heuristic operation of Query processing Several Algorithm Used in Query processing r ve Self Assessment ni Try to draw the quer graph of the given 5 questions above. What are the main algorithm used in query processing. U Exercise Fill in the blanks through the right answers. 1. Query tree is a relational algebra expression based ---------- data structure. ity a) Tree b) Graph c) Heuristic m d) Hierarchical 2. Nested Loop or Sort-Merge joins can be modified to implement ………….. join. )A a) Outer b) Inner c) Theta (c d) Gama 3. ……………..operator must be worked individually on each group of tuples. Amity Directorate of Distance & Online Education 34 Advanced Database Management Systems a) Aggregate Notes e b) Join c) Union in d) Add 4. External sorting algorithms are applicable for …………….records files nl a) large b) small O c) text d) medium 5. External sorting algorithms is not stored in…………………. ty a) main memory b) external memory si c) data structure d) flash disk r Identify the following statement are true or false ve 1. High –level language is structured query Language. 2. Query processing mainly divided into 3 phase 3. Conjunctive Normal Form connected with OR operator ni 4. SQL statement assigned a distinct name through the alias 5. DBMS import an execution strategy for retrieving the data from the database files. U Answer Keys Question Answer Question Answer Question Answer 1 a 2 a 3 a ity 4 a 5 a Question Answer Question Answer Question Answer m 1 T 2 F 3 F 4 T 5 T )A Case study of Query Processing: Let us discuss the whole process with an example. Let us consider the following two relations as the example tables for our discussion; Employee(Eno, Ename, Phone) (c Proj_Assigned(Eno, Proj_No, Role, DOP) Amity Directorate of Distance & Online Education Advanced Database Management Systems 35 where, Notes e Eno is Employee number, Ename is Employee name, in Proj_No is Project Number in which an employee is assigned, Role is the role of an employee in a project, nl DOP is duration of the project in months. With this information, let us write a query to find the list of all employees who are O working in a project which is more than 10 months old. SELECT Ename FROM Employee, Proj_Assigned ty WHERE Employee.Eno = Proj_Assigned.Eno AND DOP >10; Input: si A query written in SQL is given as input to the query processor. For our case, let us consider the SQL query written above. Step 1: Parsing r ve In this step, the parser of the query processor module checks the syntax of the query, the user’s privileges to execute the query, the table names and attribute names, etc. The correct table names, attribute names and the privilege of the users can be taken from the system catalog (data dictionary). ni Step 2: Translation If we have written a valid query, then it is converted from high level language SQL to low level instruction in Relational Algebra. U For example, our SQL query can be converted into a Relational Algebra equivalent as follows; ity πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_ Assigned)) Step 3: Optimiser Optimiser uses the statistical data stored as part of data dictionary. The statistical m data are information about the size of the table, the length of records, the indexes created on the table, etc. Optimiser also checks for the conditions and conditional attributes which are parts of the query. )A Step 4: Execution Plan A query can be expressed in many ways. The query processor module, at this stage, using the information collected in step 3 to find different relational algebra expressions that are equivalent and return the result of the one which we have written (c already. Amity Directorate of Distance & Online Education 36 Advanced Database Management Systems For our example, the query written in Relational algebra can also be written as the Notes e one given below; πEname(Employee Eno (σDOP>10 (Prof_Assigned))) in So far, we have got two execution plans. Only condition is that both plans should give the same result. nl Step 5: Evaluation Though we got many execution plans constructed through statistical data, though they return same result (obvious), they differ in terms of Time consumption to execute O the query, or the Space required executing the query. Hence, it is mandatory to choose one plan which obviously consumes less cost. At this stage, we choose one execution plan of the several we have developed. ty This Execution plan accesses data from the database to give the final result. In our example, the second plan may be good. In the first plan, we join two relations (costly operation) then apply the condition (conditions are considered as si filters) on the joined relation. This consumes more time as well as space. In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the Employee table. This join may need to compare less number of records. r Hence, the second plan is the best (with the information known, not always). ve Output: The final result is shown to the user. The overall information discussed above are depicted in Figure 2 below; ni U ity m )A (c Amity Directorate of Distance & Online Education Advanced Database Management Systems 37 Notes e in nl O ty r si ve ni U ity m )A (c Amity Directorate of Distance & Online Education 38 Advanced Database Management Systems Module - II - Object Oriented and Extended Relational Notes e Database Technologies in Key Learning Objectives: At the end of this module, you will learn: nl Architecture of OODBMS Architecture and modelling about ORDBMS O What are Specialization and Generalisation What are Aggregation and Associations What are the ObjectQueryLanguage ty Basic Concept of oops: Object oriented programming is a type of programming which uses objects and classes its functioning. The object oriented programming is based on real world entities si like inheritance, polymorphism, data hiding, etc. It aims at binding together data and function work on these data sets into a single entity to restrict their usage. r Some basic concepts of object oriented programming are − ve CLASS OBJECTS ENCAPSULATION ni POLYMORPHISM INHERITANCE ABSTRACTION U Class : A class is a data-type that has its own members i.e. data members and member functions. It is the blueprint for an object in object oriented programming language. It is the basic building block of object oriented programming in c++. The ity members of a class are accessed in programming language by creating an instance of the class. Some important properties of class are − Class is a user-defined data-type. m A class contains members like data members and member functions. Data members are variables of the class. )A Member functions are the methods that are used to manipulate data members. Data members define the properties of the class whereas the member functions define the behaviour of the class. A class can have multiple objects which have properties and behaviour that in (c common for all of them. Amity Directorate of Distance & Online Education Advanced Database Management Systems 39 Object : An object is an instance of a class. It is an entity with characteristics and Notes e behaviour that are used in the object oriented programming. An object is the entity that is created to allocate memory. A class when defined does not have memory chunk itself which will be allocated as soon as objects are created. in Syntax class_name object_name; nl Encapsulation: In object oriented programming, encapsulation is the concept of wrapping together of data and information in a single unit. A formal definition of encapsulation would be: encapsulation is binding togather the data and related function O that can manipulate the data. Let’s understand the topic with an easy real life example, ty In our colleges, we have departments for each course like computer science, information tech. , electronics, etc. each of these departments have their own students and subjects that are kept track of and being taught. Let’s think of each department as a class that encapsulates the data about students of that department and the subjects si that are to be taught. Also a department has some fixed rules and guidelines that are to be followed by the students that course like timings, methods used while learning, etc. this is encapsulation in real life, there are data and there are ways to manipulate data. r Due to the concept of encapsulation in object oriented programming another ve very important concept is possible, it is data abstraction or Data Hiding. It is possible as encapsulating hides the data at show only the information that is required to be displayed. ni Polymorphism: The name defines polymorphism is multiple forms. which means polymorphism is the ability of object oriented programming to do some work using multiple forms. The behaviour of the method is dependent on the type or the situation in which the method is called. U Let’s take a real life example, A person can have more than one behaviour depending upon the situation. like a woman a mother, manager and a daughter. And, this defines her behaviour. This is from where the concept of polymorphism came from. ity In c++ programming language, polymorphism is achieved using two ways. They are operator overloading and function overloading. Operator overloading: In operator overloading and operator can have multiple m behaviour in different instances of usage. Function overloading: Functions with the same name that can do multiple types based on some condition. )A Inheritance: It is the capability of a class to inherit or derive properties or characteristics other class. It is very important and object oriented program as it allows reusability i.e. using a method defined in another class by using inheritance. The class that derives properties from other class is known as child class or subclass and the class from which the properties are inherited is base class or parent class. (c Several types of inheritances are Amity Directorate of Distance & Online Education 40 Advanced Database Management Systems Single inheritance Notes e Multiple inheritance Multi level inheritance in Hierarchical inheritance Hybrid inheritance nl Abstraction: Data abstraction or Data Hiding is the concept of hiding data and showing only relevant data to the final user. It is also an important part object oriented programing. O Let’s take real life example to understand concept better, when we ride a bike we only know that pressing the brake will stop the bike and rotating the throttle will accelerate but you don’t know how it works and it is also not think we should know that’s why this is done from the same as a concept data abstraction. ty In C plus plus programming language, write two ways using which we can accomplish data abstraction − using class si using header file Object-Oriented DBMS: r Object-Oriented Database Management Systems (OODBMS) are an extension of ve OO programming language techniques into the field of persiste

Advanced Database Management Systems PDF

Document Details

Tags

Related

Summary

Full Transcript