Summary

These notes provide an overview of data models, including hierarchical, network, entity-relationship, object-based, and relational models. They discuss the concept of data models and their importance in management information systems. The notes also explain logical and physical database design, along with data independence and schema architecture.

Full Transcript

Data Models Hierarchical Network E-R Relational Object-based Data Data Data Data Data Model Model Model Model Model Imp Points The concept of Data Modeling started becoming impor...

Data Models Hierarchical Network E-R Relational Object-based Data Data Data Data Data Model Model Model Model Model Imp Points The concept of Data Modeling started becoming important in the 1960s, as management information systems (MIS) became popular. The first two are “the hierarchical data model” and “the network data model.” which came in 1960s The first true commercial database system came available in 1964, was called the Integrated Data Store (IDS), and was developed by Charles Bachman, with General Electric based upon Network Model. IBM chose to focus on hierarchical models, designed for their Information Management System (IMS). In 1970, Edgar F. Codd’s suggested that all data within a database could be displayed as tables using columns and rows, which would be called “relations.” These “relations” would be accessible using a non-procedural, or declarative, language. This idea led to much higher productivity. It was faster, more efficient, and prompted IBM to create SQL. (Originally called SEQUEL or Structured English Query Language). Hierarchical database model In this model the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containing only one value. It mandates that each child record has only one parent, whereas each parent record can have one or more child records. In order to retrieve data from a hierarchical database, the whole tree needs to be traversed starting from the root node. This model is recognized as the first database model created by IBM in the 1960s Examples of hierarchical data represented as relational tables Employee Table Computer Table Emp First Last Name Dept. Serial Num Type User EmpNo No Name Num 100 Almukhtar Khan 10-L 3009734-4 Computer 100 101 Gaurav Soni 10-L 3-23-283742 Monitor 100 102 Siddhartha Soni 20-B 2-22-723423 Monitor 100 103 Siddhant Soni 20-B 232342 Printer 100 Network model While the hierarchical database model structures data as a tree of records, with each record having one parent record and many children, the network model allows each record to have multiple parent and child records, forming a generalized graph structure. The network model was adopted by the CODASYL Data Base Task Group in 1969. It is sometimes known as the CODASYL model for this reason. Entity-Relationship Data Model An ER model is the logical representation of data as objects and relationships among them. These objects are known as entities, and relationship is an association among these entities. This model was designed by Peter Chen and published in 1976 papers. Object-based Data Model: An extension of the ER model with notions of methods, functions, class, encapsulation, and object identity, as well. This model supports a rich type system that includes structured and collection types. Relational Data Model This type of model designs the data in the form of rows and columns within a table. Thus, a relational model uses tables for representing data and in-between relationships. Tables are also called relations. This model was initially described by Edgar F. Codd, in 1969. The relational data model is the widely used model which is primarily used by commercial data processing applications. C_Id Name Address Salary Department 1 Varun Rajpura 80000 CSE 2 Amrit Jaipur 50000 Academic 3 Ravi Nagpur 70000 CSE 4 Ranjeet Raipur 65000 Academic 5 Aman Puri 55000 Support 6 Jatin Delhi 60000 IT Database design It is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model. Logical Designing It deals with relationships and dependencies amongst the various pieces of information have been determined. A logical structure then mapped into the storage objects supported by the DBMS. In the case of relational databases the storage objects are tables which store data in rows and columns. Physical design It specifies the physical configuration of the database on the storage media. Here we deal with many aspects like: Security, Replication, High-availability, Partitioning, Backup & restore schemes. Data Abstraction → Data Independence Database Abstraction /Database Abstraction Layers It is an application programming interface which unifies the communication between a computer application and databases such as SQL Server, IBM Db2, MySQL, PostgreSQL, Oracle or SQLite. Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It provides a different view and helps in achieving data independence which is used to enhance the security of data. Schema Architecture External Level Logical Data Independence Conceptual Level Physical Data Independence Physical Level Data Store Physical or Internal Level -> It is the lowest level of abstraction for DBMS which defines how the data is actually stored, it defines data-structures to store data and access methods used by the database. Logical or Conceptual Level -> It is the intermediate level or next higher level. It describes what data is stored in the database and what relationship exists among those data. View or External Level -> It is the highest level. In view level, there are different levels of views and every view only defines a part of the entire data. It also simplifies interaction with the user and it provides many views or multiple views of the same database. Data independence It is the ability to modify the scheme without affecting the programs and the application to be rewritten. Data is separated from the programs, so that the changes made to the data will not affect the program execution and the application. There are two levels of data independence based on three levels of abstraction. These are as follows − Physical Data Independence Logical Data Independence Schema Schema can be defined as the design of a database. The overall description of the database is called the database schema. You can relate it as something like Functions, Comments, Preprocessor, statements, types and variables in programming languages. Subschema is subset of schema which allows the user to view only their authorized part. Types 1. Physical Schema: The design of a database at physical level is called physical schema, how the data stored in blocks of storage is described at this level. 2. Logical schema: Logical schema can be defined as the design of database at logical level. In this level, the programmers as well as the database administrator (DBA) work. 3. View Schema: View schema can be defined as the design of database at view level which generally describes end-user interaction with database systems. What is an Instance? Databases change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance. Data Independence/Transparency There are two types of data independence: physical and logical data independence. Physical data independence is the ability to modify the physical schema without causing application programs to be rewritten. Logical data independence is the ability to modify the logical schema without causing application programs to be rewritten. Logical data independence is more difficult to achieve than physical data independence. Relationship 1 to 1 1 to many Many to 1 Many to many (M-N) Total participation vs Partial Participation L_no. P_Name Type P_Id 1 1 Person Has License DOB Address Expiry_date F_Id M_Name F_Name P_Id 1 1 Male Friendship Female DOB Address DOB Address C_Name O_Id Quantity C_Id 1 N Customer Places Order Price Mobile_no. Address S_Name C_Id S_Id C_Name M N Student Enrolls Course Mobile_no. Address Batch Time Weak Entity Set As the weak entities do not have any primary key, They cannot be identified on their own, so they depend on some other entity (known as owner entity). The weak entities have total participation constraint (existence dependency) in its identifying relationship with owner identity. Weak entity types have partial keys. Partial Keys are set of attributes with the help of which the tuples of the weak entities can be distinguished and identified. Weak Entity Set Weak entities are represented with double rectangular box in the ER Diagram the identifying relationships are represented with double diamond. Partial Key attributes are represented with dotted lines. E_Id Name Amount Name Age Employee Has Dependent Address Q1. The ability to change the conceptual schema without affecting the external schemas or application programs is known a) Program Data Independence b) Physical Data Independence c) Logical Data Independence d) Data Abstraction Q2. The information related to data in a database is refer as a) Knowledge b) Wisdom c) Big Data d) Metadata Q3. In a relational model, relations are termed as____________ a) Tuples b) Attributes c) Table d) Rows Q4. Consider the following ER diagram The minimum number of tables needed to represent M, N, P, R1, R2 is a) 2 b)3 c) 4 d)5 Q5. What is the minimum number of tables needed for the above ER diagram? a) 3 b) 4 c) 5 d) 6 Q6. After Minimizing this ER diagram we will convert it into relational model. As we know ‘Places’ & ‘Order’ will be combined so what will be the correct degree of ‘PlacesOrder’ Table. a) 2 b) 3 c) 4 d) 5 Q7. After Minimizing this ER diagram we will convert it into relational model. How many attributes will be created in ‘Includes’ Table? a) 1 b) 2 c) 3 d) 4 Q8. Which symbol denote derived attributes in ER Model? (a) Double ellipse (b) Dashed ellipse (c) Squared ellipse (d) Ellipse with attribute name underlined Q9. Consider the following Relationship Entity Diagram (ERD) Qualification Date M N Person Qualification Exam Name NID ExamID ExamName Which of the following possible relations will not hold if the above ERD is maped into a relation model? (a) Person (NID, Name) (b) Qualification (NID, ExamID, QualifiedDate) (c) Exam (ExamID, NID, ExamName) (d) Exam (ExamID, ExamName) Q10. For a weak entity set to be meaningful, it must be associated with another entity set in combination with some of their attribute values, is called as: (a)Neighbour Set (b)Strong Entity Set (c)Owner Entity Set (d)Weak Set Q11. Which one of the following pairs is correctly matched in the context of database design? List — I List — II I. Specialization A. Result of taking the union of two more disjoint (lower level) entity sets to produce a higher-level entity set. II. Generalization B. Express the number of entities to which another entity can be associated via a relationship set III. Aggregation C. Result of taking a subset of a higher-level entity set to form a lower-level entity set. IV. Mapping cardinalities D. An abstraction in which relationship sets (along with their associated entity sets) are treated as higher-level entity sets and can participate in relationships. (a) I-D, II-A , III-B, IV-C (b) I-D, II-C, III-B, IV-A (c) I-C, II-D, III-A, IV-B (d) I-C, II-A, III-D, IV-B Answers 1) C 2) D 3) C 4) B 5) B 6) B 7) C 8) B 9) C 10) C 11) D SQL Introduction SQL stands for Structured Query Language Declarative or Informal Not case sensitive Types of Commands DDL CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE); DROP TABLE EMPLOYEE; Alter for Add column, remove column, change datatype, column name, change datatype length ALTER TABLE table_name ADD column_name COLUMN-definition; TRUNCATE TABLE table_name; Rename old to new name DML INSERT INTO TABLE_NAME (value1, value2, value3,.... valueN); UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITION] DELETE FROM table_name [WHERE condition]; DCL GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_US ER; REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2; Transaction Control Language COMMIT; ROLLBACK; SAVEPOINT SAVEPOINT_NAME; DQL SELECT expressions FROM TABLES WHERE conditions; SELECT Distinct Order by The ORDER BY keyword is used to sort the result-set in ascending or descending order. By default ascending SELECT * FROM employee ORDER BY Salary DESC; Examples Employee table ID NAME SALARY DEPARTMENT 1 A 10000 IT 2 B 20000 HR 3 C 30000 IT 4 A 40000 SALES 5 D 50000 IT Aggregate Functions Sum Avg Count Max Min Group by & Having The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns. The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions. Operators AND, OR, NOT SELECT * FROM EMPLOYEE WHERE DEPARTMENT=‘IT’ OR DEPARTMENT=‘HR’; SELECT * FROM EMPLOYEE WHERE DEPARTMENT=‘IT AND NAME=‘A’; SELECT * FROM EMPLOYEE WHERE NOT DEPARTMENT=‘IT’; >, Y is a trivial functional dependency if Y is a subset of X For example X ->X, XY->X, XY ->Y Qus L.H.S intersect R.H.S = ?? Non-trivial functional dependency** X→ Y has a non-trivial functional dependency if B is not a subset of A. Rules of Functional Dependencies X->Y Reflexive rule – If X is a set of attributes and Y is subset of X, then X holds a value of Y. Augmentation rule: When X -> Y holds, and c is attribute set, then Xc -> Yc also holds. That is adding attributes which do not change the basic dependencies. Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -> y holds and y -> z holds, then x -> z also holds. X -> y is called as functionally that determines y. Rules of Functional Dependencies X->Y Union: If X → Y and X → Z then X → YZ Decomposition: If X → YZ then X → Y and X → Z Qus. If XY->Z can I write X->Z & Y->Z Normalization Normalization is the process of organizing the data in the database. Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization divides the larger table into the smaller table and links them using relationship. The normal form is used to reduce redundancy from the database table. First Normal Form (1NF) Each attribute of a relation contains only an atomic value. No Multivalued attribute only single valued Student_id Name Subject 101 AK Computer Network, JAVA 102 VK DBMS, C++, JAVA Software Engineering, 103 Amrita Compiler Design Second Normal Form (2NF) A relation must be in 1NF and No partial dependency should exist in the relation. Partial Dependency: If a non-prime attribute can be determined by the part of the candidate key in a relation, it is known as a partial dependency. if L.H.S is the proper subset of a candidate key and R.H.S is the non- prime attribute, then it shows a partial dependency. Partial Dependency & Prime, Non-Prime attributes Partial Dependency: If a non-prime attribute can be determined by the part of the candidate key in a relation, it is known as a partial dependency. if L.H.S is the proper subset of a candidate key and R.H.S is the non- prime attribute, then it shows a partial dependency. R(ABCD) && FD= AB->C, C->D, B->D && Candidate key=AB Third Normal Form (3NF) A relation must be in second normal form (2NF) And there should be no transitive functional dependency exists for non-prime attributes in a relation. Rollno State City 1 Punjab Chandigarh 2 Haryana Ambala 3 Punjab Chandigarh 4 Haryana Ambala 5 Uttar Pradesh Ghaziabad Check whether a table in 3rd NF or Not In all FDs X->Y X is a super key or candidate key And Y is a prime attribute, i.e., Y is a part of candidate key. Boyce-Codd Normal Form (BCNF) A relation is in 3NF And for every functional dependency, X → Y, L.H.S of the every functional dependency (X) be the super key of the table. R(ABCD) && FD: A->B, B->C, C->D, D->A Fourth Normal Form A relation is in BCNF. And, there is no multivalued dependency exists in the relation. Multivalued dependency: For a dependency X → Y, if for a single value of X, multiple values of Y exists, then the relation may have a multi-valued dependency. It is represented by the double arrow sign (→→). Fourth Normal Form STU_ID COURSE HOBBY 21 Computer Dancing 21 Math Singing 34 Chemistry Dancing 74 Biology Cricket 59 Physics Hockey Fifth normal form (5NF) A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless. No spurious tuples 5NF is also known as Project-join normal form (PJ/NF). Closure Property The set of all those attributes which can be functionally determined from an attribute set is called as a closure of that attribute set. Closure of attribute set {X} is denoted as {X}+. Consider a relation R(A , B , C , D , E , F , G ) with the functional dependencies A → BC, BC → DE, D → F, CF → G Find Candidate Key from Closure Minimal set of attribute whose closure contains all the attributes of the relation, then that attribute set is called as a candidate key of that relation. Minimal Cover/Canonical Cover/Irreducible R ( W , X , Y , Z ) – : X → W, WZ → XY, Y → WXZ Questions on Normalization Q1. Let R=(A,B,C,D,E,F) be a relation scheme with the following dependencies: C→F, E→A, EC→D, A→B. Which of the following is a key for R? (a) CD (b) EC (c) AE (d) AC Q2. Normalization from which is based on transitive dependency is classified as: (a) First normal form. (b) Second normal form. (c) Fourth normal form. (d) Third normal form. Q3. Which of the following is NOT a superkey in a relational schema with attributes V, W, X, Y, Z and primary key VY? (a) VXYZ (b) VWXZ (c) VWXY (d) VWXYZ Q4. Relation R has eight attributes ABCDEFGH. Fields of R contain only atomic values. F = {CH -> G, A -> BC, B -> CFH, E -> A, F -> EG} is a set of functional dependencies (FDs) so that F+ is exactly the set of FDs that hold for R. How many candidate keys does the relation R have? (a) 3 (b) 4 (c) 5 (d) 6 Qus.5 Relations produced from E - R Model will always be in. (a) 1NF (b) 2NF (c) 3NF (d) 4NF Q6. Which of the following is TRUE? (a) Every relation in 3NF is also in BCNF (b) A relation R is in 3NF if every non-prime attribute of R is fully functionally dependent on every key of R (c) Every relation in BCNF is also in 3NF (d) No relation can be in both BCNF and 3NF Q7. Which normal form is based on the concept of 'full functional dependency'? (a) First Normal Form (b) Second Normal Form (c) Third Normal Form (d) Fourth Normal Form Q8. Which of the following is false? (a) Every binary relation is never be in BCNF (b) Every BCNF relation is in 3NF (c) 1 NF, 2 NF, 3 NF and BCNF are based on functional dependencies (d) Multivalued Dependency (MVD) is a special case of Join Dependency (JD) Q9. If every non-key attribute is functionally dependent on the primary key, then the relation is in. (a) First Normal Form (b) Second Normal Form (c) Third Normal Form (d) Fourth Normal Form Q10. Let R = (A, B, C, D, E, F) be a relation schema with the following dependenciesC->F, E->A, EC->D, A->B. Which of the following is a key of R? (a) CD (b) EC (c) AE (d) AC Q11. Which normal form is considered adequate for normal relational database design? (a) 2NF (b) 5NF (c) 4NF (d) 3NF Q12. Which of the following FD can’t be implied from FD set: {A->B, A->BC, C->D} ? (a) A->C (b) B->D (c) BC->D (d) All of the above Solutions: 1) B 2) D 3) B 4) B 5) A 6) C 7) B 8) A 9) B 10) B 11) D 12) B Questions on Normalization Q1. Consider the following set of functional dependencies on the scheme R(ABCDEF) F= {AB→ C, C→D, B→E, B→F} 1) Find the candidate keys in the R. 2) Check the highest normal form in R. 3) Decompose the table if required. 4) Check whether the decomposition is lossless or lossy. 5) Check whether the decomposition is dependency preserving or not. Q2. Consider the following set of functional dependencies on the scheme R(ABC) F= {AB→ C, C→B} 1) Find the candidate keys in the R. 2) Check the highest normal form in R. 3) Decompose the table if required. 4) Check whether the decomposition is lossless or lossy. 5) Check whether the decomposition is dependency preserving or not. Q3. Consider the following set of functional dependencies on the scheme R(S, T, U, V) F = {S→T, T→U, U→ V, V→ S} 1) Find the candidate keys in the R. 2) Check the highest normal form in R. 3) Decompose the table if required. 4) Check whether the decomposition is lossless or lossy. 5) Check whether the decomposition is dependency preserving or not. Q4. Consider the following set of functional dependencies on the scheme (A, B, C) F = {A→BC, B→C, A→B, AB→C} The canonical cover for this set is: (a) A→BC and B→ C (b) A→BC and AB→ C (c) A→ BC and A→B (d) A→ B and B→ C Q5. Consider the 2-relation schema: R1= (A, B, C, D, E) and R2= (A, B, C, D, E). Statement 1 is the FD of R1 and statement 2 is the FD of R2. (i) A→ B, AB→ C, D→ AC, D→ E (ii) A→ BC, D→ AE Which of the following statements is true? (a) FD of R1 is equivalent to FD of R2 (b) FD of R1 and R2 not equivalent (c) We cannot compare FD of R1 and R2 (d) None of the above Solutions 1. – 2. – 3. – 4. D 5. A Definition The transaction is a set of logically related operation. It contains a group of tasks. Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main memory. Write(X): Write operation is used to write the value back to the database from the buffer. Commit: It is used to save the work done permanently. Rollback: It is used to undo the work done. ACID Properties Atomicity (either all or none) Consistency (The total amount must be maintained before or after the transaction) Isolation (In isolation, if the transaction T1 is being executed and using the data item X, then that data item can't be accessed by any other transaction T2 until the transaction T1 ends) Durability( It states that the transaction made the permanent changes) Types of problems in concurrency Dirty read Incorrect summary Lost update Unrepeatable read Phantom read Dirty Read or Uncommited Read or RAW Incorrect summary Lost Update Unrepeatable read Phantom read Irrecoverable Recoverable Conflict Serializable Conflict Serializable: A schedule is called conflict serializable if it can be transformed into a serial schedule by swapping non-conflicting operations. Locks Shared Lock Exclusive lock Irrecoverable Schedule T1 T2 R(A) W(A) R(A) W(A) COMMIT ROLLBACK Irrecoverable vs Recoverable schedule T1 T2 T1 T2 W(A) W(A) R(A) R(A) COMMIT ROLLBACK ROLL COMMIT How to ensure recoverability If T2 depends on T1 then T2 must commit only after T1 commit/rollback. Check cascade less or not? T1 T2 R(A) R(A) W(A) R(B) W(B) C/R R(B) C/R T1 T2 T3 R(A) R(C) R(C) R(A) R(B) W(A) W(B) R(B) W(C) W(B) C1 C2 C3 Shared & Exclusive Lock Shared Lock/Read lock: Only Read operation Exclusive lock/Write lock: Both Read/Write Problems with S/X locks: Serializability may not achieved, Irrecoverability also occur, Deadlock and starvation possible. T1 T2 R(A) W(A) R(A) R(B) R(B) W(B) 2 Phase locking protocol Growing Phase Shrinking Phase Transaction can not acquire a lock has been released. On commit/rollback all the locks will be released. It guarantee conflict serializable schedule but not recoverable May occur deadlock and starvaton T1 T2 R(A) W(A) R(A) R(B) R(B) W(B) T1 T2 T1 T2 R(A) X(A) W(A) R(A) R(A) W(A) C U(A) S(A) C R(A) U(A) C C T1 T2 T1 T2 T3 T4 R(A) S(A) W(B) R(B) X(A) W(A) S(A) S(A) Strict 2PL vs Rigorous 2PL Basic 2PL with all exclusive locks should be hold until commit/rollback. Basic 2PL with all exclusive locks should be hold until commit/rollback. T1 T2 X(A) R(A) W(A) U(A) S(A) R(A) U(A) C C Introduction Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed. Index structure Key and data pointer Example In a hard disk every block is of size 1000 bytes, Record size=100 bytes. Key field=12 bytes, pointer takes 8 bytes. There is a file of 10,000 records. What will be I/O with and without indexing. Dense & sparse Types Primary Clustered Secondary Multilevel( B & B+ Tree) Key Non Key Ordered Primary (Sparse) Clustered(Sparse) Non-Ordered Secondary(dense) Secondary(dense) Primary Index If the index is created on the basis of the primary key of the table, then it is known as primary indexing. These primary keys are unique to each record and contain 1:1 relation between the records. As primary keys are stored in sorted order, the performance of the searching operation is quite efficient. The primary index can be classified into two types: Dense index and Sparse index. Clustering Index A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary key columns which may not be unique for each record. In this case, to identify the record faster, we will group two or more columns to get the unique value and create index out of them. This method is called a clustering index. The records which have similar characteristics are grouped, and indexes are created for these group. Secondary Index B Tree Index will be stored in tree structure It is balanced always Terminologies ( Block pointer, Keys, Data Pointer, Order) Order of a B tree is maximum no of Block pointers(Children) Imp Points Keys are distributed all over the tree. Left node will contain smaller values then parent and right contain larger than parent. Order of leaf and non leaf is same B+ tree Data pointers are only present in the leaf node Searching is faster and deletion is easy Leaf nodes are linked together like a linked list Order of leaf and non leaf node is different Leaf node order is maximum key and data pointer pair Non leaf order is maximum children possible Q1. Which one of the following statements is NOT correct about the B+ tree data structure used for creating an index of a relational database table? a) B+ tree is a height-balanced tree b) Non-leaf nodes have pointers to data records c) Key values in each node are kept in sorted order d) Each leaf node has a pointer to the next leaf node Q2. In a B+ tree, if the search-key value is 8 bytes long, the block size is 512 bytes and the block pointer is 2 bytes, then the maximum order of the B+ tree is________ a) 52 b) 54 c) 56 d) 57 Q3. B+ trees are considered BALANCED because a) the length of the paths from the root to all leaf nodes are all equal. b) the length of the paths from the root to all leaf nodes differ from each other by at most 1. c) the number of children of any two non-leaf sibling nodes differ by at most 1. d) the number of records in any two leaf nodes differ by at most 1. Q4. With reference to the B+ tree index of order 1 shown below, the minimum number of nodes(including the root node) that must be fetched in order to satisfy the following query: “Get all records with a search key greater than or equal to 7 and less than 15” is 9 a) 4 5 13 17 b) 5 c) 6 d) 7 1 3 5 7 9 11 13 15 17 Q5. Consider a B+ tree in which the maximum number of keys in a node is 5. What is the minimum number of keys in any non-root node? a) 1 b) 2 c) 3 d) 4 Q6. The order of a leaf node in a tree B+ is the maximum number of (value, data record pointer) pairs it can hold. Given that the block size is 1K bytes, the data record pointer is 7 bytes long, the value field is 9 bytes long and a block pointer is 6 bytes long. What is the order of the leaf node? a) 63 b) 64 c) 67 d) 68 Ans: 1. B 2. A 3. A 4. B 5. B 6. A Advance Topics DBMS Topics to be covered BIG Data(Structured vs Unstructured data, 3V’s, Hadoop, Map Reduce) NoSQL Datawarehouse, Data Mining Definition HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. NoSQL databases (aka "not only SQL") are non tabular, and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads. Definition A data warehouse is a large collection of business data used to help an organization make decisions. core component of business intelligence Extract, transform, load (ETL) and extract, load, transform (ELT) are the two main approaches used to build a data warehouse system. A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area) OLAP vs OLTP Online analytical processing (OLAP) is characterized by a relatively low volume of transactions. OLAP applications are widely used by Data Mining techniques The three basic operations in OLAP are: Roll-up (Consolidation), Drill-down and Slicing & Dicing. Online transaction processing (OLTP) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). Star Schema Snowflake schema A snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. DBMS Q.1 Let Ri(z) and Wi(z) denote read and write operations on a data element z by a transaction Ti, respectively. Consider the schedule S with four transactions. S : R4(x), R2(x), R3(x), R1(y), W1(y), W2(x), W3(y), R4(x) Which one of the following serial schedules is conflict equivalent to S? a) T1 T3 T4 T2 b) T1 T4 T3 T2 c) T4 T1 T3 T2 d) T3 T1 T4 T2 Q2. Consider the following schedule S of transactions T1, T2, T3, T4: T1 T2 T3 T4 Reads (X) Writes (X) Commit Writes (X) Commit Writes (Y) Reads (Z) Commit Reads (X) Reads (Y) Commit Which one of the following statements is CORRECT? (a) S is conflict-serializable but not recoverable (b) S is not conflict-serializable but is recoverable (c) S is both conflict-serializable and recoverable (d) S is neither conflict-serializable nor is it recoverable Q3. Which one of the following is NOT a part of the ACID properties of database transactions? (a) Atomicity (b) Consistency (c) Isolation (d) Deadlock-freedom Q.4 Let S be the following schedule of operations of three transactions T1, T2 and T3 in a relational database system: R2(Y),R1(X),R3(Z),R1(Y)W1(X),R2(Z),W2(Y),R3(X),W3(Z) Consider the statements P and Q below:  P: S is conflict-serializable.  Q: If T3 commits before T1 finishes, then S is recoverable. Which one of the following choices is correct? a) Both P and Q are true b) P is true and Q is false c) P is false and Q is true d) Both P and Q are false Q5. Consider a simple checkpointing protocol and the following set of operations in the log. (start, T4); (write, T4, y, 2, 3); (start, T1); (commit, T4); (write, T1, z, 5, 7); (checkpoint); (start, T2); (write, T2, x, 1, 9); (commit, T2); (start, T3), (write, T3, z, 7, 2); If a crash happens now the system tries to recover using both undo and redo operations, what are the contents of the undo list and the redo list? (a) Undo: T3, T1; Redo: T2 (b) Undo: T3, T1; Redo: T2, T4 (c) Undo: none; Redo; T2, T4, T3, T1 (d) Undo: T3, T1, T4; Redo: T2 Q.6 Consider the following transaction involving two bank accounts x and y. read(x); x : = x–50; write(x); read(y); y:=y+50; write(y) The constraint that the sum of the accounts x and y should remain constant is that of (a) Atomicity (b) Consistency (c) Isolation (d) Durability Q7. Let ri(z) and wi(z) denote read and write operations respectively on a data item z by a transaction Ti. Consider the following two schedules.  S1:r1(x)r1(y)r2(x)r2(y)w2(y)w1(x)  S2:r1(x)r2(x)r2(y)w2(y)r1(y)w1(x) Which one of the following options is correct? a) S1 is conflict serializable, and S2 is not conflict serializable b) S1 is not conflict serializable, and S2 is conflict serializable c) Both S1 and S2 are conflict serializable d) Niether S1 nor S2 is conflict serializable Q8. Suppose a database system crashes again while recovering from a previous crash. Assume checkpointing is not done by the database either during the transactions or during recovery. Which of the following statements is/are correct? a) The same undo and redo list will be used while recovering again b) The system cannot recover any further c) All the transactions that are already undone and redone will not be recovered again d) The database will become inconsistent. Solutions: 1) A 2) C 3) D 4) B 5) A 6) B 7) B 8) A RAID DBMS RAID Redundant Array of Independent Disks Redundant Array of Inexpensive Disks Minimum Level Description Space efficiency Fault tolerance no. of drives RAID 0 Block-level striping without parity or mirroring 2 1 None 1 RAID 1 Mirroring without parity or striping 2 n − 1 drive failures 𝑛 1 RAID 3 Byte-level striping with dedicated parity 3 1− One drive failure 𝑛 1 RAID 4 Block-level striping with dedicated parity 3 1− One drive failure 𝑛 1 RAID 5 Block-level striping with distributed parity 3 1− One drive failure 𝑛 2 RAID 6 Block-level striping with double distributed parity 4 1− Two drive failure 𝑛 Ques No 1. The basic component of a file in a file system is a ___. (A) Data item (B) Field (C) Record (D) Tuple Ques No 2. Database Management System is a ___ system to facilitate the creation and maintenance of a computerized database. (A) Physical (B) Software (C) Hardware (D) Knowledge Ques No 3. UoD stands for ___. (A) Unordered Discourse (B) Union of Discourse (C) Universe of Discourse (D) None of the above Ques No 4. The E-R data model is based on a perception of the real world that consists of a set of basic objects called ___. (A) Classes (B) Entities (C) Objects (D) Attributes Ques No 5. In MySQL architecture, the function of the ___ is to keep copies of data for retrieval later, in case of a loss of data. (A) Transaction Manager (B) Buffer Manager (C) Recovery Manager (D) Query Engine Ques No 6. ___ is a family of products that meet the data storage requirements of the largest data processing systems and commercial websites. (A) Microsoft Windows XP (B) Microsoft Windows ME (C) Microsoft Windows Vista (D) Microsoft SQL Server 2000 Ques No 7. SQL is the ___ standard language for interacting with an RDBMS. (A) OSI (B) ISI (C) ISO (D) IEEE Ques No 8. ___ function is used to find the number of values in a column. (A) SUM (B) TOTAL (C) ADD (D) COUNT Ques No 9. The ___ operation removes common tuples from the first relation. (A) Union (B) Difference (C) Cartesian Product (D) Projection Ques No 10. A distributed database system consists of a collection of sites, each of which maintains a ___ database system. (A) Local (B) Remote (C) Administrator (D) None of the above Ques No 11. CAD is an example for ___. (A) Design databases (B) Multimedia databases (C) Knowledge databases (D) None of the above Ques No 12. ___ were among the first to use databases in a geographically distributed manner. (A) Railways (B) Waterways (C) Airlines (D) Roadways Ques No 13. In MySQL architecture ___ interfaces with the operating system to write data to the disk efficiently. (A) Query Engine (B) Buffer Manager (C) Transaction Manager (D) Storage Manager Ques No 14. ___ model was formally defined by Conference on Data Systems Languages (CODASYL) in 1971. (A) Hierarchical (B) Object/Relational (C) Object-Oriented (D) Network Answers 1) A 2) B 3) C 4) B 5) C 6) D 7) C 8) D 9) B 10)A 11)A 12)C 13)D 14)D

Use Quizgecko on...
Browser
Browser