pdfcoffee.com_introduction-to-database-management-system-second-editionpdf-pdf-free.pdf
Document Details
2017
Tags
Full Transcript
Contents i Introduction to Database Management System Introduction to Database Management System By Dr. Satinder Bal Gupta...
Contents i Introduction to Database Management System Introduction to Database Management System By Dr. Satinder Bal Gupta Aditya Mittal B.Tech., (CSE), MCA, UGC-NET, Ph.D (CS) B.E. (Computer Science and Engg.), Prof., Deptt. of Computer Sci. & App. Software Engineer (SAP–ABAP) Vaish College of Engineering, Rohtak, IBM India Pvt. Ltd. Haryana Noida, (U.P.) UNIVERSITY SCIENCE PRESS (An Imprint of Laxmi Publications Pvt. Ltd.) An ISO 9001:2008 Company BENGALURU CHENNAI COCHIN GUWAHATI HYDERABAD JALANDHAR KOLKATA LUCKNOW MUMBAI RANCHI NEW DELHI BOSTON (USA) ACCRA (GHANA) NAIROBI (KENYA) INTRODUCTION TO DATABASE MANAGEMENT SYSTEM © by Laxmi Publications (P) Ltd. All rights reserved including those of translation into other languages. In accordance with the Copyright (Amendment) Act, 2012, no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise. Any such act or scanning, uploading, and or electronic sharing of any part of this book without the permission of the publisher constitutes unlawful piracy and theft of the copyright holder’s intellectual property. If you would like to use material from the book (other than for review purposes), prior written permission must be obtained from the publishers. Printed and bound in India Typeset at Sukuvisa Enterprises, Delhi First Edition : 2009; Second Edition : 2017 ISBN 978-93-81159-31-6 Limits of Liability/Disclaimer of Warranty: The publisher and the author make no representation or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties. The advice, strategies, and activities contained herein may not be suitable for every situation. In performing activities adult supervision must be sought. Likewise, common sense and care are essential to the conduct of any and all activities, whether described in this book or otherwise. Neither the publisher nor the author shall be liable or assumes any responsibility for any injuries or damages arising here from. The fact that an organization or Website if referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers must be aware that the Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. All trademarks, logos or any other mark such as Vibgyor, USP, Amanda, Golden Bells, Firewall Media, Mercury, Trinity, Laxmi appearing in this work are trademarks and intellectual property owned by or licensed to Laxmi Publications, its subsidiaries or affiliates. Notwithstanding this disclaimer, all other names and marks mentioned in this work are the trade names, trademarks or service marks of their respective owners. & Bengaluru 080-26 75 69 30 & Chennai 044-24 34 47 26, 24 35 95 07 & Cochin 0484-237 70 04, 405 13 03 & Guwahati 0361-254 36 69, 251 38 81 Branches & Hyderabad 040-27 55 53 83, 27 55 53 93 & Jalandhar 0181-222 12 72 & Kolkata 033-22 27 43 84 & Lucknow 0522-220 99 16 & Mumbai 022-24 91 54 15, 24 92 78 69 Published in India by & Ranchi 0651-220 44 64 UNIVERSITY SCIENCE PRESS (An Imprint of Laxmi Publications Pvt.Ltd.) An ISO 9001:2008 Company 113, GOLDEN HOUSE, DARYAGANJ, NEW DELHI - 110002, INDIA Telephone : 91-11-4353 2500, 4353 2501 Fax : 91-11-2325 2572, 4353 2528 C— www.laxmipublications.com [email protected] Printed at: Dedicat ed To My Little Son Aditya Bal Gupta —Satinder Bal Gupta Dedicat ed To My Mother Saroj Bala and My GOD — Aditya Mittal Contents Preface (xi) 1. Introduction to Database Systems 1–43 1.1 Introduction 1 1.2 Basic Definitions and Concepts 1 1.2.1 Data 1 1.2.2 Information 2 1.2.3 Meta data 2 1.2.4 Data dictionary 3 1.2.5 Database 3 1.2.6 Components of a database 4 1.2.7 Database management system (DBMS) 4 1.2.8 Components of DBMS 5 1.3 Traditional File System Versus Database Systems 6 1.3.1 Disadvantages of traditional file system 7 1.3.2 Database systems or database system environment 8 1.3.3 Advantages of database systems (DBMS’s) 9 1.3.4 Disadvantages of database systems 11 1.4 DBMS Users 11 1.4.1 End users or naive users 12 1.4.2 Online users 12 1.4.3 Application programmers 12 1.4.4 Database administrator 12 1.5 Database or DBMS Languages 13 1.5.1 Data definition language (DDL) 13 1.5.2 Storage definition language (SDL) 13 1.5.3 View definition language (VDL) 13 1.5.4 Data manipulation language (DML) 13 1.5.5 Fourth-generation language (4-GL) 14 1.6 Schemas, Subschema and Instances 14 vii viii Contents 1.6.1 Schema 14 1.6.2 Subschema 15 1.6.3 Instances 15 1.7 Three Level Architecture of Database Systems (DBMS) 16 1.7.1 Levels or views 16 1.7.2 Different mappings in three level architecture of DBMS 17 1.7.3 Advantages of three-level architecture 18 1.7.4 Data independence 18 1.8 Data Models 18 1.8.1 Types of data models 19 1.8.2 Comparison of various data models 24 1.8.3 Which data models to use? 25 1.9 Types of Database Systems 25 1.9.1 According to the number of users 26 1.9.2 According to the type of use 27 1.9.3 According to database site locations 27 1.10 Comparison between Client/Server and Distributed Database System 32 Test Your Knowledge 32 Exercises 41 2. E-R and EER Models 44–91 2.1 Introduction 44 2.2 Basic Concepts 45 2.3 Types of Attributes 46 2.3.1 Simple and composite attributes 46 2.3.2 Single valued and multi-valued attributes 46 2.3.3 Derived attributes and stored attributes 46 2.4 Relationship Sets 47 2.4.1 Degree of relationship sets 47 2.4.2 Role and recursive relationship set 48 2.5 Mapping Constraints 48 2.5.1 Mapping cardinalities (cardinality ratios) 48 2.5.2 Participation constraints 50 2.6 Keys 50 2.6.1 Types of keys 50 2.7 Entity—Relationship Diagram 51 2.7.1 Advantages of E-R model 53 2.7.2 Limitation of E-R model 54 2.8 Types of Entity Sets 54 2.8.1 Strong entity sets 54 Contents ix 2.8.2 Weak entity sets 54 2.9 Enhanced Entity-Relationship (EER) Model 55 2.9.1 Superclass and subclass entity types 55 2.9.2 Specialization 57 2.9.3 Generalization 57 2.9.4 Attribute inheritance 57 2.9.5 Aggregation 57 2.9.6 Specialization and generalization constraints 58 2.9.7 Categorization 59 2.10 Reduction of an E-R and EER Diagram into Tables 60 Test Your Knowledge 78 Exercises 87 3. File Organization 92–130 3.1 Introduction 92 3.2 Basic Concepts of Files 92 3.2.1 Records and record types 92 3.2.2 Types of files 94 3.3 File Organization Techniques 94 3.3.1 Heap file organization 95 3.3.2 Sequential file organization 95 3.3.3 Index sequential file organization 99 3.3.4 Hashing 101 3.3.5 Direct file organization 105 3.4 Indexing 106 3.4.1 Ordered indexing 106 3.4.2 Hashed indexing 110 3.5 B-tree Index Files 110 3.6 B+–Tree Index Files 111 3.7 Comparison of Different File Organizations 113 3.8 Factors Affecting Choice of File Organization 113 Test Your Knowledge 117 Exercises 126 4. Data Models 131–161 4.1 Introduction 131 4.2 Hierarchical Model 132 4.2.1 Tree structure diagrams 132 4.2.2 Operations on hierarchical data model 135 4.2.3 Query language for hierarchical databases 135 4.2.4 Virtual records 137 x Contents 4.2.5 Implementation of tree structure diagram 138 4.3 Network Model 139 4.3.1 Graph structure diagrams 139 4.3.2 Operations on network data model 143 4.3.3 Query language for network databases 143 4.3.4 Implementation of graph structure diagram 144 4.4 Relational Model 146 4.4.1 Definition of relation 146 4.4.2 Data structure of relational database 146 4.4.3 Integrity constraints 147 4.5 CODD’s Rules 149 4.5.1 Operations on relational model 151 4.6 Comparison of DBMS and RDBMS 153 4.7 Comparison of Data Models 153 Test Your Knowledge 155 Exercises 160 5. Relational Algebra and Calculus 162–192 5.1 Introduction 162 5.2 Relational algebra 162 5.2.1 Operations in relational algebra 163 5.3 Relational Calculus 172 5.3.1 Tuple relational calculus 173 5.3.2 Domain relational calculus 175 5.4 Comparison of Domain Relational Calculus and Tuple Relational Calculus 177 5.5 Comparison of Relational Calculus and Relational Algebra 177 Test Your Knowledge 183 Exercises 190 6. Functional Dependency and Normalisation 193–272 6.1 Introduction 193 6.2 Informal Design Guidelines for Relation Schemas 193 6.3 Functional Dependencies 195 6.3.1 Functional dependency chart/diagram 195 6.3.2 Types of functional dependencies 196 6.4 Anomalies in Relational Database 199 6.5 Dependencies and Logical Implications 200 6.5.1 Armstrong’s axioms 200 6.6 Closure of a Set of Functional Dependencies 202 6.6.1 Closure of an attribute w.r.t. the set of FD’s F 203 6.7 Covers 204 Contents xi 6.7.1 Types of cover 205 6.7.2 Identifying redundant functional dependencies 206 6.7.3 Covers and equivalent sets of FDs 208 6.7.4 Minimal set of functional dependencies 209 6.7.5 Finding a minimal cover for a set of FDs 209 6.8 Keys and Functional Dependencies 211 6.9 Decompositions 214 6.9.1 Properties of a good decomposition 214 6.10 Normalisation 215 6.10.1 Benefits of normalisation 215 6.10.2 Various normal forms 216 6.11 Denormalisation 228 6.11.1 The need of denormalization 229 6.11.2 Issues to be considered when deciding denormalization 229 6.11.3 Advantages of denormalization 230 6.11.4 Disadvantages of denormalization 230 Test Your Knowledge 250 Exercises 262 7. Query Languages 273–347 7.1 Introduction 273 7.2 Structured Query Language (SQL) 273 7.2.1 Characteristics of SQL 273 7.2.2 Advantages of SQL 274 7.2.3 Parts (Components) of SQL language 274 7.2.4 Basic data types 276 7.2.5 Data manipulation in SQL 276 7.2.6 Data definition language (DDL) 303 7.2.7 Data control language (DCL) 308 7.3 Query By Example (QBE) 311 7.3.1 QBE dictionary 311 7.3.2 Data types 312 7.3.3 Data definition functions 312 7.3.4 Update operations 314 7.3.5 Data retrieval operations 316 7.3.6 Built-in functions 320 7.3.7 Features of QBE 321 7.3.8 Advantages of QBE 321 7.3.9 Limitations of QBE 321 7.3.10 Commercial database management systems providing QBE feature 321 xii Contents 7.4 Comparison of SQL and QBE 321 Test Your Knowledge 334 Exercises 343 8. Transactions and Concurrency Control 348–393 8.1 Introduction 348 8.2 Transaction 348 8.2.1 ACID properties of transaction 348 8.2.2 Transaction states 349 8.3 Some Definitions 349 8.4 Why Concurrency Control is Needed? 350 8.5 Concurrency Control Techniques 352 8.5.1 Lock-based protocols 352 8.5.2 Graph based protocols 358 8.5.3 Timestamp-based protocols 360 8.5.4 Validation based protocols 361 8.5.5 Multiversion concurrency control protocols 363 8.6 Deadlocks 364 8.6.1 Necessary conditions for deadlock 364 8.6.2 Methods for handling deadlocks 364 Test Your Knowledge 384 Exercises 389 9. Database Security and Authorization 394–403 9.1 Introduction 394 9.2 Security Violations 394 9.3 Authorization (Access Rights) 395 9.4 Views 396 9.5 Granting of Privileges 396 9.6 Notion of Roles (Grouping of Users) 397 9.7 Audit Trails 398 9.7.1 Advantages of audit trails 398 9.7.2 Audit trails and logs 399 9.7.3 Review of audit trails 399 Test Your Knowledge 400 Exercises 402 10. Database Recovery System 404–418 10.1 Introduction 404 10.2 Classification of Failures 404 10.3 Recovery Concept 405 Contents xiii 10.3.1 Log based recovery 405 10.4 Shadow Paging 408 Test Your Knowledge 412 Exercises 416 11. Query Processing and Optimization 419–447 11.1 Introduction 419 11.2 Basics of Query Processing 419 11.2.1 General strategy for query processing 420 11.2.2 Steps in query processing 420 11.3 Query Optimization 425 11.3.1 Transformation rules for relational algebra 426 11.3.2 Heuristic query optimization 428 11.3.3 Cost based query optimization 432 Test Your Knowledge 439 Exercises 444 12. Parallel and Distributed Databases 448–479 12.1 Introduction 448 12.2 Parallel Databases 448 12.2.1 Parallel database architecture 449 12.2.2 The key elements of parallel database processing 451 12.2.3 Query parallelism 452 12.2.4 Advantages of parallel databases 455 12.2.5 Disadvantages of parallel databases 455 12.3 Distributed Databases 455 12.3.1 Basic concepts of distributed databases 456 12.3.2 Distributed database management system (DDBMS) 457 12.3.3 Advantages of distributed databases 457 12.3.4 Disadvantages of distributed databases 458 12.3.5 Data distribution 459 12.3.6 Data replication and allocation 460 12.3.7 Distributed DBMS architectures 463 12.3.8 Comparison of DBMS and DDBMS 464 12.3.9 Query processing in distributed databases 464 12.4 Comparison of Parallel and Distributed Databases (DDB’s) 466 Test Your Knowledge 470 Exercises 476 13. Data Warehouses and Data Mining 480–510 13.1 Introduction 480 xiv Contents 13.2 Data Warehouse 481 13.2.1 Distinctive characteristics of data warehouses 481 13.2.2 Difference between database and data warehouse 482 13.2.3 Data warehouse architecture 483 13.2.4 Data warehouse components 484 13.2.5 Advantages of data warehouse 485 13.2.6 Disadvantages/limitations of data warehouse 485 13.3 Data Mart 486 13.3.1 Benefits of a data mart 486 13.3.2 Types of data marts 486 13.3.3 Steps to implement a data mart 488 13.3.4 How data mart is different from a data warehouse? 489 13.4 OLTP (On-line Transaction Processing) 490 13.4.1 Limitations of OLTP 490 13.5 OLAP (On-Line Analytical Processing) 491 13.5.1 Codd’s OLAP characteristics 491 13.5.2 Difference between OLTP and OLAP 492 13.5.3 OLAP operations 493 13.5.4 Types of OLAP systems 494 13.6 Data Mining 497 13.6.1 Data mining process as a part of knowledge discovery process 497 13.6.2 Goals of data mining 498 13.6.3 Elements of data mining 499 13.6.4 Types of knowledge discovered during data mining 499 13.6.5 Models of data mining 499 13.6.6 Techniques used in data mining 500 13.6.7 Data mining tools 500 13.6.8 Data mining applications 501 13.6.9 Advantages of data mining 502 13.6.10 Disadvantages of data mining 502 13.6.11 Scope of improvement in data mining 502 13.7 Comparison of Data Mining and Structured Query Language (SQL) 502 13.8 Comparison of Data Mining and Data Warehousing 503 Test Your Knowledge 504 Exercises 509 14. Database Design Project 511–519 14.1 Introduction 511 14.2 Database Design Cycle 511 Contents xv 14.2.1 Phase 1—Definition of the problem 511 14.2.2 Phase 2—Analyses of existing system and procedure 513 14.2.3 Phase 3—Preliminary design 514 14.2.4 Phase 4—Computing system requirements 515 14.2.5 Phase 5—Final design 515 14.2.6 Phase 6—Implementation and testing 516 14.2.7 Phase 7—Operation and tuning 517 14.3 Advantages of Database System Design Cycle 517 14.4 Limitations 517 Test Your Knowledge 517 Exercises 519 15. Additional Topics 520–524 15.1 Database Tuning 520 15.1.1 Why need database tuning 520 15.1.2 Types of tuning 520 15.2 Data Migration 522 15.2.1 Basic concepts of data migration 522 15.2.2 Why we need data migration 523 15.2.3 Key factors in data migration 523 15.2.4 Stages of data migration process 524 Index 525–532 Preface to the Second Edition The knowledge of database systems is an important part of education in Computer Science. Without database, an organisation is blind. The book ‘Introduction to Database Management System’ introduces concepts, designs, applications and technologies of database in a very comprehensive manner. It assumes no previous knowledge of databases and database technology. The book is self-contained. The chapters are written in a very simple language with a lot of examples and exercises. The book also contains a large number of figures to explain the concepts in a better way. It will serve as a textbook in databases for both undergraduate (B.E., B.Tech, B.C.A., B.Sc.) and postgraduate (M.Sc., ME, M.Tech, M.C.A.) students. The book fills in the gap between theoretical learning and practical implementation of concepts of databases for IT professionals and engineers. We hope that it will be useful to the readers in all ways without any difficulty and welcome suggestions from the students, faculty members and general readers for further improvement of the book. — AUTHORS xvi Preface to the First Edition The knowledge of database systems are an important part of an education in Computer Science. Without database, an organisation is blind. The book ‘Introduction to Database Management System’ introduces concepts, design, applications and technologies of database in a very comprehensive manner. It assumes no previous knowledge of databases and database technology. The book is self-contained. The chapters are written in a very simple language with a lot of examples and exercises. The book also contains a large number of figures to explain the concepts in a better way. The book will serve as a textbook in databases for both undergraduate (B.E., B.Tech, B.C.A., B.Sc.) and postgraduate (M.Sc., ME, M.Tech, M.C.A.). The book fills the gap between theoretical learning and practical implementation of concepts of databases for IT professionals and engineers. We hope that this book will be useful to the readers in all ways without any difficulty. We would welcome the suggestions for further improvement of the book given by our readers. — AUTHORS xvii Acknowledgement We would like to express our gratitude to a large number of persons from whom we sought constant help, encouragement and co-operation. We are very thankful to Mrs. Monika Gupta, Lecturer, Vaish College, Rohtak for giving us valuable suggestions for improving the book. We are very thankful to Mr. Deepak Gupta, Assistant Professor, Satpriya Institute of Management Studies and Research, Rohtak for motivating us at initial stages of the book. Finally, we dedicate this book to our family members for their patience, encouragement, motivation, understanding and sacrifices. — AUTHORS xviii Introduction to Database Systems Chapter 1 1.1 Introduction An organization must have accurate and reliable data (information) for effective decision making. Data (information) is the backbone and most critical resource of an organization that enables managers and organizations to gain a competitive edge. In this age of information explosion, where people are bombarded with data, getting the right information, in the right amount, at the right time is not an easy task. So, only those organizations will survive that successfully manage information. A database system simplifies the tasks of managing the data and extracting useful information in a timely fashion. A database system is an integrated collection of related files, along with the details of the interpretation of the data. A Data Base Management System is a software system or program that allows access to data contained in a database. The objective of the DBMS is to provide a convenient and effective method of defining, storing, and retrieving the information stored in the database. The database and database management systems have become essential for managing business, governments, schools, universities, banks etc. 1.2 Basic Definitions and Concepts In an organization, the data is the most basic resource. To run the organization efficiently, the proper organization and management of data is essential. The formal definition of the major terms used in databases and database systems is defined in this section. 1.2.1 Data The term data may be defined as known facts that could be recorded and stored on Computer Media. It is also defined as raw facts from which the required information is produced. 1 2 Introduc tion to Database Management System 1.2.2 Information Data and information are closely related and are often used interchangeably. Information is nothing but refined data. In other way, we can say, information is processed, organized or summarized data. According to Burch et. al., “Information is data that have been put into a meaningful and useful content and communicated to a recipient who uses it to made decisions”. Information consists of data, images, text, documents and voice, but always in a meaningful content. So we can say, that information is something more than mere data. Data are processed to create information. The recipient receives the information and then makes a decision and takes an action, which may triggers other actions Input Processing Output User Data Information Decision In these days, there is no lack of data, but there is lack of quality information. The quality information means information that is accurate, timely and relevant, which are the three major key attributes of information. 1. Accuracy : It means that the information is free from errors, and it clearly and accurately reflects the meaning of data on which it is based. It also means it is free from bias and conveys an accurate picture to the recipient. 2. Timeliness : It means that the recipients receive the information when they need it and within the required time frame. 3. Relevancy : It means the usefulness of the piece of information for the corresponding persons. It is a very subjective matter. Some information that is relevant for one person might not be relevant for another and vice versa e.g., the price of printer is irrelevant for a person who wants to purchase computer. So, organization that have good information system, which produce information that is accurate, timely and relevant will survive and those that do not realize the importance of information will soon be out of business. 1.2.3 Meta Data A meta data is the data about the data. The meta data describe objects in the database and makes easier for those objects to be accessed or manipulated. The meta data describes the database structure, sizes of data types, constraints, applications, autorisation etc., that are used as an integral tool for information resource management. There are three main types of meta data : 1. Descriptive meta data : It describes a resource for purpose such as discovery and identification. In a traditional library cataloging that is form of meta data, title, abstract, author and keywords are examples of meta data. 2. Structural meta data : It describes how compound objects are put together. The example is how pages are ordered to form chapters. 3. Administrative meta data : It provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. There are several subsets of data. Introduc tion to Database Systems 3 1.2.4 Data Dictionary The data dictionary contains information of the data stored in the database and is consulted by the DBMS before any manipulation operation on the database. It is an integral part of the database management systems and store meta data i.e., information about the database, attribute names and definitions for each table in the database. It helps the DBA in the management of the database, user view definitions as well as their use. Data dictionary is generated for each database and generally stores and manages the following types of information : 1. The complete information about physical database design e.g. storage structures, access paths and file sizes etc. 2. The information about the database users, their responsibilities and access rights of each user. 3. The complete information about the schema of the database. 4. The high level descriptions of the database transactions, applications and the infor- mation about the relationships of users to the transactions. 5. The information about the relationship between the data items referenced by the database transactions. This information is helpful in determining which transactions are affected when some data definitions are modified. The data dictionaries are of two types : Active data dictionary and passive data dictionary. 1. Active Data Dictionary : It is managed automatically by the database management system (DBMS) and are always consistent with the current structure and definition of the database. Most of the RDBMS’s maintain active data dictionaries. 2. Passive Data Dictionary : It is used only for documentation purposes and the data about fields, files and people are maintained into the dictionary for cross references. It is generally managed by the users of the system and is modified whenever the structure of the database is changed. The passive dictionary may not be consistent with the structure of the database, since modifications are performed manually by the user. It is possible that passive dictionaries may contain information about organisational data that is not computerized as these are maintained by the users. 1.2.5 Database A database is a collection of interrelated data stored together with controlled redundancy to serve one or more applications in an optimal way. The data are stored in such a way that they are independent of the programs used by the people for accessing the data. The approach used in adding the new data, modifying and retrieving the existing data from the database is common and controlled one. It is also defined as a collection of logically related data stored together that is designed to meet information requirements of an organization. We can also define it as an electronic filling system. The example of a database is a telephone directory that contains names, addresses and telephone numbers of the people stored in the computer storage. 4 Introduc tion to Database Management System Databases are organized by fields, records and files. These are described briefly as follows : 1.2.5.1 Fields It is the smallest unit of the data that has meaning to its users and is also called data item or data element. Name, Address and Telephone number are examples of fields. These are represented in the database by a value. 1.2.5.2 Records A record is a collection of logically related fields and each field is possessing a fixed number of bytes and is of fixed data type. Alternatively, we can say a record is one complete set of fields and each field have some value. The complete information about a particular phone number in the database represents a record. Records are of two types fixed length records and variable length records. 1.2.5.3 Files A file is a collection of related records. Generally, all the records in a file are of same size and record type but it is not always true. The records in a file may be of fixed length or variable length depending upon the size of the records contained in a file. The telephone directory containing records about the different telephone holders is an example of file. More detail is available in chapter 3. 1.2.6 Components of a Database A Database consists of four components as shown in Figure 1.1. Data Items Relationships Physical Database Constraints Schema Figure 1.1. Components of Database. 1. Data item : It is defined as a distinct piece of information and is explained in the previous section. 2. Relationships : It represents a correspondence between various data elements. 3. Constraints : These are the predicates that define correct database states. 4. Schema : It describes the organization of data and relationships within the database. The schema consists of definitions of the various types of record in the database, the data-items they contain and the sets into which they are grouped. The storage structure of the database is described by the storage schema. The conceptual schema defines the stored data structure. The external schema defines a view of the database for particular users. 1.2.7 Database Management System (DBMS) DBMS is a program or group of programs that work in conjunction with the operating system to create, process, store, retrieve, control and manage the data. It acts as an interface between the application program and the data stored in the database. Introduc tion to Database Systems 5 Alternatively, it can be defined as a computerized record-keeping system that stores information and allows the users to add, delete, modify, retrieve and update that information. The DBMS performs the following five primary functions : 1. Define, create and organise a database : The DBMS establishes the logical relationships among different data elements in a database and also defines schemas and subschemas using the DDL. 2. Input data : It performs the function of entering the data into the database through an input device (like data screen, or voice activated system) with the help of the user. 3. Process data : It performs the function of manipulation and processing of the data stored in the database using the DML. 4. Maintain data integrity and security : It allows limited access of the database to authorised users to maintain data integrity and security. 5. Query database : It provides information to the decision makers that they need to make important decisions. This information is provided by querying the database using SQL. 1.2.8 Components of DBMS A DBMS has three main components. These are Data Definition Language (DDL), Data Manipulation Language and Query Facilities (DML/SQL) and software for controlled access of Database as shown in Figure 1.2 and are defined as follows : USERS Database system Application Programs Data Definition Language DBMS (DDL) Components Software to process queries and programs (DML/SQL) Software for controlled access of stored data Physical Meta Data Database Figure 1.2. Components of DBMS. 6 Introduc tion to Database Management System 1.2.8.1 Data Definition Language (DDL) It allows the users to define the database, specify the data types, data structures and the constraints on the data to be stored in the database. More about DDL in section 1.5. 1.2.8.2 Data Manipulation Language (DML) and Query Language DML allows users to insert, update, delete and retrieve data from the database. SQL provides general query facility. More about DML and SQL in section 1.5. 1.2.8.3 Software for Controlled Access of Database This software provides the facility of controlled access of the database by the users, concurrency control to allow shared access of the database and a recovery control system to restore the database in case of ardware or software failure. The DBMS software together with the database is called a Database System. 1.3 Traditional File System Versus DataBase Systems Conventionally, the data were stored and processed using traditional file processing systems. In these traditional file systems, each file is independent of other file, and data in different files can be integrated only by writing individual program for each application. The data and the application programs that uses the data are so arranged that any change to the data requires modifying all the programs that uses the data. This is because each file is hard-coded with specific information like data type, data size etc. Some time it is even not possible to identify all the programs using that data and is identified on a trial-and-error basis. A file processing system of an organization is shown in Figure 1.3. All functional areas in the organization creates, processes and disseminates its own files. The files such as inventory and payroll generate separate files and do not communicate with each other. Marketing Manufacturing Inventory Payroll Application Application Application Application program program program program File 1 File 2 File 3 File 1 File 2 File 3 File 1 File 2 File 3 File 1 File 2 FIGURE 1.3. Traditional file system. No doubt such an organization was simple to operate and had better local control but the data of the organization is dispersed throughout the functional sub-systems. These days, databases are preferred because of many disadvantages of traditional file systems. Introduc tion to Database Systems 7 1.3.1 Disadvantages of Traditional File System A traditional file system has the following disadvantages: 1. Data Redundancy : Since each application has its own data file, the same data may have to be recorded and stored in many files. For example, personal file and payroll file, both contain data on employee name, designation etc. The result is unnecessary duplicate or redundant data items. This redundancy requires additional or higher storage space, costs extra time and money, and requires additional efforts to keep all files upto-date. 2. Data Inconsistency : Data redundancy leads to data inconsistency especially when data is to be updated. Data inconsistency occurs due to the same data items that appear in more than one file do not get updated simultaneously in each and every file. For example, an employee is promoted from Clerk to Superintendent and the same is immediately updated in the payroll file may not necessarily be updated in provident fund file. This results in two different designations of an employee at the same time. Over the period of time, such discrepencis degrade the quality of information contain in the data file that affects the accuracy of reports. 3. Lack of Data Integration : Since independent data file exists, users face difficulty in getting information on any ad hoc query that requires accessing the data stored in many files. In such a case complicated programs have to be developed to retrieve data from every file or the users have to manually collect the required information. 4. Program Dependence : The reports produced by the file processing system are program dependent, which means if any change in the format or structure of data and records in the file is to be made, the programs have to modified correspondingly. Also, a new program will have to be developed to produce a new report. 5. Data Dependence : The Applications/programs in file processing system are data dependent i.e., the file organization, its physical location and retrieval from the storage media are dictated by the requirements of the particular application. For example, in payroll application, the file may be organised on employee records sorted on their last name, which implies that accessing of any employee’s record has to be through the last name only. 6. Limited Data Sharing : There is limited data sharing possibilities with the traditional file system. Each application has its own private files and users have little choice to share the data outside their own applications. Complex programs required to be written to obtain data from several incompatible files. 7. Poor Data Control : There was no centralised control at the data element level, hence a traditional file system is decentralised in nature. It could be possible that the data field may have multiple names defined by the different departments of an organization and depending on the file it was in. This situation leads to different meaning of a data field in different context or same meaning for different fields. This causes poor data control. 8. Problem of Security : It is very difficult to enforce security checks and access rights in a traditional file system, since application programs are added in an adhoc manner. 8 Introduc tion to Database Management System 9. Data Manipulation Capability is Inadequate : The data manipulation capability is very limited in traditional file systems since they do not provide strong relationships between data in different files. 10. Needs Excessive Programming : An excessive programming effort was needed to develop a new application program due to very high interdependence between program and data in a file system. Each new application requires that the developers start from the scratch by designing new file formats and descriptions and then write the file access logic for each new file. 1.3.2 Database Systems or Database System Environment The DBMS software together with the Database is called a database system. In other words, it can be defined as an organization of components that define and regulate the collection, storage, management and use of data in a database. Furthermore, it is a system whose overall purpose is to record and maintain information. A database system consists of four major components as shown in Figure 1.4. 1. Data 2. Hardware 3. Software 4. Users DBMS Database (Hardware) User (Data) Application programs User (Software) (Data) User (Data) User (Users) FIGURE 1.4. Database system. 1. Data : The whole data in the system is stored in a single database. This data in the database are both shared and integrated. Sharing of data means individual pieces of data in the database is shared among different users and every user can access the same piece of data but may be for different purposes. Integration of data means the database can be function of several distinct files with redundancy controlled among the files. Introduc tion to Database Systems 9 2. Hardware : The hardware consists of the secondary storage devices like disks, drums and so on, where the database resides together with other devices. There is two types of hardware. The first one, i.e., processor and main memory that supports in running the DBMS. The second one is the secondary storage devices, i.e., hard disk, magnetic disk etc., that are used to hold the stored data. 3. Software : A layer or interface of software exists between the physical database and the users. This layer is called the DBMS. All requests from the users to access the database are handled by the DBMS. Thus, the DBMS shields the database users from hardware details. Furthermore, the DBMS provides the other facilities like accessing and updating the data in the files and adding and deleting files itself. 4. Users : The users are the people interacting with the database system in any way. There are four types of users interacting with the database systems. These are Application Programmers, online users, end users or naive users and finally the Database Administrator (DBA). More about users in section 1.4. 1.3.3 Advantages of Database Systems (DBMS’s) The database systems provide the following advantages over the traditional file system: 1. Controlled redundancy : In a traditional file system, each application program has its own data, which causes duplication of common data items in more than one file. This duplication/redundancy requires multiple updations for a single transaction and wastes a lot of storage space. We cannot eliminate all redundancy due to technical reasons. But in a database, this duplication can be carefully controlled, that means the database system is aware of the redundancy and it assumes the responsibility for propagating updates. 2. Data consistency : The problem of updating multiple files in traditional file system leads to inaccurate data as different files may contain different information of the same data item at a given point of time. This causes incorrect or contradictory information to its users. In database systems, this problem of inconsistent data is automatically solved by controlling the redundancy. 3. Program data independence : The traditional file systems are generally data dependent, which implies that the data organization and access strategies are dictated by the needs of the specific application and the application programs are developed accordingly. However, the database systems provide an independence between the file system and application program, that allows for changes at one level of the data without affecting others. This property of database systems allow to change data without changing the application programs that process the data. 4. Sharing of data : In database systems, the data is centrally controlled and can be shared by all authorized users. The sharing of data means not only the existing applications programs can also share the data in the database but new application programs can be developed to operate on the existing data. Furthermore, the requirements of the new application programs may be satisfied without creating any new file. 5. Enforcement of standards : In database systems, data being stored at one central place, standards can easily be enforced by the DBA. This ensures standardised data formats 10 Introduc tion to Database Management System to facilitate data transfers between systems. Applicable standards might include any or all of the following—departmental, installation, organizational, industry, corporate, national or international. 6. Improved data integrity : Data integrity means that the data contained in the database is both accurate and consistent. The centralized control property allow adequate checks can be incorporated to provide data integrity. One integrity check that should be incorporated in the database is to ensure that if there is a reference to certain object, that object must exist. 7. Improved security : Database security means protecting the data contained in the database from unauthorised users. The DBA ensures that proper access procedures are followed, including proper authentical schemes for access to the DBMS and additional checks before permitting access to sensitive data. The level of security could be different for various types of data and operations. 8. Data access is efficient : The database system utilizes different sophisticated techniques to access the stored data very efficiently. 9. Conflicting requirements can be balanced : The DBA resolves the conflicting requirements of various users and applications by knowing the overall requirements of the organization. The DBA can structure the system to provide an overall service that is best for the organization. 10. Improved backup and recovery facility : Through its backup and recovery subsystem, the database system provides the facilities for recovering from hardware or software failures. The recovery subsystem of the database system ensures that the database is restored to the state it was in before the program started executing, in case of system crash. 11. Minimal program maintenance : In a traditional file system, the application programs with the description of data and the logic for accessing the data are built individually. Thus, changes to the data formats or access methods results in the need to modify the application programs. Therefore, high maintenance effort are required. These are reduced to minimal in database systems due to independence of data and application programs. 12. Data quality is high : The quality of data in database systems are very high as compared to traditional file systems. This is possible due to the presence of tools and processes in the database system. 13. Good data accessibility and responsiveness : The database systems provide query languages or report writers that allow the users to ask ad hoc queries to obtain the needed information immediately, without the requirement to write application programs (as in case of file system), that access the information from the database. This is possible due to integration in database systems. 14. Concurrency control : The database systems are designed to manage simultaneous (concurrent) access of the database by many users. They also prevents any loss of information or loss of integrity due to these concurrent accesses. 15. Economical to scale : In database systems, the operational data of an organization is stored in a central database. The application programs that work on this data can be Introduc tion to Database Systems 11 built with very less cost as compared to traditional file system. This reduces overall costs of operation and management of the database that leads to an economical scaling. 16. Increased programmer productivity : The database system provides many standard functions that the programmer would generally have to write in file system. The availability of these functions allow the programmers to concentrate on the specific functionality required by the users without worrying about the implementation details. This increases the overall productivity of the programmer and also reduces the development time and cost. 1.3.4 Disadvantages of Database Systems In contrast to many advantages of the database systems, there are some disadvantages as well. The disadvantages of a database system are as follows : 1. Complexity increases : The data structure may become more complex because of the centralised database supporting many applications in an organization. This may lead to difficulties in its management and may require professionals for management. 2. Requirement of more disk space : The wide functionality and more complexity increase the size of DBMS. Thus, it requires much more space to store and run than the traditional file system. 3. Additional cost of hardware : The cost of database system’s installation is much more. It depends on environment and functionality, size of the hardware and maintenance costs of hardware. 4. Cost of conversion : The cost of conversion from old file-system to new database system is very high. In some cases the cost of conversion is so high that the cost of DBMS and extra hardware becomes insignificant. It also includes the cost of training manpower and hiring the specialized manpower to convert and run the system. 5. Need of additional and specialized manpower : Any organization having database systems, need to be hire and train its manpower on regular basis to design and implement databases and to provide database administration services. 6. Need for backup and recovery : For a database system to be accurate and available all times, a procedure is required to be developed and used for providing backup copies to all its users when damage occurs. 7. Organizational conflict : A centralised and shared database system requires a consensus on data definitions and ownership as well as responsibilities for accurate data maintenance. 8. More installational and management cost : The big and complete database systems are more costly. They require trained manpower to operate the system and has additional annual maintenance and support costs. 1.4 DBMS Users The users of a database system can be classified into various categories depending upon their interaction and degree of expertise of the DBMS. 12 Introduc tion to Database Management System 1.4.1 End Users or Naive Users The end users or naive users use the database system through a menu-oriented application program, where the type and range of response is always displayed on the screen. The user need not be aware of the presence of the database system and is instructed through each step. A user of an ATM falls in this category. 1.4.2 Online Users These type of users communicate with the database directly through an online terminal or indirectly through an application program and user interface. They know about the existence of the database system and may have some knowledge about the limited interaction they are permitted. 1.4.3 Application Programmers These are the professional programmers or software developers who develop the application programs or user interfaces for the naive and online users. These programmers must have the knowledge of programming languages such as Assembly, C, C++, Java, or SQL, etc., since the application programs are written in these languages. 1.4.4 Database Administrator Database Administrator (DBA) is a person who have complete control over database of any enterprise. DBA is responsible for overall performance of database. He is free to take decisions for database and provides technical support. He is concerned with the Back-End of any project. Some of the main responsibilities of DBA are as follows : 1. Deciding the conceptual schema or contents of database : DBA decides the data fields, tables, queries, data types, attributes, relations, entities or you can say that he is responsible for overall logical design of database. 2. Deciding the internal schema of structure of physical storage : DBA decides how the data is actually stored at physical storage, how data is represented at physical storage. 3. Deciding users : DBA gives permission to users to use database. Without having proper permission, no one can access data from database. 4. Deciding user view : DBA decides different views for different users. 5. Granting of authorities : DBA decides which user can use which portion of database. DBA gives authorities or rights to data access. User can use only that data on which access right is granted to him. 6. Deciding constraints : DBA decides various constraints over database for maintaining consistency and validity in database. 7. Security : Security is the major concern in database. DBA takes various steps to make data more secure against various disasters and unauthorized access of data. 8. Monitoring the performance : DBA is responsible for overall performance of database. DBA regularly monitors the database to maintain its performance and try to improve it. Introduc tion to Database Systems 13 9. Backup : DBA takes regular backup of database, so that it can be used during system failure. Backup is also used for checking data for consistency. 10. Removal of dump and maintain free space : DBA is responsible for removing unnecessary data from storage and maintain enough free space for daily operations. He can also increase storage capacity when necessary. 11. Checks : DBA also decides various security and validation checks over database to ensure consistency. 12. Liaisioning with users : Another task of the DBA is to liaisioning with users and ensure the availability of the data they require and write the necessary external schemas. 1.5 DataBase or DBMS Languages The DBMS provides different languages and interfaces for each category of users to express database queries and updations. When the design of the database is complete and the DBMS is chosen to implement it, the first thing to be done is to specify the conceptual and internal schemas for the database and the corresponding mappings. The following five languages are available to specify different schemas. 1. Data Definition Language (DDL) 2. Storage Definition Language (SDL) 3. View Definition Language (VDL) 4. Data Manipulation Language (DML) 5. Fourth-Generation Language (4-GL) 1.5.1 Data Definition Language (DDL) It is used to specify a database conceptual schema using set of definitions. It supports the definition or declaration of database objects. Many techniques are available for writing DDL. One widely used technique is writing DDL into a text file. More about DDL in chapter 7. 1.5.2 Storage Definition Language (SDL) It is used to specify the internal schema in the database. The storage structure and access methods used by the database system is specified by the specified set of SDL statements. The implementation details of the database schemas are implemented by the specified SDL statements and are usually hidden from the users. 1.5.3 View Definition Language (VDL) It is used to specify user’s views and their mappings to the conceptual schema. But generally, DDL is used to specify both conceptual and external schemas in many DBMS’s. There are two views of data the logical view—that is perceived by the programmer and physical view—data stored on storage devices. 1.5.4 Data Manipulation Language (DML) It provides a set of operations to support the basic data manipulation operations on the data held in the database. It is used to query, update or retrieve data stored in a database. The part of DML that provide data retrieval is called query language. 14 Introduc tion to Database Management System The DML is of the two types : (i) Procedural DML : It allows the user to tell the system what data is needed and how to retrieve it. (ii) Non-procedural DML : It allows the user to state what data are needed, rather than how it is to be retrieved. More about DML in chapter 7. 1.5.5 Fourth-Generation Language (4-GL) It is a compact, efficient and non-procedural programming language used to improve the efficiency and productivity of the DBMS. In this, the user defines what is to be done and not how it is to be done. The 4-GL has the following components in it. These are : (a) Query languages (b) Report (c) Spread sheets (d) Database languages (e) Application generators (f ) High level languages to generate application program. System Query Language (SQL) is an example of 4-GL. More about SQL in Chapter 7. 1.6 Schemas, Subschema and Instances The plans of the database and data stored in the database are most important for an organization, since database is designed to provide information to the organization. The data stored in the database changes regularly but the plans remain static for longer periods of time. 1.6.1 Schema A schema is plan of the database that give the names of the entities and attributes and the relationship among them. A schema includes the definition of the database name, the record type and the components that make up the records. Alternatively, it is defined as a frame-work into which the values of the data items are fitted. The values fitted into the frame-work changes regularly but the format of schema remains the same e.g., consider the database consisting of three files ITEM, CUSTOMER and SALES. The data structure diagram for this schema is shown in Figure 1.5. The schema is shown in database language. Generally, a schema can be partitioned into two categories, i.e., (i) Logical schema and (ii) Physical schema. (i) The logical schema is concerned with exploiting the data structures offered by the DBMS so that the schema becomes understandable to the computer. It is important as programs use it to construct applications. (ii) The physical schema is concerned with the manner in which the conceptual database get represented in the computer as a stored database. It is hidden behind the logical schema and can usually be modified without affecting the application programs. The DBMS’s provide DDL and DSDL to specify both the logical and physical schema. Introduc tion to Database Systems 15 Schema name is ITEM_SALES_REC type ITEM = record ITEM_ID: string; ITEM_DESC: string; Attributes/data items ITEM_COST: integer; end type CUSTOMER = record CUSTOMER_ID = integer; CUSTOMER_NAME = string; CUSTOMER_ADD = string; CUSTOMER_CITY = string; CUSTOMER_BAL = integer; end type SALES = RECORD CUSTOMER_ID = integer; ITEM_ID = string; ITEM_QTY = integer; ITEM_PRICE = integer; end FIGURE 1.5. Data structure diagram for the item sales record. 1.6.2 Subschema A subschema is a subset of the schema having the same properties that a schema has. It identifies a subset of areas, sets, records, and data names defined in the database schema available to user sessions. The subschema allows the user to view only that part of the database that is of interest to him. The subschema defines the portion of the database as seen by the application programs and the application programs can have different view of data stored in the database. The different application programs can change their respective subschema without affecting other’s subschema or view. The Subschema Definition Language (SDL) is used to specify a subschema in the DBMS. 1.6.3 Instances The data in the database at a particular moment of time is called an instance or a database state. In a given instance, each schema construct has its own current set of instances. Many instances or database states can be constructed to correspond to a particular database schema. Everytime we update (i.e., insert, delete or modify) the value of a data item in a record, one state of the database changes into another state. The Figure 1.6 shows an instance of the ITEM relation in a database schema. 16 Introduc tion to Database Management System ITEM ITEM-ID ITEM_DESC ITEM_COST 1111A Nutt 3 1112A Bolt 5 1113A Belt 100 1144B Screw 2 FIGURE 1.6. An instance/database state of the ITEM relation. 1.7 Three Level Architecture of Database Systems (DBMS) The architecture is a framework for describing database concepts and specifying the structure of database system. The three level architecture was suggested by Ansi/Sparc. Here database is divided into three levels external level, conceptual level and internal level as shown in Figure 1.7. Views and External External External External mappings view 1 view 2 view N level are maintained by DBA External conceptual External/conceptual External/conceptual mapping 1 mapping 2 mapping N Conceptual Conceptual view Database level Management System Conceptual/Internal mapping Internal Internal view (Physical level storage of data) FIGURE 1.7. Three level architecture of DBMS. 1.7.1 Levels or Views The three levels or views are discussed below: (i) Internal Level : Internal level describes the actual physical storage of data or the way in which the data is actually stored in memory. This level is not relational because data is stored according to various coding schemes instead of tabular form (in tables). This is the low level representation of entire database. The internal view is described by means of an internal schema. Introduc tion to Database Systems 17 The internal level is concerned with the following aspects: – Storage space allocation – Access paths – Data compression and encryption techniques – Record placement etc. The internal level provides coverage to the data structures and file organizations used to store data on storage devices. (ii) Conceptual Level : The conceptual level is also known as logical level which describes the overall logical structure of whole database for a community of users. This level is relational because data visible at this level will be relational tables and operators will be relational operators. This level represents entire contents of the database in an abstract form in comparison with physical level. Here conceptual schema is defined which hides the actual physical storage and concentrate on relational model of database. (iii) External Level : The external level is concerned with individual users. This level describes the actual view of data seen by individual users. The external schema is defined by the DBA for every user. The remaining part of database is hidden from that user. This means user can only access data of its own interest. In other words, user can access only that part of database for which he is authorized by DBA. This level is also relational or very close to it. 1.7.2 Different Mappings in Three Level Architecture of DBMS The process of transforming requests and results between the three levels are called mappings. The database management system is responsible for this mapping between internal, external and conceptual schemas. There are two types of mappings: 1. Conceptual/Internal mapping. 2. The External/Conceptual mapping. 1. The Conceptual/Internal Mapping : This mapping defines the correspondence or operations between the conceptual view and the physical view. It specifies how the data is retrieved from physical storage and shown at conceptual level and vice-versa. It specifies how conceptual records and fields are represented at the internal level. It also allows any differences in entity names, attribute names and their orders, data types etc., to be resolved. 2. The External/Conceptual Mapping : This mapping defines the correspondence between the conceptual view and the physical view. It specifies how the data is retrieved from conceptual level and shown at external level because at external level some part of database is hidden from a particular user and even names of data fields are changed etc. There could be one mapping between conceptual and internal level and several mappings between external and conceptual level. The physical data independence is achieved through conceptual/internal mapping while the logical data independence is achieved through external/ conceptual mapping. The information about the mapping requests among various schema levels are included in the system catalog of DBMS. When schema is changed at some level, the schema at the next higher level remains unchanged, only the mapping between the two levels is changed. 18 Introduc tion to Database Management System 1.7.3 Advantages of Three-level Architecture The motive behind the three-level architecture is to isolate each user’s view of the database from the way the database is physically stored or represented. The advantages of the three- level architecture are as follows : 1. Each user is able to access the same data but have a different customized view of the data as per the requirement. 2. The changes to physical storage organization does not affect the internal structure of the database. e.g., moving the database to a new storage device. 3. To use the database, the user is no need to concern about the physical data storage details. 4. The conceptual structure of the database can be changed by the DBA without affecting any user. 5. The database storage structure can be changed by the DBA without affecting the user’s view. 1.7.4 Data Independence It is defined as the characteristics of a database system to change the schema at one level without having to change the schema at the next higher level. It can also be defined as the immunity of the application programs to change in the physical representation and access techniques of the database. The above definition says that the application programs do not depend on any particular physical representation or access technique of the database. The DBMS achieved the data independence by the use of three-level architecture. The data independence is of Two types: 1. Physical Data Independence : It indicates that the physical storage structures or devices used for storing the data could be changed without changing the conceptual view or any of the external views. Only the mapping between the conceptual and internal level is changed. Thus, in physical data independence, the conceptual schema insulates the users from changes in the physical storage of the data. 2. Logical Data Independence : It indicates that the conceptual schema can be changed without changing the existing external schemas. Only the mapping between the external and conceptual level is changed and absorbed all the changes of the conceptual schema. DBMS that supports logical data independence, changes to the conceptual schema is possible without making any change in the existing external schemas or rewriting the application programs. Logical data independence also insulates application programs from operations like combining of two records into one or splitting an existing record into more than one records. 1.8 Data Models A data model is a collection of concepts that can be used to describe the structure of the database including data types, relationships and the constraints that apply on the data. A data model helps in understanding the meaning of the data and ensures that, we understand. Introduc tion to Database Systems 19 – The data requirements of each user. – The use of data across various applications. – The nature of data independent of its physical representations. A data model supports communication between the users and database designers. The major use of data model is to understand the meaning of the data and to facilitate communication about the user requirements. Characteristics of Data Models A data model must posses the following characteristics so that the best possible data representation can be obtained. (i) Diagrammatic representation of the data model. (ii) Simplicity in designing i.e., Data and their relationships can be expressed and distinguished easily. (iii) Application independent, so that different applications can share it. (iv) Data representation must be without duplication. (v) Bottom-up approach must be followed. (vi) Consistency and structure validation must be maintained. 1.8.1 Types of Data Models The various data models can be divided into three categories, such as (i) Record Based Data Models. (ii) Object Based Data Models. (iii) Physical Data Models. (i) Record Based Data Models : These models represent data by using the record structures. These models lie between the object based data models and the physical data models. These data models can be further categorised into three types: (a) Hierarchical Data Model (b) Network Data Model (c) Relational Data Model. (ii) Object Based Data Models : These models are used in describing the data at the logical and user view levels. These models allow the users to implicity specify the constraints in the data. These data models can be further categorised into four types: (a) Entity Relationship Model (ER-Model) (b) Object Oriented Model (c) Semantic Data Model (d) Functional Data Model. The models are discussed in the coming sections. (iii) Physical Data Models : These models provide the concepts that describes the details of how the data is stored in the computer along with their record structures, access paths and ordering. Only specialized or professional users can use these models. These data models can be divided into two types: 20 Introduc tion to Database Management System (a) Unifying Model. (b) Frame Memory Model. 1.8.1.1 Record based Data Models Record based data models represent data by using the record structures. These are used to describe data at the conceptual view level. These are named because the database is structured in a fixed format records of several types. The use of fixed length records simplify the physical level implementation of the database. These models lie between the object based data models and the physical data models. These models provide the concepts that may be understood by the end users. These data models do not implement the full detail of the data storage on a computer system. Thus, these models are used to specify overall logical structure of the database and to provide high level description of implementation. These are generally used in traditional DBMS’s and are also known as ‘Representational Data Models’. The various categories of record based data models are as follows: (i) Hierarchical Data Model (ii) Network Data Model (iii) Relational Data Model. (i) Hierarchical Data Model : Hierarchical Data Model is one of the oldest database models. The hierarchical model became popular with the introduction of IBM’s Information Management System (IMS). The hierarchical data model organizes records in a tree structure i.e., hierarchy of parent and child records relationships. This model employs two main concepts : Record and Parent Child Relationship. A record is a collection of field values that provide information of an entity. A Parent Child Relationship type is a 1 : N relationship between two record types. The record type of one side is called the parent record type and the one on the N side is called the child record type. In terms of tree data structure, a record type corresponds to node of a tree and relationship type corresponds to edge of the tree. The model requires that each child record can be linked to only one parent and child can only be reached through its parent. WORLD Continent ASIA EUROPE AUSTRALIA Etc. Country INDIA CHINA PAKISTAN Etc. (Grand Parent) State PUNJAB HARYANA RAJASTHAN Etc. (Parent) District ROHTAK SIRSA HISSAR Etc. (Child) FIGURE 1.8. Hierarchical Model. Introduc tion to Database Systems 21 In the Figure 1.8, the ‘World’ acts as a root of the tree structure which has many children’s like Asia, Europe, Australia etc. These children can act as a parent for different countries such as ASIA continents acts as a parent for countries like India, China, Pakistan etc. Similarly these children can act as a parent for different states such as INDIA country acts as a parent for states Punjab, Haryana, Rajasthan etc. Further the same follows. Consider child ‘Rohtak’ which has a parent ‘Haryana’ which further has a parent ‘India’ and so on. Now ‘India’ will acts a grandparent for the child ‘Rohtak’. The major advantages of Hierarchical Model are that it is simple, efficient, maintains data integrity and is the first model that provides the concept of data security. The major disadvantages of Hierarchical model are that it is complex to implement, Lacking of structural independence, operational anomalies and data management problem. (ii) Network Data Model : As a result of limitations in the hierarchical model, designers developed the Network Model. The ability of this model to handle many to many (N : N) relations between its records is the main distinguishing feature from the hierarchical model. Thus, this model permits a child record to have more than one parent. In this model, directed graphs are used instead of tree structure in which a node can have more than one parent. This model was basically designed to handle non-hierarchical relationships. The relationships between specific records of 1 : 1 (one to one), 1 : N (one to many) or N : N (many to many) are explicitly defined in database definition of this model. The Network Model was standardized as the Codasyl Dbtg (Conference of Data System Languages, Database Task Group) model. There are two basic data structures in this model—Records and Sets. The record contains the detailed information regarding the data which are classified into record types. A set type represents relationship between record types and this model use linked lists to represent these relationships. Each set type definition consists of three basic elements : a name for set type an owner record type (like parent) and a member record type (like child). To represent many to many relationship in this model, the relationship is decomposed into two one to many (1 : N) relationships by introducing an additional record type called an Intersection Record or Connection Record. The major advantages of Network Model are that it is conceptually simple, Handles more relationship types, promotes database integrity, data access flexibility and conformance to the standards. The major disadvantages of Network Model are that it is complex and lack of structural independence. (iii) Relational data Model : The Relational Model was first introduced by Dr. Edgar Frank, an Oxford-trained Mathematician, while working in IBM Research Centre in 1970’s. The Relational Model is considered one of the most popular developments in the database technology because it can be used for representing most of the real world objects and the relationships between them. The main significance of the model is the absolute separation of the logical view and the physical view of the data. The physical view in relational model is implementation dependent and not further defined. The logical view of data in relational model is set oriented. A relational set is an unordered group of items. The field in the items are the columns. The column in a table have names. 22 Introduc tion to Database Management System The rows are unordered and unnamed. A database consists of one or more tables plus a catalogue describing the database. The relational model consists of three components: 1. A structural component—A set of tables (also called relations) and set of domains that defines the way data can be represented. 2. A set of rules for maintaining the integrity of the database. 3. A manipulative component consisting of a set of high-level operations which act upon and produce whole tables. In the relational model the data is represented in the form of tables which is used interchangeably with the word Relation. Each table consists of rows also knowns as tuples (A tuple represents a collection of information about an item, e.g., student record) and column also known as attributes. (An attribute represents the characteristics of an item, e.g., Student’s Name and Phone No.). There are relationships existing between different tables. This model doesn’t require any information that specifies how the data should be stored physically. The major advantages of Relational Model are that it is structurally independent, improved conceptual simplicity adhoc query capability and powerful DBMS. The major disadvantages of relational model are substantial hardware and software overhead and facilitates poor design and implementation. 1.8.1.2 Object Based Data Models Object Based Data Models are also known as conceptual models used for defining concepts including entries, attributes and relationships between them. These models are used in describing data at the logical and user view levels. These models allow the constraints to be specified on the data explicitly by the users. An entity is a distinct object which has existence in real world. It will be implemented as a table in a database. An attribute is the property of an entity, in other words, attribute is a single atomic unit of information that describes something about its entity. It will be implemented as a column or field in the database. The associations or links between the various entities is known as relationships. There are 4 types of object based data models. These are: (a) Entity-relationship (E-R) Model (b) Object-Oriented Model (c) Semantic Data Model (d) Functional Data Model These are discussed as follows: (a) Entity-Relationship (E-R) Model : The E-R model is a high level conceptual data model developed by Chen in 1976 to facilitate database design. The E-R model is the generalization of earlier available commercial model like the hierarchical and network model. It also allows the representation of the various constraints as well as their relationships. The relationship between entity sets is represented by a name. E-R relationship is of 1 : 1