DBMS U-1 Slides.pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
Database Management Systems Introduction Database Management Systems Course Outline Unit 1: Introduction to Database Management Database System Applications, Purpose, View of data, Database Languages, Database design- Introduction to databases, Database application architecture, Users and...
Database Management Systems Introduction Database Management Systems Course Outline Unit 1: Introduction to Database Management Database System Applications, Purpose, View of data, Database Languages, Database design- Introduction to databases, Database application architecture, Users and Administrators, E-R Model, reducing ER to a relational schema. Structure of relational databases, Database schema, and its constraints, Keys 14 Hours Unit 2: Relational Model and Database Design Relational operations (Algebra), Unary Operations - Unity, Binary, Aggregate Functions, Grouping, SQL overview, Data definition, Structure of SQL queries, Additional Basic Operations, Set Operations, Null Values, Aggregate Functions, Nested Subqueries, Database Modification, Join expressions, Views, Triggers, Functions, and Procedures 14 Hours Database Management Systems Course Outline Unit 3: Advanced Design Concepts and Implementation Functional Dependencies, Inference Rules, Closure, Equivalence, Minimal Cover Normal Forms Based on Primary Keys (1NF, 2NF, and 3NF), General Definitions of Second and Third Normal Forms Boyce-Codd Normal Form, Properties of Relational Decompositions, Overview of Higher Normal Forms. Database transactions, Concurrency control, Locking, Recovery, Database Security 14 Hours Unit 4: Advanced Databases Query Processing and Optimization, Accessing SQL from a Programming Language, Structured, Semi structured, Unstructured data, Introduction to NoSQL databases, CAP theorem, Document database (MongoDB), Key-Value database (DynamoDB), Graph databases (Neo4j) 14 Hours Database Management Systems Course Outline Lab/Hands-on sessions Draw an ER diagram for a given problem statement Conversion of an ER diagram into Relational schema DDL – create, constraints, alter, rename, drop, truncate table, Views. 4. DML – Insert, Update, Delete, Transactions - commit, rollback, savepoint SQL - Set operators: union, intersect, minus. SQL – Aggregate functions. SQL – Joins: inner, outer; Sub queries: correlated and uncorrelated 8. SQL – Creating Functions and Procedures SQL – Creating Triggers and Cursors XML- Database access 11. NoSQL database queries High-level programming language accessing a database using an API. Tools/ Languages: MySQL Workbench, Python, ERwin, Any other tool for ER modeling Database Management Systems Textbooks Textbook 1 (TB1) Textbook 2 (TB2) Database Management Systems Course Evaluation Policy : Tentative DBMS- 5 credits course Theory Lab Components Marks Scaled Marks Marks Scaled Marks Components Lab exercises 70 10 ISA1-CBT 40 20 Mini project 10 05 ISA2-CBT 40 20 Lab- ESA 10 05 Experiential 10 10 learning Total 20 ESA- Descriptive 100 50 Total 100 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Introduction Have you ever wondered on how a large e-commerce website like Amazon is able to retrieve the products that you would want to purchase? Nowadays social media is one of the most active platforms where people post their day to day activities etc. how and where is all these information stored? By typing a persons name we can get to see their entire profile, posts that person has posted how is this possible? In a bank, multiple transactions take place everyday. Even if 2 transactions are happening simultaneously from account, still the money In the accounts is properly deducted and credited how does this happen? Database Management Systems Introduction Suppose you visit your doctor for a general checkup. Now by typing in your name or your ID, all your medical history details would be visible to the doctor. How does this happen? When you search for a specific topic on Google, how does the search engine quickly retrieve relevant web pages from millions of websites and present them to you? Database Management Systems Introduction All the above questions have a single answer and that is DATABASE Before going ahead and understanding what database exactly and how is designed and used for all the other application, let us try answering a few questions: What do you mean by Data? How is it different from Information? In current world, there is a lot of data that is available, how are all these stored or organised so that we can get the required information? Database Management Systems What Do you mean by Data & Information? Data Data represents the raw material, like individual puzzle pieces, that serves as the foundation for knowledge. Data is a raw and unorganized fact that is required to be processed to make it meaningful. It is raw and unorganized Information When data is processed, organized, structured, or presented in a given context, so as to make it useful is called information Data is a collection of facts, while information puts those facts into context It is organized and is utilized by humans in some significant Data doesn’t depend on information, Information depends way to make decisions and draw some conclusions on data Database Management Systems What is a Database? Database A database is a collection of related data representing some aspect of the real world, also called the mini-world or the universe of discourse (UoD). It is a logically coherent collection, meaning it is not just a random assortment of data but is organized with inherent meaning and structure. Databases are designed, built, and populated for a specific purpose, catering to the needs of applications or systems that interact with them. They can vary in size and complexity, ranging from small databases used by individual applications to large-scale enterprise databases handling vast amounts of data. Could you think of what do these individual terms mean w.r.t student information Database? Database Management Systems What is a Database Database Management Systems Database Management System Database Management System A modern database system is a complex software system whose task is to manage a large, complex collection of data. DBMS contains information about a particular enterprise ○ Collection of interrelated data ○ Set of programs to access the data ○ An environment that is both convenient and efficient to use Complex Database systems are used to manage collections of data that are: ○ Highly valuable ○ Relatively large ○ Accessed by multiple users and applications, often at the same time. Let us break these statements down further and understand them in the context of the student information database Database Management Systems Student Information Database 15 Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Defining a database involves specifying the data types, structures, and constraints of the data to be stored in the database. By defining the structure and data types, the DBMS ensures the consistency and integrity of the stored data. The "Students" table may include fields such as "Student ID" (numeric), "Name" (text), "Email" (text), "Date of Birth" (date), and "Major" (text). The "Courses" table may have fields like "Course ID" (numeric), "Course Name" (text), "Credits" (numeric), and "Instructor" (text). Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Constructing the database is the process of storing the data on some storage medium that is controlled by the DBMS. It organizes the data in a way that allows for efficient retrieval and manipulation. The DBMS ensures data is securely stored and can be accessed by authorized users or applications. The DBMS may store the Student Information Database on a server with appropriate data storage capacity and performance capabilities. Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Manipulating a database includes functions such as querying the database to retrieve specific data, updating the database to reflect changes in the mini world, and generating reports from the data. It ensures that any changes made to the database follow predefined constraints and maintain data consistency. A university administrator may use SQL queries to retrieve specific information, such as "Retrieve all students enrolled in Computer Science courses." The DBMS processes the query, retrieves the relevant data from the database, and presents it to the user. Allows authorized users to update the database, such as adding new students or updating grades. Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Sharing a database allows multiple users and programs to access the database simultaneously, provided they have the necessary permissions Faculty members, administrative staff, and students can access relevant information concurrently, as long as they have the necessary permissions. For example, faculty members can access student details for their courses, while administrators can update and maintain the overall database. Database Management Systems Why study databases Databases help us to: Store large amounts of data (file structure, disk management) Understand the data (data models) Keeps data secure (security, recovery) Find required data and use/manipulate it (query languages, concurrency control, and data analysis tools) Get accurate information (as databases have built-in constraints and checks help in this) Maintain Data integrity (ensures data is accurate and consistent) Database Management Systems Database Applications Enterprise Information : ○ Sales: Customer, product, and purchase information. ○ Accounting: Payments, receipts, account balances, assets, and other accounting data. ○ Human Resources: Employee information, salaries, payroll taxes, benefits, and paycheck generation. Manufacturing: ○ Supply chain management, production tracking, inventory, and orders. Banking and Finance: ○ Customer data, accounts, loans, banking transactions, credit card transactions, and finance-related data. Database Management Systems Database Applications Universities: ○ Student information, course registrations, and grades. Airlines: ○ Reservations and schedule information. Telecommunication: ○ Call, text, and data usage records, billing, prepaid card balances, and communication network data. Web-based Services: ○ Social media records, online retailers' sales data, and orders, click history for targeted advertisements, etc. Database Management Systems Database Applications Document Databases: ○ Collections of articles, patents, research papers, etc. Navigation Systems: ○ Locations of places of interest, routes of roads, trains, buses, Database Management Systems How are databases used? There are two modes in which databases are used today Online Transaction Processing (OLTP): ○ Used by a large number of users for small data retrieval and updates ○ common in most database applications like banking, universities, and airlines. Data Analytics: ○ Involves processing data to draw conclusions and create predictive models for business decisions. ○ Examples include loan approval, targeted advertisements, and manufacturing decisions. ○ Data mining combines AI and statistical techniques for efficient analysis of large databases. THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Purpose of Database System Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems File-processing Systems For understanding the purpose let us take an example of a University Organization system For a university Organization system we need to store information about the instructors, students, departments, and the courses offered. One of the ways to keep this information is to store it in operating file systems To allow the users to manipulate the information, the system would have a number of application programs that manipulate the files including: ○ Addition of new students, instructors, and courses ○ Register students for different courses ○ Assign grades to students, compute GPA, and generate transcripts. Database Management Systems File-processing Systems Now let us consider the case that the university has decided to add a new major. As a result of which the university creates a new department and creates new permanent files to record information related to this. Similarly the university would have to write new application programs to deal with/manage rules with respect to the new major. This way whenever a new change or addition is made, the university would have to write a new application program to meet the needs of the University This typical file-processing system is supported by a conventional operating system. The system stores permanent records in various files, and it needs different application programs to extract records from and add records to, the appropriate files. Database Management Systems File-processing Systems Then what is the need for a Database Management System when we can do the university organization using a simple file-processing system? Consider the following situations: What if there is a student who has enrolled for a double major (Physics and Computer Science) how many times do we store all his details? What happens if a student changes their address or phone number? Is it guaranteed that the update reflects in all relevant files across departments? Suppose the university transport department wants to find out all students who live within a particular postal code area. Can he quickly retrieve the data or does he have to build another application for it? What if files that store these data are written in different formats is it easy to access the data in this form? Suppose a new constraint is to be added, can it be done efficiently without having to make extensive changes to existing application programs? Database Management Systems Purpose of Database Systems If we try answering the questions on the previous slide we can understand that there are a lot of drawbacks to the file-processing system : Data Redundancy and Inconsistency Difficulty in accessing the data Data isolation Integrity problems Atomicity problems Concurrent access anomalies Security problems Let us understand what are these problems Database Management Systems Purpose of Database Systems 1. Data Redundancy & Inconsistency Considering the question of the student has enrolled for a double major let’s say in Physics and Computer Science. What would happen in this situation in case of file Solution to this problem: system? - Store information in centralized Details of the students would be duplicated for each dept location thus Minimizes 🡪Data Redundancy redundancy - Changes can be made at a single This would lead to the following problems: spot, thus ensuring consistency - This can be achieved through a Takes up more storage space database system May also lead to Data Inconsistency in case of changes Database Management Systems Purpose of Database Systems 2. Difficulty in accessing the data Considering that the university wants to find all the students living in a specific postal code area. How will we The solution to this problem: execute this in traditional file system? - Central information store for fast Either we need to get a list of all students and retrieval manually sift through it - Use a powerful query language for answering new queries Or create another special program for this task - Efficient ways to search and filter data for speeding up data Both the above-mentioned are time-consuming and retrieval inconvenient - This can be achieved through a Thus we can see that Conventional file-processing database system systems are not designed for quick and efficient data retrieval, especially when new types of queries arise. Database Management Systems Purpose of Database Systems 3. Data Isolation In a conventional file-based system, data is scattered across different files in various formats. Due to this, writing a new application program to retrieve The solution to this problem: the required data is quite challenging. - Centralized system It's like trying to find puzzle pieces from different sets and - Uniform and standardized way to putting them together, which can be a complicated and access the data through queries time-consuming task. so as to Simplifies complex programs - These benefits are provided by DBMS Database Management Systems Purpose of Database Systems 4. Integrity problems Let us consider example, assume that there is an account for every department that contains the funds for carrying The solution to this problem: out some research work, etc. and let us say that account - Specify integrity constraints at balance should always be above 0. how can we the database level to specify implement this? rules directly in the schema - System should enforce For this we require to careful handle each application constraints automatically program for maintaining accounts that have been written - For a new constraint, Updation of for each department the schema should be sufficient And also if a new constraint has to be added it makes the and the rest of the things would job a very tough task for us to edit all the application be taken care of automatically programs - This would be easily solved by using DBMS which provides all these facilities Database Management Systems Purpose of Database Systems 5. Atomicity problems In a computer system failure can happen unexpectedly Now let us consider a banking system, suppose a sum of 5000 rupees is getting transferred from account A to B. so internally : An amount of 5000 Rs. gets deducted from account A The solution to this problem: The amount (5000 Rs.) gets credited to account B - Atomicity should be ensured - Automatic rollback in case of Now either all of these steps should be performed or non failure should be executed. - DBMS would ensure this and In a file system, The program might successfully deduct makes it reliable and consistent $500 from account A but fail to credit it to account B. even if failure occurs This results in an inconsistent database state, with money missing from one account and not added to the other. Database Management Systems Purpose of Database Systems 6. Concurrent Access Anomalies Let us again consider a bank transaction example. Now in this situation Let's say there's a bank account (Account A) with a balance of $10,000. The solution to this problem: Two bank clerks attempt to deduct money from the - Transaction and Locking to be account at almost the same time - one deducts $500 implemented and the other $100. - Transactions ensure atomicity Since they work concurrently, they both read the - Locking prevents conflicting initial balance of $10,000, subtract their respective changes. amounts, and write back the results. - The DBMS would provide these Depending on which one writes the new balance last, and maintains data consistency the account could end up with $9,500 or $9,900 and integrity instead of the correct value of $9,400. This would pose serious problems to the bank Database Management Systems Purpose of Database Systems 7. Security problems Imagine a university using a file-based system. Over time, various application programs have been added without a The solution to this problem: centralized approach to security. - Create provision for Due to the above reason Payroll personnel might administrator to define access unintentionally view academic records and even can controls and permissions for modify it, compromising data privacy and confidentiality. each user or group. Because all people would have same permissions. - These permissions should enforced at the database level, regardless of the application programs used. - DBMS supports various methods for introducing security like User Authentication, Data Encryption, Auditing and Logging, Centralized security management Database Management Systems Characteristics of Database Systems Self-describing nature of a database system The database system contains not only the database itself but also a complete definition or description of the database structure and constraints (Metadata). Insulation between programs and data, and data abstraction The structure of data files is stored in the DBMS catalog separately from the access programs (program-data independence) Support multiple views of the data DBMSs support different types of users, each of whom may require a different perspective or view of the database. A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored. Sharing of data and multiuser transaction processing A multiuser DBMS, allows multiple users to access the database at the same time. It ensures that data changes are effected in a controlled manner so that the result of the changes are correct Database Management Systems Advantages of Database Systems Controlling redundancy in data storage. Sharing of data among multiple users. Restricting unauthorized access to data Providing persistent storage for program Objects ○ E.g., Object-oriented DBMSs make program objects persistent Providing Storage Structures and search techniques for efficient Query Processing Providing backup and recovery Providing multiple interfaces to different classes of users. Database Management Systems Advantages of Database Systems Representing complex relationships among data. Enforcing integrity constraints on the database. Drawing inferences and actions from the stored data using deductive and active rules and triggers Potential for enforcing standards Reduced application development time Flexibility to change data structures Availability of current information Economies of scale THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems View of Data Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems View of Data A database system is a collection of interrelated data and a set of programs that allow users to access and modify these data. A major purpose of a database system is to provide users with an abstract view of the data. ○ Data models A collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. ○ Data abstraction Hide the complexity of data structures to represent data in the database from users through several levels of data abstraction. The structure of a database is defined by its data model, which includes tools to describe data, relationships, semantics, and consistency constraints. Database Management Systems Data Models RELATIONAL MODEL Uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations. The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed number of fields or attributes. Each row of the table represents one piece of information The columns of the table correspond to the attributes of the record type. The relational data model is the most widely used data model, and a vast majority of current database systems are based on the relational model. Database Management Systems Data Models Entity-Relationship Model The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database design. Database Management Systems Data Models Semi Structured Data Model The semi-structured data model permits the specification of data where individual data items of the same type may have different sets of attributes. This is in contrast to the data models mentioned earlier, where every data item of a particular type must have the same set of attributes. JSON and Extensible Markup Language (XML) are widely used semi-structured data representations A database model where there is no separation between the data and the schema, and the amount of structure used depends on the purpose. The advantages of this model are the following: It can represent the information of some data sources that cannot be constrained by schema. Example of XML data Database Management Systems Data Models Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has become the dominant software-development methodology. This led initially to the development of a distinct object-oriented data model, but today the concept of objects is well integrated into relational databases. Standards exist to store objects in relational tables. Database systems allow procedures to be stored in the database system and executed by the database system. This can be seen as extending the relational model with notions of encapsulation, methods, and object identity. THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Data Abstraction & Instances, Schema Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Data Abstraction As discussed earlier DBMS plays an important role in many areas. And one of the main requirements of DBMS is that it must retrieve the data efficiently. This need for efficiency has led the database developers to use complex data structures to represent data in a database Many database-system users are not computer trained and do not know how to write complex queries for getting information they are looking for Hence developer hide all these complexity from the users through several layers of data abstraction, in order to simplify the users’ interaction with the system The different levels of data abstraction: ○ Physical Level ○ Logical level ○ View level Database Management Systems Data Abstraction Let us now have a look at what happens at each level: Physical Level : It is the lowest level of data abstraction It describes how the data is actually stored in the computer Logical Level : This describes what kind of data is stored in the database and how different pieces of information are related to each other The logical level thus describes the entire database in terms of a small number of relatively simple structures. This level provides a more organized and simpler view of the data Database Management Systems Data Abstraction Let us now have a look at what happens at each level: View Level : This is the highest level of abstraction that individual users interact with and that only displays a part of the entire database Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. At this level, different users might see different parts of the database, depending on their needs and permissions Database Management Systems Data Abstraction Now we can notice that: Although the implementation of simple structures at the logical level may involve complex physical-level structures, the user at the logical level does not need to be aware of this Similarly the user who views the database for various purposes need not be aware of the Logical level implementation, for them it would be like an interface and need not worry about the logical view Database Management Systems Data Abstraction PHYSICAL DATA INDEPENDENCE: It refers to the characteristic of being able to modify the physical schema without any alterations to the conceptual or logical schema, done for optimization purposes Example : the Conceptual structure of the database would not be affected by any change in the storage size of the database system server. Changing from sequential to random access files is one such example. These alterations or modifications to the physical structure may include: ○ Utilizing new storage devices. ○ Modifying data structures used for storage. ○ Altering indexes or using alternative file organization techniques etc. Database Management Systems Data Abstraction LOGICAL DATA INDEPENDENCE It refers characteristic of being able to modify the logical schema without affecting the external schema or application program. The user view of the data would not be affected by any changes to the conceptual view of the data. Example: These changes may include the insertion or deletion of attributes, altering table structures entities or relationships to the logical schema, etc. Database Management Systems Data Abstraction Now let us understand this with the help of an example of a University database Let the university have the given Record types/ tables as shown Now for the given database, Physical level: ○ specify how student records, course information, and department data are stored on the hard drive or in memory. Logical Level: ○ It would define entities like "Department," "Student," and "Course" and their attributes (e.g., department name, student ID, course code). ○ It also outlines the relationships between these entities, such as "Students belong to Departments" and "Courses are taken by Students.“ Database Management Systems Data Abstraction Now let us understand this with the help of an example of a University database Let the university have the given Record types/ tables as shown Now for the given database, View Level: ○ A faculty member may have access to a view showing only the courses they are teaching and the students enrolled in those courses. ○ A student, on the other hand, may have a view displaying only the courses they are registered for and their respective grades. ○ These views provide a simplified and customized perspective of the database for different users. Now in a similar way could you think of how physical and logical data independence could be of an advantage in the case of the university database? Database Management Systems Data Abstraction Let us consider the situation where the University decides to migrate its database from one storage system to another or optimize the storage layout to improve performance. What would be the consequence of this? Without Physical data independence: Minor changes to the storage system or hardware upgrades could potentially break existing queries, applications, and interfaces. As a result, any update to the storage system would require extensive modifications to the application layer, leading to higher maintenance costs and longer development cycles. Database Management Systems Data Abstraction Let us consider the situation where the University decides to migrate its database from one storage system to another or optimize the storage layout to improve performance. What would be the consequence of this? With physical data independence: the database administrators can make these changes without altering the logical schema, which means users' queries and applications will continue to work seamlessly. It allows for easier maintenance, upgrades, and performance enhancements without disrupting the application layer. Database Management Systems Data Abstraction Let's say the University decides to reorganize its department structure and add new attributes to the "Department" entity. Then what would be the effect? Without Logical data independence: Any alteration to the logical schema, such as adding or removing attributes, or changing relationships between entities, would result in changes propagating throughout the applications that rely on the database. This would force developers to modify all affected application code to accommodate the changes in the schema. Consequently, even minor changes to the data model could lead to widespread disruption and downtime for the applications, increasing the risk of introducing errors and inconsistencies. Database Management Systems Data Abstraction Let's say the University decides to reorganize its department structure and add new attributes to the "Department" entity. Then what would be the effect? With Logical data independence: The database designers can make these modifications without impacting the existing application programs that access the database. This flexibility allows for easier adaptability to changes in the University's requirements and business rules without causing application disruptions. Database Management Systems Instances and Schema Databases change over time as information is inserted and deleted or modified. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. Let us try to understand what that exactly means with an analogy to a program written in any programming language Database Schema => variable declarations along with the associated type definitions Instances => Value of the variable at a particular point in time in the program Database Management Systems Instances and Schema The databases would have different type of schemas, partitioned according to the different levels of abstraction PHYSICAL SCHEMA : describes the database design at the physical level LOGICAL SCHEMA : describes database design at the logical level SUBSCHEMAS : there could be several schemas at the view level called subschemas, that describes different views Considering the university Database, what would these schemas represent? Database Management Systems Instances and Schema Let's consider three levels of schema and understand them w.r.t University database: Physical Schema: ○ This describes how the data is stored in the database physically. It includes details like data storage format, file organization, and indexing methods. Changes in the physical schema should not affect the applications using the database. ○ Example: How data is stored on the hard disk, like using B-trees or hash indexes for efficient retrieval. Logical Schema: ○ This defines the database design at a higher level, focusing on the structure and relationships between the tables. It is essential for application developers as they build software based on logical schema. ○ Example: Defining the relationships between tables, such as the one-to-many relationship between students and courses, where one student can be enrolled in many courses, but each course can have multiple students. Database Management Systems Instances and Schema Let's consider three levels of schema and understand them w.r.t University database: View Schema (Subschema): ○ This describes different views of the database that are tailored to specific user needs or applications. Views allow users to see only relevant information and protect sensitive data. ○ Example: Creating a view that shows only the names of enrolled students and their corresponding course names, hiding other details like grades and personal information. Database Management Systems Instances and Schema Out of all the schema types, the logical schema holds the most significance when it comes to impacting application programs. This is because programmers utilize the logical schema to develop applications. The physical schema is hidden beneath the logical schema and can usually be changed easily without affecting application programs. Application programs are said to exhibit physical data independence if they do not depend on the physical schema and thus need not be rewritten if the physical schema changes. We also note that it is possible to create schemas that have problems, such as unnecessarily duplicated information. ○ Per suppose if we store the details of the department chairperson in the faculty table, i.e. every faculty would have the attribute of “chairperson”, then there is a lot of space that gets wasted as the chairperson would be the same for the entire department and we are storing it for every faculty. Rather the better idea would be to store it in the department’s database. THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Database Languages Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Database Languages Database Management Systems Data Definition Language (DDL) The specification of the storage structure and access methods used by a database system are specified by a set of special statements in a special type of DDL called a data storage and definition language The processing of DDL statements generates some output. This output is placed In a DATA DICTIONARY ( i.e. DDL compiler generates a set of table templates stored in a data dictionary) Database Management Systems Data Definition Language (DDL) So what exactly does this Data dictionary contain? Data dictionary contains metadata (i.e., data about data) ○ Database schema ○ Integrity constraints Primary key (ID uniquely identifies instructors) ○ Authorization Who can access what The data dictionary is considered to be a special type of table that can be accessed and updated only by the database system itself (not a regular user). The database system consults the data dictionary before reading or modifying actual data. Database Management Systems Data Definition Language (DDL) The data values that we store in the database must satisfy certain consistency constraints ○ Ex: For a university database, every department has some funding to support their research work. The amount of funds remaining is shown as the account balance of a department ○ Now we know that the account balance should never be negative, these sorts of constraints should be taken care of. The DDL provides the facility to specify such constraints. The database system checks these constraints every time the database is updated. In general, a constraint can be an arbitrary predicate pertaining to the database. However, arbitrary predicates may be costly to test. Thus, database systems implement only those integrity constraints that can be tested with minimal overhead ○ DOMAIN CONSTRAINTS ○ REFERENTIAL INTEGRITY ○ AUTHORIZATION Database Management Systems Data Definition Language (DDL) DOMAIN CONSTRAINT: A domain of possible values must be associated with every attribute (for example, integer types, character types, date/time types). Declaring an attribute to be of a particular domain acts as a constraint on the values that it can take. Domain constraints are the most elementary form of integrity constraint. They are tested easily by the system whenever a new data item is entered into the database Database Management Systems Data Definition Language (DDL) REFERENTIAL INTEGRITY: There are cases where we wish to ensure that a value that appears in one relation for a given set of attributes also appears in a certain set of attributes in another relation (referential integrity). For example, the department listed for each course must be one that actually exists in the university. More precisely, the dept name value in a course record must appear in the dept name attribute of some record of the department relation. Database modifications can cause violations of referential integrity. When a referential integrity constraint is violated, the normal procedure is to reject the action that caused the violation. Database Management Systems Data Definition Language (DDL) AUTHORIZATION Generally a database would be used by multiple users and we may want to differentiate among what operations each user can perform so as to make sure that there is no unauthorized changes that would be made to the Database These differentiations are expressed in terms of authorization, the most common being: ○ read authorization: which allows reading, but not modification, of data; ○ insert authorization: which allows insertion of new data, but not modification of existing data; ○ update authorization: which allows modification, but not deletion, of data; ○ delete authorization: which allows deletion of data. We may assign the user all, none, or a combination of these types of authorization. Database Management Systems Data Manipulation Language (DML) A data manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. The types of access are: ○ Retrieval of information stored in the database. ○ Insertion of new information into the database. ○ Deletion of information from the database. ○ Modification of information stored in the database. There are basically two types of data-manipulation language: ○ Procedural DMLs - require a user to specify what data are needed and how to get those data. ○ Declarative DMLs - (also referred to as nonprocedural DMLs) require a user to specify what data are needed without specifying how to get those data. Database Management Systems Data Manipulation Language (DML) Declarative DMLs are usually easier to learn and use than procedural DMLs. However, since a user does not have to specify how to get the data, the database system has to figure out an efficient means of accessing data. A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language. There are a number of database query languages in use, either commercially or experimentally. The levels of abstraction that we discussed earlier apply not only to defining or structuring data but also to manipulating data. ○ At the physical level, we must define algorithms that allow efficient access to data. ○ At higher levels of abstraction, we emphasize ease of use. The goal is to allow humans to interact efficiently with the system. The query processor component of the database system translates DML queries into sequences of actions at the physical level of the database system. Database Management Systems SQL Query Language SQL query language is nonprocedural. A query takes as input several tables (possibly only one) and always returns a single table. Example to find all instructors in Comp. Sci. dept select name from instructor where dept_name = 'Comp. Sci.' SQL is NOT a Turing machine equivalent language To be able to compute complex functions SQL is usually embedded in some higher-level language Application programs generally access databases through one of Language extensions to allow embedded SQL Application program interface (e.g., ODBC/JDBC) which allows SQL queries to be sent to a database Database Management Systems SQL Data Definition Language SQL provides a rich DDL that allows one to define tables with data types and integrity constraints. For instance, the following SQL DDL statement defines the department table: CREATE TABLE department ( dept name CHAR(20), building CHAR(15), budget NUMERIC(12,2) ); Execution of the above DDL statement creates the department table with three columns: dept name, building, and budget, each of which has a specific data type associated with it. The SQL DDL also supports a number of types of integrity constraints. EX: one can specify that the dept name attribute value is a primary key, ensuring that no two departments can have the same department name. EX2: one can specify that the dept name attribute value appearing in any instructor record must also appear in the dept name attribute of some record of the department table. Database Management Systems SQL Data Manipulation Language SQL also provides a large variety of DML statements that would help us to query the information we want to retrieve from the database Here is an example of an SQL query that finds the names of all instructors in the History department: SELECT instructor.name FROM instructor WHERE instructor.dept_name = 'History’; The query specifies that those rows from the table instructor where the dept name is History must be retrieved, and the name attribute of these rows must be displayed. The result of executing this query is a table with a single column labeled name and a set of rows, each of which contains the name of an instructor whose dept name is History. Database Management Systems Database Access from Application Programs Non-procedural query languages such as SQL are not as powerful as a universal Turing machine; that is, there are some computations that are possible using a general-purpose programming language but are not possible using SQL. SQL does not support actions such as input from users, output to displays, or communication over the network. Such computations and actions must be written in a host language, such as C/C++, Java, or Python, with embedded SQL queries that access the data in the database. Database Management Systems Database Access from Application Programs Application programs are programs that are used to interact with the database in this fashion. ○ Examples in a university system are programs that allow students to register for courses, generate class rosters, calculate student GPA, generate payroll checks, and perform other tasks. ○ To access the database, DML statements need to be sent from the host to the database where they will be executed. ○ This is most commonly done by using an application program interface (set of procedures) that can be used to send DML and DDL statements to the database and retrieve the results. The Open Database Connectivity (ODBC) standard defines application program interfaces for use with C and several other languages. The Java Database Connectivity (JDBC) standard defines a corresponding interface for the Java language. THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Database Design Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Database Design Database design mainly involves the design of the database schema. The design of a complete database application environment that meets the needs of the enterprise being modeled requires attention to a broader set of issues A high-level data model provides the database designer with a conceptual framework in which to specify the data requirements of the database users and how the database will be structured to fulfill these requirements the various steps involved in the design process are : Database Management Systems Database Design Let us now try and understand what happens at each step: Database Management Systems Database Design Understanding Different steps in detail: Database Management Systems Database Design Understanding Different steps in detail: Database Management Systems Database Design Understanding Different steps in detail: Database Management Systems Database Engine Database Management Systems Database Engine A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional components of a database system can be divided into The storage manager, The query processor component, The transaction management component. Database Management Systems Storage Manager A program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible for the following tasks: Interaction with the OS file manager Efficient storing, retrieving and updating of data The storage manager implements several data structures as part of the physical system implementation: Data files -- store the database itself Data dictionary -- stores metadata about the structure of the database, in particular the schema of the database. Indices -- can provide fast access to data items. A database index provides pointers to those data items that hold a particular value. Database Management Systems Storage Manager The storage manager components include: Authorization and integrity manager Transaction manager File manager Buffer manager Database Management Systems Query Processor The query processor is important because it helps the database system to simplify and facilitate access to data. The query processor allows database users to obtain good performance while being able to work at the view level and not be burdened with understanding the physical-level details of the implementation of the system. The query processor components include: DDL interpreter DML compiler ○ performs query optimization, i.e. it picks the lowest cost evaluation plan from among the various alternatives. Query evaluation engine Database Management Systems Query Processor The internal structure of query optimizer: Database Management Systems Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation Database Management Systems Transaction Management A transaction is a collection of operations that performs a single logical function in a database application Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. Transaction properties: A transaction is a logical unit of work that must maintain atomicity (all-or-none execution), consistency (database remains valid throughout), and durability (changes are permanent even after system failure). Consistency and atomicity: During the execution of a transaction, temporary inconsistency may be allowed, but the database must return to a consistent state after the transaction completes successfully. Database Management Systems Transaction Management The transaction Manager Application of transactions: Transactions are widely used in various applications, including financial, telecommunication, and long-duration activities like product design or administrative workflows. THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Database and Application Architecture Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Database Architecture Centralized databases (Centralized architecture) ○ Suitable for shared-memory server architectures with multiple CPUs accessing a common shared memory. ○ Limited scalability for larger data volumes and higher processing speeds. ○ One to a few cores, shared memory Client-server ○ One server machine executes work on behalf of multiple client machines. Client-Server DB Centralized/Shared Memory DB Database Management Systems Database Architecture Parallel databases ○ Designed to run on a cluster of multiple machines, ○ enables better scalability and higher processing capabilities. ○ Many cores shared memory ○ Shared disk Distributed databases ○ Allow data storage and query processing across geographically separated machines, facilitating large-scale data management. ○ Schema/data heterogeneity Parallel DB Distributed DB Database Management Systems Application Architectures We now consider the architecture of applications that use databases as their backend. Database applications can be partitioned into two or three parts. Two Tier Architecture Earlier-generation database applications used a two-tier architecture application resides at the client machine and invokes database system functionality at the server machine through query language statements. Two Tier Architecture Database Management Systems Application Architectures Three Tier Architecture Modern database applications use a three-tier architecture Client machine acts as merely a front end and does not contain any direct database calls; web browsers and mobile applications are the most commonly used application clients today. The front end communicates with an application server. The application server, in turn, communicates with a database system to access data. The business logic of the application, which says what actions to carry out under what conditions, is embedded in the application server, instead of being distributed across multiple clients. Three Tier Architecture Three-tier applications provide better security as well as better performance than two-tier applications. THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Database Users & Administrators Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Database Users People who work with a database can be categorized as: 1. Database Users 2. Database Administrators. Database Users There are four different types of database-system users, differentiated by the way they expect to interact with the system. Different types of user interfaces have been designed for different types of users. 1. Naïve Users 2. Application Programmers 3. Sophisticated Users 4. Database Administrators Database Management Systems Database Users Naïve users : They are inexperienced users who interact with a system using predefined interfaces like web or mobile applications, often through filling out forms. For instance, a student registering for a class through a web application where they enter their desired information into a form. Application programmers : They are skilled computer professionals who write application programs. They have various tools at their disposal to develop user interfaces. For example, a programmer creating a new mobile app with a custom interface for users to interact with. Sophisticated users : These users interact with the system without writing code. They can use database query languages or data analysis software to form their requests and explore data in the database. An example would be an analyst querying a database to retrieve specific information for a report without needing to write any code. Database Management Systems Database Administrator One of the main reasons for using DBMSs is to have central control of both the data and the programs that access those data. A person who has such central control over the system is called a database administrator (DBA) Functions of a Database Administrator (DBA): 1. Schema definition: DBA creates the original database schema using data definition statements (DDL). 2. Storage structure and access-method definition: DBA specifies parameters for data physical organization and index creation. 3. Schema and physical-organization modification: DBA makes changes to reflect organization's needs or enhance performance. 4. Granting authorization for data access: DBA regulates user access by granting different types of authorization. 5. Routine maintenance: DBA performs periodic backups, ensures disk space availability, and monitors database performance. THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Structure of Relational Schema S Nagasundari Department of Computer Science Engineering Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Revisiting Relational Model The relational model represents how data is stored in Relational Databases. The relational data model uses tables (relations) to store data and represent relationships between them. Each table is assigned a unique name. Each table has multiple columns with unique names, representing attributes of the data. Tables in the relational model are also known as relations and store records of specific types with fixed attributes. Information is represented by rows in the tables, where each row represents one piece of data. The relational data model is widely used and forms the basis of the majority of modern database systems. It belongs to the category of record-based models, organizing data in fixed-format records of various types. Database Management Systems Structure of Relational Model Let us consider University Database Management System that is designed to effectively manage vital information concerning university instructors and courses. The university shall store essential details about instructors. Each instructor would have a unique identification called ID, along with this ID we would also have to store information like their name, department affiliation, and salary. Furthermore, the database also shall maintain crucial course-related data. Each course would be uniquely identified by its course ID and also should contain information about the corresponding title, the department offering the course, and the credits assigned to each course. For a few courses there would be certain pre-requisite courses that would have to be completed for a better understanding of the course. If we were to design a relational database what would its structure be like? Database Management Systems Structure of Relational Model Let us consider the information we are supposed to store for an instructor, in this, each instructor would have: o a unique identifier called ID, o his/her name, o the department he is associated with o salary. All these above-specified are called attributes and if we are storing it in a relational database, we would be storing it in a table called the Instructor table with all these attributes as column headers Similarly the information regarding the courses would be stored in the course table that would have attributes course_ID, title, department, and credits as the columns for the table Likewise the prereq table would store the details regarding the prerequisite courses for each course. The table has two columns, course id, and prereq id. Each row consists of a pair of course identifiers such that the second course is a prerequisite for the first course. Database Management Systems Structure of Relational Model To summarize the structure of the relational model would be something like this: There would be three tables as shown Each table would have a unique column name also called attributes Database Management Systems Structure of Relational Model On populating these tables with values it would look something like this: Instructor table Course table Prereq table Database Management Systems Structure of Relational Model Let us consider the highlighted row in the table Prereq table: What does this row indicate? It indicates that a course with course_ID “BIO-301” needs a course “BIO-101” to be completed before the student enrolls for it. Thus, a row in the prereq table indicates that two courses are related in the sense that one course is a prerequisite for the other. Similarly, when we consider the table instructor, a row in the table can be thought of as representing the relationship between a specified ID and the corresponding values for name, dept name, and salary values Database Management Systems Structure of Relational Model So, in general, a row in a table represents a relationship among a set of values. Since a table is a collection of such relationships, there is a close correspondence between the concept of the table and the mathematical concept of relation, from which the relational data model takes its name. In mathematical terminology, a tuple is simply a sequence (or list) of values. A Attribute or Column relationship between n values is represented mathematically by an n-tuple of values, that is, a tuple with n values corresponds to a row in a table. Thus, in the relational model: o the term relation is used to refer to a table o The term tuple is used to refer to a row. Record or Tuple o The term attribute refers to a column of a table. Instructor table or Relation Database Management Systems Structure of Relational Model – Relational Instance Consider the situation in which the university has employed another instructor in a particular department, then we must modify the instructors’ table with the new instructor’s table So the Instructor’s table entries would keep changing as and when there are new instructors or if the current instructor leaves A specific instance of a relation, which includes a particular set of rows, is referred to as a Relational Instance. The instance of instructor shown in Figure has 12 tuples, corresponding to 12 instructors. Instructor table Database Management Systems Structure of Relational Model Consider the two tables given below. Are both of them the same or different? The relation is a set of tuples. So like in any other set, the Order of arrangement of tuples doesn’t matter. Therefore, whether the tuples of a relation are listed in sorted order, as in Instructor table 1, or are unsorted, as in Instructor table 2, does not matter; the relations in the two figures are the same, since both contain the same set of tuples. Instructor table 1 Instructor table 2 Database Management Systems Structure of Relational Model – Domain In the instructors table the column “Salary” could contain only the value of possible salary values, similarly the column name would contain only a set of all possible instructor names Therefore, For each attribute of a relation, there is a set of permitted values, called the Domain of that attribute. Thus the domain of “Salary” column in Instructors table is the set of all possible salary values Similary, the set of all possible instructors’ names would be the domain of “Name” column in Instructor table Database Management Systems Structure of Relational Model – Atomic values To maintain simplicity and consistency in the database, we require that all domains for attributes in any relation be Atomic. An atomic domain means that each element within the domain is considered one indivisible unit. For instance, let's consider if there was a "phone number" attribute in the "instructor" table. If the attribute can store multiple phone numbers for an instructor, it becomes non-atomic because it contains subparts (individual phone numbers within the set). However, even if we store only one phone number, if we further break it down into parts like country code, area code, and local number, it remains non-atomic. To ensure atomicity, we should treat each phone number as a single, indivisible unit. This way, the "phone number" attribute would have an atomic domain, making it simpler to manage and maintain data integrity in the database. Database Management Systems Structure of Relational Model – Null values Let's consider the "phone number" attribute in the "instructor" relation. Some instructors may not have a phone number at all, or their phone numbers might be unlisted. In such cases, we use the Null value to represent that there is no available phone number for the instructor. Null values can cause challenges when accessing or updating the database. They may lead to issues in calculations, comparisons, and querying operations. It’s thus best to eliminate null values whenever possible to maintain data consistency and integrity. Proper data validation and default values can be used to avoid nulls when entering data. Database Management Systems Database Schema S Nagasundari Department of Computer Science Engineering Database Management Systems Relational Schema What do we mean by a database schema and how is it different from a database instance? The schema is a logical design of the database In general, a relation schema consists of a list of attributes and their corresponding domains. Whereas the database instance Is the snapshot of the data in the database at a given instance of time Basically if we consider these w.r.t. programming languages, Relation corresponds to a variable, relation schema corresponds to type definition relation instance corresponds to value of a variable The value of a given variable may change with time; similarly, the contents of a relation instance may change with time as the relation is updated. In contrast, the schema of a relation does not generally change. Database Management Systems Relational Schema Let us consider the instructor table and department tables in the university database Their respective schemas are: Instructor (ID, name, dept_name, salary) Department (dept_name, building, budget) If we notice, the attribute dept name appears in both the instructor schema and the department schema. Do you think this is a coincidence? If not what is the significance of this Instructor table Department table Database Management Systems Relational Schema The duplication of dept_name is not a coincidence. Using common attributes in relation schemas is one way of relating tuples of distinct relations. For example, suppose we wish to find information about all the instructors who work in the Watson building. We look first at the department relation to find the dept name of all the departments housed in Watson. Then, for each such department, we look in the instructor relation to find the information about the instructor associated with the corresponding dept name This way we can make use of the common attributes to get the information we need from different relations Instructor table Department table Database Management Systems Relational Schema Similarly, each course in a university may be offered multiple times, across different semesters, or even within a semester. Therefore we need a relation to describe each individual offering, or section, of the class. For this the schema is: section (course id, sec id, semester, year, building, room number, time slot id) We would also need a relation to describe the association between instructors and the class sections that they teach. The relation schema to describe this association is: teaches (ID, course id, sec id, semester, year) Database Management Systems Relational Schema Like the above could you come up with the schemas for the following relations that are to be stored in a the database for example: The university database would have to maintain the details of the student like their ID, name, the department they are in, and also the total credits scored respective student Every student in the university would have a faculty advisor to mentor the student, that information also shall be stored in the database Every student would enroll for different courses during a particular semester and would secure a grade in that course after completing the course Database Management Systems Relational Schema The relational schemas for the above statements would be: The university database would have to maintain the details of the student like their ID, name, the department they are in, and also the total credits scored respective student student (ID, name, dept name, tot cred) Every student in the university would have a faculty advisor to mentor the student, that information also shall be stored in the database advisor (s_id, i_id) Every student would enroll for different courses during a particular semester and would secure a grade in that course after completing the course takes (ID, course id, sec id, semester, year, grade) Database Management Systems Relational Schema In general, A1, A2, …, An are attributes R = (A1, A2, …, An ) is a relation schema Example: instructor = (ID, name, dept_name, salary) A relation instance r defined over schema R is denoted by r(R). The current values of a relation are specified by a table An element t of relation r is called a tuple and is represented by a row in a table THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Keys, Constraints and Schema Diagrams S Nagasundari Department of Computer Science Engineering Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Constraints Constraints determine which values are permissible and which are not in the database. They are of three main types: Inherent or Implicit Constraints: Based on the data model itself. (E.g., relational model does not allow a list as a value for any attribute) Schema-based or Explicit Constraints: Expressed in the schema by using the facilities provided by the model. (E.g., max. cardinality ratio constraint in the ER model) Application based or semantic constraints: Beyond the expressive power of the model and must be specified and enforced by the application programs. Database Management Systems Constraints Constraints are conditions that must hold on all valid relation states. The Main types of (explicit/ schema-based) constraints that can be expressed in the relational model: Domain constraint. Key constraints and Constraints on NULL values Integrity Constraints Entity integrity constraints Referential integrity constraints Database Management Systems Domain Constraints Domain Constraint specifies that Within each tuple, the value of each attribute A must be an atomic value from the domain dom(A) Every value in a tuple must be from the domain of its attribute (or it could be null, if allowed for that attribute) Could you think of what all different datatypes associated with the domain? Database Management Systems Domain Constraints The data types associated with domains include Standard numeric data types for integers (such as short integer, integer, and long integer) Real numbers (float and double-precision float). Characters, Booleans, Fixed-length strings, Variable-length strings, as are date, time, timestamp, and other special data types. Domains can also be described by a subrange of values from a data type or as an enumerated data type in which all possible values are explicitly listed Database Management Systems Keys As discussed before, the relation consists of set of tuples or rows. As the term set suggests, all the tuples must be different or unique from each other Thus we must have a way to specify how tuples within a given relation are distinguished. This is expressed in terms of their attributes. That is, the values of the attribute values of a tuple must be such that they can uniquely identify the tuple. In other words, no two tuples in a relation are allowed to have exactly the same value for all attributes Database Management Systems Superkey A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely a tuple in the relation. Let us consider the example of University DB, the ID attribute of the relation instructor is sufficient to distinguish one instructor tuple from another. Thus, ID is a superkey. The name attribute of instructor, on the other hand, is not a superkey, because several instructors might have the same name. Database Management Systems Superkey Formally defining; If we say that a subset SK of R is a superkey for r: we are restricting consideration to instances of relations r in which no two distinct tuples have the same values on all attributes in K. That is, if t1 and t2 are in r and t1 ≠ t2, then t1.SK ≠ t2.SK. Superkey of R: Is a set of attributes SK of R with the following condition: No two tuples in any valid relation state r(R) will have the same value for SK That is, for any distinct tuples t1 and t2 in r(R), t1[SK] ≠ t2[SK] This condition must hold in any valid state r(R) Database Management Systems Superkey Let us take this situation into consideration: Instructor (ID, name, dept_name, salary) Now as seen before {ID} is a superkey as it uniquely identifies each tuple in the relation What about the set {ID, name} is this also a superkey? The answer is yes A superkey may contain extraneous attributes like shown above If SK is a superkey, then so is any superset of SK. We are often interested in superkeys for which no proper subset is a superkey. Such minimal superkeys are called candidate keys. Database Management Systems Superkey Let us now consider a Car relational schema as shown below Car (State, Reg#, SerialNo, Make, Model, Year) If we notice, the SerialNo is an candidate key as it is unique for the Car But we also know that {State, Reg#} together are also unique for a given car So in the above case {SerialNo} and {State, Reg#} both are minimal superkeys and also called candidate keys Thereby we can conclude that A relation schema may have more than one minimal superkey. In this case, each of them is called a candidate key. Database Management Systems Key Constraints If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined. Example: Consider the CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) We chose SerialNo as the primary key The primary key value is used to uniquely identify each tuple in a relation Provides the tuple identity Also used to reference the tuple from another tuple General rule: Choose as primary key the smallest of the candidate keys (in terms of size) Not always applicable – choice is sometimes subjective It is customary to list the primary key attributes of a relation schema before the other attributes; Database Management Systems Key Constraints Let us consider the classroom relation in the university database Schema : classroom (building, room_number, capacity) Here the primary key consists of two attributes, building and room number, which are underlined to indicate they are part of the primary key. Neither attribute by itself can uniquely identify a classroom, although together they uniquely identify a classroom. Database Management Systems Key Constraints Primary keys must be chosen with care. As we noted, the name of a person is insufficient, because there may be many people with the same name. In the our country, the aadhar number attribute of a person would be a candidate key. Since non-Indian residents usually do not have aadhar numbers, international enterprises must generate their own unique identifiers. An alternative is to use some unique combination of other attributes as a key. The primary key should be chosen such that its attribute values are never, or are very rarely, changed. For instance, the address field of a person should not be part of the primary key, since it is likely to change. Aadhar numbers, on the other hand, are guaranteed never to change. Unique identifiers generated by enterprises generally do not change, except if two enterprises merge; in such a case the same identifier may have been issued by both enterprises, and a reallocation of identifiers may be required to make sure they are unique Database Management Systems Key Constraints The schema along with the primary keys for the university database is as shown: Database Management Systems Relational Database State A relational database state DB of S is a set of relation states DB = {r1 , r2 ,..., rm} such that each ri is a state of Ri and such that the ri relation states satisfy the integrity constraints specified in IC. A relational database state is sometimes called a relational database snapshot or instance. A database state that does not obey all the integrity constraints is called not valid A state that satisfies all the constraints in the defined set of integrity constraints IC is called a valid state Database Management Systems Entity Integrity Constraint Entity integrity constraint states that The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples. t[PK] ≠ null for any tuple t in r(R) If PK has several attributes, null is not allowed in any of these attributes Note: Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key. Database Management Systems Foreign key Constraint A foreign-key constraint from attribute(s) A of relation r1 to the primary-key B of relation r2 states that on any database instance, the value of A for each tuple in r1 must also be the value of B for some tuple in r2. Attribute set A is called a foreign key from r1, referencing r2. The relation r1 is also called the referencing relation of the foreign-key constraint, and r2 is called the referenced relation. Database Management Systems Foreign key Constraint For example, let us consider the following schema: Instructor (ID, name, Dept_name, Salary) Department (Dept_name, building, budget) attribute “dept_name” in instructor is a foreign key from the instructor table, referencing the department table; Note that dept_name is the primary key of the department. So here, Instructor 🡪 referencing relation Department 🡪 referenced relation ** Note that in a foreign-key constraint, the referenced attribute(s) must be the primary key of the referenced relation ** Database Management Systems Referential Integrity Constraint This is a more general case of the foreign key constraint A referential-integrity constraint, relaxes the requirement that the referenced attributes form the primary key of the referenced relation. As an example, consider the values in the time slot id attribute of the section relation. section(course id, sec id, semester, year, building, room number, time slot id) time slot(time slot id, day, start time, end time) We require that these values must exist in the time slot id attribute of the time slot relation. Such a requirement is an example of a referential integrity constraint. In general, a referential integrity constraint requires that the values appearing in specified attributes of any tuple in the referencing relation also appear in specified attributes of at least one tuple in the referenced relation. Database Management Systems Referential Integrity Constraint Note that in the above example, the time slot does not form a primary key of the time slot relation, although it is a part of the primary key; thus, we cannot use a foreign-key constraint to enforce the above constraint. In fact, foreign-key constraints are a special case of referential integrity constraints, where the referenced attributes form the primary key of the referenced relation. Database systems today typically support foreign-key constraints, but they do not support referential integrity constraints where the referenced attribute is not a primary key Database Management Systems Referential Integrity Constraint Statement of the constraint: The value in the foreign key column (or columns) FK of the referencing relation R1 can be either: A value of an existing primary key value of a corresponding primary key PK in the referenced relation R2, or A null. In case (2), the FK in R1 should not be a part of its own primary key. Database Management Systems Schema Diagrams A database schema, along with primary key and foreign-key constraints, can be depicted by schema diagrams. Let us consider the relational schema for the university database. This would be as shown below: Database Management Systems Schema Diagrams Each relation appears as a box, with the relation name at the top in blue and the attributes listed inside the box. Primary-key attributes are shown underlined. Foreign-key constraints appear as arrows from the foreign-key attributes of the referencing relation to the primary key of the referenced relation. We use a two-headed arrow, instead of a single-headed arrow, to indicate a referential integrity constraint that is not a foreign-key constraint. Schema diagram for university Database THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] Database Management Systems Database Design - Overview S Nagasundari Department of Computer Science Engineering Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. Database Management Systems Recap – Design Phases As we have seen before the design phases would involve the following steps: ○ Characterizing User Data Needs ○ Choosing a Data Model and Translating Requirements ○ Review and Refinement of Schema ○ Functional Requirements Specification ○ Logical-Design Phase ○ Physical-Design Phase Database Management Systems Recap – Design Phases Let us just have a look at the design phases again: Characterizing User Data Needs: Interaction with domain experts and users. Specification of user requirements. Choosing a Data Model and Translating Requirements: Selection of a data model. Translation of requirements into the conceptual schema. Using entity-relationship model for conceptual design. Review and Refinement of Schema: Confirming data requirement satisfaction. Eliminating redundant features. Database Management Systems Recap – Design Phases Let us just have a look at the design phases again: Functional Requirements Specification: Users describe operations (transactions) on data. Ensure schema meets functional requirements. Logical-Design Phase: Mapping conceptual schema to implementation data model. Using entity-relationship model for relational design Physical-Design Phase: Using system-specific database schema. Specifying physical features (file organization, indexes). Database Management Systems Why do we need a design phase? The physical schema of a database can be changed relatively easily after an application has been built. However, changes to the logical schema are usually harder to carry out, since they may affect a number of queries and updates scattered across application code. It is therefore important to carry out the database design phase with care, before building the rest of the database application. Database Management Systems Design Alternatives A major part of the database design process is deciding how to represent in the design the various types of “things” such as people, places, products, and the like. We use the term entity to refer to any such distinctly identifiable item. ○ Considering a university database, the entities would be: instructors, students, departments, courses, etc. The various entities are related to each other in a variety of ways, all of which need to be captured in the database design. ○ For example: A student takes a course offering An instructor teaches a course offering Database Management Systems Design Alternatives While designing a database we must try to avoid two major pitfalls: ○ Redundancy ○ Incompleteness Let us try and understand what these are with respect to the university database system, and what are their drawbacks if not handled Database Management Systems Redundancy A bad design leads to repeating the same information unnecessarily which would lead to redundancy For example in the university database, let us that we are storing the department_ID along with the department name for every course in the course entity Course(Department_ID, department_name, course_ID, course_name, credits) In this situation, a department would offer many courses and unnecessarily we would store the department name along with the ID those many times. This would lead to redundancy. It would suffice to store only the department_ID with each course and to associate the department name with the Department_ID only once, in a department entity Redundancy can also occur in a relational schema. Imagine a single table with all course information repeated for each section. This creates unnecessary redundancy, as each course's details are duplicated. Database Management Systems Redundancy Problem with Redundancy: ○ Redundant copies of data can become inconsistent if updated without care. ○ Different sections of a course could end up having different titles if not properly managed. Example: Inconsistent Course Titles: ○ Multiple offerings of a course might have the same identifier but different titles. ○ This inconsistency makes it unclear which title is correct. Optimal Data Management: ○ Ideally, information should be stored in one place to avoid redundancy. ○ A course's title should be stored only in a single entity (like the course entity) and linked to offerings as needed. Database Management Systems Incompleteness For instance, if we only have entities for course offerings(Sections) and not for the actual courses, it creates difficulties in modeling. ○ Imagine if we only track when courses are offered but don't have a separate entity for courses. ○ This design limitation would make it impossible to represent new courses that aren't currently offered. Similarly in Relations: ○ In terms of relational databases, having a single table with repeated course information for each section causes problems. ○ This approach prevents us from representing new courses that don't yet have offerings. Database Management Systems Incompleteness Challenge in Representing New Information: ○ With this flawed design, adding details about a new course becomes tricky, as it must have an associated section to fit the structure. ○ Attempting to use null values for section information is a less desirable