DBMS u1 PDF - Database Management Systems
Document Details
Uploaded by JudiciousOklahomaCity
Tags
Summary
This document is a course outline for a Database Management Systems (DBMS) course. It covers various aspects, including introduction to DBMS, database design, relational model, advanced design concepts, and database transaction management. In addition, it details lab sessions and required tools/languages.
Full Transcript
Database Management Systems Introduction Database Management Systems Course Outline Unit 1: Introduction to Database Management Database System Applications, Purpose, View of data, Database Languages, Database design, Introduction to databases, Database application architecture, Users and...
Database Management Systems Introduction Database Management Systems Course Outline Unit 1: Introduction to Database Management Database System Applications, Purpose, View of data, Database Languages, Database design, Introduction to databases, Database application architecture, Users and Administrators, E-R Model, reducing ER to a relational schema. Structure of relational databases, Database schema, and its constraints, Keys 14 Hours Unit 2: Relational Model and Database Design Relational operations (Algebra), Unary Operations - Unity, Binary, Aggregate Functions, Grouping, SQL ,overview, Data definition, Structure of SQL queries, Additional Basic Operations, Set Operations, Null Values, Aggregate Functions, Nested Subqueries, Database Modification, Join expressions, Views, Triggers, Functions, and Procedures, Introduction to strategies of Query processing and Query optimization. 14 Hours 2 Database Management Systems Course Outline Unit 3: Advanced Design Concepts and Implementation Functional Dependencies, Inference Rules, Closure, Equivalence, Minimal Cover Normal Forms Based on Primary Keys (1NF, 2NF, and 3NF), General Definitions of Second and Third Normal Forms Boyce-Codd Normal Form, Properties of Relational Decompositions, Overview of Higher Normal Forms. 14 Hours Unit 4: Advanced Databases Database transactions, Concurrency control, Locking, Recovery, Database Security, Introduction to NoSQL databases, Document database (MongoDB), Key-Value database (DynamoDB), Graph databases (Neo4j), Wide-column store database (HBase) 14 Hours 3 Database Management Systems Course Outline Lab/Hands-on sessions Draw an ER diagram for a given problem statement Conversion of an ER diagram into Relational schema DDL – create, constraints, alter, rename, drop, truncate table, Views. DML – Insert, Update, Delete, Transactions - commit, rollback, savepoint SQL - Set operators: union, intersect, minus. SQL – Aggregate functions. SQL – Joins: inner, outer; Sub queries: correlated and uncorrelated. SQL – Creating Functions and Procedures SQL – Creating Triggers and Cursors XML- Database access. NoSQL database queries High-level programming language accessing a database using an API. Tools/ Languages: MySQL Workbench, Python, ERwin, Any other tool for ER modeling 4 Database Management Systems Textbooks Textbook 1 (TB1) Textbook 2 (TB2) 5 Database Management Systems Course Evaluation Policy : Tentative DBMS- 5 credits course 6 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 7 Database Management Systems Introduction ➔ Have you ever wondered on how a large e-commerce website like Amazon is able to retrieve the products that you would want to purchase? ➔ Nowadays social media (eg: Instagram) one of the most active platforms where people post their day to day activities etc. how and where is all these information stored? By typing a person's name we can get to see their entire profile, posts that person has posted how is this possible? ➔ In a bank, multiple transactions take place everyday. Even if 2 transactions are happening simultaneously from account, still the money in the accounts is properly deducted and credited how does this happen? 8 Database Management Systems Introduction ➔ Suppose you visit your doctor for a general checkup. Now by typing in your name or your ID, all your medical history details would be visible to the doctor. How does this happen? ➔ When you search for a specific topic on Google, how does the search engine quickly retrieve relevant web pages from millions of websites and present them to you? 9 Database Management Systems Introduction All the above questions have a single answer and that is DATABASE Before going ahead and understanding what database exactly and how is designed and used for all the other application, let us try answering a few more questions: What do you mean by Data? How is it different from Information? In current world, there is a lot of data that is available, how are all these stored or organised so that we can get the required information? 10 Database Management Systems What Do you mean by Data & Information? Data Data represents the raw material, like individual puzzle pieces, that serves as the foundation for knowledge. Data is a raw and unorganized fact that is required to be processed to make it meaningful. Information When data is processed, organized, structured, or presented in a given context to make it useful is called information. Data is a collection of facts, while information puts those facts into context It is organized and is utilized by humans in some significant way to make decisions and draw some conclusions Data doesn’t depend on information, Information depends on data 11 Database Management Systems What is a Database? Database A database is a collection of related data representing some aspect of the real world, also called the mini-world or the universe of discourse (UoD). It is a logically coherent collection, meaning it is not just a random assortment of data but is organized with inherent meaning and structure. Databases are designed, built, and populated for a specific purpose, catering to the needs of applications or systems that interact with them. They can vary in size and complexity, ranging from small databases used by individual applications to large-scale enterprise databases handling vast amounts of data. Could you think of what do these individual terms mean w.r.t student information Database? 12 Database Management Systems What is a Database 13 Database Management Systems Database Management System Database Management System A modern database system is a complex software system whose task is to manage a large, complex collection of data. DBMS contains information about a particular enterprise ○ Collection of interrelated data ○ Set of programs to access the data ○ An environment that is both convenient and efficient to use Database systems are used to manage collections of data that are: ○ Highly valuable ○ Relatively large ○ Accessed by multiple users and applications, often at the same time. Let us break these statements down further and understand them in the context of the student information database 14 Database Management Systems Student Information Database 15 Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Defining a database involves specifying the data types, structures, and constraints of the data to be stored in the database. By defining the structure and data types, the DBMS ensures the consistency and integrity of the stored data. The "Students" table may include fields such as "Student ID" (numeric), "Name" (text), "Email" (text), "Date of Birth" (date), and "Major" (text). The "Courses" table may have fields like "Course ID" (numeric), "Course Name" (text), "Credits" (numeric), and "Instructor" (text). 16 Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Constructing the database is the process of storing the data on some storage medium that is controlled by the DBMS. It organizes the data in a way that allows for efficient retrieval and manipulation. The DBMS ensures data is securely stored and can be accessed by authorized users or applications. The DBMS may store the Student Information Database on a server with appropriate data storage capacity and performance capabilities. 17 Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Manipulating a database includes functions such as querying the database to retrieve specific data, updating the database to reflect changes in the mini world, and generating reports from the data. It ensures that any changes made to the database follow predefined constraints and maintain data consistency. A university administrator may use SQL queries to retrieve specific information, such as "Retrieve all students enrolled in Computer Science courses." The DBMS processes the query, retrieves the relevant data from the database, and presents it to the user. Allows authorized users to update the database, such as adding new students or updating grades. 18 Database Management Systems Database Management System Database Management System A general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications ○ Sharing a database allows multiple users and programs to access the database simultaneously, provided they have the necessary permissions Faculty members, administrative staff, and students can access relevant information concurrently, as long as they have the necessary permissions. For example, faculty members can access student details for their courses, while administrators can update and maintain the overall database. 19 Database Management Systems Questions Some questions to revise 1) How do you define data, and how does it differ from information in the context of database management systems? 2) List out the data types you would take to define a database for an online bookstore. 3) Explain the role of a DBMS in managing highly valuable and relatively large collections of data. Why is this important for organizations? 4) What are some of the common operations performed on a database? How do these operations interact with the data stored in the database? 20 Database Management Systems How are databases used? Some online resources - https://dev.mysql.com/doc/ https://www.oracle.com/in/database/what-is-database/ https://docs.oracle.com/cd/E17952_01/index.html https://downloads.mysql.com/docs/mysql-tutorial-excerpt-8.0-en.a4.pdf 21 THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] 22 Database Management Systems Introduction 1 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 2 Database Management Systems Why study databases Databases help us to: Store large amounts of data (file structure, disk management) Understand the data (data models) Keeps data secure (security, recovery) Find required data and use/manipulate it (query languages, concurrency control, and data analysis tools) Get accurate information (as databases have built-in constraints and checks help in this) Maintain Data integrity (ensures data is accurate and consistent) 3 Database Management Systems Database Applications 1) Enterprises Customer Relationship Management (CRM): Databases store customer interactions, preferences, and sales data. Supply Chain Management: Databases track inventory levels, supplier information, and logistics. Human Resources: Manage employee data, payroll, and benefits using databases. Financial Systems: Handle accounting, budgeting, and financial reporting through database solutions. Example: Salesforce CRM is widely used by enterprises to manage customer relationships and sales data within a database. 4 Database Management Systems Database Applications 2) Manufacturing Production Tracking: Databases monitor production schedules, inventory levels, and quality control. Supply Chain Management: Manage supplier information, procurement, and logistics using databases. Product Lifecycle Management: Track product design, development, and updates in a database. Example: Toyota uses databases to manage their Just-In-Time (JIT) manufacturing processes, ensuring efficient production and inventory management. 5 Database Management Systems Database Applications 3) Banking and Finance Account Management: Databases store customer accounts, transactions, and balance information. Loan Processing: Manage loan applications, approvals, and repayments through a database system. Fraud Detection: Monitor transactions for unusual activity and potential fraud using database systems. Example: JPMorgan Chase uses advanced database systems to manage customer accounts, process transactions, and detect fraud. 6 Database Management Systems Database Applications 4) University Student Records: Databases store student information such as personal details, course registrations, grades, and attendance. Course Management: Databases manage course offerings, schedules, and faculty assignments. Library Systems: Track book inventory, borrowing history, and reservations using databases. Alumni Relations: Maintain contact information and other records in a centralized database. Example: University of California, Berkeley uses a comprehensive database system to manage student records, academic schedules, and library resources. 7 Database Management Systems Database Applications 5) Airlines Reservation Systems: Databases manage flight bookings, seat allocations, and passenger information. Flight Operations: Track flight schedules, crew assignments, and aircraft maintenance through databases. Customer Service: Store frequent flyer data, customer preferences, and service history in a database. Example: Amadeus is a global reservation database system used by many airlines for booking and flight management. 8 Database Management Systems Database Applications 6) Telecommunication Customer Data: Databases store customer information, billing details, and service preferences. Network Management: Monitor network usage, service quality, and fault management using databases. Service Provisioning: Manage the activation and maintenance of customer services through a database system. Example: AT&T utilizes databases to manage customer data, monitor network performance, and provide reliable telecommunication services. 9 Database Management Systems Database Applications 7) Web-based Services User Data Management: Databases store user profiles, preferences, and activity logs. Content Management: Manage website content, including articles, images, and videos, using a database system. Analytics: Track user interactions, traffic patterns, and conversion rates through databases. Example: Netflix uses a sophisticated database system to manage user preferences, content recommendations, and streaming data. 10 Database Management Systems Database Applications 8) Document Databases Data Storage: Document databases store semi-structured data in formats like JSON, BSON, or XML, allowing for flexible schema design and easy data retrieval. Content Management: Manage collections of documents such as articles, patents, research papers, and other unstructured content. Scalability: Document databases are designed to handle large volumes of data and can scale horizontally across multiple servers. Use Cases: Often used for content management systems, catalogues, and data lakes where data doesn't fit neatly into relational models. Example: MongoDB is a popular document database used by companies like The New York Times to manage and deliver vast amounts of news articles and media content. 11 Database Management Systems Database Applications 9) Navigation Systems Location Data: Databases store geographic information including locations of places of interest, routes, and geographic features. Real-Time Updates: Manage real-time traffic data, route optimization, and point-of-interest information. User Interaction: Provide features like route planning, navigation guidance, and local searches. Integration: Often integrated with other systems such as weather services, local businesses, and emergency services to enhance user experience. Example: Google Maps uses a complex database system to manage and update location data, provide navigation services, and offer real-time traffic updates to millions of users worldwide. 12 Database Management Systems How are databases used? There are two modes in which databases are used today Online Transaction Processing (OLTP): ○ Used by a large number of users for small data retrieval and updates ○ common in most database applications like banking, universities, and airlines. Data Analytics/Online Analytical Processing(OLAP): ○ Involves processing data to draw conclusions and create predictive models for business decisions. ○ Examples include loan approval, targeted advertisements, and manufacturing decisions. ○ Data mining combines AI and statistical techniques for efficient analysis of large databases. 13 Database Management Systems View of Data A database system is a collection of interrelated data and a set of programs that allow users to access and modify these data. A major purpose of a database system is to provide users with an abstract view of the data. ○ Data models A collection of conceptual tools for describing data, data relationships, data semantics and constraints. ○ Data abstraction Hide the complexity of data structures to represent data in the database from users through several levels of data abstraction. The structure of a database is defined by its data model, which includes tools to describe data, relationships, semantics, and consistency constraints. 14 Database Management Systems Data Models RELATIONAL MODEL Uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations. The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed number of fields or attributes. Each row of the table represents one piece of information The columns of the table correspond to the attributes of the record type. The relational data model is the most widely used data model, and a vast majority of current database systems are based on the relational model. 15 Database Management Systems Data Models Entity-Relationship Model The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database design. 16 Database Management Systems Data Models Semi Structured Data Model The semi-structured data model permits the specification of data where individual data items of the same type may have different sets of attributes. This is in contrast to the data models mentioned earlier, where every data item of a particular type must have the same set of attributes. JSON and Extensible Markup Language (XML) are widely used semi-structured data representations A database model where there is no separation between the data and the schema, and the amount of structure used depends on the purpose. The advantages of this model are the following: It can represent the information of some data sources that cannot be constrained by schema. Example of XML 17 data Database Management Systems Data Models Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has become the dominant software-development methodology. This led initially to the development of a distinct object-oriented data model, but today the concept of objects is well integrated into relational databases. Standards exist to store objects in relational tables. Database systems allow procedures to be stored in the database system and executed by the database system. This can be seen as extending the relational model with notions of encapsulation, methods, and object identity. 18 Database Management Systems Questions Some questions to revise 1) List and describe three real-world applications of databases in different industries. How do these applications benefit from using databases? 2) What are the two modes in which databases are used today? 3) Discuss the various data models. How does the relational model differ from the semi-structured and object-based models? 4) Explain the significance of document databases in storing and managing semi-structured data. How does MongoDB exemplify the use of document databases in real-world applications? 19 THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] 20 Database Management Systems Purpose of Database System Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 2 Database Management Systems File-processing Systems File-based System One way to keep information on a computer is to store it in permanent files. A company system has a number of application programs; each of them is designed to manipulate data files. These application programs have been written at the request of the users in the organization. New applications are added to the system as the need arises. 3 Database Management Systems Challenges of File-processing Systems Let us take an example of a University Organization system. We need to store information about the following entities: Instructors, Students, Departments, Courses offered. One of the ways to keep this information is to store it in operating file systems To allow the users to manipulate the information, the system would have a number of application programs that manipulate the files including: ○ Addition of new students, instructors, and courses ○ Register students for different courses ○ Assign grades to students, compute GPA, and generate transcripts. 4 Database Management Systems Challenges of File-processing Systems Scenario: The university decides to add a new major. What specific steps must be taken to integrate a new major into the university's existing system? 1. Create a New Department: ○ Set up a new department for the major. ○ Create new permanent files for department information. 2. Develop New Application Programs: ○ Write programs to manage rules for the new major. ○ Ensure programs handle course registration, grading, and transcripts for the new major. This typical file-processing system is supported by a conventional operating system. The system stores permanent records in various files, and it needs different application programs to extract records from and add records to, the appropriate files. 5 Database Management Systems Challenges of File-processing Systems What is the need for a Database Management System when we can do the university organization using a simple file-processing system? Consider the following situations: Double Major Students : What if there is a student who has enrolled for a double major (Physics and Computer Science) how many times do we store all his details? Updating information : What happens if a student changes their address or phone number? Is it guaranteed that the update reflects in all relevant files across departments? 6 Database Management Systems Challenges of File-processing Systems Efficient Data Retrieval: The university transport department wants to find out all students who live within a particular postal code area. Can he quickly retrieve the data or does he have to build another application for it? Data Format Consistency: What if files that store these data are written in different formats is it easy to access the data in this form? Adding New Constraints: Suppose a new constraint is to be added, can it be done efficiently without having to make extensive changes to existing application programs? 7 Database Management Systems Challenges of File-processing Systems and the Purpose of Database Systems There are a lot of drawbacks to the file-processing system : Data Redundancy and Inconsistency Difficulty in accessing the data Data isolation Integrity problems Atomicity problems Concurrent access anomalies Security problems 8 Database Management Systems Purpose of Database Systems 1. Data Redundancy & Inconsistency Consider the question of the student enrolled for a double major. What would happen in this situation in case of file system? Solution to this problem: Details of the students would be duplicated for each - Store information in centralized department - Data Redundancy location thus Minimizes redundancy This would lead to the following problems: - Changes can be made at a single spot, thus ensuring consistency Takes up more storage space - This can be achieved through a May also lead to Data Inconsistency in case of database system changes 9 Database Management Systems Purpose of Database Systems 2. Difficulty in accessing the data Consider that the university wants to find all the students living in a specific postal code area. How will we execute The solution to this problem: this in traditional file system? - Central information store for fast Either we need to get a list of all students and retrieval manually sift through it - Use a powerful query language for answering new queries Or create another special program for this task - Efficient ways to search and filter Both the above-mentioned are time-consuming and data for speeding up data inconvenient retrieval - This can be achieved through a Thus we can see that Conventional file-processing database system systems are not designed for quick and efficient data retrieval, especially when new types of queries arise. 10 Database Management Systems Purpose of Database Systems 3. Data Isolation Consider a concurrency situation, The solution to this problem: where identifying when and how changes made by one operation become visible to other concurrent users. - Ability to allow multiple transactions to access same This is a challenge because New applications has to data without interfering with retrieve the appropriate data, which might be stored in each other various files. - Consistent view of the data A file-based system must manage, or prevent, concurrency by the application programs. When an application opens a file, it will lock the file. This means that no one else has access to the file at the same time. 11 Database Management Systems Purpose of Database Systems 4. Integrity problems Scenario: Each department has an account for research The solution to this problem: funds, and the account balance must always be above 0. - Specify integrity constraints at the How can we implement this? database level to specify rules directly in the schema Each application program is created for maintaining - System should enforce constraints departmental accounts but adding new constraints automatically becomes challenging as it requires editing all the existing - To add a new constraint, updating the application programs. schema alone will be sufficient, and the system will handle the rest automatically. - This would be easily solved by using DBMS which provides all these facilities 12 Database Management Systems Purpose of Database Systems 5. Atomicity problems: In a computer system failure can happen unexpectedly Consider a banking system, a sum of 5000 rupees is getting transferred from account A to B. What happens internally? The solution to this problem: An amount of 5000 Rs. gets deducted from account A - Atomicity should be ensured The amount (5000 Rs.) gets credited to account B - Automatic rollback in case of Now either both the steps should be performed or none failure should be executed. - DBMS would ensure this and In a file system, The program might successfully deduct 5000 makes it reliable and consistent from account A but fail to credit it to account B. even if failure occurs This results in an inconsistent database state, with money missing from one account and not added to the other. 13 Database Management Systems Purpose of Database Systems 6. Concurrent Access Anomalies Consider a bank transaction example. There is a bank account (Account A) with a balance of $10,000. The solution to this problem: Two bank clerks attempt to deduct money from - Transaction and Locking to be Account A at almost the same time - one deducts implemented $500 and the other $100. - Transactions ensure atomicity Since they work concurrently, they both read the - Locking prevents conflicting initial balance of $10,000, subtract their respective changes. amounts, and write back the results. - The DBMS would provide these Depending on which one writes the new balance and maintains data consistency last, the account could end up with either $9,500 or and integrity $9,900 instead of the correct value of $9,400. 14 Database Management Systems Purpose of Database Systems 7. Security problems Imagine a university using a file-based system. Over time, they've added various application The solution to this problem: programs without a central security system. - Create a provision for administrator to define access controls and permissions for As a result, payroll personnel might each user or group. accidentally see and even change academic - These permissions should enforced at the records, compromising data privacy and database level, regardless of the confidentiality because everyone has the same application programs used. permissions. - DBMS supports various methods for introducing security like User Authentication, Data Encryption, Auditing and Logging, Centralized security management 15 Database Management Systems Characteristics of Database Systems Self-describing nature of a database system The database system contains not only the database itself but also a complete definition or description of the database structure and constraints (Metadata). Insulation between programs and data, and data abstraction The structure of data files is stored in the DBMS catalog separately from the access programs (program-data independence) Support multiple views of the data DBMSs support different types of users, each of whom may require a different perspective or view of the database. A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored. Sharing of data and multi user transaction processing A multi user DBMS, allows multiple users to access the database at the same time. It ensures that data changes are effected in a controlled manner so that the result of the changes are correct 16 Database Management Systems Advantages of Database Systems Controlling redundancy in data storage. Sharing of data among multiple users. Restricting unauthorized access to data Providing persistent storage for program Objects ○ E.g., Object-oriented DBMSs make program objects persistent Providing Storage Structures and search techniques for efficient Query Processing Providing backup and recovery Providing multiple interfaces to different classes of users. 17 Database Management Systems Advantages of Database Systems Representing complex relationships among data. Enforcing integrity constraints on the database. Drawing inferences and actions from the stored data using deductive and active rules and triggers Potential for enforcing standards Reduced application development time Flexibility to change data structures Availability of current information Economies of scale 18 Database Management Systems Q&A Identify the Problem: Case Study Applications 1. Scenario 1: Banking Transactions: At a bank, two customers attempt to withdraw money from the same account simultaneously. Due to a lack of concurrency control, the system fails to manage the simultaneous transactions correctly, resulting in the account being overdrawn. Problem Type: Concurrent Access Anomalies 19 Database Management Systems Q&A Scenario 2: Customer Records: A company stores customer information in multiple spreadsheets, with each department maintaining its own copy. Occasionally, customer addresses and contact details differ between spreadsheets, leading to confusion and errors in communication. Problem Type: Data Redundancy and Inconsistency 20 Database Management Systems Q&A Scenario 3: Inventory and Orders: A small business uses separate systems to manage inventory and orders. The inventory system is not updated in real-time with sales data, leading to situations where the business oversells products, unaware that the stock has been depleted. Problem Type: Data Isolation 21 Database Management Systems Q&A Scenario 4: Student Grades: A university records student grades manually, and sometimes errors are made in entering the data, such as typing grades incorrectly. There is no system in place to check for or correct these errors automatically. But, if grade is entered as ‘x’ which is not a valid grade then constraints can be added to handle. Problem Type: Integrity Problems 22 THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] 23 Database Management Systems Data Abstraction & Instances, Schema 1 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 2 Database Management Systems Data Abstraction DBMS plays an important role in many areas. One of the main requirements is that it must retrieve the data efficiently. This need for efficiency has led the database developers to use complex data structures to represent data in a database Many database-system users are not computer trained and do not know how to write complex queries for getting information they are looking for. Hence developer hide all these complexity from the users through several layers of data abstraction, in order to simplify the users’ interaction with the system The different levels of data abstraction: ○ Physical Level ○ Logical level ○ View level Database Management Systems Data Abstraction Let us now have a look at what happens at each level: Physical Level : It is the lowest level of data abstraction It describes how the data is actually stored in the computer Logical Level : This describes what kind of data is stored in the database and how different pieces of information are related to each other The logical level thus describes the entire database in terms of a small number of relatively simple structures. This level provides a more organized and simpler view of the data 4 Database Management Systems Data Abstraction Let us now have a look at what happens at each level: View Level : This is the highest level of abstraction that individual users interact with and that only displays a part of the entire database Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. At this level, different users might see different parts of the database, depending on their needs and permissions 5 Database Management Systems Data Abstraction Now we can notice that: Although the implementation of simple structures at the logical level may involve complex physical-level structures, the user at the logical level does not need to be aware of this. Similarly the user who views the database for various purposes need not be aware of the Logical level implementation, for them it would be like an interface and need not worry about the logical view 6 Database Management Systems Data Abstraction PHYSICAL DATA INDEPENDENCE: It refers to the characteristic of being able to modify the physical schema without any alterations to the conceptual or logical schema, done for optimization purposes. Example : the Conceptual structure of the database would not be affected by any change in the storage size of the database system server. Changing from sequential to random access files is one such example. These alterations or modifications to the physical structure may include: ○ Utilizing new storage devices. ○ Modifying data structures used for storage. ○ Altering indexes or using alternative file organization techniques etc. 7 Database Management Systems Data Abstraction LOGICAL DATA INDEPENDENCE It refers characteristic of being able to modify the logical schema without affecting the external schema or application program. The user view of the data would not be affected by any changes to the conceptual view of the data. Example: These changes may include the insertion or deletion of attributes, altering table structures entities or relationships to the logical schema, etc. 8 Database Management Systems Data Abstraction Now let us understand this with the help of an example of a University database Let the university have the given Record types/ tables as shown Now for the given database, Physical level: ○ Specify how student records, course information, and department data are stored on the hard drive or in memory. Logical Level: ○ It would define entities like "Department," "Student," and "Course" and their attributes (e.g., department name, student ID, course code). ○ It also outlines the relationships between these entities, such as "Students belong to Departments" and "Courses are taken by Students.“ 9 Database Management Systems Data Abstraction Now let us understand this with the help of an example of a University database Let the university have the given Record types/ tables as shown Now for the given database, View Level: ○ A faculty member may have access to a view showing only the courses they are teaching and the students enrolled in those courses. ○ A student, on the other hand, may have a view displaying only the courses they are registered for and their respective grades. ○ These views provide a simplified and customized perspective of the database for different users. Similarly , how could physical and logical data independence be of an advantage in the case of the university database? 10 Database Management Systems Data Abstraction Let us consider the situation where the University decides to migrate its database from one storage system to another or optimize the storage layout to improve performance. What would be the consequence of this? Without Physical data independence: Minor changes to the storage system or hardware upgrades could potentially break existing queries, applications, and interfaces. As a result, any update to the storage system would require extensive modifications to the application layer, leading to higher maintenance costs and longer development cycles. 11 Database Management Systems Data Abstraction Let us consider the situation where the University decides to migrate its database from one storage system to another or optimize the storage layout to improve performance. What would be the consequence of this? With physical data independence: the database administrators can make these changes without altering the logical schema, which means users' queries and applications will continue to work seamlessly. It allows for easier maintenance, upgrades, and performance enhancements without disrupting the application layer. 12 Database Management Systems Data Abstraction Let's say the University decides to reorganize its department structure and add new attributes to the "Department" entity. Then what would be the effect? Without Logical data independence: Any alteration to the logical schema, such as adding or removing attributes, or changing relationships between entities, would result in changes propagating throughout the applications that rely on the database. This would force developers to modify all affected application code to accommodate the changes in the schema. Consequently, even minor changes to the data model could lead to widespread disruption and downtime for the applications, increasing the risk of introducing errors and inconsistencies. 13 Database Management Systems Data Abstraction Let's say the University decides to reorganize its department structure and add new attributes to the "Department" entity. Then what would be the effect? With Logical data independence: The database designers can make these modifications without impacting the existing application programs that access the database. This flexibility allows for easier adaptability to changes in the University's requirements and business rules without causing application disruptions. 14 Database Management Systems Instances and Schema Databases change over time as information is inserted and deleted or modified. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. Let us try to understand what that exactly means with an analogy to a program written in any programming language Database Schema => variable declarations along with the associated type definitions Instances => Value of the variable at a particular point in time in the program 15 Database Management Systems Three Schema Architecture 16 Database Management Systems Instances and Schema The databases would have different type of schemas, partitioned according to the different levels of abstraction Physical Schema/Internal Schema : describes the database design at the physical level Logical Schema/Conceptual Schema : describes database design at the logical level Subschemas/External View Schema : there could be several schemas at the view level called subschemas, that describes different views Considering the university Database, what would these schemas represent? 17 Database Management Systems Instances and Schema Let's consider three levels of schema and understand them w.r.t University database: Physical Schema: ○ This describes how the data is stored in the database physically. It includes details like data storage format, file organization, and indexing methods. Changes in the physical schema should not affect the applications using the database. ○ Example: How data is stored on the hard disk, like using B-trees or hash indexes for efficient retrieval. Logical Schema: ○ This defines the database design at a higher level, focusing on the structure and relationships between the tables. It is essential for application developers as they build software based on logical schema. ○ Example: Defining the relationships between tables, such as the one-to-many relationship between students and courses, where one student can be enrolled in many courses, but each course can have multiple students. 18 Database Management Systems Instances and Schema Let's consider three levels of schema and understand them w.r.t University database: View Schema (Subschema): ○ This describes different views of the database that are tailored to specific user needs or applications. Views allow users to see only relevant information and protect sensitive data. ○ Example: Creating a view that shows only the names of enrolled students and their corresponding course names, hiding other details like grades and personal information. 19 Database Management Systems Instances and Schema Out of all the schema types, the logical schema holds the most significance when it comes to impacting application programs. This is because programmers utilize the logical schema to develop applications. The physical schema is hidden beneath the logical schema and can usually be changed easily without affecting application programs. Application programs are said to exhibit physical data independence if they do not depend on the physical schema and thus need not be rewritten if the physical schema changes. It is possible to create schemas that have problems, such as unnecessarily duplicated information. ○ If we store the details of the department chairperson in the faculty table, i.e. every faculty would have the attribute of “chairperson”, then there is a lot of space that gets wasted as the chairperson would be the same for the entire department and we are storing it for every faculty. Rather the better idea would be to store it in the department’s database. 20 Database Management Systems Database State The actual data in a database may change quite frequently. The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. In a given database state, each schema construct has its own current set of instances. Many database states can be constructed to correspond to a particular database schema. 21 Database Management Systems Database State V/S Schema The distinction between database schema and database state is very important. When we define a new database, we specify its database schema only to the DBMS. At this point, the corresponding database state is the empty state with no data. We get the initial state of the database when the database is first populated or loaded with the initial data. From then on, every time an update operation is applied to the database, we get another database state. At any point in time, the database has a current state. 22 Database Management Systems Database State V/S Schema A simplified COMPANY relational database schema Sample database state for the relational database schema 23 Database Management Systems Instances and Schema Case Study: Marvel Cinematic Universe (MCU) Character Management Database: Scenario: The Marvel Studios team manages a database for tracking MCU characters, their movies, and key events. The database includes tables for characters, movies, and appearances. The team has decided to make several updates to optimize the database and enhance data access for the creative team working on new projects and storylines. Details: Implemented a B-Tree index on the ‘character_id’ field in the ‘characters’ table to speed up lookups for characters across different movies. Created a view named ‘CharacterAppearances’ that shows each character’s name, the movies they appeared in, and key events related to their appearances. Updated the ‘movies’ table to include a new column for ‘release_date’ to better track the release schedule of MCU films. Question: Identify the type of Schema (View, Logical, Physical) for each of the scenarios mentioned above. 24 Database Management Systems Instances and Schema Case Study: Marvel Cinematic Universe (MCU) Character Management Database: Details: Implemented a B-Tree index on the ‘character_id’ field in the ‘characters’ table to speed up lookups for characters across different movies. Created a view named ‘CharacterAppearances’ that shows each character’s name, the movies they appeared in, and key events related to their appearances. Updated the ‘movies’ table to include a new column for ‘release_date’ to better track the release schedule of MCU films. Solution: Physical Schema: The B-Tree index on the ‘character_id’ field affects the physical schema. B-Trees improve query performance by allowing efficient retrieval of character information. View Schema: Creating the ‘CharacterAppearances’ views involve the view schema. Logical Schema: Adding the ‘release_date’ column in the ‘movies’ table modifies the logical schema. This affects how the movie data is structured. 25 THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] 26 Database Management Systems Database Languages 1 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 2 Database Management Systems Database Languages 3 Database Management Systems Introduction to DBMS Languages DBMS Languages: Database Management Systems (DBMS) use specialized languages to define, manage, and manipulate databases. Key types of DBMS languages include Data Definition Language (DDL), Storage Definition Language (SDL), View Definition Language (VDL), and Data Manipulation Language (DML). Modern DBMS often integrate these languages into a comprehensive database language like SQL. 4 Database Management Systems Data Definition Language (DDL) Data Definition Language (DDL) DDL is used to define the database schema, including conceptual and internal schemas. In systems with no strict separation between schema levels, DDL defines both. DDL compiler processes DDL statements, storing schema descriptions in the DBMS catalog. 5 Database Management Systems Storage Definition Language (SDL) & View Definition Language (VDL) SDL & VDL Storage Definition Language (SDL): ○ Used for specifying internal schema and storage-related details like indexing and data mapping. ○ Often combined with other functions in modern relational DBMS. View Definition Language (VDL): ○ Specifies user views and their mappings to the conceptual schema. ○ SQL often fulfills the role of VDL by defining application views as results of queries. 6 Database Management Systems Data Manipulation Language (DML) Data Manipulation Language (DML) DML is used to manipulate data with in the database, including retrieval, insertion, deletion, and modification. Modern DBMS integrate DML with other languages, such as SQL, for seamless database management. Types of DML: ○ High-Level/Declarative(Nonprocedural): Set-oriented, declarative, used for specifying complex operations. Focuses on declaring what needs to be achieved without specifying the exact steps to get there SQL queries are a prime example of non-procedural DML Example : You're specifying what data you want but not how the database should find and return that data. The database's query optimizer decides the best execution plan. 7 Database Management Systems Data Manipulation Language (DML) ○ Low-Level (Procedural): Record-at-a-time, requires programming constructs like loops. The developer explicitly writes out the steps needed to perform a task. This means specifying how to retrieve, insert, update, or delete data in a database. Example : The steps are explicitly defined: insert the data, capture the ID, and then commit 8 the transaction. Database Management Systems Integration of DBMS Languages Integration in Modern DBMS Modern DBMS often use a comprehensive language like SQL, integrating DDL, DML, and VDL. Storage definitions (SDL) are managed separately to optimize database performance. SQL includes constructs for schema definition, data manipulation, and constraint specification. 9 Database Management Systems User Interaction with DML User Interaction with DML High-Level DMLs can be used interactively or embedded in general-purpose programming languages. Host Language: When DML is embedded, the primary language is called the host language, and the DML is a data sublanguage. Query Language: High-level DML used standalone is called a query language, commonly used by casual end users. User-friendly interfaces are available for naive users to interact with databases without in-depth knowledge of DML. 10 Database Management Systems Database Access from Application Programs Questions: 1. The __________ is used by database designers to define both conceptual and internal schemas in DBMSs where no strict separation of levels is maintained. 1. In most modern relational DBMSs, __________ performs the role of View Definition Language (VDL) to define user or application views. 1. In modern DBMSs, the Storage Definition Language (SDL) is typically used as a standalone language to define the internal schema and storage structures. 1. A high-level DML is often referred to as __________ because it specifies which data to retrieve rather than how to retrieve it. 11 Database Management Systems Database Access from Application Programs Questions: 1. The Data Definition Language (DDL) is used by database designers to define both conceptual and internal schemas in DBMSs where no strict separation of levels is maintained. 2. In most modern relational DBMSs, SQL performs the role of View Definition Language (VDL) to define user or application views. 3. In modern DBMSs, the Storage Definition Language (SDL) is typically used as a standalone language to define the internal schema and storage structures. False 4. A high-level DML is often referred to as declarative/Non-Procedural because it specifies which data to retrieve rather than how to retrieve it. 12 THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] 13 Database Management Systems Database Design 1 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 2 Database Management Systems Why Database Design ? Scenario: Imagine you are tasked with designing a database for a new online bookstore. The bookstore needs to manage a large catalog of books, customer information, orders, and reviews. Customers should be able to search for books, place orders, leave reviews, and track their order history. Question What factors would you need to consider while designing the database system to ensure it meets all the needs of the online bookstore? (Hint : Types of data Eg: books, customers, orders, reviews? How will you ensure data integrity and security?) 3 Database Management Systems Database Design Database design mainly involves the design of the database schema. The design of a complete database application environment that meets the needs of the enterprise being modeled requires attention to a broader set of issues. A high-level data model gives designers a conceptual framework to specify user data requirements and structure the database accordingly. The various steps involved in the design process are : 4 Database Management Systems Database Design Let us now try and understand what happens at each step: 1. USER REQUIREMENT SPECIFICATION Interact with domain experts and users to fully understand the data needs and functional requirements of the database. Identify the operations or transactions that users will perform on the data. Document and review the user requirements to ensure clarity and accuracy. 2. CONCEPTUAL DESIGN PHASE Choose a high-level data model (e.g., entity-relationship model) to conceptualize the data requirements and relationships. Develop a conceptual schema that provides a detailed overview of the enterprise's data structure. Review the schema to ensure all data requirements are met and remove any 5 redundant features. Database Management Systems Database Design 3. LOGICAL DESIGN PHASE Map the high-level conceptual schema to the implementation data model of the database system that will be used (e.g., relational model). Use normalization algorithms to generate a set of tables that properly group attributes and reduce data redundancy. Confirm that the logical design meets the functional requirements specified in the user requirements phase. 4. PHYSICAL DESIGN PHASE Specify the physical features of the database, including file organization and internal storage structures. Consider performance optimization and choose appropriate indexing techniques to enhance data retrieval speed. Review and refine the physical design to ensure it aligns with the requirements and constraints of the database system and hardware. 6 Database Management Systems Database Engine 7 Database Management Systems Database Engine A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The functional components of a database system can be divided into The storage manager, The query processor component, The transaction management component. 8 Database Management Systems Storage Manager A program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible for the following tasks: Interaction with the OS file manager Efficient storing, retrieving and updating of data The storage manager implements several data structures as part of the physical system implementation: Data files -- store the database itself Data dictionary -- stores metadata about the structure of the database, in particular the schema of the database. Indices -- can provide fast access to data items. A database index provides pointers to those data items that hold a particular value. 9 Database Management Systems Storage Manager The storage manager components include: Authorization and integrity manager Transaction manager File manager Buffer manager 10 Database Management Systems Query Processor The query processor is important because it helps the database system to simplify and facilitate access to data. The query processor allows database users to obtain good performance while being able to work at the view level and not be burdened with understanding the physical-level details of the implementation of the system. The query processor components include: DDL interpreter DML compiler ○ performs query optimization, i.e. it picks the lowest cost evaluation plan from among the various alternatives. Query evaluation engine 11 Database Management Systems Query Processor The internal structure of query optimizer: 12 Database Management Systems Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation 13 Database Management Systems Transaction Management A transaction is a collection of operations that performs a single logical function in a database application Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. Transaction properties: A transaction is a logical unit of work that must maintain atomicity (all-or-none execution), consistency (database remains valid throughout), and durability (changes are permanent even after system failure). Consistency and atomicity: During the execution of a transaction, temporary inconsistency may be allowed, but the database must return to a consistent state after the transaction completes successfully. 14 Database Management Systems Transaction Management The transaction Manager Application of transactions: Transactions are widely used in various applications, including financial, telecommunication, and long-duration activities like product design or administrative workflows. 15 Database Management Systems Database System Structure 16 THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] 17 Database Management Systems Database and Application Architecture 1 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 2 Database Management Systems Database Architecture Scenario: Imagine you are responsible for the database system of a popular online multiplayer game. The game has thousands of players online simultaneously, from different parts of the world. Each player needs real-time updates for their game stats, friends list, and in-game purchases. Question: How would you design the database system to ensure efficient, secure, and real-time data access for all players? 3 Database Management Systems Database Architecture Centralized databases (Centralized architecture) Characteristics: Single Location: All data is stored and managed on a central server or database. Shared Memory: Suitable for systems with multiple CPUs that access a common shared memory space. Resource Allocation: Resources like CPU, memory, and storage are centralized, making it easier to manage and maintain. Use Cases: Small to Medium-Sized Applications: Suitable for applications with moderate data needs and user loads. Legacy Systems: Often used in older systems where scaling is not as critical or feasible. Centralized/Shared Memory DB 4 Database Management Systems Database Architecture Client Server Architecture: Client-server architecture is a model where a client (the user or application) interacts with a server (the database system) to access and manage data. The client is the user-facing component that interacts with the server to request data or perform operations. (web browser, mobile apps, desktop applications) The server is the backend component that manages the database, processes requests from clients, and handles data storage and retrieval. (MySQL, Oracle, Postgres) 5 Database Management Systems Database Architecture Parallel databases: Designed to run on a cluster of multiple machines Enables better scalability and higher processing capabilities. Many cores shared memory and Shared Disk Parallel databases are ideal for high-performance applications requiring extensive data processing, such as large-scale data warehousing, real-time analytics, and high-throughput transactional systems. Example: How do you think NASA manages and processes the vast amounts of satellite data it collects? Answer: NASA uses parallel databases and high-performance computing (HPC) systems. NASA can efficiently handle and analyze massive datasets from satellite missions, enabling quick and insightful analysis for complex tasks like climate modeling and NASA’s HPC Cluster environmental monitoring. 6 Database Management Systems Application Architectures Distributed Database: A distributed database is a database system in which data is stored across multiple physical locations. These locations can be spread across various machines within a single data center or across different geographic locations. Advantages: Scalability: Can handle large amounts of data and high traffic by distributing the load across multiple nodes. Fault Tolerance: Provides high availability and fault tolerance through data replication and distribution. Performance: Can improve performance by distributing queries and data access across multiple servers. 7 Database Management Systems Two-Tier Client Server Architectures Client: This is the user interface where users interact with the application. It can be a desktop application or a web-based interface. Server: This is where the database is stored and managed. The server handles data storage, retrieval, and management tasks. Client programs send query and transaction requests via ODBC or JDBC API, the server In a two-tier database architecture, there are two processes these requests and sends the results main components: the client (user interface) and back to the client, which then processes and the server (database). displays the results. 8 Database Management Systems Two-Tier Client Server Architectures Clients Provide appropriate interfaces through a client software module to access and utilize the various server resources. Clients may be diskless machines or PCs or Workstations with disks with only the client software installed. Connected to the servers via some form of a network. (LAN: local area network, wireless network, etc.) Servers Provides database query and transaction services to the clients Relational DBMS servers are often called SQL servers, query servers, or transaction servers Applications running on clients utilize an Application Program Interface (API) to access server databases via standard interface such as: ODBC: Open Database Connectivity standard JDBC: for Java programming access Client and server must install appropriate client module and server module software for ODBC or JDBC 9 Database Management Systems Three-Tier Client Server Architectures Common for Web applications Intermediate Layer called Application Server or Web Server: ○ Stores the web connectivity software and the business logic part of the application used to access the corresponding data from the database server ○ Acts like a conduit for sending partially processed data between the database server and the client. Three-tier Architecture Can Enhance Security: ○ Database server only accessible via middle tier ○ Clients cannot directly access database server 10 Database Management Systems Questions Scenarios for Architecture Selection: 1. Weather Forecasting System: A weather forecasting system collects data from various sensors and satellites, processes the data to generate forecasts, and distributes the information to users through a web interface and mobile application. 2. E-Learning Platform: An e-learning platform offers online courses to students around the world. The platform needs a database to manage course content, track student progress, and support interactive features such as quizzes and discussion forums. 3. Social Media Platform: A social media platform with millions of users worldwide needs a database system that can handle high read and write loads, store large amounts of user-generated content, and provide fast access to data. 11 Database Management Systems Questions Scenarios for Architecture Selection: Solutions - Weather Forecasting System: Parallel database architecture. This architecture can handle large volumes of data from sensors and satellites, process it efficiently, and distribute forecasts quickly through web and mobile interfaces. E-Learning Platform: 3-tier architecture. This architecture supports the management of course content, tracking of student progress, and interactive features like quizzes and discussion forums, ensuring scalability and efficient data access. Social Media Platform: Distributed database architecture. This architecture supports high read and write loads and large amounts of user-generated content, providing fast access to data for users worldwide. 12 Database Management Systems Database Users & Administrators 13 Database Management Systems Database Users People who work with a database can be categorized as: 1. Database Users 2. Database Administrators. Database Users There are four different types of database-system users, differentiated by the way they expect to interact with the system. Different types of user interfaces have been designed for different types of users. 1. Naïve Users 2. Application Programmers 3. Sophisticated Users 4. Database Administrators 14 Database Management Systems Database Users Naïve Users: Naive users also known as Parametric End users, don't have any knowledge of databases but still frequently use the database applications to get the desired results. With the help of the interface provided by the DBMS applications. For Example: Have you ever tried to book a ticket for your favorite movie? It's straightforward and easy. But, do you have any idea about the complex database operations happening behind the scenes to make this process smooth and efficient? In database terminology, you are a naive user or a parametric end user. This means you interact with the database through a predefined interface without needing to know anything about how the database works. 15 Database Management Systems Database Users Application programmers : Application Programmers also known as Back-End Developers, are computer professional users who are responsible for developing the application programs. Example: For an e-commerce site, application programmers connect the app to the database, write queries to display products, manage user registrations and orders, handle database errors, and secure user data. Sophisticated Users: Sophisticated users are individuals who possess a deep understanding of database management systems (DBMS) and are proficient in using Data Definition Language (DDL) and Data Manipulation Language (DML) commands. Example: Data engineers and developers often access databases directly using SQL queries to fetch, delete, update, or insert data, making them sophisticated users. 16 Database Management Systems Database Administrator One of the main reasons for using DBMSs is to have central control of both the data and the programs that access those data. Database Administrator is an individual or a team of users who defines the database schema and takes charge of controlling various levels of the database within the organization. Functions of a Database Administrator (DBA): 1. Schema definition: DBA creates the original database schema using data definition statements (DDL). 2. Storage structure and access-method definition: DBA specifies parameters for data physical organization and index creation. 3. Schema and physical-organization modification: DBA makes changes to reflect organization's needs or enhance performance. 4. Granting authorization for data access: DBA regulates user access by granting different types of authorization. 5. Routine maintenance: DBA performs periodic backups, ensures disk space availability, and monitors database performance. 17 Database Management Systems Relational Schema Questions: 1. "Application programmers are responsible for defining storage structures and access methods in the database.“ – True/False 1. "A __________ user interacts with a system using predefined interfaces and does not need to write code.“ 1. "The role of a __________ includes creating the original database schema and managing user access.“ 1. "What type of user would most likely use a query language to retrieve specific data from a database?“ 18 Database Management Systems Relational Schema Questions: 1. "Application programmers are responsible for defining storage structures and access methods in the database.“ – False (Database Administrators handle this) 2. "A Naïve user interacts with a system using predefined interfaces and does not need to write code.“ 3. "The role of Database Administrator a includes creating the original database schema and managing user access.“ 4. "What type of user would most likely use a query language to retrieve specific data from a database?“ - Sophisticated User 19 THANK YOU S Nagasundari Department of Computer Science and Engineering [email protected] 20 Database Management Systems Database Design - Overview S Nagasundari Department of Computer Science Engineering 1 Database Management Systems Unit 1: Introduction to Database Management Slides adapted from Author Slides of “Database System Concepts”, Silberschatz, H Korth and S Sudarshan, McGrawHill, 7th Edition, 2019. And Author Slides of Fundamentals of Database Systems”, Ramez Elamsri, Shamkant B Navathe, Pearson, 7th Edition, 2017. 2 Database Management Systems Recap – Design Phases As we have seen before the design phases would involve the following steps: ○ Characterizing User Data Needs ○ Choosing a Data Model and Translating Requirements ○ Review and Refinement of Schema ○ Functional Requirements Specification ○ Logical-Design Phase ○ Physical-Design Phase Fig : A simplified diagram to illustrate the main phases of database design 3 Database Management Systems Recap – Design Phases Let us just have a look at the design phases again: Characterizing User Data Needs: Interaction with domain experts and users. Specification of user requirements. Choosing a Data Model and Translating Requirements: Selection of a data model. Translation of requirements into the conceptual schema. Using entity-relationship model for conceptual design. Review and Refinement of Schema: Confirming data requirement satisfaction. Eliminating redundant features. 4 Database Management Systems Recap – Design Phases Let us just have a look at the design phases again: Functional Requirements Specification: Users describe operations (transactions) on data. Ensure schema meets functional requirements. Logical-Design Phase: Mapping conceptual schema to implementation data model. Using entity-relationship model for relational design Physical-Design Phase: Using system-specific database schema. Specifying physical features (file organization, indexes). 5 Database Management Systems Why do we need a design phase? You are designing a database for an e-commerce platform. The platform is running smoothly, but after launch, you realize the database schema didn’t account for tracking customer reviews. Problem: Adding the reviews feature now requires significant changes to the database schema, which impacts numerous queries and updates scattered across the application code. This leads to extended downtime and increased costs. Why is it crucial to thoroughly plan and carry out the database design phase before building the rest of the application? 6 Database Management Systems Why do we need a design phase? The physical schema of a database can be changed relatively easily after an application has been built. However, changes to the logical schema are usually harder to carry out, since they may affect a number of queries and updates scattered across application code. It is therefore important to carry out the database design phase with care, before building the rest of the database application. 7 Database Management Systems Design Alternatives A major part of the database design process is deciding how to represent in the design the various types of “things” such as people, places, products, and the like. We use the term entity to refer to any such distinctly identifiable item. ○ Considering a university database, the entities would be: instructors students departments courses, etc. The various entities are related to each other in a variety of ways, all of which need to be captured in the database design. ○ For example: A student takes a course offering An instructor teaches a course offering 8 Database Management Systems Design Alternatives While designing a database we must try to avoid two major pitfalls: ○ Redundancy ○ Incompleteness Let us try and understand what these are with respect to the university database system, and what are their drawbacks if not handled 9 Database Management Systems Redundancy A bad design leads to repeating the same information unnecessarily which would lead to redundancy. For example in the university database, let us that we are storing the department_ID along with the department name for every course in the course entity Course(Department_ID, department_name, course_ID, course_name, credits) In this situation, a department (Computer Science) offers many courses, leading to unnecessary repetition of storing the department name along with the ID for each course (redundancy). 10 Database Management Systems Redundancy It would suffice to store only the department_ID with each course and to associate the department name with the Department_ID only once, in a department entity Department entity: Course entity: Redundancy can also occur in a relational schema. Imagine a single table with all course information repeated for each section. This creates unnecessary redundancy, as each course details are duplicated. 11 Database Management Systems Redundancy Problem with Redundancy: ○ Redundant copies of data can become inconsistent if updated without care. ○ Different sections of a course could end up having different titles if not properly managed. Example: Inconsistent Course Titles: ○ Multiple offerings of a course might have the same identifier but different titles. ○ This inconsistency makes it unclear which title is correct. Optimal Data Management: ○ Ideally, information should be stored in one place to avoid redundancy. ○ A course's title should be stored only in a single entity (like the course entity) and linked to offerings as needed. 12 Database Management Systems Incompleteness For instance, if we only have entities for course offerings(Sections) and not for the actual courses, it creates difficulties in modeling. ○ Imagine if we only track when courses are offered but don't have a separate entity for courses. ○ This design limitation would make it impossible to represent new courses that aren't currently offered. Similarly in Relations: ○ In terms of relational databases, having a single table with repeated course information for each section causes problems. ○ This approach prevents us from representing new courses that don't yet have offerings. 13 Database Management Systems Incompleteness Challenge in Representing New Information: ○ With this flawed design, adding details about a new course becomes tricky, as it must have an associated section to fit the structure. ○ Attempting to use null values for section information is a less desirable workaround. Drawbacks of Workaround: ○ Using null values to represent missing information is not an elegant solution. ○ It might run into