Copy of Unit 01 - Introduction to Database Management Systems .pdf
Document Details
Uploaded by LovedConnemara6528
Tags
Full Transcript
Unit 01 Introduction to Database Management Systems DBMS Team Outlines 1. Database vs File System 2. Database system concept and Architecture 3. Data Model Schema and Instances 4. Data Independence, Databa...
Unit 01 Introduction to Database Management Systems DBMS Team Outlines 1. Database vs File System 2. Database system concept and Architecture 3. Data Model Schema and Instances 4. Data Independence, Database Language and Interfaces 5. DDL and DML 6. Overall Database Structure 7. ER Models and Mapping constraints 8. DBMS Keys constraints 9. Generalization , Aggregation and Reduction of ER Diagrams to Tables 10. Extended ER Models and Relationship Higher Degrees Basic Definitions Database: A collection of related data. Data: Known facts that can be recorded and have an implicit meaning. Database Management System (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. Database System: The DBMS software together with the data itself. Sometimes, the applications are also included. Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf In database terminology, what is an attribute? A) A unique identifier for a record in a table B) A field or column in a table that holds data C) A table that stores multiple records D) A row in a table that contains data values In database terminology, what is a relation? A) A unique identifier for a record in a table B) A row in a table that contains data values C) A table in a database that consists of rows and columns D) A field or column in a table that holds data In database terminology, what is a tuple? A) A unique identifier for a record in a table B) A table that stores multiple records C) A row in a table that contains a set of values for each attribute D) A field or column in a table that holds data Typical DBMS Functionality Define a particular database in terms of its data types, structures, and constraints Construct or Load the initial database contents on a secondary storage medium Manipulating the database: - Retrieval: Querying, generating reports - Modification: Insertions, deletions and updates to its content - Accessing the database through Web applications Processing and Sharing by a set of concurrent users and application programs – yet, keeping all data valid and consistent Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf #Example - Define a particular database in terms of its data types, structures, and constraints CREATE TABLE Customers ( customer_id INTEGER PRIMARY KEY, customer_name VARCHAR(100) NOT NULL, date_of_birth DATE NOT NULL, account_balance DECIMAL(10, 2) CHECK (account_balance >= 0) ); #Example - Construct or Load Initial Database Contents A. Creating the Database Create the Database Schema CREATE DATABASE CustomerDB; USE CustomerDB; CREATE TABLE Customers ( customer_id INTEGER PRIMARY KEY, customer_name VARCHAR(100) NOT NULL, date_of_birth DATE NOT NULL, account_balance DECIMAL(10, 2) CHECK (account_balance >= 0) ); Load Initial Data INSERT INTO Customers (customer_id, customer_name, date_of_birth, account_balance) VALUES (1, 'John Doe', '1985-05-15', 1000.00), (2, 'Jane Smith', '1990-07-22', 1500.50), (3, 'Alice Johnson', '1988-11-03', 1200.75); B. Using a Secondary Storage Medium - Export the Database - Backup to External Storage C. Manipulating the Database Example of Database Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf File Systems A File System is a tool which facilitates storing, accessing and modifying data from numerous files in a computer system. All data is stored in the form of files. All files are categorized and sorted accordingly. The file names are closely related to one another and are organised in such a way that they are easily accessible. Files like.txt,.jpg, and.docx, as well as structured data types like.xml and.json, may be used by a file system. For organizing or more complex administration needs, files can be placed in different system directories. Examples of file systems include storage media like hard disks, pen drives, etc, that may contain data such as music files, videos, documents and e.t.c. DBMS VS File Systems Simplified Database Environment systems Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Data Model / Database Schema and Database Instances Database Schema: Logical Design of the Database Defines how data in a database is structured Defines how elements like foreign and primary keys, data types, and fields within a database relate to each other Accomplishes the above through the implementation of coded rules Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Data Model / Database Schema and Database Instances Database Instance: Snapshot of the data in the database at a given instance in time. Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Data Model Schema VS Database Instances Data Independence Data Independence in DBMS is of two types: 1. Physical Data Independence 2. Logical Data Independence Data Independence Physical Data Independence: This is defined as the ability to modify the physical schema of the database without the modification causing any changes in the logical/conceptual or view/external level. Why Physical Data Independence is important ? Physical data independence allows you to distinguish between conceptual and internal/physical levels. It allows you to describe the database logically without needing to identify physical structures. Physical data independence allows you to modify physical storage structures or devices without affecting the conceptual model of the database. Figure: Three level architecture of DBMS Data Independence Logical Data Independence Logical data independence is the ability to modify logical schema without causing any unwanted modifications to the external schema or the application programs to be rewritten. Simple Example : Data Independence Imagine you have a library system where you keep track of books and their details. The system is organized into a catalog where each book has information like the title, author, and publication year. Logical Schema Think of the logical schema as the structure of this catalog. It’s how the information is organized internally. For example: Each book entry in the catalog has fields for title, author, and publication year. This catalog is stored in a database table named Books. External Schema The external schema is how the catalog is presented to users and how they interact with it. For example: Library staff might use a software program to add new books or check out books. Readers might use a website to search for books. Logical Data Independence in Action Let's say you want to improve the catalog by adding a new field for "Genre" to better categorize books. You make this change to the logical schema so that each book entry now also includes the genre. Logical Data Independence means that you can make this change (add a "Genre" field) without affecting how library staff use their software or how readers search for books on the website. The software and website continue to work as they did before; they just see the additional "Genre" information when they query the catalog. Why Logical Data Independence is important ? Logical data is database data, which means it stores information about how data is managed within the database. Logical data independence is a method that makes sure that if we make modifications to the table format, the data should not be affected. In other words, to distinguish the external level from the conceptual view, logical data independence is used. Any modifications to the conceptual representation of the data will not affect the user's view of the data. Overview of DDL [ Data Definition Language ] and DML [ Data Manipulation Language ] DDL [ Data Definition Language ] Data Definition Language (DDL) commands allow me to define and manage a schema in SQL. In a nutshell, a schema in SQL is a blueprint that defines how data is organized in a database and how the relationships among different tables within the database are managed. DDL [ Data Definition Language ] Data Definition Language (DDL) commands allow me to define and manage a schema in SQL. In a nutshell, a schema in SQL is a blueprint that defines how data is organized in a database and how the relationships among different tables within the database are managed. Data Manipulation Language Data Manipulation Language, also known as DML, is a set of SQL commands that are used to manipulate data within database tables or query views. Data analysts, scientists, engineers, and anyone using SQL rely on DML in order to access, transform, and analyze data. Database Languages ER Model and Interfaces An Entity Relationship Diagram is a diagram that represents relationships among entities in a database. It is commonly known as an ER Diagram Entity: An Entity in a Database is a real-world object which can be described in terms of some features. For example, a box is a real-world object which can be described in terms of shape, size, and color. Attributes: The attributes are the characteristics that describe the entity of the database. For example, shape, size, and color in the above case. Relationship: Relationship is the logical binding between the different entities which exist in the Database. Suppose, two entities are Parent and Child. Then, the relationship would be ‘gave birth to.’ It means the parent ‘gave birth to the child, so the parent and child are related in this way. Reference: https://blog.knoldus.com/introduction-er-diagram/ Database Languages Entitiesand Interfaces Reference - https://www.youtube.com/@nptel-nociitm Database Languages Attributes and Interfaces Reference - https://www.youtube.com/@nptel-nociitm Database Languages Attributes and Interfaces Reference - https://www.youtube.com/@nptel-nociitm Database Languages Attributes and Interfaces Reference - https://www.youtube.com/@nptel-nociitm Database Languages ER Model and Interfaces Reference: https://blog.knoldus.com/introduction-er-diagram/ Notion of ER Diagrams Database Notation Languages for Entities and Interfaces Reference - https://www.youtube.com/@nptel-nociitm Database Languages Attributes and Interfaces Reference - https://www.youtube.com/@nptel-nociitm Notion of ER Diagrams Reference - https://www.youtube.com/@nptel-nociitm Mapping Cardinality Constraints Express the number of entities to which another entity can be associated via a relationship set. ▪ Most useful in describing binary relationship sets. ▪ For a binary relationship set the mapping cardinality must be one of the following types: - One to one - One to many - Many to one - Many to many References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Mapping Cardinality Note: Some elements in A and B may not be mapped to any elements in the other set Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Mapping Cardinality Note: Some elements in A and B may not be mapped to any elements in the other set Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Mapping Cardinality When a single instance of an entity is associated with a single instance of another entity, then it is called as one to one cardinality Here each entity of the entity set participate only once in the relationship Reference - https://prepinsta.com/dbms/mapping-constraints/ Mapping Cardinality When is a single instance of an entity is associated with more than one instance of another entity then this type of relationship is called one to many relationships. Here entities in one entity set can take participation in any number of times in relationships set and entities in another entity set can take participation only once in a relationship set. Reference - https://prepinsta.com/dbms/mapping-constraints/ Mapping Cardinality When entities in one entity set can participate only once in a relationship set and entities in another entity can participate more than once in the relationship set, then such type of cardinality is called many-to-one Reference - https://prepinsta.com/dbms/mapping-constraints/ Mapping Cardinality Here, more than one instance of an entity is associated with more than one instance of another entity then it is called many to many relationships. In this cardinality, entities in all entity sets can take participate any number of times in the relationship cardinality is many to many. Reference - https://prepinsta.com/dbms/mapping-constraints/ ER Model - Example Reference - https://www.youtube.com/@nptel-nociitm ER Model - Examples Reference - https://www.youtube.com/@nptel-nociitm ER Model Reference - https://www.youtube.com/@nptel-nociitm Constraints Integrity Constraints Motivation An integrity constraint is a condition specified on a database schema that restricts the data that can be stored in an instance of the database. References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf WHY AND HOW ? Why: Integrity Constraints [ICs] help prevent entry of incorrect information How: DBMS enforces integrity constraints – Allows only legal database instances (i.e., those that satisfy all constraints) to exist – Ensures that all necessary checks are always performed and avoids duplicating the verification logic in each application References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Constraints in ER diagrams Finding constraints is part of the DBMS modeling process. Commonly used constraints: Keys: social security number uniquely identifies a person. Single-value constraints: a person can have only one father. Referential integrity constraints: if you work for a company, it must exist in the database. Other constraints: peoples’ ages are between 0 and 150. References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Keys in ER Diagrams References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Single Values Constraints References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Referential Integrity Constraints Each product made by at most one company. Some products made by no company References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Referential Integrity Constraints Each product made by exactly one company References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Other Constraints References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Types of Constraints in SQL Focus References - https://courses.cs.washington.edu/courses/cse344/12wi/lectures/lecture15-constraints.pdf Key Constraints Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Keys with multiple attributes Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Keys Overview Reference: https://www.geeksforgeeks.org/types-of-keys-in-relational-model-candidate-super-primary-alternate-and-foreign/ Candidate Key A candidate key in a database management system (DBMS) is a unique identifier for a record within a table that can be chosen as the primary key. It possesses the essential characteristics required for a primary key: uniqueness and minimal redundancy. Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Candidate Key Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Super Key Super keys are collections of one or more properties (columns) in database management systems that allow a tuple (row) in a relation (table) to be distinctly identified Role of Super Key : Originality: A super key ensures each value combination is unique, enabling accurate data retrieval and processing. Selection: Super keys help locate specific tuples in a dataset by utilizing their unique properties. Security of Data: They maintain data consistency by preventing duplicate and inconsistent entries. Validity of Reference: Super keys can serve as foreign keys, ensuring reliable references between related tables. Indexing: Super keys enhance database performance by enabling efficient indexing and faster data access. Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Super Key Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Super Key Identify the Super Key from the above table. Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Primary Key A table's primary key is a column that identifies each tuple (row). The table's primary key enforces integrity restrictions. A table may only contain one main key. Duplicate and NULL values cannot be used in the primary key. Note: Choosing the primary key value in a table carefully is important because it can change quite infrequently Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Primary Key The following are the main fundamental characteristics: Duplicate values are not permitted in the primary key column. It implements the entity integrity of the table. A table may only have one primary key column. From one or more table fields, we can create the primary key. NOT NULL constraints should be present in the primary key column. Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Primary Key Unique Key Unique key constraints can uniquely identify an individual tuple in a relation or table. Unlike the main key, a table may have several unique keys. Note: Only one NULL value per column is permitted for unique key restrictions Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Unique Key Unique Key The following are the fundamental distinctive characteristics: One or more table fields can be used to create the unique key. There may be several distinct key columns defined in a table. Unclustered unique indexes are where a unique key is, by default, found. The column with the unique constraint may store a NULL value, but only one NULL is permitted per column. The unique constraint can be referred to as the foreign key to maintaining a table's uniqueness. Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Foreign Key Constraint A foreign key (FK) is a column or combination of columns that is used to establish and enforce a link between the data in two tables to control the data that can be stored in the foreign key table Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Foreign Key Constraint CREATE TABLE Purchase ( prodName CHAR(30) REFERENCES Product(name), date DATETIME ); - prodName is a foreign key to Product(name) - name must be a key in Product Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Foreign Key Constraint Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Foreign Key Constraint Reference: https://db-book.com/slides-dir/PDF-dir/ch.pdf Additional - Participation Constraint Reference - https://www.youtube.com/@nptel-nociitm Example - Participation Constraint Reference - https://www.youtube.com/@nptel-nociitm Entity-Relationship (ER) modeling streamlines database design by using three key concepts: 1. Generalization 2. Specialization 3. Aggregation Generalization When someone says, "Social media", what comes to mind? Facebook, Twitter, Instagram, or maybe LinkedIn In short, if we want to talk about Social media in general, we generalize and refer to them as "Social Media". This method of combining multiple lower-level entities of a similar kind and referring to them as a single higher-level entity is what we know as generalization. Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Generalization Generalization in a Database Management System is a bottom-up approach where entities with common attributes are combined to form a higher-level entity. This process simplifies ER diagrams by merging lower-level entities. Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Generalization Design steps for Generalization Define some entities with attributes. Identify common attributes between these entities and create a super-entity, so that all the remaining entities in some way related to the super-entity. Add Relationship between entities. Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Generalization - Example 01 Reference: https://www.codechef.com/practice/course/interview-dbms/CORECS03/problems/PPREPCS26 Generalization - Example For instance, in a university model, "Faculty," and "Admin.Staff" entities might share attributes like "E_Id" and ""E_Name." A generalized "Employee" entity would hold these shared traits. Reference: https://www.codechef.com/practice/course/interview-dbms/CORECS03/problems/PPREPCS26 Generalization - Example 02 Reference:https://www.upgrad.com/tutorials/software-engineering/dbms-tutorial/generalization-and-specialization-in-dbms/ Generalization Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Specialization A software developer is a person who can develop software. Now the software can be a web application, a mobile application, or, say, a desktop application. Hence, we can have developers specializing in a particular field, whether it's desktop applications, mobile applications, or desktop applications. This intuition of categorizing or dividing a higher-level entity into multiple lower-level entities of a similar kind is known as specialization [ Top Down Approach ] Specialization Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Reference: https://www.scaler.com/topics/specialization-and-generalization-in-dbms/ Conceptual Design Choices Should a concept be modeled as an entity or an attribute? Should a concept be modeled as an entity or a relationship? How should we identify relationships? Binary or ternary? Ternary or aggregation? Constraints in the ER Model: A lot of data semantics can (and should) be captured But some constraints cannot be captured in ER diagrams Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Conceptual Design Choices Should address be an attribute of Employees or an entity (connected to Employees by a relationship)? This depends upon the use we want to make of address information, and the semantics of the data If we have several addresses per an employee, address must be an entity (since attributes cannot be set-valued) If the structure (city, street, etc.) is important (e.g., we want to retrieve employees in a given city), address must be modeled as an entity Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Entity VS Attribute Limitation Concept: Attributes can be directly associated with a relationship in an ER diagram. These attributes describe the relationship itself rather than either of the entities involved. Example: In your original scenario, you could add a StartDate and EndDate attribute directly to the Works_In4 relationship between Employee and Department. Limitation: This approach works well if there is only a single instance of the relationship between an employee and a department. However, if an employee can work in the same department multiple times (i.e., multiple periods), this method might not be sufficient because the attributes wouldn't differentiate between those different periods. How it Works ? 1. The Works_In4 relationship will now link to the Duration entity, which will include attributes like StartDate and EndDate. 2. This allows the database to record multiple instances of an employee working in the same department, each with its own distinct duration. 3. This setup resolves the problem by enabling the recording of multiple work periods for the same employee within the same department. Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Aggregation Aggregation allows indicating that a relationship set participates in another relationship set Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Aggregation Consider the ternary relationship proj_guide: Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Aggregation Suppose we want to record evaluations of a student by a guide on a project Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Aggregation Relationship sets eval_for and proj_guide represent overlapping information - Every eval_for relationship corresponds to a proj_guide relationship - However, some proj_guide relationships may not correspond to any eval_for relationships ▪ So we can’t discard the proj_guide relationship Eliminate this redundancy via aggregation - Treat relationship as an abstract entity - Allows relationships between relationships - Abstraction of relationship into new entity Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Aggregation Eliminate this redundancy via aggregation without introducing redundancy, the following diagram represents: - A student is guided by a particular instructor on a particular project - A student, instructor, project combination may have an associated evaluation Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Aggregation Cont.. To represent aggregation, create a schema containing - Primary key of the aggregated relationship, - The primary key of the associated entity set - Any descriptive attributes In our example: - The schema eval_for is: eval_for (s_ID, project_id, i_ID, evaluation_id) - The schema proj_guide is redundant. Reference: https://db-book.com/slides-dir/PDF-dir/ch6.pdf Reduction of ER Diagrams to Table Reduction of ER Diagrams to Table Reduction of ER Diagrams to Table PROF (pid, hkid, dept, rank, salary), and pid is set as a primary key Reduction of ER Diagrams to Table M N STUDENT(sid, ….) CLASS(cid,..) TAKE (sid, cid, year, grade) where the Foreign Key is (sid, cid). Reduction of ER Diagrams to Table 1 N PROF(pid, ….) CLASS(cid, year, pid..) where the Foreign Key is (pid). Reduction of ER Diagrams to Table 1 1 MAN( mid, …) WOMAN(wid,.. ) MARRY (mid, wid, year) Where Primary Key either mid or wid PROF (pid, hkid, dept, rank, salary), primary key pid CLASS (cid, title, year, dept), primary key (cid ) STU (sid, dept, gpa), primary key sid TEACH (pid, cid, year), Foreign key (cid, year) TAKE (cid, year, sid, grade), Foreign key (cid, year, sid) Reduction of ER Diagrams to Table Reduction of ER Diagrams to Table cont.. Tables The tables are the strong entities that we have. Here we have the STUDENT, TEACHER, SUBJECT, and COURSE. Reduction of ER Diagrams to Table cont.. Columns in each Table The columns of a table are the attributes associated with that entity. Let's take a look at each table. We will first consider the Simple attributes that are not Composite or Multivalued. 1. Student Table: Std_Id(KEY), Std_Name, DOB 2. Teacher Table: Teach_Id(KEY), Teach_Name 3. Subject Table: Sub_Id(KEY), Sub_Name 4. Course Table: Crs_Id(KEY), Crs_Name Reduction of ER Diagrams to Table cont.. Now, let's consider the Multivalued attributes like HOBBY. We will be having a separate table for the Multivalued attribute like HOBBY. Here we will consider the columns as Stud_Hobby and Std_Id. The Std_Id will associate the hobby with that particular student. Composite Attributes For composite attributes, like ADDRESS, we will have each of the CITY and STREET as a column in the Student Table. Final Table Home Work Activity Extended ER Model EER is a high-level data model that incorporates the extensions to the original ER model. Enhanced ERD are high level models that represent the requirements and complexities of complex database. In addition to ER model concepts EE-R includes − 1. Sub classes and Super classes. 2. Specialization and Generalization. 3. Category or union type. 4. Aggregation.