Introduction to Databases Ch.1 PDF
Document Details
Uploaded by Deleted User
Mahmoud M. Ismail
Tags
Summary
This document is an introduction to database systems, covering key concepts in information management, data modelling, and different phases of database design. It explores the relational model, SQL, and advanced topics like normalization, query optimization, and transaction processing.
Full Transcript
Introduction to Databases COURSE SPECIFICATIONS Course Title: Introduction to Databases COURSE Description This course provides a comprehensive introduction to database systems, covering key concepts in information management, data modeling, and different phases of database desig...
Introduction to Databases COURSE SPECIFICATIONS Course Title: Introduction to Databases COURSE Description This course provides a comprehensive introduction to database systems, covering key concepts in information management, data modeling, and different phases of database design. Students will explore the relational model, SQL, and advanced topics such as normalization, query optimization, and transaction processing. Through hands-on projects, students will gain practical experience in designing and managing databases.. Overall aims and objectives of course Understand the fundamentals of information management and database systems concepts. Understand the role of a database management system in an organization. Modeling databases at conceptual, logical and physical levels of design. Describe the basic relational model and its integrity constraints. Normalize database in order to maintain data integrity and data redundancy. Using SQL to access and manipulate data in relational databases. Speed up the retrieval of records using indexes. Understand the importance of transaction management. Course Learning Outcomes (CLO) By the completion of the course the students should be able to: Identify the problems entailed in file-based information systems design and that motivates the use of a database system. Apply the 3-stage database design methodology. Draw the conceptual data model using ERD. Transform a conceptual data model into a relational database model (logical database designs). Apply normalization techniques to logical database designs. Build a relational database schema using the standard SQL. Construct and optimize SQL queries for effective data retrieval. Apply Transaction Properties (ACID properties) for transaction management Course Chapters Chapter 1 Information Management and Database System Concepts Chapter 2 Conceptual data modeling using ERD (Conceptual Database Design) Chapter 3 Relational Data Model and Constraints Chapter 4 Relational database design by ER to relational mapping (Logical Database Design) Chapter 5 Normalization Chapter 6 Query Languages Chapter 7 Physical Database Design (Storage structures and file organizations) Chapter 8 Indexing Structures Chapter 9 Transaction Processing Concepts. Teaching and learning methods Lectures Seminars Discussion Groups Student Assessment Methods Assignments to assess intellectual, professional & practical skills. Midterm exam to assess knowledge and understanding. Final exam to assess knowledge and understanding. Weighting of assessments Midterm Exam Assignments Final Exam Total 100% Textbook: 1- Elmasri, Ramez. "Fundamentals of database systems”, seventh edition.", Pearson, ISBN- 13: 978-0133970777, 2021. 2- Connolly, Thomas M and Carolyn E Begg, ”Database Systems: A Practical Approach To Design, Implementation, And Management”, 6th ed, Pearson Learning Solutions, ISBN 978-0-13-294326-0, 2015. References: 1-Carlos Coronel, and Steven Morris. Database systems: design, implementation, & management , 12th edition, Cengage Learning, 2019. 2-Silberschatz, A., Korth, H.F. and Sudarshan, S, “Database Systems Concepts”, 6th Edition, McGraw-Hill, New York, ISBN 978-0-07-352332-3, 2011. Database Systems Chapter I Information Management and Database System Concepts Contact: [email protected] [email protected] Basic Definitions Data Data is raw It does not have meaning of itself. Represents Facts or observations, which are unorganized and unprocessed and therefore have no meaning or value because of lack of context and interpretation E.g. names, phone numbers, addresses, … Information It is data processing. Data Processing Information E.g. Students data about marks can be processed to obtain information like max and average mark. An entity represents a real-world object or concept, such as an employee or a project. An attribute represents some property of interest that describes an entity, such as the employee’s name or salary. A relationship among two or more entities represents an association among the entities. -Example: a works-for relationship between an employee and department Overview of Information Management Definition Information management refers to the systematic collection, organization, storage, and dissemination/distribution of information within an organization. It encompasses the processes and technologies that enable effective handling of data, ensuring that the right information is available to the right people at the right time. Information, in this case, is any detail that allows a business to make a decision. For example, the number of sales that a business records over a week. An excellent management system allows information to be accessed in a timely and accurate fashion so that key stakeholders can use it to make informed decisions about the business. Key Components of Information Management 1) Data Collection: Gathering data from various sources, including internal systems, external databases, and user input. 2) Data Organization: Structuring data in a way that facilitates easy access and retrieval, often using databases and data warehouses. 3) Data Storage: Choosing appropriate storage solutions (e.g., relational databases, cloud storage) to ensure data is secure and retrievable. 4) Data Analysis: Using analytical tools to derive insights from data, supporting decision-making processes. 5) Information Dissemination: Distributing information to stakeholders through reports, dashboards, and other formats. Information Management Process 1) Identification: Determining what information is needed. 2) Acquisition: Collecting the necessary data. 3) Storage: Safely storing data for future access. 4) Processing: Converting raw data into meaningful information. 5) Distribution: Sharing information with relevant parties. 6) Archiving and Disposal: Managing data retention and secure deletion when no longer needed. Information Management Principles Principle #1- Accuracy Ensuring that data is free from errors, inconsistencies, or misleading elements. Achieving accuracy involves: 1) Attention to detail during data entry 2) Validation processes 3) Regular audits to identify and rectify discrepancies/misleading/conflict. There are several methods to ensure that information is accurate by: 1) Automating data entry to reduce human errors, 2) Investing in one of the best intelligent document processing systems 3) Validate different data sources. Inaccurate information can have cascading effects such as compromising decision-making processes and leading to suboptimal outcomes. Principle #2- Completeness Capturing all relevant information to avoid gaps or omissions that might impact accurate analysis and decision-making. Completeness ensures that the entirety of required data is collected, stored, and maintained throughout its lifecycle. Principle #3- Timeliness Delivering data immediately to meet the needs of decision-makers and organizational processes. Organizations need to make sure that all the collected types of information are up to date to have the ability to respond quickly to: 1) Market changes 2) Identify opportunities to maintain competitive edge. Failure to do so will result in outdated and irrelevant information which will have catastrophic effect on decisions. Principle #4- Relevance Obtaining the right information that is aligned to the organization’s goals and objectives. i.e. deliver the right information to the right person in the right place and time. Organizations must focus on information that directly contributes to their business objectives and decision making processes. Principle #5- Accessibility Information can be accessed from anywhere, whether it’s from home or the organization. This means that organizations must have secure, reliable systems to enable authorized personnel to access information remotely. The appropriate people have access to the right information whenever they need it, without compromising the security of the documents. Principle #6- Security Protect information from unauthorized access, modification or destruction. Due to Cyberthreats and data breaches, organizations must establish security policies and make use of different security options to safeguard sensitive information such as: 1) Encryption 2) User access control Principle #7 Integrity Integrity means that the data in the database is accurate/correct. Maintaining the consistency and trustworthiness of information to protect it against unauthorized or unintentional modifications. It involves implementing serious measures to prevent data corruption, unauthorized modifications to ensure that information are always kept in its original form. Drawbacks of using file systems to store data ▪ Duplication/Redundancy Same data may be stored in multiple files E.g. an address of someone in many files ▪ Inconsistency Same data may be stored by different names in different format E.g. an address of someone in many files and when we change it in only one file and in another file it may not be updated. ✓ Inconsistency is generally compounded by data redundancy. - Implications Waste of space Data inaccuracies High overhead of data manipulation and maintenance ▪ Program-Data Dependence In traditional file processing, the structure of data files is embedded in the application programs. Any change in in the characteristics of data, such as changing a field from integer to decimal, require changes in all the programs that access the file. - Implications Lengthy development time Excessive program maintenance ▪ Integrity problems Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly. Hard to add new constraints or change existing ones. ▪ Atomicity(indivisible units) of updates problem Failures may leave it in an inconsistent state with partial updates carried out. E.g.: Transfer of funds from one account to another should either complete or not happen at all. ▪ Concurrent access by multiple users problem Concurrent access needed for performance Uncontrolled concurrent accesses can lead to inconsistencies E.g.: More than one user is concurrently updating the same data ▪ Security problems Database systems offer solutions to all the above problems File systems Multiple files and formats File Systems Database A collection of logically related data stored in a single repository. Database Management System (DBMS) A software that enables easy creation, access and modification of databases for effective and efficient database management. DBMS manages interaction between end users and database Example: bank and its ATM machines DBMS manages interaction between end users and database Effectiveness is doing the right things to obtain high-quality results. Effectiveness is determined by comparing what a process or installation can produce with what they actually produce. Efficiency is doing things in a right way in the least amount of time with the least amount of resources. Efficiency is determined by the amount of time, money, and energy – i.e. resources that are necessary to obtain certain results. Efficiency is a measure of how well you do those things; If you are able to get more outputs from the same inputs, you are said to have increased efficiency. Productivity is doing right things in a right way. Productivity = Effectiveness + Efficiency Productivity is determined by looking at the production obtained (effectiveness) versus the invested effort in order to achieve the result (efficiency). in other words, if we can achieve more with less effort or expense, productivity increases. Companies often talk about employee effectiveness and efficiency. An effective employee produces at a high level, while an efficient employee produces quickly and intelligently. By combining effectiveness and efficiency, a company produces better products faster and with fewer resources. ▪ So, DBMS must be Effective and Efficient. Effective: gaining better access to data and better management and maintaining of the data helps the end users share the data fast and effectively across the organization. Efficient: using the appropriate amount of resources, such as time and space. Both of them Increase the productivity of the End User Productivity = Effectiveness + Efficiency Typical DBMS Functionality ▪ Define a database: In terms of data types, structures and constraints ▪ Construct or Load the Database: On a secondary storage medium ▪Manipulating the database : Querying, generating reports, insertions, deletions and modifications to its content. ▪Concurrent Processing and sharing: By a set of users and programs and keeping all data valid and consistent. Database System Is an integrated system of hardware, software, people, procedures, and data that define and regulate the collection, storage, management, and use of data within a database environment. ▪ Hardware ▪ Software - OS - DBMS - Applications ▪ People ▪ Procedures ▪ Data Database System Environment Procedures ▪ Procedures are the instructions and rules that govern the design and use of the database system, this may include: How to log on to the DBMS Start and stop DBMS How to handle hardware or software failures. Make a backup copies of database ▪ Other functions of System Admin: Responsible for the O/S, Network, server Hardware, Disk space monitoring, upgrading Memory, system backup and Tuning O/s. Standards ▪ Standards are common practices that ensure the consistency and effectiveness of the database environment, such as database naming conventions. ▪ Standards for SA may include: DBMS installation and testing procedures Upgrade policies and procedures Bug fix and maintenance practices Interface considerations DBMS storage, usage, and monitoring procedures Database System Software ▪Users interact with database systems through query languages ▪The query language of a DBMS has two tasks: Defines the data structures that serve as receptacles for the data of the database Allows the speedy retrieval and modification of data. ▪Query language has two components: Data definition for defining the database schema Data manipulation for data retrieval and update - Data retrieval entails obtaining data stored in the database that satisfies a certain specification formulated by the user in a query. - Data updates include data modification, deletion and insertion. Database vs. File Systems Simplified Database system environment Difference between Programming in query languages of DBMSs and in higher-level programming languages The typical program written in C, Pascal, or PL/1 directly implements an algorithm for solving a problem. A query written in a database query language merely states: - What the problem is and leaves the construction of the code that solves the problem to a special component of the DBMS software. - This approach to programming is called nonprocedural. Database Systems Utilities To perform certain functions such as: ▪Loading data stored in files into a database Including data conversion tools. ▪Backing up the database periodically on tape. ▪Performance monitoring utilities. ▪Report generation utilities. ▪Other functions, such as user monitoring and data compression. Database System Applications ▪ Company Databases: Employees, departments, projects … ▪ Airline Reservation Systems ▪ Universities: registration, grades ▪ Library Databases: Authors, titles, publishers, videos … ▪ Bank Databases transactions ▪ Sales: customers, products, purchases ▪ Manufacturing: production, inventory, orders, supply chain ▪ Human resources: employee records, salaries, tax deductions University Database Example Application program examples Add new students, instructors, and courses Register students for courses, and generate class rosters Assign grades to students, compute grade point averages (GPA) and generate transcripts Data Models A data model describes the way data is organized and how to represent relationships among database records. The relationships among the many individual records in databases are based on one of several models: 1) Hierarchical Model 2) Network Model Record-Based Data Models 3) Relational Model 4) Entity-Relationship Data Model (ER) 5) Object Oriented Data Model (OODM) Object-Based Data Models 6) Object-Relational Model Extended Relational Data Model (ERDM) Root Hierarchical Model Children one-to-many The database records are arranged in the form of a tree with roots and several branches/children. The root represents the primary key and thus the paths of the other branches can be accessed. Each hierarchy represents a number of related records. Records are dependent and arranged in multilevel structures, consisting of one root record and any number of subordinate levels. Relationships among the records are one-to-many between a parent and its children. – Each parent can have many children/branches – Each child/branch has only one parent Example of this type is the client file in the database. The main key is the client code or name, which is the root or parent of the branch records (children) represented by the billing records, which in turn is the root or parent of the product data fields. Client 1 Bill 1 Bill2 Product 1 Product 2 Product 1 Product 2 Product 3 1 2 3 Network Model Many-to-Many Represents the entity as record types. A file of records represents the entity instances. Records linked by pointers A relationship is composed of sets - Each set has parent (owner) and member record (child) Many-to-Many (M:N) relationships representation - Each owner can have multiple members - A member may have several owners Course 1 Course 2 Course 3 items Student 1 Student 2 Student 3 Student 4 Student 5 Relational Model: Most popular Relational model stores the data in the form of a table (Relation). Each Table has rows (tuples) and columns (attributes). Each table has one or more columns that contain the key of the table. A single database can be spread across several tables. Each table corresponds to an entity and each row represents an instance of that entity. Relationship representation through common attribute links. Relationship representation through common attribute links. ▪ Advantages Data and Structural independence – Separation of database design and physical data storage/access – Relational table is purely logical structure – How data are physically stored in the database is of no concern to the user or the designer. – Changes in table structure do not affect on data access or application programs. Easier database design, implementation, management and use Ad hoc query capability with Structured Query Language (SQL) – Structured Query Language (SQL) allows the user to specify what must be done without specifying how it must be done. Improved conceptual simplicity and Powerful database management system. ▪ Disadvantages Substantial hardware and system software overhead Relational database systems hides the implementation complexities and the physical data storage details from the users. Poor design and implementation is made easy Ease of use allows careless use of RDBMS Main Characteristics of the Database Approach 1) Self-describing nature of a database system: Data Catalog or Data Dictionary is a system database that stores the description of the database (meta-data). - It contains information about data , relations , constraints and the entire schema that organize these features in to a unified database. This allows the DBMS software to work with different databases. Example of a Simplified Relational Database Catalog 2) Insulation between programs and data/program-data independence In traditional file processing, the structure of data files is embedded in the application programs. - So, any changes to the structure of a file may require changing all programs that access that file. By contrast, DBMS access programs do not require such changes. - The structure of data files is stored in the DBMS catalog separately from the access programs. We call this property program-data independence. 3) Data Abstraction: A data model is used to hide storage details and present the users with a conceptual view of the database. 4) Less redundancy: DBMS follows the rules of normalization that reduces data redundancy. ▪ Consistency: Consistency is a state where every relation in a database remains consistent(Correct). A DBMS can provide greater consistency as compared to earlier forms of data storing applications like file-processing systems. 5) Query Language: DBMS is equipped with query language, which makes it more efficient to retrieve and manipulate data. A user can apply different filtering options to retrieve a set of data. 6) ACID Properties: DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability (ACID). ACID properties are applied on transactions and help the database stay healthy in and in case of failure. 7) Multiuser and Concurrent Access: DBMS supports multi-user environment and allows them to access and manipulate data in parallel. Though there are restrictions on transactions when users attempt to handle the same data item, but users are always unaware of them. 8) Sharing of data and multiuser transaction processing : Allowing a set of concurrent users to retrieve and to update the database. - Advantage: Reduction of redundancy and the consequent possibility of inconsistency. ❖ Sharing requires that multiple accesses to data are suitably organized: concurrency control techniques are used. 9) Support of multiple views (subset of the database) of the data: Many users of the database system do not need all information; instead, they need to access only a part of the database. Each user may see a different view of the database, which describes only the data of interest to that user. E.g. A doctor performing drug tests should be able to access the patients' medical data but not their hospital bills. However, a billing clerk should have a very different view of the database. A user who is in the Sales department will have a different view of database than a user working in the Production department. 10) Security: Data should be protected from unauthorized access. - Not all users of a database will have the same accessing privileges. For example, one user might have read-only access but not make changes, while another might have read and write. DBMS offers many different levels of security features, which enables multiple users to have different views with different features. - A user in the Sales department cannot see the data that belongs to the Purchase department. Advantages of DB Approach 1) Redundancy can be reduced: In non-database systems, each application or department has its own private file resulting in considerable amount of redundancy of the stored data. Thus storage space is wasted. By having centralized database most of this can be avoided. 2) Inconsistency can be avoided: When the same data is duplicated and changes are made at one site, which is not propagated to the other sites, it gives rise to inconsistency. If the redundancy is removed, chances of having inconsistent data is removed. 3) Integrity can be maintained: Integrity means that the data in the database is accurate. Centralized control of the data helps in permitting to define integrity constraints to the data in the database. ❖Key or uniqueness constraint - Every employee record must have a unique value for ESSN ❖Referential integrity constraint - Every employee record must be related to a department record DBMS offers methods to impose constraints while entering data into the database and retrieving the same at a later stage. 4) Providing Storage Structures (e.g. indexes) for efficient query processing 5) Providing backup and recovery services ❖If a hard drive fails and the database stored on the hard drive is not accessible, the only way to recover the database is from a backup. ❖If a computer system fails, the recovery subsystem is responsible for making sure that the database is restored to its original state. 6) Providing multiple interfaces to different classes of users 7) Reduced application development time 8) Flexibility to change data structures Schema ▪ Schema is the description of a database during database design. Schema defines the entities, the relationship among them and the constraints that should hold on the database. In the case of relational databases it is divided into database tables. Schema can be depicted by means of schema diagrams. ▪ Database designers design the schema to help programmers understand the database and make it useful. Schema Diagram: A diagrammatic display for the schema of a database that shows the relations in the database, their attributes, and primary keys and foreign keys. Schema diagram for the Company database Schema diagram for the university database Three-Schema Architecture ▪ Three important characteristics of the database approach are: 1) Insulation of programs and data (program-data and program- operation independence) 2) Support of multiple user views 3) Store the database description (schema). ▪ Three-schema architecture was proposed to achieve and visualize these characteristics. External schema The external level includes a number of external schemas. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. Example: A doctor should be able to access the patients' medical data but not their hospital bills. However, a billing clerk should have a very different view of the database. Conceptual schema The conceptual schema specifies the entities, the attributes of the entities, the relationships among the entities, and constraints on the entities. Example: a set of customers and Order and the relationship between them (a Customer can place multiple Orders.) The conceptual schema is represented by the Entity-relationship Model ( ER Model ), which uses symbols to represent the data elements and relationships (Entity Relationship Diagram (ERD)) Database designers are responsible for creating the conceptual schema. Physical/internal schema Describes the physical structure of the database Details how data is stored physically on storage 1) Heap (unsorted) 2) Sequential (sorted on the values of one or more of the fields) 3) Indexed sequential access method (ISAM) sorted with a primary index of key field 4) Hash (a hash function used to store a record based on one or more fields in the record- called random, or direct, files.) 5) Clustered tables (groups of one or more tables physically stored together) 6) B+-trees Access paths: is a search structure that makes search for database records efficient, such as indexing or hashing. User view 1 user view 2 User view 3 user view 4 User view 5 External Schema External Schema External Schema Conceptual Schema Physical or DB internal Schema Three-Schema Architecture Data Independence ▪ The ability to modify a schema definition in one level without affecting a schema definition in a higher level. ▪ When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed. ▪ The higher-level schemas are unchanged. Hence, the application programs need not be changed since they refer to the higher- level schema. Types of Data Independence Logical Data Independence:(provided by external/conceptual mapping) Ability to modify conceptual schema without changing: - External views - Application programs Changes to conceptual schema may be necessary whenever the logical structure of the database changes: - Adding a data item to schema - Removing a data item - Adding table to the database In this case the external schemas that refer only to the remaining data should not be affected. - Only the view definitions and the mapping need to be changed. Physical Data Independence: (provided by conceptual/internal mapping) Ability to modify or physical/internal schema without changing: - Conceptual or view level schema - Application programs Changes to physical schema may be necessary to improve performance of retrieval or update. Changes in the physical schema may include: - Using different data structures - Switching from one access method to another (indexing or hashing) - Using different file organizations or storage structures (indexed sequential or random). - Using new storage devices Three-Schema Architecture User view 1 user view 2 User view 3 user view 4 User view 5 External Schema External Schema External Schema External/Conceptual Mapping DBMS transforms a request on an external Logical Data Independence schema against the conceptual schema Conceptual Schema DBMS transforms the request from the Conceptual/Internal Mapping conceptual to internal levels Physical Data Independence Physical or DB internal Schema Database Users Database Administrator (DBA) ▪Coordinates all the activities of the database system - Access authorization, monitoring database usage, problem determination, performance tuning. ▪Maintains the DBMS and are responsible for administrating the database. - Monitor its usage and by whom it should be used. - Create access profiles for users and force security. ▪Administrators also monitor DBMS resources like system license, required tools, and other software and hardware related maintenance. Database administrator functions Schema definition Storage structure (sequential file, or a hash file, or a sequential file with indices) and access method definition (indexing or hashing) Schema and physical organization modification Granting user authority to access the database (which parts of the database various users can access). Specifying integrity constraints (number of hours an employee may work in 1week may not exceed a specified limit (say, 80 hours)). Acting as liaison with users Monitoring performance and responding to changes in requirements Database Designers Designers are the group of people who actually work on the designing part of the database. Responsible to define the content, the structure, the constraints, and functions or transactions against the database. Identify and design the whole set of entities, relations, constraints, and views. Application programmers Are computer professionals who write application programs. Users ▪ End users/ Naïve, who use one of the application programs(such as a flight reservation or a bank operation)- people accessing database over the web, bank tellers, clerical staff. ▪ Casual/ Sophisticated user, interact with the system without writing programs. They form their requests either using a database query language or by using tools such as data analysis software. Database Users and Administrators Database