COM106 - 2 - Database Basics & Data Models (1).pptx

Full Transcript

COM106: Introductio n to Databases Database Basics (continued) & Data Models Database Basics (continued) & Data Models Some Examples Given the data table below, identify any data dependencies by determining all of the statements that are TRUE A – Knowing EName, uniquely identifies Manager...

COM106: Introductio n to Databases Database Basics (continued) & Data Models Database Basics (continued) & Data Models Some Examples Given the data table below, identify any data dependencies by determining all of the statements that are TRUE A – Knowing EName, uniquely identifies Manager B – Knowing Manager, uniquely identifies Dept# Is the reverse True? C – Knowing Salary, uniquely identifies DNameDoes knowing Manager D – Knowing DName, uniquely identifies Manager uniquely identify DName ? E – Knowing EName, uniquely identifies Dept# No F – None of the above Emp# EName Salary Dept# DName Manager Does the table contain any more 1 Smith 15000 A Sales Kelly data dependencies? 2 Jones 21000 B Admin Whyte 3 Kane 18000 D Finance Coyle Yes:- 4 Dwyer 20000 B Admin Whyte Dept# gives DName gives Manager 5 Cook 16000 C Personnel Woods 6 Healy 15000 A Sales Kelly 7 Conroy 19000 B Admin Whyte Notice the data duplication in the table 8 Price 22000 D Finance Coyle Poor database design 9 Murphy 23000 C Personnel Woods 10 Jones 16000 A Sales Kelly Database Basics (continued) & Data Models An example of some of the problems of the file-based approach: Consider a University, where each department maintains and processes its Student Accommodation Accommodat own data. FNam SNam Records Acc Addre Grade ion e e Type ss s Staff Records Name Positio Joe Blogs Halls Room CCD n D1 Jennif Inspect Mary Smith Flat 10 AAB er or Jane Jones Halls Room A*CB Pat Inspect A3 or David Banda Flat 23 210 Student Registration Payroll FNam SNam Records Studt Addre Grade Records Name Positio e e No ss s n Joe Blogs B0012 Room CCD Jennif Inspect 3 D1 er or Mary Smith B0045 Room AAB Pat Cleaner Database Basics (continued) & Data Models Student Accommodation Accommodat FNam SNam Records Acc Addre Grade ion e e Type ss s Staff Records Name Positio Joe Blogs Halls Room CCD n D1 Jennif Inspect Mary Smith Flat 10 AAB er or Jane Jones Halls Room A*CB Securi Pat Inspect A3 ty or Data David Banda Data Flat 23 210 Correctne Data Data Consisten ss No Data Consisten Duplicati cy Centralisation cy on Possible incompatible Student Registration file formats Payroll FNam SNam Records Studt Addre Grade Records Name Positio e e No ss s n Joe Blogs B0012 Room CCD Jennif Inspect 3 D1 er or Mary Smith B0045 Room AAB Pat Cleaner Database Basics (continued) & Data Models The Evolution of Databases – Database Approaches The database approach overcomes many of the problems with the file- based approach (From earlier) a database is a shared collection of logically related data stored in an organised structure which provides the ability to interact with data and to extract salient information and knowledge. A database provides a data-centred approach, designed to meet the needs of an organisation, where data can be shared among different applications. A database management system (DBMS) is a software system for managing a database(s) …. which allows the creation, manipulation (queries) and maintenance of data …. and provides controlled access to the data (security, integrity, recovery etc). Examples of common DBMS include: Microsoft Access - Single user DBMS, supports limited database size Database Basics (continued) & Data Models Database Advantages (over File Handling Systems) 1. Data Independence - application programs insulated from data 2. Centralised Data - security and integrity maintained We’ll look at each of these in 3. Access Flexibility - query language and storage mechanisms more detail 4. Data Model - reflects real world (network, hierarchical, relational etc.) Other DBMS Advantages …. Enforcement of Standards Data Sharing (different applications built on same data) Economy of Scale (all applications built on common source data) Increased Productivity (DBMS provides many standard functions) …. and Disadvantages Complexity (complex system requires level of understanding) Cost (in large organisation with large number of records, cost of DBMS plus conversion and maintenance costs are high) Higher impact of failure (downside of centralisation) Database Basics (continued) & Data Models DBMS Facilities Application Programming Language – e.g. Visual Basic in MS Access Report Generator (data analysis for management) Data Dictionary (stores data definitions etc.) independent of the stored data (supporting data independence) Query Language - e.g. SQL in relational database DBMSs Screen/Form Editor - allows creation of data entry screens (views) Functions provided by DBMS Data storage, retrieval & update System Catalog/Data Dictionary - definition of all data objects Transaction support - ensures correct update completion - transaction concurrency control for multiuser access - transaction recovery from failure of database User authorisation Provide access across a network Provides data integrity and promote data independence And a range of utilities (e.g. Indexing, Performance monitoring, file import/export) Database Basics (continued) & Data Models Database Advantages - 1. Data Independence Applications separated from data using a 3-layered model (ANSI SPARC) of a database: External Layer (Views): programs, queries, reports Logical Data Independen ce Conceptual Layer (Database Schema): single representation of logically stored data (e.g. set of linked relational tables) Internal Layer (Physical): definition of Physical how data is stored and access paths to Data that data Independen ce Each layer is defined by a schema (set of definitions) Three levels of abstraction Database Basics (continued) & Data Models The description of the database structure is the database schema – like a blueprint of how the database is constructed Changes to the applications (queries/programs etc) do not necessarily require changes to the tables or stored records (logical independence) Changes to the record storage or table design do not necessarily require changes to the applications (physical independence) Database Advantages - 2. Centralised Data DBMS is managed by a Database Administrator Data is stored centrally, thereby allowing/requiring: Security to be applied to users and database objects as appropriate (passwords, privileges, views) Integrity of the data to be maintained through: Integrity Constraints (e.g., input validation, referential integrity, etc) transaction management We’ll study these topics later in the concurrency control module Database Basics (continued) & Data Models Brief Aside – Transaction Processing A transaction (key concept) is one or more atomic sequential operations that make up a single task Operations - one of four categories (CRUD) Create Read (Query) Update Delete An Example of a Transaction – ATM Withdrawal £20 Account Table Checks the Account Balance (Read)Account Balan # ce 1234567 £300 Reduces balance by withdrawal (Update) 89 Update Account – Set Balance - £20 Log Table Transacti Account Actio Date Logs the transaction (Create) on# # n TR_89 1234567 -£20 28/11/1 Database Basics (continued) & Data Models Database Advantages - 3. Access Flexibility Powerful query language to allow ad hoc (immediate) querying of the database Query By Example - Easy to create visual queries SQL - powerful query language (starting in Week 3)We’ll study these topics later in the Storage structures and access paths module e.g. Indexed Sequential, Hashing, B-Trees Physical storage reorganisation (for large databases) in response to query performance (query optimisation) We won’t be might involve a change of storage structure studying these topics Database Advantages - 4. Data Model Data Model – How the database is organised to reflect the real world Important real world objects (entities), and the associations and relationships between them, are modelled. e.g.Employee works_in Department We’ll study ER Student enrolls_in Module Modelling later in the module Database Basics (continued) & Data Models Different data models used: Look at these data Hierarchical models in your own Network independent study Relational – uses the Relational Model to give flexible navigation through data relationships between tables The vast majority of Object oriented databases in use today are Relational Databases NoSQL are a new class of database system that is growing in popularity for particular tasks For NoSQL --- read --- ‘Not only SQL’ Lots of different variations, and lots of different products (e.g., MongoDB, CouchDB, Neo4J, etc) We focus on the Relational Document Store Model in this module. We Key Value Store won’t look at NoSQL in Graph Databases much detail. Designed to address problems which relational database are not good at tackling Need for a flexible schema Database Basics (continued) & Data Models The Relational Model The most widely used database type in the world today Proposed by Codd (1970) - see paper in Additional Module Resources (seminal work) Based on mathematical set theory which: Presents the user with a simple view of data as two-dimensional tables (aka - relations) Defines a set of relational algebra operations to manipulate relations (tables) In relational databases, data is manipulated using Structured Query Language (SQL) SQL was formalised Relational ModelbyTerminology ANSI (American National Standards Institute) in 1986 and has a direct relationship to relational algebra. Relation - a named table We won’twith study relational algebra FNam in detail, SNam but later we willStudt Grade look at its attributes (columns) relationship e e No s andtotuples SQL. (rows) Joe Blogs B0012 CCD Attribute- a named column 3 of a relation (defined on a domain) Mary Smith B0045 AAB 6 Tuple- a row of a relation (with Database Basics (continued) & Data Models Domain - the set of allowable values for one or more attributes Represents all the values that can ever be used (not just the values present at any time) and specifies whether duplicate values are allowed Domains cannot be precisely defined in database systems - data types are used instead when creating the tables (generalisation) Integrity Constraints can be used to further refine legitimate values of a data type. E.g., an attribute holding age might be of data type tinyint, but additional constraints can be used to ensure that values lie only between 0 and 99. Domains need to be known in order to relate data from different relational tables i.e. to relate data across two relations (join) requires that both attributes are defined Relatio Comm on the same domain (or File- have the same nal dataon type) Based Relation Some database terminology Table is used File interchangeably. These terms are approximately equivalent: Attribut Column Field e Database Basics (continued) & Data Models Properties of Relations Each relation has a distinct name Each cell has a single (atomic) value Each attribute has a distinct name Attribute values are all from the same domain There are no duplicate tuples Attribute order is insignificant Relation Schema Tuple order (Table Plan) has no significance A relation schema is the relation name followed by its attribute names in brackets e.g. PATIENT (patient_no, pat_name, condition, doctor) An instance of relation PATIENT (i.e. sample data): patient_ pat_na patient_no is defined on no me condition doctor positive integers (>= 32947 Adams Halitosis Jekyll 10000) (no duplicates) 59421 Brown Influenza Johnson pat_name is defined on all 37983 Clark Lurgi Jekyll valid surnames (with Haemorrhoi duplicates) 10442 Doyle d Johnson condition is defined on all 72511 Evans Amnesia Who valid medical conditions 26743 Brown Lurgi Johnson (with duplicates) Database Basics (continued) & Data Models Keys Keys are fundamental to the structure of a relation (table) A super key is any combination of fields within a table that uniquely identifies each record within that table. A candidate key of a relation is a subset of its attributes which have the time-independent rules: 1.unique identification: the key value uniquely identifies each tuple in the relation 2.minimality (non-redundancy): no attribute in the key can be discarded without destroying rule 1. i.e., a candidate key (CK) is the minimum number of attributes which together uniquely identify each tuple A candidate key of a relation is a subset of a super key. Since each tuple in a relation is distinct a key always exists (even if it is a combination of all attributes) A primary key (PK) is the candidate key chosen as the unique identifier in a relational table i.e., the smallest possible combination of attributes that uniquely identifies every tuple (record) in a relational table (i.e. unique & minimal) Database Basics (continued) & Data Models A Foreign Key (FK) is an attribute in one relation matching the primary key in some other relation (used to express a relationship) Consider the following relational schema: EMP (emp_no, nat_ins_no, ename) Employee Number (emp_no) and national Insurance Number (nat_ins_no) are normally unique - so they are CKs One is selected as the primary key (e.g. emp_no) Any others are alternate keys (e.g. nat_ins_no) A candidate/primary key which contains two or more attributes is known as a composite key To decide on a candidate/primary key you need to decide: which attribute(s) have unique values OR, if no single attribute has unique values, which combination of attributes has unique values This requires knowledge of (or assumptions about) the ‘real world’ meaning (domains) of attributes (whether attributes have unique values) Decisions on PKs may be made from relational schema alone or from tables with sample data Database Basics (continued) & Data Models What is the Primary Key (PK) of the relation: EMPLOYEE (emp_no, ename, salary) First, identify the candidate keys - do any attributes contain unique values? emp_no - unless there is any evidence to the contrary we may assume that emp_no is created to identify employees uniquely ename - There could be more than one employee with the same name (non unique values) salary - More than one employee could have the same salary (non unique values) MPLOYEE Relation So, one (Table) candidate key (emp_no), which is also the primary key (PK) emp_n salar By inspection emp_no can be seen to o ename y have unique values 1 J Smith 20000 2 B Foster 18000 (and we assume that all future 3 P Allen 23000 values will be unique – based on an assumption that the domain of 4 D Kelly 20000 emp_no contains unique values only) 5 M Corry 18000 6 J Smith 27000 Database Basics (continued) & Data Models Another Example: What is the primary key of the relation: TAKES (student_no, module_code, exam_result) Identify the candidate keys - do any attributes contain unique values? NO - student_no - If a student takes several modules then the same student# will occur in more than one record module_code - If a module has many students then the same module code will appear in many records exam_result - Several students could have the same examTAKES result Do any combination of two attributes contain student_ module_co exam_res unique values? no de ult YES – student_no & module_code 90923204 – combinedCOM140J4 43 contain unique values (composite key)90923204 COM147J4 56 and are the Primary Key 90923204 COM158J1 38 90924104 COM140J4 67 student_no & module_code are the 90924104 COM147J4 70 Composite Primary Key 90924104 COM158J1 70 (assuming that this represents 90927504 COM140J4 56 modules taken by students in a 90927504 COM147J4 67 single academic year) Database Basics (continued) & Data Models Foreign Keys Relationships between relations are represented by foreign keys (FK), normally copies of primary keys Consider two relations, EMP and DEPT EMP (emp_no, ename, dept_no*) DEPT (dept_no, dname) Notice dept_no is in both tables - dept_no in DEPT is a primary key dept_no in EMP is a foreign key Primary keys are underlined in a relation Foreign keys usually, but not necessarily, have the same attribute name as the corresponding primary key and are not underlined. In this module we will adopt the convention of indicating a FK with an * To link relations foreign keys must contain only valid values of the corresponding primary key Keys are essential in enforcing and maintaining relational integrity – c.f. data integrity There are two important relational integrity rules 1. Entity Integrity: No attributes participating in the primary key of a relation are allowed to accept null values. Database Basics (continued) & Data Models 2. Referential Integrity: If a relational table includes a foreign key (FK) matching the primary key (PK) of another relational table, then every value of the FK must: either be equal to a value of the PK in some tuple (row) or be wholly null Tables are linked An example - Consider by a FKtwo relations, EMP and DEPT EMP (emp_no, EMP dept_no*) ename, DEPT emp_no ename dept_no dept_n dname dname) DEPT (dept_no, 1 Smith A o 2 Jones A A Sales 3 White B B Admin 4 Brown B A foreign key value (e.g. C ) cannot be entered in dept_no from the EMP table unless it already exists in dept_no in the DEPT table. i.e. an employee cannot be assigned to a department unless that department exists The FK can be NULL – employee not yet assigned to a department Database Basics (continued) & Data Models What happens if a row referenced by a FK in another table is deleted? Consider the EMP and DEPT tables above, linked by an FK/PK match on dept_no. What happens if the row B, Admin is deleted from DEPT? By default, in SQL Server, this is Not Permitted – an error is raised. Referential integrity would be breached (FK values in EMP can’t reference PK values in DEPT that don’t exist!) What happens ON DELETE can be specified when the FK constraint is created (or altered) NO ACTION – Default. An error is raised and no deletion occurs. CASCADE - Corresponding rows are deleted from the referencing table (EMP) if that row is deleted from the parent table (DEPT). SET NULL - All the values that make up the FK are set to NULL when the corresponding row in the parent table is deleted. SET DEFAULT - All the values that comprise the FK are set to their default values when the corresponding row in the parent table is deleted.

Use Quizgecko on...
Browser
Browser