Summary

This document provides an introduction to relational databases, covering key concepts like databases, relational database management systems (RDBMS), and SQL. It also includes a table example and introduces operators used in relational algebra.

Full Transcript

Introduction to Relational Databases Introduction Database – collection of persistent data Database Management System (DBMS) – software system that supports creation, population, and querying of a database A database is called "self-describing" because...

Introduction to Relational Databases Introduction Database – collection of persistent data Database Management System (DBMS) – software system that supports creation, population, and querying of a database A database is called "self-describing" because it contains a description of itself. OODBMSs not popular - business information systems is that OOP is obsolete. Can be used by concurrent users SQL all purpose use Relational Database Relational Database Management System (RDBMS) – Consists of a number of tables and single schema (definition of tables and attributes) – Students (sid, name, login, age, gpa) Students identifies the table sid, name, login, age, gpa identify attributes sid is primary key An Example Table Students (sid: string, name: string, login: string, age: integer, gpa: real) Tuple (row) –Any order Collection of tables sid name login age gpa 50000 Dave dave@cs 19 3.3 53666 Jones jones@cs 18 3.4 53688 Smith smith@ee 18 3.2 53650 Smith smith@math 19 3.8 53831 Madayan madayan@music 11 1.8 53832 Guldu guldu@music 12 2.0 Anatomy of a Relation StudentId Name CourseId S1 Anne C1 attribute name attribute values n-tuple, or tuple. This is a 3-tuple. The tuples constitute the body of the Heading (a set of attributes) relation. The degree of this heading is 3, The number of which is also the degree of the relation. tuples in the body is the cardinality of the relation. 5 Key-terms For each attribute A Row in a table of a relation, there represents a is a set of relationship among permitted values, a set of values. called the -domain The term attribute A domain refers to a Column is atomic if elements of a table. of the domain are Relation means a considered to be- table indivisible Another example: Courses Courses (cid, instructor, quarter, dept) cid instructor quarter dept Carnatic101 Jane Fall 06 Music Reggae203 Bob Summer 06 Music Topology101 Mary Spring 06 Math History105 Alice Fall 06 History Keys Primary key – minimal subset of fields that is unique identifier for a tuple – sid is primary key for Students – cid is primary key for Courses Foreign key –connections between tables – Courses (cid, instructor, quarter, dept) – Students (sid, name, login, age, gpa) – How do we express which students take each course? relationships In general, need a new table Enrolled(cid, grade, studid) Studid is foreign key that references sid in Student table Student Foreign Enrolled key sid name login cid grade studid 50000 Dave dave@cs 53666 Jones jones@cs Carnatic101 C 53831 53688 Smith smith@ee Reggae203 B 53832 53650 Smith smith@math 53831 Madayan madayan@mus Topology112 A 53650 53832 Guldu guldu@music History 105 B 53666 Relational Algebra Collection of operators for specifying queries Query describes step-by-step procedure for computing answer (i.e., operational) Each operator accepts one or two relations as input and returns a relation as output Relational algebra expression composed of multiple operators Basic operators Selection – return rows that meet some condition Projection – return column values Union Cross product Difference Other operators can be defined in terms of basic operators Example Schema (simplified) Courses (cid, instructor, quarter, dept) Students (sid, name, gpa) Enrolled (cid, grade, studid) Selection Select students with gpa higher than 3.3 from S1: σgpa>3.3(S1) S1 sid name gpa 50000 Dave 3.3 sid name gpa 53666 Jones 3.4 53666 Jones 3.4 53688 Smith 3.2 53650 Smith 3.8 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 Projection Project name and gpa of students in S1: name, gpa(S1) S1 Sid name gpa name gpa 50000 Dave 3.3 Dave 3.3 53666 Jones 3.4 Jones 3.4 53688 Smith 3.2 Smith 3.2 53650 Smith 3.8 Smith 3.8 53831 Madayan 1.8 Madayan 1.8 53832 Guldu 2.0 Guldu 2.0 Combine Selection and Projection Project name and gpa of students in S1 with gpa higher than 3.3: name,gpa(σgpa>3.3(S1)) Sid name gpa 50000 Dave 3.3 name gpa 53666 Jones 3.4 Jones 3.4 53688 Smith 3.2 Smith 3.8 53650 Smith 3.8 53831 Madayan 1.8 53832 Guldu 2.0 Set Operations Union (R U S) – All tuples in R or S (or both) – R and S must have same number of fields – Corresponding fields must have same domains Intersection (R ∩ S) – All tuples in both R and S Set difference (R – S) – Tuples in R and not S Set Operations (continued) Cross product or Cartesian product (R x S) – All fields in R followed by all fields in S – One tuple (r,s) for each pair of tuples r  R, s S Example: Intersection S1 S2 sid name gpa sid name gpa 50000 Dave 3.3 53666 Jones 3.4 53666 Jones 3.4 53688 Smith 3.2 53688 Smith 3.2 53700 Tom 3.5 53650 Smith 3.8 53777 Jerry 2.8 53831 Madayan 1.8 53832 Guldu 2.0 53832 Guldu 2.0 sid name gpa 53666 Jones 3.4 S1  S2 = 53688 Smith 3.2 53832 Guldu 2.0 Relational Algebra Summary Algebras are useful to manipulate data types (relations in this case) Set-oriented Brings some clarity to what needs to be done Opportunities for optimization – May have different expressions that do same thing We will see examples of algebras for other types of data in this course Create Table CREATE TABLE Enrolled (studid CHAR(20), cid CHAR(20), grade CHAR(20), PRIMARY KEY (cid), FOREIGN KEY (studid) references Students) Select-From-Where query “Find all students who are under 18” SELECT * FROM Students S WHERE S.age < 18 Queries across multiple tables (joins) “Print the student name and course ID where the student received an ‘A’ in the course” SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid = E.studid AND E.grade = ‘A’ Other SQL features MIN, MAX, AVG – Find highest grade in fall database course COUNT, DISTINCT – How many students enrolled in CS courses in the fall? ORDER BY, GROUP BY – Rank students by their grade in fall database course Views Virtual table defined on base tables defined by a query – Single or multiple tables Security – “hide” certain attributes from users – Show students in each course but hide their grades Ease of use – expression that is more intuitively obvious to user Views can be materialized to improve query performance Views Suppose we often need names of students who got a ‘B’ in some course: CREATE VIEW B_Students(name, sid, course) AS SELECT S.sname, S.sid, E.cid FROM Students S, Enrolled E WHERE S.sid=E.studid and E.grade = ‘B’ name sid course Jones 53666 History105 Guldu 53832 Reggae203 Summary: Why are RDBMS useful? Data independence – provides abstract view of the data, without details of storage Efficient data access – uses techniques to store and retrieve data efficiently Reduced application development time – many important functions already supported Centralized data administration Data Integrity and Security Concurrency control and recovery Some other limitations of RDBMS Arrays Hierarchical data Example: Taxonomy of Organisms Hierarchy of categories: – Kingdom - phylum – class – order – family – genus - species Animals Chordates Arthropods Vertebrates insects spiders crustaceans birds reptiles mammals – How would you design a relational schema for this?

Use Quizgecko on...
Browser
Browser