CENG315 Information Management Course Introduction PDF
Document Details
İzmir Yüksek Teknoloji Enstitüsü
Tags
Summary
This document is a course introduction for the course: CENG315 Information Management. The course introduces relational database management systems; teaching includes database fundamentals, relational models, SQL, transaction processing, concurrency, recovery, security, privacy in RDBMS, query optimization, and hands-on experience.
Full Transcript
Course Introduction CENG315 INFORMATION MANAGEMENT CENG315 Information Management ▪ Learn database fundamentals and get familiar with relational data model and SQL ▪ Comprehend relational algebra, queries, joins, and normalization ▪ Get acquainted with transaction processing, concurrency and recove...
Course Introduction CENG315 INFORMATION MANAGEMENT CENG315 Information Management ▪ Learn database fundamentals and get familiar with relational data model and SQL ▪ Comprehend relational algebra, queries, joins, and normalization ▪ Get acquainted with transaction processing, concurrency and recovery in databases ▪ Understand the basics of security, privacy in RDBMS ▪ Learn the methods of query optimization ▪ Gain hands-on experience in design and implementation of RDB 2 General Information ▪ Weekly Course Hours: Thursdays 9:45 ▪ Lecturer: Belgin Ergenç Bostanoğlu [email protected] ▪ Research Assistants: Leyla Tekin [email protected] & Büşra Çalmaz [email protected] ▪ Teams Code: ne6b4m0 3 General Information: Textbooks ▪ A. Silberschatz, HF. Korth, S. Sudarshan, Database System Concepts, 7th Ed., McGraw-Hill, 2019. ▪ Edward Sciore, Database Design and Implementation, Wiley, 2nd Ed. 2020. ▪ Jeffrey D. Ullman and Jennifer Widom, A First Course in Database Systems, 3rd Ed., 2007. ▪ C.J. Date, An Introduction to Database Systems, 8th Ed., 2003. 4 General Information: Grading Policy ▪ Midterm: 35% ▪ Final: 40% ▪ Project: 25% ▪ There should be 5 students in each group. ▪ There are 6 different project titles. Requirements of each title will be announced next week on Teams. Groups will be assigned to projects by the staff. ▪ Project will be processed in 5 steps; ▪ Step 1: Project Group & Title ▪ Step 2: Project Design Report ▪ Step 3: Final Project Design Report ▪ Step 4: Implementation of the Project ▪ Step 5: Presentation 5 Weekly Schedule Week Who? Date Subject Book Chapter Term Project 1 B.E.B.-1 3/10 Course Introduction Book 2, Ch. 1, 2 Introduction, Relational Model, Querying 2 B.E.B.-2 10/10 Relational Algebra Book 2, Ch. 4 3 B.E.B.-3 17/10 Relational Algebra Book 2, Ch. 4 Step 1: Project Group & Title Relational Design 4 B.E.B.-4 24/10 Relational Design Book 3, Ch. 6 5 Ass. 31/10 SQL Lecture Notes 6 Ass. 7/11 SQL Lecture Notes Step 2: Project Design Report 7 B.E.B.-5 14/11 Integrity, Security Book 2, Ch. 5 8 B.E.B.-6 21/11 Midterm Feedback on Project Design Reports 9 B.E.B.-7 28/11 Functional Dependencies Book 4, Ch. 3 10 B.E.B.-8 5/12 Views, Indexes Book 2, Ch. 6 11 B.E.B.-9 12/12 Transactions, Recovery Book 1, Ch. 15 12 B.E.B-10 19/12 Concurrency Control Book 1, Ch. 16 Step 3 & 4: Final Project Design Report & Implementation of the Project Book 3, Ch. 18 13 B.E.B.& 26/12 Project Presentations Step 5: Presentation Assts 14 B.E.B. & Assts 9/01 Final Books: Book 1: C.J. Date, An Introduction to Database Systems, 8th Ed., 2003. Book 2: Edward Sciore, Database Design and Implementation, Wiley, 2009. Book 3: A. Silberschatz, HF. Korth, S. Sudarshan, Database System Concepts, 7th Ed., McGraw-Hill, 2019. Book 4: Jeffrey D. Ullman and Jennifer Widom, A First Course in Database Systems, 3rd Ed., 2008. Grading: Midterm: 35%, Project: 25%, Final 40% Course Assistants: Büşra Güvenoğlu, Leyla Tekin 6 Introduction to Database Systems & Relational Databases Databases and Database Systems ▪ Database ▪ A collection of data stored in a computer ▪ The data in a database is typically organized into records. ▪ Employee records ▪ Medical records ▪ Sales records ▪ Figure 1-1 depicts a database that holds information about students in a university and the courses they have taken. 8 University Database ▪ Five types of records ▪ A conceptual picture of some records ▪ Does not indicate: ▪ How the records are stored ▪ How they are accessed ▪ Database system ▪ Software that manages the records in a database 9 Requirements of a Database System ▪ Must be persistent ▪ Can be very large ▪ Get shared ▪ Must be kept accurate ▪ Must be usable 10 Record Storage ▪ Databases must be persistent. ▪ Storing database records in text files: ▪ The simplest and most straightforward approach ▪ One file per record type ▪ Each record could be a line of text, with its values separated by tabs 11 Record Storage: Text Files ▪ Advantages: ▪ The database system has to do very little ▪ A user could examine and modify the files with a text editor ▪ Disadvantages: ▪ Updating the file takes to much time ▪ Searching the file takes too much time 12 Data Models and Schemas ▪ Two different ways of expressing the university database: ▪ As several collections of records, as in Figure 1-1 ▪ As several files where each record is stored in a representation as in Figure 1-2 ▪ Each of these ways can be specified as a schema in a data model. ▪ A data model is a framework for describing the structure of databases. ▪ A schema is the structure of a particular database. 13 Data Models and Schemas (Cont.) ▪ Data models: ▪ Relational data model ▪ File-system data model ▪ Schemas: ▪ Expressed in terms of tables of records ▪ Expressed in terms of files of records 14 Data Models and Schemas (Cont.) File-system data model Relational data model ▪ Student records are stored in the text ▪ The records are stored in a table file “student.txt”. named STUDENT. ▪ There is one record per line. ▪ Each record contains four fields: an integer SId, a string SName, and ▪ Each record contains four values, integers GradYear and MajorId. separated by tabs, denoting the student ID, name, graduation year, ▪ Users access a table in terms of its and major ID. records and fields: They can insert new records into a table, and ▪ Programs that read (and write to) the retrieve, delete, or modify all records file are responsible for understanding and decoding this representation. satisfying a specified predicate. 15 Data Models and Schemas (Cont.) ▪ Most of the Java code deals with decoding the file: ▪ Reading each record from the file ▪ Splitting it into an array of values to be examined (Figure 1-3 (a)) ▪ The SQL code only specifies the values to be extracted from the table ▪ It says nothing about how to retrieve them (Figure 1-3 (b)) 16 Data Models and Schemas (Cont.) ▪ These two models are clearly at different levels of abstraction. ▪ The relational model is called a conceptual model ▪ Its schemas are specified and manipulated without any knowledge of how it is to be implemented ▪ The file-system is called a physical model ▪ Its schemas are specified and manipulated in terms of a specific implementation 17 Data Models and Schemas (Cont.) ▪ A conceptual schema describes what the data “is”. ▪ A physical schema describes how the database is implemented. ▪ Conceptual schemas are much easier to understand and manipulate than physical schemas ▪ They omit all the implementation details 18 Physical Data Independence ▪ A conceptual schema is certainly nicer to use than a physical schema. ▪ Operations on a conceptual schema get implemented by the database system. ▪ The database catalog contains descriptions of the physical and conceptual schemas. ▪ Given an SQL query, the database system uses its catalog to generate equivalent file-based code. This translation process enables physical data independence. ▪ A database system supports physical data independence if users do not need to interact with the database system at the physical level. 19 Benefits of Physical Data Independence ▪ Ease of use ▪ Result from not needing to be concerned with implementation details ▪ Query optimization ▪ Automatic optimization ▪ Isolation from changes to the physical schema ▪ Physical implementation does not affect the user 20 Logical Data Independence ▪ Suppose that the dean’s office constantly deals with student transcripts. ▪ They would really appreciate being able to query the following two tables: ▪ STUDENT_INFO (SId, SName, GPA, NumCoursesPassed, NumCoursesFailed) ▪ STUDENT_COURSES (SId, YearOffered, CourseTitle, Prof, Grade) ▪ The set of tables personalized for a particular user is called the user’s external schema. ▪ A database system supports logical data independence if users can be given their own external schema. 21 The Three Schema Levels 22 Benefits of Logical Data Independence ▪ Each user gets a customized external schema. ▪ Users see the information they need in the form that they need it, and they don’t see the information they don’t need. ▪ The user is isolated from changes to the conceptual schema. ▪ Users receive privileges on their schema only, which provides better security. ▪ External schemas can be used to hide sensitive data from unauthorized users. 23 Relational Databases Tables ▪ The data in a relational database system is organized into tables. ▪ Each table contains zero or more records (the rows of the table) and one or more fields (the columns of the table). ▪ Each record has a value for each field. ▪ Each field has a specific type. ▪ Commercial database systems support many types, including various numeric, string, date/time types. 25 Tables (Cont.) ▪ Often, when discussing a database, it is convenient to ignore the type information; in such cases, we can write the schema by simply listing the field names for each table. 26 Null Values ▪ A null value denotes a value that does not exist or is unknown. ▪ Null values occur for two reasons: ▪ Data collection may be incomplete. ▪ Data may arrive late. 27 Superkeys and Keys ▪ A user must reference a record by specifying field values. ▪ Example: “I want the record for student named Joe who graduated in 1977”. ▪ However, not all field values are guaranteed to uniquely identify a record. ▪ A unique identifier is called a superkey. ▪ A superkey of a table is a field (or fields) whose values uniquely identify the table’s records. ▪ Note that adding a field to a superkey always produces another superkey. ▪ A key is a superkey having property that no subset of its fields is a superkey. 28 Superkeys and Keys (Cont.) ▪ STUDENT (SId, SName, GradYear, MajorId) ▪ Every student has a different ID so SId is a superkey of STUDENT. ▪ SId is a key. ▪ {SId, GradYear} is a superkey of STUDENT. ▪ SECTION (SectId, CourseId, Prof, YearOffered) ▪ If a professor teaches at most one section a year, then {Prof, YearOffered} is a key. ▪ If a course can have at most one section per year, then {CourseId, YearOffered} is a key. 29 Primary Key ▪ Although a table can have several keys, one key is chosen to be the primary key. ▪ Records are referenced by their primary key. ▪ Each primary key should be as natural and easy to understand. ▪ ID numbers are often used as primary keys because they are simple and intuitive. ▪ Primary key fields must never be null. 30 Foreign Key ▪ The information in a database is split among its tables. ▪ However, these tables are not isolated from each other. ▪ A foreign key is a field (or fields) of one table which corresponds to the primary key of another table. 31 Foreign Keys and Referential Integrity ▪ The specification of a foreign key asserts referential integrity, which requires each non-null foreign key value to be the key value of some record. 32 Constraints ▪ A constraint describes the allowable states that the tables in the database may be in. ▪ There are four important kinds of a constraint: ▪ Null value constraints specify that a particular field must not contain nulls. ▪ Key constraints specify that two records cannot have the same values for the key’s fields. ▪ Referential integrity constraints specify that a foreign key value of one record must be the key value of another record. ▪ Integrity constraints 33 Integrity Constraints ▪ An integrity constraint encodes “business rules” about the organization. ▪ Integrity constraints have two purposes: ▪ They can detect bad data entry. ▪ They can enforce the “rules” of the organization. ▪ An integrity constraint may apply to ▪ an individual record, “a student’s graduation year is at least 1863” ▪ a table, “a professor teaches at most two sections per year” ▪ or database, “a student cannot take a course more than once” 34 Specifying Tables in SQL 35 Specifying Tables in SQL (Cont.) ▪ The action specified with the on delete and on update keywords can be one of the following: ▪ Cascade: causes the same query (delete or update) to apply to each foreign key record ▪ Set null: causes the foreign key values to be set null ▪ Set default: causes the foreign key values to be set to their default value ▪ No action: causes the query to be rejected if there exists an affected foreign key record 36 References ▪ Edward Sciore, Database Design and Implementation, Wiley, 2020. ▪ Chapter 1 & 2 37