Database System Concepts 1.1-1.3 PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of database systems, including their applications, data models, and advantages.
Full Transcript
1.1 Database-System Applications Should a data scientist know data base management systems? Find companies that requires DBMS for data science position. Database-management system (DBMS) o DBMS, a collection of interrelated data and a set of programs to access those data. o The coll...
1.1 Database-System Applications Should a data scientist know data base management systems? Find companies that requires DBMS for data science position. Database-management system (DBMS) o DBMS, a collection of interrelated data and a set of programs to access those data. o The collection of data is usually referred to as the database. o Primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient. o Management of data involves both defining structures for storage of information and providing mechanisms for the manipulation of information. o The database system must ensure the safety of the information stored, despite system crashes or attempts at unauthorized access. Database-System Applications Database systems are used to manage collections of data that: o are highly valuable, o are relatively large, and o are accessed by multiple users and applications, often at the same time. Some representative applications are: Database-System Applications Some representative applications are: o Enterprise Information Sales Accounting Human resources o Manufacturing o Banking and Finance Banking: customer information, accounts, loans, and banking transactions. Credit card transactions: purchases on credit cards and generation of monthly statements. Finance: information about holdings, sales, and purchases of financial instruments such as stocks and bonds. o Universities: student information, course registrations, and grades. o Airlines: reservations and schedule information. o Telecommunication: keeping records of calls, texts, and data usage, generating monthly bills, maintaining balances on prepaid calling cards, and storing information about the communication networks. Web-based services Social-media: Online retailers Online advertisements Document databases: maintaining collections of new articles, patents, published research papers, etc. Navigation systems: maintaining the locations along with the exact routes of roads, train systems, buses, etc. Broadly, there are two modes in which databases are used: Online transaction processing: large number of users use the database, with each user retrieving relatively small amounts of data, and performing small updates. Data analytics: processing of data to draw conclusions, and infer rules or decision procedures, which are then used to drive business decisions. 1.2 Purpose of Database Systems Purpose of Database Systems To allow users to manipulate the information: Add new students, instructors, and courses. Register students for courses and generate class rosters. Assign grades to students, compute grade point averages (GPA), and generate transcripts. This typical file-processing system is supported by a conventional operating system. The system stores permanent records in various files, and it needs different application programs to extract records from, and add records to, the appropriate files. Disadvantages of file-processing systems File-processing system has a number of major disadvantages: o Data redundancy and inconsistency: Example: If a student has a double major (say, music and math), the address and telephone number of that student may appear in a file that consists of student records of students in the Music department and in a file that consists of student records of students in the Mathematics department. This redundancy leads to higher storage and access cost. Data inconsistency; various copies of the same data may no longer agree. Example: a changed student address may be reflected in the one department but not elsewhere in the system. o Difficulty in accessing data. Suppose one needs to find out the names of all students who live within a particular postal-code area. The data-processing department needs to generate such a list (no application program). There is an application program to generate the list of all students. We now has two unsatisfactory choices: either obtain the list of all students and extract the needed information manually or ask a programmer to write the necessary application program. Assume a program is written and several days later, we need to trim that list to include only those students who have taken at least 60 credit hours (no application program). The conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner. Data isolation: Writing new application programs to retrieve the appropriate data is difficult because data are scattered in various files and in different formats. Integrity problems: The data values stored in the database must satisfy certain types of consistency constraints. Example: Suppose University maintains an account for each department and records the balance amount in each account and also requires that the account balance of a department may never fall below zero. When new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files. o Atomicity problems. A computer system is subject to failure. It is crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure. Example: A banking system with a program to transfer $500 from account A to account B. If a system failure occurs during the execution of the program, it is possible that the $500 was removed from the balance of account A but was not credited to the balance of account B, resulting in an inconsistent database state. It is essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic - it must happen in its entirety or not at all. o Concurrent-access anomalies: Example: Consider account A, with a balance of $10,000. If two bank clerks debit the account balance (by say $500 and $100, respectively) of account A at almost exactly the same time, the re- sult of the concurrent executions may leave the account balance in an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both read the value $10,000, and write back $9500 and $9900, respectively. Depending on which one writes the value last, the balance of account A may contain either $9500 or $9900, rather than the correct value of $9400. To guard against this possibility, the system must maintain some form of supervision. But supervision is difficult to provide because data may be accessed by many different application programs that have not been coordinated previously. o Security problems: Not every user of the database system should be able to access all the data. Disadvantages of database systems Disadvantages of database systems Setup requires more knowledge, money, skills, and time. Complexity may result in poor performance 1.3 View of Data View of Data A database system is a collection of interrelated data and a set of programs that allow users to access and modify these data. A major purpose of a database system is to provide users with an abstract view of the data. That is, the system hides certain details of how the data are stored and maintained. o Data model: a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. o The data models can be classified into four different categories: Relational, Entity-Relationship, Semi-structured Data and Object-Based Data Models Relational Model o This model uses a collection of tables to represent both data and the relationships among those data. o Each table (relations) has multiple columns, and each column has a unique name. o Each table contains records of a particular type. o Each record type defines a fixed number of fields, or attributes. o The columns of the table correspond to the attributes of the record type. o This is the most widely used data model, and a vast majority of current database systems are based on the it. Entity-Relationship Model o The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. o An entity is a “thing” or “object” in the real world that is distinguishable from other objects. o The E-R model is widely used in database design. Semi-structured Data Model o The semi-structured data model permits the specification of data where individual data items of the same type may have different sets of attributes. o This is in contrast to the data models mentioned earlier, where every data item of a particular type must have the same set of attributes. o JSON and Extensible Markup Language (XML) are widely used semi- structured data representations. Object-Based Data Model Object-Based Data Model o Object-oriented programming (in Java, C++, or C#) has become the dominant software-development methodology. o This led initially to the development of a distinct object-oriented data model, but today the concept of objects is well integrated into relational databases. o Database systems allow procedures to be stored in the database system and executed by the database system. o This can be seen as extending the relational model with notions of encapsulation, methods, and object identity. A sample relational database ID name dept name salary 22222 Einstein Physics 95000 dept name building budget 12121 Wu Finance 90000 Comp. Sci. Taylor 100000 32343 El Said History 60000 Biology Watson 90000 45565 Katz Comp. Sci. 75000 Elec. Eng. Taylor 85000 98345 Kim Elec. Eng. 80000 Music Packard 80000 76766 Crick Biology 72000 Finance Painter 120000 10101 Srinivasa Comp. Sci. 65000 n History Painter 50000 58583 Califieri History 62000 83821 Brandt Comp. Sci. 92000 Physics Watson 70000 15151 Mozart Music 40000 33456 Gold Physics 87000 (b) The department 76543 Singh Finance 80000 table (a) The instructor table Levels of Abstraction o Physical level: describes how a record (e.g., instructor, department, or student) is stored. o Logical level: describes data stored in database, and the relationships among the data. type instructor = record ID : string; name : string; dept_name : string; salary : integer; end; o View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes. View of Data An architecture for a database system Which one is the most important level for database administrators? Instances and Schemas o The collection of information stored in the database at a particular moment is called an instance of the database. o The overall design of the database is called the database schema. o Similar to types and variables in programming languages. o A database schema corresponds to the variable declarations (along with associated type definitions) in a program. o The values of the variables in a program at a point in time correspond to an instance of a database schema. Instances and Schemas Database systems have several schemas, partitioned according to the levels of abstraction. o Logical Schema: the overall logical structure of the database Example: The database consists of information about a set of customers and accounts in a bank and the relationship between them - Analogous to type information of a variable in a program o Physical schema: the overall physical structure of the database o Instance: the actual content of the database at a particular point in time - Analogous to the value of a variable Physical Data Independence It is the ability to modify the physical schema without changing the logical schema o Applications depend on the logical schema o In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. Exercises List four applications you have used that most likely employed a database system to store persistent data. Assume that two students are trying to register for a course in which there is only one open seat. What component of a database system prevents both students from being given that last seat? Answers Banking: For account information, transfer of funds, banking transactions. Universities: For student information, online assignment submissions, course registrations, and grades. Airlines: For reservation of tickets and schedule information. Online news sites: For updating news and maintaining archives. Online-trade: For product data, availability and pricing information, ordertracking facilities, and generating recommendation lists. The concurrency-control manager, which is part of the transaction manager, ensures that at most one student will register successfully.