1. Basic Concepts of Database.pdf
Document Details
Uploaded by ConsiderateActionPainting
Tags
Full Transcript
TOPIC 1 Basic Concepts Contents Data vs. Information 2 Data 2 Information...
TOPIC 1 Basic Concepts Contents Data vs. Information 2 Data 2 Information 2 Database 3 Database Management System (DBMS) 3 Acid Properties of DBMS 3 Advantages of DBMS 7 Disadvantages of DBMS 8 Components of DBMS 8 Types of Databases 9 Traditional File System versus Database System 10 Disadvantages of Traditional File Systems 10 1 Learning Objectives 1. Explain the basic concepts of Database Systems. a. Describe key terminologies: data, information, database DBMS, database system. b. Discuss the database approach along with its evolution. c. Identify some traditional and contemporary use cases of the database approach. d. Explain the components, functions, and categories of DBMS. e. State some challenges of traditional database systems. Data vs. Information Data is the most basic resource in any organization. It is critical to properly organize and handle data in order to run a business efficiently. To really understand the concepts of databases, you must understand the difference between the data and information. Data Data are the raw facts that could be recorded and stored on Computer Media. The term raw refers to the facts that have not been processed in order to disclose their meaning. Let’s say you want to know what the computer lab’s users have to say about its services. In most cases, you would start by surveying users to assess the computer lab’s performance. The data gathered from users is raw data that has no meaning unless it is processed. Information Information is the result of processing raw data to reveal its meaning. Data and information are closely related and are often used interchangeably. Data processing can be as simple as organizing data to reveal patterns or as complex as making forecasts using statistical modeling. To reveal the meaning of the data, information require context. For instance, a temperature reading of 105 degrees means nothing unless you know what it means in context: Is this in Fahrenheit or Celsius degrees? Is this a machine temperature, a body temperature, or the temperature of the outside air? The data can be utilized as a starting point for making decisions. To have efficient decisions, the data must be properly formatted for storage, processing, and presentation. These days, there is no shortage of data, but there is a lack of quality information. The following are some of the most important characteristics of information: 1. Accuracy: The information must be free of errors, and reflect the meaning of the data clearly. It is also free of bias and provides the recipient with correct information. 2 2. Timeliness: Here, timely means the recipient receives the information when they need it and within the required time frame. 3. Relevancy: It refers to how beneficial a piece of information is to the individual to whom it is addressed. It is a very subjective matter. Some information that is useful to one person may not be useful to another, and vice versa. Therefore, the organization must have a good information system, which produces information that is accurate, timely, and relevant for survival in the market. Database For efficient data management, a computer database is required. A database is a shared, integrated computer structure that stores a collection of: End-user data: Raw fact of interest to the end-user. Metadata: Metadata is also referred to as data about data. The metadata describes objects in the database ad makes it easier for those objects to be accessed or manipulated. It is also defined as a collection of logically connected data that is stored together to satisfy an organization’s information needs. It is also called an electronic filing system. An example of a database is a telephone directory that contains the names, addresses, and telephone numbers of the people stored in the computer storage. The database is organized by fields, records, and files. The description is as follows: Fields: It is the smallest unit of the data that has meaning to its users. Name, Address, and Telephone number are examples of the fields. Records: A record is a collection of logically related fields. A record is one complete set of fields and each field have some value. Files: A file is a collection of related records. A telephone directory containing records about the different telephone holders is an example of a file. Database Management System (DBMS) Definition: A Database Management System (DBMS) is a collection of programs that manages the database structure and control access to the data stored in the database. Acid Properties of DBMS Atomicity A single database transaction often contains multiple statements to be executed on the database. In Relational Databases, these are usually multiple SQL statements, while in the case of non-Relational Databases, these could be multiple database commands. 3 Atomicity in ACID mandates that each transaction should be treated as a single unit of execution, which means either all the statements/commands of that transaction are executed, or none of them are. At the end of the successful transaction or after a failure while applying the transaction, the database should never be in a state where only a subset of statements/commands is applied. An atomic system thus guarantees atomicity in every situation, including successful completion of transactions or after power failures, errors, and crashes. Figure 1: Atomacity Consistency In the context of databases, Consistency is Correctness, which means that under no circumstance will the data lose its correctness. Database systems allow us to define rules that the data residing in our database are mandated to adhere to. Few handy rules could be balance of an account should never be negative no orphan mapping: there should not be any mapping of a person whose entry from the database is deleted. no orphan comment: there should not be any comment in the database that does not belong to an existing blog. 4 These rules can be defined on a database using Constraints, Cascades, and Triggers; for example, Foreign Key constraints, Check constraints, On Delete Cascades, On Update Cascades, etc. Figure 2: Consistency Isolation Isolation is the ability of the database to concurrently process multiple transactions in a way that changes made in one does not affect the other. A simple analogy is how we have to make our data structures and variables thread-safe in a multi-threaded (concurrent) environment. And similar to how we use Mutex and Semaphores to protect variables, the database uses locks (shared and exclusive) to protect transactions from one another. Isolation is one of the most important properties of any database engine, the absence of which directly impacts the integrity of the data. 5 Figure 3: Isolation Durability In the context of Database, Durability ensures that once the transactions commit, the changes survive any outages, crashes, and failures, which means any writes that have gone through as part of the successful transaction should never abruptly vanish. This is exactly why Durability is one of the essential qualities of any database, as it ensures zero data loss of any transactional data under any circumstance. A typical example of this is your purchase order placed on Amazon, which should continue to exist and remain unaffected even after their database faced an outage. So, to ensure something outlives a crash, it has to be stored in non-volatile storage like a Disk; and this forms the core idea of durability. The most fundamental way to achieve durability is by using a fast transactional log. The changes to be made on the actual data are first flushed on a separate transactional log, and then the actual update is made. This flushed transactional log enables us to reprocess and replay the transaction during database reboot and reconstruct the system’s state to the one that it was in right before the failure occurred - typically the last consistent state of the database. The write to a transaction log is made fast by keeping the file append-only and thus minimizing the disk seeks. 6 Figure 4: Durability Advantages of DBMS The DBMS serves as the intermediary between the user and the database. Data are the crucial raw material from which information is derived. For that, you must have a good method to manage such data. Some of the advantages provided by the DBMS are: Improved data sharing: The DBMS helps in the creation of an environment in which end users have better access to more and better-managed data. Such access makes it possible for end-users to respond quickly to changes in their environment. Improved data security: The more users access the data, the greater the chance of a data security breach. Corporations spend a lot of time, effort, and money to make sure that their data is used correctly. A DBMS provides a framework for enforcing data privacy and security regulations more effectively. Better data integration: Access to well-managed data allows for a more holistic view of the organization’s operations and a clearer understanding of the big picture. It’s a lot easier to understand how activities in one part of the company affect the rest of the company. Minimized data inconsistency: Data inconsistency exists when different versions of the same data appear in different places. For instance, data inconsistency exists when the company’s regional sales office shows the price of a product as Nu.1500 and its national sales office shows the same product’s price as Nu. 1400. In a well-designed database, the chances of data inconsistency are considerably decreased. 7 Improved data access: The DBMS helps to produce a quick answer to ad hoc queries thus making data access faster and more accurate. When working with a large amount of data, you might want a quick answer to questions such as: o What was the dollar volume of sales by-products during the past six months? o What is the sales bonus figure of each of our salespeople during the past three months? Improved decision making: Better-managed data and improved data access make it possible to generate better-quality information, which may then be used to make a better decision. The quality of the underlying data determines the quality of the information generated. Data quality refers to a holistic strategy for ensuring data accuracy, validity, and timeliness. Increased end-user productivity: The availability of data, combined with the tools that transform data into usable information, empowers end users to make quick, informed decisions that can make the difference between success and failure in the global economy. Disadvantages of DBMS In contrast to the many advantages of the database systems, there are some disadvantages as well. The disadvantages of the database are as follows: o Complexity increases: The data structure may become more complex because of the centralized database supporting many applications in an organization. This may lead to difficulties in its administration and may require professionals for management. o Requirement for more disk space: The size of the DBMS increases with wide functionality and more complexity. As a result, it takes up a lot more space to store and run than a traditional file system. o An additional cost of hardware: The expense of setting up a database system is substantially higher. It is dependent on the environment and functionality, the size of the hardware, and the system’s maintenance expenses. o Need for additional and specialized manpower: Any organization that uses database systems must regularly hire and train staff to develop and deploy databases as well as provide database administration services. o Organizational conflict: A centralized and shared database system requires agreement on data definitions and ownership, as well as responsibilities for maintaining accurate data. Components of DBMS A DBMS has three main components: The Data Definition Language (DLL), Data Manipulation Language and Query Facilities (DML/SQL), and software for controlled access to the database as shown in Figure 1. 8 Figure 5. Components of DBMS 1. Data Definition Language (DDL): DDL allows the users to define the database, and specify the data types, data structures and the constraints on the data to be stored in the databases. 2. Data Manipulation Language (DML) and Query Language: DML allows the users to insert, update, delete and retrieve data from the database. The types of access are: o Retrieval of information stored in the database. o Insertion of new information into the database. o Deletion of information from the database. o Modification of information stored in the database. 3. Software for controlled access of database: This software provides the facility of controlled access (user accounts) of the database by the users, concurrency control (one after another) to allow shared access of the database and a recovery control system to restore the database in case of hardware or software failure. Types of Databases A DBMS can support many different types of databases. Databases can be classified according to the number of users, the database location(s), and the expected type and extent of use. o Single-user database: A single-user database supports only one user at a time. If user A is using the database, users B and C must wait for user A to finish the task. A single-user database that runs on a personal computer is called a desktop database. 9 o Multi-user database: Multi-user databases support multiple users at the same time. When the multiuser database supports a relatively small number (less than 50) of users or a specific department within an organization, it is called a workgroup database. o Enterprise database: When the database is used by the entire organization and supports many users (more than 50) across many departments, the database is known as an enterprise database. o Centralized database: Sometimes, the locations are used to classify the database. A database that supports data located at a single site is called a centralized database and data that is distributed across several different sites is called distributed database. Traditional File System versus Database System Understanding what a database is, what it does, and how to use it properly can be clarified by considering what a database is not. Conventionally, the data were stored and processed using a traditional file processing system. Each file in a traditional file system is independent of other files, and the data from the different files may be integrated only by writing an individual program for each application. The data and the application programs that use the data are so arranged that any change to the data requires modifying all the programs that use the data. This is due to the fact that each file has hard-coded information such as data type, data size, etc. Sometimes it is even not possible to identify all the programs using that data and is identified on a trial-and-error basis. As shown in Figure 2, each file in the system used its own application program to store, retrieve, and modify data. Figure 6. A simple file system Disadvantages of Traditional File Systems A traditional file system has the following disadvantages: 10 o Data Redundancy: Since each application has its own copy of the data file, the same data may need to be recorded and kept in multiple locations. For instance, a personal file and payroll file, both contain information about the employee’s name, designation, etc. As a result, duplicate or redundant data items are created. This redundancy requires more or larger storage space, costs extra time and money, and requires additional effort to keep all data up-to-date. o Data Inconsistency: Data redundancy leads to data inconsistency especially when data is to be updated. The same data items that appear in multiple files do not get updated at the same time in each file, resulting in data inconsistency. For example, for an employee promoted from Lieutenant to Major, the bio-data file is updated instantly, however, the payroll file may not always be updated. As a result, an employee can have two different job titles at the same time. Such inconsistencies decrease the quality of data in the data file over time, affecting the accuracy of reports. o Lack of Data Integration: Due to the existence of independent data files, users have trouble obtaining information for any ad hoc query that needs accessing data from multiple files. In such a case complicated programs have to be developed to retrieve data from every file or the users have to manually collect the required information. o Limited Data Sharing: The typical file system has limited data sharing capabilities. Users have little choice to exchange data outside of their own programs because each application has its own private files. To get data from numerous incompatible files, complex programs have to be built. o Poor Data Control: Traditional file system is decentralized in nature. It could be possible that the data field has numerous names defined by various departments within the same organization, depending on the file in which it was found. As a result, a data field may have distinct meanings in different contexts or the same meaning for different fields. This leads to a lack of data control. o The problem of Security: In a traditional file system, it is very difficult to enforce security checks and access rights, since application programs are added in an ad-hoc manner. 11