Database Storage and Query Processing PDF

Summary

This document provides an introduction to database storage and query processing, covering different types of storage, memory hierarchies, and query processing steps. The document also touches on SQL and its components. It's designed for computer science students or those interested in the topic.

Full Transcript

INTRODUCTION TO DATABASE STORAGE AND QUERY PROCESSING Learning Outcome: ▪ Structured Query Language (SQL) ▪ Characteristics of SQL ▪ Advantages of SQL ▪ Components of SQL ▪ Basic Data Types 1. Data Storage Manager Data Storage Manager also known as “Database Control System”, is generally a progr...

INTRODUCTION TO DATABASE STORAGE AND QUERY PROCESSING Learning Outcome: ▪ Structured Query Language (SQL) ▪ Characteristics of SQL ▪ Advantages of SQL ▪ Components of SQL ▪ Basic Data Types 1. Data Storage Manager Data Storage Manager also known as “Database Control System”, is generally a program that provides an interface between the data/information stored and the queries received. It helps us to maintain the integrity and consistency of the database by applying the constraints. It is a highly flexible and scalable product that provides us with the capability of fully managed storage. 1. Data Storage Manager Storage Manager is generally in charge of the interactions with the File Manager, where raw data is stored on the data with the help of the file system. It translates various DML statements into low-level commands. ▪ Authorization and Integrity Manager: The main purpose of the Authorization and Integrity Manager is to ensure the satisfaction of the integrity constraints and checks the authority of users to access information. 1. Data Storage Manager ▪ Transaction Manager: The main purpose of Transaction Manager is to ensure that even after the system failures, the database should remain in a uniform state. ▪ File Manager: The main purpose of File Manager is to manage the allocation of space on the disk storage. ▪ Buffer Manager: The main purpose of Buffer Manager is to fetch the data from disk storage into the main memory. 2. Storage The collection of data that makes up a computerized database must be stored physically on some computer storage medium. The DBMS software can then retrieve, update, and process this data as needed. Can differentiate storage into: ▪ Volatile storage: loses contents when power is switched off ▪ Non-volatile storage: contents persist even when power is switched off. Includes secondary and tertiary storage, as well as battery-backed up main-memory. 2. Storage Computer storage media form a storage hierarchy that includes two main categories: 1. Primary storage. 2. Secondary storage. 3. Tertiary storage. 2.1 Primary Storage This category includes storage media that can be operated on directly by the computer’s central processing unit (CPU), such as the computer’s main memory and smaller but faster cache memories. Primary storage usually provides fast access to data but is of limited storage capacity. Although main memory capacities have been growing rapidly in recent years, they are still more expensive and have less storage capacity than demanded by typical enterprise-level databases. The contents of main memory are lost in case of power failure or a system crash. 2.2 Secondary Storage The primary choice of storage medium for online storage of enterprise databases has been magnetic disks. However, flash memories are becoming a common medium of choice for storing moderateamounts of permanent data. When used as a substitute for a disk drive, such memory is called a solid-state drive (SSD). 2.3 Tertiary Storage Optical disks (CD-ROMs, DVDs, and other similar storage media) and tapes are removable media used in today’s systems as offline storage for archiving databases and hence come under the category called tertiary storage. These devices usually have a larger capacity, cost less, and provide slower access to data than do primary storage devices. Data in secondary or tertiary storage cannot be processed directly by the CPU; first it must be copied into primary storage and then processed by the CPU. 3. Memory Hierarchies and Storage Devices In a modern computer system, data resides and is transported throughout a hierarchy of storage media. The highest-speed memory is the most expensive and is therefore available with the least capacity. The lowest-speed memory is offline tape storage, which is essentially available in indefinite storage capacity. 3.1 Primary Storage Level At the primary storage level, the memory hierarchy includes, at the most expensive end, cache memory, which is a static RAM (random access memory). Cache memory is typically used by the CPU to speed up execution of program instructions using techniques such as prefetching and pipelining. The next level of primary storage is DRAM (dynamic RAM), which provides the main work area for the CPU for keeping program instructions and data. It is popularly called main memory. The advantage of DRAM is its low cost, which continues to decrease; the drawback is its volatility and lower speed compared with static RAM. 3.2 Secondary and Tertiary Storage Level At the secondary and tertiary storage level, the hierarchy includes magnetic disks; mass storage in the form of CD-ROM (compact disk–read-only memory) and DVD (digital video disk or digital versatile disk) devices; and finally tapes at the least expensive end of the hierarchy. The storage capacity is measured in kilobytes (Kbyte or 1,000 bytes), megabytes (MB or 1 million bytes), gigabytes (GB or 1 billion bytes), and even terabytes (1,000 GB). The word petabyte (1,000 terabytes) is now becoming relevant in the context of very large repositories of data in physics, astronomy, earth sciences, and other scientific applications. 4. Memory Hierarchy Memory hierarchy is arranging different kinds of storage present on a computing device based on speed of access. Figure 1, shows the six hierarchies in the memory. 4.1 Cache Memory Cache Memory is a special very high-speed memory. The cache is a smaller and faster memory that stores copies of the data from frequently used main memory locations. There are various different independent caches in a CPU, which store instructions and data. The most important use of cache memory is that it is used to reduce the average time to access data from the main memory. 4.1 Cache Memory 4.2 Main Memory Main memory is the primary, internal workspace in the computer, commonly known as RAM (random access memory). Specifications such as 4GB, 8GB, 12GB and 16GB almost always refer to the capacity of RAM. The main memory is the fundamental storage unit in a computer system. It is associatively large and quick memory and saves programs and information during computer operations. The technology that makes the main memory work is based on semiconductor integrated circuits. 4.2 Main Memory 4.3 Flash Memory Flash memory, also known as flash storage, is widely used for storage and data transfer in consumer devices, enterprise systems and industrial applications. Flash memories are high-density, high-performance memories using EEPROM (electrically erasable programmable read-only memory) technology. The advantage of flash memory is the fast access speed; the disadvantage is that an entire block must be erased and written over simultaneously. 4.3 Flash Memory 4.3 Flash Memory Between DRAM and magnetic disk storage, another form of memory, flash memory, is becoming common, particularly because it is nonvolatile. Flash memories are high-density, high-performance memories using EEPROM (electrically erasable programmable read-only memory) technology. The advantage of flash memory is the fast access speed; the disadvantage is that an entire block must be erased and written over simultaneously. 4.4 Magnetic Disk Magnetic disks are flat circular plates of metal or plastic, coated on both sides with iron oxide. Input signals, which may be audio, video, or data, are recorded on the surface of a disk as magnetic patterns or spots in spiral tracks by a recording head while the disk is rotated by a drive unit. 4.4 Magnetic Disk 4.4 Magnetic Disk An optical disc is an electronic data storage medium that is also referred to as an optical disk, optical storage, optical media, Optical disc drive, disc drive, which reads and writes data by using optical storage techniques and technology. An optical disc, which may be used as a portable and secondary storage device, was first developed in the late 1960s. James T.Russell invented the first optical disc, which could store data. 4.5 Optical Disk The most popular form of optical removable storage is CDs (compact disks) and DVDs. CDs have a 700-MB capacity whereas DVDs have capacities ranging from 4.5 to 15 GB. CD-ROM(compact disk – read only memory) disks store data optically and are read by a laser. CD-ROMs contain prerecorded data that cannot be overwritten. The version of compact and digital video disks called CD-R (compact disk recordable) and DVD- R or DVD+R, which are also known as WORM (write-once-read-many) disks, are a form of optical storage used for archiving data; they allow data to be written once and read any number of times without the possibility of erasing. 4.6 Magnetic Tapes Magnetic tape is a type of physical storage media for different kinds of data. It is considered an analog solution, in contrast to more recent types of storage media, such as solid-state disk (SSD) drives. Magnetic tape has been a major vehicle for audio and binary data storage for several decades and is still part of data storage for some systems. 4.6 Magnetic Tapes Magnetic tapes are used for archiving and backup storage of data. Tape jukeboxes— which contain a bank of tapes that are catalogued and can be automatically loaded onto tape drives—are becoming popular as tertiary storage to hold terabytes of data. For example, NASA’s EOS (Earth Observation Satellite) system stores archived databases in this fashion. Many large organizations are using terabyte-sized databases. The term very large database can no longer be precisely defined because disk storage capacities are on the rise and costs are declining. Soon the term very large database may be reserved for databases containing hundreds of terabytes or petabytes. 5. Query Processing Query processing is a process of translating a user query into an executable form. It helps to retrieve the results from a database. In query processing, it converts the high-level query into a low-level query for the database. Query processing is a very important component of DBMS. It is critical to the performance of applications that rely on database operations. 5. Query Processing The flow of query processing in DBMS is mentioned below: 5. Query Processing Query processing in DBMS involves several steps. These steps are mentioned below: 6.1 Parsing Query parsing is the first step in query processing. In this step, a query is checked for syntax errors. Then it converts it into the parse tree. So, a parse tree represents the query in a format that is easy to understand for DBMS. A parse tree is used in other steps of query processing in DBMS. 6.2 Optimization After doing query parsing, the DBMS starts finding the most efficient way to execute the given query. The optimization process follows some factors for the query. These factors are indexing, joins, and other optimization mechanisms. These help in determining the most efficient query execution plan. So, query optimization tells the DBMS what the best execution plan is for it. The main goal of this step is to retrieve the required data with minimal cost in terms of resources and time. 6.3 Evaluation After finding the best execution plan, the DBMS starts the execution of the optimized query. And it gives the results from the database. In this step, DBMS can perform operations on the data. These operations are selecting the data, inserting something, updating the data, and so on. Once everything is completed, DBMS returns the result after the evaluation step. 7. Query Processing Example Let us consider an example to show you the query processing steps. Suppose we have a database with a table “person”. It contains the person_id, first_name, last_name, and salary. The following SQL query is used to retrieve the names of all ninjas whose salary is greater than 5000: SELECT first_name, last_name FROM person WHERE salary > 5000; 7. Query Processing Example Now let us understand how this query will give us results by following the query processing steps. Parsing: Firstly, the query will be parsed. This will also check whether the syntax is correct or not. Then this query will be converted into a parse tree. This tree will look like this 7. Query Processing Example This tree will look like this: 7. Query Processing Example Optimization: The DBMS determines the most efficient way to execute the query by considering factors such as whether an index exists on the salary field. In this case, the DBMS might use an index on the salary field to efficiently retrieve the matching rows. 7. Query Processing Example Evaluation: The DBMS executes the optimized query. It retrieves the results from the database. Then it returns the first_name and last_name of all ninjas whose salary is greater than 5000.

Use Quizgecko on...
Browser
Browser