Query Processing PDF
Document Details
Uploaded by UseableJoy
Western Mindanao State University
Tags
Summary
This document explains the query processing steps in a database system. It covers parsing, translation, optimization, and evaluation of SQL queries. The explanation includes relational algebra, parse tree representation, different types of optimization, and the evaluation process. Specific examples with SQL commands are discussed, providing a comprehensive overview of query processing.
Full Transcript
QUERY PROCESSING WHAT IS QUERY PROCESSING Query processing is the process of translating a high-level query, such as SQL, into a low-level query that can be executed by the database system. This involves parsing, validating, and optimizing the query, as well as generating a query execution plan. P...
QUERY PROCESSING WHAT IS QUERY PROCESSING Query processing is the process of translating a high-level query, such as SQL, into a low-level query that can be executed by the database system. This involves parsing, validating, and optimizing the query, as well as generating a query execution plan. PARSING During the parse call, the database performs the following checks: Syntax check, Semantic check, and Shared pool check. Semantic check: determines whether the statement is meaningful or not. This includes ensuring: The tables, columns, and data types referenced in the query exist. The operations are appropriate for the data types involved The query does not violate integrity constraints like primary key, foreign key, or unique constraints. Correctness in terms of joins, aggregations, and subqueries PARSING During the parse call, the database performs the following checks: Syntax check, Semantic check, and Shared pool check. Shared Pool check: Every query possesses a hash code during its execution. So, this check determines the existence of written hash code in the shared pool if the code exists in the shared pool then the database will not take additional steps for optimization and execution. PARSE TREE A parse tree (or syntax tree) is a tree-like representation of the grammatical structure of a SQL query. It visually represents how the SQL statement is structured according to the rules of SQL syntax. Features of a parse tree includes: Hierarchical Structure -A hierarchy of nodes representing the entire SQL query Grammar Representation -Each node in the tree corresponds to a grammatical construct defined by SQL syntax rules. Purpose of the parse tree: Syntax Validation: It helps in verifying that the SQL query adheres to the correct syntax. Structure Visualization: It provides a clear visual representation of the query’s structure, which aids in further processing stages. PARSE TREE SELECT name, age FROM users WHERE age > 18; Parse tree representation TRANSLATION Translation is the process of converting the parse tree into an internal representation, typically using relational algebra. This involves transforming high-level SQL constructs into operations that can be executed by the DBMS. TRANSLATION Key steps in translation Relational Algebra Representation: The parse tree is converted into a relational algebra expression, which outlines the operations needed to retrieve the data. Common operations in relational algebra include: Selection (σ): Filters rows based on a condition (similar to the WHERE clause). Projection (π): Chooses specific columns to return (similar to the SELECT clause). Join (⨝): Combines rows from two or more tables based on a related column. TRANSLATION Sample π_name, age (σ_age > 18 (users)) π_name, age: This denotes the projection of the name and age columns. σ_age > 18: This denotes the selection of rows where age is greater than 18. (users): This indicates the table from which the data is being retrieved. Purpose: Prepare for Optimization Internal Consistency OPTIMIZATION Optimization aims to find the most efficient way to execute the parsed query by evaluating multiple possible execution plans and selecting the one with the lowest cost in terms of resources like CPU, memory, and disk I/O. TYPES OF OPTIMIZATION Heuristic Optimization Cost-Based Optimization The DBMS evaluates various execution plans by A rule-based approach that applies predefined estimating the cost of resources (like CPU usage, rules (heuristics) to rewrite the query for better disk I/O, and memory) needed for each possible performance without calculating resource costs. plan. Predicate Pushdown: Moving the WHERE Generate Possible Plans: The optimizer clause conditions as early as possible to filter generates different ways to execute the query. rows sooner and reduce the amount of data Estimate Cost: For each plan, the system processed in later stages. estimates the cost based on factors such as: Reordering Joins: Joining smaller tables first I/O Cost: How many pages need to be read to minimize the size of intermediate results. from the disk. Eliminating Redundant Operations: CPU Cost Removing unnecessary operations such as Memory duplicate scans of the same table or Select the Best Plan: The plan with the lowest redundant projections. estimated cost is chosen. EVALUATION refers to the process of executing a SQL query and obtaining results based on the operations defined in the query. EVALUATION PROCESS Execution Plan Selection: -This plan outlines how the query will be executed, including the sequence of operations and the data access methods to be used. Data Access: -The database accesses the tables needed for the query. It can read data directly from the table or use indexes if available. Data Filtering and Manipulation: 1. Selection (WHERE clause): The DBMS applies any filtering conditions specified in the WHERE clause to determine which rows meet the criteria. 2. Join Operations: If the query involves multiple tables, the DBMS performs join operations to combine rows from these tables based on specified relationships. 3. Aggregation (GROUP BY): If there are aggregation functions (like COUNT, SUM, AVG), the DBMS groups the data accordingly and computes the results. EVALUATION PROCESS Result Construction 1. Projection (SELECT clause): The DBMS selects only the columns specified in the SELECT clause. This reduces the data returned to the user, providing only relevant information. 2. Sorting (ORDER BY): If an ORDER BY clause is present, the DBMS sorts the result set based on the specified columns and order (ascending or descending). Final Result Set: 3. Output Formatting: The DBMS formats the final result set for presentation. This might include converting data types or applying formatting rules. 4. Return to Client: The results are sent back to the client application or user interface, ready for display or further processing. QUERY PROCESSING BREAKDOWN QUERY PARSING When you run the query, the MySQL engine first checks whether the query is written correctly and follows SQL syntax rules. It checks if all keywords (like SELECT, FROM, WHERE) are used properly It verifies whether the table books exists in the database. It ensures the column title exists in the books table and the LIKE operator is used correctly. The parsing step ensures that MySQL understands what you are asking before it proceeds. The parsing step ensures that MySQL understands what you are asking before it proceeds. PARSE TREE The query is broken down into a parse tree, which is a hierarchical structure that represents the query components. It creates the hierarchical representation of the query. The parse tree illustrates the relationships between different parts of the query: The root represents the entire query (e.g., SELECT). Branches represent clauses (FROM, WHERE). Leaves contain specific elements like column names, table names, and conditions. TRANSLATION After parse tree, the query is converted into a machine-friendly format, typically an internal data structure like a parse tree or execution plan. This is where the query is broken down into smaller steps for the database engine to process. OPTIMIZATION “Finding the Best Way to Execute the Query” The MySQL query optimizer looks for the most efficient way to execute the query. It may consider different query execution plans and choose the one that costs the least in terms of time and resources. The optimizer evaluates different execution plans based on the parsed query and statistics about the data. It considers factors like: Whether indexes exist on columns used in the WHERE clause. The number of rows to be scanned and filtered. The best order of operations for joins (if applicable). The optimizer selects the plan with the lowest estimated cost in terms of time and resources. EVALUATION After optimization, MySQL runs the query and retrieves the data from the books table where the title matches the pattern '%Dog%'. The final result is sent back to the user. In this phase, the database engine carries out the operations defined in the execution plan generated during the optimization step. It fetches the required data from the storage (disk or memory) based on the optimized plan. During the evaluation: Data Access: The engine checks if there’s an index on the title column. If there is, it uses that index to find relevant rows quickly. If not, it scans the entire books table. Filter Application: Each row is checked against the condition in the WHERE clause. Only rows where title contains "Dog" are kept. Column Retrieval: Since we used *, all columns from the matching rows are included in the final output. Result Output: The resulting rows are returned to the user. OUTPUT THANK YOU