Query Processing - SMU Classification: Restricted (PDF)
Document Details
Uploaded by ArticulatePiccoloTrumpet7945
Singapore Management University
Tags
Summary
These notes detail query processing techniques, focusing on MySQL and its internal mechanisms. Topics covered include database internals, table storage, and primary keys. The document also explains the different modes of query execution (buffered and unbuffered), along with optimization strategies.
Full Transcript
SMU Classification: Restricted MITB [ISSS625] Query Processing and Optimisation Topic 4 Query processing SMU Classification: Restricted Agenda Part 1: Database internals Part 2: Table storage Part 3: Primary keys S...
SMU Classification: Restricted MITB [ISSS625] Query Processing and Optimisation Topic 4 Query processing SMU Classification: Restricted Agenda Part 1: Database internals Part 2: Table storage Part 3: Primary keys SMU Classification: Restricted Part 1 Database internals SMU Classification: Restricted External functionality Client RDBMS Server (CLI/GUI) (Relational Database Management System) SMU Classification: Restricted How clients use MySQL? A client application opens a connection to an RDBMS server process. (for MySQL typically via a network socket on a TCP port 3306) The client can exchange data with the server using a commonly understood protocol. The server supports a specific built-in protocol. The client must know this protocol; usually it uses: A specialized and feature-rich connector library that implements this protocol, or A generalized connectivity library supporting an open standard such as JDBC, ODBC. The client sends authentication details to create a database session. Once the session is open, the client can send via the connection SQL statements and receive response data and status information. SMU Classification: Restricted How clients use MySQL? A client can fetch data in two modes: Buffered (default in most connectors) A client sends a query and continues execution. A server prepares the query for processing, locks all database objects to isolate a transaction, processes the query, sends the complete result to the client, and immediately unlocks all database objects and performs a clean-up. The client receives the complete result, caches it in own memory. Unbuffered, nonbuffered (default in Python MySQL connector) A client sends a query and suspends execution. A server prepares the query for processing, locks all database objects to isolate a transaction, initiates processing and sends to the client a "cursor". A client receives the cursor and can iterate through it to prompt the server to progress execution of the query step-by-step and return the next part of a resulting data set. Once there is no more data to fetch, the client can close the cursor, prompting the server to unlock all database objects and clean-up. https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorbuffered.html SMU Classification: Restricted How clients use MySQL? Buffered vs. unbuffered Buffered Easy to use Some programming languages do not require exception handling to ensure resources are always released in this mode. Some libraries require fewer function calls to implement this mode. Less likely to keep database objects locked for long. Results can be iterated over multiple times without engaging a database. The count of rows in known without iterating (explicitly) over all the results. The connection can be used while the server is processing results of a buffered query. Unbuffered, nonbuffered The only option for accessing result tables that exceed memory available to a client process. https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorbuffered.html SMU Classification: Restricted Optimization opportunity When writing and executing SQL statements, minimize impact they have on execution of other queries. Reduce duration of locking of database objects (use buffered queries where feasible, reduce correlated subqueries). Reduce the number of database objects required to generate a result (limit JOINs, subqueries, consider CTEs). Reduce the number of columns that are utilized and returned (consider reuse, don't return redundant columns). SMU Classification: Restricted How MySQL fulfils requests? Utility Layer SQL Layer Storage Engine Connection Pool Executor Client Parser Optimizer Permission Control Rewriter https://www.youtube.com/watch?v=aKOaQfpW7to https://distributedsystemsauthority.com/mysql-performance-tuning-part-1/ SMU Classification: Restricted How MySQL fulfils requests? Utility layer Offers general purpose functionality that is standard for most RDBMS software. Consists of the following key components: Connection Pool – offers a pre-allocated pool of worker threads that are engaged without a delay when a new connection gets open for an incoming client. Parser – performs syntactical analysis of statements' grammar (to identify tokens: literals, keywords, expressions, object names, etc.) and produces an abstract syntax tree corresponding to a text string of each statement. Permission Control – checks if the currently authenticated user has been authorized (by granting necessary permissions) to perform operations on the objects manipulated by each statement. SMU Classification: Restricted How MySQL fulfils requests? SQL layer Offers MySQL specific statement execution pipeline. Consists of the following key components: Rewriter –resolves statements' abstractions (views, CTEs, aliases, etc.) into their target objects (tables, columns, etc.). It can also process administrator-defined rules for altering text of queries (e.g. adding the LIMIT clause if not specified). Optimizer – creates an execution plain from each given statement, and generates candidate alternative execution plans that would return the same results. Then picks the best one (often referred as the champion), based on the estimated cost of execution using sampling statistics that were aggregated for queried objects. Executor – follows the champion execution plan and executes its steps by accessing storage engine plugins that handle persistence of individual database objects. While processing an execution plan, the executor generates results that get passed back through all components back to a client. SMU Classification: Restricted How MySQL fulfils requests? Utility Layer SQL Layer Storage Engine Connection Pool Executor InnoDB (default) Client Parser Optimizer Memory Permission Control Rewriter … https://www.youtube.com/watch?v=aKOaQfpW7to https://distributedsystemsauthority.com/mysql-performance-tuning-part-1/ SMU Classification: Restricted How MySQL fulfils requests? Storage engine To list enabled engines type: Reports its capabilities to the executor. SHOW ENGINES; Indirectly assists the optimizer. Persists data for associated individual objects. The suffix \G used by the MySQL command-line interface utility instead of Implements various persistence features: a semicolon transposes the result to On-disk/in-memory persistence. a vertical format to improve readability. Various types of indexing. Transaction and locking support. Other utilities and GUI tools may offer a similar functionality, e.g.: Compression, encryption, backup. MySQL Workbench: Alt+Ctrl+Enter https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorbuffered.html SMU Classification: Restricted Optimization opportunity Storage drives and I/O access are slow. Maximize presence of data in primary memory (RAM): consider Memory storage engine, reduce data early, use CTEs. Minimize operations in secondary memory (storage): use indexes, filter by indexed attributes, avoid full table scans, insert data by appending at the end to avoid reshuffling existing data (these techniques will be explained soon). Avoid queries that lock tables for long, further delaying slow operations on secondary memory. SMU Classification: Restricted How MySQL fulfils requests? Storage engine: InnoDB Used as a default if another engine is not specified. Offers on-disk terabyte-scale persistence with good support for transactions, indexing, foreign keys. Storage is organized based on a primary key implemented as a clustered index (discussed layer). No horizontal scaling as a cluster database. Other noteworthy storage engines Memory – High-performance in-memory volatile persistence (limited to the size of RAM); limited support for indexing, no foreign key support, no transaction support. NDB Cluster – High-availability, high-redundancy cluster distributed storage with exabyte-scale persistence, row-locking granularity, but limited support for consistency and indexing. https://dev.mysql.com/doc/refman/8.0/en/storage-engines.html SMU Classification: Restricted Part 2 Table storage SMU Classification: Restricted Table organization Based on meta-data (schema-on-write), an RDBMS knows: A list of tables in a database, A list of columns in a table, including: Their sequence, Column names, Data types for each column: Their size and binary representation (e.g. SMALLINT requires 2 bytes, FLOAT requires 4 bytes). Their interpretation (e.g. the same 4 bytes as INT, FLOAT and CHAR(4) have a different meaning and are processed differently). Their allowed values (e.g. INT is unable to store fractions and the value of negative zero, but FLOAT is able to do it). A list of constraints and other additional objects associated with a table (primary keys, foreign key references, views, etc.). SMU Classification: Restricted Table organization From the size, a storage engine can derive a layout of each record, for example: INT DOUBLE FLOAT CHAR(8) 32 bytes total 4 bytes 8 bytes 4 bytes 16 bytes CPUs are unable to process data in the secondary memory (storage drives). Data must be copied to the primary memory (RAM) first. Tables may be too large to bring all their records into the primary memory at once. Tables therefore must be split into blocks ("pages") that can be loaded/unloaded on demand. A page can contain zero or more records; some storage engines allow for records to overflow and occupy more than one page ("off-page spill"). SMU Classification: Restricted Table organization A storage engine reserves memory slots ("frames") Data pages with records of a single table for storing pages in memory. Any page must fit any frame. INT DOUBLE FLOAT CHAR(8) Data page 1 Pages must have a fixed length INT DOUBLE FLOAT CHAR(8) (16KB by default in InnoDB). INT DOUBLE FLOAT CHAR(8) INT DOUBLE FLOAT CHAR(8) … INT DOUBLE FLOAT CHAR(8) Data page 2 INT DOUBLE FLOAT CHAR(8) INT DOUBLE FLOAT CHAR(8) INT DOUBLE FLOAT CHAR(8) … INT DOUBLE FLOAT CHAR(8) 3 https://dev.mysql.com/doc/refman/8.0/en/innodb-physical-structure.html SMU Classification: Restricted Table organization A storage engine reserves memory slots ("frames") Memory frames Data pages for storing pages in memory. Any page must fit any frame. Data page 1 Frame 1 Pages must have a fixed length Data page 2 (16KB by default in InnoDB). Frame 2 Only some parts of a table (if any) Data page 3 Frame 3 could be available in frames at the same time. Data page 4 Frame 4 Data page 5 ata page 6 https://dev.mysql.com/doc/refman/8.0/en/innodb-physical-structure.html SMU Classification: Restricted Part 3 Primary keys SMU Classification: Restricted Definition Primary key (PK) A column or a combination of columns that uniquely identifies each row in a table. Ensures uniqueness – no two rows have the same combination of values in PK columns. Requires non-nullability – rows must have valid and unique PK values: NULL is considered a missing value, thus not valid. Rows with NULL behave as unique (NULL=NULL is not TRUE), but aren't uniquely distinguishable. SMU Classification: Restricted Benefits Primary key Referential integrity. INT DOUBLE FLOAT CHAR(8) Structural integrity. Data page 1 INT DOUBLE FLOAT CHAR(8) Improved access time INT DOUBLE FLOAT CHAR(8) and search performance. INT DOUBLE FLOAT CHAR(8) … Precise pinpointing INT DOUBLE FLOAT CHAR(8) Data page 2 of storage location. INT DOUBLE FLOAT CHAR(8) INT DOUBLE FLOAT CHAR(8) INT DOUBLE FLOAT CHAR(8) … INT DOUBLE FLOAT CHAR(8) 3 SMU Classification: Restricted Referential integrity Other attributes can unambiguously reference a record with a PK. Such references can exist in another table, or in the same table. They are called foreign keys (FK). Departments id name phone dept_email_address id name phone dept_email_address id name phone dept_email_address id name phone dept_email_address Employees id name dept direct_email_address SMU Classification: Restricted Structural integrity PK gives a table schema a structure. Example: in a relational model, in a normalized table a record should not contain attributes not directly and solely dependent on attributes selected as a primary key. Often, such attributes should be broken down into more atomic concepts or moved to a different table. Departments id name phone dept_email_address id name phone dept_email_address id name phone dept_email_address id name phone dept_email_address Employees id name dept direct_email_address SMU Classification: Restricted Improved access time and performance PK in most RDBMS is implemented with a primary index. Enables indexed search, removing constant need for linear scans of a table. Organizes records in a sorted fashion, enabling quick access. Ensures efficient insertion by appending if PK is incremental (consider: timestamp or AUTO_INCREMENT). Departments id name phone dept_email_address id name phone dept_email_address INDEX id name phone dept_email_address id name phone dept_email_address … SMU Classification: Restricted Precise pinpointing of storage location. PK in many RDBMS is implemented as a clustered index. Index pages may: Point to an exact data page in storage, or Contain a data records directly instead of a pointer. Departments id name phone dept_email_address id name phone dept_email_address INDEX id name phone dept_email_address id name phone dept_email_address … SMU Classification: Restricted Table organization In MySQL InnoDB storage engine uses a primary key to establish a clustered index for storage. Thus, every InnoDB table must have a primary key: 1. If a table has a user-defined PK on one or more columns, it becomes a PK. 2. Otherwise, the first non-nullable, unique index (if exists) becomes clustered and used a PK. 3. Otherwise, a storage engine creates an implicit invisible attribute using AUTO_INCREMENT (making rows sequenced by order of insertions); it becomes a PK. https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html https://dev.mysql.com/doc/refman/8.0/en/create-table-gipks.html SMU Classification: Restricted Selection of PK For simplicity and better distribution of indexes, avoid PK with attributes having a business meaning (a natural key); prefer arbitrary attributes (a synthetic key). To avoid shifting data already stored in a clustered primary index, prefer attributes with incrementing values such as a time-stamps, or AUTO_INCREMENT. An exception: storing data for write-once/read-many access when the cost of more expensive inserts is compensated with more efficient accessing or querying of deliberately grouped data. https://dev.mysql.com/doc/refman/8.0/en/glossary.html#glos_primary_key https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html