5.1 Database.pdf
Document Details
Uploaded by CapableAmethyst
Tags
Full Transcript
12/31/23, 4:54 PM zyBooks Boyce-Codd normal form Progression simple or composite. A table is in Boyce-Codd normal form if, whenever column A depends on column B, then B is unique. Columns A and B may be simple or composite. 4.11.1 Normal form. 4.12 Applying normal form Normalization Normalizat...
12/31/23, 4:54 PM zyBooks Boyce-Codd normal form Progression simple or composite. A table is in Boyce-Codd normal form if, whenever column A depends on column B, then B is unique. Columns A and B may be simple or composite. 4.11.1 Normal form. 4.12 Applying normal form Normalization Normalization eliminates redundancy by decomposing a table into two or more tables in higher normal form. depends on Column A depends on column B when each B value is related to at most one A value. A and B may be simple or composite columns. Boyce-Codd normal form In a Boyce-Codd normal form table, if column A depends on column B, then B must be unique. Denormalization Denormalization means intentionally introducing redundancy by merging tables. 5. Data Storage 5.1 Storage media Access time Access time is the time required to access the first byte in a read or write operation. Transfer rate Transfer rate is the speed at which data is read or written, following initial access. Volatile memory Volatile memory is memory that is lost when disconnected from power. Non-volatile memory Non-volatile memory is retained without power. Main memory / random-access memory (RAM) Main memory, also called random-access memory (RAM), is the primary memory used when computer programs execute. Flash memory / solidstate drive (SSD) Flash memory, also called solid-state drive (SSD), is less expensive and higher capacity than main memory. https://learn.zybooks.com/zybook/WGUD426v2/content-explorer/print 24/39 12/31/23, 4:54 PM zyBooks Magnetic disk / harddisk drive (HDD) Magnetic disk, also called hard-disk drive (HDD), is used to store large amounts of data. sectors Magnetic disk groups data in sectors, traditionally 512 bytes per sector but 4 kilobytes with newer disk formats. pages Flash memory groups data in pages, usually between 2 kilobytes and 16 kilobytes per page. block Databases and file systems use a uniform size, called a block, when transferring data between main memory and storage media. row-oriented storage To minimize block transfers, relational databases usually store an entire row within one block, which is called row-oriented storage. column-oriented / columnar storage In column-oriented storage, also called columnar storage, each block stores values for a single column only. 5.2 Table structures table structure A table structure is a scheme for organizing rows in blocks on storage media. heap table In a heap table, no order is imposed on rows. sorted table / sort column In a sorted table, the database designer identifies a sort column that determines physical row order. hash table In a hash table, rows are assigned to buckets. bucket A bucket is a block or group of blocks containing rows. hash key The hash key is a column or group of columns, usually the primary key. hash function The hash function computes the bucket containing the row from the hash key. modulo function The modulo function is a simple hash function with four steps. dynamic hash function A dynamic hash function automatically allocates more blocks to the table, creates additional buckets, and distributes rows across https://learn.zybooks.com/zybook/WGUD426v2/content-explorer/print 25/39 12/31/23, 4:54 PM zyBooks Table clusters / multitables cluster key Progression all buckets. With more buckets, fewer rows are assigned to each bucket and, on average, buckets contain fewer linked blocks. Table clusters, also called multi-tables, interleave rows of two or more tables in the same storage area. Table clusters have a cluster key, a column that is available in all interleaved tables. 5.2.1 Table structures. 5.3 Single-level indexes single-level index A single-level index is a file containing column values, along with pointers to rows containing the column value. multi-column index In a multi-column index, each index entry is a composite of values from all indexed columns. In all other respects, multi-column indexes behave exactly like indexes on a single column. table scan A table scan is a database operation that reads table blocks directly, without accessing an index. index scan An index scan is a database operation that reads index blocks sequentially, in order to locate the needed table blocks. Hit ratio / filter factor / selectivity Hit ratio, also called filter factor or selectivity, is the percentage of table rows selected by a query. binary search In a binary search, the database repeatedly splits the index in two until it finds the entry containing the search value: . primary index / clustering index A primary index, also called a clustering index, is an index on a sort column. secondary index / nonclustering index A secondary index, also called a nonclustering index, is an index that is not on the sort column. dense index A dense index contains an entry for every table row. sparse index A sparse index contains an entry for every table block. Progression 5.3.1 Single-level indexes. https://learn.zybooks.com/zybook/WGUD426v2/content-explorer/print 26/39 12/31/23, 4:54 PM zyBooks 5.4 Multi-level indexes multi-level index A multi-level index stores column values and row pointers in a hierarchy. fan-out The number of index entries per block is called the fan-out of a multi-level index. branch Each path from the top-level block to a bottom-level block is called a branch. balanced / imbalanced Multi-level indexes are called balanced when all branches are the same length and imbalanced when branches are different lengths. B+tree B+tree. All indexed values appear in the bottom level. Pointers to table blocks appear only in the bottom level. Since some indexed values also appear in higher levels, values are occasionally repeated in the index. B-tree B-tree. If an indexed value appears in a higher level, the value is not repeated at lower levels. Instead, a pointer to the corresponding table block appears in the higher level along with the value. Progression 5.4.1 Multi-level indexes. 5.5 Other indexes hash index In a hash index, index entries are assigned to buckets. bucket A bucket is a block or group of blocks containing index entries. hash function The bucket containing each index entry is determined by a hash function, which computes a bucket number from the value of the indexed column. bitmap index A bitmap index is a grid of bits: . physical index A single- or multi-level index normally contains pointers to table blocks and is called a physical index. logical index A logical index is a single- or multi-level index in which pointers to table blocks are replaced with primary key values. https://learn.zybooks.com/zybook/WGUD426v2/content-explorer/print 27/39 12/31/23, 4:54 PM zyBooks function index Progression In a function index, the database designer specifies a function on the column value. Index entries contain the result of the function applied to column values, rather than the column values. 5.5.1 Other indexes. 5.6 Tablespaces and partitions tablespace A tablespace is a database object that maps one or more tables to a single file. fragmented As files are updated, blocks become scattered, or fragmented, across many tracks. partition A partition is a subset of table data. One table has many partitions that do not overlap and, together, contain all table data. horizontal partition A horizontal partition is a subset of table rows. vertical partition A vertical partition is a subset of table columns. shard Like a partition, a shard is a subset of table data, usually a subset of rows rather than columns. Unlike partitions, which are stored on different storage devices of a single computer, shards are stored on different computers of a distributed database. partition expression / partition columns To partition a table, the database administrator specifies a partition expression based on one or more partition columns. The partition expression may be simple, such as the value of a single partition column, or a complex expression based on several partition columns. Rows are assigned to partitions in one of the following ways: . range partition A range partition associates each partition with a range of partition expression values. The VALUES LESS THAN keywords specify the upper bound of each range. The MAXVALUE keyword represents the highest column value, and VALUES LESS THAN MAXVALUE specifies the highest range. Each partition is explicitly named by the database administrator. list partition A list partition associates each partition with an explicit list of partition expression values using the VALUES IN keywords. Like a range partition, each partition is explicitly named. https://learn.zybooks.com/zybook/WGUD426v2/content-explorer/print 28/39 12/31/23, 4:54 PM zyBooks hash partition A hash partition requires a partition expression with positive integer values. The database administrator specifies the number of partitions, N, and partitions are automatically named p0 through p(N-1). The partition number for each row is computed as: (partition expression value) modulo N. key partition A key partition is similar to a hash partition, except the partition expression is determined automatically by the database. Progression 5.6.1 Tablespaces and partitions. 5.7 Physical design Logical design Logical design specifies tables, columns, and keys. The logical design process is described elsewhere in this material. Physical design Physical design specifies indexes, table structures, and partitions. Physical design affects query performance but never affects query results. storage engine / storage manager A storage engine or storage manager translates instructions generated by a query processor into low-level commands that access data on storage media. Storage engines support different index and table structures, so physical design is dependent on a specific storage engine. CREATE INDEX The CREATE INDEX statement creates an index by specifying the index name and table columns that compose the index. DROP INDEX The DROP INDEX statement deletes a table's index. SHOW INDEX The SHOW INDEX statement displays a table's index. EXPLAIN The EXPLAIN statement generates a result table that describes how a statement is executed by the storage engine. slow query log The MySQL slow query log is a file that records all long-running queries submitted to the database. 6. Database Architecture 6.1 MySQL architecture https://learn.zybooks.com/zybook/WGUD426v2/content-explorer/print 29/39