Database Storage I PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document covers database storage, including discussions of database architecture, disk-based architecture, storage hierarchy, and symmetric multiprocessing. Diagrams and illustrations are included. Topics include registers, caches, DRAM, SSD, HDD, and network storage.
Full Transcript
Database storage I The slides are adapted with minor changes from the open materials of the CMU 15-445/645 course, with credit to Andy Pavlo and gratitude to their outstanding teaching team. General DB architecture Application SQL Query Planning Operator Execu...
Database storage I The slides are adapted with minor changes from the open materials of the CMU 15-445/645 course, with credit to Andy Pavlo and gratitude to their outstanding teaching team. General DB architecture Application SQL Query Planning Operator Execution Access Methods Buffer Pool Manager Disk Manager DISK-BASED ARCHITECTURE The DBMS assumes that the primary storage location of the database is on non-volatile disk. The DBMS's components manage the movement of data between non-volatile and volatile storage. STORAGE HIERARCHY Faster Registers files Smaller Expensive Caches Volatile Random Access Byte-Addressable DRAM Non-Volatile SSD Sequential Access Block-Addressable HDD Slower Network Storage Larger Cheaper STORAGE HIERARCHY Faster 28 CPU Registers files Smaller On-chip memory Expensive Caches off-chip memory DRAM SSD Disk HDD Slower Network Storage Larger Cheaper Symmetric Multi-Processor CPU chip layout Core Core Register Register les les BT BP INT BP INT B INT + FP INT + FP L1 TLB L1 TLB L2 L2 L3 Mem Ctl. DIMM busses DRAM fi fi STORAGE HIERARCHY Faster 28 CPU Registers files Smaller On-chip memory Expensive Caches High-bandwidth memory (HBM) off-chip memory DRAM SSD Disk HDD Slower Network Storage Larger Cheaper STORAGE HIERARCHY Faster On-package 28 CPU Registers files Smaller memory Expensive Caches High-bandwidth memory (HBM) off-package DRAM memory SSD Disk HDD Slower Network Storage Larger Cheaper Photo taken at SC24 STORAGE HIERARCHY Faster On-package 28 CPU Registers files Smaller memory Expensive Caches High-bandwidth memory (HBM) off-package DRAM memory CXL 4.0 SSD Disk HDD Slower Network Storage Larger Cheaper STORAGE HIERARCHY Faster On-package 28 CPU Registers files Smaller memory Expensive Caches High-bandwidth memory (HBM) off-package DRAM memory CXL 3.0 SSD Disk HDD Slower Network Storage Larger Cheaper Photo taken at SC24 CXL.io ( ) CXL.mem ( ) Photo taken at SC24 SEQUENTIAL VS. RANDOM ACCESS Random access on non-volatile storage is almost always much slower than sequential access. DBMS will want to maximize sequential access. → Algorithms try to reduce number of writes to random pages so that data is stored in contiguous blocks. → Allocating multiple pages at the same time is called an extent. SYSTEM DESIGN GOALS Allow the DBMS to manage databases that exceed the amount of memory available. Reading/writing to disk is expensive, so it must be managed carefully to avoid large stalls and performance degradation. Random access on disk is usually much slower than sequential access, so the DBMS will want to maximize sequential access. DISK-ORIENTED DBMS Get Page #2 Execution Engine Pointer to Page #2 Buffer Pool Directory Header Interpret Page #2 layout 2 Update Page #2 Memory Database File Directory Header Header Header Header Header 1 2 3 4 5 … Pages Disk FILE STORAGE The DBMS stores a database as one or more files on disk typically in a proprietary format. → The OS does not know anything about the contents of these files. → We will discuss portable file formats next week… Early systems in the 1980s used custom filesystems on raw block storage. → Some "enterprise" DBMSs still support this. → Most newer DBMSs do not do this. STORAGE MANAGER The storage manager is responsible for maintaining a database's files. → Some do their own scheduling for reads and writes to improve spatial and temporal locality of pages. It organizes the files as a collection of pages. → Tracks data read/written to pages. → Tracks the available space. A DBMS typically does not maintain multiple copies of a page on disk. → Assume this happens above/below storage manager. DATABASE PAGES A page is a fixed-size block of data. → It can contain tuples, meta-data, indexes, log records… → Most systems do not mix page types. → Some systems require a page to be self-contained. Each page is given a unique identifier (page ID). → A page ID could be unique per DBMS instance, per database, or per table. → The DBMS uses an indirection layer to map page IDs to physical locations. DATABASE PAGES There are three different notions of Default DB Page Sizes "pages" in a DBMS: → Hardware Page (usually 4KB) 4KB → OS Page (usually 4KB, x64 2MB/1GB) → Database Page (512B-32KB) A hardware page is the largest block of data that the storage device can 8KB guarantee failsafe writes. DBMSs that specialize in read-only workloads have larger page sizes. 16KB PAGE STORAGE ARCHITECTURE Different DBMSs manage pages in files on disk in different ways. → Heap File Organization → Tree File Organization → Sequential / Sorted File Organization (ISAM) → Hashing File Organization At this point in the hierarchy, we do not need to know anything about what is inside of the pages. HEAP FILE A heap file is an unordered collection of pages with tuples that are stored in random order. → Create / Get / Write / Delete Page → Must also support iterating over all pages. Need additional meta-data to track location of files and free space availability. Offset = Page# × PageSize Database File Page0 Page1 Page2 Page3 Page4 Get Page #2 … HEAP FILE A heap file is an unordered collection of pages with tuples that are stored in random order. → Create / Get / Write / Delete Page → Must also support iterating over all pages. Need additional meta-data to track location of files and free space availability. File Location Page# × PageSize Page Get Page #23 Directory HEAP FILE: PAGE DIRECTORY File 1 File 2 The DBMS maintains special pages that tracks the location of data pages Page0 Page0 in the database files. Directory Data Data → One entry per database object. Table X → Must make sure that the directory pages Index Y are in sync with the data pages. Table Z Page1 Page1 DBMS also keeps meta-data about ⋮ Data Data pages' contents: → Amount of free space per page. → List of free / empty pages. ⋮ ⋮ → Page type (data vs. meta-data). PAGE HEADER Every page contains a header of meta- Page data about the page's contents. → Page Size Header → Checksum → DBMS Version → Transaction Visibility Data → Compression / Encoding Meta-data → Schema Information → Data Summary / Sketches Some systems require pages to be self- contained (e.g., Oracle). PAGE LAYOUT For any page storage architecture, we now need to decide how to organize the data inside of the page. Approach #1: Tuple-oriented Storage Approach #2: Log-structured Storage Approach #3: Index-organized Storage TUPLE-ORIENTED STORAGE How to store tuples in a page? Page Strawman Idea: Keep track of the Num Tuples = 0 number of tuples in a page and then just append a new tuple to the end. TUPLE-ORIENTED STORAGE How to store tuples in a page? Page Strawman Idea: Keep track of the Num Tuples = 30 number of tuples in a page and then Tuple #1 just append a new tuple to the end. Tuple #2 → What happens if we delete a tuple? Tuple #3 TUPLE-ORIENTED STORAGE How to store tuples in a page? Page Strawman Idea: Keep track of the Num Tuples = 230 number of tuples in a page and then Tuple #1 just append a new tuple to the end. → What happens if we delete a tuple? Tuple #3 TUPLE-ORIENTED STORAGE How to store tuples in a page? Page Strawman Idea: Keep track of the Num Tuples = 30 number of tuples in a page and then Tuple #1 just append a new tuple to the end. Tuple #4 → What happens if we delete a tuple? Tuple #3 TUPLE-ORIENTED STORAGE How to store tuples in a page? Page Strawman Idea: Keep track of the Num Tuples = 30 number of tuples in a page and then Tuple #1 just append a new tuple to the end. Tuple #4 → What happens if we delete a tuple? → What happens if we have a variable- Tuple #3 length attribute? SLOTTED PAGES The most common layout scheme is Slot Array called slotted pages. 1 2 3 4 5 6 7 Header The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of: Tuple #4 Tuple #3 → The # of used slots → The offset of the starting location of the Tuple #2 Tuple #1 last slot used. Fixed- and Var-length Tuple Data SLOTTED PAGES The most common layout scheme is Slot Array called slotted pages. 1 2 3 4 5 6 7 Header The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of: Tuple #4 Tuple #3 → The # of used slots → The offset of the starting location of the Tuple #2 Tuple #1 last slot used. Fixed- and Var-length Tuple Data SLOTTED PAGES The most common layout scheme is Slot Array called slotted pages. 1 2 3 4 5 6 7 Header The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of: Tuple #4 Tuple #3 → The # of used slots → The offset of the starting location of the Tuple #2 Tuple #1 last slot used. Fixed- and Var-length Tuple Data SLOTTED PAGES The most common layout scheme is Slot Array called slotted pages. 1 2 3 4 5 6 7 Header The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of: Tuple #4 Tuple #3 → The # of used slots → The offset of the starting location of the Tuple #2 Tuple #1 last slot used. Fixed- and Var-length Tuple Data SLOTTED PAGES The most common layout scheme is Slot Array called slotted pages. 1 2 3 4 5 6 7 Header The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of: Tuple #4 → The # of used slots → The offset of the starting location of the Tuple #2 Tuple #1 last slot used. Fixed- and Var-length Tuple Data SLOTTED PAGES The most common layout scheme is Slot Array called slotted pages. 1 2 3 4 5 6 7 Header The slot array maps "slots" to the tuples' starting position offsets. The header keeps track of: Tuple #4 → The # of used slots → The offset of the starting location of the Tuple #2 Tuple #1 last slot used. Fixed- and Var-length Tuple Data RECORD IDS The DBMS assigns each logical tuple a unique record identifier that CTID (6-bytes) represents its physical location in the database. → File Id, Page Id, Slot # → Most DBMSs do not store ids in tuple. → SQLite uses ROWID as the true primary ROWID (8-bytes) ROWID key and stores them as a hidden attribute. Applications should never rely on %%physloc%% (8-bytes) these IDs to mean anything. ROWID (10-bytes) TUPLE HEADER Each tuple is prefixed with a header Tuple that contains meta-data about it. Header Attribute Data → Visibility info (concurrency control) → Bit Map for NULL values. We do not need to store meta-data about the schema. TUPLE DATA Attributes are typically stored in the Tuple order that you specify them when you Header a b c d e create the table. This is done for software engineering CREATE TABLE foo ( reasons (i.e., simplicity). a INT PRIMARY KEY, b INT NOT NULL, c INT, However, it might be more efficient d DOUBLE, to lay them out differently. e FLOAT ); DENORMALIZED TUPLE DATA DBMS can physically denormalize (e.g., "pre-join") related tuples and CREATE TABLE foo ( store them together in the same page. a INT PRIMARY KEY, → Potentially reduces the amount of I/O for b INT NOT NULL, common workload patterns. ); CREATE TABLE bar ( → Can make updates more expensive. c INT PRIMARY KEY, a INT ⮱REFERENCES foo (a), ); DENORMALIZED TUPLE DATA DBMS can physically denormalize foo (e.g., "pre-join") related tuples and Header a b store them together in the same page. → Potentially reduces the amount of I/O for common workload patterns. → Can make updates more expensive. bar Header c a SELECT * FROM foo JOIN bar ON foo.a = bar.a; Header c a Header c a DENORMALIZED TUPLE DATA DBMS can physically denormalize foo (e.g., "pre-join") related tuples and Header a b c c c … store them together in the same page. → Potentially reduces the amount of I/O for common workload patterns. foo bar → Can make updates more expensive. SELECT * FROM foo JOIN bar ON foo.a = bar.a; DENORMALIZED TUPLE DATA DBMS can physically denormalize foo (e.g., "pre-join") related tuples and Header a b c c c … store them together in the same page. → Potentially reduces the amount of I/O for common workload patterns. foo bar → Can make updates more expensive. Not a new idea. → IBM System R did this in the 1970s. → Several NoSQL DBMSs do this without calling it physical denormalization. LOG-STRUCTURED STORAGE Instead of storing tuples in pages and updating the in-place, the DBMS maintains a log that records changes to tuples. → Each log entry represents a tuple PUT/DELETE operation. → Originally proposed as log-structure merge trees (LSM Trees) in 1996. The DBMS applies changes to an in-memory data structure (MemTable) and then writes out the changes sequentially to disk (SSTable). LOG-STRUCTURED STORAGE MemTable Memory Disk LOG-STRUCTURED STORAGE PUT (key101,a1) MemTable Memory Disk LOG-STRUCTURED STORAGE PUT (key102,b1) MemTable Memory Disk LOG-STRUCTURED STORAGE PUT (key101,a2) MemTable Memory Disk LOG-STRUCTURED STORAGE PUT (key103,c1) MemTable Memory Disk LOG-STRUCTURED STORAGE MemTable SSTable PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 SSTable Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 SSTable SSTable Newest→Oldest Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 SSTable SSTable Newest→Oldest Level #1 SSTable Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 Newest→Oldest Level #1 SSTable Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 SSTable SSTable Newest→Oldest Level #1 SSTable Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 SSTable SSTable Newest→Oldest Level #1 SSTable SSTable Disk LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 Newest→Oldest Level #1 SSTable SSTable Disk Level #2 SSTable LOG-STRUCTURED STORAGE MemTable SSTable Key Low→High PUT (key101,a2) PUT (key102,b1) PUT (key103,c1) Memory Level #0 Newest→Oldest Level #1 Disk Level #2 SSTable LOG-STRUCTURED STORAGE MemTable Memory Level #0 SSTable Level #1 SSTable Disk Level #2 SSTable LOG-STRUCTURED STORAGE GET (key101) MemTable SummaryTable Min/Max Key Per SSTable Key Filter Per Level Memory Level #0 SSTable Level #1 SSTable Disk Level #2 SSTable LOG-STRUCTURED STORAGE Key-value storage that appends log SSTable records on disk to represent changes Key Low→High DEL (key100) to tuples (PUT, DELETE). PUT (key101,a3) → Each log record must contain the tuple's unique identifier. PUT (key102,b2) → Put records contain the tuple contents. PUT (key103,c1) → Deletes marks the tuple as deleted. As the application makes changes to the database, the DBMS appends log records to the end of the file without checking previous log records. LOG-STRUCTURED COMPACTION Periodically compact SSTAbles to reduce wasted space and speed up reads. → Only keep the "latest" values for each key using a sort- merge algorithm. SSTable SSTable DEL (key100) PUT (key101,a2) + PUT (key101,a3) PUT (key102,b1) PUT (key102,b2) DEL (key103) PUT (key103,c1) PUT (key104,d2) Newest→Oldest LOG-STRUCTURED COMPACTION Periodically compact SSTAbles to reduce wasted space and speed up reads. → Only keep the "latest" values for each key using a sort- merge algorithm. SSTable SSTable SSTable DEL (key100) PUT (key101,a2) DEL (key100) + PUT (key101,a3) PUT (key102,b1) PUT (key101,a3) PUT (key102,b2) DEL (key103) PUT (key102,b2) PUT (key103,c1) PUT (key104,d2) PUT (key103,c1) PUT (key104,d2) Newest→Oldest OBSERVATION The two table storage approaches we've discussed so far rely on indexes to find individual tuples. → Such indexes are necessary because the tables are inherently unsorted. But what if the DBMS could keep tuples sorted automatically using an index? INDEX-ORGANIZED STORAGE DBMS stores a table's tuples as the value of an index data structure. → Still use a page layout that looks like a slotted page. → Tuples are typically sorted in page based on key. B+Tree pays maintenance costs upfront, whereas LSMs pay for it later. Inner key→ key→ key→ Nodes Header offset offset offset Leaf Nodes Tuple #3 Tuple #2 Tuple #6 Database storage II (Data Layout) The slides are adapted with minor changes from the open materials of the CMU 15-445/645 course, with credit to Andy Pavlo and gratitude to their outstanding teaching team. DATA LAYOUT unsigned char[] CREATE TABLE foo ( id INT PRIMARY KEY, header id value value BIGINT ); DATA LAYOUT unsigned char[] CREATE TABLE foo ( id INT PRIMARY KEY, header id value value BIGINT ); reinterpret_cast(address) WORD-ALIGNED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE foo ( id INT PRIMARY KEY, unsigned char[] cdate TIMESTAMP, color CHAR(2), zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); WORD-ALIGNED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE foo ( 32-bits id INT PRIMARY KEY, unsigned char[] cdate TIMESTAMP, id color CHAR(2), zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); WORD-ALIGNED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE foo ( 32-bits id INT PRIMARY KEY, unsigned char[] 64-bits cdate TIMESTAMP, id cdate color CHAR(2), zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); WORD-ALIGNED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE foo ( 32-bits id INT PRIMARY KEY, unsigned char[] 64-bits cdate TIMESTAMP, id cdate c 16-bits color CHAR(2), zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); WORD-ALIGNED TUPLES All attributes in a tuple must be word aligned to enable the CPU to access it without any unexpected behavior or additional work. CREATE TABLE foo ( 32-bits id INT PRIMARY KEY, unsigned char[] 64-bits cdate TIMESTAMP, id cdate c zipc 16-bits color CHAR(2), 32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); WORD-ALIGNMENT: PADDING Add empty bits after attributes to ensure that tuple is word aligned. Essentially round up the storage size of types to the next largest word size. CREATE TABLE foo ( 32-bits id INT PRIMARY KEY, 00000000 00000 64-bits cdate TIMESTAMP, id 00000000 00000000 00000000 cdate c zipc 000 00000 000 16-bits color CHAR(2), 32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); WORD-ALIGNMENT: REORDERING Switch the order of attributes in the tuples' physical layout to make sure they are aligned. → May still have to use padding to fill remaining space. CREATE TABLE foo ( 32-bits id INT PRIMARY KEY, 64-bits cdate TIMESTAMP, id cdate c zipc 16-bits color CHAR(2), 32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); WORD-ALIGNMENT: REORDERING Switch the order of attributes in the tuples' physical layout to make sure they are aligned. → May still have to use padding to fill remaining space. CREATE TABLE foo ( 32-bits id INT PRIMARY KEY, 000000000000 64-bits cdate TIMESTAMP, id zipc cdate c 000000000000 000000000000 000000000000 16-bits color CHAR(2), 32-bits zipcode INT 64-bit Word 64-bit Word 64-bit Word 64-bit Word ); VARIABLE PRECISION NUMBERS Inexact, variable-precision numeric type that uses the "native" C/C++ types. Store directly as specified by IEEE-754. → Example: FLOAT, REAL/DOUBLE These types are typically faster than fixed precision numbers because CPU ISA's (Xeon, Arm) have instructions / registers to support them. But they do not guarantee exact values… VARIABLE PRECISION NUMBERS Rounding Example Output #include x+y = 0.300000 0.3 = 0.300000 int main(int argc, char* argv[]) { float x = 0.1; float y = 0.2; printf("x+y = %f\n", x+y); printf("0.3 = %f\n", 0.3); } VARIABLE PRECISION NUMBERS Rounding Example Output #include x+y = 0.300000 0.3 = 0.300000 int#include main(int argc, char* argv[]) { float x = 0.1; x+y = 0.30000001192092895508 int main(int float argc, char* argv[]) { y = 0.2; 0.3 = 0.29999999999999998890 float x = =0.1; printf("x+y %f\n", x+y); float y = 0.2; printf("0.3 = %f\n", 0.3); } printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); } FIXED PRECISION NUMBERS Numeric data types with (potentially) arbitrary precision and scale. Used when rounding errors are unacceptable. → Example: NUMERIC, DECIMAL Many different implementations. → Example: Store in an exact, variable-length binary representation with additional meta-data. → Can be less expensive if the DBMS does not provide arbitrary precision (e.g., decimal point can be in a different position per value). POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1st Digit typedef struct { int ndigits; Scale Factor int weight; int scale; Positive/Negative/NaN int sign; NumericDigit *digits; Digit Storage } numeric; NULL DATA TYPES Choice #1: Null Column Bitmap Header → Store a bitmap in a centralized header that specifies what attributes are null. → This is the most common approach in row-stores. Choice #2: Special Values → Designate a placeholder value to represent NULL for a data type (e.g., INT32_MIN). More common in column-stores. Choice #3: Per Attribute Null Flag → Store a flag that marks that a value is null. → Must use more space than just a single bit because this messes up with word alignment. LARGE VALUES CREATE TABLE foo ( Most DBMSs do not allow a tuple to id INT PRIMARY KEY, exceed the size of a single page. data INT, Tuple contents TEXT ); To store values that are larger than a page, the DBMS uses separate Header INT INT TEXT overflow storage pages. → Postgres: TOAST (>2KB) → MySQL: Overflow (>½ size of page) → SQL Server: Overflow (>size of page) Lots of potential optimizations: → Overflow Compression, German Strings Overflow Compression German Strings LARGE VALUES CREATE TABLE foo ( Most DBMSs do not allow a tuple to id INT PRIMARY KEY, exceed the size of a single page. data INT, Tuple contents TEXT ); To store values that are larger than a page, the DBMS uses separate Header INT INT TEXT overflow storage pages. → Postgres: TOAST (>2KB) → MySQL: Overflow (>½ size of page) Overflow Page → SQL Server: Overflow (>size of page) VARCHAR DATA Lots of potential optimizations: → Overflow Compression, German Strings Overflow Compression German Strings LARGE VALUES CREATE TABLE foo ( Most DBMSs do not allow a tuple to id INT PRIMARY KEY, exceed the size of a single page. data INT, Tuple contents TEXT ); To store values that are larger than a page, the DBMS uses separate Header INT INT size TEXT location overflow storage pages. → Postgres: TOAST (>2KB) → MySQL: Overflow (>½ size of page) Overflow Page → SQL Server: Overflow (>size of page) VARCHAR DATA Lots of potential optimizations: → Overflow Compression, German Strings Overflow Compression German Strings EXTERNAL VALUE STORAGE Some systems allow you to store a Tuple large value in an external file. Header a b c d e Treated as a BLOB type. → Oracle: BFILE data type → Microsoft: FILESTREAM data type External File The DBMS cannot manipulate the contents of an external file. → No durability protections. Data → No transaction protections. SYSTEM CATALOGS A DBMS stores meta-data about databases in its internal catalogs. → Tables, columns, indexes, views → Users, permissions → Internal statistics Almost every DBMS stores the database's catalog inside itself (i.e., as tables). → Wrap object abstraction around tuples. → Specialized code for "bootstrapping" catalog tables. SYSTEM CATALOGS You can query the DBMS’s internal INFORMATION_SCHEMA catalog to get info about the database. → ANSI standard set of read-only views that provide info about all the tables, views, columns, and procedures in a database DBMSs also have non-standard shortcuts to retrieve this information. SCHEMA CHANGES ADD COLUMN: → NSM: Copy tuples into new region in memory. → DSM: Just create the new column segment on disk. DROP COLUMN: → NSM #1: Copy tuples into new region of memory. → NSM #2: Mark column as "deprecated", clean up later. → DSM: Just drop the column and free memory. CHANGE COLUMN: → Check whether the conversion is allowed to happen. Depends on default values. INDEXES CREATE INDEX: → Scan the entire table and populate the index. → Have to record changes made by txns that modified the table while another txn was building the index. → When the scan completes, lock the table and resolve changes that were missed after the scan started. DROP INDEX: → Just drop the index logically from the catalog. → It only becomes "invisible" when the txn that dropped it commits. All existing txns will still have to update it.