Data_management2.pdf

2.1 RELATIONAL MODEL Database models A database model is a conceptual framework for database systems, with three parts: ● ● ● Data structures that prescribe how data is organized. Operations that manipulate data structures. Rules that govern valid data. Primary data structure Hierarchical Network Relational Object Graph Document Initial product releases Example database system Strengths Tree 1960s IMS Fast queries Efficient storage Linked list 1970s IDMS Fast queries Efficient storage Oracle Database Productivity and simplicity Transactional applications ObjectStore Integration with object-oriented programming languages Neo4j Flexible schema Evolving business requirements MongoDB Flexible schema Unstructured and semi-structured data Table Class Vertex and edge XML JSON 1980s 1990s 2000s 2010s Relational data structure The relational data structure is based on set theory. A set is an unordered collection of elements enclosed in braces. Ex: {a, b, c} and {c, b, a} are the same, since sets are not ordered. A tuple is an ordered collection of elements enclosed in parentheses. Ex: (a, b, c) and (c, b, a) are different, since tuples are ordered. The data structure organizes data in tables: ● ● A table has a name, a fixed tuple of columns, and a varying set of rows. A column has a name and a data type. A row is an unnamed tuple of values. Each value corresponds to a column and belongs to the column's data type. ● A data type is a named set of values, from which column values are drawn. Since a table is a set of rows, the rows have no inherent order. ● Synonyms: Table, File, Relation Row, Record, Tuple Column, Field, Attribute Relational operations Like the relational data structure, relational operations are based on set theory. Each operation generates a result table from one or two input tables: ● ● ● ● ● ● ● ● ● Select selects a subset of rows of a table. Project eliminates one or more columns of a table. Product lists all combinations of rows of two tables. Join combines two tables by comparing related columns. Union selects all rows of two tables. Intersect selects rows common to two tables. Difference selects rows that appear in one table but not another. Rename changes a table name. Aggregate computes functions over multiple table rows, such as sum and count. Relational rules Rules are logical constraints that ensure data is valid. Relational rules are part of the relational model and govern data in every relational database. Ex: Unique primary key. All tables have a primary key column, or group of columns, in which values may not repeat. ● Unique column names. Different columns of the same table have different names. ● No duplicate rows. No two rows of the same table have identical values in all columns. Business rules are based on business policy and specific to a particular database. Ex: All rows of the Employee table must have a valid entry in the DepartCode column. Ex: PassportNumber values may not repeat in different Employee rows. ● Relational rules are implemented as SQL constraints and enforced by the database system. Business rules are discovered during database design and, like relational rules, often implemented as SQL constraints. However, some complex business rules must be enforced by applications running on the database. 2.2 STRUCTURED QUERY LANGUAGE Structured Query Language (SQL) is a high-level computer language for storing, manipulating, and retrieving data. SQL is the standard language for relational databases, and is commonly supported in non-relational databases. SQL is pronounced either 'S-Q-L' or 'seekwəl', but the preferred pronunciation is 'S-Q-L'. SQL syntax An SQL statement is a complete command composed of one or more clauses. A clause groups SQL keywords like SELECT, FROM, and WHERE with table names like City, column names like Name, and conditions like Population > 100000. SELECT Name FROM City WHERE Population > 1000 Type Description Examples Literals Explicit values that are string, numeric, or binary. Strings must be surrounded by single quotes or double quotes. Binary values are represented with x'0' where the 0 is any hex value. Keywords Words with special meaning. SELECT, FROM, WHERE Identifiers Objects from the database like tables, columns, etc. City, Name, Population Comments Statement intended only for humans and ignored by the database when parsing an SQL statement. -- single line comment /* multi-line Comment */ 'String' "String" 123 x'0fa2' SQL sublanguages The SQL language is divided into five sublanguages: ● ● ● ● ● Data Definition Language (DDL) defines the structure of the database. Data Query Language (DQL) retrieves data from the database. Data Manipulation Language (DML) manipulates data stored in a database. Data Control Language (DCL) controls database user access. Data Transaction Language (DTL) manages database transactions. 2.3 MANAGING DATABASES CREATE DATABASE and DROP DATABASE statements A database system instance is a single executing copy of a database system. Personal computers usually run just one instance of a database system. Shared computers, such as computers used for cloud services, usually run multiple instances of a database system. Each instance usually contains multiple system and user databases. Several SQL statements help database administrators, designers, and users manage the databases on an instance. CREATE DATABASE DatabaseName creates a new database. DROP DATABASE DatabaseName deletes a database, including all tables in the database. USE and SHOW statements USE DatabaseName selects a default database for use in subsequent SQL statements. Several SHOW statements provide information about databases, tables, and columns: ● ● ● ● SHOW DATABASES lists all databases in the database system instance. SHOW TABLES lists all tables in the default database. SHOW COLUMNS FROM TableName lists all columns in the TableName table of the default database. SHOW CREATE TABLE TableName shows the CREATE TABLE statement for the TableName table of the default database. 2.4 TABLES Tables, columns, and rows All data in a relational database is structured in tables: A table has a name, a fixed sequence of columns, and a varying set of rows. A column has a name and a data type. A row is an unnamed sequence of values. Each value corresponds to a column and belongs to the column's data type. ● A cell is a single column of a single row. A table must have at least one column but any number of rows. A table without rows is called an empty table. ● ● ● Rules governing tables Tables must obey relational rules, including: 1. Exactly one value per cell. A cell may not contain multiple values. Unknown data is represented with a special NULL value. 2. No duplicate column names. Duplicate column names are allowed in different tables, but not in the same table. 3. No duplicate rows. No two rows may have identical values in all columns. 4. No row order. Rows are not ordered. The organization of rows on a storage device, such as a disk drive, never affects query results. Rules 6 and 7 follow directly from the definition of a table. A table is a set of rows. Since a set's elements may not repeat and are not ordered, the same is true of a table's rows. Rule 7 is called data independence. Data independence allows database administrators to improve query performance by changing the organization of data on storage devices, without affecting query results. CREATE TABLE and DROP TABLE statements The CREATE TABLE statement creates a new table by specifying the table name, column names, and column data types. Example data types are: INT or INTEGER — integer values VARCHAR(N) — values with 0 to N characters DATE — date values DECIMAL(M, D) — numeric values with M digits, of which D digits follow the decimal point ● ● ● ● The DROP TABLE statement deletes a table, along with all the table's rows, from a database. ALTER TABLE statement The ALTER TABLE statement adds, deletes, or modifies columns on an existing table. The ALTER TABLE statement specifies the table name followed by a clause that indicates what should be altered. The table below summarizes the three ALTER TABLE clauses. ALTER TABLE clause ADD Description Adds a column CHANGE Modifies a column DROP Deletes a column Syntax ALTER TABLE TableName ADD ColumnName DataType; ALTER TABLE TableName CHANGE CurrentColumnName NewColumnName NewDataType; ALTER TABLE TableName DROP ColumnName; 2.5 DATA TYPES Data type categories A data type is a named set of values from which column values are drawn. In relational databases, most data types fall into one of the following categories: ● ● Integer data types represent positive and negative integers. Several integer data types exist, varying by the number of bytes allocated for each value. Common integer data types include INT, implemented as 4 bytes of storage, and SMALLINT, implemented as 2 bytes. Decimal data types represent numbers with fractional values. Decimal data types vary by number of digits after the decimal point and maximum size. Common decimal data types include FLOAT and DECIMAL. ● ● ● ● ● Character data types represent textual characters. Common character data types include CHAR, a fixed string of characters, and VARCHAR, a string of variable length up to a specified maximum size. Date and time data types represent date, time, or both. Some date and time data types include a time zone or specify a time interval. Some date and time data types represent an interval rather than a point in time. Common date and time data types include DATE, TIME, DATETIME, and TIMESTAMP. Binary data types store data exactly as the data appears in memory or computer files, bit for bit. The database manages binary data as a series of zeros and ones. Common binary data types include BLOB, BINARY, VARBINARY, and IMAGE. Spatial data types store geometric information, such as lines, polygons, and map coordinates. Examples include POLYGON, POINT, and GEOMETRY. Spatial data types are relatively new and consequently vary greatly across database systems. Document data types contain textual data in a structured format such as XML or JSON. Category Data type Value Integer INT -9281344 Decimal FLOAT 3.1415 Character VARCHAR Chicago Date and time DATETIME 12/25/2020 10:35:00 Binary BLOB 1001011101 . . . Spatial POINT (2.5, 33.44) <menu> <selection> <name>Greek salad</name> Document XML <price>$13.90</price> <text>Cucumbers, tomatoes, onions, and feta cheese</text> </selection> <selection> <name>Turkey sandwich</name> <price>$9.00</price> <text>Turkey, lettuce, tomato on choice of bread</text> </selection> </menu> MySQL data types All relational databases support integer, decimal, date and time, and character data types. Most databases allow integer and decimal numbers to be signed or unsigned. A signed number may be negative. An unsigned number cannot be negative. Data types vary in storage requirements. Ex: ● ● ● Character data types use one or two bytes per character. Integer data types use a fixed number of bytes per number. Unsigned data types can store larger numbers than the signed version of the same data type. Category Example Data type TINYINT SMALLINT Integer 34 and -739448 MEDIUMINT INTEGER or INT BIGINT Decimal 123.4 and -54.29685 DECIMAL(M,D) Storage Notes 1 byte Signed range: -128 to 127 Unsigned range: 0 to 255 2 bytes Signed range: -32,768 to 32,767 Unsigned range: 0 to 65,535 3 bytes Signed range: -8,388,608 to 8,388,607 Unsigned range: 0 to 16,777,215 4 bytes Signed range: -2,147,483,648 to 2,147,483,647 Unsigned range: 0 to 4,294,967,295 8 bytes Signed range: -263 to 263 -1 Unsigned range: 0 to 264 -1 Varies depending on M and D Exact decimal number where M = number of significant digits, D = number of digits after decimal point 4 bytes Approximate decimal numbers with range: -3.4E+38 to 3.4E+38 8 bytes Approximate decimal numbers with range: -1.8E+308 to 1.8E+308 DATE 3 bytes Format: YYYY-MM-DD. Range: '1000-01-01' to '9999-12-31' TIME 3 bytes Format: hh:mm:ss 5 bytes Format: YYYY-MM-DD hh:mm:ss. Range: '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. N bytes Fixed-length string of length N; 0 ≤ N ≤ 255 VARCHAR(N) Length of characters + 1 bytes Variable-length string with maximum N characters; 0 ≤ N ≤ 65,535 TEXT Length of characters + 2 bytes Variable-length string with maximum FLOAT DOUBLE Date and time '1776-07-04 13:45:22' DATETIME CHAR(N) Character 'string' 65,535 characters 2.6 SELECTING ROWS Operators An operator is a symbol that computes a value from one or more other values, called operands: Arithmetic operators compute numeric values from numeric operands. Comparison operators compute logical values TRUE or FALSE. Operands may be numeric, character, and other data types. ● Logical operators compute logical values from logical operands. A unary operator has one operand. A binary operator has two operands. Most operators are binary. The logical operator NOT is unary. The arithmetic operator - is either unary or binary. ● ● Type Operator Description Example Value + Adds two numeric values 4 + 3 7 - (unary) Reverses the sign of one numeric value -(-2) 2 - (binary) Subtracts one numeric value from another 11 - 5 6 * Multiplies two numeric values 3 * 5 15 / Divides one numeric value by another 4 / 2 2 % (modulo) Divides one numeric value by another and returns the integer remainder 5 % 2 1 ^ Raises one numeric value to the power of another 5^2 25 Arithmetic Comparison Logical = Compares two values for equality 1 = 2 FALSE != Compares two values for inequality 1 != 2 TRUE < Compares two values with < 2 < 2 FALSE <= Compares two values with ≤ 2 <= 2 TRUE > Compares two values with > '2019-08-13' > '2021-08-13' FALSE >= Compares two values with ≥ 'apple' >= 'banana' FALSE AND Returns TRUE only when both values are TRUE TRUE AND FALSE FALSE OR Returns FALSE only when both values are FALSE TRUE OR FALSE TRUE NOT Reverses a logical value NOT FALSE TRUE Expressions An expression is a string of operators, operands, and parentheses that evaluates to a single value. Operands may be column names or fixed values. The value of an expression may be any data type. Ex: Salary > 34000 AND Department = 'Marketing' is an expression with a logical value. A simple expression may consist of a single column name or a fixed value. Ex: The column EmployeeName and the fixed value 'Maria' are expressions with a character data type. When an expression is evaluated, column names are replaced with column values for a specific row. Consequently, an expression containing column names may have different values for different rows. The order of operator evaluation may affect the value of an expression. Operators in an expression are evaluated in the order of operator precedence, shown in the table below. Operators of the same precedence are evaluated from left to right. Regardless of operator precedence, expressions enclosed in parentheses are evaluated before any operators outside the parentheses are applied. SELECT statement The SELECT statement selects rows from a table. The statement has a SELECT clause and a FROM clause. The FROM clause specifies the table from which rows are selected. The SELECT clause specifies one or more expressions, separated by commas, that determine what values are returned for each row. The SELECT statement returns a set of rows, called the result table. WHERE clause An expression may return a value of any data type. A condition is an expression that evaluates to a logical value. A SELECT statement has an optional WHERE clause that specifies a condition for selecting rows. A row is selected when the condition is TRUE for the row values. A row is omitted when the condition is either FALSE or NULL. The WHERE clause follows the FROM clause. When a SELECT statement has no WHERE clause, all rows are selected. 2.7 NULL VALUES NULL NULL is a special value that represents either unknown or inapplicable data. NULL is not the same as zero for numeric data types or blanks for character data types. Ex: A zero bonus indicates an employee can, but has not, earned a bonus. A zero bonus is known and applicable, and should not be represented as NULL. NOT NULL constraint The NOT NULL constraint prevents a column from having a NULL value. Statements that insert NULL, or update a value to NULL, are automatically rejected. NOT NULL follows the column name and data type in a CREATE TABLE statement. 2.8 INSERTING, UPDATING, AND DELETING ROWS INSERT statement The INSERT statement adds rows to a table. The INSERT statement has two clauses: The INSERT INTO clause names the table and columns where data is to be added. The keyword INTO is optional. ● The VALUES clause specifies the column values to be added. The VALUES clause may list any number of rows in parentheses to insert multiple rows. ● INSERT [INTO] TableName (Column1, Column2, ...) VALUES (Value1, Value2, ...); DEFAULT values The optional DEFAULT keyword and default value follow the column name and data type in a CREATE TABLE statement. The column is assigned the default value, rather than NULL, when omitted from an INSERT statement. UPDATE statement The UPDATE statement modifies existing rows in a table. The UPDATE statement uses the SET clause to specify the new column values. An optional WHERE clause specifies which rows are updated. Omitting the WHERE clause results in all rows being updated. DELETE statement The DELETE statement deletes existing rows in a table. The FROM keyword is followed by the table name whose rows are to be deleted. An optional WHERE clause specifies which rows should be deleted. Omitting the WHERE clause results in all rows in the table being deleted. The TRUNCATE statement deletes all rows from a table. TRUNCATE is nearly identical to a DELETE statement with no WHERE clause except for minor differences that depend on the database system. The MERGE statement selects data from one table, called the source, and inserts the data to another table, called the target. 2.9 PRIMARY KEYS Primary keys A primary key is a column, or group of columns, used to identify a row. The primary key is usually the table's first column and appears on the left of table diagrams, but the position is not significant to the database. Often, primary key values are used in the WHERE clause to select a specific row. The primary key is specified in SQL when the table is created. If a table contains several unique columns, any unique column, or group of columns, may be specified. Primary keys must be: ● Unique. This rule ensures that each value identifies at most one row. ● Not NULL. This rule ensures that each value identifies at least one row. Together, the two rules ensure that each primary key value identifies exactly one row. Composite primary keys Sometimes multiple columns are necessary to identify a row. A simple primary key consists of a single column. A composite primary key consists of multiple columns. Composite primary keys are denoted with parentheses. Ex: (ColumnA, ColumnB). Composite primary keys must be: ● ● ● Unique. Values of primary key columns, when grouped together, must be unique. No group of values may repeat in multiple rows. Not NULL. No column of a composite primary key may contain a NULL value. Minimal. All primary key columns are necessary for uniqueness. When any column is removed, the resulting simple or composite column is no longer unique. PRIMARY KEY constraint The PRIMARY KEY constraint in a CREATE TABLE statement names the table's primary key. The PRIMARY KEY constraint ensures that a column or group of columns is always unique and non-null. Auto-increment columns An auto-increment column is a numeric column that is assigned an automatically incrementing value when a new row is inserted. The AUTO_INCREMENT keyword defines an auto-increment column. AUTO_INCREMENT follows the column's data type in a CREATE TABLE statement. Database users occasionally make the following errors when inserting primary keys: ● Inserting values for auto-increment primary keys. ● Omitting values for primary keys that are not auto-increment columns. MySQL allows insertion of a specific value to an auto-increment column. However, overriding auto-increment for a primary key is usually a mistake. 2.10 FOREIGN KEYS Foreign keys A foreign key is a column, or group of columns, that refer to a primary key. The data types of the foreign and primary keys must be the same, but the names may be different. Foreign keys do not obey the same rules as primary keys. Foreign key values may be repeated and may be NULL. Foreign keys obey a relational rule called referential integrity. Referential integrity requires foreign key values must either be NULL or match some value of the referenced primary key. Special cases Multiple foreign keys may refer to the same primary key. A foreign key may refer to a primary key in the same table. FOREIGN KEY constraint A foreign key constraint is added to a CREATE TABLE statement with the FOREIGN KEY and REFERENCES keywords. When a foreign key constraint is specified, the database rejects insert, update, and delete statements that violate referential integrity. 2.11 REFERENTIAL INTEGRITY Referential integrity rule A fully NULL foreign key is a simple or composite foreign key in which all columns are NULL. Referential integrity is a relational rule that requires foreign key values are either fully NULL or match some primary key value. In a relational database, foreign keys must obey referential integrity at all times. Occasionally, data entry errors or incomplete data result in referential integrity violations. Violations must be corrected before data is stored in the database. Referential integrity violations Referential integrity can be violated in four ways: 5. A primary key is updated. 6. A foreign key is updated. 7. A row containing a primary key is deleted. 8. A row containing a foreign key is inserted. Referential integrity actions An insert, update, or delete that violates referential integrity can be corrected manually. However, manual corrections are time-consuming and error-prone. Instead, databases automatically correct referential integrity violations with any of four actions, specified as SQL constraints: ● RESTRICT rejects an insert, update, or delete that violates referential integrity. ● SET NULL sets invalid foreign keys to NULL. ● SET DEFAULT sets invalid foreign keys to the foreign key default value. ● CASCADE propagates primary key changes to foreign keys. CASCADE behaves differently for primary key updates and deletes. If a primary key is deleted, rows containing matching foreign keys are deleted. If a primary key is updated, matching foreign keys are updated to the same value. RESTRICT, SET NULL, and SET DEFAULT apply to primary key update and delete, and foreign key insert and update. CASCADE applies to primary key update and delete only. ON UPDATE and ON DELETE clauses For foreign key inserts and updates, MySQL supports only RESTRICT. Foreign key inserts and updates that violate referential integrity are automatically rejected. For primary key updates and deletes, MySQL supports all four actions. Actions are specified in the optional ON UPDATE and ON DELETE clauses of the FOREIGN KEY constraint. ON UPDATE and ON DELETE are followed by either RESTRICT, SET NULL, SET DEFAULT, or CASCADE. ON UPDATE and ON DELETE determine what happens to the foreign key when the referenced primary key is updated or deleted. When several foreign keys refer to the same primary key, different actions can be specified for each foreign key. MySQL has several limitations on primary key updates and deletes: ● ● ● RESTRICT is applied when the ON UPDATE or ON DELETE clause is omitted. SET NULL cannot be used when a foreign key is not allowed NULL values. SET DEFAULT is not supported in some MySQL configurations. 2.12 CONSTRAINTS Column and table constraints A constraint is a rule that governs allowable values in a database. Constraints are based on relational and business rules, and implemented with special keywords in a CREATE TABLE statement. The database automatically rejects insert, update, and delete statements that violate a constraint. The following constraints are described elsewhere in this material: ● ● ● ● NOT NULL DEFAULT PRIMARY KEY FOREIGN KEY A column constraint appears after the column name and data type in a CREATE TABLE statement. Column constraints govern values in a single column. Ex: NOT NULL is a column constraint. A table constraint appears in a separate clause of a CREATE TABLE statement and governs values in one or more columns. Ex: FOREIGN KEY is a table constraint. Some constraint types can be defined as either column or table constraints. UNIQUE constraint The UNIQUE constraint ensures that values in a column, or group of columns, are unique. When applied to a single column, UNIQUE may appear either in the column declaration or a separate clause. When applied to a group of columns, UNIQUE is a table constraint and must appear in a separate clause. CHECK constraint The CHECK constraint specifies an expression on one or more columns of a table. The constraint is violated when the expression is FALSE and satisfied when the expression is either TRUE or NULL. Adding and dropping constraints Constraints are added and dropped with the ALTER TABLE TableName followed by an ADD, DROP, or CHANGE clause. Unnamed constraints such as NOT NULL and DEFAULT are added or dropped with a CHANGE clause: ● CHANGE CurrentColumnName NewColumnName NewDataType [ConstraintDeclaration] Named constraints are added with an ADD clause: ● ● ADD [CONSTRAINT ConstraintName] PRIMARY KEY (Column1, Column2 ...) ADD [CONSTRAINT ConstraintName] FOREIGN KEY (Column1, Column2 ...) REFERENCES TableName (Column) ● ADD [CONSTRAINT ConstraintName] UNIQUE (Column1, Column2 ...) ● ADD [CONSTRAINT ConstraintName] CHECK (expression) Adding a constraint fails when the table contains data that violates the constraint. Named constraints are dropped with a DROP clause: ● DROP PRIMARY KEY ● DROP FOREIGN KEY ConstraintName ● DROP INDEX ConstraintName (drops UNIQUE constraints) ● DROP CHECK ConstraintName ● DROP CONSTRAINT ConstraintName (drops any named constraint) Dropping a table fails when a foreign key constraint refers to the table's primary key. Before dropping the table, either the foreign key constraint or the foreign key table must be dropped.

Data_management2.pdf

Document Details

Tags

Related

Full Transcript