Database Management System Module 3 PDF
Document Details
Uploaded by ChasteMimosa
Department of ISE, RNSIT
Tags
Summary
This document covers the basics of functional dependencies and normalization for relational databases. The content provides informal guidelines, explains clear semantics, and discusses redundant information within tuples and update anomalies.
Full Transcript
Database Management System MODULE - 3 Basics of Functional Dependencies and Normalization for Relational Databases. 1. Informal Design Guidelines for Relation Schemas. Four informal guidelines that may be used as measures to determine t...
Database Management System MODULE - 3 Basics of Functional Dependencies and Normalization for Relational Databases. 1. Informal Design Guidelines for Relation Schemas. Four informal guidelines that may be used as measures to determine the quality of relation schema design: Making sure that the semantics of the attributes is clear in the schema Reducing the redundant information in tuples Reducing the NULL values in tuples Disallowing the possibility of generating spurious tuples 1.1 Imparting Clear Semantics to Attributes in Relations. The group of attributes belonging to one relation have certain real-world meaning and a proper interpretation associated with them. The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple. If the conceptual design done carefully and the mapping procedure is followed systematically, the relational schema design should have a clear meaning. The meaning of the EMPLOYEE relation schema is quite simple: Each tuple represents an employee, with values for the employee‘s name (Ename), Social Security number (Ssn), birth date (Bdate), and address (Address), and the number of the department that the employee works for (Dnumber). Guideline 1 Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. If a relation schema corresponds to one entity type or one relationship type, it is straightforward to interpret and to explain its meaning. Otherwise, if the relation Dept. of ISE, RNSIT Page 1 Database Management System corresponds to a mixture of multiple entities and relationships, semantic ambiguities will result and the relation cannot be easily explained. Examples of Violating Guideline 1. The following relation schema EMP_DEPT and EMP_PROJ have clear semantics but they violate Guideline 1 by mixing attributes from distinct real-world entities: EMP_DEPT mixes attributes of employees and departments, and EMP_PROJ mixes attributes of employees and projects and the WORKS_ON relationship. Hence, they fare poorly against the above measure of design quality. 1.2 Redundant Information in Tuples and Update Anomalies. One goal of schema design is to minimize the storage space used by the base relations. Grouping attributes into relation schemas has a significant effect on storage space. For example, The space used by the two base relations EMPLOYEE and DEPARTMENT is less compared to EMP_DEPT. Dept. of ISE, RNSIT Page 2 Database Management System 1. In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn) are repeated for every employee who works for that department. In contrast, each department‘s information appears only once in the DEPARTMENT relation. 2. EMP_DEPT base relation is the result of applying the NATURAL JOIN operation to EMPLOYEE and DEPARTMENT. Storing natural joins of base relations leads to an additional problem referred to as update anomalies. Update anomalies can be classified into insertion anomalies, deletion anomalies, and modification anomalies. Insertion Anomalies. Insertion anomalies can be differentiated into two types, based on the EMP_DEPT relation: 1. To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for, or NULLs (if the employee does not work for a department as yet). 2. It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation. The only way to do this is to place NULL values in the attributes for employee. This violates the entity integrity for EMP_DEPT because Ssn is its primary key. Deletion Anomalies. The problem of deletion anomalies is related to the second insertion anomaly situation. 1. If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost from the database. 2. This problem does not occur in DEPARTMENT relation since tuples are stored separately. Dept. of ISE, RNSIT Page 3 Database Management System Modification Anomalies. 1. In EMP_DEPT, if we change the value of one of the attributes of a particular department say, the manager of department 5 we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent. 2. If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which would be wrong. Guideline 2 Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly. 1.3 NULL Values in Tuples If many of the attributes do not apply to all tuples in the relation, we end up with many NULLs in those tuples. This can waste space at the storage level and may also lead to problems with understanding the meaning of the attributes. SELECT and JOIN operations involve comparisons; if NULL values are present, the results may become unpredictable. NULLs can have multiple interpretations, such as the following: 1. The attribute does not apply to this tuple. For example, Visa_status may not apply to U.S. students. 2. The attribute value for this tuple is unknown. For example, the Date_of_birth may be unknown for an employee. 3. The value is known but absent; For example, the Home_Phone_Number for an employee may exist, but may not be available and recorded yet. Guideline 3 Avoid placing attributes in a base relation whose values may frequently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation. 1.4 Generation of Spurious Tuples Consider the two relation schemas EMP_LOCS and EMP_PROJ1. Suppose if we perform NATURAL JOIN operation on EMP_PROJ1 and EMP_LOCS, the result produces many more tuples than the original set of tuples. These additional tuples are called spurious tuples because they represent spurious information that is not valid. The spurious tuples are marked by asterisks (*) in Figure 15.6. Dept. of ISE, RNSIT Page 4 Database Management System Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because when we JOIN them back using NATURAL JOIN, we do not get the correct original information. This is because in this case Plocation is the attribute that relates EMP_LOCS and EMP_PROJ1, and Plocation is neither a primary key nor a foreign key in either EMP_LOCS or EMP_PROJ1. Guideline 4 Design relation schemas so that they can be joined with equality conditions on attributes that are appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious tuples are generated. Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations because joining on such attributes may produce spurious tuples. 2 Functional Dependencies Definition :A functional dependency, denoted by X → Y, between two sets of attributes X and Y that are subsets of R specifies a constraint on the tuples in a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y]. A functional dependency, denoted by X → Y means that the values of the Y are determined by the values of X. A functional dependency is a property of the semantics or meaning of the attributes. The database designers will use their understanding of the semantics of the attributes of R to specify the functional dependencies in a relation. Consider the relation schema EMP_PROJ from the semantics of the attributes and the relation, the following functional dependencies should hold: Dept. of ISE, RNSIT Page 5 Database Management System a. Ssn→Ename b. Pnumber →{Pname, Plocation} c. {Ssn, Pnumber}→Hours These functional dependencies specifies that (a) the value of an employee‘s Social Security number (Ssn) uniquely determines the employee name (Ename), (b) the value of a project‘s number (Pnumber) uniquely determines the project name (Pname) and location (Plocation), and (c) Combination of Ssn and Pnumber values uniquely determines the number of hours the employee currently works on the project per week (Hours). Types of functional dependency : 1. A functional dependency X → Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more; Ex: {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours nor Pnumber→Hours holds). 2. A functional dependency X → Y is a partial functional dependency if removal of any attribute A from X and the dependency still holds; Ex: {Ssn, Pnumber}→Ename is partial because Ssn→Ename holds. 3. A functional dependency X→Y in a relation schema R is a transitive dependency if there exists a set of attributes Z in R such that X→Z and Z→Y hold. Ex: The dependency Ssn→Dmgr_ssn is transitive in EMP_DEPT, because of the dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn. 4. Trivial Functional Dependency. If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD. 3 Normal Forms Based on Primary Keys 3.1 Normalization of Relations The normalization process, as first proposed by Codd (1972). Codd proposed three normal forms, which he called first, second, third normal form and Boyce-Codd normal form (BCNF). All these normal forms are based on functional dependencies among the attributes of a relation. Later, a fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on the concepts of multivalued dependencies and join dependencies, respectively; Normalization of data can be considered a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of (1) minimizing redundancy and (2) minimizing the insertion, deletion, and update anomalies. It can be considered as a ―filtering‖ or ―purification‖ process to make the design have successively better quality. Definition. The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized. Dept. of ISE, RNSIT Page 6 Database Management System 3.2 Definitions of Keys and Attributes Participating in Keys. Definition. A superkey of a relation schema R is a set of attributes S ⊆ R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]. A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more. The difference between a key and a superkey is that a key has to be minimal; that is, if we have a key K = {A1, A2,..., Ak} of R, then K – {Ai} is not a key of R. Ex: {Ssn} is a key for EMPLOYEE, whereas {Ssn}, {Ssn, Ename}, {Ssn, Ename, Bdate}, and any set of attributes that includes Ssn are all superkeys. If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. {Ssn} is the only candidate key for EMPLOYEE, so it is also the primary key. Definition. An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. An attribute is called nonprime if it is not a prime attribute—that is, if it is not a member of any candidate key. Ex: Ssn and Pnumber are prime attributes of WORKS_ON, whereas other attributes of WORKS_ON are nonprime. 3.3 First Normal Form First normal form (1NF) states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute. Consider the following DEPARTMENT relation, It is not in 1NF. Because the domain of Dlocations contains sets of values and hence is nonatomic. There are three main techniques to achieve first normal form for such a relation: 1. Remove the attribute Dlocations that violates 1NF and place it in a separate relation DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT. The primary key of this relation is the combination {Dnumber, Dlocation}. Dept. of ISE, RNSIT Page 7 Database Management System 2. Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location of a DEPARTMENT. In this case, the primary key becomes the combination {Dnumber, Dlocation}. This solution has the disadvantage of introducing redundancy in the relation. 3. If a maximum number of values is known for the attribute for example, if it is known that at most three locations can exist for a department—replace the Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and Dlocation3. This solution has the disadvantage of introducing NULL values if most departments have fewer than three locations. First normal form also disallows multivalued attributes that are themselves composite. These are called nested relations because each tuple can have a relation within it. Figure 15.10 2 the EMP_PROJ relation represents an employee entity, and a relation PROJS(Pnumber, Hours) within each tuple. The schema of this EMP_PROJ relation can be represented as follows: EMP_PROJ(Ssn, Ename, {PROJS(Pnumber, Hours)}). The set braces { } identify the attribute PROJS as multivalued, and we list the component attributes that form PROJS between parentheses ( ). To normalize this into 1NF, we remove the nested relation attributes into a new relation and propagate the primary key into it; Decomposition and primary key propagation yield the schemas EMP_PROJ1 and EMP_PROJ2, as shown in Figure 15.10(c). 2 Dept. of ISE, RNSIT Page 8 Database Management System 2 2 Normalizing nested relations into 1NF. Figure 15.10 (a) Schema of the EMP_PROJ relation with a nested relation attribute PROJS. (b) Sample extension of the EMP_PROJ relation showing nested relations within each tuple. c) Decomposing EMP_PROJ into EMP_PROJ1 and EMP_PROJ2 by propagating the primary key. 3.4 Second normal form (2NF) Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency X → Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Example1: {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours nor Pnumber→Hours holds). A functional dependency X → Y is a partial functional dependency if removal of any attribute A from X and the dependency still holds; Example2: The dependency {Ssn, Pnumber}→Ename is partial because Ssn→Ename holds. The EMP_PROJ relation is in 1NF but is not in 2NF. The nonprime attribute Ename violates 2NF because of FD2 , Pname and Plocation violates 2NF because of FD3. The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially dependent on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test. Dept. of ISE, RNSIT Page 9 Database Management System If a relation schema is not in 2NF, it can be second normalized by decomposing EMP_PROJ into the three relation schemas EP1, EP2, and EP3 shown in Figure 15.11(a), 3 each of which is in 2NF. Figure 15.11- 3 Normalizing into 2NF and 3NF. (a) Normalizing EMP_PROJ into 2NF relations. Example 2: Consider the relation schema LOTS shown in Figure 15.12(a). 4 There are two candidate keys: Property_id# and {County_name, Lot#}; lot numbers are unique only within each county, but Property_id# numbers are unique across counties for the entire state. There are two candidate keys Property_id# and {County_name, Lot#}. We choose Property_id# as the primary key, so it is underlined in Figure 15.12(a). 4 The LOTS relation schema violates the general definition of 2NF because Tax_rate is partially dependent on the candidate key {County_name, Lot#}, due to FD3. To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2, shown in Figure 15.12(b). 4 We construct LOTS1 by removing the attribute Tax_rate that violates 2NF from LOTS and placing it with County_name into another relation LOTS2. Both LOTS1 and LOTS2 are in 2NF. Dept. of ISE, RNSIT Page 10 Database Management System Figure 15.12 4 (a) The LOTS relation with its functional dependencies FD1 through FD4. (b) Decomposing into the 2NF relations LOTS1 and LOTS2. 3.5 Third Normal Form (3NF) Definition: A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key. Third normal form (3NF) is based on the concept of transitive dependency. A functional dependency X→Y in a relation schema R is a transitive dependency if there exists a set of attributes Z in R such that X→Z and Z→Y hold. Ex: The dependency Ssn→Dmgr_ssn is transitive in EMP_DEPT, because of the dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn. Normalizing EMP_DEPT into 3NF relations. The relation schema EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_ssn and Dname on Ssn via Dnumber. We can normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2 shown in the above figure. Dept. of ISE, RNSIT Page 11 Database Management System Example 2: FD4 in LOTS1 violates 3NF because Area is not a superkey and Price is not a prime attribute in LOTS1. To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A and LOTS1B shown in Figure 15.12(c).We 4 construct LOTS1A by removing the attribute Price that violates 3NF from LOTS1 and placing it with Area (the lefthand side of FD4 that causes the transitive dependency) into another relation LOTS1B. Both LOTS1A and LOTS1B are in 3NF. 4 Figure 15.12(c). Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B. 4. Boyce-Codd Normal Form. Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency X→A holds in R, then X is a superkey of R. The BCNF is based on the concept non trivial dependency. If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD. FD5 violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A. decompose LOTS1A into two BCNF relations LOTS1AX and LOTS1AY, shown in Figure 15.13(a). 5 Figure 15.13 5 Boyce-Codd normal form. (a) BCNF normalization of LOTS1A with the functional dependency FD2 being lost in the decomposition. Dept. of ISE, RNSIT Page 12 Database Management System Consider Figure 15.14, 6 which shows a relation TEACH with the following dependencies: FD1: {Student, Course} → Instructor FD2: Instructor → Course Figure 15.14 6 A relation TEACH that is in 3NF but not BCNF. {Student, Course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 15.13(b), with Student as A, Course as B, and Instructor as C. Hence this relation is in 3NF but not BCNF. The relation can be decomposed into one of the three following possible pairs: 1. {Student, Instructor} and {Student, Course}. 2. {Course, Instructor} and {Course, Student}. 3. {Instructor, Course} and {Instructor, Student}. All three decompositions lose the functional dependency FD1. The desirable decomposition is (Instructor, Course) and (Instructor, Student), because it is nonadditive join decomposition The relation schemas R1 and R2 form a nonadditive join decomposition of R with respect to a set F of functional dependencies if and only if (R1 ∩ R2) → (R1 – R2) or, (R1 ∩ R2) → (R2 – R1). 5. Multivalued Dependency and Fourth Normal Form Definition: A multivalued dependency X→→Y specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X]. Then two tuples t3 and t4 should also exist in r with the following properties, where we use Z to denote (R – (X ∪ Y)) t3[X] = t4[X] = t1[X] = t2[X]. t3[Y] = t1[Y] and t4[Y] = t2[Y]. t3[Z] = t2[Z] and t4[Z] = t1[Z]. Dept. of ISE, RNSIT Page 13 Database Management System An MVD X →→ Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X ∪ Y = R. An MVD that satisfies neither (a) nor (b) is called a nontrivial MVD. For example, the relation EMP_PROJECTS in Figure 15.15(b) 6 has the trivial MVD Ename →→ Pname. 15.15(b) 6 Definition. A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X →→ Y in F, X is a superkey for R. An all-key relation is always in BCNF since it has no FDs. An all-key relation such as the EMP relation in Figure 15.15(a), 7 which has no FDs but has the MVD Ename→→ Pname | Dname, is not in 4NF. A relation that is not in 4NF due to a nontrivial MVD must be decomposed To convert it into a set of relations in 4NF. The decomposition removes the redundancy caused by the MVD. Consider the EMP relation in Figure 15.15(a). 7 EMP is not in 4NF because of the nontrivial MVDs Ename→→ Pname and Ename →→ Dname, and Ename is not a superkey of EMP. We decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS, shown in Figure 15.15(b). 7 Both EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs Ename →→ Pname in EMP_PROJECTS and Ename →→ Dname in EMP_DEPENDENTS are trivial MVDs. Figure 15.15 7 (a) The EMP relation with two MVDs: Ename →→ Pname and Ename →→ Dname. (b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS. Dept. of ISE, RNSIT Page 14 Database Management System Whenever we decompose a relation schema R into R1 = (X ∪ Y) and R2 = (R – Y) based on an MVD X →→ Y that holds in R, the decomposition has the nonadditive join property. The following algorithm shows Relational Decomposition into 4NF Relations with Nonadditive Join Property. Input: A universal relation R and a set of functional and multivalued dependencies F. 1. Set D:= { R }; 2. While there is a relation schema Q in D that is not in 4NF, do { choose a relation schema Q in D that is not in 4NF; find a nontrivial MVD X→→Y in Q that violates 4NF; replace Q in D by two relation schemas (Q – Y) and (X U Y); }; 6. Join Dependencies and Fifth Normal Form A relation schema R when divided in to R1 and R2 has the lossless property and if the natural join is applied (R1 * R2) we will get the original relation R. In some cases there may be no lossless join decomposition of R if the number of decomposition is equal to two. But if the same relation is decomposed in to more than two relations we have a lossless decomposition. This dependency depends on the number of decomposition and hence referred as join dependency Definition. A join dependency (JD), denoted by JD(R1, R2,..., Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2,..., Rn. Hence, for every such r we have ∗(πR1 (r), πR2 (r),..., πRn (r)) = r. A join dependency JD(R1, R2,..., Rn), specified on relation schema R, is a trivial JD if one of the relation schemas Ri in JD(R1, R2,..., Rn) is equal to R. Such a dependency is called trivial because it has the nonadditive join property for any relation state r of R and thus does not specify any constraint on R. Definition. A relation schema R is in fifth normal form (5NF) (or project-join normal form (PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1, R2,..., Rn) in F+, every Ri is a superkey of R. Figure 15.157 (c)The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). Dept. of ISE, RNSIT Page 15 Database Management System Figure 15.157 (d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, R3. For example of a JD, consider once again the SUPPLY all-key relation in Figure 15.15(c).7 Suppose that the following additional constraint always holds: Whenever a supplier s supplies part p, and a project j uses part p, and the supplier s supplies at least one part to project j, then supplier s will also be supplying part p to project j. This constraint can be restated in other ways and specifies a join dependency JD(R1, R2, R3) among the three projections R1(Sname, Part_name), R2(Sname, Proj_name), and R3(Part_name, Proj_name) of SUPPLY. Figure 15.15(d) 7 shows how the SUPPLY relation with the join dependency is decomposed into three relations R1, R2, and R3 that are each in 5NF. Notice that applying a natural join to any two of these relations produces spurious tuples, but applying a natural join to all three together does not. Dept. of ISE, RNSIT Page 16 Database Management System CHAPTER-2 Basic SQL 1.1 SQL Data Definition and Data Types 1 Schema in SQL Schema elements include tables, constraints, views, domains, and other constructs (such as authorization grants) that describe the schema. A schema is created via the CREATE SCHEMA statement, which can include all the schema elements‘ definitions. For example, the following statement creates a schema called COMPANY,owned by the user with authorization identifier ‘Jsmith’. CREATE SCHEMA COMPANY AUTHORIZATION ‗Jsmith‘; 2 The CREATE TABLE Command in SQL The CREATE TABLE command is used to specify a new relation by giving it a name and specifying its attributes and initial constraints. The attributes are specified first, and each attribute is given a name, a data type to specify its domain of values, and any attribute constraints, such as NOT NULL. The key, entity integrity, and referential integrity constraints can be specified within the CREATE TABLE statement after the attributes are declared, or they can be added later using the ALTER TABLE command. CREATE TABLE COMPANY.EMPLOYEE rather than CREATE TABLE EMPLOYEE 3 Attribute Data Types and Domains in SQL 1. Numeric : The data types include integer numbers of various sizes INTEGER or INT, and SMALLINT Floating-point numbers of various precision FLOAT or REAL. Formatted numbers can be declared by using DECIMAL(i,j)—or DEC(i,j) or NUMERIC(i,j)—where i, the precision, is the total number of decimal digits Dept. of ISE, RNSIT Page 17 Database Management System and j, the scale, is the number of digits after the decimal point. 2. Character-string The data types with fixed length characters—CHAR(n) or CHARACTER(n), where n is the number of characters The data types with varying length characters VARCHAR(n) or CHAR VARYING(n) or CHARACTER VARYING(n), where n is the maximum number of characters. For example, if the value ‗Smith‘ is for an attribute of type CHAR(10), it is padded with five blank characters to become ‗Smith ‘ if needed. Padded blanks are generally ignored when strings are compared. Another variable-length string data type called CHARACTER LARGE OBJECT or CLOB is also available to specify columns that have large text values, such as documents. The CLOB maximum length can be specified in kilobytes (K), megabytes (M), or gigabytes (G). For example, CLOB(20M) specifies a maximum length of 20 megabytes 3. Bit-string The data types of fixed length n is BIT(n). The varying length is BIT VARYING(n), where n is the maximum number of bits. Literal bit strings are placed between single quotes but preceded by a B to distinguish them from character strings; for example, B‗10101‘. Another variable-length bitstring data type called BINARY LARGE OBJECT or BLOB is also available to specify columns that have large binary values, such as images 2. Boolean The data type has the traditional values of TRUE or FALSE. In SQL, because of the presence of NULL values, a three-valued logic is used, so a third possible value for a Boolean data type is UNKNOWN. 3. DATE The data type has ten positions, and its components are YEAR, MONTH, and DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions, with the components HOUR, MINUTE, and SECOND in the form HH:MM:SS. for example, DATE ‗2008-09- 27‘ or TIME ‗09:12:47‘ Dept. of ISE, RNSIT Page 18 Database Management System 4. Timestamp The data type (TIMESTAMP) includes the DATE and TIME fields, Example, TIMESTAMP ‗2008-09-27 09:12:47.648302‘. 3.2 Specifying Constraints in SQL 1 Specifying Attribute Constraints and Attribute Defaults SQL allows NULLs as attribute values, a constraint NOT NULL may be specified if NULL is not permitted for a particular attribute. It is also possible to define a default value for an attribute by appending the clause DEFAULT to an attribute definition. Example: Mgr_ssn CHAR (9) NOT NULL DEFAULT ‘888665555’, The default value is included in any new tuple if an explicit value is not provided for that attribute. If no default clause is specified, the default default value is NULL for attributes that do not have the NOT NULL constraint. Another type of constraint that can restrict attribute or domain values is by using the CHECK clause following an attribute or domain definition. For example, suppose that department numbers are restricted to integer numbers between 1 and 20; then, we can change the attribute declaration of Dnumber in the DEPARTMENT table to the following: Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber < 21); The CHECK clause can also be used in conjunction with the CREATEDOMAIN statement.For example, CREATE DOMAIN D_NUM AS INTEGER CHECK (D_NUM > 0 AND D_NUM < 21); 2 Specifying Key and Referential Integrity Constraints The PRIMARY KEY constraint uniquely identifies each record in a database table. Primary keys must contain unique values, and cannot contain NULL values. A table can have only one primary key, which may consist of single or multiple fields. For example, the primary key of DEPARTMENT can be specified as Dnumber INT PRIMARY KEY; Dept. of ISE, RNSIT Page 19 Database Management System The UNIQUE clause specifies alternate (secondary) keys, The UNIQUE constraint ensures that all values in a column are different. Both the UNIQUE and PRIMARY KEY constraints are similar.A table can have many UNIQUE constraints , but only one PRIMARY KEY constraint per table. The UNIQUE clause can also be specified directly for a secondary key if the secondary key is a single attribute, as in the following example: Dname VARCHAR(15) UNIQUE; Example: CREATE TABLE DEPARTMENT ( Dname VARCHAR(15) NOT NULL, Dnumber INT NOT NULL, Mgr_ssn CHAR(9) NOT NULL, Mgr_start_date DATE, PRIMARY KEY (Dnumber), UNIQUE (Dname), FOREIGN KEY (Mgr_ssn) REFERENCES EMPLOYEE(Ssn) ); Referential integrity is specified via the FOREIGN KEY. A FOREIGN KEY is a key used to link two tables together. A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table. The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table. A referential integrity constraint can be violated when tuples are inserted or deleted, or when a foreign key or primary key attribute value is modified. The default action that SQL takes for an integrity violation is to reject the update operation that will cause a violation, which is known as the RESTRICT option. The schema designer can specify an alternative action to be taken by attaching a referential triggered action clause to any foreign key constraint. The options include SET NULL, CASCADE, and SET DEFAULT. An option must be qualified with either ON DELETE or ON UPDATE. Dept. of ISE, RNSIT Page 20 Database Management System FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn) ON DELETE SET NULL ON UPDATE CASCADE. The database designer chooses ON DELETE SET NULL and ON UPDATE CASCADE for the foreign key Super_ssn of EMPLOYEE. This means that if the tuple for a supervising employee is deleted, the value of Super_ssn is automatically set to NULL for all employee tuples that were referencing the deleted employee tuple. On the other hand, if the Ssn value for a supervising employee is updated the new value is cascaded to Super_ssn for all employee tuples referencing the updated employee tuple. In general, the action taken by the DBMS for SET NULL or SET DEFAULT is the same for both ON DELETE and ON UPDATE: The value of the affected referencing attributes is changed to NULL for SET NULL and to the specified default value of the referencing attribute for SET DEFAULT. The action for CASCADE ON DELETE is to delete all the referencing tuples, whereas the action for CASCADE ON UPDATE is to change the value of the referencing foreign key attribute(s) to the updated (new) primary key value for all the referencing tuples. 3 Giving Names to Constraints Constraint may be given a constraint name, following the keyword CONSTRAINT. The names of all constraints within a particular schema must be unique. A constraint name is used to identify a particular constraint in case the constraint must be dropped later and replaced with another constraint, Example: CONSTRAINT DEPTMGRFK FOREIGN KEY (Mgr_ssn)REFERENCES EMPLOYEE (Ssn) DEPTMGRFK is a constraint name 4 Specifying Constraints on Tuples Using CHECK Constraints can be specified through additional CHECK clauses at the end of a CREATE TABLE statement. These can be called tuple-based constraints because they apply to each tuple individually and are checked whenever a tuple is inserted or modified. Dept. of ISE, RNSIT Page 21 Database Management System For example, suppose that the DEPARTMENT table has additional attribute Dept_create_date, which stores the date when the department was created. Then we could add the following CHECK clause at the end of the CREATE TABLE statement for the DEPARTMENT table to make sure that a manager’s start date is later than the department creation date. CHECK (Dept_create_date