chapter 6-design db with ERD.pdf
Document Details
Uploaded by EnergeticLearning
Tags
Full Transcript
CHAPTER 6 Database Design Using the E-R Model Up to this point in the text, we have assumed a given database schema and studied how queries and updates are expressed. We now consider how to design a database schema in the first place. In this chapter, we focus on the entity-relationship data model...
CHAPTER 6 Database Design Using the E-R Model Up to this point in the text, we have assumed a given database schema and studied how queries and updates are expressed. We now consider how to design a database schema in the first place. In this chapter, we focus on the entity-relationship data model (ER), which provides a means of identifying entities to be represented in the database and how those entities are related. Ultimately, the database design will be expressed in terms of a relational database design and an associated set of constraints. We show in this chapter how an E-R design can be transformed into a set of relation schemas and how some of the constraints can be captured in that design. Then, in Chapter 7, we consider in detail whether a set of relation schemas is a good or bad database design and study the process of creating good designs using a broader set of constraints. These two chapters cover the fundamental concepts of database design. 6.1 Overview of the Design Process The task of creating a database application is a complex one, involving design of the database schema, design of the programs that access and update the data, and design of a security scheme to control access to data. The needs of the users play a central role in the design process. In this chapter, we focus on the design of the database schema, although we briefly outline some of the other design tasks later in the chapter. 6.1.1 Design Phases For small applications, it may be feasible for a database designer who understands the application requirements to decide directly on the relations to be created, their attributes, and constraints on the relations. However, such a direct design process is difficult for real-world applications, since they are often highly complex. Often no one person understands the complete data needs of an application. The database designer must interact with users of the application to understand the needs of the application, represent them in a high-level fashion that can be understood by the users, and 241 242 Chapter 6 Database Design Using the E-R Model then translate the requirements into lower levels of the design. A high-level data model serves the database designer by providing a conceptual framework in which to specify, in a systematic fashion, the data requirements of the database users, and a database structure that fulfills these requirements. • The initial phase of database design is to characterize fully the data needs of the prospective database users. The database designer needs to interact extensively with domain experts and users to carry out this task. The outcome of this phase is a specification of user requirements. While there are techniques for diagrammatically representing user requirements, in this chapter we restrict ourselves to textual descriptions of user requirements. • Next, the designer chooses a data model and, by applying the concepts of the chosen data model, translates these requirements into a conceptual schema of the database. The schema developed at this conceptual-design phase provides a detailed overview of the enterprise. The entity-relationship model, which we study in the rest of this chapter, is typically used to represent the conceptual design. Stated in terms of the entity-relationship model, the conceptual schema specifies the entities that are represented in the database, the attributes of the entities, the relationships among the entities, and constraints on the entities and relationships. Typically, the conceptual-design phase results in the creation of an entity-relationship diagram that provides a graphic representation of the schema. The designer reviews the schema to confirm that all data requirements are indeed satisfied and are not in conflict with one another. She can also examine the design to remove any redundant features. Her focus at this point is on describing the data and their relationships, rather than on specifying physical storage details. • A fully developed conceptual schema also indicates the functional requirements of the enterprise. In a specification of functional requirements, users describe the kinds of operations (or transactions) that will be performed on the data. Example operations include modifying or updating data, searching for and retrieving specific data, and deleting data. At this stage of conceptual design, the designer can review the schema to ensure that it meets functional requirements. • The process of moving from an abstract data model to the implementation of the database proceeds in two final design phases. ° In the logical-design phase, the designer maps the high-level conceptual schema onto the implementation data model of the database system that will be used. The implementation data model is typically the relational data model, and this step typically consists of mapping the conceptual schema defined using the entity-relationship model into a relation schema. ° Finally, the designer uses the resulting system-specific database schema in the subsequent physical-design phase, in which the physical features of the database 6.1 Overview of the Design Process 243 are specified. These features include the form of file organization and choice of index structures, discussed in Chapter 13 and Chapter 14. The physical schema of a database can be changed relatively easily after an application has been built. However, changes to the logical schema are usually harder to carry out, since they may affect a number of queries and updates scattered across application code. It is therefore important to carry out the database design phase with care, before building the rest of the database application. 6.1.2 Design Alternatives A major part of the database design process is deciding how to represent in the design the various types of “things” such as people, places, products, and the like. We use the term entity to refer to any such distinctly identifiable item. In a university database, examples of entities would include instructors, students, departments, courses, and course offerings. We assume that a course may have run in multiple semesters, as well as multiple times in a semester; we refer to each such offering of a course as a section. The various entities are related to each other in a variety of ways, all of which need to be captured in the database design. For example, a student takes a course offering, while an instructor teaches a course offering; teaches and takes are examples of relationships between entities. In designing a database schema, we must ensure that we avoid two major pitfalls: 1. Redundancy: A bad design may repeat information. For example, if we store the course identifier and title of a course with each course offering, the title would be stored redundantly (i.e., multiple times, unnecessarily) with each course offering. It would suffice to store only the course identifier with each course offering, and to associate the title with the course identifier only once, in a course entity. Redundancy can also occur in a relational schema. In the university example we have used so far, we have a relation with section information and a separate relation with course information. Suppose that instead we have a single relation where we repeat all of the course information (course id, title, dept name, credits) once for each section (offering) of the course. Information about courses would then be stored redundantly. The biggest problem with such redundant representation of information is that the copies of a piece of information can become inconsistent if the information is updated without taking precautions to update all copies of the information. For example, different offerings of a course may have the same course identifier, but may have different titles. It would then become unclear what the correct title of the course is. Ideally, information should appear in exactly one place. 2. Incompleteness: A bad design may make certain aspects of the enterprise difficult or impossible to model. For example, suppose that, as in case (1) above, we only had entities corresponding to course offering, without having an entity 244 Chapter 6 Database Design Using the E-R Model corresponding to courses. Equivalently, in terms of relations, suppose we have a single relation where we repeat all of the course information once for each section that the course is offered. It would then be impossible to represent information about a new course, unless that course is offered. We might try to make do with the problematic design by storing null values for the section information. Such a work-around is not only unattractive but may be prevented by primary-key constraints. Avoiding bad designs is not enough. There may be a large number of good designs from which we must choose. As a simple example, consider a customer who buys a product. Is the sale of this product a relationship between the customer and the product? Alternatively, is the sale itself an entity that is related both to the customer and to the product? This choice, though simple, may make an important difference in what aspects of the enterprise can be modeled well. Considering the need to make choices such as this for the large number of entities and relationships in a real-world enterprise, it is not hard to see that database design can be a challenging problem. Indeed we shall see that it requires a combination of both science and “good taste.” 6.2 The Entity-Relationship Model The entity-relationship (E-R) data model was developed to facilitate database design by allowing specification of an enterprise schema that represents the overall logical structure of a database. The E-R model is very useful in mapping the meanings and interactions of realworld enterprises onto a conceptual schema. Because of this usefulness, many databasedesign tools draw on concepts from the E-R model. The E-R data model employs three basic concepts: entity sets, relationship sets, and attributes. The E-R model also has an associated diagrammatic representation, the E-R diagram. As we saw briefly in Section 1.3.1, an E-R diagram can express the overall logical structure of a database graphically. E-R diagrams are simple and clear— qualities that may well account in large part for the widespread use of the E-R model. The Tools section at the end of the chapter provides information about several diagram editors that you can use to create E-R diagrams. 6.2.1 Entity Sets An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example, each person in a university is an entity. An entity has a set of properties, and the values for some set of properties must uniquely identify an entity. For instance, a person may have a person id property whose value uniquely identifies that person. Thus, the value 677-89-9011 for person id would uniquely identify one particular person in the university. Similarly, courses can be thought of as entities, and course id uniquely identifies a course entity in the university. An entity may be concrete, such 6.2 The Entity-Relationship Model 245 as a person or a book, or it may be abstract, such as a course, a course offering, or a flight reservation. An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all people who are instructors at a given university, for example, can be defined as the entity set instructor. Similarly, the entity set student might represent the set of all students in the university. In the process of modeling, we often use the term entity set in the abstract, without referring to a particular set of individual entities. We use the term extension of the entity set to refer to the actual collection of entities belonging to the entity set. Thus, the set of actual instructors in the university forms the extension of the entity set instructor. This distinction is similar to the difference between a relation and a relation instance, which we saw in Chapter 2. Entity sets do not need to be disjoint. For example, it is possible to define the entity set person consisting of all people in a university. A person entity may be an instructor entity, a student entity, both, or neither. An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. The designation of an attribute for an entity set expresses that the database stores similar information concerning each entity in the entity set; however, each entity may have its own value for each attribute. Possible attributes of the instructor entity set are ID, name, dept name, and salary. In real life, there would be further attributes, such as street number, apartment number, state, postal code, and country, but we generally omit them to keep our examples simple. Possible attributes of the course entity set are course id, title, dept name, and credits. In this section we consider only attributes that are simple— those not divided into subparts. In Section 6.3, we discuss more complex situations where attributes can be composite and multivalued. Each entity has a value for each of its attributes. For instance, a particular instructor entity may have the value 12121 for ID, the value Wu for name, the value Finance for dept name, and the value 90000 for salary. The ID attribute is used to identify instructors uniquely, since there may be more than one instructor with the same name. Historically, many enterprises found it convenient to use a government-issued identification number as an attribute whose value uniquely identifies the person. However, that is considered bad practice for reasons of security and privacy. In general, the enterprise would have to create and assign its own unique identifier for each instructor. A database thus includes a collection of entity sets, each of which contains any number of entities of the same type. A database for a university may include a number of other entity sets. For example, in addition to keeping track of instructors and students, the university also has information about courses, which are represented by the entity set course with attributes course id, title, dept name and credits. In a real setting, a university database may keep dozens of entity sets. An entity set is represented in an E-R diagram by a rectangle, which is divided into two parts. The first part, which in this text is shaded blue, contains the name of 246 Chapter 6 Database Design Using the E-R Model instructor student ID name salary ID name tot_cred Figure 6.1 E-R diagram showing entity sets instructor and student. the entity set. The second part contains the names of all the attributes of the entity set. The E-R diagram in Figure 6.1 shows two entity sets instructor and student. The attributes associated with instructor are ID, name, and salary. The attributes associated with student are ID, name, and tot cred. Attributes that are part of the primary key are underlined (see Section 6.5). 6.2.2 Relationship Sets A relationship is an association among several entities. For example, we can define a relationship advisor that associates instructor Katz with student Shankar. This relationship specifies that Katz is an advisor to student Shankar. A relationship set is a set of relationships of the same type. Consider two entity sets instructor and student. We define the relationship set advisor to denote the associations between students and the instructors who act as their advisors. Figure 6.2 depicts this association. To keep the figure simple, only some of the attributes of the two entity sets are shown. A relationship instance in an E-R schema represents an association between the named entities in the real-world enterprise that is being modeled. As an illustration, the individual instructor entity Katz, who has instructor ID 45565, and the student entity Shankar, who has student ID 12345, participate in a relationship instance of advi- 76766 Crick 98988 Tanaka 45565 Katz 12345 Shankar 10101 Srinivasan 00128 Zhang 98345 Kim 76543 Brown 76543 Singh 76653 Aoi 22222 Einstein 23121 Chavez instructor 44553 Peltier student Figure 6.2 Relationship set advisor (only some attributes of instructor and student are shown). 6.2 The Entity-Relationship Model student instructor ID name salary 247 advisor ID name tot_cred Figure 6.3 E-R diagram showing relationship set advisor. sor. This relationship instance represents that in the university, the instructor Katz is advising student Shankar. A relationship set is represented in an E-R diagram by a diamond, which is linked via lines to a number of different entity sets (rectangles). The E-R diagram in Figure 6.3 shows the two entity sets instructor and student, related through a binary relationship set advisor. As another example, consider the two entity sets student and section, where section denotes an offering of a course. We can define the relationship set takes to denote the association between a student and a section in which that student is enrolled. Although in the preceding examples each relationship set was an association between two entity sets, in general a relationship set may denote the association of more than two entity sets. Formally, a relationship set is a mathematical relation on n ≥ 2 (possibly nondistinct) entity sets. If E1 , E2 , … , En are entity sets, then a relationship set R is a subset of {(e1 , e2 , … , en ) | e1 ∈ E1 , e2 ∈ E2 , … , en ∈ En } where (e1 , e2 , … , en ) is a relationship instance. The association between entity sets is referred to as participation; i.e., the entity sets E1 , E2 , … , En participate in relationship set R. The function that an entity plays in a relationship is called that entity’s role. Since entity sets participating in a relationship set are generally distinct, roles are implicit and are not usually specified. However, they are useful when the meaning of a relationship needs clarification. Such is the case when the entity sets of a relationship set are not distinct; that is, the same entity set participates in a relationship set more than once, in different roles. In this type of relationship set, sometimes called a recursive relationship set, explicit role names are necessary to specify how an entity participates in a relationship instance. For example, consider the entity set course that records information about all the courses offered in the university. To depict the situation where one course (C2) is a prerequisite for another course (C1) we have relationship set prereq that is modeled by ordered pairs of course entities. The first course of a pair takes the role of course C1, whereas the second takes the role of prerequisite course C2. In this way, all relationships of prereq are characterized by (C1, C2) pairs; (C2, C1) pairs are excluded. We indicate roles in E-R diagrams by labeling the lines that connect diamonds 248 Chapter 6 Database Design Using the E-R Model course course_id title credits course_id prereq_id prereq Figure 6.4 E-R diagram with role indicators. to rectangles. Figure 6.4 shows the role indicators course id and prereq id between the course entity set and the prereq relationship set. A relationship may also have attributes called descriptive attributes. As an example of descriptive attributes for relationships, consider the relationship set takes which relates entity sets student and section. We may wish to store a descriptive attribute grade with the relationship to record the grade that a student received in a course offering. An attribute of a relationship set is represented in an E-R diagram by an undivided rectangle. We link the rectangle with a dashed line to the diamond representing that relationship set. For example, Figure 6.5 shows the relationship set takes between the entity sets section and student. We have the descriptive attribute grade attached to the relationship set takes. A relationship set may have multiple descriptive attributes; for example, we may also store a descriptive attribute for credit with the takes relationship set to record whether a student has taken the section for credit, or is auditing (or sitting in on) the course. Observe that the attributes of the two entity sets have been omitted from the E-R diagram in Figure 6.5, with the understanding that they are specified elsewhere in the complete E-R diagram for the university; we have already seen the attributes for student, and we will see the attributes of section later in this chapter. Complex E-R designs may need to be split into multiple diagrams that may be located in different pages. Relationship sets should be shown in only one location, but entity sets may be repeated in more than one location. The attributes of an entity set should be shown in the first occurrence. Subsequent occurrences of the entity set should be shown without attributes, to avoid repetition of information and the resultant possibility of inconsistency in the attributes shown in different occurrences. grade student takes section Figure 6.5 E-R diagram with an attribute attached to a relationship set. 6.3 Complex Attributes 249 It is possible to have more than one relationship set involving the same entity sets. For example, suppose that students may be teaching assistants for a course. Then, the entity sets section and student may participate in a relationship set teaching assistant, in addition to participating in the takes relationship set. The formal definition of a relationship set, which we saw earlier, defines a relationship set as a set of relationship instances. Consider the takes relationship between student and section. Since a set cannot have duplicates, it follows that a particular student can have only one association with a particular section in the takes relationship. Thus, a student can have only one grade associated with a section, which makes sense in this case. However, if we wish to allow a student to have more than one grade for the same section, we need to have an attribute grades which stores a set of grades; such attributes are called multivalued attributes, and we shall see them later in Section 6.3. The relationship sets advisor and takes provide examples of a binary relationship set — that is, one that involves two entity sets. Most of the relationship sets in a database system are binary. Occasionally, however, relationship sets involve more than two entity sets. The number of entity sets that participate in a relationship set is the degree of the relationship set. A binary relationship set is of degree 2; a ternary relationship set is of degree 3. As an example, suppose that we have an entity set project that represents all the research projects carried out in the university. Consider the entity sets instructor, student, and project. Each project can have multiple associated students and multiple associated instructors. Furthermore, each student working on a project must have an associated instructor who guides the student on the project. For now, we ignore the first two relationships, between project and instructor, and between project and student. Instead, we focus on the information about which instructor is guiding which student on a particular project. To represent this information, we relate the three entity sets through a ternary relationship set proj guide, which relates entity sets instructor, student, and project. An instance of proj guide indicates that a particular student is guided by a particular instructor on a particular project. Note that a student could have different instructors as guides for different projects, which cannot be captured by a binary relationship between students and instructors. Nonbinary relationship sets can be specified easily in an E-R diagram. Figure 6.6 shows the E-R diagram representation of the ternary relationship set proj guide. 6.3 Complex Attributes For each attribute, there is a set of permitted values, called the domain, or value set, of that attribute. The domain of attribute course id might be the set of all text strings of a certain length. Similarly, the domain of attribute semester might be strings from the set {Fall, Winter, Spring, Summer}. 250 Chapter 6 Database Design Using the E-R Model project ... student instructor ID name tot_cred proj_guide ID name salary Figure 6.6 E-R diagram with a ternary relationship proj guide. composite attributes first_name name middle_initial address last_name street city state postal_code component attributes street_number street_name apartment_number Figure 6.7 Composite attributes instructor name and address. An attribute, as used in the E-R model, can be characterized by the following attribute types. • Simple and composite attributes. In our examples thus far, the attributes have been simple; that is, they have not been divided into subparts. Composite attributes, on the other hand, can be divided into subparts (i.e., other attributes). For example, an attribute name could be structured as a composite attribute consisting of first name, middle initial, and last name. Using composite attributes in a design schema is a good choice if a user will wish to refer to an entire attribute on some occasions, and to only a component of the attribute on other occasions. Suppose we were to add an address to the student entity-set. The address can be defined as the composite attribute address with the attributes street, city, state, and postal code.1 Composite attributes help us to group together related attributes, making the modeling cleaner. Note also that a composite attribute may appear as a hierarchy. In the composite attribute address, its component attribute street can be further divided into street number, street name, and apartment number. Figure 6.7 depicts these examples of composite attributes for the instructor entity set. 1 We assume the address format used in the United States, which includes a numeric postal code called a zip code. 6.3 Complex Attributes 251 • Single-valued and multivalued attributes. The attributes in our examples all have a single value for a particular entity. For instance, the student ID attribute for a specific student entity refers to only one student ID. Such attributes are said to be single valued. There may be instances where an attribute has a set of values for a specific entity. Suppose we add to the instructor entity set a phone number attribute. An instructor may have zero, one, or several phone numbers, and different instructors may have different numbers of phones. This type of attribute is said to be multivalued. As another example, we could add to the instructor entity set an attribute dependent name listing all the dependents. This attribute would be multivalued, since any particular instructor may have zero, one, or more dependents. • Derived attributes. The value for this type of attribute can be derived from the val- ues of other related attributes or entities. For instance, let us say that the instructor entity set has an attribute students advised, which represents how many students an instructor advises. We can derive the value for this attribute by counting the number of student entities associated with that instructor. As another example, suppose that the instructor entity set has an attribute age that indicates the instructor’s age. If the instructor entity set also has an attribute date of birth, we can calculate age from date of birth and the current date. Thus, age is a derived attribute. In this case, date of birth may be referred to as a base attribute, or a stored attribute. The value of a derived attribute is not stored but is computed when required. Figure 6.8 shows how composite attributes can be represented in the E-R notation. Here, a composite attribute name with component attributes first name, middle initial, and last name replaces the simple attribute name of instructor. As another example, suppose we were to add an address to the instructor entity set. The address can be defined as the composite attribute address with the attributes street, city, state, and postal code. The attribute street is itself a composite attribute whose component attributes are street number, street name, and apartment number. The figure also illustrates a multivalued attribute phone number, denoted by “{phone number}”, and a derived attribute age, depicted by “age ( )”. An attribute takes a null value when an entity does not have a value for it. The null value may indicate “not applicable” — that is, the value does not exist for the entity. For example, a person who has no middle name may have the middle initial attribute set to null. Null can also designate that an attribute value is unknown. An unknown value may be either missing (the value does exist, but we do not have that information) or not known (we do not know whether or not the value actually exists). For instance, if the name value for a particular instructor is null, we assume that the value is missing, since every instructor must have a name. A null value for the apartment number attribute could mean that the address does not include an apartment number (not applicable), that an apartment number exists but we do not know what 252 Chapter 6 Database Design Using the E-R Model instructor ID name first_name middle_initial last_name address street street_number street_name apt_number city state zip { phone_number } date_of_birth age ( ) Figure 6.8 E-R diagram with composite, multivalued, and derived attributes. it is (missing), or that we do not know whether or not an apartment number is part of the instructor’s address (unknown). 6.4 Mapping Cardinalities Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set. Mapping cardinalities are most useful in describing binary relationship sets, although they can contribute to the description of relationship sets that involve more than two entity sets. For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of the following: • One-to-one. An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. (See Figure 6.9a.) • One-to-many. An entity in A is associated with any number (zero or more) of entities in B. An entity in B, however, can be associated with at most one entity in A. (See Figure 6.9b.) • Many-to-one. An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with any number (zero or more) of entities in A. (See Figure 6.10a.) 6.4 Mapping Cardinalities A B A 253 B b1 a1 a2 a3 a4 b1 a1 b2 b2 a2 b3 b3 a3 b4 b5 (a) (b) Figure 6.9 Mapping cardinalities. (a) One-to-one. (b) One-to-many. • Many-to-many. An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is associated with any number (zero or more) of entities in A. (See Figure 6.10b.) The appropriate mapping cardinality for a particular relationship set obviously depends on the real-world situation that the relationship set is modeling. As an illustration, consider the advisor relationship set. If a student can be advised by several instructors (as in the case of students advised jointly), the relationship set is many-to-many. In contrast, if a particular university imposes a constraint that a student can be advised by only one instructor, and an instructor can advise several students, then the relationship set from instructor to student must be one-to-many. Thus, mapping A B a1 a aa2 2 b1 a3 b2 a4 b3 a5 (a) A B a1 b1 a2 b2 a3 b3 a4 b4 (b) Figure 6.10 Mapping cardinalities. (a) Many-to-one. (b) Many-to-many. 254 Chapter 6 Database Design Using the E-R Model cardinalities can be used to specify constraints on what relationships are permitted in the real world. In the E-R diagram notation, we indicate cardinality constraints on a relationship by drawing either a directed line (→) or an undirected line (— ) between the relationship set and the entity set in question. Specifically, for the university example: • One-to-one. We draw a directed line from the relationship set to both entity sets. For example, in Figure 6.11a, the directed lines to instructor and student indicate that an instructor may advise at most one student, and a student may have at most one advisor. instructor student advisor ID name salary ID name tot_cred (a) One-to-one student instructor advisor ID name salary ID name tot_cred (b) One-to-many student instructor advisor ID name salary ID name tot_cred (c) Many-to-one student instructor ID name salary advisor ID name tot_cred (d) Many-to-many Figure 6.11 Relationship cardinalities. 6.4 Mapping Cardinalities 255 • One-to-many. We draw a directed line from the relationship set to the “one” side of the relationship. Thus, in Figure 6.11b, there is a directed line from relationship set advisor to the entity set instructor, and an undirected line to the entity set student. This indicates that an instructor may advise many students, but a student may have at most one advisor. • Many-to-one. We draw a directed line from the relationship set to the “one” side of the relationship. Thus, in Figure 6.11c, there is an undirected line from the relationship set advisor to the entity set instructor and a directed line to the entity set student. This indicates that an instructor may advise at most one student, but a student may have many advisors. • Many-to-many. We draw an undirected line from the relationship set to both entity sets. Thus, in Figure 6.11d, there are undirected lines from the relationship set advisor to both entity sets instructor and student. This indicates that an instructor may advise many students, and a student may have many advisors. The participation of an entity set E in a relationship set R is said to be total if every entity in E must participate in at least one relationship in R. If it is possible that some entities in E do not participate in relationships in R, the participation of entity set E in relationship R is said to be partial. For example, a university may require every student to have at least one advisor; in the E-R model, this corresponds to requiring each entity to be related to at least one instructor through the advisor relationship. Therefore, the participation of student in the relationship set advisor is total. In contrast, an instructor need not advise any students. Hence, it is possible that only some of the instructor entities are related to the student entity set through the advisor relationship, and the participation of instructor in the advisor relationship set is therefore partial. We indicate total participation of an entity in a relationship set using double lines. Figure 6.12 shows an example of the advisor relationship set where the double line indicates that a student must have an advisor. E-R diagrams also provide a way to indicate more complex constraints on the number of times each entity participates in relationships in a relationship set. A line may have an associated minimum and maximum cardinality, shown in the form l..h, where l instructor ID name salary student advisor ID name tot_cred Figure 6.12 E-R diagram showing total participation. 256 Chapter 6 Database Design Using the E-R Model instructor ID name salary student 0.. * advisor 1..1 ID name tot_cred Figure 6.13 Cardinality limits on relationship sets. is the minimum and h the maximum cardinality. A minimum value of 1 indicates total participation of the entity set in the relationship set; that is, each entity in the entity set occurs in at least one relationship in that relationship set. A maximum value of 1 indicates that the entity participates in at most one relationship, while a maximum value ∗ indicates no limit. For example, consider Figure 6.13. The line between advisor and student has a cardinality constraint of 1..1, meaning the minimum and the maximum cardinality are both 1. That is, each student must have exactly one advisor. The limit 0.. ∗ on the line between advisor and instructor indicates that an instructor can have zero or more students. Thus, the relationship advisor is one-to-many from instructor to student, and further the participation of student in advisor is total, implying that a student must have an advisor. It is easy to misinterpret the 0.. ∗ on the left edge and think that the relationship advisor is many-to-one from instructor to student — this is exactly the reverse of the correct interpretation. If both edges have a maximum value of 1, the relationship is one-to-one. If we had specified a cardinality limit of 1.. ∗ on the left edge, we would be saying that each instructor must advise at least one student. The E-R diagram in Figure 6.13 could alternatively have been drawn with a double line from student to advisor, and an arrow on the line from advisor to instructor, in place of the cardinality constraints shown. This alternative diagram would enforce exactly the same constraints as the constraints shown in the figure. In the case of nonbinary relationship sets, we can specify some types of many-toone relationships. Suppose a student can have at most one instructor as a guide on a project. This constraint can be specified by an arrow pointing to instructor on the edge from proj guide. We permit at most one arrow out of a nonbinary relationship set, since an E-R diagram with two or more arrows out of a nonbinary relationship set can be interpreted in two ways. We elaborate on this issue in Section 6.5.2. 6.5 Primary Key We must have a way to specify how entities within a given entity set and relationships within a given relationship set are distinguished. 6.5 Primary Key 6.5.1 257 Entity Sets Conceptually, individual entities are distinct; from a database perspective, however, the differences among them must be expressed in terms of their attributes. Therefore, the values of the attribute values of an entity must be such that they can uniquely identify the entity. In other words, no two entities in an entity set are allowed to have exactly the same value for all attributes. The notion of a key for a relation schema, as defined in Section 2.3, applies directly to entity sets. That is, a key for an entity is a set of attributes that suffice to distinguish entities from each other. The concepts of superkey, candidate key, and primary key are applicable to entity sets just as they are applicable to relation schemas. Keys also help to identify relationships uniquely, and thus distinguish relationships from each other. Next, we define the corresponding notions of keys for relationship sets. 6.5.2 Relationship Sets We need a mechanism to distinguish among the various relationships of a relationship set. Let R be a relationship set involving entity sets E1 , E2 , … , En . Let primary-key(Ei ) denote the set of attributes that forms the primary key for entity set Ei . Assume for now that the attribute names of all primary keys are unique. The composition of the primary key for a relationship set depends on the set of attributes associated with the relationship set R. If the relationship set R has no attributes associated with it, then the set of attributes primary-key(E1 ) ∪ primary-key(E2 ) ∪ ⋯ ∪ primary-key(En ) describes an individual relationship in set R. If the relationship set R has attributes a1 , a2 , … , am associated with it, then the set of attributes primary-key(E1 ) ∪ primary-key(E2 ) ∪ ⋯ ∪ primary-key(En ) ∪ {a1 , a2 , … , am } describes an individual relationship in set R. If the attribute names of primary keys are not unique across entity sets, the attributes are renamed to distinguish them; the name of the entity set combined with the name of the attribute would form a unique name. If an entity set participates more than once in a relationship set (as in the prereq relationship in Section 6.2.2), the role name is used instead of the name of the entity set, to form a unique attribute name. Recall that a relationship set is a set of relationship instances, and each instance is uniquely identified by the entities that participate in it. Thus, in both of the preceding cases, the set of attributes primary-key(E1 ) ∪ primary-key(E2 ) ∪ ⋯ ∪ primary-key(En ) forms a superkey for the relationship set. 258 Chapter 6 Database Design Using the E-R Model The choice of the primary key for a binary relationship set depends on the mapping cardinality of the relationship set. For many-to-many relationships, the preceding union of the primary keys is a minimal superkey and is chosen as the primary key. As an illustration, consider the entity sets instructor and student, and the relationship set advisor, in Section 6.2.2. Suppose that the relationship set is many-to-many. Then the primary key of advisor consists of the union of the primary keys of instructor and student. For one-to-many and many-to-one relationships, the primary key of the “many” side is a minimal superkey and is used as the primary key. For example, if the relationship is many-to-one from student to instructor — that is, each student can have at most one advisor— then the primary key of advisor is simply the primary key of student. However, if an instructor can advise only one student— that is, if the advisor relationship is manyto-one from instructor to student — then the primary key of advisor is simply the primary key of instructor. For one-to-one relationships, the primary key of either one of the participating entity sets forms a minimal superkey, and either one can be chosen as the primary key of the relationship set. However, if an instructor can advise only one student, and each student can be advised by only one instructor— that is, if the advisor relationship is one-to-one— then the primary key of either student or instructor can be chosen as the primary key for advisor. For nonbinary relationships, if no cardinality constraints are present, then the superkey formed as described earlier in this section is the only candidate key, and it is chosen as the primary key. The choice of the primary key is more complicated if cardinality constraints are present. As we noted in Section 6.4, we permit at most one arrow out of a relationship set. We do so because an E-R diagram with two or more arrows out of a nonbinary relationship set can be interpreted in the two ways we describe below. Suppose there is a relationship set R between entity sets E1 , E2 , E3 , E4 , and the only arrows are on the edges to entity sets E3 and E4 . Then, the two possible interpretations are: 1. A particular combination of entities from E1 , E2 can be associated with at most one combination of entities from E3 , E4 . Thus, the primary key for the relationship R can be constructed by the union of the primary keys of E1 and E2 . 2. A particular combination of entities from E1 , E2 , E3 can be associated with at most one combination of entities from E4 , and further a particular combination of entities from E1 , E2 , E4 can be associated with at most one combination of entities from E3 , Then the union of the primary keys of E1 , E2 , and E3 forms a candidate key, as does the union of the primary keys of E1 , E2 , and E4 . Each of these interpretations has been used in practice and both are correct for particular enterprises being modeled. Thus, to avoid confusion, we permit only one arrow out of a nonbinary relationship set, in which case the two interpretations are equivalent. 6.5 Primary Key 259 In order to represent a situation where one of the multiple-arrow situations holds, the E-R design can be modified by replacing the non-binary relationship set with an entity set. That is, we treat each instance of the non-binary relationship set as an entity. Then we can relate each of those entities to corresponding instances of E1 , E2 , E4 via separate relationship sets. A simpler approach is to use functional dependencies, which we study in Chapter 7 (Section 7.4). Functional dependencies which allow either of these interpretations to be specified simply in an unambiguous manner. The primary key for the relationship set R is then the union of the primary keys of those participating entity sets Ei that do not have an incoming arrow from the relationship set R. 6.5.3 Weak Entity Sets Consider a section entity, which is uniquely identified by a course identifier, semester, year, and section identifier. Section entities are related to course entities. Suppose we create a relationship set sec course between entity sets section and course. Now, observe that the information in sec course is redundant, since section already has an attribute course id, which identifies the course with which the section is related. One option to deal with this redundancy is to get rid of the relationship sec course; however, by doing so the relationship between section and course becomes implicit in an attribute, which is not desirable. An alternative way to deal with this redundancy is to not store the attribute course id in the section entity and to only store the remaining attributes sec id, year, and semester.2 However, the entity set section then does not have enough attributes to identify a particular section entity uniquely; although each section entity is distinct, sections for different courses may share the same sec id, year, and semester. To deal with this problem, we treat the relationship sec course as a special relationship that provides extra information, in this case the course id, required to identify section entities uniquely. The notion of weak entity set formalizes the above intuition. A weak entity set is one whose existence is dependent on another entity set, called its identifying entity set; instead of associating a primary key with a weak entity, we use the primary key of the identifying entity, along with extra attributes, called discriminator attributes to uniquely identify a weak entity. An entity set that is not a weak entity set is termed a strong entity set. Every weak entity must be associated with an identifying entity; that is, the weak entity set is said to be existence dependent on the identifying entity set. The identifying entity set is said to own the weak entity set that it identifies. The relationship associating the weak entity set with the identifying entity set is called the identifying relationship. The identifying relationship is many-to-one from the weak entity set to the identifying entity set, and the participation of the weak entity set in the relationship is total. 2 Note that the relational schema we eventually create from the entity set section does have the attribute course id, for reasons that will become clear later, even though we have dropped the attribute course id from the entity set section. 260 Chapter 6 Database Design Using the E-R Model The identifying relationship set should not have any descriptive attributes, since any such attributes can instead be associated with the weak entity set. In our example, the identifying entity set for section is course, and the relationship sec course, which associates section entities with their corresponding course entities, is the identifying relationship. The primary key of section is formed by the primary key of the identifying entity set (that is, course), plus the discriminator of the weak entity set (that is, section). Thus, the primary key is {course id, sec id, year, semester}. Note that we could have chosen to make sec id globally unique across all courses offered in the university, in which case the section entity set would have had a primary key. However, conceptually, a section is still dependent on a course for its existence, which is made explicit by making it a weak entity set. In E-R diagrams, a weak entity set is depicted via a double rectangle with the discriminator being underlined with a dashed line. The relationship set connecting the weak entity set to the identifying strong entity set is depicted by a double diamond. In Figure 6.14, the weak entity set section depends on the strong entity set course via the relationship set sec course. The figure also illustrates the use of double lines to indicate that the participation of the (weak) entity set section in the relationship sec course is total, meaning that every section must be related via sec course to some course. Finally, the arrow from sec course to course indicates that each section is related to a single course. In general, a weak entity set must have a total participation in its identifying relationship set, and the relationship is many-to-one toward the identifying entity set. A weak entity set can participate in relationships other than the identifying relationship. For instance, the section entity could participate in a relationship with the time slot entity set, identifying the time when a particular class section meets. A weak entity set may participate as owner in an identifying relationship with another weak entity set. It is also possible to have a weak entity set with more than one identifying entity set. A particular weak entity would then be identified by a combination of entities, one from each identifying entity set. The primary key of the weak entity set would consist of the union of the primary keys of the identifying entity sets, plus the discriminator of the weak entity set. course course_id title credits sec_course section sec_id semester year Figure 6.14 E-R diagram with a weak entity set. 6.6 Removing Redundant Attributes in Entity Sets 6.6 261 Removing Redundant Attributes in Entity Sets When we design a database using the E-R model, we usually start by identifying those entity sets that should be included. For example, in the university organization we have discussed thus far, we decided to include such entity sets as student and instructor. Once the entity sets are decided upon, we must choose the appropriate attributes. These attributes are supposed to represent the various values we want to capture in the database. In the university organization, we decided that for the instructor entity set, we will include the attributes ID, name, dept name, and salary. We could have added the attributes phone number, office number, home page, and others. The choice of what attributes to include is up to the designer, who has a good understanding of the structure of the enterprise. Once the entities and their corresponding attributes are chosen, the relationship sets among the various entities are formed. These relationship sets may result in a situation where attributes in the various entity sets are redundant and need to be removed from the original entity sets. To illustrate, consider the entity sets instructor and department: • The entity set instructor includes the attributes ID, name, dept name, and salary, with ID forming the primary key. • The entity set department includes the attributes dept name, building, and budget, with dept name forming the primary key. We model the fact that each instructor has an associated department using a relationship set inst dept relating instructor and department. The attribute dept name appears in both entity sets. Since it is the primary key for the entity set department, it is redundant in the entity set instructor and needs to be removed. Removing the attribute dept name from the instructor entity set may appear rather unintuitive, since the relation instructor that we used in the earlier chapters had an attribute dept name. As we shall see later, when we create a relational schema from the E-R diagram, the attribute dept name in fact gets added to the relation instructor, but only if each instructor has at most one associated department. If an instructor has more than one associated department, the relationship between instructors and departments is recorded in a separate relation inst dept. Treating the connection between instructors and departments uniformly as a relationship, rather than as an attribute of instructor, makes the logical relationship explicit, and it helps avoid a premature assumption that each instructor is associated with only one department. Similarly, the student entity set is related to the department entity set through the relationship set student dept and thus there is no need for a dept name attribute in student. 262 Chapter 6 Database Design Using the E-R Model As another example, consider course offerings (sections) along with the time slots of the offerings. Each time slot is identified by a time slot id, and has associated with it a set of weekly meetings, each identified by a day of the week, start time, and end time. We decide to model the set of weekly meeting times as a multivalued composite attribute. Suppose we model entity sets section and time slot as follows: • The entity set section includes the attributes course id, sec id, semester, year, build- ing, room number, and time slot id, with (course id, sec id, year, semester) forming the primary key. • The entity set time slot includes the attributes time slot id, which is the primary key,3 and a multivalued composite attribute {(day, start time, end time)}.4 These entities are related through the relationship set sec time slot. The attribute time slot id appears in both entity sets. Since it is the primary key for the entity set time slot, it is redundant in the entity set section and needs to be removed. As a final example, suppose we have an entity set classroom, with attributes building, room number, and capacity, with building and room number forming the primary key. Suppose also that we have a relationship set sec class that relates section to classroom. Then the attributes {building, room number} are redundant in the entity set section. A good entity-relationship design does not contain redundant attributes. For our university example, we list the entity sets and their attributes below, with primary keys underlined: • • • • • • • classroom: with attributes (building, room number, capacity). department: with attributes (dept name, building, budget). course: with attributes (course id, title, credits). instructor: with attributes (ID, name, salary). section: with attributes (course id, sec id, semester, year). student: with attributes (ID, name, tot cred). time slot: with attributes (time slot id, {(day, start time, end time) }). The relationship sets in our design are listed below: • inst dept: relating instructors with departments. • stud dept: relating students with departments. 3 We shall see later on that the primary key for the relation created from the entity set time slot includes day and start time; however, day and start time do not form part of the primary key of the entity set time slot. 4 We could optionally give a name, such as meeting, for the composite attribute containing day, start time, and end time. 6.6 Removing Redundant Attributes in Entity Sets • • • • • • • • 263 teaches: relating instructors with sections. takes: relating students with sections, with a descriptive attribute grade. course dept: relating courses with departments. sec course: relating sections with courses. sec class: relating sections with classrooms. sec time slot: relating sections with time slots. advisor: relating students with instructors. prereq: relating courses with prerequisite courses. You can verify that none of the entity sets has any attribute that is made redundant by one of the relationship sets. Further, you can verify that all the information (other than constraints) in the relational schema for our university database, which we saw earlier in Figure 2.9, has been captured by the above design, but with several attributes in the relational design replaced by relationships in the E-R design. We are finally in a position to show (Figure 6.15) the E-R diagram that corresponds to the university enterprise that we have been using thus far in the text. This E-R diagram is equivalent to the textual description of the university E-R model, but with several additional constraints. In our university database, we have a constraint that each instructor must have exactly one associated department. As a result, there is a double line in Figure 6.15 between instructor and inst dept, indicating total participation of instructor in inst dept; that is, each instructor must be associated with a department. Further, there is an arrow from inst dept to department, indicating that each instructor can have at most one associated department. Similarly, entity set course has a double line to relationship set course dept, indicating that every course must be in some department, and entity set student has a double line to relationship set stud dept, indicating that every student must be majoring in some department. In each case, an arrow points to the entity set department to show that a course (and, respectively, a student) can be related to only one department, not several. Similarly, entity set course has a double line to relationship set course dept, indicating that every course must be in some department, and entity set student has a double line to relationship set stud dept, indicating that every student must be majoring in some department. In each case, an arrow points to the entity set department to show that a course (and, respectively, a student) can be related to only one department, not several. Further, Figure 6.15 shows that the relationship set takes has a descriptive attribute grade, and that each student has at most one advisor. The figure also shows that section is a weak entity set, with attributes sec id, semester, and year forming the discriminator; sec course is the identifying relationship set relating weak entity set section to the strong entity set course. 264 Chapter 6 Database Design Using the E-R Model department course_dept dept_name building budget stud_dept inst_dept instructor student ID name salary ID name tot_cred advisor takes teaches section course course_id title credits course_id prereq sec_course prereq_id sec_id semester year grade time_slot sec_time_slot time_slot_id { day start_time end_time } sec_class classroom building room_number capacity Figure 6.15 E-R diagram for a university enterprise. In Section 6.7, we show how this E-R diagram can be used to derive the various relation schemas we use. 6.7 Reducing E-R Diagrams to Relational Schemas Both the E-R model and the relational database model are abstract, logical representations of real-world enterprises. Because the two models employ similar design principles, we can convert an E-R design into a relational design. For each entity set and for each relationship set in the database design, there is a unique relation schema to which we assign the name of the corresponding entity set or relationship set. 6.7 Reducing E-R Diagrams to Relational Schemas 265 In this section, we describe how an E-R schema can be represented by relation schemas and how constraints arising from the E-R design can be mapped to constraints on relation schemas. 6.7.1 Representation of Strong Entity Sets Let E be a strong entity set with only simple descriptive attributes a1 , a2 , … , an . We represent this entity with a schema called E with n distinct attributes. Each tuple in a relation on this schema corresponds to one entity of the entity set E. For schemas derived from strong entity sets, the primary key of the entity set serves as t