T02 Relational Data Modelling and Normalization PDF
Document Details
Uploaded by Deleted User
Instituto Superior de Engenharia do Porto
null
null
Tags
Summary
This document covers relational database design concepts, focusing on normalization techniques. It details the different normal forms and their application in relational database management. The document is presented as a lecture or presentation on data modeling for databases.
Full Transcript
INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO DATABASES T02 – RELATIONAL DATA MODELLING AND NORMALIZATION INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO NORMALIZATION Based on functional, multi-value and join dependencies, the data normalization process is applied to the relational model. The hierar...
INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO DATABASES T02 – RELATIONAL DATA MODELLING AND NORMALIZATION INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO NORMALIZATION Based on functional, multi-value and join dependencies, the data normalization process is applied to the relational model. The hierarchy is made up of five normal forms (1st, 2nd, 3rd, 4th and 5th Normal Form) and an intermediate one (Boyce-Codd Normal Form, between the 3rd and 4th). In practice, normalization should not be taken to extremes, as the proliferation of relationships can lead to a deterioration in the performance of the database. In most cases, a compromise is attained by adopting something between the 3rd Normal Form and the Boyce Codd Normal Form. 2 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO NORMAL FORMS 1FN, 2FN, 3FN, FNBC (Boyce-Cood) – Functional dependencies 4NF – Multivalued dependencies 5FN – Join dependencies 3 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO 1ST NORMAL FORM 1NF A relationship is in 1NF if: – Key attributes are defined – All attributes are defined in domains that contain only atomic values, that is, each attribute can only accept elementary values and not a set of values (i.e. no table column can have tables as values) – All attributes functionally depend on the primary key Aims to eliminate the existence of groups of repeated values – An occurrence of the key can only correspond to one occurrence of the other non-key attributes 4 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO 1NF EXAMPLE Relation/table – Student(idStudent, name, address, (idCourse, courseName)) – This relation does not fulfil 1FN, since the 4th column accept a set of values idStudent name Address idCourse courseName A1 João St. A D1, D2, D3 Calculus, OOP, Data Structures A2 Ana St. B D1, D4 Calculus, Physics A3 Pedro St. C D1, D2 Calculus, OOP A4 Filipa St. D D1 Calculus 5 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO NORMALIZED TO 1NF idStudent name address idStudent idCourse courseName A1 João St. A A1 D1 Calculus A2 Ana St. B A1 D2 OOP A3 Pedro St. C A1 D3 Data Structures A4 Filipa St. D A2 D1 Calculus Student (idStudent, name, address) A2 D4 Physics A3 D1 Calculus A3 D2 OOP A4 D1 Calculus Enrollement (idStudent (FK), idCourse, courseName) 6 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO 2ND NORMAL FORM 2NF A relationship is in 2NF if: – If it is in 1NF – Each non-key attribute functionally depends on the entire key There are no partial dependencies All attributes that do not belong to the key are functionally dependent on the key as a whole and Do not depend only on any of its elements or subsets 7 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO CONVERSION TO MEET 2NF If the relationship only has one attribute as the primary key and if this relationship is already in 1NF, then the relationship is also in 2NF If the primary key is composed of more than one attribute and if any non-key attribute depends on only a part of the primary key, then the relationship must be decomposed, so that each attribute depends on the entire primary key. idStudent name address A1 João St. A The Student table is already in the 1st NF and as the primary key contains only one attribute it is A2 Ana St. B also in the 2nd NF A3 Pedro St. C A4 Filipa St. D 8 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO CONVERSION TO MEET 2NF The Enrolled table is in the 1st NF but its primary key is composed It is necessary to decompose the Enrolled table because there is a functional dependency between the non-key attribute courseName and only part of the primary key, with the attribute idCourse idStudent idCourse courseName A1 D1 Calculus A1 D2 OOP ….. …... ….. idCourse→ courseName 9 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO CONVERSION TO MEET 2NF idStudent name address idStudent idCourse A1 João St. A A1 D1 A2 Ana St. B A1 D2 A3 Pedro St. C A1 D3 A4 Filipa St. D A2 D1 Student(idStudent, name, address) A2 D4 idCourse courseName A3 D1 D1 Calculus A3 D2 D2 OOP A4 D1 D3 Data Structures Enrollement(idStudent(FK), idCourse(FK)) D4 Physics Course(idCourse, courseName) 10 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO 3RD NORMAL FORM 3NF If it is in 2FN If none of its attributes are transitively dependent on non-key attributes – If a transitive dependency exists, we remove the transitively dependent attribute(s) from the relation by placing the attribute(s) in a new relation along with a copy of the determinant. 11 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO CONVERSION TO MEET 3NF Search for functional dependencies between the non-key attributes of the relationship If the relation that is already in 2NF and has only one non-key attribute, then the relation is also already in 3NF If there is some set of non-key attributes in the relation that has functional dependence on another set of non-key attributes of the same relation, then the relation must be decomposed in such a way that any non-key attribute of the relation only depends on the primary key of the relationship 12 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO FINAL THOUGHTS Normalization is about eliminating redundancy – Redundancy leads to data inconsistence, which is a very serious problem/threat to the system Other problems that redundancy can bring – Cost of storage space - redundancy involves taking up additional space with something that adds nothing to what is already stored – Maintenance - a simple change or removal can involve accessing several tables, making it difficult to maintain the coherence of the stored data – Performance - if redundancy is significant, this will imply more disk accesses to bring the same data 13 INSTITUTO SUPERIOR DE ENGENHARIA DO PORTO FURTHER READING Begg C., Connolly T., Database Systems: A Practical Approach to Design, Implementation, and Management, Pearson, 6th ed. 2015 – Chapter 14 - Normalization 14