BUSI 2053 Data & Information Management Recap PDF
Document Details
![PleasingAzalea](https://quizgecko.com/images/avatars/avatar-20.webp)
Uploaded by PleasingAzalea
Lakehead University
2025
Tags
Summary
This document is a recap of Chapters 1-4 for BUSI 2053, a Data & Information Management course, likely from Lakehead University, dated February 5 or 6, 2025. It summarizes key concepts including database fundamentals, ER diagrams, and relational database modelling.
Full Transcript
Class 9 BUSI 2053 - Data & Information Management Chapters 1-4 Recap Chapter 1 Introduction What is a Database? Data: Stored facts in formats like text, numbers, images, etc. Information: Useful data accessed for specific tasks. Metadata: Describes data’s structure (e.g.,...
Class 9 BUSI 2053 - Data & Information Management Chapters 1-4 Recap Chapter 1 Introduction What is a Database? Data: Stored facts in formats like text, numbers, images, etc. Information: Useful data accessed for specific tasks. Metadata: Describes data’s structure (e.g., names, types) to make it usable. Database: Organized collection of data for quick access. DBMS: Software to create, manage, and use databases. How Databases Work Direct Interaction: Users work directly with the DBMS. Indirect Interaction: Users access the database via apps or interfaces. Steps to Build a Database 1. Plan: Identify what data is needed and how it will be used. 2. Model: Design a blueprint for the database (e.g., diagrams). 3. Build: Use DBMS to create the actual database. 4. Add Features: Develop user-friendly interfaces (e.g., forms, menus). 5. Deploy: Make the database available for use. 6. Maintain: Handle security, backups, and updates. Who Works with Databases? Analysts: Define what the database should do. Designers: Create the structure of the database. Developers: Build the database and interfaces. DBAs: Maintain and secure the database. Users: Access and use data for tasks. Types of Databases Operational: Support daily tasks (e.g., transactions). Analytical: Support decision-making by analyzing data. Chapter 2 Database Requirements & ER Diagrams Chapter Overview Purpose: Learn ER Modeling as a database design tool. Key Concepts: Entities, Attributes, Relationships, and ER Diagrams (ERDs). Importance: Blueprint for database creation. Chapter 2 Breakdown Chapter 2 introduces Entity-Relationship (ER) Modeling, a technique for designing databases. Here's the breakdown: 1. ER Modeling Basics: ○ Helps organize and visualize database requirements. ○ Produces ER diagrams (ERDs), which are blueprints for databases. 2. Entities and Attributes: ○ Entities: Represent things like people, places, or objects. ○ Attributes: Describe characteristics of entities (e.g., name, ID). 3. Relationships: ○ Define how entities connect to each other (e.g., one-to-many or many-to-many). ○ Include rules about how many connections each entity can have. 4. Special Cases: ○ Weak Entities: Depend on other entities for identification. ○ Associative Entities: Simplify complex relationships, especially M:N connections. ○ Ternary Relationships: Involve three entities. 5. ER Diagram Best Practices: ○ Use clear naming conventions. ○ Avoid over-complicating diagrams. ○ Focus on capturing database-relevant information, not all organizational processes. 6. Common Mistakes: ○ Confusing entities with ER diagrams. ○ Including unnecessary details. 1. What is an ERD? An ERD is a visual representation of a database's structure. It maps out entities (e.g., tables), their attributes (e.g., columns), and the relationships between entities. Relational Model Diagram (Video Tutorial): Focuses on the implementation of the database, detailing the tables, primary keys (PK), and foreign keys (FK). Represents tables explicitly with fields (columns) listed, including designation of primary and foreign keys. Used in the logical and physical design phases, focusing on how the database will be structured. Uses tabular representation with lines or keys to indicate relationships. 2. Core Components of an ERD Entities Represented as rectangles. Entities are objects or things in the database, such as Customer, Order, or Product. Two types: ○ Strong Entities: Can exist independently (e.g., Employee). ○ Weak Entities: Depend on a strong entity and cannot exist alone (e.g., OrderItem). i. Represented as a double rectangle Attributes Represented as ovals connected to entities. Describe the properties or details of an entity. ○ Simple Attribute: Cannot be divided further (e.g., FirstName). ○ Composite Attribute: Can be broken down into smaller parts (e.g., FullName → FirstName + LastName). ○ Derived Attribute: Calculated from other attributes (e.g., Age derived from DateOfBirth) and uses a dashed oval. ○ Multi-Valued Attribute: Can have multiple values and uses a double oval (e.g., PhoneNumbers) 2. Core Components of an ERD Primary Key (PK) An attribute (or combination of attributes) that uniquely identifies each record in an entity. Represented as underlined text. Relationships Represented as diamonds between entities. Define how entities are related. ○ Examples: Customer Places Order, Employee AssignedTo Project. 3. Types of Relationships Cardinality Specifies how many instances of one entity relate to instances of another: 1. Mandatory Many / One or Many / 1:M: An instance of Entity A must be related to at least one or more instances of the other entity. 2. Optional Many / Zero or Many / 0:M: An instance of Entity A can be related to zero, one, or many instances of the other entity. 3. Mandatory One / One and Only One / 1:1: An instance of Entity A must be related to exactly one instance of the other entity. 4. Optional One / Zero or One / 0:1: An instance of Entity A can be related to zero or one instance of the other entity. Advantages of Including Numbers Provides clarity in database design. Ensures the model adheres to specific business constraints. Avoids ambiguity when interpreting relationships during database implementation. 4. Design Process for ERD 1) Understand the Requirements Talk to stakeholders to understand the system’s purpose and what data it needs to store. 2) Identify Entities Determine the main objects (e.g., Customer, Order) in your system. 3) Define Relationships Determine how entities interact with one another. 4) Add Attributes For each entity, define the attributes it needs. Identify the primary key for each entity. 5) Determine Cardinality Define the number of instances in each relationship (1:1, 1:N, M:N). 6) Draw the ERD Use symbols for entities, attributes, and relationships. Add labels for clarity (e.g., "Places" for the Customer-Order relationship). 5. Best Practices 1. Simplify the Design: ○ Avoid overly complex ERDs. Group similar data logically. 2. Use Consistent Naming: ○ Use descriptive names for entities and attributes (e.g., Customer_ID, not C_ID). 3. Avoid Redundancy: ○ Ensure attributes aren't duplicated across entities unless necessary. 4. Verify with Stakeholders: ○ Ensure the diagram reflects real-world processes and requirements. 5. Test with Sample Scenarios: ○ Validate your design by using sample data to check for completeness. 6. Tools for Creating ERDs Some popular tools include: Free: ERDPlus, Lucidchart, Draw.io a. Lucidchart is the most popular option b. ERDPlus is what the textbook c. Draw.io is convenient for working directly in your Google Drive Paid: Microsoft Visio, ER/Studio, Oracle SQL Developer Data Modeler 7. Common Mistakes 1. Not defining a primary key for every entity. 2. Failing to identify proper relationships (e.g., misidentifying cardinality). 3. Ignoring derived attributes when needed for reporting or analysis. 4. Adding redundant entities or attributes. Chapter 3 Relational Database Modelling Chapter 3 Breakdown This chapter introduces the relational database model, a foundational concept in database management that organizes data into structured tables called relations. It explains how to design databases by mapping real-world data and relationships into a logical framework, focusing on primary keys, foreign keys, and constraints to ensure data integrity. Students learn to translate Entity-Relationship (ER) diagrams into relational schemas, handle various attribute types (e.g., composite, optional, and multivalued), and represent different types of relationships (1:M, M:N, and 1:1). The chapter emphasizes best practices in database design, showing how constraints and modeling techniques work together to create efficient, consistent, and scalable databases. 10 Key Takeaways 1. Relational databases organize data into tables. 2. Tables must follow specific rules to qualify as relations. 3. Primary keys make each row in a table unique. 4. Composite primary keys use multiple columns to identify rows. 5. ER diagrams help design tables and show relationships. 6. Foreign keys connect related tables. 7. Different types of relationships are handled differently in tables. 8. Constraints ensure data accuracy and consistency. 9. Special attributes like multivalued and derived are handled separately. 10. ER diagrams make it easier to understand and communicate database designs. What is a Relational Database? Definition: A relational database organizes data into structured tables (called relations). Components: Tables: Represent entities (e.g., Customers, Orders). Rows: Represent individual records. Columns: Represent attributes of the entity. Rules for Relations: Each column must have a unique name. Each row must be unique. Data in each column must follow a predefined format (e.g., numbers, text). Each cell must have a single value. Interesting Fact: The relational model was invented by Edgar F. Codd in 1970, revolutionizing data storage. Primary Keys Definition: A primary key is a column (or set of columns) that uniquely identifies each row in a table. Importance: Ensures no duplicate rows exist. Acts as a unique identifier for each record. Example: A CustomerID column in the Customers table uniquely identifies each customer. Tip: Always choose a column that cannot have duplicate values for the primary key. Foreign Keys Definition: A foreign key is a column in one table that references the primary key in another table. Purpose: Establishes relationships between tables and ensures data consistency. Example: In an Orders table, the CustomerID column references the CustomerID in the Customers table. Visual Example: Customers Table: Orders Table: Interesting Facts: Foreign keys maintain referential integrity, ensuring no orders can exist for non-existent customers. Translating Entities to Tables Process: 1. Each entity becomes a table. 2. Attributes of the entity become columns. 3. Unique attributes are mapped as primary keys. Composite Attributes: Break down into individual columns. Example: A "Full Name" attribute splits into "First Name" and "Last Name." Example Entity to Table Conversion: Entity: Customer (Attributes: CustomerID, Name, Email) Customer Table: Mapping Relationships 1:M Relationship (One-to-Many): ○ Add a foreign key to the "many" side. ○ Example: A Customer can place multiple Orders. M:N Relationship (Many-to-Many): ○ Create a new table with foreign keys from both entities. ○ Example: Students and Courses. 1:1 Relationship (One-to-One): ○ Add a foreign key to one table, ensuring each row corresponds to exactly one row in the other table. Data Integrity Rules Primary Key Constraint: Ensures no duplicate rows. Entity Integrity Constraint: Primary key columns cannot have null values. Referential Integrity Constraint: Foreign key values must match a valid primary key in the referenced table or remain null. Example: An order cannot reference a non-existent customer. Example Violation: Orders Table: This number has to be present in the other table Special Attributes Multivalued Attributes: Store in a separate table with a foreign key. Example: A Customer with multiple phone numbers. Customers Table: Phone Numbers Table: Derived Attributes: Calculated in the application, not stored in the database. Example: Age derived from birth date. Ternary and Unary Relationships Ternary Relationships: Involve three entities, e.g., Doctors, Patients, and Appointments. Create a table with foreign keys for all three entities. Unary Relationships: A relationship within the same entity, e.g., employees managing employees. Add a foreign key referencing the same table. Indicates she has no manager Mapping Unary Relationships 1:1 M:N 1:M Mapping Multiple Relationships between same entities Note that the relation PACKAGE has two foreign keys, both of which refer to the primary key in the relation EMPLOYEE. Both of the foreign keys are renamed in order to illustrate their roles in the relation EMPLOYEE. Summary and Applications Key Takeaways: Relational databases organize data into structured, related tables. Primary keys uniquely identify rows, while foreign keys link tables. Constraints ensure data accuracy and integrity. Real-World Examples: E-commerce: Customers and Orders. Education: Students and Courses. Streaming Platforms: Users and Watchlists. Interactive Activity: Ask students to think of a real-world system and design its tables and relationships. ERD → Relational Schema 1. Updated Instructions for Converting the ERD to a Relational Schema Step 1: Map Entities to Tables Each entity in the ERD becomes a table in the Relational Schema. The attributes of each entity become columns in the table. 1. EMPLOYEE: ○ Attributes: EID (Primary Key), EName ○ Relational Schema Table: EMPLOYEE(EID, EName) 2. DEVICE: ○ Attributes: DeviceID (Primary Key), DeviceType ○ Relational Schema Table: DEVICE(DeviceID, DeviceType, EID) (where EID is a Foreign Key referencing EMPLOYEE). 3. FACILITY: ○ Attributes: FacilityID (Primary Key), FacilityType ○ Relational Schema Table: FACILITY(FacilityID, FacilityType) ERD → Relational Schema 2. Map Relationships Relationship 1: EMPLOYEE and DEVICE (1:N) Description: One employee can have zero or many devices. Each device is assigned to exactly one employee. Mapping: ○ Add a foreign key (EID) in the DEVICE table to reference the EMPLOYEE table. Relational Schema: ○ DEVICE(DeviceID, DeviceType, EID (FK to EMPLOYEE.EID)) Relationship 2: EMPLOYEE and FACILITY (M:N) Description: Many employees can access many facilities. Mapping: ○ Create a new table, HasAccessTo, to represent this many-to-many relationship. ○ Include: EID as a foreign key referencing EMPLOYEE. FacilityID as a foreign key referencing FACILITY. ○ Use the combination of EID and FacilityID as the primary key. Relational Schema: ○ HasAccessTo(EID (FK to EMPLOYEE.EID), FacilityID (FK to FACILITY.FacilityID)) ERD → Relational Schema 3. Enforce Constraints 1. Primary Keys (PK): ○ EMPLOYEE(EID) ○ DEVICE(DeviceID) ○ FACILITY(FacilityID) ○ HasAccessTo(EID, FacilityID) (Composite PK) 2. Foreign Keys (FK): ○ DEVICE.EID → EMPLOYEE.EID ○ HasAccessTo.EID → EMPLOYEE.EID ○ HasAccessTo.FacilityID → FACILITY.FacilityID ERD → Relational Schema 4. Updated Relational Schema Table Name Attributes Employee EID (PK), EName Device DeviceID (PK), DeviceType, EID (FK to EMPLOYEE.EID) Facility FacilityID (PK), FacilityType HasAccessTo EID (FK to EMPLOYEE.EID), FacilityID (FK to FACILITY.FacilityID) ERD → Relational Schema Why Is IssuedTo Not Included as a Separate Table? Cardinality: The relationship is one-to-many (1:N). ○ Each employee can have multiple devices, but each device is assigned to exactly one employee. Representation: In a one-to-many relationship, the relationship can be captured by adding the foreign key of the "one" side (EID from EMPLOYEE) into the table of the "many" side (DEVICE). Efficiency: Adding EID as a column in the DEVICE table avoids redundancy and keeps the schema simple. Why Is HasAccessTo Included as a Separate Table? Cardinality: The relationship is many-to-many (M:N). ○ Each employee can access multiple facilities, and each facility can be accessed by multiple employees. Requirement: In a many-to-many relationship, there is no single table where both foreign keys (EID and FacilityID) can naturally exist without duplication or redundancy. Solution: A bridge table (HasAccessTo) is required to store the foreign keys (EID and FacilityID) and ensure that the many-to-many relationship is represented correctly. ERD → Relational Schema Explanation of Changes 1. Why Is DEVICE Modeled with a Foreign Key to EMPLOYEE? The relationship is one-to-many (1:N): ○ Each employee can have multiple devices. ○ Each device belongs to exactly one employee. This can be captured using a foreign key (EID) in the DEVICE table without needing a separate relationship table. How to Deal with Each Type Relationship Type Implementation Key Points One-to-One (1:1) Add a foreign key with a unique Use when entities are tightly constraint or merge the tables coupled or dependant on each other One-to-Many (1:N) Add a foreign key in the ‘many’ The most common relationship table type. Efficiently represented without additional tables Many-to-Many (M:N) Create a junction table with Required for all M:N foreign keys from both entities relationships. Allows additional attributes for the relationship Chapter 4 Update Operations, Update Anomalies, and Normalization Chapter 4 Simplified Imagine you have a toy collection, and you keep track of it in a notebook. When you add a new toy, you write it down (Insert), when you lose one, you erase it (Delete), and if you rename a toy, you update it (Modify). But if your notebook is messy, you might have to write the same toy many times, leading to Update Anomalies like extra, missing, or repeated information. To fix this, you organize your notebook using Normalization, creating separate lists for toys, owners, and who owns what, so updates happen in one place instead of everywhere. However, sometimes, combining lists (Denormalization) makes it easier to search quickly. A well-organized database works the same way—it prevents mistakes and keeps information easy to manage! 🎉 Chapter 4 Breakdown Key Topics Covered: ✅ Update operations: Insert, Delete, Modify ✅ Update anomalies: Insertion, Deletion, Modification ✅ Functional dependencies ✅ Normalization (1NF, 2NF, 3NF) ✅ When (and when not) to normalize 🔹 Fun Fact: Normalization helps prevent "database nightmares" where incorrect updates lead to messy and inconsistent data! Update Operations – Keeping Data Fresh! 🔄 Operations that keep a database running smoothly: Insert: Add new records (e.g., new customer sign-up) Delete: Remove outdated/unnecessary records (e.g., deleting an old product listing) Modify: Update existing records (e.g., changing a customer's address) 💡 Quick Thought: What happens if you delete a record that other data depends on? Stay tuned! Update Anomalies – The Data Troublemakers! 🚨 Anomalies happen when a database isn't structured properly: Insertion Anomaly: Need extra, unrelated info just to add new data Deletion Anomaly: Removing one record accidentally deletes other important info Modification Anomaly: Same data needs to be updated in multiple places (error-prone!) 🔍 Example: If a marketing database stores "Campaign Manager Name" multiple times, changing it in one place doesn’t update it everywhere! Functional Dependencies – The Rules of Data Relationships 🧩 A column's value uniquely determines another column’s value: 📌 Example: ClientID → ClientName (Every ClientID has exactly one matching ClientName) 🛑 Avoid these pitfalls: Trivial Dependencies (A → A) – Obvious & unnecessary Augmented Dependencies (Adding extra columns without adding value) Equivalent Dependencies (One value determines another, and vice versa) 🤔 Challenge: If "EmployeeID → EmployeeEmail", what happens when two employees share an email? The Power of Normalization – Why Do We Do It? 🔧 Normalization reduces redundancy and prevents update anomalies by breaking down tables into efficient structures. Key Normal Forms: 1⃣ First Normal Form (1NF): No duplicate rows, no multiple values in a single column 2⃣ Second Normal Form (2NF): No partial dependencies on a composite key 3⃣ Third Normal Form (3NF): No transitive dependencies (a non-key column should not depend on another non-key column) 💡 Fun Fact: The concept of normalization was first introduced in 1970 by Edgar F. Codd, the “father of relational databases”! First Normal Form (1NF) – Keep it Clean! ✨ 🚫 NO: Multi-valued columns or duplicate rows ✅ YES: Each field contains only one value, and every row is unique Example: StudentID Name Courses 101 Alex Math, CS 101 Alex Math 101 Alex CS Second Normal Form (2NF) – No Partial Dependencies! 🔍 🛑 Issue: If a table has a composite key, every column should depend on the full key, not just a part of it. 📌 Example – NOT 2NF: OrderID ProductID ProductName OrderDate 1 A123 Laptop 2025-01-01 1 B456 Mouse 2024-01-01 🚀 Fix: Separate tables for Orders and Products so "ProductName" doesn’t depend only on "ProductID"! Third Normal Form (3NF) – No Transitive Dependencies! 🚫➡ 🛑 Problem: A non-key column should not determine another non-key column. Example – NOT 3NF: EmployeeID EmployeeName DepartmentID DepartmentName 123 Alice D01 Sales 124 Bob D02 HR 💡 Why is this bad? If "DepartmentName" changes, we must update multiple rows! ✅ Solution: Split into Employee Table & Department Table! When NOT to Normalize – The Case for Denormalization 🔄 Normalization reduces redundancy but sometimes we denormalize for: ✅ Faster queries (Fewer joins = Faster retrieval) ✅ Simpler reports (Easier for business users to read) 📌 Example: A Sales Report might store Customer Name directly instead of joining multiple tables every time. ⚖ Balance is key! Normalize for accuracy, denormalize for speed! Understanding Dependencies Full Functional Dependency Partial Functional Dependency Transitive Functional Dependency Normalization Perspective A full functional dependency means that an A partial dependency occurs when an A transitive dependency occurs when an To avoid redundancy and update attribute is functionally dependent on the entire attribute depends only on a part of the attribute depends on another non-key attribute anomalies, this table should be primary key and not just a part of it. primary key rather than the whole key. rather than directly on the primary key. decomposed into 3NF (Third Normal In this case: Here, ActorsAssistantName depends on Form): ActorsAssistantID, which in turn depends on In the given table, the primary key is (MovieID, ActorID. MovieName depends only on ActorID) because each movie has multiple actors, A Movies table with MovieID, MovieID (not ActorID). and salaries are assigned based on the That means ActorsAssistantName MovieName. ActorName depends only on actor-movie combination. is transitively dependent on An Actors table with ActorID, ActorID (not MovieID). ActorID (through ActorName. These dependencies are ActorsMovieSalary is fully dependent on ActorsAssistantID). A MovieCast table with partial because they rely on (MovieID, ActorID) because the salary of an This is problematic because it MovieID, ActorID, and only one part of the composite actor is based on the specific movie they introduces redundancy—if the same ActorsMovieSalary. key instead of both MovieID worked in. ActorsAssistantID appears multiple An ActorAssistants table and ActorID together. ○ Example: If we remove either times, we store the assistant's name with ActorsAssistantID, MovieID or ActorID, we won’t be multiple times. ActorsAssistantName, and able to determine the salary ActorID. correctly. Wrap-Up – What Did We Learn? 🎯 ✅ Update Operations: Insert, Delete, Modify ✅ Update Anomalies: Insertion, Deletion, Modification issues ✅ Functional Dependencies: How data is related ✅ Normalization: 1NF, 2NF, 3NF – Removing redundancy ✅ Denormalization: Trading structure for speed 📢 Final Thought: A well-structured database is like a well-organized library – easy to navigate, quick to find what you need, and free from unnecessary clutter! 🎉 Question for You: Can you think of a real-world example where normalization helps keep data clean? 🤔