Option A - Databases PDF

📒 Course Notes COMPUTER SCIENCE IBDP: OPTION A - Databases I - BASIC CONCEPTS A.1.1 - Outline the differences between data and information. Note: Data is meaningless. To be useful, data must be interpreted to produce information. Data is a collection of facts that are meaningless on their own whereas information puts data into clear understandable context. Data tends to be raw and be classified as primary or secondary depending on whether the source is a first hand account or not. By processing data to form judgements and conclusions, the data is turned into information. A.1.2 Outline the differences between an information system and a database. Note: Students must be aware that these terms are not synonymous. Databases are a component within an information system. Information System: A formal, sociotechnical, organizational system designed to collect, process, store, and distribute information. In a sociotechnical perspective, information systems are composed by four components: task, people, structure (or roles), and technology Database: A collection of data related in a logical manner which is designed to provide all the necessary information for an organisation The Difference between Information system and Database: A database will contain data that is used by information systems where-as the information system comprises of the complete system, furthermore may present the data from the database in a way in which it becomes information The Difference between Database and Spreadsheet: ○ Spreadsheet: Primarily used to list and display data in a table like display, typically relying on functions to manipulate data, usually set up with the purpose of modelling the data into some other representation ○ Database: Primarily used to store and retrieve data, not for modelling. A.1.3 Discuss the need for databases Note: This should address topics such as the benefits of data sharing. For example, correct information relating to customers and/or clients. Data organisation: Databases provide a way to organise data in a structured manner, making it easier to store, retrieve, and manipulate data. Without a database, data would be stored in individual files, which would make it difficult to manage and access. Data integrity: Databases ensure data integrity by providing mechanisms to ensure that data is accurate and consistent. This is important when multiple users or applications need to access the same data. Without a database, it would be difficult to maintain data consistency and accuracy. Data security: Databases provide a secure way to store data by allowing administrators to control access to data. This helps protect sensitive data from unauthorised access, ensuring that only authorised users can access the data. Scalability: Databases are designed to handle large amounts of data, making them a scalable solution for organisations that need to store and manage large volumes of data. Performance: Databases are optimised for performance, allowing users to access and manipulate data quickly and efficiently. This is especially important for applications that need to process large amounts of data quickly. Data sharing: Databases enable data sharing among different applications and users, making it easier for teams to collaborate and share data across different systems and applications. Relating Data: Certain databases can link various data entries together which allow for easier retrieval and querying of data, this makes it more convenient as real life links can be expressed within the database. A.1.5 Define the term database transaction. A transaction is a logical unit of work that consists of one or more database operations that must be executed together as a single, atomic unit. Operations typically follow CRUD commands ○ Create: Creates a new data entry ○ Read: Retrieves information from a pre-existing data entry ○ Update: Updates a pre-existing data entry overwriting previous data for new data ○ Delete: Removing a data entry from the database Transactions ensure that either all of the operations are completed successfully or none of them are completed at all This helps maintain data consistency by ensuring that the database remains in a consistent state, even in the event of errors or system failures. Definition: A transaction is a single unit of work which can contain one more database operation ensuring that all of the operations are carried out or none of them are to ensure the consistency of data within the database. A.1.4 Describe the use of transactions, states and updates to maintain data consistency (and integrity) States: States refer to the condition of the database at any given time. Database management systems use states to keep track of changes to the database, including additions, updates, and deletions. The Current State is the present condition of the database and its contents. Updates: Updates refer to changes made to the database, including inserting data, changing pre-existing data, and deletions. By using a locking mechanism, the database ensures only one user can make changes to a specific record at a time to avoid conflicts and inconsistencies. By using transactions, states, and updates, database management systems can ensure data consistency and integrity by ensuring that changes to the database are made in a controlled and consistent manner. This helps prevent errors, conflicts, and inconsistencies in the data, which can lead to problems with data quality and reliability. A.1.6 Explain concurrency in a data sharing situation. Concurrency Concurrency in a data sharing situation refers to the ability of multiple users or applications to access and manipulate the same data simultaneously. In a shared data environment, concurrency can lead to conflicts, inconsistencies, and other issues if not managed properly. Concurrency control: The process of managing concurrent access to data in order to maintain data consistency and integrity. This involves implementing mechanisms to prevent conflicts and inconsistencies. Mechanisms such as ○ Locking: Locking involves the use of locks to control access to data. When a user or application accesses a particular record, a lock is placed on that record, preventing other users or applications from accessing or modifying it until the lock is released. ○ Time-stamping: Time-stamping involves assigning a unique timestamp to each transaction that accesses the database. If two transactions attempt to modify the same data, the system can use the timestamps to determine which transaction should be given priority. ○ Multi-version Concurrency Control (MVCC): Involves creating multiple versions of a data record to allow multiple users or applications to access and modify the same data simultaneously. Each user or application sees a version of the data that reflects the state of the database at the time the user or application began the transaction. Most database management systems support concurrent operations, with the universal principle of keeping unsaved data in temporary log files to maintain data integrity and consistency. Common Problems with Concurrency: Dirty read: A dirty read occurs when a transaction accesses data written or updated by another uncommitted transaction. For instance, if transaction T1 updates a stock brokerage account’s stocks_tally from 5 to 8 and transaction T2 reads this value before T1 commits, T2 may commit with the incorrect value if T1 later fails and rolls back. This inconsistency arises from uncommitted data being accessed. Lost Data Updates: when two transactions simultaneously operate on the same data variable, resulting in a loss of data updates made by one of the transactions. The second transaction generally nullifies the data updates of the first transaction. For example in the Transaction Log Below (consider B = 300 as the starting value): ○ The database admin is trying to write subtract 100 from B then add 300 to B ○ Currently at Tw the database performs 300-100 and 200 is saved temporarily but not written to the database ○ At Tx the database has not been updated with Tw’s output hence it still reads 300 so an additional 300 would make Tx output as 600 ○ Ty Transaction 1 is outputted so 200 is written to the database ○ Tz Transaction 2 overrides the value from transaction 1 so there is data loss Unrepeatable read: This issue happens when a transaction accesses a particular database variable two or more times, but it reads a unique value of the variable on every iteration. ○ Initially, the value of B is 50. ○ At Tx time, the T1 transaction reads the value of B. ○ Ty time updates the value of B and commits (writes) it to the database ○ As a result, the T1 transaction reads a different value of B at Tz time, resulting in a non-repeatable read problem. A.1.7 Explain the importance of the ACID properties of a database transaction. Atomicity Atomicity: The property of DB transactions that ensures that the entire transaction takes place at once or doesn’t happen at all. There is no midway i.e. transactions do not occur partially. Involves the following Operations: ○ Abort: If a transaction aborts, changes made to the database are not visible. ○ Commit: If a transaction commits, changes made are visible. Also known as the All or Nothing Rule Consider the above Transaction, which consists of two operations T1 and T2 where 100 is taken from x and given to y. If the transaction fails after the completion of T1 where 100 is taken from x but does not give 100 to y this results in an inconsistent database state. Atomicity ensures that in the event of this, the transaction is aborted and no changes are reflected in the database. Consistency Consistency: ensures that a database remains in a valid state before and after a transaction. It guarantees that any transaction will take the database from one consistent state to another, maintaining the rules and constraints defined for the data. Example: ○ In the example above The total amount before and after the transaction must be maintained. ○ Total before T occurs = 500 + 200 = 700. ○ Total after T occurs = 400 + 300 = 700. ○ Therefore, the database is consistent if T occurs in full. ○ Inconsistency occurs in case T1 completes but T2 fails. Isolation Isolation: Ensures that multiple transactions can occur concurrently without leading to the inconsistency of the database state. Achieved by ensuring that each transaction is “isolated” that is the changes made by one transaction are not visible to any other transaction until the operation has been committed. This property ensures that when multiple transactions run at the same time, the result will be the same as if they were run one after another in a specific order. In the following example let 𝑋 = 500 & 𝑌 = 500 ○ Suppose T has been executed till Read (Y) and then T’’ starts. ○ As a result, interleaving of operations takes place due to which T’’ reads the correct value of X but the incorrect value of Y. ○ T’’: 𝑋 + 𝑌 = 50, 000 + 500 = 50, 500 ○ This is not consistent with the sum at the end of the transaction. ○ T: 𝑋 + 𝑌 = 50, 000 + 450 = 50, 450. ○ This results in database inconsistency and a loss of 50 units. ○ Transactions must take place in isolation and changes should only be visible after they have been committed to the main memory. Durability Durability: Ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk allowing for the data to be persisted In the event of system failure, these updates now become permanent and are stored in non-volatile memory. The effects of the transaction is never lost Advantages of ACID Data Consistency: ACID properties ensure that the data remains consistent and accurate after any transaction execution. Data Integrity: ACID properties maintain the integrity of the data by ensuring that any changes to the database are permanent and cannot be lost. Concurrency Control: ACID properties help to manage multiple transactions occurring concurrently by preventing interference between them. Recovery: ACID properties ensure that in case of any failure or crash, the system can recover the data up to the point of failure or crash. Disadvantages of ACID Performance: The ACID properties can cause a performance overhead in the system, as they require additional processing to ensure data consistency and integrity. Scalability: The ACID properties may cause scalability issues in large distributed systems where multiple transactions occur concurrently. Complexity: Implementing the ACID properties can increase the complexity of the system and require significant expertise and resources. A.1.8 Describe the two functions databases require to be performed on them. Note: Query functions and update functions. Databases require two fundamental functions to be performed on them: query functions and update functions. Query Functions: Query functions are used to retrieve data from the database. ○ These functions allow users or applications to search for specific data or to retrieve a subset of data that meets certain criteria. ○ Common query functions include SELECT statements in SQL and find() functions in NoSQL databases. ○ Query functions allow users to perform various types of data analysis and reporting, such as sorting, grouping, filtering, and aggregating data. ○ They are essential for retrieving data from the database and for generating reports and insights. ○ Update Functions: Update functions are used to modify the data in the database. These functions allow users or applications to add, update, or delete data in the database. ○ Common update functions include INSERT, UPDATE, and DELETE statements in SQL and save() and remove() functions in NoSQL databases ○ Update functions are essential for maintaining the accuracy and integrity of the data in the database. ○ They allow users to make changes to the data, such as correcting errors, updating records, or deleting obsolete data. ○ Update functions must be used carefully to ensure that data consistency and integrity are maintained. Databases require both query functions and update functions to be performed on them. Query functions are used to retrieve data from the database and allow for data analysis and reporting. Update functions are used to modify data in the database and ensure data accuracy and integrity. These two functions are essential for managing data effectively in a database system. A.1.9 Explain the role of data validation and data verification. Note: Data validation and data verification are two important processes used to ensure the accuracy, completeness and consistency of data in a database system. Data Validation: Data validation is the process of checking whether the data entered into a system is accurate, complete, and consistent with predefined rules and constraints. The purpose of data validation is to ensure that the data entered into the system is correct and can be used reliably. ○ Checking the data against a predefined set of values or rules ○ If recording a list of heights and someone’s height is inputted as 5 metres, then data validation tells us this data is incorrect ○ Verifying that data is correctly formatted Data Verification: Data verification is the process of checking whether the data in the database is accurate, complete, and consistent with the original source. The purpose of data verification is to ensure that the data stored in the database is a true representation of the original data source. ○ Checking that the data matches the data from the original source ○ Checking that data matches the expected values from a previous source II - RELATIONAL DATABASE MODEL A.2.1 Define the terms: database management system (DBMS) and relational database management system (RDBMS). Database Management System: A Database Management System (DBMS) is a software system that allows users to create, define, manipulate and maintain one or more databases. It provides a way for organizations to store, organize and retrieve large amounts of data quickly and efficiently in an organized manner. Relational Database Management System: A software that organizes, stores, and manages data in structured tables with rows and columns, allowing relationships between data to be defined and manipulated using SQL, while ensuring data integrity and consistency. A.2.2 Outline the functions and tools of a DBMS. Data organisation and management: A DBMS helps organisations to store and manage large amounts of data in a structured and organised manner, making it easier to find and retrieve the data as needed. Data security and privacy: A DBMS provides a controlled environment for managing data, enabling organisations to enforce data security and privacy policies and ensure that sensitive data is protected. Data consistency and integrity: A DBMS helps to ensure that the data stored in the database is accurate, consistent, and up-to-date, improving the quality of the data and supporting better decision making. Data sharing and collaboration: A DBMS enables multiple users and applications to access and use the same data, improving collaboration and data sharing across the organisation. Data analysis and reporting: A DBMS provides tools and functions for data analysis and reporting, enabling organisations to gain insights into their data and make informed decisions based on that data. A.2.3 Describe how a DBMS can be used to promote data security. Encryption: A DBMS can support encryption mechanisms to protect data in transit and at rest. Encryption can be used to ensure that data is transmitted securely over networks and stored securely on disk or in memory. This helps to prevent data theft and unauthorised access to data. Audit Trail: A DBMS can maintain an audit trail of all activities that occur in the system. The audit trail can record all changes to data, all login attempts, and other security-related events. This can help to detect and investigate security breaches or other incidents. Backup and Recovery: A DBMS can support backup and recovery mechanisms to protect against data loss or corruption. Backup mechanisms can be used to create copies of the database at regular intervals, while recovery mechanisms can be used to restore the database to a previous state in the event of a system failure, data loss, or other problems. Data Masking: A DBMS can support data masking techniques to protect sensitive data by replacing it with fictitious data. This can be useful in situations where sensitive data is being used for testing, training, or other purposes where the original data is not required. A.2.4 Define the term schema. Schema: A schema refers to the logical structure of a database, which defines the organization and relationships among the data elements or objects within the database. Can be thought of as a blueprint or plan for the database, which specifies the types of data that can be stored in the database, the relationships between different types of data, and the constraints or rules that govern the data. A database schema typically consists of a set of tables, which represent the different entities or objects within the database, along with their attributes or fields. Example: A database schema for a customer database ○ includes tables for customers, orders, and products, along with fields for each table such as customer name, order date, and product price. ○ The schema would define the relationships between these tables, such as the fact that each order is associated with a particular customer and product. Key Points of a Database Schema Include: ○ It defines how data is logically organized, including tables, fields, and relationships. ○ It outlines the relationships between entities, such as primary and foreign keys. ○ It helps resolve issues with unstructured data by organizing it in a clear, structured way. ○ Database schemas guide how data is accessed, modified, and maintained. A.2.5 Identify the characteristics of the three levels of the schema: conceptual, logical, physical. Conceptual Schema Conceptual Schema: The conceptual schema is a high-level representation of the database that defines the structure and organization of the data stored in the database. It provides a consolidated view of the data across the organization, abstracting the details of the physical storage and processing of the data. Example Relationships: ○ Each PERSON may be the vendor in one or more ORDERS. ○ Each ORDER must be from one and only one PERSON. ○ PERSON is a sub-type of PARTY. (Meaning that every instance of PERSON is also an instance of PARTY.) ○ Each EMPLOYEE may have a supervisor who is also an EMPLOYEE. The conceptual schema defines the entities, attributes, and relationships between the entities, providing a semantic model of the data. It is typically used as a bridge between the business requirements and the physical implementation of the database, providing a common understanding of the data for both the business and technical stakeholders. The conceptual schema serves as the foundation for the logical schema Logical Schema Logical Schema: A logical database schema defines the logical structure of the data, including tables, views, relationships, and integrity constraints. It describes how data is organized in tables and how the attributes of these tables are connected. The logical schema ensures that the data is stored in an organized manner, while maintaining data integrity. Using Entity-Relationship (ER) modeling, the logical schema outlines the relationships between different data components. It also defines integrity constraints to ensure the quality of data during insertion and updates. This schema represents a higher level of abstraction compared to the physical schema, focusing on logical constraints and how the data is structured, without dealing with the physical storage details. The logical schema provides a bridge between the business requirements and the physical implementation of the database, enabling organizations to maintain a consistent and well-organized view of their data, even as the physical implementation evolves over time. By defining the logical structure of the database, the logical schema helps organizations to ensure the data stored in their databases is accurate and up-to-date, supporting better decision making and improving the overall quality of the data. Physical Schema Physical Schema: A physical schema defines how data is stored in the storage system, including the arrangement of files, indices and other storage structures. It specifies the actual code and syntax needed to create the database structure. Essentially, it determines where and how the data is stored in the physical storage medium. The database administrator decides the storage locations and organization of data within the storage blocks. This schema represents the lowest level of abstraction A.2.6 Outline the nature of the data dictionary Definition: A data dictionary is a centralised repository that stores metadata, which is information about the structure, organization, and attributes of the data stored in a database. It serves as a reference for database administrators, developers, and end-users to understand the data’s format, relationships, constraints, and usage. The nature of the data dictionary can vary depending on the specific database management system being used, but it typically includes the following types of information: ○ Data Element Descriptions: A data dictionary typically includes a description of each data element or attribute used in the database, along with information such as the data type, length, and format of the element. ○ Table and Relationship Descriptions: A data dictionary may include descriptions of the tables in the database, as well as the relationships between the tables ○ Constraints: Can include information such as data validation rules, default values, and other constraints. ○ Data Access Permissions: Access permissions that are required to view or modify data in the database. ○ Database Management Information: The version of the software being used, the server configuration, and other technical details. Integrated Data Dictionary ○ This integrated data dictionary acts as a system directory that is accessed and updated by the relational database. ○ Active: When any changes are made to the database, the active data dictionary is automatically updated by the DBMS. It is also known as a self-updating dictionary because it continuously updates its data. ○ Passive: Unlike active dictionaries, passive dictionaries must be updated manually when there are changes in the database. This type of data dictionary is difficult to manage because it requires proper functionality. Else, the database and data dictionary will be synchronized. Stand-Alone Dat Dictionary ○ This type of database in the DBMS is very adaptive because it grants the administrator in charge of the confidential information complete autonomy to define and manage all crucial data. There is no standard format for data dictionaries. Here are some common elements: ○ Data Elements: The data dictionary describes each data element by specifying the names, data types, storage formats and validation rules. ○ Table: All information about the table, such as the user who created the table, the number of rows and columns, the date the table was created and entered, etc. ○ Indexes: Indexes for database tables are stored in the data dictionary. The DBMS stores the index name used and index attributes, locations, and properties, as well as the creation date, in each index. ○ Programs: Applications defined for database access, reports, application formats and screens, SQL queries, etc. also stored in the data dictionary. ○ Relationships between data elements: A data dictionary stores relationship types; for example, if it is mandatory or optional, the nature of the relationship and connection, etc. ○ Administration and End Users: The data dictionary stores all administrative and end user data. A.2.7 Explain the importance of a data definition language in implementing a data model. Definition of DDL: A data definition language (DDL) is a set of commands or statements used to define and manipulate the structure of a database. A DDL is used to create and modify tables, indexes, constraints, and other database objects, and to specify the relationships between these objects. Examples of DDL Commands (in SQL): Use Cases: ○ Creating Tables and Relationships: To create the tables and relationships that make up a database. The DDL specifies the structure and attributes of each table. ○ Enforcing Data Integrity: A DDL can also be used to specify constraints that ensure the integrity of the data in the database. ○ Facilitating Database Management: A DDL can also be used to modify the structure of a database as needed. For example, a DDL can be used to add new tables or fields to a database, or to modify existing fields or relationships. ○ Supporting Data Security: A DDL can also be used to specify access permissions for different users or groups of users. A.2.8 Explain the importance of data modelling in the design of a database. Data modeling: The process of creating a structured representation of the data to be stored, including its organization, relationships, and constraints, typically using diagrams such as Entity-Relationship (ER) diagrams. Data modeling is a critical step in the design of a database because it allows developers to create a blueprint of the database structure and relationships between the data elements. The importance of data modeling in the design of a database can be explained as follows: ○ Data Consistency and Accuracy: Ensures data consistency and accuracy. By ensuring that data is organised consistently and accurately, a data model reduces the risk of data inconsistencies and errors. ○ Efficiency: Improves the efficiency of a database by reducing data redundancy and improving data retrieval speed. A data model helps to identify and eliminate data redundancy, ensuring that data is stored only once in the database. ○ Flexibility: Helps flexibility and can adapt to changing business needs. A data model can be updated and modified easily to accommodate new requirements or changing business needs. ○ Collaboration: Helps to facilitate collaboration between developers, database administrators, and other stakeholders involved in the design of the database. ○ Maintainability: Improves the maintainability of a database. A data model provides a clear understanding of the database structure, which helps to ensure that changes to the database can be made easily and without impacting other areas of the database. A.2.9 Define the following database terms: table, record, field, primary key, secondary key, foreign key, candidate key, composite primary key, and join. Key Database Terms Basic Components Table: The basic structure in a relational database that organises data into rows and columns Record (Row): A single entry in a table that contains a complete set of information Field (Column): A single piece of information within a record Key Types Primary Key: A unique identifier for each record in a table Secondary Key: An alternate field that can be used to identify records, but is not the primary key Foreign Key: A field that creates a relationship between two tables by referencing the primary key of another table Candidate Key: A field or combination of fields that could potentially serve as a primary key Composite Primary Key: A primary key that consists of two or more fields combined Joins A join is an operation that combines rows from two or more tables based on a related column between them. This allows for retrieving data from multiple tables in a single query. A.2.10 Identify the different types of relationships within databases: one-to-one, one-to-many, and many-to-many. One-to-One (1:1) Each record in Table A is related to exactly one record in Table B, and vice versa Example: Each person has one passport number, and each passport number belongs to one person One-to-Many (1:N) Each record in Table A can be related to multiple records in Table B, but each record in Table B is related to only one record in Table A Example: One customer can place many orders, but each order belongs to only one customer Many-to-Many (M:N) Records in both tables can be related to multiple records in the other table Requires an intermediate junction table to implement Example: Students can take multiple courses, and courses can have multiple students A.2.11 Outline the issues caused by redundant data. Data redundancy occurs when the same information is stored in multiple places within a database. While some redundancy might be intentional for performance optimisation, uncontrolled redundancy can cause several significant issues: Storage inefficiency is the most immediate impact of data redundancy. When the same information is stored multiple times, it unnecessarily consumes storage space and increases the costs associated with data storage and backup operations. Data inconsistency becomes a major risk when redundant data exists. When information needs to be updated, there's a risk that it might be changed in some locations but not others, leading to contradictory information within the database. For example, if a customer's address is stored in multiple tables and only gets updated in some of them, the database will contain conflicting information about where that customer lives. Update anomalies are another serious consequence of data redundancy. These come in three main forms: 1. Insertion Anomalies: New information cannot be added to the database without including complete data for all redundant copies 2. Deletion Anomalies: Removing one piece of information might inadvertently delete other important data that was stored alongside it 3. Modification Anomalies: Changes to data must be made consistently across all copies, or the database becomes inconsistent A.2.12 Outline the importance of referential integrity in a normalized database. Referential integrity is a fundamental database concept that ensures the relationships between tables remain consistent. It's essentially a set of rules that maintain the validity of relationships between tables in a relational database. At its core, referential integrity ensures that if one table references another table through a foreign key, that reference must always be valid. This means: A foreign key must always match an existing primary key in the referenced table You cannot delete a record that has related records in other tables (unless you specify cascading deletes) You cannot modify a primary key value if that record is referenced by other tables The importance of referential integrity in a normalized database cannot be overstated. It: Prevents orphaned records (records that reference non-existent data) Maintains data consistency across related tables Enforces business rules and data accuracy Helps maintain data quality and reliability Ensures that relationships between tables remain valid throughout database operations A.2.13 Describe the differences between 1st Normal Form (1NF), 2nd Normal Form (2NF) and 3rd Normal Form (3NF). Database normalization involves organizing data to minimize redundancy and dependency by dividing larger tables into smaller ones and defining relationships between them. The three main normal forms are: First Normal Form (1NF): A table is in First Normal Form when: Each table cell contains a single value (atomic value) There are no repeating groups All entries in a column are of the same type Each record has a unique identifier (primary key) For example, a customer table where each cell contains only one piece of information (not a list or set of values) would be in 1NF. Second Normal Form (2NF): A table is in Second Normal Form when: It is already in 1NF All non-key attributes are fully functionally dependent on the primary key No partial dependencies exist (where a field depends on only part of a composite key) For example, if you have an order details table with {OrderID, ProductID} as a composite key, the product price should not depend only on ProductID - it should depend on both OrderID and ProductID. Third Normal Form (3NF): A table is in Third Normal Form when: It is already in 2NF There are no transitive dependencies All fields depend directly on the primary key No field depends on another non-key field For example, if you have an employee table, the employee's department's budget should not be stored in the employee table because it depends on the department, not directly on the employee. A.2.14 Describe the characteristics of a normalized database. A properly normalized database exhibits several important characteristics that contribute to its efficiency and reliability: Data Independence: Each piece of data is stored in exactly one place (except for foreign keys). This independence means that when you need to update a piece of information, you only need to do it in one place. Minimal Redundancy: While some controlled redundancy exists through foreign keys, a normalized database eliminates unnecessary data duplication. This makes the database more efficient in terms of storage and easier to maintain. Logical Data Organization: Data is organized in a way that reflects the natural relationships between entities in the real world. This makes the database structure more intuitive and easier to understand. Data Integrity: The normalized structure helps maintain data integrity by: Eliminating update anomalies Ensuring consistency across related data Reducing the risk of data corruption Making it easier to enforce business rules Flexible Query Support: Despite being broken down into smaller tables, a normalized database supports complex queries through table joins, allowing you to retrieve related data efficiently. A.2.15 Evaluate the appropriateness of the different data types. The selection of appropriate data types is crucial for database efficiency and data integrity. Here's a detailed evaluation of common data types and when to use them: Numeric Data Types: Integer types are ideal for whole numbers and should be used for quantities, counts, and identification numbers. They come in various sizes: TINYINT: For very small numbers (-128 to 127) INT: For standard whole numbers BIGINT: For very large whole numbers Decimal types (DECIMAL/NUMERIC) should be used for exact decimal numbers, especially in financial calculations where precision is crucial. They store exact decimal values and are perfect for: Currency amounts Precise measurements Calculations requiring exact precision Floating-point types (FLOAT/DOUBLE) are appropriate for approximate decimal numbers, especially in scientific calculations where exact precision isn't required but the ability to handle very large or small numbers is important. Text Data Types: CHAR should be used for fixed-length strings where the length is consistent, such as: State abbreviations Postal codes Fixed-length codes VARCHAR is more appropriate for variable-length strings, offering better storage efficiency for: Names Addresses Descriptions Email addresses Temporal Data Types: DATE should be used for calendar dates without time components. TIME is appropriate for time-of-day values without dates. DATETIME or TIMESTAMP should be used when you need both date and time components. Special Data Types: BOOLEAN should be used for true/false values. BLOB or similar binary types should be used for storing files or images. ENUM should be used when you have a fixed list of possible values. A.2.16 Construct an entity-relationship diagram (ERD) for a given scenario. An Entity-Relationship Diagram (ERD) is a visual representation of the relationships between entities in a database. Here's how to construct one effectively: Step 1: Identify Entities First, identify all the major entities in your system. These are typically nouns in your system requirements. For example, in a school database: Student Teacher Course Class Step 2: Define Attributes For each entity, identify its attributes. These are the properties that describe the entity. For example, a Student entity might have: StudentID (Primary Key) FirstName LastName DateOfBirth EmailAddress Step 3: Establish Relationships Determine how entities relate to each other and mark the relationships with appropriate symbols: One-to-One: Use a single line with marks on both ends One-to-Many: Use a line with a crow's foot on the "many" end Many-to-Many: Use crow's feet on both ends Step 4: Add Cardinality Specify the exact nature of relationships: Mandatory vs. Optional relationships Minimum and maximum numbers of related entities A.2.17 Construct a relational database to 3NF using objects such as tables, queries, forms, reports and macros. Creating a database in Third Normal Form (3NF) involves several key steps and components: Tables: Create separate tables for each entity Define primary keys for each table Establish foreign key relationships between tables Ensure all columns depend directly on the primary key Queries: Create SELECT queries to retrieve data Implement INSERT, UPDATE, and DELETE queries for data manipulation Design complex queries for data analysis Forms: Design input forms for data entry Create search forms for data retrieval Implement validation rules in forms Reports: Design formatted reports for data presentation Include calculations and summaries Add grouping and sorting options Macros: Automate common tasks Create custom commands Implement data validation rules STUDENT_COURSES (Not Normalized): StudentID StudentName StudentEmail Course CourseRoom TeacherName TeacherPhon Grade e 1 John Smith john@email. Math Room 101 Ms. Jones 555-0101 A com 1 John Smith john@email. Science Room 102 Mr. Davis 555-0102 B com 2 Mary Brown mary@email. Math Room 101 Ms. Jones 555-0101 B+ com 3 Tom Wilson [email protected] English Room 103 Mrs. White 555-0103 A- om To get this to 3NF, we break it into these tables: STUDENTS: StudentID StudentName StudentEmail 1 John Smith [email protected] 2 Mary Brown [email protected] 3 Tom Wilson [email protected] TEACHERS: TeacherID TeacherName TeacherPhone T1 Ms. Jones 555-0101 T2 Mr. Davis 555-0102 T3 Mrs. White 555-0103 COURSES: CourseID CourseName CourseRoom TeacherID C1 Math Room 101 T1 C2 Science Room 102 T2 C3 English Room 103 T3 ENROLLMENTS: StudentID CourseID Grade 1 C1 A 1 C2 B 2 C1 B+ 3 C3 A- Now each table serves a specific purpose: 1. STUDENTS hold student information 2. TEACHERS hold teacher information 3. COURSES holds course information 4. ENROLLMENTS tracks which students are in which courses and their grades This structure ensures: No redundant data (each piece of information is stored once) Data integrity (relationships are maintained through IDs) Easy updates (changing a teacher's phone number only needs to be done in one place) A.2.18 Explain how a query can provide a view of a database. A query creates a virtual table or "view" by selecting specific data from the database It can combine information from multiple tables into a single view The view shows only the data you need, hiding unnecessary complexity Views can be used to: Show summary information (like total sales per region) Combine related data from different tables (like student grades across all courses) Filter data based on specific conditions (like showing only active accounts) Present data in a more understandable format for end users A.2.19 Describe the difference between a simple and complex query. Simple Queries: Usually involve only one table Use basic operations like SELECT, INSERT, UPDATE, DELETE Have straightforward WHERE conditions Don't use joins or subqueries Example: "Show all students in grade 10" Typically used for basic data retrieval or updates Complex Queries: Involve multiple tables using JOINs Use aggregate functions (COUNT, SUM, AVG) Include subqueries or nested queries Have multiple conditions and groupings Use advanced features like HAVING clauses Perform calculations or data transformations Example: "Show average grades per course for students who have above 80% attendance" A.2.20 Outline the different methods that can be used to construct a query. A query can be constructed through several different methods, each suited for different users and situations: 1. Visual Query Builders These are graphical interfaces where users can create queries by dragging and dropping elements. They're ideal for beginners and provide a visual way to understand table relationships. Common examples include Microsoft Access Query Designer and phpMyAdmin. 2. SQL Direct Coding This involves writing SQL commands directly using the SQL programming language. It offers the most flexibility and control, making it the preferred choice for experienced developers and complex queries. While it requires knowledge of SQL syntax, it allows for the most precise and powerful query construction. 3. Query By Example (QBE) This method uses a grid-based interface where users can fill in example values and conditions. It's a middle ground between visual builders and direct SQL coding, making it suitable for users with some database knowledge but who aren't SQL experts. 4. Report Generators These tools focus on creating queries specifically for reporting purposes. They often include templates and formatting options, making them ideal for business users who need to create regular reports from database data. The choice of method depends on factors such as: The user's technical expertise The complexity of the required query Time constraints Performance requirements The specific database system being used III - FURTHER ASPECTS OF DATABASE MANAGEMENT A.3.1 - Explain the role of a database administrator. The main role of a database administrator is to design, implement, maintain and manage the organisation’s database. These responsibilities may vary depending on the size and complexity of the organisation. - Design implementation: DBAs are responsible for the design and implementation of database architecture (hardware and software), the logical relationships between the data entities (ERD), and the security and access control (Safety) - Maintenance and performance tuning: DBAs are responsible for maintaining the integrity of the database and ensuring performance and availability (monitoring performance metrics, tuning the database for optimal performance, and performing regular backups and disaster control operations - A.3.2 - Explain how end-users can interact with a database. Database Administrators (DBAs) Use specialized tools (command-line, GUI, web interfaces) Manage backups, user permissions, performance monitoring Focus on database maintenance and security Internal Employees Access through role-specific interfaces Use customized forms, dashboards, reports Example: Sales reps accessing customer data and orders External Customers Interact via web/mobile applications Access personal accounts, orders, tracking Use simplified, user-friendly interfaces Each group gets tailored interfaces matching their needs and permissions. A.3.3 - Describe different methods of database recovery. 1. System Log (Transaction Log) Records all database changes with timestamps and user info Enables undoing/redoing changes Used for error recovery and restoration 2. Deferred Update Changes held in memory until commit point Batch writes to disk improve performance May cause temporary data availability delays 3. Mirroring Creates duplicate database on separate server Enables quick recovery after failures Complex to maintain but provides high availability A.3.4 - Outline how integrated database systems function. 1. Data Collection Gathers data from multiple sources (databases, apps, files) Consolidates into single database/warehouse 2. Data Integration Transforms diverse data into unified format Enables consistent access and analysis 3. Data Cleansing Fixes errors and duplicates Ensures accuracy and consistency 4. Data Storage Centralizes in optimized database/warehouse Uses specialized storage technologies 5. Data Access Provides tools for queries, visualization, reporting Offers APIs/web services for programmatic use This creates a single, reliable data source that users can access through various interfaces. A.3.5 - Outline the use of databases in areas such as stock control, police records, health records, employee data. Stock Control: Databases are used in stock control systems to manage inventory levels and track sales. The database contains information about each product, including its SKU, description, price, and quantity on hand. When a sale is made, the database is updated to reflect the change in inventory levels. Reports can be generated from the database to help with forecasting, ordering, and reordering. Police Records: Databases are used in police records systems to store and manage information about crimes, suspects, and victims. The database contains details such as the location and date of the crime, the type of crime, and any evidence or witness statements. This information can be used to identify patterns, track suspects, and solve crimes. Health Records: Databases are used in healthcare systems to store and manage patient health records. The database contains information such as the patient's name, age, medical history, diagnoses, medications, and test results. This information can be used by healthcare providers to make informed decisions about treatment, monitor patient progress, and provide better care. Employee Data: Databases are used in human resources systems to store and manage employee data. The database contains information such as the employee's name, address, contact information, job title, salary, and benefits. This information can be used to manage payroll, track performance, and provide benefits to employees. A.3.6 - Suggest methods to ensure the privacy of the personal data and the responsibility of those holding personal data not to sell or divulge it in any way. To ensure privacy of personal data and the responsibility of those holding personal data, technological and human methods are incorporated in unison. Technological Methods: Data Encryption: Data encryption is a process of converting data into a coded form to protect its confidentiality. Encryption can be used to protect data in transit or data at rest. Access Controls: Access controls are security measures that restrict access to data based on the user's identity, role, or permissions. This can include password protection, multi-factor authentication, and role-based access control. Secure Data Storage: Data should be stored in a secure location, such as a server room with access control and surveillance cameras. In addition, data backups should be stored offsite in a secure location. Regular Security Audits: Regular security audits should be conducted to ensure that data is protected from unauthorised access, and that all security controls are functioning as intended. Human methods: Employee Training: Employees should be trained on the importance of data privacy, as well as the policies and procedures for protecting personal data. Training should include how to handle personal data securely and how to detect and report any privacy breaches. Access Controls: Access controls should also be enforced for human users, including restricting access to personal data on a need-to-know basis, and providing training on how to handle sensitive data. Background Checks: Employees who have access to personal data should undergo background checks to ensure that they have a trustworthy background. Privacy Policies and Notices: Privacy policies and notices should be created and made available to all employees, customers, and partners. These should clearly state the organization's privacy practices and procedures. A.3.7 - Discuss the need for some databases to be open to interrogation by other parties (police, government, etc). Third parties such as the police or medical service may need to interrogate database systems for a variety of reasons, including: Criminal Investigations: Law enforcement agencies may need to interrogate database systems to gather evidence in criminal investigations. This may involve accessing records of suspects, victims, or witnesses. Medical Emergencies: Medical services may need to interrogate database systems to access medical records in cases of emergency. This can help medical professionals make informed decisions about treatment and care. Compliance and Regulations: Some industries, such as finance and healthcare, are subject to strict regulations and compliance requirements. Third parties may need to interrogate database systems to ensure that organisations are complying with these regulations. A.3.8 Explain the difference between data matching and data mining. Data Matching Compares multiple datasets to find matches/duplicates Used for data integration, fraud detection, identity verification Example: Bank matching customer records with government data Data Mining Analyzes single large datasets to find patterns/relationships Uses statistical/machine learning algorithms Used for business intelligence, marketing, research Example: Retailer analyzing purchase patterns Key Distinction: Matching focuses on finding duplicates across datasets, while mining uncovers patterns within a single dataset. IV - HL EXTENSION A.4.1 - Describe the characteristics of different database models. A.4.2. - Evaluate the use of object-oriented databases as opposed to relational databases. A.4.3 - Define the term data warehouse. A.4.5 - Describe a range of situations suitable for data warehousing. A.4.6 - Explain why data warehousing is time dependent. A.4.7 - Describe how data in a warehouse is updated in real time A.4.8 - Describe the advantages of using data warehousing A.4.9 - Explain the need for ETL processes in data warehousing. A.4.10 - Describe how ETL processes can be used to clean up data for a data warehouse. A.4.11 - Compare the different forms of discovering patterns using data mining. A.4.12 - Describe situations that benefit from data mining. A.4.13 - Describe how predictive modelling is used. A.4.14 - Explain the nature of database segmentation. A.4.15 - Explain the nature and purpose of link analysis. A.4.16 - Describe the process of deviation detection. 🌠 Key Vocabulary

Option A - Databases PDF

Document Details

Tags

Related

Summary

Full Transcript