Document Details

PreciseJuxtaposition4490

Uploaded by PreciseJuxtaposition4490

Dr. Ohene-Bonsu Simmons

Tags

database types database design relational databases data management

Summary

This document provides an introduction to various database types, focusing on the characteristics of relational databases. It also discusses the advantages and disadvantages of relational databases and the role they play in data management. The document includes a wide range of relevant topics within database design and database management systems.

Full Transcript

Topic 2: VARIOUS TYPES OF DATABASES AND PHYSICAL DATABASE DESIGN Learning outcomes By the end of this topic, you should be able to: 1. identify the various categories of database - electronic 2. physical database design - traditional 3. unde...

Topic 2: VARIOUS TYPES OF DATABASES AND PHYSICAL DATABASE DESIGN Learning outcomes By the end of this topic, you should be able to: 1. identify the various categories of database - electronic 2. physical database design - traditional 3. understand database role against the traditional approach LEARNING 4.OUTCOMES file systems vs. database 5. understand some database terms 2. VARIOUS TYPES OF DATABASES AND PHYSICAL DATABASE DESIGN 2.1 identify the various categories of databases Database consists of various types and each is required according to a user case or corporate structure for handling data and information. More specifically, the various types reflect the varied needs of applications and the type of data they process or handle. The various types are developed by vendors to suit different applications or cases. Thus, each database is developed to optimise performance of a particular case. The availability of different types of databases is in line with corporates’ needs for data structures, access patterns, scalability demands, consistency requirements, and ad hoc requirements. The figure below shows some types of databases including the four (4) major categories. The major types consist of four (4), namely; relational, hierarchical, network, and object-oriented systems. Below provides details of major types including other selected options. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 1 2.1.1 Relational Databases Relational database is the popular type and stores data and information in tables structured into rows and columns. For each table (table datasheet), row represents a unique record, and column represents a specific attribute of that record. The tables below were extracted from a club database where multiple users can work with each table at a time. The table approach is similar to the normal spreadsheet for capturing and handling data. Each row in a table represents a distinct record, while column shows characteristic of an entity such as member #, first_name, and location. The main power of relational databases is the ability to link several tables together using relationships, such as registration (Table1) and payment (Query2). These relationships are established through foreign keys when related tables are connected for users to link data from various sources. This linkage creates a unified view of information to suit organisational needs. Most organisations use relational databases to store their operational data because of the technology associated with that system. Relational databases are relatively accessible and easily related to a wide range of data management applications. Programmers’ Tool in Relational Databases - Structured query language (SQL) Structured Query Language (SQL) is a backend language that supports programmers, database administrator or software engineers to manage relational database management systems (RDBMS). Thus, every computer programmer needs to have a basic knowledge of SQL. This SQL is a powerful language that enables experts in databases management systems to create, query, insert, update, and delete data or record(s). It also supports experts to perform complex operations such as joining data from multiple tables and queries. The structured nature of SQL ensures data integrity and consistency through atomicity, consistency, isolation, and durability (ACID) properties. The details are explained below. Atomicity: All operations within a particular transaction are treated as a single unit, which ensure that either all changes are committed or none. Consistency: Data in database remain in a valid state throughout any transaction, which adhere to predefined constraints and rules. Isolation: Every transaction is executed independently as if it is the only operation happening on the database. Durability: Once a transaction is performed or committed, its changes are permanent, even in the event of systems failure. Applications of relational databases Relational databases are recommended for the following actions: ✓ Strong consistency: Ensuring all users see the same data simultaneously. ✓ Complex queries: Joining data from multiple tables to gain insights. ✓ ACID compliance: Guaranteeing reliable transaction processing for critical applications. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 2 Not suitable for the following approaches: ✓ Unstructured data: Handling or managing data that doesn't fit neatly into a tabular format (e.g., social media posts, sensor data). ✓ Massive scalability: When your application needs to scale horizontally across numerous servers. Popular relational databases Some of the popular relational databases that support organisation to store data and information are listed below: ✓ MySQL: Open-source and known for its ease of use, speed, and reliability, often used in web applications. ✓ PostgreSQL: Open-source and highly extensible, offering advanced features and strong compliance with SQL standards. ✓ Oracle Database: A comprehensive, enterprise-grade solution known for its performance, scalability, and security. ✓ Microsoft SQL Server: Tightly integrated with the Microsoft ecosystem, offering a wide range of tools for business intelligence and analytics. 2.1.2 NoSQL Databases Not Only (No) SQL (NoSQL) database is a powerful type of database like relational database. These databases are best used in scenarios where flexibility, scalability, and high performance are paramount. Unlike the relational databases, NoSQL systems can handle unstructured or semi-structured data without the constraints of a fixed schema. This means experts can design the table in various formats to capture or store data, such as documents, key-value pairs, and graph structures, without defining the structure for input record. These databases often provide features to scale out across multiple servers and clusters, which make them suitable for distributed data environments. Querying NoSQL databases Unlike relational databases, which use SQL, NoSQL databases don't have a universal query language. Instead, each type of NoSQL database typically has its unique query language or API (application platform interface) tailored to its specific data model and structure. Applications of NoSQL databases NoSQL databases are particularly well-suited for scenarios where: ✓ Agility is key: Rapid development cycles and evolving data models. ✓ Scale is a priority: Applications with exponential data growth or high traffic. ✓ Performance matters: Real-time applications requiring fast read/write operations. ✓ Variety is the norm: Diverse data types (e.g., social media posts, sensor data). Common use cases include the following: ✓ Big data analytics: Processing massive datasets. ✓ Real-time applications: Delivering up-to-the-minute information. ✓ Content management systems: Storing and managing diverse content. ✓ Internet of Things (IoT): Handling or managing continuous data streams. ✓ Personalisation engines: Tailoring user experience. While offering significant advantages, NoSQL databases might not be ideal for applications requiring strong transactional guarantees or complex relational queries. Many organisations adopt a hybrid approach, using both relational and NoSQL databases to leverage their respective strengths. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 3 2.1.3 Cloud Databases Cloud databases have revolutionised data management by leveraging the vast resources and scalability of cloud computing platforms. These databases reside on remote servers and are accessed over the Internet; eliminating the need for organisations to invest in and maintain their own hardware and infrastructure. They operate on a pay-as-you-go model, where organisations only pay for the resources, which they actually use for their routing activities. This technique eliminates the upfront costs and ongoing maintenance expenses associated with traditional on-premises databases. Cloud providers handle the following elements of digital infrastructure: servers, storage, and networking. This strategy enables organisations (users) to focus on building and managing their applications. Querying cloud databases Querying cloud databases involve using the same tools and languages as done with the off-line systems. The powerful tool the experts use to interact with the data is SQL, which is the same as the relational databases. Unlike NoSQL databases, using such systems for data interactions have their own query languages or APIs. Cloud providers often offer additional tools and services to simplify database management and querying. These might include web-based consoles and command-line interfaces depend on the programming languages associated with those systems. Applications of cloud databases Cloud databases are an excellent choice for managing data when the systems are positioned for the following: ✓ Scalability is crucial: Easily adapt to changing demands. ✓ Flexibility is a priority: Wide range of database options available. ✓ Global accessibility is important: Low- latency access for users worldwide. ✓ Cost-effectiveness is a concern: Pay-as-you-go model and scalable resources. Popular cloud databases The leading cloud providers offer a range of database services, each with its own strengths and specialties: ✓ Amazon RDS: Supports multiple database engines such as MySQL, PostgreSQL, and Oracle, offering managed relational database services. ✓ Google Cloud SQL: A fully-managed service that allows running MySQL, PostgreSQL, and SQL Server databases in the cloud. ✓ Azure SQL Database: Provides scalable, intelligent, and fully-managed database services in the Microsoft Azure cloud. ✓ Oracle cloud: Example includes live score of football matches 2.1.4 Vector Database Vector databases deal with special features or tool for handling the unique demands of AI and machine learning applications. These systems are designed to store, index, and manage vector embeddings, which are high-dimensional data representations often used in machine learning models. This strategy enables efficient similarity search, where the database can quickly identify areas (vectors or paths) that are "close" to a given query. The features make the vector databases suitable for applications such as image recognition, recommendation systems, and natural language processing. Querying vector databases The application of these databases in the context of querying involves the following steps: 1. Embedding the query: The input query (e.g., an image, a piece of text) is converted into a vector embedding using an appropriate embedding model. 2. Similarity search: The vector database performs a similarity search to find the nearest neighbours of the query embedding in the vector space. This is often done using approximate nearest neighbour (ANN) algorithms to ensure efficiency at scale. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 4 3. Returning results: The database returns the identified nearest neighbours along with their associated metadata or original data objects. Applications of vector databases Vector databases are particularly well-suited for scenarios concerning the following: ✓ Similarity search is critical: Applications like image recognition or recommendation systems. ✓ High-dimensional data is involved: Inefficient for traditional databases. ✓ Real-time performance is required: AI applications like recommender systems. Popular vector databases ✓ Faiss: Developed by Facebook AI Research, it provides efficient similarity search and clustering of dense vectors. ✓ Milvus: An open-source vector database that supports scalable similarity search and AI applications. ✓ Pinecone: A vector database service that simplifies the deployment and scaling of similarity search in production environments. 2.1.5 Object-oriented databases Object-oriented databases (OODBs) store data as objects, similar to object-oriented programming. This can simplify modeling complex data structures and relationships. However, OODBs have not gained widespread adoption due to challenges with standardisation and query optimisation. Popular options include ObjectDB and Versant Object Database. 2.1.6 Graph databases Graph databases excel at representing and querying relationships between entities. They store data as nodes (entities) and edges (relationships), making them well-suited for social networks, recommendation engines, fraud detection systems, and knowledge graphs. Popular options include Neo4j, Amazon Neptune, and JanusGraph. 2.1.7 Hierarchical databases Hierarchical databases organise data in a tree-like structure, with parent-child relationships between records. This structure is suitable for some specialised applications but can be inflexible for complex data models. These systems are less common in contemporary applications. 2.1.8 Network databases Network databases are similar to hierarchical databases but allow for more complex relationships between records. While they offer flexibility, they can also be more challenging to manage and query. These systems have largely been replaced by relational and graph databases in most applications. 2.1.9 Ranking of Databases Management Systems - Engines The ranking is based on the popularity of the product. The top four (4) databases as of 2024 are all relational architecture oriented and ranked as follow Oracle, MySQL, Microsoft SQL, and PostgreSQL. The figure below illustrates the evidence of the ranking. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 5 2.2 physical database design – traditional The physical database is the manual approach of keeping related files in folders at one location or in a cabinet. In our own home or office, we probably have some sort of filing system or cabinet, which contains receipts, guarantees, invoices, bank statements, and other documents. We always use manual approach for searching through several stages when something is needed. Alternatively, indexing system may be used, which helps to locate what we want more quickly. In fact, the physical or traditional database is not appropriate to handle contemporary data and information where users may want to access their profiles or records in real-time across the globe. Thus, in this era of digitalization where users need smart activities. The physical database works well when the number of items to be stored is small. It even works quite adequately when there are large numbers of items and we have only to store and retrieve them. For example, a typical real estate agent’s office might have a separate file for each property for sale or rent, each potential buyer and renter, and each member of staff. Clearly physical database is inadequate for this’ type of work. Some experts also consider flat file systems as physical database due to its methodology for handle data. The key focus of physical database design is on performance in terms of efficiency and simplicity. The steps taken for this system is to ensure that all key functions perform well and simple to implement. Physical database design involves processes for producing a description of the implementation of a database using a defined manual approach. The process includes data and information capturing, storage structures, access methods, and security mechanism. 2.3 understand database role against the traditional approach A database approach refers to the use of a database management system (DBMS)/software to efficiently and systematically organize, store, retrieve, and manage data. This approach is in contrast to the traditional database where data is stored in separate locations or files without a centralized management system. The DBMS approach offers several advantages over the traditional approach. Below provides that roles of DBMS as advantages of the traditional approach: i. Data Integrity: In a database approach, data integrity is maintained through various constraints (such as primary keys, foreign keys, and check constraints) defined in the database schema. This ensures that the data is accurate and consistent. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 6 ii. Data Independence: The database approach provides a higher level of abstraction, separating the physical storage details from the logical data structure. Thus, DBMS can provide an abstract view of the data to isolate application code from such details. This means that changes to the database schema do not affect the applications using the data, providing data independence. iii. Data Security: Databases offer security features, such as access controls and user privileges, to restrict unauthorized access to sensitive information. This enhances data security compared to a file- based system where securing individual files can be challenging. iv. Concurrent Access and Transactions: Databases support concurrent access to data by multiple users or applications. Transactions ensure that operations on the data are atomic, consistent, isolated, and durable (ACID properties), which is crucial for maintaining data integrity in a multi-user environment. v. Data Redundancy and Normalisation: - The database approach aims to minimize data redundancy through normalization, a process that organizes data to eliminate duplicate information. This reduces the chances of inconsistencies and anomalies in the data. vi. Data Retrieval and Query Language: Databases provide a powerful query language (e.g., SQL) that allows users to retrieve and manipulate data in a flexible and efficient manner. This is more convenient than the file-based approach, where custom code might be needed for each data retrieval operation. vii. Scalability: Databases are designed to handle large amounts of data and scale with growing data requirements. This scalability is achieved through various optimization techniques and indexing. viii. Data Relationships: In a database, relationships between different entities can be defined using foreign keys, enabling the establishment of connections between related pieces of information. This supports the representation of complex data relationships. ix. Backup and Recovery: Databases offer mechanisms for backup and recovery to safeguard against data loss. This is crucial for maintaining data consistency and availability. x. Centralized Management: A database provides a centralized management system, allowing for easier administration, monitoring, and maintenance of the data. When several users share the data, centralizing the administration of data can offer significant improvements. This is more efficient than managing multiple files scattered across different directories or locations. 2.4 File Systems vs. Database 2.4.1 File Systems The file system (flat file database) is basically a way of arranging the files in a storage medium such as hard disk and pen drive. Any software system that stores groups of records in a separate file. This approach is the oldest for capturing, storing, and managing files. A file actually consists of records. Thus, a file encompasses zero (0) or more records treated as a unity or any thing on the disk meaningful to the user. It does not organise the data but rather stores it as a “single table.” It is the simplest way to store and access data on a computer without the use of a software program to coordinate and manage the data It is a collection of application programs that perform services for the end-users such as students’ reports for the academic office and lecturers’ report for the dean’s office. Each program defines and manages its own data. Examples include Excel book and Word document. File systems also consist of different files which are grouped into folders or directories. The directory is an empty space created purposely to store files. The directories further contain other sub-folders and files. This system performs basic operations such as creating and saving. The figures below show file hierarch and approaches for managing files and directories. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 7 File Systems Student registration file Payment processing Course processing processing (flat file) (flat file) (flat file / appli. prg.) File systems Student details Payment details Course registration File systems storage method for a university students management system Limitations of FPS ✓ Data is separated and Isolated ✓ Data is often duplicated ✓ Application programs are dependent on file format ✓ It is difficult to represent complex objects using file processing systems ✓ Limited data sharing Advantages of File Systems ✓ Enforcement of development and maintenance standards. Helps reduce redundancy ✓ Avoid file management inconsistencies and maintain data isolation integrity. A solid theoretical foundation (for relational models). ✓ More efficient and less costly than a DBMS in certain situations. ✓ File handling design is simpler than database design. ✓ Uniform data management procedure ✓ Application programmers are not exposed to the details of data representation and storage. A DBMS uses various powerful features to store and retrieve data efficiently. Provides data integrity and security. Disadvantages of File System ✓ Each application has its own data file, so you may need to record and save the same data multiple times. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 8 ✓ It can keep records for large companies with a large number of items. A lot of work was required. ✓ It can keep records for large companies with a large number of items. ✓ Data dependencies in file handling systems are data dependent, but the problem is incompatibility with file formats. Time-consuming. ✓ Limited Data Sharing. ✓ Security issue. 2.4.2 DBMS (Database Management Systems) Database Management System is basically software that manages the collection of related files. It is used for storing large volume of data and retrieving them effectively when needed. It also provides End Users proper security measures for protecting the data from unauthorized access Database Applications Examples are Oracle, MySQL, and MS SQL Database Mgt. server. Systems Advantages of DBMS ✓ Provides data integrity and security ✓ Uniform data management procedure ✓ Application programmers are not exposed to the details of data representation and storage. ✓ DBMSs have different techniques for storing and retrieving data. ✓ A DBMS uses various powerful features to store and retrieve data efficiently. ✓ DBMS implies integrity constraints to maintain a high level of protection against unauthorized access to data. ✓ Reduce application development time ✓ Reduced redundancy. ✓ Provides data independence. Other advantages of DBMS are ✓ Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is especially important if the data is stored on external storage devices. ✓ Data Consistency and Integrity - by controlling access and minimizing data duplication ✓ Application program independence - by storing data in a uniform fashion ✓ Data Sharing - by controlling access to data items, many users can access data concurrently ✓ Backup and Recovery ✓ Security and Privacy ✓ Multiple views of data Disadvantages of DBMS ✓ Most database management systems are complex and require user training to use the DBMS. ✓ Some data may be lost when many users use the same program simultaneously. ✓ DBMS hardware and software costs are very high and add to an organisation’s budget. ✓ DBMS cannot perform advanced calculations. ✓ Datasets are larger as they provide more predictable query response times. ✓ A fast processor for data processing was required. ✓ Power failures can cause databases to fail and even bring down the entire system. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 9 2.1.1 Difference between File System and DBMS A file system and a DBMS are two kinds of data management systems that are used in different capacities and possess different characteristics. The approach of for organizing and managing data is the main difference between the two systems. File systems are used to manage files and folders (directories), and provide basic operations for creating, deleting, renaming, and accessing files. They typically store data in a hierarchical structure, where files are organized in directories and subdirectories on a storage media. File systems are simple and efficient, but they lack the ability to manage complex data relationships and ensure data consistency. DBMS is a more elaborate software application that is solely charged with the responsibility of managing large amounts of structured data. DBMS provides a centralized and organized way of storing data, which can be accessed and modified by multiple users or applications. DBMS offers advanced features such as data validation, indexing, transactions, and backup and recovery mechanisms. DBMS ensures data consistency, accuracy, and integrity by enforcing data constraints, such as primary keys, foreign keys, and data types. It provides functionalities such as query, index, transaction, and data integrity. Although the file system serves well for the purpose of data storage for applications where data is to be stored simply and does not require any great organization. DBMS is more appropriate for applications where data need to be stored and optimized for organizational and structural needs. The table below shows more details of the two approaches for handle records or data and information. In terse, the key difference between File Systems and DBMS ✓ A file system is a collection of data, and you must write procedures to manage it, whereas a DBMS is a collection of data, and you do not have to write procedures for working with databases. ✓ A file system is a software that manages and organizes files on storage media, and a DBMS is a software application used to access, create, and manage databases. ✓ A file system does not have a crash recovery mechanism, but a DBMS does. ✓ File systems do not support complex transactions, but DBMS can quickly implement complex transactions using SQL. ✓ A DBMS provides concurrency, but a file system does not. ✓ Data inconsistency is high in file systems. In contrast, data inconsistencies in DBMS are occasional. Table for File System vs. DBMS Basics File System DBMS The file system is a way of arranging the DBMS is software for managing the Structure files in a storage medium within a database. computer. Redundant data can be present in a file Data Redundancy In DBMS there is no redundant data. system. Backup and It doesn’t provide Inbuilt mechanism for It provides in house tools for backup Recovery backup and recovery of data if it is lost. and recovery of data even if it is lost. There is no efficient query processing in Efficient query processing is there in Query processing the file system. DBMS. There is less data consistency in the file There is more data consistency because Consistency system. of the process of normalization. It is less complex as compared to It has more complexity in handling as Complexity DBMS. compared to the file system. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 10 File systems provide less security in DBMS has more security mechanisms Security Constraints comparison to DBMS. as compared to file systems. It has a comparatively higher cost than Cost It is less expensive than DBMS. a file system. In DBMS exists, mainly of two types: Data Independence There is no data independence. 1) Logical Data Independence. 2)Physical Data Independence. Multiple users can access data at a User Access Only one user can access data at a time. time. The users are not required to write The user has to write procedures for Meaning procedures. managing databases Data is distributed in many files. So, it Due to centralized nature data sharing Sharing is not easy to share data. is easy It gives details of storage and Data Abstraction It hides the internal details of Database representation of data Integrity Constraints are difficult to Integrity constraints are easy to Integrity Constraints implement implement To access data in a file, user requires Attributes attributes such as file name and file No such attributes are required. location. Nb: - File systems are suitable for managing small amounts of unstructured data - DBMS is designed for managing large amounts of structured data -It offers more advanced features for ensuring data integrity, security, and performance. Mini case study The need for using DBMS is based on the routine activities of the organisation. ABC company has a large collection of data on employees, departments, products, sales, and other functions that require over 100 GB storage capacity. The employees are across the world and need to access some parts of the data currently for their routine activities, which can update the company’s database in real-time. 2.5 understand some database terms The database approach separates the structure of the data from the application programs and this approach is known as data abstraction. An entity(source document, set) is a specific object (for example a department, place, or event) in the organisation that is to be represented in the database An attribute is a property that explains some characteristics of the object that we wish to record. A relationship is an association between entities An entity is a specific object (for example a department, place, or event) in the organisation that is to be represented in the database. An attribute(field) is a property that explains some characteristics of the object that we wish to record. A relationship is an association between entities (e.g Name, ID, Address) A relationship is an association between entities(e.g one-to-one, Registration and Department files) Relation=table=file. Tuple=record=row The database approach separates the structure of the data from the application programs and this approach is known as data abstraction. Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 11 Superkey is a column, or combination of columns that uniquely identifies a row within a relation. Candidate key is described as a Superkey without redundancies. Primary key is the candidate key that is selected to uniquely identify rows within the relation. (A good candidate for a primary key has several characteristics. First, it uniquely identifies each row. Second, it is never empty or null — it always contains a value. Third, it rarely (ideally, never) changes. Access uses primary key fields to quickly bring together data from multiple tables.) Foreign key is an attribute or a set of attributes in one table whose values must match the candidate key of another relation. Or when a primary key valid in two or more tables Composite key when a table has two or more primary keys Database Design and Mgt. Topic 2 by Dr. Ohene-Bonsu Simmons, Ph.D. Page 12

Use Quizgecko on...
Browser
Browser