Database Systems Architecture PDF
Document Details
Uploaded by UnrealCarnelian9420
Tags
Summary
This document provides an overview of database systems architecture, covering various aspects from schemas and instances to three-tier architecture. It includes a detailed discussion of different database models such as hierarchical, network, and relational.
Full Transcript
Chapter Two Database Systems Architecture 1 Database Management Systems Database schema and Instance Databases change over time as information is inserted and deleted. The...
Chapter Two Database Systems Architecture 1 Database Management Systems Database schema and Instance Databases change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. Schemas are changed infrequently, if at all Database systems have several schemas, partitioned according to the levels of abstraction. The physical schema describes the database design at the physical level, while the logical schema describes the database design at the logical level. 2 Database Management Systems A database may also have several schemas at the view level, sometimes called subschemas, that describe different views of the database. The logical schema is by far the most important, in terms of its effect on application programs, since programmers construct applications by using the logical schema. The physical schema is hidden beneath the logical schema, and can usually be changed easily without affecting application programs. Note : Application programs are said to exhibit physical data independence if they do not depend on the physical schema, and thus need not be rewritten if the physical schema changes 3 Database Management Systems The Three Levels of DBS Architecture System architecture is the conceptual model that defines the structure, behavior, and more views of a system. The term "database architecture" refers to the structural design and methodology of a database system, which forms the core of a Database Management System (DBMS). This architecture dictates how data is stored, organized, and retrieved, playing a crucial role in the efficiency and effectiveness of data management. The Architecture of most of commercial DBMS available today is mostly based on ANSI-SPARC database architecture. ANSI SPARC THREE-TIER architecture has three main levels: Internal Level Conceptual Level External Level 4 Database Management Systems Using these three levels, it is possible to use complex structures at internal level for efficient operations and to provide simpler convenient interface at external level. Internal level: it is also known as physical level. It describes how the data are actually stored on storage devices. It provides internal view of physical storage of data. It deals with complex low level data structures, file structures and access methods in detail. It also deals with data compression and encryption techniques, if used. 5 Database Management Systems Conceptual level: It is also known as Logical level. This is the next higher level than internal level of data abstraction. It describes What data are stored in the database and What relationships exist among those data. It hides low level complexities of physical storage. Community view of the database. Application developers also work on this level. 6 Database Management Systems External Level: It is also known as a view level. This is the highest level of data abstraction. It describes only part of the entire database that concerns end users. End users need to access only part of the database rather than entire database. Different user need different views of database. And so, there can be many view level abstractions of the same database. 7 Database Management Systems ANSI-SPARC Three-Level Architecture 8 Database Management Systems Advantages of Three-tier Architecture The main objective of it is to provide data abstraction. Same data can be accessed by different users with different customized views. A user’s view is immune to changes made in other user’s views. The user is not concerned about the physical data storage details. Physical storage structure can be changed without requiring changes in conceptual structure of the database as well as users view. Conceptual structure of the database can be changed without affecting end users. 9 Database Management Systems Data Independence Main Concept is upper layers are immune to changes in the lower layers Logical Data Independence Refers to immunity of external schemas to changes in conceptual schema. Conceptual schema changes (e.g. addition/removal of entities) should not require changes to external schema or rewrites of application programs. Physical Data Independence Refers to immunity of conceptual schema to changes in the internal schema. Internal schema changes (e.g. using different file organizations, storage structures/devices) should not require change to conceptual or external schemas. 10 Database Management Systems Schema Mapping- Provision for Data Independence ANSI-SPARC has three Layers of Describing Organizational Data The DBMS is responsible for mapping between these three types of schema. External/Conceptual Mapping This enables the DBMS to map names in the user’s view on to the relevant part of the conceptual schema. Conceptual/Internal Mapping This enables the DBMS to find the actual record or combination of records in physical storage that constitute a logical record in the conceptual schema, together with any constraints to be enforced on the operations for that logical record. 11 Database Management Systems ANSI-SPARC Three-Level Architecture and Data Independence 12 Database Management Systems Client/Server Architecture Database system architecture refers to the way components of the database system relate to each other to meet desired functionality of the system A single-user DB system is designed to be installed on a single computer and can only be accessed by only one user who is currently logged in at a time. A multi-user DB system, on the other hand, can be accessed by multiple users simultaneously and can be scaled to handle large volumes of data and users with a higher performance and security. There are Common architectures that are used to implement multi-user database management systems like Teleprocessing, File-server, and Client–server 13 Database Management Systems Mainframe Architecture (Teleprocessing) is a traditional multi-user system architecture that have single mainframe and multiple (dumb)terminals. This system can enables terminals to access application programs ,DBMS and database directly. But this architecture places a tremendous burden on the central computer, which not only had to run the application programs and the DBMS, but also had to carry out a significant amount of work on behalf of the terminals (such as formatting data for display on the screen). 14 Database Management Systems File-Server Architecture A file-server is a computer that is connected to several workstations in a network and mainly serves as a shared storage. Database resides on file-server. DBMS and applications run on each workstation. The applications and the DBMS run on each workstation requesting files from the file- server when necessary. The processing is distributed over the network. Workstation (application and DBMS) must request data before they can get database. 15 Database Management Systems Some disadvantages Heavy network traffic High total costs of ownership: maintain a full instance of the DBMS on each client (workstation). Complex integrity, concurrency, and recovery control: Multiple DBMSs may concurrently access the same shared file. 16 Database Management Systems Client Server Architecture To overcome the disadvantages of the first two approaches and accommodate an increasingly decentralized business environment, the client–server architecture was developed. As the name suggests, there is a client process, which requires some resource, and a server, which provides the resource. There is no requirement that the client and server must reside on the same machine. Client : is any end user device, program or user that requests access to a service provided by a server in a client-server architecture. End-user devices typically include desktop computers, laptops and smartphones. A server is a computer program or device that provides and manages, access to a centralized resource or service to another device , program or end user in a network There are a two tier and three tier client server architectures. 17 Database Management Systems Two-Tier Architecture: In the most common implementation of a two tier client server architecture the three components of an application (presentation, processing, and data) are divided among two software tiers, client application code and database server. Presentation is handled exclusively by the client, processing is split between client and server, and data is stored on and accessed via the server. transmitting client requests to the server are essential for a two-tier implementation Disadvantages The performance degrades when the number of users increases. it is difficult to implement reliable security as users need to have login information for every database server. 18 Database Management Systems Two-Tier Architecture 19 Database Management Systems Three-Tier Architecture: The tree tier architecture attempts to overcome some of the limitations of the two-tier scheme by separating presentation, processing, and data into separate, distinct software tiers. In a three-tier architecture, the application or process logic is situated in the middle-tier, completely independent from the user interface(client environment ) and the data(the DBMS's server environment). These systems are generally more adaptable, secure, scalable and they can seamlessly integrate data from diverse sources When the presentation client requires calculations or data access, a call is made to a middle tier functionality server. This tier can perform calculations or can make requests as a client to data servers. Middle-tier functionality servers may be multi-threaded and can be accessed by multiple clients, even those from separate applications. 20 Database Management Systems Three-Tier Architecture 21 Database Management Systems Three Tier Client Server Architecture has three part: Presentation Tier (first tier) Presentation tier is the user interface and communication layer of the application, where the end user interacts with the application. Its main purpose is to display information to and collect information from the user. This top-level tier can run on a web browser or as desktop application as an interface with the following tasks o Presentation of data o Basic input validation o Send request to server and visualize results 22 Database Management Systems Application tier/ Logic tier/middle tier The application tier, also known as the logic tier or middle tier, is the heart of the application with the following tasks o Business logic o Data Processing Logic In this tier, information that is collected in the presentation tier is processed , sometimes against other information in the data tier ,using business logic, a specific set of business rules. The application tier can also add, delete, or modify data in the data tier. The application tier is typically developed by using Python, Java etc.. and communicates with the data tier by using API calls. 23 Database Management Systems Data tier The data tier, sometimes called database tier, data access tier or back-end, is where the information that is processed by the application is stored and managed. This tire has the following tasks o Basic Data Validation o Manage (concurrent) database access (data services) This can be a relational database management system such as Microsoft SQL Server , PostgreSQL, MySQL etc or in a NoSQL Database server such as MongoDB or Cassandra. In a three-tier application, all communication goes through the application tier. The presentation tier and the data tier cannot communicate directly with one another. 24 Database Management Systems Advantages of three tire architecture Reduced costs for thin clients due to lower resource requirements because application running in web browser Application logic is centralized in a single application server Increased modularity: easier to replace one tier without affecting the other tier Load Balancing become easier with a clear separation between the core business logic and the database functionality It is the most powerful Architecture than the others because it offers quick access to the data from its specialized function from every server and it is possible to replace one tier without affecting the other tier so it is so easy to make a maintenance. 25 Database Management Systems Difference between Two and Three Tier Database Architectures Parameters Two-Tier Database Architecture Three-Tier Database Architecture It primarily consists of three layers - the It primarily comprises two layers - the Number of Tiers Data Layer, the Business Layer, and the Data Tier and the Client Tier. Client Layer. A two-tier DB architecture embeds the A three-tier DB architecture places the application logic either within the server process or application logic in the middle- Placement of Application Logic database, the client (within the UI), or tier, making it distinct from the Client/User both. Interface and the Data Interface. A two-tier DB architecture is relatively A three-tier DB architecture is relatively Construction and Maintenance simpler to build and maintain. complex to construct and maintain. Operational Speed It operates at a relatively slower speed. It operates at a relatively faster speed. The three-tier DB architecture prevents The two-tier DB architecture allows direct direct communication between its clients Security communication between the client and the and database, making it more secure in the database, making it less secure. long run. The three-tier DB architecture experiences The two-tier DB architecture experiences a a performance drop when operated over Performance Degradation performance drop with an increase in user the Internet. However, it generally count. performs better than its two-tier 26 Database Management Systems counterpart. Reality, Data, and Metadata The real world will be referred to as reality. Data collected about people, places, or events in reality will eventually be stored in a file or database. To understand the form and structure of the data, information about the data itself is required. The information that describes data is referred to as metadata. Metadata are data about the data in the file or database. Metadata is data that describes other data.... For example, author, date created, date modified and file size are examples of very basic document metadata 27 Database Management Systems Reality, data, and metadata 28 Database Management Systems An entity is an object or concept in the real world, that is distinguishable from all other objects ,about which an organization needs to collect data. An entity may be a PERSON, STUDENT ,COURSE etc… An entity has a set of descriptive characteristic called attributes possessed by each member of an entity and the values for some set of properties may uniquely identify an entity occurrence. A record consists of data items /fields, each data item describing an attribute of the entity. Metadata describe the length and composition of each of the records and also a describe the name given and the length assigned to each data item. Every database stores every information about its objects about its structure, definition, purpose, storage, number of columns and records, dependencies, access rights, owner etc. 29 Database Management Systems Collection of these metadata is stored in the Data Dictionary or System Catalog. Usually, system catalogs are accessed by the DBMS to perform various transactions and data dictionary has the user accessible views that are accessed by the developers/ designers/ users. In general data dictionary /system catalog stores: names, types, and sizes of data items; Default field values if exist ; constraints on the data; names of authorized users; data items accessible by a user and the type of access; usage statistics. Details cannot be modified by the user. Database Management Systems 30 Data Models A collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints Database Model determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized and manipulated. Used to show how data is stored, connected, accessed and updated in the database management system Data Model comprises: a structural part; a manipulative part; possibly a set of integrity rules. 31 Database Management Systems There are three types of Models ( In-line with ANSI-SPARC) 1. An external data model, to represent each user’s view of the organization, sometimes called the Universe of Discourse (UoD); 2. A conceptual data model a, to represent the conceptual (or community) view that is DBMS independent; 3. An internal data model, to represent the logical schema in such a way that it can be understood by the DBMS. External models Represent the user’s view of the database Contain multiple different external views Are closely related to the real world as perceived by each user 32 Database Management Systems Conceptual models Defines WHAT the system contains Provide flexible data-structuring capabilities Present a “community view” as the conceptual structure of the entire database Support all user views. Show relationships among data including: Constraints Semantic information (e.g., business rules) Security and integrity information Consider a database as a collection of entities of various kinds Are the basis for identification and high-level description of main data objects; they avoid details 33 Database Management Systems Offers Organization-wide coverage of the business concepts. This type of Data Models are designed and developed for a business audience. Note : Entities : object or any item about which the data should be captured and stored objects that exists separately from other objects and have a clear identity of their own. Conceptual modelling is process of developing a conceptual data model which is a complete and accurate representation of an organization’s data requirements The conceptual model is developed independently of hardware specifications like data storage capacity, location or software specifications like DBMS vendor and technology. The focus is to represent data as a user will see it in the "real world." 34 Database Management Systems The purpose of creating a conceptual data model is to establish entities, their attributes, and relationships. In this data modeling level, there is hardly any detail available on the actual database structure. Example : Entity-Relationship Model Note Conceptual models are also referred to as logical models in some literature. However there is a distinction between conceptual and logical data models. The conceptual model is independent of all implementation details, whereas the logical data model adds further information to the conceptual data model elements.. 35 Database Management Systems Internal models (logical Models ) Defines HOW the system should be implemented regardless of the DBMS. Consider a database as a collection of fixed-size records Are closer to the physical level or file structure Designed and developed independently from the DBMS. Data attributes will have datatypes with exact precisions and length. Normalization processes to the model is applied typically till 3NF. Are a representation of the database as seen by the DBMS. Require the designer to match the conceptual model’s characteristics and constraints to those of the selected implementation model The three best-known Internal models are the relational data model, the network data model and the hierarchical data model. 36 Database Management Systems Physical models Describes HOW the system will be implemented using a specific DBMS system Are the physical representation of the database Have the lowest level of abstractions Describes how the data is stored and deal with Run-time performance Storage utilization and compression File organization and access methods Data encryption Are the physical level – managed by the operating system (OS) Provide concepts that describe the details of how data are stored in the computer’s memory 37 Database Management Systems Data models can also be categorized as Object-based Models Record-based Models Physical Models Object-Based Data Models based on the concept of Entity (distinct object) Entity-Relationship- Considers only the data aspect Object-Oriented- considers both data and behaviour. 38 Database Management Systems Record-Based Data Models (Hierarchical, Network and Relational) based on fixed format records Each record has fixed number of fields Each field is of a fixed length (Number of Characters) Physical Data Models Models for describing physical storage characteristics 39 Database Management Systems Entity–Relationship model An Entity–relationship model (ER model) is a design or blueprint of a database that can later be implemented as a database. ER model is a high-level conceptual data model ER model helps to systematically analyze data requirements to produce a well-designed database. ER model describes the structure of a database with the help of a diagram, which is known as Entity Relationship Diagram (ER Diagram). ERD is a diagram that displays the relationship of entity sets stored in a database. ER diagrams are created based on three basic concepts: entities, attributes and relationships. 40 Database Management Systems Object-oriented Model Uses the E-R modeling as a basis but extended to include encapsulation and inheritance In this model, data is stored in the form of objects. Objects have both state and behavior State is defined by attributes Behavior is defined by methods (functions or procedures) Designer defines classes with attributes, methods, and relationships Classes are a grouping of all objects with the same properties and behaviors Pointers are addresses that facilitate both object access and establishing relationships between objects. Class constructor method creates object instances Each object has a unique object ID Classes related by class hierarchies Both conceptual-level and logical-level model 41 Database Management Systems Advantages It can easily support complex data structures, with relationships. It also supports features like Inheritance, Encapsulation, etc. Object-oriented databases provide us code reusability, real world modelling, and improved reliability and flexibility. Disadvantages There is no universally defined data model for an OODBMS, and most models lack a theoretical foundation. In comparison to RDBMSs the use of OODBMS is still relatively limited. The system is more complex than that of traditional DBMSs. 42 Database Management Systems Hierarchical Database Model In hierarchical model data is organized into a sort of upside down tree like structure with a single root(top node), to which all the other data is linked Each record other than the root is having one parent record and many children. A record is a collection of fields, with each field containing only one value. Record type is referred to as node or segment A parent node can have more than one child node A child node can only have one parent node The relationship between parent and child is one-to-many Relation is established by creating physical link between stored records (implemented as pointer) 43 Database Management Systems A hierarchical database consists of a collection of records which are connected to one another through links which is an association between precisely two records. The main drawback of this model is that, it can have only one to many relationships between nodes. To add new record type or relationship, the database must be redefined and then stored in a new form. And also it might be challenging to significantly accommodate changes in the data structure, changing the database design. The schema for a hierarchical database consists of boxes, which correspond to record types and lines, which correspond to links 44 Database Management Systems 45 Database Management Systems Advantage Promotes data sharing Parent / data relationship promotes conceptual simplicity and data integrity Database security Efficient with 1:many relationships Disadvantage Complex implementation Difficult to manage and lack of standards Changes in structure require changes in all application programs Implementation limitations Requires knowledge of physical data storage characteristics Navigation system requires knowledge of hierarchical path 46 Database Management Systems Network Data model The network database model was created to solve the shortcomings of the hierarchical database model. A child can be linked to multiple parents, a feature that was not supported by the hierarchical data model. The parent nodes are known as owners and the child nodes are called members. Is a database model that allows multiple records to be linked to the same owner file.. The multiple linkages allows the network database model to be very flexible. A network data models sees records as set members Each set has an owner and one or more members Allows many to many relationship between entities(Parent and Child) Like hierarchical model network model is a collection of physically linked records. Allow member records to have more than one owner 47 Database Management Systems Network Model 48 Database Management Systems Advantages of Network Model The network model can support many to many relationships Disadvantages of Network Model The network model is much more complicated than the Hierarchical model. As such, it is difficult to handle and maintain. Not all relations can be handled by assigning them in the form of owners and members. The structure of the Network Model is quite complicated and so the programmer has to understand it well in order to implement or modify it. In general Hierarchical and Network models are Navigational and procedural approach to data processing Need to know the physical Database to access the data Treat records as individual objects linked with pointers i.e. cannot process in sets 49 Database Management Systems Relational Data Model The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Can define more flexible and complex relationship Viewed as a collection of tables called “Relations” equivalent to collection of record types Stores information or data in the form of tables rows and columns A row of the table is called tuple equivalent to record A column of a table is called attribute equivalent to fields Data value is the value of the Attribute Records are related by the data stored jointly in the fields of records in two tables or files. The related tables contain information that creates the relationship 50 Database Management Systems NoSQL Models Data Models can be also categorized as Relational(SQL) and non relational (NoSQL) NoSQL stands for "Not Only SQL," offers a more flexible, non-relational approach, ideal for handling unstructured or dynamic data Instead of storing data in fixed rows and columns, document databases use flexible documents. The NoSQL database model have dynamic schemas that supports an unstructured style of storing data. NoSQL database models include document oriented , graph based , column oriented , and key- value A document is a record in a document database. A document typically stores information about one object and any of its related metadata. 51 The documents look more like JSON(JavaScript Object Notation) strings or Key-value based object representations A collection is a group of documents. Collections typically store documents that have similar contents. Not all documents in a collection are required to have the same fields, because document databases have a flexible schema. Documents can be created without defining their structure at first. It does provide features like indexing, relationships between data, etc. This database model is well-suited for Big data applications, real-time analytics, CMS (Content Management systems), etc. NoSQL database examples include MongoDB and Cassandra 52 Database Management Systems Advantages of the NoSQL Model This database model functions with high performance. The NoSQL database model can handle large volumes of data. An intuitive data model that is fast and easy for developers to work with. Flexible schema: Document databases have a flexible schema, meaning that not all documents in a collection need to have the same fields A flexible schema that allows for the data model to evolve as application needs change. Distributed and resilient: Document databases are distributed, which allows for horizontal scaling (typically cheaper than vertical scaling) and data distribution Querying through an API or query language: Document databases have an API or query language that allows developers to execute the CRUD operations on the database. But NoSQL databases don't support ACID (atomicity, consistency, isolation, durability) transactions across multiple documents. 53 Database Management Systems Database Management System (DBMS) DBMS is a software consisting of multiple integrated components that deliver a consistent and managed environment that enables users to define, create, maintain, and control access to the database. Some of the components are Storage engine, Metadata catalog, Database language, Query processor, optimization engine, log managers Storage engine. This basic element of a DBMS is used to store data. The DBMS must interface with a file system at the operating system (OS) level to store data. Metadata catalog. Sometimes called a system catalog or database dictionary, a metadata catalog functions as a repository for all the database objects that have been created. he metadata catalog can include information about database objects, schemas, programs, security, performance, communication and other environmental details about the databases it manages. Database language. The DBMS also provide an API to access the data, typically in the form of a database language to access and modify data but may also be used to create database objects and secure and authorize access to the data. 54 Database Management Systems Database Languages A database system provides a data definition language to specify the database schema , data control language to defines privileges granted to database users and a data manipulation language to express database queries and updates. The data definition, data control and data manipulation languages are not separate languages; instead they simply form parts of a single database language, such as the widely used Structured Query Language( SQL language) Data-Definition Language (DDL) A database schema is specified by a set of definitions expressed by a special language called a data-definition language (DDL). 55 Database Management Systems CREATE TABLE Persons ( PersonID int, LastName varchar(30), FirstName varchar(30), City varchar(40) ); Execution of the above DDL statement creates the Persons table. In addition, it updates a special set of tables called the data dictionary or System catalog. 56 Database Management Systems Database Manipulation Languages(DML) Data manipulation is The insertion of new information into the database The retrieval of information stored in the database The modification of information stored in the database The deletion of information from the database A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. 57 Database Management Systems There are basically two types of DML Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed without specifying how to get those data Procedural DMLs require a user to specify what data are needed and how to get those data. Declarative DMLs are usually easier to learn and use than are procedural DMLs. Since a user does not have to specify how to get the data, the database system has to figure out an efficient means of accessing data. SQL is a set-based, declarative programming language, T-SQL(Transact-SQL) is a procedural language used by Microsoft in SQL Server. It is the extension of SQL and adds declared variables, transaction control, error and exception handling, and row processing to SQL. Database Management Systems 58 The portion of a DML that involves information retrieval is called a query language. A query is a statement requesting the retrieval of information. Although technically incorrect, it is common practice to use the terms query language and data manipulation language synonymously SELECT Persons. FirstName, Persons. LastName FROM Persons WHERE Persons. City=‘Addis Ababa’; Execution of the above DML results a view of persons First and Last Name who lives is Addis Ababa 59 Database Management Systems DCL (Data Control Language): defines the privileges granted to database users. GRANT: to allow specified users to perform specified tasks/Grant user access privileges REVOKE: to cancel previously granted or denied permissions/Removes user access privileges Transaction control language (TCL) Transaction control language (TCL) manages the transactions within a database. Transactions group a set of related tasks into a single, executable task. All the tasks must succeed in order for the transaction to work. Here's a list of TCL statements:COMMIT: Carries out a transaction ROLLBACK: Restores a transaction if any tasks fail to execute 60 Database Management Systems