Introduction to Database Session 1.pptx
Document Details
Uploaded by JoyousForesight4592
Full Transcript
Introduction to Database P R O F K A L LO L D A S Session 1 OVERVIEW OF DATABASE SYSTEMS Content Data Fundamentals Developments in Database system (Historical Perspective) Information and Data Models Data Model and Categorization DB Schema versus DB Instance DBMS Language Component modules o...
Introduction to Database P R O F K A L LO L D A S Session 1 OVERVIEW OF DATABASE SYSTEMS Content Data Fundamentals Developments in Database system (Historical Perspective) Information and Data Models Data Model and Categorization DB Schema versus DB Instance DBMS Language Component modules of DBMS Centralized and Client Server Database Systems Classification of DBMS Database design process Data Fundamentals Data pervades every aspect of our surroundings in the rapidly evolving world of information and technology Data refers to unorganized information that undergoes processing to make it meaningful. It includes Facts, observations, perceptions Numbers, characters, symbols Images Or a combination of these elements. Data Fundamentals cont.… The structure of data plays an important role in determining its efficient management, storage, and analysis. Broadly, we can categorize data into three main types: Structured Unstructured Semi-structured. Data Fundamentals cont.… Structured Data: Highly organized Follows a predefined format, typically arranged in tables with rows and columns Adheres to a strict schema and a rigid structure ensuring consistency and easy retrieval Examples of structured data include Excel spreadsheets, arrange data into rows and columns, assigning each piece of data a specific cell address. SQL databases that allow data to be stored in predefined tables and columns, online forms collect customer information by storing each piece of data; name, address, credit card number in designated fields. Data Fundamentals cont.…. Unstructured Data: Lacks a specific format or organization It doesn't conform to any predefined rules or sequences, making it challenging to process and analyze using traditional methods. Examples of unstructured data include Text files contain free-form text documents without a predefined structure. Media files include images, audio, and video. Web pages, main content, such as text, images, and multimedia is often unstructured despite potentially structured elements like HTML tags. Social media content encompasses posts, tweets, and other updates with mixed text, images, and links. Data Fundamentals cont.…. Semi-structured Data: Such data possesses some organizational properties but does not adhere to a strict tabular structure. Employs hierarchical structure or tags to organize information, providing a balance between flexibility and structure. Examples of semi-structured data Include JSON files contain arrays and objects using specific tags or keys to mark different data elements. XML documents define data structure using tags, attributes, and schema. Emails have structured fields; to, from, subject but unstructured message bodies. Developments in Database system (Historical Perspective) Era Year Popular Remarks s Product Early Database 1960 Network model Computerized databases started in the 1960s, when the use of (CODASYL) & computers became a more cost-effective option for private hierarchical model (IMS) organizations. The Introduction of 1970 DB2, Oracle E.F. Codd's relational data model introduced a non-procedural querying the Relational method and separated schema from physical storage, setting the Database standard for database systems. Microcomputer 1980 DB2, R:base, Paradox, SQL was standardized by ANSI in 1986 and ISO in 1987. The surge in DBMS products Access computer sales drove the success of relational databases and led to a decline in network and hierarchical models. Web Database and 1990 Apache, ASP, Java In the mid-1990s, the Internet's rise spurred rapid growth in the Open Source DBMS Servelets, EJB, Oracle database industry. Desktop users began accessing legacy data through Developer 2000. client-server systems, increasing the need for improved databases. By MySQL and PostgresQL the late 1990s, investment in online businesses drove higher demand for Internet database connectors. XML and Web 1998+ XML, SOAP, WSDL, UDDI XML provided huge benefit to web-based database applications. Services Big Data and 2000+ Hadoop, Cassandra, Despite the early 2000s Internet decline, database applications grew, NoSQL MongoDB etc with new interactive uses for point-of-sale, and vendor consolidation. Microsoft, IBM, and Oracle lead the database industry in the West. Web Information and Data Models A database model/data model is an abstract model that describes how the data is represented and used. A data model consists of a set of data structures and conceptual tools that is used to describe the structure (data types, relationships, and constraints) of a database. A data model not only describes the structure of the data, it also defines a set of operations that can be performed on the data. A data model generally consists of data model theory , which is a formal description of how data may be structured and used. Data model instance is a practical data model designed for a particular application. The process of applying a data model theory to create a data model instance is known as data modeling. Data Model Categorization The data models are categorized into three types: o High-level or conceptual data models: Conceptual data model describes the information used by an organization in a way that is independent of any implementation-level issues and details. This can be understood even by the end users having no technical background. ER Diagram o Representational or Implementation data models: The representational or implementation data models hide some data storage details from the users. The various representational data models are: o Hierarchical Data Model: This data model organizes the data in a tree-like structure. o Network Data Model: In network data model, the data is organized in the form of graphs. The first specification of network data model was presented by Conference on Data Systems Languages (CODASYL) in 1969. o Relational Data Model: The relational data model was developed by E. F. Codd in 1970. All data is maintained in the form of tables (generally, known as relations) consisting of rows and columns. Each row (record) represents an entity and a column (fields) represents an attribute of the entity. The relationship between the two tables is implemented through a common attribute in the tables and not by physical links or pointers. o Object-Based Data Model: The object-oriented data model extends the concepts of object-oriented programming language with persistence, versioning, concurrency control, data recovery, security, and other database capabilities. The object-relational data model is an extension of relational data model. It combines the features of both the relational data model and object-oriented data model. o Semi-structured Data Model: Semi-structured data model facilitates data exchange among heterogeneous data sources. E.g. XML o Low-level or physical data models: Physical data model describes the data in terms of a collection of files, indices, and other storage structures such as record formats, record ordering, and access paths. DB Schema versus DB Instance The overall design or description of the database is known as database schema or simply schema. The data in the database at a particular point of time is known as database instance or database state or snapshot. The various states of the database are: o Empty State o Initial State o Current State The DBMS is responsible to check whether the database state is valid state. Thus, each time the database is updated, DBMS ensures that the database remains in the valid state. The DBMS refers to DBMS catalog where the metadata is stored in order to check whether the database state satisfies the structure and constraints specified in the schema. DBMS Language The main objective of a database management system is to allow its users to perform a number of operations on the database such as insert, delete, and retrieve data in abstract terms without knowing about the physical representations of data. DBMS mainly provides two database languages: Data Definition Language (DDL): Is used to define the conceptual and internal schemas for the database. It is to specify the integrity rules (constraints) in order to maintain the integrity of the database. The various integrity constraints are domain constraints, referential integrity, assertions and authorization. Data Manipulation Language (DML): DML enables users to retrieve and manipulate the data. Component Models of DBMS Data definition: Provides functions to define structure of the data. These functions include defining and modifying the record structure, the data type of fields, and the various constraints to be satisfied by the data in each field. Data manipulation: Once the data structure is defined, data needs to be manipulated. The manipulation of data includes insertion, deletion, and modification of records. Data security and integrity: This module handle the security and integrity of data stored in the database. Concurrency and data recovery: This module contains some functions that deal with the concurrent access of records by multiple users and the recovery of data after a system failure. Performance optimization: This module optimize the performance of the queries by evaluating the different execution plans of a query and choosing the best among them. Centralized and Client Server Database System Centralized database systems: The database system, application programs, and user-interface all are executed on a single system and dummy terminals are connected to it. The processing power of single system is utilized and dummy terminals are used only to display the information. Client/server architecture: The processing power of the computer system at the user end is utilized by processing the user-interface on that system. A client is a computer system that sends request to the server connected to the network, and a server is a computer system that receives the request, processes it, and returns the requested information back to the client. Two approaches to implement client/server architecture: Two-tier architecture: The user interface and application programs are placed on the client side and the database system on the server side. Three-tier architecture: This is primarily used for web-based applications. It adds intermediate layer known as application server (or web server ) between the client and the database server. The client communicates with the application server, which in turn communicates with the database server. Classification of DBMSs Based on Data Models: Hierarchical Network Relational, Object-oriented Object-relational Based on Number of Users: Single-user system Multi-user system Based on Number of Sites: Centralized database systems Distributed database systems: The database and DBMS are distributed over several computers located at different sites. The computers communicate with each other through various communication media such as high-speed networks or telephone lines.This can be further classified into Homogeneous : All sites have identical database management system software. Heterogeneous : Different sites use different database management system software Based on the Purpose: General purpose Specific purpose: Online transaction processing (OLTP), predominantly used in banking, airlines, supermarkets, manufacturing, etc. Database Design Process The systematic process of designing a database is known as design methodology. The overall database design and implementation process consists of several phases. Requirement collection and analysis: The process of knowing and analyzing the expectations of the users for the new database application in as much detail as possible. Conceptual database design: The database designer selects a suitable data model and translates the data requirements resulting from previous phase into a conceptual database schema. The entity-relationship (E-R) diagram is generally used to represent the conceptual database design. Choice of a DBMS: This depends on many factors such as cost, DBMS features and tools, underlying model, potability and hardware requirements. Logical database design: In this phase, the database designer moves from an abstract data model to the implementation of the database. In case of relational model, this phase generally consists of mapping the E-R model into a relational schema. Physical database design: In this phase, the physical features such as storage structures, file organization, and access paths for the database files are specified to achieve good performance. Database system implementation: Once the logical and physical database designs are completed, the database system can be implemented. DDL statements of the selected DBMS are used and compiled to create the database schema and database files, and finally the database is loaded with the data. Testing and evaluation: In this phase, the database is tested and fine-tuned for the performance, integrity, concurrent access, and security constraints. Note that once the application programs are developed, it is easier to change the physical database design. However, it is difficult to modify the logical database design as it may affect the queries (DML) embedded in the program code.