Introduction to DBMS PDF
Document Details
Uploaded by EnchantingErudition1660
Tags
Summary
This document provides a broad overview of database management systems (DBMS). It details the components, operations, benefits, drawbacks, and different types of databases. The document introduces concepts like data, records, tables, and relations, as well as discussing functionalities like retrieval, insertion, and deletion. It also touches upon various database types including centralized, distributed, and cloud-based databases.
Full Transcript
IN T RO D U C T IO N T O D A T A B A S E IN T RO D U C T IO N – DA TA B A SE Database is a collection of related data. Data is facts and figures that can be processed to produce information. o used in variety of forms like text, numbers, media, bytes, etc....
IN T RO D U C T IO N T O D A T A B A S E IN T RO D U C T IO N – DA TA B A SE Database is a collection of related data. Data is facts and figures that can be processed to produce information. o used in variety of forms like text, numbers, media, bytes, etc. o can be stored in pieces of paper or electronic memory, etc. Information is processed, organized, or summarize data. o Data are processed to create information which is meaningful to user. o Information is used for make decision or to interpret data to get the meaning. Example: College Database organizes the data about admin, staff, students, faculty, alumni etc. o Using this database, one can easily retrieve, insert, and delete required information. Database is a collection of related data. o Used to retrieve, insert and delete data efficiently. o Used to organize data in the form of a table, schema, views, and reports, etc. o Data aids in producing information, which is based on facts. o Example, marks obtained by all students, one can find about toppers and average marks. Database management system Software that is used to manage the database. stores data in such a way that it becomes easier to retrieve, manipulate, and produce information. Modern databases are managed by database management system (DBMS). Many databases available like MySQL, Sybase, Oracle, MongoDB, Informix, PostgreSQL, SQL Server, etc. DBMS provides interface to perform various operations like database creation, storing data , updating data, creating a table in database and a lot more. It provides protection and security to database. In case of multiple users, it also maintains data consistency. cylindrical structure is used to display image of a database. Data: Facts, figures, statistics etc. (may not have particular meaning in itself). (e.g. 1, ABC, 19). Record: Collection of related data items, that represent meaningful information. Table or Relation: Collection of related records. Database: structured Collection of related relations. Database consists of columns (attributes / fields / features / domains / variables) and rows (records / tuples). Year Name Age Roll Hostel Roll Roll 1 2nd 1 Rohan 21 1 Netaji 2 1st 2 Nishant 20 2 Nehru 3 3rd 3 Priya 22 3 Kasturba Characteristics of a Database Speed: machine can retrieve data faster then human. Accuracy: accurate and up to date information is available on demand. Protection : data can be better protected against unintentional loss and unlawful access. Compactness : no paper works ( no Bunch of files) Support for multiple views of data Sharing of data and multiuser system - concurrency control strategies. Control of data redundancy - theory of normalisation provides ways to reduce the redundancy. Enforcement of integrity constraints - ensures that the data stored in the database is accurate, consistent, and reliable Database Management System (DBMS): Technology of storing and retrieving data with utmost efficiency along with appropriate security measures. o add, remove, update records o retrieve data that match certain criteria o cross-reference data in different tables o perform complex aggregate calculation A DBMS is a collection of interrelated data and a set of program to access those data. Database Application: All possible applications built upon the data stored in databases. Web/Mobile/Desktop standalone application - e-commerce, online banking, online registration, etc. Major tasks supported by DBMS : Data Definition: used for creation, modification, and removal of definition that defines organization of data in database. Data Updation: used for insertion, modification, and deletion of actual data in database. Data Retrieval: used to retrieve data from database which can be used by applications for various purposes. User Administration: used for registering and monitoring users, maintain data integrity, enforcing data security, dealing with concurrency control, monitoring performance and recovering information corrupted by unexpected failure. Benefit of DBMS Data can be shared Redundancy can be reduced Inconsistency can be avoided Transaction support can be provided Integrity can be maintained Security can be enforced Standards can be enforced. Increased concurrency Improved backup and recovery services Improved data quality Disadvantages of DBMS Cost of Hardware and Software: requires high speed of data processor and large memory size to run DBMS software. Size: occupies a large space of disks and large memory to run them efficiently. Complexity: Database system creates additional complexity and requirements. Higher impact of failure: Failure is highly impacted database because in most organization, all data is stored in a single database and if database is damaged due to electric failure or database corruption then data may be lost forever. Evolution of Databases More than 50 years of journey for several generations Evolution from flat-file system to relational and objects relational systems. It has gone through. Centralized Database (CDB) A model where data is stored, managed, and maintained in a single central server The central repository is accessible by multiple users or applications through a network. These applications contain authentication process to let users access data securely. Example: Central Library, Healthcare System, Banking System …. Advantages of Centralized Database o Data consistency is maintained as it manages data in a central repository. o Provides better data quality, which enables organizations to establish data standards. o Easier to perform backups, updates, and security management because all tasks are centralized. o Less costly because fewer vendors are required to handle the data sets. o Central control allows for more effective implementation of security measures such as access controls and encryption. Disadvantages of Centralized Database o Size of centralized database is large, which increases response time for fetching the data. o Not easy to update such an extensive database system. o If any server failure occurs, entire data will be lost, which could be a huge loss. Distributed Database (DDB) Data is distributed among different database systems of an organization. The database is not stored at a single location, rather stored in multiple computers at the same place or geographically spread far away. Data is stored on multiple servers The distributed database appears as a single database to the user. Maintains consistency across distributed nodes through synchronization mechanisms. Ex: Financial Services SQ L BA SI C S SQ L _ I N T RO SQL - Structured Query Language (also called, See-Quell). Used to perform operations on records stored in database. Updating records, inserting records, deleting records, creating and modifying database tables, views, etc. SQL is not a database system, but it is a query language. SQL operates on database management system in PC. Example of DBMS; Oracle, MySQL, MongoDB, PostgreSQL, SQL Server, DB2, etc. SQL is a query programming language that manages RDBMS. MySQL is a relational database management May/Apr- 2022 system that uses SQL. MSc_BA, DoC Advantages of SQL No programming needed: No large numeric of coding lines, rather simple SQL syntactical rules (user- friendly). High-Speed Query Processing Standardized Language Portability: across difference devices and applications according to user's requirements. Interactive language: easy to learn and understand SQL language. More than one Data View Disadvantages of SQL Cost Interface is Complex May/Apr- 2022 MSc_BA, DoC D I FFER EN T T Y P ES O F CO M M A N D S ✓DDL commands: - To create and modify database objects - CREATE, ALTER, DROP ✓DML commands: - To manipulate data of a database objects- INSERT, DELETE, UPDATE ✓DQL command: - To retrieve the data from a database - SELECT ✓DCL commands: - To control the data of a database – GRANT, REVOKE ✓TCL commands:- To control and manage transactions – COMMIT, SAVEPOINT, ROLLBACK May/Apr- 2022 MSc_BA, DoC SQL commands help in creating and managing database Most common SQL commands which are highly used are: o CREATE command o UPDATE command o DELETE command o SELECT command o DROP command o INSERT command May/Apr- 2022 MSc_BA, DoC CREATE Command: helps in creating new database, new table, table view, and other objects of database. UPDATE Command: helps in updating or changing stored data in database. DELETE Command: helps in removing or erasing saved records from database tables. SELECT Command: helps in accessing single/multiple rows from one/multiple tables of database (also used with WHERE clause) DROP Command: helps in deleting entire table, table view, and other objects from database. INSERT Command: helps in inserting data or records into database tables. May/Apr- 2022 MSc_BA, DoC Show DATABASES; use databaseName; show TABLES; Describe/desc tableName; Show create table tablename; DROP DATABASE databasename; DROP TABLE tablename; May/Apr- 2022 MSc_BA, DoC DATA TYPES INTEGER STRING DATE TIME DATA TYPE - INTEGER SIGNED allows zero, positive, and negative numbers AND UNSIGNED allows zero and positive numbers INTEGER /INT is a numeric value without a decimal Type Min Value Max Value Min Value Max Value (Signed) (Signed) (UnSigned) (UnSigned) TINYINT -128 127 0 255 SMALLINT -32768 32767 0 65535 MEDIUM INT -8388608 8388607 0 16777215 INT -2147483648 2147483647 0 4294967295 63 63 64 BIGINT -2 2 -1 0 2 -1 Zerofill Attribute Used with numeric data types to pad the display width of a column with leading zeros. Ex: regno int(5) zerofill INTEGRITY CONSTRAINTS Rules applied to database tables to maintain the accuracy and consistency of data. The constraints ensure that the data entered into the database adheres to certain rules, preventing invalid or inconsistent data entries. Rule/Constraints can be imposed on single column or combination of columns. Column-level Constraints- Imposed on Single Column. Defined along with Column Table Level Constraint.- Defined at the end after defining all the columns. Types of Integrity Constraint: Primary Key - Is a column that uniquely identifies each row in a table. Foreign Key - Ensures referential integrity between the two tables. The foreign key column in the child table must match a primary key value in the parent table Unique - Ensures that all values in a column or a set of columns are unique Not Null - Ensures that a column cannot have NULL values. CHECK - Ensures that all values in a column satisfy a specific condition or set of conditions. DEFAULT Value - Sets a default value for a column when no value is specified during insertion. Types of Distributed Database: Homogeneous DDB: database systems which execute on same operating system and use same application process and carry same hardware devices. Ex: A network of MySQL databases where all nodes are running on MySQL DBMS. Heterogeneous DDB: database systems which execute on different operating systems under different application procedures, and carries different hardware devices. Ex: System where some nodes (individual server) use Oracle DB and others use PostgreSQL Example for Homogeneous DDB Data Warehousing: Organizations use a homogeneous setup with systems like Microsoft SQL Server or Oracle across multiple data centers for consistent reporting and analytics. Enterprise Resource Planning (ERP): Companies like SAP use homogeneous distributed databases to manage business processes across different departments (finance, HR, logistics) with the same database system. Example for Heterogeneous DDB E-Commerce Platforms: Companies like eBay and Amazon use a mix of relational databases (for transactions) and NoSQL databases (for user-generated content). Social Media Networks: Facebook utilizes various databases like MySQL for structured data, Cassandra for unstructured data, to manage vast amounts of user interactions and content. Advantages of Distributed Database o Can easily scale out by adding more nodes to handle increasing data volumes and user loads. o If one node fails, other nodes can continue to operate, improving system reliability. o Data can be stored closer to its source or users, reducing latency and improving access speed. Disadvantages of Distributed Database o Additional costs associated with maintaining multiple nodes, network infrastructure, and management software. o Relies on the network for communication between nodes, and network issues can impact performance and accessibility. o Ensuring consistent data across all nodes can be challenging, particularly in handling concurrent updates. Hierarchical Data Model 1970’s was era of Hierarchical Database. Prominent hierarchical database model was IBM's first DBMS - IMS (Information Management System). Organizes data in a tree-like structure Files are related in a parent/child manner. (Root,parent and child node) Small circle represents objects in Hierarchical Data Model. Not as flexible as other database models, and are not well suited for handling complex data relationships or changes in data structures. Advantages: Fast and efficient data retrieval Easy to add/delete information Efficient storage of data Disadvantages Limited Flexibility Limited Interoperability Lack of Standardization Network data model Represents data relationships in a graph- like structure Entities are connected through multiple types of relationships. Network model allows for more complex interconnections between entities, reflecting many-to-many relationships. Advantages Supports many-to-many relationships which allows more complex and flexible data structures. Allows for efficient and direct navigational access to data Data retrieval can be faster for certain types of queries compared to hierarchical models. Model reduces data redundancy by allowing entities to share relationships. Disadvantages Complex structure Difficult to understand Database Maintenance Modification of schema Scalability Issues RDBMS (Relational Database Management System) RDBMS is an extension of DBMS. Data is represented as a table that contains rows and column. RDBMS is based on Relational model. RDBMS is a tabular DBMS that maintains security, integrity, accuracy, and consistency of the data. Contains number of tables and each table has its own primary key. Relational database contains the following components: o Table; o Record/ Tuple o Field/Column name /Attribute o Schema o Keys Data Integrity Refers to the accuracy, consistency, and reliability of data Types of Data Integrity: Entity integrity: Ensures that each table has a primary key that uniquely identifies each record. It specifies that there should be no duplicate rows in a table. Domain integrity: Ensures that the values in a column fall within a specific range or set of valid values. Age – only positive value, char(3), fixed format etc Referential integrity: Ensures that relationships between tables remain consistent. It specifies that rows cannot be deleted, which are used by other records. User-defined integrity: It enforces some specific business rules that are defined by users. These rules are different from entity, domain or referential integrity. (Triggers/Constraints) NoSQL Database Not Only SQL store data differently than relational tables. A powerful storage system for Non-relational databases Designed to manage large volumes of unstructured data, return real-time web app analytics, and process big data across the internet of things (IoT). Types of NoSQL Databases are Document-oriented , Key-value, Tabular/Wide Column,Graph Database Types of NoSQL Database 1. Document-oriented databases Stores data in documents, typically in JSON, BSON, or XML format. Each document contains pairs of fields and values. a flexible data model, much suited for semi-structured and unstructured data sets. Examples of document databases are MongoDB and Couchbase. Key Value Database Store data as key-value pairs, where each key is unique and maps to a value. used for caching and session management and provide high performance in reads and writes Examples are Amazon DynamoDB and Redis. Tabular/Wide-column store Database/Column-Family Store Database Stores data in tables, rows, and dynamic columns. Each column family contains rows, but columns are grouped together for better performance on wide tables. Used in data warehousing, OLAP, Cloud based Analytics … Examples: Apache Cassandra, HBase. Graph Databases A type of NoSQL database. A graphical representation of data that contains nodes, edges and properties. Node represents an entity Each edge represents a relationship between two nodes. Properties are key-value pairs that provide additional information about nodes and edges. Every node in a graph database represents a unique identifier. Beneficial for searching relationship between data because they highlight relationship between relevant data. Useful when database contains complex relationship and dynamic schema. Widely used in applications like social networks, recommendation engines, and fraud detection. Example: Neo4j, ArangoDB, Amazon Neptune, and OrientDB. User User Friend Friend Post ID User ID User Name Content Date ID Name Name ID 1 ABC XYZ 2 1 ABC PQR 3 1 1 ABC Hello 20/9/2024 9.15 AM 2 XYZ PQR 3 2 2 XYZ Good Morning 20/9/2024 9.30 2 XYZ MNO 4 AM 3 PQR MNO 4 3 3 PQR Having 20/9/2024 10.15 3 PQR STR 5 Breakfast AM 4 MNO STR 5 4 4 MNO … --- 5 STR ABC 1 5 5 STR … --- 6 1 ABC … --- 7 2 XYZ … --- 8 3 PQR … --- 9 4 MNO … --- 10 5 STR … --- ABC XYZ PQR Advantage of NoSQL High Scalability: NoSQL can handle extensive amount of data because of scalability. High Availability: NoSQL supports auto replication. (makes data highly available because, in case of any failure, data replicates itself to previous consistent state). Disadvantage of NoSQL Open source: there is no reliable standard for NoSQL Management challenge: Data management in NoSQL is more complicated than relational databases. GUI is not available: GUI tools for NoSQL database are not easily available in market. Backup: Backup is a great weak point for NoSQL databases. Difference between Graph and Relational DB Graph Database Relational Database Organizes and stores data using a graph structure Organizes and stores data in tables with rows and of nodes and edges columns Entities are represented by nodes, while Data is structured into predefined schemas, and relationships between entities are depicted by relationships are established using foreign keys. edges. Focuses on graph structures, enabling complex Supports various data models (relational, relationships and connections. document, key-value, etc.) uses specialized languages like Cypher (Neo4j) or uses SQL for relational databases or specific Gremlin. APIs(MongoDB/Couchbase/Cassandra) for NoSQL databases. Ideal for applications like social networks, Suitable for a wide range of applications, from recommendation systems, and fraud detection transactional systems to data warehousing and where relationships are critical. analytics. Horizontal Sacling (Scale Out) Vertical Scaling (Scale Up) Cloud database Facilitates to store, manage, and retrieve structured, unstructured data via cloud platform. Data is accessible over Internet. Also called database as service (DBaaS); because they are offered as a managed service. Some best cloud options are: o AWS (Amazon Web Services); Oracle Cloud Services; Microsoft SQLserver; Google cloud spanner, Google Cloud Firestore Advantages of cloud database Lower costs: No need to invest in databases (hardware, software is facilitated by provider). Automated: Cloud databases are enriched with variety of automated processes (recovery, failover, and auto-scaling). Increased accessibility: User can access cloud-based database from any location, anytime (with internet connection). Types of Cloud Databases Relational Database Use structured data models with tables, rows, and columns Ex: Amazon RDS, Google Cloud SQL, Azure SQL Database NoSQL Databases: Designed for unstructured or semi-structured data, these databases offer flexibility in data models Ex: Amazon DynamoDB, Google Firestore, MongoDB Atlas Data Warehouses: Used for analytics and reporting, Ex: Amazon Redshift, Google BigQuery Object-Oriented Databases (OODB) Contain data in form of object and classes. Objects are real-world entity, and class are collection of objects. Object-oriented databases hold the rules of object-oriented programming. Object-oriented programming properties o Classes - Student (regno, name, course) o Objects - student1(101,’abc’,’M.Sc BA’), student2(102,’xyz’,’M.Sc BA’) o Inheritance – Graduates(student, grade) o Polymorphism – postgraduate(name,course,currentcourse) o Encapsulation – Student(regno,name,contactdetails(private))