Database Management System Lecture 1 PDF

Document Details

UndauntedHeliotrope3391

Uploaded by UndauntedHeliotrope3391

Ghana Communication Technology University

Tags

database management systems DBMS database design data management

Summary

This document is a lecture on database systems, providing fundamental concepts, definitions, and examples. It covers topics like data versus information, database management systems (DBMS), their functions, and their importance in managing data effectively.

Full Transcript

DATABASE MANAGEMENT SYSTEM Lecture 1 Database Systems DATA VS. INFORMATION Data: are raw facts. Information: is the result of processing raw data to reveal its meaning Data processing may be as simple as organizing data to reveal patterns or as complex as making forecasts or drawing i...

DATABASE MANAGEMENT SYSTEM Lecture 1 Database Systems DATA VS. INFORMATION Data: are raw facts. Information: is the result of processing raw data to reveal its meaning Data processing may be as simple as organizing data to reveal patterns or as complex as making forecasts or drawing inferences using statistical modelling. Such information can then be used as the foundation for decision making. Timely and useful information requires accurate data. Such data must be generated properly, and they must be stored in a format that is easy to access and process – this calls for the careful management of the data environment. Data management: is a discipline that focuses on the proper generation, storage and retrieval of data. Introduction to the Database and the DBMS Efficient data management typically requires the use of a computer database. Database: is a system which holds data as an integrated system of records/related data and metadata, or data about data/descriptions of the data held in the database. Metadata provides a description of the data characteristics and the set of relationships that link the data found within the database. In a sense, a database resembles a very well-organized electronic filing cabinet in which powerful software, known as a database management system, helps manage the cabinet’s contents. Database management system (DBMS): is a collection of programs that manages the database structure and controls access to the data stored in the database OR it is software that enables easy creation, access, and modification of DBs for efficient and effective DB management. Database Management System Manages interaction between end users and database Database System Environment Hardware Software: OS, DBMS, Applications People Procedures Data Database (DB): Why? Purpose of DB – Optimizes data management – Transforms data into information Importance of DB Design – Defines the database’s expected use Different approach needed for different types of DBs – Avoid data redundancy & ensure data integrity Data is accurate and verifiable – Poorly designed database generates errors Leads to bad decisions Can lead to failure of organization Database (DB): Why? Functions of DBMS/DB Systems – Stores data and related data entry forms, report definitions, etc. – Hides the complexities of relational DB model from the user Facilitates the construction/definition of data elements and their relationships Enables data transformation and presentation Enforces data integrity Implements data security management – Access, privacy, backup & restoration Database (DB): How? Planning & Analysis – Access Goal of the organization Database environment – Existing hardware, software, raw data, data processing procedures Identify – Database needs What DB can do to further the goal of the organization – User needs and characteristics Who the users are, what they want to do, how they envision doing it – DB system requirements What the DB system should do to satisfy the DB and user needs Database (DB): How? Design – From conceptual design to a detailed system specification Implementation – Create the DB Maintenance – Troubleshoot, update, streamline the DB Role of the DBMS The DBMS enables that data in the database to be shared among multiple applications or users The DBMS integrates the many different users’ views of the data into a single all- encompassing data repository DBMS: helps makes data management more efficient and effective. Advantages of the DBMS Improved data sharing: The DBMS helps create an environment in which end users have better access to more and better-managed data. Such access makes it possible for end users to respond quickly to changes in their environment. Better data integration: wider access to well-managed data promotes an integrated view of the organization’s operations and a clearer view of the big picture. It becomes much easier to see how actions in one segment of the company affect other segments. Minimized data inconsistency: Data Inconsistency exists when different versions of the data appear in different places. Example; exams and records may store a student’s name as Patty Dapps but Accounts stores the same students name as Patty C. Dapps. Another example; at Shoprite Accra mall the price a tin of ideal milk as GHS 13.20 and its Westhills mall branch price the same tin of ideal milk as GHS 14.50. The probability of data inconsistency is greatly reduced in a properly designed database. Advantages of the DBMS Improved data access: The DBMS makes it possible to produce quick answers to ad hoc queries. A query is a specific request for data manipulation (for example, to read or update the data) issued to the DBMS. The DBMS sends back an answer (called the query result set) to the application. Improved decision making: Better-managed data and improved data access make it possible to generate better quality information, on which better decisions are based. Increased end-user productivity: The availability of data, combined with the tools that transform data into usable information, empowers end users to make quick, informed decisions that can be the difference between success and failure in the global economy. Types of Databases A DBMS can support many different types of databases. Database can be classified according to the number of users, the database location (s), and the expected type and extent of use. Single – user database: supports only one user at a time. This means if user A is using the database, users B and C must wait until user A is done. A single – user database that runs on a personal computer is called a desktop database. Multiuser database: supports multiple users at the same time. When the multiuser database supports a relatively small number of users (usually fewer than 50) or a specific department within an organization, it is called a workgroup database. When the database is used by the entire organization and supports many users (more than 50, usually hundreds) across many department, the database is known as an enterprise database. Centralized database: a database that supports data located at a single site. Distributed database: a database that supports data distributed across several different sites. Types of Databases Operational database: a database that is designed primarily to support a company’s day – to – to day operations. This type of database is sometimes referred to as transactional or production database. These are databases that are most frequently encountered in common activities such as enrolling in a class, registering a car, making a bank deposit or withdrawal. Data warehouse database: focus primarily on storing data used to generate information required to make tactical or strategic decisions. Such decisions typically require extensive ‘data massaging’ (data manipulation) to extract information to formulate pricing decisions, sales forecasts, market positioning, etc. data warehouse databases derive most of their data from production databases, and if production databases are poorly designed, the data warehouse databases based on them will lose their reliability and value as well. Types of Databases Product Number of Users Data Location Data Usage Single user multiuser Centralized Distributed Operational Data Warehouse Workgroup Enterprise MS Access X X X X MS SQL X* X X X X X X Server IBM DB2 X* X X X X X X ORACLE X* X X X X X X RDBMS * Vendor offers single – user/personal DBMS Why Database Design is Important Database design: refers to the activities that focus on the design of the database structure that will be used to store and manage end – user data. Good database: a database that meets all user requirements – does not just happen; its structure must be designed carefully. Even a good DBMS will perform poorly with a badly designed database. Proper database design requires the database designer to identify precisely the database’s expected use. A well – designed database facilitates data management and generates accurate and valuable information. A poorly designed database is likely to become a breeding ground for difficult – to – trace errors that may lead to bad decision making – and bad decision can lead to the failure of an organization. Database: User-centered Perspective: – The user is always right. If there is a problem with the use of the system, the system is the problem, not the user. Compliance: – The user has the right to a system that performs exactly as promised Instruction: – The user has the right to easy-to-use instructions (user guides, online or contextual help, error messages) for understanding and utilizing a system to achieve desired goals and recover efficiently and gracefully from problem situations. Usability: – The user should be the master of software and hardware technology, not vice- versa. Products should be natural and intuitive to use. Files and File Systems The manual file system used to be what was used to manage data. Such a file system is composed of file folders, each properly tagged and kept in a filing cabinet. Organization of the data within the file folders was determined by the data’s expected use. The content of each file folder were logically related. For example; a file folder in a doctor’s office might contain patient data, one file folder for each patient. All of the data in that file folder would describe only that particular patient’s medical history. Similarly an HR manager might organize personnel data by category of employment (for example; technical, sales and administrative). A file folder labelled ‘Technical; would contain data pertaining to only those people whose duties were properly classified as technical As long as a data collection was relatively small and an organization’s managers had few reporting requirements, the manual system served its role well as a data repository. However, as organizations grew and as reporting requirements became more complex, keeping track of data in a manual and file system became more difficult. With all the associated challenges it became necessary for a computer based system to be designed to track data and produce required reports. The conversion from a manual file system to a matching computer file system was technically complex, hence a new kind of professional, known as a data processing (DP) specialist., had to be hired or ‘grown’ from the current staff. The DP created specialist created the necessary computer file structures, often wrote the software that managed the data within those structures, and designed the application programs that produced reports based on the file data. This resulted to numerous homegrown computerized file systems being born. Initially the computer files within the file system were similar to the manual files. A simple example of a customer data file for a small insurance company is shown in the next slide (you will discover later that the file structure shown in the figure in the next slide, although typically found in early file systems, is unsatisfactory for a database) Basic File Terminology or Vocabulary Data: raw facts, such as a telephone number, a birth date, a customer name, etc. data have little meaning unless they have been organized in some logical manner. The smallest piece of data that can be recognized by the computer is a single character, such as the letter A, the number 5 or a symbol such as %. A single character requires 1 byte of computer storage. Field: a character or group of characters (alphabetic or numeric) that has a specific meaning. A field is used to define and store data. Record: a logically connected set of one or more fields that describes a person, place or thing. File: a collection of related records Problems with File System Data Management Although the file system served a useful purpose in data management, nonetheless, it had its own problems and limitations. A critique of the file system method serves two major purposes: understanding the shortcomings of the file system enables you to understand the development of modern database, and many of the problems are not unique to file systems. Failure to understand such problems is likely to lead to their duplication in a database environment, even though database technology makes it easy to avoid them. Problems: Even the simplest data – retrieval task requires extensive programming in 3rd generation language (Common Business – Oriented Language – COBOL, Beginner’s All – purpose Symbolic Instruction Code – BASIC, C++ and FORmula TRANslation – FORTRAN. 3RD generation language requires the programmer to specify what must be done and how it is to be done. 4th generation language allows the user to specify what must be done without specifying how it must be done. Problem Cont.: When the number of files in the system expands, system administration becomes more difficult. Each file must have its own file management system composed of programs that allow the user to: add, delete and modify file data and list the file contents and generate reports. Making changes to an existing structure can be difficult in a file system environment. For example; changing the structure of just one field in the original CUSTOMER file requires a program that: opens the original file, reads a record from the original file, transforms the original data to conform to the new structure’s storage requirements, writes the transformed data into the new file structure, and deletes the original file. Security features such as effective password protection, the ability to lock out parts of files or parts of the system itself, and other measures designed to safeguard data confidentiality are difficult to program and are therefore often omitted in a file system environment. Summary of the limitations of file system data management It requires extensive programming There are no ad hoc query capabilities System administration can be complex and difficult It is difficult to make changes to existing structures Security features are likely to be inadequate. The limitations, lead to problems of structural and data dependency Structural and Data Dependence A file system exhibits structural dependence; that is access to a file is dependent on its structure. For example, adding a customer date – of – birth field to the CUSTOMER file would require the 5 steps specified earlier. Given this change, none of the previous programs will work with the new CUSTOMER file structure. All of the file system programs must therefore be modified to conform to the new file structure. Because the file system application programs are affected by change in the file structure, they exhibit structural dependence.. Data dependence; on the other hand occurs when file data characteristics such as changing a field from integer to decimal, requires changes in all programs that access the file. Because all data access programs are subject to change when any of the file’s data storage characteristics change (ie. Changing the data type). Field Definitions Considering the CUSTOMER file shown earlier, it appears to have served its purpose well: requested reports could usually be generated. But suppose you want to create a customer phone directory based on the data stored in the CUSTOMER file. Storing the customer name as a single field turns out to be a liability because the directory must break up the field contents to list the last names, first names and initials in alphabetical order. Or suppose you want to get a customer listing by area code. Including the area code in the phone number field is inefficient. Similarly, producing a listing of customers by city is a more difficult task than is necessary. From the user’s point of view, a much better (more flexible) record definition would be one that anticipates reporting requirements by breaking up fields into their components parts. Thus instead of the CUSTOMER file’s fields reading as; C_NAME, C_PHONE, C_ADDRESS, C_POSTCODE, A_NAME, A_PHONE, TP, AMT, REN rather it might be listed as; CUS_LNAME, CUS_FNAME, CUS_INITIAL, CUS_AREACODE, CUS_PHONE, CUS_ADDRESS, CUS_CITY, CUS_COUNTY, CUS_POSTCODE Field Naming Conventions Selecting proper field names is also important. For example, make sure that the field names are reasonably descriptive. From the CUSTOMER file, field name REN represents the customer’s insurance renewal date. Using the field name CUS_RENEW_DATE would be better for 2 reasons; the prefix CUS can be used as an indicator of the field’s origin, which is the CUSTOMER file. RENEW_DATE portion of the field name structure becomes self-documenting. That is, by simply looking as the field names, you can determine which files the fields belong to and what information the fields are likely to contain. Some software packages place restrictions on the length of field names, it is therefore wise to be as descriptive as possible within those restrictions. In addition, very long field names make it difficult to fit more than a few fields on a page, thus making output spacing a problem. Example; the field name CUSTOMER_INSURANCE_RENEW_DATE, while being self – documenting, is less desirable than CUS_RENEW_DATE Field Naming Conventions cont. Another problem with the CUSTOMER file is the difficulty of finding desired data efficiently. The CUSTOMER file currently does not have a unique record identifier. For example, it is possible to have several customers named Michael Afrane. Consequently, the addition of a CUS_ACCOUNT field that contains a unique customer account number would be appropriate. NOTE: The criticisms of field definitions and naming conventions shown in the file structure of the CUSTOMER file are not unique to file systems. Because such conventions will prove to be important later. It shall be revisited in our subsequent topics Regardless of the data environment, the design – whether it involves a file systems or a database – must always reflect the designer’s documentation needs and the end user’s reporting and processing requirements. Both types of needs are best served by adhering to proper field definitions and naming conventions. Data Redundancy The file system’s structure and lack of security make it difficult to source data. The organizational structure promotes the storage of the same basic data in different locations (islands of information for such scattered data locations). It is unlikely that data stored in different locations will always be updated consistently, the islands of information often contain different versions of the same data. Data redundancy exists when the same data are stored unnecessarily at different places. Uncontrolled data redundancy sets the stage for; Data inconsistency: this exists when different and conflicting versions of the same data appear in different places. Data that display data inconsistency are also referred to as data that lack data integrity. Data integrity: the condition in which all the data in the database are consistent with the real – world events and conditions. In other words data are accurate: there are not data inconsistencies. Data are verifiable: the data will always yield consistent results. Data entry errors are more likely to occur when complex entries are made in several different files and/or recur frequently in one or more files. Data entry error such as an incorrectly spelled name or an incorrect phone number yields the same kind of data integrity problems Data anomalies (abnormality – dictionary definition). Ideally, a field value change should be made in only a single place. Data redundancy, however fosters an abnormal condition by forcing field value changes in many different locations. Any change in any field value must be correctly made in many places to maintain data integrity. A data anomaly develops when all of the required changes in the redundant data are not made successfully. Common data anomalies caused; update, insertion and deletion anomalies. The Database System Environment Database system refers to an organization of components that define and regulate the collection, storage, management, and use of data within a database environment. Database system consist of five components; Hardware: refers to all the system’s physical devices. For example, computers (microcomputers, mainframes, workstations, and servers), storage devices, printers, network devices (hubs, switches, routers, and fiber optics) and other devices (automated teller machines, ID readers, etc.) Software: although the most readily identified software is the DBMS itself, to make the database function fully, three types of software are needed: operating system software, DBMS software, and application programs and utilities. The Database System Environment cont. Operating system software: manages all hardware components and makes it possible for all other software to run on the computer. Example of operating system software include Microsoft Windows, Linux, Mac OS, UNIX. DBMS software: manages the database within the database system. Example: Microsoft Access and SQL Server, Oracle Corporation’s Oracle and IBM’s DB2 Application programs and utility software: are used to access and manipulate data in the DBMS and to manage the computer environment in which data access and manipulation take place. Application programs are most commonly used to access data found within the database, to generate reports, tabulations and other information to facilitate decision making. Utilities are the software tools used to help manage the database system’s computer components. For example, all of the major DBMS vendors now provide graphical user interfaces (GUIs) to help create database structures, control database access and monitor database operations. The Database System Environment cont. People: includes all users of the database system. On the basics of primary job functions, 5 types of users can be identified in a database system; Systems Administrators, Database Administrators, Database Designers, Systems Analysts and Programmers, and end users. System Administrators: oversee the database system’s general operations Database Administrators (DBAs): manage the DBMS and ensure that the database is functioning properly. Database Designers: design the database structure. They are the database architects. Systems Analysts and Programmers: design and implement the application programs. They design and create the data entry screens, reports and procedures through which end users access and manipulate the database’s data End Users: are the people who use the application programs to run the organization’s daily operations. Example; salesclerks, supervisors, managers and directors etc. The Database System Environment cont. Procedures: are the instructions and rules that govern the design and use of the database system. Procedures play an important role in a company because they enforce the standards by which business is conducted within the organization and with customers. Procedures are also used to ensure that there is an organized way to monitor and audit both the data that enter the database and the information that is generated through the use of that data. Data: covers the collection of facts stored in the database. DBMS Functions Data Dictionary Management – the DBMS stores definitions of the data elements and their relationships (metadata) in a data dictionary. All programs that access the data in the database work through the DBMS. The DBMS uses the data dictionary to look up the required data component structures and relationships, thus relieving you from having to code such complex relationships in each program. Also any changes made in a database structure are automatically recorded in the data dictionary, this frees one from having to modify all of the programs that access the changed structure – this means the DBMS provides data abstraction and it removes structural and data dependency from the system Data Storage Management – the DBMS creates and manages the complex structures required for data storage, thus relieving you of the difficult task of defining and programming the physical data characteristics. A modern DBMS system provides storage for; the data and its related data entry forms or screen definitions, report definitions, data validation rules, structures to handle video and picture formats etc. DBMS Function Data Transformation and Presentation – the DBMS transforms data entered into required data structures. By using the data transformation and presentation function the DBMS can determine the difference between logical and physical data formats. That is, the DBMS formats the physically retrieved data to make it conform to the user’s logical expectations. For example, imagine an enterprise database used by a multinational company. An end user in England would expect to enter data such as 11 July, 2018 as ‘11/07/2018’. In contrast, the same date would be entered in the United States as ‘07/11/2018’. Regardless of the data presentation format, the DBMS must manage the date in the proper format for each country. Security Management – the DBMS creates a security system that enforces user security and data privacy. Security rules determine which users can access the database, which data items each user can access and which data operations (read, add, delete, or modify) the user can perform. This is especially important in multiuser database systems where many users access the database simultaneously. All database users may be authenticated to the DBMS through a username and password or through biometric authentication. The DBMS uses this information to assign access privileges to various database components, such as queries and reports. DBMS Function Multiuser Access Control: to provide data integrity and data consistency, the DBMS uses sophisticated algorithms to ensure that multiple users can access the database concurrently without compromising the integrity of the database. Backup and Recovery Management: the DBMS provides backup and data recovery to ensure data safety and integrity. Current DBMS systems provide special utilities that allow the DBA to perform routine and special backup and restore procedures. Recovery management deals with the recovery of the database after a failure, such as a bad sector in the disk or a power failure. Data Integrity and Management: the DBMS promotes and enforces integrity rules, thus minimizing data redundancy and maximizing data consistency. The data relationships stored in the data dictionary are used to enforce data integrity. DBMS Function Database Access Language and Application Programming Interfaces: the DBMS provides data access through a query language. A query language: is a non – procedural language – one that lets the user specify what must be done without having to specify how it is to be done. The DBMS may also provide data access to programmers via procedural (3GL) language such as COBOL, C, PASCAL, VISUAL BASIC and C++. The DBMS also provides administrative utilities used by the DBA and the database designer to create, implement, monitor and maintain the database. Structured Query Language (SQL): is the de facto query language and data access standard supported by the majority of DBMS vendors Database Communication Interfaces: current – generation DBMS’ accept end – user request via multiple, different network environment. For example, the DBMS might provide access to the database via the internet through the use of Web browsers; end users can request answers to queries by filling in screen forms through their preferred web browser, the DBMS can automatically publish predefined reports on a website, the DBMS can connect to 3rd party systems to distribute information via email or other productivity applications. Database System Overheads Increased cost: database systems require sophisticated hardware and software and highly skilled personnel. The cost of maintaining the hardware, software, and personnel required to operate and manage a database system can be substantial. Management Complexity: database systems interface with many different technologies and have a significant impact on a company’s resources and culture. The changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the company’s objectives. Given the fact that database systems hold crucial company data that are accessed from multiple sources, security issues must be assessed constantly. System Maintenance: to maximize the efficiency of the database system, you must keep your system current. Frequent updates must be performed and apply the latest patches and security measures to all components. Since database technology advances rapidly, personnel training costs tend to be significant. Vendor Dependence: given the heavy investment in technology and personnel training, companies may be reluctant to change database vendors. As a consequence, vendors are less likely to offer pricing point advantages to existing customers and those customers may be limited in their choice of database system components. Thank You Any Questions?

Use Quizgecko on...
Browser
Browser