INF3703 Databases II Notes PDF
Document Details
Uploaded by FamousErbium
UNISA
2024
Tags
Related
Summary
These notes cover INF3703 Databases II, including the database development process, conceptual, logical, and physical database design, transaction management, database performance tuning, distributed databases, and database connectivity with web technologies. The notes also discuss business intelligence and data warehousing.
Full Transcript
INF3703 Databases II October 15, 2024 Contents 10 Database Development Process 1 10.1 The Information System.................................. 1 10.2 The Systems Development Lif...
INF3703 Databases II October 15, 2024 Contents 10 Database Development Process 1 10.1 The Information System.................................. 1 10.2 The Systems Development Life Cycle........................... 2 10.2.1 Planning...................................... 2 10.2.2 Analysis...................................... 3 10.2.3 Detailed System Design.............................. 3 10.2.4 Implementation.................................. 3 10.2.5 Maintenance.................................... 4 10.3 The Database Life Cycle.................................. 4 10.3.1 The Database Initial Study............................ 4 10.3.2 Database Design.................................. 6 10.3.3 Implementation and Loading........................... 10 10.3.4 Database Security................................. 11 10.3.5 Testing and Evaluation.............................. 15 10.3.6 Operation..................................... 17 10.3.7 Maintenance and Evolution............................ 17 10.3.8 Determine Performance Measures........................ 17 10.4 Database Design Strategies................................. 18 10.5 Centralised vs Decentralised Design............................ 18 10.6 Database Administration.................................. 20 10.6.1 The Managerial Role of the DBA......................... 22 10.6.2 The Technical Role of the DBA.......................... 29 10.6.3 Developing a Data Administration Strategy................... 36 11 Conceptual, Logical, and Physical Database Design 39 11.1 Conceptual Design..................................... 40 11.1.1 Data Analysis and Requirements......................... 40 11.1.2 Entity Relationship Modelling and Normalisation................ 41 11.1.3 Data Model Verification.............................. 42 11.2 Logical Database Design.................................. 44 11.2.1 Creating the Logical Data Model......................... 44 11.3 Physical Database Design.................................. 46 11.3.1 Analyse Data Volume and Database Usage................... 46 11.3.2 Determine a Suitable File Organisation..................... 46 11.3.3 Define Indexes................................... 49 11.3.4 Database Security................................. 49 iii CONTENTS 12 Managing Transactions and Concurrency 51 12.1 What is a Transaction?................................... 51 12.1.1 Evaluating Transaction Results.......................... 51 12.1.2 Transaction Properties............................... 52 12.1.3 Transaction Management with SQL....................... 52 12.1.4 The Transaction Log................................ 53 12.2 Concurrency Control.................................... 53 12.2.1 Lost Updates.................................... 53 12.2.2 Uncommitted Data................................ 53 12.2.3 Inconsistent Retrievals.............................. 54 12.2.4 The Scheduler................................... 54 12.3 Concurrency Control with Locking Methods....................... 54 12.3.1 Lock Granularity.................................. 55 12.3.2 Lock Types..................................... 56 12.3.3 Two-Phase Locking to Ensure Serialisability................... 57 12.3.4 Deadlocks..................................... 57 12.4 Concurrency Control with Time Stamping Methods................... 58 12.4.1 Wait/Die and Wound/Wait Schemes....................... 59 12.5 Concurrency Control with Optimistic Methods...................... 59 12.6 ANSI Levels of Transaction Isolation............................ 60 12.7 Database Recovery Management.............................. 60 12.7.1 Transaction Recovery............................... 61 13 Managing Database and SQL Performance 63 13.1 Database Performance-Tuning Concepts.......................... 63 13.1.1 Performance Tuning: Client and Server..................... 63 13.1.2 DBMS Architecture................................ 64 13.1.3 Database Query Optimisation Modes...................... 65 13.1.4 Database Statistics................................. 66 13.2 Query Processing...................................... 67 13.2.1 SQL Parsing Phase................................. 68 13.2.2 SQL Execution Phase............................... 69 13.2.3 SQL Fetching Phase................................ 69 13.2.4 Query Processing Bottlenecks........................... 69 13.3 Optimiser Choices...................................... 70 13.3.1 Using Hints to Affect Optimiser Choices..................... 70 13.4 SQL Performance Tuning.................................. 71 13.4.1 Index Selectivity.................................. 71 13.4.2 Conditional Expressions.............................. 71 13.5 DBMS Performance Tuning................................. 72 14 Distributed Databases 75 14.1 The Evolution of Distributed Database Management Systems.............. 75 14.2 DDBMS Advantages and Disadvantages.......................... 76 14.3 Distributed Processing and Distributed Databases.................... 77 14.4 Characteristics of Distributed Database Management Systems.............. 77 14.5 DDBMS Components.................................... 78 14.6 Levels of Data and Process Distribution.......................... 79 14.6.1 Multiple-Site Processing, Multiple-Site Data (MPMD)............. 79 14.7 Distributed Database Transparency Features....................... 80 iv Contents 14.8 Distribution Transparency................................. 80 14.9 Transaction Transparency.................................. 81 14.9.1 Distributed Requests and Distributed Transactions............... 81 14.9.2 Distributed Concurrency Control......................... 82 14.9.3 Two-Phase Commit Protocol........................... 82 14.10 Performance and Failure Transparency.......................... 82 14.11 Distributed Database Design................................ 83 14.11.1 Data Fragmentation................................ 83 14.11.2 Data Replication.................................. 84 14.11.3 Data Allocation.................................. 85 14.12 The CAP Theorem...................................... 86 14.13 Distributed Databases within the Cloud.......................... 86 14.14 C.J. Date’s 12 Commandments for Distributed Databases................ 87 15 Databases for Business Intelligence 89 15.1 Business Intelligence.................................... 89 15.1.1 Business Intelligence Architecture........................ 90 15.1.2 Business Intelligence Evolution.......................... 92 15.2 Decision Support Data................................... 92 15.2.1 Operational Data vs Decision Support Data................... 92 15.2.2 Decision Support Database Requirements.................... 94 15.3 The Data Warehouse.................................... 94 15.3.1 Twelve Rules that Define a Data Warehouse................... 96 15.3.2 Data Marts..................................... 96 15.3.3 Designing and Implementing a Data Warehouse................ 97 15.3.4 The Extraction, Transformation, Loading Process................ 97 15.4 Star Schemas........................................ 99 15.4.1 Star Schema Representation........................... 100 15.4.2 Star Schema Performance-Improving Techniques................ 100 15.5 Data Analytics........................................ 101 15.5.1 Data Mining.................................... 102 15.5.2 Predictive Analytics................................ 103 15.6 Online Analytical Processing................................ 103 15.6.1 Multidimensional Data Analysis Techniques................... 104 15.6.2 Advanced Database Support........................... 104 15.6.3 Easy-to-Use End-User Interface.......................... 105 15.6.4 OLAP Architecture................................. 105 15.6.5 Relational OLAP.................................. 105 15.6.6 Multidimensional OLAP.............................. 106 15.7 SQL Analytic Functions................................... 106 15.7.1 The ROLLUP Extension............................... 107 15.7.2 The CUBE Extension................................ 107 15.7.3 Materialised Views................................. 107 15.8 Data Visualisation...................................... 108 15.8.1 The Science of Data Visualisation........................ 109 15.8.2 Understanding the Data.............................. 109 v CONTENTS 17 Database Connectivity and Web Technologies 111 17.1 Database Connectivity................................... 111 17.1.1 ODBC, DAO, RDO, and UDA........................... 112 17.1.2 OLE-DB....................................... 113 17.1.3 ADO.NET...................................... 114 17.1.4 Java Database Connectivity (JDBC)....................... 115 17.1.5 PHP......................................... 116 17.2 Database Internet Connectivity............................... 116 17.2.1 Web-to-Database Middleware: Server-Side Extensions............. 116 17.2.2 Web Server Interfaces............................... 117 17.2.3 The Web Browser................................. 118 17.2.4 Client Side Extensions............................... 118 17.2.5 Web Application Servers............................. 118 17.2.6 Web Database Development........................... 119 17.3 Extensible Markup Language (XML)............................ 119 17.3.1 Document Type Definitions (DTDs) and XML Schemas............. 120 17.3.2 XML Presentation................................. 120 17.3.3 SQL/XML and XQuery............................... 121 17.4 Cloud Computing Services................................. 121 17.4.1 Characteristics of Cloud Services......................... 121 17.4.2 Types of Cloud Services.............................. 122 17.4.3 Cloud Services: Advantages and Disadvantages................. 122 17.4.4 SQL Data Services................................. 123 17.5 The Semantic Web..................................... 124 vi Chapter 10 Database Development Process 10.1 The Information System A database is a carefully designed and constructed repository of facts. This fact repository is part of a larger whole, known as an information system. An information system provides for data collection, storage, and retrieval. It also facilitates the transformation of data into information, and the management of both data and information. A complete information system is composed of people, hardware, software, procedures, the databases, and application programs. Systems Analysis is the process that establishes the need for, and the scope of, an information system. The process of creating an information system is known as systems development. Within the framework of systems development, applications transform data into the information that forms the basis of decision-making. Every application is composed of two parts: the data, and the code by which the data are transformed into information. The performance of an information system depends on a triad of factors: 1. Database design and implementation 2. Application design and implementation 3. Administrative procedures In a broad sense, the term database development describes the process of database design and implementation. The primary objective in database design is to create complete, normalised, non- redundant (to the extent possible) and fully integrated conceptual, logical, and physical database models. The implementation phase includes creating the database storage structure, loading data into the database, and providing for data management. 1 CHAPTER 10. DATABASE DEVELOPMENT PROCESS 10.2 The Systems Development Life Cycle Phases of the SDLC Planning Initial assessment Feasibility study Analysis User requirements Existing system evaluation Logical system design Detailed System Design Detailed system specification Implementation Coding, testing and debugging Installation, fine-tuning Maintenance Evaluation Maintenance Enhancement The SDLC is an iterative rather than a sequential process. 10.2.1 Planning The SDLC planning phase yields a general overview of the company and its objectives. Initial Assessment An initial assessment of the information-flow-and-extent requirements must be made during this discovery portion of the SDLC. This should answer the questions: Should the existing system be continued? Should the existing system be modified? Should the existing system be replaced? Participants in the SDLC’s initial assessment must begin to study and evaluate alternative solutions. If it is decided that a new system is necessary, the next question is whether it is feasible. Feasibility Study The feasibility study must address the following: The technical aspects of hardware and software requirements The system cost The operational cost 2 10.2. The Systems Development Life Cycle 10.2.2 Analysis Problems defined during the planning phase are examined in greater detail in the analysis phase. A macroanalysis must be made of both individual and organisational needs, addressing questions such as: What are the requirements of the current system’s end users? Do those requirements fit into the overall information requirements? The existing hardware and software are also studied during the analysis phase. The result of analysis should be a better understanding of the system’s functional areas, actual and potential problems and opportunities. End users and the system designer(s) must work together to identify processes and to uncover potential problem areas. Along with a study of user requirements and the existing systems, the analysis phase also includes the creation of a logical systems design. The logical design must specify the appropriate conceptual data model, inputs, processes, and expected output requirements. The database design’s data-modelling activities take place at this point, to discover and describe all entities and their attributes, and the relationships among the entities within the database. Defining the logical system also yields functional descriptions of the system’s components (modules) for each process within the database environment. All data transformations are described and documented using systems analysis tools such as data flow diagrams (DFDs). The conceptual data model is validated against those processes. 10.2.3 Detailed System Design The designer completes the design of the system’s processes. The design includes all necessary technical specifications for the screens, menus, reports, and other devices, that might be used to help make the system a more efficient information generator. The steps are laid out for conversion from the old to the new system. Training principles and methodologies are also planned, and must be submitted for management’s approval. 10.2.4 Implementation During the implementation phase, the hardware, DBMS software, and application programs, are installed, and the database design is implemented. During the initial stages of the implementation phase, the system enters into a cycle of coding, testing, and debugging, until it is ready to be delivered. The actual database is created, and the system is customised by the creation of tables and views, user authorisation, and so on. The database contents may be loaded interactively or in batch mode, using a variety of methods and devices: Customised user programs Database interface programs Conversion programs that import the data from a different file structure, using batch programs, a database utility, or both. 3 CHAPTER 10. DATABASE DEVELOPMENT PROCESS The system is subjected to exhaustive testing until it is ready for use. Traditionally, the implementation and testing of a new system took 50 to 60 percent of the total development time. However, the advent of sophisticated application generators and debugging tools has substantially decreased coding and testing time. After testing is concluded, the final documentation is reviewed and printed, and end users are trained. The system is in full operation at the end of this phase, but will be continuously evaluated and fine- tuned. 10.2.5 Maintenance Almost as soon as the system is operational, end users begin to request changes in it. Those changes generate system maintenance activities, which can be grouped into three types: Corrective Maintenance In response to system errors. Adaptive Maintenance Due to changes in the business environment. Perfective Maintenance To enhance the system Because every request for structural change requires retracing the SDLC steps, the system is, in a sense, always at some stage of the SDLC. Every system has a predetermined operational lifespan. The actual operational lifespan of a system depends on its perceived utility. There are several reasons for reducing the operational life of certain systems: some include rapid technological change (especially for systems based on processing speed and expandability) and the cost of maintaining a system. If the system’s maintenance cost is high, its value becomes suspect. Computer-aided systems engineering (CASE) technology, such as System Architect or Visio Professional, helps make it possible to product better systems within a reasonable amount of time and at a reasonable cost. In addition, the more structured, better-documented, and standardised implementation, of CASE- produced applications tends to prolong the operational life of systems by making them easier and cheaper to update and maintain. 10.3 The Database Life Cycle 10.3.1 The Database Initial Study If a designer has been called in, it is likely that the current system has failed to perform functions deemed vital by the company. So, in addition to examining the current system’s operation within the company, the designer must determine how and why the current system fails. The overall purpose of the database initial study is to: Analyse the company situation Define problems and constraints Define objectives Define scope and boundaries 4 10.3. The Database Life Cycle Phases of the DBLC Database Initial Study Analyse the company situation Define problems and constraints Define objectives Define scope and boundaries Database Design Create the conceptual design DBMS software selection Create the logical design Create the physical design Implementation and Loading Install the DBMS Create the databases Load or convert the data Testing and Evaluation Test the database Fine-tune the database Evaluate the database and its application programs Operation Produce the required information flow Maintenance and Evolution Introduce changes Make enhancements Analyse the Company Situation The company situation describes the general conditions in which a company operates, its organisational structure, and its mission. To analyse the company situation, the database designer must discover what the company’s operational components are, how they function, and how they interact. What is the organisation’s general operating environment, and what is its mission within that environment? What is the organisation’s structure? Define Problems and Constraints The designer has both formal and informal sources of information. If the company has existed for any length of time, it already has some form of system in place. It can be useful to consider how the existing system functions, what its inputs and outputs are, the documents generated, how the documents are used, and who the documents are used by. Remember to consider the differences between the official version of a system’s operation, and the more informal, real version. The problem definition process might initially appear to be unstructured. Company end users are often unable to describe precisely the larger scope of company operations, or to identify the real problems encountered during company operations. Often the managerial view of a company’s operation is different from that of the end users who perform the actual routine work. Finding precise answers is important, especially concerning the operational relationships among business units. Even the most complete and accurate problem definition does not lead to the perfect solution. The real world usually intrudes to limit the design of even the most elegant database by imposing constraints. 5 CHAPTER 10. DATABASE DEVELOPMENT PROCESS Such constraints include time, budget, personnel, and more. The designer must learn to distinguish between what’s perfect and what’s possible. Also, the designer must look for the source of the problem, rather than designing a system to treat the symptoms of the problem. Define Objectives A proposed database system must be designed to help solve at least the major problems identified during the problem discovery process. As the list of problems unfolds, several common sources are likely to be discovered. The initial study phase also yields proposed problem solutions. The designer’s job is to ensure that the database system objectives, as seen by the designer, correspond to those envisioned by the end-user(s). The database designer must begin to address the following questions: What is the proposed system’s initial objective? Will the system interface with other existing or future systems in the company? Will the system share the data with other systems or users? Define Scope and Boundaries The system’s scope defines the extent of the design according to operational requirements. For example, will the database design encompass the entire organisation, one or more departments within the organisation, or one or more functions of a single department? Knowing the database design scope helps in defining the required data structures, the type and number of entities, the physical size of the database, and so on. The proposed system is also subject to limits known as boundaries, which are external to the system. Boundaries include time and budget limitations. They are also imposed by existing hardware and software. 10.3.2 Database Design The second phase focuses on the design of the database model that will support company operations and objectives. This is arguably the most critical DBLC phase: making sure that the final product meets user and system requirements. In the process of database design, you must concentrate on the data characteristics required to build the database model. At this point, there are two views of the data within the system: the business view of data as a source of information, and the designer’s view of the data structure, its access, and the activities required to transform the data into information. The process of database design is loosely related to the analysis and design of a larger system. The data component is only one element of a larger information system. The systems analysts or systems programmers are in charge of designing the other system components. Their activities create the procedures that will help transform the data within the database into useful information. The database design does not constitute a sequential process. Rather, it is an iterative process that provides continuous feedback designed to trace previous steps. 6 10.3. The Database Life Cycle Figure 10.1: Two views of data: business manager and designer Conceptual Design In the conceptual design stage, data modelling is used to create an abstract database structure that represents real-world objects in the most realistic way possible. The conceptual model must embody a clear understanding of the business and its functional areas. At this level of abstraction, the type of hardware and/or database model to be used might not yet have been identified. Therefore, the design must be software and hardware independent, so the system can be set up within any hardware and software platform chosen later. Minimal Data Rule All that is needed is there, and all that is there is needed. All data elements required by the database transactions must be defined in the model, and all data elements defined in the model must be used by at least one database transaction. As you apply the minimal data rule, avoid an excessive short-term bias. Focus not only on the immediate data needs of the business, but also on the future data needs. Thus, the database design must leave room for future modifications and additions. Steps in Conceptual Design Data analysis and requirements Entity relationship modelling and normalisation Data model verification Distributed database design 7 CHAPTER 10. DATABASE DEVELOPMENT PROCESS DBMS Software Selection The selection of DBMS software is critical to the information system’s smooth operation. Therefore, the advantages and disadvantages of the proposed DBMS software should be carefully studied. To avoid false expectations, the end user must be made aware of the limitations of both the DBMS and the database. Factors Affecting DBMS Software Selection The factors to consider are: Costs Purchase, maintenance, operational, licence, installation, training, and conversion costs. DBMS features and tools Some database software includes a variety of tools that facilitate the application development task. For example, the availability of query by example, screen painters, report generators, application generators, data dictionaries, and so on, helps create a more pleasant work environment for both the end user and the application programmer. Database administrator facilities, query facilities, ease of use, performance, security, concurrency control, transaction processing, and third-party support also influence DBMS software selection. Underlying model Hierarchical, network, relational, object/relational, or object-oriented. Portability Across platforms, systems, and languages. DBMS hardware requirements Processor(s), RAM, disk space, and so on. Logical Design The second stage in the database design cycle is known as logical design. The aim of the logical design stage is to map the conceptual model into a logical model that can then be implemented on a relational DBMS. Logical Design Steps 1. Creating the logical data model 2. Validating the logical data model using normalisation 3. Assigning and validating integrity constraints 4. Managing logical models constructed for different parts of the database 5. Reviewing the logical model with the user The right to use the database is also specified during the logical design phase. The logical design translates the software-independent conceptual model into a software-dependent model by defining the appropriate domain definitions, the required tables, and the necessary access restrictions. 8 10.3. The Database Life Cycle Physical Design Physical design is the process of selecting the data storage and data access characteristics of the database. The storage characteristics are a function of the types of devices supported by the hardware, the type of data access methods supported by the system, and the DBMS. Physical design affects not only the location of the data in the storage device(s), but also the performance of the system. Stages of Physical Design 1. Analyse data volume and database usage 2. Translate each relation identified in the logical data model into tables 3. Determine a suitable file organisation 4. Define indexes 5. Define user views 6. Estimate data storage requirements 7. Determine database security for users Physical design is a very technical job, more typical of the client/server and mainframe world than of the desktop world. Yet even in the more complex mid-range and mainframe environments, modern database software has assumed much of the burden of the physical portion of the design and its implementation. In spite of the fact that relational models tend to hide the complexities of the computer’s physical characteristics, the performance of relational databases is affected by physical-level characteristics. Performance can be affected by the characteristics of the storage media, such as seek time, sector and block (page) size, buffer pool size and number of disk platters and read/write heads. In addition, factors such as the creation of an index can have a considerable effect on the relational database’s performance, that is, data access speed and efficiency. Even the type of data request must be analysed carefully to determine the optimum access method for meeting the application requirements, establishing the data volume to be stored and estimating the performance. Some DBMSs automatically reserve the space required to store the database definition and the user’s data in permanent storage devices. This ensures that the data are stored in sequentially adjacent locations, thereby reducing data access time and increasing system performance. Physical design becomes more complex when data are distributed at different locations, because the performance is affected by the communication media’s throughput. Given such complexities, designers favour software that hides as many of the physical-level activities as possible. Logical and physical design can be carried out in parallel, on a table-by-table (or file-by-file) basis. Logical and physical design can also be carried out in parallel when the designer is working with hierarchical and network models. Such parallel activities require the designer to have a thorough understanding of both software and hardware characteristics. 9 CHAPTER 10. DATABASE DEVELOPMENT PROCESS 10.3.3 Implementation and Loading The output of the design phase is a series of instructions detailing the creation of tables, attributes, domains, views, indexes, security constraints, and storage and performance guidelines. In this phase, you actually implement all of these design specifications. Install the DBMS This step is required only when a new dedicated instance of the DBMS is necessary for the system. In many cases, the organisation will have made a particular DBMS the standard to leverage investments in technology and the skills that employees have already developed. The DBMS may be installed on a new server or on existing servers. One current trend is called virtualisation. Virtualisation A technique that creates logical representations of computing resources that are independent of the underlying physical computing resources. The technique is used in many areas of computing, such as the creating of virtual services, virtual storage, and virtual private networks. In a database environment, database virtualisation refers to the installation of a new instance of the DBMS on a virtual server running on shared hardware. This is normally a task that involves system and network administrators to create appropriate user groups and services in the server configuration and networks routing. Another common trend is the use of cloud database services such as Microsoft SQL Database Service or Amazon Relational Database Service. This new generation of services allows users to create databases that can be easily managed, tested, and scaled up as needed. Create the Database(s) In most modern relational DBMSs, a new database implementation requires the creation of special storage-related constructs to house the end-user tables. The constructs usually include the storage group (or file groups), the table spaces, and the tables. A storage group can contain more than one table space, and a table space can contain more than one table. Load or Convert the Data After the database has been created, the data must be loaded into the database tables. Typically, the data will have to be migrated from the previous version of the system. Often, data to be included in the system must be aggregated from multiple sources. In the best case scenario, all the data will be in a relational database so that it can be readily transferred to a new database. However, in some cases, data may have to be imported from other relational databases, non-relational databases, flat files, legacy systems, or even manual paper-and-pencil systems. If the data format does not support direct importing into the new database, conversion programs may have to be created to reformat the data for importing. In the worst-case scenario, much of the data will have to be manually entered into the database. Once the data has been loaded, the database administrator (DBA) works with the application developers to test and evaluate the database. Loading existing data into a cloud-based database service can sometimes be expensive. The reason for this is that most cloud services are priced based not only on the volume of the data to be stored, but also on the amount of data that travels over the network. 10 10.3. The Database Life Cycle Example: Implementation of the Logical Design of IBM’s DB2 1. The system administrator would create the database storage group. This step is mandatory for such mainframes as DB2. Other DBMS software may create equivalent storage groups automatically when a database is created. 2. The system administrator creates the database within the storage group. 3. The system administrator assigns the rights to use the database to a database administrator. 4. The database administrator creates the table space(s) within the database. 5. The database administrator creates the table(s) within the table space(s). 6. The database administrator assigns access rights to the table spaces and to the tables within specified table spaces. Access rights may be limited to views rather than to whole tables. The creation of views is not required for database access in the relational environment, but views are desirable from a security standpoint. 10.3.4 Database Security Data stored in the company database must be protected from access by unauthorised users. Any misuse or damage to the data may have a serious impact on the organisation. The most common security goals relate to the integrity, confidentiality, and availability of data. Within database design, it is essential that security measures are developed to meet the security goals, and in doing so, protect the data from any kind of threat. Threats are any set of circumstances that have the potential to cause loss, misuse or harm to the system and/or its data. Threats can include: The loss of integrity of data through unauthorised modification. The loss of availability of the data. The loss of confidentiality of the data. Threats can occur internally and externally to an organisation, and are of various levels of severity. Example: Types of Threats and their Effects Some types of threats and their effects include: Theft and Fraud of Data Activities such as these are likely to be perpetrated by humans, often by electronic means. Both threat and fraud can occur both inside and outside the organisation and each has to be treated differently. Human error that causes accidental loss of data This is often caused by humans not following policies and procedures such as user authorisation. However, it is important for an organisation to ensure that it has excellent security policies and procedures in place to begin with. Additionally, data can be lost by poor staff training. If employees do not know the procedures surrounding data security then it will be impossible for them to be followed. 11 CHAPTER 10. DATABASE DEVELOPMENT PROCESS Electronic infections There are four general categories of general infections. Viruses A malicious piece of software that is capable of copying itself and spreading across a network. As viruses are usually attached to a program or application, they cannot be ‘caught’ without human intervention. Email viruses This type of virus attaches itself to email messages and replicates itself by automatically mailing itself to all people in the receiver’s email address book. Worms are also small pieces of software that replicate themselves using any form of telecommunications network or hole in security. They are different from viruses in that they travel between systems without any human intervention, and they can replicate themselves very quickly between networks. Trojan horses A computer program that claims to perform one task or action. It remains dormant until run and then begins to do damage such as erase a hard disk. The introduction of a virus to a computer network can result in both the loss of integrity of the data and the loss of availability of the system resources, resulting in serious consequences to the business. Natural disasters such as storms, fires, or floods. These are unpredictable, and not deliberate actions, but would still result in the loss of integrity and availability of data. In addition, data could be corrupted due to power surges, and hardware would become physically damaged. Unauthorised access and modification of data The phrase often used for gaining unauthorised access is hacking. Hacking is usually defined as the act of illegally entering a computer system, and making unauthorised changes to files and data contained within. Obtaining unauthorised access to a database may involve a person browsing unauthorised data to gain information that could be used to that person’s benefit, or against the organisation. Unauthorised modification could result in the data being changed, or even deleted. Employee sabotage is concerned with deliberate acts of malice against the organisation. This would include not only any computer system, but also the property, reputation, and safety of a business and its employees. Unauthorised access and modification of data, physically damaging hardware, and theft of data are also covered by this threat. Poor database administration This could be caused by the database administrator not having enough knowledge through lack of training. One example is the DBA granting excessive privileges to a user who exceeds the requirements of his or her job within the organisation. The user then goes on to abuse these privileges. Another example would be that the DBA has only set up weak authentication schemes, which allow attackers to steal or obtain login information, and then assume the identity of genuine database users. The above list of threats is not exhaustive. However, it does highlight the need for an organisation to have a comprehensive data security plan. The plan should contain a number of data security measures to protect both the data and the hardware. The DBMS is only part of the computer system infrastructure within an organisation, and will often rely on the security measures used in other parts of the system. 12 10.3. The Database Life Cycle Data Security Measures Physical security allows only authorised personnel access to specific areas. Depending on the type of database implementation, however, establishing physical security may not always be practical. Examples of impractical candidates for physical security include a university student research database, and large multi-server microcomputer networks. In terms of guarding against the loss of data and hardware due to a natural disaster, the placement of the hardware in a building could be carefully considered. For example, do not place the hardware in the basement, due to the possibility of floods. Physical access to rooms can be controlled by push-button security controls, swipe cards, or biometric systems. User authentication is a way of identifying the user, and verifying that the user is allowed to access restricted data or applications. This can be achieved through the use of passwords and access rights. Password Security allows the assignment of access rights to specific authorised users. Password security is usually enforced at logon time at the operating system level. Access Rights can be established through the use of database software. The assignment of access rights may restrict operations (CREATE, UPDATE, DELETE) on predetermined objects such as databases, tables, views, queries, and reports. User authentication is a function of authorisation management, which is part of the DBA’s managerial role. Audit trails are usually provided by the DBMS to check for access violations. Although the audit trail is an after-the-face device, its mere existence can discourage unauthorised use. Audit trails represent the last line of the database defence. Although it would be preferable for the security measures to work, and for an attacker to not gain access to the system, if all else fails, the audit data itself can identify the existence of a violation or unauthorised access after it has occurred. The audit data may then be used to link a violation to a particular user, and may be used to repair the system. Data encryption can be used to render data useless to unauthorised users who might have violated some of the database security layers or security measures. Data encryption is carried out by an algorithm. Example: Data Encryption Suppose a bank wants to encrypt the account numbers of its customers. The first stage would be to alter the code by a secret one-digit number, for example 5. If a person’s account number is 32451, then the encrypted value would be 32456. The real value can then be decrypted from the encrypted value by subtracting 5. The logic of adding a specific number to the real data is known as the encryption algorithm. Here, the value 5, which is added by the algorithm, is known as the encryption key. Where only one method is used, the method is referred to as the one-key method, or the data encryption standard (DES). Both the sender and the receiver would need to know the key in order to decipher the stored data. With the one-key method, an intruder would need up to ten guesses, whereas for a two-key method, up to 100 guesses would be needed. Therefore, the longer the key, the more difficult it is to decipher the data. In the two-key method, all users who wish to send data have a public key. The encryption algorithm uses this public key to transform the data in the message into an encrypted message. The second key, known as the private key, is used by the encryption algorithm to convert the encrypted message back 13 CHAPTER 10. DATABASE DEVELOPMENT PROCESS to the data in the message. The only person who may hold the private key is the one for whom the original message was destined. Some DBMS products include encryption routines. For example, Oracle DBMS has a feature known as Transparent Data Encryption (TDE) which allows for columns in a database table to be easily encrypted without the need for writing lots of complex code. When users insert data, the database transparently encrypts and stores it in the column. Similarly, when users select the column, the database automatically decrypts it. Encryption in the Real World The most common example of encryption at work is Secure Electronic Transactions (SET). This is an open protocol designed by a large consortium of companies interested in ensuring data privacy in all electronic commerce over the Internet. SET ensures the authenticity of electronic transactions and provides a guarantee that customer’s transactions are protected. A combination of private and public key encryption is used in Secure Sockets Layer (SSL) technology on the Internet. SSLs create a secure connection between a user and an external server, over which any amount of data can be sent securely. The use of SSL can be seen when a person purchases goods from an Internet-based store. This is normally indicated by the use of ‘https’ instead of ‘http’ before the web address. User-defined policies and procedures should be put in place by the organisation to ensure that employees know how to implement the data security measures. Such policies and procedures can cover personal controls such as training employees in security aspects and monitoring employees to ensure that they are actually following the procedures themselves. The establishment of policies and procedures is also a responsibility of the DBA. Backup and recovery strategies should be in place in the event of a disaster occurring. The responsibility ultimately lies with the DBA to ensure the data within the database can always be fully recovered. Antivirus software is used by organisations to search system hard drives and media devices for any known or potential viruses. Each time a virus is discovered, antivirus software vendors record the virus’ unique signature, and then incorporate it into their software database. The antivirus software will check, in real time, all messages entering an organisation’s network from any external source, to see if a known virus is trying to enter. This feature is only useful if kept up to date. Firewalls are systems comprising hardware devices or software applications which act as gatekeepers to an organisation’s network. They are used to prevent unauthorised access by allowing you to establish a set of rules or filters to determine which messages should be allowed in or out of an organisation’s network. They are most commonly used when an organisation’s database can be accessed by Web applications. If a message is flagged as breaking the rules, it is not allowed through. Packet filtering Each message or packet that contains data is checked against a set of filters. Packets that are accepted are allowed to be sent to the designated system, and all others are discarded. Proxy server The proxy server manages all communication between the internal network of an organisation and external networks such as the Internet. There are further advantages to using a proxy server, other than security measures. It can also cache the Web pages that have been requested, so that network traffic is reduced if other users request the same page. This also increased response time. In addition, the proxy server can also be used to limit the websites that users may view outside the organisation. 14 10.3. The Database Life Cycle Circuit-level gateway This blocks all incoming messages to any host but itself. Within the organisation, all the client machines will run software to allow them to establish a connection with the circuit-level gateway machine. The proxy server performs all communication with any external network, such as the Internet, so the internal client machines never actually have any contact with the ‘outside world’. Diskless workstations These allow end-users to access the database without being able to download the information from their workstations. Desirable Attributes Data are: Protected Reconstructable Auditable Tamper-proof Users are: Identifiable Authorised Monitored 10.3.5 Testing and Evaluation In the design phase, decisions were made to ensure integrity, security, performance and recoverability of the database. During implementation and loading, these plans were put into place. In testing and evaluation, the DBA tests and fine-tunes the database to ensure that it performs as expected. This phase occurs in conjunction with application programming. Programmers use database tools to prototype the applications during coding of the programs. Tools such as report generators, screen painters, and menu generators are especially useful to application programmers. Test the Database During this step, the DBA tests the database to ensure that it maintains the integrity and security of the data. Data integrity is enforced by the DBMS through the proper use of primary and foreign key rules. Many DBMSs also support the creation of domain constraints and database triggers. Testing will ensure that these constraints are properly designed and implemented. Data integrity is also the result of properly implemented data management policies, which are part of a comprehensive data administration framework. Evaluate the Database and its Application Programs As the database and application programs are created and tested, the system must also be evaluated using a more holistic approach. Testing and evaluation of the individual components should culminate in a variety of broader system tests to ensure that all the components interact properly to meet the needs of the users. Integration issues and deployment plans are refined, user training is conducted, and system documentation is finalised. Once the system receives final approval, it must be a sustainable resource for the organisation. To ensure that the data contained in the databases are protected against loss, backup and recovery plans are tested. Databases can lose data through unintentional deletions, power outages, and other causes. Data backup and recovery procedures create a safety valve, ensuring the availability of consistent data. Database vendors encourage the use of fault-tolerant components such as uninterruptible power supply (UPS) units, RAID storage devices, clustered servers, and data replication technologies, to ensure the continuous operation of the database in case of a hardware failure. 15 CHAPTER 10. DATABASE DEVELOPMENT PROCESS Database Backup Levels 1. A full backup, or dump, of the entire database. All database objects are backed up in their entirety. 2. A differential backup of the database, in which only the objects that have been updated or modified since the last full backup are backed up. 3. A transaction log backup, which backs up only the transaction log operations that are not reflected in a previous backup copy of the database. In this case, no other database objects are backed up. The database backup is stored in a secure place, usually in a different building from the database itself, and is protected against dangers such as fire, theft, flood, and other potential calamities. The main purpose of the backup is to guarantee database restoration following a hardware or software failure. Depending on the type and extent of the failure, the recovery process ranges from a minor short-term inconvenience to a major long-term rebuild. Regardless of the extent of the required recovery process, recovery is not possible without a usable backup. Database recovery generally follows a predictable scenario. First, the type and extent of the required recovery are determined. If the entire database needs to be recovered to a consistent state, the recovery uses the most recent backup copy of the database in a known consistent state. The backup copy is then rolled forward to restore all subsequent transactions by using the transaction log information. If the database needs to be recovered, but the committed portion of the database is still usable, the recovery process uses the transaction log to ‘undo’ all the transactions that were not committed. At the end of this phase, the database completes an iterative process of testing, evaluation and modification, that continues until the system is certified as ready to enter the operational phase. Common Sources of Database Failure Source Description Software Software-induced failures may be due to the operating system, application programs, or viruses and other malware. Hardware Hardware-induced failures may include memory chip errors, disk crashes, bad disk sectors, and disk-full errors. Programming exemptions Application programs or end-users may roll back transactions when certain conditions are defined. Programming exemptions can also be caused by malicious or improperly tested code that can be exploited by hackers. Transactions The system detects deadlocks and aborts one of the transactions. External factors Backups are especially important when a system suffers complete destruction from a fire, earthquake, flood, or other natural disaster. 16 10.3. The Database Life Cycle 10.3.6 Operation Once the database has passed the evaluation stage, it is considered operational. At this point, the database, its management, its users, and its application programs constitute a complete information system. The beginning of the operational phase starts the process of system evolution. As soon as all the targeted end users have entered the operations phase, problems that could not have been foreseen during the testing phase begin to surface. These problems range from serious to minor issues. Either way, this leads to a demand for change, leading to the next phase: maintenance and evolution. 10.3.7 Maintenance and Evolution Routine Maintenance Activities Preventive maintenance (backup) Corrective maintenance (recovery) Adaptive maintenance (enhancing performance, adding entities and attributes) Assignment of access permissions and their maintenance for new and old users Generation of database access statistics to improve the efficiency and usefulness of system audits and to monitor system performance. Periodic security audits based on the system-generated statistics Periodic system-usage summaries for internal billing or budgeting purposes. The likelihood of new information requirements and the demand for additional reports and new query formats require application changes and possible minor changes in the database components and contents. Those changes can be easily implemented only when the database design is flexible, and when all documentation is updated and online. Eventually, even the best-designed database environment will no longer be capable of incorporating such evolutionary changes; then the whole DBLC process begins anew. 10.3.8 Determine Performance Measures Physical design becomes more complex when data is distributed at different locations because the performance is affected by the communication media’s throughput. Designers favour database software that hides as many of the physical-level activities as possible. Despite the fact that relational models tend to hide the complexities of the computer’s physical characteristics, the performance of relational databases is affected by physical storage properties. Performance can be affected by characteristics of the storage media, such as seek time, sector and block (page) size, buffer pool size, and the number of disk platters and read/write heads. In addition, factors such as the creation of an index can have a considerable effect on the relational database’s performance. Physical design performance measurement deals with fine-tuning the DBMS and queries to ensure they will meet end-user performance requirements. 17 CHAPTER 10. DATABASE DEVELOPMENT PROCESS 10.4 Database Design Strategies There are two classical approaches to database design: Top-down design starts by identifying the data sets, then defines the data elements for each of those sets. This process involves the identification of different entity types and the definition of each entity’s attributes. Bottom-up design first identifies the data elements (items), then groups them together in data sets. In other words, it first defines attributes, then groups them to form entities. The selection of a primary emphasis on top-down or bottom-up procedures often depends on the scope of the problem, or on personal preference. Although the two methodologies are complementary rather than mutually exclusive, a primary emphasis on a bottom-up approach may be more productive for small databases, with few entities, attributes, relations, and transactions. For situations in which the number, variety and complexity of entities, relations and transactions is overwhelming, a primarily top-down approach may be more easily managed. Normalisation and ER models Even when a primarily top-down approach is selected, the normalisation process that revises existing tables is a bottom-up technique. ER models constitute a top-down process even when the selection of attributes and entities can be described as bottom-up. Both normalisation and the ER model form the basis for most designs. 10.5 Centralised vs Decentralised Design The two approaches discusses above can be influenced by factors such as the scope and size of the system, the company’s management style, and the company’s structure. Depending on the above factors, database design might focus on either a centralised or decentralised design philosophy. Centralised Design Centralised design is productive when the data component is composed of a relatively small number of objects and procedures. The design can be carried out and represented in a fairly simple database. Centralised design is typical of relatively simple and/or small databases and can be successfully done by a single person or a small, informal design team. The company operations and the scope of the problem are sufficiently limited to allow even a single designer to define the problem(s), create the conceptual design, verify the conceptual design with the user views, define system processes and data constraints to ensure the efficacy of the design, and ensure that the design will comply with all the requirements. Although centralised design is typical of small companies, it is not limited to small companies. A single conceptual design is completed and then validated in the centralised design approach. 18 10.5. Centralised vs Decentralised Design Decentralised Design Decentralised design might be used when the data component of the system has a considerable number of entities and complex relations on which very complex operations are performed. Decentralised design is also likely to be employed when the problem itself is spread across several operational sites, and each element is a subset of the entire data set. In large and complex projects, the database design typically cannot be done by only one person. Instead, a carefully selected team of database designers is employed to tackle a complex database project. Within the decentralised design framework, the database design task is divided into several modules. Once the design criteria have been established, the lead designer assigns design subsets or modules to design groups within the team. As each design group focuses on modelling a subset of the system, the definition of boundaries and the interrelation among data subsets must be very precise. Each design group creates a conceptual data modelling corresponding to the subset being modelled. Each conceptual model is then verified individually against the user views, processes, and constraints for each of the modules. After the verification process has been completed, all modules are integrated into one conceptual model. Because the data dictionary describes the characteristics of all objects within the conceptual data model, it plays a vital role in the integration process. After all the subsets have been aggregated into a larger conceptual model, the lead designer must verify that the combined conceptual model is still able to support all the required transactions. Aggregation Problems The aggregation process requires the designer to create a single model in which various aggregation problems must be addressed: Synonyms and homonyms Different departments might know the same object by different names (synonyms), or they might use the same name to address different objects (homonyms). The object can be an entity, an attribute, or a relationship. Entity and entity subtypes An entity subtype might be viewed as a separate entity by one or more departments. The designer must integrate such subtypes into a higher-level entity. Conflicting object definitions Attributes can be recorded as different types, or different domains can be defined for the same attribute. Constraint definitions can also vary. The designer must remove such conflicts from the model. 19 CHAPTER 10. DATABASE DEVELOPMENT PROCESS 10.6 Database Administration The person responsible for the control of the centralised and shared database is the database administrator (DBA). The size and role of the DBA function varies from company to company, as does its placement within a company’s organisation structure. On the organisation chart, the DBA function might be defined as either a staff or line position. Placing the DBA function in a staff position often creates a consulting environment in which the DBA is able to device the data administration strategy, but does not have the ability to enforce it or resolve possible conflicts. The DBA function in a line position has both the responsibility and the authority to plan, define, implement, and enforce the policies, standards, and procedures used in the data administration activity. There is no standard for how the DBA function fits in an organisation’s structure. In part, that is because the DBA function itself is probably the most dynamic of any organisation’s functions. The fast-paced changes in DBMS technology dictate changing organisational styles. For example: The development of distributed databases can force an organisation to decentralise the data administration further. The distributed database requires the system DBA to define and delegate the responsibilities of each local DBA, thus imposing new and more complex coordinating activities on the system DBA. The growing use of Internet-ready and object-oriented databases and the growing number of data warehouse applications are likely to add to the DBA’s data modelling and design activities, thus expanding and diversifying the DBA’s job. The increasing sophistication and power of desktop-based DBMS packages provided an easy platform for the development of user-friendly, cost-effective and efficient solutions to specific departmental information needs. But such an environment also invites data duplication, not to mention the problems created by people who lack the technical qualifications to produce good database designs. In short, the new desktop environment requires the DBA to develop a new set of technical and managerial skills. Although no current standard exists, it is common practice to define the DBA function by dividing the DBA operations according to the DBLC phases. If that approach is used, the DBA function requires personnel to cover the following activities: Database planning, including the definition of standards, procedures, and enforcement. Database requirements gathering and conceptual design Database logical design and transaction design Database physical design and implementation Database testing and debugging Database operations and maintenance, including installation, conversion, and migration Database training and support 20 10.6. Database Administration Figure 10.2: A DBA Functional Organisation A company might have several different and incompatible DBMSs installed to support different operations. For example, it is not uncommon to find corporations with a hierarchical DBMS to support the daily transactions at an operational level, and a relational database to support middle and top management’s ad hoc information needs. There may also be a variety of desktop DBMSs installed in the different departments. In such an environment, the company might have one DBA assigned for each DBMS. The general coordinator of all DBAs is sometimes known as the systems administrator (SYSADM). Figure 10.3: Multiple Database Administrators in an Organisation There is a growing trend towards specialisation in the data management function. For example, the organisation charts used by some of the bigger corporations make a distinction between a DBA and a data administrator (DA). The DA, also known as the information resource manager (IRM), usually reports directly to top management and is given a higher degree of responsibility and authority than the DBA, although the two roles tend to overlap to some extent. The DA is responsible for controlling the overall corporate data resources, both computerised and manual. Thus, the DA’s job description covers a larger area of operations than that of the DBA because the DA is in charge of controlling not only the computerised data, but also the data outside the scope of the DBMS. The placement of the DBA within the expanded organisational structure may vary from company to company. Depending on the structure’s components, the DBA might report to the DA, the IRM, the IS manager, or directly to the company’s CEO. There are two distinct roles that the DBA must perform: the managerial role, and the technical role. The DBA’s managerial role is focused on personnel management and on interactions with the end-user community. The DBA’s technical role involves the use of the DBMS – database design, development, and implementation – as well as the production, development, and use of application programs. 21 CHAPTER 10. DATABASE DEVELOPMENT PROCESS Desired DBA Skills Managerial Technical Broad business understanding Broad data-processing background Coordination skills Systems development life cycle knowledge Analytical skills Structured methodologies: Data flow diagrams Structure charts Programming languages Communication skills (oral and written) Database modelling and design skills: Conceptual Logical Physical Negotiation skills Operation skills: Database implementation Data dictionary management Security etc. 10.6.1 The Managerial Role of the DBA As a manager, the DBA must concentrate on the control and planning dimensions of database administration. Therefore, the DBA is responsible for: Coordinating, monitoring and allocating database administration resources: people and data. Defining goals and formulating strategic plans for the database administration function. DBA Activities and Services DBA Activity DBA Service Planning End-user support Organising Policies, procedures, and standards Testing Data security, privacy, and integrity Monitoring Data backup and recovery Delivering Data distribution and use 22 10.6. Database Administration End-User Support The DBA interacts with the end-user by providing data and information support services to the organisation’s departments. Because end-users usually have dissimilar backgrounds, end-user support services usually include: Gathering user requirements The DBA must work with the end-user community to help gather the data required to identify and describe the end-user’s problems. The DBA’s communication skills are very important at this stage, because the DBA works closely with people who tend to have different computer backgrounds and communication styles. The gathering of user requirements requires the DBA to develop a precise understanding of the user’s views and needs, and to identify present and future information needs. Building end-user confidence Finding adequate solutions to end-user’s problems increases end- user trust and confidence in the DBA function. Resolving conflicts and problems Finding solutions to end-user’s problems in one department might trigger conflicts with other departments. End-users are typically concerned with their own specific data needs rather than those of others, and they are not likely to consider how their data affect other departments within the organisation. When data/information conflicts arise, the DBA function has the responsibility and authority to resolve them. Finding solutions to information needs The ability and authority to resolve data conflicts enable the DBA to develop solutions that will properly fit within the existing data management framework. The DBA’s primary objective is to provide solutions to the end-user’s information needs. Given the growing importance of the Internet, those solutions are likely to require the development and management of Web browsers to interface with the databases. In fact, the explosive growth of e-commerce requires the use of dynamic interfaces to facilitate interactive product queries and product sales. Ensuring quality and integrity of applications and data Once the right solution has been found, it must be properly implemented and used. Therefore, the DBA must work with both application programmers and end-users to teach them the database standards and procedures required for data access and manipulation. The DBA must also make sure that the database transactions do not adversely affect the database’s data quality. Certifying the quality of the application programs that access the database is a crucial DBA function. Special attention must be given to the DBMS Internet interfaces, because those interfaces do not provide the transaction management features that are typically found in the DBMS-managed database environment. Managing the training and support of DBMS users One of the most time-consuming DBA activities is teaching end-users how to use the database properly. The DBA must ensure that all users accessing the database have a basic understanding of the functions and use of the DBMS software. The DBA coordinates and monitors all activities concerning end-user education. 23 CHAPTER 10. DATABASE DEVELOPMENT PROCESS Policies, Standard, and Procedures A prime component of a successful data administration strategy is the continuous enforcement of the policies, procedures, and standards for correct data creation, usage, distribution, and deletion within the database. The DBA must define, document, and communicate the policies, procedures, and standards, before they can be enforced. Policies General statements of direction or action that communicate and support DBA goals. Example All users must have passwords. Passwords must be changed every six months. Standards More detailed and specific than policies, and describe the minimum requirements of a given DBA activity. In effect, standards are rules that are used to evaluate the quality of the activity. For example, standards define the structure of application programs, and the naming conventions programmers must use. Example A password must have a minimum of five characters. A password may have a maximum of 12 characters. ID numbers, names, and birth-dates cannot be used as passwords. Procedures Written instructions that describe a series of steps to be followed during the performance of a given activity. Procedures must be developed within existing working conditions, and they must support and enhance that environment. Example To create a password, 1. the end-user sends the DBA a written request for the creation of an account; 2. the DBA approves the request and forwards it to the computer operator; 3. the computer operator creates the account, assigns a temporary password, and sends the account information to the end-user; 4. a copy of the account information is sent to the DBA; 5. the user changes the temporary password to a permanent one. Standards and procedures defined by the DBA are used by all end-users who want to benefit from the database. Standards and procedures must complement each other, and must constitute an extension of data administration policies. Procedures must facilitate the work of end users and the DBA. 24 10.6. Database Administration The DBA must define, communicate and enforce procedures that cover areas such as: End-user database requirements gathering Which documentation is required? Which forms should be used? Database design and modelling Which database design methodology to use; which tools to use. Documentation and naming conventions Which documentation to use in the definition of all data elements, sets, and programs that access the database. Design, coding, and testing of database application programs The DBA must define the standards for application program coding, documentation, and testing. The DBA standards and procedures are given to the application programmers, and the DBA must enforce those standards. Database software selection The selection of the DBMS package and any other software related to the database must be properly managed. For example, the DBA might require that software be properly interfaced with existing software, that it has the features required by the organisation, and that it provides a positive return on investment. In today’s Internet environment, the DBA must also work with Web administrators to find proper Web-to-database connectivity solutions. Database security and integrity The DBA must define the policies governing security and integrity. Database security is especially crucial. Security standards must be clearly defined and strictly enforced. Security procedures must be designed to handle a multitude of security scenarios to ensure that security problems are minimised. Although no system can ever be completely secure, security procedures must be designed to meet critical standards. The growing use of internet interfaces to databases opens the door to new security threats that are far more complex and difficult to manage than those encountered with more traditional internally generated and controlled interfaces. Therefore, the DBA must work closely with Internet security specialists to ensure that the databases are properly protected from attacks launched inadvertently, or attacks launched deliberately by unauthorised users. Database backup and recovery Database backup and recovery policies must include the information necessary to guarantee proper execution and management of the backups. Database maintenance and operation The DBMS’s daily operations must be clearly documented. Operators must keep job logs, and they must write operator instructions and notes. Such notes are helpful in pinpointing the causes and solutions of the problems. Operational procedures must also include precise information concerning backup and recovery procedures. End-user training A full-featured training program must be established within the organisation, and procedures governing the training must be clearly specified. The objective is to indicate clearly who does what, when, and how. Each end-user must be aware of the type and extent of the available training methodology. Procedures and standards must be revised at least annually to keep them up to date and to ensure that the organisation can adapt quickly to changes in the work environment. The introduction of new DBMS software, the discovery of security or integrity violations, the reorganisation of the company, and similar changes, require revision of the procedures and standards. 25 CHAPTER 10. DATABASE DEVELOPMENT PROCESS Data Security, Privacy, and Integrity The security, privacy, and integrity of the data in the database are of great concern to DBAs who manage current DBMS installations. Technology has pointed the way to greater productivity through information management. Technology has also resulted in the distribution of data across multiple sites, thus making it more difficult to maintain data control, security, and integrity. The multiple-site data configuration has made it imperative that the DBA use the security and integrity mechanisms provided by the DBMS to enforce the database administration policies. In addition, DBAs must team up with security experts to build firewalls, proxy services, and other security mechanisms to safeguard data from possible attacks. Protecting the security and privacy of the data in the database is a function of authorisation management. Authorisation management defines procedures to protect and guarantee database security and integrity. Those procedures include, but are not limited to, user access management, view definition, DBMS access control and DBMS usage monitoring. User access management This function is designed to limit access to the database and likely includes at least the following procedures: Define each user to the database This is achieved at two levels: at the operating system level and at the DBMS level. At the OS level, the DBA can request the creation of a logon user ID that allows the end user to log on to the computer system. At the DBMS level, the DBA can either create a different user ID or employ the same user ID to authorise end-user access to the DBMS. Assign passwords to each user This can also be done at the OS and DBMS level. The database passwords can be assigned with predetermined expiration dates. The use of expiration dates enables the DBA to screen end-users periodically, and to remind users to change their passwords periodically, thus making unauthorised access less probable. Define user groups Classifying users into user groups according to common access needs facilitates the DBA’s job of controlling and managing the access privileges of individual users. Assign access privileges The DBA assigns access privileges or access rights to specific users to access specified databases. An access privilege describes the type of authorised access. Access privileges in relational databases are assigned through SQL GRANT and REVOKE commands. Control physical access Physical security can prevent unauthorised users from directly accessing the DBMS installation and facilities. Some common physical security practices found in large database installations include secured entrances, password-protected workstations, electronic personnel badges, closed-circuit video, voice recognition, and biometric technology. View definition The DBA must define data views to protect and control the scope of the data that are accessible by an authorised user. The DBMS must provide the tools that allow the definition of views that are composed of one or more tables and the assignment of access rights to a user or a group of users. The SQL command CREATE VIEW is used in relational databases to define views. DBMS access control Database access can be controlled by placing limits on the use of the DBMS’s query and reporting tools. The DBA must make sure that those tools are used properly, and only by authorised personnel. 26 10.6. Database Administration DBMS usage monitoring The DBA must also audit the use of the data in the database. Several DBMS packages contain features that allow the creation of an audit log, which automatically records a brief description of the database operations performed by all users. Such audit trails enable the DBA to pinpoint access violations. The audit trails can be tailored to record all database accesses, or just failed database accesses. Security breaches can yield a database whose integrity is either preserved or corrupted: Preserved: Action is required to avoid the repetition of similar security problems, but data recovery may not be necessary. As a matter of fact, most security violations are produced by unauthorised and unnoticed access for information purposes, but such snooping does not disrupt the database. Corrupted: Action is required to avoid the repetition of similar security problems, and the database must be recovered to a consistent state. Corrupting security breaches include database access by computer viruses and by hackers whose actions are designed to alter or destroy data. The integrity of a database might be lost because of external factors beyond the DBA’s control. Whatever the reason, the possibility of data corruption or destruction makes backup and recovery procedures crucial to any DBA. Data Backup and Recovery When data are not readily available, companies face potentially ruinous losses. Therefore, data backup and recovery procedures are critical in all database installations. The DBA must also ensure that the data in the database can be fully recovered in case of physical data loss or loss of database integrity. Data loss can be partial or total. A partial loss can be caused when a physical loss of part of the database has occurred or when part of the database has lost integrity. A total loss might mean that the database continues to exist, but its integrity is entirely lost, or the entire database is physically lost. Either way, backup and recovery procedures are the cheapest database insurance. The management of database security, integrity, backup and recovery is so crucial that many DBA departments have created a position staffed by the database security officer (DSO). The DSO’s sole job is to ensure database security and integrity. In large database shops, the DSO’s activities are often classified as disaster management. Disaster management includes all the DBA activities designed to secure data availability following a physical disaster or a database integrity failure. Disaster management includes all planning, organising, and testing of database contingency plans and recovery procedures. The backup and recovery measures must include at least: Periodic data and applications backups Some DBMSs include tools to ensure backup and recovery of the data in the database. The DBA should use those tools to render the backup and recovery tasks automatic. Products such as IBM’s DB2 allow the creation of different backup types: full, incremental, and concurrent. A full backup, also known as a database dump, produces a complete copy of the entire database. An incremental backup produces a backup of all data since the last backup date; a concurrent backup takes place while the user is working on the database. Proper backup identification Backups must be clearly identified through detailed descriptions and date information, thus enabling the DBA to ensure that the correct backups are used to recover the database. While cloud-based backups are fast replacing tape backups, many organisations still use tapes. As tapes require physical storage, it is vital that the storage and labelling of tapes be done diligently by the computer operators, and the DBA must keep track of tape currency and 27 CHAPTER 10. DATABASE DEVELOPMENT PROCESS location. However, organisations that are large enough to hire a DBA do not typically use tapes for enterprise backup. Other emerging solutions include optical and disk-based backup devices. Such backup solutions use a layered backup approach in which the data are first backed up to fast disk media for intermediate storage and fast restoration. Later, the data is transferred to tape for archival storage. Convenient and safe backup storage There must be multiple backups of the same data, and each backup copy must be stored in a different location. The storage locations must include sites both inside and outside the organisation. The storage locations must be properly prepared and may include fire-safe and earthquake-proof vaults, as well as humidity and temperature controls. The DBA must establish a policy to determine where backups should be stored, and for how long. Physical protection of both hardware and software Protection might include the use of closed installations with restricted access, as well as preparation of the computer sites to provide air conditioning, backup power, and fire protection. Physical protection also includes the provision of a backup computer and DBMS for use in case of emergency. Personal access control to the software of a database installation Multilevel passwords and privileges, and hardware and software challenge/response tokens, can properly identify authorised users of resources. Insurance coverage for the data in the database The DBA or security officer must ensure an insurance policy to provide financial protection in the event of a database failure. The insurance may be expensive, but it is less expensive than the disaster created by massive data loss. Two additional points: Data recovery and contingency plans must be thoroughly tested and evaluated, and they must be practised frequently. So-called fire drills are not to be disparaged, and they require top-level management’s support and enforcement. A backup and recovery program is not likely to cover all components of an information system. Therefore, it is appropriate to stablish priorities concerning the nature and extent of the data recovery process. Data Distribution and Use Data are useful only when they reach the right users at the right time. The DBA is responsible for ensuring that the data are distributed to the right people, at the right time, and in the right format. The DBA’s data distribution and use tasks can become very time-consuming, especially when the data delivery capacity is based on a typical applications programming environment, where users depend on programmers to deliver the programs to access the data in the database. Although the Internet and its intranet and extranet extensions have opened databases to corporate users, their use has also created a new set of challenges for the DBA. Current data distribution philosophy makes it easy for authorised end-users to access the database. One way to accomplish this task is to facilitate the use of a new generation of more sophisticated query tools and the Internet Web front-ends. They enable the DBA to educate end-users to produce the required information without being dependent on application programmers. The DBA must ensure that appropriate standards and procedures are adhered to. This distribution philosophy is common today, and it is likely that it will become more common as database technology marches on. Such an environment is more flexible for the end-user. Enabling end-users to become relatively self-sufficient in the acquisition and use of data can lead to more 28 10.6. Database Administration efficient use of data in the decision process. Yet this ‘data democracy’ can also produce some troublesome side effects. Letting end users micromanage their data subsets could inadvertently sever the connection between those users and the data administration function. The DBA’s job under those circumstances might become sufficiently complicated to compromise the efficiency of the data administration function. Data duplication might flourish again without checks at the organisational level to ensure the uniqueness of data elements. Thus, end-users who do not completely understand the nature and sources of data might make improper use of the data elements. 10.6.2 The Technical Role of the DBA The DBA’s technical role requires a broad understanding of DBMS functions, configuration, programming languages, data modelling, design methodologies, and other DBMS-related issues. The DBA’s technical activities include the selection, installation, operation, maintenance, and upgrading of the DBMS and utility software, as well as the design, development, implementation, and maintenance of the application programs that interact with the database. Many of the DBA’s technical activities are a logical extension of the DBA’s managerial activities. The technical aspects of the DBA’s job are rooted in the following areas of operation: Evaluating, selecting, and installing the DBMS and related utilities Designing and implementing databases and applications Testing and evaluating databases and applications Operating the DBMS, utilities, and applications Training and supporting users Maintaining the DBMS, utilities, and applications Evaluating, Selecting, and Installing the DBMS and Utilities One of the DBA’s first and most important technical responsibilities is selecting the database management system, utility software, and supporting hardware for use in the organisation. Therefore, the DBA must develop and execute a plan for evaluating and selecting the DBMS, utilities, and hardware. The plan must be based primarily on the organisation’s needs, rather than on specific software and hardware features. The DBA must recognise that the search is for solutions to problems, rather than for a computer or DBMS software. A DBMS is a management tool, and not a technological toy. The first and most important step of the evaluation and acquisition plan is to determine company needs. To establish a clear picture of those needs, the DBA must make sure that the entire end- user community, including top- and mid-level managers, is involved in the process. Once the needs are identified, the objectives of the data administration function can be established, and the DBMS features and selection criteria can be defined. Pros and cons of several alternative solutions must be evaluated during the selection process. Available alternatives are often restricted because software must be compatible with the organisation’s existing computer system. A DBMS is just part of the solution: it requires support from other hardware, application software, and utility programs. 29 CHAP