CIS9340Chapter 3-Database architectures and the web.pdf
Document Details
Uploaded by ProblemFreeQuail
Tags
Related
- Database Systems PDF - Introduction To Database Systems
- Chapter 2 Database System Concepts and Architecture PDF
- Fundamentals of Database Systems PDF
- Introduction to Database Systems Lectures 4-6 PDF
- DB Lecture 3: Database System Concepts and Architecture PDF
- Comp-3150: Database Management Systems PDF
Full Transcript
CIS 9340 CHAPTER 3 DATABASE ARCHITECTURES AND THE WEB . MULTI-USER DBMS ARCHITECTURES The traditional architecture for multi-user systems was teleprocessing, where there is one computer with a single central processing unit (CPU) and a number of terminals. All processing is performed within the bo...
CIS 9340 CHAPTER 3 DATABASE ARCHITECTURES AND THE WEB . MULTI-USER DBMS ARCHITECTURES The traditional architecture for multi-user systems was teleprocessing, where there is one computer with a single central processing unit (CPU) and a number of terminals. All processing is performed within the boundaries of the same physical computer. MULTI-USER DBMS ARCHITECTURES The terminals send messages via the communications control subsystem of the operating system to the user’s application program, which in turn uses the services of the DBMS. In the same way, messages are routed back to the user’s terminal. Unfortunately, this architecture placed a tremendous burden on the central computer, which had to not only run the application programs and the DBMS, but also carry out a significant amount of work on behalf of the terminals (such as formatting data for display on the screen). MULTI-USER DBMS ARCHITECTURES File-Server Architecture File server A computer attached to a network with the primary purpose of providing shared storage for computer files such as documents, spreadsheets, images, and databases. In a file-server environment, the processing is distributed about the network, typically a local area network (LAN). The file-server holds the files required by the applications and the DBMS. However, the applications and the DBMS run on each workstation, requesting files from the file-server when necessary. MULTI-USER DBMS ARCHITECTURES The DBMS on each workstation sends requests to the fileserver for all data that the DBMS requires that is stored on disk. This approach can generate a significant amount of network traffic, which can lead to performance problems. MULTI-USER DBMS ARCHITECTURES As the file-server has no knowledge of SQL, the DBMS must request the files corresponding to the Branch and Staff relations from the file-server, rather than just the staff names that satisfy the query. The file-server architecture, therefore, has three main disadvantages: (1) There is a large amount of network traffic. (2) A full copy of the DBMS is required on each workstation. (3) Concurrency, recovery, and integrity control are more complex, because there can be multiple DBMSs accessing the same files. MULTI-USER DBMS ARCHITECTURES Traditional Two-Tier Client–Server Architecture To overcome the disadvantages of the first two approaches and accommodate an increasingly decentralized business environment, the client–server architecture was developed. Client–server refers to the way in which software components interact to form a system. There is a client process, which requires some resource, and a server, which provides the resource. There is no requirement that the client and server must reside on the same machine. In practice, it is quite common to place a server at one site in a LAN and the clients at the other sites. MULTI-USER DBMS ARCHITECTURES Data-intensive business applications consist of four major components: the database, the transaction logic, the business and data application logic, and the user interface. The traditional two-tier client–server architecture provides a very basic separation of these components. The client (tier 1) is primarily responsible for the presentation of data to the user, and the server (tier 2) is primarily responsible for supplying data services to the client. Presentation services handle user interface actions and the main business and data application logic. MULTI-USER DBMS ARCHITECTURES MULTI-USER DBMS ARCHITECTURES A typical interaction between client and server is as follows. The client takes the user’s request, checks the syntax, and generates database requests in SQL or another database language appropriate to the application logic. It then transmits the message to the server, waits for a response, and formats the response for the end-user. The server accepts and processes the database requests, then transmits the results back to the client. The processing involves checking authorization, ensuring integrity, maintaining the system catalog, and performing query and update processing. In addition, it also provides concurrency and recovery control. MULTI-USER DBMS ARCHITECTURES There are many advantages to this type of architecture. For example: • It enables wider access to existing databases. • Increased performance: If the clients and server reside on different computers, then different CPUs can be processing applications in parallel. It should also be easier to tune the server machine if its only task is to perform database processing. • Hardware costs may be reduced: It is only the server that requires storage and processing power sufficient to store and manage the database. • Communication costs are reduced: Applications carry out part of the operations on the client and send only requests for database access across the network, resulting in less data being sent across the network. • Increased consistency: The server can handle integrity checks, so that constraints need be defined and validated only in the one place, rather than having each application program perform its own checking. • It maps on to open systems architecture quite naturally. MULTI-USER DBMS ARCHITECTURES MULTI-USER DBMS ARCHITECTURES Three-Tier Client–Server Architecture The need for enterprise scalability challenged the traditional two-tier client–server model. In the mid 1990s, as applications became more complex and could potentially be deployed to hundreds or thousands of end-users, the client side presented two problems that prevented true scalability: • A “fat” client, requiring considerable resources on the client’s computer to run effectively. This includes disk space, RAM, and CPU power. • A significant client-side administration overhead.. MULTI-USER DBMS ARCHITECTURES By 1995, a new variation of the traditional two-tier client–server model appeared to solve the problem of enterprise scalability. This new architecture proposed three layers, each potentially running on a different platform: (1) The user interface layer, which runs on the enduser’s computer (the client). (2) The business logic and data processing layer. This middle tier runs on a server and is often called the application server. (3) A DBMS, which stores the data required by the middle tier. This tier may run on a separate server called the database server. MULTI-USER DBMS ARCHITECTURES The three-tier design has many advantages over traditional two-tier or single-tier designs, which include: • The need for less expensive hardware because the client is “thin.” • Application maintenance is centralized with the transfer of the business logic for many end-users into a single application server. This eliminates the concerns of software distribution that are problematic in the traditional two-tier client–server model. • The added modularity makes it easier to modify or replace one tier without affecting the other tiers. • Load balancing is easier with the separation of the core business logic from the database functions. An additional advantage is that the three-tier architecture maps quite naturally to the Web environment, with a Web browser MULTI-USER DBMS ARCHITECTURES N-Tier Architectures The three-tier architecture can be expanded to n tiers, with additional tiers providing more flexibility and scalability. the middle tier of the architecture could be split into two, with one tier for the Web server and another tier for the application server. In environments with a high volume of throughput, the single Web server could be replaced by a set of Web servers (or a Web farm) to achieve efficient load balancing. MULTI-USER DBMS ARCHITECTURES Application servers Hosts an application programming interface (API) to expose business logic and business processes for use by other applications. An application server must handle a number of complex issues: • concurrency; • network connection management; • providing access to all the database servers; • database connection pooling; • legacy database support; • clustering support; • load balancing; • failover. MULTI-USER DBMS ARCHITECTURES Java Platform, Enterprise Edition (JEE), previously known as J2EE, is a specification for a platform for server programming in the Java programming language. As with other Java Community Process specifications, JEE is also considered informally to be a standard, as providers must agree to certain conformance requirements in order to declare their products to be “JEE-compliant.” A JEE application server can handle the transactions, security, scalability, concurrency, and management of the components that are deployed to it, meaning that the developers should be able to concentrate more on the business logic of the components rather than on infrastructure and integration tasks. Some well known JEE application servers are WebLogic Server and Oracle GoldFish Server from Oracle Corporation, JBoss from Red Hat, WebSphere Application Server from IBM, and the open source Glassfish Application Server. We discuss the JEE platform and the technologies associated with accessing databases in Section 29.7. • .NET Framework is Microsoft’s offering for supporting the development of the middle tier. We discuss Microsoft .NET in Section 29.8. • Oracle Application Server provides a set of services for assembling a scalable multitier infrastructure to support e-Business. MULTI-USER DBMS ARCHITECTURES Middleware Computer software that connects software components or applications. Middleware is a generic term used to describe software that mediates with other software and allows for communication between disparate applications in a heterogeneous system. The need for middleware arises when distributed systems become too complex to manage efficiently without a common interface. The need to make heterogeneous systems work efficiently across a network and be flexible enough to incorporate frequent modifications led to the development of middleware, which hides the underlying complexity of distributed systems. MULTI-USER DBMS ARCHITECTURES Hurwitz (1998) defines six main types of middleware: • Asynchronous Remote Procedure Call (RPC): An inter process communication technology that allows a client to request a service in another address space (typically on another computer across a network) without waiting for a response. An RPC is initiated by the client sending a request message to a known remote server in order to execute a specified procedure using supplied parameters. This type of middleware tends to be highly scalable, as very little information about the connection and the session are maintained by either the client or the server. On the other hand, if the connection is broken, the client has to start over again from the beginning, so the protocol has low recoverability. Asynchronous RPC is most appropriate when transaction integrity is not required. MULTI-USER DBMS ARCHITECTURES Synchronous RPC: Similar to asynchronous RPC, however, while the server is processing the call, the client is blocked (it has to wait until the server has finished processing before resuming execution). This type of middleware is the least scalable but has the best recoverability. There are a number of analogous protocols to RPC, such as: – Java’s Remote Method Invocation (Java RMI) API provides similar functionality to standard UNIX RPC methods; – XML-RPC is an RPC protocol that uses XML to encode its calls and HTTP as a transport mechanism. Microsoft .NET Remoting offers RPC facilities for distributed systems implemented on the Windows platform. – CORBA provides remote procedure invocation through an intermediate layer called the “Object Request Broker.”– The Thrift protocol and framework for the social networking Web site Facebook. MULTI-USER DBMS ARCHITECTURES Publish/subscribe: An asynchronous messaging protocol where subscribers subscribe to messages produced by publishers. Messages can be categorized into classes and subscribers express interest in one or more classes, and receive only messages that are of interest, without knowledge of what (if any) publishers there are. This decoupling of publishers and subscribers allows for greater scalability and a more dynamic network topology. Examples of publish/subscribe middleware include TIBCO Rendezvous from TIBCO Software Inc. and Ice (Internet Communications Engine) from ZeroC Inc. • Message-oriented middleware (MOM): Software that resides on both the client and server and typically supports asynchronous calls between the client and server applications. Message queues provide temporary storage when the destination application is busy or not connected. There are many MOM products on the market, including WebSphere MQ from IBM, MSMQ (Microsoft Message Queuing), JMS (Java Messaging Service), which is part of JEE and enables the development of portable, message-based applications in Java, Sun Java System Message Queue Queue (SJSMQ), which implements JMS, and MessageQ from Oracle Corporation. MULTI-USER DBMS ARCHITECTURES • Object-request broker (ORB): Manages communication and data exchange between objects. ORBs promote interoperability of distributed object systems by allowing developers to build systems by integrating together objects, possibly from different vendors, that communicate with each other via the ORB. The Common Object Requesting Broker Architecture (CORBA) is a standard defined by the Object Management Group (OMG) that enables software components written in multiple computer languages and running on multiple computers to work together. An example of a commercial ORB middleware product is Orbix from Progress Software. MULTI-USER DBMS ARCHITECTURES SQL-oriented data access: Connects applications with databases across the network and translates SQL requests into the database’s native SQL or other database language. SQL-oriented middleware eliminates the need to code SQL-specific calls for each database and to code the underlying communications. More generally, database-oriented middleware connects applications to any type of database (not necessarily a relational DBMS through SQL). Examples include Microsoft’s ODBC (Open Database Connectivity) API, which exposes a single interface to facilitate access to a database and then uses drivers to accommodate differences between databases, and the JDBC API, which uses a single set of Java methods to facilitate access to multiple databases. Within this category we would also include gateways, which act as mediators in distributed DBMSs to translate one database language or dialect of a language into to another language or dialect (for example, Oracle SQL into IBM’s DB2 SQL or Microsoft SQL Server SQL into Object Query Language, or OQL). MULTI-USER DBMS ARCHITECTURES Transaction Processing Monitors A program that controls data transfer between clients and servers in order to provide a consistent environment, particularly for online transaction processing (OLTP). TP Monitor Complex applications are often built on top of several resource managers (such as DBMSs, operating systems, user interfaces, and messaging software). A Transaction Processing Monitor, or TP Monitor, is a middleware component that provides access to the services of a number of resource managers and provides a uniform interface for programmers who are developing transactional software. A TP Monitor forms the middle tier of a three-tier architecture. MULTI-USER DBMS ARCHITECTURES TP Monitors provide significant advantages, including: • Transaction routing: The TP Monitor can increase scalability by directing transactions to specific DBMSs. • Managing distributed transactions: The TP Monitor can manage transactions that require access to data held in multiple, possibly heterogeneous, DBMSs. For example, a transaction may require to update data items held in an Oracle DBMS at site 1, an Informix DBMS at site 2, and an IMS DBMS as site 3. TP Monitors normally control transactions using the X/Open Distributed Transaction Processing (DTP) standard. A DBMS that supports this standard can function as a resource manager under the control of a TP Monitor acting as a transaction manager. MULTI-USER DBMS ARCHITECTURES • Load balancing: The TP Monitor can balance client requests across multiple DBMSs on one or more computers by directing client service calls to the least loaded server. In addition, it can dynamically bring in additional DBMSs as required to provide the necessary performance. • Funneling: In environments with a large number of users, it may sometimes be difficult for all users to be logged on simultaneously to the DBMS. In many cases, we would find that users generally do not need continuous access to the DBMS. Instead of each user connecting to the DBMS, the TP Monitor can establish connections with the DBMSs as and when required, and can funnel user requests through these connections. This allows a larger number of users to access the available DBMSs with a potentially much smaller number of connections, which in turn would mean less resource usage. MULTI-USER DBMS ARCHITECTURES The TP Monitor acts as a transaction manager, performing the necessary actions to maintain the consistency of the database, with the DBMS acting as a resource manager. If the DBMS fails, the TP Monitor may be able to resubmit the transaction to another DBMS or can hold the transaction until the DBMS becomes available again. TP Monitors are typically used in environments with a very high volume of transactions, where the TP Monitor can be used to offload processes from the DBMS server. Prominent examples of TP Monitors include CICS (which is used primarily on IBM mainframes under z/OS and z/VSE) and Tuxedo from Oracle Corporation. In addition, the Java Transaction API (JTA), one of the Java Enterprise Edition (JEE) APIs, enables distributed transactions to be performed across multiple X/Open XA resources in a Java environment. Open-source implementations of JTA include JBossTS, formerly known as Arjuna Transaction Service, from Red Hat and Bitronix Transaction Manager from Bitronix. MULTI-USER DBMS ARCHITECTURES Web Services and Service-Oriented Architectures Web Services A software system designed to support interoperable machine-to- Web service machine interaction over a network. Although it has been only about 20 years since the conception of the Internet, in this relatively short period of time it has profoundly changed many aspects of society, including business, government, broadcasting, shopping, leisure, communication, education, and training. Though the Internet has allowed companies to provide a wide range of services to users, sometimes called B2C (Business to Consumer), Web services allow applications to integrate with other applications across the Internet and may be a key technology that supports B2B (Business to Business) interaction. Unlike other Web-based applications, Web services have no user interface and are not aimed at Web browsers. Web services instead share business logic, data, and processes through a programmatic interface across a network. In this way, it is the applications that interface and not the users. Developers can then add the Web service to a Web page (or an executable program) to offer specific functionality to users. MULTI-USER DBMS ARCHITECTURES Examples of Web services include: • Microsoft Bing Maps and Google Maps Web services provide access to location based services, such as maps, driving directions, proximity searches, and geocoding (that is, converting addresses into geographic coordinates) and reverse geocoding. • Amazon Simple Storage Service (Amazon S3) is a simple Web services interface that can be used to store and retrieve large amounts of data, at any time, from anywhere on the Web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of Web sites. Charges are based on the “pay-as-you-go” policy, currently $0.125 per GB for the first 50TB/month of storage used. MULTI-USER DBMS ARCHITECTURES • Geonames provides a number of location-related Web services; for example, to return a set of Wikipedia entries as XML documents for a given place name or to return the time zone for a given latitude/longitude. • DOTS Web services from Service Objects Inc., an early adopter of Web services, provide a range of services such as company information, reverse telephone number lookup, email address validation, weather information, IP address-to-location determination. • Xignite is a B2B Web service that allows companies to incorporate financial information into their applications. Services include US equities information, real-time securities quotes, US equities pricing, and financial news. Key to the Web services approach is the use of widely accepted technologies and standards, such as: • XML (extensible Markup Language). • SOAP (Simple Object Access Protocol) is a communication protocol for exchanging structured information over the Internet and uses a message format based on XML. It is both platform- and language-independent. MULTI-USER DBMS ARCHITECTURES • WSDL (Web Services Description Language) protocol, again based on XML, is used to describe and locate a Web service. • UDDI (Universal Discovery, Description, and Integration) protocol is a platform independent, XMLbased registry for businesses to list themselves on the Internet. It was designed to be interrogated by SOAP messages and to provide access to WSDL documents describing the protocol bindings and message formats required to interact with the Web services listed in its directory. Connolly, Thomas. Database Systems (p. 70). Pearson Education. Kindle Edition. MULTI-USER DBMS ARCHITECTURES RESTful Web services Web API is a development in Web services where emphasis has been moving away from SOAP-based services towards Representational State Transfer (REST) based communications. REST services do not require XML, SOAP, WSDL, or UDDI definitions. REST is an architectural style that specifies constraints, such as a uniform interface, that if applied to a Web service creates desirable properties, such as performance, scalability, and modifiability, that enable services to work best on the Web. In the REST architectural style, data and functionality are considered resources and are accessed using Uniform Resource Identifiers (URIs), generally links on the Web. The resources are acted upon by using a set of simple, well defined HTML operations for create, read, update, and delete: PUT, GET, POST, and DELETE. PUT creates a new resource, which can be then deleted by using DELETE. GET retrieves the current state of a resource in some representation. POST transfers a new state into a resource. REST adopts a client-server architecture and is designed to use a stateless communication protocol, typically HTTP. In the REST architecture style, clients and servers exchange representations of resources by using a standardized interface and protocol. MULTI-USER DBMS ARCHITECTURES Service-Oriented Architectures (SOA) A business-centric software architecture for building applications that implement business processes as sets of services published at a granularity relevant to the service consumer. Services can be invoked, published, and discovered, and are abstracted away from the implementation using a single standards-based form of interface. SOA Flexibility is recognized as a key requirement for businesses in a time when IT is providing business opportunities that were never envisaged in the past while at the same time the underlying technologies are rapidly changing. Reusability has often been seen as a major goal of software development and underpins the object-oriented paradigm: object-oriented programming (OOP) may be viewed as a collection of cooperating objects, as opposed to a traditional view in which a program may be seen as a group of tasks to compute. MULTI-USER DBMS ARCHITECTURES This architecture has three processes: Service Scheduling, Order Processing, and Account Management, each accessing a number of databases. Clearly there are common “services” in the activities to be performed by these processes. If the business requirements change or new opportunities present themselves, the lack of independence among these processes may lead to difficulties in quickly adapting these processes. The SOA approach attempts to overcome this difficulty by designing loosely coupled and autonomous services that can be combined to provide flexible composite business processes and applications. MULTI-USER DBMS ARCHITECTURES The following are a set of common SOA principles that provide a unique design approach for building Web services for SOA: • Loose coupling: Services must be designed to interact on a loosely coupled basis; • Reusability: Logic that can potentially be reused is designed as a separate service; • Contract: Services adhere to a communications contract that defines the information exchange and any additional service description information, specified by one or more service description documents; • Abstraction: Beyond what is described in the service contract, services hide logic is described in the service contract, services hide logic from the outside world; MULTI-USER DBMS ARCHITECTURES • Composability: Services may compose other services, so that logic can be represented at different levels of granularity thereby promoting reusability and the creation of abstraction layers; • Autonomy: Services have control over the logic they encapsulate and are not dependent upon other services to execute this governance; • Stateless: Services should not be required to manage state information, as this can affect their ability to remain loosely-coupled; • Discoverability: Services are designed to be outwardly descriptive so that they can be found and assessed via available discovery mechanisms. Note that SOA is not restricted to Web services and could be MULTI-USER DBMS ARCHITECTURES Distributed DBMSs A major motivation behind the development of database systems is the desire to integrate the operational data of an organization and to provide controlled access to the data. Although we may think that integration and controlled access implies centralization, this is not the intention. This decentralized approach mirrors the organizational structure of many companies, which are logically distributed into divisions, departments, projects, and so on, and physically distributed into offices, plants, or factories, where each unit maintains its own operational data. The development of a distributed DBMS that reflects this organizational structure, makes the data in all units accessible, and stores data proximate to the location where it is most frequently used, should improve the ability to share the data and should improve the efficiency with which we can access the data. A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network. MULTI-USER DBMS ARCHITECTURES Distributed database A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network. Distributed DBMS The software system that permits the management of the distributed database and makes the distribution transparent to users. MULTI-USER DBMS ARCHITECTURES A distributed database management system (DDBMS) consists of a single logical database that is split into a number of fragments. Each fragment is stored on one or more computers (replicas) under the control of a separate DBMS, with the computers connected by a communications network. Each site is capable of independently processing user requests that require access to local data (that is, each site has some degree of local autonomy) and is also capable of processing data stored on other computers in the network. Users access the distributed database via applications. Applications are classified as those that do not require data from other sites (local applications) and those that do require data from other sites (global applications). . MULTI-USER DBMS ARCHITECTURES A DDBMS therefore has the following characteristics: • a collection of logically related shared data; • data split into a number of fragments; • fragments may be replicated; • fragments/replicas are allocated to sites; • sites are linked by a communications network; • data at each site is under the control of a DBMS; • DBMS at each site can handle local applications, autonomously; • each DBMS participates in at least one global application. It is not necessary for every site in the system to have its MULTI-USER DBMS ARCHITECTURES Distributed processing It is important to make a distinction between a distributed DBMS and distributed processing: The key point with the definition of a distributed DBMS is that the system consists of data that is physically distributed across a number of sites in the network. If the data is centralized, even though other users may be accessing the data over the network, we do not consider this to be a distributed DBMS simply distributed processing. We illustrate the topology of distributed processing. Compare this figure, which has a central database at site 2, which shows several sites each with their own database. MULTI-USER DBMS ARCHITECTURES Data Warehousing Since the 1970s, organizations have largely focused their investment in new computer systems (called online transaction processing or OLTP systems) that automate business processes. In this way, organizations gained competitive advantage through systems that offered more efficient and cost-effective services to the customer. The concept of a data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision making, receiving data from multiple operational data sources. A consolidated/integrated view of corporate data drawn from disparate operational data sources and a range of end-user access tools capable of supporting simple to highly complex queries to support decision making MULTI-USER DBMS ARCHITECTURES The data held in a data warehouse is described as being subject-oriented, integrated, time-variant, and nonvolatile (Inmon, 1993). • Subject-oriented, as the warehouse is organized around the major subjects of the organization (such as customers, products, and sales) rather than the major application areas (such as customer invoicing, stock control, and product sales). This is reflected in the need to store decision-support data rather than application oriented data. • Integrated, because of the coming together of source data from different organization-wide applications systems. The source data is often inconsistent, using, for example, different data types and/or formats. The integrated data source must be made consistent to present a unified view of the data to the users. • Time-variant, because data in the warehouse is accurate and valid only at some point in time or over some time interval. • Nonvolatile, as the data is not updated in real time but is refreshed from operational systems on a regular basis. New data is always added as a supplement to the database, rather than a replacement. The typical architecture of a data warehouse MULTI-USER DBMS ARCHITECTURES The typical architecture of a data warehouse The source of operational data for the data warehouse is supplied from mainframes, proprietary file systems, private workstations and servers, and external systems such as the Internet. An operational data store (ODS) is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact act simply as a staging area for data to be moved into the warehouse. The load manager performs all the operations associated with the extraction and loading of data into the warehouse. The warehouse manager performs all the operations associated with the management of the data, such as the transformation and merging of source data; creation of indexes and views on base tables; generation of aggregations, and backing up and archiving data. The query manager performs all the operations associated with the management of user queries. Detailed data is not stored online but is made available by summarizing the data to the next level of detail. However, on a regular basis, detailed data is added to the warehouse to supplement the summarized data. The warehouse stores all the predefined lightly and highly summarized data generated by the warehouse manager. MULTI-USER DBMS ARCHITECTURES Detailed and summarized data is stored offline for the purposes of archiving and backup. Metadata (data about data) definitions are used by all the processes in the warehouse, including the extraction and loading processes; the warehouse management process; and as part of the query management process. The principal purpose of data warehousing is to provide information to business users for strategic decision making. These users interact with the warehouse using end-user access tools. The data warehouse must efficiently support ad hoc and routine analysis as well as more complex data analysis. The types of end-user access tools typically include reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools. MULTI-USER DBMS ARCHITECTURES Cloud Computing A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (NIST, 2011)1. Cloud computing 1 NIST, 2011. The NIST Definition of Cloud Computing, NIST Special Publication 800-145, National Institute of Standards, September 2011. Cloud computing is the term given to the use of multiple servers over a digital network as if they were one computer. The ‘Cloud’ itself is a virtualization of resources—networks, servers, applications, data storage, and services—which the end-user has on-demand access to. Virtualization is the creation of a virtual version of something, such as a server, operating system, storage device, or network resource. MULTI-USER DBMS ARCHITECTURES The essential characteristics are as follows: • On-demand self-service. Consumers can obtain, configure, and deploy cloud services themselves using cloud service catalogues, without requiring the assistance of anyone from the cloud provider. • Broad network access. The most vital characteristic of cloud computing, namely that it is network based, and accessible from anywhere, from any standardized platform (e.g., desktop computers, laptops, mobile devices). • Resource pooling. The cloud provider’s computing resources are pooled to serve multiple consumers, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. Examples of resources include storage, processing, memory, and network bandwidth. • Rapid elasticity. Resource pooling avoids the capital expenditure MULTI-USER DBMS ARCHITECTURES • Rapid elasticity. Resource pooling avoids the capital expenditure required for the establishment of network and computing infrastructure. By outsourcing to a cloud, consumers can cater for the spikes in demand for their services by using the cloud provider’s computing capacity, and the risk of outages and service interruptions are significantly reduced. Moreover, capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly based on demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be called on in any quantity at any time. Measured service. Cloud systems automatically control and optimize resource use by leveraging a metering capability appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and charged for. MULTI-USER DBMS ARCHITECTURES The three service models defined by NIST are as follows: • Software as a Service (SaaS). Software and associated data are centrally hosted on the cloud. SaaS is typically accessed from various client devices through a thin client interface, such as a Web browser. The consumer does not manage or control the underlying cloud infrastructure with the possible exception of limited user specific application configuration settings.. Examples include Salesforce.com sales management applications, NetSuite’s integrated business management software. • Platform as a Service (PaaS). PaaS a computing platform that allows the creation of web applications quickly and easily and without the complexity of buying and maintaining the software and infrastructure underneath it. Sometimes, PaaS is used to extend the capabilities of applications developed as SaaS. While earlier application development required hardware, an operating system, a database, middleware, Web servers, and other software, with the PaaS model only the knowledge to integrate them is required. The rest is handled by the PaaS provider. Examples of PaaS include Salesforce.com’s Force.com, Google’s App Engine, and Microsoft’s Azure. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the hosting environment MULTI-USER DBMS ARCHITECTURES • Infrastructure as a Service (IaaS). Iaas delivers servers, storage, network and operating systems—typically a platform virtualization environment—to consumers as an on-demand service, in a single bundle and billed according to usage. A popular use of IaaS is in hosting Web sites, where the in-house infrastructure is not burdened with this task but left free to manage the business. Amazon’s Elastic Compute Cloud (EC2), Rackspace, and GoGrid are examples of IaaS. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications, and possibly limited control of select networking components (e.g., firewalls). MULTI-USER DBMS ARCHITECTURES The four main deployment models for the cloud are: • Private cloud. Cloud infrastructure is operated solely for a single organization, whether managed internally by the organization, a third party, or some combination of them, and it may be hosted internally or externally. • Community cloud. Cloud infrastructure is shared for exclusive use by a specific community of organizations that have common concerns (e.g., security requirements, compliance, jurisdiction). It may be owned and managed by one or more of the organizations in the community, a third party, or some combination of them, and it may be hosted internally or externally. • Public cloud. Cloud infrastructure is made available to the general public by a service provider. These services are free or offered on a pay-per-use model. It may be owned and managed by a business, academic, or government organization, or some combination of these. It exists on the premises of the cloud provider. • Hybrid cloud. Cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology, offering the benefits of multiple deployment models. MULTI-USER DBMS ARCHITECTURES Components of a DBMS DBMSs are highly complex and sophisticated pieces of software that aim to provide the services. It is not possible to generalize the component structure of a DBMS, as it varies greatly from system to system. However, it is useful when trying to understand database systems to try to view the components and the relationships between them. We examine the architecture of the Oracle DBMS in the next section. A DBMS is partitioned into several software components (or modules), each of which is assigned a specific operation. As stated previously, some of the functions of the DBMS are supported by the underlying operating system. However, the operating system provides only basic services and the DBMS must be built on top of it. Thus, the design of a DBMS must take into account the interface between the DBMS and the operating system. MULTI-USER DBMS ARCHITECTURES DBMS components: • Query processor. This is a major DBMS component that transforms queries into a series of low-level instructions directed to the database manager. • Database manager (DM). The DM interfaces with user-submitted application programs and queries. The DM accepts queries and examines the external and conceptual schemas to determine what conceptual records are required to satisfy the request. The DM then places a call to the file manager to perform the request. • File manager. The file manager manipulates the underlying storage files and manages the allocation of storage space on disk. It establishes and maintains the list of structures and indexes defined in the internal schema. If hashed files are used, it calls on the hashing functions to generate record addresses. However, the file manager does not directly manage the physical input and output of data. Rather, it passes the requests on to the appropriate access methods, which either read data from or write data into the system buffer (or cache). MULTI-USER DBMS ARCHITECTURES DBMS components: • DML preprocessor. This module converts DML statements embedded in an application program into standard function calls in the host language. The DML preprocessor must interact with the query processor to generate the appropriate code. • DDL compiler. The DDL compiler converts DDL statements into a set of tables containing metadata. These tables are then stored in the system catalog while control information is stored in data file headers