Database System Architectures
41 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

In a centralized system, how do general-purpose computers typically access shared memory?

  • Via a common bus connecting CPUs and device controllers. (correct)
  • By implementing a message-passing interface.
  • Using a distributed cache coherency protocol.
  • Through a dedicated high-speed network connection.

What is the primary role of the back-end in a client-server database system?

  • Providing a graphical user interface for users.
  • Handling communication between the client and the network.
  • Generating reports and forms for data presentation.
  • Managing access structures, query evaluation, and concurrency control. (correct)

Why is replacing mainframes with client-server architectures beneficial for organizations?

  • Mainframes provide better user interfaces.
  • Client-server architectures offer better functionality for the cost and easier maintenance. (correct)
  • Mainframes are easier to scale and maintain.
  • Client-server architectures centralize all processing tasks.

Which type of server system is commonly used in relational database systems?

<p>Transaction servers (C)</p> Signup and view all the answers

What is the function of the Log writer process in a transaction server?

<p>To output log records to stable storage. (B)</p> Signup and view all the answers

What is the purpose of implementing mutual exclusion in database systems?

<p>To ensure that no two processes are accessing the same data structure at the same time. (A)</p> Signup and view all the answers

What is the primary advantage of using data servers in high-speed LANs?

<p>To distribute data processing to clients with comparable processing power. (D)</p> Signup and view all the answers

What is 'cache coherency' in the context of data caching in data servers?

<p>Ensuring that cached data is up-to-date before it is used. (C)</p> Signup and view all the answers

What characterizes a 'coarse-grain parallel' machine?

<p>Consisting of a small number of powerful processors. (B)</p> Signup and view all the answers

How is 'speedup' measured in the context of parallel systems?

<p>By the ratio of the small system elapsed time to the large system elapsed time. (C)</p> Signup and view all the answers

What is 'transaction scaleup' designed to address in parallel database systems?

<p>Numerous small queries submitted by independent users. (B)</p> Signup and view all the answers

Which factor contributes to sublinear speedup and scaleup in parallel systems?

<p>Interference from processes accessing shared resources. (D)</p> Signup and view all the answers

What is a limitation of using a 'bus' interconnection network in parallel systems?

<p>It does not scale well with increasing parallelism. (A)</p> Signup and view all the answers

In a 'hypercube' interconnection network, how are components connected?

<p>Components are connected if their binary representations differ by exactly one bit. (A)</p> Signup and view all the answers

What is a characteristic of the 'shared memory' architecture in parallel database systems?

<p>Processors and disks have access to a common memory. (A)</p> Signup and view all the answers

Where does the bottleneck typically occur in a 'shared disk' parallel database system?

<p>The interconnection to the disk subsystem. (D)</p> Signup and view all the answers

What is the primary advantage of a 'shared nothing' architecture in parallel database systems?

<p>Scalability to thousands of processors without interference. (B)</p> Signup and view all the answers

What is a 'hierarchical' database architecture a combination of?

<p>Shared-memory, shared-disk, and shared-nothing architectures. (C)</p> Signup and view all the answers

What is a key characteristic of distributed systems concerning data?

<p>Data is spread over multiple machines. (A)</p> Signup and view all the answers

What is the primary goal of homogeneous distributed databases?

<p>Providing a unified view of a single database, hiding distribution details. (C)</p> Signup and view all the answers

How does a 'global transaction' differ from a 'local transaction' in a distributed database?

<p>A global transaction accesses data in one or more sites different from where it was initiated. (A)</p> Signup and view all the answers

What is a significant trade-off in distributed systems related to data management?

<p>Increased autonomy for each site but added complexity for coordination. (C)</p> Signup and view all the answers

What is the purpose of the two-phase commit protocol (2PC) in distributed databases?

<p>To ensure atomicity for transactions updating data at multiple sites. (A)</p> Signup and view all the answers

What is the primary difference between local-area networks (LANs) and wide-area networks (WANs)?

<p>LANs are composed of processors distributed over small geographical areas, while WANs cover large areas. (B)</p> Signup and view all the answers

What is a key characteristic of groupware applications working on WANs with discontinuous connections?

<p>Data is replicated, and updates are propagated periodically. (D)</p> Signup and view all the answers

In a client-server architecture, if the front-end requires data mining and analysis tools, where would these tools reside?

<p>On the front end (C)</p> Signup and view all the answers

To ensure the safe concurrent access of shared data, database systems use mutual exclusion, which is typically implemented using:

<p>Operating system semaphores or atomic instructions. (D)</p> Signup and view all the answers

Considering transaction server processes, what function does the checkpoint process serve?

<p>It performs periodic checkpoints, creating consistent states for recovery. (D)</p> Signup and view all the answers

What is the significance of message passing overhead in page-shipping versus item-shipping scenarios within data servers?

<p>Item-shipping always requires higher message passing overhead due to each item needing individual requests. (A)</p> Signup and view all the answers

In data server architectures, 'lock caching' is employed between transactions. Which statement explains its main benefit?

<p>Reducing lock contention and latency, leading to performance gains. (C)</p> Signup and view all the answers

How does 'skew' affect overall execution time in parallel systems?

<p>Skew causes overall execution time to be determined by the slowest of the parallely executing tasks. (C)</p> Signup and view all the answers

How often does a server process add log records in a log record buffer?

<p>Continually (D)</p> Signup and view all the answers

Which of these choices is NOT involved in the structure of transaction server process?

<p>End user process (C)</p> Signup and view all the answers

To avoid overhead of interprocess communication for lock request/grant, what is an alternative?

<p>Each database process operates directly on the lock table (B)</p> Signup and view all the answers

What does a node function as, regarding shared nothing systems?

<p>The server for the data for the disk or disks the node owns (C)</p> Signup and view all the answers

What is the disadvantage of added complexity required to ensure proper coordination among sites?

<p>Software development cost (B)</p> Signup and view all the answers

Why are wide-area networks with continuous connection (e.g. the Internet) needed for implementing distributed database systems?

<p>So systems may reliably communicate (D)</p> Signup and view all the answers

What is a downside to shared memory systems?

<p>The bus becomes a bottleneck (A)</p> Signup and view all the answers

A fixed-sized problem executing on a small system is given to a system which is N-times is known as

<p>Speedup (B)</p> Signup and view all the answers

How does the mesh network typically scale?

<p>Communication links grow and scales better (C)</p> Signup and view all the answers

What is a characteristic of the processes in Server Processes?

<p>They may be multithreaded (A)</p> Signup and view all the answers

Flashcards

Centralized Systems

Run on a single computer system and do not interact with other computer systems.

General-purpose computer system

One to a few CPUs and device controllers connected through a common bus, providing access to shared memory.

Single-user system

Typically has one CPU, one or two hard disks, and an OS that may support only one user

Client-Server Systems

Server systems satisfy requests generated at 'm' client systems.

Signup and view all the flashcards

Back-end (Client-Server)

Manages access structures, query evaluation/optimization, concurrency control, and recovery in a client-server system.

Signup and view all the flashcards

Front-end (Client-Server)

Consists of tools like forms, report-writers, and GUI facilities in a client-server system.

Signup and view all the flashcards

Transaction Servers

A category of server systems widely used in relational database systems.

Signup and view all the flashcards

Transaction Server Process

Clients send requests, transactions are executed, and results are sent back.

Signup and view all the flashcards

Server Processes

Processes that receive user queries, execute them, and send results back in transaction servers.

Signup and view all the flashcards

Multithreaded Processes

Allows a single process to execute several user queries concurrently.

Signup and view all the flashcards

Database writer process

Outputs modified buffer blocks to disks continually in transaction servers.

Signup and view all the flashcards

Log writer process

Outputs log records to stable storage in transaction servers.

Signup and view all the flashcards

Process monitor process

Monitors other processes and takes recovery actions if any fail in transaction servers.

Signup and view all the flashcards

Mutual Exclusion

Provides synchronization to ensure data integrity in concurrent access.

Signup and view all the flashcards

Data Servers

Category of server systems used often in object-oriented database systems.

Signup and view all the flashcards

Page/Item Shipping

Shipping smaller units requires more messages, consider prefetching related items.

Signup and view all the flashcards

Cache Coherency

Ensuring cached data is up-to-date before use in data server systems

Signup and view all the flashcards

Parallel Systems

System consists of multiple processors and disks connected by a fast network.

Signup and view all the flashcards

Coarse-grain Parallel

Parallel machine with a small number of powerful processors.

Signup and view all the flashcards

Fine-grain Parallel

Parallel machine utilizing thousands of smaller processors.

Signup and view all the flashcards

Throughput

The number of tasks completed in a given time.

Signup and view all the flashcards

Response Time

Time it takes to complete a single task.

Signup and view all the flashcards

Speedup

Fixed-size problem given to a system which is N-times larger.

Signup and view all the flashcards

Scaleup

Increase the size of both the problem and the system.

Signup and view all the flashcards

Batch Scaleup

A single large job (decision support query and simulation).

Signup and view all the flashcards

Shared Memory Architecture

Parallel Architecture where processors share common memory

Signup and view all the flashcards

Transaction Scaleup

Small queries by independent users to a shared DB (transaction/timesharing systems).

Signup and view all the flashcards

Startup Costs

Cost of starting up multiple processes may dominate computation.

Signup and view all the flashcards

Interference (Parallel)

Processes compete for resources (e.g., bus, disks, locks).

Signup and view all the flashcards

Bus Network

System components send data on a single communication medium.

Signup and view all the flashcards

Mesh Network

Components in a grid, each connected to adjacent components.

Signup and view all the flashcards

Hypercube Network

Components numbered in binary; connected if binary representations differ by one bit.

Signup and view all the flashcards

Shared Memory

Processors share a common memory.

Signup and view all the flashcards

Shared Disk

Processors share a common disk.

Signup and view all the flashcards

Shared Nothing

Processors share neither a common memory nor common disk.

Signup and view all the flashcards

Hierarchical Architecture

Hybrid of shared-memory, shared-disk, and shared-nothing architectures.

Signup and view all the flashcards

Distributed Systems

Data spread over machines (sites/nodes) interconnected by a network.

Signup and view all the flashcards

Homogeneous Database

Same software/schema; provides view of a single database.

Signup and view all the flashcards

Local Transaction

Data is accessed in the local site at which a transaction was initialized.

Signup and view all the flashcards

Global Transaction

Data accessed in a site different from the one at which transaction was initiated or in several sites.

Signup and view all the flashcards

Local-Area Networks (LANs)

Composed of processors that are distributed over small geographical areas.

Signup and view all the flashcards

Wide-Area Networks (WANs)

Composed of processors distributed over a large geographical area.

Signup and view all the flashcards

Study Notes

  • Covers database system architectures
  • Includes: centralized and client-server systems, server system architectures, parallel systems, distributed systems, and network types.

Centralized Systems

  • Run on a single computer and do not interact with other computer systems
  • The general-purpose computer system includes few CPUs and device controllers connected through a common bus.
  • Access to shared memory is provided by the common bus.
  • A single-user system is typically a desktop unit with one CPU, one or two hard disks, and supports only one user.
  • A multi-user system has more disks, memory, CPUs, uses a multi-user OS, and serves many users via terminals.
  • Multi-user systems are often called server systems.

Client-Server Systems

  • Server systems satisfy requests from m client systems.
  • Database functionality in client-server systems can be divided into back-end and front-end components
  • Back-end manages access structures, query evaluation/optimization, concurrency control, and recovery.
  • Front-end consists of tools such as forms, report-writers, and graphical user interfaces.
  • SQL or an application program interface provides the interface between the front-end and back-end.
  • Replacing mainframes with networks of workstations or personal computers connected to back-end server machines provide better functionality for the cost.
  • Client-server systems have flexibility in locating resources, expanding facilities, has better user interfaces and easier maintenance

Server System Architecture

  • Server systems are broadly categorized into transaction servers and data servers
  • Widely used in relational database systems transaction servers
  • Data servers are used in object-oriented database systems.

Transaction Servers

  • Also called query server systems or SQL server systems.
  • Clients send requests to the server which executes transactions, and then ships the results back to the client.
  • Requests are specified in SQL and communicated via a remote procedure call (RPC) mechanism.
  • Transactional RPC allows many RPC calls to form a transaction.
  • Open Database Connectivity (ODBC) is a C language API from Microsoft used to connect to a server, send SQL requests, and receive results.
  • JDBC standard for Java is similar to ODBC

Transaction Server Process Structure

  • A transaction server consists of multiple processes accessing data in shared memory.
  • Server processes receive user queries (transactions), execute them, and send results back.
  • Server processes may be multithreaded, allowing a single process to execute several user queries concurrently
  • A lock manager process is used.
  • A database writer process outputs modified buffer blocks to disks continually

Transaction Server Processes

  • Log writer processes add log records to a log record buffer and outputs them to stable storage.
  • Checkpoint processes perform periodic checkpoints
  • Process monitor processes monitor other processes
  • Process monitor processes take recovery actions if processes fail, such as aborting transactions and restarting processes.
  • Shared memory contains shared data in: the buffer pool, lock table, log buffer, and cached query plans
  • All database processes can access shared memory
  • Database systems implement mutual exclusion using operating system semaphores or atomic instructions to prevent two processes from accessing the same data structure simultaneously.
  • To avoid interprocess communication overhead for lock request/grant, each database process operates directly on the lock table.
  • The lock manager process is still used for deadlock detection

Data Servers

  • Utilized in high-speed LANs when clients have comparable processing power to the server and the tasks are compute-intensive.
  • Data is shipped to clients for processing, and results are sent back to the server.
  • This architecture requires full back-end functionality at the clients.
  • Used in many object-oriented database systems where issues include: page-shipping vs item-shipping, locking, data caching, and lock caching

Data Servers - Page-Shipping vs Item-Shipping

  • Page-shipping involves a larger unit but fewer messages.
  • Item-shipping uses a smaller unit, thus involves more messages
  • Worth prefetching related items along with requested items.
  • Page shipping can be thought of as prefetching

Data Servers - Locking

  • Overhead of requesting and getting locks from the server is high due to message delays.
  • Locks can be granted on requested and prefetched items
  • With page shipping, the transaction is granted a lock on the whole page.
  • Locks on prefetched items can be called back by the server and returned by the client if not used.
  • Locks on a page can be deescalated to locks on items when lock conflicts occur, and locks from unused items can then be returned to the server.

Data Servers - Data Caching

  • Data can be cached at the client, even in between transactions.
  • Check that data is up-to-date before being used (cache coherency).
  • Checking can be done when requesting a lock on a data item.

Data Servers - Lock Caching

  • Locks can be retained by the client system, even between transactions.
  • Transactions can acquire cached locks locally, without contacting the server.
  • The server calls back locks from clients when it receives conflicting lock requests.
  • The client returns the lock once no local transaction is using it and is similar to de-escalation, but across transactions.

Parallel Systems

  • Contain multiple processors and disks connected by a fast interconnection network.
  • A coarse-grain parallel machine has few powerful processors
  • A massively parallel or fine-grain parallel machine utilizes thousands of smaller processors.
  • Throughput, the tasks completed in a time interval, and response time, the time to complete a task, are the 2 main performance measures

Speed-Up and Scale-Up

  • Speedup involves giving a fixed-size problem executing on a small system to a system that is N-times larger.
  • Speedup is measured by dividing small system elapsed time by large system elapsed time, and is linear if this equation equals N.
  • Scaleup involves increasing the size of both the problem and the system: an N-times larger system used to perform an N-times larger job.
  • Scaleup is measured by dividing small system small problem elapsed time by big system big problem elapsed time, and is linear if the equation equals 1.

Batch and Transaction Scaleup

  • A single large job is batch scaleup which is typical of most decision support queries and scientific simulations.
  • Batch scaleup uses an N-times larger computer on an N-times larger problem.
  • Transaction scaleup involves numerous small queries from independent users to a shared database.
  • Transaction scaleup involves N-times as many users submitting requests to an N-times larger database on an N-times larger computer
  • Transaction scaleup is well-suited to parallel execution.

Factors Limiting Speedup and Scaleup

  • Speedup and scaleup commonly are often sublinear due to start-up costs, interference and skew
  • Startup costs: the costs of starting up multiple processes may dominate computation time
  • Interference: competing processes accessing shared resources spend more time waiting than performing useful work.
  • Skew: increased parallelism increases the variance in service times of parallel tasks, and execution time depends on the slowest task.

Interconnection Network Architectures

  • Bus: System components send data on and receive it from a single communication bus.
    • Does not scale well with increasing parallelism.
  • Mesh: Components arranged as nodes, connect to adjacent components.
    • Communication links grow with components and scales better.
    • May require 2√n hops to send message to a node or √n with wraparound​.
  • Hypercube: Components are numbered in binary; components connect if binary representations differ by one bit.
    • N components connect to log(n) other components and can reach each other via log(n) links, which reduces communication delays.

Parallel Database Architectures

  • Shared memory: processors share a common memory
  • Shared disk: processors share a common disk
  • Shared nothing: processors share neither memory nor disk
  • Hierarchical: hybrid of shared memory, disk, and nothing

Parallel Database Architectures - Shared Memory

  • Processors and disks access a common memory, typically via a bus or interconnection network.
  • There is efficient communication between processors
  • Shared memory can be accessed by any processor without having software move it
  • Architecture is not scalable past 32 or 64 processors because the bus becomes a bottleneck
  • Shared Memory Widely used for lower degrees of parallelism (4 to 8).

Parallel Database Architectures - Shared Disk

  • All processors can directly access all disks via an interconnection network, but processors have private memories.
  • The memory bus is not a bottleneck and provides a degree of fault tolerance
  • If a processor fails, others take over its tasks since the database has disks that are accessible from all processors
  • Shared disks systems can scale to a somewhat larger number of processors, but the communication between processors is slower
  • Examples include: IBM Sysplex and DEC clusters (now part of Compaq) running Rdb (now Oracle Rdb), which were early commercial users; its weakness now is that there is a bottleneck to the disk subsystem

Parallel Database Architectures - Shared Nothing

  • A node consist of processor, memory, and one or more disks. Nodes communicate via the interconnection network.
  • The node functions as the server for the data on its disk or disks.
  • Data accessed from local disks does not pass through the interconnection minimizing the interference of resource sharing
  • Shared-nothing multiprocessors can be scaled up to thousands of processors without interference.
  • Cost of communication and non-local disk access are the main drawback
  • Examples include: Teradata, Tandem, Oracle-n CUBE

Parallel Database Architectures - Hierarchical

  • Combines characteristics of shared-memory, shared-disk, and shared-nothing architectures
  • Top level is a sharded-nothing architecture where nodes use an interconnection network but do not share disks or memory
  • Each node of the system could be a shared-memory system with a few processors
  • Alternately, each node could be a shared-disk system, and each of the systems sharing a set of disks could share a shared-memory system, but the complexity leads to distributed virtualmemory architectures also called non-uniform memory architectures
  • Reduces the programming complexity in distributed virtual-memory architectures which can also be called non-uniform memory architecture (NUMA).

Distributed Systems

  • Data spreads over multiple machines, also called sites or nodes, connected by a network.
  • Data is shared by users on multiple machines.

Distributed Databases

  • In homogeneous distributed databases, all sites have same software/schema, and data is partitioned among sites.
  • The goal is to provide a single database view, hiding distribution details.
  • In heterogeneous distributed databases, different sites use different software/schema.
  • The goal is to integrate existing databases to provide useful functionality
  • A local transaction accesses data in a single initiating site
  • A global transaction accesses data in a different site than the initiating site or in multiple sites

Trade-offs in Distributed Systems

  • Sharing data allows users at one site to access data at other sites.
  • Each site retains a degree of control over locally stored data, referred to as autonomy.
  • Higher system availability from data replication at remote sites, even when one site fails.
  • Coordination complexity, development cost, bug potential, and processing overhead are all disadvantages.

Implementation Issues for Distributed Databases

  • Atomicity guarantees must update data at multiple sites.
  • The two-phase commit protocol (2PC) ensures atomicity.
  • Each site executes the transaction until right before the commit and leaves the final decision to a coordinator.
  • Each site must follow the coordinator's verdict, even if failures occur while waiting.
  • 2PC is not always appropriate.
  • Other transactions that can be used include: persistent messaging, and workflows
  • Distributed concurrency control (and deadlock detection) is required
  • Data items may be replicated to improve data availability
  • Details are in Chapter 22

Network Types

  • Local-area networks (LANs) are composed of processors distributed over small areas like a single building, or a few buildings
  • Wide-area networks (WANs) are composed of processors distributed over a large geographical area.

Networks Types

  • WANs with a continuous connection (e.g., the Internet) are needed for implementing distributed database systems
  • Groupware applications like Lotus notes can work on WANs with discontinuous connections through data that is periodically updated.
  • Copies of data may be updated independently
  • This can result in non-serializable executions due to different order of operations being executed.
  • Resolution is application dependent.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explanation of database system architectures. Covers centralized, client-server, parallel, and distributed systems. Explains network types and how servers handle requests from multiple clients.

More Like This

Use Quizgecko on...
Browser
Browser