Distributed Database Systems

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following best describes a Distributed Database System (DDBS)?

  • A collection of logically interrelated databases distributed over a computer network. (correct)
  • A centralized database system with remote access capabilities.
  • A single database that is physically stored on multiple servers.
  • A set of independent file systems managed by different applications.

What is the primary role of a Distributed Database Management System (D-DBMS)?

  • To optimize data storage for individual applications.
  • To provide a single point of access for all data within an organization.
  • To ensure data is stored redundantly across multiple locations.
  • To manage the complexity of a distributed database by making the distribution transparent to users. (correct)

Which characteristic distinguishes a Distributed Database System (DDBS) from a centralized database on a network?

  • That the DDBS is a collection of files individually stored at each node of a computer network
  • The presence of a Database Management System (DBMS).
  • The use of a computer network for data access.
  • The physical distribution of databases across multiple nodes. (correct)

In the context of Distributed Database Systems, what does the term 'data independence' refer to?

<p>The immunity of user applications to changes in the definition and organization of data. (D)</p> Signup and view all the answers

What is the role of 'computer network technology' in a distributed database environment?

<p>To connect distributed operational tasks and enable communication between database sites. (A)</p> Signup and view all the answers

Which of the following is NOT an implicit assumption in Distributed Database Systems?

<p>The system is a loosely coupled multiprocessor system. (A)</p> Signup and view all the answers

What are the three orthogonal dimensions that define data delivery alternatives (DDA) in a distributed database environment?

<p>Delivery modes, delivery frequency, and communication methods. (B)</p> Signup and view all the answers

In data delivery alternatives, which mode involves the server initiating the transfer of data to clients without a specific request?

<p>Push-only (D)</p> Signup and view all the answers

Which data delivery frequency involves data being sent from the server to clients at regular, pre-defined intervals?

<p>Periodic (B)</p> Signup and view all the answers

What is the primary characteristic of 'Unicast' as a communication method in data delivery?

<p>Data is sent from a server to a specific client in a one-to-one fashion. (D)</p> Signup and view all the answers

Which of the following is a primary promise of Distributed DBMS regarding data management?

<p>Transparent management of distributed, fragmented, and replicated data. (C)</p> Signup and view all the answers

What does 'transparency' in the context of a Distributed DBMS refer to?

<p>The separation of higher-level semantics from lower-level implementation issues. (B)</p> Signup and view all the answers

Which type of data independence refers to the immunity of user applications to changes in the logical structure of the database?

<p>Logical data independence (B)</p> Signup and view all the answers

How does 'location transparency' benefit users of a Distributed Database System (DDBS)?

<p>By allowing users to access data without needing to know its physical location. (B)</p> Signup and view all the answers

In Distributed Databases, what does 'naming transparency' ensure?

<p>A unique name is provided for each object in the database. (A)</p> Signup and view all the answers

In the context of data distribution, what is 'horizontal fragmentation'?

<p>Partitioning a relation into subsets of tuples (rows) and assigning them to different locations. (A)</p> Signup and view all the answers

How does 'vertical fragmentation' differ from 'horizontal fragmentation' in a distributed database?

<p>Vertical fragmentation divides a table into columns, while horizontal fragmentation divides it into rows. (B)</p> Signup and view all the answers

What is the main challenge when handling user queries on fragmented database objects in a distributed environment?

<p>Finding a query processing strategy based on the fragments rather than the relations. (D)</p> Signup and view all the answers

What is a major benefit of using distributed DBMS with replicated components?

<p>Improved reliability by removing any single point of failure. (B)</p> Signup and view all the answers

What is the purpose of Commit protocols in Distributed Transaction Management?

<p>They are protocols that are capable of performing commit operations and recovery unfinished transactions. (D)</p> Signup and view all the answers

What is a potential drawback of Data Replication in distributed databases?

<p>Difficulty handling data consistency during updates. (C)</p> Signup and view all the answers

Localization, as a result of fragmentation and replication in DDBS, has two main advantages. One of them is that contention for CPU and I/O services is not as severe as for centralized databases. What is the other advantage?

<p>Reduced remote access delays that are usually involved in wide area networks (B)</p> Signup and view all the answers

What is the benefit of Intra-query parallelism in a distributed database system?

<p>Speeds up query execution by breaking a query into subqueries executed at different sites. (B)</p> Signup and view all the answers

Have as much of the data required by each application at the site where the application executes will lead to...

<p>Full replication (A)</p> Signup and view all the answers

What factor has contributed significantly to easier system expansion in modern database systems?

<p>Rapid advancements in microprocessor and workstation technologies. (C)</p> Signup and view all the answers

In distributed database design, what is a key consideration regarding query processing?

<p>Minimizing the overall cost, which includes data transmission and local processing. (D)</p> Signup and view all the answers

What are primary components of the architecture of a query?

<p>Components identified, functions of each component defined, interrelationships and interactions between components defined (B)</p> Signup and view all the answers

What is a system parameter to keep consistent accesses in a distributed DB?

<p>Synchronization of concurrent accesses (B)</p> Signup and view all the answers

What are essential features of the ANSI/SPARC architecture?

<p>Conceptual, internal, and external schemas (C)</p> Signup and view all the answers

What is a key factor to consider regarding operating system support for effectively operating a Distributed DBMS?

<p>Dichotomy between general purpose processing requirements and database processing requirements. (C)</p> Signup and view all the answers

Which of the following represents a challenge related to concurrency control in Distributed DBMS?

<p>Synchronization of concurrent accesses. (B)</p> Signup and view all the answers

How does the implementation of Distributed Concurrency Control protocols enhance database management systems?

<p>By controlling simultaneous transactions in distributed databases. (D)</p> Signup and view all the answers

A company decides to distribute its database across multiple sites to improve accessibility and fault tolerance. However, they still want users to interact with the database as if it were a single, centralized system. Which of the following transparency types is most critical for achieving this goal?

<p>Network (Distribution) Transparency (C)</p> Signup and view all the answers

A database system is designed to allow users to access data regardless of its storage location. However, the system requires users to include the physical site name in their queries to specify where the data is located. Which type of transparency is lacking in this system?

<p>Location Transparency (D)</p> Signup and view all the answers

An international bank has multiple branches, each with its own database. The bank wants to create a distributed database system where each branch can independently manage its data and processes but still needs to interact with other branches. Which characteristic of a Distributed DBMS is most important in the scenario?

<p>Autonomy (D)</p> Signup and view all the answers

In a distributed database environment, replicated components and data should make distributed DBMS more __________

<p>Reliable (B)</p> Signup and view all the answers

What is required to handle user queries that are specified on the entire relations but must be executed on sub-relations?

<p>Query Optimization (B)</p> Signup and view all the answers

Which of the following is a major impact of the emergence of Microprocessor and Workstation Technologies?

<p>The easier system expansion (C)</p> Signup and view all the answers

What does 'Data Independence' serve as to?

<p>Transparency (A)</p> Signup and view all the answers

How the Distributed query processing will be formulated to?

<p>$min{cost = data \ transmission + local \ processing}$ (C)</p> Signup and view all the answers

Flashcards

What is a Distributed Database System (DDBS)?

A database system where data is spread across multiple computers or locations.

How does Database Management administrate data?

Centralizing data definition/administration, offering data independence.

What technologies converge in DDBS?

The combination of database technology and computer networks for data handling.

What is Distributed Computing?

Autonomous elements interconnected to perform tasks collaboratively.

Signup and view all the flashcards

What composes a Distributed Database (DDB)?

Collection of logically related databases over a computer network.

Signup and view all the flashcards

What does a Distributed DBMS (D-DBMS) do?

Software managing the DDB, ensuring access transparency.

Signup and view all the flashcards

What is NOT a DDBS?

A collection of files individually stored at nodes on a network.

Signup and view all the flashcards

What is a key assumption of DDBS?

Data stored at multiple sites, logically single-processor per site.

Signup and view all the flashcards

What is 'Data Delivery' in DDBS?

How data is delivered from storage to query origin.

Signup and view all the flashcards

What are the Data Delivery Alternatives?

Pull, push and hybrid modes of delivery.

Signup and view all the flashcards

What is 'pull-only' data delivery?

Transfer initiated by client request.

Signup and view all the flashcards

What is 'push-only' data delivery?

Transfer initiated by the server.

Signup and view all the flashcards

What is 'hybrid' data delivery?

Combines client-pull and server-push methods.

Signup and view all the flashcards

What are the promises of Distributed DBMS?

Transparent management, improved reliability, performance, and easier scaling.

Signup and view all the flashcards

What is 'Transparency' in DDBS?

Separating semantics from implementation details, 'hides' complexity.

Signup and view all the flashcards

What is Data Independence?

Immunity of applications to data definition/organization changes.

Signup and view all the flashcards

What is Network Transparency?

Users protected from operational network details.

Signup and view all the flashcards

What is Location Transparency?

Command independent of data/system location.

Signup and view all the flashcards

What is Naming Transparency?

A unique name for each database object.

Signup and view all the flashcards

What is Replication Transparency?

Users unaware of data copies; system manages them.

Signup and view all the flashcards

What is Fragmentation Transparency?

Dividing database relation into smaller, manageable parts.

Signup and view all the flashcards

What is Horizontal Fragmentation?

Partitioning relation into subsets of tuples (rows).

Signup and view all the flashcards

What is Vertical Fragmentation?

Subsetting attributes (columns) of the original relation.

Signup and view all the flashcards

How do Distributed DBMSs improve reliability?

Improve reliability by removing single points of failure.

Signup and view all the flashcards

What is a Transaction?

A basic unit of consistent and reliable computing.

Signup and view all the flashcards

What is Concurrency Transparency?

Transformation of database state between consistency.

Signup and view all the flashcards

What is Failure Atomicity?

Ensures all transactions are fully completed or not at all

Signup and view all the flashcards

What does Distributed Transaction Support need?

Support requires concurrency control and commit protocols.

Signup and view all the flashcards

How does Distribution optimize resource use?

Contention for CPU and I/O services is reduced.

Signup and view all the flashcards

How does Localization improve database access?

Reduces delays because of geographic distance.

Signup and view all the flashcards

What is Inter-query Parallelism?

Executing multiple queries at the same time.

Signup and view all the flashcards

What is Intra-query Parallelism?

Breaking a query into several pieces, executing each on different sites.

Signup and view all the flashcards

What is System Expansion about?

Scaling database to accommodate more workloads, growing.

Signup and view all the flashcards

What is Distributed Database Design?

How to distribute the database over sites.

Signup and view all the flashcards

What is main Concurrency Control challenges?

Synchronization, consistency, and deadlock management.

Signup and view all the flashcards

What does Reliability focus on?

Resilience to failures, atomicity and durability.

Signup and view all the flashcards

What are the key Related Issues?

Operating system support and support for data operation.

Signup and view all the flashcards

What comprises the 'Architecture' in DBMS?

Structure the system by identifying components, their function and interaction.

Signup and view all the flashcards

What are main merits of Client-Server Architecture?

More efficient labor, horizontal and vertical scaling.

Signup and view all the flashcards

What comprises Heterogeneity?

Hardware, communication, and OS differences.

Signup and view all the flashcards

Study Notes

Lecture Outline

  • A distributed DBMS intro.
  • Distributed DBMS architecture.
  • Background info.
  • Distributed database design.
  • Database integration.
  • Semantic Data Control.
  • Distributed query processing.
  • Multidatabase query processing.
  • Distributed transaction management.
  • Data replication.
  • Parallel database systems.
  • Distributed object DBMS.
  • Peer-to-peer data management.
  • Web data management
  • Current issues.

Introduction

  • A Distributed Database System (DDBS) results from combining "Database Systems" and "Computer Network" technologies.
  • Initially, each application defined and managed its own data via file systems.
  • Now, data is centrally defined and administered through Database Management.
  • Data independence means application programs aren't affected by logical or physical changes.
  • Distributed Database systems motivation is integrating operational data without centralization.
  • Computer network tech connects distributed operational tasks.

File Systems & Database Management Visuals

  • File systems involve separate programs and data descriptions for each file, pointing to individual files.
  • Database Management involves applications interacting with a DBMS, which manages description, manipulation, and control of data in a centralized database.

Motivation

  • Database Technology allows for integration of Computer Networks leading to distributed database systems.
  • Integration does not equal centralization.

Distributed Computing

  • This involves autonomous processing elements, potentially heterogeneous, that are interconnected by a computer network for assigned tasks.
  • Processing logic or elements are distributed.
  • Function: Delegating various functions of a computer system to various pieces of hardware or software
  • Data: The data used by a number of applications may be distributed to a number of processing sites.
  • Control: Control of the execution of various tasks might be distributed instead of being performed by one computer system.
  • Distributed processing aligns with today's organizational structures.
  • It suits web applications, e-commerce, multimedia, manufacturing control systems, and cloud computing.
  • Aids in handling large-scale data through divide and conquer.

Distributed Database System (DDBS)

  • A DDBS is multiple logically interrelated databases distributed over a computer network.
  • D-DBMS manages the DDB and makes distribution transparent to users.
  • Distributed database system (DDBS) = DDB + D-DBMS

What a DDBS is NOT

  • Not a "collection of files" individually stored at each node of a computer network.
  • It's more than just physical distribution, depends on whether databases reside in the same computer or not.
  • Nor a timesharing computer system.
  • Nor a loosely or tightly coupled multiprocessor system.
  • Not a database system residing at one network node as a centralized database.

Centralized vs. Distributed Databases

  • Centralized DBMS on a Network = database on a network node
  • Distributed DBMS Environment = multiple databases across multiple networks

Implicit Assumptions

  • Data at multiple sites assumes each site has a single processor.
  • Processors are networked, not a multiprocessor system.
  • Distributed database must be a database and not a collection of files
  • D-DBMS is a full-fledged DBMS and not a remote file or TP system.

Data Delivery

  • Data delivery concerns data moving from storage to query location.
  • Three dimensions exist in data delivery alternatives (DDA): delivery modes, frequency, and communication methods.
  • Combining these dimensions creates a rich design space.

Data Delivery Alternatives

  • Delivery modes

Pull-only

  • Data is transferred when initiated by a client pulling data from a server.
  • The server responds by locating requested information.
  • New data/updates are carried out at the source without notifying clients unless explicitly polled.

Push-only

  • The transfer of data from servers to clients is initiated by a server push without a specific request from clients.
  • It is hard to know what data is of common interest, and when to send them to clients.
  • Alternatives are periodic, irregular, or conditional.
  • Server push relies on accurately predicting client needs (Broadcast/Multicast).

Hybrid

  • Hybrid mode combines client-pull and server-push.
  • Information transfer from servers to clients is first initiated by a client pull, and the subsequent transfer of updated information to clients is initiated by a server push.

Frequency

  • Periodic delivery sends data from the server to clients at regular intervals (defined by system/clients).
  • Pull and push can be done periodically and in a scheduled way.
  • Conditional is delivering data when certain conditions are met by certain conditions installed by clients in their profiles.
  • They can be event-condition-action rules.
  • Used in hybrid or push-only systems.
  • Ad-hoc/irregular delivery is mostly pull-based.
  • Data is pulled when clients request it.
  • Periodic pull occurs when clients use polling on a regular schedule.

Communication Methods

  • Unicast: Server sends to one client using a delivery mode with frequency.
  • One-to-many: Server sends to many clients, potentially using multicast/broadcast protocols.
  • Not all combinations make sense. And is good for first-order characterization of the complexity of emerging distributed data management systems.

Distributed DBMS Promises

  • Transparent management of distributed, fragmented, and replicated data.
  • Improved reliability/availability through distributed transactions.
  • Improved performance.
  • Easier and more economical system expansion.

Transparency

  • Transparency in systems separates high-level semantics from lower-level implementation, hiding details from users.
  • The advantage is support for developing complex apps

Data Independence

  • Data independence in the distributed environment:
  • Network (distribution) transparency.
  • Replication transparency.
  • Fragmentation transparency:
    • Horizontal fragmentation: selection.
    • Vertical fragmentation: projection.

Types of Transparency

  • Data independence is a core transparency form, concerning an application's immunity to data changes.
  • Data definition occurs at 2 levels: logical structure (schema definition) and its physical structure (physical data description).
  • Has logical data independence and physical data independence. -Refers to user application immunity to schema changes -Deals with hiding storage from user apps
  • Network transparency protects users from network details, potentially hiding its existence.
  • Location transparency ensures task commands are independent of data location & system running the task.
  • Naming transparency provides a unique name for each database object.
  • Replication transparency concerns management of replicated data copies.

Fragmentation Transparency

  • Desirably, database relations are divided into smaller fragments treated as separate objects for performance, availability, and reliability.
  • Fragmentation reduces replication effects by using data subsets.
  • Horizontal fragmentation partitions relations into subsets of tuples (rows).
  • Vertical fragmentation creates sub-relations defined by attribute subsets (columns).
  • If database objects are fragmented, it is necessary to handle user queries on entire relations via sub-relations.
  • A query processing strategy needs to be based on fragments rather than relations.
  • A global query must translate to fragment queries to maintain transparency.

Reliability

  • Distributed DBMS improve reliability by replicating components and eliminating single point of failure.
  • Replicated components/data enhance DDMS reliability.
  • A transaction is the consistent unit of computing, and is a sequence of database operations executed as a single action.
  • Distributed transactions provide:
    • Concurrency transparency: Transforms a good database state to another good database state. -Failure atomicity: Occurs during failures across multiple concurrent transactions.

Distributed Transaction Support

  • Requires implementation of -Distributed concurrency control protocols: Protocols that control simultaneous transactions in distributed databases. -Commit protocols: Protocols that are capable of perform commits operations and recovery unfinished transactions.
  • Data replication suits read-intensive workloads, however, problematic for updates.
  • Mechanism to improve reliability in big size distributed databases (common in NoSQL databases).
  • Replication protocols perform replication tasks using distributed computing

Potentially Improved Performance

  • The proximity of data to its points of use require support for fragmentation and replication.
  • A DBMS fragments the conceptual database allowing for data localization. And has 2 potential advantages:
    • Contention for CPU and I/O services is less severe than in centralized databases.
    • Localization reduces remote access delays in wide area networks.

Parallelism Requirements

  • Parallelism requires execution: -Inter-query parallelism: Executes multiple queries at the same time. -Intra-query parallelism: Divides a query into subqueries active on networks to the database. -Each application at a site needs data. -Updates require mutual consistency and freshness of copies.

System Expansion

  • Issue is database scaling.
  • Emergence of microprocessor and workstation technologies, and demise of Grosh's law.
  • The Client-server model of computing exists.
  • Data communication cost vs telecommunication cost.

Distributed DBMS Issues:

  • Distributed Database Design
    • How to distribute the database. Replicated & non-replicated database distribution.
    • A related problem in directory management.
  • Query Processing involves converting user transactions and involves min{cost = data transmission + local processing}.
  • General formulation is NP-hard.
  • Concurrency Control involves synchronization of concurrent accesses, consistency and isolation of transactions' effects, and deadlock management.
  • How to make the system resilient to failures and Atomicity and durability.
  • Operating System Support that operating system with proper support for database operations.
  • Open Systems and Interoperability in Distributed Multidatabase Systems including Parallel issues.

Architecture

  • Defines the structure of the system.
  • Components are identified.
  • Functions are defined for each component.
  • Defines interrelationships between components.

Dimensions of the problem

  • Distribution is whether components are on the same machine or not.
  • Heterogeneity across hardware, communications, or OS; DBMS most important with different data models, query language, or transaction management algorithms.
  • Autonomy, Design Autonomy, Communication, and Execution.
    • Design autonomy: Ability of a component DBMS to decide on issues related to its own design.
    • Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with other DBMSs.
    • Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.

Client/Server Architecture Advantages

  • More efficient division of labor
  • Horizontal and vertical scaling of resources
  • Better price/performance on client machines
  • The ability to use familiar tools on client machines
  • Client access to remote data (via standards)
  • Full DBMS functionality provided to client workstations
  • Overall better system price/performance

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Distributed Databases Quiz
5 questions
Chapter 6 - Distributed Caching
48 questions

Chapter 6 - Distributed Caching

IdyllicResilience5759 avatar
IdyllicResilience5759
Use Quizgecko on...
Browser
Browser