Podcast
Questions and Answers
In a distributed database system, what is the primary role of the coordinator in the Two-Phase Commit (2PC) protocol?
In a distributed database system, what is the primary role of the coordinator in the Two-Phase Commit (2PC) protocol?
- To optimize query processing across different sites.
- To execute the transaction's operations at each participating site.
- To detect and resolve deadlocks spanning multiple sites.
- To manage the global state of the transaction and ensure all sites either commit or abort together. (correct)
Which of the following best describes the 'prepare' phase in the Two-Phase Commit (2PC) protocol?
Which of the following best describes the 'prepare' phase in the Two-Phase Commit (2PC) protocol?
- The coordinator pre-commits the operations locally before propagating the decision.
- Each participant informs the coordinator of its final commit or abort decision.
- The coordinator broadcasts the commit decision to all participants.
- The coordinator asks all participants to prepare to commit, and each participant signals its readiness or refusal. (correct)
What is a significant limitation of the Two-Phase Commit (2PC) protocol in distributed transaction management?
What is a significant limitation of the Two-Phase Commit (2PC) protocol in distributed transaction management?
- It is unable to handle concurrent transactions effectively.
- It can suffer from blocking if the coordinator fails, potentially halting progress indefinitely. (correct)
- It does not guarantee atomicity in the presence of network partitions.
- It requires all participants to use the same database management system.
How does Three-Phase Commit (3PC) attempt to address the blocking problem inherent in Two-Phase Commit (2PC)?
How does Three-Phase Commit (3PC) attempt to address the blocking problem inherent in Two-Phase Commit (2PC)?
What is the primary purpose of concurrency control mechanisms in a distributed database system?
What is the primary purpose of concurrency control mechanisms in a distributed database system?
Which of the following concurrency control protocols is commonly used to serialize access to data in distributed databases?
Which of the following concurrency control protocols is commonly used to serialize access to data in distributed databases?
What does 'eventual consistency' imply in the context of distributed data management?
What does 'eventual consistency' imply in the context of distributed data management?
According to the CAP theorem, what are the three guarantees that a distributed system can only provide two of simultaneously?
According to the CAP theorem, what are the three guarantees that a distributed system can only provide two of simultaneously?
What is the primary benefit of data replication in a distributed database system?
What is the primary benefit of data replication in a distributed database system?
How does horizontal fragmentation distribute data across multiple sites in a distributed database?
How does horizontal fragmentation distribute data across multiple sites in a distributed database?
Flashcards
Distributed Database
Distributed Database
Data stored across multiple physical locations interconnected via a network, functioning as a single logical database.
Fragmentation
Fragmentation
Splitting a relation into logical pieces. Can be horizontal (row-wise) or vertical (column-wise).
Replication
Replication
Creating multiple copies of data at different sites to improve availability and read performance.
Distributed Query Processing
Distributed Query Processing
Signup and view all the flashcards
Distributed Transaction Management
Distributed Transaction Management
Signup and view all the flashcards
Two-Phase Commit (2PC)
Two-Phase Commit (2PC)
Signup and view all the flashcards
Three-Phase Commit (3PC)
Three-Phase Commit (3PC)
Signup and view all the flashcards
Concurrency Control
Concurrency Control
Signup and view all the flashcards
Strict Consistency
Strict Consistency
Signup and view all the flashcards
Eventual Consistency
Eventual Consistency
Signup and view all the flashcards
Study Notes
- Data in distributed databases is stored across various physical locations interconnected by a network.
- The system functions as a single, logical database.
- Benefits include improved scalability, availability, and fault tolerance.
- Data partitioning and replication across sites enhance performance through local access.
- The systems are highly resilient because operation continues even if some sites fail.
- Managing these databases is more complex than centralized ones.
- Data consistency, concurrency control, and transaction management contribute to this complexity.
Architecture
- Homogeneous setups feature identical software and schema across all sites.
- Conversely, heterogeneous setups involve different software and schema at different sites.
- Federated database systems integrate multiple, autonomous databases.
- In federated systems, each database retains its autonomy while participating in a global schema.
Data Distribution
- Distribution can be achieved through fragmentation, replication, or both.
- Fragmentation divides data into logical pieces.
- Tables are split row-wise in horizontal fragmentation.
- Tables are split column-wise in vertical fragmentation.
- Replication involves creating multiple data copies at different sites.
- Replication improves availability and read performance.
- Data replication necessitates careful management to maintain consistency across all replicas.
Distributed Query Processing
- This process optimizes and executes queries across multiple sites.
- The primary goal is to minimize data transfer and execution time.
- Query decomposition breaks down queries into subqueries executable at different sites.
- Data localization moves relevant data to the site of query processing.
- Optimization identifies the most efficient execution plan.
- Execution plan considers data location, network bandwidth, and processing costs.
Distributed Transaction Management
- Transactions in these systems involve operations across multiple sites.
- Distributed transaction management ensures atomicity, consistency, isolation, and durability (ACID properties).
- Achieving ACID properties in a distributed environment is challenging due to network latency and potential site failures.
Two-Phase Commit (2PC)
- 2PC is a widely used protocol which ensures atomicity in distributed transactions.
- 2PC involves a coordinator and multiple participants.
- The coordinator manages the transaction’s global state.
- Participants are the sites involved in the transaction.
- Phase 1: Prepare Phase, the coordinator asks all participants to prepare to commit.
- Each participant performs its part of the transaction.
- Participants then signal their readiness (or refusal) to commit to the coordinator.
- Phase 2: Commit Phase which occurs if all participants are ready, the coordinator sends a commit message to all participants.
- If any participant refuses, the coordinator sends an abort message.
- Participants then either commit or rollback their changes accordingly.
- 2PC ensures that all sites either commit or abort the transaction together.
- 2PC can suffer from blocking if the coordinator fails.
- If a participant is unsure of the coordinator’s decision, it may be blocked indefinitely.
Three-Phase Commit (3PC)
- 3PC is designed to address the blocking problem of 2PC.
- 3PC adds an extra phase to provide fault tolerance.
- Phase 1: Prepare Phase, the coordinator sends a prepare message to all participants.
- Phase 2: Pre-Commit Phase, where if participants are ready, they acknowledge the coordinator.
- The coordinator sends a pre-commit message.
- Phase 3: Commit Phase, the coordinator sends a commit message, and participants commit the transaction.
- 3PC reduces the blocking window compared to 2PC, but is more complex to implement.
Concurrency Control
- Concurrency control manages simultaneous access to shared data to prevent interference between transactions.
- Locking protocols are used to serialize access to data.
- Two-phase locking (2PL) is a common concurrency control protocol.
- Timestamping assigns timestamps to transactions.
- Timestamp ordering ensures that transactions are executed in timestamp order.
- Distributed deadlock detection is required to handle deadlocks spanning multiple sites.
- Deadlock detection can be centralized or distributed.
Data Consistency
- Maintaining data consistency across multiple sites is crucial.
- Replication introduces challenges in keeping replicas synchronized.
- Consistency models define the level of consistency guaranteed.
- Strict consistency requires that all replicas are immediately updated.
- Eventual consistency allows temporary inconsistencies.
- Eventual consistency replicas converge over time.
- CAP theorem states that it is impossible for a distributed system to simultaneously provide consistency, availability, and partition tolerance.
- Choosing the right consistency model depends on the application requirements.
Fault Tolerance
- Fault tolerance allows the system to continue operating despite site failures.
- Replication enhances fault tolerance by providing redundant copies of data.
- Failure detection mechanisms identify failed sites.
- Recovery protocols restore the system to a consistent state after a failure.
Advantages of Distributed Databases
- Scalability is achieved by easily adding more nodes to increase capacity.
- Availability is achieved if the system remains operational even if some nodes fail.
- Performance is increased because data locality reduces network traffic.
- Autonomy allows local control over data.
Disadvantages of Distributed Databases
- Complexity makes it more difficult to design and manage.
- Security creates increased security risks due to distributed nature.
- Control and maintaining data consistency is challenging.
- Overhead includes additional requirements for communication and coordination.
Use Cases
- Financial institutions use distributed databases for transaction processing.
- E-commerce platforms use them for managing product catalogs and customer data.
- Social media networks use distributed databases for storing user profiles and content.
- Cloud computing providers use distributed databases for scalable data storage.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.