Podcast
Questions and Answers
Which of the following best describes a Distributed Database System (DDBS)?
Which of the following best describes a Distributed Database System (DDBS)?
- A collection of logically interrelated databases distributed over a computer network. (correct)
- A centralized database system with remote access capabilities.
- A single database that is physically stored on multiple servers.
- A set of independent file systems managed by different applications.
What is the primary role of a Distributed Database Management System (D-DBMS)?
What is the primary role of a Distributed Database Management System (D-DBMS)?
- To optimize data storage for individual applications.
- To provide a single point of access for all data within an organization.
- To ensure data is stored redundantly across multiple locations.
- To manage the complexity of a distributed database by making the distribution transparent to users. (correct)
Which characteristic distinguishes a Distributed Database System (DDBS) from a centralized database on a network?
Which characteristic distinguishes a Distributed Database System (DDBS) from a centralized database on a network?
- That the DDBS is a collection of files individually stored at each node of a computer network
- The presence of a Database Management System (DBMS).
- The use of a computer network for data access.
- The physical distribution of databases across multiple nodes. (correct)
In the context of Distributed Database Systems, what does the term 'data independence' refer to?
In the context of Distributed Database Systems, what does the term 'data independence' refer to?
What is the role of 'computer network technology' in a distributed database environment?
What is the role of 'computer network technology' in a distributed database environment?
Which of the following is NOT an implicit assumption in Distributed Database Systems?
Which of the following is NOT an implicit assumption in Distributed Database Systems?
What are the three orthogonal dimensions that define data delivery alternatives (DDA) in a distributed database environment?
What are the three orthogonal dimensions that define data delivery alternatives (DDA) in a distributed database environment?
In data delivery alternatives, which mode involves the server initiating the transfer of data to clients without a specific request?
In data delivery alternatives, which mode involves the server initiating the transfer of data to clients without a specific request?
Which data delivery frequency involves data being sent from the server to clients at regular, pre-defined intervals?
Which data delivery frequency involves data being sent from the server to clients at regular, pre-defined intervals?
What is the primary characteristic of 'Unicast' as a communication method in data delivery?
What is the primary characteristic of 'Unicast' as a communication method in data delivery?
Which of the following is a primary promise of Distributed DBMS regarding data management?
Which of the following is a primary promise of Distributed DBMS regarding data management?
What does 'transparency' in the context of a Distributed DBMS refer to?
What does 'transparency' in the context of a Distributed DBMS refer to?
Which type of data independence refers to the immunity of user applications to changes in the logical structure of the database?
Which type of data independence refers to the immunity of user applications to changes in the logical structure of the database?
How does 'location transparency' benefit users of a Distributed Database System (DDBS)?
How does 'location transparency' benefit users of a Distributed Database System (DDBS)?
In Distributed Databases, what does 'naming transparency' ensure?
In Distributed Databases, what does 'naming transparency' ensure?
In the context of data distribution, what is 'horizontal fragmentation'?
In the context of data distribution, what is 'horizontal fragmentation'?
How does 'vertical fragmentation' differ from 'horizontal fragmentation' in a distributed database?
How does 'vertical fragmentation' differ from 'horizontal fragmentation' in a distributed database?
What is the main challenge when handling user queries on fragmented database objects in a distributed environment?
What is the main challenge when handling user queries on fragmented database objects in a distributed environment?
What is a major benefit of using distributed DBMS with replicated components?
What is a major benefit of using distributed DBMS with replicated components?
What is the purpose of Commit protocols
in Distributed Transaction Management?
What is the purpose of Commit protocols
in Distributed Transaction Management?
What is a potential drawback of Data Replication in distributed databases?
What is a potential drawback of Data Replication in distributed databases?
Localization, as a result of fragmentation and replication in DDBS, has two main advantages. One of them is that contention for CPU and I/O services is not as severe as for centralized databases. What is the other advantage?
Localization, as a result of fragmentation and replication in DDBS, has two main advantages. One of them is that contention for CPU and I/O services is not as severe as for centralized databases. What is the other advantage?
What is the benefit of Intra-query parallelism in a distributed database system?
What is the benefit of Intra-query parallelism in a distributed database system?
Have as much of the data required by each application at the site where the application executes will lead to...
Have as much of the data required by each application at the site where the application executes will lead to...
What factor has contributed significantly to easier system expansion in modern database systems?
What factor has contributed significantly to easier system expansion in modern database systems?
In distributed database design, what is a key consideration regarding query processing?
In distributed database design, what is a key consideration regarding query processing?
What are primary components of the architecture of a query?
What are primary components of the architecture of a query?
What is a system parameter to keep consistent accesses in a distributed DB?
What is a system parameter to keep consistent accesses in a distributed DB?
What are essential features of the ANSI/SPARC architecture?
What are essential features of the ANSI/SPARC architecture?
What is a key factor to consider regarding operating system support for effectively operating a Distributed DBMS?
What is a key factor to consider regarding operating system support for effectively operating a Distributed DBMS?
Which of the following represents a challenge related to concurrency control in Distributed DBMS?
Which of the following represents a challenge related to concurrency control in Distributed DBMS?
How does the implementation of Distributed Concurrency Control protocols enhance database management systems?
How does the implementation of Distributed Concurrency Control protocols enhance database management systems?
A company decides to distribute its database across multiple sites to improve accessibility and fault tolerance. However, they still want users to interact with the database as if it were a single, centralized system. Which of the following transparency types is most critical for achieving this goal?
A company decides to distribute its database across multiple sites to improve accessibility and fault tolerance. However, they still want users to interact with the database as if it were a single, centralized system. Which of the following transparency types is most critical for achieving this goal?
A database system is designed to allow users to access data regardless of its storage location. However, the system requires users to include the physical site name in their queries to specify where the data is located. Which type of transparency is lacking in this system?
A database system is designed to allow users to access data regardless of its storage location. However, the system requires users to include the physical site name in their queries to specify where the data is located. Which type of transparency is lacking in this system?
An international bank has multiple branches, each with its own database. The bank wants to create a distributed database system where each branch can independently manage its data and processes but still needs to interact with other branches. Which characteristic of a Distributed DBMS is most important in the scenario?
An international bank has multiple branches, each with its own database. The bank wants to create a distributed database system where each branch can independently manage its data and processes but still needs to interact with other branches. Which characteristic of a Distributed DBMS is most important in the scenario?
In a distributed database environment, replicated components and data should make distributed DBMS more __________
In a distributed database environment, replicated components and data should make distributed DBMS more __________
What is required to handle user queries that are specified on the entire relations but must be executed on sub-relations?
What is required to handle user queries that are specified on the entire relations but must be executed on sub-relations?
Which of the following is a major impact of the emergence of Microprocessor and Workstation Technologies?
Which of the following is a major impact of the emergence of Microprocessor and Workstation Technologies?
What does 'Data Independence' serve as to?
What does 'Data Independence' serve as to?
How the Distributed query processing will be formulated to?
How the Distributed query processing will be formulated to?
Flashcards
What is a Distributed Database System (DDBS)?
What is a Distributed Database System (DDBS)?
A database system where data is spread across multiple computers or locations.
How does Database Management administrate data?
How does Database Management administrate data?
Centralizing data definition/administration, offering data independence.
What technologies converge in DDBS?
What technologies converge in DDBS?
The combination of database technology and computer networks for data handling.
What is Distributed Computing?
What is Distributed Computing?
Signup and view all the flashcards
What composes a Distributed Database (DDB)?
What composes a Distributed Database (DDB)?
Signup and view all the flashcards
What does a Distributed DBMS (D-DBMS) do?
What does a Distributed DBMS (D-DBMS) do?
Signup and view all the flashcards
What is NOT a DDBS?
What is NOT a DDBS?
Signup and view all the flashcards
What is a key assumption of DDBS?
What is a key assumption of DDBS?
Signup and view all the flashcards
What is 'Data Delivery' in DDBS?
What is 'Data Delivery' in DDBS?
Signup and view all the flashcards
What are the Data Delivery Alternatives?
What are the Data Delivery Alternatives?
Signup and view all the flashcards
What is 'pull-only' data delivery?
What is 'pull-only' data delivery?
Signup and view all the flashcards
What is 'push-only' data delivery?
What is 'push-only' data delivery?
Signup and view all the flashcards
What is 'hybrid' data delivery?
What is 'hybrid' data delivery?
Signup and view all the flashcards
What are the promises of Distributed DBMS?
What are the promises of Distributed DBMS?
Signup and view all the flashcards
What is 'Transparency' in DDBS?
What is 'Transparency' in DDBS?
Signup and view all the flashcards
What is Data Independence?
What is Data Independence?
Signup and view all the flashcards
What is Network Transparency?
What is Network Transparency?
Signup and view all the flashcards
What is Location Transparency?
What is Location Transparency?
Signup and view all the flashcards
What is Naming Transparency?
What is Naming Transparency?
Signup and view all the flashcards
What is Replication Transparency?
What is Replication Transparency?
Signup and view all the flashcards
What is Fragmentation Transparency?
What is Fragmentation Transparency?
Signup and view all the flashcards
What is Horizontal Fragmentation?
What is Horizontal Fragmentation?
Signup and view all the flashcards
What is Vertical Fragmentation?
What is Vertical Fragmentation?
Signup and view all the flashcards
How do Distributed DBMSs improve reliability?
How do Distributed DBMSs improve reliability?
Signup and view all the flashcards
What is a Transaction?
What is a Transaction?
Signup and view all the flashcards
What is Concurrency Transparency?
What is Concurrency Transparency?
Signup and view all the flashcards
What is Failure Atomicity?
What is Failure Atomicity?
Signup and view all the flashcards
What does Distributed Transaction Support need?
What does Distributed Transaction Support need?
Signup and view all the flashcards
How does Distribution optimize resource use?
How does Distribution optimize resource use?
Signup and view all the flashcards
How does Localization improve database access?
How does Localization improve database access?
Signup and view all the flashcards
What is Inter-query Parallelism?
What is Inter-query Parallelism?
Signup and view all the flashcards
What is Intra-query Parallelism?
What is Intra-query Parallelism?
Signup and view all the flashcards
What is System Expansion about?
What is System Expansion about?
Signup and view all the flashcards
What is Distributed Database Design?
What is Distributed Database Design?
Signup and view all the flashcards
What is main Concurrency Control challenges?
What is main Concurrency Control challenges?
Signup and view all the flashcards
What does Reliability focus on?
What does Reliability focus on?
Signup and view all the flashcards
What are the key Related Issues?
What are the key Related Issues?
Signup and view all the flashcards
What comprises the 'Architecture' in DBMS?
What comprises the 'Architecture' in DBMS?
Signup and view all the flashcards
What are main merits of Client-Server Architecture?
What are main merits of Client-Server Architecture?
Signup and view all the flashcards
What comprises Heterogeneity?
What comprises Heterogeneity?
Signup and view all the flashcards
Study Notes
- Luis Eduardo Bautista Villalpando, PhD is the instructor.
- Course page: https://rooster.uaa.mx/?page_id=132.
- Email: [email protected].
Lecture Outline
- A distributed DBMS intro.
- Distributed DBMS architecture.
- Background info.
- Distributed database design.
- Database integration.
- Semantic Data Control.
- Distributed query processing.
- Multidatabase query processing.
- Distributed transaction management.
- Data replication.
- Parallel database systems.
- Distributed object DBMS.
- Peer-to-peer data management.
- Web data management
- Current issues.
Introduction
- A Distributed Database System (DDBS) results from combining "Database Systems" and "Computer Network" technologies.
- Initially, each application defined and managed its own data via file systems.
- Now, data is centrally defined and administered through Database Management.
- Data independence means application programs aren't affected by logical or physical changes.
- Distributed Database systems motivation is integrating operational data without centralization.
- Computer network tech connects distributed operational tasks.
File Systems & Database Management Visuals
- File systems involve separate programs and data descriptions for each file, pointing to individual files.
- Database Management involves applications interacting with a DBMS, which manages description, manipulation, and control of data in a centralized database.
Motivation
- Database Technology allows for integration of Computer Networks leading to distributed database systems.
- Integration does not equal centralization.
Distributed Computing
- This involves autonomous processing elements, potentially heterogeneous, that are interconnected by a computer network for assigned tasks.
- Processing logic or elements are distributed.
- Function: Delegating various functions of a computer system to various pieces of hardware or software
- Data: The data used by a number of applications may be distributed to a number of processing sites.
- Control: Control of the execution of various tasks might be distributed instead of being performed by one computer system.
- Distributed processing aligns with today's organizational structures.
- It suits web applications, e-commerce, multimedia, manufacturing control systems, and cloud computing.
- Aids in handling large-scale data through divide and conquer.
Distributed Database System (DDBS)
- A DDBS is multiple logically interrelated databases distributed over a computer network.
- D-DBMS manages the DDB and makes distribution transparent to users.
- Distributed database system (DDBS) = DDB + D-DBMS
What a DDBS is NOT
- Not a "collection of files" individually stored at each node of a computer network.
- It's more than just physical distribution, depends on whether databases reside in the same computer or not.
- Nor a timesharing computer system.
- Nor a loosely or tightly coupled multiprocessor system.
- Not a database system residing at one network node as a centralized database.
Centralized vs. Distributed Databases
- Centralized DBMS on a Network = database on a network node
- Distributed DBMS Environment = multiple databases across multiple networks
Implicit Assumptions
- Data at multiple sites assumes each site has a single processor.
- Processors are networked, not a multiprocessor system.
- Distributed database must be a database and not a collection of files
- D-DBMS is a full-fledged DBMS and not a remote file or TP system.
Data Delivery
- Data delivery concerns data moving from storage to query location.
- Three dimensions exist in data delivery alternatives (DDA): delivery modes, frequency, and communication methods.
- Combining these dimensions creates a rich design space.
Data Delivery Alternatives
- Delivery modes
Pull-only
- Data is transferred when initiated by a client pulling data from a server.
- The server responds by locating requested information.
- New data/updates are carried out at the source without notifying clients unless explicitly polled.
Push-only
- The transfer of data from servers to clients is initiated by a server push without a specific request from clients.
- It is hard to know what data is of common interest, and when to send them to clients.
- Alternatives are periodic, irregular, or conditional.
- Server push relies on accurately predicting client needs (Broadcast/Multicast).
Hybrid
- Hybrid mode combines client-pull and server-push.
- Information transfer from servers to clients is first initiated by a client pull, and the subsequent transfer of updated information to clients is initiated by a server push.
Frequency
- Periodic delivery sends data from the server to clients at regular intervals (defined by system/clients).
- Pull and push can be done periodically and in a scheduled way.
- Conditional is delivering data when certain conditions are met by certain conditions installed by clients in their profiles.
- They can be event-condition-action rules.
- Used in hybrid or push-only systems.
- Ad-hoc/irregular delivery is mostly pull-based.
- Data is pulled when clients request it.
- Periodic pull occurs when clients use polling on a regular schedule.
Communication Methods
- Unicast: Server sends to one client using a delivery mode with frequency.
- One-to-many: Server sends to many clients, potentially using multicast/broadcast protocols.
- Not all combinations make sense. And is good for first-order characterization of the complexity of emerging distributed data management systems.
Distributed DBMS Promises
- Transparent management of distributed, fragmented, and replicated data.
- Improved reliability/availability through distributed transactions.
- Improved performance.
- Easier and more economical system expansion.
Transparency
- Transparency in systems separates high-level semantics from lower-level implementation, hiding details from users.
- The advantage is support for developing complex apps
Data Independence
- Data independence in the distributed environment:
- Network (distribution) transparency.
- Replication transparency.
- Fragmentation transparency:
- Horizontal fragmentation: selection.
- Vertical fragmentation: projection.
Types of Transparency
- Data independence is a core transparency form, concerning an application's immunity to data changes.
- Data definition occurs at 2 levels: logical structure (schema definition) and its physical structure (physical data description).
- Has logical data independence and physical data independence. -Refers to user application immunity to schema changes -Deals with hiding storage from user apps
- Network transparency protects users from network details, potentially hiding its existence.
- Location transparency ensures task commands are independent of data location & system running the task.
- Naming transparency provides a unique name for each database object.
- Replication transparency concerns management of replicated data copies.
Fragmentation Transparency
- Desirably, database relations are divided into smaller fragments treated as separate objects for performance, availability, and reliability.
- Fragmentation reduces replication effects by using data subsets.
- Horizontal fragmentation partitions relations into subsets of tuples (rows).
- Vertical fragmentation creates sub-relations defined by attribute subsets (columns).
- If database objects are fragmented, it is necessary to handle user queries on entire relations via sub-relations.
- A query processing strategy needs to be based on fragments rather than relations.
- A global query must translate to fragment queries to maintain transparency.
Reliability
- Distributed DBMS improve reliability by replicating components and eliminating single point of failure.
- Replicated components/data enhance DDMS reliability.
- A transaction is the consistent unit of computing, and is a sequence of database operations executed as a single action.
- Distributed transactions provide:
- Concurrency transparency: Transforms a good database state to another good database state. -Failure atomicity: Occurs during failures across multiple concurrent transactions.
Distributed Transaction Support
- Requires implementation of -Distributed concurrency control protocols: Protocols that control simultaneous transactions in distributed databases. -Commit protocols: Protocols that are capable of perform commits operations and recovery unfinished transactions.
- Data replication suits read-intensive workloads, however, problematic for updates.
- Mechanism to improve reliability in big size distributed databases (common in NoSQL databases).
- Replication protocols perform replication tasks using distributed computing
Potentially Improved Performance
- The proximity of data to its points of use require support for fragmentation and replication.
- A DBMS fragments the conceptual database allowing for data localization. And has 2 potential advantages:
- Contention for CPU and I/O services is less severe than in centralized databases.
- Localization reduces remote access delays in wide area networks.
Parallelism Requirements
- Parallelism requires execution: -Inter-query parallelism: Executes multiple queries at the same time. -Intra-query parallelism: Divides a query into subqueries active on networks to the database. -Each application at a site needs data. -Updates require mutual consistency and freshness of copies.
System Expansion
- Issue is database scaling.
- Emergence of microprocessor and workstation technologies, and demise of Grosh's law.
- The Client-server model of computing exists.
- Data communication cost vs telecommunication cost.
Distributed DBMS Issues:
- Distributed Database Design
- How to distribute the database. Replicated & non-replicated database distribution.
- A related problem in directory management.
- Query Processing involves converting user transactions and involves min{cost = data transmission + local processing}.
- General formulation is NP-hard.
- Concurrency Control involves synchronization of concurrent accesses, consistency and isolation of transactions' effects, and deadlock management.
- How to make the system resilient to failures and Atomicity and durability.
Related Issues
- Operating System Support that operating system with proper support for database operations.
- Open Systems and Interoperability in Distributed Multidatabase Systems including Parallel issues.
Architecture
- Defines the structure of the system.
- Components are identified.
- Functions are defined for each component.
- Defines interrelationships between components.
Dimensions of the problem
- Distribution is whether components are on the same machine or not.
- Heterogeneity across hardware, communications, or OS; DBMS most important with different data models, query language, or transaction management algorithms.
- Autonomy, Design Autonomy, Communication, and Execution.
- Design autonomy: Ability of a component DBMS to decide on issues related to its own design.
- Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with other DBMSs.
- Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to.
Client/Server Architecture Advantages
- More efficient division of labor
- Horizontal and vertical scaling of resources
- Better price/performance on client machines
- The ability to use familiar tools on client machines
- Client access to remote data (via standards)
- Full DBMS functionality provided to client workstations
- Overall better system price/performance
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.