Distributed Database Systems Concepts

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the term for a collection of databases scattered across multiple sites over a network?

  • Centralized
  • Parallel
  • Distributed (correct)
  • Cloud

Which type of database is spread over different sites and is not limited to one system?

  • Local database
  • None
  • Distributed database (correct)
  • Centralized database

What type of transparency allows users to access any table or fragment as if it is stored locally?

  • Concurrency
  • Fragmentation
  • Location (correct)
  • Replication

In data integration within a distributed database, what do you call the transformations between related objects?

<p>Mapping (B)</p> Signup and view all the answers

What is the schema process called that identifies when two objects are semantically related?

<p>Matching (C)</p> Signup and view all the answers

Which type of database system is located on various sites without sharing physical components?

<p>Distributed (D)</p> Signup and view all the answers

What type of database system has all different sites storing databases identically?

<p>Homogeneous (A)</p> Signup and view all the answers

What kind of database system allows different sites to use varying schemas and software?

<p>Heterogeneous (A)</p> Signup and view all the answers

Which operator is effective in reducing total data transmission in distributed query processing?

<p>Semi-join (A)</p> Signup and view all the answers

What operation can be used to rebuild a table from vertical fragments?

<p>Union (A)</p> Signup and view all the answers

In horizontal fragmentation, which operation can be performed on the fragments to construct a table?

<p>Union (C)</p> Signup and view all the answers

What is true about the global conceptual schema in logical integration?

<p>It is not materialized. (C)</p> Signup and view all the answers

What is the process of assigning data fragments to specific sites in a distributed system called?

<p>Allocation (A)</p> Signup and view all the answers

What term describes the intelligent distribution of data fragments for improved performance?

<p>Allocation (D)</p> Signup and view all the answers

When data is updated by a user, and this update reflects in all tables across multiple sites, this is known as?

<p>Replication (C)</p> Signup and view all the answers

What process involves creating and maintaining multiple copies of data at different sites?

<p>Replication (D)</p> Signup and view all the answers

What is the first type of optimization performed in query optimization?

<p>Local level (A)</p> Signup and view all the answers

The second type of optimization in query optimization occurs at which level?

<p>Global level (A)</p> Signup and view all the answers

Which of the following is a primary driver for local database execution?

<p>Local CPU and disk I/O time (B)</p> Signup and view all the answers

What do nearly all global optimization alternatives overlook?

<p>Commit times (C)</p> Signup and view all the answers

In query optimization, which cost is considered dominant compared to local processing?

<p>Network communication cost (D)</p> Signup and view all the answers

Which type of cost is deemed important in query optimization?

<p>Both local and global costs (C)</p> Signup and view all the answers

What is a tree data structure that represents a relational algebra expression called?

<p>Query tree (D)</p> Signup and view all the answers

What does a query tree specifically represent?

<p>Relational algebra expression (D)</p> Signup and view all the answers

What is the process of dividing a database into various sub-tables for efficient storage called?

<p>fragmentation (B)</p> Signup and view all the answers

Which of the following ensures that fragments can reconstruct the original relation?

<p>fragmentation (C)</p> Signup and view all the answers

What type of integration occurs in a distributed database where various sub-tables of the database are utilized?

<p>fragmentation (C)</p> Signup and view all the answers

What aspect of cost differentiates query processing in DDBMS from that in centralized DBMS?

<p>communication (A)</p> Signup and view all the answers

In physical integration, what must the integrated database be to ensure proper configuration?

<p>materialized (A)</p> Signup and view all the answers

Which tools aid in the integration of a distributed database system?

<p>Extract-Transform-Load (ETL) (A), Enterprise Application Integration (EAI) (B), Enterprise Information Integration (EII) (C)</p> Signup and view all the answers

In query trading algorithms for distributed database systems, what is the controlling site for a query referred to as?

<p>requester (C)</p> Signup and view all the answers

What type of operations should be performed at the site where most fragmented data is present?

<p>join (D)</p> Signup and view all the answers

Which term refers to the replicas converging to the same value?

<p>Mutual consistency (D)</p> Signup and view all the answers

Which method requires the availability of lock managers at each site?

<p>Distributed 2PL (D)</p> Signup and view all the answers

In a centralized protocol, lock requests are issued to which entity?

<p>Centralized 2PL (B)</p> Signup and view all the answers

What do we refer to the transaction manager at the originating site in a distributed database system?

<p>Coordinator (A)</p> Signup and view all the answers

Which approach guarantees that deadlocks cannot occur in the first place?

<p>Deadlock avoidance (D)</p> Signup and view all the answers

What is defined as the probability that a system does not experience failures over a specific time frame?

<p>Reliability (C)</p> Signup and view all the answers

What describes the probability that a system operates according to its specifications at a given point in time?

<p>Availability (C)</p> Signup and view all the answers

Which approach to managing deadlocks is the most popular in a distributed environment?

<p>Deadlock avoidance (B)</p> Signup and view all the answers

What type of data requires a different algorithm and longer analysis time?

<p>Unstructured Data (A)</p> Signup and view all the answers

How does big data contribute to cost savings for organizations?

<p>Through enabling optimization of processes (C)</p> Signup and view all the answers

What benefit does big data provide by enabling analysis of large datasets efficiently?

<p>Streamlined operations (D)</p> Signup and view all the answers

In what way does big data assist in marketing efforts?

<p>By monitoring social media trends (C)</p> Signup and view all the answers

How does big data help businesses tailor marketing strategies?

<p>By analyzing customer behavior and preferences (C)</p> Signup and view all the answers

What advantage does big data provide with regards to consumer behavior insights?

<p>It enables personalized marketing campaigns (D)</p> Signup and view all the answers

Which of the following benefits relates to the speed of customer engagement through big data?

<p>Time Saving (B)</p> Signup and view all the answers

What is a significant feature of large datasets analyzed in big data?

<p>They often include both structured and unstructured data (A)</p> Signup and view all the answers

Flashcards

Distributed Database

A database system where data is stored and managed across multiple physical locations connected by a network.

Centralized Database

A database system where all data is stored and managed in a single physical location.

Transparency in Distributed Databases

The ability for users to access data from a distributed database as if it were stored locally.

Schema Mapping

The process of defining relationships and mapping data between different databases in a distributed system.

Signup and view all the flashcards

Schema Matching

The process of identifying if two database objects (like tables) have similar semantics or meaning.

Signup and view all the flashcards

Homogeneous Distributed Database

A distributed database system where all sites use the same database software and schema.

Signup and view all the flashcards

Heterogeneous Distributed Database

A distributed database system where different sites can use different database software and schemas.

Signup and view all the flashcards

Replicated Database

A database system where data is replicated across multiple sites to ensure high availability and fault tolerance.

Signup and view all the flashcards

Leaf Node

In a tree-like data structure, nodes that have no children, representing the end of a branch.

Signup and view all the flashcards

Semi-Join Operator

In distributed query processing, a semi-join operator efficiently reduces data transmission by only sending necessary data.

Signup and view all the flashcards

Rebuilding a Table from Vertical Fragments

Rebuilding a table from vertical fragments involves combining the fragments via a join operation.

Signup and view all the flashcards

Horizontal Fragmentation

Concatenating horizontal fragments into a single table can be achieved with a union operation.

Signup and view all the flashcards

Global Conceptual Schema

In logical integration, a global conceptual schema is virtual, not physically materialized.

Signup and view all the flashcards

Data Allocation

Data allocation is the process of assigning data fragments to specific sites in a distributed system.

Signup and view all the flashcards

Data Fragmentation

Intelligent data fragmentation for better performance and availability for end-users.

Signup and view all the flashcards

Replication Schema

A replication schema describes how fragments are replicated in a distributed database.

Signup and view all the flashcards

Database Fragmentation

The process of dividing a database into smaller, independent sub-tables or relations, allowing data to be distributed and stored efficiently across different systems.

Signup and view all the flashcards

Reconstructable Fragments

When performing fragmentation, it's crucial to ensure that the sub-tables can be recombined to recover the original database structure.

Signup and view all the flashcards

Fragmentation

The process of dividing a complete database into smaller sub-tables or relations for efficient storage and management across multiple systems.

Signup and view all the flashcards

Query Processing in DDBMS

Query processing in distributed database systems is inherently different from centralized systems because the need to exchange data across physically separated locations adds communication overhead and complexity.

Signup and view all the flashcards

Physical Integration

In physical data integration, the source databases are combined into a single, integrated database that is physically stored. This integrated database is materialized, meaning it exists as a concrete data structure.

Signup and view all the flashcards

ETL Tools for Integration

Extract-Transform-Load (ETL) tools are crucial for integrating data in distributed database systems. They extract data from source systems, transform it into a consistent format, and load it into the target database.

Signup and view all the flashcards

Query Trading Algorithm: Requester and Seller

In the query trading algorithm, the site initiating a distributed query is referred to as the requester, while the sites processing the query locally are considered sellers.

Signup and view all the flashcards

Data Transfer for Fragment Operations

When performing operations like joins or unions on fragments located across multiple sites, it's more efficient to transfer the fragmented data to a single site where most of the data resides and execute the operation there.

Signup and view all the flashcards

Local Level Optimization

Query optimization that takes into account the characteristics and resources of each individual database instance (DBE) in a distributed system.

Signup and view all the flashcards

Global Level Optimization

Query optimization that considers the overall distribution of data across all DBEs in a distributed network, optimizing data transfer and communication between sites

Signup and view all the flashcards

Local Processing Time

The time it takes to process a query at a local database instance, including CPU and disk I/O operations.

Signup and view all the flashcards

Communication Cost

The cost associated with transmitting data between different database instances in a distributed system.

Signup and view all the flashcards

Query Tree

A data structure that visually represents the sequence of operations involved in processing a relational database query.

Signup and view all the flashcards

Distributed Query Optimization

The process of finding the most efficient way to execute a query in a distributed database environment, by considering both local and global optimization strategies.

Signup and view all the flashcards

Optimal Solution (Distributed Query Optimization)

The aim of distributed query optimization: finding a solution that is good enough, but not necessarily the absolute best, to optimize query execution.

Signup and view all the flashcards

Weak Consistency

Ensures all replicas in a distributed system eventually converge to the same value, but doesn't guarantee real-time consistency.

Signup and view all the flashcards

Mutual Consistency

Guarantees that all replicas in a distributed system have the same data at all times.

Signup and view all the flashcards

DB Consistency

A type of consistency in distributed databases where data is guaranteed to be accurate and up-to-date across all nodes.

Signup and view all the flashcards

Transaction Consistency

The property of a transaction that ensures all changes happen atomically, meaning they either all succeed or all fail.

Signup and view all the flashcards

Centralized 2PL

A technique used to manage locks in a distributed database system where a central lock manager is responsible for granting and releasing locks.

Signup and view all the flashcards

Deadlock Prevention using Timestamps

A deadlock prevention method that uses timestamps to prioritize transactions, resolving deadlocks by potentially aborting lower-priority transactions.

Signup and view all the flashcards

Deadlock Detection and Resolution

A mechanism used to recover from a deadlock situation in a distributed database system.

Signup and view all the flashcards

Availability

The probability that a system will be operational according to its specifications at a given point in time.

Signup and view all the flashcards

Structured Data

Data that is organized in a structured format, often in rows and columns, making it easy to analyze and process. Think spreadsheets or relational databases.

Signup and view all the flashcards

Unstructured Data

Data that lacks a predefined structure or organization, making it challenging to analyze and process. Examples include text files, images, audio files, and videos.

Signup and view all the flashcards

Semi-structured Data

Data that has a semi-structured format, meaning it has some organization but not as rigid as structured data. Examples include XML and JSON documents.

Signup and view all the flashcards

Why Unstructured Data is Difficult to Analyze

Unstructured data is challenging to analyze and process because it requires different algorithms and can take longer time to analyze.

Signup and view all the flashcards

Big Data and Cost Savings

Big data helps organizations optimize processes, reduce expenses, and make informed decisions, saving money in the process.

Signup and view all the flashcards

Big Data and Time Savings

Big data streamlines operations, automates tasks, and accelerates decision-making, saving time and resources.

Signup and view all the flashcards

Big Data and Social Media Listening

Big data helps businesses monitor social media trends, customer sentiments, and brand reputation, aiding in targeted marketing and engagement, improving customer relationships.

Signup and view all the flashcards

Big Data and Customer Acquisition

Big data provides valuable insights into consumer behavior, enabling personalized marketing campaigns and better customer experiences, driving customer acquisition.

Signup and view all the flashcards

Study Notes

Distributed Database Systems - Oral Bank (Unsolved)

  • Oral bank questions cover distributed database systems concepts, specifically focusing on distributed database systems, data integration, fragmentation, and query processing.
  • Multiple choice questions assess understanding of key terms, concepts, and techniques in the field.
  • Questions explore different fragmentation strategies (horizontal, vertical, hybrid) and how they relate to the distribution of data across multiple sites.
  • The exam tests understanding of how to integrate databases and their related transformations and query processes.
  • Concepts and algorithms relating to distributed database systems, such as fragmentation and allocation, are examined within the context of real-world applications and database design.
  • Deadlocks within distributed databases and solutions to prevent or resolve them are also covered.

Distributed Database Systems - Concepts

  • Distributed Database System: A collection of databases spread across multiple sites on a network.
  • Centralized Database System: A database system located on a single site.
  • Data Transparency: Ability of a user to access data located at any site like local data.
  • Fragmentation: Dividing a relation into smaller schemas.
  • Horizontal Fragmentation: Dividing a relation into disjoint groups of tuples based on a condition applied to tuples.
  • Vertical Fragmentation: Dividing a relation into disjoint groups of attributes.
  • Hybrid Fragmentation: Combining horizontal and vertical fragmentation strategies.
  • Mapping: Transforming objects between different schemas.
  • Matching: Identifying semantically equivalent objects.
  • Modelling: Defining objects in a way that enables appropriate relationships between semantically equivalent objects.
  • Data Replication: Creating copies of data at multiple sites.
  • Location Transparency: Users can access data without knowing its exact physical location.
  • Query Optimization: Techniques to improve query efficiency in distributed database systems.
  • Query Execution: Executing queries across multiple sites in a distributed database.
  • Data Allocation: Distributing data fragments across different sites.

Query Processing in Distributed DBMS

  • Query Tree: A tree-structured representation of a query, used in query processing in a distributed database system.
  • Data Transfer Costs: Costs involved in transferring data across a network.
  • Local Query Optimization: Optimizing queries at individual sites.
  • Global Query Optimization: Optimizing queries across multiple sites.
  • Query Trading: Algorithm used for distributing queries amongst sites in a distributed manner.
  • Communication overhead: Overhead of data transfer and communication between different sites.
  • Local level optimization: Minimizing query processing time at each individual site.
  • Global level optimization: Optimizing queries across all sites involved.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser