Physical DB Design: Relational Databases

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following scenarios would warrant specifying more access paths for a file?

  • When the file contains attributes that are neither primary nor candidate keys.
  • When the file is small and rarely accessed.
  • When the file is infrequently updated.
  • When the file is updated frequently. (correct)

What is the primary purpose of specifying access paths on candidate key attributes?

  • To ensure uniqueness constraints are efficiently checked. (correct)
  • To optimize join operations with other relations.
  • To speed up range queries on the attribute.
  • To reduce storage space used by the attribute.

In the context of physical database design, what is a critical trade-off to consider when deciding whether to index an attribute?

  • The trade-off between security and accessibility.
  • The trade-off between data integrity and data redundancy.
  • The trade-off between query performance and update overhead. (correct)
  • The trade-off between storage space and memory usage.

Under which of the following conditions is creating a multiattribute index most beneficial?

<p>When multiple attributes from one relation are frequently used together in queries. (B)</p> Signup and view all the answers

Which statement accurately describes the implications of choosing an attribute for a clustered index?

<p>The file is physically ordered on that attribute, and only one clustered index is allowed per table. (D)</p> Signup and view all the answers

For what type of queries do range queries benefit most from?

<p>Queries that benefit from clustering. (A)</p> Signup and view all the answers

In general, why are B+-trees the typical choice for indexing in RDBMSs compared to hash indexes?

<p>B+-trees support both equality and range queries. (A)</p> Signup and view all the answers

When is dynamic hashing a particularly suitable choice for file organization?

<p>When the file is very volatile, growing and shrinking continuously. (B)</p> Signup and view all the answers

What is denormalization, and why might a database designer choose to implement it?

<p>The process of combining attributes from multiple tables into a single table; to speed up queries at the cost of redundancy. (C)</p> Signup and view all the answers

What practical reasons typically drive the decision to denormalize a database?

<p>To improve the execution speed of frequently occurring queries and reports. (D)</p> Signup and view all the answers

What are the primary inputs to the database tuning process after the database is deployed?

<p>Actual statistics about usage patterns and revised physical design parameters. (C)</p> Signup and view all the answers

What is the main goal of database tuning?

<p>To optimize database performance by making applications run faster, lowering response times, and improving overall throughput. (A)</p> Signup and view all the answers

Which of the following is a key indicator that indexes may need to be revised?

<p>Certain queries are running slowly due to lack of an index, or existing indexes are not utilized. (A)</p> Signup and view all the answers

What action does rebuilding a clustered index typically involve?

<p>Reorganizing the entire table ordered on that key. (D)</p> Signup and view all the answers

Which of the following scenarios might justify dropping an existing index?

<p>When the index is causing excessive overhead due to frequent changes to the indexed attribute. (A)</p> Signup and view all the answers

Why is it important to avoid using DISTINCT unnecessarily in SQL queries?

<p>It often causes a sort operation, which can be expensive. (C)</p> Signup and view all the answers

After identifying a poorly performing query, which strategies can be applied?

<p>Tuning indexes, rewriting the query, or both. (A)</p> Signup and view all the answers

What does query tuning involve?

<p>Analyzing and modifying SQL queries to improve performance. (B)</p> Signup and view all the answers

Why might query optimizers not use indexes in the presence of arithmetic expressions or substring comparisons?

<p>These operations transform the data in a way that prevents the optimizer from using the index effectively. (C)</p> Signup and view all the answers

What is a common strategy for improving the performance of a query with multiple selection conditions connected by OR?

<p>Splitting the query into a UNION of simpler queries, each with a condition that allows index usage. (A)</p> Signup and view all the answers

How can using temporary tables assist in optimizing complex queries?

<p>By storing intermediate results that can be reused, avoiding redundant computations. (A)</p> Signup and view all the answers

In the context of transactions, what does atomicity ensure?

<p>That the transaction completes in its entirety or not at all. (D)</p> Signup and view all the answers

What is an interleaved execution of processes in a multi-programming system?

<p>The OS executes some commands from one process, suspends it, and executes commands from the next process and so on. (B)</p> Signup and view all the answers

Why is concurrency control essential in a multiuser database system?

<p>To manage multiple users accessing and updating the same data. (C)</p> Signup and view all the answers

What is the 'lost update' problem in concurrent transactions?

<p>One transaction overwrites changes made by another transaction, thus losing those updates. (B)</p> Signup and view all the answers

What is a 'dirty read' in the context of database transactions?

<p>Reading data that has been modified by another transaction that has not yet committed. (D)</p> Signup and view all the answers

What is the approach taken by a DBMS when a transaction fails after executing some operations but before completing all of them?

<p>The DBMS must ensure that the transaction has no effect whatsoever on the database, undoing any operations. (D)</p> Signup and view all the answers

Which type of failure is typically classified as a media failure?

<p>Main memory failure. (A)</p> Signup and view all the answers

What marks the beginning of a transaction's execution from the recovery manager's perspective?

<p>BEGIN_TRANSACTION. (A)</p> Signup and view all the answers

What signal is sent at the end of a transaction if any changes executed by the transaction can be safely committed to the database and will not be undone?

<p>COMMIT_TRANSACTION. (C)</p> Signup and view all the answers

What can be used when a system failure occurs to restore data?

<p>The system log. (D)</p> Signup and view all the answers

What is force-writing in the context of database transactions?

<p>A process where the log file must now be written to the disk before the transaction can be committed. (D)</p> Signup and view all the answers

A transaction that fails to complete leaves a database in what sort of inconsistent state?

<p>The recovery technique must undo any effects of the transaction. (D)</p> Signup and view all the answers

Isolation is enforced by which part of the DBMS?

<p>The concurrency control subsystem. (C)</p> Signup and view all the answers

What condition describes the circumstances relating to a recoverable schedule?

<p>If all transactions T' that have written an item are completed, then a transaction T in S commits. . (C)</p> Signup and view all the answers

What is the name of an event where an uncommitted transaction has to be rolled back because it read an item from a failed transaction?

<p>Cascading rollback. (A)</p> Signup and view all the answers

What should occur to satisfy the criterion that prevents cascading rollbacks from occurring?

<p>Every transaction in the schedule reads only items that were written by committed transactions. (D)</p> Signup and view all the answers

What are schedules called in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted)?

<p>Strict schedule. (A)</p> Signup and view all the answers

What is a necessary, although not sufficient by itself, action that each transaction T must take part in?

<p>A transaction T must participate in the schedule. (D)</p> Signup and view all the answers

If a schedule S of n transactions is a complete schedule, what must the operations in S include?

<p>A commit or abort operation as the last operation for each transaction in the schedule. (D)</p> Signup and view all the answers

When are two operations in a schedule said to conflict?

<p>The operations belong to different transactions, they access the same item X, and at least one of the operations is a write_item operation. (D)</p> Signup and view all the answers

Under what conditions is a schedule S of n transactions serializable?

<p>if it is equivalent to some serial schedule of the same n transactions. (D)</p> Signup and view all the answers

<h1>=</h1> <h1>=</h1> Signup and view all the answers

Flashcards

Uniqueness Constraints on Attributes

Specifying access paths on candidate key attributes or sets of attributes that should be unique.

Design Decisions About Indexing

Attributes required in equality or range conditions and keys/join conditions require access paths for indexing.

Whether to Index an Attribute

The attribute must be a key, or used in a query for selection (equality/range) or join conditions.

What Attribute to Index On

An index can be formed on one or multiple attributes.

Signup and view all the flashcards

Denormalization

Speeds up often-occurring queries and transactions. Storing the logical database design in weaker normal form

Signup and view all the flashcards

Materialized Join

A materialized join of two tables. represents extreme redundancy. Any updates to 11 and 12 would have to be applied to TEACH.

Signup and view all the flashcards

Vertical Partitioning

Splitting a table vertically into multiple tables.

Signup and view all the flashcards

When Disk Accesses are too High

When a query takes too many disk accesses.

Signup and view all the flashcards

Using DNO = DNUMBER

Using DNO = DNUMBER helps make an index on DNO available.

Signup and view all the flashcards

Redundant Distinct Operations

Operations can be avoided without changing the result.

Signup and view all the flashcards

Collapsing Multiple Queries

Collapsing multiple smaller queries into a super query.

Signup and view all the flashcards

Indexing

Select indexes may be dropped or created based on tuning analysis.

Signup and view all the flashcards

Tables May Be Joined

When data from multiple tables is needed and used together more frequently

Signup and view all the flashcards

BCNF-based Table Storage

Each table groups attributes used together. For example EMPLOYEE may be split into EMP1 and EMP2.

Signup and view all the flashcards

BCNF Relation Storage

A relation split across multiple tables in BCNF can be stored into multiple tables also in BCNF.

Signup and view all the flashcards

Lost Update

Occurs when two transaction access the same data items

Signup and view all the flashcards

Temporary Update Problem

One transaction updates and another accesses updates

Signup and view all the flashcards

Incorrect summary function

When one is calculating an aggregate summary function on a number of records.

Signup and view all the flashcards

Unrepeatable Reads

The isolation makes the information secure.

Signup and view all the flashcards

Transaction Failures

Failures are classified into transaction, system and media failures.

Signup and view all the flashcards

Computer Failure

A hardware, software, or network error occurs in the computer system during transaction execution.

Signup and view all the flashcards

Transaction/System Error

An operation in the transaction may cause it to fail.

Signup and view all the flashcards

Database Transaction

Atomic work completes or fails entirely

Signup and view all the flashcards

BEGIN_TRANSACTION Operation

Marks start of transaction execution.

Signup and view all the flashcards

ROLLBACK Operation

Rollback = ended unsuccessfully, must undo

Signup and view all the flashcards

COMMIT_TRANSACTION

This signals a successful end of the transaction.

Signup and view all the flashcards

System Log

System maintains logo for transactions

Signup and view all the flashcards

Commit Point of Transaction

Transaction reaches point is permanent

Signup and view all the flashcards

The Log File

This must be kept to record changes to disk.

Signup and view all the flashcards

Atomicity

A transaction either performs entirely or not at all.

Signup and view all the flashcards

Isolation

Must be isolated from other transactions

Signup and view all the flashcards

Durability

Database must be persistent.

Signup and view all the flashcards

The Durability Property

Guarantees durability and automicity.

Signup and view all the flashcards

Schedules (Histories)

A schedule is a sequence of operations between multiple transactions

Signup and view all the flashcards

Conflict of Schedules

If any two operations from diff transactions access item X

Signup and view all the flashcards

Committed Projection C(S)

Includes operations in transaction is committed.

Signup and view all the flashcards

Recoverable Schedules

That which, once committed shouldn't rollback.

Signup and view all the flashcards

Cascadeless Rollback, or Avoiding it.

Schedule = every read is known

Signup and view all the flashcards

Two phase locking

Used for concurrency control

Signup and view all the flashcards

Study Notes

Physical Database Design in Relational Databases

  • Performance constraints on query invocation times, aiming for under 20 seconds, guide access path attribute priorities
  • Time-constrained queries and transactions elevate the priority of selection attributes for primary access structures

Analyzing Update Operation Frequencies

  • A minimum number of access paths should be set for frequently updated files
  • Updating access paths can slow down update operations

Uniqueness Constraints on Attributes

  • Access paths should be specified for candidate key attributes or sets of attributes to guarantee uniqueness
  • Indexes can efficiently check uniqueness constraints, using only the index's leaf nodes

Physical Database Design Decisions

  • Relational systems represent each base relation as a physical database file
  • Access path options include specifying file types and defining attributes for indexes.
  • At most one index can be primary or clustered per file
  • Multiple secondary indexes are allowed

Indexing Design Decisions

  • Attributes used in equality or range conditions and those used in join conditions or keys require access paths
  • Query performance hinges on indexes or hashing schemes for selections and joins

Indexing Categories

  • Whether to index an attribute is based on that attributes use such as it being a key, or query uses to satisfy selections and joins is present
  • Indexes can be used without retrieving full records by scanning indexes

Attributes to Index

  • Indexes can be made on one or many attributes.
  • Multiple attributes within a relation in several queries warrant a multiattribute index, e.g., (garment_style, color)
  • The sequencing of attributes has to correspond to the queries

Clustered Index Setup

  • One primary/clustering index is allowed per table, physically ordering the file by that attribute
  • CLUSTER keyword specifies this in most RDBMSs
  • Primary indexes are created, for key attributes; clustering indexes, for non-key attributes
  • The decision of the attribute best benefiting from ordering depends on if keeping the table ordered on the attribute
  • Range queries get help from clustering
  • Only index searches mean the corresponding index should not be clustered, because the main effect of clustering is achieved through record retrieval

Hash vs Tree Index

  • RDBMSs generally use B+-trees for indexing, although ISAM and hash indexes are also available in some systems
  • B+-trees support both equality/range queries
  • Hash indexes excel in equality conditions, especially during joins to find matches

Dynamic Hashing

  • Dynamic hashing is suitable for volatile files that grow and shrink continuously
  • Dynamic hashing schemes are not offered by most commercial RDBMSs

Speeding Up Queries

  • Denormalization separates logically related attributes into tables to reduce redundancy
  • Denormalization can improve query execution speed

Tradeoffs for Execution Speed

  • Faster execution is traded with normalization such that frequent queries and transactions can occur.
  • Logical design is stored in a weaker normal form, e.g., 2NF or 1NF, instead of BCNF or 4NF
  • Attributes needed for answering queries/reports are added to a table, avoiding joins, but reintroducing functional/transitive dependencies and redundancy issues

Denormalization

  • Storing extra tables maintains original functional dependencies lost during the decomposition of relations

Database Tuning

  • Post deployment, actual database use reveals problem areas unnoticed during initial design
  • Revision of inputs from Section 16.1.1. involves the gathering of statistics about usage patterns, and the monitoring of internal DBMS processing, such as, query optimizations
  • Bottlenecks involving data or device contention are revealed
  • Volumes of activity and data sizes are estimated
  • Constant monitoring and revision of the physical design is maintained
  • Tuning attempts to reduce query/transaction response times and improve overall throughput

Factors used by the DBMS

  • Statistics related to factors are an input to the tuning process, including:
  • Sizes of individual tables
  • Number of distinct values in a column
  • Frequency of query/transaction submission in a time interval
  • Times for different query/transaction processing phases

Utilization

  • Other statistics create a profile of the data contents, use of the database and include include information by monitoring
  • Storage statistics: Data about allocation of storage into tablespaces, indexspaces, and buffer ports
  • Total read/write activity (paging) on disk extents and disk hot spots for I/O and device performance
  • Query/transaction processing statistics: Execution times of queries and transactions, optimization times during query optimization
  • Locking/logging related statistics: Rates of issuing different types of locks, transaction throughput rates, and log records activity
  • Index statistics: Number of levels in an index, number of noncontiguous leaf pages, etc

Tuning

  • Tuning transactions, concurrency control, and recovery involve:
    • Avoiding excessive lock contention to increase concurrency
    • Minimizing logging overhead and unnecessary data dumping
    • Optimizing buffering and scheduling of processes
    • Allocating resources for efficient utilization of disks, RAM, and processes
  • Solutions are tailored to specific systems by the trained DBAs who tune physical DBMS parameters and configurations of devices and operating systems

Tuning Indexes

  • Index choices can be revised if:
    • Queries take too long because of missing indexes
    • Indexes are not being utilized
    • Indexes cause excessive overhead because of frequent attribute changes
  • DBAs can analyze query execution plans using DBMS commands or trace facilities to understand the utilization of operations

Tuning Goals

  • Tuning dynamically evaluates the requirements and attempts indexes reorganization
  • Dropping and building indexes can have heavy cost but improvements can be huge

Indexing Costs

  • Table updating generally stops while an index is dropped or created; the loss must be accounted for
  • Besides dropping or creating indexes and changing from a nonclustered to a clustered index and vice versa, index rebuilding may improve performance.
  • Many deletions on the index key can have pages containing wasted space, which may be claimed during a rebuild operation while insertions may cause too many overflows

Clustered Tuning

  • Rebuilding a clustered index rearranges the entire table ordered on that key

Options

  • Option availability of indexing and its reorganization vary
  • Sparse indexes have one index pointer for each page (disk block) in the data file, while dense indexes have an index pointer for each record.
  • Some indexes can be sparse indexes with clustering index with B+-trees while others indexes can be sparse with a clustering index using an ISAM file and dense indexes as B+-trees
  • In some versions, setting up a clustering index is limited to a dense index, the DBA must work with this limitation

Database Designs

  • A possible denormalization is departure from keeping every table a BCNF relation. If a specific design has bad results, it could result in a logical database needing adjustments and remapping of new tables and indexes

Data Requirement

  • A database needs to be driven by processing requirements along with data need
  • Dynamic processing requirements necessitate conceptual, logical, and physical schema adjustments

Physical design

  • Several processes may occur depending on processing requirements:
  • Due to the frequency of attributes form two or more tables are brought together, existing tables may be denormalized, reducing the normalization level from BCNF to 3NF, 2NF, or 1NF
  • 3NF and 2NF address different problem dependencies that do not relate to each other, that is why normalization address these independently
  • For table sets, there may be many possible design choices, which may have one replaced

Common place adjustments

  • For attributes R(K,A,B,C,D,...), BCNF can be split into multiple tables also in BCNF
  • Example is R1(K,A,B), R2(K,C,D), R3(K...), in each table for accessing.
  • Tables with lots of columns may want to vertically partition attributes for access.
  • Repeating of attributes in other tables even though potential anomalies may occur
  • The main table guarantees updated data, is up-to-date

Data Control

  • Slicing takes a table vertically like partitioning.
  • Each table in the diagram contains the set of distinct products/tuples in which a specific query or set of transactions apply.
  • These changes can be used to meet specific requirements

Bad Queries

  • Inappropriate index selection causes bad query performance because commands produce
  1. Too many disk accesses occur such as queries scanning all
  2. Showing appropriate indexes are not being accessed

Optimizers

  • Common issues seen in optimizing can be
  1. Optimizers won't use arithmetic
  2. Null comparisons aren't easy to optimize
  3. Unnecessary distinct operation lead to sorts

Correlated queries

  • Bad practice can arise searching the inner table for each outer tuple.
  • You can use temporary queries and tables to make things more efficient
  • In join conditions use clustered indexes over string comparisons

Clause Order

  • Some optimizers are greatly affected with clause order
  • Switch this to look at the smaller scanner used by the larger one
  • Perform bad in these conditions
  • Uncorrelated/Correlated with aggregate/no aggregate subqueries However, the first one above doesn't seem to be a problem that query optimizer evaluate

Additional Query Tuning Guidelines

  • Avoid using OR operator as this could prompt optimizer to use indexes; query be split into a union

SQL commands

  • Following transformations help:
  • Not transforms to positive
  • Embedding can lead to join

Rewrites

  • Rewrite WHERE to utilise indexes. The syntax to rewrite SQL statements is SELECT REGION#, PROD_TYPE, MONTH, SALES FROM SALES_STATISTICS WHERE REGION# = 3 AND ((PRODUCT_TYPE BETWEEN 1 AND 3) OR (PRODUCT_ TYPE BETWEEN 8 AND 10));

Improvements

  • Actions to improve the inefficient queries are by such as the use of temp tables, view and certain avoidance operations
  • Techniques will depend upon optimzer

Introduction to Transaction Processing

  • Techniques used involves by the concurrency control, and the recovery, The transaction processing includes in some basic concepts of these transaction
  • Concurrent execution of transactions and recovery from transaction failures
  • Single-user and multiuser database systems and demonstrates how concurrent execution of transactions can take place
  • Concept of transaction, read and write operations on basic model

Multi users

  • Multi users access computer same time involves in MULTI programming

Interleaving

  • I/O operations keeps CPU busy , then prevent the long process
  • More CPU involves in the process

Granularity

  • Database models and functions which the size of data describes and has more concepts to cover

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser