Podcast
Questions and Answers
Which of the following is NOT a characteristic of Key-Value Stores?
Which of the following is NOT a characteristic of Key-Value Stores?
Key-Value Stores lack a static data schema.
Key-Value Stores lack a static data schema.
True
What is the main operation used to retrieve a value from a Key-Value Store?
What is the main operation used to retrieve a value from a Key-Value Store?
get(key)
In a Key-Value Store, the __________ operation is used to remove an entry from the store.
In a Key-Value Store, the __________ operation is used to remove an entry from the store.
Signup and view all the answers
Match the following Key-Value Store characteristics with their descriptions:
Match the following Key-Value Store characteristics with their descriptions:
Signup and view all the answers
Which part of the HBase client configuration is critical for connecting to the database?
Which part of the HBase client configuration is critical for connecting to the database?
Signup and view all the answers
In HBase, the columns defined in a table must be established at compile-time.
In HBase, the columns defined in a table must be established at compile-time.
Signup and view all the answers
What does the 'Put' operation do in HBase?
What does the 'Put' operation do in HBase?
Signup and view all the answers
In a key-value store, the key is unique and is used to retrieve the corresponding __________.
In a key-value store, the key is unique and is used to retrieve the corresponding __________.
Signup and view all the answers
Match the key-value stores with their key-value structure:
Match the key-value stores with their key-value structure:
Signup and view all the answers
Which of the following is not a common feature of KV-Stores?
Which of the following is not a common feature of KV-Stores?
Signup and view all the answers
All KV-Stores check for integrity constraints to ensure data validity.
All KV-Stores check for integrity constraints to ensure data validity.
Signup and view all the answers
What is the primary purpose of a write-ahead log (WAL) in KV-Stores?
What is the primary purpose of a write-ahead log (WAL) in KV-Stores?
Signup and view all the answers
A ________ index contains one entry for every record in the sequential file.
A ________ index contains one entry for every record in the sequential file.
Signup and view all the answers
Which of the following data model storage methods is not common in KV-Stores?
Which of the following data model storage methods is not common in KV-Stores?
Signup and view all the answers
Match the following indexing types with their descriptions:
Match the following indexing types with their descriptions:
Signup and view all the answers
Name one key property of B-Trees.
Name one key property of B-Trees.
Signup and view all the answers
Versioning in KV-Stores allows the storage of multiple copies of the same data.
Versioning in KV-Stores allows the storage of multiple copies of the same data.
Signup and view all the answers
Study Notes
Big Data Systems - Key Value Stores I
- Big data systems use key-value stores for storing and retrieving data
- Key-value stores are containers for storing key-value pairs
- NoSQL semantics (non-relational) are used in key-value stores
- Key-value stores provide increased scalability, speed, availability, and flexibility
Timeline I
- A schedule of topics taught in the course
- Includes dates, topics covered on Tuesdays and Wednesdays
This Lecture
- KV Overview
- B-Trees
- Sources of information for the lecture: Hans-Arno Jacobsen (TUM) - Distributed Systems, Johannes Zschache (University of Leipzig) - NoSQL-Datenbanken, Volker Markl (TUB) - Database Technology
Where are we?
- Data Management
- Operational storage (small writes and reads)
- Visualisation, Analytics and Query Language
Serving Requests
- User access to inverted index.
- Input: search term.
- Output: relevant URLs.
- Requirements: Many search terms, many URLs, frequent updates, and interactive speed (<1s).
More on Interaction
- Today's internet applications have huge amounts of stored data (1PB=10^15 bytes).
- There are a vast number of Internet users (e.g., 3.4 billion).
- Applications require frequent updates, fast data retrieval, and rapidly changing data definitions.
Enter Key-Value Stores
- Key-value stores are scalable containers for key-value pairs
- They use NoSQL semantics (non-relational)
- Simplified semantics (and syntax) in exchange for increased scalability, speed, availability, and flexibility
- Key-value store operations: Write/update, Read, Delete.
This Lecture
- Reminders of two main types of queries:
- OLTP: Read and Write access to data, few rows/operation, high qps (queries per second).
- OLAP: Mostly read access to data, many rows/operation, low qps.
- Distinctions are also relevant at web scale.
- Discusses processing OLTP-like queries at web scale.
Web Scale
- More information on web scale data store vs. relational systems.
- "MonetDB is Web Scale" - Youtube Link included.
DBMS vs KV-Store
- Comparison between DBMS (SQL) and KV-Store
- Relational data schema vs. No (static) data schema.
- Data types, Foreign keys and Full SQL support in DBMS.
- Raw byte access, no relations and single-row operations in KV-Store.
KV-Stores - Requirements
- Horizontal scalability, considering user growth, traffic patterns, and request/data size.
- High speed for single record read and write operations.
- Data flexibility and adapting to changing data definitions
- Reliability through commodity hardware, failure recovery, availability, geo-distribution, global user base, and fast access.
KV-Store Client Interface - Overview
- Main operations: Write/update (put(K,V)), Read (get(K)), Delete (delete(K))
- No aggregation, table joins or transactions in KV-store.
KV-Store Client Interface - Example HBase
- Initialization using ZooKeeper
- Column Family and Column are defined at runtime in HBase
KV-Stores in Practice
- Bigtable, Apache HBase, Apache Cassandra, Redis, Amazon Dynamo, Yahoo! PNUTS, LevelDB, RocksDB.
- Detailed examples of key-value pairs used for amazon, facebook and google .
Common Properties of KV-Stores
- Memory store and write-ahead log (WAL).
- Keeping data in memory for fast access and commit log.
- Versioning (different versions of data).
- Timestamping (tracking data changes).
- Replication (multiple copies of data).
- Failure detection and failure recovery.
Common Non-Features of KV-Stores
- Integrity constraints.
- Transactional guarantees (ACID).
- Powerful query languages.
- Materialized views.
Simple Index
- Prerequisite: sorted data file that is sorted on the primary key of a relation
- Index file stores (key, pointer) pairs.
- Key value K is associated with a pointer to a record where that key, value pair is located in the sorted file.
- Index types: Dense (one entry per record in sequential file) and Sparse (one entry per block in the sequential file).
B-Trees
- Single-level index for simple access acceleration.
- In general B-Trees, typically B+ trees are used, that use multiple levels, the number of levels depend on the need.
- Blocks are at least 50% full to ensure efficient space usage.
- No overflow blocks are needed due to the structure of the tree.
Structure (B-tree)
- Index blocks are organized in a tree structure
- Balanced: All paths from root to leaf have the same length.
- Parameter n: Every block holds at least n and up to 2n search keys and at least n+1 pointers up to 2n+1 pointers. This is different form normal tree structure.
- Choice of n: n should be as large as possible with respect to block size.
- Alternative definition: Defines conditions for the minimum and maximum number of keys and pointers in each block (including leaves).
Definition of B+-Trees
- Keys in leaves are keys from the data and sorted from left to right.
- Root has at least one pointer.
- All pointers point to subsequent B-tree blocks on the next level below.
- Leaves have a pointer to the next leaf node on the right.
- Pointers in inner nodes point to B-tree blocks of the level below.
Example Configuration (B-tree)
- Example configuration of a B+-tree demonstrating storage of data in blocks. Shows inner nodes and leaf nodes structure.
Alternative Definition (B-tree)
- Parameter n definition conditions for the minimum and maximum numbers of keys and pointers in each block (including leaves).
Examples of Leaves (B-tree)
- Examples of full and partially filled leaf blocks in the B-tree structure
Examples of Inner Nodes (B-tree)
- Examples of full inner nodes and partially filled inner node blocks in the B-tree structure.
Example B+-Tree
- Example of a B+ tree with parameter n=2 showing the structure and organization of data in the tree.
Applications of B+-Trees
- B-trees can assume different index roles.
- Search key can be a primary key, order of data file is not important.
- Dense index: when order of data file is not important
- Sparse Index: when data file must be sorted by primary key.
- Search key is not necessary primary key
- Duplicate keys in leaves have to be handled
B-Tree Search
- Basic search operations in B-Trees.
Searching in B-Trees
- Search operations in a B-Tree
- Basic procedure for search based on keys and recursive search process to the leaf nodes
Example Searching in B-Trees
- Example of searching for a value K=60 in a sample B-tree.
- Demonstrates logical branching based on key match and comparison.
Example Searching in B-Trees (K=53)
- Example of searching for a value K=53 in a sample B-tree, shows how values are found in the tree.
Range Queries
- Queries with inequalities in a WHERE clause.
- Searches within a range [a,b]
Example Searching in B-Trees (10 ≤ K ≤ 28)
- Example of demonstrating range querying in a B-tree example
B-Tree Insertion
- Recursive algorithm for insertion.
- Search for corresponding leaf, insert key-pointer pair.
- Overflow occurs if no room in leaf node.
- Splits the leaf node, distributes keys, inserts new key-pointer pair in parent node.
- Recursively ascends if no space in root and create a new root.
Example of B-Tree Insertion (K=60 and K=61)
- Examples of inserting K=60 and K=61 into a B-Tree, demonstrations are provided for those operations
Insertion Cost (B-tree)
- Cost of insertion, searches and block writing during the insertion process. Includes cases where split is needed, ascend to root or create new root.
Deleting from a B-Tree
- Find the corresponding node
- Delete the key
- If minimal number of keys remains in the node: nothing else is to be done.
- If too few keys, merge with sibling nodes, or steal from sibling
- Adjusting parent node keys and possibility for delete propagations
Example of B-Tree Deletion (K=12, K=15, K=5)
- Examples of deleting values K=12, 15, 5 from a B-Tree example. Demonstrates how deletion can lead to merge and propagate actions.
Deletion Cost (B-tree)
- Cost of deletion operations
Deleting from B-Trees – Variation
- Assumption: B-tree data set usually grows, underflow is possible, and deleting values is done by creating tombstone in the tree
B-Tree Variations
- B*-tree
B-Trees for Non-Primary Keys
- Explanation of how the meaning of pointers in inner nodes changes for non-primary keys
B-Tree Variations: B*-Tree
- Overflow during insertion: distribution across all leaves.
- If not possible to distribute, create 3 new leaves from 2 leaves.
- Generate m+1 nodes from m nodes to improve memory utilization.
- Demonstrates possible alternative for B-tree when handling values. At least 66% utilization
Next Part (Key Value Stores and Distributed Architecture)
High Insert & Update in B-Tree
- Insertion cost in B-trees is proportional to log(N).
- Several disk accesses are made per write due to data retrieval.
Log Structured Merge - Tree
- Split index (in memory and disk).
- Co resides in memory (recent data), C1 resides on disk (older data).
- Both Co and C1 cover full data domain.
- O'Neil: Co = AVL-Tree, C1= B-Tree (to allow for compaction and merging operations)
LSM Tree – Compaction / Rolling Merge
- Updates in memory in place are fast.
- Updates on disk are fast in bulk for large blocks.
- Multiple merging levels for better cost amortizations to optimize large datasets.
LSM-Tree – Other Operations
- Search: start with CO then go to C1 if needed
- Delete: insert tombstone
- Actual delete during merge
- Nice side effects (multiple version of data, used for recovery)
LSM Tree -> Log Structured File
- Faster inserts (O(1) for insert ).
- Writes goes into Memtable (+ log-file).
- Memtable flushed into immuatable Level 1 SSTable.
- Sequential Input/0 (I/O).
- SSTables are periodically compacted.
- Merge into next level.
- Expensive for non-existent keys -> Bloom Filter.
- O(log(n)) performance for read operation using bloom filter
Disk Layout – Immutable Data
- Optimization of writes by inserting instead of overwriting.
- Memtable (main memory buffer).
- Storing sorted data files (immutable data).
- Deletion using Tombstones (marking deleted data to allow compaction).
Compaction - Tiering vs Leveling
- Comparison of compaction methods
- Leveling: one run per level and sort-merge.
- Tiering: multiple runs per level, sort with one run for next level
- Hybrid/Partial: fixed size runs, increase number per level
Read (LSM-Tree)
- Reading requires combination of files.
- Read until first version is found or tombstone.
- Expensive if key does not exist.
Bloom Filter
- Fast determination if a key exists in the storage.
- No input/output (I/O) required if not found.
- Bit array for inserted keys (stores bits to determine presence of a key).
File Format (LSM-Tree Format)
- Variable block size.
- Blocks are indexed.
- Trailer contains metadata (start of index).
- Type= value/tombstone
Block Index - Hash Table
- In-memory hash map (dictionary).
- Byte-offsets are used as values
Sorted String Table (SSTable)
- Hash table with sorted keys.
- Advantages: faster compaction, range queries, reduction of HDD I/O in case of operations.
- Sparse index: in memory sorted segment file (SSTable) on disk.
Distributed Architecture – Motivation
- Scalability (Elasticity) to accommodate growing data volumes, processing power, and access frequency.
- Availability (Fault Tolerance): preserving data accessibility and availability even in case of failures/crashes.
- Latency reduction in distributed copies using local data access.
Replication and Partitioning
- Partitioning: dividing data across different nodes (load balancing & latency reduction).
- Replication: storing multiple copies of data items (redundancy increasing scalability, availability and latency, and consistency challenge).
Partitioning (Key-Value Data)
- Partitioning of key-value data, partitioning by key range or hash of key.
- Partitioning Algorithm: assign each data item to a partition to balance load among partitions.
Partition Design: Prevention of Hotspots
- Hash functions or randomizers are used to assign keys to specific partitions to avoid hotspots.
Partitioning – Hash vs Range
- Range partitioning: splitting data by key ranges for scans and balancing, generally harder to balance.
- Hash partitioning: splitting data by range of key hash, for random data distribution and no locality
Distributed Hash Tables (DHT)
- Distributed key-value store.
- Hash table provides put() and get() operations.
- Partitioning is data-centered and uses routing by value to avoid global knowledge.
DHT: Design Goals
- Load balancing
- Decentralization
- Availability and robustness
- Flexible naming scheme
Consistent Hashing
- Traditional hash functions may need remapping of all tuples when number of nodes change.
- Consistent hash function reduces remapping in case of node addition or removal with average K/n keys remapped.
DHT – High-Level Architecture
- Hash function for node and key identifiers.
- Identifier circle (hash ring) that maps values to positions.
- Assignment of keys to first node with equal or successor identifier.
- Virtual nodes (multiples per server) is a mechanism to improve load balancing.
- Gossiping is used for node and failure detection.
DHT with Consistent Hashing
- Random seed and key range assignment.
- Hashing of values to key ranges & local changes.
- Spliting of one key range when new nodes are added.
DHT Replication
- Replication for high availability.
- Replication factor usually 3.
- Original node becomes coordinator.
- Replica lists act as preference list.
Summary (LSM-Tree, Distributed Architecture, Partitioning, Consistent Hashing)
- Summarizes the contents based on what was covered in the lectures.
Next Part (Key Value Stores III, Distributed Consistency)
- Indicates the next part will be focused on Key-Value Stores, distributed consistency.
Announcements
- Quiz deadline and lecture schedule change notices.
Timeline I (Revised)
- Includes the schedule for topics on Tuesdays and Wednesdays of the course.
This Lecture (Recaps, Distributed Consistency, 2PC, 3PC, Data Model)
- Recaps previous topics (distributed consistency, 2PC, 3PC, Data Model, examples)
First-in-First-Out Scheduling
- Description of FIFO Scheduling
- Shows the average completion time and turn-around times, in comparison to the data structure.
Recaps – B+-trees and Duplicate Keys
- B+Tree is typically used for unique keys.
- Meaning of pointers change based on the presence of duplicate keys in the inner nodes.
- Keys with same value can span blocks in B+ tree.
Recaps – LSM Storage Design
- Explains different LSM design considerations in terms of storage design including leveling, tiering, and various hybrid models.
Recaps - Consistent Hashing
- Consistent-hashing functions and the mapping to nodes and the behavior in case of nodes failures. Demonstrates the example of hash function
hash() MOD n
.
Where are we?
- Overview of data management tools.
- Focus is on operational storage
- Small writes and reads are emphasized
Consistency
- Refresher of ACID semantics and transaction processing protocols.
- How distributed commits are handled.
Distributed Commit Protocols
- Two-phase commit (2PC).
- Three-phase commit (3PC).
- Paxos.
Two-Phase Commit (2PC) - Overview
- Two-phase commit mechanism in distributed database environments.
- Roles of coordinator and worker nodes.
Two-Phase Commit (2PC) - Commit Request Phase
- Procedure of 2PC, how it receives, handles and sends messages between the coordinator and workers to achieve consensus for transaction processing.
Two-Phase Commit (2PC) - Commit Phase (Success)
- Sequence of operations when all workers successfully execute the prepare commit protocol in 2PC and then the commit phase
Two-Phase Commit (2PC) - Commit Phase (Abort)
- Procedure of 2PC, when at least one worker fails to properly execute the prepare commit protocol, the coordinator initiates a rollback action.
Two-Phase Commit (2PC) - Problems
- 2PC blocking protocol.
- Deadlock possibility due to coordinator/worker failures, or network issues.
Three-Phase Commit (3PC) - Overview
- Non-blocking distributed commit protocol.
- Allows for node failures, does not block in network partition scenarios.
- Assumes one coordinator, n workers and messages exchanged between them.
Three-Phase Commit (3PC) – Advantages & Disadvantages
- Addresses the weaknesses of 2PC
- Network partition handling.
- Higher network overhead due to more messaging.
Paxos – Overview
- Distributed consensus protocol, suitable for large clusters.
- Ensures consistency despite network failures.
- Three main roles: proposer, acceptor, and learner.
Paxos vs. 2PC / 3PC
- Comparison of Paxos with both 2PC and 3PC in terms of handling network partitions.
CAP Theorem - Overview
- Summarizes the CAP theorem, a system constraint when designing distributed database systems that limits to achieve only Two properties at a time.
CAP Theorem – Implications I
- Implications of different consistency model, availability, and partition tolerance trade-offs in the CAP theorem, given different system constraints.
CAP Theorem – Implications II
- Implications of dropping consistency and availability for a given system, in relation to partition tolerance.
A Weaker Consistency Model
- Explains that even though in web-scale consistency isn't always as important as availability, it is still important to consider and have some form of consistency in data systems.
Consistency Models - Overview
- Various types of consistency models in a given distributed storage system. Explaining how different models handle reads and writes to achieve different consistency results.
Variations of Eventual Consistency
- Different kinds of consistency models that handle data consistency in a distributed database system.
- Explains various forms of eventual consistency in a given system including Causal, Session and Monotonic consistency.
ACID vs. BASE
- Comparison of ACID (Atomicity, Consistency, Isolation, Durability) and BASE (Basically Available, Soft State, Eventually Consistent) consistency models
- Describes how those models approach consistency handling in a distributed system in relation to prioritizing consistency and availability in a given use-case.
Data Model Design Principles
- Principles to follow in the design of the data models
KV-Store++
- Wide column store with special key-value support, features, and requirements.
- Aka. tabular data stores
Data Model (1)
- Key components (namespaces, column families, example) within a data model designed to describe the overall structure and design features.
Data Model (2)
- Describes the key-value pair format, and structure format.
Example: Web-Table (1 & 2)
- Implementation examples of using key value pairs in a web storage environment to handle webpages, links, content. Demonstrates how data can be structured in a KV storage system with multiple columns and versions.
Design Principles
- Key elements for modeling data elements
Design: Denormalization (1 & 2)
- Denormalization design strategies for database design and how to structure related data in systems.
Summary
- Summary of previously discussed topics on distributed consistency and data models.
Next Part
- Indication of the next part in the lecture.
Thank you for your attention!
- Thank-you message for attention.
- Questions/In-Moodle/Email/Q&A session details.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on key-value stores with this quiz! Explore their characteristics, operations, and specific implementations like HBase. Understand the common features and unique aspects of KV-Stores through various questions designed to challenge your understanding.