Big Data Systems - Key Value Stores I PDF

Big Data Systems Martin Boissier Key Value Stores I Data Engineering Systems Hasso-Plattner-Institut Timeline I Date Tuesday Wednesday 15.10. /16.10 Intro / Organizational Use C...

Big Data Systems Martin Boissier Key Value Stores I Data Engineering Systems Hasso-Plattner-Institut Timeline I Date Tuesday Wednesday 15.10. /16.10 Intro / Organizational Use Case - Search Engines 22.10. / 23.10. Performance Management Intro to GitHub Classroom 29.10. / 30.10. Map Reduce I Map Reduce II 5.11. / 6.11. Map Reduce III Exercise 12.11. / 13.11. Data Centers Cloud 19.11 / 20.11. File Systems Exercise 26.11. / 27.11. Key Value Stores I Key Value Stores II 3.12 / 4.12. Key Value Stores III Exercise 10.12. / 11.12. Stream Processing I Stream Processing II 17.12. / 18.12. ML Systems I Exercise Christmas Break 3 This Lecture 1. KV Overview 2. B-Trees Sources Hans-Arno Jacobsen (TUM) – Distributed Systems Johannes Zschache (University of Leipzig) – NoSQL-Datenbanken Volker Markl (TUB) – Database Technology 4 Where are we? § Data Management Application / Query Language / Analytics / Visualization Application § Operational storage Data Processing Data Management Big Data Systems § Small writes and reads File System Virtualization / Container OS / Scheduling Infrastructure Hardware 5 Serving Requests § User access to inverted index § Input: search term User § Output: relevant URLs Interaction § Requirements § Many search terms Index § Many URLs § Frequent updates § Interactive speed ( 1 trillion (10^12) pages indexed ~ 10.000 queries per second Search must complete in ~200ms 7 Enter Key-Value Stores § Scalable container for key-value pairs § NoSQL semantics (non-relational) § KV-Stores offer simpler semantics (and syntax) in exchange for increased scalability, speed, availability, and flexibility § Small scale: Map / hash table -> index § Main operations § Write/update put(key, value) § Read get(key) § Delete delete(key) 8 This Lecture § Remember, two main types of queries: § OLAP queries: § OLTP queries: § Mostly read access to data § Read & Write access to data § Many rows per operation § Few rows per operation § Low qps querys per second § High qps § This distinction also holds at „web scale“ § Peta-/Exabytes (1015 – 1018) of data § 10.000+ nodes § This lecture: „How to process OLTP-like queries at web scale?“ 9 Web Scale § More information of web scale data stores vs. relational systems § “MonetDB is Web Scale” § https://www.youtube.com/watch?v=b2F-DItXtZs 10 DBMS vs KV-Store DBMS (SQL) Key-Value Store Key Value John Smith {Activity:Name= Swimming} Jane Bloggs {Activity:Cost=57} Mark Anthony {ID=219} Relational data schema No (static) data schema Data types Raw byte access Foreign keys No relations Full SQL support Single-row operations 11 KV-Stores – Requirements § Horizontal scalability § User growth, traffic patterns § Adapt to number of requests & data size § Performance § High speed for single-record read and write operations § Flexibility § Adapt to changing data definitions § Reliability § Uses commodity hardware: failure is the norm § Provide failure recovery § Availability and geo-distribution § Users are worldwide § Guarantee fast access 12 KV-Store Client Interface – Overview § Main operations § Write/update put(key, value) § Read get(key) § Delete delete(key) § Usually no aggregation, no table joins, no transactions! 13 KV-Store Client Interface – Example HBase Configuration conf = HBaseConfiguration.create(); Initialization conf.set("hbase.zookeeper.quorum", "192.168.127.129"); Using ZooKeeper HTable table = new HTable(conf, „MyBaseTable"); Column Family: “Schema” Put put = new Put(Bytes.toBytes("key1")); put.add(Bytes.toBytes("colfam1"), Bytes.toBytes(„value"), Bytes.toBytes(200)); table.put(put); Column: Defined at run-time Get get = new Get(Bytes.toBytes(“key1")); Result result = table.get(get); byte[] val = result.getValue(Bytes.toBytes("colfam1"), Bytes.toBytes(„value")); System.out.println("Value: " + Bytes.toInt(val)); 14 KV-Stores in Practice § Bigtable § Amazon § Apache HBase § Key: customerID § Value: customer profile (history, …) § Apache Cassandra § Redis § Facebook, Twitter § Amazon Dynamo § Key: UserID § Yahoo! PNUTS § Value: user profile (friends, photos) § LevelDB § RocksDB 15 Common Properties of KV-Stores Common Elements Common Features § Memory store & write ahead log § Flexible data model with column (WAL) families § Keep data in memory for fast access § Horizontal partitioning (key ranges) § Keep a commit log as ground truth § Store big chunks of data as raw § Versioning bytes § Store different versions of data § Fast column-oriented single-key § Timestamping access (read/write/update) § Replication § Store multiple copies of data § Failure detection & failure recovery 16 Common Non-Features of KV-Stores § Integrity constraints we do not chekc if data model is valid § Transactional guarantees, i.e., ACID § Powerful query languages § Materialized views 17 Index Simple Index § Prerequisite: Sorted data file Dense index sorted by Primary key § Called sequential file § In DB typically sorted on primary key of a relation § Index file stores (key, pointer) pairs § Key value K is associated with pointer § Pointer points to record in sequential file that contains key value K Sparse index § Index types § Dense Index § One entry in index file for every record in sequential file § Sparse Index § One entry in index file for every block in sequential file B-Trees § So-far: Single level index for access acceleration § In general: B-Trees (here B+-Trees) § As many levels as necessary § Blocks are at least 50% full § No overflow blocks needed 20 Structure § Index blocks organized in a tree § Balanced § Each path from root to leaf has the same length. § Parameter n § Every block holds at least n and up to 2n search keys § Every block holds at least n+1 up to 2n+1 pointers § Just like index block before, but one additional pointer § Choice of n § n as large as possible with respect to block size § 4096 Bytes per block; 4 Bytes per key; 8 Bytes per pointer § 4(2n) + 8(2n+1) ≤ 4096 => n = 170 § Alternative definition § Every block holds at least (𝑛 + 1)/2 − 1 search keys, (𝑛 + 1)/2 pointers § Every block holds at most n search keys, n+1 pointers 21 Definition of B+-Trees § Keys in the leaves are keys from the data § Sorted across all leaves (from left to right) § Root: at least one used pointer. § All pointers point to B-tree block one level below § Leaves: Last pointer points to next leaf (on the right), § From the other 2n pointers at least n are used. K1 RID1 K2 RID2 … Next § Point to data blocks § Inner Nodes: Pointers point to B-tree blocks of level below § At least n+1 are used § If j pointers are used, there are j-1 keys in the block § K1, … , Kj-1 P0 K1 P1 K2 P2 … Free § First pointer points to subtree with key values < K1. § Second pointer points to subtree with key values between K1 and K2. 22 Example Configuration § n=2 § All nodes: At most 4 keys and 5 pointers § Root: At least 1 key and 2 pointers § Inner Nodes: At least 2 key and 3 pointers § Leaves: At least 2 keys and 3 pointers 41 12 28 46 67 first pointer goes to block that is snaller than 46 (not equal) second ponter lerger than 46 and smaller tha 67 1 5 9 12 15 19 28 33 37 41 45 46 53 59 67 71 83 99 23 Alternative Definition § So far: Parameter n § Block contains at least n keys § Block contains at most 2n keys § Block contains at least n + 1 pointers (as before) § Alternative: § Node contains at most n keys, n+1 pointers § Inner node contains at least é(n+1)/2ù pointers § Leaf contains at least ë(n+1)/2û+1 pointers § Always: § Inner node contains always one pointer more than number of search keys § Leaf contains as many pointers as search keys § plus linked list 24 Examples of Leaves § n=2 § 4 keys and 5 pointers 67 71 83 99 § Full leaf To next leaf To block with To block with To block with To block with key 67 key 71 key 83 key 99 67 71 To next leaf § Partially filled leaf To block with To block with 67 Not key 67 key 71 allowed 25 Examples of Inner Nodes § Full inner node 67 71 83 99 To keys To keys K < 67 To keys To keys K ≥ 99 67 ≤ K < 71 To keys 83 ≤ K < 99 71 ≤ K < 83 67 71 § Partially filled inner node To keys Not K < 67 To keys To keys allowed 67 ≤ K < 71 K ≥ 71 26 Example B+-Tree §n=2 § ≥ 2 keys § ≥ 3 pointers 41 12 28 46 67 1 5 9 12 15 19 28 33 37 41 45 46 53 59 67 71 83 99 In the leaves, every key exists only once 27 Applications of B+-Trees § B-trees can assume different index roles: Dense index § Search key is primary key § Dense index § Order of data file is not important § Sparse index Sparse index § Data file must be sorted by primary key § Search key is NOT primary key § Search key is not unique § Duplicate keys in leaves have to be handled 29 B-Tree Search Searching in B-Trees § For now: Search key = primary key § Dense index § Operations for sparse indexes similar logarithmic search § Find K. § If we are at a leaf: § Search K in the leaf. § If we are at an inner node with keys K1, K2, … Kn: § If K < K1 go to first child § If K1 ≤ K < K2 go to second child § … § If Kn ≤ K go to last child 31 Example: Searching in B-Trees K = 60 41 41 ≤ 60 12 28 46 67 46 ≤ 60< 67 1 5 9 12 15 19 28 33 37 41 45 46 53 59 67 71 83 99 59 < 60 L now we have to search that block linearily 32 Example: Searching in B-Trees K = 53 41 41 ≤ 53 12 28 46 67 46 ≤ 53< 67 1 5 9 12 15 19 28 33 37 41 45 46 53 59 67 71 83 99 53 = 53 J 33 Range Queries § Queries with inequality in WHERE-clause § SELECT * FROM R WHERE R.k > 40 § SELECT * FROM R WHERE R.k >= 10 AND R.k b: § Follow pointer to next node. § Similar for open intervals [-∞ , b] respectively [a, ∞]. 34 Example: Searching in B-Trees 10 ≤ K ≤ 28 41 10 < 41 12 28 46 67 10 < 12 1 5 9 12 15 19 28 33 37 41 45 46 53 59 67 71 83 99 12, 15, 19, 28 35 B-Tree Insertion Inserting into a B-Tree Recursive Algorithm: § Search corresponding leaf. § If room, insert key and pointer. § If no room: Overflow § Split leaf in two parts and distribute keys equally § Split requires inserting a new key/pointer pair in parent node § Recursively ascend the tree § Exception: If no space in root § Split root § Create new root (with only one key) 37 Example of B-Tree Insertion K = 60 41 41 ≤ 60 12 28 46 67 46 ≤ 60< 67 1 5 9 12 15 19 28 33 37 41 45 46 53 59 60 67 71 83 99 Easy J 38 Example of B-Tree Insertion K = 61 41 41 ≤ 61 12 28 46 67 46 ≤ 61< 67 1 5 9 12 15 19 28 33 37 41 45 46 53 59 60 67 71 83 99 ? 39 Example of B-Tree Insertion K = 61 41 12 28 46 67 1 5 9 12 15 19 28 33 37 41 45 67 71 83 99 We need a key/pointer 46 53 59 60 61 pair Key: 59 40 Example of B-Tree Insertion K = 61 41 12 28 46 59 67 1 5 9 12 15 19 28 33 37 41 45 67 71 83 99 46 53 59 60 61 41 Insertion Cost § Let h be the height of the B-Tree § In practice often: h = 3 § Search for leaf node: h § If no split needed: Total cost h + 1 § h block reads, 1 block write § If split needed § Worst-case: ascends to root § Even caching useless: nodes need to be written to disk § Total cost: 3 h + 1 § On every level search and overflow handling § + writing new root 42 Deleting from a B-Tree Find corresponding node Delete key If still at least minimal number of keys in node § Nothing else to do If too few keys in node: Merge § If a sibling node (left or right) contains more than minimal number of keys, „steal“ a key. § Maybe need to adjust keys of parents § If not: There are two siblings in tree with minimal and sub-minimal number of keys § => These nodes can be merged. § Keys in parent have to be adjusted § Possibly delete propagates upward in the tree 43 Example of B-Tree Deletion K = 12 41 12 28 46 67 1 5 9 12 15 19 28 33 37 41 45 46 53 59 67 71 83 99 Just remove 12 44 Example of B-Tree Deletion K = 12 41 12 doesnt Need to be changed as a parent, split ist still valid 12 28 46 67 1 5 9 15 19 28 33 37 41 45 46 53 59 67 71 83 99 45 Example of B-Tree Deletion K = 15 41 12 28 46 67 1 5 9 15 19 28 33 37 41 45 46 53 59 67 71 83 99 46 Example of B-Tree Deletion K = 15 41 12 28 46 67 1 5 9 19 28 33 37 41 45 46 53 59 67 71 83 99 Steal from sibling 47 Example of B-Tree Deletion K = 15 41 9 28 46 67 1 5 9 19 28 33 37 41 45 46 53 59 67 71 83 99 Adjust parent 48 Example of B-Tree Deletion K=5 41 9 28 46 67 1 5 9 19 28 33 37 41 45 46 53 59 67 71 83 99 Cannot steal Delete propagates 49 Example of B-Tree Deletion K=5 41 28 46 67 1 9 19 28 33 37 41 45 46 53 59 67 71 83 99 And propagates 50 Example of B-Tree Deletion K=5 28 41 46 67 1 9 19 28 33 37 41 45 46 53 59 67 71 83 99 51 Deletion Cost § Search and local delete: h + 2 § Write leaf node § Write data file § When merging with siblings: h + 6 § Check left and right § Write block and move neighbors § Write parent nodes § Write data block § When merging up to root: 3h - 2 52 Deleting from B-Trees - Variation § Assumption: Normally data sets grow § Consequence: Never delete nodes in B-tree § Underflow nodes in B-Tree will eventually be filled again § We need to keep information of deletes and maintain structure § Improvement: Tombstone in B-tree instead of data file block 53 B-Tree Variations B-Trees for Non-Primary Keys § Meaning of pointers in the inner nodes changes § Why? § Non-primary keys are not unique § Keys with same value can span blocks § Given: Keys K1, … , Kj => Ki is smallest new key value reachable from (i+1)-st pointer. § I.e., there is no key value Ki in left subtree, but at least one occurrence of the key value in the subtree after the (i+1)-st pointer. § Problem: There is not always such a key 55 B-Trees for Non-Primary Keys No new values in second child 28 is not 41 new 12 28 ^ 67 1 5 9 12 15 19 28 28 28 28 41 46 41 53 41 59 67 71 There is no reason to start a search here. □Ki is smallest key value reachable from pointer (i+1) 56 B-Tree Variations: B*-Tree § B*-tree and B#-tree § Overflow during insertion: distribution across all leaves § If not possible: create 3 new leaves from 2 old leaves § In general, create m+1 nodes from m nodes § Better memory utilization m = 1 (B+-Tree) § At least 66% m = 2 (B*-Tree) m=3 57 Next Part § Key Value Stores II § LSM Tree Application / Query Language / Analytics / Application § Distributed Architecture Visualization Data Processing Data Management Big Data Systems File System Virtualization / Container OS / Scheduling Infrastructure Hardware 58 Thank you for your attention! § Questions? § In Moodle § Per email: [email protected] § In Q&A sessions 59

Big Data Systems - Key Value Stores I PDF

Document Details

Tags

Related

Summary

Full Transcript