BDS Exam: Distributed Systems & Matrix Multiplication

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

When we use the term "system under test" in benchmarking, we are referring specifically to:

Deployment comprised of hardware, software and data (correct)

Deployment of software where users can issue requests

Software, workloads, and metrics

Software

Given the following cluster resources {CPUs: 32, RAM: 128 GB, GPUs: 8, SSD: 1 TB} and job {CPUs: 20, RAM: 30 GB, GPUs: 6, SSD: 200GB}. In Dominant Resource Fair Scheduling, which is the dominant resource of the job?

SSD (correct)

GPU

CPU

RAM

Given the following three equivalent matrix multiplications, which execution order is the fastest?

((A10*5 x B5*10) x (C10*1 x D1*7))

(((A10*5 x B5*10) x C10*1) x D1*7)

(A10*5 x (B5*10 x (C10*1 x D1*7))) (correct)

What are the main three challenges of distributed file systems

<p>Performance, scalability, and consistency</p> Signup and view all the answers

Which of the following statements is true about RDDs upon failures:

<p>The entire RDD is rebuilt using its lineage.</p> Signup and view all the answers

What statement is true in the context of stream processing

<p>Event time refers to the time when events occurred, while processing time refers to the time when events are processed by the stream processing system.</p> Signup and view all the answers

Assume you have a matrix that you represent as a two-dimensional int array of size rowscolumns. You decide to represent it as a large one-dimensional array in a row-major layout. In code you write: Int src = new int[rows*columns]; Assume the matrix is very large and has a quadratic shape. You want to calculate the sum over all values in the matrix, which code snippet is most likely the faster one?

<p>Snippet 1 for (c = 0; c < cols; c++) for (r = 0; r > rows; r++) sum += src[r * cols + c];</p> Signup and view all the answers

Which statement is correct?

<p>In general, a GPU is less efficient but more flexible than an ASIC.</p> Signup and view all the answers

Assume you have two datasets: a dataset containing basic information about 10 products that a company sells and a dataset that consists of a large number of historical transactions representing purchases of the products. Now assume you want to join the data by, for example, a product ID. Which join method is the most suitable?

<p>Broadcast join</p> Signup and view all the answers

Which of the following functions is not a distributive function?

<p>most frequent value</p> Signup and view all the answers

What are the windowed aggregate values and functions after processing the watermark at 22? Note that for count-based windows, "start" and "end" refer to indices instead of timestamps.

<p>Window start end aggregate (a) 0 4 null (a) 4 8 5.6 (a) 8 12 7.7 (a) 12 16 8.9 (a) 16 20 11.8 (a) 20 22 11.7 (b) 0 3 6 (b) 3 6 15 (b) 6 9 30 (b) 9 12 28 (b) 12 15 29 (b) 15 18 22 (b) 18 21 14 (b) 21 22 null</p> Signup and view all the answers

Given the initial B+ Tree in the figure below, Delete "27"

<p>4 12 17 26 33 12 4 8 12 13 17 19 24 26 33 35</p> Signup and view all the answers

Given the initial B+ Tree in the figure below, add "13"

<p>3 5 7 9 1 2 3 4 5 6 7 8 9 10 11 12 13 3 4 5 6 7 8 9 10 11 12 1 2 3 4</p> Signup and view all the answers

If you request the value for the key "eager", what is the expected result (if the value does not exist, return "null")?

<p>1</p> Signup and view all the answers

Can key 3 be in the DB? Explain in one sentence why or why not.

<p>False</p> Signup and view all the answers

Perform a BotEC to estimate the latency of a query over employee that fetches the record with PK=40. Assume that the system's query optimizer decides to use the B+ Tree. Please express the answer in millions of nanoseconds rounded to one decimal place, for example, 2.07 ms = 2.1M ns.

<p>2.1M ns</p> Signup and view all the answers

Perform a BotEC to estimate the latency of a query over employee that fetches the record with PK greater or equal to 64 and less than 96. Assume that the system's query optimizer decides to use the B+ Tree. Please express the answer in millions of nanoseconds rounded to one decimal place, for example, 2.07 ms = 2.1M ns.

<p>0.2M ns</p> Signup and view all the answers

Suppose a buffer manager in the system stores the two most recently accessed data pages in memory. Reading 1KiB of data from memory takes 250 nanoseconds. Please perform a BotEC to estimate the latency of a query that retrieves records with primary keys 90, 170, 57, and 15 from the "employee" table. Assume the data pages containing the data for the leaf nodes 21-40 and 51-63 are cached. Also, assume that the system's query optimizer selects the B+ Tree for this query. Please express the answer in millions of nanoseconds rounded to one decimal place, for example, 2.07 ms = 2.1M ns.

<p>0.7M ns</p> Signup and view all the answers

You're working with a dataset containing information about books. Analyze the following MapReduce code and provide a concise explanation (in one sentence) of what the code is doing. Assume the function CATEGORY(book) returns the genre of a book, and the functions MAX(ITERABLE) and MIN(ITERABLE) calculate the maximum and minimum values of an iterable, respectively.
def map(key, book): if book.language == 'English' and CATEGORY(book) == ‘Science Fiction': emit(book.author, book.num_pages)

def reduce(author, pages): emit (author, (MIN(pages), MAX(pages)))

<p>The code finds the minimum and maximum number of pages for each author who has written English-language science fiction books.</p> Signup and view all the answers

What are the two major drawbacks of the approach used in the previous question? Please answer in a maximum of one sentence per drawback, bullet points are allowed!

<ul> <li>It requires a rehashing step whenever a node is added.</li> <li>It can lead to an uneven distribution of keys across the nodes, especially if the hash function is not well chosen.</li> </ul> Signup and view all the answers

Now assume two more nodes Node 2 and Node 3 join the cluster, while keeping the number of virtual nodes on the ring constant.
Update the figures and tables from above that will have to change after adding nodes.

<p><strong>Key Distribution</strong></p> <table> <thead> <tr> <th>Virtual Node</th> <th>Key</th> </tr> </thead> <tbody> <tr> <td>V0</td> <td>1</td> </tr> <tr> <td>V1</td> <td>5</td> </tr> <tr> <td>V2</td> <td>11</td> </tr> <tr> <td>V3</td> <td>13</td> </tr> <tr> <td>V0</td> <td>17</td> </tr> <tr> <td>V1</td> <td>21</td> </tr> <tr> <td>V2</td> <td>26</td> </tr> <tr> <td>V3</td> <td>30</td> </tr> </tbody> </table> <p><strong>Hash Ring</strong> This would show the nodes V0 to V3 (clockwise) in the hash ring and the keys placed in the right node.</p> <p><strong>Mapping</strong></p> <table> <thead> <tr> <th>Physical Node</th> <th>Virtual Nodes</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>V0, V1</td> </tr> <tr> <td>1</td> <td>V2, V3</td> </tr> <tr> <td>2</td> <td>V0, V1</td> </tr> <tr> <td>3</td> <td>V2, V3</td> </tr> </tbody> </table> Signup and view all the answers

Assume you have a main memory size of M, a tuple size of R, and a block size of B. Explain why the maximum number of tuples that can be sorted with TPMMS can be calculated as follows: (M/R) * ((M/ В) – 1)

<p>(M/R) calculates the number of tuples that can fit in main memory at any given time because M is the size of the main memory and R is the size of each tuple. (M/B) calculates the number of blocks that can fit in main memory. Subtracting 1 from this value accounts for the fact that the main memory needs to hold one block for the output of the sort step before it can be written to the disk.<br /> (M/R) * ((M/ B) – 1) represents the product of these two values, giving the maximum number of tuples that can be sorted in main memory during one phase of the TPMMS algorithm.</p> Signup and view all the answers

List each operation from the operation names set, C ={reduce, groupBy, union, join, map}, as either transformations or actions.

<p>Transformations: <code>map</code>, <code>union</code>, <code>join</code> Actions: <code>reduce</code>, <code>groupBy</code></p> Signup and view all the answers

List each transformation as either wide or narrow.

<p>Wide: <code>union</code>, <code>join</code> Narrow: <code>map</code></p> Signup and view all the answers

Draw the matrix in a modified compressed sparse rows (MCSR) representation.

<p>Values: [2, 8, 6, 8, 4, 5, 1, 6] Row_ptr: [0, 2, 4, 6, 8] Col_ind: [0, 1, 0, 2, 1, 0, 3, 2]</p> Signup and view all the answers

Assume you have a use case where you operate with very large but sparse matrices. Assume dimensions of 105 x 103 (rows x columns) and a filling degree of approx. 1%. Further, assume you currently use 32 bit for indexes, 32 bit for matrix values, and 64 bit for pointers. Currently, you are using the Coordinate Matrix Format (COO) format. You want to make your program memory efficient and seek a more space-efficient way to represent your matrices. A colleague suggests that using the Compressed Sparse Column (CSC) format would improve memory consumption by at least 30%. Perform a back-of-the-envelope calculation to assess your colleague's suggestion.

<p>COO format</p> <ul> <li>Values: 105 x 103 x 32 = 340.8 MB</li> <li>Indexes: 105 x 103 x 32 = 340.8 MB</li> <li>Pointers: 105 x 64 = 6.72 KB CSC format</li> <li>Values: 10^3 x 32 = 32 KB</li> <li>Indexes: 10^5 x 32 = 3.2 MB</li> <li>Pointers: 10^5 x 64 = 6.4 MB The storage requirement for CSC is approximately 3.3 MB + 6.4 MB + 32 KB = 9.73 MB. While the storage requirement for COO is approximately 340.8 MB + 340.8 MB + 6.72 KB = 681.6 MB. Therefore, the CSC format is more efficient and reduces storage requirements by about 98.6% in comparison to COO format. This back-of-the-envelope calculation supports your colleague's claim of a significant memory saving with the CSC format making it at least 30% better than the COO format.</li> </ul> Signup and view all the answers

Give the sequence of instructions that happen in phase 1 of TPMS and the state of the disk after phase 1.

<p>Sequence of instructions: <code>read(B0)</code>, <code>read(B1)</code>, <code>read(B2)</code>, <code>sort_mem()</code>, <code>write(B0)</code>, <code>write(B1)</code>, <code>write(B2)</code>. Disk after Phase 1: B0: 2, 5, 11, 12, 14, 21 B1: 1, 8, 15, 16, 20, 22 B2: 3, 9, 23, 24, 28, 30</p> Signup and view all the answers

Study Notes

Multiple Choice Questions (BDS Exam)

Benchmarking: The term "system under test" refers to the software, workloads, and metrics being evaluated. This includes the software deployment, user requests, and hardware/software/data deployments.
Dominant Resource Fair Scheduling: In the provided cluster resources and job descriptions, RAM is the dominant resource.
Matrix Multiplication: The fastest execution order for matrix multiplication is (((A x B) x C) x D).
Distributed File Systems Challenges: The main challenges are performance, scalability, and consistency. They must also support various file sizes, operating systems, and data types.
RDD Failures: Only the lost data partitions of an RDD need to be recomputed. The entire RDD is not reconstructed; the lineage is used for rebuilding.
Stream Processing: Event time is the moment an event occurs, while processing time is when the stream processing system actually handles the event.

Matrix Summation

Matrix Representation: Representing a matrix as a one-dimensional array (a row-major layout in this scenario) is possible and has implications on computational speed (snippet 2 is likely faster given quadratic matrix shape).

GPU/ASIC/FPGA Efficiency

General Comparison: A GPU is generally less efficient (than a CPU), but more flexible.

Data Analysis

Join Method: A broadcast join is suitable for joining a small dataset (e.g., product information) with a large dataset (e.g., historical transactions)

Stream Window Aggregation Notes

Aggregate Values: The sliding and tumbling window aggregations will provide specific outcomes after a watermark (e.g., 22) is processed based on the timestamps and data values within the windows.

B+ Tree Operations

Deletion: Deleting "27" from the provided B+ Tree involves evaluating multiple scenarios (steal from right/left sibling, merge with sibling) for adjustment to the tree structure.
Insertion: Inserting "13" in the given B+ Tree structure requires potential node splits.
Deletion: Similarly, deletion of "1" from the original tree also necessitates adjustments involving node splits or merges.

B+ Tree (Employee Relation)

Data Page Retrieval: All data manipulation and retrieval within this given context of the B+ Tree are done with memory and are very fast (100 ns for memory access, and sequential reads from disk 20,000 ns for 1 KiB).

BotEC Calculations

Query Latacies (employee table): These calculations, using a Bottleneck Estimate of Costs (BotEC), provide estimates for the query latencies related to queries that use the indexed employee relation data.

LSM-Trees

Key Value Retrieval (eager, eat, earth): Retrieve the associated values for the given keys, and return "null" if a key is not found. The example includes all the steps of processing.

Bloom Filter

Existence Check (Key 3, Key 5): Determine whether key 3 or 5 , exists in a set according to the bit configuration of the filter.

Bloom Filter Trends

False Positive Probability: The probability of a false positive in a Bloom filter typically increases with more items, a smaller filter size, or fewer hash functions.

Tree Structure Selection

Workloads (95% reads, 5% writes): B+ Trees are generally preferred for workloads with high read ratios and frequent or fast lookups, whereas LSM-Trees are more suitable for workloads with high write rates, though a detailed explanation about the tradeoffs would be needed.

Distributed Hash Tables (DHT)

Partitioning: Updates of Node 4's removal from the hash table structure require a recalculation of the hashing function (i.e., using mod(5) when 5 nodes are not available in a DHT structure) and remapping of keys to other nodes resulting in a revised DHT structure accordingly.

MapReduce Example

Processing Books: The code maps English language science fiction books to their page count. The reduce phase then calculates the maximum and minimum page counts for each author.

SQL Query (Sales Data)

Book Sales: This pseudocode translates the SQL query analyzing book sales in Germany during the current year using MapReduce, considering timestamps, the availability of functions like YEAR() and CURRENT_YEAR(), and basic aggregation functions.

TPMMS Algorithm Discussion

Sorting process: The phase 1 instructions for the given TPMMS sorting algorithm involve reading blocks from disk, performing sorting of memory values, and writing sorted values to disk blocks. The sequence of instructions is crucial for getting desired results.

Matrix Considerations

MCSR (Modified Compressed Sparse Rows): The compressed sparse-row representation reduces storage requirements by only storing the non-zero values and their corresponding row and column indices.

Large Sparse Matrix Efficient Storage

Alternative Format (Compressed Sparse Column Format): Using a compressed sparse column (CSC) format offers potentially significant memory benefits for large sparse matrices, potentially reducing memory space usage by 30% or more (compared to Coordinate Matrix Format (COO)).

Spark DAG Analysis

Stage Matching: The given directed acyclic graph (DAG) reflects the staged operations within a Spark task. The operations (e.g., reduce, groupBy, union, join, map) can be explicitly matched onto the respective stages (I, II, III, IV, V) in the Spark DAG in order.

(Transformation/Action) Types

Spark Operations: List each operation from the given set as either a transformation or an action and the type as either narrow or wide depending upon the functionality of the operations in Spark.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Description

Test your knowledge on key concepts related to benchmarking, resource scheduling, and matrix multiplication within distributed systems. This quiz covers critical challenges and operational principles including RDD failures and stream processing. Perfect for BDS students preparing for exams in the field of computer science.