Directory Based Cache Coherence
35 Questions
3 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Describe the differences between snoop-based coherence protocols and directory based protocols?

In snoop-based protocols, all processors monitor (or "snoop") a shared communication medium (typically a bus) to detect changes in the cache lines. In directory-based protocols, A directory keeps track of the state of each cache line and which processors have copies of it. The directory can be centralized or distributed across multiple nodes. Snoop based protocols work well for small systems, but as the number of processors grow, directory based protocols perform better.

Directory based coherence protocols use broadcasts.

False

What is the main reason to implement a directory based coherence protocol over a snoop/broadcast based protocol?

scalability

Describe a point-to--point interconnection

<p>a network that utilizes a set of routers and links, where a link connects only a pair of routers. The key characteristic of a point-to-point interconnect is that there is no link that is shared by all processors or caches. Therefore, it is absent of a shared medium that can easily be used for broadcasting messages, and for ordering of requests made by various processors.</p> Signup and view all the answers

Name the two requirements to maintain cache coherence and define them.

<p>Write Propagation: the mechanism by which updates to a cache line in one processor's cache are propagated to other caches in the system to ensure that all copies of the cache line remain consistent. Transaction Serialization: requirement that multiple operations (reads or writes) to a single memory location are seen in the same order by all processors.</p> Signup and view all the answers

Match the terms to their definitions.

<p>Intervention = downgrade ending in Shared state Invalidation = downgrade ending in Invalid state Upgrade = Moving states from more restrictive to less restrictive (S -&gt; M) Downgrade = Moving states from less restrictive to more restrictive (M -&gt; S)</p> Signup and view all the answers

Match each term to it's definition

<p>Arbitration = determines which processor/cache is allowed to use the bus for a bus transaction Clean Sharing = If a block is shared amongst multiple caches, it means that the value has to be clean Dirty Sharing = If a block is shared amongst multiple caches, the value doesn’t necessarily need to be clean. Snoop = When a transaction is posted on the bus, each cache can recognize what type of transaction is on the bus using a snooper.</p> Signup and view all the answers

What two things does transaction serialization require?

<ol> <li>A way to determine the sequence of transactions that is consistently viewed by all processors</li> <li>a way to provide the illusion that each transaction appears to proceed atomically, absence of overlap with other transactions.</li> </ol> Signup and view all the answers

Well-tuned applications exhibit read/write data sharing between processors.

<p>False</p> Signup and view all the answers

Picture a 4 processor system directory-based system. Block B is stored in P3's cache in the Modified state. P0 wishes to read Block B. List the order of events.

<ol> <li>P0 suffers a cache miss, and queries the directory</li> <li>The directory finds Block B in the table, sees that it's in the Modified state in P3's cache, and forwards the read request to P3.</li> <li>P3 sends P0 the data directly.</li> </ol> Signup and view all the answers

Picture a 4 processor directory-based system. Block C is cached in the Shared state in P0, P1, and P2. P0 wishes to write to Block C. List the events that occur.

<ol> <li>P0 sends a upgrade request to the directory indicating it wants to write to the block</li> <li>The directory finds that P1 and P2 are holding the block in Shared state and sends an invalidate message.</li> <li>P1 and P2 invalidate their blocks and send an ACK to P0</li> </ol> Signup and view all the answers

How many and what states does the directory hold if using the MESI protocol?

<p>exclusive/modified (EM), shared (S), or invalid (I). The directory will keep 3 states (instead of the 4 that are used in MESI). This is because the directory cannot know if a state has moved from exclusive to modified.</p> Signup and view all the answers

What formats can the directory keep cache states in? (Select all that apply)

<p>Full bit vector</p> Signup and view all the answers

Calculate the storage overhead of using a full bit vector format of the directory in a directory-based cache? Assume: 1024 caches, 64B block size, group size of 4

<p>200%</p> Signup and view all the answers

Calculate the storage overhead of using a coarse bit vector format of the directory in a directory-based cache? Assume: 1024 caches, 64B block size, group size of 4

<p>50%</p> Signup and view all the answers

Calculate the storage overhead of using a limited pointer format of the directory in a directory-based cache? Assume: 1024 caches, 64B block size, group size of 4, we keep 8 pointers

<p>15.6%</p> Signup and view all the answers

Explain how a full-bit vector format works for a directory-based cache coherence.

<p>A full-bit vector format uses a bit vector to keep track of the state of each cache line in the system, where each bit represents the presence of a cache line in the cache of a particular processor or node. When a cache line is accessed, the corresponding bit is set or cleared, allowing the directory to maintain coherence among multiple caches.</p> Signup and view all the answers

Explain how a coarse bit directory format works for directory-based cache?

<p>Coarse bit vector is like full bit vector format, except that it groups the processors, and then keeps a bit for each group rather than for each processor. So if you have 16 processors with 4 groups, you'd have 4 bits.</p> Signup and view all the answers

Explain how limited pointer format works for directory-based caches. Give an example of how this would work for 1024 processors in a system, with n = 4.

<p>The directory keeps a limited amount of pointers for each cache line. So let's say that the system has 1024 processors, and the directory entry keeps 4 pointers. We would need 4 * log2(1024) bits or 40 bits.</p> Signup and view all the answers

Explain what the sparse directory format is. What is one limitation of this format?

<p>Instead of keeping a directory entry for each block in memory, only keep a directory entry for those blocks that are cached. If a block is brought into a cache that has not been cached before, then a directory entry is created for it.</p> <p>One limitation of this format is that if a block is silently evicted (does not alert the directory that it was evicted because it was a clean block), then the directory may hold an entry for a block that is not cached.</p> Signup and view all the answers

Explain the difference between a centralized configuration vs a distributed configuration of the directory location in a directory-based cache system?

<p>In a centralized configuration, the directory is kept in one location. In a distributed configuration, the directory is split up into different parts depending on the addresses it keeps track of.</p> Signup and view all the answers

Give two examples of other directory formats other than the 4 discussed in class?

<p>Hybrid: limited pointer, convert to full/coarse vector when exceed sharing limit Linked List: one pointer in directory, each cache block also has a pointer to next sharer.</p> Signup and view all the answers

List all supported request types (There are 10):

<p>Read, ReadX, Upgrade, ReplyD, Reply, Inv, Int, Flush, InvAck, Ack</p> Signup and view all the answers

Match each request type to it's definition

<p>ReplyD = reply from the home to the requestor containing the data value of a memory block Reply = reply from the home to the requestor not containing the data value of a memory block InvAck = acknowledgement of the receipt of an invalidation request Ack = acknowledgement of the receipt of non-invalidation messages</p> Signup and view all the answers

Match each group of messages to the best match (home node is where the directory information resides)

<p>Read, ReadX, Upgr = requestor to home node ReplyD, Reply = the directories response to the requestor node request Inv, Int = Home node to either all sharers (or owner of block) Flush, InvAck, Ack = responses to the Inv and Int messages to write-back/flush a dirty cache block.</p> Signup and view all the answers

Draw the Cache Coherence state machine for directory-based MESI protocol. Enter 'done' when finished. Solution is figure 10.3 in textbook.

<p>done</p> Signup and view all the answers

Fill in the following table for a 3 processor directory based MESI cache coherence protocol: Requests : (Init, R1, W1, R3, W3, R1, R3, R2) The answer should be in the following format: (Request (from above row), P1 state, P2 state, P3 state, Directory <state, bit vector>, All Messages [<msg, src->dest>], # Hops) For example, if P2 requests a block Read, and all the states will be shared, then the following format would suffice: R2, S, S, S, <S, 111>, [<Read, P2->H>,<ReplyD, H->P2>], 2 Put a dash ("-") if cache block hasn't been brought in yet, or if no messages are sent.

<p>Init, -, -, -, &lt;U, 000&gt;, [-], - R1, E, -, -, &lt;EM, 100&gt;, [&lt;Read, P1-&gt;H&gt;,&lt;ReplyD, H-&gt;P1&gt;], 2 W1, M, -, -, &lt;EM, 100&gt;, [-], 0 R3, S, -, S, &lt;S, 101&gt;, [&lt;Read, P3-&gt;H&gt;, &lt;Int, H-&gt;P1&gt;, &lt;Flush, P1-&gt;H&gt;, &lt;Flush, P1-&gt;P3&gt;], 3 W3, I, -, M, &lt;EM, 001&gt;, [&lt;Upgr, P3-&gt;H&gt;, &lt;Reply, H-&gt;P3&gt;, &lt;Inv, H-&gt;P1&gt;, &lt;InvAck, P1-&gt;P3&gt;], 3 R1, S, -, S, &lt;S, 101&gt;, [&lt;Read, P1-&gt;H&gt;, &lt;Int, H-&gt;P3&gt;, &lt;Flush, P3-&gt;P1&gt;, &lt;Flush, P3-&gt;H&gt;], 3 R3, S, -, S, &lt;S, 101&gt; [-], 0 R2, S, S, S, &lt;S, 111&gt;, [&lt;Read, P2-&gt;H&gt;, &lt;ReplyD, H-&gt;P2&gt;], 2</p> Signup and view all the answers

How is cache coherence maintained in a directory based coherence protocol? Explain how write propagation and transaction serialization are satisfied.

<p>Write Propagation is ensured by sending all requests to H, which will then send invalidations to all sharers. On a miss, H provides the most recently written data by either retrieving from memory or sending an intervention to the cache that owns the dirty block.</p> <p>Transaction Serialization happens because we make sure that the home node does not do more than one operation at a time for a block (in a home-centric system). Remember that home-centric systems rely on ACKs to know when operations have completed.</p> Signup and view all the answers

What are 3 valid cases of how a directory state can contain out-of-date information (e.g., as a result of a cache line being silently evicted)?

<p>Shared blocks silently evicted</p> Signup and view all the answers

Explain how the directory based coherence protocol handles containing out-of-date information for the case where a shared block is silently evicted.

<p>Shared blocks can be silently evicted, the directory doesn’t need to know about it. If a cache silently replaces a block, and then the directory sends an invalidation because another cache wants to write to it, then the cache needs to send an acknowledgement that the block is invalidated.</p> Signup and view all the answers

Explain how the directory based coherence protocol handles containing out-of-date information for the case where a shared block is silently evicted, and then tries to read/write to block.

<p>It suffers a miss, and then asks the directory. The directory will treat it as normal, recognizing that there was a silent eviction. Worst case scenario is extraneous invalidate messages, not knowing if silent evictions have occurred.</p> Signup and view all the answers

Explain how the directory based coherence protocol handles containing out-of-date information for the case where an Exclusive block is silently evicted.

<p>If the directory gets a Read/ReadX request, the easiest thing to do is to treat it normally and send the requested data. The problem is that the directory doesn’t know if a write back has occurred, and is in the middle of happening. The solution is to make it so that the cache has to know it’s done a write back, and not allow to send out any Read/ReadX until an ACK is received for the WB.</p> Signup and view all the answers

What is an Overlapping request?

<p>When the end time of the current request processing is greater than the start time of the next request.</p> Signup and view all the answers

Explain the difference between a home-centric and a requestor-assisted approach to handling overlapped requests.

<p>In the home-centric approach, the home node determines when the processing of the request is complete, based on notifications from all involved nodes. In contrast, the requestor-assisted approach requires the requesting node to maintain an outstanding transaction buffer, tracking the request until it receives a reply from the home node.</p> Signup and view all the answers

Draw a diagram showing the processing request for the following scenarios, do so for both home-centric and requestor-assisted approaches: Read to Clean Block Read to exclusive/Modified block ReadX to uncached block ReadX to shared block ReadX to exlusive/modified block (no WB race) ReadX to exlusive/modified block (WB race) (Check answer in figure 10.6 and figure 10.7 of the textbook ). Enter 'done' when done.

<p>done</p> Signup and view all the answers

More Like This

Use Quizgecko on...
Browser
Browser