Computer Memory Systems PDF

Memory systems can be classified according to their key characteristics. - Location refers to whether memory is internal or external to the computer. Examples of internal memory include processor registers, cache memory, and main memory. External memory include optical disks, tapes. - Capacity in the memory is referring to the number of bytes or words stored in memory - Unit of transfer: for internal memory the unit of transfer refers to the number of bits read out of or written into memory at a time. For external memory, data is transferred in larger units called blocks. - Access method refers to how data is accessed, and there are 4 ways they are accessed: 1. Sequential access: Memory is organized into units of data called records, access must be made in a specific linear sequence, and access time is variable. An example is tape. 2. Direct access: Involves a shared R/W mechanism where individual blocks or records have a unique address based on their physical location, and access time is variable. An example is disk. 3. Random access: Each addressable location in memory has a unique, physically wired-in addressing mechanism. The time to access a given location is independent of the sequence of prior accesses. Main memory and some cache systems are random access. An example is RAM. 4. Associative access: A word is retrieved based on a portion of its contents rather than its address. Each location has its own addressing mechanism and retrieval time is constant. Cache memories may employ associative access. An example is cache memory. When designing memory, the designer must think of 3 things: the capacity, the speed, and the cost. As you go down the memory hierarchy: - The cost per bit decreases - Capacity increases - Access time increases - The frequency of access of the memory by the processor decreases ![](media/image2.png) For a user, the most important characteristics of memory are performance and capacity. There are 3 performance parameters being used: - Access time: for random access memory, it is the time taken to perform a R/W operation. For non-random access memory, it is the time taken to position the R/W mechanism at the desired location. - Memory cycle time: it concerns the system bus, not the processor and is applied to random access memory. It is the access time plus any extra time required before a second access can begin. - Transfer rate: this is the rate at which data can be transferred into/out of a memory unit. Physical characteristics of data storage: Volatility: volatile memory is when information is lost when electrical power is switched off, or is decayed naturally. Non-volatile memory is when information remains without deterioration unless changed. When electrical power is switched off, information isn't lost. Erasable: non erasable memory, like ROM, can't be altered. Cache memory contains a copy of portions of main memory, and is faster than main memory. It sits in between main memory and the CPU. Cache read operation: When the processor tries to read a word, first the cache memory is checked and if it contains the words, they are delivered to the processor. If the words required weren't available, a block of main memory which contains the words is read into the cache then sent to the processor. Below is a simple flowchart which explains the cache read operation. Elements of cache design: ![](media/image4.png) Cache address: virtual memory allows programs to address memory from a logical point of view, without regard to the amount of main memory physically available. Logical cache (virtual cache) stores data using virtual addresses. Physical cache stores data using main memory physical addresses. Cache size: the desirable size is for the cache to be small enough so that the overall average cost per bit is close to that of main memory alone and large enough so that the overall average access time is close to that of the cache alone. The larger the cache the slower and more expensive it is. Mapping function: Cache lines are less than main memory blocks therefore an algorithm is used to map main memory blocks into cache lines. There are 3 techniques that can be used: - Direct: each block of main memory maps to only one cache line. This means that if a block is in cache, it must be in one specific place. Its advantage is that it's inexpensive and simple to implement. Its disadvantage is that there is a fixed cache location for any given block, this causes blocks to be continually swapped in the cache. - Associative: a main memory block can load into any line of cache, memory address is interpreted as a tag and a word. The tag identifies the block of memory, and each tag is examined for a match. Its advantage is that there is flexibility as to which block to replace when a new block is read into cache. Its disadvantage is its complex circuitry. ![](media/image6.png) - Set associative: cache is divided into a number of sets which contain a number of lines. A given block maps to any line in a given set. Replacement algorithms: For direct mapping: there is only one possible line for any particular block, and no choice is possible. For the associative and set-associative techniques, a hardware implemented algorithm is needed. These include: LRU (least recently used): replaces the block in the set that has been in the cache the longest with no reference to it. FIFO (first in first out): Replace the block in the set that has been in the cache longest. LFU (least frequently used): Replace the block in the set that has experienced the fewest references. Random: pick a line at random from among the candidate lines. The write policy focuses on maintaining data consistency between the cache and main memory. The cache block consistency must ensure that a cache block isn't overwritten unless the corresponding data in main memory is also updated. Multiple CPUs may have individual caches. I/O operations may bypass the cache memory and address the main memory directly. The methods that can be used are: - Write through: all data is written to the cache and is immediately written to the main memory. The multiple CPUs monitor main memory traffic to ensure local caches are synchronized to the main memory. Its disadvantages are there is high traffic between the CPU and main memory as every written operation is sent to both the cache and main memory. It slows down write operations. - Write back: updates of data are made to the cache only and not immediately to the main memory. Update bit for cache slot is set when an update occurs in the cache. If a block is to be replaced, it is written to memory only if the update bit is sent. I/O operations need to access main memory through the cache. Cache coherency approaches are used in a system where multiple processors or devices, each having its cache, share the same main memory. The goal is to ensure consistency between caches and main memory. The approaches are: - Bus watching with write-through: This method is used when multiple devices or processors write to shared memory. Each cache controller monitors address lines on the bus to detect write operations to the shared memory by other devices. This ensures that if one device writes to memory, other devices can update their caches accordingly. - Hardware transparency: This involves adding extra hardware to automatically keep caches coherent. The hardware ensures that all updates made to main memory through one cache are reflected in other caches. - Non-cacheable memory: Certain portions of memory are marked as non-cacheable, meaning these areas cannot be cached by any processor. This is useful for shared memory regions where coherency could be difficult to maintain or where real-time access is needed without cache delays. Line size: When retrieving data from main memory, the cache retrieves adjacent words as well. This improves performance by using the principle of locality. Increasing the block size increases the hit ratio at first. But as the block gets bigger, the hit ratio decreases as the likelihood of needing the newly fetched data decreases. Larger blocks reduce the number of blocks that fit in cache. On the other hand, data is overwritten shortly after being fetched. Multilevel caches: They improve the speed and efficiency of data access between the CPU and memory. High logic density has enabled caches to be placed on the same chip as a processor. This allows for faster access and also frees up the bus for other data transfer. L1 cache is on the chip, the L2 cache used to be off the chip and in static RAM but now is on the chip. The L2 cache is much faster than main memory (but slower than L1). L2 often uses separate data path to speed up access. If present, the L3 cache could either be on chip or off chip. It is slower and bigger than L1 and L2 but still faster than directly accessing main memory. These are used to reduce latency and avoid slow access times. Unified cache: It has a single cache for both instructions and data. Its advantages are that it causes a higher hit rate which balances instruction and data fetches, and it is simpler to design. Split cache: there are separate caches for instructions and data. Its advantages are it eliminates cache contention between instruction fetch unit and execution unit. Below are some formulas needed to understand and questions, with answers, related to the formulas. ![](media/image8.png) ![](media/image10.png) ![](media/image12.png)This is a block diagram of the organization of the Pentium 4. The processor here contains 4 main components: 1. Fetch/Decode Unit - Fetches instructions from L2 cache - Decodes them into micro-ops - Stores micro-ops in L1 cache 2. Out of order execution logic: - Schedules execution of micro-ops based on data dependencies and resource availability. - Schedules executions that may be required in the future. 3. Execution units: - Execute micro-ops - Fetches data from L1 cache. - Temporarily stores results in registers 4. Memory subsystem: it contains the L2 and L3 cache, along with the systems bus ARM cache organization: The small FIFO write buffer: - Enhances performance of writing data in the processor to memory. - Is located in between cache and main memory - Is small compared to the cache - Operation: Data is put into the write buffer at the same speed as the processor, and the processor continues execution while the data in the buffer is written to main memory in parallel. If the buffer is full, the processor stalls until there is available space. - Limitations: Data stored in write buffer isn't available until written to memory. Below is a diagram illustrating the ARM cache and write buffer organization.

Computer Memory Systems PDF

Document Details

Tags

Related

Summary

Full Transcript