MID TERM REVIEWER PDF
Document Details
City College of Angeles - Institute of Computer Studies and Library Information Science
Tags
Related
- Computer-Organization-and-Architecture.pdf
- (The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-24-101-9-11.pdf
- 04.-Computer-organization-Architecture_.pdf
- Computer Architecture and Organization PDF
- Computer Architecture and Organization PDF
- Computer Organization and Architecture: An Overview Lecture PDF
Summary
This document is a reviewer for a midterm exam on Computer Architecture and Organization, focusing on providing an overview of computer organization and architecture. It covers topics like the definition of computer architecture and organization, computer performance, response time, throughput, CPU execution time, and more. The document is from CITY COLLEGE OF ANGELES - ICSLIS.
Full Transcript
7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 1 Lesson # 1 Overview of Computer Examples of architecture attributes include Architecture and Organization the instruction set, the number of bit to...
7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 1 Lesson # 1 Overview of Computer Examples of architecture attributes include Architecture and Organization the instruction set, the number of bit to represent various data types (e.g.., Outline: numbers, and characters), I/O Definition of Computer Architecture mechanisms, and technique for addressing and Organization memory. Structure and Function Four main functions of a computer. Examples of organization attributes include Structural components of a those hardware details transparent to the processor programmer, such as control signals, Computer Performance interfaces between the computer and Response time peripherals, and the memory technology Throughput used. CPU execution time As an example, it is an architectural design issue whether a computer will have a Objectives: multiply instruction. It is an organizational issue whether that instruction will be 1. To provides an overview of implemented by a special multiply unit or computer organization and by a mechanism that makes repeated use of architecture. the add unit of the system. The organization 2. To differentiate computer decision may be bases on the anticipated structure and computer frequency of use of the multiply instruction, function. the relative speed of the two approaches, 3. To understand the four main and the cost and physical size of a special functions of a computer. multiply unit. 4. To define the main structural components of a computer. Historically, and still today, the distinction 5. To list and briefly define the between architecture and organization has main structural components of been an important one. Many computer a processor manufacturers offer a family of computer INTRODUCTION model, all with the same architecture but with differences in organization. Computer architecture refers to those Consequently, the different models in the attributes of a system visible to a family have different price and performance programmer, or put another way, those characteristics. Furthermore, an attributes that have a direct impact on the architecture may survive many years, but logical execution of a program. its organization changes with changing Computer organization refers to the technology. operational units and their interconnection that realize the architecture specification. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 2 Difference of Computer Architecture and the next lower level. At each level, the Computer Organization designer is concerned with structure and function: Structure: The way in which the components are interrelated. Function: The operation of each individual component as part of the structure. Function In general terms, there are four main functions of a computer: Data processing Data storage Data movement Control Functions performed by a computer are: Structure and Function Accepting information to be processed as input. A computer is a complex system; Storing a list of instructions to contemporary computers contain millions process the information. of elementary electronic components. How, Processing the information according then, can one clearly describe them? The to the list of instructions. key is to recognize the hierarchical nature Providing the results of the of most complex system. A hierarchical processing as output. system is a set of interrelated subsystems, each of the later, in turn, hierarchical in structure until we reach some lowest level of elementary subsystem. The hierarchical nature of complex systems is essential to both their design and their description. The designer need only deal with a particular level of the system at a time. At each level, the system consists of a set of components and their interrelationships. The behavior at each level depends only on a simplified, abstracted characterization of the system at CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 3 data that are being worked on at any given moment. Thus, there is at least a short-term data storage function. Equally important, the computer performs a long-term data storage function. Files of data are stored on the computer for subsequent retrieval and update. The computer must be able to move data between itself and the outside world. The computer’s operating environment consists of devices that serve as either Figure 1.1 A functional view of the computer sources or destinations of data. When The computer, of course, must be able to data are received from or delivered to a process data. The data may take a wide device that is directly connected to the variety of forms, and the range of computer, the process is known as processing requirements is broad. input– output (I/O), and the device is However, we shall see that there are only referred to as a peripheral. When data a few fundamental methods or types of are moved over longer distances, t o or data processing. from a remote device, the process is known as data communications. Finally, there must be control of these three functions. Ultimately, this control is exercised by the individual(s) who provides the computer with instructions. Within the computer, a control unit manages the computer’s resources and orchestrates the performance of its functional parts in response to those instructions. At this general l ev el of discussion, the number of possible operations that can be performed is few. Figure 1.2 depicts the four possible types of operations. Figure 1.2a Operation: Data Movement The computer can function a s a data It is also essential that a computer store m o v e m e n t device (Figure 1. 2 a ), simply data. Even if the computer is processing transferring data from one peripheral or data on the fly (i.e., data come in and get communication line to another. It can processed, and the results go out also function as a data storage d evi ce immediately), the computer must (Figure 1. 2 b ), with data transferred temporarily store at least those pieces of CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 4 from the external environment to structure, to differentiate a variety of computer storage (read) and vice versa functions. (write). Figure 1.2b Operations: Storage Figure 1.2d Operation processing from storage to I/O (Figure 1. 2 b ), with data transferred from the external environment to computer storage (read) and vice versa STRUCTURE (write). Figure 1.3 is the simplest possible depiction of a computer. The computer interacts in some fashion with its external environment. In general, all its linkages to the external environment can be classified as peripheral devices or communication lines. Figure 1.2c Operation: processing from/to storage The preceding discussion may seem absurdly generalized. It is certainly Figure 1.3 Computer possible, even at a top level of computer CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 5 Computer system is consisting of Figure 1.4a Structure of the CPU hardware and software concept: ❖ Hardware consists of all electronic components and electronically devices, whereas computer software consists of instruction and data that computer manipulates to perform various data-processing tasks. The hardware consists of 3 main parts or some time they counted 5 main parts: ❖ Central Processing Unit (CPU): contain an arithmetic and logic unit for manipulating data, several Figure 1.4b Control Unit registers for storing data and control circuits for fetching and executing Arithmetic and Logic Unit instructions ALU is responsible for performing the ❖ Memory: contain storage for logical expressing, such as adding, instructions and date, it’s called a multiplying, suppose two numbers random-access memory (RAM) located in a memory are to be added, because the CPU can access any they brought into a processor and location in memory random and actual action carried out by ALU, the retrieve the binary information sum may be stored in memory or within a fixed interval of time. retained in the processor for ❖ Input and Output units: contain immediate use. electronic circuits for communicating The operands that brought to the and controlling the transfer of processor for logical action are stored information between the computer in a high-speed storage element and outside world. called register. Access time to the register is somewhat faster than access time to the fastest cache unit. Control unit: The memory, arithmetic and logic unit, and input and output units, stores and process information and perform input and output operations. The operation of these units must be coordinated in some way. This is the CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 6 task of the control unit. It’s the nerve Operating system overhead. center that send control signal to Waiting for I/O and other processes other unit and sense their states. Accessing disk and memory Time spent executing on the CPU or The operation of the computer can be execution time. summarized: 1. The computers accept information in Throughput is the total amount of work the form of programs and date done in a given time. thorough an input and output unit and stores it in the memory. CPU execution time is the total time a 2. Information stored in the memory is CPU spends computing on a given task. It fetched, under program control, into also excludes time for I/O or running other an arithmetic and logic unit, where it programs. This is also referred to as simply is processed. CPU time. 3. Processed information leaves the computer through an output unit. 4. All activates inside the machine are Performance is determined by execution directed by the control unit. time as performance is inversely proportional to execution time. Instructions specify commands to: Performance = (1 / Execution time) Transfer information within a And, computer (e.g., from memory to ALU) (Performance of A / Performance of B) Transfer of information between the = (Execution Time of B / Execution computer and I/O devices (e.g., from Time of A) keyboard to computer, or computer to printer) If given that Processor A is faster than Perform arithmetic and logic processor B, that means execution time of operations (e.g., Add two numbers, A is less than that of execution time of B. Perform a logical AND). Therefore, performance of A is greater than Computer performance is the amount of that of performance of B. work accomplished by a computer system. Example – The word performance in computer Machine A runs a program in 100 seconds, performance means “How well is the Machine B runs the same program in 125 computer doing the work it is supposed to seconds do?”, It basically depends on response time, throughput and execution time of a (Performance of A / Performance of B) computer system. = (Execution Time of B / Execution Response time is the time from start to Time of A) completion of a task. This also includes: = 125 / 100 = 1.25 CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 7 That means machine A is 1.25 times faster Decrease the number of required than Machine B cycles or improve ISA or Compiler And the time to execute a given program can be computed as: Execution time = CPU clock cycles x clock cycle time Since clock cycle time and clock rate are reciprocals, so, Execution time = CPU clock cycles / clock rate The number of CPU clock cycles can be determined by, CPU clock cycles = (No. of instructions / Program) x (Clock cycles / Instruction) = Instruction Count x CPI Which gives, Execution time = Instruction Count x CPI x clock cycle time = Instruction Count x CPI / clock rate How to Improve Performance? To improve performance, you can either: Decrease the CPI (clock cycles per instruction) by using new Hardware. Decrease the clock time or Increase clock rate by reducing propagation delays or by use pipelining. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 8 LESSON # 2 Instructions: Language of ◦ For each step, an arithmetic the Computer Operations, Operands or logical operation is done ◦ For each operation, a different Outline: set of control signals is needed Instruction Representation Data Program Concept Instruction Data are the “operands” upon which Operand And Opcode instructions operate. Instruction And Mnemonics Data could be: Fetch And Execute Cycles ◦ Numbers, Program Execution ◦ Encoded characters. MIPS instructions Data, in a broad sense means any digital information. Objectives: Computers use data that is encoded 1. To classify the instruction as a string of binary digits called bits. presentation. 2. To understand the program concepts Function of Control Unit and the instruction sets. For each operation a unique code is 3. To identify the operand and opcode. provided 4. To differentiate the fetch and execute o e.g. ADD, MOVE cycles. A hardware segment accepts the 5. To understand the program code and issues the control signals execution and MIPS Instructions Instructions are operations performed by PROGRAM CONCEPT the CPU. Operands are entities operated Hardwired systems are inflexible upon by the instruction. Addresses are the General purpose hardware can do locations in memory of specified data. different tasks, given correct control An instruction is a statement that is signals executed at runtime. An x86 instruction Instead of re-wiring, supply a new set statement can consist of four parts: of control signals Programs Label (optional) A sequence of instructions to perform Instruction (required) a task is called a program, which is Operands (instruction specific) stored in the memory. Comment (optional) Processor fetches instructions that make up a program from the memory The terms instruction and mnemonic are and performs the operations stated used interchangeably in this document to in those instructions refer to the names of x86 instructions. A sequence of steps Although the term opcode is sometimes CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 9 used as a synonym for instruction, this INSTRUCTION CYCLE document reserves the term opcode for the Two steps: hexadecimal representation of the — Fetch instruction value. — Execute Memory Address Register (MAR) - It holds the address of the main memory from where data is to be transferred. Program Counter (PC) - It holds the address of the next instructions. Instruction Register (IR) - It stores the instruction that is currently being executed. It gives the operation to the CU to generate Figure 2.2 Instruction Cycle the timing signal that controls the execution. FETCH CYCLE Program Counter (PC) holds address Memory Data Register / Buffer Register of next instruction to fetch (MDR) - It holds the data which write into / Processor fetches instruction from Read out of address location. memory location pointed to by PC Increment PC — Unless told otherwise Instruction loaded into Instruction Register (IR) Processor interprets instruction and performs required actions EXECUTE CYCLE Processor-memory — data transfer between CPU Figure 2.1 Components Computer: Top Level and main memory View Processor I/O Register: General purpose register R0 — Data transfer between CPU through Rn+1 is used for storing data. and I/O module Control Unit (CU) - The memory, arithmetic Data processing & logic, input & output units store & — Some arithmetic or logical process information & perform input / operation on data output operation. These operation units Control must be co-ordinate in same way. This is — Alteration of sequence of the task of the CU. Send control signal to operations other units and sense their state. — e.g. jump CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 10 EXAMPLE OF PROGRAM EXECUTION op: operation code (opcode) rs: first source register number rt: second source register number rd: destination register number shamt: shift amount (00000 for now) funct: function code (extends opcode) REPRESENTING INSTRUCTIONS Instruction Set To command a computer’s hardware, you must speak its language. The words of a computer’s language are called 00000010 00110010 01000000 001000002 instructions, and its vocabulary is called = 0232402016 instruction set. Different computers have different instruction sets Instructions are encoded in binary – Called machine code MIPS instructions Encoded as 32-bit instruction words Small number of formats encoding operation code (opcode), register numbers, Components of an ISA Register numbers Organization of programmable storage –registers –memory: flat, $t0 – $t7 are regs 8 – 15 – segmented -Modes of addressing and $t8 – $t9 are regs 24 – 25 – accessing data items and $s0 – $s7 are regs 16 – 23 instructions Data types and data structures – encoding and representation (next chapter) Instruction formats Instruction fields CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 11 Instruction set (or operation code) – ALU, control transfer, exceptional handling The MIPS instruction-set architecture has characteristics based on conclusions from previous lectures. It is a load- store architecture that uses general- purpose registers. It has 32 floating-point registers, which can hold either single- precision (32-bit) or double-precision (64- bit) values. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 12 LESSON #3 Flynn’s Taxonomy program than computers with a single and Parallel Computing processor because the architecture of parallel computers varies accordingly, and the processes of multiple CPUs must be Outline: coordinated and synchronized. Flynn’s Classification Singles Instruction Single Data Based on the number of instruction and Singles Instruction Multiple Data data streams that can be processed Multiple Instruction Single Data simultaneously, computing systems are Multiple Instruction Multiple Data classified into four major categories: Objectives: 1. To understand the Flynn’s taxonomy 2. To identify the Flynn’s classification. 3. To understand the parallel processor organizations 4. To improve the performance of the CPU. Flynn's taxonomy is a classification of FLYNN’S CLASSIFICATION – computing systems proposed by Michael 1. Single-instruction, single-data Flynn. The basic idea of the classification is (SISD) systems – that computer programs are composed of two streams: data stream and task (or An SISD computing system is a instructional) stream. This gives the uniprocessor machine which can possibility of four combinations depending execute a single instruction, operating on the streams being serial (single stream) on a single data stream. In SISD, or parallel (multiple streams). machine instructions are processed in a sequential manner and computers Parallel computing is a computing where adopting this model are popularly the jobs are broken into discrete parts that called sequential computers. Most can be executed concurrently. Each part is conventional computers have SISD further broken down to a series of architecture. All the instructions and instructions. Instructions from each part data to be processed must be stored in execute simultaneously on different CPUs. primary memory. Parallel systems deal with the simultaneous use of multiple computer resources that can include a single computer with multiple processors, several computers connected by a network to form a parallel processing cluster or a combination of both. Parallel systems are more difficult to CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 13 3. Multiple-instruction, single-data (MISD) systems – MISD computing system is a multiprocessor machine capable of executing different instructions on different PEs but all of them operating on the same data set. The speed of the processing element in the SISD model is limited (dependent) by the rate at which the computer can transfer information internally. Dominant representative SISD systems are IBM PC, workstations. 2. Single-instruction, multiple-data Example Z = sin(x)+cos(x)+tan(x) (SIMD) systems – The system performs different A SIMD system is a multiprocessor machine operations on the same data set. capable of executing the same instruction Machines built using the MISD model on all the CPUs but operating on different are not useful in most of the data streams. Machines based on a SIMD application, a few machines are built, model are well suited to scientific but none of them are available computing since they involve lots of vector commercially. and matrix operations. So that the 4. Multiple-instruction, multiple-data information can be passed to all the (MIMD) systems – processing elements (PEs) organized data A MIMD system is a multiprocessor elements of vectors can be divided into machine which can execute multiple multiple sets (N-sets for N PE systems) and instructions on multiple data sets. each PE can process one data set. Each PE in the MIMD model has separate instruction and data streams; therefore, machines built using this model are capable to any kind of application. Unlike SIMD and MISD machines, PEs in MIMD machines work asynchronously. Dominant representative SIMD systems is Cray’s vector processing machine. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 14 MIMD architecture is easier to program but is less tolerant to failures and harder to extend with respect to the distributed memory MIMD model. Failures in a shared-memory MIMD affect the entire system, whereas this is not the case of the distributed model, in which each of the PEs can be easily isolated. Moreover, shared MIMD machines are broadly memory MIMD architectures are less categorized into shared-memory likely to scale because the addition of MIMD and distributed-memory more PEs leads to memory contention. MIMD based on the way PEs are This is a situation that does not coupled to the main memory. happen in the case of distributed memory, in which each PE has its own In the shared memory MIMD model memory. As a result of practical (tightly coupled multiprocessor outcomes and user’s requirement systems), all the PEs are connected to distributed memory MIMD a single global memory, and they all architecture is superior to the other have access to it. The communication existing models. between PEs in this model takes place through the shared memory, TO IMPROVE THE PERFORMANCE modification of the data stored in the OF A CPU WE HAVE TWO OPTIONS: global memory by one PE is visible to 1) Improve the hardware by all other PEs. Dominant representative introducing faster circuits. shared memory MIMD systems are 2) arrange the hardware such that Silicon Graphics machines and more than one operation can be Sun/IBM’s SMP (Symmetric Multi- performed at the same time. Processing). In Distributed memory MIMD machines (loosely coupled multiprocessor systems) all PEs have a local memory. The communication between PEs in this model takes place through the interconnection network (the inter process communication channel, or IPC). The network connecting PEs can be configured to tree, mesh or in accordance with the requirement. The shared-memory CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 15 LESSON #4 Amdahl’s Law, workload that can be expected of a system Microarchitecture, and Instruction Set whose resources are improved. In other Architecture words, it is a formula used to find the maximum improvement possible by just Outline: improving a particular part of a system. It is Amdahl’s Law and Its Proof often used in parallel computing to predict Classifying Instruction Set Architectures and Instruction the theoretical speedup when using Formats multiple processors. Microarchitecture and Instruction SPEEDUP Sets Architecture Arithmetic/Logic Instructions Speed up is defined as the ratio of Data transfer Instructions performance for the entire task using the Branch and Jump Instructions enhancement and performance for the entire task without using the enhancement Objectives: or speedup can be defined as the ratio of 1. To understand the Amdahl’s law and execution time for the entire task without compute the number of processors using the enhancement and execution time performance. for the entire task using the enhancement. 2. To evaluate the speed up formulas If Pe is the performance for entire task and applications using the enhancement when 3. To define the Microarchitectures and possible, Pw is the performance for entire Instruction Set Architecture. task without using the enhancement, Ew is 4. To determine the three types MIPS the execution time for entire task without Instructions using the enhancement and Ee is the execution time for entire task using the Amdahl's Law states that in parallelization, enhancement when possible then, if P is the proportion of a system or program Speedup = Pe/Pw that can be made parallel, and 1-P is the or proportion that remains serial, then the Speedup = Ew/Ee maximum speedup S(N) that can be Amdahl’s law uses two factors to find achieved using N processors is: S(N)=1/((1- speedup from some enhancement – P)+(P/N)) Fraction enhanced – The fraction of the computation time in the original computer AMDAHL’S LAW AND ITS PROOF that can be converted to take advantage of It is named after computer scientist Gene the enhancement. For example- if 10 Amdahl (a computer architect from IBM and seconds of the execution time of a program Amdahl corporation), and was presented at that takes 40 seconds in total can use an the AFIPS Spring Joint Computer enhancement, the fraction is 10/40. This Conference in 1967. It is also known obtained value is Fraction Enhanced. as Amdahl’s argument. It is a formula Fraction enhanced is always less than 1. which gives the theoretical speedup in Speedup enhanced – The improvement latency of the execution of a task at a fixed gained by the enhanced execution mode; CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 16 that is, how much faster the task would run MICROARCHITECTURE AND if the enhanced mode were used for the INSTRUCTION SET ARCHITECTURE entire program. For example – If the Instruction Set Architecture (ISA) is and enhanced mode takes, say 3 seconds for a what is the difference between portion of the program, while it is 6 seconds an ‘ISA’ and Microarchitecture. An ISA is in the original mode, the improvement is defined as the design of a computer from 6/3. This value is the Programmer’s Perspective. Speedup enhanced. This basically means that an ISA describes Speedup Enhanced is always greater than the design of a computer in terms of 1. the basic operations it must support. The ISA is not concerned with the The overall Speedup is the ratio of the implementation specific details of a execution time: - computer. It is only concerned with the set or collection of basic operations the computer must support. For example the AMD Athlon and the Core 2 Duo processors have entirely different implementations but they support more or less the same set of basic operations as defined in the x86 Instruction Set. Proof: Let Speedup be S, old execution time be T, Let us try to understand the Objectives of new execution time be T’ , execution time an ISA by taking the example of the MIPS that is taken by portion A(that will be ISA. MIPS is one of the most widely used enhanced) is t, execution time that is taken ISAs in education due to its simplicity. by portion A(after enhancing) is t’, execution time that is taken by portion that won’t be The ISA defines the types of enhanced is tn, Fraction enhanced is f’, instructions to be supported by the Speedup enhanced is S’. processor. Now from the equation Based on the type of operations they perform MIPS Instructions are classified into 3 types: Arithmetic/Logic Instructions These Instructions perform various Arithmetic & Logical operations on one or more operands. Data Transfer Instructions: CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 17 These instructions are responsible for the transfer of instructions from memory to the processor registers and vice versa. Branch and Jump Instructions: These instructions are responsible for breaking the sequential flow of instructions and jumping to instructions at various other locations, this is necessary for the implementation of functions and conditional statements. Figure – The Abstraction Hierarchy 1. The ISA defines the maximum Microarchitectural- level lies just below length of each type of instruction. the ISA level and hence is concerned with Since the MIPS is a 32-bit ISA, each the implementation of the basic operations instruction must be accommodated to be supported by the computer as defined within 32 bits. by the ISA. Therefore, we can say that the AMD Athlon and Core 2 Duo processors are 2. The ISA defines the Instruction based on the same ISA but have different Format of each type of instruction. microarchitectures with different The Instruction Format determines performance and efficiencies. how the entire instruction is encoded within 32 bits Microarchitecture and ISA? The answer to this lies in the need to standardize and maintain the compatibility There are 3 types of Instruction of programs across different hardware Formats in the MIPS ISA: implementations based on the same ISA. R-Instruction Format Making different machines compatible with I-Instruction Format the same set of basic instructions (The ISA) J-Instruction Format allows the same program to run smoothly Each of the above Instruction Formats on many different machines thereby making have different instruction encoding it easier for the programmers to document schemes, and hence need to be and maintain code for many different interpreted differently by the machines simultaneously and efficiently. processor. This Flexibility is the reason we first define an ISA and then design different microarchitectures complying with this ISA for implementing the machine. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 18 The design of the x86 was developed by MICROARCHITECTURE Intel, but we see that almost every year Intel The Microarchitecture is more concerned comes up with a new generation of i-series with the lower-level implementation of how processors. The x86 architecture on which the instructions are going to be executed most of the Intel Processors are based and deals with concepts like Instruction essentially remains the same across all Pipelining, Branch Prediction, and Out of these generations but, where they differ is Order Execution. in the underlying Microarchitecture. They differ in their implementation, and hence The x86 was developed by Intel, but we see are claimed to have improved Performance, that almost every year Intel comes up with Therefore in conclusion, we can say that a new generation of i-series processors. different machines may be based on the The x86 architecture on which most of the same ISA, but have different Intel Microarchitectures. Processors are based essentially remains the same across all these generations The ISA is responsible for defining the set CLASSIFYING INSTRUCTION SET of instructions to be supported by the ARCHITECTURES processor. For example, some of the Using the type of internal storage in instructions defined by the ARMv7 ISA are the CPU. given below. INSTRUCTION SET ARCHITECTURE The ISA is responsible for defining the set of instructions to be supported by the processor. For example, some of the instructions defined by the ARMv7 ISA are given below. The Branch of Computer Architecture is Using the number of explicit more inclined towards the Analysis and operands named per instructions. Design of Instruction Set Architecture. Using operand location. For Example, Intel developed Can ALU operands be located in the x86 architecture, ARM developed memory? the ARM architecture, & AMD developed RISC architecture requires all the amd64 architecture. The RISC-V ISA operands in register. developed by UC Berkeley is an example of Stack architecture requires all a Open Source ISA. operands in stack. ( top portion of the stack inside CPU; the rest in memory) Using operations provided in the ISA. Using types and size of operands. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 19 METRICS FOR EVALUATING DIFFERENT ISA Code size (How many bytes in a program’s executable code?) Code density (How many instructions in x K Bytes?) Instruction length. Stack architecture has the best code density. (No operand in ALU ops) Code efficiency. (Are there limitation on operand access?)Stack cannot be randomly accessed. e.g. Mtop-x x>=2 cannot be directly accessed. Mtop-Mtop-1 is translated into subtract followed by negate Bottleneck in operand traffic. Stack will be the bottleneck; ––both input operands from the stack and result goes back to the stack. Memory traffic (How many memory references in a program (i+d)?) Accumulator Arch: Each ALU operation involves one memory reference. Easiness in writing compilers. –– General-Purpose Register Arch: we have more registers to allocate. more choices more difficult to write. Easiness in writing assembly programs. Stack Arch: you need to use reverse polish expression. LESSON # 5 Stack Organization CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 20 And Instruction Format REGISTER STACK Stack can be organized as a collection of a Outline: finite number of registers. Stack Organization Push Operation Pop Operation Stack Operations: Reverse Polish A stack can be organized as a Notation (Postfix) Instruction Formats collection of a finite number of Zero Instruction Format registers. One Instruction Format Two Instruction Format Three Instruction Format Objectives: 1. To understand how the data store and retrieve using stack organization. 2. To write the Reverse polish notation 3. To identify the different instruction format. In a 64-word stack, the stack pointer contains 6 bits. 4. To evaluate an instruction using the The one-bit register FULL is set to 1 when the stack is full; four-instruction format. EMPTY register is 1 when the stack is empty. The data register DR holds the data to be written STACK ORGANIZATION into or read from the stack. Stack: A storage device that stores information in such a manner that the item stored last The following are the micro-operations is the first item retrieved. associated with the stack. Also called last-in first-out (LIFO) list. Useful for compound Initialization arithmetic operations and nested SP ¬ 0, EMPTY ¬ 1, FULL ¬ 0 subroutine calls. Push operation SP ¬ SP + 1 Stack pointer (SP): A register M[SP] ¬ DR that holds the address of the top If (SP = 0) then (FULL ¬ 1) Note that SP becomes 0 after 63 item in the stack. EMPTY ¬ 0 SP always points at the top item in the stack Push: Operation to insert an item into the stack. retrieve an item from the stack Pop: Operation to retrieve an item from the stack. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 21 12 56 * + 12 30+ 42 Reverse Polish notation evaluation with a stack. Stack is the most efficient way for evaluating arithmetic expressions. Stack evaluation: STACK OPERATIONS: REVERSE POLISH Get value NOTATION (postfix) If value is data: push data Reverse polish notation is a postfix Else if value is operation: pop, pop notation (places operators after operands) evaluate and push (Example) (Example) using stacks to do this. Infix notation A+B 3 * 4 + 5 * 6 = 42 Reverse Polish notation: AB+ => 3 4 * 5 6 * + also called postfix. A stack organization is very effective for evaluating arithmetic expressions A * B + C * D → (AB *)+(CD *) → AB * CD * + ( 5 * 4 ) + ( 3 * 6 ) → 54 * 36 * + Evaluation procedure: INSTRUCTION FORMATS 1. Scan the expression from left to The most common fields in instruction right. formats are: 2. When an operator is reached, perform the operation with the two 1. Mode field: Specifies the way the operands found on the left side of the effective address is determined operator. 2. Operation code: Specifies the 3. Replace the two operands and the operations to be performed. operator by the result obtained from 3. Address field: Designates a memory the operation. address or a processor register (Example) infix 3* 4 + 5 * 6 = 42 postfix 3 4 * 5 6 * + Mode Opcode Address CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 22 FOUR INSTRUCTION FORMATS One address can be a register name or memory address. 1. Zero address instruction: Stack is used. Arithmetic operation pops two Single Accumulator Organization operands from the stack and pushes Since the accumulator always provides the result. one operand, only one memory address needs to be specified. Instruction: ADD Instruction: ADD X Push and pop operations need to Micro-operation: AC ¬ AC + M[X] specify one address involved in data transfer. Stack-organized computer does not use an address field for the instructions ADD, and MUL 3. Two address instructions: Two address registers or two memory Instruction: POP X locations are specified, one for the Evaluate X = (A + B) * (C + D) final result. PUSH, and POP instructions need an address field to specify the Assumes that the destination operand address is the same as that of the first operand. Can be a memory address or a register name. Instruction: ADD R1, R2 Microoperation: R1 R1 + R2 Advantages: No memory addresses needed during the operation. Disadvantages: results in longer program codes. Most common in commercial computers Each address fields specify either a 2. One address instruction: AC and processor register or a memory operand memory. Since the accumulator Advantages: results in writing always provides one operand, only medium size programs one memory address needs to be Disadvantages: more bits are needed specified. to specify two addresses CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 23 4. Three address instructions: Three address registers or memory locations are specified, one for the final result. It is also called general address organization. GENERAL REGISTER ORGANIZATION Three address instructions: Memory addresses for the two operands and one destination need to be specified. Instruction: ADD R1, R2, R3 Microoperation: R1 R2 + R3 ADD R1, R2, R3 R1 R 2 + R 3 Advantages: results in writing short programs CPU CACHE A CPU Cache is a cache used by the central processing unit of a computer to reduce the CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 24 average time to access data from the main A memory location that is near a memory. The Cache is a smaller, faster recently referenced location is more memory which stores copies of the data likely to be referenced then a memory from frequently used main memory location that is farther away. location. A small but fast cache memory, in which the contents of the most CACHE ENTRY commonly accessed locations are maintained, can be placed between Data is transferred between memory the CPU and the main memory. and cache in blocks of fixed size, When a program executes, the cache called cache lines. When a cache line memory is searched first. is copied from memory into the cache, a cache entry is created. Why is cache memory fast? The cache entry will include the Faster electronics used copied data as well as the requested A cache memory has fewer locations memory location (now called a tag). than a main memory, which reduces When the processor needs to read or the access time write a location in main memory, it The cache is placed both physically first checks for a corresponding entry closer and logically closer the CPU in the cache. than the main memory The cache checks for the contents of This cache less computer usually the requested memory location in needs a few bus cycles to any cache lines that might contain synchronize the CPU with the bus. that address. If the processor finds A cache memory can be positioned that the memory location is in the closer to the CPU. cache, a cache hit has occurred. However, if the processor does not find the memory location in the cache, a cache miss has occurred. In the case of a cache hit, the processor immediately reads or writes the data in the cache line. For a cache miss, the cache allocates a new entry and copies in data from main memory, then the request is fulfilled from the contents of the cache CACHE MAPPING Why is cache memory needed? When a program references a Commonly used methods: memory location, it is likely to 1. Associative Mapped Cache reference that same memory location again soon. CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 25 Any main memory blocks can be (7FED780)16, which is made up of the 27 mapped into each cache slot. most significant bits of the address. To keep track of which of the 227 possible blocks is in each slot, a 27- bit tag field is added to each slot. Valid bit is needed to indicate whether the slot holds a line that belongs to the program being Advantages Any main memory block can be placed into any cache slot. Regardless of how irregular the data and program references are, if a slot is available for the block, it can be stored in the cache. Disadvantages Considerable hardware overhead executed. needed for cache bookkeeping. Dirty bit keeps track of whether a line There must be a mechanism for has been modified while it is in the searching the tag memory in parallel. cache. The mapping from main memory blocks to cache slots is performed by 2. Direct-Mapped Cache partitioning an address into fields. Each cache slot corresponds to an For each slot, if the valid bit is 1, then explicit set of main memory. the tag field of the referenced address In our example we have 227 memory is compared with the tag field of the blocks and 214 cache slots. slot. A total of 227 / 214 = 213 main memory blocks can be mapped onto each cache slot. How an access to the memory location (FFDAF014)16 is mapped to the cache. If the addressed word is in the cache, it will be found in word (14)16 of a slot that has a tag of CITY COLLEGE OF ANGELES – ICSLIS 7COMPARC -COMPUTER ARCHITECTURE AND ORGANIZATION | 26 an entire block to be read into the cache even though only a single word is used. 3. Set-Associative Mapped Cache Combines the simplicity of direct mapping with the flexibility of associative mapping For this example, two slots make up a set. Since there are 214 slots in the cache, there are 214/2 =213 sets. The 32-bit main memory address is partitioned into a 13-bit tag field, followed by a 14-bit slot field, followed by a five-bit word field. When a reference is made to the main memory address, the slot field identifies in which of the 214 slots the block will be found. When an address is mapped to a set, If the valid bit is 1, then the tag field the direct mapping scheme is used, of the referenced address is and then associative mapping is compared with the tag field of the used within a set. slot. The format for an address has 13 bits in the Advantages set field, which identifies the set in which The tag memory is much smaller the addressed word will be found. Five bits than in associative mapped cache. are used for the word field and 14-bit tag No need for an associative search, field since the slot field is used to direct the comparison to a single field. Disadvantages Consider what happens when a program references locations that are 219 words apart, which is the size of the cache. Every memory reference will result in a miss, which will cause CITY COLLEGE OF ANGELES – ICSLIS