Underg Architecture.docx

**A REVISION NOTE** **ON** **COMPUTER ARCHITECTURE** **\ Introduction** The characteristics of different computers vary considerably from category to category. Computers for data processing activities have different system features than those with scientific features. Even computers configured within the same application area have variations in designs. Computer architecture is the science of integrating those components to achieve a level of functionality and performance. Also, it is logical organization or designs of the hardware that make up the computer system. **General Computer Architecture** A typical Computer System can be divided into 3 sections namely: \(a) Control section: It executes instructions and processes data \(b) Memory section: It stores data instructions and data \(c) Input and Output section: Handles communications between the computer and the outside world. Also, these sections are interconnected to one another by tiny wires which constitute the conducting paths called buses. Buses are divided into three: - Control bus - Data bus - Address bus Control Bus: used to send control signals generated in timing and control circuits of microprocessor to the memory and the input and output units (I/O). E.g. it carries read or write signals to memory and I/O parts. Data Bus: used to transfer machine instructions and data from the memory to the microprocessor; and data associated with I/O transfers only. Data bus is bi-directional in nature as distinct form control and address buses (one direction). It consists of tracts that carry 8bits or 1byte word simultaneously. Address bus: used to transfer the address of the location in memory or the I/O part involved in data transfer. **NB**: Some of these buses may be physically the same; a single bus may carry data in different directions or addresses at times. As a result, a bus for more than one purpose is shared or multiplexed. Shown below is the block diagram of a typical computer system. **INPUT/OUTPUT SECTION** It handles the transfer of data between the computer and external devices or peripherals. The transfer involves status and control signal as well as data. The I/O must reconcile time difference between the computer and the peripherals, format the data properly, handle status and control signals and supply the required voltage level. Irregular transfer may be handled with interrupts control signals that received the immediate attention of the control section and cause the suspension of its normal activity. **A typical I/P operation proceeds as follows:** - The peripherals signal the control section that new data is available. The I/O section must format the signal properly and hold it until the control section accepts it. - It sends data to the control section. The I/O section must hold the data until control section is - The control section reads data. The I/O section must have a decoding mechanism that selects a particular part of the instruction. After reading the data, the control signal is deactivated and the peripheral is acknowledged for more data to be sent. Output operations are much like Input operations. - The peripheral indicates to the control sections that it is ready to accept data. - The control section then sends the data along with a signal (a strobe) that indicates to the peripheral that the data is available. - The I/O section formats the data and control signal properly; and holds the data long enough for the peripheral to use it. - O/P data must be held longer than I/P data, since mechanical devices, lighted displays and human observers respond much more slowly than computers. **Functions of the I/O Section** 1. The I/O section must perform many simple interfacing tasks. It must place signals in the proper form for both the control section and the peripherals to understand. 2. The I/O section may perform some functions that the control section could perform such as the following: i. Conversion of data between serial (one bit at a time) and parallel (more than one bit at a time) form; ii. The insertion or deletion of special patterns that mark the beginning or end of a data transfer; iii. ### The calculation and checking of error detection codes, such as parity. 3. Also, the I/O section of a computer may be programmable or may even contain a processor so **THE CONTROL SECTION** **In any typical computer system, control section is vested with the overall responsibility and ability to coordinate the activity of the machine. It is the "brain" where "every thought" and synchronization of all other sections in the machine is initiated. Besides, other functions being performed are highlighted as follows:** \- The control section (CPU or Processor) processes the data \- It fetches instructions from memory, decodes and executes them. \- It generates timing or control signals. \- It transfer data to and from the memory and I/O sections \- It performs arithmetic and logical functions \- It recognizes external signal. I/P Data Bus O/P Data Bus Figure 2: A typical example of Control Section During each instructions cycle, the control section performs many tasks: 1\. It places the address of the instruction on the memory address bus; 2\. It takes the instruction from the input data and memory address bus; 3\. It fetches the address and data required by the instruction. The addresses and data may be in memory or in any of the registers; 4\. It performs the operation specified by the instruction code. The operation may be an arithmetical or a logical function, a data transfer or a management functions; 5\. It looks for control signals, such as interrupts and provides appropriate responses that are required; 6\. It provides status, control and timing signals that the memory and I/O section can use. The control section, or CPU is the centre of the computer operations because of its ability to both process data and direct the entire computer. **MEMORY SECTION** It consists of storage units which comprises either magnetic cores or semiconductors cells. The units are of binary constituents that have two stable states i.e. values one and zero. Memory is organized sequentially into bytes and words, each of which has a unique address. Its contents are rightly accessed by virtue of their unique addresses. Based on its designs, memory could either be accessed randomly or sequentially. Different locations in random memory are accessed at virtually the same amount of time while sequentially memory locations are accessed at distinctly different times. **Access time** is the time it takes to find the correct memory location and obtaining its content. It affects the speed of the computer since the computer must obtain its instructions and most of the data from memory. Memory access speed in most computers ranges between 100ns (Nona-second) to several microseconds. Memory sections are often subdivided into units called pages. The entire memory may involve billions of words, whereas a page contains between 256 and 4k words. Most memory accesses are accomplished by first accessing the required particular page and then accessing the required location on that page. **Advantage:** The computer can reach several locations on the same page The control section transfers data to and from the memory as follows: 1 The control section sends an address to the memory. 2 The control section sends a signal (read/write) to the memory to indicate the direction of transfer of the signal. 3 The control section waits until the transfer has completed. This delay precedes the actual data. **\ Registers** These are some memories that reside in the control section aside the arithmetic and logic unit, address decoding unit, instruction handling unit and; timing and control unit. Like other form of memories, registers consist of binary cells and its capacity is in bytes. They are advantageous under program control because the CPU can obtain data from them without memory access. However, those that are not under program control allow the control section to save on them for later use. They have internal buses by which they are interconnected together. Register are very expensive because of the complexity and difficulty involved in producing it especially at the manufacturing stage; hence limiting the number found in a computer. There are different types of registers \- Program Counter \- Instruction Registers \- Memory Address Register \- Accumulators \- General purpose Registers \- Condition Code Registers \- Stack Pointers **Program Counter (PC)**: Contains the address of the memory location that contains the next instruction. The instruction cycle begins with the CPU placing the contents of the PC on the address bus, the CPU then fetches the first word of the instruction from memory. The CPU also increments the contents of the PC so that the next instruction cycle will fetch the next sequential address in memory. The CPU fetches the instructions sequentially until there is a branch or jump instruction that changes the contents of the PC. **Instruction Register (IR):** Holds instructions until it is decoded. The bit length of the instruction register is the length of the basic instruction word that the computer system would accommodate. The IR cannot rightly be accessed by the programmer because it is under the total power of the section. Some computers have two IRs so that they can fetch and save one while executing the previous one (pipe-lining process). **Memory Address Registers (MAR):** Holds the address of data in memory. The address may be part of the instructions or may be provided separately from the instruction or the program e.g. LOAD ACCUMULATOR FROM MEMORY loads the contents of desired memory locations and places it in accumulator. **Accumulators:** These are temporary storage used during calculations. In arithmetic operation, the accumulators always constitute one of the operands. The accumulators may be used in logical operations, shifts and rotation operations. They are used to temporarily store intermediate results in both arithmetic and logical operations. Accumulators are the most widely used registers in computer systems. A computer with one accumulator is slower in operation than that with two or more accumulator. Example is A\*B+C\*D +-----------------------------------+-----------------------------------+ | Single Accumulator | Two Accumulators | +-----------------------------------+-----------------------------------+ | Load Accumulator with a | Load Accumulator1 with A | | | | | Multiply by B | Multiply by B | | | | | Store in Temp Register 1 | Load Accumulator2 with C | | | | | Load Accumulator with C | Multiply by D | | | | | Multiply by D | Add the Accumulators | | | | | Add temp Storage | | +-----------------------------------+-----------------------------------+ **General Purpose Registers:** perform a variety of functions. They could be made to serve as temporary storage for data or addresses. They could be made to perform the functions of the program counters or accumulators etc. **Index Registers** They are normally used for addressing. The contents of the index registers are added to the memory address which an instruction would use. The sum is the actual address of the data or effective address. The same instruction can be used to handle different addresses by changing the contents of the index register. Index registers allow easy transfer of data from one place to another. Also, data are easily accessed in array memory. **Condition Code or Status Registers:** These are indicators (flags) that represent the state or present condition inside the computer system. The flags are the basis of decision-making. Different computers have different number and types of flags based on their area of application and technology. Among the common flags that exist are Carry, Zero, Overflow, Sign, Purity, Interrupt enable etc. Newer computers have several flags. The common ones are explained as follows: CARRY -1: If the last operation generated a carry from the most significant bit. The carry flag can retain a single bit of information and the carry would then be handled from one word to the next, as in multiple arithmetic. ZERO-1: If the result of arithmetic operation produces zero its flag bit indicates 1. OVERFLOW-1: If the last operation produced a twos complement overflow. The overflow bit determines if the result of an arithmetic operation has exceeded the capacity of a word. SIGN -1: If the most significant bit of the result of the operation was negative then the sign flag indicates 1. It is useful in arithmetic in examining the bits within a word. PARITY-1: If the number one bits in the result of the last operation was even (even parity), bit 1 is indicated otherwise or it indicates bit 0 (odd parity). It is useful in character manipulation and communications. HALF CARRY -1: If the last operation generated a carry from the lower half word. INTERRUPT enable -1: If an interrupt is allowed, otherwise 0 if not. **Stack Pointer:** It\'s a register that contains the address of the top of a stack. **Arithmetic Units** The arithmetic part of a control section varies from a simple adder to a complex unit that performs many arithmetic and logical functions. Modern computers have binary adder to produce a binary sum and a carry bit. Subtraction operation is being performed by using two\'s complement form while multiplication and division are performed by repeated additions and subtraction. Extra circuitry can form other status such as a zero indicator. **Single-Chip Arithmetic:** Logic units (ALUS) consist of a binary address and other logical circuitry. ALU performs both arithmetic and logical operations such as: -Addition -Subtraction -Division - Multiplication Logical operations such as: -logical AND - clear -logical (inclusive) OR - complement) \- Increment (add) - decrement (subtract) **Instruction Handling Areas** The CPU must translate the instruction it obtains from memory into control signals that produce the desired actions. For example with instruction: Add R1 and R2 and place the result in R3. If the computer is organized as: Then (1) The contents of R1 are placed in Temp R1 \(2) The contents of R2 are placed in Temp R2 \(3) The addition is performed and the result is placed in R3. The CPU must also obtain the instruction from memory, place it in the instruction register and prepare to fetch the next instruction. The CPU will hence perform an instruction cycle follows: - The CPU places the program counter (PC) on the memory address bus in order to fetch instruction. - The CPU increments the PC so as to be ready to fetch the next instruction. - The CPU takes the instruction from the memory data input/ output bus and places it in instruction register - The CPU decodes and executes the instruction The figures show the demonstration of execution of the instruction ADD R1 & R2 and place the result in R3. **The Memory** The central memory units store programs currently being executed by the computer and firmware. Firmwares are permanently stored programs which are necessary for the user to operate the computer systems. In microcomputers, two types of memory devices are employed. **ROM (Read Only Memory)** is a permanent storage which can only be read i.e. program instruction can only be copied from ROM memory. No information can only be written into ROM. Programs instructions are "burnt" into its store during manufacturing or possibly by the user. It is employed to store permanently required program, the firmware. For example, monitor programs which control the operation of the micro computer system and allow users to run applications, to input and output data, to examine and monitor main memory and dedicated programs such as the control of industrial program. ROM is nonvolatile i.e. it does not lose information when there is no power supply. Among the several uses of ROM are program storage function tables, clock generating, clock conversion table, character generation tables, mathematical or unit conversion, linearization tables, random function generators division and multiplication tables, instruction or diagnostic decoder, pattern generators, diagnostic messages, and teaching aids. **Read Access memory (RAM)** It is normally called Read/Write memory; it is designed to have information written into it and read out of it. In a RAM, each word is accessed or retrieved in a given amount of time. It is accessed randomly in the same amount of time i.e. the access time is independent of the position of the word in a memory. It can be used for storing application programs and for storing intermediate result during program execution. Information stored in RAM can be read or modified.RAM is volatile because data stored in it is lost if there is no power supply. **Advantages of ROMs & RAMs are**: **For ROMs** - ROMs are non volatile - ROMs are cheaper than RAMs - The contents of ROM are always known and can be verified. - ROMs are easy to test - ROMs are more reliable than RAMs because their circuitry is simpler - ROMs are static and do not require refresh - ROMs are easier to interface than RAMs - ROMs cannot be accidentally changed **For RAMs** - RAMs can be updated or corrected - RAMs can serve as temporary data storage - RAMs do not require lead time like ROMs - RAMs do not require programming equipment like PROMs **Programmable Read Only Memory (PROMs)**: The only difference between PROM & ROM is that users and manufacturers respectively determine their contents. **Erasable PROM (EPROM)**: Its contents are lost through exposure to high intensity short wave ultra violet light between 10 and 20 minutes after being removed from the circuit board **Electrically Alterable ROMs (EAROMs)** can be reprogrammed electrically without removal from the circuit board unlike EPROMs. Its contents can be changed in a matter of milliseconds. It has the features of being nonvolatile or nondestructive read out. **Specific features of Microprocessor Architecture** This is concerned with the registers, arithmetic units and instruction decoding mechanisms of microprocessors. Microprocessors registers:-It differs from those of larger computers for the following reasons 1. Limited chip size: a single microprocessor therefore cannot save addresses or data in program memory. 2\. Use of read only program memory 3\. Limited read /write memory; 4\. Short word length: A memory address may occupy several data words; 5\. Interrupt driven operations. **The following are the architectural features of microprocessors**: They have several general purpose registers. For example, the fair-child F-8 has 64 GPR Intel 8080 has 6 etc expect a few like Motorola 6800 which has none. Almost every microprocessor has a single accumulator Almost all microprocessors have a stack for saving subroutine return address They have special features which involve the use of a different set of registers dunning the interrupt service e.g. Intel 4040 and signetics 2650 have this feature. Microprocessors with only a few registers e.g. (Motorola 6800) can respond to interrupt quickly since they have little to save. The short words of length of microprocessors make the handing of addresses difficult. Registers may alleviate this problem in the following ways: Varied register length: Often some registers are longer than the normal word length of the processor particularly program counters memory, address registers, stack pointers, index register and return address, stacks. Special instructions may be available to load and manipulate their contents. Most microprocessors have an arithmetic unit with a simple bus structure. Generally, one operand is an accumulator and the other operand is a temporally register; the result is sent to the accumulator. Many microprocessors have special read only memory or their circuit to perform Binary Coded Decimal (BCD) addition and BCD subtraction. **\ INSTRUCTION FORMATS** An Instruction manipulates stored data and a sequence of instructions constitutes a program. In general, an instruction has two components. 1\. Operation-Code (op-Code) field. 2\. Address field The Op-Code field specifies how data is to be manipulated. A data item may either reside within a microprocessor register or in the Memory. Thus, the purpose of the address field is to indicate the address of a data item. Also, there are operations that require data to be stored in two or more addresses i. e. an address may contain more than one address. For example, consider the following instruction: ADD R1, R0 Assume R1 Source Register OP-code field Address field R0 Destination Register The instructions address the content of registers RO and R1 and saved the sum in Register R0. The number and types of instructions supported by a microprocessor vary from one another. This depends primarily on the architecture of the machine in use. There are different types of instruction formats depending on the number of addresses specified. These are: \- Three -- address format \- Two -- address format \- One -- address format \- Zero -- address format **3 -- Address Format:-** It takes the following general form (OP-Code) Addr 1, Addr 2, Addr 3 Typical 3 -- address instructions are specified as follows:- MUL A, B, C; C A \* B ADD A, B, C; C A + B SUB R1, R2, R3; R3 R1 -- R2 The result of an operation is always assumed to be saved in the destination address. E.g.32-bit microprocessor such as the Intel 80386. **2 -- Address Format** General form: (OP-Code) Addr1, Add2 Typical 2--- address instruction as listed as follows:- MOV A, R1; R1 A ADD C, R2; R2 R2 + C SUB R1, R2; R2 R2 -- R1 **1- Address Format** General form is as follows:- (OP-Code) Addr Its typical instructions are as follows: LDA B; Acc B ADD C; Acc Acc + C STA E; E Acc **Zero- Address Format:-** General form :- (OP-Code) These are instructions that do not require any addresses Examples are: - NOP, DEC, INC. instructions. **ADDRESS MODES** The sequence of operations that a microprocessor is to setout while executing an instruction is called its instruction cycle. The most important activity in an instruction Cycle is the determination of the operand and the operation addresses. The manner in which the microprocessor accomplished this task is called the addressing. Different types of addresses are described below:- **Inherent or Implied Addressing** An instruction in inherent addressing form has its Op code indicating the address of the operand. The address specified is usually the address of the Microprocessor register; e.g. The instruction CMA; Complement Accumulator register. In this, the Op-code implies the address of the operand; the microprocessor does not have to compute the operand address. E. g. very common with 8-bit microprocessor e. g. 8085, Z80. **Immediate Addressing** Occur when an instruction contains the operand value E. G. the instruction: ADD \#16, R0; (R0) (R0) + 16 The symbol \# indicates an immediate mode instruction. Example of microprocessor employing this is MC68000. In this, machine representation of this instruction occupies two consecutive memory words: The 1st word holds the OP code while the next word holds the value (i. e. 16). The Microprocessor accesses the mm twice. **Absolute or Direct Addressing:** Occurs when an instruction contains the address of the operand, e\. g. MOVE 2000, R0; R0 M(2000) This instruction copies the contents of memory location 2000 in the microprocessor register R0. **Indirect Addressing:** This exists when an instruction contains the address of an address of the operand. E. g. MOVE (2000), R0; R0 M (M (2000)) **Register Addressing:-** Exists when an instruction contains a register address as opposed to memory address. This indicates that the operand values are held in microprocessor registers. E. g. ADD R1, R0; R0 (R1) + (R0). In this mode, the effective address (EA) of an operand is a microprocessor, register, since a few numbers of registers contained the microprocessor registers. An important concept used in the context of addressing modes is the ideal of address modification. In this approach, the EA of an operand is expressed as the mm of two parameters: reference address (RA) and Modifier (MA). More formally, EA = RA + M. The modifier M is also called the offsewt or displacement. This address modification principle is the basic concept associated with the following addressing modes: -Indexed Mode -Base Register Mode -Relative Mode **Index Mode:-** In this mode, the value of RA is included in the instruction and the register contains the value M. In particular, Register X is called the index register, which is useful in accessing arrays. e\. g. LDA 0100 (X) EA = RA + (X) = 0100 + 2 = 0102 Where x=0002 **Base Register Mode:-** The parameter RA is held in a separate register called Base Register and the modifier is included in the instruction. This mode is very significant in a system that provides a virtual memory support. It is applied in segmented memory systems where the base register helds the base address of a segment. In this mode, the number of bits in the modifier field M may be less than the number of bits required for a direct memory reference. **Relative Addressing Mode:** This is achieved where the program counter is configured as the base register. It is highly useful in order to design a short branch instruction. E. g. in Z80 branch instruction JP 0248H, (H is hexadecimal). The machine representation of this instruction requires 3 bytes: 1 byte (OP Code) and 2 bytes for the branch address (0248) **Instruction types** (Assumed to have been taught in CSE 308 Assembly Programming Language) **\ PIPELINED COMPUTERS** Pipeline is one of the techniques employed to greatly improve CPU speed of operation. Pipelining is the addition of parallel elements to the computer's arithmetic and control element so that several instructions can be worked on in parallel, increasing the through put of the computer. The basic idea in pipeline is to begin carrying out new instructions b/4 execution of an old one is completed. Pipelining is an experience occurred that typically especially in register character charts for addition, subtraction and other operations; where each step involved is outlined in the micro-program lists, which show micro instructions that must perform an instruction. The possible list of steps to be carried out for an instruction is highlighted as follows:- 1. Fetch the instruction: Read the instruction word from memory. 2. Decode the instruction. Identify the instructions based on the Op code and determine what control signals are required and when. 3. Generate Operand address. Determine the address of the operand to be used. 4. Fetch Operands. Access Operands to be used. 5. Execute instruction. to perform the required operation (addition, Subtraction etc.) between Stages The figure above shows the block diagram of a pipeline for the operations outlined. A new instruction word is read into the fetch instruction section each clock period. This instruction word is passed to the decode instruction section during the next clock period, while a new instruction is taken in sequence from memory. The block shown is implemented separately using gates and flip-flops and results are passed from block to block over connections. Control signals are generated so that each instruction is properly executed as it passes through the pipeline. All the details of the instruction decoded (Op code, Operand values etc.) must be passed along each stage of the pipeline. However, instructions such as floating- point instructions, multiplication and division which require more than the normal number of instruction steps, are accommodated by adding special sections for these operations. Under this influence, the control sees that subsequent instructions are delayed until the operations are completed. Other problems encountered are: \(1) BRANCH instructions that cause the order of instructions taken from memory to vary from the "normal" sequence. \(2) It arises when an instruction modifies the operand to be used by a following instruction. BRANCH instructions are accommodated by delaying the reading of new instructions until BRANCH instruction is executed. This normally slows down the system used. And the operand modification is dealt with, by simply delaying the execution of the following instructions until new values have been formed. As a result of the enhance speed and additional through put, Pipelines have been made a component of almost every new machine, including most microprocessors. INTERUPTS ========= An interrupt refers to the temporary transfer of control from a currently running program to an interrupt service as a result of an externally or internally generated request, control returns to the program that was read after the service routine has been executed.(An interrupt procedure resembles a subroutine call). As just mentioned, interrupts can be externally generated (by I/O devices for example) or internally generated (by the CPU itself). (An example of the latter occurs when a DIVIDE instruction attempts to divide a number by 0) internally generated are sometimes called traps. After an interrupt occurs, the CPU must be returned to its state as of the time interrupt occurred. The state of the CPU includes the current contents of the program counter, the contents of all processor registers and the content of all the status registers. These must be stored before the interrupt is serviced and restored by the interrupt service routine (or user) before operation of the program can be resumed. **\ REDUCED INSTRUCTION SET COMPUTERS AND COMPLEX INSTRUCTION SET COMPUTER ARCHITECTURES** Computer architectures can be divided into two basic types: the Reduced Instruction Set Computer (RISC) and the Complex Instruction Set Computer (CISC). The earliest computers had only a few simple instructions because the early vacuum tube and semiconductor technology were too expensive and bulky and generated too much heat to allow complicated instruction repertoires. As semiconductor technology improved, computer designers began adding more instruction to computers and these instructions became more complex. The VAX and IBM 370 are good illustrations of these instruction sets and CISC architecture. The result was that the number of different instructions that were implemented in computers became large, although many of the more complex instructions were rarely used in actual programs. The basic idea behind CISCs is that when a single instruction does the works of several instructions, fewer instruction words need be read from memory and this reduces trips to memory, saving memory and speed up operations. There are some negatives side effects from large instructions repertoires. They are: (1) Each instruction added requires additional electronics particularly when a pipeline is used. Pipelining is the addition of parallel element to the computer's arithmetic and control element so that several instructions can be worked on in parallel, increasing the throughput of the computer. (2) As a result of the added electronics, the chip area is increased and the signal paths become longer and more delay is encountered. (3) The extra electronic also generate heats which also slow down instruction operation. As a result of the complication experienced in CISCs, RISC was designed. RISC use only a small set of carefully chosen instruction, each of which is kept as sample as possible. In this way, the processor can be implemented with a small chip area and execute these few instruction rapidly. RISC architectures run many program faster than CISCs because of their noteworthy characteristics which are outlined below:- 1. The instruction set is simple and instruction words are similar in the construction. 2. There are few addressing modes. 3. There is a memory hierarchy that includes cache memory. An attempt is made to execute and instruction word each clock cycle. 4. The processor is pipelined. 5. Control is in gates and flip-flop and the control is not micro-programmed. 6. The registers in which operands are stored tend to be numerous and arithmetic instructions are registered-to-register. The compliers that translate RISC programs often perform operations that optimized the machine code generated. Sometimes, it even adds instructions to prevent problems or slowdowns that might occur due to the limited instructions repertoire and pipeline construction. Microprogramming:- is the writing of control instruction that decode and execute computer instructions. Microprogramming replaces hardware with software provides flexibility at the cost of speed. **\ PARALLEL COMPUTING** **Von Neumann Architecture** For over 40 years, virtually all computers have followed a common machine model known as the Von Neumann computer which was named after the Hungarian mathematician John von Neumann. A Von Neumann computer uses the stored-program concept. The CPU executes a stored program that specifies a sequence of read and write operations on the memory. **Von Neumann Computer Architecture** Its basic design includes the followings: memory is used to store both program and data instructions; program instructions are coded data which tell the computer what to do. Data is simply information to be used by the program. The central processing unit (CPU) gets instructions and/or data from memory, decodes the instructions and then ***sequentially*** performs them. **Flynn\'s Classical Taxonomy** There are different ways to classify parallel computers. One of the more widely used classifications, in use since 1966, is called Flynn\'s Taxonomy. Flynn\'s taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of ***Instruction*** and ***Data***. Each of these dimensions can have only one of two possible states: ***Single*** or ***Multiple***. The matrix below defines the 4 possible classifications according to Flynn. **Flynn\'s Classical Taxonomy** +-----------------------------------+-----------------------------------+ | S I S D | S I M D | | | | | **Single Instruction, Single | **Single Instruction, Multiple | | Data** | Data** | +-----------------------------------+-----------------------------------+ | M I S D | M I M D | | | | | **Multiple Instruction, Single | **Multiple Instruction, Multiple | | Data** | Data** | +-----------------------------------+-----------------------------------+ **Single Instruction, Single Data (SISD):** This pertains to a serial (non-parallel) computer in which only one instruction stream is being acted on by the CPU during any one clock cycle and only one data stream is being used as input during any one clock cycle. The execution is deterministic execution. This is the oldest and until recently, the most prevalent form of computer Examples: most PCs, single CPU workstations and mainframes **Chart of Single Instruction, Single Data (SISD)** Single Instruction, Multiple Data (SIMD): This pertains to a type of parallel computer in which all processing units execute the same instruction at any given clock cycle and each processing unit can operate on a different data element. This type of machine typically has an instruction dispatcher, a very high-bandwidth internal network, and a very large array of very small-capacity instruction units. It is best suited for specialized problems, characterized by a high degree of regularity, such as image processing. It displays a synchronous (lockstep) and deterministic execution. Two varieties: Processor Arrays and Vector Pipelines Examples: Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2 Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820 ![](media/image3.png) **Schematic of Single Instruction, Multiple Data (SIMD)** **Multiple Instructions, Single Data (MISD):** This entails a single data stream being fed into multiple processing units in which each processing unit operates on the data independently via independent instruction streams. Few actual examples of this class of parallel computer have ever existed. One is the experimental Carnegie-Mellon C.mmp computer (1971). Some conceivable uses might be: multiple frequency filters operating on a single signal stream and multiple cryptography algorithms attempting to crack a single coded message. **Figure 2.6: Schematic of Multiple Instruction, Single Data (MISD)** **Multiple Instructions, Multiple Data (MIMD):** Currently, this is the most common type of parallel computer. Most modern computers fall into this category in which every processor may be executing a different instruction stream and every processor may be working with a different data stream. Execution can be synchronous or asynchronous, deterministic or non-deterministic Examples: most current supercomputers, networked parallel computer \"grids\" and multi-processor SMP computers - including some types of PCs. ![](media/image5.png) **Figure 2.7: Representation of Multiple Instruction, Multiple Data (MIMD)** **Parallel Computer Memory Architectures** **Shared Memory** The following depict the general characteristics of shared memory parallel computers architecture: it varies widely, but generally have in common the ability for all processors to access all memory as global address space; multiple processors can operate independently but share the same memory resources; changes in a memory location effected by one processor are visible to all other processors. \[7\] **Chart of Shared Memory** Shared memory machines can be divided into two main classes based upon memory access times: *UMA* and *NUMA*. **Uniform Memory Access (UMA)**: Most commonly represented today by Symmetric Multiprocessor (SMP) machines. Identical processors and equal access and access times to memory. Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update. Cache coherency is accomplished at the hardware level. **Non-Uniform Memory Access (NUMA)**: Often made by physically linking two or more SMPs. One SMP can directly access memory of another SMP. Not all processors have equal access time to all memories; memory access across link is slower. If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA **Advantages:** Global address space provides a user-friendly programming perspective to memory. Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs. **Disadvantages:** Primary disadvantage is the lack of scalability between memory and CPUs. Adding more CPUs can geometrically increases traffic on the shared memory-CPU path, and for cache coherent systems, geometrically increase traffic associated with cache/memory management. Programmer responsibilities for synchronization constructs assures \"correct\" access of global memory. It becomes increasingly difficult and expensive to design and produce shared memory machines with ever increasing numbers of processors. **Distributed memory** The general characteristics of distributed memory systems like shared memory systems, varies widely but share a common characteristic. Distributed memory systems require a communication network to connect inter-processor memory. Processors have their own local memory. Memory addresses in one processor do not map to another processor, so there is no concept of global address space across all processors. Because each processor has its own local memory, it operates independently. Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply. When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer\'s responsibility. The network \"fabric\" used for data transfer varies widely, though it can be as simple as Ethernet. \[8\] ![](media/image7.png) **Schematic of Distributed Memory** **Advantages:** Memory is scalable with number of processors. Increase the number of processors and the size of memory increases proportionately. Each processor can rapidly access its own memory without interference and without the overhead incurred with trying to maintain cache coherency. Cost effectiveness: can use commodity, off-the-shelf processors and networking. **Disadvantages:** The programmer is responsible for many of the details associated with data communication between processors. It may be difficult to map existing data structures, based on global memory, to this memory organization. Non-uniform memory access (NUMA) times. **Hybrid Distributed-Shared Memory** Summarizing a few of the key characteristics of shared and distributed memory machines: **Comparison of Shared and Distributed Memory Architectures** ----------------------------------------------------------- ---------------------- -------------------------- --------------------------------------------- Comparison of Shared and Distributed Memory Architectures Architecture CC-UMA CC-NUMA Distributed Examples SMPs\ SGI Origin\ Cray T3E\ Sun Vexx\ Sequent\ Maspar\ DEC/Compaq\ HP Exemplar\ IBM SP2 SGI Challenge\ DEC/Compaq\ IBM POWER3 IBM POWER4 (MCM) Communications MPI\ MPI\ MPI Threads\ Threads\ OpenMP\ OpenMP\ shmem shmem Scalability to 10s of processors to 100s of processors to 1000s of processors Draw Backs Memory-CPU bandwidth Memory-CPU bandwidth\ System administration\ Non-uniform access times Programming is hard to develop and maintain Software Availability many 1000s ISVs many 1000s ISVs 100s ISVs ----------------------------------------------------------- ---------------------- -------------------------- --------------------------------------------- The largest and fastest computers in the world today employ both shared and distributed memory architectures. **Representation of Hybrid Distributed-Shared Memory** The shared memory component is usually a cache coherent SMP machine. Processors on a given SMP can address that machine\'s memory as global. The distributed memory component is the networking of multiple SMPs. SMPs only known about their own memory and not the memory on another SMP. Therefore, network communications are required to move data from one SMP to another. Current trends seem to indicate that this type of memory architecture will continue to prevail and increase at the high end of computing for the foreseeable future.

Underg Architecture.docx

Document Details

Tags

Related

Full Transcript