coa unit 1.pdf
Document Details
Uploaded by Deleted User
Full Transcript
Please read this disclaimer before proceeding: This document is confidential and intended solely for the educational purpose of RMK Group of Educational Institutions. If you have received this document through email in error, please notify the system manager. This document contains proprietary inf...
Please read this disclaimer before proceeding: This document is confidential and intended solely for the educational purpose of RMK Group of Educational Institutions. If you have received this document through email in error, please notify the system manager. This document contains proprietary information and is intended only to the respective group / learning community as intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you have received this document by mistake and delete this document from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. 22CS302 COMPUTER ORGANIZATION AND ARCHITECTURE Department: CSE Batch/Year: BATCH 2023-27/II Created by: Ms. K. Ramya Devi, Assistant Professor Ms. S. Sajithra, Assistant Professor Ms. J. Latha, Assistant Profesor Table of Contents Sl. No. Contents Page No. 1 Contents 5 2 Course Objectives 6 3 Pre Requisites (Course Name with Code) 8 4 Syllabus (With Subject Code, Name, LTPC details) 10 5 Course Outcomes (6) 12 6 CO-PO/PSO Mapping 14 Lecture Plan (S.No., Topic, No. of Periods, 7 Proposed date, Actual Lecture Date, pertaining CO, 18 Taxonomy level, Mode of Delivery) 8 Activity based learning 20 Lecture Notes (with Links to Videos, e-book 9 21 reference, PPTs, Quiz and any other learning materials ) Assignments ( For higher level learning and Evaluation 10 84 - Examples: Case study, Comprehensive design, etc.,) 11 Part A Q & A (with K level and CO) 86 12 Part B Qs (with K level and CO) 94 Supportive online Certification courses 13 96 (NPTEL, Swayam, Coursera, Udemy, etc.,) 14 Real time Applications in day to day life and to Industry 98 Contents beyond the Syllabus ( COE related 15 100 Value added courses) 16 Assessment Schedule ( Proposed Date & Actual Date) 103 17 Prescribed Text Books & Reference Books 105 18 Mini Project 107 Course Objectives Course Objectives To describe the basic principles and operations of digital computers. To design arithmetic and logic unit for various fixed and floating point operations To construct pipeline architectures for RISC processors. To explain various memory systems & I/O interfacings To discuss parallel processor and multi-processor architectures PRE REQUISITES Prerequisites ❖ 22EC101 Digital Principles and Systems Design Syllabus Syllabus COMPUTER ORGANIZATION LTPC 22CS302 AND ARCHITECTURE 3003 OBJECTIVES: Describe the basic principles and operations of digital computers. Design arithmetic and logic units for various fixed and floating point operations Construct pipeline architectures for RISC processors. Explain various memory systems & I/O interfacings Discuss parallel processor and multi-processor architectures. UNIT I COMPUTER FUNDAMENTALS 9 Computer Types - Functional Units – Basic Operational Concepts - Number Representation and Arithmetic Operations - Performance Measurements- Instruction Set Architecture: Memory Locations and Addresses - Instructions and Instruction Sequencing - Addressing Modes UNIT II COMPUTER ARITHMETIC 9 Addition and Subtraction of Signed Numbers - Design of Fast Adders - Multiplication of Unsigned Numbers - Multiplication of Signed Numbers - Fast Multiplication - Integer Division - Floating-Point Numbers and Operations UNIT III BASIC PROCESSING UNIT AND PIPELINING 9 Basic Processing Unit: Concepts - Instruction Execution - Hardware Components - Instruction Fetch and Execution Steps - Control Signals - Hardwired Control Pipelining - Basic Concept - Pipeline Organization - Pipelining Issues - Data Dependencies - Memory Delays - Branch Delays - Resource Limitations - Performance Evaluation - Superscalar Operation. UNIT IV I/O AND MEMORY 9 Input/Output Organization: Bus Structure - Bus Operation - Arbitration -The Memory System: Basic Concepts - Semiconductor RAM Memories - Read-only Memories - Direct Memory Access - Memory Hierarchy - Cache Memories - Performance Considerations - Virtual Memory - Memory Management Requirements Memory Management Requirements - Secondary Storage. UNIT V PARALLEL PROCESSING AND MULTICORE COMPUTERS 9 Parallel Processing: Use of Multiple Processors - Symmetric Multiprocessors Multithreading and Chip Multiprocessors - Clusters - Nonuniform Memory Access Computers - Vector Computation - Multicore Organization. TOTAL: 45 PERIODS Course Outcomes Course Outcomes Course Description Knowledge Outcomes Level Explain the basic principles and operations of digital CO1 K2 computers Design Arithmetic and Logic Unit to perform fixed CO2 K3 and floating-point operations CO3 Develop pipeline architectures for RISC Processors. K3 Summarize Various Memory systems & CO4 K4 I/O interfacings. Recognize Parallel Processor and Multi CO5 K5 Processor Architectures. Explore Parallel Processing and Multicore CO6 K4 Computers. Knowledge Level Description K6 Evaluation K5 Synthesis K4 Analysis K3 Application K2 Comprehension K1 Knowledge CO – PO/PSO Mapping CO – PO /PSO Mapping Matrix CO PO PO PO PO PO PO PO PO PO PO PO PO PS PS PS 1 2 3 4 5 6 7 8 9 10 11 12 O1 O2 03 1 3 2 1 1 3 2 3 3 2 2 3 3 3 3 1 1 3 4 3 3 1 1 3 5 3 3 1 1 3 6 2 2 1 1 3 UNIT – I COMPUTER FUNDAMENTALS Lecture Plan Lecture Plan – Unit 1– COMPUTER FUNDAMENTALS Sl. Numb Actual No. Taxo Mode of er of Proposed Lectur Topic CO nomy Delivery Period Date e Level s Date Computer 1 CO1 K2 Blackboa Types - rd / ICT Functional Units Tools 1 Basic Operational 1 CO1 K2 Blackboa 2 Concepts rd / ICT Tools Number 1 CO1 K2 Blackboa Representation and rd / ICT Arithmetic Tools 3 Operations - Performance - 1 CO1 K2 Blackboa 4 Measurements rd / ICT Tools Instruction Set 1 CO1 K2 Blackboa Architecture: rd / ICT 5 Memory Locations Tools and Addresses Instruction Set 1 CO1 K2 Blackboa Architecture: rd / ICT 6 Memory Locations Tools and Addresses 1 CO1 K4 Blackboa Instructions and 7. rd / ICT Instruction Tools Sequencing 1 CO1 K4 Blackboa Instructions and 8 rd / ICT Instruction Tools Sequencing 1 CO1 K4 Blackboa Addressing 9 rd / ICT Modes Tools Activity Based Learning Activity Based Learning Sl. No. Contents Page No. 1 Crossword Puzzle – Computer Fundamentals 84 Simulation of Instruction Execution using Little 2 85 Man Computer Lecture Notes – Unit 1 UNIT I COMPUTER FUNDAMENTALS Sl. No. Page Contents No. 1 Computer Types 23 2 Functional Units 24 3 Basic Operational Concepts 27 4 Number Representation and Arithmetic Operations 31 5 Performance Measurements 38 Instruction Set Architecture: Memory Locations 6 48 and Addresses 7 Instructions and Instruction Sequencing 51 8 Addressing Modes 63 1. Computer Types Modern computers can be divided roughly into four general categories: Embedded computers are integrated into a larger device or system in order to automatically monitor and control a physical process or environment. They are used for a specific purpose rather than for general processing tasks. Typical applications include industrial and home automation, appliances, telecommunication products, and vehicles. Users may not even be aware of the role that computers play in such systems. Personal computers have achieved widespread use in homes, educational institutions, and business and engineering office settings, primarily for dedicated individual use. They support a variety of applications such as general computation, document preparation, computer-aided design, audiovisual entertainment, interpersonal communication, and Internet browsing. A number of classifications are used for personal computers. ⮚ Desktop computers - general needs and fit within a typical personal workspace. ⮚ Work station computers- provides higher computational capacity and more powerful graphical display capabilities for engineering and scientific work. ⮚ Finally, portable and notebook computers provide the basic features of a personal computer in a smaller lightweight package. They can operate on batteries to provide mobility. Servers and Enterprise systems are large computers that are meant to be shared by a potentially large number of users who access them from some form of personal computer over a public or private network. Such computers may host large databases and provide information processing for a government agency or a commercial organization. Supercomputers and Grid computers normally offer the highest performance. They are the most expensive and physically the largest category of computers. Supercomputers are used for the highly demanding computations needed in weather forecasting, engineering design and simulation, and scientific work. They have a high cost. Grid computers provide a more cost-effective alternative. They combine a large number of personal computers a disk storage units in a physically distributed high-speed network, called a grid, which is managed as a coordinated computing resource. By evenly distributing the computational workload across the grid, it is possible to achieve high performance on large applications ranging from numerical computation to information searching. 2. Functional Units Functional Units Five functionally independent main parts: Input Memory Arithmetic and Logic Unit (ALU) Output Control Units Program – A list of instructions that performs a task. Data – numbers and encoded characters that are used as operands by the instructions. Number, character or instructions: String of binary digits called bits. Binary-coded decimal (BCD) Alphanumeric characters: ASCII (American Standard Code for Information Interchange) EBCDIC (Extended Binary-Coded Decimal Interchange Code) I. Input Reads the data E.g. keyboard, joysticks, trackballs, mouse – graphical input devices E.g. microphones – captures audio input II. Memory Store programs and data Two classes of storage: primary and secondary Primary storage Fast memory, expensive Memory contains a number of semiconductor storage cells, that can store one bit of information. Group of cells called words (n bits) Memory operations are organized so that the contents of one word, can be stored or retrieved in one basic operation Each memory word location has a distinct address. Addresses are numbers that identify successive locations. Number of bits in each word 🡪 word length When the memory is accessed, usually only one word of data is read or written 2. Functional Units Instructions and data can be written into the memory or read out under the control of the processor. Random Access Memory (RAM) A memory in which any location can be reached in a short and fixed amount of time after specifying its address. The time required to access one word is called the memory access time. This time is fixed, independent of the location of the word being accessed. Computer memory is implemented as a memory hierarchy of three or four levels of RAM units with different speeds and sizes. Cache memory Small, fast RAM unit Tightly coupled with the processor IC chip, high performance Main memory Largest and slowest Secondary storage Cheaper than primary storage Used when large amounts of data and programs have to be stored Magnetic disks and tapes, optical disks (CD-ROM’s) III. Arithmetic and Logic Unit (ALU) Computer programs are executed in the ALU Arithmetic or logic operations(e.g.) multiplication, division or comparison of numbers are performed by the ALU The control and arithmetic and logic units are many times faster than other devices. The operands for operations are stored in high-speed storage elements called registers. Each register can store one word of data. Access times to registers are somewhat faster than access times to the fastest cache unit in the memory hierarchy. 2. Functional Units IV. Output Its function is to send processed results to the outside world. E.g. printer, mechanical impact heads, ink jet scanners, laser printers. V. Control Unit The memory, arithmetic and logic, and input and output units store and process information and perform input and output operations. The control unit coordinates these operations and is the nerve centre that sends control signals to other units and senses their states. The timing signals that govern the I/O transfers are generated by the control circuits. Timing signals also control data transfer between the processor and the memory. Timing signals are the signals that determine when a given action is to take place. A physically separate unit that interacts with other parts of the machine. A large set of control lines (wires) carries the signals used for timing and synchronization of events in all units. Figure 1 Basic Functional Units of a Computer The operations of a computer: The computer accepts information in the form of programs and data through an input unit and stores it in the memory. Information stored in memory is fetched, under program control, into an ALU, where it is processed. Processed information leaves the computer through an output unit. All activities inside the machine are directed by the control unit. 3. Basic Operational Concepts Basic Operational Concepts The activity in a computer is governed by instructions. To perform a given task, an appropriate program consisting of a list of instructions is stored in the memory. Individual instructions are brought from the memory into the processor, which executes the specified operations. Data to be used as instruction operands are also stored in the memory. A typical instruction might be Load R2, LOC This instruction reads the contents of a memory location whose address is represented symbolically by the label LOC and loads them into processor register R2. The original contents of location LOC are preserved, whereas those of register R2 are overwritten. Execution of this instruction requires several steps. First, the instruction is fetched from the memory into the processor. Next, the operation to be performed is determined by the control unit. The operand at LOC is then fetched from the memory into the processor. Finally, the operand is stored in register R2. After operands have been loaded from memory into processor registers, arithmetic or logic operations can be performed on them. For example, the instruction Add R4, R2, R3 adds the contents of registers R2 and R3, then places their sum into register R4. 3. Basic Operational Concepts Basic Operational Concepts The operands in R2 and R3 are not altered, but the previous value in R4 is overwritten by the sum. After completing the desired operations, the results are in processor registers. They can be transferred to the memory using instructions such as Store R4, LOC This instruction copies the operand in register R4 to memory location LOC. The original contents of location LOC are overwritten, but those of R4 are preserved. For Load and Store instructions, transfers between the memory and the processor are initiated by sending the address of the desired memory location to the memory unit and asserting the appropriate control signals. The data are then transferred to or from the memory. Figure 2 shows how the memory and the processor can be connected. It also shows some components of the processor that have not been discussed yet. The interconnections between these components are not shown. In addition to the ALU and the control circuitry, the processor contains a number of registers used for several different purposes. The instruction register (IR) holds the instruction that is currently being executed. Its output is available to the control circuits, which generate the timing signals that control the various processing elements involved in executing the instruction. 3. Basic Operational Concepts The program counter (PC) is another specialized register. It contains the memory address of the next instruction to be fetched and executed. During the execution of an instruction, the contents of the PC are updated to correspond to the address of the next instruction to be executed. It is customary to say that the PC points to the next instruction that is to be fetched from the memory. In addition to the IR and PC, Figure 2 shows general-purpose registers R0 through Rn−1, often called process registers. They serve a variety of functions, including holding operands that have been loaded from the memoryfor processing. Figure 2 – Connection between the processor and the main memory The processor-memory interface is a circuit which manages the transfer of data between the main memory and the processor. If a word is to be read from the memory, the interface sends the address of that word to the memory along with a Read control signal. The interface waits for the word to be retrieved, then transfers it to the appropriate processor register. If a word is to be written into memory, the interface transfers both the address and the word to the memory along with a Write control signal. A program must be in the main memory in order for it to be executed. 3. Basic Operational Concepts It is often transferred there from secondary storage through the input unit. Execution of the program begins when the PC is set to point to the first instruction of the program. The contents of the PC are transferred to the memory along with a Read control signal. When the addressed word (in this case, the first instruction of the program) has been fetched from the memory it is loaded into register IR. At this point, the instruction is ready to be interpreted and executed. Instructions such as Load, Store, and Add perform data transfer and arithmetic operations. If an operand that resides in the memory is required for an instruction, it is fetched by sending its address to the memory and initiating a Read operation. When the operand has been fetched from the memory, it is transferred to a processor register. After operands have been fetched in this way, the ALU can perform a desired arithmetic operation, such as Add, on the values in processor registers. The result is sent to a processor register. If the result is to be written into the memory with a Store instruction, it is transferred from the processor register to the memory, along with the address of the location where the result is to be stored, then a Write operation is initiated. During the execution of each instruction, the contents of the PC are incremented so that the PC points to the next instruction to be executed. Thus, as soon as the execution of the current instruction is completed, the processor is ready to fetch a new instruction. In addition to transferring data between the memory and the processor, the computer accepts data from input devices and sends data to output devices. Thus, some machine instructions are provided for the purpose of handling I/O transfers. Normal execution of a program may be preempted if some device requires urgent service. The device raises an interrupt signal, which is a request for service by the processor. The processor provides the requested service by executing a program called an interrupt-service routine. Because such diversions may alter the internal state of the processor, its state must be saved in the memory before servicing the interrupt request. Normally, the information that is saved includes the contents of the PC, the contents of the general-purpose registers, and some control information. When the interrupt-service routine is completed, the state of the processor is restored from the memory so that the interrupted program may continue. 4. Number Representation and Arithmetic Operations I. INTEGERS: Three systems are used for representing integer numbers: Sign-and-magnitude 1’s- complement 2’s- complement In all three systems, the leftmost bit is 0 for positive numbers and 1 for negative numbers. Positive values have identical representations in all systems, but negative values have different representations. In the sign-and-magnitude system, negative values are represented by changing the most significant bit. In 1’s-complement representation, negative values are obtained by complementing each bit of the corresponding positive number. The 2’s-complement of a number is obtained by adding 1 to the 1’s- complement of that number. There are distinct representations for +0 and −0 in both the sign-and magnitude and 1’s-complement systems, but the 2’s-complement system has only one representation for 0. 2’s-complement system leads to the most efficient way to carry out addition and subtraction operations. Addition of Unsigned Integers We add bit pairs starting from the low-order (right) end of the bit vectors, propagating carries toward the high-order (left) end. The carry- out from a bit pair becomes the carry-into the next bit pair to the left. The carry- in must be added to a bit pair in generating the sum and carry-out at that position. 4. Number Representation and Arithmetic Operations – Character Representation I. INTEGERS: Addition and Subtraction of Signed Integers The 2’s-complement system is the most efficient method for performing addition and subtraction operations. To add two numbers, add their n-bit representations, ignoring the carry-out bit from the most significant bit (MSB) position. The sum will be the algebraically correct value in 2’s-complement representation if the actual result is in the range−2n−1 through+2n−1 − 1. To subtract two numbers X and Y , that is, to perform X − Y , form the 2’s- complement of Y , then add it to X using the add rule. Again, the result will be the algebraically correct value in 2’s-complement representation if the actual result is in the range −2n−1 through +2n−1 − 1. Sign Extension For a negative number in 2’s-complement representation, the leftmost bit, which indicates the sign of the number, is a 1. A longer number with the same value is obtained by replicating the sign bit to the left as many times as needed. To represent a signed number in 2’s-complement form using a larger number of bits, repeat the sign bit as many times as needed to the left. This operation is called sign extension. Overflow in Integer Arithmetic When the actual result of an arithmetic operation is outside the representable range, an arithmetic overflow has occurred. When adding unsigned numbers, a carry-out of 1 from the most significant bit position indicates that an overflow has occurred. Overflow may occur only if both summands have the same sign. The addition of numbers with different signs cannot cause overflow because the result is always within the representable range. Examine the signs of the two summands and the sign of the result. When both summands have the same sign, an overflow has occurred when the sign of the sum is not the same as the signs of the summands. 4. Number Representation and Arithmetic Operations II. FLOATING POINT NUMBERS Fixed-point numbers are integers, having an implied binary point at the right end of the number. It is also possible to assume that the binary point is just to the right of the sign bit, thus representing a fraction. In the 2’s-complement system, the signed value F, represented by the n-bit binary fraction Consider the range of values representable in a 32-bit, signed, fixed-point format. Interpreted as integers, the value range is approximately 0 to ±2.15 x 109. If we consider them to be fractions, the range is approximately ±4.55 x 1010 to ± 1. To accommodate both very large integers and very small fractions, a computer must be able to represent numbers and operate on them in such a way that the position of the binary point is variable and is automatically adjusted as computation proceeds. In such a case, the binary point is said to float, and the numbers are called floating-point numbers. This distinguishes them from fixed- point numbers, whose binary point is always in the same position. Because the position of the binary point in a floating-point number is variable, it must be given explicitly in the floating-point representation. For example, in the familiar decimal scientific notation, numbers may be written as 6.0247 x 1023, 6.6254 x 1027 , —1.0341 x 102, —7.3000x 10-14 and so on. These numbers are said to be given to five significant digits. The scale factors (1023, 1027, and so on) indicate the position of the decimal point with respect to the significant digits. By convention, when the decimal point is placed to the right of the first (nonzero) significant digit, the number is said to be normalized. The base, 10, in the scale factor is f ixed and does not need to appear explicitly in the machine representation of a floating-point number. A floating-point number representation is one in which a number is represented by its sign, a string of significant digits, commonly called the mantissa, and an exponent to an implied base for the scale factor. 4. Number Representation and Arithmetic Operations IEEE STANDARD FOR FLOATING-POINT NUMBERS A useful floating-point form is ±X1. X2 X3 X4 X5 X6X7 x 10±Y1Y2 where Xi and Yj are decimal digits. Both the number of significant digits (7) and the exponent range (±99) are sufficient for a wide range of scientific calculations. A 24-bit mantissa can approximately represent a 7-digit decimal number, and an 8-bit exponent to an implied base of 2 provides a scale factor with a reasonable range. One bit is needed for the sign of the number. Since the leading nonzero bit of a normalized binary mantissa must be a 1, it does not have to be included explicitly in the representation. Therefore, a total of 32 bits is needed. The 32-bit representation is given in Figure 6.24a. The sign of the number is given in the first bit, followed by a representation for the exponent (to the base 2) of the scale factor. Instead of the signed exponent, E, the value actually stored in the exponent field is an unsigned integer E’ = E + 127. This is called the excess-127 format. Thus, E’ is in the range 0 ≤ E’ ≤ 255. The end values of this range, 0 and 255, are used to represent special values, as described below. Therefore, the range of E’ for normal values is 1 ≤ E’ ≤ 254. This means that the actual exponent, E, is in the range —126 ≤ E ≤ 127. The last 23 bits represent the mantissa. Since binary normalization is used, the most significant bit of the mantissa is always equal to 1. This bit is not explicitly represented; it is assumed to be to the immediate left of the binary point. Hence, the 23 bits stored in the M field actually represent the fractional part of the mantissa, that is, the bits to the right of the binary point. An example of a single- precision floating-point number is shown in Figure 6.24b. 4. Number Representation and Arithmetic Operations The 32-bit standard representation in Figure 6.24a is called a single-precision representation because it occupies a single 32-bit word. The scale factor has a range of 2-126 to 2+127, which is approximately equal to 10±38. The 24-bit mantissa provides approximately the same precision as a 7-digit decimal value. To provide more precision and range for floating-point numbers, the IEEE standard also specifies a double- precision format, as shown in Figure 6.24c. The double-precision format has increased exponent and mantissa ranges. The 11-bit excess-1023 exponent E’ has the range 1 ≤ E’ ≤ 2046 for normal values, with 0 and 2047 used to indicate special values. Thus, the actual exponent E is in the range —1022 ≤ E ≤ 1023, providing scale factors of 2-1022 to 21023 (approximately 10±308). The 53-bit mantissa provides a precision equivalent to about 16 decimal digits. IEEE STANDARD FOR FLOATING-POINT NUMBERS A useful floating-point form is ±X1. X2 X3 X4 X5 X6X7 x 10±Y1Y2 where Xi and Yj are decimal digits. Both the number of significant digits (7) and the exponent range (±99) are sufficient for a wide range of scientific calculations. A 24-bit mantissa can approximately represent a 7-digit decimal number, and an 8-bit exponent to an implied base of 2 provides a scale factor with a reasonable range. One bit is needed for the sign of the number. Since the leading nonzero bit of a normalized binary mantissa must be a 1, it does not have to be included explicitly in the representation. Therefore, a total of 32 bits is needed. The 32-bit representation is given in Figure 6.24a. The sign of the number is given in the first bit, followed by a representation for the exponent (to the base 2) of the scale factor. Instead of the signed exponent, E, the value actually stored in the exponent field is an unsigned integer E’ = E + 127. This is called the excess-127 format. Thus, E’ is in the range 0 ≤ E’ ≤ 255. The end values of this range, 0 and 255, are used to represent special values, as described below. Therefore, the range of E’ for normal values is 1 ≤ E’ ≤ 254. This means that the actual exponent, E, is in the range —126 ≤ E ≤ 127. The last 23 bits represent the mantissa. Since binary normalization is used, the most significant bit of the mantissa is always equal to 1. This bit is not explicitly represented; it is assumed to be to the immediate left of the binary point. Hence, the 23 bits stored in the M field actually represent the fractional part of the mantissa, that is, the bits to the right of the binary point. An example of a single- precision floating-point number is shown in Figure 6.24b. 4. Number Representation and Arithmetic Operations The 32-bit standard representation in Figure 6.24a is called a single- precision representation because it occupies a single 32-bit word. The scale factor has a range of 2-126 to 2+127, which is approximately equal to 10±38. The 24-bit mantissa provides approximately the same precision as a 7-digit decimal value. To provide more precision and range for floating-point numbers, the IEEE standard also specifies a double- precision format, as shown in Figure 6.24c. The double-precision format has increased exponent and mantissa ranges. The 11-bit excess-1023 exponent E’ has the range 1 ≤ E’ ≤ 2046 for normal values, with 0 and 2047 used to indicate special values. Thus, the actual exponent E is in the range —1022 ≤ E ≤ 1023, providing scale factors of 2- 1022 to 21023 (approximately 10±308). The 53-bit mantissa provides a precision equivalent to about 16 decimal digits. Advantages: The extended versions are intended to provide increased precision and increased exponent range for the representation of intermediate values in a sequence of calculations. The inputs are given in a standard precision, either single or double, and the answer is truncated to the same precision. The use of extended formats helps to reduce the size of the accumulated round-off error in a sequence of calculations. Extended formats also enhance the accuracy of evaluation of elementary functions such a sine, cosine, and so on. Two basic aspects of operating with floating-point numbers. First, if a number is not normalized, it can always be put in normalized form by shifting the fraction and adjusting the exponent. Figure 6.25 shows an un-normalized value, 0.0010110... x 29, and its normalized version 1.0110.... x 26. Since the scale factor is in the form 2i, shifting the mantissa right or left by one bit position is compensated by an increase or a decrease of 1 in the exponent, respectively. Second, as computations proceed, a number that does not fall in the representable range of normal numbers might be generated. In single precision, this means that its normalized representation requires an exponent les than —126 or greater than + 127. In the first case, we say that underflow has occurred, and in the second case, we say that overflow has occurred. Both underflow and overflow are arithmetic exceptions. 4. Number Representation and Arithmetic Operations Special Values The end values 0 and 255 of the excess-127 exponent E’ are used to represent special values. When E’ = 0 and the mantissa fraction M is zero, the value exact 0 is represented. When E’ = 255 and M = 0, the value ∞ is represented, where ∞ is the result of dividing a normal number by zero. The sign bit is still part of these representations, so there are ±0 and ±∞ representations. When E’ = 0 and M ≠ 0, denormal numbers are represented. Their value is ±0.M x 2-126. Therefore, they are smaller than the smallest normal number. There is no implied one to the left of the binary point, and M is any nonzero 23-bit fraction. The purpose of introducing denormal numbers is to allow for gradual underflow, providing an extension of the range of normal representable numbers that is useful in dealing with very small numbers in certain situations. When E’ = 255 and M ≠ 0, the value represented is called Not a Number (NaN). A NaN is the result of performing an invalid operation such as 0/0 or. 5. Performance Measurement The most important measure of the performance of a computer is how quickly it can execute programs. The speed with which a computer executes programs is affected by the design of its instruction set, its hardware and its software, including the operating system, and the technology in which the hardware is implemented. Because programs are usually written in a high-level language, performance is also affected by the compiler that translates programs into machine language. 1. Technology The technology of Very Large Scale Integration (VLSI) that is used to fabricate the electronic circuits for a processor on a single chip is a critical factor in the speed of execution of machine instructions. The speed of switching between the 0 and 1 states in logic circuits is largely determined by the size of the transistors that implement the circuits. Smaller transistors switch faster. Advances in fabrication technology over several decades have reduced transistor sizes dramatically. This has two advantages: instructions can be executed faster, and more transistors can be placed on a chip, leading to more logic functionality and more memory storage capacity. 1. Parallelism Performance can be increased by performing a number of operations in parallel. Parallelism can be implemented on many different levels. Instruction-level Parallelism The simplest way to execute a sequence of instructions in a processor is to complete all steps of the current instruction before starting the steps of the next instruction. If we overlap the execution of the steps of successive instructions, total execution time will be reduced. For example, the next instruction could be fetched from memory at the same time that an arithmetic operation is being performed on the register operands of the current instruction. This form of parallelism is called pipelining. 5. Performance Measurement Multicore Processors Multiple processing units can be fabricated on a single chip. The term core is used for each of these processors. The term processor is then used for the complete chip. Hence, we have the terminology dual- core, quad-core, and octo-core processors for chips that have two, four, and eight cores, respectively. Multiprocessors Computer systems may contain many processors, each possibly containing multiple cores. Such systems are called multiprocessors. These systems either execute a number of different application tasks in parallel, or they execute subtasks of a single large task in parallel. All processors usually have access to all of the memory in such systems, and the term shared-memory multiprocessor is often used to make this clear. The high performance of these systems comes with much higher complexity and cost, arising from the use of multiple processors and memory units, along with more complex interconnection networks. In contrast to multiprocessor systems, it is also possible to use an interconnected group of complete computers to achieve high total computational power. The computers normally have access only to their own memory units. When the tasks they are executing need to share data, they do so by exchanging messages over a communication network. This property distinguishes them from shared- memory multiprocessors, leading to the name message passing multicomputers. 5. Performance Measurement 5.1 PERFORMANCE AND METRICS The most important measure of the performance of a computer is how quickly it can execute programs. The speed with which a computer executes programs is affected by the design of its hardware and its machine language instructions. Because programs are usuaI1y written in a high-level language, performance is also affected by the compiler that translates programs into machine language. For best performance, it is necessary to design the compiler, the machine instruction set, and the hardware in a coordinated way. The operating system overlaps processing, disk transfers, and printing for several programs to make the best possible use of the resources available. The total time required to execute the program in Figure 1.4 is t5 to t0. This elapsed time is a measure of the performance of the entire computer system. It is affected by the speed of the processor, the disk, and the printer. To calculate the performance of the processor, only the periods during which the processor is active should be considered. These are the period’s labeled Program and OS routines in Figure 1.4. The sum of these periods is referred to as the processor time needed to execute the program. Elapsed Time:- The total time required to execute the entire program. The elapsed time for the execution of a program depends on all units in a computer system. Processor Time:- The periods during which the processor is active during the execution of the program is called the processor time. The processor time depends on the hardware involved in the execution of individual machine instructions. This hardware comprises the processor and the memory, which are usually connected by a bus, as shown in Figure 1.3. The pertinent parts of this figure are repeated in Figure 1.5, including the cache memory as part of the processor unit. 5. Performance Measurement 5.2 CACHE MEMORY Use of Cache memory for faster program execution:- At the start of execution, all program instructions and the required data are stored in the main memory. As execution proceeds, instructions are fetched one by one over the bus into the processor, and a copy is placed in the cache. When the execution of an instruction calls for data located in the main memory, the data are fetched and a copy is placed in the cache. Later, if the same instruction or data item is needed a second time, it is read directly from the cache. The processor and a relatively small cache memory can be fabricated on a single integrated circuit chip. The internal speed of performing the basic steps of instruction processing on such chips is very high and is considerably faster than the speed at which instructions and data can be fetched from the main memory. A program will be executed faster if the movement of instructions and data between the main memory and the processor is minimized, which is achieved by using the cache. For example, suppose a number of instructions are executed repeatedly over a short period of time, as happens in a program loop, these instructions are made avai1able in the cache; and they can be fetched quickly during the period of repeated use. The same applies to data that are used repeatedly. 5. Performance Measurement 3. PROCESSOR CLOCK Processor circuits are controlled by a timing signal called a clock. The clock defines regular time intervals, called clock cycles. To execute a machine instruction, the processor divides the action to be performed into a sequence of basic steps, such that each step can be completed in one clock cycle. The length P of one clock cycle is an important parameter that affects processor performance. Its inverse is the clock rate, R = 1/P, which is measured in cycles per second. Processors used in today’s personal computers and workstations have clock rates that range from a few hundred million to over a billion cycles per second. The standard electrical engineering terminology for the term “cycles per second’ is called hertz (Hz). The term “million” is denoted by the prefix Mega (M) and “billion’ is denoted by the prefix Giga (G). Hence, 500 million cycles per second is usually abbreviated to 500 Megahertz (MHz) and 1250 million cycles per second is abbreviated to 1.25 Gigahertz (GHz). The corresponding clock periods are 2 and 0.8 nanoseconds (ns), respectively. 3. BASIC PERFORMANCE EQUATION Let T be the processor time required to execute a program that has been prepared some high-level language. The compiler generates a machine language object program that corresponds to the source program. Assume that complete execution of the program requires the execution of N machine language instructions. The number N is the actual number of instruction executions, and is not necessarily equal to the number of machine instructions in the object program. Some instructions may be executed more than once which is the case for instructions inside a program loop. Others may not be executed at all, depending on the input data used. Suppose that the average number of basic steps needed to execute one machine instruction is S, where each basic step is completed in one clock cycle. If the clock rate is R cycles per second, the program execution time is given by NxS T = R This is often referred to as the basic performance equation. 5. Performance Measurement The performance parameter T for an application program is much more important to the user than the individual values of the parameters N, S. or R. To achieve high performance, the computer designer must seek ways to reduce the value of T, which means reducing N and S, and increasing R. The value of N is reduced if the source program is compiled into fewer machine instructions. The value of S is reduced if instructions have a smaller number of basic steps to perform or if the execution of instructions is overlapped. Using a higher-frequency clock increases the value or R, which means that the time required to complete a basic execution step is reduced. We must emphasize that N, S, and R are not independent parameters: changing one may affect another. Introducing a new feature in the design of a processor will lead to improved performance only if the overall result is to reduce the value of T. A processor advertised as having a 900- MHz clock does not necessarily provide better performance than a 700 processor because it may have a different value of S. 5.5 PIPELINING AND SUPERSCALAR OPERATION If the instructions are executed one after another, the value of S, which is the total number of basic steps or clock cycles, required to execute an instruction. A substantial improvement in performance can be achieved by over lapping the execution of successive instructions, using a technique called pipelining. Consider the instruction Add RI, R2, R3 which adds the contents of registers R2 and R3, and places the sum into R1. The contents of R2 and R3 are first transferred to the inputs of the ALU. After the add operation is performed, the sum is transferred to R1. The processor can read the next instruction from the memory while the addition operation is being performed. Then, if that instruction also uses the ALU, its operands can be transferred to the ALU inputs at the same time that the result of the Add instruction is being transferred to R1. In the ideal case, if all instructions are overlapped to the maximum degree possible, execution proceeds at the rate of one instruction completed in each clock cycle. Individual instructions still require several clock cycles to complete. But, for the purpose of computing T, the effective value of S is 1. 5. Performance Measurement The ideal value S = 1 cannot be attained in practice for a variety of reasons. However, pipelining increases the rate of executing instructions significantly and causes the effective value of S to approach 1. A higher degree of concurrency can be achieved if multiple instruction pipelines are implemented in the processor. This means that multiple functional units are used, creating parallel paths through which different instructions can be executed in parallel. With such an arrangement, it becomes possible to start the execution of several instructions in every clock cycle. This mode of operation is called superscalar execution. If it can be sustained for a long time during program execution, the effective value of S can be reduced to less than one. Of course, parallel execution must preserve the logical correctness of programs, that is, the results produced must be the same as those produced by serial execution of program instructions. Many of today’s high-performance processors are designed to operate in this manner. 5.6 CLOCK RATE There are two possibilities for increasing the clock rate, R. First, improving the integrated-circuit (IC) technology makes logic circuits faster, which reduces the time needed to complete a basic step. This allows the clock period P, to be reduced and the clock rate R, to be increased. Second reducing the amount of processing done in one basic step also makes it possible to reduce the clock period. P. However, if the actions that have to be performed by an instruction remain the same, the number of basic steps needed may increase. Increases in the value of R that are entirely caused by improvements in IC technology affect all aspects of the processor’s operation equally with the exception of the time it takes to access the main memory. In the presence of a cache, the percentage of accesses to the main memory is small. Hence, much of the performance gain expected from the use of faster technology can be realized. The value of T will be reduced by the same factor as R is increased because S and N are not affected. The impact on performance of changing the way in which instructions are divided into basic steps is more difficult to assess. 5. Performance Measurement 7. INSTRUCTION SET: CISC AND RISC Simple instructions require a small number of basic steps to execute. Complex instructions involve a large number of steps. For a processor that has only simple instructions, a large number of instructions may be needed to perform a given programming task. This could lead to a large value for N and a small value for S. On the other hand, if individual instructions perform more complex operations, fewer instructions will be needed, leading to a lower value of N and a larger value of S. It is not obvious if one choice is better than the other. A key consideration in comparing the two choices is the use of pipelining. The effective value of S in a pipelined processor is close to 1 even though the number of basic steps per instruction may be considerably larger. This seems to imply that complex instructions combined with pipelining would achieve the best performance. However, it is much easier to implement efficient pipelining in processors with simple instruction sets. The suitability of the instruction set for pipelined execution is an important and often deciding consideration. The processors with simple instructions are called Reduced Instruction Set Computers (RISC), and the processors with more complex instructions are referred to as Complex Instruction Set Computers (CISC). 7. COMPILER A compiler translates a high-level language program into a sequence of machine instructions. To reduce N, we need to have a suitable machine instruction set and a compiler that makes good use of it. An optimizing compiler takes advantage of various features of the target processor to reduce the product N x S. which is the total number of clock cycles needed to execute a program. The number of cycles is dependent not only on the choice of instructions, but also on the order in which they appear in the program. The compiler may rearrange program instructions to achieve better performance. Of course, such changes must not affect the result of the computation. 5. Performance Superficially, a compiler appears as a separate entity from the processor with which it is used and may even be available from a different vendor. However, a high-quality compiler must be closely linked to the processor architecture. The compile and the processor are often designed at the same time, with much interaction between the designers to achieve best results. The ultimate objective is to reduce the total number of clock cycles needed to perform a required programming task. 5.9 PERFORMANCE MEASUREMENT It is important to be able to assess the performance of a computer. Computer designers use performance estimates to evaluate the effectiveness of new features. Manufacturers use performance indicators in the marketing process. Buyers use such data to choose among many available computer models. The only parameter that properly describes performance of a computer is the execution time, T, for the programs of interest. Despite the conceptual simplicity of the basic performance equation, computing the value of T is not simple. Moreover, parameters such as the clock speed and various architectural features are not reliable indicators of the expected performance. For these reasons, the computer community adopted the idea of measuring computer performance using benchmark programs. To make comparisons possible, standardized programs must be used. The performance measure is the time it takes a computer to execute a given benchmark. Initially, some attempts were made to create artificial programs that could be used as standard benchmarks. But, synthetic programs do not properly predict performance obtained when real application programs are run. The accepted practice today is to use an agreed-upon selection of real application programs to evaluate performance. A nonprofit organization called System Performance Evaluation Corporation (SPEC) selects and publishes representative application programs for different application domains, together with test results for many commercially available computers. For general-purpose computers, a suite of benchmark programs was selected in 1989. It was modified somewhat and published in 1995 and again in 2000. 5. Performance The programs selected range from game playing, compiler, and database applications to numerically intensive programs in astrophysics and quantum chemistry. In each case, the program is compiled for the computer under test, and the running time on a real computer is measured. The same program is also compiled and run on one computer selected as a reference. The SPEC rating is computed as follows Running time on the reference computer SPEC rating = Running time on the computer under test Thus, a SPEC rating of 50 means that the computer under test is 50 times as fast as the selected reference computer for this particular benchmark. The test is repeated for all the programs in the SPEC suite, and the geometric mean of the results is computed. Let SPEC be the rating for program i in the suite. The overall SPEC rating for the computer is given by where n is the number of programs in the suite. Because the actual execution time is measured, the SPEC rating is a measure of the combined effect of all factors affecting performance, including the compiler, the operating system, the processor and the memory of the computer being tested. 6. Instruction Set Architecture: Memory Locations and Addresses 6. 1 MEMORY LOCATIONS AND ADDRESSES Number and character operands, as well as instructions, are stored in the memory of a computer. The memory consists of many millions of storage cells each of which can store a bit of information having the value 0 or 1. Because a single bit represents a very small amount of information, bits are seldom handled individually. The usual approach is to deal with them in groups of fixed size. For this purpose, the memory is organized so that a group of n bits can be stored or retrieved in a single, basic operation. Each group of n bits is referred to as a word of information. The memory of a computer can be schematically represented as a collection of words as shown in Figure 2.5. Modern computers have word lengths that typically range from 16 to 64 bits. If the word length of a computer is 32 bits, a single word can store a 32-bit 2’s- complement number or four ASCII characters, each occupying 8 bits, as shown in Figure 2.6. A unit of 8 bits is called a byte. Machine instructions may require one or more words for their representation. Accessing the memory to 2st24ore or retrieve a single item of information, either a word or a byte, requires distinct names or address of each item location. It is customary to use numbers from 0 through 2k−1, for some suitable value of k, as the addresses of successive locations in the memory. The 2 k addresses constitute the a d d r e s s o f t h e the computer, and the memory can have up to 2 k addressable locations. For example, a 24-bit address generates an address space of (16,777,216) locations. This number is usually written as 16M (16 mega), where 1M is the number 220 (1,048,576). A 32-bit address creates an address space of 232 or 4G (4 giga) locations, where 1G is 230. Other notational conventions that are commonly used are K (kilo) for the number 210 (1,024), and T (tera) for the number 240. 6. Instruction Set Architecture: Memory Locations and Addresses 2. BYTE ADDRESSABILITY There are three basic information quantities to deal with: the bit, byte, and word. A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits. It is impractical to assign distinct addresses to individual bit locations in the memory. The most practical assignment is to have successive addresses refer to successive byte locations in the memory. This is the assignment used in most modern computers, and is the one we will normally use in this book. The term byte addressable memory used for this assignment. Byte locations have addresses 0,1, 2,....Thus, if the word length of the machine is 32 bits, successive words are located at addresses 0, 4, 8,... , with each word consisting of four bytes. 2. BIG-ENDIAN AND LITTLE-ENDIAN ASSIGNMENTS There are two ways that byte addresses can be assigned across words, as shown in Figure 2.7. The name Big -Endian when lower byte addresses are used for the more significant bytes (the leftmost bytes) of the word. The name Little -Endian is used for the opposite ordering, where the lower byte addresses are used for the less significant bytes (the rightmost bytes) of the word. The words “more significant” and “less significant” are used in relation to the weights (powers of 2) assigned to bits when the word represents a number. Both little- endian and big-endian assignments are used in commercial machines. In both cases, byte addresses 0, 4, 8,..., are taken as the addresses of successive words in the memory and are the addresses used when specifying memory read and write operations for words. 6. Instruction Set Architecture: Memory Locations and Addresses In addition to specifying the address ordering of bytes within a word, it is also necessary to specify the labeling of bits within a byte or a word. The most common convention, is as shown in Figure 2.6a. It is the most natural ordering for the encoding of numerical data. The same ordering is also used for labeling bits within a byte, that is, b7,b6,...,b0, from left to right. There are computers, however, that use the reverse ordering. 6.4 WORD ALIGNMENT In the case of a 32-bit word length, natural word boundaries occur at addresses 0, 4,8,... , as shown in Figure 2.7. The word locations have aligned addresses. In general, words are said to be aligned in memory if they begin at a byte address that is a multiple of the number of bytes in a word. For practical reasons associated with manipulating binary-coded addresses, the number of bytes in a word is a power of 2. Hence, if the word length is 16 (2 bytes), aligned words begin at byte addresses 0,2,4,...,and for a word length of 64 (23 bytes), aligned words begin at byte addresses 0,8,16,.... There is no fundamental reason why words cannot begin at an arbitrary byte address. In that case, words are said to have unassigned addresses. While the most common case is to use aligned addresses, some computers allow the use of unaligned word addresses. 7. Instructions and Instruction Sequencing INSTRUCTIONS AND INSTRUCTION SEQUENCING The tasks carried out by a computer program consist of a sequence of small steps, such as adding two numbers, testing for a particular condition, reading a character from the keyboard, or sending a character to be displayed on a display screen. A computer must have instructions capable of performing four types of operations: Data transfers between the memory and the processor registers Arithmetic and logic operations on data Program sequencing and control I/O transfers 7.1 REGISTER TRANSFER NOTATION Transfer of information from one location in the computer to another a data transfer instruction is used. Possible locations that are involved in such transfers are memory locations, processor registers, or registers in the I/O subsystem. To identify a location a symbolic name instead of its hardware binary address is used. For example, names for the addresses of memory locations may be LOC, PLACE, A, VAR2; processor register names may be R0, R5; and I/O register names may be DATAIN, OUTSTATUS, and so on. The contents of a location are denoted by placing square brackets around the name of the location. Thus, the expression R1 ← [LOC] means that the contents of memory location LOC are transferred into processor register R1. As another example, consider the operation that adds the contents of registers R1 and R2, and then places their sum into register R3. This action is indicated as R3 ← [R1] + [R2] This type of notation is known as Register Transfer Notation (RTN). Note that the right-hand side of an RTN expression always denotes a value, and the left-hand side is the name of a location where the value is to be placed, overwriting the old contents of that location. 7. Instructions and Instruction Sequencing 2. ASSEMBLY LANGUAGE NOTATION To represent machine instructions and programs Assembly Language Notation or an assembly language format is used.For example, a generic instruction that causes the transfer described above, from memory location LOC to processor register R2, is specified by the statement. Load R2, LOC The contents of LOC are unchanged by the execution of this instruction, but the old contents of register R2 are overwritten. The name Load is appropriate for this instruction, because the contents read from a memory location are loaded into a processor register. The second example of adding two numbers contained in processor registers R2 and R3 and placing their sum in R4 can be specified by the assembly- language statement. Add R4, R2, R3 In this case, registers R2 and R3 hold the source operands, while R4 is the destination 2. BASIC INSTRUCTION FORMATS The operation of adding two numbers is a fundamental capability in any computer. The statement C=A+B in a high-level language program is a command to the computer to add the current values of the two variables called A and B, and to assign the sum to a third variable, C. When the program containing this statement is compiled, the three variables, A, B, and C, are assigned to distinct locations in the memory. The variable names are used to refer to the corresponding memory location addresses. The contents of these locations represent the values of the three variables. Hence, the above high-level language statement requires the action C ← [A] + [B] to take place in the computer. To carry out this action, the contents of memory locations A and B are fetched from the memory and transferred into the processor where their sum is computed. This result is then sent back to the memory and stored in location C. There are four types of Instruction Formats. They are: ∙ Three address Instructions ∙ Two address Instructions 7. Instructions and Instruction Sequencing (i) Three address Instructions Assume that this instruction contains the memory addresses of the three operands—A, B, and C. This type of instruction can be represented symbolically as Add A,B,C Operands B and C are called the source operands, A is called the destination operand and Add is the operation to be performed on the operands. A general instruction of this type has the format Operation Destination ,Source1,Source2 If kbits are needed to specify the memory address of each operand, the encoded form of the above instruction must contain 3kbits for addressing purposes in addition to the bits needed to denote the Add operation. For a modern processor with a 32-bit address space, a 3- address instruction is too large to fit in one word for a reasonable word length. Thus, a format that allows multiple words to be used for a single instruction would be needed to represent an instruction of this type. (ii) Two address Instructions An alternative approach is to use a sequence of simpler instructions to perform the same task, with each instruction having only one or two operands. Suppose that two address instructions of the form Operation Destination ,Source are available. An Add instruction of this type is Add A, B which performs the operation A←[A] + [B]. When the sum is calculated, the result is sent to the memory and stored in location A, replacing the original contents of this location. This means that operand B is both a source and a destination. A single two-address instruction cannot be used to solve our original problem, which is to add the contents of locations A and B, without destroying either of them, and to place the sum in location C. The problem can be solved by using another two address instruction that copies the contents of one memory location into another. Such an instruction is Move C, B which performs the operation C←[B], leaving the contents of location B unchanged. 7. Instructions and Instruction Sequencing (iii) One address Instructions Another possibility is to have machine instructions that specify only one memory operand. When a second operand is needed, as in the case of an Add instruction, it is understood implicitly to be in a unique location, accumulator. A processor register, usually called the Accumulator is used for one Address instruction Add A means the following: Add the contents of memory location A to the contents of the accumulator register and place the sum back into the accumulator. The following are the examples of one- address instructions Load A and Store A The Load instruction copies the contents of memory location A into the accumulator, and the Store instruction copies the contents of the accumulator into memory location A. Using only one-address instructions, the operation C←[A]+[B] can be performed by executing the sequence of instructions Load A Add B Store C 7. Instructions and Instruction Sequencing Note that the operand specified in the instruction may be a source or a destination, depending on the instruction. In the Load instruction, address A specifies the source operand, and the destination location, the accumulator, is implied. On the other hand, C denotes the destination location in the Store instruction, whereas the source, the accumulator, is implied. Some early computers were designed around a single accumulator structure. Most modern computers have a number of general-purpose processor registers — typically 8 to 32, and even considerably more in some cases. Access to data in these registers is much faster than to data stored in memory locations because the registers are inside the processor. Because the number of registers is relatively small, only a few bits are needed to specify which register takes part in an operation. For example, for 32 registers, only 5 bits are needed. This is much less than the number of bits needed to give the address of a location in the memory. Because the use of registers allows faster processing and results in shorter instructions, registers are used to store data temporarily in the processor during processing. (iv) Zero address Instruction It is also possible to use instructions in which the locations of all operands are defined implicitly. Such instructions are found in machines that store operands in a structure called a pushdown stack. In this case, the instructions are called zero address instructions. Push A Push B Add Pop C Operand A is transferred from memory to pushdown stack and the second operand B from memory is transferred to stack. Then the zero address instruction ‘Add’ , pops the top two contents of the stack and adds them and stores the result by pushing it into the stack. The result is then popped and stored into the memory location C. 7. Instructions and Instruction Sequencing 7.4 Types of Instructions: 1. Data Transfer Instructions 1. Register Transfer 🡪 Eg. Move 2. Memory Transfer 🡪 Eg. Load , Store 3. Arithmetic Instructions - Eg. Add , Sub 4. Logical Instructions - e.g. And , Or , Xor , Shift 5. Control transfer Instructions - Eg. Branch Loop , Branch>0 Loop, Procedure Call, Return 6. Input-Output Instructions - Eg. Read , Write 7.4.4. Control transfer Instructions (a) INSTRUCTION EXECUTION AND STRAIG H T - L I N E SEQUENCING Figure 2.8 shows a program segment for the task C ← [A] + [B] as it appears in the memory of a computer. This code is written with the assumption that the computer allows only one memory operand per instruction and has a number of processor registers. The next assumption is that the word length is 32 bits and the memory is byte addressable. The three instructions of the program are in successive word locations, starting at location i. Since each instruction is 4 bytes long, the second and third instructions start at addresses i+ 4 and i+ 8. For simplicity, the next assumption is that a full memory address can be directly specified in a single-word instruction, although this is not usually possible for address space sizes and word lengths of current processors. 7. Instructions and Instruction Sequencing Steps of program execution: ∙ The processor contains a register called the program counter (PC), which holds the address of the next instruction to be executed. To begin executing a program, the address of its first instruction (i in our example) must be placed into the PC. Then, the processor control circuits use the information in the PC to fetch and execute instructions, one at a time, in the order of increasing addresses. This is called straight-line sequencing. ∙ During the execution of each instruction, the PC is incremented by 4 to point to the next instruction. Thus, after the Store instruction at location i + 12 is executed, the PC contains the value i + 16, which is the address of the first instruction of the next program segment. Executing a given instruction is a two-phase procedure. ∙ In the first phase, called instruction fetch, the instruction is fetched from the memory location whose address is in the PC. This instruction is placed in the instruction register (IR) in the processor. ∙ At the start of the second phase, called instruction execute, the instruction in IR is examined to determine which operation is to be performed. The specified operation is then performed by the processor. This involves a small number of steps such as fetching operands from the memory or from processor registers, performing an arithmetic or logic operation, and storing the result in the destination location. At some point during this two-phase procedure, the contents of the PC are advanced to point to the next instruction. ∙ When the execute phase of an instruction is completed, the PC contains the address of the next instruction, and a new instruction fetch phase can begin 7. Instructions and Instruction Sequencing (b) BRANCHING Consider the task of adding a list of n numbers. The program outlined in Figure 2.9 is a generalization of the program in Figure 2.8. The addresses of the memory locations containing the n numbers are symbolically given as NUM1 , NUM2,... , NUMn, and a separate Add instruction is used to add each number to the contents of register R0. After all the numbers have been added, the result is placed in memory location SUM. Instead of using a long list of Add instructions, it is possible to place a single Add instruction in a program loop, as shown in Figure 2.10. The loop is a straight-line sequence of instructions executed as many times as needed. It starts at location LOOP and ends at the instruction Branch>0. During each pass through this loop, the address of the next list entry is determined, and that entry is fetched and added to R0. The address of an operand can be specified in various ways using the various addressing modes. Assume that the number of entries in the list, n, is stored in memory location N, as shown. Register R1 is used as a counter to determine the number of times the loop is executed. Hence, the contents of location N are loaded into register R1 at the beginning of the program.Then within the body of the loop. Decrement R1 reduces the contents of R1 by 1 each time through the loop. (A similar type of operation is performed by an Increment instruction, which adds 1 to its operand.) Execution of the loop is repeated as long as the result of the decrement operation is greater than zero. 7. Instructions and Instruction Sequencing A branch instruction is used to for the purpose of loop. This type of instruction loads a new value into the program counter. As a result, the processor fetches and executes the instruction at this new address, called the b r a n c h t a r g e t of the instruction at the location that follows the branch instruction in sequential address order. A conditional branch instruction causes a branch only if a specified condition is satisfied. If the condition is not satisfied, the PC is incremented in the normal way, and the next instruction in sequential address order is fetched and executed. In the program in Figure 2.10, the instruction Branch>0 LOOP (branch if greater than 0) is a conditional branch instruction that causes a branch to location LOOP if the result of the immediately preceding instruction, which is the decremented value in register R1, is greater than zero. This means that the loop is repeated as long as there are entries in the list that are yet to be added to R0. At the end of the nth pass through the loop, the Decrement instruction produces a value of zero, and, hence, branching does not occur. Instead, the Move instruction is fetched and executed. It moves the final result from R0 into memory location SUM. 7. Instructions and Instruction Sequencing (c) CONDITION CODES The processor keeps track of information about the results of various operations for use by subsequent conditional branch instructions. This is accomplished by recording the required information in individual bits, often called conditional code flags. These flags are usually grouped together in a special processor register called the condition code register and status register. Individual condition code flags are set to 1 or cleared to 0, depending on the outcome of the operation performed. Four commonly used flags are N (negative) Set to 1 if the result is negative; otherwise, cleared to 0 Z (zero) Set to 1 if the result is 0; otherwise, cleared to 0 V (overflow) Set to 1 if arithmetic overflow occurs; otherwise, cleared to 0 C (carry) Set to 1 if a carry-out results from the operation; otherwise, cleared to 0 ∙ The N and Z flags indicate whether the result of an arithmetic or logic operation is negative or zero. The N and Z flags may also be affected by instructions that transfer data, such as Move, Load, or Store. This makes it possible for a later conditional branch instruction to cause a branch based on the sign and value of the operand that was moved. Some computers also provide a special Test instruction that examines a value in a register or in the memory and sets or clears the N and Z flags accordingly. The V flag indicates whether overflow has taken place. Overflow occurs when the result of an arithmetic operation is outside the range of values that can be represented by the number of bits available for the operands. The processor sets the V flag to allow the programmer to test whether overflow has occurred and branch to an appropriate routine that corrects the problem. Instructions such as BranchIfOverflow are provided for this purpose. A program interrupt may occur automatically as a result of the V bit being set, and the operating system will resolve what to do. 7. Instructions and Instruction Sequencing ∙ In the 2’s-complement number representation system, n bits can represent values in the range −2n−1 to +2n−1 − 1. For example, using four bits, the range of numbers that can be represented is −8 through +7. When the result of an arithmetic operation is outside the representable range, an arithmetic overflow has occurred. The addition of numbers with different signs cannot cause overflow. This leads to the following conclusions: 1.Overflow can occur only when adding two numbers that have the same sign. 2. The carry-out signal from the sign-bit position is not a sufficient indicator of overflow when adding signed numbers. A simple way to detect overflow is to examine the signs of the two summands X and Y and the sign of the result. When both operands X and Y have the same sign, an overflow occurs when the sign of S is not the same as the signs of X and Y. ∙ The C flag is set to 1 if a carry occurs from the most significant bit position during an arithmetic operation. This flag makes it possible to perform arithmetic operations on operands that are longer than the word length of the processor. Such operations are used in multiple-precision arithmetic. ∙ The conditional branch instruction Branch>0, is an example of a branch instruction that tests one or more of the condition flags. It causes a branch if the value tested is neither negative nor equal to zero. That is, the branch is taken if neither N nor Z is 1. There are many other conditional branch instructions provided to enable a variety of conditions to be tested like Branch=0. The conditions are given as logic expressions involving the condition code flags. ∙ In some computers, the condition code f lags are affected automatically by instructions that perform arithmetic or logic operations. However, this is not always the case. ∙ A number of computers have two versions of an Add instruction, for example. One version, Add, does not affect the flags, but a second version, AddSetCC, does. This provides the programmer—and the compiler—with more flexibility when preparing programs for pipelined execution (parallel execution of instructions). 8. Addressing Modes 10 ADDRESSING MODES 1. GENERATING MEMORY ADDRESSES The purpose of the instruction block of Figure 2.10 at LOOP is to add a different number from the list during each pass through the loop. Hence, the Add instruction in that block must refer to a different address during each pass. So, the memory operand address cannot be given directly in a single Add instruction in the loop. Otherwise, it would need to be modified on each pass through the loop. As one possibility, suppose that a processor register, Ri, is used to hold the memory address of an operand. If it is initially loaded with the address NUM1 before the loop is entered and is then incremented by 4 on each pass through the loop, it can provide the needed capability. The situations like this, gives rise to the need for flexible ways to specify the address of an operand. The instruction set of a computer provides a number of such methods, called addressing modes. 8. Addressing Modes In general, a program operates on data that reside in the computer’s memory. These data can be organized in a variety of ways. To keep track of students’ names, we can write them in a list. If we want to associate information with each name, for example to record telephone numbers or marks in various courses, we may organize thisinformation in the form of a table.Programmers use organizations called Datastructures to represent the data used in computations. These include lists, linked lists, arrays, queues, and so on. Programs are normally written in a high-level language, which enables the programmer to use constants, local and global variables, pointers, and arrays. When translating a high-level language program into assembly language, the compiler must be able to implement these constructs using the facilities provided in the instruction set of the computer in which the program will be run. The different ways in which the location of an operand is specified in an instruction are referred to as addressing modes. 1 0. 3 IMPLEMENTATION OF VARIABLES AND CONSTANTS Variables are found in almost every computer program. In assembly language, a variable is represented by allocating a register or a memory location to hold its value. This value can be changed as needed using appropriate instructions. The precise definitions of these two modes are: Register mode—The operand is the contents of a processor register; the name of the register is given in the instruction. Absolute mode—The operand is in a memory location; the address of this location is given explicitly in the instruction The instruction Add R4, R2, R3 uses the Register mode for all three operands. Registers R2 and R3 hold the two source operands, while R4 is the destination. The Absolute mode can represent global variables in a program. A declaration such as Integer NUM1, NUM2, SUM; in a high-level language program will cause the compiler to allocate a memory location to each of the variables NUM1, NUM2, and SUM 8. Addressing Modes The Absolute mode is used in the instruction Load R2, NUM1 which loads the value in the memory location NUM1 into register R2. representing data or addresses are also found in almost every computer Constants. Such constants can be represented in assembly language using the Immediate addressing mode. Immediate mode—The operand is given explicitly in the instruction For example, the instruction Add R4, R6, 200immediate adds the value 200 to the contents of register R6, and places the result into register R4. Using a subscript to denote the Immediate mode is not appropriate in assembly languages. A common convention is to use the number sign (#) in front of the value to indicate that this value is to be used as an immediate operand. Hence, we write the instruction above in the form Add R4, R6, #200 In the addressing modes that follow, the instruction does not give the operand or its address explicitly. Indirection and Pointer: There is a capability for modifying the address of the memory operand during each pass through the loop. A good way to provide this capability is to use a processor register to hold the address of the operand. The contents of the register are then changed (incremented) during each pass to provide the address of the next number in the list that has to be accessed. The register acts as a pointer to the list, and we say that an item 8. Addressing Modes in the list is accessed indirectly by using the address in the register. The desired capability is provided by the indirect addressing mode. Indirect mode—The effective address of the operand is the contents of a register that is specified in the instruction. Indirection and the use of pointers are important and powerful concepts in programming. They permit the same code to be used to operate on different data. For example, register R5 in Figure 2.7 serves as a pointer for the Load instruction to load an operand from the memory into register R2. At one time, R5 may point to location B in memory. Later, the program may change the contents of R5 to point to a different location, in which case the same Load instruction will load the value from that location into R2. Thus, a program segment that includes this Load instruction is conveniently reused with only a change in the pointer value Indirect addressing can be used to access successive numbers in the list, resulting in the program shown in Figure 2.8. Register R4 is used as a pointer to the numbers in the list, and the operands are accessed indirectly through R4. The initialization section of the program loads the counter value n from memory location N into R2. Then, it uses the Clear instruction to clear R3 to 0. The next instruction uses the Immediate addressing mode to place the address value NUM1, which is the address of the first number in the list, into R4. Observe that we cannot use the Load instruction to load the desired immediate value, because the Load instruction can operate only on memory source operands 8. Addressing Modes The first time through the loop, the instruction Load R5, (R4) fetches the operand at location NUM1 and loads it into R5. The first Add instruction adds this number to the sum in register R3. The second Add instruction adds 4 to the contents of the pointer R4, so that it will contain the address value NUM2 when the Load instruction is executed in thesecond pass through the loop. Indexing and array: Index mode—The effective address of the operand is generated by adding a constant value to the contents of a register. For convenience, we will refer to the register used in this mode as the index register. Typically, this is just a general-purpose register. We indicate the Index mode symbolically as X(Ri) where X denotes a constant signed integer value contained in the instruction and Ri is the name of the register involved. The effective address of the operand is given by EA = X + [Ri] The contents of the register are not changed in the process of generating the effective address. 8. Addressing Modes The above figure illustrates two ways of using the Index mode. In Figure 2.9a, the index register, R5, contains the address of a memory location, and the value X defines an offset (also called a displacement) from this address to the location where the operand is found. An alternative use is illustrated in Figure 2.9b. Here, the constant X corresponds to a memory address, and the contents of the index register define the offset to the operand. In either case, the effective address is the sum of two values; one is given explicitly in the instruction, and the other is held in a register. To see the usefulness of indexed addressing, consider a simple example involving a list of test scores for students taking a given course. Assume that the list of scores, beginning at location LIST, is structured as shown in Figure 2.10. A four-word memory block comprises a record that stores the relevant information for each student. Each record consists of the student’s identification number (ID), followed by the scores the student earned on three tests. There are n students in the class, and the value n is stored in location N immediately in front of the list. 8. Addressing Modes In the body of the loop, the program uses the Index addressing mode in the manner depicted in Figure 2.9a to access each of the three scores in a student’s record. Register R2 is used as the index register. Before the loop is entered, R2 is set to point to the ID location of the first student record which is the address LIST. On the first pass through the loop, test scores of the first student are added to the running sums held in registers R3, R4, and R5, which are initially cleared to 0. These scores are accessed using the Index addressing modes 4(R2), 8(R2), and 12(R2). The index register R2 is then incremented by 16 to point to the ID location of the second student. Register R6, initialized to contain the value n, is decremented by 1 at the end of each pass through the loop. When the contents of R6 reach 0, all student records have been accessed, and the loop terminates. Until then, the conditional branch instruction transfers control back to the start of the loop to process the next record. The last three instructions transfer the accumulated sums from registers R3, R4, and R5, into memory locations SUM1, SUM2, and SUM3, respectively. 8. Addressing Modes Auto increment mode- The two modes described next are usefulfor accessing data items in successive locations in the memory. The effective address of the operand is the contents of a register specified in the instruction. After accessing the operand, the contentsof this register are automatically incremented to point to the next item in a list. The Autoincrement mode is represented by putting the specified register in parentheses, to show that the contents of the register are used as the effective address, followed by a plus sign to indicate that these contents are to be incremented after the operand is accessed. Thus, the Autoincrement mode is written as (Ri)+ Auto decrement mode – The contents of a register specified in the instruction are first automatically decremented and are then used as the effective address of the operand. The Autodecrement mode is denoted by putting the specified register in parentheses, preceded by a minus sign to indicate that the contents of the register are to be decremented before being used as the effective address. Thus, we write −(Ri) In this mode, operands are accessed in descending address order. The reader may wonder why the address is decremented before it is used in the Autodecrement mode and incremented after it is used in the Autoincrement mode. Advantages: The main reason for this is that these two modes can be used together to implement an important data structure called a stack. The actions performed by the Autoincrement and Autodecrement addressing modes can obviously be achieved by using two instructions, one to access the operand and the other to increment or decrement the register that contains the operand address. Combining the two operations in one instruction reduces the number of instructions needed to perform the desired task. Disadvantages: It is not always advantageous to combine two operations into a single instruction especially in a pipelined processor. 8. Addressing Modes 9 Reduced Instruction Set Computers(RISC) is architecture simplifies the instruction set (i.e.) simple instructions ii. Each instruction requires a small number of basic steps to execute. Hence faster instruction sets. iii. Relatively few instructions in a RISC processor’s instruction set. iv. Relatively few simple addressing modes – e.g. register addressing, immediate operands and relative mode. v. Memory access is limited to load and store instructions. vi. All operations are done within the registers of the CPU. vii. Fixed-length, easily decoded instruction format. viii. Hardwired rather than micro-programmed control unit, for faster operations. ix. Has the ability to execute one instruction per clock cycle by overlapping in a pipeline. x. For a processor that has only simple instructions, a large number xi. of instructions may be needed to perform a given programming xii. task. So a large value of N and a small value of S. A relatively large number of registers in the processor 10 (CISC) Complex Instruction set Computers II.A Large number of instructions typically vary from 100 to 250 iii. Some instructions that perform specialized tasks and are used infrequently. iv. A large variety of addressing modes – typically from 5 to 20 different modes. v. Variable length instruction formats. So only fewer instructions will be needed leading to a lower value of N and a larger value of S. Instructions that manipulate operands in memory. EXAMPLE PROBLEMS Problem1 : List the steps needed to execute the machine instruction Load R2, LOC in terms of transfers between the components and some simple control commands. Assume that the address of the memory location containing this instruction is initially in register PC. Solution: The required steps are: Send the address of the instruction word from register PC to the memory and issue a Read control command. Wait until the requested word has been retrieved from the memory, then load it into register IR, where it is interpreted (decoded) by the control circuitry to determine the operation to be performed. Increment the contents of register PC to point to the next instruction in memory. Send the address value LOC from the instruction in register IR to the memory and issue a Read control command. Wait until the requested word has been retrieved from the memory, then load it into register R2. Problem 2: Quantify the effect on performance that results from the use of a cache in the case of a program that has a total of 500 instructions, including a 100- instruction loop that is executed 25 times. Determine the ratio of execution time without the cache to execution time with the cache. This ratio is called the speedup. Assume that main memory accesses require 10 units of time and cache accesses require 1 unit of time. We also make the following further assumptions so that we can simplify calculations in order to easily illustrate the advantage of using a cache: Program execution time is proportional to the total amount of time needed to fetch instructions from either the main memory or the cache, with operand data accesses being ignored. Initially, all instructions are stored in the main memory, and the cache is empty. The cache is large enough to contain all of the loop instructions. Solution: Execution time without the cache is EXAMPLE PROBLEMS Problem 3 : Problem: Convert the following pairs of decimal numbers to 5-bit 2’s- complement numbers, then perform addition and subtraction on each pair. Indicate whether or not overflow occurs for each case. (a) 7 and 13