IM Computer Systems Organization PDF

Module 1: Introduction to Computer Systems Overview In this introduction to computer systems, we will discuss in detail what is a computer system, technical features of a computer system, computer hardware, computer system architecture, and other important topics related to the Computer System Lesson Outcomes o To understand the Computer systems Concepts o To understand the Computer Evolution and Performance o To examine the bus interconnection structures available. o To understand the Top-Level View of Computer System o To discuss the interconnection structures and the function of each computer component. Lesson 1: Computer Architecture Computer Architecture is a vital course in Computer Science; thus, it is one of the required courses in the program. Knowledge of computer organization and the underlying hardware is relevant for someone who intends to write software. Programmers need to understand how programs affect the underlying hardware particularly the instructions used, memory accesses, I/O operations, and timing. Knowledge of computer architecture is also needed for later courses, helping one to understand topics like compilers and operating systems. Understanding computer architecture allows a programmer to write computer programs that are faster, smaller, and less prone to errors, especially for programming embedded systems. Unfortunately, learning about Computer Architecture is no easy task as one will encounter numerous hardware details that sometimes can be counter-intuitive to learn. It cannot be mastered in one course and would generally require a background in electricity and electronics. The complexity is non-linear – small increases in size can produce large increases in hardware complexity. The good thing though is that it is possible to understand architectural components without knowing all low-level technical details. Usually, programmers only need to know the essentials such as the characteristics of major components, its (components) role in the overall system, and the consequences for software and software development. Computer architecture and computer organization are often confused with one another, often used interchangeably to refer to one or the other. (Stallings, 2016) provided a clear distinction between the two with computer architecture referring to those attributes of the (computer) system visible to a programmer that has a direct impact on the logical execution of a program. On the other hand, computer 6 organization pertains to the operational units and their interconnections that realize the architectural specifications. Incidentally, computer architecture is interchangeably used with instruction set architecture (ISA) that defines instruction formats, instruction opcodes, registers, instruction and data memory; the effect of executed instructions on the registers and memory; and an algorithm for controlling instruction execution. Examples of architectural attributes include the instruction set, the number of bits used to represent various data types (e.g., numbers, characters), I/O mechanisms, and techniques for addressing memory. Organizational attributes include those hardware details transparent to the programmer, such as control signals; interfaces between the computer and peripherals; and the memory technology used. Lesson 2: Components of a Computer System (Processor, Memory, Input/Output) The computer is a device that operates upon information or data. It is an electronic device that accepts input data, stores the data, does arithmetic and logic operations, and provides the outputs in the desired format. Figure 1 Basic elements of every computer There are five fundamental components of any computer machine design. These are input/output, storage, communication, control, and processing. Figure 1 shows the five basic units of the simplest computer. The devices on the left correspond to sources and destination of data, the channel corresponds to the links between components necessary to pass data among different components, and the units on the right correspond to the control mechanism of a Turing machine. Because of cost and/or performance, various of these basic units are elaborated in different ways in extant machines. Incidentally, Alan Turing 7 was the first person to envisage, and prove the possibility of, machines that could read programmed instructions for any conceivable task and then execute those instructions. Input-Output. The input unit is formed by the input devices attached to the computer. Input devices are used to interact with a computer system or are used to enter data and instructions to the computer. These devices convert input data and instructions into binary form (such as ASCII) that can be accepted by the computer. The output unit on the other hand is formed by the output devices attached to the computer. They are used to present the result produced by the computer to the users in the form of electric signals, which is then converted into a human-understandable form. Typical examples of output devices are the monitor, printer, and speaker. Storage. The data and instructions, which are entered through an input unit must be stored on the computer before the actual processing starts. The result produces by the computer after processing is also kept somewhere before passed to the output units. If intermediate results are produced during processing, it should be stored somewhere in memory. The storage unit of a computer performs all these needs which consist of two types of memory or storage: primary and secondary storage. Communication. Communication channels here refers to the link between the different computer units that enable data and control signals to pass through the different components. This is often referred to as the bus and there are different buses that connect the various components depending on the type of signals being passed. Control. The control unit provides the necessary timing and control signals to all the operations on the computer. It controls the flow of data between the CPU, memory, and peripherals. It also controls the entire operation of a computer. It obtains the instructions from the program stored in the main memory, interprets the instructions, and issues the signals, which cause the other units of the system to execute them. So, it is considered as a central nervous system of a computer that provides status, control, and timing signals necessary for the operation of other parts of CPU, memory, and I/O devices. Processing (ALU). This is the area of CPU where various computing functions are performed on data. The ALU performs arithmetic operations such as addition, subtraction, multiplication, and division, and logical operations such as comparison AND, OR and Exclusive OR. The result of an operation is stored in an Accumulator or in some register. Lesson 2: The Von Neumann Model Most of the computer concepts we knew today was developed by John Von Neumann, a Hungarian-born scientist and one of the major legendary figures of twentieth century mathematics. He worked on the Manhattan project and was a prolific contributor to various fields like physics, mathematics, economics, and game theory among others. 8 Prior to the development of the model, all programming was done at the digital logic level as in the case of ENIAC. This means that programming the computer involved moving plugs and wires and that a different hardware configuration was needed to solve every unique problem type. As an example, configuring the ENIAC to solve a “simple” problem required many days of labor by skilled technicians. Central to the von Neumann machine is the concept of the stored program -- the principle that instructions and data are to be stored together intermixed in a single, uniform storage medium rather than separately, as was previously the case. a) John von Neumann b) The Von Neumann Model Figure 2 John von Neumann and the Von Neumann Model Another concept central to the von Neumann machine is the program counter, a register that is used to indicate the location of the next instruction to be executed and which is automatically incremented by each instruction fetch. Almost all computers use this technique since it clearly reduces the storage space that would otherwise be necessary if each instruction contained a field to indicate the address of its successor. Today’s stored-program computers have the three hardware systems namely a central processing unit (CPU), a main memory system, and an I/O system. It has the capacity to carry out sequential instruction processing. And there exists a single path between the CPU and main memory. This single path is known as the von Neumann bottleneck. These computers employ a fetch-decode-execute cycle to run programs. 9 Figure 3 General depiction of a von Neumann System Lesson 3: The System Bus Model The System Bus Model comes as a refinement of the von Neumann model that has a CPU (ALU and control), memory, and an input/output unit. What is special about this model is highlighting on the system bus as a key component in the model. Communication among components is handled by a shared pathway called the system bus, which is made up of the data bus, the address bus, and the control bus. There is also a power bus, and some architectures may also have a separate I/O bus. Because a multipoint bus is a shared resource, access to this multipoint bus is controlled through protocols, which are built into the hardware. A system bus is a single computer bus that connects the major components of a computer system, combining the functions of a data bus to carry information, an address bus to determine where it should be sent, and a control bus to determine its operation. The technique was developed to reduce costs and improve modularity, and although popular in the 1970s and 1980s, more modern computers use a variety of separate buses adapted to more specific needs. 10 Figure 4 System Bus Model Lesson 4: Levels of abstraction (Levels of Machines) There are a number of levels in a computer (the exact number is open to debate), from the user level down to the transistor level. Progressing from the top level downward, the levels become less abstract as more of the internal structure of the computer becomes visible. Figure 5 Level of Machines (Tanenbaum, 2006) provided a six-level computer which exists to most modern computers. Yet another level below level 0 exists and is not shown because it falls within the realm of electrical engineering called device level. At the device level, the designer sees individual transistors which are the lowest-level primitives for computer designers. Level 0 – Digital Logic Level 11 The interesting objects at this level are gates. Each gate has one or more digital inputs (signals representing a 0 or 1) and computes as output some simple function of these inputs such as AND or OR. Each gate is built of a handful of transistors that can be combined to form a 1-bit memory which can store a 0 or 1. 1-bit memories can be combined in groups to form registers (e.g., by 16, 32, or 64) where each register can hold a single binary number up to some maximum values. Gates can also be combined to form the main computing engine itself. Figure 6 A six-level computer Level 1 – Microarchitecture Level In the microarchitecture level, we see a collection of (typically) 8-32 registers that form a local memory, and a circuit called an ALU (Arithmetic Logic Unit) that is capable of performing simple arithmetic operations. The registers are connected to the ALU to form a data path over which the data flows. The basic operations of the data path consist of selecting one or two registers having the ALU operate on them. For example, adding them together with the result being stored back in some register. On some machines the operation of the data path is controlled by a program called a microprogram, on other machines it is controlled by hardware. 12 Level 2 – Instruction Set Architecture Level The Instruction Set Architecture (ISA) level is defined by the machine’s instruction set. This is a set of instructions carried out interpretively by the microprogram or hardware execution sets. Level 3 – Operating System Machine Level The Operating System Machine level provides a different memory organization, a new set of instructions, the ability to run one or more programs concurrently, and various other features. The new facilities added are carried out by an interpreter running at level 2, which has been called an operating system (OS). Level 3 instructions identical to Level 2’s is carried out directly by the microprogram (or hardwired control), not by the OS. In other words, some of the level 3 instructions are interpreted by the OS and some of the level 3 instructions are interpreted directly by the microprogram. Between Levels 3 and 4 The lower 3 levels are not for the average programmer – Instead they are primarily for running the interpreters and translators needed to support the higher levels. These are written by system programmers who specialize in developing new virtual machines while Levels 4 and above are intended for the applications programmer. Levels 2 and 3 are always interpreted, Levels 4 and above are usually, but not always, supported by translation. Languages at Levels 1, 2 and 3 are generally numeric – long series of numbers – from Level 4 and higher, the languages contain words and abbreviations meaningful to humans. Level 4 – Assembly Language Level This level is really a symbolic form for the one of the underlying languages. This level provides a method for people to write programs for levels 1, 2 and 3 in a form that is not as unpleasant as the virtual machine languages themselves. Programs in assembly language are first translated to level 1, 2 or 3 language and then interpreted by the appropriate virtual or actual machine. The program that performs the translation is called an assembler. 13 Level 5 – Problem-oriented Language Level This level usually consists of languages designed to be used by applications programmers. These languages are generally called higher level languages. Examples of which are Java, C, BASIC, LISP, Prolog. Programs written in these languages are generally translated to Level 3 or 4 by translators known as compilers, although occasionally they are interpreted instead. For example, in some cases Level 5 consists of an interpretation for a specific application domain, such as symbolic mathematics. Thus, computers are designed as a series of levels, each one built on its predecessors. Each level represents a distinct abstraction, with different operations and objects present allowing us to handle the complexity of computer design. The set of data types, operations and features of each level is called its architecture. The implementation aspects, such as what kind of chip technology is used to implement the memory, are not part of the architecture. The study of how to design those parts of a computer system that are visible to the programmers is called computer architecture. Lesson 5: Evolution of Computers The development of computers can be traced back to the development of the different physical components that form part of the machine. We will not discuss the intricate details of its history but rather present some of the most important aspects of the evolution of computers. Refer to the links at the end of this unit if you would like to refresh yourselves of the evolution of computers. Generation in computer terminology refers to the changes in technology a computer is/was being used. Previously, the generation term was used to distinguish between varying hardware technologies but now includes both hardware and software, which together make up an entire computer system. Beyond the third generation there is less general agreement on defining generations of computers attributed on the advances in integrated circuit technology. With the rapid pace of technology, the high rate of introduction of new products, and the importance of software and communications as well as hardware, the classification by generation becomes less clear and less meaningful. 1st Generation: 1945–55, vacuum tubes, relays, mercury delay lines The first generation of computers used vacuum tubes for digital logic elements and memory. As a result, they were enormous, literally taking up entire rooms and costing a fortune to run. These were inefficient materials which consumed huge amounts of electricity and subsequently generated a lot of heat which caused ongoing breakdowns. 14 All such tubes mainly has an anode and a cathode in a vacuum setting. The appearance varies from small ceramic sized as a corn seed to one meter or taller compact steel. Some locally available ones are made from glass or aluminum with a cylindrical shape. Figure 7 Sample of Vacuum Tubes These first-generation computers relied on ‘machine language’ (which is the most basic programming language that can be understood by computers) and were limited to solving one problem at a time. Input was based on punched cards and paper tape. Output came out on printouts. The two notable machines of this era were the UNIVAC and ENIAC machines – the UNIVAC is the first ever commercial computer which was purchased in 1951 by a business – the US Census Bureau. 2nd Generation: 1955–65, discrete transistors, and magnetic cores The transistor was invented at Bell Labs in 1947 but wasn’t used significantly in computers until the end of the 1950s. Though within 10 years the transistor revolutionized computers and made vacuum tube computers obsolete by the late 1950s. Figure 8 Sample of Transistors 15 The replacement of vacuum tubes by transistors saw the advent of the second generation of computing. They were a big improvement over the vacuum tube, despite still subjecting computers to damaging levels of heat. However, they were hugely superior, making computers smaller, faster, cheaper and less heavy on electricity use. They still relied on punched cards for input/printouts. The language evolved from cryptic binary language to symbolic (‘assembly’) languages. This meant programmers could create instructions in words. At about the same time high level programming languages were being developed (early versions of COBOL and FORTRAN). Transistor-driven machines were the first computers to store instructions into their memories – moving from magnetic drum to magnetic core ‘technology’. The early versions of these machines were developed for the atomic energy industry. 3rd Generation: 1965–71, small- and medium-scale integrated circuits Third-generation computers were where we saw the introduction of integrated circuits (IC), which are still in use today. The invention of the silicon integrated circuit by Robert Noyce in 1958 allowed dozens of transistors to be put on a single chip. This packaging made it possible to build computers that were smaller, faster, and cheaper than their transistorized predecessors. Figure 9 Sample of Integrated Circuit By this phase, transistors were now being miniaturized and put on silicon chips (called semiconductors). This led to a massive increase in the speed and efficiency of these machines. These were the first computers where users interacted using keyboards and monitors which interfaced with an operating system, a significant leap up from punch cards and printouts. This enabled machines to run several applications at once using a central program which functioned to monitor memory. As a result of these advances which again made machines cheaper and smaller, a new mass market of users emerged during the ‘60s. 16 An IC can be further classified as being digital, analog or a combination of both. The most common example of a modern IC is the computer processor, which consists of billions of fabricated transistors, logic gates and other digital circuitry. 4th Generation: 1971-1980, single-chip microcomputer, VLSI microprocessor In the fourth generation of computers, the invention of the microprocessor (commonly known as CPU) helped to get computers to the desk and, later, lap-size that we still know and use today. This revolution can be summed up in one word: Intel. The chipmaker developed the Intel 4004 chip in 1971, which positioned all computer components (CPU, memory, input/output controls) onto a single chip. What filled a room in the 1940s now fit in the palm of the hand. The Intel chip housed thousands of integrated circuits. In 1981, we saw the first home computers, brought to us by IBM, and in 1984, the first Apple Macintosh was introduced. Microprocessors have even moved beyond the realm of computers and into an increasing number of everyday products. Figure 10 A VLSI chip (right) and the circuitry found inside (left) The increased power of these small computers meant they could be linked, creating networks. Which ultimately led to the development, birth, and rapid evolution of the Internet. Other major advances during this period have been the Graphical user interface (GUI), the mouse and more recently the astounding advances in lap-top capability and hand-held devices. 5th Generation: 1980-present, ULSI microprocessor, low-power and invisible computers 17 Although we are still using technology from the fourth generation of information technology, we are now going into a new age: the fifth generation. This generation is characterized by the introduction of artificial intelligence. Computer devices with artificial intelligence are still in development, but some of these technologies are beginning to emerge and be used such as voice recognition (such as Apple’s Siri or Amazon’s Alexa). AI is constantly adapting and, moving forward, is expected to become more tailored towards individual business needs. The hope, as this generation progresses, is that computers can begin to learn self-organization, which sounds pretty appealing if organization isn’t something that comes naturally to you! AI is a reality made possible by using parallel processing and superconductors. Leaning to the future, computers will be radically transformed again by quantum computation, molecular and nano technology. The essence of fifth generation will be using these technologies to ultimately create machines which can process and respond to natural language and have capability to learn and organize themselves. Figure 11 The different technology used in the development of the computer that created the generation we know today. Scale of Integration The number of components fitted into a standard size IC represents its integration scale or its density of components. The integration scale is classified as follows: Small scale integration (SSI) – ten to hundreds of active components (transistors) per chip. Medium scale integration (MSI) – hundreds to thousands of transistors per chip Large scale integration (LSI) – thousands to several hundred thousand active components per chip Very large scale integration (VLSI) – up to 1 million transistors per chip Ultra large scale integration (ULSI) – This represents a modern IC with millions and billions of transistors per chip. Giga scale integration (GSI) – contains more than one billion transistor gates. 18 Table 1 Some milestones in the development of the modern digital computer Lesson 6: Measuring Performance Performance is a very vague term when used in the context of computer systems. Generally, performance describes how quickly a given system can execute a program. As such, systems that can execute programs in less time are said to have higher performance. The best measure of computer performance is the execution time of the program but generally, it is impractical to test all of the programs that will be run on a given system before deciding which computer to purchase or when making design decisions. Instead, a variety of metrics to describe computer performance were devised, including metrics for the performance of individual computer subsystems. Note that many factors other than performance may influence the design or purchase decisions. Ease of programming is an important consideration, as time and expense required to develop needed 19 programs may be more significant than the difference in execution times of the programs once they have been developed. Also, is the issue of compatibility as most programs are sold as binary images that will only run on a particular family of processors. If the program you need will not run-on a given system, it does not matter how quickly the system executes other programs. MIPS (Millions of Instructions Per Second) An early measure of computer performance was the rate at which a given machine executes instructions. This is done by dividing the number of instructions executed in running a program by the time required to run the program and is expressed in millions of instructions per second or MIPS. Though, MIPS has fallen out of use as a measure of performance since it does not account that different systems often require different numbers of instructions to implement a given program. A computer’s MIPS rating does not tell one anything about how many instructions it requires to perform a given task, making it less useful than other metrics for comparing the performance of different systems. CPI/IPC (Cycles per Instruction/Instructions per Cycle) Another metric used to describe computer performance is the number of clock cycles required to execute each instruction or cycles per instruction (CPI). CPI is calculated by dividing the number of clock cycles required to execute the program by the number of instructions executed in running the program. For systems that can execute more than one instruction per cycle, the number of instructions executed per cycle or IPC is often used instead of CPI. IPC is calculated by dividing the number of instructions executed in running a program by the number of clock cycles required to execute the program. This is the reciprocal of CPI. These two metrics give the same information and the choice of which one to use is generally based on which of the values is greater than the number 1. When using CPI and IPC to compare systems, it is important to remember that high IPC values indicate that the reference program took fewer cycles to execute while high CPI values indicate that more cycles were required to execute the program as compared to low CPI values. Simply, a large IPC tends to indicate good performance, while a large CPI indicates poor performance. To illustrate, a given program consists of a 100-instruction loop that is executed 42 times. If it takes 16,000 cycles to execute the program on a given system, what are the system’s CPI and IPC values for the program? SOLUTION: The 100-instruction loop is executed 42 times, so the total number of instructions executed is 100 x 42 = 4,200. It takes 16,000 cycles to execute the program, so the CPI is 16,000 / 4,200 = 3.81. To compute for the IPC, we divide 4,200 instructions by 16,000 cycles, getting an IPC of 0.26. 20 IPC and CPM are even less useful measures of actual system performance than MIPS as they do not contain any information about a system’s clock rate or how many instructions the system requires to perform a task. If you know a system’s MIPS rating on a given program, you can multiply it by the number of instructions executed in running the program to determine how long the program took to complete. If you know the system’s CPI on a given program, you can multiply it by the number of instructions in the program to get the number of cycles it took to complete the program, but you have to know the number of cycles per second (the system’s clock rate) to convert that into the amount of time needed to execute the program. Thus, CPI and IPC are rarely used to compare actual computer systems. However, they are very common metrics in computer architecture research as most research is done in simulation, using programs that simulate a particular architecture to estimate how many cycles a given program will take to execute on that architecture. Benchmark Suites Both MIPS and CPI/IPC have significant limitations as measures of computer performance. Benchmark suites are a third measure of computer performance and were developed to address the limitations of MIPS and CPI/IPC. A benchmark suite is composed of a set of programs that are believed to be typical of the programs that will be run on the system. A system’s score on the benchmark suite is based on how long it takes the system to execute all the programs in the suite. Many different benchmark suites exist that generate estimates of a system’s performance on different types of applications. Benchmark suites provide several advantages over MIPS and CPI/IPC. First, their performance results are based on total execution times, not rate of instruction execution. Second, they average a system’s performance across multiple programs to generate an estimate of its average speed making a system’s overall rating on a benchmark suite a better indicator of its overall performance than its MIPS rating on any one program. Lastly, many benchmarks require manufacturers to publish their systems’ results on the individual programs within the benchmark as well as the system’s overall score on the benchmark suite. This makes it possible to make a direct comparison of individual benchmark results. Geometric versus Arithmetic Mean Many benchmark suites use the geometric rather than arithmetic mean to average the results of the programs contained in the benchmark suite, because a single extreme value has less of an impact on the geometric mean of a series than on the arithmetic mean. The geometric mean of n values is calculated by multiplying the n values together and taking the nth root of the product. On the other hand, the arithmetic mean, or average, of a set of values is calculated by adding all the values together and dividing by the number of values. For example, what are the arithmetic and geometric means of the values 4, 2, 4, 82? SOLUTION: 21 The arithmetic mean of this series is 4 + 2 + 4 + 82 = 23 4 The geometric mean is 4 √4 𝑥 2 𝑥 4 𝑥 82 = 7.16 Note that the inclusion of one extreme value in the series had a much greater effect on the arithmetic mean than on the geometric mean. Speedup Computer architects often use the term speedup to describe how the performance of an architecture changes as different improvements are made to the architecture. Speedup is simply the ratio of the execution times before and after a change is made. 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑏𝑒𝑓𝑜𝑟𝑒 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒𝑎𝑓𝑡𝑒𝑟 For example, if a program takes 25 seconds to run on one version of an architecture and 15 seconds to run on the new version, the overall speedup is 25 seconds/15 seconds = 1.67. Amdahl’s Law The most important rule for designing high-performance computer systems is making the common case fast. By common we mean the most time consuming not necessarily the most frequently executed as the uncommon case does not make much difference. The overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is used. This rule has been expressed as Amdahl’s Law which states 𝐹𝑟𝑎𝑐𝑢𝑠𝑒𝑑 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝑛𝑒𝑤 = 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝑜𝑙𝑑 𝑋 [𝐹𝑟𝑎𝑐𝑢𝑛𝑢𝑠𝑒𝑑 + ] 𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑢𝑠𝑒𝑑 In this equation, Fracunused is the fraction of time (not instructions) that the improvement is not in use, Fracused is the fraction of time that the improvement is in use, and Speedupused is the speedup that occurs when the improvement is used (this will be the overall speedup if the improvement were in use at all times). Note that Fracused and Fracunused are computed using the execution time before the modification is applied. Computing these values using the execution time after the modification is applied will give incorrect results. Amdahl’s Law can be rewritten using the definition of speedup: 22 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝑜𝑙𝑑 1 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = = 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝑛𝑒𝑤 𝐹𝑟𝑎𝑐𝑢𝑠𝑒𝑑 𝐹𝑟𝑎𝑐𝑢𝑛𝑢𝑠𝑒𝑑 + 𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑢𝑠𝑒𝑑 Example: Suppose that a given architecture does not have hardware support for multiplication, so multiplications have to be done through repeated addition, which is usually the case in some early microprocessors. If it takes 200 cycles to perform a multiplication in software, and 4 cycles to perform a multiplication in hardware, what is the overall speedup from hardware support for multiplication if a program spends 10 percent of its time doing multiplications? What about a program that spends 40 percent of its time doing multiplications? SOLUTION: In both cases, the speedup when the multiplication hardware is used is 200/4=50 (ratio of time to do a multiplication without the hardware to time with the hardware). In this case, where the program spends 10 percent of its time doing multiplications, Fracunused = 0.9, and Fracused = 0.1. Plugging these values into Amdahl’s Law, we got Speedup = 1 / [0.9 + (0.1/50)] = 1.11. If the program spends 40 percent of its time doing multiplications before the addition of hardware multiplication, then Fracunused = 0.6, and Fracused = 0.4, and we get Speedup = 1/[0.6 + (0.4/50)] = 1.64 The example illustrates the impact that the fraction of time an improvement is used has on overall performance. As Speedupused goes to infinity, overall speedup converges to 1/Fracunused, because the improvement cannot do anything about the execution time of the fraction of the program that does not use the improvement. 23 References A Short History of the Computer (https://youtu.be/eKPEVNlmaUg) Watch: A Short History of the Computer (https://youtu.be/eKPEVNlmaUg) Read: Section 1.3 A Brief History of Computers (pp. 11-27, Computer Organization and Architecture Designing for Performance 10th Edition by William Stallings) Section 1.2 Milestones in Computer Architecture (pp. 13-26, Structured Computer Organization 5th Edition by Andrew S. Tanenbaum) Review: Key Events in the History of Computing (http://ei.cs.vt.edu/%7Ehistory/50th/30.minute.show.html) Processor Basics - the various generations of processors over the past 20 years (http://www.iro.umontreal.ca/~pift1214/Transparents/micro-generations.html) Assessment Speedup 1. If the 1998 version of a computer executes a program in 200 s and the version of the computer made in the year 2000 executes the same program in 150 s, what is the speedup that the manufacturer has achieved over the two-year period? 2. To achieve a speedup of 3 on a program that originally took 78 s to execute, what must be the execution time of the program be reduced to? Measuring Performance 1. Why are benchmark programs and benchmark suites used to measure computer performance? 2. Why are there multiple benchmarks that are used by computer architects, instead of one “best” benchmark? 3. When running a particular program, computer A achieves 100 MIPS and computer B achieves 75 MIPS. However, computer A takes 60 s to execute the program, while computer B takes only 45 s. How is this possible? CPI/IPC 1. When run on a given system, a program takes 1,000,000 cycles. If the system achieves a CPI of 40, how many instructions were executed in running the program? 24 2. What is the IPC of a program that executes 35,000 instructions and requires 17,000 cycles to complete? Geometric versus Arithmetic Mean 1. Given the following set of individual benchmark scores for each of the programs in the integer portion of the SPEC2000 benchmark, compute the arithmetic and geometric means of each set. Benchmark Score Before Improvement Score After Improvement 1.64.gzip 10 12 175.vpr 14 16 176.gcc 23 28 181.mcf 36 40 186.crafty 9 12 197.parser 12 120 252.eon 25 28 253.perlbmk 18 21 254.gap 30 28 255.vortex 17 21 256.bzip2 7 10 300.twolf 38 42 25 Module 2: Machine Representation of Numbers and Characters Overview Data and instructions cannot be entered and processed directly into computers using human language. Any type of data, be it numbers, letters, special symbols, sound, or pictures must first be converted into machine-readable form i.e., binary form. Due to this reason, it is important to understand how a computer together with its peripheral devices handles data in its electronic circuits, on magnetic media and in optical devices. Lesson Outcomes After successful completion of this module, you should be able to: o To understand basic concepts of computer internal data representation focusing on numeric data and character codes. o To understand the concepts of Number systems. o To convert between Octal, Hexadecimal, Decimal, and Binary systems. o To understand 2’s complement representation of negative binary numbers o To understand the Boolean functions: OR, AND, NOR, NAND, XOR Lesson 1: Representation of Information in a Computer System Data refers to the symbols that represent people, events, things, and ideas. Data can be a name, a number, the colors in a photograph, or the notes in a musical composition. Data Representation refers to the form in which data is stored, processed, and transmitted. Devices such as smartphones, iPods, and computers store data in digital formats that can be handled by electronic circuitry. Computers use binary digits to represent all types of information inside the computer. From this term we get the word bit that stands for binary digit. A bit is a 0 or 1 used in the digital representation of data. Binary system is suitable for this purpose due to following reasons: 1. Electronic components in digital computers operate in binary mode. A switch is either on (1) or off (0); a transistor is either conducting (1) or non-conducting (0). 2. Computers must handle only two digits (bits) rather than 10. So binary system simplifies design, reduces the cost, and improve the reliability of the computer. 3. Everything that can be done with decimal system can also be done using a binary system. The terms bits, bytes, nibble, and word are used widely in reference to computer memory and data size. o Bits: can be defined as either a binary, which can be 0, or 1. It is the basic unit of data or information in digital computers. o Byte: a group of bits (8 bits) used to represent a character. A byte is considered as the basic unit of measuring memory size in computer. o A nibble: is half a byte, which is usually a grouping of 4 bytes. 26 o Word: two or more bits make a word. The term word length is used as the measure of the number of bits in each word. For example, a word can have a length of 16 bits, 32 bits, 64 bits etc. Character Codes Various schemes have been developed to represent data in the computer and to conveniently transmit it between systems. There is a need to represent both numeric and alphabetic information. There is also a need to represent special characters (e.g., ?/{}{} etc.) and control information (e.g., backspace (BS), delete (DEL), new line (NL), etc.). We commonly refer to these data as alphanumeric data and representation of alphanumeric characters in bits 0 and 1 is done by character codes. There are three widely used character codes: 1. Binary Coded Decimal (BCD) 2. Extended Binary Coded Decimal Interchange Code (EBCDIC) 3. American Standard Code for Information Interchange (ASCII) Binary Coded Decimal (BCD) The first coding scheme that became widely accepted was called BCD or Binary Coded Decimal. It was the dominant representation used on second generation computing systems. Binary Coded Decimal is a 4-bit code used to represent numeric data only. For example, a number like 9 can be represented using BCD as 10012. Binary Coded Decimal is mostly used in simple electronic devices like calculators and microwaves. This is because it makes it easier to process and display individual numbers on their Liquid Crystal Display (LCD) screens. A standard Binary Coded Decimal, an enhanced format of Binary Coded Decimal, is a 6-bit representation scheme which can represent non-numeric characters. The four right most bits (low order bits) are called the numeric bits, while the left two bits (high order bits) are called the zone bits. This allows 64 characters to be represented. For letter A can be represented as 1100012 using standard Binary Coded Decimal. When the zone bits are 00, the numeric bits are interpreted as the numeric digits 0 through 9. When the zone bits are any of the other three configurations, 01, 10, 11, the interpretation is then a letter or special symbol. 27 Figure 12 BCD representation of decimal numbers Commonly used Binary Codes The following is a list of some of the commonly used Binary Codes. The first three i.e., 8421, 2421 and 5211 are weighted binary codes while the other two are non-weighted binary codes. o 8421 Codes o 2421 Codes o 5211 Codes o Excess-3 Codes o Gray Codes Weighted Binary Codes The value of a digit in a number depends not only on its own intrinsic value but also on its location in the number. The 1st, 2nd, 3rd and 4th digit positions from right to left represent the ones, tens, hundreds, and thousands. For example, the value 2363, using the positional values, is understood to mean 3 ones, 6 tens, 3 hundreds, and 2 thousands. The values assigned to consecutive places in the decimal system, which is a place value system are 10⁴, 10³, 10², 10¹, 10⁰, 10⁻¹, 10⁻², 10⁻³… from left to right. It can be understood that the weight of digit of the decimal system is ‘10’. For example (3546.25)₁₀ = 3 x 10³ + 5 x 10² + 4 x 10¹ + 6 x 10⁰ + 2 x 10⁻¹ + 5 x 10⁻² In the same way the values assigned to consecutive places in the binary system, which is also a place value system, but called as weighted binary system are 2⁴, 2³, 2², 2¹, 2⁰, 2⁻¹, 2⁻², 2⁻³… from left to right. It can easily be understood that the weight of digit of the binary system is ‘2’. For example: (1110110)₂ = 1 x 2⁶+ 1 x 2⁵ + 1 x 2⁴ + 0 x 2³ + 1 x 2² + 1 x 2¹ + 0 x 2⁰ = 64 + 32 + 16 + 0 + 4 + 2 + 0 = (118) 10 Binary Weights Whenever any binary number appears, its decimal equivalent can be found easily as follows. When there is 1 in a digit position, weight of that position should be added. 28 When there is 0 in a digit position, weight of that position should be disregarded. For example, binary number 1100 has a decimal equivalent of 8 + 4 + 0 + 0 = 12. o 8421 Code or BCD Code. The decimal numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 can be expressed in Binary numbers as shown below. All these binary numbers are again expressed in the last column by expanding into 4 bits. As per the weighted binary digits, the 4 Bit binary numbers can be expressed according to their place value from left to right as 8421 (2³ 2² 2¹ 2⁰ = 8421). Table 2 Binary codes for decimal digits Decimal Number Binary Number 4 Bit Expression (8421) 0 0 0000 1 1 0001 2 10 0010 3 11 0011 4 100 0100 5 101 0101 6 110 0110 7 111 0111 8 1000 1000 9 1001 1001 As per the above expression all the decimal numbers written in the 4 Bit binary code is in the form of 8421. As this is a straight code, any Decimal number can be expressed easily because the weights of the positions are straight for easy conversion into these 8421 codes. Note however that four bits are required to code each decimal number thus 3510 is represented as 0011 0101 using BCD rather than 1000112. o 2421 Code. This code is also a 4-bit application code where the binary weights carry 2, 4, 2, 1 from left to right. o 5211 Code. This is a 4-bit code where the binary weights carry 5, 2, 1, 1 from left to right. Table 3 Binary codes for decimal digits using 2421 and 5211 code Decimal Number Binary Number 2421 Code 5211 Code 0 0 0000 0000 1 1 0001 0001 2 10 0010 0011 3 11 0011 0101 4 100 0100 0111 5 101 1011 1000 6 110 1100 1010 7 111 1101 1100 8 1000 1110 1110 29 Decimal Number Binary Number 2421 Code 5211 Code 9 1001 1111 1111 Non-Weighted Codes Some of the codes do not follow the weights of the sequence of binary numbers and are called non- weighted codes. ASCII code and Gray code are some of the examples that are coded for some special purpose applications and do not follow the weighted binary number calculations. o Excess-3 Code. The Excess-3 code is a non-weighted and self-complementary BCD code used to represent decimal numbers. This code has a biased representation and plays an important role in arithmetic operations because it resolves deficiencies encountered when we use the 8421 BCD code for adding two decimal digits whose sum is greater than 9. The Excess-3 code uses a special type of algorithm, which differs from the binary positional number system or normal non-biased BCD. We can easily get an excess-3 code of a decimal number by simply adding 3 to each decimal digit. And then we write the 4-bit binary number for each digit of the decimal number. We can find the excess-3 code of the given binary number by using the following steps: 1. We find the decimal number of the given binary number. 2. Then we add 3 in each digit of the decimal number. 3. Now, we find the binary code of each digit of the newly generated decimal number. We can also add 0011 in each 4-bit BCD code of the decimal number for getting excess-3 code. o Gray code. Gray code belongs to a class of code known as minimum change code, in which a number changes by only one bit as it proceeds from one number to the next. Hence this code is not useful for arithmetic operations. This code finds extensive use for shaft encoders, in some types of analog-to-digital converters, etc. Gray code is a reflected code and is shown in Table 4. The Gray code may contain any number of bits. Here we take the example of the 4-bit Gray code. The code shown in Table 4. is only one of many such possible codes. To obtain a different reflected code, one can start with any bit combination and proceed to obtain the next bit combination by changing only one bit from 0 to 1 or 1 to 0 in any desired random fashion, if two numbers do not have identical code assignments. The Gray code is not a weighted code. Table 4 Binary and Gray codes Decimal Number Binary Number Gray Code 0 0000 0000 1 0001 0001 2 0010 0011 3 0011 0010 4 0100 0110 5 0101 0111 6 0110 0101 30 Decimal Number Binary Number Gray Code 7 0111 0100 8 1000 1100 9 1001 1101 10 1010 1111 11 1011 1110 12 1100 1010 13 1101 1011 14 1110 1001 15 1111 1000 Conversion of a Binary Number into Gray Code Any binary number can be converted into equivalent Gray code by the following steps: 1. the MSB of the Gray code is the same as the MSB of the binary number; 2. the second bit next to the MSB of the Gray code equals the Ex-OR of the MSB and second bit of the binary number; it will be 0 if there are same binary bits or it will be 1 for different binary bits; 3. the third bit for Gray code equals the exclusive-OR of the second and third bits of the binary number, and similarly all the next lower order bits follow the same mechanism. For example, the conversion of the binary number 10110 to Gray code is as follows: Conversion of Gray Code into a Binary Number Any Gray code can be converted into an equivalent binary number by the following steps: 1. the MSB of the binary number is the same as the MSB of the Gray code. 31 2. the second bit next to the MSB of the binary number equals the Ex-OR of the MSB of the binary number and second bit of the Gray code; it will be 0 if there are same binary bits or it will be 1 for different binary bits; 3. the third bit for the binary number equals the exclusive-OR of the second bit of the binary number and third bit of the Gray code, and similarly all the next lower order bits follow the same mechanism. The conversion of the Gray code word 11011 to binary is as follows: Extended Binary Coded Decimal Interchange Code (EBCDIC) Extended Binary Coded Decimal Interchange Code or EBCDIC (pronounced Eb-see-dic) is an 8-bit character-coding scheme used primarily on IBM computers. A total of 256 (28) characters can be coded using this scheme. For example, the symbolic representation of letter A using Extended Binary Coded Decimal Interchange Code is 110000012. The extra characters are used for lower case letters, non-printing characters that are used to send control information, and other special printing characters. There are still unused codes for future growth. It also uses four numeric bits but has added two zone bits. In an attempt to maintain compatibility with the older BCD code, the meaning of the numeric bits was maintained for the digits and alphabetic letters. American Standard Code for Information Interchange (ASCII) ASCII or the American Standard Code for Information Interchange (pronounced Ass-kee) was developed through the cooperation of the computing and communications industry to simplify and standardize machine-to-machine and system-to-system communication. The set consists of letters, digits, special characters, control characters and graphic characters. ASCII has many of the properties of the EBCDIC code discussed above but there are several important differences. ASCII uses only seven bits but when we add one bit to help ensure data integrity, all the ASCII characters conveniently fit into one byte. This fact has significant benefits for communication. Another important difference is the sequence of characters as there are no gaps in the numeric values of the codes. This permits easy checking for data validity. ASCII has become the de facto standard on micro and mini computing systems and the standard for communications. EBCDIC is still found extensively on the large IBM and IBM compatible computing systems. The majority of both large and small systems provide translation utility programs to convert from one system to the other. The three codes discussed so far all have a fixed number of bits. It is also possible to have variable length codes. The advantage of a variable length code permits a significant saving in space. Since there is a variable length, it is then necessary to have some technique to indicate when one-character stops and 32 another starts. An additional saving in space may be realized by assigning the shortest codes to the most frequently occurring characters. One common method is called Hoffman Coding. It should be obvious that there are trade-offs in using variable length codes in place of the fixed length methods. The saving in space comes at the cost of increased processing time to decode the variable length characters. Lesson 2: Introduction to Number Systems It was Ada, Countess of Lovelace who first suggested using the binary number system to represent data inside the computer. John von Neumann is credited with reintroducing the idea almost a hundred years later in the early computing machines. To understand the number schemes or bases important in computing, we first will review some fundamental concepts using the more familiar base 10 or decimal system. A number base is a specific collection of symbols on which a number system can be built. Number bases are also known as Base Systems or Radix Systems. In computer science, we deal with four number systems but there exist other number systems used in other fields. When different number bases are being discussed, it is a common practice to use the number base as a subscript. Table 5 Number systems used in computers Number System Characteristics Base 10 has 10 symbols: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 originator of the decimal number system Base 2 exactly has 2 symbols: 0 and 1 Base 8 uses exactly 8 symbols: 0, 1, 2, 3, 4, 5, 6, and 7 Base 16 has 16 symbols: 0-9, A, B, C, D, E, and F where A=10, B=11, C=12, D=13, E=14, and F=15 Positional Notation and Base 10 The representation in which we normally write numbers is nothing more than shorthand symbolic representation for the precise mathematical equivalent. The number system we are all most familiar with is decimal or base 10. It is called base 10 because each position in a number has a value ten times as much as the position to its right and there are ten different digits 0,1,2,3,4,5,6,7,8,9. For example, in the number 354, the 4 represents 4 "ones", the 5 represents 5 "tens" and the 3 represents 3 "hundreds". We can see that each digit has two meanings; one is its own numeric value, and the second its position. As we count, when we reach 9, we know the next number starts with 1 in the next column. We say we have carried into the next position. If we are dealing with real numbers with a decimal point the same principle applies, except now as you move to the right of the decimal point you divide by ten for each successive position. To present the idea further, let us look at another example. Consider the value 2146 base 10, using the positional notation the number is understood to mean as: 33 2000 100 40 + 6 This number can also be represented as 2146 (2 x 1000) + (1 x 100) + (4 x 10) + (6 x 1) Using exponential notation, the same value can be represented as 2 x 103 + 1 x 102 + 4 x 101 + 6 x 100 Numeric data can be represented by its radix and “weight”. Here, 10 is called “radix” and upper right of 10 (in this case, 3) is called the “exponent”. Conversion between Number Systems To process numeric values in a computer, decimal numbers are converted into binary or hexadecimal numbers. However, since we ordinarily use decimal numbers, it would be difficult to understand the meaning of the result of a process if it were represented by binary or hexadecimal numbers. The process is called radix/base conversion. Decimal-to-Base N (2, 8, 16) conversion The idea about Decimal-to-Base N conversion can be summed up through the following steps: 1. Decimal integer is divided by the base of Base N 2. The quotient and remainder are obtained. 3. The quotient is divided by the base again until the quotient becomes 0. 4. The Base N value is obtained by placing the remainder(s) in reverse order. For example, to convert the decimal value 2510 to binary, we divide 2510 by the base (2) until we reach a zero quotient. The equivalent Base 2 value is the remainder placed in reverse order. 34 The same is true for other number base. As an example, let us try to convert 2510 to Base 16. Conversion of Decimal Fractions For decimal fractions, the value is multiplied by the base, the integer and fraction portion of the product are separated, and the integer section is extracted. Next, only the fraction portion is multiplied by the base. This operation is repeated until the fraction becomes 0 and the value is obtained by placing the integer portions extracted in the order they were extracted. 35 Applying the same idea, let us convert the decimal fraction to hexadecimal. Note however that when decimal fractions are converted, most of the time, the conversion is not finished and no matter how many times the fraction portion is multiplied it will not become 0. Most decimal fractions become infinite fractions. In that case, when a pattern is repeated the process of multiplying the fractional part ends and we get the extracted portions as the final value. Base N to Decimal conversion Though the representations of numbers may vary, the true value of numbers is the same. Table 6 list down the representation of decimal numbers (leftmost) in both binary number system and hexa- decimal number system. Getting the true value of numbers can be easily done using positional notations, applying the use of radix and weights. As an example, consider the following table below. Using the position of each value and the weight in that position, we can derive the true value of a number. The same process can be applied in other number base systems. DECIMAL number 2 1 9 9 8 (Radix/Base = 10) Weight 104 103 102 101 100 Value 2*104 1*103 9*102 9*101 8*100 Final (True) value 20000 + 1000 + 900 + 90 + 8 = 2199810 36 BINARY number 1 1 0 0 1 (Radix/Base = 2) Weight 24 23 22 21 20 Value 1 * 24 1 * 23 0 * 22 0 * 21 1 * 20 Final (True) value 16 + 8 + 0 + 0 + 1 = 2510 Table 6 Representation of decimal numbers in other number base Decimal Numbers Binary Numbers Hexa-decimal Numbers 0 0 0 1 1 1 2 10 2 3 11 3 4 100 4 5 101 5 6 110 6 7 111 7 8 1000 8 9 1001 9 10 1010 A 11 1011 B 12 1100 C 13 1101 D 14 1110 E 15 1111 F 16 10000 10 17 10001 11 18 10010 12 19 10011 13 20 10100 14 Try filling out the two tables below following what we have learned from the previous examples. 37 OCTAL number 2 1 7 7 2 (Radix/Base = 8) Weight Value Final (True) value HEXA number A 2 5 7 C (Radix/Base = 16) Weight Value Final (True) value Conversion of Binary to Hexadecimal Four binary strings are equivalent to 1 hexadecimal digit. Thus, in binary integers, the binary number is divided into groups of 4 digits starting from the least significant digit to represent a value in hexadecimal. For binary fractions, the binary number is divided into groups of 4 digits starting from the decimal point. Then, the conversion is performed by adding up the weights of each 1s in the group of 4 bits. If there is a bit string with less than 4 digits, the necessary number of 0s is added and the string is considered as a 4-bit string. 38 Conversion of Hexadecimal numbers into Binary numbers Converting hexadecimal numbers is performed by reversing the process of converting binary to hexadecimal. In other words, 1 digit of the hexadecimal number is represented with a 4-digit binary number. a) Conversion of hexadecimal integers b) Conversion of hexadecimal fractions 39 Conversion of Binary number into Octal number and vice versa In Octal number system, each digit in the system is represented by three binary digits representing the values 0 to 7. Though not commonly used currently, octal number system was used widely before during the development of the computer systems. The process of converting a binary value into an octal number system representation and vice versa is the same as that of the binary-hexadecimal conversion. Following the pattern in binary-hexadecimal conversion but using 3 bits to represent one octal digit, convert the following: a) Convert 10000112 to Octal 1 000 0112 1 0 3 → 1038 b) Convert 1038 to its binary form 1 0 38 001 000 011 → 10000112 Digital and Numeric Representations Earlier in this unit we presented how data is represented in the computer using codes. A computer number format is the internal representation of numeric values in any digital device, such as in programmable computers and calculators. The representation format suitable for each type of data allows for the precision and easiness with which calculations can be performed. Figure 13 Data representation format Decimal digit representation 1. Binary-coded Decimal (BCD) code Recall that Binary Coded Decimal code uses 4-bit binary digits that correspond to the numbers 0 to 9 of the decimal system. As a binary-coded decimal code is not a numeric value but a code, there are only 10 patterns, and the notation is performed by arranging the patterns for each digit as shown in Table 7. 40 Table 7 Binary-coded decimal code Decimal Number Binary Number Binary-Coded Decimal 0 0000 0000 1 0001 0001 2 0010 0010 3 0011 0011 4 0100 0100 5 0101 0101 6 0110 0110 7 0111 0111 8 1000 1000 9 1001 1001 10 1010 0001 0000 11 1011 0001 0001 12 1100 0001 0010 … … … As an example, consider the representation of the decimal number 789 using the binary-coded decimal code as follows: As the number of digits of a decimal number increases, the length of the binary-coded decimal code also increases. This format is called a variable-length format. Binary-coded decimal is mainly used for the numeric representation of decimal values and according to the memory format of the computer, it is divided into unpacked decimal format and packed decimal format. 2. Unpacked decimal format When representing signed decimal values, the unpacked decimal format uses 1 byte for each digit of the decimal number. In the unpacked decimal format, the values from 0 to 9 is represented in the least significant 4 bit of 1 byte. The most significant 4 bits are called zoned bits and the value (1111)2 is ordinarily stored. 41 In cases of signed values, the zoned bits of the least significant digit represent the sign. In both the case of 0 and positive numbers, the value (1100)2 is stored while in the case of negative numbers, (1101)2 is stored. The unpacked decimal format is also called the zoned decimal format. To explain further, the bit pattern for the decimal numbers +789 and -789 in the unpacked decimal format is shown in Figure 14. As one can notice, in the unpacked decimal format, except for the least significant byte, only half of a byte is used which was considered a waste of resource. This defect was eliminated by the packed decimal format. 3. Packed decimal format In packed decimal format, 1 byte represents a numeric value of 2 digits and for signed values, the least significant 4 bits of the least significant digit represent the sign. The bit pattern for the sign is the same as that of the unpacked decimal format, (1100)2 for 0 and positive numbers, and (1101)2 for negative numbers. Compared to the unpacked decimal format, the packed decimal format allows for a numeric value to be represented with fewer bytes and the conversion into the binary system is easy. 42 43 Binary representation 1. Representation of negative numbers a. Absolute value representation of negative numbers b. Complement representation of negative numbers i. Decimal complement ii. Binary complement iii. 1’s and 2’s complement representation of negative integers 2. Fixed point a. Integer representation b. Fraction representation 3. Floating point a. Floating point representation format in mainframe computers i. Exponent portion ii. Mantissa portion b. IEEE floating point representation format c. Shift operation i. Arithmetic shift ii. Logical shift Logical Operators 44 References Watch: Binary Data (https://www.dropbox.com/s/fro6nemj39ilj25/03_BinaryData_sm.mp4?dl=0) Read: Section 1.2 Milestones in Computer Architecture (pp. 13-26, Structured Computer Organization 5th Edition by Andrew S. Tanenbaum) Chapter 1 Data and Number Systems (pp. 1-30, Digital Principles and Logic Design by A. Saha and N. Manna) Chapter 2 Codes and Their Conversions (pp. 31-43, Digital Principles and Logic Design by A. Saha and N. Manna) Assessment This is the part where the learners get to apply the knowledge they have acquired from the module. These activities and assessments must be based on the learning objectives of the module. More so, it would be best to include opportunities for students to interact with each other when doing the activities or some of the assignments. Convert the following into binary, octal and hexadecimal Binary Octal Hexadecimal 2710 1510 50.2210 Convert the following into decimal Decimal 110112 338 1BF16 45 Module 3: Instruction Set Architecture Overview The instruction set is part of a computer that pertains to programming and provides a set of commands to the processor, telling it what it needs to do. This unit will focus on the instruction set and present the different components that forms part of an instruction set architecture Lesson Outcomes After successful completion of this module, you should be able to: o To examine the general concepts of how a computer executes an instruction o To describe how the information of various data types are represented in a computer o To describe and simulate the different addressing modes and instruction formats. o To understand the concept of the stack o To understand what a general register and segment registers are o To understand register to register transfers Lesson 1: Overview of the ISA abstraction Instruction Set Architecture as an Abstraction An instruction set architecture specifies how programs are to be encoded for a family of computers sharing that architecture. Once coded in a specific ISA, a program can generally be run on various machines sharing that ISA provided sufficient memory and I/O resources are available. In this respect, an ISA is an important engineering abstraction: it specifies an interface between hardware and software, and -- like other engineering abstractions -- allows both technologies to evolve independently. The design of an instruction set is subject to a number of pressures: Cost/performance. It is clearly desirable that an ISA be cheap to implement and execute programs quickly. Cost/performance scaling. A related, more subtle goal is that the ISA be consistent with implementations over a wide range of the cost/performance spectrum; that is, allowing specific programs to be run quickly on expensive computers or more slowly on cheaper ones. Model of computation. During much of the history of general-purpose computing, an ISA design often reflected a particular language family or computation model: machines were designed around, say, ALGOL or LISP. Backward compatibility. Common ISA desiderata include the requirement that a new architecture be able to run dusty deck software written for now-obsolescent machines. Future compatibility. Forward-looking ISAs are designed to anticipate new trends in both hardware and software, often by providing an evolutionary path for the ISA itself. 46 Each of these goals represents an engineering challenge; satisfying each involves some cost, and different chapters of the evolving computer engineering technology have led to successive trends in ISA design. Well-designed ISAs, like the IBM 360, have been spectacularly successful as engineering abstractions by allowing the same software to be run on many computers spanning a wide range of cost/performance tradeoffs and decades of technological advances. The isolation of software compatibility from technology-dependent hardware details remains a major goal in the design of instruction sets. In recent decades, however, it has been increasingly popular to downplay ISA details as the key to this compatibility, viewing instead some relatively machine-independent compiled language (such as C) as the interface specification. This view, popularized by Patterson and Hennessey as part of the Reduced Instruction Set Computer (RISC) movement, argues for starkly simplifying instruction sets by eliminating ISA "baggage" whose performance impact on the execution of real programs cannot be demonstrated by actual or simulated execution. The appeal of this RISC argument stems in part from the Turing universality of the general-purpose machines for which we are designing ISAs. Each general-purpose machine can perform the same computations, which we can specify in a reasonably machine-independent way using a higher-level language than machine instructions. So why not use our body of existing software as benchmarks, and tune the ISA design to maximize performance? Such thinking has been a major driver for ISA evolution over recent decades, resulting in steady, incremental progress in the design and implementation of ISAs. One plausible danger in this quantitative hill climbing approach to ISA evolution is that its basis on existing benchmarks may miss architectural innovations that require a new, alternative programming discipline: existing programs are optimized, to varying extents, to existing ISAs. It is possible that certain radical ISA improvements will require radically different software structures to demonstrate their advantages. Lesson 2: Data Types Operand Types An operand is a value that an instruction operates on. By giving an instruction type and an addressing mode, we have somehow specified some operands for the instruction. Integers. Usually 8-bit (characters), 16-bit (words), 32-bit (doubleword), 64-bit (quadword). The terminology may differ from one ISA to another. Single and double precision floating point numbers, usually 32-bit and 64-bit respectively. Binary-coded decimal. A single decimal digit occupies one half of a byte. Sometimes called packed decimal because decimal digits are packed together into bytes. Strings. Some ISAs support variable-length strings of bytes as a primitive data type in memory. Vectors of primitive types. Some (weird) ISAs support fixed or variable length vectors of primitive types. Examples: CRAY vector processors, MMX extensions to x86. 47 Lesson 3: Instruction Formats A computer performs a task based on the instruction provided. Instruction in computers comprises groups called fields. These fields contain different information as for computers everything is in 0 and 1 so each field has different significance based on which a CPU decides what to perform. The most common fields are: Operation field specifies the operation to be performed like addition. Address field which contains the location of the operand, i.e., register or memory location. Mode field which specifies how operand is to be founded. Instruction is of variable length depending upon the number of addresses it contains. Generally, CPU organization is of three types based on the number of address fields: 1. Single Accumulator organization 2. General register organization 3. Stack organization Instructions can be classified based on the number of operands as: three-address, two- address, one-and-half-address, one-address, and zero-address. We explain these classes together with simple examples in the following paragraphs. It should be noted that in presenting these examples, we would use the convention operation, source, destination to express any instruction. In that convention, operation represents the operation to be performed, for example, add, subtract, write, or read. The source field represents the source operand(s). The source operand can be a constant, a value stored in a register, or a value stored in the memory. The destination field represents the place where the result of the operation is to be stored, for example, a register or a memory location. A three-address instruction takes the form operation add-1, add-2, add-3. In this form, each of add-1, add-2, and add-3 refers to a register or to a memory location opcode Destination address Source address Source address mode Expression: X = (A+B) *(C+D) R1, R2 are registers M [] is any memory location ADD R1, A, B R1 = M[A] + M[B] ADD R2, C, D R2 = M[C] + M[D] MUL X, R1, R2 M[X] = R1 * R2 48 A two-address instruction takes the form operation add-1, add-2. In this form, each of add-1 and add-2 refers to a register or to a memory location. Consider, for example, the instruction ADD R1, R2. This instruction adds the contents of register R1 to the contents of register R2 and stores the results in register R2. Opcode Destination address Source address mode Expression: X = (A+B) *(C+D) R1, R2 are registers M [] is any memory location MOV R1, A R1 = M[A] ADD R1, B R1 = R1 + M[B] MOV R2, C R2 = C ADD R2, D R2 = R2 + D MUL R1, R2 R1 = R1 * R2 MOV X, R1 M[X] = R1 A one-address instruction takes the form ADD R1. In this case the instruction implicitly refers to a register, called the Accumulator 𝑅𝑎𝑐𝑐 , such that the contents of the accumulator is added to the contents of the register R1 and the results are stored back into the accumulator 𝑅𝑎𝑐𝑐 ,. Opcode Operand/address of operand mode Expression: X = (A+B) *(C+D) 𝑅𝑎𝑐𝑐 is accumulator M [] is any memory location M[T] is temporary location 49 LOAD A 𝑅𝑎𝑐𝑐 = M[A] ADD B AC = A𝑅𝑎𝑐𝑐 + M[B] STORE T M[T] = AC LOAD C 𝑅𝑎𝑐𝑐 = M[C] ADD D 𝑅𝑎𝑐𝑐 = 𝑅𝑎𝑐𝑐 + M[D] MUL T 𝑅𝑎𝑐𝑐 = 𝑅𝑎𝑐𝑐 * M[T] STORE X M[X] = 𝑅𝑎𝑐𝑐 Between the two- and the one-address instruction, there can be a one-and-half address instruction. Consider, for example, the instruction ADD B, R1. In this case, the instruction adds the contents of register R1 to the contents of memory location B and stores the result in register R1. It is interesting to indicate that there exist zero-address instructions. These are the instructions that use stack operation. A stack is a data organization mechanism in which the last data item stored is the first data item retrieved. Two specific operations can be performed on a stack. These are the push and the pop operations. 50 As can be seen, a specific register, called the stack pointer (SP), is used to indicate the stack location that can be addressed. In the stack push operation, the SP value is used to indicate the location (called the top of the stack) in which the value (5A) is to be stored (in this case it is location 1023). After storing (pushing) this value the SP is incremented to indicate to location 1024. In the stack pop operation, the SP is first decreased to become 1021. The value stored at this location (DD in this case) is retrieved (popped out) and stored in the shown register. Figure 16 Addition using the stack Note that we will use X = (A+B) *(C+D) expression to showcase the procedure. A*B B A STACK A*B PUSH B PUSH A A stack-based computer does not use the address field in the instruction. To evaluate an expression first it is converted to reverse Polish Notation i.e., Postfix Notation. Expression: X = (A+B) *(C+D) Postfixed: X = AB+CD+* TOP means top of stack M[X] is any memory location 51 PUSH A TOP = A PUSH B TOP = B ADD TOP = A+B PUSH C TOP = C PUSH D TOP = D ADD TOP = C+D MUL TOP = (C+D) *(A+B) POP X M[X] = TOP Different operations can be performed using the stack structure. Consider, for example, an instruction such as ADD (SP)þ, (SP). The instruction adds the contents of the stack location pointed to by the SP to those pointed to by the SP þ 1 and stores the result on the stack in the location pointed to by the current value of the SP. Figure 17 Instruction Classification 52 Lesson 4: Addressing Modes Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in each instruction set architecture define how machine language instructions in that architecture identify the operand (or operands) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere. In computer programming, addressing modes are primarily of interest to compiler writers and to those who write code directly in assembly language. Types of Addressing Modes Each instruction of a computer specifies an operation on certain data. There are various ways of specifying the address of the data to be operated on. These different ways of specifying data are called the addressing modes. The most common addressing modes are: Immediate addressing mode Direct addressing mode Indirect addressing mode Register addressing mode Register indirect addressing mode Displacement addressing mode Stack addressing mode To specify the addressing mode of an instruction several methods are used. Most often used are: a) Different operands will use different addressing modes. b) One or more bits in the instruction format can be used as mode field. The value of the mode field determines which addressing mode is to be used. The effective address will be main memory address of a register. 53 Immediate Addressing: This is the simplest form of addressing. Here, the operand is given in the instruction itself. This mode is used to define a constant or set initial values of variables. The advantage of this mode is that no memory reference other than instruction fetch is required to obtain operand. The disadvantage is that the size of the number is limited to the size of the address field, which most instruction sets are small compared to word length. INSTRUCTION OPERAND Direct Addressing: In direct addressing mode, the effective address of the operand is given in the address field of the instruction. It requires one memory reference to read the operand from the given location and provides only a limited address space. The length of the address field is usually less than the word length. Ex: Move P, Ro, Add Q, Ro P and Q are the address of operand. Figure 18 Illustration of the direct addressing mode Indirect Addressing: Indirect addressing mode, the address field of the instruction refers to the address of a word in memory, which in turn contains the full-length address of the operand. The advantage of this mode is that for the word length of N, an address space of 2N can be addressed. The disadvantage is that instruction execution requires two memory reference to fetch the operand Multilevel or cascaded indirect addressing can also be used. 54 Figure 19 Illustration of the indirect addressing mode Register Addressing: Register addressing mode is similar to direct addressing. The only difference is that the address field of the instruction refers to a register rather than a memory location 3 or 4 bits are used as address field to reference 8 to 16 generate purpose registers. The advantages of register addressing are small address field is needed in the instruction. Register Indirect Addressing: This mode is similar to indirect addressing. The address field of the instruction refers to a register. The register contains the effective address of the operand. This mode uses one memory reference to obtain the operand. The address space is limited to the width of the registers available to store the effective address. Displacement Addressing: In displacement addressing mode there are 3 types of addressing mode. They are: 1) Relative addressing 2) Base register addressing 3) Indexing addressing. This is a combination of direct addressing and register indirect addressing. The value contained in one address field. A is used directly and the other address refers to a register whose contents are added to A to produce the effective address. 55 Stack Addressing: Stack is a linear array of locations referred to as last-in first out queue. The stack is a reserved block of location, appended or deleted only at the top of the stack. Stack pointer is a register which stores the address of top of stack location. This mode of addressing is also known as implicit addressing. Lesson 5: Instruction Types Examples of operations common to many instructions sets include: Data handling and Memory Operations Data movement instructions are used to move data among the different units of the machine. Most notably among these are instructions that are used to move data among the different registers in the CPU. A simple register to register movement of data can be made through the instruction 𝑀𝑂𝑉𝐸 𝑅𝑖 , 𝑅𝑗 Data movement operation Meaning MOVE Move data (a word or a block) from a given source (a register or a memory) to a given destination LOAD Load data from memory to a register STORE Store data into memory from a register PUSH Store data from a register to stack POP Retrieve data from stack into a register Table 8 Some Common Data Movement Operations This instruction moves the content of register Ri to register Rj. The effect of the instruction is to override the contents of the (destination) register Rj without changing the contents of the (source) register Ri. Data movement instructions include those used to move data to (from) registers from (to) memory. These instructions are usually referred to as the load and store instructions, respectively. Examples of the two instructions are LOAD 25838, 𝑅𝑗 STORE 𝑅𝑖 , 1024 The first instruction loads the content of the memory location whose address is 25838 into the destination register Rj. The content of the memory location is unchanged by executing the LOAD instruction. The STORE instruction stores the content of the source register Ri into memory location 1024. The content of the source register is unchanged by executing the STORE instruction. 56 Arithmetic and Logic Operations Arithmetic and logical instructions are those used to perform arithmetic and logical manipulation of registers and memory contents. Examples of arithmetic instructions include the ADD and SUBTRACT instructions. These are ADD 𝑅1 , 𝑅2 , 𝑅0 SUBTRACT 𝑅1 , 𝑅2 , 𝑅0 The first instruction adds the contents of source registers R1 and R2 and stores the result in destination register R0. The second instruction subtracts the contents of the source registers R1 and R2 and stores the result in the destination register R0. The contents of the source registers are unchanged by the ADD and the SUBTRACT instructions. In addition to the ADD and SUBTRACT instructions, some machines have MULTIPLY and DIVIDE instructions. These two instructions are expensive to implement and could be substituted using repeated addition or repeated subtraction. Therefore, most modern architectures do not have MULTIPLY or DIVIDE instructions on their instruction set. Table 2.4 shows some common arithmetic operations and their meanings. Arithmetic operation Meaning ADD Perform the arithmetic sum of two operands SUBTRACT Perform the arithmetic difference of two operands MULTIPLY Perform the product of two operands DIVIDE Perform the division of two operands INCREMENT Add one to the contents of a register DECREMENT Subtract one from the contents of a register Table 9 Some Common Data Arithmetic Operations 57 Sequencing Instructions Control (sequencing) instructions are used to change the sequence in which instructions are executed. They take the form of CONDITIONAL BRANCHING (CONDITIONAL JUMP), UNCONDITIONAL BRANCHING (JUMP), or CALL instructions. A common characteristic among these instructions is that their execution changes the progr

IM Computer Systems Organization PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue