Introduction to Computer Systems PDF

&CHAPTER 1 Introduction to Computer Systems The technological advances witnessed in the computer industry are the result of a long chain of immense and successful efforts made by two major forces. These are the academia, represented by university research centers, and the industry, represented by computer companies. It is, however, fair to say that the current tech- nological advances in the computer industry owe their inception to university research centers. In order to appreciate the current technological advances in the computer industry, one has to trace back through the history of computers and their development. The objective of such historical review is to understand the factors affecting computing as we know it today and hopefully to forecast the future of computation. A great majority of the computers of our daily use are known as general purpose machines. These are machines that are built with no specific application in mind, but rather are capable of performing computation needed by a diversity of applications. These machines are to be distinguished from those built to serve (tailored to) specific applications. The latter are known as special purpose machines. A brief historical background is given in Section 1.1. Computer systems have conventionally been defined through their interfaces at a number of layered abstraction levels, each providing functional support to its pre- decessor. Included among the levels are the application programs, the high-level languages, and the set of machine instructions. Based on the interface between different levels of the system, a number of computer architectures can be defined. The interface between the application programs and a high-level language is referred to as a language architecture. The instruction set architecture defines the interface between the basic machine instruction set and the runtime and I/O control. A different definition of computer architecture is built on four basic viewpoints. These are the structure, the organization, the implementation, and the performance. In this definition, the structure defines the interconnection of various hardware com- ponents, the organization defines the dynamic interplay and management of the various components, the implementation defines the detailed design of hardware components, and the performance specifies the behavior of the computer system. Architectural development and styles are covered in Section 1.2. Fundamentals of Computer Organization and Architecture, by M. Abd-El-Barr and H. El-Rewini ISBN 0-471-46741-3 Copyright # 2005 John Wiley & Sons, Inc. 1 2 INTRODUCTION TO COMPUTER SYSTEMS A number of technological developments are presented in Section 1.3. Our discus- sion in this chapter concludes with a detailed coverage of CPU performance measures. 1.1. HISTORICAL BACKGROUND In this section, we would like to provide a historical background on the evolution of cornerstone ideas in the computing industry. We should emphasize at the outset that the effort to build computers has not originated at one single place. There is every reason for us to believe that attempts to build the first computer existed in different geographically distributed places. We also firmly believe that building a computer requires teamwork. Therefore, when some people attribute a machine to the name of a single researcher, what they actually mean is that such researcher may have led the team who introduced the machine. We, therefore, see it more appropriate to mention the machine and the place it was first introduced without linking that to a specific name. We believe that such an approach is fair and should eliminate any controversy about researchers and their names. It is probably fair to say that the first program-controlled (mechanical) computer ever build was the Z1 (1938). This was followed in 1939 by the Z2 as the first oper- ational program-controlled computer with fixed-point arithmetic. However, the first recorded university-based attempt to build a computer originated on Iowa State University campus in the early 1940s. Researchers on that campus were able to build a small-scale special-purpose electronic computer. However, that computer was never completely operational. Just about the same time a complete design of a fully functional programmable special-purpose machine, the Z3, was reported in Germany in 1941. It appears that the lack of funding prevented such design from being implemented. History recorded that while these two attempts were in progress, researchers from different parts of the world had opportunities to gain first-hand experience through their visits to the laboratories and institutes carrying out the work. It is assumed that such first-hand visits and interchange of ideas enabled the visitors to embark on similar projects in their own laboratories back home. As far as general-purpose machines are concerned, the University of Pennsylvania is recorded to have hosted the building of the Electronic Numerical Integrator and Calculator (ENIAC) machine in 1944. It was the first operational general-purpose machine built using vacuum tubes. The machine was primarily built to help compute artillery firing tables during World War II. It was programmable through manual set- ting of switches and plugging of cables. The machine was slow by today’s standard, with a limited amount of storage and primitive programmability. An improved version of the ENIAC was proposed on the same campus. The improved version of the ENIAC, called the Electronic Discrete Variable Automatic Computer (EDVAC), was an attempt to improve the way programs are entered and explore the concept of stored programs. It was not until 1952 that the EDVAC project was completed. Inspired by the ideas implemented in the ENIAC, researchers at the Institute for Advanced Study (IAS) at Princeton built (in 1946) the IAS machine, which was about 10 times faster than the ENIAC. 1.1. HISTORICAL BACKGROUND 3 In 1946 and while the EDVAC project was in progress, a similar project was initiated at Cambridge University. The project was to build a stored-program com- puter, known as the Electronic Delay Storage Automatic Calculator (EDSAC). It was in 1949 that the EDSAC became the world’s first full-scale, stored-program, fully operational computer. A spin-off of the EDSAC resulted in a series of machines introduced at Harvard. The series consisted of MARK I, II, III, and IV. The latter two machines introduced the concept of separate memories for instructions and data. The term Harvard Architecture was given to such machines to indicate the use of separate memories. It should be noted that the term Harvard Architecture is used today to describe machines with separate cache for instructions and data. The first general-purpose commercial computer, the UNIVersal Automatic Computer (UNIVAC I), was on the market by the middle of 1951. It represented an improvement over the BINAC, which was built in 1949. IBM announced its first com- puter, the IBM701, in 1952. The early 1950s witnessed a slowdown in the computer industry. In 1964 IBM announced a line of products under the name IBM 360 series. The series included a number of models that varied in price and performance. This led Digital Equipment Corporation (DEC) to introduce the first minicomputer, the PDP-8. It was considered a remarkably low-cost machine. Intel introduced the first micropro- cessor, the Intel 4004, in 1971. The world witnessed the birth of the first personal computer (PC) in 1977 when Apple computer series were first introduced. In 1977 the world also witnessed the introduction of the VAX-11/780 by DEC. Intel followed suit by introducing the first of the most popular microprocessor, the 80 ! 86 series. Personal computers, which were introduced in 1977 by Altair, Processor Technology, North Star, Tandy, Commodore, Apple, and many others, enhanced the productivity of end-users in numerous departments. Personal computers from Compaq, Apple, IBM, Dell, and many others, soon became pervasive, and changed the face of computing. In parallel with small-scale machines, supercomputers were coming into play. The first such supercomputer, the CDC 6600, was introduced in 1961 by Control Data Corporation. Cray Research Corporation introduced the best cost/performance supercomputer, the Cray-1, in 1976. The 1980s and 1990s witnessed the introduction of many commercial parallel computers with multiple processors. They can generally be classified into two main categories: (1) shared memory and (2) distributed memory systems. The number of processors in a single machine ranged from several in a shared memory computer to hundreds of thousands in a massively parallel system. Examples of parallel computers during this era include Sequent Symmetry, Intel iPSC, nCUBE, Intel Paragon, Thinking Machines (CM-2, CM-5), MsPar (MP), Fujitsu (VPP500), and others. One of the clear trends in computing is the substitution of centralized servers by networks of computers. These networks connect inexpensive, powerful desktop machines to form unequaled computing power. Local area networks (LAN) of powerful personal computers and workstations began to replace mainframes and minis by 1990. These individual desktop computers were soon to be connected into larger complexes of computing by wide area networks (WAN). 4 INTRODUCTION TO COMPUTER SYSTEMS TABLE 1.1 Four Decades of Computing Feature Batch Time-sharing Desktop Network Decade 1960s 1970s 1980s 1990s Location Computer room Terminal room Desktop Mobile Users Experts Specialists Individuals Groups Data Alphanumeric Text, numbers Fonts, graphs Multimedia Objective Calculate Access Present Communicate Interface Punched card Keyboard & CRT See & point Ask & tell Operation Process Edit Layout Orchestrate Connectivity None Peripheral cable LAN Internet Owners Corporate computer Divisional IS shops Departmental Everyone centers end-users CRT, cathode ray tube; LAN, local area network. The pervasiveness of the Internet created interest in network computing and more recently in grid computing. Grids are geographically distributed platforms of com- putation. They should provide dependable, consistent, pervasive, and inexpensive access to high-end computational facilities. Table 1.1 is modified from a table proposed by Lawrence Tesler (1995). In this table, major characteristics of the different computing paradigms are associated with each decade of computing, starting from 1960. 1.2. ARCHITECTURAL DEVELOPMENT AND STYLES Computer architects have always been striving to increase the performance of their architectures. This has taken a number of forms. Among these is the philosophy that by doing more in a single instruction, one can use a smaller number of instructions to perform the same job. The immediate consequence of this is the need for fewer memory read/write operations and an eventual speedup of operations. It was also argued that increasing the complexity of instructions and the number of addressing modes has the theoretical advantage of reducing the “semantic gap” between the instructions in a high-level language and those in the low-level (machine) language. A single (machine) instruction to convert several binary coded decimal (BCD) numbers to binary is an example for how complex some instructions were intended to be. The huge number of addressing modes considered (more than 20 in the VAX machine) further adds to the complexity of instructions. Machines following this philosophy have been referred to as complex instructions set computers (CISCs). Examples of CISC machines include the Intel PentiumTM, the Motorola MC68000TM, and the IBM & Macintosh PowerPCTM. It should be noted that as more capabilities were added to their processors, manufacturers realized that it was increasingly difficult to support higher clock rates that would have been possible otherwise. This is because of the increased 1.3. TECHNOLOGICAL DEVELOPMENT 5 complexity of computations within a single clock period. A number of studies from the mid-1970s and early-1980s also identified that in typical programs more than 80% of the instructions executed are those using assignment statements, conditional branching and procedure calls. It was also surprising to find out that simple assign- ment statements constitute almost 50% of those operations. These findings caused a different philosophy to emerge. This philosophy promotes the optimization of architectures by speeding up those operations that are most frequently used while reducing the instruction complexities and the number of addressing modes. Machines following this philosophy have been referred to as reduced instructions set computers (RISCs). Examples of RISCs include the Sun SPARCTM and MIPSTM machines. The above two philosophies in architecture design have led to the unresolved controversy as to which architecture style is “best.” It should, however, be men- tioned that studies have indicated that RISC architectures would indeed lead to faster execution of programs. The majority of contemporary microprocessor chips seems to follow the RISC paradigm. In this book we will present the salient features and examples for both CISC and RISC machines. 1.3. TECHNOLOGICAL DEVELOPMENT Computer technology has shown an unprecedented rate of improvement. This includes the development of processors and memories. Indeed, it is the advances in technology that have fueled the computer industry. The integration of numbers of transistors (a transistor is a controlled on/off switch) into a single chip has increased from a few hundred to millions. This impressive increase has been made possible by the advances in the fabrication technology of transistors. The scale of integration has grown from small-scale (SSI) to medium-scale (MSI) to large-scale (LSI) to very large-scale integration (VLSI), and currently to wafer- scale integration (WSI). Table 1.2 shows the typical numbers of devices per chip in each of these technologies. It should be mentioned that the continuous decrease in the minimum devices feature size has led to a continuous increase in the number of devices per chip, TABLE 1.2 Numbers of Devices per Chip Integration Technology Typical number of devices Typical functions SSI Bipolar 10– 20 Gates and flip-flops MSI Bipolar & MOS 50– 100 Adders & counters LSI Bipolar & MOS 100– 10,000 ROM & RAM VLSI CMOS (mostly) 10,000– 5,000,000 Processors WSI CMOS.5,000,000 DSP & special purposes SSI, small-scale integration; MSI, medium-scale integration; LSI, large-scale integration; VLSI, very large-scale integration; WSI, wafer-scale integration. 6 INTRODUCTION TO COMPUTER SYSTEMS which in turn has led to a number of developments. Among these is the increase in the number of devices in RAM memories, which in turn helps designers to trade off memory size for speed. The improvement in the feature size provides golden oppor- tunities for introducing improved design styles. 1.4. PERFORMANCE MEASURES In this section, we consider the important issue of assessing the performance of a computer. In particular, we focus our discussion on a number of performance measures that are used to assess computers. Let us admit at the outset that there are various facets to the performance of a computer. For example, a user of a computer measures its performance based on the time taken to execute a given job (program). On the other hand, a laboratory engineer measures the performance of his system by the total amount of work done in a given time. While the user considers the program execution time a measure for performance, the laboratory engineer considers the throughput a more important measure for performance. A metric for assessing the performance of a computer helps comparing alternative designs. Performance analysis should help answering questions such as how fast can a program be executed using a given computer? In order to answer such a question, we need to determine the time taken by a computer to execute a given job. We define the clock cycle time as the time between two consecutive rising (trailing) edges of a periodic clock signal (Fig. 1.1). Clock cycles allow counting unit compu- tations, because the storage of computation results is synchronized with rising (trail- ing) clock edges. The time required to execute a job by a computer is often expressed in terms of clock cycles. We denote the number of CPU clock cycles for executing a job to be the cycle count (CC), the cycle time by CT, and the clock frequency by f ¼ 1/CT. The time taken by the CPU to execute a job can be expressed as CPU time ¼ CC " CT ¼ CC=f It may be easier to count the number of instructions executed in a given program as compared to counting the number of CPU clock cycles needed for executing that Figure 1.1 Clock signal 1.4. PERFORMANCE MEASURES 7 program. Therefore, the average number of clock cycles per instruction (CPI) has been used as an alternate performance measure. The following equation shows how to compute the CPI. CPU clock cycles for the program CPI ¼ Instruction count CPU time ¼ Instruction count " CPI " Clock cycle time Instruction count " CPI ¼ Clock rate It is known that the instruction set of a given machine consists of a number of instruction categories: ALU (simple assignment and arithmetic and logic instruc- tions), load, store, branch, and so on. In the case that the CPI for each instruction category is known, the overall CPI can be computed as Pn i¼1 CPIi " Ii CPI ¼ Instruction count where Ii is the number of times an instruction of type i is executed in the program and CPIi is the average number of clock cycles needed to execute such instruction. Example Consider computing the overall CPI for a machine A for which the following performance measures were recorded when executing a set of benchmark programs. Assume that the clock rate of the CPU is 200 MHz. Instruction Percentage of No. of cycles category occurrence per instruction ALU 38 1 Load & store 15 3 Branch 42 4 Others 5 5 Assuming the execution of 100 instructions, the overall CPI can be computed as Pn i¼1CPIi " Ii 38 " 1 þ 15 " 3 þ 42 " 4 þ 5 " 5 CPIa ¼ ¼ ¼ 2:76 Instruction count 100 It should be noted that the CPI reflects the organization and the instruction set archi- tecture of the processor while the instruction count reflects the instruction set archi- tecture and compiler technology used. This shows the degree of interdependence between the two performance parameters. Therefore, it is imperative that both the 8 INTRODUCTION TO COMPUTER SYSTEMS CPI and the instruction count are considered in assessing the merits of a given computer or equivalently in comparing the performance of two machines. A different performance measure that has been given a lot of attention in recent years is MIPS (million instructions-per-second (the rate of instruction execution per unit time)), which is defined as Instruction count Clock rate MIPS ¼ ¼ Execution time " 106 CPI " 106 Example Suppose that the same set of benchmark programs considered above were executed on another machine, call it machine B, for which the following measures were recorded. Instruction Percentage of No. of cycles category occurrence per instruction ALU 35 1 Load & store 30 2 Branch 15 3 Others 20 5 What is the MIPS rating for the machine considered in the previous example (machine A) and machine B assuming a clock rate of 200 MHz? Pn i¼1 CPIi " Ii 38 " 1 þ 15 " 3 þ 42 " 4 þ 5 " 5 CPIa ¼ ¼ ¼ 2:76 Instruction count 100 Clock rate 200 " 106 MIPSa ¼ ¼ ¼ 70:24 CPIa " 106 2:76 " 106 Pn i¼1 CPIi " Ii 35 " 1 þ 30 " 2 þ 20 " 5 þ 15 " 3 CPIb ¼ ¼ ¼ 2:4 Instruction count 100 Clock rate 200 " 106 MIPSb ¼ ¼ ¼ 83:67 CPIa " 106 2:4 " 106 Thus MIPSb. MIPSa. It is interesting to note here that although MIPS has been used as a performance measure for machines, one has to be careful in using it to compare machines having different instruction sets. This is because MIPS does not track execution time. Consider, for example, the following measurement made on two different machines running a given set of benchmark programs. 1.4. PERFORMANCE MEASURES 9 No. of No. of Instruction instructions cycles per category (in millions) instruction Machine (A) ALU 8 1 Load & store 4 3 Branch 2 4 Others 4 3 Machine (B) ALU 10 1 Load & store 8 2 Branch 2 4 Others 4 3 Pn i¼1 CPIi " Ii (8 " 1 þ 4 " 3 þ 4 " 3 þ 2 " 4) " 106 CPIa ¼ ¼ ffi 2:2 Instruction count (8 þ 4 þ 4 þ 2) " 106 Clock rate 200 " 106 MIPSa ¼ ¼ ffi 90:9 CPIa " 106 2:2 " 106 Instruction count " CPIa 18 " 106 " 2:2 CPUa ¼ ¼ ¼ 0:198 s Clock rate 200 " 106 Pn i¼1 CPIi " Ii (10 " 1 þ 8 " 2 þ 4 " 4 þ 2 " 4) " 106 CPIb ¼ ¼ ¼ 2:1 Instruction count (10 þ 8 þ 4 þ 2) " 106 Clock rate 200 " 106 MIPSb ¼ ¼ ¼ 95:2 CPIa " 106 2:1 " 106 Instruction count " CPIa 20 " 106 " 2:1 CPUb ¼ ¼ ¼ 0:21 s Clock rate 200 " 106 MIPSb. MIPSa and CPUb. CPUa The example shows that although machine B has a higher MIPS compared to machine A, it requires longer CPU time to execute the same set of benchmark programs. Million floating-point instructions per second, MFLOP (rate of floating-point instruction execution per unit time) has also been used as a measure for machines’ performance. It is defined as Number of floating-point operations in a program MFLOPS ¼ Execution time " 106 10 INTRODUCTION TO COMPUTER SYSTEMS While MIPS measures the rate of average instructions, MFLOPS is only defined for the subset of floating-point instructions. An argument against MFLOPS is the fact that the set of floating-point operations may not be consistent across machines and therefore the actual floating-point operations will vary from machine to machine. Yet another argument is the fact that the performance of a machine for a given program as measured by MFLOPS cannot be generalized to provide a single performance metric for that machine. The performance of a machine regarding one particular program might not be interesting to a broad audience. The use of arithmetic and geometric means are the most popular ways to summarize performance regarding larger sets of programs (e.g., benchmark suites). These are defined below. 1X n Arithmetic mean ¼ Execution timei n i¼1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n Y n Geometric mean ¼ Execution timei i¼1 where execution timei is the execution time for the ith program and n is the total number of programs in the set of benchmarks. The following table shows an example for computing these metrics. CPU time on CPU time on Item computer A (s) computer B (s) Program 1 50 10 Program 2 500 100 Program 3 5000 1000 Arithmetic mean 1835 370 Geometric mean 500 100 We conclude our coverage in this section with a discussion on what is known as the Amdahl’s law for speedup (SUo) due to enhancement. In this case, we consider speedup as a measure of how a machine performs after some enhancement relative to its original performance. The following relationship formulates Amdahl’s law. Performance after enhancement SUo ¼ Performance before enhancement Execution time before enhancement Speedup ¼ Execution time after enhancement Consider, for example, a possible enhancement to a machine that will reduce the execution time for some benchmarks from 25 s to 15 s. We say that the speedup resulting from such reduction is SUo ¼ 25=15 ¼ 1:67. 1.5. SUMMARY 11 In its given form, Amdahl’s law accounts for cases whereby improvement can be applied to the instruction execution time. However, sometimes it may be possible to achieve performance enhancement for only a fraction of time, D. In this case a new formula has to be developed in order to relate the speedup, SUD due to an enhance- ment for a fraction of time D to the speedup due to an overall enhancement, SUo. This relationship can be expressed as 1 SUo ¼ (1 " D) þ (D=SUD ) It should be noted that when D ¼ 1, that is, when enhancement is possible at all times, then SUo ¼ SUD , as expected. Consider, for example, a machine for which a speedup of 30 is possible after applying an enhancement. If under certain conditions the enhancement was only possible for 30% of the time, what is the speedup due to this partial application of the enhancement? 1 1 1 SUo ¼ ¼ ¼ ¼ 1:4 (1 " D) þ (D=SUD ) 0:3 0:7 þ 0:01 (1 " 0:3) þ 30 It is interesting to note that the above formula can be generalized as shown below to account for the case whereby a number of different independent enhancements can be applied separately and for different fractions of the time, D1, D2,... , Dn, thus leading respectively to the speedup enhancements SUD1 , SUD2 ,... , SUDn. 1 SUo ¼ (D1 þ D2 þ % % % þ Dn ) ½1 " (D1 þ D2 þ % % % þ Dn )& þ (SUD1 þ SUD2 þ % % % þ SUDn ) 1.5. SUMMARY In this chapter, we provided a brief historical background for the development of computer systems, starting from the first recorded attempt to build a computer, the Z1, in 1938, passing through the CDC 6600 and the Cray supercomputers, and ending up with today’s modern high-performance machines. We then provided a discussion on the RISC versus CISC architectural styles and their impact on machine performance. This was followed by a brief discussion on the technological development and its impact on computing performance. Our coverage in this chapter was concluded with a detailed treatment of the issues involved in assessing the per- formance of computers. In particular, we have introduced a number of performance measures such as CPI, MIPS, MFLOPS, and Arithmetic/Geometric performance means, none of them defining the performance of a machine consistently. Possible

Introduction to Computer Systems PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue