Computer Components PDF

WillaCh03v3.qxd 8/30/06 5:58 PM Page 48 48 Chapter 3 ❚ Functional units and the fetch–execute cycle 3.1 The naming of parts – CPU, memory, IO units Although digital computers can be described as having only three major hardware units (Fig. 3.1), when you open your PC the motherboard often appears much more complicated. This section aims to allow you to identify which part is which, and then to correctly associate the principal functions. The schematic plan of a modern PC motherboard, with the visible parts labelled, is shown in Fig. 3.2. When you take the lid off a PC and look inside, the three units that you must try to identify on the motherboard are: central processing unit (CPU), main memory and input and output devices. These are the minimum set of components for a working digital computer; everything else could be classed as additional luxuries. Later in Chapter 12 we will have a look at disk drives, which can be the most significant part of an online data storage facility, but for the moment we will ignore them. The CPU will currently be a Pentium, with large metal fins attached to dissipate the excess power and keep the silicon from overheating. Different processors generate different amounts of waste heat: the Pentium can produce 65 watts, while the ARM only 1 watt. Where necessary, a miniature fan is fitted to assist with essential cooling. (Note that the human brain is rated at 12 watts.) Intel now fits temperature sensors to its processor cores which allow the system clock to be slowed down to reduce the production of heat, and avoid a catastrophic melt-down. Figure 3.2 illustrates the CPU plugging directly into a ‘Socket-A’, for the AMD Athlon/Duron, on a mother- board. Inserting all 462 pins, required by this type of processor, into a push-fit socket would be tricky and so a ZIF (Zero Insertion Force) unit is normally provided, allow- ing the CPU to be dropped (carefully) into place and firmly clamped with a side lever. The newer Socket 939, which handles the increased number of pins needed for the Athlon 64, operates in the same manner, but the older Pentium II and III used a different arrangement involving a secondary, or daughter, board holding the CPU, which was pushed into a special slot on the motherboard. Table 3.1 lists the currently available processor sockets. In a welcome move, new motherboards now have fixing points to accommodate the weight of CPU fan and cooling fins. Motherboards come in several sizes with slightly different organization of the chips. They are referred to as: AT, Baby-AT, ATX, ATX-mini and so on. Unfortunately, when attempting to exchange a motherboard unexpected problems can occur, such as the CPU cooler fan and the floppy disk compartment requiring the same space. Fig. 3.1 The principal components of a computer. WillaCh03v3.qxd 8/30/06 5:58 PM Page 49 3.1 The naming of parts – CPU, memory, IO units 49 Mouse USB socket SVGA monitor sockets socket LAN Keyboard Printer socket socket connector Sound IO Slots for ATXpower IO connector expansion cards Hard disk IDE VGA connectors Floppy disk Hard disk connector SATA sockets PCI slots Slots for ChipSet ChipSet dual in-line AGPslot memory modules Lithium (DIMMs) battery Socket A, 462 BIOS + Cr ystal Front panel CPU fan LEDs and switches connector connector n Fa Pentium CPU, heatsink and fan Fig. 3.2 PC-AT motherboard, Sometimes the only solution is to obtain a new-style case along with the mother- showing the locations of the CPU, memory board. The most obvious memory will be in the form of small DIMM cards and IO card sockets. plugged into sockets. Sometimes these cards have memory chips mounted on both sides. Then there are a variety of IO devices which need connecting: key- board, mouse, screen, printer and disk drives, not forgetting the front panel LEDs and reset switch! In Chapter 11 we will discuss how the expansion slots can be used for more IO connections. Now, though, you must try to see the de- vices in terms of their generic role. Appreciating the basic functions necessary WillaCh03v3.qxd 8/30/06 5:58 PM Page 50 50 Chapter 3 ❚ Functional units and the fetch–execute cycle Table 3.1 Changing Pentium Max socket sizes and pin clock types. Socket count (MHz) Processors 486 168 33 486DX Socket 1 169 33 486SX/DX Socket 2 238 50 486SX Socket 3 237 50 486SX/DX Socket 4 273 66 Pentium 60 Socket 5 320 66 Pentium 75 Socket 7 321 124 AMD K6 Slot 1 242 133 Celeron Slot 2 330 133 Pentium II Socket 370 370 133 Celeron(1.4 GHz) Socket A 462 200 Athlon XP(2 GHz) Socket 423 423 100 Pentium IV (2 GHz) Socket 478 478 200 Celeron (2.8 GHz) Socket 603/4 603/4 200 Xeon (3 GHz) PAC611 611 200 Itanium 2 (1 GHz) Socket754 754 200 Athlon64 (2.8 GHz) Socket940 940 200 Opteron Socket939 939 200 Athlon64 (3.8 MHz) for a computer to operate is more important in the long term than being able to dis- tinguish the deluxe from the GTi model! Computers can be divided into three principal subsystems, as represented in Fig. 3.1: the CPU, the main memory and the IO units. Each of these subsystems is often made up of many components. On the motherboard, all the components are inter- connected by signal highways, or buses. A bus is just a bundle of conductors, wires or tracks. The Intel 8086 had 20 shared address/data lines and a further 17 lines for con- trol. With the Intel Pentium, the data bus was expanded to 64 lines and the address bus to 32 lines, while the next generation of Itanium processors will come with 64 address lines. Each hardware unit is connected to the address, data and control buses. This provides a simple way of building up a complex system in which each unit can communicate with all others. New units can be plugged in with little disruption and failed units can be swapped out for testing. When you open up a PC and look for the bus wires you may be disappointed. The card, or motherboard, on which the chips are fixed, can be multi-layer, and is generally coated with opaque green varnish to protect the fine detail of the bus tracking. The bus interconnection scheme is often represented in diagrams as a wide pathway, rather than showing the individual wires (as in Fig. 3.3). It is then assumed that the connected devices can have access to any signal line that they require. WillaCh03v3.qxd 8/30/06 5:58 PM Page 51 3.1 The naming of parts – CPU, memory, IO units 51 Fig. 3.3 System interconnection schematic. Buses are not the only possibility. An alternative scheme which used point-to- point interconnection was tried, but led to an impossible tangle of wires when more than three units were involved. The number of pathways needed to link every possible pair of n units can be expressed mathematically: n1n 12 number of different pathways 2 Remember that each pathway will still require a full-width data highway, which could be 32 lines, and several (approximately 6) control lines. However, there is a benefit. Because the pathways are communicating point-to-point there would be no need for full-width address buses. Addresses serve the double purpose of specifying the desti- nation device on the broadcast highway, as well as selecting the position within the de- vice. The former would no longer be required with point-to-point pathways. Taking an example where the number of units to be linked, n, is 30: 30 29 number of different pathways 435 2 number of wires 435 132 6 2 16 530 The number of ‘wires’ demanded by this interconnection scheme is unmanage- able. You may like to check the method of calculating the required number of in- terconnection pathways by counting the lines in the simpler examples laid out in Fig. 3.4. Also shown for comparison is the much simpler bus scheme with six units. Fig. 3.4 Point-to-point escalation compared with simple bus interconnection. WillaCh03v3.qxd 8/30/06 5:58 PM Page 52 52 Chapter 3 ❚ Functional units and the fetch–execute cycle Fig. 3.5 Further decomposition. So, taking into account the large number of devices which need to communicate with each other in a computer, the bus interconnection scheme rapidly won favour. However, there is a serious disadvantage to bus architectures. Because an electronic bus, like a rail- way line, can only transfer one item of data at a time, it leads eventually to a limit on the performance that can be achieved even when faster processors are available. This con- straint is termed the bus bottleneck. So, very soon the transmission speed of signals on the bus becomes a critical parameter when considering how to increase system through- put. It is not possible to simply increase the clock rate, reducing pulse widths, to achieve faster transmission. The capacitance and inductance of the bus tracks place an upper limit on clock speeds. To get around this problem, wider data buses are being adopted for microprocessors. Since the early 1980s there has been an eight-fold change, from 8 to 64 lines. In addition, the more powerful computers employ several separate buses, thus allowing the computer to transfer many items of data simultaneously. Concerns about the limitations imposed by maximum rates of data transfer will reoccur in Chapter 14 when we deal with networked computers. To further investi- gate the hardware, the CPU can usefully be divided into two functional sub-units: the control unit and the arithmetic and logic unit. By attaching Fig. 3.5 to Fig. 3.1, you can see the whole structure developing. 3.2 The CPU fetch–execute cycle – high-speed tedium The computer has to read and obey every program, including the operating system itself, one instruction at a time. This initially appears to be an incredible handicap to performance when compared with the myriad of simultaneous activities that living organisms can carry out. The basic operation, known as the fetch–execute cycle, is the sequence whereby each instruction within a program is read one at a time into the CPU from program memory and then decoded and performed. Only the phe- nomenal speed of electronic hardware makes this tedious cycle of repeated events of practical value. At any moment in the historic development of computers, one of the three units involved in the fetch–execute cycle, memory, bus and CPU, will be the limiting factor. This can affect both the design parameters which computer en- gineers have to work within and also the selection of algorithms for problem solving. WillaCh03v3.qxd 8/30/06 5:58 PM Page 53 3.2 The CPU fetch–execute cycle – high-speed tedium 53 Table 3.2 Relative timings for ns Ms ms various activities. 1 1 1 1000000000 1000000 1000 Fetch–execute Light Human reaction 10 ns 300 m ms1 300 ms Logic gate TV line scan TV frame 20 ms delay 5 ns 60 ms SRAM access Interrupt Hard disk access 15 ns 2–20 ms 10 ms Engine spark Car engine 10 ms (3000 r.p.m.) 20 ms For example, there are sometimes ‘memory-intensive’ methods available or ‘com- pute-intensive’ alternatives. If memory is fast and cheap then the former may be bet- ter; otherwise the latter will be chosen. Presently, DRAM chips used in main memory are not as fast as the CPU. Faster memory devices are available (SRAM) but at much higher cost, so they are only used in small, fast buffers, known as memory caches. These can help somewhat to reduce the main memory access delay by holding copies of current instructions and data. Chapter 12 will deal with this topic in more depth. To further reduce the negative effect of von Neumann’s ‘single stream’ bottleneck, the new RISC (Reduced Instruction Set Computers) generation speeds up the fetch–execute rate by working on several 152 instructions simultaneously by means of a push- through pipeline. This will be explained more fully in Chapter 21. However familiar we become with computers, it seems impossible for us to really understand the incredible speed at which they operate. To attempt to bridge this gap in our understanding, Table 3.2 gives the speeds of various activities for com- parison. From Table 3.2 you can readily see the disparity in speeds between computer op- erations and the real world. Even something which appears to us to be very fast, like a single television scan line, is a thousand times slower than a CPU fetch–execute cycle. As we have seen, computer programs are made up of binary-coded instructions such as: 1011 1000 0000 0000 0000 0001 This is a Pentium instruction which would be expressed in assembler mnemonics as: MOV AX,0x100 Or more recognizably in the C programming language: ncount = 256; WillaCh03v3.qxd 8/30/06 5:58 PM Page 54 54 Chapter 3 ❚ Functional units and the fetch–execute cycle Fig. 3.6 Instruction Pointer (IP) register points to the next instruction in memory. This is a clear indication of the advantage gained from using programming languages (even the humble assembler!). This instruction will set the number 256 into the CPU accumulator register, AX. The fetch–execute cycle is the process by which the CPU retrieves from program memory the next instruction, decodes it, and carries out its requested action. Fully describing the fetch–execute cycle for the Pentium CPU would go far beyond this chapter, or even this book, but a summary can be given to help you appreciate that fetch–execute is actually made up of several separate phases of activity. In Fig. 3.6 can be seen the CPU with the Instruction Pointer (IP), Accumulator (AX) and In- struction Register (IR). Main memory holds the program, within which the single MOV AX,256 instruction can be seen in binary form. The Memory Address Register is also represented because it is mentioned in the following description. Figure 3.7 illustrates the fetch phase and Fig. 3.8 the execute phase for the MOV AX,256 instruction. Now try tracing out the sequence of actions which make up the fetch–execute cycle. This schematic example is relatively straightforward when compared with the ac- tual fetch–execute sequence of a modern processor such as the Pentium. Some in- structions require an extra execute cycle to read in an address value from memory, which is then used to access the desired variable from memory to be operated on. It is not necessary to rehearse the fetch–execute cycle for every Pentium instruction. You are only expected to understand the principles, and apply them to the simpler instructions. Programmers will only come up against problems at this level if they are working with unproven prototype hardware – an experience best avoided! WillaCh03v3.qxd 8/30/06 5:58 PM Page 55 3.2 The CPU fetch–execute cycle – high-speed tedium 55 Fig. 3.7 The fetch phase of the fetch–execute cycle. Main memory Main memory Fig. 3.8 The execution phase of the fetch–execute cycle for MOVAX,256. Main memory Main memory WillaCh03v3.qxd 8/30/06 5:58 PM Page 56 56 Chapter 3 ❚ Functional units and the fetch–execute cycle Fig. 3.9 Displaying the current CPU fetch–execute loading using Sun’s perfmeter. To summarize: the fetch–execute cycle is the sequence by which each instruction of the program is read from memory, decoded and performed. This may involve read- ing in further items of data from memory, so that an operation can be carried out, and writing result values back to memory. Both Unix and Windows provide tools to watch the various activities as they hap- pen. The Unix tool on Sun workstations is called perfmeter; on Linux it is xsysinfo; and on Windows it is the Performance Monitor. The Sun perfmeter display of the current CPU load is started with the command line: perfmeter -t cpu & The peaks in activity in Fig. 3.9 were caused by capturing and processing the screen image for the diagram! 3.3 System bus – synchronous or asynchronous? The signals sent out by the CPU along the system bus are separated into three distinct groups: Data bus – typically 32 bits wide, but will be increased to 64 bits Address bus – 32 bits wide, but will require more very soon Control bus – about 15 lines responsible for starting and stopping activities One of the control bus lines is the system clock. It is generated by a high-frequency crystal oscillator which can be identified on the motherboard as a small silver cylinder, often situated close to the CPU. In most circumstances it is the CPU that WillaCh03v3.qxd 8/30/06 5:58 PM Page 57 3.3 System bus – synchronous or asynchronous 57 starts an operation by sending signals along the bus wires to other units. These then respond by signalling back along the bus lines. Sometimes an action is trig- gered by a non-CPU unit, which may take control of the bus for a while at the ex- pense of the CPU. The sequence of bus signals must follow a very precise timing pattern. If the timing is completely locked into the system clock signal the bus is termed ‘synchronous’. Synchronous buses require all attached units, memory and IO chips to operate at the same speed. This arrangement is quite simple to implement from the hardware viewpoint, but lacks the flexibility to easily accommodate a range of devices operat- ing at a variety of speeds. The signal trace (Fig. 3.10) shows the timing relationships that operate on a synchronous bus. Note that 64 address lines and 32 data lines are collapsed onto single representative bus lines. The exact binary pattern carried is not relevant; you only need to know when there is a valid number on the bus. The figure represents an idealized screen from a four-trace oscilloscope. In real computers the actual voltage traces are much messier, and so quite difficult to disentangle. For a motherboard running at 100 MHz, often referred to as the front side bus (FSB) speed, the clock period for a single cycle is 10 ns, while a 200 MHz clock has a period of 5 ns. Try to follow the fetch–execute cycle on Fig. 3.10. It starts with the address of the next instruction (Addr1) emerging onto the address bus from the IP register. The memory is told to carry out a ‘Read’ from this location by the R/W control line. The instruction code emerges from memory (Instr) onto the data bus and travels back into the CPU where it is saved in the IR register. The fetch cycle is now complete. The execute cycle requires two further bus cycles, the first to read in an item of data for processing, and the second to write out the result. Fig. 3.10 Synchronous bus fetch–execute cycle timing diagram (100 MHz). WillaCh03v3.qxd 8/30/06 5:58 PM Page 58 58 Chapter 3 ❚ Functional units and the fetch–execute cycle Asynchronous buses are more complicated, but also more flexible. They allow the CPU to adapt its cycle period to the speed of the unit with which it is actively com- municating. The important difference between the systems is the removal of the cen- tralized system clock and the intoduction of ALE and DTA control lines (see Fig. 3.11). ALE (Address Latch Enable) is controlled by the CPU and is used to indicate to units on the bus when the address signal is valid. It actually fulfils part of the role of the system clock. The DTA (Data Transfer Acknowledge) signal does not emerge from the CPU, but is generated by the bus unit. It tells the CPU when the data write has been completed successfully, or when the data is valid for a read. If the memory needs more time it can delay activating the DTA signal and so delay the completion of the bus cycle. The ALE and DTA together are an example of a handshake system for data transfer. Looking at Fig. 3.11, we can see that the sequence starts with the CPU putting an address value onto the address bus and then activating the ALE control line. The unit addressed responds by copying the addressed location back onto the data bus and ac- tivating the DTA line in response. When the CPU recognizes the DTA signal it will read the data bus and then deactivate ALE, removing the address too. The sequence follows the dashed lines A, B and C. The distinction between synchronous and asynchronous working is relevant for all communication channels, including networks, so it is worth considering more fully here. The synchronous system distributes a clock signal from the CPU to all the com- ponents attached to the bus, which are then controlled by the downward-falling edges, as indicated by the dashed lines in Fig. 3.10. Edge A tells the memory to copy Fig. 3.11 Asynchronous bus cycle timing diagram. WillaCh03v3.qxd 8/30/06 5:58 PM Page 59 3.4 System clock – instruction cycle timing 59 the instruction code, from the addressed location, onto the data bus because the R/W signal is set for Read. Edge B then gets some data from memory, while edge C writes the result of the instruction execution back into another memory location. In this way, the memory chip is simply responding to the CPU control lines, but it has to act fast enough every time because the CPU has no way of telling whether the signals on the data bus represent valid data or are just random oscillations. If the memory can- not get an item of data out quickly enough, the CPU will carry on regardless until the erroneous data makes it crash! Now you see why the CPU can only run as fast as the slowest unit on the bus. This is a serious constraint when, as often happens, the sys- tem needs to use a few chips from a previous generation (legacy chips). 3.4 System clock – instruction cycle timing It should now be clear, following Sections 3.2 and 3.3, that the fetch–execute cycle is not a single event but comprises a number of distinct phases, or microcycles. These are generated from the system clock, and for a 200 MHz clock the microcycle period would be 5 nanoseconds: 1 1 1 1000 period ,t s ms ns 5 ns frequency micro 200 106 200 200 The example shown in Fig. 3.12 has five microcycles within the full fetch–execute instruction cycle. Each microcycle is responsible for the completion of a particular operation. There is now a major difference between CISC and RISC computers in the organization of how the microcycles are arranged and implemented. We will be in- vestigating this further during the following chapters. As clock speeds rose to increase processor throughput, they were becoming too fast for memories to respond to within a single cycle. Yet more problems arose which in- volved the broadcast band for VHF FM radio, which rests at 90–100 MHz (Fig. 3.13). This was exactly the speed intended for the next generation of processor clocks. To Fig. 3.12 Multi-phase instruction cycle. WillaCh03v3.qxd 8/30/06 5:58 PM Page 60 60 Chapter 3 ❚ Functional units and the fetch–execute cycle Fig. 3.13 Computers can emit strong radio interference in the FM band. make matters worse, the aerial length for FM broadcasts is of the same scale as the motherboard tracking, making them excellent emitters and receivers: c 3 108 lfm 3m f 100 106 Thus, a quarter of the full wavelength is 3/4 m, or 75 cm. The microcomputer pioneers exploited this effect when testing out the original MITS 8080 microcomputer. It had been supplied with little in the way of input–output facilities, offering only a single row of eight LEDs and eight toggle switches. This de- manded some ingenuity if anything approaching an impressive demonstration was required. Then it was noticed that a nearby transistor radio was inadvertently serving as an output device, so programs were quickly written to play tunes from the radio’s loudspeaker! For a time, the radio emission problem was reduced by the use of clock doubling. This allowed the CPU to accept a lower frequency clock from the mother- board and then to synthesize a higher rate for use on-chip. For example, the 66.6 MHz system clock was doubled by the 150 MHz Pentiums. The much smaller di- mensions of the CPU package reduced the quantity of VHF radio emission relative to the larger motherboard tracking. The long-term solution was to house computers in metal cases, which needed to be well earthed to screen away any radio emissions from inside. Current PCs have motherboard clocks running at 200 MHz, which for example are multiplied by 14 so Pentiums can operate at 2.8 GHz. Sometimes the mother- board clock is referred to as the FSB (Front Side Bus) clock because of the construc- tion of the old Slot 1 and Slot 2 CPU cards. The ‘front side’ connects to the main memory, on the motherboard, while the ‘back side’ connects to the cache memory held on the daughterboard itself. As the CPU is running many times faster than the main system bus and memory, the need for faster on-chip cache memory suddenly became a keen requirement, and the move from CISC to RISC architecture became more urgent. Chapter 21 discusses the philosophy and advantages of RISC in more detail. The maximum rate of signals in a computer is limited by fundamental electrical ef- fects: resistance and capacitance. Although we have drawn the signals as sharp pulses with vertical edges, the physical resistance and capacitance of the wires and tracking WillaCh03v3.qxd 8/30/06 5:58 PM Page 61 3.5 Pre-fetching – early efforts to speed things up 61 Fig. 3.14 Clock frequency limiting. have the effect of ‘rounding the edges’. Problems start when a pulse ‘tail’ starts to overlap the rising edge of the next: see Fig. 3.14. To increase the rate of pulses, which means the frequency of the clock, the com- puter hardware engineer needs to reduce the resistance and capacitance of the inter- connecting bus wiring. This has been successfully achieved by miniaturizing more and more of the circuits onto silicon chips. If the signal track or wire can be made with lower capacitance, it will need fewer electrons to charge it up to the required voltage (logic 1). Similarly, it will be able to discharge to the lower level (logic 0) much more quickly. For many years, the voltage levels representing logic 1 and 0 have generally followed the standard set in 1970 by Texas Instruments for its TTL components. Logic 0 was in- dicated by a voltage less than 0.8 V and logic 1 by a voltage greater than 2.5 V. This scheme works well with a 5 V power supply, but now, in order to increase clock speeds, a lower power supply of 3.5 V has been adopted by several chip manufacturers. This successfully reduces the transmission line charging and discharging delays, but de- mands tighter signal margins and less power supply noise. The latest voltage levels for fast chips are even lower (1.8 V). Another important factor constraining the CPU clock speed is the amount of heat being generated in the chip. Silicon transistors do not work so well when they are hot, and the faster the clock the more heat is generated. Pentiums can get so hot that they stop working, or require extra cooling equipment such as fans, to keep them running. There are also some intriguing experiments reported on using ice cubes to provide the extra cooling for CPUs. There is even a commercial refrigerator unit available to reduce chip temperatures to well below 0 °C which is sometimes used during CPU development work, or overclocking experiments! 3.5 Pre-fetching – early efforts to speed things up The fetch–execute sequence of microcycles (Fig. 3.12 ) can be made to run faster, but only within the limits of current technology. It was realized that the CPU was being held up at a number of moments during the fetch–execute cycle, principally because

Computer Components PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue