Computer Architecture & OS (PDF)
Document Details
Uploaded by ExuberantUnicorn
Tags
Summary
This document provides a basic overview of computer architecture and operating systems, including historical context, from tally sticks to modern computer systems. It details concepts such as the decimal and binary systems, conversions, and logic gates. It is intended for an undergraduate audience.
Full Transcript
COMPUTER ARCHITECTURE & OS Unit 1- BASIC CONCEPTS OF COMPUTER ARCHITECTURE 1.1 Historical Overview Computer= a human who is good at executing mathematical operations (definition by Braithwaite). Any device that assists people in performing...
COMPUTER ARCHITECTURE & OS Unit 1- BASIC CONCEPTS OF COMPUTER ARCHITECTURE 1.1 Historical Overview Computer= a human who is good at executing mathematical operations (definition by Braithwaite). Any device that assists people in performing calculations could be considered a computer. It includes both prehistorical counting tools and modern computers. Tally stick A device that was used between 35,000 BCE—20,000 BCE to record numbers and quantities (couldn’t do calculations). The oldest discovered tally stick was found in the Belgian Congo. Values were recorded on the bone with marks carved into three columns. Abacus A calculation tool that dates ca. 1700 BCE. Also known as a counting frame, the abacus is still used in some countries. It consists of a wooden frame and multiple rows of movable beads, which represent digits. The abacus can be utilised to perform basic arithmetic calculations and to find square and cubic roots. Pascal This calculator was used to add and subtract two numbers directly, and multiply and divide two numbers by repeatedly calculator performing addition or subtraction operations. This calculator was the size of a shoebox and could execute calculations involving numbers up to six digits. The metal dials looked like spoked wheels. Digits from 0 through 9 were displayed around each of the wheels. Stepped A mechanical calculator that could multiply two eight-digit numbers, divide a 16-digit number by an eight-digit number, reckoner and add or subtract an eight-digit number from a 16-digit number. This calculator would multiply numbers by repeatedly performing addition operations. Similarly, division was performed through repeated subtraction operations. Jacquard loom A mechanical device that used a chain of punched cards. It was the first machine that could be programmed. While other computers were only used to perform fixed calculations, the Jacquard loom could be programmed using punched cards. For this reason, it is considered the grandfather of punched card computers, which were used to plan missions to space. Babbage A numerical table is a tool used to save computing time in fields like mathematics, astronomy,navigation,physics,statistics machine and engineering. The oldest discovered tables were compiled in Babylon between 1800- 1500 BCE. Preparing such tables required a lot of calculations, and the result was full of errors. BABBAGE MACHINE: In 1820, the Astronomical Society asked Babbage to improve the tables of a navigational book. Babbage and his team constructed the formulas and distributed the arithmetic to clerks. To decrease errors, they performed the calculations twice using two different clerks, and then compared the tables to check for inconsistencies. There were many errors. This represented the beginning of the Babbage machine. Babbage invented the first model of difference engines. This engine is based on Newton’s method of divided differences. If the value of a polynomial P(X) (e.g., P(X) = x2 + 3) at a specific point (i.e., when x = a) is given, then the value of a polynomial at any nearby point can be calculated. For example, when P(1) , P(2) , and P(3) are known, the value P(4) can be calculated from the known values. The Babbage machine consists of N columns to calculate polynomials of degree n(xn) , where each column stores one number. The machine adds the value of the column n+1 to the value of the column n in order to obtain a new value of column n. Column 1 represents the result of the calculation, while column N only stores a constant. TABULATING SYSTEM: Hollerith invented an electric tabulating system to process statistical data. The system operated using punched cards. He founded the Tabulating Recording Company, which was renamed to International Business Machines (IBM). Turing Machine Turing introduced the model of a universal machine that became the foundation of modern computers. Turing’s machine is a general model of a CPU which enables the computer to manage data manipulation. A Turing machine is a hypothetical abstract computing device consisting of a read/write head, a tape passing through this head, and a state table to record the state of the machine. The tape is divided into cells, where each cell holds a single symbol, such as zero, one, or “blank”. This tape represents the machine’s storage medium. The read/write head can move a cell to the left or right, depending on the structure given to the machine. For example, a machine structure could be “If the cell value is zero, then move one to the right.” The head can read the cell’s value, erase the symbol, and write a new value (for example, if the cell value is one, it can erase it and write zero). With a chain of instructions, the machine will be programmed. ENIAC Mauchly and Eckert built the grandfather of digital computers, the Electronic Numerical Integrator and Calculator. This Turing machine consumed 150 kWh, possessed vacuum tubes, resistors, capacitors, and relays. The vacuum tubes acted as an ON/OFF switch and made the calculations possible. The ENIAC could multiply two ten-digit numbers. Vacuum tube computers also contributed the term “bug” (i.e., error) into the programming vocabulary. Hopper coined it when she found that the cause of a calculation error was a moth trapped in one of the relays of a Mark I computer Modern In 1947, the transistor was invented. Here, vacuum tubes were replaced to perform electrical switching. The size and energy consumption of the computers decreased, while their computing power drastically increased. Hopper developed the common business-oriented language (COBOL), the first programming language. computer Engelbert introduced the prototype of a computer with a graphical user interface (GUI) and a mouse. systems Bell Laboratories developed the UNIX operating system. IBM invented the floppy disk. Between 1974 and 1977, commercial personal computers (PCs) and Apple I first appeared on the market. 1.2 Digital Logic & Binary Arithmetic Mechanical calculators like Babbage’s use numbers 0 to 9 for calculations. However, electronic calculators operate according to electrical signals so regulating and using ten different voltages is not an easy task. Therefore, scientists have reduced the number of counting options to only two; we would incorporate two voltages: a “high” voltage for 1 (ON or true state) and a “low” voltage for the 0 (OFF or false state). Decimal system A decimal numbering system is a base 10 numbering system that uses ten symbols, 0 through 9, to represent all possible numbers. Human beings use the decimal numbering system to represent numbers because they have ten fingers. Each digit represents a power of ten. The place values start from the rightmost digit and increase by a factor of 10 as you move leftward. The equation is written as sum of the decimal symbols (0-9) multiplied by the power of the system base. Therefore, any WZYZ10 can be written as: WXYZ10 = W* 103 + X*102 + Y * 101 + Z*100 (ex. 23510 = 3*10 2 + 4*10 1 + 1*10 0) Binary System Computers have a lamp, which can show only two different symbols (states): ON and OFF (or 1 and 0). This requires computers to use the binary system, the 2-based system, where only two symbols are used. To write a number using the binary (2-based) system, we should write it as a sum of the binary system symbols (i.e., 1 and 0) multiplied by the powers of the system base (i.e., 2). Ex. 1000102= 1*25 + 0* 24 + 0* 23 + 0*22 + 1*21 + 0*20 Decimal to binary In decimal to binary conversion, you keep on dividing the decimal number by 2 repeatedly until the result of the division operation is 0 while recording the remainder. The remainders will determine if the corresponding digit in the binary representation is 1 or 0. Remainder has to be read in reverse order to get the binary number. Binary to Assign place values to each digit of the binary number, starting from the rightmost digit and decimal working leftward, starting with 0 (in this case 5). Multiply each digit of the binary number by its corresponding place value and add the results: 1*25 + 0* 24 + 0* 23 + 0*22 + 1*21 + 0*20 = 32 + 0 + 0 + 0 +2 +0 = 34 34 is the decimal number Addition The numbers are written on top of each other and then added up in places from right to left. The fourth formula means that we have a carry of 1 at the corresponding position, which is added to the next position. Hexadecimal Hexadecimal is a base 16 numbering system used in computing and digital communication (from 0 to 9 and then from A Representation to F – “A” represents decimal 10, “B” represents 11, and so on). ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Hex→ Dec Assign place values to each digit of the hex number, starting from the rightmost digit and working leftward, starting with 0 (in this case 2). Multiply each digit of the hex number by its corresponding place value and add the results: 31A16= 3* 162 + 1* 161 + A* 160 = 768 + 16+ 10 = 79410 ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Dec→ Hex Keep dividing the decimal number by 16 repeatedly until the result of the division operation is 0 while recording the remainder. The remainders will determine the corresponding digit in the hex representation. Remainder has to be read in reverse order to get the binary number. 41210 : 16 = 25 12 25 : 16 = 1 9 Result: 19C 16 1 : 16 = 0 1 Logic Gates In digital electronics the basic binary operators, known as bitwise operators, are known as logic gates. These logic gates operate on two inputs (bits), and will produce an output. The logic gates are the building blocks of all electronic devices, including computers. In electronics, six bitwise operators AND, NAND, OR, NOR, XOR, and NOT are called basic logic gates. Bitwise operators= Each 1 or 0 in a binary number is called a bit. These bitwise operators perform functions bit-by-bit on one or two binary numbers. Digital electronics= In digital electronics, instead of using analog/continuous (a varying signal between two limits) , we use digital/discrete (presence/on or absence/off of an electrical signal) signals. NOT The complement, or NOT operator, is the only bitwise operator that operates on a single binary number. It turns all ones into zeros and zeros into ones. Ā correspond to NOT (A) OR This operator generates the union of two bits and is mathematically represented by ∨. First, it lines up the two binary numbers to match the bits. If they do not have the same length, add the zeros to the left side of the shorter number to match the length of the longer one. The operator then compares the first bit of the first number to the first bit of the second number. If any of them (or both) is 1, then the output of the OR operator is 1. If both are 0, then the output is also 0. NOR This operation is represented by the following equation: 𝐴 ∨ 𝐵 If both inputs are 0, the output is 1; otherwise, the output is 0. XOR This operation is represented mathematically by the symbol ⊕. Same inputs= 0; Different inputs= 1 AND AND is mathematically represented by ∧. If either or both bits are 0, the result is 0. If both bits are 1, the result is 1. NAND NAND is represented by the following equation: 𝐴 ∧ 𝐵 If at least one of the bits is 0, the output is 1; otherwise, the output is 0. Digital Circuits A digital electronic device such as a microprocessor is a highly complex electronic circuit that processes discrete-values variables (like voltage). A digital circuit can be conceptualized as a black box composed of four elements: 1. One or more input terminals that digest discrete values 2. One or more output terminals that produce discrete values 3. A functional specification to describe the relationship between input and output terminals 4. A timing specification to describe the time delay between any change in the input terminals and the corresponding response in the output terminals Types of digital There are two types of digital circuits: circuits Combinational → the outputs of the circuit depend only on the current values of the inputs, meaning that the current input values are combined to produce the output values. To provide an example of a combinational circuit, we could consider the NOR circuit. This circuit has two inputs and one output made of OR and NOT logical gates. Sequential → the outputs of the circuit depend on the current and previous values of the inputs, which means that the output depends on the sequence of the inputs. The previous state of the inputs can be memorized by distilling the previous inputs into a smaller chunk of information. This is referred to as the state of the system and it is stored in a set of bits called state variables. Digital Adder We can use basic logic gates to perform basic arithmetic operations, such as addition. For example, for this addition, we can use thisFull Adder Logic Circuit (page 29): 1.3 Semiconductor Technology Early electrical computers, such as the ENIAC, contained vacuum tubes, which are designed to perform binary calculations. The problem with vacuum tubes was their size and high energy consumption, so, in modern computers, they have been substituted with transistors. Semiconductors Materials that have electrical conductivity between a conductor (can conduct electricity e.g., copper) and an insulator (cannot conduct electricity e.g., wood or rubber). Silicon and germanium are the most commercially used semiconductors because they have four valence electrons (the electrons in the outermost shell, i.e., the energy level of an atom. Whether an atom can connect to another atom depends on how many electrons are located in the outermost orbit) in their page 31 - 33 outermost layers, which makes them susceptible to gaining or losing electrons. Silicon atoms form covalent bonds (a chemical bond that involves sharing electrons between atoms - page 31) with each other, which results in a silicon crystal. Semiconductors are special elements of the periodic system of elements that under certain conditions can conduct electricity and under others, they don’t; so they can be either conductors or insulators depending on conditions. Transistors A transistor is an electrical switch that can be controlled and put in the ON (1) or OFF (0) state by applying different voltages to the input terminals. Transistors are made from semiconductor materials. A transistor is made of three terminals: gate (g), source (s), and drain (d). The current flows from the source to the drain; the gate is the push button that controls the flow of the current between the two through the voltage. By applying voltage on the gate, it can be controlled whether there will be a connection between source and drain or not. The most commonly used transistors in computer chips are metal-oxide-semiconductor field-effect transistors (MOSFET or MOS). There are two types of MOSFETs: nMOS and pMOS. nMOS: G=0 → There is NO CONNECTION (OFF) pMOS: G=0 → There is A CONNECTION (ON) nMOS: G=1 → There is A CONNECTION (ON) pMOS: G= 1 → There is NO CONNECTION (OFF) nMOS transistor The nMOS transistors are made of an n-type semiconductor drain and source terminals installed on the top of a p-type semiconductor substrate, which is always connected to the ground or the lowest voltage of the system. The gate is made of polysilicon and separated by a layer of SiO2 insulator (silicon dioxide) from the p-substrate, the source, and drain terminals. This is why the term “metal-oxide-semiconductor” (MOS) appears in the MOSFET name. HIGHLIGHTED STUFF & NOTES PAGE 35 nMOS: G=0 → There is NO CONNECTION (OFF) nMOS: G=1 → There is A CONNECTION (ON) pMOS transistor pMOS transistors are made of a p-type drain, and source terminals installed on the top of an n-type semiconductor, with a gate made of polysilicon. The operation mechanism of a pMOS transistor is opposite to that of an nMOS transistor, in that a pMOS transistor is OFF (i.e., no current flows between “s” and “d”) when the voltage at the gate is one (high) and it is ON when the current at the gate is zero (low) pMOS: G=0 → There is A CONNECTION (ON) pMOS: G= 1 → There is NO CONNECTION (OFF) Transistors to nMOS transistors are connected in series→ output of one is connected to the terminal of the other form a logic pMOS transistors are connected in parallel → their outputs are connected together and they’re both under VDD gate nMOS transistors are always connected to the ground while pMOS are always connected to VDD ○ nMOS = on ground = 0 → so: 0+0= 0 & 0+ 1 = 1 ○ pMOS = on VDD = 1 → so: 1+0= 1 & 1+1= 0 NOT gate In order to build a NOT logic gate (circuit), we could connect two pMOS and nMOS transistors. The source of the pMOS is connected to the drain terminal of the nMOS and, together, they form the output (O). The source terminal of the nMOS is connected to the ground, whereas the drain terminal of the pMOS is connected to the supply voltage VDD. The gates of both transistors are connected to the input (I). NAND GATE page 37 AND GATE 1.4 -hardware Design & HDL The process of finding an optimized set of logic gates to perform a complex function, such as a CPU, is very complicated and prone to errors. An error in hardware design could be very expensive HDW Design Flow: An algorithm, certain sets we must follow when we want to chive the goal of designing specific electrical circuits in a chip Integrated Also known as a microchip, a set of electronic circuits on a small flat piece of semiconductor is called integrated circuit Circuit (IC) (IC). There are two types of integrated circuits, depending on their application: Field portable gate array (FPGA) chips → The FPGA chips can be programmed or configured at the field by the user, who is normally a manufacturer of electronic devices, to act in a certain way, such as sound system equipment. This type of chip is made of thousands of logic blocks linked via programmable interconnections. Application-specific integrated circuit (ASIC) chips. ASIC chips cannot be programmed or modified. They are meant only for one purpose. In this context, a manufacturer, e.g., a cell phone manufacturer, represents the creator of the product. The CPU of your cell phone is an example of an ASIC. Both FPGA and ASIC chips are designed using hardware description language (HDL). Flow of IC 1. The IC development team collects the requirements set from the customer, design who will implement the IC in their products. 2. The product specifications abstractly define the overall architecture of the chip and its functionality, such as computational power. In this case, they will be set according to the collected requirements. 3. The architecture of the chip is designed. This specifies the chip’s components and their interconnections, and the data flow inside the chip. → where we put what 4. The architecture is translated into a code using HDL. 5. The code produced in HDL is verified by feeding the input into the IC and checking the output. This step is called pre-silicon verification. The output of this step is a verified HDL code of the IC. → simulate the microchip. 6. The verified HDL code is used by a computer-aided design (CAD) tool to generate a detailed design including all components, such as logic gates, transistors, and their interconnections. This process is called logic synthesis. 7. The logic synthesised design is translated into a real chip with physical components and routing between these components. This step is not just a schematic design, it is the physical design, which should consider factors such as environmental conditions. 8. Finally, in the post-silicon validation step, a physical sample IC is produced, tested, and verified. After verifying the IC, mass production is started. Logic Design In the 1990s, hardware designers found that the hardware design process was much faster and more reliable if they using HDL in the specified the hardware logical function at a very high abstract level using HDL, and then utilized a CAD tool (software that AsIC design flow created computer models using geometrical parameters) to generate optimized gates. A block of hardware with inputs and outputs is called a module. For example, AND gates and full adders are both hardware modules. Two models can be used to describe the module functionality: the behavioural model, which describes what a module does; the structural model, which describes how a module is constructed from building blocks. HDL focuses on behavioural models. HDL HDL is not a programming language like C, C++, or Python, but a declarative language without any execution. It is a static description of the gate diagrams, which is used later by a CAD tool to generate the optimized gates for the defined functionality. (you tell the hardware you want in HDL → HDL is translated by CAD tool to a transistor) There are two main purposes for utilising HDLs: 1. logic simulation: The expected inputs are fed into the module written with HDL and the outputs are checked to verify that the module operates according to the customer requirements and product specifications. 2. logic synthesis: This is the process of transforming the descriptive HDL code into a schematic hardware netlist, such as logic nodes, transistors, and the connection between them. Therefore, a general HDL code structure is composed of two main subsets: 1. Testbench, which is a code that receives inputs and generates and verifies the outputs 2. Synthesizable modules, which describe the real hardware. So far, there are two main HDL languages which have dominated the market: 1. VHDL→ developed in 1981 by the US Department of Defense and became a standard of the IEEE in 1987 2. Verilog →developed in 1984 by Gateway Design Automation and became an IEEE standard in 1995. Both VHDL and Verilog languages are based on similar principles with different syntaxes. Verilog Code module[module_name]([port_list]); → the behaviour code must start with keyword module module[module_name]([port_list]); → port_list: can be input or output, can have any number of inputs module[module_name]([port_list]); and any number of outputs, written in round brackets (..), each is module[module_name]([port_list]); separated by comma, can be written in any order→ (parameters) [list_of_input_ports]; → each input is separated by comma, can be scalar (without specifying a range: module[module_name]([port_liex. input logic a) or a vector (by specifying a range: ex. input logic [3:0] a) [list_of_output_ports]; → each output is separated by comma, can be scalar or a vector. [declaration_of_other_signals]; → placeholder for additional signals you may need in the module module[module_name]([port_l (ex. internal registers, wired, temporary storage elements, etc.). module[module_name]([port_l They follow the same syntax as the port declaration [behavioral_code:for_third_module]; → the operations that should be done based on inputs and module[module_name]([port_l internal signals, and outputs are assigned accordingly → comments that will not be interpreted by the CAD tool endmodule → the behaviour code must end with this keyword,it isn’t a statement, no semicolon, important to put it Each line, other than endmodule, ends with a semicolon NOT gate module inv(a,y); (inverter) in input logic [3:0]a; → logic: data type of the port: a single bit can take the value of 0,1, x (unknown) Verilog output logic [3:0]y; → [3:0] : a And y are 4-bit vector (aaaa), where a is the most le[module_name]([port_ significant bit and a is the least significant bit (ex. can assume value of “0100”) assign y=~a; →Assign is the function: in this case assigns to y the value of not a endmodule SYNTHESISED CIRCUITS OF NOT GATE’S HDL CODE Main basic logic module gates(input logic [3:0]a,b, → another way of putting inputs and outputs gates module output logic [3:0]y1,y2,y3,y4,y5); ports assign y1= a & b; → AND assign y2= a | b; → OR assign y3= a ^ b; → XOR assign y4= ~(a & b); → NAND assign y5= ~(a | b); → NOR endmodule Unit 2- Computer Architecture 2.1 Computer Architecture Design Goals A computer system is composed of different parts, which are mainly categorized into two groups: 1. Hardware, or the physical elements, such as memory or the central processing unit (CPU) 2. Software, i.e., the programs that control the hardware, operation, and functionality of the computer. Two terms should be clarified when talking about computer system hardware: 1. computer system architecture→ describes the physical aspects of the system, the design of the different parts of the computer and how they communicate with each other. Is mainly concerned with circuit design and memory types, among other things. 2. computer system organization→ an abstract description of the computer’s internal organization. the physical arrangement of the components inside the computer and how they work together to execute tasks, like fetching data from memory or performing calculations. You may order two laptops with two different types of CPUs manufactured by two different companies, such as Intel and Advanced Micro Devices (AMD). If these two CPUs have the same architecture, to a programmer they will look identical because they have the same instruction sets, even though they most likely have different organizations (e.g., they have different circuits designed by the engineers at the two manufacturers). To give another example, assume you want to add two 32-bit numbers and have the following two options: Choose a high-performance 32-bit architecture hardware that can operate directly on a 32-bit number in a single operation, or choose to have a low-performance (and low-cost) 16-bit architecture hardware that first splits a 32-bit number into two 16-bit numbers, then performs the operation. The result of both types of hardware is the same. In this case, we have computers with the same architecture, but different organizations. The main goal of computer system architecture is balancing the performance, reliability, efficiency, and cost of a computer system. von Neumann System architecture can be explained using one of the simplest and earliest computer system architectures, the von Architecture Neumann architecture. He aimed to design a simple fixed-structure computer that could perform any complex computation without hardware modification, as long as it is provided with proper program and instructions. Basically, a computer is a system composed of four main components: 1. A processor/microprocessor as the computing part of the computer 2. A memory to store both data and instructions→ data and instructions share the same memory unit, e.g., random access memory (RAM). Therefore, the instruction could be modified in the same way as the data. The length of the word will specify the internal structure of the memory 3. One or more input/output (I/O) devices for transferring the data to and from the outside (ex. keyboard, mouse, monitors, printers) 4. A bus system as the means (i.e., parallel wires consisting of 8, 26, 32 or 64 lines) to transfer data and instructions between the processor components and the memory CPU Responsible for all the processing, which includes doing calculations and sending commands to other hardware components, using data stored in memory. It takes that input from memory and creates output, the processing results. ALU Arithmetic & Logical Unit. Performs basic arithmetic calculations (adding, subtracting) and logic operations (AND, OR….) Control Unit It controls the operation of the ALU, communication with the input and the output devices, t interprets the instructions and carries them out. Decides what to do next (read data, perform logic operations,…) Typical tasks of the CU are controlling and orchestrating CPU operations, managing the data-flow between memory and CPU, recognizing and accepting the subsequent instructions, and decoding said instructions. Registers High-speed memory blocks inside the CPU; used to store data fetched from the main memory before they are processed. Registers have to cooperate to execute the operations because there are a lot of steps that have to be done but they need to be done in sequence because there is only one processor, only one CIR, only one PC… so the steps have to be done in sequence for the instructions to be correctly executed. There are 5 registers: 1. Program Counter (PC): contains the memory address of the next instruction that should be executed→ keeps track of your current position in the algorithm. After the instruction has been fetched and decoded, the PC register points to the next instruction address that should be fetched. 2. Memory Address Register (MAR): contains the address of the current instruction in memory, or the next data to be transferred 3. Memory Data Register (MDR): contains the contents of the memory address that the MAR is pointing to and contains data to be transferred a. MAR and MDR are responsible for accessing data in memory: it can be reading data or writing data 4. Accumulator (AC): stores the intermediate results of arithmetic or logic operations. 5. Instruction register (IR): contains the current binary instruction that is being executed (contains the instruction itself, not the address) BUS The interconnections between the elements inside the processor, as well as between the processor and the memory, are realized using buses (i.e., parallel wires consisting of eight, 16, 32, or 64 lines). Control bus: bidirectional bus that transmits control signals (write-memory, read-memory, nothing) between CPU elements and between CPU, memory, and I/O devices to coordinate the computer operations. Address bus: used to transfer memory addresses of the data and instruction that may be read from or written to→ CPU places address on Address bus which then sends it to either memory or output device. Data bus: bidirectional bus used to send and receive data and instructions. As each bus is a shared transmission medium, only one device at a time can send signals or data along a bus. Together, all three buses form a bus system. Fetch-Decode- Instructions are fetched one by one from the memory. The processor decodes and executes one instruction at a time, Execute Process and, upon completion, looks for the next instruction to fetch. This process continues until there are no more instructions to execute. This is known as the “fetch-decode-execute” process. Page 50 - 51 Harvard A central characteristic of the von Neumann architecture is that the data and the instructions are treated in the same way. Architecture This creates an issue known as the von Neumann bottleneck, in which the data and instructions are stored on a shared memory and can only be accessed through the data bus. Therefore, when the data is transmitted via the data bus, the CPU cannot do anything (i.e., it is in the idle state). The speed of the data processor is unimportant; it must wait for the data to be fetched. → I/O is slower than CPU One approach to resolve this problem is to implement what is called Harvard architecture, which was first developed by IBM in 1944 for the Harvard Mark I relay-based computer. At its core, the Harvard architecture has the same components as the von Neumann architecture. However, the former has two separate memories for the data and instructions that can be accessed independently through two separate buses: the data bus and the instruction bus. By accessing the data and instructions through two separate buses, they can be fetched simultaneously, which makes the process faster and more efficient. 2.2 Instruction Set Architecture Program= a sequence of instructions for a specific task. An instruction tells the processor what to do, such as move data from register A to register B, or perform an arithmetic or logical operation on data stored in register A. Reprogramming is only possible when we can modify the instructions as we can modify the data. In the von Neumann architecture, data and instructions are treated the same, which means that we can modify the data as we would the instructions. In computer science, the instruction set architecture (ISA) provides a logical view of the computer system’s capabilities. The ISA is the interface between hardware and software, and it is the only way in which the system designer can interact with the hardware, and can be viewed as a programmer’s manual. The ISA defines the instruction set a microprocessor can execute, the rules for using the instructions (such as addressing mode or mnemonics (the abbreviations of the operators in the assembly language), and the method used to encode instructions into the machine language. Processor design details are closely related to the type of ISA. In more recent ISAs, instructions can accept more arguments (operands). The human representation of the computer’s programming language (known as machine language) is called assembly language, used to write a code in terms of processor capabilities. An assembler translates the assembly language into machine language. Look in the ISA→ find needed commands→ write assembly code→ translate it into machine language. —------------------------------------------------------------------------------------------------------------ It is important to note that in high-level programming languages such as C, Python, and Java, the programmer is not always aware of the processor architecture, and the same source code can run on the same family of processors. Regardless of this, it is crucial to have a deep knowledge of assembly languages in order to write a compiler program. In fact, when programmers know how the processor ingests the instructions, they can better manage and control the system and its resources. This results in program optimization and better code performance. Types of Basic Assembly language→ first level of abstraction. assembly This assembly corresponds exactly to the underlying architecture. The machine itself (The CPU, registers, etc.) are languages controlled by 0s and 1s. This is the so-called machine code. The very first level above the machine code is the basic assembly language: one instruction in machine code corresponds to one instruction of basic assembly. One instruction in basic assembly corresponds to one instruction in machine code. Hierarchy of abstraction: “machine code→ basic assembly”. Extended assembly language→ second level of abstraction This assembly language does not correspond one-to-one to the underlying architecture. One instruction of the extended assembly language corresponds to several instructions in machine code. This assembly language provides more functionality than the basic assembly language. The architecture is not changed, you just have the “illusion” of having more complex instructions. In reality the machine code still has only a limited number of instructions. ---------------------------------------------------------------------------------------------------------------------------------------------------------------- One of these extended assembly languages is the MIPS assembly language, developed for a MIPS architecture, which is one of the RISC architectures suitable for educational purposes. With extended assembly language for MIPS, you have the “illusion” of having a bigger set of instructions available, but this is just a result of abstraction. The underlying architecture is still RISC, and each “extended assembly instruction” must be transferred to machine code. This machine code can (and will, for the more complex instruction) contain more machine code instructions than you wrote in your assembly code. RISC Despite the complexity of the instructions, there are two types of instruction set architecture (computers): Reduced instruction set computer (RISC): a minimal set of simple instructions. Number of clock CPI is small Number of instructions per program is big A lot of lines of assembly code are needed for a task. An example of this is the Advanced RISC Machines (ARM) processor. LOAD A,a → the processor loads the content of memory location a into register A LOAD B,b → the processor loads the content of memory location b into register B PROD A,B → Prod performs the multiplication of the two numbers to get the result STORE a,A → it stores the result of the multiplication in register A in the memory location a CISC Complex instruction set computer (CISC): a set of many instructions that aim to execute a task in as few lines of a program as possible (ex. Intel Pentium processor). Bigger set of instructions; instructions are more complex and can take more than one clock cycle. Number Of clock cycles per instruction is big Number of instructions per program is small MULT a,b → The MULT instruction operates directly on the machine memory and does not need to load the data from memory into the registers first or store the results in another register. At first glance, the RISC operation seems more time-consuming and less efficient because the processor must process four lines of program instead of one. However, the required time to perform this operation in CISC architecture is almost the same. It takes only one clock cycle to execute each line in the RISC architecture, while it takes multiple clock cycles to execute the MULT instruction in the CISC because it is a multi-cycle instruction. A RISC architecture also needs fewer transistors, meaning that there is more space on the CPU hardware for extra registers MIPS A RISC instruction set. The microprocessor with an MIPS instruction set has been used widely for educational purposes Instruction Set due to its simplicity. MIPS processors are used in many embedded computers, such as internet routers, digital cameras, printers, gaming consoles, and in Internet of Things (IoT) devices. The MIPS is based on four design principles: 1. Simplicity favours regularity: the hardware can handle easier instructions with a consistent number of operands. #add values b and c, subtract value d, and store the result in variable a sub s,c,d → s = c – d where s is a temporary variable to store the intermediate value add a,b,s → a= b + s In MIPS, we have operators with three arguments. All arithmetic operations in the MIPS architecture have the same structure because regularity facilitates implementation, and this simplicity enables higher performance for a lower cost: OPERATION destination, source/s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2. Make the common case fast: MIPS is a RISC, i.e., it uses a minimal, simple, and commonly used set of instructions. Therefore, smaller and simpler hardware can decode the instruction and its operands. The next two principles regard the storing location of the data. An instruction operates on operands. However, computers can only operate on 0s and 1s, not on variable names. Furthermore, instructions need the physical address of the binary data, and operands can be stored in registers, in the main memory, or in the instructions (as constants). The original MIPS architecture is a 32-bit architecture, i.e it operates on 32-bits data, =8 bytes. So in MIPS, we work with 8 Hex digits. Its range is from 0x0000 0000 to 0xFFFF FFFF (232=4’294’967’296 → because we start counting from 0, after we calculated the address range we must subtract 1 in order to get the value of the last possible cell which can be addressed:4’294’967’296 – 1= 4’294’967’295). However, only the first 231 bytes are available for the user data, while the rest (0x80000000 to 0xFFFFFFFF) are reserved for the operating system and ROM, so the programmer does not have access to those bytes. The RAM-memory is strictly divided into usable parts. So you cannot use any memory address you want. The usable part is called “User data Segment”. For example, in I-Type Instructions the offset is 16-bit, BUT starting from the first address of “User data Set”. Be careful how big the offset is, so you don’t land in exception. We can use the MIPS instructions to transfer (load or store) data between the memory and the registers. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ MIPS also has another register for saving the address of the current instruction. This register is known as the program counter (PC) register and is separate from the 32 main registers→ inside the PC, there is the next instruction and a clock. When clock allows, the next instruction is placed on the output of PC. Then the next instruction is already loaded in PC. The collection of all registers (also known as register file) and the PC register is called the architectural state of the MIPS processor. The computer architecture is described using the instruction set and the architectural state 3. Smaller is faster: For a higher processing performance, MIPS uses small numbers of small-sized registers, i.e., 32 32-bit registers. The variables in the MIPS assembly language should be chosen from one of the 32 32-bit registers. To store the variables in MIPS we use 18 registers preceded by the $ sign: $16–$23 ($s0 - $s7) (saved registers to store variables) and $8–$15 and $24-$25 ($t0 - $t9) (temporary registers to store temporary variables). #do a= b+c-d code in MIPs using its registers sub $8,$17,$18 → take the value inside reg. 17 and subtract the value which is inside reg. 18 and put the result inside register 8. The values inside the registers will be calculated. add $19,$20,$8 → $19 contains variable a and $20 contains variable b. 4. Good design demands good compromises: Inside the processors, registers are a memory with extremely fast access; however, 32 registers are not enough to perform complex operations, which is why we use the main memory to store more data with a lower accessibility rate. Memory has more data locations compared to the registers, and MIPS instructions access the memory locations using a byte-addressable approach, i.e., each byte has a unique address in the memory. Therefore, the main memory in a MIPS architecture is made of 32-bit words, each containing four bytes, and MIPS instructions access each byte separately. We transfer things from registers to RAM & back from RAM to registers. 0 x 0000 8000 → register 8 will look: 0000 0000 0000 0000 1000 0000 0000 0000 Word= something 32 bit log that can be stored in memory In MIPS, we also have the lb (loads a byte) and sb (store a byte) in the main memory. ALU Another component of a processor is the arithmetic logic unit (ALU), which performs arithmetic and logic operations, such as the addition of two 32-bit integers. The integer used as an input for an operation is called an operand, and the ALU operands are stored in the registers. The results of the ALU operation will be stored in another register. You can see how the ALU stores the results of the addition of two operands in registers $9 and $10 into register $11. Machine Assembly languages are a human-readable version of the code that hardware (processors) can understand, also called Language machine language (containing only zeros and ones). A program written in the assembly language should be translated into the machine language. Encoding the instructions into the machine language is a part of ISA. MIPS has three general types of machine language instruction formats (or encodings) based on types of mapping of memory: R-type, I-type, and J-type. R-type R-type instructions (or register-type instructions) need three registers as operands. For these instructions, two registers are used as sources and one as the destination. EX. PAGE 58 The operation performed by an instruction is determined by two fields: funct and op, where op=0 for all R-type instructions; we have two source registers (rs and rt) and one destination register (rd). shamt is a shift binary value whose value is 0 for all R-type instructions. It indicates the amount to shift in shift instructions. I-type I-type instructions (or immediate instructions) have two operand registers, rs and rt, and one immediate operand register, imm. The operation is determined only Page 59 by the op field. For example, addi has op=8, lw has op=35, and sw has op=43. J-type An example of J-type instruction (or jump instructions) is j firstInstruction where firstInstruction holds the address (addr) of the first instruction in the program. Code #A program to add 2 and 3 →The comments are written with # and they are ignored by the processor.text → a directive, i.e., a statement that tells the assembler about the program, but that is not translated into the machine language. For example, the.text directive tells the assembler that the codes in the following lines are the source code for the program. main: → specifies a symbolic address, which is an identifier for a location in the memory, such as the address of the first machine instruction (here, 0x00400000). The programmer refers to a memory location by name (not a numeric address), and the assembler figures out the numerical address. ori $8, $0, 0x2 → put 0x2 into register $8 - this is the value which will be stored ori $9, $0, 0x3 → or immediate: bitwise OR with immediate value (a number) addu $10, $8, $9 → add unsigned: Values are treated as unsigned integers, We save the code with the.asm extension. This stands for an assembly source file (in PL this is a file that contains the text of the program statements and is written by a prommer). 8($0) points to the address of the word: $0 is the base address and 8 is the offset. Therefore, the address in the memory is $0+8=8 Each of the blue rectangle corresponds to each Data The numbers which are addressable are of 4 byte steps only because of MIPS architecture: One instruction = 32 bit =4 bytes. So one instruction is saved into 4 memory cells. If you try to access some address in the middle, you’ll get an exception: you count 0, 4, 8, C, 10, 14, … You must go on steps of 4 bytes. From assembly code to mapping —-----------------------------------------------------------------------------—-----------------------------------------------------------------------------—---------------------------------------------- Opposite direction Page 110 immediates Some machine instructions use 16 of their 32 bits to hold one of the two operands. These instructions are known as immediate instructions (which means constant numbers), and the operands integrated into the instruction itself are called immediate operands. EXAMPLE page 109 ori ori d, s, const → bitwise OR of const which consist of content of register $s and a constant of Example page 110 16-bits. Their result is put in register d. ( s | const) andi andi d, s, const → bitwise AND of const: s & const xori xori d, s, const → bitwise-exclusive XOR of const: s ⊕ const Examples page 112 Logic Many MIPS assembly instructions use two registers as input operands and load the result in another register (or into one Instructions of the input registers). -------------------------------------------------------------------------------------------------------------------------------------------- Or or d, s, t → OR function between registers s and t: s || t and d, s, t → AND function between registers s and t: s && t And xor d, s, t → XOR function between registers s and t: s ⊕ t xor nor d, s, t → NOR function between registers s and t: s ↓ t Nor Or, and, xor, nor at 1.2 Arithmetic MIPS has several instructions to perform integer arithmetic operations. The size of an integer in MIPS is 32-bit, which Instructions means that any integer arithmetic operation that is either slightly longer or slightly shorter is performed using the standard 32-bit arithmetic instructions. ADDITION The first row is called the carry out (C-out); where the leftmost C-out is 1, we call it an addition with overflow. If you are working on architecture in 8-bits, this result means that you have an overflow. If you are working on architecture in 32-bits, this result means that you don’t have an overflow In MIPS, we have two addition instructions (add and addu), which are differentiated by the overflow. The addu instruction loads the sum of the content of the registers s and t into the register d, and ignores the overflow. Overflow trap: when you try to fit in 32-bits a number which in binary needs more than 32-bits: Using addu will NOT produce an exception if you program generates an overflow; Using add will produce an exception if your program generates an overflow. Addu addu d, s, t → ADD Unsigned function between registers s and t: s + t Values are treated as unsigned integers, the range is bigger. addu will add the two numbers, there will be an overflow, but NO EXCEPTION!! Example p. 116 ori $8, $8, 0x7 FFFF FFFF → put 0x2 into register $8 - this is the value which will be stored ori $9, $9, 0x1 → or immediate: bitwise OR with immediate value (a number) → add unsigned: load the sum of the content of the registers 8 and 9 in register 10; addu $10, $8, $9 the result will be 0 because there is no exception. ………………………………………………………………………………………………………………………………… Add add d, s, t → ADD loads the sum of the content of the registers s and t in register d and, when it creates an overflow, it sends an interruption control signal called overflow trap. Subu subu d, s, t → SUB unsigned: no overflow trap. d= s+(-t) sub sub d, s, t → SUB: overflow is trapped! d=s+(-t) Signed numbers When performing arithmetic calculations, we should use signed binary numbers. There are three different representations representations for these numbers. 1. Sign-magnitude representation: we use the most significant bit (MSB) as the sign of the binary number (1 as negative and 0 as positive). In this approach, we will have 31 bits available to represent the magnitude of the binary number. 2. One’s complement: here, the 0...231 – 1 is used for non-negative numbers. Negative numbers are obtained by inverting (or reflecting) all bits. Page 117 - table Example page 118 3. Two’s complement: the negative of a binary number is obtained by inverting all bits and adding the number one to the resulting number 1. The 0….231-1 is used for non-negative numbers. Write the number without the sign. 2. Invert all bits: 0→ 1 and 1→ 0 ○ The invert of the bits in MIPS assembly language can be obtained using the nor instruction 3. Add 1 to the LSB (Least significant bit) 4. Add the leading 0s or 1s until we reach 32-bit representation, as the size of MIPS is.. ○ Leading 0s if it’s a positive number (MSB is 0) ○ Leading 1s if it’s a negative number (MSB is 1) Data Transfer MIPS also has instructions for loading and storing data. A load operation transfers the data from the main memory into Instructions the register, while a store operation transfers the data from the register into the main memory. lw instruction loads a word of data (4 bytes) from the main memory into the register Exercise p. 120 ○ lw d, off(b) → command for reading a value from a specific address in RAM. It copies the word of data stored in the address b+off in the main memory into the d register. sw instruction copies a word of data from a register into the main memory. ○ sw t, off(b) → command for storing something in a specific address in RAM. It copies word of data stored in the register t into the main memory at the address b+off. off(b) = offset+base Store Word - sw 1. Add immediate value with content of register A 2. Result is an address in RAM, go to that address 3. Store the content of register B into that address li $8, 0x11223344 → put 0x11223344 into register $8 - this is the value which will be stored li $7, 0x4 → put 0x4 into register $7 - this will be the base address sw $8, 0x10030000($7) → 1. Adds immediate value (0x10030000 (offset)) with the content of reg. 7 2. The result from step 1 is an address in the RAM, go to that address 3. Store the content of reg. 8 into the RAM-address obtained in step 1 Even if you type li, internally ori code will be executed because it is easier to do this for hardware (basic assembly): extended assembly→ basic assembly→ machine code Load Word - lw 1. Add immediate value with content of register A 2. Result is an address in RAM, go to that address 3. Read the content of that RAM-address and store it inside register B li $12, 0x4 → put 0x4 into register $12 - this is the value which will be stored lw $13, 0x10030000($12) → load word from address 0x1003000 ($12) into register 13 1. Adds immediate value (0x10030000 (offset)) with the content of reg. 12 2. The result from step 1 is an address in the RAM, go to that address 3. Read the content of that RAM-address and put it inside reg. 13 Sw: sw+ lw sw part: R8 = 0x1122 3344 value is saved Screenshot page R7 = value 0x4 is saved 123 1. 4+10030000 = 10030004 → address of RAM (which is found in Reg 1) 2. Go to 10030004 3. We store 11223344 (the value of register 8) in the address 10030004 lw part: R12 = value 0x4 is saved 1. We do 4+1003 0000=1003 0004 2. Go to 1003 0004 3. We read the content that is saved ar 1003 0004, which is 11223344 2.3 Microarchitecture The specific ordering of components like registers, ALU, and memories to implement the instruction set architecture is called microarchitecture (the hardware circuits that implement a particular ISA). The ISA describes the computer design from the perspective of the programmers in terms of basic operations, and is unconcerned with the basic implementation of those operations. This means that two processors can run the same programs, but their internal structure is different. The challenge of microarchitecture is to design a processor with minimised execution time by considering other factors, such as technology and cost. Some non-processor factors also affect the performance of the system, such as the hard-disk, memory, or network connection. The microarchitecture is the next level of abstraction below the ISA. The level of abstraction increases as we move from the bottom layers to the top layers Abstraction= In computer science, presenting information while hiding the unnecessary information is called abstraction→ if one task is difficult, you need to divide it into simpler sub-tasks. One of the ways is abstraction: you split the task in different levels of complexity, the level of abstraction. Benchmark Because for a particular architecture (such as MIPS), we can have many different implementations (microarchitectures) programs with different characteristics, one of the most reliable ways to measure the performance of a processor is by using benchmark programs. This is a collection of programs similar to what you are going to run with commonly published execution times for a specific processor. Benchmark= 1 program Look at: Number of instructions: depends mostly on the architecture and the programmer’s skill. Executing known - Basic Notes, MIPS instructions means that the number of instructions in each program is known and constant. - programmers skills Tc: The parameter “seconds per cycle” is called the clock period TC. This parameter is the time needed to synchronise the circuits in the processor. It is the time between two rising edges of the rectangle pulse. (it is the inverse of the clock frequency). For example, a 1GHz processor has a clock period of 1 nanosecond (ns), and a 4Ghz processor has a clock period of 0.25 ns. Tc= 1/f → Tc is measured in seconds Clock frequency: depends on the circuit technology. How many times per second will the rectangular pulse change between 0 and 1. f=1/Tc → f is measured in Hz For instance, a gate does not propagate instantly, but rather has a propagation delay, which depends on the number of inputs and outputs. The clock period is set to be the worst-case total propagation time within gates that generates a signal needed in the subsequent cycle. The worst-case total propagation time happens on one or more signal paths within the chip. These paths are called critical paths. CPI The number of processor clock cycles needed to execute an instruction is called the number of cycles per instruction (CPI) and it is affected by the complexity of the instructions. It’s calculated as: 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑜𝑛𝑒 𝑡𝑦𝑝𝑒 𝐶𝑃𝐼 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 * 𝑐𝑦𝑐𝑙𝑒𝑠 𝑛𝑒𝑒𝑑𝑒𝑑 𝑡𝑜 𝑒𝑥𝑒𝑐𝑢𝑡𝑒 𝑜𝑛𝑒 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 + (𝑠𝑎𝑚𝑒 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑡𝑦𝑝𝑒𝑠 𝑜𝑓 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 Number of instructions: how many instructions does your program have? Cycle/instruction: how many CPU cycles will it take to execute one instruction (1 CPU clock cycle is from rising edge to the next rising edge). Seconds/cycle: how many seconds does one cycle take (from one rising edge to rising edge)? This is the period Tc of CPU clock Execution time The execution time of a program (a set of instructions) in seconds is given by 𝑐𝑦𝑐𝑙𝑒𝑠 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 Page 64 notes 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 (𝑠𝑒𝑐𝑜𝑛𝑑𝑠) = (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠) * 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 * 𝑐𝑦𝑐𝑙𝑒 = num of instruction * CPI * Tc 2.4 System Design How to design a MIPS processor given the MIPS ISA (which describes the MIPS processor from the programmer’s perspective in terms of registers, instructions, and memory). The MIPS ISA considered are: R-type arithmetic instruction: add I-type instructions: lw, sw Single-cycle Micro Instrumentation and Telemetry Systems (MITS) CPU: In a single-cycle CPU, each instruction will be completed within exactly one CPU clock cycle. A microarchitecture can be divided into two separate parts: datapath → operates on words of data and contains structures, such as memories, registers, ALUs, and multiplexers control unit → fetches the current instruction from the datapath and instructs it on how to execute the instruction State elements: the 32 registers, the program counter (PC) register, and the memory of a MIPS architecture. Multiplexer A device that chooses between several inputs and forwards the selected input to an output port is called a multiplexer. Based on the address control chosen, then selected input will be forwarded to output. Ex, if address control is: 00 → address 0 input will be propagated 01 → address 1 input will be propagated 10 → address 2 input will be propagated 11 → address 3 input will be propagated How to determine how many bits you need for addressing a certain address space? Power of 2 and permutations (all inputs/variables). Ex. if you have 3 permutations, then you do 23=8→ with 3 permutations, you can control 8 possible addresses (8 bits). PC The PC register contains the memory address of the next instruction that should be executed. It is a 32-bit register where its output PC is a 32-bit pointing to the current instruction and its input PC points to the address of the next instruction. CLK states the CPU clock. When you consider the presence of CPU-clock (CLK), then it becomes very clear that the PC-register contains in itself the address of the NEXT instruction to be executed. Instruction The instruction memory has a 32-bit read input port and a 32-bit read output port. It reads the memory address of input Memory A data like an instruction, and then goes to the specific memory cell, there is a 32-bit value that will be taken out and placed into the output read port (RD). Register File The register file (the collection of 32 registers, each with 32 bits) has two read ports and one write port. A1, A2, A3 are 5-bit ports because this way you can point to the 32 registers (25= 32). RD1, RD2, WD3 are 32-bit wide because MIPS is a 32-bit architecture. Clock and WE3 are controlled by the Control Unit. When the write enabler WE3= 0: the read ports read the memory addresses A1 and A2 of two registers in the register file (25= 32). Then, they read the 32-bit-long values of the addressed registers into the read ports RD1 (displays the Data that was read in the memory address pointed by A1) and RD2 (displays the Data that was read in the memory address pointed by A2). When the write enabler WE3= 1: A3 specifies the memory address in the registers. Then the 32-bit value WD3 will write the data in the memory address specified by A3. Data Memory The data memory has one read and one write input. If the write enabler WE=0, the data in the memory address indicated by A will be displayed by RD; If the write enabler WE=1, the data in WD will be written in the memory address indicated by A. imple Single- Cycle MIPS Processor (Step by Step how the different parts interact): What happens when the processor obtains the code: lw $13, 0x10030000($12). Design a processor that can execute an lw instruction. 1. Fetch instruction from memory: The PC register contains the address (PC) of the instruction (Instr) that should be fetched by the “instruction memory” command and put into the RD port. It goes to RAM to get the instruction from the memory address specified by A. It gets the instruction and outputs it, 2. Decoding: the processor acts depending on the type of Instr. We consider an lw instruction that should read the source register pointed in the base address of Instr. This register is stored in the rs field of the Instr, i.e., Instr21:25. It is important to remember the form of an I-type register (like lw). Now it needs to split the binary number The bin value of OP will go to CU. Based on this bin value, the CU will issue the control signal: - For register file: WE3 - For data memory: WE In 0x1003000, we don’t deal with the first 4 bytes but only the last 4 digits. So, the binary conversion will be done on 0000 and not the entire number. rs field is connected to one of the input read ports of the “register file”, like A1, and it reads the register value onto RD1. 1. So, in Assembly language you write lw $13, 0x10030000 ($12) 2. The assembler takes this code and converts it to: 1000 1101 1101 0000 0000 0000 0000 Input A1 is based on rs field from the lw-instruction: 5 bits determining the register. Place register content on RD1 Complementary The offset is 16 bits long but MIPS is 32-bits, same as ALU. Therefore, we must extend it for the ALU to be able to execute binary: this addition. The result goes to ALU srcB (32 bits). As the 16-bit immediate offset imm might be positive or negative, it should be sign-extended. It means extra bits (also called sign bits) will be added to the beginning of the binary number (i.e., the right side). We look at the most significant bit, i.e., the bit 15 from the Instruction to build its sign-extended version: If bit 15 is 0, we fill leading bits with zeros, which indicates a positive binary number; If bit 15 is 1, we fill leading bits with ones, which indicates a negative binary number. For instance, 310=112 , but the sign-extended would be +310=0112 and -310=1112 Here, we make a 32-bit sign-extended of the offset imm called SignImm. In this case, SignImm15:0=Instr15:0 and SignImm26:31=Instr15. The Result goes to SrcB. 3. Load ALU SrcA with RD1 and load SrcB with result from SignExtend. The processor should add the base address with RD1 in SrcA and offset ScrB to find the address of the target data on the memory, which should be read. For this purpose, we need an ALU unit. ALU execution: From Control Unit, we find the type of operation : “010” means “addition” for lw instruction is the code used to add the address to the base Execute “addition” of SrcA with SrcB The output of ALU is a 32-bit ALUResult, which represents an address in the data memory and a zero flag, which indicates if ALUResult==0. 4. Data is read from the data memory onto the ReadData port. These data will then be written into the destination register specified by the rt field of the lw instruction Instr20:16. This register is also connected to the A3 port of the register file. The ReadData port is connected to the WD3 port of the register file. The RegWrite control signal into the WE3 port of the register file permits the writing into the register (RegWrite=1). The writing task happens at the end of the cycle. ALU goes to input A of data memory (the sum of SrcA and SrcB)→ Data memory retrieves content of that memory cell: Data memory places that content on output RD→ The output from RD goes back to WD3 of the register file→ Input A3 from the register file is connected to field rt (5 bits) from the lw-instruction→ CU issues Example page WE3=1 (enable writing). Values in WD3 are written into the register determined by 69 A3. During the execution of the lw instruction, the processor should compute the address of the next instruction PC’. As the instructions are 32-bits or 4 bytes, then PC= PC+4. This calculation will be done by an adder. The new address will be 5. written onto the program counter at the beginning of the next cycle, which means that one cycle of the lw instruction datapath has now been completed. Unit 3- Computers Hardware 3.1 Personal Computers Personal computer (PC)= a digital computer that will be used by a single person at a time. A conventional pc consists of: a central processing unit (CPU): This includes the computer’s ALU, control unit, and a couple of registers all assembled into an integrated circuit (IC). two types of computer memories: main memory, like the random access memory (RAM), and at least one auxiliary memory, such as magnetic hard disks. A PC also includes other complementary types of memories, such as special optical compact discs (CD-ROMs and DVD-ROMs), a read-only memory (ROM), USB-sticks, and memory cards. multiple input/output (I/O) devices: monitors, keyboards, mouse devices, and printers. Before the invention of personal computers, the smallest computing machines available on the market were minicomputers. Developed in the 1960s, minicomputers, or “minis”, were the size of a refrigerator and were based on two types of technology: transistors and magnetic-core memory. They were BIG. Transistors Originally, the official name for these devices was transistorized computers, while the previous generation was called vacuum tube computers. The transistors used in the minicomputers were known as discrete transistors. Unlike modern Page 73- 74 computer processors, where millions of transistors are integrated into a single board, discrete transistors have a single component in a semiconductor package. Magnetic Core Magnetic core memory, was the predominant type of RAM from the 1960s to 1980s. Core memory was built from rings of Memory hard magnetic material, which formed the the transformer core. Then, three to four electrical wires went through each Page 75 core to form a transformer winding. Magnetic hysteresis enabled each core to remember or store information. The advancement in semiconductor and transistor technologies facilitated the production of PCs. Before the 1970s, CPUs were massive and expensive, and made of discrete transistors. However, the invention of the metal-oxide-semiconductor field-effect (MOSFET) transistors facilitated the production of the microprocessors (small-sized CPUs) in a single integrated circuit (IC). In 1971 Intel built Intel 4004, the first single-chip microprocessor. The first commercial, preassembled personal computers appeared in 1977. Manufactured by three different companies, these PCs were the Apple II, the PET 2001, and the TRS-80; they were also known as the “1977 trinity”. In 1976 Steve Wozniak and Steve Jobs established Apple Computer and presold Apple II machines based on the single-board computer, named Apple I. Apple II was an 8-bit computer (data units are 8-bit wide) with colour graphics, a keyboard, and external slots, all mounted in a plastic case. The storage medium of the Apple II was an audio cassette interface for loading programs and storing data. Page 78 Commodore PET 2001, developed by Chuck Peddle, was an 8-bit single-board computer with a monitor, a keyboard, and a cassette deck as the storage medium. Page 79 TRS-80, developed by the Tandy Corporation is a 8-bit computer wherethe motherboard (i.e., the mainboard including the memory, CPU, and all other elements) was integrated into a single unit, while the monitor and power supply were in separate units Page 80 In 1981, the IBM introduced the IBM PC, an open, card-based architecture that enabled third-party companies to develop extra products for it. It utilized an Intel 8088 microprocessor. The IBM PC shipped with a floppy disk slot. Microsoft provided the computer’s operating system, the Microsoft disk operating system (MS-DOS). The design of the IBM PC became one of the most popular computer design standards in the world. In fact, most contemporary personal computers are descendants of the IBM PC. Page 81 In 1983, Apple introduced Lisa, the first computer with a graphical user interface (GUI). A GUI is a collection of visual interactive elements which enables the user to start and stop programs, select commands, and perform other routine tasks using a mouse. Despite its high price, Lisa had very low performance which ultimately led to its commercial failure. In 1984, Apple introduced Macintosh, which became the first commercially successful computer with a GUI Single-board computer: a computer where all necessary functional components, such as CPU and memory, are mounted on a signle board. 3.2 Mainframes A mainframe is a high-performance computer system equipped with a very large memory that is designed to perform billions of simple tasks in real-time (the process of preparing the result of a computation process within a specified period of time is called real-time). Mainframes, also known as big irons, are larger than a normal PC and are mounted in a big cabinet. Mainframes are mostly used by organizations like banks, insurance companies, and financial institutions for bulk data processing, e.g., processing monetary transactions at the end of each working day. The most important characteristics of mainframes are described using a term introduced by Into IBM to describe their mainframe computers: RAS, used to refer to: reliability: A reliable system results in the correct output within the expected time. availability: the ratio of the real operational time to the expected operational time, generally expressed in percentage→ it breaks few times serviceability: the simplicity and speed used to resolve system issues, based on the idea that better serviceability results in higher availability→ if something breaks, it can be quickly required. The first mainframe was developed by IBM in 1964 and it was called IBM System/360. The IBM System/360 could perform numerical and scientific calculations and massive input/ output commercial computations. This was a series of upwardly compatible machines that could be upgraded to a more powerful version without rewriting their programs (the new model of mainframe can still run on your old programs: you need to pay for the software only once). This model of mainframe used special computers that managed the input/output tasks, so that the CPU could then apply its power to the application. The first models of IBM System/360 had only 32K memory. The operating system of the initial System/360 was OS/360. IBM System/360 was replaced by System/370. The last model of this series was System/390, released in 1990, which was used until 1998, when it evolved into the modern mainframe product line of IBM, known as zSeries. The zSeries is based on the z/Architecture, which is IBM’s 64-bit complex instruction set computer (CISC) architecture. The latest z/Architecture mainframe of IBM, released in 2019, is called z15 3.3 Servers A server is a piece of hardware and software that is used to serve, i.e., provides required data or resources other computers, known as clients; together, they form the client-server system, and it works as a hub to connect different clients. A server can support multiple clients, and a client can connect to multiple servers simultaneously. The connection between server and clients can be realized through the LAN, or WAN, like the internet. One of the most common implementations of the client-server model is based on the request-response model. The client requests a service from the server and the server sends the response with a result. There are different types of servers, which are divided into different categories based on the services they provide: Application servers provide an environment in which the applications can run, so that the client can use them without installing them. In this case, the client could be a web browser that uses an online calculator application installed on an application server Computing servers provide computing resources (e.g., CPUs or memory) to the clients. Database servers host the data and provide them to the authorized clients. Usually, a database management system (DBMS) like MySQL or MongoDB should be installed in this type of server. File servers host files and folders, and allows access to them to the authorized clients. It is normally used within an organization over a local area network (LAN). Web servers host the content of websites (including images and text) and respond to the Hypertext Transfer Protocol (HTTP) requests of the client. Examples of web servers are web browser like Google Chrome or Firefox Mail servers, such as Microsoft Outlook, send and receive emails from clients. To this end, mail servers use standard email protocols, such as the simple mail transfer protocol (SMTP) for sending messages, and the internet message access protocol (IMAP) or the messaging application programming interface (MAPI) for receiving messages. In principle, even a simple PC can be a server for other computers (clients) if its OS is updated; however, its limited resources, such as its CPU power and memory, reduce its serving performance and functionality. For example, the hard disk of a server should be resistant to wear, vibration, environment changes, and other factors for many years. A server normally has multiple hard drives to avoid data loss and have a higher availability. To this end, servers use a redundant array of inexpensive disks (RAID) data storage virtualization technology, where a server distributes all data among the provided hard disks. In the case of a drive failure, the server will rebuild the inaccessible data of the failed drive on a new drive. A server’s motherboard should also support multiple CPU configurations, a vast memory, and the networking requirements. Servers have redundant power supplies for the utmost availability in the case of a power failure. Servers vs Mainframes page 87 3.4 Supercomputers Supercomputer= highest possible available computing power. This is an evolving technology. A few decades ago, what we now consider a normal laptop would have been a supercomputer in its own right. The concept of the supercomputer was introduced by Cray. The first supercomputer developed by CDC was called CDC 6600. In those days, conventional computers used a single CPU; this had to be intricate to be able to carry out complex instructions, so it had more components and wiring and it was physically larger. More wiring in CPUs resulted in more signaling delays and lower performance. To address this issue and maximize the efficiency of the idle memory, Cray decided to increase the number of processors. He also replaced the complex processors with simpler ones to reduce the signaling delay. In CDC 6600, ten such simple peripheral processors had access to a shared main memory. In 1975, Cray and his team introduced Cray-1 as a successful supercomputer that used multiple processing units in parallel. This technique, known as parallelism, provides several computations simultaneously, and is still the basis of today’s supercomputers. The parallel processing splits tasks in several subtasks and multiple CPUs, not depending on each other. The performance of a supercomputer is calculated in a unit called floating-point operations per second, or FLOPS and is the number of floating-point computations that a processor can perform per second. A floating-point number is a real number with a fraction, such as 3.14, and it is very important for scientific calculations. To represent these numbers in computer science, we use the scientific notation 0.0034 = 34.0 · 10−4 = 3.4 · 10−3.The number radix’s point (the dot character that indicates the fraction) can float. The numbers above can also be written as real number = significand · baseexponent We also have integer computations, which are instructions that move the data from memory into the register or compare two bits. Generally, floating-point number computations are much more complex than integer computations. 3.5 Mobile Systems Mobile computer systems are designed for mobility; to this end, all required resources for the computer system, such as battery or wireless network connection, are integrated, ex. Laptops, , tablet computers, or wearable compu