SN NOTES 9.2 - 11.pdf

LU6: EXTERNAL MEMORY TYPES OF EXTERNAL MEMORY 1. Magnetic disk - RAID - fundamental principle - the use of multiple hard disk drives in an array behaves in most respects like a single large, fast one - depend on the needs of application, same results - storage subsystem to exceed capacity - data security - performance of the drives that make up the system (can recover data) - CONS; cost and complexity RAID BENEFITS higher data security redundancy -> provide protection for the data stored on the array (hard disk) the data on the array (hard disk) can withstand complete failure of one hard disk ○ without any data loss ○ so if experience any failures, company still have backup data fault tolerance redundancy -> reliable overall storage subsystem that can be achieved by a single disk improved availability availability -> access to data (available to access data) improve availability ○ fault tolerance ○ special features allow for recovery from hardware faults without diruption improved performance allow controller to exploit the capabilities of multiple hard disks ○ performance-limiting mechanical issues on individual hard disks SUMMARY redundant array of independent disks 7 levels (0-6) not a hierarchy (tak perlu ada semua level pun, can choose any of the level) set of physical disks, single logical drive by O/S data distributed across physical drives redundant capacity store parity information ○ error correction ○ to recover data RAID CONCEPTS mirroring redundancy techniques all data in system is written simultaneously to two hard disks instead of one, “mirror” concept principle ○ 100% data redundancy provides full protection against failure of either disks (contain duplicated data) mirroring setup ○ require even number of drives used in RAID 1 ADVANTAGE complete redundancy of data fast recovery from a disk failure ○ since all data is on second drive read performance (hurts write performance) DISADVANTAGE RAID 1 : expensive! ○ data duplication = half space is wasted ○ must buy twice capacity that u want striping main performance-limiting issues with disk storage? ○ slow mechanical components used for positioning and transferring data access time = seek time + latency (high, slow) how does it solve with RAID array? ○ has many drives ○ improve performance use hardware in all these drives in parallel ○ disk 1 -> file 1, disk 2 -> file 2 striping technique = “chopped up pieces” (put the files into separate hard disk) on the various drives with a different color used for each file can be done at byte or blocks level ○ block > byte (smaller components) ○ byte level? file broken into “byte-sized” pieces first byte sent to first drive second byte sent to second drive etc ○ block level? each file split into blocks of a certain size, distributed to various drives size of blocks = stripe size (block size) can set up the size tak kisah determine how many pieces files will be split into implementation of striping technique ○ most of RAID levels ○ RAID 0 = block-level striping NO parity ○ RAID 3 & 7 = byte-level striping WITH parity ○ RAID 4, 5, 6 = block-level striping WITH parity striping with and without parity means? ○ striping by itself (no parity) = no redundancy ○ no data protection! parity mirroring (data redundancy technique) = used in RAID 1 (some levels juga) ○ data protection on a RAID array LIMITATIONS MIRRORING high cost ○ 50% of drives in the array reserved for duplicated data ○ does not improve performance as much as data striping how to solve the mirroring limitations? - use parity information - redundancy information calculated from actual data values principle of parity take “N” pieces of data, compute extra piece of data, “N+1” store in “N+1” drives if lose any of the “N+1” data, can recreate from the “N” that remain used with striping, “N” pieces of data = blocks/bytes distributed across the drives in the array parity info ○ stored on a separate, dedicated drive ○ can mixed w the data across all the drives RAID 3 to 7 use parity; popular RAID 5 RAID LEVELS 0 CN : RAID 0 technique: striping (without parity) description: ○ no redundancy benefits of no redundancy? allows have the best overall performance costs ○ data striped across all disks files -> stripes of size (user-defined stripe size) -> each disk in the array ○ round robin striping ○ increase speed multiple data requests probably not on same disk disks seek in parallel a set of data is likely to be striped across multiple disks 1 CN : RAID 1 technique used: mirroring/duplexing description: ○ mirroring (mirrored disks) a drive has its data duplicated on two different drives (hardware RAID controller/software, via OS) benefits of mirroring? either drive fails, the other continues to function as a single drive until the failed drive replace ○ data striped across disks ○ 2 copies of each stripe on separate disks ○ read from either ○ write to both ○ recovery is simple swap faulty disk & re-mirror no down time ○ EXPENSIVE (cons) 2 CN: RAID 2 technique used: bit-level striping with Hamming Code ECC description: ○ split data -> bit level ○ spread over a number of data disks & redundancy disks ○ redundant bits calculated using Hamming Codes (ECC), error calculation calculated across corresponding bits on disks data written to the array -> codes are calculated and written along side the data to dedicated ECC disks data is read -> ECC codes read as well, why? confirm that no error occurred since data was written single-bit error occur, corrected “on the fly” (can be corrected without retrieve original file) ○ disks are synchronized ○ very small stripes (byte / word) ○ multiple parity disks store Hamming code error correction in corresponding positions ○ lots of redundancy (CONS), highest level of protection (PROS) EXPENSIVE NOT USED 3 CN: RAID 3 technique used: byte-level striping with dedicated parity description: ○ data striped across multiple disks -> byte level (exact number of bytes sent in each stripes) ○ parity info -> dedicated parity disk ○ failure of any disk can tolerated ○ similar to RAID 2 ○ only one redundant disk, no matter how large array ○ simple parity bit for each set of corresponding bits ○ data on failed drive reconstructed from surviving data and parity info ○ very high transfer rates 4 CN: RAID 4 techniques used: block-level striping with dedicated parity description: ○ improve performances striping data across many disks in blocks ○ provide fault tolerance thru dedicated parity disk ○ same like RAID 3, except uses BLOCKS for striping ○ same like RAID 5 but use dedicated parity instead of distributed parity ○ byte -> block improve random access performance compare to RAID 3 ○ dedicated parity disk remains a bottle neck, random write performance ○ each disk operates independently ○ good for high I/O request rate ○ large stripes ○ bit by bit parity calculated across stripes on each disk ○ parity stored on parity disk 5 CN: RAID 5 technique used: block-level striping with distributed parity description: ○ stripes both data and parity information across three or more drives ○ similar to RAID 4 except dedicated -> distributed parity algorithm, writing data and parity blocks ○ benefits of distributed parity? removes “bottleneck” that the dedicated parity drive represents fault tolerance maintained ensuring the parity info is placed on a drive separate from those used to store the data itself - parity striped across all disks - round robin allocation for parity strip - used in network servers 6 CN: RAID 6 techniques used: block-level striping with dual distributed parity description: ○ stripes block of data and parity across an array of drives (like RAID 5) except, it calculates two sets of parity information (two parity calculations) goal for this duplication? to improve fault tolerance RAID 6 can handle failure of any two drives in the array other single RAID levels can handle only at most one fault (boleh satu je yang fail) ○ stored in separate blocks on different disks ○ user requirement of N disks = N+2 ○ high data availability three disks need to fail for data loss significant write penalty - removable REMOVABLE DISK speed seek time - moving head to correct track - time to move head to the correct position rotational latency - waiting for data to rotate under head - time taken for platter to move position access time = seek + latency transfer rate - how much data transfer per second PROS commercial software making back-up copies for important info transport data between computers store software and info that dont need to access constantly copy info to give to someone else secure info that dont want anyone else to access ZIP - use magnetic coating - higher quality coating - read/write head significantly smaller than floppy disk - smaller head means - conjunction w head-positioning mechanism (similar to hard disk) - can pack thousands of tracks per inch on the disk surface (USEFUL) cartridges - magnetic technology as well (JAZ) - has self contained case - a hard disk, why? - several platters (in a hard, plastic case) - cartridge - got no heads/motor for spinning disk - heads and motor in drive unit portable - completely external (has usb technology) drives - drive mechanism and media (in one sealed case) - drive connect to PC via USB cable - driver software installed, and keluar available drive, so can extract data there je 2. Optical - CD-ROM - STORAGE - originally for audio - polycarbonate coated with highly reflective coat, usually aluminium - data stored as pits - read by reflecting laser - constant packing density - constant linear velocity - SPEED - audio single speed - other speeds -> multiples - maximum drive can achieve - RANDOM ACCESS - difficult - move head to rough position - set correct speed - read address - adjust to required location - ADVANTAGE - large capacity - easy to mass produce - removable - robust - DISADVANTAGE - expensive for small runs - slow - read only - CD-Recordable (CD-R) - write once read many - now affordable - compatible with CD-ROM drives - CD-R/W - erasable - getting cheaper - phase chang - DVD - larger data capacity - technology - multi-layer - very high capacity - full length movie on single disk - finally standardized - movies carry regional coding - players play on correct region films - can be fixed - writable - loads of trouble with standards 3. Magnetic Tape Winchester Hard Disk Floppy disk also called as hard disk called floppy because the sealed unit packaging was very flexible plastic one or more platters (disks) envelope, not rigid case (easy to heads fly on boundary layer of air as break) disk spins write on both side very small head to disk gap getting more robust PROS small capacity universal slow cheap universal fastest external storage cheap getting larger all the time obsolete LU7: INPUT OUTPUT A. main functional blocks in computer - CPU - main memory - I/O - I/O devices themselves (peripherals) - I/O modules - handle communication between them and CPU B. I/O DEVICES 1. storage devices - magnetic, optical disks (storage) - to store data for later retrieval 2. source/sink devices - allow communication between computer and environments - keyboards (input), mouse(input), printers(output), etc C. I/O MODULES I/O MODULES a key element of a computer system + processor interfaces to the system bus/central switch controls on one or more peripheral devi es set of mechanical connectors that wire devices into system bus contains logic to perform function between peripheral and the bus (software) PARTS a connection to system bus some control logic a data buffer interface to the peripherals two main areas: strategy I/O modules communicate with CPU interface between I/O modules and device(s) connected FUNCTION 1. control & timing - coordinate flow of traffic between internal resources & external devices 2. CPU communication - command decoding, data, status, reporting, address recognition - communicate with CPU 3. Device communication - involves commands - exchanges status information and data 4. data buffering 5. error detection - Hamming Code, parity, etc STEPS cpu check i/o module device status (available or not) - cpu need to initiate everything i/o module returns status if ready, cpu requests data transfer i/o module gets data from device i/o module transfers data to CPU DECISIONS hide/reveal device properties to CPU support multiple or single device control device functions or leave for CPU O/S decisions STRATEGIES/TECHNIQUES programmed simplest ○ to handle communication between CPU and I/O module CPU responsible for all communication ○ executing instructions which control attached devices ○ transfer data example ○ CPU -> send data to a device first, issue an instruction to the appropriate i/o module telling it to expect data cpu must wait sampai module respond before send data if module slower than cpu, cpu kena la wait sampai transfer habis ○ problem 1 : very inefficient (lambat) why? CPU waiting -> not doing anything useful (so, computer system slow) same if for keyboard, nak kena tunggu instruction from cpu to the i/o module if any keys have been pressend extremely inefficient ○ used in very small microprocessor controlled devices SUMMARY ○ cpu has direct control sensing status read/write commands transferring data ○ CPU waits for i/o module to complete operation ○ wastes CPU time details ○ cpu requests I/O operation ○ i/o module perform operation ○ i/o module sets status bits ○ cpu check status bits periodically ○ i/o module does not inform CPU directly ○ i/o module does not interrupt CPU ○ cpu may wait or come back later commands ○ CPU issues address identifies module & devices data transfer is like memory access each device given unique identifier cpu commands ada identifier (address) ○ cpu issues command control - tell module what to do test - check status read/write module transfer data via buffer from/to device mapping ○ memory mapped I/O devices & memory share address space i/o looks ust like memory read/write no special commands for i/o memory access commands -> large selection ○ isolated i/o separate address space need i/o or memory select lines special commands for i/o limited set interrupt-driven overcomes CPU waiting (kalau programmed I/O waiting time -> lama) no repeated CPU checking of device (cpu no need to do frequently checking) i/o module interrupts when ready why this strategy? allows CPU to carry on with other operations until module is ready to transfer data why? because if programmed i/o, ○ CPU need to wait for the i/o to be ready ○ while waiting, cpu didnt do anything -> wasting removes CPU need to continually poll input devices to see if it must read any data ○ if input device has data -> i/o module interrupt cpu -> request a data transfer cpu wants to communicate with a device -> issue an instruction to the module -> continue w other operations -> device ready -> interrupt cpu -> cpu then carry out data transfer as before basic operation how i/o module interrupt cpu? ○ activating a control line in control bus sequence 1. i/o module interrupts CPU 2. cpu finishes executing the current instruction 3. cpu acknowledges the interrupt 4. cpu saves its current state 5. cpu jumps to a sequence of instructions -> handle the interrupt POV CPU 1. issue read command 2. do other work 3. check for interrupt at the end of each instruction cycle 4. if interrupted a. save context (in registers) b. process interrupt i. fetch data & store design issues 1. how to identify the module yang issue the interrupt? 2. how to deal with multiple interrupts? a. interrupt handle being interrupted each interrupt line has a priority higher priority lines can interrupt lower priority lines if bus mastering only current master can interrupt identifying interrupting module ○ different line for each module cons, limits number of devices ○ software poll cpu asks each module in turn cons, slow ○ daisy chain/hardwall poll interrupt acknowledge sent down a chain module responsible places vector on bus cpu uses vector to identify handler routine ○ bus master (reserve the bus) module must claim the bus before it can rais interrupt eg. PCI (PC Interrupt controller) & SCSI direct memory interrupt driven efficient, but all data is still transferred through access (DMA) the CPU (CPU need to do a lot of work) so, whats the problem? ○ inefficient if large quantities of data are being transferred between the peripheral and memory ○ transfer will be slower than necessary ○ cpu will be unable to perform any other actions solution? uses additional strategy, DMA additional piece of hardware - DMA ○ dma controller take over the system bus and transfer data between i/o module and main memory (to seek for data) without intervention of CPU (no need CPU) ○ how? CPU wants to transfer data -> tell dma controller direction of the transfer, i/o module involved, location of data in the memory and size of the block of data to be transferred -> CPU continue with other instructions -> DMA controller interrupt CPU when the transfer is complete DMA operation ○ CPU tell dma controller read/write device address starting address of memory block for data amount of data to be transferred ○ cpu buat kerja lain ○ dma controller akan deals with transfer ○ dma controller sends interrupt to the cpu when finished getting the data DMA transfer ○ cpu and dma controller cannot use the system bus at the same time, so how nak share the bus? 1. cycle stealing (steal some cycle) dma controller takes over a bus for a cycle transfer of one word of data not an interrupt ○ cpu does not switch context cpu suspended just before it accesses bus slows down cpu but not as much as cpu doing transfer (thats why the strategies existed) 2. burst mode (give a time/duration) dma controller transfers block of data halting the cpu & control the system bus for the duration fo the transfer transfer will be quick as the weakest link in the i/o module/bus/memory chain cpu still be halted while transfer (in dma controller) takes place DMA CONFIGURATIONS dma configurations (1) single bus, detached DMA controller each transfer -> bus twice ○ i/o -> dma ○ dma -> memory cpu suspended twice (uses bus twice) dma configurations (2) single bus, integrated DMA controller (dalam i/o module tu dah ada dma) ○ have multiple DMA controller may support > 1 device each transfer uses bus once ○ dma -> memory cpu suspended once dma configurations (3) separate i/o bus bus supports all dma enabled devices each transfer uses bus once ○ DMA -> memory CPU suspended once I/O CHANNELS i/o module -> most of the detailed processing burden, a high-level interface to the processor i/o evices getting more sophisticated (GPU) cpu instructs i/o controller to do transfer ○ i/o controller does entire transfer improves speed ○ takes load off cpu ○ dedicated processor is faster I/O INTERFACES handle sychronization and control of the peripheral, actual transfer of data sequences to send data to a peripheral 1. i/o module sends a control signal to peripheral a. requesting permission to send data 2. peripheral acknowledges the request 3. i/o module send data 4. peripheral acknowledges receipt of data known as handshaking internal buffer ○ allows i/o module to compensate for some of the difference in speed at which interface can communicate with the peripheral, and speed of the bus parallel interface multiple wires connecting i/o module to peripheral (dekat tepi bits of data transferred simultaneously as they are over the computer, etc like data bus for the tempat high speed peripherals (disk drives) cucuk wire, sometimes wired mouse kan, ha tempat nak cucuk tu) serial interfaces a single wire connects the i/o module to the peripheral data transferred one bit at a time slower peripherals -> printers and keyboards which one has parallel interfaces have higher speed higher ○ multiple wires for data transfer between performance in problem? general and their ○ if one wire gives problem, entire group of transfer will problems need to be retransmitted again ○ chance to pick up error on multiple wires is much higher serial interface for slower speed devices can upgrade performance ○ implement a higher clock speed controller to coordinate data transfer ○ upgrading to a higher speed controller D. I/O PROBLEMS (cannot simply connect I/O devices directly to the system bus) 1. many different types of device, with different method of operations 2. data transfer rate of peripherals much slower than CPU. - cpu cannot communicate directly with such devices - without slowing the whole system down 3. peripherals often use different data words sizes and formats than CPU E. EXTERNAL DEVICES - human readable: communicate w computer users - printers, keyboard, mouse - machine readable = communicate with equipment - monitoring & controlling - communication = communicate w remote devices - modem & NIC - external device block diagram LU8: OPERATING SYSTEM SUPPORT 1. functions 2. hardware and software structure 3. operating system services 4. OS as resource manager 5. types of operating system a. interactive b. batch c. single program (uni-programming) d. multi programming (multi-tasking) i. single program ii. two programs iii. three programs iv. time sharing systems 6. scheduling a. long term scheduling b. medium term scheduling c. five state process model 7. memory management a. uni program b. multi program i. swapping 1. what is swapping 2. use of swapping ii. partitioning 1. fixed partitioning 2. variable sized partitioning iii. OBJECTIVES/FUNCTIONS OS convenience ○ computer easy to use efficiency ○ allow better use of computer resources OPERATING 1. program creation SYSTEM SERVICES a. compile 2. program execution a. run/execute program 3. access to i/o devices a. i/o modules & peripherals 4. controlled access to files 5. system access 6. error detection and response 7. accounting a. manage resources i. control the computer basic functions ii. control will be passed to user program during the execution of that user program iii. upon completion of execution -> control passed back to OS 1. contoh if we nak run a program (yang kita buat), the control akan diberi dekat that program then after program tu habis run, control pass back to the os iv. OS -> master program 1. have control over ssmaller user programs 2. share computer resources (CPU time, memory, I/O) with all programs problems of early computer systems 1. scheduling - users must sign up how much time he/she wants to use the computer - depends on human manager - if signed up time is not enough, need to book/sign up new time slot - if task finishes earlier, remaining signed up time wasted 2. set up time - needs to set up first before program can run - set up based on requirements of each program TYPES OF OPERATING SYSTEM interactive user can interact directly using keyboard/display terminal batch all user programs batched together submitted by a computer operator single program (uni-programming) running just a single program at a time multi-programming (multi-tasking) running multiple program at the same time, trying to keep the processor as busy as possible ○ resource -> has to maximise the usage two programs three programs simple batch systems Resident Monitor Program ○ users submit jobs to operator ○ operator batches jobs ○ monitor controls sequence of events to process batch ○ when one job is finished, control returns to Monitor which reads next job ○ monitor handles scheduling monitor program solve scheduling problem memory layout for resident monitor 1. Control Language Interpreter: The ﬁrst part of the Resident monitor is control language interpreter which is used to read and carry out the instruction from one level to the next level. 2. Loader: The second part of the Resident monitor which is the main part of the Resident Monitor is Loader which Loads all the necessary system and application programs into the main memory. 3. Device Driver: The third part of the Resident monitor is Device Driver which is used to manage the connecting input-output devices to the system. So basically it is the interface between the user and the system. it works as an interface between the request and response. request which user made, Device driver responds that the system produces to fulﬁll these requests. 4. Interrupt Processing: The fourth part as the name suggests, it processes the all occurred interrupt to the system. other desirable hardware features ○ memory protection to protect the monitor from user program ○ timer to prevent a job monopolizing the system ○ privileged instructions only executed by monitor i/o ○ interrupts allows for relinquishing and regaining control multi-programmed i/o devices very slow batch systems one program waiting for i/o, another program can use the processor ○ multi-programmed = multitasking main memory hold resident monitor program and user program(s) ○ program A is waiting for I/O, processor can switch to program B ○ main memory must be large enough to hold all the user programs need memory management and scheduling algorithm time sharing systems multi-programing ○ allows processor to handle multiple jobs multiple interactive jobs allows user to interact directly w computer ○ interactive (minimize interaction time) allows a number of users to interact w the computer BATCH MULTIPROGRAMMING VS TIME SHARING SCHEDULING long term scheduling determine which programs are submitted for processing ○ control the degree (number of processes in memory) of multiprogramming once submitted, the user program becomes a process + added to queue of the short term scheduler/medium-term scheduler (swapped out condition) medium term part of swapping function scheduler ○ to move a program between main memory and virtual memory swap in and swap out decision based on the need to manage multi-programming if no virtual memory, memory management also an issue short term scheduler the dispatcher ○ immediately deliver to the processor decide which job to execute next ○ which job get to use the processor in the next time slot i/o scheduling decision as to which process’s pending i/o request shall be handled by an available i/o device FIVE STATE PROCESS MODEL (life cycle of a programme) PROCESS CONTROL BLOCK (PCB) data structure used by OS to manage and control the execution of the processes 1. identifier - process unique identifier 2. state - current process state (new, ready, …) 3. priority - priority level of process (OS decide) 4. program counter - address of next instruction to be executed 5. memory pointers - starting and end locations of program 6. context data - data in the registers 7. i/o status - outstanding i/o requests, i/o devices assigned to process 8. accounting information - amount of resources used memory 1. uni program management - memory split into two - one for OS (monitor) - one for currently executing program (user) 2. multi-program - “user” part is sub-divided and shared among active processes - time programs/processes waiting for i/o and processor to be idle - need to pack more programs into memory efficiently (memory management) swapping problem: i/o so slow compared w cpu even in multi-programming system ○ cpu can be idle most of the time solution? ○ increase main memory, why? to accommodate more process (to keep processor busy) cons expensive programs getting larger (less program can be hold by main memory) ○ swapping move programs out of main memory to intermediate queue on the disk long term queue of processes -> disk processes “swapped” -> space becomes available as a process complete, move out of main memory if none process in memory ready swap out blocked process to intermediate queue swap in a ready process or a new process swapping -> i/o process PARTITIONING splitting memory into sections to allocate to process (including OS) fixed-size may not be equal size process fitted into smallest hole that will take it (best fit) some wasted memory ○ not fully utilise the allocated memory leads to variable sized partitions (to overcome this) variable-sized allocate exactly the required memory to a process partitions problem: leads to hole at the end of memory ○ too small to use ○ one small hole - less waste when all processes are blocked, swap out a process and bring in another ○ problem: new process may be smaller than swapped out process ○ another hole solutions to these holes (fragmentation) ○ compaction from time to time go through memory and move all hole into one free block effect of dynamic partitioning relocation no guaranteed that process load into the same place in memory instructions contain addresses ○ location of data ○ addresses for instructions (branching) logical address ○ relative to beginning of the program physical address ○ actual location in memory automatic conversion using base address paging split memory -> equal sized, small chunks, page frames split programs -> equal sized small chunks - pages allocate the required number page frames to a process OS maintains list of free frames a process does not require contiguous page frames use page table to keep track logical and physical addresses virtual memory allow very effecting multiprogramming relieves users from unnecessarily tight constraint of main memory demand paging ○ does not require all pages of a process in memory ○ bring in pages as required page fault (if not found in main memory) ○ required page is not in memory ○ OS swap in required page (page replacement) ○ may need to swap out a page to make space ○ select page to throw out based on recent history thrashing too many processes in too little memory OS -> swapping little or no real work is done ○ disk activity indicator light is on all the time solutions? ○ good page replacement algorithms ○ reduce number of processes running ○ fit more memory Computerized Multiplication Multiplier is loaded into register Q Multiplicand is loaded into register M Register A is initialized to 0 Register C (1 bit) is initialized to 0 – holds a potential carry Control logic reads the bits of multiplier one at a time Example of Execution Q0 If Q0 = 1, multiplicand is added to A, result is Multiplier Multiplicand stored in A, C is used for overflow C, A and Q are shifted to the right one bit — Q0 is lost Computerized Multiplication (Cont…) If Q0 = 0, shift Result is contained in A and Q Result This is how you implement Multiplication in computer Computerized Implementation Multiplier is loaded into register Q and Multiplicand is loaded into register M Register A is initialized to 0 Register Q-1 (1 bit) is initialized to 0 Control logic scans the bits of the multiplier one at a time If Q0,Q-1 = 1,1 or 0,0 – all bits of the A, Q, and Q-1 registers are shifted to the right 1 bit Q0,Q -1 = 1,1 or 0,0 Example of Booth’s Algorithm Q0,Q -1 = 0,1 A + M Q0,Q -1 = 1,0 A - M Multiplier Multiplicand Result Computerized Implementation (Cont…) If Q0,Q-1 = 0,1 - multiplicand is added to A If Q0,Q-1 = 1,0 - multiplicand is subtracted from A Right shift A, Q, and Q-1 Sign preservation: An-1 is shifted and remains in An-1 (An-1 is copied to An-2, not moved) This is how you implement Multiplication in computer Computerized Division Divisor → M, dividend → A, Q Shift A, Q left 1 bit If M and A have the same signs, perform A  A – M, otherwise A  A + M If the sign of A is the same before and after the operation or A = 0, then Q0 = 1 Example of Execution If the sign of A is not the same before and after the Divisor operation and A  0, then Q0 = 0 and restore the A 0000 Q 0111 M = 0011 Initial value previous value of A 0000 1110 Shift Dividend 1101 Subtract 0000 1110 Restore 0001 1100 Shift 1110 Subtract 0001 1100 Restore Question : (7)/(3) 0011 1000 Shift Computerized Division (Cont…) 0000 Subtract Remainder is in A. If the signs of the divisor and 0000 1001 Set Q0 = 1 Remainder dividend were the same, then the quotient is in Q; 0001 0010 Shift otherwise, the correct quotient is in the twos 1110 Subtract Quotient complement of Q. 0001 0010 Restore This is how you implement Division in a computer LU10: INSTRUCTION SETS: CPU STRUCTURE AND FUNCTION 1. CPU structure 2. CPU w system bus 3. CPU internal structure 4. Registers a. user-visible registers i. gp registers ii. data registers iii. address registers iv. condition code registers (flags) 5. about registers i. number of registers ii. register length b. control and status registers i. program status word (PSW) c. other registers 6. instruction cycle a. indirect cycle 7. data flow a. instruction fetch b. data fetch c. execute d. interrupt 8. prefetch 9. improved performance 10. pipelining a. fetch instruction FI b. decode instruction DI c. calculate operands CO d. fetch operands FO e. execute instructions EI f. write operand WO 11. conditional branch a. dealing with branches i. multiple streams ii. prefetch branch target iii. loop buffer iv. branch prediction 1. static approaches a. predict never taken b. predict always taken c. predict by opcode 2. dynamic approach a. taken/not taken switch b. branch history v. delayed branching CPU STRUCTURE instructions 1. fetch instruction - reads an instruction from memory 2. interpret instruction - decodes an instruction to determine required action 3. fetch data (fetch required data) - reads the data from the memory or an I/O module 4. process data - performs arithmetic/logical operations on data 5. write data (write back the output) - write results of an execution to memory or i/o module REGISTERS working space = temporary storage = registers number of registers and functions vary between processor design ○ register match cpu speed top level of memory hierarchy, faster, smaller ROLES (1) USER-VISIBLE enable machine language/ assembly language programmer to minimize main memory references by optimizing use of registers 4 CATEGORIES general purpose may be true general purpose (GP) ○ can contain operand for any opcode may be restricted ○ dedicated registers for stack operations etc may be used for data/addressing ○ if address register tak cukup, GP can be a back up IF THE GP: make them general purpose ○ increase flexibility and programmer options (pros) ○ increase instruction size & complexity (cons) make them specialized ○ smaller (faster instructions) (pros) ○ less flexibility (cons) if have diff task/scenario the purpose of register cannot change/stays the same data registers used only to hold ata cannot be employed in the calculation of an operand address address registers hold addresses may support more than one addressing modes (macam general purpose) may be devoted to a particular addressing mode ○ segment pointer a segment register holds the address of the base of the segment ○ stack pointer a dedicated register that points to the top of the stack condition code bits set by cpu hardware as the results of operations registers (flags) ○ result of last operation was zero ○ 0 -> positive ○ 1 -> negative ○ like penanda ah gitu can be read (implicitly) by programs ○ jump if zero can not (usually) set by programs about registers number of registers ○ between 8-32 ○ fewer = more memory references (banyak yang asyik kena refer to memory if nak access data, hit/miss increase) ○ more does not mean reduce memory references takes up processor space ○ perfomance, cost register length ○ address registers large enough to hold full/largest address ○ data registers large enough to hold full word possible to combine two contiguous data registers to be used as one for holding double-length value (2) CONTROL & STATUS REGISTERS control the operation of CPU invisible to user some visibile to machine instruction execute in a control/OS mode (4) ESSENTIAL REGISTERS 1. program counter (PC) - contains address of an instruction to be fetched 2. instruction register (IR) - contains instruction most recently fetched 3. memory address register (MAR) - address of a location in memory 4. memory buffer register (MBR) - a word of data to be written to memory/word most recently read Program Status a set of bits contains status information Word (PSW) include condition codes ○ check if the instruction contains this - sign (contain sign bit of the result of last arithmetic operation) - zero (set when result is 0) - carry (set if an operation resulted in carry (addition) into or borrow (subtraction) out of high-order bit) - equal (set if a logical compare result is equality) - overflow (used to indicate arithmetic overflow) - interrupt enable/disable (enable/disable interrupts) - supervisor (indicate whether the processor executing in supervisor or user mode) other registers may have registers for ○ process control blocks (OS) ○ interrupt vectors (OS) ○ subroutine call ○ control of i/o operations instruction cycle may require more access to fetch operands indrect addressing -> more memory accesses - indirect cycle additional instruction subcycle indirect instruction cycle: instruction cycle state diagram: DATA FLOW instruction fetch depends on cpu design during fetch: ○ pc (program counter) contain address of next instruction to be fetched ○ address moved to MAR (memory address register) ○ address placed on address bus ○ control unit request memory read ○ results placed on data bus -> copied to MBR (memory buffer register) -> IR (instruction register, contains most recently fetched instruction) ○ PC incremented by 1, why? preparatory for the next fetch data fetch IR is examined (decoding) if indirect addressing -> indirect cycle performed (bawah ni steps to instruction fetch in indirect cycle) ○ right most N bits of MBR (contains address reference) transferred to MAR ○ control unit requests memory read ○ result (address of operand) moved to MBR fetch diagram execute may take many forms depends on instruction being executed may include ○ memory read/write ○ input/output ○ register transfers ○ ALU operations interrupt current PC saved ○ to allow resumption after interrupt ○ lepas buat interrupt, dia nak sambung yang dia buat tadi PC copied to MBR special memory location loaded to MAR MBR written to memory PC loaded with address of interrupt handling routine next instruction (first of interrupt handler) can be fetched prefetch fetch access main memory (amik instruction execution usually does not access main memory, why? siap-siap tapi ○ fetch data/instruction from register kena access can fetch next instruction during execution of current main memory, instruction when to apply? if ○ instruction prefect we already know improved performance! what to do) ○ but not doubled fetch usually shorter than execution can ke prefetch more than one instruction? any jump/branch means that prefetched instructions are not required instructions ○ add more stages to improve perfomance pipelining laying the production process out in an assembly line product at various stages can be worked on simultaneously (boleh buat kerja serentak) instruction cycles’ various stages: ○ fetch instruction FI ○ decode instruction DI ○ calculate operands CO ○ fetch operands FO ○ execute instructions EI ○ write operand WO conditional branch penalty branch ○ no instruction complete during this time APPROACHES TO DEAL WITH CONDITIONAL BRANCHES (4) conditional branch instruction: pipelining strategy until instruction is actually executed, it is impossible to determine whether the brance will be taken or not multiple streams have two pipelines prefetch each branch into a separate pipeline use appropriate pipeline problems ○ lead to bus & register contention (conflict when two or more programs want to use same resource) ○ multiple branches -> further pipelines being needed prefetch branch target of branch is prefetched target ○ to instructions following branch keep target until branch is executed know in advance, but cannot decide until finish everything prefecth -> instruction is fetch early loop buffer a small, very fast memory ○ maintained by instruction fetch stage of pipeline ○ contain n most recently fetched instructions in sequence check buffer before fetching from memory if branch is to be taken and the branch target is within buffer ○ next instruction is fetched from the buffer very good for small loops or jumps ○ cache branch prediction (1) static approaches predict never always assume that branch will not be taken taken always fetch next instruction in sequence predict always assume that branch will be taken taken always fetch instruction from branch target predict by opcode makes decision based ○ opcode of branch instruction processor assume that the branch will be taken for certain branch opcode and not for others success rate > 75% (2) dynamic approach - attempt to improve accuracy of prediction - recording history of conditional branch instructions in a program taken/no taken based on previous history switch one or more bits associates w each conditional branch instruction ○ reflect the recent history of instruction good for loops (pattern) delayed automatically rearrange instructions within a program branching so that branch instruction occur later than actually desired ○ delay first if it generate branches LU11: RISC VS CISC A. major advances in computers B. RISC C. CISC D. instruction execution characteristics a. operations performed b. operands used c. execution sequencing E. large register file F. registers for local variables G. register windows H. circular buffer I. global variables J. registers vs cache K. why cisc L. risc characteristics M. risc vs cisc N. risc pipelining a. effects b. optimization O. normal and delayed branch A. major advances in computers microprogrammed control unit ○ microprogramming eases the design and implementation task of control unit cache memory ○ improves performance pipelining ○ introduces parallelism into fetch execute cycle ○ execute several instruction at the same time multiple processors ○ with a number of different organizations and objectives B. Reduced Instruction Set Computer (RISC) processor architecture key features ○ large number of GP registers and/or use of compiler technology to optimize register usage ○ limited and simple instruction set ○ emphasis on optimising the instruction pipeline RISC CHARACTERISTICS ○ one instruction per cycle ○ register to register operations ○ few, simple addressing modes ○ few, simple instruction formats C. Complex Instruction Set Computer (CISC) richer instruction sets ○ larger number and more complex instructions ○ use more space in memory ○ resembling high level language software costs far exceed hardware costs increasingly complex high level languages semantic gap ○ differences between operations provided in high level languages (HLLs) and computer architecture ○ reduce the gap between compiler and machine language leads to: ○ large instruction sets ○ more addressing modes ○ hardware implementations of HLL statements intention of cisc ○ ease compiler writing ○ improve execution efficiency complex operations in microcode ○ support more complex HLLs WHY CISC? 1. compiler simplification - complex machine instructions harder to exploit - optimization more difficult 2. smaller programs - program takes up less memory - may not occupy less bits, look shorter in symbolic form - more instructions require longer op-codes - register references require few bits 3. faster programs - bias towards use of simpler instructions - more complex control unit - microprogram control store larger - simple instructions take longer to execute D. INSTRUCTION EXECUTION CHARACTERISTICS operations performed ○ determine the function to be performed by the processor ○ processor interaction w memory operands used ○ types of operands ○ usage frequency ○ memory organization ○ addressing modes execution sequencing ○ control and pipeline organization E. LARGE REGISTER FILE software solution (CISC) ○ require compiler to allocate registers ○ allocated based on most used variables in a given time ○ requires sophisticated program-analysis algorithm hardware solution (RISC) ○ have more registers ○ more variables will be in registers F. REGISTERS FOR LOCAL VARIABLES store local scalar variables (integers, pointers, etc) in registers ○ a few registers reserved for global variables reduces memory access every procedure (function) call changes locality local variables must be saved from registers -> memory parameters must be passed variables from calling programs must be restored results must be returned G. REGISTER WINDOWS a few passed parameters and local variables multiple sets of registers ○ each assigned to a different procedure calls switch to a fixed-sized window of registers rather than memory windows for adjacent procedures are overlapped to allow parameter passing three fixed-sized areas within a register set ○ parameter registers hold parameters passed down from the parameter that called current procedure hold results to be passed back up ○ local register local variables assigned by compiler ○ temporary registers to exchange parameters and results with the next lower level temporary registers at one level physically the parameter registers at the next lower level overlap ○ permits parameter passing without the actual movement of data overlapping register windows H. CIRCULAR BUFFER operation ○ a call is made, current window pointer moved to show the currently active register window ○ if all window in use -> interrupt is generated, the oldest window (the one furthest back in the call nesting) is saved to memory ○ a saved window pointer indicates where the next saved windows should restore to I. GLOBAL VARIABLES allocated by compiler -> memory ○ inefficient for frequently acessed variables ○ susah nak cari if the variables are frequently accessed sebab store in the main memory have a set of registers for global variables ○ increase hardware burden to accommodate the split in register addressing ○ compiler must decide which global variables should be assigned to registers J. REGISTERS VS CACHE REGISTERS CACHE all local variables (scalars) recently used local variables individual variables blocks of memory compiler assigned global variables recently used global variables save/restore based on procedure nesting save/restore based on cache replacement depth algorithm register addressing memory addressing K. RISC pipelining most instructions; register to register two phases of execution ○ I: instruction fetch ○ E: execute alu operation w register input and output for load and store ○ I: instruction fetch ○ E: execute calculate memory address ○ D: memory register to memory or memory to register operation L. EFFECTS OF PIPELINING M. OPTIMIZATION OF PIPELIINING delayed branch ○ does not take effect until execution of following instruction ○ following instruction = delay slot delayed load ○ register to be target is locked by processor ○ continue execution of instruction stream until register required ○ idle until load is complete ○ re0arranging instructions can allow useful work while loading loop unrolling ○ replicate body of loop a number of times ○ iterate loop fewer times ○ reduces loop overhead ○ increases instruction parallelism ○ improved register, data cache, tlb locality

SN NOTES 9.2 - 11.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue