Lec 4_8.pdf

EMBEDDED SYSTEM DESIGN (EEE G512) Class Code: i27rlxu Processors Processors are broadly classified into 3 major categories: General Purpose - high performance Pentiums, Alpha's, SPARC Used for general-purpose software Heavy weight OS - UNIX, NT Workstations, PC's Embedded processors and processor cores ARM, 486SX, Hitachi SH7000, NEC V800 Single program Lightweight, real-time OS DSP support Cellular phones, consumer electronics (e. g. CD players) Microcontrollers Extremely cost-sensitive Small word size – 8-bit common Highest volume processors by far Automobiles, toasters, thermostats 2 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Processors The performance of a processor is measured in terms of the following metrics: MIPS: It is the measure of processing speed of a processor in million instructions per sec. MFLOPS: It is the measure of processing speed of a processor or DSP in million floating point operations per second. A general-purpose processor consists of a data path, a control unit tightly linked with the memory Instruction Register: A register inside the CPU which holds the instruction code temporarily before sending it to the decoding unit. Program Counter: It is a register inside the CPU which holds the address of the next instruction code in a program. It gets updated automatically by the address generation unit. Instruction Queue: A set of memory locations inside the CPU to hold the instructions in a pipe-line before rending them to the next instruction decoding unit. Control Unit: This is responsible in generating timing and control signals for various operations inside the CPU. It is very closely associated with the instruction decoding unit. 3 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Processors Pipelining: overlapping execution, increase throughput. – Parallelism improves performance – Analogy: Laundry Superscalar Processing: A superscalar processor has the capacity to fetch (instructions from memory), decode (instructions), and execute more than one instruction in parallel at any instant. – Superscaling allows ( two or more) instructions to be processed in parallel (full overlapping). – Multiple units are provided for instruction processing. – Supports pipelining Pentium has two 5-stage pipelines to execute two instructions per clock cycle. Whereas Pentium II has a single-stage pipeline but multiple functional units. Power PC MPC 601 (RISC, first PowerPC, 66 MHz, 132 MIPS) has 3 execution units, 1 branch unit (branching), 1 integer unit, 1 floating point unit, and can dispatch up to 2 instructions and process 3 every clock cycle 4 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Processors The various units of the Microcontroller are: The C500 Core contains the CPU which consists of the Instruction Decoder, Arithmetic Logic Unit (ALU) and Program Control section The housekeeper unit generates internal signals for controlling the functions of the individual internal units within the microcontroller. Port 0 and Port 2 are required for accessing external code and data memory and for emulation purposes. The external control block handles the external control signals and the clock generation. The access control unit is responsible for the selection of the on- chip memory resources. The IRAM provides the internal RAM which includes the general purpose registers. The XRAM is another additional internal RAM sometimes provided The interrupt requests from the peripheral units are handled by an Interrupt Controller Unit. Serial interfaces, timers, capture/compare units, A/D converters, The architecture of a typical microcontroller named as watchdog units (WDU), or a multiply/divide unit (MDU) are typical C500 from Infineon Technology, Germany examples of on-chip peripheral units. The external signals of these peripheral units are available at multifunctional parallel I/O ports or at dedicated pins. 5 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Digital Signal Processors (DSP) have been designed based on the modified Harvard Architecture to handle real-time signals. These DSP units generally use Multiple Access and multi-port memory units. Multiple access memory allows more than one access in one clock period. The Multi-ported Memory allows multiple addresses as well as Data ports. This also increases the number of accesses per unit clock cycle. The Very Long Instruction Word (VLIW) architecture is also suitable for Signal Processing applications. Modified Harvard architecture Block Diagram of VLIW architecture 6 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Microprocessors vs Microcontrollers Microprocessor-based System A Microcontroller 7 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Microprocessors vs DSP 1. CPU for PCs and workstations E.g., Intel Pentium IV 1. Microprocessors specialized for signal processing applications 2. Von Neumann architecture 2. Harvard architecture 3. Typically 1 access per cycle 3. Two to Four memory accesses per cycle 4. Most operations take more than 1 cycle 4. Dedicated hardware performs all key arithmetic 5. General-purpose instructions: typically only one operations in 1 cycle operation per instruction 5. Very limited SIMD (Single Instruction Multiple Data) 6. Often, no separate address generation units features and Specialized, complex instructions, multiple 7. General-purpose addressing modes operations per instruction 8. Software loops only 6. Dedicated address generation units 9. Interrupts rarely disabled 7. Specialized addressing [ Auto-increment Modulo 10. Register shadowing common (circular) Bit-reversed ] 11. Dynamic caches are common 8. Hardware looping. 12. Wide range of on-chip and off-chip peripherals and 9. Interrupts disabled during certain operations I/O interfaces 10. Limited or no register Shadowing 13. Asynchronous serial port 11. Rarely have dynamic features 12. Relatively narrow range of DSP-oriented on-chip peripherals and I/O interfaces 13. synchronous serial port 8 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Multi-Processor System GPPs + Application Specific System Processors (ASSPs) – e.g. real-time video processing and multimedia applications require multi processing units 9 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Pipelining Hazards- scenarios that prevent the execution of the next instruction in the instruction stream during its designated clock cycle. Structural Hazards: Arise from resource conflicts when the hardware cannot support all possible combinations of instructions in simultaneous overlapped execution. Control Hazards: Caused by the delay between the fetching of instructions and decisions about changing the control flow Data Hazards: Arise when an instruction depends on SCALAR PIPELINED ARCHITECTURES the result of a previous instruction Performance Improvement: Increase clock frequency - Super-pipelining Increase the number of instructions executed in the same cycle- Superscalar 10 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Pipelining Superscalar Architecture: Involves adding multiple execution units (pipelines) to the CPU Instruction-level parallelism (ILP)- a measure of how many instructions can be executed simultaneously. Complex scheduling 11 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Pipelining Data hazards occur when there is a conflict in the access or use of operand data. These can be categorized into three types: read-after-write (RAW), write-after-read (WAR), and write-after-write (WAW). – A RAW hazard, also known as a true dependency, occurs when an instruction depends on the result of a previous instruction. – A WAR hazard occurs when an instruction depends on the reading of a value before that value is overwritten by a previous instruction. – A WAW hazard occurs when a value is written by an instruction before the previous instruction writes that value. Note: In an optimizing compiler, the occurrence of WAR and WAW hazards is reduced by register renaming, where the compiler uses different registers for different uses of the same value. This is usually done at compile time. RAW hazards, on the other hand, cannot be mitigated at compile time and must be dealt with at runtime. Example of Data Hazard: Consider a situation in which two instructions are being executed concurrently in a pipeline. The first instruction writes a value to a register, while the second instruction reads from the same register: ADD R1, R2, R3 // Instruction 1 SUB R4, R1, R5 // Instruction 2 The SUB instruction cannot be executed in the next cycle after the ADD instruction because it requires the result of the ADD instruction stored in R1 as an operand. This delay is a classic example of a data hazard. 12 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Pipelining Example of Control Hazard: A classic example of a control hazard is the if-then-else construct in programming. if (a < b) x = a; // Instruction 1 else x = b; // Instruction 2 y = x; // Instruction 3 If the condition is not resolved or mis-predicted, either Instruction 1 or Instruction 2 will have to be discarded, introducing a control hazard. Example of Structural Hazard: Consider the execution of two different instructions simultaneously: LOAD R1, 7(R1) // Instruction 1 MULT R2, R3, R4 // Instruction 2 If the system only has one memory unit to deal with both LOAD and MULT instructions, then there's a delay due to the unavailability of resources, leading to a structural hazard. Understanding these examples will help in having a clearer perspective on the nature of CPU Pipeline Hazards and how they can influence overall computing performance. Self Study: Techniques to Handle Pipeline Hazards 13 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION RISC vs CISC Reduced Instruction Set Computers (RISC) have a small number of simple, frequently used instructions. Complex Instruction Set Computers include as many instructions as users might need to write efficient programs. RISC CISC Only Load/Store instructions can access main Any instructions can access main memory (Reg memory (Reg to Reg). to Mem) Optimize for RISC Simpler instructions, thus simpler instruction Complex instructions, thus complex decoding. decoding. Optimize for CISC Instructions take one clock cycle to get Required more than one clock cycles. executed. Instructions are of a single word size. Instructions are more than one word size. Large code size, but RAMs are cheap. Small code size. More general purpose registers. Lesser general purpose registers. Low Cost, Slower, but the problem is High Cost, Fast only if the compiler generates overcome using more registers and pipelining. appropriate code. Examples: MIPS, SPARC, Power PC, etc. Examples: x86, VAX, MC68000 14 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION System On Chip Apple's custom designed M1 chip replaces Intel processors in Macs. M1’s 8-core CPU and GPU combo boosts performance across tasks. 5nm process ensures M1’s efficiency, extending battery life. Unified memory architecture enhances data access for faster performance. Integrated Neural Engine empowers M1 with advanced machine learning capabilities. 15 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Memory Memory serves the processor’s short and long-term information storage requirements. The memory may be Read-Only-Memory (ROM) or Random Access Memory (RAM). – The memory can also be divided into Volatile (RAM) and Non-volatile (ROM, Hard Disks, CD) memory. – It may exist on the same chip with the processor itself (on-chip) or may exist outside the chip (off-chip). To reduce the access (read-write) time a local copy of a portion of memory can be kept in a small but fast memory called the cache memory. The memory also can be categorized as Dynamic or Static. – Dynamic memory dissipates less power and hence can be compact and cheaper. But the access time of these memories is slower than their Static counterparts. – In Dynamic RAMs (or DRAM) the data is retained by periodic refreshing operation. – In Static Memory (SRAM) the data is retained continuously. – SRAMs are much faster than DRAMs but consume more power. – The intermediate cache memory is an SRAM. 16 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Memory The memory can be classified in various ways i.e. based on the location, power consumption, way of data storage, etc. The memory at the basic level can be classified as: 1. Processor Memory (Register Array) 2. Internal on-chip Memory 3. Primary Memory 4. Cache Memory 5. Secondary Memory Memory Specifications – storage capacity – memory access time (read access and write access) – Bandwidth There are two important specifications for the Memory as far as Real-Time Embedded Systems are concerned. – Write Ability – Storage Performance Self Study: HM6264 & 27C256 RAM/ROM devices 17 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Memory Hierarchy: How Does it Work? Temporal Locality (Locality in Time): Keep most recently accessed data items closer to the processor Spatial Locality (Locality in Space): Move blocks consist of contiguous words to the upper levels The objective is to use inexpensive, fast memory Main memory – Large, inexpensive, slow memory stores the entire program and data Cache – Small, expensive, fast memory stores copies of likely accessed parts of larger memory – Can be multiple levels of cache 18 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION 19 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Cache Mapping It is necessary as there are far fewer number of available cache addresses than the memory Cache mapping is used to assign the main memory address to the cache address and determine hit or miss Three basic techniques: (Self Study) Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line Cache/Main Memory Structure: Cache Design: Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches 20 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Cache-Replacement Policy/Cache Write Techniques Cache-Replacement Policy Cache Write Techniques Technique for choosing which block to replace When written, the data cache must update the main when the fully associative cache is full memory When the set-associative cache’s line is full Write-through Direct mapped cache has no choice write to main memory whenever cache is written to easiest to implement Random processor must wait for slower main memory write Replace block chosen at random potential for unnecessary writes LRU: least-recently used Write-back Replace block not accessed for the longest time main memory only written when “dirty” block replaced FIFO: first-in-first-out extra dirty bit for each block set when cache block written to reduces number of slow main memory writes Push the block onto queue when accessed Choose a block to replace by popping queue 21 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Cache Impact on System Performance Most important parameters in terms of performance: Total size of cache total number of data bytes the cache can hold tag, valid, and other house keeping bits not included in the total Degree of associativity Data block size Larger caches achieve lower miss rates but higher access costs. Example: 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles Avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change Avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement) 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change Avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles 22 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Input-Output Devices In the traditional definition, input-output devices create a medium of interaction with human users. They fall into the following categories such as: Printers Visual Display Units Keyboard Cameras Scanners, etc. However in Real-Time embedded systems the definition of I/O devices is very different. An embedded controller needs to communicate with a wide range of devices namely Analog to Digital (A-D) and Digital to Analog (D-A) Converters Small Screen Displays such as TFT, LCD, etc. Antennas Cameras Microphones Touch Screens, etc. Example: Digital Camera 23 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Interfacing The functionality of an Embedded System can be broadly classified as Processing Transformation of data Implemented using processors Storage Retention of data Implemented using memory Communication (also called Interfacing) Transfer of data between processors and memories Implemented using buses Interfacing is a way to communicate and transfer information in either way without ending in deadlocks. In our context, it is a way of effective communication in real-time. This involves Addressing Arbitration Protocols 24 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Timer Timer is used to generate events at specific times or measures the duration of specific events which are external to the processor. For example, a typical embedded processor (8051 Architecture) as shown in Fig. with timers and interrupts. They can be a part of the microcontroller or can reside outside the chip ➔ therefore, should be properly interfaced with the processor. Designed to achieve the following objectives Timers, counters, watchdog timers Serial transmission Analog/Digital conversions 25 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Timer Timer Count and Output (count-down mode) 26 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Direct Memory Access (DMA) In a Programmable input-output (PIO) system, data transfer involves one read and write under processor supervision. However, each read/write operation takes several clock cycles to complete. very inefficient when we have to. transfer a huge amount of data between memory and peripherals. For example, we have a video frame stored in memory and need to transfer it to the display controller. The PIO method won’t meet the performance requirement (30 fps). Another drawback of PIO is that the processor wastes time for data transfer instead of during some useful processing. DMA allows devices to transfer data without subjecting the processor to a heavy overhead. A DMA controller is a device, usually peripheral to a CPU that is programmed to perform a sequence of data transfers on behalf of the CPU. It is a master of all other peripherals, but it is still a slave to the processor. 27 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION DMA Controller Architecture 28 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION DMA Transfer A DMA controller can directly access memory and is used to transfer data from one memory location to another, or from an I/O device to memory and vice versa. A DMA controller manages several DMA channels, each of which can be programmed to perform a sequence of these DMA transfers. A DMA request signal (DRQ) for each channel is routed to the DMA controller. When the DMA controller sees a DMA request, it responds by performing one or many data transfers from that I/O device into system memory or vice versa. Channels must be enabled by the processor for the DMA controller to respond to DMA requests. The number of transfers performed, transfer modes used, and memory locations accessed depend on how the DMA channel is programmed. The processor configures the DMA controller (Internal Registers) with the following Information. – Starting address from where data has to be transferred – Starting address where the data has to be stored – Total length of the data transfer The internal registers consist of source and destination address registers and transfer count registers for each DMA channel, as well as control and status registers for initiating, monitoring, and sustaining the operation of the DMA controller. In bus master mode, the DMA controller acquires the system bus (address, data, and control lines) from the CPU to perform the DMA transfers. In bus slave mode, the DMA controller is accessed by the CPU, which programs the DMA controller's internal registers to set up DMA transfers. 29 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Steps in a Typical DMA cycle 1. Processor completes the current bus cycle and then asserts the bus grant signal to the device. 2. The device then asserts the bus grant ack signal. 3. The processor senses the change in the state of the bus grant ack signal and starts listening to the data and address bus for DMA activity. 4. The DMA device performs the transfer from the source to the destination address. 5. During these transfers, the processor monitors the addresses on the bus and checks if any location modified during DMA operations is cached in the processor. If the processor detects a cached address on the bus, it can take one of the two actions: Processor invalidates the internal cache entry for the address involved in the DMA write operation Processor updates the internal cache when a DMA write is detected 6. Once the DMA operations have been completed, the device releases the bus by asserting the bus release signal. 7. Processor acknowledges the bus release and resumes its bus cycles from the point it left off. 30 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION AD and DA Converters Real signals (e.g., a voltage measured with a thermocouple or a speech signal recorded with a microphone) are analog in nature, varying continuously with time. Digital format offers several advantages: manipulation, storage, use of computers, robust transmission, etc. An ADC (Analog-to-Digital Converter) is used to convert an analog signal to a digital format and a DAC (Digita-to-Analog Converter) does the reverse. Fig: Functional layout of the ADC and DAC Step-size or resolution Δ = Vref/ (2N — 1), for an N-bit DAC 31 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION DAC Types of DAC – DAC #1: Voltage Divider (The digital input controls the switches, and the amplifier provides the analog output.) – DAC #2: R/2R Ladder – The first type is easier to analyze, while the second one is more practical from the implementation point of view. 32 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION ADC The ADC consists of a sampler, quantizer, and a Sample & Hold Circuit and Signal coder. – The sampler in the simplest form is a semiconductor switch followed by a hold circuit which is a capacitor with a very low leakage path. – The hold circuit tries to maintain a constant voltage till the next switching. The quantizer is responsible for converting this voltage to a binary number. – Coder is an optional device which is used after the conversion is complete. Types of ADC – Flash – Single-Slope Integration – Successive Approximate (SAR) – Counter-type ADC – Tracking-type ADC Hold Circuit Output Quantized output 33 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION ADC Flash ADC: also known as Parallel ADC For a n-bit converter, we require 2n-1 comparators, 2n resistors and an encoder logic Advantage: the fastest type of ADC. Disadvantages: Expensive. large power consumption. Applications: Data acquisition. satellite communication, radar processing, sampling oscilloscope, and high-density disk drives. 34 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Interrupt An Interrupt is a signal informing a program or a device connected to the processor that an event has occurred. When a processor receives an interrupt signal, it takes a specified action depending on the priority and importance of the entity generating the signal. Interrupt signals can cause a program to suspend itself temporarily to service the interrupt by branching into another program called Interrupt Service Subroutines (ISS) for the specified device that has caused the interrupt. Types of Interrupts: Interrupts can be broadly classified as Hardware Interrupts: These are interrupts caused by the connected devices. Software Interrupts: These are interrupts deliberately introduced by software instructions to generate user-defined exceptions. Trap: These are interrupts used by the processor alone to detect any exception such as divided by zero Depending on the service the interrupts also can be classified as Fixed interrupt Address of the ISR built into a microprocessor, cannot be changed Either ISR stored at the address or a jump to actual ISR stored if not enough bytes are available Vectored interrupt Peripheral must provide the address of the ISR Common when a microprocessor has multiple peripherals connected by a system bus Maskable and Non-Maskable 35 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Interrupt Driven Data Transfer (Vectored Interrupt) 36 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION Interrupt Interrupt Latency Time between interrupt request being raised and execution of the first instruction of the interrupt service routine Factors affecting latency? Latency: Constant + Variable Masking and Priority Logic Variable ➔ Architecture Dependent, Delay to recognize interrupt Application Dependent Time taken to complete the current instructions Saving the Current State (return address, flag, registers) Locating the first instruction of ISR CALL Interrupts disable temporarily Save the current state (context) Higher priority interrupts have to be completed Pass parameters needed for function Interrupt Jitter: Effect of interrupt latency execution Functions: CALL & RET Branch-to-body function Execution function body RET Restore Context Pass the result to the main/caller program 37 ELECTRICAL ELECTRONICS COMMUNICATION INSTRUMENTATION

Document Details

Tags

Related

Full Transcript