Introduction to ARM Cortex-M Processors PDF
Document Details
Uploaded by SeasonedDrums6830
C-DAC, Pune
Tags
Summary
This document provides an introduction to ARM Cortex-M processors, their architecture, advantages, and potential use cases. It includes information about core concepts and basic development aspects of ARM Cortex-M microcontrollers.
Full Transcript
Introduction to ARM Cortex-M Processors CDAC ACTS, Pune My promise ! I am confident that this course will save you many, many hours of studying/experimenting/googling time to learn about this processor. I stand behind this course 100% and am committed to helping you. A Promise t...
Introduction to ARM Cortex-M Processors CDAC ACTS, Pune My promise ! I am confident that this course will save you many, many hours of studying/experimenting/googling time to learn about this processor. I stand behind this course 100% and am committed to helping you. A Promise that I need from you ! Daily 30Mins of Revision of what is taught in the class 2 What am I going to get from Programming Micro-controllers using ‘C' this course? Learn about embedded software development and debugging using STM Cube IDE Learn about Mixed ‘C’ and Assembly Coding Demystifying Memory, Bus interfaces, NVIC, Exception handling with lots of animation Low level register Programming for interrupts, System Exceptions, Setting Priorities, Preemption etc. Learn writing IRQ handlers , IRQ numbers, NVIC and many more Learn about OS related features like SVC, SysTick, PendSv and many more 3 Cortex M3 @ S/W Developer point of View Programming Model How exceptions are handled The Memory Map Peripheral Interfacing How to use software driver libraries from Microcontroller Vendor. CMSIS Core API’s STM32CUBE Libraries 4 Agenda What are the ARM Cortex M Processors? The CortexM3 and M4 Processors The Cortex-M Processor Family Advantage of the Cortex-M Processors Low Power Performance Energy Efficiency Code Density Interrupts Ease of use, C friendly Scalability Application of the ARM Cortex-M Processor Background and History ARM processor evolution Architecture versions and Thumb ISA 5 What’s Happening in Microcontrollers? Microcontrollers are getting cheap 32-bit ARM Cortex-M3 Microcontrollers @ INR 1200 Some microcontrollers sell for as little as INR 400 Microcontrollers are getting powerful Lots of processing, memory, I/O in one package Floating-point is even available in some! Microcontrollers are getting interactive Internet connectivity, new sensors and actuators LCD and display controllers are common Creates new opportunities for microcontrollers 6 Why ARM here ??? ARM is one of the most licensed and thus widespread processor cores in the world Used especially in portable devices due to low power consumption and reasonable performance Several interesting extensions available like Thumb instruction set and Jazelle Java ARM History ARM – Acorn RISC Machine(1983–1985) Acorn Computers Limited, Cambridge, England ARM – Advanced RISC Machine 1990 ARM Limited, 1990 ARM has been licensed to many semiconductor manufacturers ARM History Key component of many 32 – bit embedded systems Portable Consumer devices ARM1 prototype in 1985 One of the ARM’s most successful cores is the ARM7TDMI,provides high code density and low power consumption Advanced RISC Machines ARM Core uses a ________ architecture ARM is Physical hardware design company. ARM licenses its cores out and other companies make processors based on its cores RISC vs. CISC Architecture RISC CISC Fixed width Variable length instructions instructions Few formats of Several formats of instructions instructions Load/Store Memory values can be Architecture used as operands in Large Register bank instructions Small Register Bank Instructions are Pipelining is Complex pipelinable RISC Advantage(s) A Smaller Die Size A Shorter Development Time Higher Performance (Bit Tricky) Disadvantage Generally poor code density (Fixed Length Instruction) CISC vs. RISC CISC RISC Greater Compiler Compiler Complexity Code Generation Code Generation Greater Processor Processor Complexity Features used from RISC A Load/Store Architecture Fixed Length 32-bit Instructions 3- Address Instruction Formats Load Store Architecture Memory can be accessed only through two dedicated instructions LDR ; move word from memory to register STR ; move word from register to memory All other instructions have to work on registers only. 3 Address Instruction Format f bits n bits n bits n bits Function op 1 addr. op 2 addr. dest. addr. Example Add d, s1, s2 ; d =s1+s2 Von Neumann vs Harvard Architecture 17 What are ARM Cortex M processors? The Cortex-M3 and Cortex-M4 are processors designed by ARM. The Cortex-M3 processor was the first of the Cortex generation of processors, released by ARM in 2005 (silicon products released in 2006). The Cortex-M4 processor was released in 2010 (released products also in 2010). The Cortex-M3 and Cortex-M4 processors use a 32-bit architecture. Internal registers in the register bank, the data path, and the bus interfaces are all 32 bits wide. The Instruction Set Architecture (ISA) in the Cortex-M processors is called the Thumb ISA and is based on Thumb-2 Technology which supports a mixture of 16-bit and 32-bit instructions. 18 ARM Cortex-M3 Microcontroller 18 x 32-bit registers Excellent compiler target Reduced pin count requirements Efficient interrupt handling Power management Efficient debug and development support features Breakpoints, Watchpoints, Flash Patch support, Instruction Trace Strong OS support User/Supervisor model OS support features Designed to be fully programmed in C (even reset, interrupts 19 and exceptions) ARM Cortex-M3 Microcontroller ARMv7M Architecture No Cache - No MMU Debug is optimized for microcontroller applications Vector table contains addresses, not instructions DIV instruction Interrupts automatically save/restore state Exceptions programmed in C (No Coprocessor 15 - All registers are memory-mapped) Interrupt controller is part of Cortex-M3 macrocell Fixed memory map Bit-banding Non-Maskable Interrupt (NMI) Only one processor status reg Thumb-2 processing core Mix of 16 and 32 bit instructions for very high code density Gives complete Thumb compatibility 20 The Cortex M Processor Family 21 Advantage of Cortex –M Processor Low Power Currently, many Cortex-M microcontrollers have power consumption of less than 200 uA/MHz, with some of them well under 100 uA/MHz. In addition, the Cortex-M processors also include support for sleep mode features and can be used with various advanced ultra- low power design technologies. Performance The Cortex-M3 and Cortex-M4 processors can deliver over 3 CoreMark/MHz & 1.25 DMIPS Energy Efficiency Code Density Interrupts Ease of use, C friendly 22 Features Cont. Scalability Debug Friendly OS Support Versatile system features Bit Banding MPU Software portability and reusability CMSIS Application of Cortex M Processors Microcontrollers Automotive Industrial Control Consumer Products DSP 23 Architecture Version and Thumb ISA The successful ARM7TDMI is based on the architecture version ARMv4T (The “T” means Thumb instruction support). Note that architecture version numbers are independent of processor names. The Architecture version 7 is divided into three profiles Cortex-A Processors: ARMv7-A Architecture Cortex-R Processors: ARMv7-R Architecture Cortex-M Processors: ARMv7-M ,ARMv7-EM(for M4) ARMv6-M Architectures 24 ISA Enhancement and Evolution 25 Introduction to Embedded Software Development CDAC ACTS, Pune Agenda What are inside typical ARM microcontrollers? What you need to start Development Suites Development Boards Debug Adaptor Documentation and other resources Software Development Flow Compiling your Application Software flow Polling Interrupt Driven Multi Tasking System Input, output and peripherals accesses Microcontroller interfaces Cortex Microcontroller software interface standard (CMSIS) 27 What are inside typical ARM µC.microcontrollers, the processor takes less than In many 10% of the silicon area, and the rest of the silicon die is occupied by other components such as: Program memory (e.g., flash memory) SRAM Peripherals Internal bus infrastructure Clock generator (including Phase Locked Loop), reset generator, and distribution network for these signals Voltage regulator and power control circuits Other analog components (e.g., ADC, DAC, voltage reference circuits) I/O pads 28 What you need to start Development suites STM32CubeIDE Development Board STM32F4 Discovery Board Debug Adaptor STLinkv2 Documents and other Resources Link to Resources 29 Software Development Flow 30 Software Compiling Flow 31 Polling Flow 32 Interrupt Driven 33 Managing the ISR The code inside an ISR is generally kept as short as possible, in order to minimize the amount of time spent in the interrupt. This is important for a few reasons: If the interrupt occurs very often and the ISR contains a lot of instructions, there is a chance that the ISR won't return before being called again. For communication peripherals such as UART or SPI, this will mean dropped data (which obviously isn't desirable). Another reason to keep the code short is because other interrupts also need to be serviced. One way of achieving minimal instructions and responsibility in the ISR is to do the smallest amount of work possible inside the ISR and 34 then set a flag that is checked by code running in the super loop. Multi Tasking System In these applications, a Real-Time Operating System(RTOS) can be used to handle the task scheduling. An RTOS allows multiple processes to be executed concurrently, by dividing the processor’s time into time slots and allocating the time 35 RTOS to Handle Multiple Task 36 RTOS vs Superloop 37 ARM Data Size Definition & Interfaces 38 Technical Overview CDAC ACTS, Pune Agenda General information about Cortex M3 and M4 processors? Processor type and Architecture Instruction Set Block Diagram Memory system Interrupt and exception support Features of Cortex M3 and Cortex M4 Processor Performance Code Density Low Power Memory System and Protection Unit Interrupt Handling OS support Debug Support Scalability and Compatibility 40 Processor type and Architecture All the ARM Cortex-M processors are 32-bit RISC (Reduced Instruction Set Computing) processors. They have: 32-bit registers 32-bit internal data path 32-bit bus interface 3 Stage Pipelining (FDE) Harvard Bus Arch.(Simul. Instrn. Fetch & Data Access) 32-bit Addressing which allow 4GB address space Load and Store Arch. Processor Arch. ARMv7M 41 Instruction Set - Thumb2 High Performance High Performance & High Code Density 42 The Thumb-2 instruction set Variable-length instructions ARM instructions are a fixed length of 32 bits Thumb instructions are a fixed length of 16 bits Thumb-2 instructions can be either 16-bit or 32-bit Thumb-2 gives approximately 26% improvement in code density over ARM Thumb-2 gives approximately 25% improvement in performance over Thumb 43 ARM and Thumb Mode Switching 44 Thumb2 No Switching Req. 45 Block Diagram 46 Various Bus Interfaces on Cortex M3 47 AMBA System High Performance APB ARM processor UART High Bandwidth AHB Timer APB External Bridge Memory Keypad Interface High-bandwidth DMA PIO on-chip RAM Bus Master Low Power Non-pipelined High Simple Performance Interface Pipelined Burst Support Multiple Bus Masters 48 Memory Map Very simple linear 4GB memory map The Bus Matrix partitions memory access via the AHB and PPB buses 49 Memory System & Interrupt support Typically, the microcontroller vendor will need to add the following items to the memory system: Program memory, typically flash Data memory, typically SRAM Peripherals The Cortex-M3 and Cortex-M4 processors include an interrupt controller called the Nested Vectored Interrupt Controller (NVIC). It is programmable and its registers are memory mapped. The address location of the NVIC is fixed and the programmer’s model of the NVIC is consistent across all Cortex-M processors. 50 NVIC supports a number of system exceptions, Features of Cortex M3 - Performance The three-stage pipeline allows most instructions, including multiply, to execute in a single cycle, and at the same time allows high clock frequencies for microcontroller devices typically over 100 MHz, and up to approx. 200 MHz in modern semiconductor manufacturing processes. Multiple bus interfaces allow simultaneous instruction and data accesses to be performed. The pipelined bus interface allows a higher clock frequency in the memory system. The highly efficient instruction set allows complex operations to be carried out in a low numbers of instructions. Each instruction fetch is 32-bit, and most instructions are 16-bit. Therefore, up to two instructions can be fetched at a time 51 Code Density Thumb-2 technology allows 16-bit instructions and 32- bit instructions to work together without any state switching overhead. Most simple operations can be carried out with a 16-bit instruction. Various memory addressing modes for efficient data accesses Multiple memory accesses can be carried out in a single instruction Support for hardware divide instructions and Multiply- and-Accumulate (MAC) instructions exist in both Cortex- M3 and Cortex-M4 Instructions for bit field processing in Cortex-M3/M4 Single Instruction, multiple data (SIMD) instruction support exists in Cortex-M4 52 Low Power The Cortex-M processors provide a number of low power features. These include Multiple sleep modes defined in the architecture Integrated architectural clock gating support, which allows clock circuits for parts of the processor to be deactivated when the section is not in use. The processors also have additional optional hardware support: Wakeup Interrupt Controller (WIC) to enable advanced low power technologies such as State Retention Power Gating (SRPG). 53 Memory Protection Unit If an MPU is included, applications can divide the memory space into a number of regions and define the access permissions for each of them. When an access rule is violated, a fault exception is generated and the fault exception handler will be able to analyze the problem and, if possible, correct it. 54 Interrupt Handling Supports up to 240 interrupt inputs, a Non-Maskable Interrupt (NMI) input, and a number of system exceptions. Each interrupt (except NMI) can be individually enabled or disabled. Programmable priority levels for interrupts. The priority levels can be changed dynamically at run time Automatic handling of interrupt/exception prioritization and nested interrupt/exception handling. Vector table can be relocated to various areas in the memory. Low interrupt latency with zero wait state memory 55 system, the interrupt latency is only 12 cycles. STM32F4x Block Diagram 56 57 Architecture CDAC ACTS, Pune Agenda Introduction to the Architecture Programmers Model Operating Modes and States Registers Special registers Behavior of Application program Status Register(APSR) Integer status Flags Q status flag GE bits Memory System Memory system features and Memory Map Stack Memory & MPU Exception and Interrupt What are exceptions? Nested vector interrupt controller (NVIC) Vector Table and Fault Handling System Control Block Debug and Reset Sequence 59 Programmer’s Model Operation states and Modes 60 CPU Operating Modes – Cont. Operation modes Handler mode: When executing an exception handler such as an Interrupt Service Routine (ISR). When in handler mode, the processor always has privileged access level. Thread mode: When executing normal application code, the processor can be either in privileged access level or unprivileged access level. This 61 is controlled by Register Set R0-R12 General Purpose registers R0-R7 low registers due to limited space available in IS , many 16 bit instruction can only access the low registers. R8-R12 high registers can be used by 32-bit instruction Initial Value of R0-R12 are undefined. R13 Stack Pointer R14 Link Register R15 Program Counter 62 Stack Pointer –R13 Physically there are two different Stack Pointers: Main Stack Pointer is the default Stack Pointer. It is selected after reset, or when the processor is in Handler Mode. The other Stack Pointer is called the Process Stack Pointer. The PSP can only be used in Thread Mode. The selection of Stack Pointer is determined by a special register called CONTROL. The PSP is normally used when an embedded OS is involved, where the stack for the OS kernel and application tasks are separated. 63 Link Register – R14 This is used for holding the return address when calling a function or subroutine or ISR. At the end of the function or subroutine, the program control can return to the calling program and resume by loading the value of LR into the Program Counter (PC). When a function or subroutine call is made, the value of LR is updated automatically. If a function needs to call another function or subroutine, it needs to save the value of LR in the stack first. Otherwise, the current value in LR will be lost when the function call is made. 64 Link Register Flow diagram 65 Program Counter – R15 It is readable and writeable: a read returns the current instruction address plus 4 (this is due to the pipeline nature of the design, and compatibility requirement with the ARM7TDMI processor). Writing to PC (e.g., using data transfer/processing instructions) causes a branch operation. 66 Special Registers They are needed for development of an embedded OS, or when advanced interrupt masking features are needed. Special registers are not memory mapped, and can be accessed using special register access instructions such as MSR and MRS(Move to ARM register from system coprocessor register). 67 Program Status Register 31 16 15 10 7 0 28 27 26 25 24 N Z C V23Q IT T IT/ICI ISR Number One Status Register consisting of APSR - Application Program Status Register – ALU flags IPSR - Interrupt Program Status Register – Interrupt/Exception No. EPSR - Execution Program Status Register IT field – If/Then block information ICI field – Interruptible-Continuable Instruction information Please note: The EPSR cannot be accessed by software code directly using MRS (read as zero) or MSR The IPSR is read only and can be read from combined PSR (xPSR). 68 Program Status Register: xPSR 69 APSR Bits 28 to 31 are alu condition code flags. Q bit – sticky overflow flag , used by saturating instructions. These alu are the only one you can modify in thread mode 70 Q Flag – Signed & Unsigned Saturation 71 IPSR 72 EPSR *Note: Since Cortex M3 has only thumb2 mode so bit 24 i.e. T bit will always be high and you should never attempt to change it. The IT (IF-THEN) instruction statement contains the IT instruction opcode with up to an additional three optional suffixes of “T” (then) and “E” (else), followed by the condition to check against, which is the same as the condition symbol for conditional branches. The “T”/”E” indicates how many subsequence instructions are inside the IT instruction block, and whether they should or should not be executed if the condition is met. 73 Mask registers The PRIMASK register prevents activation of all exceptions with configurable priority. Write 1 into PRIMASK to disable all interrupts with configurable Priority -MOV R0, #1 -MSR PRIMASK, R0 Write 0 into PRIMASK to Enable all interrupts The FAULTMASK register prevents activation of all exceptions except for Non-Maskable Interrupt (NMI) The BASEPRI register defines the minimum priority for exception processing. When BASEPRI is set to a nonzero value, it prevents the activation of all exceptions with the same or lower priority level as the BASEPRI value. If we want to disable all interrupts below certain level of Priority Write priority into BASEPRI - MOV R0, #0x60 74 - MSR BASEPRI, R0 CONTROL register The CONTROL register controls the stack used and the privilege level for software execution when the processor is in Thread mode. 75 Stacks 76 Both Thread and Handler using MSP 77 Thread uses PSP & Handler uses MSP 78 Switching Privileged & Unprivileged 79 Privilege, Modes and Stacks 80 Memory System Memory System features 4GB linear address space Architecturally defined memory map Support for little endian and big endian memory systems Bit band accesses (optional) Write buffer - When a write transfer to a buffered memory region will take multiple cycles, the transfer can be buffered by the internal write buffer in the Cortex-M3 or Cortex-M4 processor so that the processor can continue to execute the next instruction, if possible. This allows higher program execution speed. Memory Protection Unit (Optional) Unaligned transfer support 81 Memory Map Copy from FLASH to SRAM 82 Stack Memory 83 Exceptions and Interrupts What are exceptions? Exceptions are events that cause changes to program flow. When one happens, the processor suspends the current executing task and executes a part of the program called the exception handler. After the execution of the exception handler is completed, the processor then resumes normal program execution. In the ARM architecture, interrupts are one type of exception. Interrupts are usually generated from peripheral or external inputs, and in some cases they can be triggered by software. The exception handlers for interrupts are also referred to as Interrupt Service Routines (ISR). Link 84 Exception sources 85 Exception Types 86 Nested Vectored Interrupt Controller Features Flexible exception and interrupt management Each interrupt (apart from the NMI) can be enabled or disabled and can have its pending status set or cleared by software. Nested exception/interrupt support Each exception has a priority level. Some exceptions, such as interrupts, have programmable priority levels and some others (e.g., NMI) have a fixed priority level. When an exception occurs, the NVIC will compare the priority level of this exception to the current level. If the new exception has a higher priority, the current running task will be suspended. Vectored exception/interrupt entry The Cortex-M processors automatically locate the starting point of the exception handler from a vector table in the memory. As a result, the delays from the start of the exception to the execution of the exception handlers are reduced. Interrupt masking 87 Starting Address of Exception To determine the starting address of the exception Handler handler, a vector table mechanism is used. The vector table is an array of word data inside the system memory, each representing the starting address of one exception type. Example :The vector table is located at address 0x0 after reset. if the reset is exception type 1, the address of the reset vector is 1 times 4 (each word is 4 bytes), which equals 0x00000004, and the NMI vector (type 2) is located at (n x 4)2 x 4 = 0x00000008. The address 0x00000000 is used to store the starting value of the MSP. The LSB of each exception vector indicates whether the 88 exception is to be executed in the Thumb state. Since Vector Table LSB of exception vectors should be set to 1 to indicate Thumb state 89 Fault Handling 90 System Control Block (SCB) One part of the processor that is merged into the NVIC unit is the SCB. The SCB contains various registers for: Controlling processor configurations (e.g., low power modes) Providing fault status information (fault status registers) Vector table relocation (VTOR) The SCB is memory-mapped. Similar to the NVIC registers, the SCB registers are accessible from the System Control Space (SCS). 91 Core Sight Debug and Trace Technology 92 Core Sight features Core Sight features can be accessed through a JTAG or Serial Wire interface. Debugging in JTAG and Serial Wire mode at the same time is not possible. Cortex-M processor-based devices can include a: Debug Interface The debug interface offers two modes: JTAG Debug is the industry-standard interface that allows device chaining. Serial Wire Debug is a 2-pin interface with an optional Serial Wire Trace Output. In contrast to JTAG, devices cannot be chained. The Debug Interface communicates with the following units: Run Control: allows the user to start, stop, and single-step through the source code. Breakpoint Unit: allows the user to set breakpoints even while the processor is running. Memory Access Unit: allows the user to read or write to memory and 93 peripheral registers even while the program is running. (Variable viewer – Live exp, registers) Debug connections 94 Trace Port Interface Link The Trace Port Interface encodes and provides trace information via two possible interfaces: The Serial Wire Trace Output pin (SWO) can be used in Serial Wire Debug mode only. The 4-Pin Trace Output has a greater bandwidth than Serial Wire Trace Output and uses 5 functional pins. It is the only way to output ETM trace data. The Trace Port Interface communicates with the following units: Embedded Trace Macrocell (ETM): can be used for instruction tracing to debug historical sequences, for software profiling, and code coverage analysis. ETM data are output through an extra 4-bit interface. Instrumentation Trace Macrocell (ITM): provides application information like debug printf(), RTOS information, unit test, or UML annotation. Data Watchpoint & Trace Unit (DWT): provides PC sampling, event counters, timing, and interrupt execution information. In addition, it allows Access Breakpoints for up to four memory addresses. 95 Cortex Reset Sequence & Startup Link 96 Vector Table at Startup 97 Reset Sequence 98 Reset Behavior 99 Status of SP and PC during reset 100 Task to be Performed 101 Initialization Summary 102 Memory System CDAC ACTS, Pune Agenda Memory Map Connecting the processor to memory and peripheral Memory Requirements Memory Endianness Data alignment and Unaligned data access Bit-band operations Default Memory Access permissions Exclusive access Memory Barrier Memory System in a microcontroller 104 Memory Map 105 106 Memory Endianness The Cortex-M3 (byte-invariant big-endian, BE- 8) – Data on the AHB Bus Little Endian – Data on the AHB Bus 107 Data alignment and unaligned data access 108 Bit Banding Normally Bit Manipulation requires a READ MODIFY WRITE operations which is expensive in terms of no. of CPU cycles taken. To overcome this limitation , a technique called bit banding allows direct bit manipulations on section of Peripheral and SRAM memory , without need of special instructions. While performing Bit banding operations we need to consider two memory regions Bit Band Region [1 MB] Bit Band Alias Region [32 MB] Bit Banding works by mapping each Bit in the Bit Band Region to a Word in Alias Region 109 Bit Access to Bit band region 110 Bit Banding Mapping 111 Cortex-M3 Bit Banding Calculate the bit band alias address for given bit band memory address and bit position. 5th bit position of the memory location 0x20000200 using its alias address Formula: Alias_address=alias_base+ (32*(bitband_memory_addr – bitband_base_address))+ bit*4 112 Cortex-M3 Bit Banding Writes to a word address in the bit band alias affect a single bit in the bit band region The write is translated to an atomic read-modify-write by the Cortex- M3 bus matrix Bit 0 of the stored register is written to the appropriate bit Word alias 32MB Bit band alias 32MB 31MB Physical bit 1MB Bit band region 32MB Bit band alias 31MB 1MB Bit band region 113 Advantages of Bit-Band Bit-Band vs. Bit-Bang In the Cortex-M3, we use the term bit-band to indicate that the feature is a special memory band (region) that provides bit accesses. Bit-bang commonly refers to driving I/O pins under software control to provide serial communication functions. The bit- band feature in the Cortex-M3 can be used for bit-banging implementations, but the definitions of these two terms are different. Reading the whole register ,Masking the unwanted bits,Comparing and branching You can simplify the operations by BitBanding to: Reading the status bit via the bit-band alias (get 0 or 1) ,Comparing and branching 114 Memory Access attributes Bufferable: Write to memory can be carried out by a write buffer while the processor continues on to next instruction execution. Cacheable: Data obtained from memory read can be copied to a memory cache so that next time it is accessed the value can be obtained from the cache to speed up program execution. Executable: The processor can fetch and execute program code from this memory region. Shareable: Data in this memory region could be shared by multiple bus masters. The memory system needs to ensure coherency of data between different bus masters in the shareable memory region. 115 Exclusive Access via MUTEX semaphore 116 Memory system in a Microcontroller In many microcontroller devices, the designs additional memory system features such as: integrate Boot loader Memory remapping Memory alias There are many different reasons why chip designers put a boot loader into the system. For example, to: Provide a flash programming utility, so that you can program the flash using a simple UART interface, or even program some parts of the flash memory dynamically. Provide Built-In Self Test (BIST) for the chip. 117 Bootloaders and Memory For Remapping chips with a boot loader ROM, the boot loader is executed when the system is started, so it has to be located in address 0 when the system starts at power up. However, the next time the system starts, it might not need to execute the boot loader again and can run the application in the flash directly, so the memory map needs to be changed. In order to do this, the address decoder needs to be programmable. A hardware register (e.g., a peripheral register in a system control unit) can be used. The operation to switch the memory map is called “Memory Remap.” This operation is done by the 118 boot loader. 119 Memory remap implementation with boot loader 120 Possibilities of Re-mapping Normally the boot loader is accessible from address 0 using a memory address alias, and the alias can be turned off. There are many possible memory system configurations. What is shown in prev. slide is just one of the possibilities. Other Possibilities Video Link1, Link2 In some microcontrollers, memory remapping is not needed as the boot loaders present in address 0 are executed every time the system starts up. The vector table is then relocated using the vector table relocation feature provided by the processor, so there is no need to use any remap to handle vector fetches. 121 Exceptions and Interrupts CDAC ACTS, Pune Agenda Exception types Overview of interrupt management Definitions of priority Vector table and vector table relocation Interrupt inputs and pending behaviors Details of NVIC registers for interrupt control Summary Interrupt enable registers Interrupt set pending and clear pending Active status Priority level Software trigger interrupt register Interrupt controller type register 123 Agenda Details of SCB registers for exception and interrupt control Summary of the SCB registers Interrupt control and state register (ICSR) Vector table offset register (VTOR) Application interrupt and reset control register System handler priority registers System handler control and state register 124 Exception Type 125 Exception types 126 127 List of Interrupts 128 CMSIS-Core Exception Definitions 129 Commonly Used CMSIS-Core Fxns 130 Definition of Priority A priority-level register with 3 bits implemented (8 programmable priority levels) A priority-level register with 4 bits implemented (16 programmable priority levels) 131 132 Group Priority if the priority-level configuration registers are 8- bits wide, there are only 128 pre-emption levels? This is because the 8-bit register is further divided into two parts: group priority and sub- priority. Using a configuration register in the System Control Block (SCB) called Priority Group the priority-level configuration registers for each exception with programmable priority levels is divided into two halves. The group priority level defines whether an interrupt can take place when the processor is already running another interrupt handler. The sub-priority level is used only when two 133 exceptions with same group-priority level occur Group Priority 134 Vector Table and its Relocation When the Cortex-M processor accepts an exception request, the processor needs to determine the starting address of the exception handler (or ISR if the exception is an interrupt). This information is stored in the vector table in the memory. By default, the vector table starts at memory address 0. The vector table is normally defined in the startup codes provided by the microcontroller vendors. Usually, the starting address (0x00000000) should be boot memory, and it will usually be either flash memory or ROM devices. The Vector Table Relocation feature provides a programmable register called the Vector Table Offset 135 Register (VTOR) Vector Table Offset Register 136 Vector Table Relocation for Boot Rom 137 Booting from Pen Drive or SD Card 138 Interrupt Inputs and Pending Behavior There are various status attributes applicable to each interrupt: Each interrupt can either be disabled (default) or enabled Each interrupt can either be pending (a request is waiting to be served) or not pending Each interrupt can either be in an active (being served) or inactive state An interrupt request can be accepted by the processor if: The pending status is set, The interrupt is enabled, and The priority of the interrupt is higher than the 139 Interrupt pending and activation behavior 140 Register In NVIC for Interrupt Control 141 Exceptions Handling In Detail CDAC ACTS, Pune Agenda Exception sequences Exception Entrance and stacking Exception return and unstacking Interrupt latency and exception handling optimization 143 Exception entrance and stacking 144 Nested Interrupt Stacking 145 Interrupt Handling One Non-Maskable Interrupt (INTNMI) supported 1- 240 prioritizable interrupts supported Interrupts can be masked Implementation option selects number of interrupts supported Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core Interrupt inputs are active HIGH INTNMI 1-240 NVIC Cortex-M3 Interrupts … Processor INTISR[239:0] Core Cortex-M3 146 NVIC Operations Exception Entry/Exit 147 Interrupt Preemption 148 Low Power and System Control Features CDAC ACTS, Pune Power Management Multiple sleep modes supported Controlled by NVIC Sleep Now – Wait for Interrupt/Event instructions Sleep On Exit – Sleep immediately on return from last ISR Deep Sleep Long duration sleep, so PLL can be stopped Exports additional output signal SLEEPDEEP Cortex-M3 system is clock gated in all sleep modes Sleep signal is exported allowing external system to be clock gated also NVIC interrupt Interface stays awake Wake-Up Interrupt Controller (WIC) External wake-up detector allows Cortex-M3 to be fully powered down Effective with State-Retention / Power Gating (SRPG) methodology 150 Clock Gating Clock gating is a popular technique used in many synchronous circuits for reducing dynamic power dissipation. Clock gating saves power by adding more logic to a circuit to prune the clock tree. Pruning the clock (selectively disabling the clock signal) disables portions of the circuitry so that the flip-flops in them do not have to switch states because Switching states consumes power. When not being switched, the switching power consumption goes to zero, and only leakage currents are incurred. Leakage currents are small, unwanted currents that flow through a semiconductor device even when it is supposed to be in the off state. 151 Activities in an Interrupt-Driven Application 152 Various Power Modes 153 State Retention Power Gating In some designs, advanced power-saving techniques called State Retention Power Gating (SRPG) can be used to reduce the leakage current of the chip by a wide margin. In SRPG designs, the registers (often called flip- flops in IC design terminology) have a separate power supply for state retention elements inside the registers. When the system is in Deep Sleep mode, the normal power supply can be turned off, leaving only the power to the state retention elements ON. The leakage in this type of design is greatly reduced because the combinational logic, clock buffers, and most parts of the registers are 154 SRPG 155 System Timer (SysTick) 156 Cortex-M3 Pipeline Cortex-M3 has 3-stage fetch-decode-execute pipeline Similar to ARM7 Cortex-M3 does more in each stage to increase overall performance 1st Stage - Fetch 2nd Stage - Decode 3rd Stage - Execute Address Data Phase AGU Phase & Write Load/Store & Back Branch Instruction Fetch Decode & Multiply & Write (Prefetch) Register Divide Read Branch Shift ALU & Branch Branch forwarding & speculation Execute stage branch (ALU branch & Load Store Branch) 157 The Cortex microcontroller software interface standard (CMSIS) CMSIS was developed by ARM to allow microcontroller and software vendors to use a consistent software infrastructure to develop software solutions for Cortex-M microcontrollers. Many software products for Cortex-M microcontrollers are CMSIS- compliant. Currently the Cortex-M microcontroller market comprises: More than 15 microcontroller vendors shipping Cortex-M microcontroller products, with some other silicon vendors providing Cortex-M based FPGA and ASICs More than 10 toolchain vendors More than 30 embedded operating systems Additional Cortex-M middleware software providers for codecs, communication protocol stacks, etc. With such a large ecosystem, some form of standardization of the way the158software infrastructure works becomes necessary to ensure software CMSIS To Increase the interoperability of various software components , ARM worked with various microcontroller vendors, tools vendors, and software solution providers to develop CMSIS, a software framework covering most Cortex-M processors and Cortex-M microcontroller products. The aims of CMSIS include: Enhanced software reusability - makes it easier to reuse software code in different Cortex-M projects, reducing time to market and verification efforts. Enhanced software compatibility - by having a consistent software infrastructure (e.g., API for processor core access functions, system initialization method, common style for defining peripherals), software from various sources can work together, reducing the risk in integration. Easy to learn - the CMSIS allows easy access to processor core features from the C language. In addition, once you learn to use one Cortex-M microcontroller product, starting to use another Cortex-M product is much easier because of the consistency in software setup. Toolchain independent - CMSIS-compliant device drivers can be used with various compilation tools, providing much greater freedom. Openness 159 - the source code for CMSIS core files can be downloaded and accessed by everyone, and everyone can develop software products with CMSIS. CMSIS Projects CMSIS-Core (Cortex-M processor support) CMSIS-DSP library CMSIS-SVD - the CMSIS System View Description CMSIS-RTOS - the CMSIS-RTOS is an API specification for embedded OS CMSIS-DAP - the CMSIS-DAP (Debug Access Port) 160 Standardization in CMSIS-Core Standardized access functions to access processor’s features - These include various functions for interrupt control using NVIC, and functions for accessing special registers in the processors. Standardized functions for system initialization - Most modern feature-rich microcontroller products require some configuration of clock circuitry and power management registers before the application starts. In CMSIS-compliant device-driver libraries, these configuration steps are placed in a function called “SystemInit().” However, having a standardized function name and a standardized location where this function can be found makes it much easier for a designer to pick up and start using a new Cortex-M microcontroller device. Standardized software variables for clock speed information - This might not be obvious, but often our application code does need to know what clock frequency the system is running at. For 161 example, such information might be needed for setting up the baud rate divider in a UART, or to initialize the SysTick timer for an embedded 162 Organization of CMSIS-Core In a general sense, we can define the CMSIS into multiple layers: Core Peripheral Access Layer - Name definitions, address definitions, and helper functions to access core registers and core peripherals. This is processor specific and is provided by ARM. Device Peripheral Access Layer - Name definitions, address definitions of peripheral registers, as well as system implementations including interrupt assignments, exception vector definitions, etc. This is device specific (note: multiple devices from the same vendor might use the same file set). Access Functions for Peripherals - The driver code for peripheral accesses. This is vendor specific and is optional. You can choose to develop your application using the peripheral driver code provided by the microcontroller vendor, or you can program the peripherals directly if you prefer. There is also a proposed additional layer for peripheral accesses: Middleware Access Layer - This layer does not exist in current version of CMSIS. The idea is to develop a set of APIs for interfacing common peripherals such as UART, SPI, and Ethernet. If this layer exists, developers of middleware can develop their applications based on this layer to allow software to be ported between devices easily. 163 CMSIS-Core structure 164 Using CMSIS in Project 165