merged_presentation_choladeck.pptx

COMPX349 Embedded Systems Memory in Embedded Systems Introduction Physical memory What types of memory are used in embedded systems? Memory layout How is memory address space laid out? Memory Allocation How is memory allocated? © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 2 Types of Memory - Intro Static RAM Low power, low density, high speed Dynamic RAM High power, high density, medium speed EEPROM Stores without power. Modern EEPROM is Flash memory Flash Extremely high density and low cost, zero power. Limited and slow writes © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 3 Static RAM Cell Specialised latch with 4 to 6 transistors CMOS transistors consume very little power except when changing state High power availability means changes (writes) and reads can be very fast Low density due to requiring so many transistors for each memory cell Great for embedded systems but limited in size © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 4 Dynamic RAM One transistor and one capacitor High density Must be refreshed frequently Rows are read and latched then columns addressed then row is re- written Very high power Complex controller Generally not used for low © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 5 Flash Modern version of Electrically Erasable, Programmable Read Only Memory (EEPROM) Modern Flash is the highest density storage available Much cheaper than DRAM or SRAM to manufacture Retains data when switched off, Zero power except when being read or written Widely used – memory cards, USB Drives SSDs Ideal storage for embedded systems Most embedded systems have an extra trick to make it more useful © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 6 Flash cell The float gate is completely insulated If it has no charge the control gate can open the transistor If it is charged the float gate masks the control gate so it has no effect © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 7 Reading and writing High voltages are used to tunnel electrons on to or off the floating gate. Opposite voltages for read and write Normally generated on chip These voltages can damage the insulation around the gate. Early flash could only be erased and re-written a few hundred times. Modern flash is 10s to 100s of thousands of times SSD controllers use “wear levelling” to spread erases around and make the whole SSD last longer. Writing involves erasing then writing values Quite slow Erased Flash is normally all ones Erasing is done a block at a time © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 8 NAND and NOR flash memory There are differences in how the chips are wired NAND memory Less wires – more density Reads are done a block at a time NOR Lower density Can read individual words © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 9 NOR Cells Cells are connected in parallel. Any one cell can bring a word line low – NOR Individual cells are reasonably fast to read Cells are bit larger and more wiring to include © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 10 NAND Cells Cells are connected in series - NAND like circuit Smaller cells and less wiring To save more space often remove the ability to address individual cells Block reads and block writes only © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 11 NAND vs NOR NAND NOR Erase Time ~1ms ~1sec Read Time ~20us >80ns Write Time 100us/block Medium 10us/word Bit reliability Low, bad bits possible Good, mostly reliable Erase cycles 100k-1M 10k – 100k Access Sequential Random Interface Block controller (complex) Memory interface XIP No Yes © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 12 eXecute In Place Code can be run directly from storage Requires random access Special mapping in the compiler/linker Code that can be modified needs to be moved to RAM Normally NOR-Flash used to meet these requirements. Used for first stage boot loaders BIOS/UEFI Commonly used with embedded systems Cheaper Lower power Higher density than SRAM Performance is sufficient © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 13 Memory use in embedded processors Memory address range is used for: Flash memory for code SRAM for data (stack/heap) Addressing peripherals System configuration data (registers) Other specialized uses Generally these are simply mapped to different parts of the address space 32 bit address space is 4GB Embedded micro has kB to MB of Flash/RAM © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 14 Cortex M Memory Map 0xE0100000 0xFFFFFFFF ROM Table 0xE00 FF000 External PPB System 0xE00 42000 ETM 0xE00 41000 0xE0100000 TPIU 0xE00 40000 Private peripheral bus - External 0xE00 4 0000 Private peripheral bus - Internal 0xE00 40000 Reserved 0xE00 0 0000 0xE000 F000 SCS 0xE000 E000 Reserved 0xE000 3000 External device 1.0GB FPB 0xE000 2000 DWT 0xE000 1000 ITM 0xE000 0000 0xA0000000 0x44000000 External RAM 1.0GB 32MB Bit band alias 0x42000000 0x60000000 31 MB Peripheral 0.5GB 0x40100000 1MB Bit band region 0x40000000 0x40000000 0x24000000 SRAM 0.5GB 32MB Bit band alias 0x 22000000 0x20000000 31 MB Code 0.5GB 0x 20100000 1MB Bit band region 0x 20000000 0x00000000 © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 15 Memory Protection ARM and other embedded processors often provide memory protection Permissions such as Execute (fetch) instructions, Read and write data Unprivileged mode access Buffering on peripheral accesses Some access levels are configurable and some are fixed Different areas have different access permissions ARM MPU allows up to eight different areas © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 16 Busses Arm High Performance Bus AHB Arm Peripheral Bus APB Busses provide standardized interfaces for integrating core with memory and other components on a SoC ARM now moving to more standardized busses © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 17 Caches Caches are good when there is a mismatch between CPU speed and main memory latency. GPCPUs are >1GHz with multiple execution units; 2-10 instructions/ns DRAM may have 50ns latency Embedded devices have much less mismatch CPU may be 10-100MHz, single execution, so 10-100ns per instruction NOR Flash and SRAM are typically 10-80ns latency Caches have a significant cost to manage Also cause unpredictable memory access times Embedded systems rarely use caches (high performance ones do) May have very small caches to help with bus contention E.g. single entry © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 18 Specialised Memory – Bit Banding Embedded applications often need to address bits. GPIO pins, Flags, network data etc Many processors (including ARM) do not have instructions for reading and writing individual bits of memory. Conventional solution requires reading, setting bits in registers and writing again Susceptible to interrupts. Bit banding allows individual bits to be read or written by reading and writing words that are mapped to the bit band region Feature only on ARM Cortex M processors © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 19 Bit Band 0x44000000 32MB Bit band alias 0x42000000 0x60000000 31 MB Peripheral 0.5GB 0x40100000 1MB Bit band region 0x40000000 0x40000000 0x24000000 SRAM 0.5GB 32MB Bit band alias 0x22000000 0x20000000 31 MB 0x20100000 1MB Bit band region 0x20000000 © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 20 Specialised Memory - Tightly Coupled Memory Most memory accesses (Flash/SRAM) go through a bus Possible contention Tightly Coupled Memory is SRAM that has a direct memory interface to the CPU Low latency, Constant access times Some ARM processors (CortexR) have a few kB of TCM Important code/data Interrupt vectors/service routines Makes response times predictable (and smaller) © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 21 NRF52 memory layout UICR User information configuration registers FICR Factory information configuration registers © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 22 Microbit Memory Map /build/MICROBIT.map Memory Configuration Name Origin Length Attributes MBR 0x0000000000000000 0x0000000000001000 xr SD 0x0000000000001000 0x000000000001b000 xr FLASH 0x000000000001c000 0x000000000005b000 xr BOOTLOADER 0x0000000000077000 0x0000000000007000 xr SETTINGS 0x000000000007e000 0x0000000000002000 xr UICR 0x0000000010001014 0x0000000000000008 xr RAM 0x0000000020002030 0x000000000001dfd0 xrw *default* 0x0000000000000000 0xffffffffffffffff © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 23 Memory Map Boot loader and Interrupt Service Routines.mbr 0x0000000000000000 0xb00 *(SORT_BY_ALIGNMENT(.mbr)).mbr 0x0000000000000000 0xb00../libraries/codal-microbit-v2/lib/mbr.o 0x0000000000000000 _binary_mbr_bin_start 0x0000000000000b00 _binary_mbr_bin_end 0x0000000000000b00. = ALIGN (0x4).isr_vector 0x000000000001c000 0x200 libcodal-nrf52.a(gcc_startup_nrf52833.S.o) 0x000000000001c000 __isr_vector © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 24 Memory Map.stack 0x0000000000000000 0x2000.stack 0x0000000000000000 0x2000 libcodal-nrf52.a(gcc_startup_nrf52833.S.o) 0x0000000000000000 __StackLimit.heap 0x0000000020004138 0x1b6c8 load address 0x0000000000034538 0x0000000020004138 __end__ =. 0x0000000020004138 end = __end__ *(SORT_BY_ALIGNMENT(.heap*)).heap 0x0000000020004138 0x2000 libcodal-nrf52.a(gcc_startup_nrf52833.S.o) 0x0000000020004138 __HeapBase 0x0000000020006138 __HeapLimit 0x0000000000000001 ASSERT ((. = heap.heap_end) break; // We can merge! blockSize += (*next & ~DEVICE_HEAP_BLOCK_FREE); *block = blockSize | DEVICE_HEAP_BLOCK_FREE; next = block + blockSize; } © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 33 Malloc() - check block size // We have a free block. Let's see if it's big enough. // If so, we have a winner. if (blockSize >= blocksNeeded) break; // Otherwise, keep looking... block += blockSize; } // We're full! if (block >= heap.heap_end) { target_enable_irq(); return NULL; } © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 34 Malloc(). -Return correct size block // If we're at the end of memory or have very near match then mark the whole segment as in use. if (blockSize = heap.heap_end) { // Just mark the whole block as used. *block &= ~DEVICE_HEAP_BLOCK_FREE; } else { // We need to split the block. PROCESSOR_WORD_TYPE *splitBlock = block + blocksNeeded; *splitBlock = blockSize - blocksNeeded; *splitBlock |= DEVICE_HEAP_BLOCK_FREE; *block = blocksNeeded; // Mark index block as allocated } // Enable Interrupts target_enable_irq(); return block+1; // Don’t return index block } © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 35 Free() - setup void device_free (void *mem) { PROCESSOR_WORD_TYPE *memory = (PROCESSOR_WORD_TYPE *)mem; PROCESSOR_WORD_TYPE *cb = memory-1; int i=0; // Sanity check. if (memory == NULL) return; // If this memory was created from a heap registered with us, free it. #if (DEVICE_MAXIMUM_HEAPS > 1) for (i=0; i < heap_count; i++) #endif © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 36 Free() { if(memory > heap[i].heap_start && memory < heap[i].heap_end) { // The memory block given is part of this heap, so we can simply // flag that this memory area is now free, and we're done. if (*cb == 0 || *cb & DEVICE_HEAP_BLOCK_FREE) target_panic(DEVICE_HEAP_ERROR); *cb |= DEVICE_HEAP_BLOCK_FREE; return; } } // If we reach here, then the memory is not part of any registered heap. target_panic(DEVICE_HEAP_ERROR); } © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 37 Note: Slabs Bigger bits of memory are sometimes allocated as slabs E.g. Linux/Unix uses a slab allocator Slabs are internal to the allocator Each slab is divided into fixed size blocks Used to quickly allocate memory for common kernel data structures Inodes, network connection info, various caches Some embedded systems can allocate memory as slabs Larger chunks of memory for data transfer Special purpose uses © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 38 COMPX349 Embedded Systems 12 - Real Time Overview What does Real Time mean? Real Time tasks Scheduling Real Time Operating Systems (RTOS) © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 40 What does “Real Time” mean? In computing broadly Real time = the time measured by a physical clock Real time operation generally means the execution keeps up with the pace of events A speech or video sample of 1 second, if processed in 1 second or less In real time systems Task execution meets specific deadlines © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 41 Real Time Operating System Real Time Tasks Process control in industrial plants Robotics Air Traffic control Telecommunications Weapon guidance system e.g. Guided missiles Medical diagnostic and life support system Automatic engine control system Real time data base Mars Rovers Curiosity : OS – VxWorks, Processor BEA’s RAD 750 © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 42 Terms and definitions Release time (or ready time): This is the time instant at which a task(process) is ready or eligible for execution Schedule Time: This is the time instant when a task gets its chance to execute Completion time: This is the time instant when task completes its execution Deadline: This is the instant of time by which the execution of task should be completed Runtime: The time taken without interruption to complete the task, after the task is released © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 43 Soft and Hard Real Time tasks Hard real time task: Task must complete before or at deadline. Value of completing the task after deadline is zero. Soft real time Missing deadline incurs a penalty Penalty increases as tardiness increases © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 44 Examples Types of Real Time Tasks Hard Real-time task Air traffic control Vehicle subsystems control Nuclear power plant control Soft Real-time Task Multimedia transmission and reception Networking, telecom (cellular) networks Web sites and services Computer games. Firm Real Task Periodic Task Aperiodic Task (soft) Sporadic Task (hard) Preemptible/Non-Preemptible Tasks © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 45 Real Time Design Patterns Loop / Round Robin Interrupts Threaded/concurrent State Machines Others © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 46 (Super) Loop Everything runs in one main loop Common with “bare metal” development No scheduler required Round-robin tasks Setup Good for simple task sets Task Task Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 47 Interrupts Interrupt service routines can be added to provide a quick response Interrupt service routine Can be nested Embedded MCU may support 100s of interrupts Eliminates polling Setup ISR shouldn’t be complex Task Interrupt Task Service Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 48 Threaded Requires scheduler Allows “concurrent” operation Tasks may be linked to interrupts or events For longer or larger tasks the scheduler behaviour is important Setup ISR Task Task Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 49 State machine switch( state_value ) { Many control systems case state-1: can be well modelled if(input=in1) Block-1; by state machines Break; case state-2: Simple predictable Block-2; logic Break; case state-n: State machines can be Block-n; Break; encoded in C with a default: switch statement Catch_error; Break; } © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 50 State machine Normally state machines are coded on switch( input) { case input-1: state values if(state=state1) Inputs are assessed Block1; within the state logic Break; case input-2: Block-2; The state loop needs to Break; run continuously case input-n: Block-n; Alternatively state Break; default: logic can be coded in Ignore_input; terms of inputs Break; } State loop runs when an input is detected © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 51 State machines Considerable complexity can be added to the basic state machine pattern Sub states can run when inside a super state Some states can execute in parallel The state machine design can be used as part of a superloop, an ISR or with parallel tasks © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 52 Other design patterns Other patterns cover areas such as Message queueing Resource contention Hierarchy Memory use/access Safety and reliability © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 53 Scheduling concerns Scheduling in Real time in operating systems: No of tasks Resource Requirements Release Time Execution time Deadlines © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 54 Scheduling Classification Real Time Scheduling Offline Online Static Dynamic Priority Priority Pre- Non Plannin Best emptive Pre- g Based emptive Effort © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 55 Real-Time Scheduling Real-time tasks execute repeatedly (usually are periodic) under some time constraint Task Task Task Time 0ms 5ms 10ms E.g., a task is released to execute every 5 msec, and each invocation has a deadline of 5 msec © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 56 Options for real time scheduling Traditional kernel scheduling Deadlines vs. fairness For example, if a user accessed the kernel: “can you guarantee my task will run for 1 second at every 5 second interval?” Conventional OS use proportional sharing – so the answer is highly dependent on other system activity Co-operative scheduling can’t provide guarantees either Other thread may lock up kernel Polling/busy wait © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 57 Real-Time OS Support Goal is to achieve predictable execution: Ideal: Real world: Preemp Migrat t e Other sources of uncertainty (and solutions): Interrupts (can mask some interrupts) Migrations (can pin tasks to cores) OS latency, jitter, and kernel non-preemptibility (real-time scheduling) © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 58 Offline Scheduling (Pre runtime scheduling) Generate scheduling information prior to system execution (Deterministic System Model) This scheduling is based on: Release time Deadlines Execution Disadvantage: Inflexible. If any parameter changes, the policy will have to be recomputed. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 59 Online Scheduling Number and types of tasks, associated parameters are not known in advance. Scheduling must accommodate dynamic changes Online Scheduling are of two types: Static Priority Dynamic Priority © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 60 Real-time scheduling Common scheduling classes: First-in, First-out scheduling Round robin scheduling Most frequent first Earliest deadline first Preemptive fixed priority scheduling © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 61 FIFO Scheduling First-in, First-out scheduling Task 1 Task 2 Task 3 Time The first enqueued task of highest priority executes to completion A task will only relinquish a processor when it completes, yields, or blocks Co-operative Scheduling Only a higher priority SCHED_FIFO or SCHED_RR task can preempt a SCHED_FIFO task – all others will be starved as it runs © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 62 Round Robin Scheduling Round-robin scheduling Same as SCHED_FIFO but with timeslices Among tasks of equal priority: Rotate through all tasks Each task gets a fixed time slice Task 1 Task 2 Task 3 Task 1 Task 2 Task 3 Task 1 Task 2 Task 3 Time Only a higher priority SCHED_FIFO or SCHED_RR task can preempt a SCHED_FIFO Tasks of equal priority preempt each other after timeslice expiration © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 63 Most Frequent First Scheduling A priority is assigned based on the inverse of its period Shorter periods = higher priority Longer periods = lower priority P1 is assigned a higher priority than P2. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 64 EDF Scheduling Earliest Deadline First (EDF) scheduling Simple, yet effective Whichever task has next deadline gets to run Deadline: 5 Deadline: 8 Deadline: 12 Task 1 Task 2 Task 3 Exec time: 4 Exec time: 3 Exec time: 2 Theory exists to analyze such systems Difficult to know deadlines in some cases Task 1 Task 2 Task 3 Time 0 5 8 12 © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 65 Preemptive Fixed Priority Scheduling High Priority Task Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 66 Preemptive Fixed Priority Scheduling High Priority Task Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 67 Preemptive Fixed Priority Scheduling High & low High & low priority jobs priority jobs arrived together arrived together High Priority Task Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 68 Preemptive Fixed Priority Scheduling High priority job is executed first High Priority Task Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 69 Preemptive Fixed Priority Scheduling Low priority job is executed later High Priority Task Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 70 Preemptive Fixed Priority Scheduling High priority High Priority Task job arrived Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 71 Preemptive Fixed Priority Scheduling Preempts low High Priority Task priority job Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 72 Preemptive Fixed Priority Scheduling High Priority Task Lower priority job resumes later Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 73 Preemptive Fixed Priority Scheduling High Priority Task Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 74 Preemptive Fixed Priority Scheduling High Priority Task Time Low Priority Task © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 75 Priority Scheduling issue: Priority Inversion Priority inversion High-level task stalled due to low-level using shared resources, then a medium-level task holding up the low-level one Solution: Priority inheritance give low-level task the priority of any task waiting on a mutex it holds Mars Pathfinder had this problem © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 76 Priority inversion © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 77 Priority Scheduling Normally tasks waiting to run are added to a run queue The scheduler selects the next task to run from the run queue If the queue is short then the selection algorithm doesn’t matter much A priority scheduler must sort the queue based on priority If it takes too long to sort it interferes with system real-time responsiveness Want a queue with (near) constant time insert/delete/lookup © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 78 Red-Black Tree Used as a priority queue Self balancing form of binary search tree Each node has a colour: red or black Leaf nodes (NIL) are always black Red nodes always have black children From any node, the paths to all of its descendent leaf (NIL) nodes pass through the same number of black nodes. All operations on the tree are O(logn) in the worst case Insert, delete, lookup. Makes a (near) constant time queue. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 79 What is an RTOS? Real time operating system Less focus on human use Designed for embedded use Features for scheduling to meet real-time requirements © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 80 Types of RTOS Embedded systems are on a spectrum “Bare metal” programming Device support libraries. Low power IoT Environments Real Time Application Platforms POSIX compliant systems (Portable Operating System Interface) Next Generation RTOS Examples follow, except where noted all are developed in C/C++ and use it as the primary development environment Normally these are cross-compiled to provide loadable firmware © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 81 Bare Metal Programming Once, you might have had to write raw C (or assembly) code for the microprocessor. I/O through accessing specific memory locations Similar to the code written for WRAMP. Now embedded processor vendors will provide an SDK Provides libraries of helper functions to access features and peripherals ARM processor vendors normally support the CMSIS standard Common Microcontroller Software Interface Standard May not be any memory allocation, scheduling © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 82 Low Power (IoT) Platforms Designed for battery use Very constrained CPUs (e.g. Cortex M0) Very low memory footprint Simple scheduler Generally little memory protection or safety features Often quite good networking Contiki OS Started with uIP small TCP/IP stack Complete system runs in 10kB RAM 30kB ROM Currently being re-developed. Contiki-NG RIOT OS Started from University of Berlin project Very strong online community Well tested codebase Similar or smaller memory requirements than Contiki depending on options required. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 83 POSIX Systems Many embedded OS provide some level of POSIX API E.g. RIOT uses POSIX networking calls Makes C environment more familiar for developers Most difficult parts of POSIX for embedded systems: Privilege separation between kernel and userspace Ability to fork new processes Some systems are approaching these capabilities Nuttx Apache Foundation project aiming for strict POSIX compliance Can be built flat or with separate kernel. Frosted is a system with strict kernel separation but not a very active project RTLinux is a realtime kernel for Linux © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 84 Real Time Application Platforms This is the most popular bit of the RTOS space. Systems provide mechanisms to enable real time response ** Hard real-time scheduling ** Memory protection Mutex / Semaphores Timers Generally have strong hardware and feature support Networking Filesystems Etc. Safety features vary a bit. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 85 Commercial RTOS VxWorks Probably still the most widely used RTOS Developed since the late 1980s Has been redeveloped extensively, up to version 7 Now sold by WindRiver Widely used in the Automotive, Aviation markets Used in consumer devices such as home routers Many well known space missions including Mars Rovers Other commercial RTOS include QNX – commercial micro-kernel OS Windows Embedded CE © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 86 Open Source RTOS FreeRTOS Well known project, established 2003 Now maintained by AWS Very simple – kernel is three C files Concentrates on threading Flexible heap management options ChibiOS Full featured RTOS developed since 2007 Fast context switching Encourages static memory allocation for safety ARM MBED – specifically to support ARM microcontrollers Many others… © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 87 Future RTOS Fuschia Google project Now appearing in Nest devices Kernel built on Little Kernel embedded OS Strong emphasis on security Allows development in Dart, Rust and Go Tock Trusted kernel written in Rust Uses Rust memory safety features to improve OS safety Co-operative scheduling using Rust inside kernel Processes are pre-emptive and rely on hardware protection © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 88 An example: Project Zephyr Linux Foundation project since 2016 Based on a kernel (Rocket) contributed by WindRiver Claims to be most contributed to open source RTOS Very wide hardware support Includes Micro:bit Developed in C CMake compile system Two configuration systems (from Linux) Devicetree for hardware description Kconfig to select options for the software © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 89 Zephyr features (from Wikipedia) A small kernel A flexible configuration and build system for compile- time definition of required resources and modules A set of protocol stacks (IPv4 and IPv6, CoAP, LwM2M , MQTT, 802.15.4, Thread, Bluetooth Low Energy, CAN) A virtual file system interface with several flash file systems for non-volatile storage (FATFS, LittleFS, NVS) Management and device firmware update mechanisms © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 90 Zephyr Scheduling Threads have priorities – can be positive or negative Negative priority threads are cooperatively scheduled Positive priority threads are pre-emptively scheduled Number of priority levels is configurable Multiple scheduler options © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 91 COMPX349 Embedded Systems 13 - Power Overview Reducing power consumption Slow down Lower voltage Stop CPU Stop peripherals Optimise code Compiler optimisations © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 93 Power Power consumption on CPUs is important Heat Battery life Generally we want to reduce it. Turn things off when not needed Slow clocks down Optimize software But we want to have the full compute power available when needed Response time Throughput © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 94 Power Reduction Features Embedded processors are normally designed to be low power Static RAM/ Flash Medium to low clock rates Short pipelines High levels of integration (Reasonably) modern IC technologies. In addition embedded microprocessors have features to enable power to be reduced while running Under software control Optimizing software also can significantly reduce power consumption © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 95 Reducing CPU Power Consumption - 1 Slow the clock down CMOS transistors are very high impedance when “off” or “on” Most current (power) is used when transitioning The clock network itself transitions every cycle and uses a lot of power. It is common to quote current consumption as a linear function of clock speed ARM CortexM4 documentation example, depends on implementation, compiler and flags © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 96 Programmatic Clock Control Setting the clock is often quite easy Change a register value Clock frequency or multiplier value Unfortunately when the clock changes other values have to change as well Memory wait states Processor voltage Bus/Peripheral clock multipliers There is no standard for these. SoC dependent Device documentation must be consulted RTOS support is poor © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 97 Reducing CPU Power Consumption - 2 Voltage Control Modern microcontrollers will run on a wide range of voltages nRF52833 runs on 1.7 to 5.5 volts. Lowering the voltage lowers power consumption May limit the clock rate May limit memory performance In some systems this may be programmable © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 98 Reducing CPU Power Consumption - 3 Sleep modes. - stopping the CPU clock ARM Cortex processors have three sleep modes Sleep Peripherals maintain power/clock CPU wakes on any peripheral interrupt Deep Sleep RAM values maintained Peripherals turned off Requires external interrupt (timer) to wake up Standby RAM /register values lost Backup circuit reboots processor on defined events © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 99 Entering sleep mode ARM has two instructions for entering sleep mode WFE. Wait for Event. This check the event register and enters deep sleep depending on the value Exits sleep for a set of maskable events WFI. Wait for Interrupt This enters sleep mode Exits for any interrupt Most ARM software uses WFI instruction ARM C compilers have defined intrinsics (built in functions) to allow calling of WFE or WFI __WFE(); __WFI(); Which sleep mode is used depends on the system control register © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 100 Micro:bit sleep Standard sleep() returns control to the scheduler. If nothing is executing it executes an “power efficient concurrent sleep operation” Under some circumstances it ends up at target_wait_for_event(); void target_wait_for_event() { __WFE(); } Micro:bit CODAL has a deepsleep(); function This calls simpleDeepSleep(); in MicroBitPowerManager.cpp Calls callback functions on all Micro:bit peripheral drivers to turn them off Calls the __WFI(); intrinsic © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 101 Project Zephyr Power Management Project Zephyr provides a configurable, generalized power management model Aims to be applicable across a wide range of devices Includes system (CPUs) management and device (peripherals) management State go from Active(on) through idle, standby, suspend to RAM/disk to off Devices can be managed by the application or the RTOS Lower power states take longer to resume (includes restoring state) Zephyr aims to choose lowest power state that will allow resuming in the expected wait time. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 102 Reducing CPU Power Consumption - 3 Turning other stuff off Peripherals consume power even when idling Generally they can be turned off Many have an on/off input that can be connected to a GPIO pin Use or configure the pin to be pulled low (off) May have to send a serial command or other value Driver should have sleep and wake up functions Called as part of CPU sleep and wake up proceedure © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 103 Power consumption examples See https://infocenter.nordicsemi.com/index.jsp?topic=% 2Fstruct_nrf52%2Fstruct%2Fnrf52833.html © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 104 Reducing CPU Power Consumption - 4 Write correct code, then optimize. Know your microprocessor’s architecture, compiler, and programming language. Use the right tool for the problem Let the compiler optimize for you Modern compilers are very good © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 105 Optimizing For Power vs Optimizing For Energy? Not always. Optimizing for power aims to reduce the amount of energy consumed over an interval of time whereas optimizing for energy aims to optimize the total energy consumed. Remember: power is linked to heat, energy is the total you pay for in the end. Assume the power consumption is defined by the number of instructions executed at any moment in time**. By rescheduling instructions on a VLIW processor for instance, or any parallel computer system, we can have different power and energy consumption profiles. Average sum (proportional to energy) power power power time time time Scheduling (a) Scheduling (b) Scheduling (c) © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 106 Optimizing For Power vs Optimizing For Energy? Suppose we can execute certain instructions faster (e.g. in half the time) by using certain hardware features e.g. exploiting special SIMD hardware. Average sum (proportional to energy) Average sum (proportional to energy) power power Average sum time time power (proportional to energy) Scheduling (a) Scheduling (d) exploits certain hardware features (using the same time power budget) Scheduling (e) exploits certain hardware features (at a higher power budget this time) –still less energy consumed overall compared to (a) © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 107 Optimizing For Power/Energy vs Optimizing For Speed? Not necessarily - often higher performance leads to more energy consumption e.g. through increasing clock frequency. However, eliminating redundant instructions or memory hierarchy optimizations, for instance, reduce the overall execution time (i.e. increase execution speed) and often result in energy reductions consequently. But even this is not always guaranteed. Consider the following code: for (i=0; iLat*PI/180) * cos(p2->Lon*PI/180 - p1->Lon*PI/180)) * 6371; } © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 134 Example Program: “Nearby Points of Interest” void Find_Nearest_Point(... ) {... while (strcmp(points[i].Name, "END")) { d = Calc_Distance (&ref, &(points[i]) ); if (dSinLat * p2->SinLat + p1->CosLat * p2->CosLat * cos(p2->Lon - p1->Lon)) * 6371; } Distance is acos(big_expression)*6371 What is 6371? Scaling factor to convert angle in radians to distance in km 6371 km = Earth’s circumference/2π radians Can compare angle (returned by acos) rather than distance Angle is still proportional to distance between points © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 136 Optimized Code float Calc_Distance_in_Radians( PT_T * p1, const PT_T * p2) { // calculates distance in radians between locations return acos(p1->SinLat * p2->SinLat + p1->CosLat * p2->CosLat * cos(p2->Lon - p1->Lon)); // no *6371 here } void Find_Nearest_Point(... ) { while (strcmp(points[i].Name, "END")) { d = Calc_Distance_in_Radians(&ref, &(points[i]) );... } *distance = d*6371;... } Eliminates NPoints-1 floating point multiplies © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 137 Taking it Further float Calc_Distance_in_Radians( PT_T * p1, const PT_T * p2) { // calculates distance in radians between locations return acos(p1->SinLat * p2->SinLat + p1->CosLat * p2->CosLat * cos(p2->Lon - p1->Lon)); } Can we make distance comparisons without using acos? How is acos related to its argument X? Can we call acos just once – to compute the distance to the closest point? © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 138 Taking it Further acos(X) 3.5 float Calc_acos_arg ( PT_T * p1, const PT_T * p2) { 3 return ( 2.5 p1->SinLat * p2->SinLat + 2 p1->CosLat * p2->CosLat * 1.5 cos(p2->Lon - p1->Lon)); 1 } 0.5 0 -1 -0.5 0 0.5 1 X acos always decreases when input X increases Nearest point will have minimum distance and maximum X So, search for point with maximum argument to acos function After finding nearest point (max X), compute distance_km = acos(X) * 6371 © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 139 Even More Optimized Code float Calc_Distance_inverse ( PT_T * p1, const PT_T * p2) { // calculates distance in radians between locations return (p1->SinLat * p2->SinLat + p1->CosLat * p2->CosLat * cos(p2->Lon - p1->Lon); // no acos here } void Find_Nearest_Point(... ) { while (strcmp(points[i].Name, "END")) { d = Calc_Distance_inverse(&ref, &(points[i]) ); if (d>closest_d) closest_d = d; i++; } *distance = acos(d)*6371;... } Eliminates NPoints-1 floating point acos calculations. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 140 Optimizing Searches Improve the data organization, possibly also enabling a better algorithm Example data structure: List Each node holds a data element. May be connected to one or two other nodes with pointers: One successor (next) Optional: one predecessor (prev) Sequential access to data. Must start at current node (or start node) and traverse list by visiting nodes via next or prev pointers. Examples: linked list, queue, circular queue, double-ended queue. © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 141 More Data Structures Tree - hierarchical Each node holds a data element. May be connected to other nodes with pointers: Up to one parent Down to N children Sequential access to data. Must traverse by visiting nodes, but additional connections reduce number of intermediate nodes. Hierarchical structure. May be represented explicitly with pointers or implicitly with index location of element. Array – random access © THE UNIVERSITY OF WAIKATO TE WHARE WANANGA O WAIKATO 142 Optimization e.g.: Binary Search Divide and Conquer approach Step 1 Step 2 Step 3 Step 4 Requirements Regions in table must be sorted by increasing starting address For each entry, start address

merged_presentation_choladeck.pptx

Document Details

Related

Full Transcript