Embedded System Optimization IT2102 PDF

IT2102 Embedded System Optimization The aspect of optimization plays a central role during the design of embedded systems. Optimization is a process of improving the efficiency of a system in time (speed), space (size), or resources. The first step in optimization is to determine the problems that have been existing. Most of the optimizations performed on code involve a tradeoff between execution speed and code size. High-Level Optimizations Software optimization is the process of modifying a software system to make some aspects work more efficiently or use fewer resources. Simple Loop Transformations o Loop Permutations – These permutations swap the order of two loops to increase parallelism, improve spatial locality, or enable other transformations. o It may have a positive effect on the reuse of array elements in the cache since the next iteration of the innermost loop will access an adjacent location in memory. ▪ Caches are normally organized such that adjacent locations can be accessed significantly faster than locations that are further away from the previously accessed location in which it is exploiting spatial locality. o Loop Unrolling - is a standard transformation creating several instances of the loop body. ▪ The number of copies of the loop is called the unrolling factor. Unrolling factors larger than two are possible. Unrolling reduces the loop overhead (fewer branches per execution), and therefore typically improves the speed. o Loop Fusion and Fission – implies the combining or separating of loop nests. ▪ There may be cases in which two separate loops can be merged, and there may be cases in which a single loop is split into two. ▪ Some versions might lead to an improved cache behavior and increase the potential for parallel computations within the loop body. Loop Tiling/Blocking – It implies the chunking of nested loops. In this optimization, the use of memory hierarchies may be beneficial in which it includes caches and scratchpad memories. A significant reuse factor for the information in those memories is required. Otherwise, the memory hierarchy cannot be exploited. Loop Splitting – Performing this loop splitting manually is very difficult and error-prone. There are published algorithms is based on a sophisticated analysis of accesses to array elements in loops. Optimized solutions are generated using genetic algorithms. Run-times can be reduced by loop splitting for various applications and architectures. Array Folding – Combining different array elements into one to avoid resource mismanagement. At any time, only a subset of array elements is needed. o The maximum number of elements needed is called the address reference window. Each array is allocated the maximum of the space it requires during the entire execution time. Embedded System Concurrency (Lee, 2017) The physical world is concurrent with many things happening at once. Bridging this mismatch in semantics is one of the major challenges that an embedded system designer face. Threads are imperative programs that run concurrently and share a memory space in which each program can access the other’s variables. o Creating Threads - The mechanism is provided in the form of a collection of procedures that a programmer can use, such as a standardized API (application program interface). o Implementing Threads - The core of the implementation of threads is a scheduler that decides which thread to execute next when a processor is available to execute a thread. o Mutual Exclusion (mutex) - A thread may be suspended between any two atomic operations to execute another thread and/or an interrupt service routine. o Deadlock - As mutex locks proliferate in programs, the risk of deadlock increases. A deadlock occurs when some threads become permanently blocked trying to acquire locks. 06 Handout 1 *Property of STI  [email protected] Page 1 of 5 IT2102 Processes are imperative programs with their own memory spaces. These programs cannot refer to each other’s variables and do not exhibit the same difficulties as threads. o To achieve concurrency, processes need to be able to communicate. ▪ Operating systems provide a variety of mechanisms such as creating shared memory spaces leading to potential difficulties to multithreaded programming. ▪ A file system is simply a way to create a body of data that is persistent in the sense that it outlives the process that creates it. One process can create data and write it to a file, and another process can read data from the same file. ▪ Message Passing – One process creates a chunk of data, deposits it in a carefully controlled section of memory that is shared, and then notifies other processes that the message is ready. Those other processes can block waiting for the data to become ready. Semaphores are named after mechanical signals traditionally used on railroad tracks to signal that a section of track has a train on it. o It is possible to use a single section of track for trains to travel in both directions (the semaphore implements mutex, preventing two trains from simultaneously being on the same section of track). Basics of Scheduling (Lee & Seshia, 2017) Multitasking is where multiple imperative tasks execute concurrently, either interleaved on a single processor or in parallel on multiple processors. o A scheduler decides what to do next at certain points in time, such as the time when a processor becomes available. Real-time systems are collections of tasks where in addition to any ordering constraints imposed by each precedence between the tasks. o These constraints relate the execution of a task to real-time and physical time in the environment of the computer executing the task. o Tasks have deadlines in which are values of physical time by which the task must be completed. The deadline is the time by which a task must be completed. ▪ Hard real-time deadline is a real physical constraint imposed by the application, where missing the deadline is considered an error. ▪ Soft real-time deadline reflects a design decision that need not be enforced strictly. It is better to meet the deadline, but missing the deadline is not an error. Scheduling Decisions - A scheduler decides what task to execute next when faced with a choice in the execution of a concurrent program or set of programs. A scheduling decision is a decision to execute a task. o Assignment: which processor should execute the task; The choice of processor is called processor assignment. o Ordering: in what order each processor should execute its tasks; and o Timing: the time at which each task executes. Types of Schedulers (Lee & Ha, 1989) A fully-static scheduler makes all decisions at design time in which it does not need semaphores or locks. It can use timing instead to enforce mutual exclusion and precedence constraints. A static order scheduler performs the task assignment and ordering at design time but defers until the run time the decision of when in physical time to execute a task. A static assignment scheduler performs the assignment at design time and everything else at run time. Each processor is given a set of tasks to execute, and a run-time scheduler decides during execution what task to execute next. A fully-dynamic scheduler performs all decisions at run time. When a processor becomes available (e.g., it finishes executing a task or a task block acquiring a mutex), the scheduler decides at that point about what task to execute next on that processor. 06 Handout 1 *Property of STI  [email protected] Page 2 of 5 IT2102 A preemptive scheduler may make a scheduling decision during the execution of a task, assigning a new task to the same processor. That is, a task may be in the middle of executing when the scheduler decides to stop that execution and begin the execution of another task. o The interruption of the first task is called preemption. A non-preemptive scheduler always lets tasks run to completion before assigning another task to execute on the same processor. A priority-based scheduler assumes each task is assigned a number called a priority, and the scheduler will always choose to execute the task with the highest priority. ▪ A fixed priority is a priority that remains constant over all executions of a task. ▪ A dynamic priority is allowed to change during execution. o A preemptive priority-based scheduler is a scheduler that supports the arrival of tasks and executes the enabled task with the highest priority all the time. o A non-preemptive priority-based scheduler is a scheduler that uses priorities to determine which task to execute next after the current task execution completes, but never interrupts a task during execution to schedule another task. Scheduler Implementation (Lee & Seshia, 2017) A scheduler may be part of a compiler or code generator, part of an operating system or microkernel, or both. A run-time scheduler will typically implement tasks as threads (or as processes, but the distinction is not important here). o A timer interrupt occurs. o A task attempts to acquire a mutex. o An I/O interrupt occurs. o A task tests a semaphore. o An operating system service is invoked. Priority inversion is a scheduling anomaly where a high-priority task is blocked while unrelated lower-priority tasks are executing. o Priority inheritance - when a task blocks attempting to acquire a lock, then the task that holds the lock inherits the priority of the blocked task. Thus, the task that holds the lock cannot be preempted by a task with lower priority than the one attempting to acquire the lock. Power Management (Peckol, 2019) Embedded systems are now integrated into smartphones, smartwatches, wireless data modems, video cameras, net browsers, wearable systems, and so on. All these devices use batteries as a source of energy supply. Categories of Power Consumption (CMOS-based systems) o The dynamic power consumption relates to the charging and discharging of the load capacitance and the short circuit currents. o The static power (or leakage power) relates to leakage currents flowing even when a transistor is closed. Importance of Power Management 1. Power consumption determines how long an embedded system can operate. Power consumption determines the sizes of power supplies, the heat-dissipation overhead, the cost, weight, and area. 2. If temperature increases, then the device failure rates increase, too. The power dissipation leads to heating. 3. There are hard time constraints in embedded systems. Designers use pessimistic estimations of worst-case execution time. It leads to redundancy in system resources (wastage) to meet worst-case performance requirements. 4. Embedded systems often execute functions that require resource-intensive applications (for example, multimedia processing). It leads to the optimization of embedded systems in the performance direction. 5. The development of CMOS technology leads to increasing both on-chip transistor density and speed. It results in the chip utilization wall, which limits the fraction of the chip that can be simultaneously used at full speed within the power budget. 06 Handout 1 *Property of STI  [email protected] Page 3 of 5 IT2102 6. More and more embedded systems are used in mobile convergence applications. For example, they are key platforms for web browsing, video streaming, and others. Such an amount of these makes their total power consumption very high. 7. Data processing leads to power consumption. Now many corporate IT departments follow the main tendency of green computing. It means that they try to reduce the environmental effect of their activity. Efficient power management is very important for achieving these goals. Energy Saving Approaches (Marwedel, 2018) Dynamic Voltage and Frequency Scaling (DVFS) - is a technique that aims at reducing dynamic power consumption by dynamically adjusting the voltage and frequency of a CPU. Typically, such an optimization step follows code generation by the compiler. o It is possible to shut down the processor during the slack time of 5 seconds. o Another option is to initially run the processor at full speed and then reduce the voltage when the remaining cycles can be completed at the lowest voltage. o Run the processor at a clock rate just large enough to complete the cycles within the available time. o If a variable voltage processor completes a task before the deadline, the energy consumption can be reduced. o If a processor uses a single supply voltage and completes a task just at its deadline, then the supply voltage is the unique supply voltage that minimizes the energy consumption of the task. o If a processor can only use some discrete voltage levels, then a voltage schedule using the two voltages, which are the two immediate neighbors of the ideal voltage can be chosen. These two voltages lead to the minimum energy consumption except if the need to use an integer number of cycles results in a small deviation from the minimum. Dynamic Power Management (DPM) – allows a system or system's blocks to be placed in low-power sleep modes when the systems are inactive. Normally, not all blocks of a system participate in performing different functions, and it is useful to shut down inactive blocks to reduce power consumption. o Straightforward approaches just use a simple timer to transition into a power-saving state. o More sophisticated approaches model the idle times by stochastic processes and use these to predict Thermal Management - It is necessary to use run-time monitoring of temperatures. This means that thermal sensors must be available in systems that potentially could get too hot. This information is then used to control the generation of additional heat and possibly has an impact on cooling mechanisms as well. o Controlling fans (when available) can be considered. o Systems may be shutting down completely if temperatures are exceeding maximum thresholds. o Adjusting the configuration that reduces performance of the system will lead to more available hardware resources. o Issuing less instructions per clock cycle leads to lessen the usage of some processor pipelines. References: Barkalov, A., Titarenko L. & Mazurkiewicz, M. (2019). Foundations of embedded systems. Springer International. Barr. M, & Massa, A. (2006). Programming embedded systems (2nd ed.). Chap. 14 - optimization techniques. O'Reilly. Cardoso, J., Coutinho, J., & Diniz, P. (2017). Embedded computing for high performance - Efficient mapping of computations using customization, code transformations and compilation. Elsevier. Colorado State University – Department of Computer Science. (n.d.). Legality of loop interchange [PDF]. Retrieved on 2021, July 9, from https://www.cs.colostate.edu/~mstrout/CS553Fall07/Slides/lecture23-looptransform.pdf Darwish, T. & Bayoumi, M. (2005). The electrical engineering handbook (1st ed.). Academic Press. Emertxe Information Technologies Pvt Ltd. (2014). Embedded C - optimization techniques [Slides]. SlideShare. Retrieved on 2021, August 11, from https://www.slideshare.net/EmertxeSlides/embedded-c-optimization-techniques Lee, E., Seshia, S. (2017). Introduction to embedded systems: a cyber-physical systems approach [2nd ed.]. MIT Press. Lee, E., Ha, S. (1989). Scheduling strategies for multiprocessor real-time DSP. In Global Telecommunications Conference (GLOBECOM). doi:10.1109/GLOCOM.1989.64160. 06 Handout 1 *Property of STI  [email protected] Page 4 of 5 IT2102 Marwedel, P. (2018). Embedded system design: Embedded systems, foundations of cyber-physical systems, and the internet of things (3rd ed.). Springer International. Pan, T., Zhu, Y. (2018). Designing embedded systems with Arduino: a fundamental technology for makers. Springer. Peckol, J. (2019). Embedded systems – A contemporary design tool. (2nd ed.). Wiley. Technische Universitt Hamburg. (n.d.). Optimization in embedded systems [Article]. Retrieved on 2021, August 11, from https://www.tuhh.de/es/embedded-systems-design/teaching/seminars/optimization-in-embedded-systems.html 06 Handout 1 *Property of STI  [email protected] Page 5 of 5

Embedded System Optimization IT2102 PDF

Document Details

Tags

Related

Summary

Full Transcript