Cortex M Programming: See a Program Running - University of North Carolina Wilmington PDF
Document Details
![FruitfulPennywhistle8115](https://quizgecko.com/images/avatars/avatar-5.webp)
Uploaded by FruitfulPennywhistle8115
University of North Carolina at Wilmington
Tags
Summary
This document explores key concepts in computer architecture and programming, specifically focusing on Cortex M processors. Topics include assembly code, machine code, program execution, memory mapping, and the function of registers within the CPU. The examples provided help illustrate how programs run.
Full Transcript
Ch I – See a Program Running Contents Translating C to Machine code Loading a program to memory Registers Executing a machine program Levels of Program code 001000010000000...
Ch I – See a Program Running Contents Translating C to Machine code Loading a program to memory Registers Executing a machine program Levels of Program code 001000010000000 0 001000000000000 C Program Assembly Program Machine 0Program 111000000000000 int main(void){ 1 int i; 010001000000000 int total = 0; 1 for (i = 0; i < 10; i++) Compil Assemble 000111000100000 { e 0 total += i; } 001010000000101 while(1); // Dead loop 0 } 110111001111101 1 101111110000000 High-level language Assembly Hardware0 111001111111111 Level of abstraction language representation 0 closer to problem Textual Binary digits (bits) domain representation of Encoded Provides for productivity instructions instructions and and portability Human-readable data format Computer-readable instructions format instructions Translating C to Machine Code Goal: To create an executable file aka machine program. What creates an executable file? Compiler. Executables are platform-dependent and non-portable. An executable compiled for one processor cannot run directly on a different microprocessor. Modifications for executable files may be required when switching the platforms. Compiler Two steps - Analysis and Synthesis. Analysis: Goal – create an Intermediate Representation (IR) that represents the original program in a simplified way. How – by analyzing the symbols and syntax. IR – aka low-level representation. Example - For C- assembly language. Synthesis: Goal – To generate the executable from IR How – perform transformations and optimizations Why – to improve program execution speed or reduce program size. See a program run C Code Assembly Code int main(void){ int a = 0; MOVS r1, #0x00 ; int a = 0 int b = 1; compiler MOVS r2, #0x01 ; int b = 1 int c; ADDS r3, r1, r2 ; c = a + b c = a + b; MOVS r0, 0x00 ; set return value return 0; BX lr ; return } l er b m se as Machine Code 001000010000000 2100 ; MOVS r1, #0x00 0 2201 ; MOVS r2, #0x01 001000100000000 188B ; ADDS r3, r1, r2 1 2000 ; MOVS r0, #0x00 000110001000101 4770 ; BX lr 1 In Binary In Hex Next step: Load and run! 001000000000000 0 The Executable The machine programs support a standard binary file specification known as the executable and linkable format (ELF). ARM processors support ELF. Two views – linking and execution view. Linking: used at static link time to merge multiple files during compilation Execution: employed at runtime to create a process image in memory when a program is loaded and executed. ELF Linking Execution View ELF Header View ELF Header Program Header Program Header Table Table Zero Initialized Data Section 1 segment (Z) Section 2 Read-Write Data Seg … (RW) Section n Read-Only Data Seg (RO) … Text Seg Section Header When a compiler constructs Table an executable Section file or a library, Header it can locate the Table content of a specific section in the ELF file by referencing the section header table. text segment – consists binary machine code RO - contains the value of variables unalterable at runtime RW - contains the initial values of statically allocated and modifiable variables Z- holds all uninitialized variables Binary Program view Load a program in Memory Executable is saved in flash memory (several MBs) – allowing the processor to restart. When running – program is copied to primary memory aka static memory aka SRAM (hundreds of KBs). Memory is arranged as a series of “locations” Each location has a unique “address” Each location holds a byte (byte-addressable) e.g. the memory location at address 0x080001B0 contains the byte value 0x70, i.e., 112 Memory For a 32-bit processor, each memory address has 32 bits. Then, the total addressable memory space is bytes (i.e., 4 GB) Memory is measured in KB (kilobytes), MB {megabytes), GB (gigabytes), and TB {terabytes). Kilo K = 1,024 Mega M = 1,048,576 Giga G = 1,073,741,824 Tera T = l,099,511,627,776 Data Address Memory 8 bits 32 bits 0xFFFFFFFF The number of locations in memory is limited e.g. 4 GB of RAM 70 0x080001B 1 Gigabyte (GB) = 230 bytes 0 232 locations 4,294,967,296 BC 0x080001A locations! 18 F 01 0x080001A Values stored at each location can A0 E represent either program data or 0x080001A D program instructions 0x080001A e.g. the value 0x70 might be the C code used to tell the processor to add two values together 0x00000000 Memory Computer Architecture Von-Neumann Harvard Instructions and data are Data and instructions are stored in the same stored into separate memory. memories. Von-Neumann The processor cannot fetch an instruction and access data simultaneously. 1 bus. All segments are loaded into the main memory. Memory bandwidth is shared. Inexpensive and straightforward. Harvard 2 sets of buses. Simultaneous instruction fetch and data access. Z, RW segments are copied to data memory. Enhancing memory bandwidth = high/speed performance. More efficient. Harvard Instruction and data memory are small enough to fit within the same address space – share the same memory address bus. 32-bit memory address space – 4GB 4KB for instructional memory (flash) 256 KB for data memory (SRAM) ARM Cortex-M Series Family Von-Neumann Harvard Instructions and data are Data and instructions are stored in the same stored into separate memory. memories. ARM ARM ARM ARM Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4 ARMv6-M ARMv6-M ARMv7-M ARMv7E-M ARM ARM ARM ARM Cortex-M1 Cortex-M23 Cortex-M7 Cortex-M33 ARMv6-M ARMv8-M ARMv7E-M ARMv8-M Cortex M + Harvard starting addresses: flash - 0x0800_0000 SRAM - 0x2000_0000 Load the code and data in memory Load the code and data in memory Loading the program in memory When the processor loads a program, all initialized, uninitialized global and local variables data memory. When processor boots, loads the first instruction of the program from the instruction memory, and the program starts to run. At runtime: 4 segments - initialized data, uninitialized or zero-initialized data aka block starting symbol (BSS), heap, and stack. Initialized and uninitialized ds - size and location remain unchanged heap and stack - changes dynamically Load and execute Stack is mandatory - first-in- last –out (FILO) Heap is used only if dynamic allocation (e.g. malloc, calloc) is used. Cortex M Memory Map No overlap Thus, convenient peripheral interface. A peripheral device is a computer device that is not part of the computer's core architecture. What are peripheral memory addresses? Peripheral Memory A set of registers: data registers for data exchange between the peripheral and the processor, control registers to configure or control the peripheral, and status registers to indicate the operation state of the peripheral. May also contain a small memory. Memory access instructions are predefined for each peripheral. Registers Aka Processor registers Fastest data storage – digital values General sizes – 16,32 and 64. Cortex – M registers : 32 bits Processor reads or writes all the bits in a register together. 2 types – general purpose (GP) – used to store any instruction – temporary storage special purpose (SP) – predetermined usage, access restricted Registers Made up of a series of flip-flops that operate in parallel to store binary bits. E.g. 32-bit register consists of 32 flip-flops connected in parallel http://hyperphysics.phy-astr.gsu.edu/hbase/Electronic/nandlatch.html#c1 Registers – Advantages Accessing the processor's registers is significantly faster than accessing data memory = good performance Temporal locality: compilers store the values of frequently accessed data variables and memory addresses in registers. E.g. tendency to read the same book again and again Spatial locality: when a processor accesses data at a certain memory address, it is likely that data stored in nearby locations will be accessed soon. Both processor architecture design (such as caching and prefetching) and software development (such as reorganizing data access sequences) exploit spatial locality to speed up overall performance. E.g. tendency to read the books from the same shelf But why only 32 bits? Thermal issues - registers Rules to follow in Assembly for exhibit the highest register allocation: temperature 1. Inspect the live range of a Fewer available registers = variable fewer bits to encode a register in a machine 2. If the live ranges of 2 instruction. variables overlap, then 2 = reduce code size different registers should be = lessen the memory allocated bandwidth. 3. Most frequently used variables should be mapped to registers and rest to data memory Registers 13 GP – data operations 1 Stack Pointer - stack pointer (SP) r13 holds the memory address of the top the stack. Link register (LR) r14 – mem add of inst after subroutine Program counter PC – next instruction mem add Program status register (xPSR) - status bit flags – negative, zero, carry, overflow 32 bits Base priority mask register (BASEPRI) - the priority threshold; lower the value = higher the priority R0 R1 Control register (CONTROL) – choice of MSP or PSP R2 Priority mask register (PRIMASK) can disable all interrupts, excluding hard faults Low R3 and non-maskable interrupts (NMI) Registers R4 ARM Cortex-M has R5 Fault mask register (FAUL TMASK) can disable all interrupts, excluding non- General maskable interrupts (NMI). R0-R12: 13 general-purpose registers R6 Purpose R7 Register R13: 1 Stack pointer – 2stacks – 1 Main R8 SP and 1 Process SP) R9 32 bits R14: Link register (LR) High Registers R10 R15: Program counter (PC) R11 xPSR Special registers (xPSR, BASEPRI, R12 BASEPRI Special PRIMASK, etc) R13 (SP) R13 (MSP) R13 (PSP) PRIMASK Purpose Register R14 (LR) FAULTMASK R15 (PC) CONTROL Program Counter Plays a major role for program execution Holds the memory address of the next instruction Processor fetches instructions consecutively from the instruction memory automatically incrementing the program counter to point to the next instruction to be fetched Program Execution Program Counter (PC) is a register that holds the memory address of the next instruction to be fetched from the memory. Memory Address 1. Fetch instruction at PC 477 0x080001B address 0 4 PC 200 0x080001B 0 2 3. 2. 188 0x080001B Execute Decode B 0 the the PC = 0x080001B0 220 0x080001A instructio instructio Instruction = 188B or 1 E n n 2000188B 210 0x080001A 0 C Three-state pipeline: Fetch, Decode, Execution Pipelining allows hardware resources to be fully utilized One 32-bit instruction or two 16-bit instructions can be fetched. Pipeline of 32-bit instructions Three-state pipeline: Fetch, Decode, Execution Pipelining allows hardware resources to be fully utilized One 32-bit instruction or two 16-bit instructions can be fetched. Clock Instruction Instruction Instruction Instruction i Fetch Decode Execution Instruction Instruction Instruction Instruction i + 1 Fetch Decode Execution Instruction Instruction Instruction Instruction i + 2 Fetch Decode Execution Instruction Instruction Instruction Instruction i + 2 Fetch Decode Execution Pipeline of 16-bit instructions Three-state pipeline: working Machine codes are stored in memory Data Address r15 pc 0xFFFFFFFF r14 lr r13 sp r12 r11 r10 r9 4770x080001B r8 ALU 0 4 r7 2000x080001B r6 0 2 r5 1880x080001B r4 B 0 2200x080001A r3 1 E r2 2100x080001A r1 0 C r0 0x00000000 Registers CPU Memory Fetch Instruction: pc = 0x08001AC Decode Instruction: 2100 = MOVS r1, #0x00 Data Address 0x080001A r15 pc C 0xFFFFFFFF r14 lr r13 sp r12 r11 r10 r9 477 0x080001B r8 ALU 0 4 r7 200 0x080001B r6 0 2 r5 188 0x080001B r4 B 0 220 0x080001A r3 1 E r2 210 0x080001A r1 0 C r0 0x00000000 Registers CPU Memory Execute Instruction: MOVS r1, #0x00 Data Address 0x080001A r15 pc C 0xFFFFFFFF r14 lr r13 sp r12 r11 r10 r9 477 0x080001B r8 ALU 0 4 r7 200 0x080001B r6 0 2 r5 188 0x080001B r4 B 0 220 0x080001A r3 1 E r2 210 0x080001A r1 0x000000 0 00 C r0 0x00000000 Registers CPU Memory Fetch Next Instruction: pc = pc + 2 Data Address 0x080001 r15 pc AE 0xFFFFFFFF r14 lr Thumb-2 consists of a mix of 16- & r13 sp 32-bit instructions r12 2 bytes from the instruction r11 memory are r10 fetched in this r9 477 0x080001B example. r8 ALU 0 4 r7 200 0x080001B r6 0 2 r5 188 0x080001B r4 B 0 220 0x080001A r3 1 E r2 210 0x080001A r1 0x000000 0 00 C r0 0x00000000 Registers CPU Memory Fetch Next Instruction: pc = pc + 2 Decode & Execute: 2201 = MOVS r2, #0x01 Data Address 0x080001 r15 pc AE 0xFFFFFFFF r14 lr r13 sp r12 r11 r10 r9 477 0x080001B r8 ALU 0 4 r7 200 0x080001B r6 0 2 r5 188 0x080001B r4 B 0 220 0x080001A r3 1 E r2 0x000000 01 0x000000 210 0x080001A r1 0 00 C r0 0x00000000 Registers CPU Memory Fetch Next Instruction: pc = pc + 2 Decode & Execute: 188B = ADDS r3, r1, r2 Data Address 0x080001 r15 pc B0 0xFFFFFFFF r14 lr r13 sp r12 r11 r10 r9 477 0x080001B r8 ALU 0 4 r7 200 0x080001B r6 0 2 r5 188B 0x080001B 220 0 r4 1 0x080001A r3 0x000000 01 0x000000 210 E r2 0 01 0x000000 0x080001A r1 00 C r0 0x00000000 Registers CPU Memory Fetch Next Instruction: pc = pc + 2 Decode & Execute: 2000 = MOVS r0, #0x00 Data Address 0x080001 r15 pc B2 0xFFFFFFFF r14 lr r13 sp r12 r11 r10 r9 477 0x080001 r8 ALU 0 B4 r7 200 0x080001 r6 0 B2 r5 188 0x080001 r4 B B0 220 0x080001 r3 1 AE r2 0x000000 01 0x000000 210 0x080001 r1 0 00 0x000000 AC r0 00 0x00000000 Registers CPU Memory Fetch Next Instruction: pc = pc + 2 Decode & Execute: 4770 = BX lr Data Address 0x080001 r15 pc B4 0xFFFFFFFF r14 lr r13 sp r12 r11 r10 r9 477 0x080001B r8 ALU 0 4 r7 200 0x080001B r6 0 2 r5 188 0x080001B r4 B 0 220 0x080001A r3 1 E r2 0x000000 01 0x000000 210 0x080001A r1 0 00 0x000000 C r0 00 0x00000000 Registers CPU Memory Byte based Addressing Each memory address size = 1 byte The program counter is incremented by 1 to move to the next instruction. PC = PC + 1 Example: Intel 8051, Microchip PIC series (8 bits), Atmel AVR (8 bits), Motorola 6800 series. Word based addressing Each memory address size = 2 bytes = 1 Word The program counter is incremented by 2 to move to the next instruction. PC = PC + 2 Example: Intel x86 (16-bit mode), Motorola 68000. DWord based addressing Each memory address size = 4 bytes = 32 bits =1 DWord The program counter is incremented by 4 to move to the next instruction. PC = PC + 4 Example: Cortex M, Intel x86 (32-bit mode), ARM (in 32- bit mode), MIPS (Microprocessor without Interlocked Pipelined Stages). Qword based addressing Each memory address size = 8 bytes = 64 bits =1 QWord The program counter is incremented by 4 to move to the next instruction. PC = PC + 8 Fetch size can be 8 or 4, depending upon the processor design. Example: Intel x86-64, AMD64, ARM (in 64-bit mode). Cortex M – 32-bit or 16-bit PC is always incremented by 4. If bit [15-11] = 11101, 11110, or 11111, then, it is the first half-word of a 32-bit instruction. Otherwise, it is a 16-bit instruction. Executing a Machine Program – Cortex M Example 1: Calculate the Sum of an Array int a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; int total; int main(void){ int i; total = 0; for (i = 0; i < 10; i++) { total += a[i]; } while(1); } Execution: Memory Mapping Instruction Data Memory Memory (Flash) (RAM) int main(void){ int a = {1, 2, 3, 4, int i; 5, 6, 7, 8, 9, 10}; total = 0; int total; for (i = 0; i < 10; i++) { I/O CPU total += a[i]; } Devices while(1); } Starting memory address Starting memory address 0x08000000 0x20000000 Execution: Comparison 0010 0001 0000 0000 0100 1010 0000 MOVS r1, #0x00 Instruction 1000 LDR r2, = Memory 0110 0000 0001 total_addr 0001 STR r1, [r2, (Flash) 0010 0000 0000 #0x00] int main(void){ 0000 MOVS r0, #0x00 int i; 1110 0000 0000 B Check total = 0; 1000 Loop: LDR r1, = a_addr for (i = 0; i < 10; 0100 1001 0000 LDR r1, [r1, r0, i++) { total += a[i]; 0111 LSL #2] } 1111 1000 0101 LDR r2, = while(1); 0001 total_addr } Starting memory address 0001 0000 0010 LDR r2, [r2, 0x08000000 0000 #0x00] 0100 1010 0000 ADD r1, r1, r2 0100 LDR r2, = 0110 1000 0001 total_addr 0010 STR r1, 0100 0100 0001 [r2,#0x00] 0001 ADDS r0, r0, #1 0100 1010 0000 Check: CMP r0, #0x0A Execution: Memory 0x20000054 0x00000000 0x20000050 0x00000000 0x2000004C 0x00000000 0x20000048 0x00000000 Data 0x20000044 0x00000000 Memory (RAM) 0x20000040 0x00000000 0x2000003C 0x00000000 0x20000038 0x00000000 int a = {1, 2, 3, 4, 5, 6, 7, 8, 0x20000034 0x00000000 9, 10}; 0x20000030 0x00000000 int total; 0x2000002C 0x00000000 0x20000028 0x00000000 total= 0x00000000 0x20000024 0x0000000A a = 0x0000000A 0x20000020 0x00000009 a = 0x00000009 0x2000001C 0x00000008 a = 0x00000008 0x20000018 0x00000007 a = 0x00000007 Assume the starting memory address of the data memory is 0x20000014 0x00000006 a = 0x00000006 0x20000000 0x20000010 0x00000005 a = 0x00000005 0x2000000C 0x00000004 a = 0x00000004 0x20000008 0x00000003 a = 0x00000003 0x20000004 0x00000002 a = 0x00000002 0x20000000 0x00000001 a = 0x00000001 Memory content Execution: Example 2: calculates the sum of two global integer variables Comparison of a C program and its equivalent assembly and machine program. "LDR r1, =a" is a pseudo-instruction -- register r1 to the memory address of the variable a. Compilers translate it to "LDR r1, [pc, #12]". The memory address of the variable a is stored at the memory location " [pc, #12]“. Execution: Loading a Program Example 2: calculates the sum of two global integer variables 1. Fetch the instruction from the instruction memory, 2. Decode the instruction, and 3. Execute the arithmetic or logic operation, update the program counter for a branch instruction, or access the data memory for a load or store instruction. Execution: Starting the execution Each processor has a startup program called a bootloader, which sets up the runtime environment after completing self-testing. For an assembly program, PC points to the first instruction of the _main function When the program starts, PC is 0x0800_0160. Because each instruction takes 16 bits in this example, the memory address of the next instruction is 0x0800_0162 Integers takes 4 bytes in memory Execution: Explanation Execution: Explanation Execution: Program Completion The program counter (PC) is kept at 0x0800_016E, repeatedly pointing to the instruction OxE7FE. A dead loop at the end of the main function is where the program ends.