Chapter3_1-3-v2.pdf

Chapter 3 ARM Assembly Language 3.1 ARM Assembly Language and programming environment 2 ARM assembly language ARM instructions are written in the form Label Op-code operand1, operand2, operand3 //comment Consider the following example implementing a loop. Test_5 ADD r0,r1,r2 //TotalTime = Time + NewTime SUBS r7,#1 //Decrement loop counter BEQ Test_5 //IF zero THEN goto Test_5 The Label field is a user-defined label that can be used by other instructions to refer to that line. Any text following a double slash is regarded as a comment field and is ignored by the assembler. 3 ARM assembly language and CPUlator We use CPUlator Simulator as the programming environment in this course. More details about this simulator will be provided in Lab 1. Web-based simulator and no installation needed: https://cpulator.01xz.net/?sys=arm-de1soc Consider this program that you will see in Lab 1:.org.text.global _start: mov r0, mov r1, add r2, 0x1000 // Start at memory location 1000 // Code section _start #9 #0xE r1, r0 // Store decimal 9 in register r0 // Store hex E (decimal 14) in register r1 // Add the contents of r0 and r1 and put result in r2 // End program _stop: b _stop.end 4 ARM assembly language and CPUlator Two types of statements: 1. Assembler directives Used to control the assembler Do not generate machine code Highlighted in red. All directives in CPUlator starts with a period. 2. Executable instructions Do generate machine code to perform various operations Pseudoinstruction: instruction available to the programmer but not part of the processor’s instruction set. Shorthand expression that an assembler converts to appropriate machine code (e.g., ldr r1, =0xFF896788 – discuss in more detail after covering the fundamentals). 5 Directives used in the sample program.org – Format:.org memory_address – tells the compiler where in memory to place the program..text – identifies the code section of the program..global – Format:.global label – identifies globally visible labels..end – identifies the end of the source code and data. 6 Instructions used in the sample program mov (Move) Puts a value into a register. Two formats: – – mov r0, #9 puts the (immediate) decimal value 9 into register r0. mov r1, r0 copies the value in register r0 into register r1. add (Add) adds two values and places the results into a register. Two formats: – – add r2, r1, r0 adds the contents of r0 and r1 and puts the result in register r2. add r3, r1, #7 adds the contents of r1 and the decimal value 7 and puts the result in register r3. b (Branch) Branches execution to a memory location. The instruction: _stop: b _stop terminates a program by branching to its own location. 7 CPUlator interface 8 CPUlator interface Address The address of the instruction. The program starts at address 0x1000 as specified by the.org instruction. Opcode 32-bit instruction encoded in binary format Specify the operation (e.g., ADD) and the operands. 9 CPUlator interface Registers Memory 10 CPUlator interface You will learn the use of debugging tools CPUlator in Lab 2 and 3. The following statement you encounter in Lab 2 is a pseudoinstruction: – ldr r0, =0x12345678 – This pseudoinstruction puts a number into a register – You will learn more why we need this statement and how this pseudoinstruction works later. 11 3.2 ARM Data-processing instructions 12 ARM Data Processing Instructions Consist of : – – – – Arithmetic: ADD Logical: AND Comparisons: CMP Data movement: ADC SUB SBC ORR EOR BIC CMN TST TEQ MOV MVN RSB RSC These instructions only work on registers, NOT memory. Syntax: {}{S} Rd, Rn, Operand2 Comparisons set flags only - they do not specify Rd Data movement does not specify Rn Second operand is sent to the ALU via barrel shifter. 13 Updating ARM conditional codes ARM does not automatically update its status flags after an operation Update-on-demand: status flags are updated only the programmer puts an “S” suffix in the mnemonic, i.e., – ADD r1, r2, r3 does not update status flags – ADDS r1, r2, r3 updates status flags This allows the programmer performs a test, and then carry out other instructions without changing the flags: SUBS r1, r1, #1 ; subtract 1 from r1 and set status bits ADD r2, r2, #4 ; increment r2 (don’t update status bits) BEQ Error ; if r1 is zero then deal with the problem 14 Addition Suppose you have a computer capable of adding 8-bit values but you want to add two 16-bit values. You can divide the 16-bit addition to two 8-bit additions, but the answer is not correct if there is a carry in the addition of the least significant byte: Addition of two 8-bit number without carry 34 32 + 57 DF 8B 11 Addition of two 8-bit number with carry 34 32 + 57 DF C 8C 11 15 Multi-Word Addition The following shows how a 64-bit addition is performed: ADDS r4, r0, r2 ADC r5, r1, r3 Must use S after ADD, as the C flag in the status register (CPSR) must be updated. ADC: add with carry 16 Example 35F62562FA +21F412963B 17 Subtraction Subtraction: SUB r1, r2, r3: [r1]←[r2]-[r3] SBC: subtract with borrow Reverse subtraction: RSB r1, r2, r3: [r1]←[r3]-[r2] As the literal must be in the second operand, RSB allows you to subtract the register content from a literal, e.g., – RSB r1, r2, #10: [r1]←10-[r2] No negation operation in ARM. RSB also allows you to implement negation: – RSB r1, r1, #0: [r1]← 0-[r1] 18 Examples: SUBS C = 1, Z = 0, N = 0 and V = 0 Note: if C = 1, No Borrow if C = 0, Borrow C = 1, Z = 0, N = 0 and V = 0 19 Example: Multi-Word Subtraction 35F412963B - 21F62562FA 00000035 - 00000021 F412963B - F62562FA 20 Bitwise logical operations ARM does not have a NOT instruction of this form, but the NOT instruction can be implemented by one of the following two approaches: 1. mvn r2, r1: mvn copies the logical complement of r1 into r2 2. Apply EOR with the second operand equal to 0xFFFFFFFF. Note for each bit x, 𝑥 𝐸𝑂𝑅 1 = 𝑥ҧ (i.e., 0 EOR 1 = 1 1 EOR 1 = 0) 21 Shift operations In eight bits, if the carry C = 1 and the word to be shifted is 01101110, a rotate left through carry would give 11011101 and carry = 0 22 The Barrel Shifter in ARM 23 Using the Barrel shifter: The second operand ARM combines shifting with other data processing operations, because the second operand can be shifted before it is used. Consider: ADD r0,r1,r2, LSL #1 A logical shift left is applied to the contents of r2 before they are added to the contents of r1. This operation is equivalent to [r0] ← [r1] + [r2] x 2. Logical shift without other operations can be achieved by, e.g.: MOV r0, r0, LSL #1 Operand 1 Operand 2 Barrel Shifter ALU Result 24 Static vs. Dynamic Shifts Static shift The number of shifts is fixed in the code. e.g., ADD r0,r1,r2, LSL #1 implements [r0]←[r1]+[r2]x2 Dynamic shift Consider ADD r0,r1,r2, LSL r3, which implements [r0]←[r1]+[r2]x2 𝑟3 This allows you to change the number of shifts at runtime. 25 Instruction Encoding 12 bits 4 bits 4 bits static shift Bit 25: The second operand is a register (0) or a literal (1) 4 bits dynamic shift 4 bits 4 bits for each of the destination and first source operand. 12 bits for the second operand consisting of: – shift type – static shift: 5-bit unsigned integer in Bit 7 to 11 specifies the number of shifts. Range is 0 to 31. – dynamic shift: 4-bit address of the register in Bit 8 to 11 specifies the register containing the number of shifts. 26 Instruction Encoding Bit 21-24: Opcode 27 Literal Encoding No ARM instruction can contain a 32bit immediate constant The data processing instruction format has 12 bits available for operand2 The literal processed by the instruction is the 8-bit value stored in Bits 0-7, shifted by ROR by 2 times the value stored in Bits 8-11. 28 Example of literal encoding Rotate left Rotate right Encode ror shift divided by 2 (ror) = 32 - rol (rol) → Encode 0 → 28 = 2 x14 Encode 14 decimal or E in hex → 22 = 2 x11 Encode 11 decimal or B in hex →8=2x4 Encode 4 decimal or 4 in hex 29 Example of literal encoding The assembler converts immediate values to the rotate form: – MOV r0,#4096; uses 0x01, left shift 12, ror 20, encode 10 or A machine code: E3A00A01 – ADD r1,r2,#0xFF0000; uses 0xFF, left shift 16, ror 16, encode 8 machine code: E28218FF Should be able to convert least significant 12 bit of machine code to the literal being encoded The MVN (move negative) instruction MVN r0, #XX can specify unshifted constant in the range 0xFFFFFF00 to 0XFFFFFFFF. e.g., – MOV r0, #0xFFFFFFFF ;assembles to MVN r0,#0 Values that cannot be generated in this way will cause an error. 30 More examples on literal encoding Determine the literal operand encoded by the following 32-bit machine code. Express your answer in hexadecimal representation. (a)E3A00C67 (b)E2810E89 (c) E283260F (d)E3A04807 31 More examples on literal encoding Determine the literal operand encoded by the following 32bit machine code. Express your answer in hexadecimal representation. (a) E3A00C67 Encoded C → ror 12 × 2 = 24 → left shift = 32-24 = 8. Literal = 67 shifted by 8 bits to the left = 6700 (b) E2810E89 Encoded E → ror 14 × 2 = 28 → left shift = 32-28 = 4. Literal = 89 shifted by 4 bits to the left = 890 (c) E283260F Encoded 6 → ror 6 × 2 = 12 → left shift = 32-12 = 20. Literal = 0F shifted by 20 bits to the left = F00000 (d) E3A04807 Encoded 8 → ror 8 × 2 = 16 → left shift = 32-16 = 16. Literal = 07 shifted by 16 bits to the left = 70000 32 Multiplication Take two m-bit operands → output a 2m-bit result. Multiplication would not be correct using 2’s complement values. Cannot use the same multiplication operation in signed and unsigned values. Focus on unsigned here. For more information: https://bohr.wlu.ca/cp216/docs/QRC0001_UAL.pdf 33 Unsigned multiplications Operation Assembler Action Cannot use the same register to specify both Rd and Rm. MUL, MLA and MLS have the multiplication results truncated to the lower-order 32 bits. UMULL and UMLAL store the full 64-bit result into two registers RdLo and RdHi. 34 3.3 ARM Flow Control Instructions 35 Unconditional branch Format: B target, where target denotes the branch target address (the address of the next instruction to be executed). Example:.... do this Some code then that Some other code B Next Now skip past next instructions.. …the code being skipped past.. …the code being skipped past Next.. Target address for the branch 36 Conditional branch IF (X == Y) THEN Y = Y + 1; ELSE Y = Y + 2 A test is performed and one of two courses of action is carried out depending on the outcome. We can translate this as: CMP BNE ADD B Plus2 ADD leave … r1,r2 Plus2 r1,r1,#1 leave r1,r1,#2 ;r1 contains y and r2 contains x: compare them ;if not equal, then branch to the else part ;if equal, fall through to here and + 1 to y ;now skip past the else part ;ELSE part add 2 to y ;continue from here The conditional branch instruction tests flag bits in the processor’s CPSR, then takes the branch if the tested condition is true. Here CMP computes [r1]-[r2] and update the CPSR, the conditional branch tests for the condition NE (not equal), corresponding to !Z, i.e., condition is true when Z = 0. CMP always updates the CPSR flags (i.e., no S suffix is needed) 37 Condition Codes The code of possible conditions is tabulated here: 38 Branching for signed and unsigned data Branch instruction can refer to signed and unsigned data Consider the example: – [r1] = 0x90000075 – [r2] = 0x50000075 – Which value is larger? The answer depends on whether you interpret the number as signed or unsigned numbers. – if interpreted as unsigned numbers, [r1] > [r2]. – if interpreted as signed numbers, [r1] is a negative number and [r2] is a positive number. Thus [r2] > [r1]. 39 Branching for signed and unsigned data The computer does not know whether you want to interpret the comparison as signed and unsigned. When executing CMP r1, r2, the computer would just perform [r1] – [r2] and update CPSR. 90000075 - 50000075 90000075 +AFFFFF8B (2’s complement) 1 40000000 CPSR: C = 1, N = 0, V = 1 (-ve + -ve = +ve), Z = 0 If programmer decides to test [r1]>[r2] and consider the operands as unsigned numbers → Use BHI (Unsigned higher) – BHI branches as the condition C and !Z are satisfied If programmer decides to test [r1]>[r2] and consider the operands as signed numbers → Use BGT (Signed greater than) – BGT does not take the branch as the condition N != V 40 Test instruction: TST Two test instructions that explicitly update CPSR: TST Rn, Operand2 updates CPSR on Rn AND Operand2 Useful in testing whether an individual bit of a word is 1. E.g., Bit 5 of a lower-case ASCII character is set to 1. To test whether an ASCII character in r0 is lower-case: TST r0, #2_0010000 BNE LowerCase // jump to code handling lower case – If lower-case, then the result is non-zero (i.e., Z = 0) → Branch taken – If capital, result is zero (i.e., Z = 1) → Branch not taken 41 Test instruction: TEQ TEQ Rn, updates CPSR on Rn EOR Operand2 Z-bit is high if [Rn] == [Operand2] – [Rn] == [Operand2]: All bits in the result are 0 (i.e., Z = 1) – [Rn] != [Operand2]: At least one bit in the result are 1 (i.e., Z = 0) Similar to CMP but TEQ does not update the overflow flag (no concept of V in logical operations). Only used to test equivalence, not less/larger than conditions. 42 Conditional Execution In ARM, each instruction is conditionally executed (i.e., an instruction is executed only if a condition after the conditional code list is satisfied) So far, we have only dealt with the default case always: – ADD r0, r1, r2 means ADDAL r0, r1, r2 Bits 28-31 in Fig. 3.26 are used to encode the condition. e.g., if Z = 1 then [r0] ←[r1]+[r2] can be implemented as: ADDEQ r0, r1, r2 43 Example 1 This improves code density and performance by reducing the number of forward branch instructions. E.g., if (a == 0); b = c + d CMP BNE ADD skip r3,#0 skip r0,r1,r2 CMP r3,#0 ADDEQ r0,r1,r2 44 Example 2 a==b ab 45 Pipelining ARM7 has a threestage pipeline: – fetch – decode – execute When ADD is being executed (Cycle 3), the pc is pointing to CMP, which is two instructions (8 bytes) below ADD. pc = executed address + 8 Address Instruction For our example, when ADD 100C ADD 1010 SUB is executed: – pc = 0x100C + 8 = 0x1014 pc 1014 CMP 46 No Pipelining 30 30 30 30 30 30 30 30 1  2  3  4  47 With Pipelining 30 30 30 30 30 1  2  3  4  48 Pipelining When ADD is being executed (Cycle 3), the PC is pointing to CMP, which is two instructions (8 bytes) below ADD. PC = executed address + 8 This has an effect on address encoding, as described below. Address Instruction pc 100C 1010 1014 ADD SUB CMP 49 Branch instruction encoding How does the machine code encode the destination address of the branching operation? 50 Branch instruction encoding The 24-bit signed offset stores the number of instructions of the destination instruction with respect to the current PC. Note each instruction has 4 bytes. Thus, the 24-bit offset is shifted left twice to convert the instruction offset to byte offset → 26-bit signed offset Range of offset: -32 Mbytes to +32 Mbytes or -8 M instructions to +8 M instructions 26-bit signed: -225 to 225-1 bytes 24-bit signed: -223 to 223-1 instructions 1M = 220 51 Branch offset Instruction 1 Current PC → Instruction 2 Instruction 1 Current PC → Instruction 2 (a) Backward branch (-ve offset) (a) Forward branch (+ve offset) Note that due to pipelining, when a branch instruction is executed, the PC points to two instructions below the branching instruction. 52 Example 1 Address Machine 2’s complement 000004 FFFFFB + 1 FFFFFC Encoded address 𝐷𝑒𝑠𝑡 − 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝑃𝐶 = 4 Dest = 0x08 Current PC = 0x18 (two instructions beyond bne) Encoded address = 0𝑥08−0𝑥18 = −4 4 = 0xFFFFFC 53 Example 2 Determine the encoded destination addresses in the covered machine code: 54 Example 2 55 56 Branch penalty MOV R1, L1: ADD R2, B L1 MOV R3, MOV R4, #1 R2, #1 #3 #4 First instruction takes 3 cycles to pass through the stages of the pipeline Thereafter, instructions take one cycle to execute The B L1 instruction changes the sequential order of program execution. The instruction in the fetch and decode stages must be dumped (or flushed). Effectively the B L1 instruction takes 3 instruction cycles to execute. 3 instruction cycles to execute B L1 57 Example mov r0, #255 Again: subs r0, r0, #1 //1 cycle bne Again //1 cycle if not branch, 3 if branch How many instruction cycles are generated by the Again loop? [r0] Instruction Cycles 1st repetition 255 → 254 4 2nd repetition 254 → 253 4 …… …… …… 255th repetition 254 x 4 1→0 2 (bne branch not taken) Total 1018 58

Chapter3_1-3-v2.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue