Assembly Language Programming I: Introduction PDF

Assembly Language Programming I: Introduction 1 Homework • Reading – Professional Assembly Language, pp 73-106 – Also, study website references: • Using gas: http://www.sourceware.org/binutils/docs-2.12/as.info/index.html • i386 Assembly Instruction Set http://www.cs.umb.edu/~cheungr/cs341/Instructions.pdf • Labs – Continue with labs in your assigned section 2 C versus Assembly Language • C is called a “portable assembly language” – Allows low level operations on bits and bytes – Allows access to memory via use of pointers – Integrates well with assembly language functions • Advantages over assembly code – Easier to read and understand source code – Requires fewer lines of code for same function – Does not require knowledge of the hardware 3 C versus Assembly Language • Good reasons for learning assembly language – It is a good way to learn how a processor works – In time-critical sections of code, it is possible to improve performance with assembly language – In writing a new operating system or in porting an existing system to a new machine, there are sections of code which must be written in assembly language such as the cpuid example in this lecture 4 Best of Both Worlds • Integrating C and assembly code • Convenient to let C do most of the work and integrate with assembly code where needed • Make our gas routines callable from C – Use C compiler conventions for function calls – Preserve registers that C compiler expects saved 5 Instruction’s Four Field Format • Instruction example: here: movb $0x20, %eax # move a number 0x20 to eax register • Label: – Can be referred to as a representation of the address – Usual practice is to place these on a line by themselves • Mnemonic to specify the instruction and size – Makes it unnecessary to remember instruction code values • Operand(s) on which the instruction operates (if any) – Zero, one, or two operands depending on the instruction • Comment contains documentation – It begins with a # anywhere and goes to the end of the line 6 – It is very important to comment assembly code well!! Assembly Framework for a Function • General form for a function in assembly is: .globl .text mycode mycode: . . . ret .data Assembler Directives mydata: .long .end 17 7 Assembler Directives • Defining a label for external reference (call) .globl mycode • Defining the code section of program (ROM) .text • Defining the static data section of program (RAM) .data • End of the Assembly Language .end 8 Assembler Directives for Sections • These directives designate sections where we want our assembler output placed into memory – .text places the assembler output into program memory space (e.g., where ROM will be located) – .data places the assembler output into a static initialized memory space (e.g., where RAM will be located) – .bss allows assembler to set labels for uninitialized memory space (we won’t be using this section) – .section ignore/omit this directive with our assembler • In builds, ld is given addresses for the sections *Note: bss – block start by symbol (area initialized by OS) 9 Assembler Directives • Defining / initializing static storage locations: label1: .long 0x12345678 label2: .word 0x1234 label3: .byte 0x12 # 32 bits # 16 bits # 8 bits 10 Assembler Directives • Defining / initializing a string label1: .ascii “Hello World\n\0” label2: .asciz “Hello World\n” 11 Defining Constant Values • Constant definitions follow C conventions: $123 $0x123 $‘a’ $‘\n’ # # # # decimal constant hex constant character constant character escape sequence • With the following exception: $‘\0’ #this results in‘0’ instead of 0 # to get around problem, use $0 Symbolic Constant Names • Allow use of symbols for numeric values – Perform same function as C preprocessor #define – Unfortunately, not the same format as used in C preprocessor so can’t just include .h files to define symbols across combination of C/assembly code – Format: SYMBOL = value – Example: NCASES = 8 movl $NCASES, %eax 13 Addressing Memory • Direct addressing for memory – gas allows use of hard coded memory addresses – Not recommended except for HW based addresses – Examples: .text movb %al, 0x1234 movb 0x1234, %dl . . . 14 Addressing Memory • Direct addressing for memory – gas allows use of a label for memory address – Examples: .text movb %al, total movb total, %dl . . . .data total: .byte 0 15 Addressing Memory • Indirect - like *pointer in C – Defined as using a register as the address of the memory location to access in an instruction movl $0x1234, %ebp movb (%ebp), %al Memory %ebp 0x00001234 address One byte %al 16 Addressing Memory • Indirect with Offset - like *(pointer+4) in C – May also be done with a fixed offset, e.g., 4 movl $0x1234, %ebp movb 4(%ebp), %al %ebp 0x00001234 address Memory Low Address +4 %al One byte High address 17 Addressing Memory • Memory-memory addressing restrictions – Why can’t we write instructions such as these? movl first, second # direct movl (%eax), (%ebx) # indirect – Intel instruction set does not support instructions to move a value from memory to memory! • Must always use a register as an intermediate location for the value being moved, e.g. movl first, %eax movl %eax, second # direct from mem # direct to mem 18 Integrating C and Assembly example • Pick up the makefile from mp2 • Always read the makefile for a program first! • The makefile in mp2 expects a “matched pair” – C driver filename is mycodec.c – Assembly filename is mycode.s • The make file uses macro substitutions for input: – The format of the make command is: make A=mycode 19 Example: Function cpuid • C “driver” in file cpuidc.c to execute code in cpuid.s /* cpuidc.c - C driver to test cpuid function * bob wilson - 1/15/2012 */ #include <stdio.h> extern char *cpuid(); /* our .s file is external*/ int main(int argc, char **argv) { printf("The cpu ID is: %s\n", cpuid()); return 0; } 20 Example: Function cpuid • Assembly code for function in file cpuid.s # cpuid.s C callable function to get cpu ID value .data buffer: .asciz "Overwritten!" # overwritten later .text .globl cpuid cpuid: movl $0,%eax # zero to get Vendor ID cpuid # get it movl $buffer, %eax # point to string buffer movl %ebx, (%eax) # move four chars movl %edx, 4(%eax) # move four chars movl %ecx, 8(%eax) # move four chars ret # string pointer is in %eax .end 21 Self Modifying Code  • Our assembler does not actually support cpuid instruction, so I made the code self-modifying: . . . cpuid: movb $0x0f, cpuid1 movb $0xa2, cpuid2 movl $0,%eax cpuid1: nop cpuid2: nop . . . # # # # # patch in the cpuid first byte patch in the cpuid second byte input to cpuid for ID value hex for cpuid instruction here 0x0f replaces 0x90 # 0xa2 replaces 0x90 22 Self Modifying Code  • Obviously, the self modifying code I used for this demonstration would not work if: – The code is physically located in PROM/ROM – There is an O/S like UNIX/Linux that protects the code space from being modified (A problem that we avoid using Tutor on our SAPC’s) • Try justifying this “kludge” to the maintenance programmer!! 23 Self Modifying Code  • Here is self-modifying code in C: int main(int argc, char **args) { // array to hold the machine code bytes of the function static char function [100]; // // // // I used static memory now put some machine must put the address must put the machine for the array so I could find its address in syms file code instructions byte by byte into the function array of a return string in the %eax before returning code for an assembly ret instruction (0xc3)at the end function[0] = 0xb8; // move address of the string to %eax . . . function[5] = 0xc3; // and return . . . // execute the function whose address is the array printf("%s\n", (* (char * (*)()) function) ()); return 0; } 24

Assembly Language Programming I: Introduction PDF

Document Details

Tags

Related

Summary

Full Transcript