Assembly Language Programming I: Introduction PDF
Document Details
Uploaded by SmoothestSunstone
UMass Boston
Tags
Summary
These lecture notes provide an introduction to assembly language programming, covering key concepts and examples. The document introduces instructions, addressing memory, and integrates C and assembly code. The focus is on introductory concepts, preparing students to understand assembly language.
Full Transcript
Assembly Language Programming I: Introduction 1 Homework • Reading – Professional Assembly Language, pp 73-106 – Also, study website references: • Using gas: http://www.sourceware.org/binutils/docs-2.12/as.info/index.html • i386 Assembly Instruction Set http://www.cs.umb.edu/~cheungr/cs341/Instru...
Assembly Language Programming I: Introduction 1 Homework • Reading – Professional Assembly Language, pp 73-106 – Also, study website references: • Using gas: http://www.sourceware.org/binutils/docs-2.12/as.info/index.html • i386 Assembly Instruction Set http://www.cs.umb.edu/~cheungr/cs341/Instructions.pdf • Labs – Continue with labs in your assigned section 2 C versus Assembly Language • C is called a “portable assembly language” – Allows low level operations on bits and bytes – Allows access to memory via use of pointers – Integrates well with assembly language functions • Advantages over assembly code – Easier to read and understand source code – Requires fewer lines of code for same function – Does not require knowledge of the hardware 3 C versus Assembly Language • Good reasons for learning assembly language – It is a good way to learn how a processor works – In time-critical sections of code, it is possible to improve performance with assembly language – In writing a new operating system or in porting an existing system to a new machine, there are sections of code which must be written in assembly language such as the cpuid example in this lecture 4 Best of Both Worlds • Integrating C and assembly code • Convenient to let C do most of the work and integrate with assembly code where needed • Make our gas routines callable from C – Use C compiler conventions for function calls – Preserve registers that C compiler expects saved 5 Instruction’s Four Field Format • Instruction example: here: movb $0x20, %eax # move a number 0x20 to eax register • Label: – Can be referred to as a representation of the address – Usual practice is to place these on a line by themselves • Mnemonic to specify the instruction and size – Makes it unnecessary to remember instruction code values • Operand(s) on which the instruction operates (if any) – Zero, one, or two operands depending on the instruction • Comment contains documentation – It begins with a # anywhere and goes to the end of the line 6 – It is very important to comment assembly code well!! Assembly Framework for a Function • General form for a function in assembly is: .globl .text mycode mycode: . . . ret .data Assembler Directives mydata: .long .end 17 7 Assembler Directives • Defining a label for external reference (call) .globl mycode • Defining the code section of program (ROM) .text • Defining the static data section of program (RAM) .data • End of the Assembly Language .end 8 Assembler Directives for Sections • These directives designate sections where we want our assembler output placed into memory – .text places the assembler output into program memory space (e.g., where ROM will be located) – .data places the assembler output into a static initialized memory space (e.g., where RAM will be located) – .bss allows assembler to set labels for uninitialized memory space (we won’t be using this section) – .section ignore/omit this directive with our assembler • In builds, ld is given addresses for the sections *Note: bss – block start by symbol (area initialized by OS) 9 Assembler Directives • Defining / initializing static storage locations: label1: .long 0x12345678 label2: .word 0x1234 label3: .byte 0x12 # 32 bits # 16 bits # 8 bits 10 Assembler Directives • Defining / initializing a string label1: .ascii “Hello World\n\0” label2: .asciz “Hello World\n” 11 Defining Constant Values • Constant definitions follow C conventions: $123 $0x123 $‘a’ $‘\n’ # # # # decimal constant hex constant character constant character escape sequence • With the following exception: $‘\0’ #this results in‘0’ instead of 0 # to get around problem, use $0 Symbolic Constant Names • Allow use of symbols for numeric values – Perform same function as C preprocessor #define – Unfortunately, not the same format as used in C preprocessor so can’t just include .h files to define symbols across combination of C/assembly code – Format: SYMBOL = value – Example: NCASES = 8 movl $NCASES, %eax 13 Addressing Memory • Direct addressing for memory – gas allows use of hard coded memory addresses – Not recommended except for HW based addresses – Examples: .text movb %al, 0x1234 movb 0x1234, %dl . . . 14 Addressing Memory • Direct addressing for memory – gas allows use of a label for memory address – Examples: .text movb %al, total movb total, %dl . . . .data total: .byte 0 15 Addressing Memory • Indirect - like *pointer in C – Defined as using a register as the address of the memory location to access in an instruction movl $0x1234, %ebp movb (%ebp), %al Memory %ebp 0x00001234 address One byte %al 16 Addressing Memory • Indirect with Offset - like *(pointer+4) in C – May also be done with a fixed offset, e.g., 4 movl $0x1234, %ebp movb 4(%ebp), %al %ebp 0x00001234 address Memory Low Address +4 %al One byte High address 17 Addressing Memory • Memory-memory addressing restrictions – Why can’t we write instructions such as these? movl first, second # direct movl (%eax), (%ebx) # indirect – Intel instruction set does not support instructions to move a value from memory to memory! • Must always use a register as an intermediate location for the value being moved, e.g. movl first, %eax movl %eax, second # direct from mem # direct to mem 18 Integrating C and Assembly example • Pick up the makefile from mp2 • Always read the makefile for a program first! • The makefile in mp2 expects a “matched pair” – C driver filename is mycodec.c – Assembly filename is mycode.s • The make file uses macro substitutions for input: – The format of the make command is: make A=mycode 19 Example: Function cpuid • C “driver” in file cpuidc.c to execute code in cpuid.s /* cpuidc.c - C driver to test cpuid function * bob wilson - 1/15/2012 */ #include <stdio.h> extern char *cpuid(); /* our .s file is external*/ int main(int argc, char **argv) { printf("The cpu ID is: %s\n", cpuid()); return 0; } 20 Example: Function cpuid • Assembly code for function in file cpuid.s # cpuid.s C callable function to get cpu ID value .data buffer: .asciz "Overwritten!" # overwritten later .text .globl cpuid cpuid: movl $0,%eax # zero to get Vendor ID cpuid # get it movl $buffer, %eax # point to string buffer movl %ebx, (%eax) # move four chars movl %edx, 4(%eax) # move four chars movl %ecx, 8(%eax) # move four chars ret # string pointer is in %eax .end 21 Self Modifying Code • Our assembler does not actually support cpuid instruction, so I made the code self-modifying: . . . cpuid: movb $0x0f, cpuid1 movb $0xa2, cpuid2 movl $0,%eax cpuid1: nop cpuid2: nop . . . # # # # # patch in the cpuid first byte patch in the cpuid second byte input to cpuid for ID value hex for cpuid instruction here 0x0f replaces 0x90 # 0xa2 replaces 0x90 22 Self Modifying Code • Obviously, the self modifying code I used for this demonstration would not work if: – The code is physically located in PROM/ROM – There is an O/S like UNIX/Linux that protects the code space from being modified (A problem that we avoid using Tutor on our SAPC’s) • Try justifying this “kludge” to the maintenance programmer!! 23 Self Modifying Code • Here is self-modifying code in C: int main(int argc, char **args) { // array to hold the machine code bytes of the function static char function [100]; // // // // I used static memory now put some machine must put the address must put the machine for the array so I could find its address in syms file code instructions byte by byte into the function array of a return string in the %eax before returning code for an assembly ret instruction (0xc3)at the end function[0] = 0xb8; // move address of the string to %eax . . . function[5] = 0xc3; // and return . . . // execute the function whose address is the array printf("%s\n", (* (char * (*)()) function) ()); return 0; } 24