Podcast
Questions and Answers
Which of the following is NOT a typical use case for understanding assembly language?
Which of the following is NOT a typical use case for understanding assembly language?
- Cybersecurity analysis
- Writing portable applications (correct)
- Reverse engineering executable files
- Debugging optimized code
Machine code is designed to be easily understood and modified by humans.
Machine code is designed to be easily understood and modified by humans.
False (B)
What is the primary role of a disassembler in the context of program encoding?
What is the primary role of a disassembler in the context of program encoding?
Translating machine code into assembly language
In instruction encoding, the part that specifies the action to be performed is called the ______.
In instruction encoding, the part that specifies the action to be performed is called the ______.
Match the following x86-64 registers with their common uses:
Match the following x86-64 registers with their common uses:
Which of the following is NOT a valid data size suffix in x86-64 assembly?
Which of the following is NOT a valid data size suffix in x86-64 assembly?
In AT&T assembly syntax, the destination operand comes before the source operand.
In AT&T assembly syntax, the destination operand comes before the source operand.
What is the purpose of the leaq
instruction?
What is the purpose of the leaq
instruction?
An ______ is a word or name that serves as an alias for a memory address, resolved by the assembler or linker.
An ______ is a word or name that serves as an alias for a memory address, resolved by the assembler or linker.
Match the data transfer types with their corresponding examples:
Match the data transfer types with their corresponding examples:
Which of the following registers is commonly used to store the top of the stack?
Which of the following registers is commonly used to store the top of the stack?
The x86-64 architecture is an example of RISC (Reduced Instruction Set Computing).
The x86-64 architecture is an example of RISC (Reduced Instruction Set Computing).
In the context of assembly, what is an operand?
In the context of assembly, what is an operand?
In indirect memory access, the effective address is calculated using the formula: Imm + Rb + Ri * ______.
In indirect memory access, the effective address is calculated using the formula: Imm + Rb + Ri * ______.
Match the following assembly concepts with their descriptions:
Match the following assembly concepts with their descriptions:
Which of the following best describes the role of inline assembly?
Which of the following best describes the role of inline assembly?
When using inline assembly in GCC, registers that are modified by the assembly code but not listed as outputs do not need to be 'clobbered'.
When using inline assembly in GCC, registers that are modified by the assembly code but not listed as outputs do not need to be 'clobbered'.
What does the +
constraint signify in the output section of GCC inline assembly?
What does the +
constraint signify in the output section of GCC inline assembly?
In GCC inline assembly, the ______ section is used to list registers modified by the assembly code that aren't outputs.
In GCC inline assembly, the ______ section is used to list registers modified by the assembly code that aren't outputs.
Match the following indirect memory access components with their descriptions:
Match the following indirect memory access components with their descriptions:
Which of the following is the main advantage of using registers for data access?
Which of the following is the main advantage of using registers for data access?
Data can be directly moved from one memory location to another in a single x86-64 instruction.
Data can be directly moved from one memory location to another in a single x86-64 instruction.
Explain the difference between immediate and register addressing modes.
Explain the difference between immediate and register addressing modes.
In assembly syntax, a memory location is indicated by enclosing the register in ______.
In assembly syntax, a memory location is indicated by enclosing the register in ______.
Match the x86-64 register with its size.
Match the x86-64 register with its size.
In inline assembly syntax, what is the purpose of specifying 'inputs'?
In inline assembly syntax, what is the purpose of specifying 'inputs'?
Assembly language code directly translates to instructions executed by the operating system.
Assembly language code directly translates to instructions executed by the operating system.
Why is it important to 'clobber' registers in inline assembly?
Why is it important to 'clobber' registers in inline assembly?
The x86-64 architecture provides ______ 64-bit general-purpose registers.
The x86-64 architecture provides ______ 64-bit general-purpose registers.
Match the correct instruction suffix with the register used.
Match the correct instruction suffix with the register used.
Which of the following is NOT a typical component of an Assembly instruction?
Which of the following is NOT a typical component of an Assembly instruction?
The MOV instruction copies data, but does not modify the original data.
The MOV instruction copies data, but does not modify the original data.
What is the difference between assembly code and machine code?
What is the difference between assembly code and machine code?
The default scale (s) is ______, in Indirect Memory Access when scale(s) is not present .
The default scale (s) is ______, in Indirect Memory Access when scale(s) is not present .
Match the AT&T syntax and what the registers mean.
Match the AT&T syntax and what the registers mean.
Which of the following best describes the role of the instruction?
Which of the following best describes the role of the instruction?
LEA expression do affect flag
LEA expression do affect flag
Why do we need to know what an immidiate vs a register vs memory access is?
Why do we need to know what an immidiate vs a register vs memory access is?
The Intel microprocessors, beginning with the 80386, use ______ addressing.
The Intel microprocessors, beginning with the 80386, use ______ addressing.
Match the type with form of memory.
Match the type with form of memory.
Which of the following is a valid reason to use assembly instead of a high-level language?
Which of the following is a valid reason to use assembly instead of a high-level language?
Machine code is portable across different CPUs.
Machine code is portable across different CPUs.
In instruction encoding, what are the two primary components?
In instruction encoding, what are the two primary components?
In x86-64 assembly, which of the following registers is commonly used for storing function arguments?
In x86-64 assembly, which of the following registers is commonly used for storing function arguments?
In AT&T assembly syntax, the data flows from _____ to _____. Fill in the blanks.
In AT&T assembly syntax, the data flows from _____ to _____. Fill in the blanks.
Which of the following is NOT a valid addressing mode in x86-64 assembly?
Which of the following is NOT a valid addressing mode in x86-64 assembly?
It is possible to directly move data from one memory location to another with a single instruction.
It is possible to directly move data from one memory location to another with a single instruction.
What is the purpose of 'clobbering' in inline assembly?
What is the purpose of 'clobbering' in inline assembly?
What does the acronym CISC stand for?
What does the acronym CISC stand for?
Given the instruction leaq (%rax,%rbx), %rdx
, if %rax holds 5 and %rbx holds 3, after execution, what value will %rdx hold? Enter the numerical value only without '0x'.
Given the instruction leaq (%rax,%rbx), %rdx
, if %rax holds 5 and %rbx holds 3, after execution, what value will %rdx hold? Enter the numerical value only without '0x'.
Flashcards
Machine Code
Machine Code
A C program translated into machine code, specific to the CPU, easier to read in assembly than machine instructions
Operation
Operation
Part of the machine language which tells the CPU what to do
Operands
Operands
Registers and data used in the function defined in the instruction encoding
Assembly Syntax
Assembly Syntax
Signup and view all the flashcards
MOV instruction
MOV instruction
Signup and view all the flashcards
Code
Code
Signup and view all the flashcards
Registers
Registers
Signup and view all the flashcards
Memory
Memory
Signup and view all the flashcards
Width Suffix in MOV
Width Suffix in MOV
Signup and view all the flashcards
Immediate Access
Immediate Access
Signup and view all the flashcards
Register Access
Register Access
Signup and view all the flashcards
Absolute Memory References
Absolute Memory References
Signup and view all the flashcards
Indirect Memory Access
Indirect Memory Access
Signup and view all the flashcards
Inline Assembly
Inline Assembly
Signup and view all the flashcards
asm() statement
asm() statement
Signup and view all the flashcards
LEA
LEA
Signup and view all the flashcards
Definition of the x86-64 Architecture
Definition of the x86-64 Architecture
Signup and view all the flashcards
Study Notes
Program Encoding and Data Access
- To run, C programs must be translated into machine code.
- Machine code is CPU-specific.
- Machine Code:Encodes instructions so that it's easy for the CPU to decode.
- Machine Code: Has a one-to-one correspondence with assembly instructions.
Why Assembly?
- Debugging: Optimized code flow differs from program code; statements execute out of order.
- Code Optimization: To find areas suitable for hand optimization and write code that's easy to optimize.
- Hardware Manipulation: Manipulation of special registers and memory locations.
- Reverse Engineering: To analyze executables when source code isn't available.
- Cybersecurity: Computer virus analysis and exploit correction.
What to Understand
- The basic architecture of the CPU, including RISC vs CISC, register sizes and purposes, and types of instructions
- Addressing modes such as Immediate, Register, and Memory
- Assembly instruction format (e.g., AT&T vs Intel)
- How to mix assembly with high-level languages like C using standalone or inline assembly
Instruction Encoding
- Instruction encoding comprises operation (what to do) and operands (what data to use).
- Operation is typically encoded as leading bytes.
- The operation dictates the operands needed.
- Instructions can have 0 to 4 operands.
- Operands can have three forms: Immediate (constant), Register (direct), or Memory (indirect).
Register Encodings
- Common registers include rax, rbx, rcx, rdx, rsp, rbp, rsi, and rdi, with variations for 32-bit (e.g., eax), 16-bit, and 8-bit operations.
- Each register has a specific name and common use such as return values/system calls (rax), loop counters (rcx), data (rdx), base pointer for memory access (rbx).
- Instructions such as
mov
use register encodings to specify source and destination.
Looking up Information
- AT&T assembly style reference is available for Assembly and Machine Code, with free ebook and author's website.
- Intel style list of opcodes and Software Developer Manuals are available for Intel 64 and IA-32 Architectures.
- Online translators are available to translate assembly into machine code.
Disassembler
- Use a disassembler to inspect assembly code.
- Write the C program and compile with debugging information
gcc -g -o main main.c
- Debug with GDB via
gdb ./main
. - Disassemble the main function using
(gdb) disassemble main
or useobjdump -d main
.
x86-64 Architecture
- To use a CPU, understanding available resources is crucial.
- Key aspects include register architecture, general assumptions about the instruction set, and basic assembly syntax.
Registers
- The x86-64 architecture provides 16 64-bit general-purpose registers like %rax, %rcx, %rdx, %rbx, %rsi, %rdi, %rsp, %rbp, and %r08-%r15.
- %rsp and %rbp are reserved for stack and base pointers, respectively.
- Registers %r08-%r15 were added in the 64-bit architecture, with variations for 32-bit, 16-bit and 8-bit.
- Special-purpose registers like %rsp and %rbp are only directly manipulated during function calls and cannot be loaded or stored like general-purpose registers.
Instruction Set for x86-64
- The x86-64 uses the CISC instruction set with a variety of available instructions.
- Instructions include data transfer, arithmetic operations, bitwise manipulation, and branch instructions.
- The data transfer and manipulation instructions require operands of immediate, registers, or memory type.
- An
add
instruction modifies memory directly. - Instructions typically have only one memory (indirect) operand.
- Instructions cannot move data from one memory location to another in one instruction.
Basic Assembly Syntax
- Assembly code consists of instructions, typically one per line.
- Instructions start with an operation mnemonic like "mov", "add", or "call".
- Operands can be immediate constants (e.g., $42), immediate labels, registers (e.g., %rax), absolute memory, or indirect memory (enclosed in parentheses).
- Lines starting with
.
are assembler directives.
AT&T Assembly Syntax
- The AT&T assembly syntax (from Bell Labs) flows data from left to right, with the source preceding the destination operand.
Data Access in x86-64
- Most instructions transfer data, involving loading, modifying, or storing data.
- Data can be stored as Code(immediate values, constants), Registers (on the CPU), or Memory(RAM).
- The CPU can perform five kinds of data transfers between code, register and memory
- Code cannot change.
Data Access- q, I, w, b
- Instruction suffixes like q, l, w, and b specify the size of the data being moved (64-bit, 32-bit, 16-bit, and 8-bit, respectively).
- The assembler can infer the width from the operands if not specified.
Data Access- Fast v's Easy
- Immediate values are constants encoded as part of the instruction and begin with a
$
sign. - Registers are on the CPU, indicated by
%
, allowing for fast data access. - Absolute memory references uses an immediate value as a direct reference to a memory location.
Indirect memory access in x86-64
- Indirect memory access in x86 is designed with arrays in mind; uses one or more registers as pointers to access a location in memory.
- The general format for indirect access is
Imm(Râ™,Ráµ¢,s)
Imm
is an immediate constant, such as 42.Râ™
is the base register (e.g., %rsi).Ráµ¢
is the index register (e.g., %rax).s
is a scale factor (1, 2, 4, or 8).
- To calculate the address = Imm + Râ™ + Ráµ¢*s
- If an Imm or R is not present, treat it as O in the general formula. If scale (s) is not present, the default is 1.
- Common uses of simple indirect addressing include simple indirect & indirect with displacement
Inline Assembly in GCC
- Inline assembly is writing assembly code within C code.
- Inline assembly avoids writing assembly on its own.
- An
asm(...)
statement includes assembly code, inputs, outputs and registers clobbered by the assembly code. - The compiler insertes assembly code into compilation output, to provide the inputs and the outputs.
Inline Assembly Details
- The
asm(...)
statement includes assembly code, inputs, outputs, and registers clobbered by the assembly code. - Output variables have result that are after the assembly, Inputs variables are listed as read-only, Clobbers registers are restored by the compiler
- Observations
- % is the escape character, so registers in inline assembly are prefixed with %% instead of just %.
- Each assembly line is its own string and ends with a ‘;’ -The %0, %1, and %2 are place holders for the inputs and outputs
- Assembly code is written in AT&T syntax with each line terminated by a semicolon.
- Clobbering registers includes "rax" and "rdx" (tells the compiler to reload the registers after the assembly).
- Register is only clobbered if by name is used in the ASM code and the contents are being modified, unless the compiler might assume a register is unchanged.
Mathematical Expressions
- The LEA (load effective address) statement calculates mathematical operations (additions and small multiplications) without affecting flags, unlike arithmetic expressions.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.