Full Transcript

IT2104 Assembly Language Basics architecture,...

IT2104 Assembly Language Basics architecture, such as the Intel 8086 (Intel x86) Microprocessor with an Computer Architecture integrated 8086 assembler. The distinctive feature of this computer Computer architecture defines the details of how a set of software and architecture is its use of single memory area for program instructions and hardware technology standards interact to form a computer system, also relative data, which makes it conceptually straightforward for programmers. known as a platform. Computer architecture generally pertains to the design of a computer system, together with its technological compatibilities. It requires robust collaboration between computer scientists and computer engineers. Computer architecture encompasses the following (Britannica.com, n.d.): Data storage devices and memory organization Networking components that store and run programs Data transmissions Interactions between computers across networks and/or with users Three categories of computer architecture (Techopedia, n.d.): 1. System Design – This type of computer architecture involves all the hardware components of a system, such as the central processing unit (CPU) and memory controllers. 2. Instructional Set Architecture – This type of computer architecture involves the embedded programming language of a CPU, which defines Figure 1. The Von Neumann Architecture (illustration may vary based on the author's perspective). Source: Modern Computer Architecture and Organization, 2020 p. 172 its functions and capabilities. 3. Microarchitecture – This is also known as computer organization. This Familiarity with the fundamentals of computer architecture is required in architectural type covers the data paths, data processing and storage, as learning assembly language. The development of assembly language was a well as the data implementation in an instructional set architecture. major milestone in the evolution of computer technology since it is one step away from machine language. In assembly language, the term x86 Von Neumann Architecture is a good example of computer architecture. architecture pertains to a microprocessor that has the capability to execute a Numerous general-purpose computers, up until today, utilize this architectural 32-bit instruction set. design. It was proposed by John Von Neumann in 1945, describing the design of an electronic computer. Computers with this architecture can be termed as Generally, each assembly language instruction is translated into one (1) control-flow computers, which follow a step-by-step program that governs all machine instruction by the assembler. Assembly language is hardware- operations. The following are some of the fundamental processes in a Von dependent. Instructions can refer to specific registers in the processor, Neumann architecture (BBC Bitesize, n.d.). including all the operational codes (opcodes) of the processor, and reflect the Data and instructions are both stored as binary in the primary storage. bit length of various registers and operands. Therefore, to be able to utilize Instructions are fetched from memory one at a time (serially). assembly language, one must understand the fundamentals of computer The processor decodes and executes an instruction, before cycling around architecture (Stallings, 2019). to fetch the next instruction. The cycle continues until no more instructions are available. Figure 2 illustrates a simple computer architectural model based on Emu8086. It depicts a system bus that connects the central processing unit (CPU) and Figure 1 illustrates the basic Von Neumann architecture. Note that most the random access memory (RAM) to different devices. Most computations microprocessors available in the industry encompass standard Von Neumann 02 Handout 1 *Property of STI  [email protected] Page 1 of 7 IT2104 occur inside the CPU, while the RAM is the location where programs are Disadvantages loaded in order to be executed. Writing codes in assembly language takes more time than writing a high- level language. It is easy to commit errors when using assembly codes. Assembly codes are more difficult to debug and verify. Assembly code is platform-specific thus, porting to a different platform is difficult. Assembly language utilizes a great number of symbolic names, which includes the name assignments to specific main memory and the instruction locations. It also includes statements that are not directly executable but considered as instructions to the assembler. Statements are the core of any assembly language program. A typical assembly language statement consists of the following elements (Stallings, 2019): Figure 2. A simple computer architecture model. Label – This may subsequently be used as an address or as data in Source: https://www.philadelphia.edu.jo/academics/qhamarsheh/uploads/emu8086.pdf another instruction's address. The assembler defines the label as equivalent to the address into which the first byte of the object code for Emu8086 is a prevailing offline emulator of Intel 8086 microprocessor with that instruction will be loaded. The assembler also replaces the label with 8086 assembler, which is also compatible with Advanced Micro Devices the assigned value when creating an object file. (AMD), disassembler, source editor, and an emulator with debugger. This Mnemonic – This is the name of the operation or function of the assembly software helps beginners to study assembly language. Emu8086 compiles language statement which may correspond to any of the following: source codes and executes through an emulator step by step. One can watch o Machine instructions: These are executable instructions that registers, flags, and memory while the program executes. instruct the processor what to do. Note that each executable instruction generates one machine language instruction. These An emulator runs programs on Virtual PC. This completely blocks a program are also known as operational codes or opcodes. from accessing real hardware since the assembly code runs on a virtual o Assembler directives: These non-executable commands direct machine, which makes debugging easier (Philadelphia.edu.jo, n.d.). the assembler about the various aspects of the assembly process. These do not generate machine language instructions. Below are some advantages and disadvantages of using assembly language o Macro Definition: It is a section of code that programmers write programming (Stallings, 2019): once, and can be used many times. It encompasses a text Advantages substitution mechanism that is handled by the assembler at Looking at a compiler-generated assembly code in a debugger is useful assembly time. for finding errors and for checking how well a compiler optimizes a Operand – This identifies an immediate value, a register value, or a particular piece of code. memory location. Generally, assembly language provides conventions to It is necessary to understand assembly coding techniques in order to distinguish the type of operand, as well as the conventions for indicating create compilers, debuggers, and other development tools. the addressing mode. It is possible to access instructions that are not accessible from a high- Comment – This begins with a semicolon ( ; ) which signals the assembler level language. that the rest of the line is a comment and must be ignored by the It is possible to make library functions compatible with multiple compilers assembler. Comments can occur at the right-hand side of an assembly and operating systems (OS). statement or occupy an entire text line. 02 Handout 1 *Property of STI  [email protected] Page 2 of 7 IT2104 Pointer registers o IP: Instruction pointer – stores the offset address of the next instruction to be executed. Figure 3. Assembly language statement structure. o SP: Stack pointer – provides the offset value within a program stack. o BP: Base pointer – used in referencing parameter variables passed to a subroutine. Index registers o SI: Source index – used as source index for string operations. o DI: Destination index – used as destination index for string operations. Control registers contain bits that manage the operation of the floating-point unit including the type of rounding control (single, double, or extended precision), and bits to enable or disable exception conditions (Stallings, 2019). Figure 4. Some examples of assembly language statements. Source: https://www.tutorialspoint.com/assembly_programming/assembly_basic_syntax.htm CR0 – This contains the control flags that manage the operating mode and status of the processor. Below are some of the most common flag bits Registers, Segments, and Indexes (Tutorialspoint, n.d.): A processor's operation greatly involves data processing. Reading and storing o OF: Overflow Flag – indicates the overflow of a high-order bit of data data in the memory slows down the processor. In order to speed up its after a signed arithmetic operation. operation, the processor includes some internal memory storage locations o DF: Direction Flag – determines the direction (left or right) for moving known as registers which do not require memory access. Note that a number or comparing string data. of registers are already built into a processor chip (Tutorialspoint, n.d.).  DF is 0 if the string operation direction is left to right.  DF is 1 if the string operation direction is right to left. General-purpose registers may contain the operand for any opcode, which o IF: Interrupt Flag – enables or disables external interrupts. can be assigned to a variety of functions. In some cases, their use within the  When IF is set to 0, the external interrupt is disabled. instruction set is orthogonal to the operation. However, there are restrictions  When IF is set to 1, the external interrupt is enabled. such as the dedicated registers for floating point and the stack operations o TF: Trap Flag – sets the operation of the processor to single-step (Stallings, 2019). General-purpose registers are 16-bit registers but each can mode. also be used as two separate 8-bit registers: the high byte – H and the low o SF: Sign Flag – shows the sign of the result of an arithmetic operation, byte – L. For example, the high byte of AX is denoted as AH while the low byte indicated by the leftmost bit. is denoted as AL. Note that the H and L notation also applies to BX, CX, and  A positive (+) result is denoted by 0. DX.  A negative (-) result is denoted by 1. Data registers o ZF: Zero Flag – indicates the result of an arithmetic or comparison o AX: Primary accumulator – used for input/output (I/O) operations and operation. in most arithmetic instructions.  A non-zero result is denoted by 0. o BX: Base register – used in indexed addressing.  A zero result is denoted by 1. o CX: Counter register – stores the loop count in iterative operations. o AF: Auxiliary Carry Flag – contains the carry of an arithmetic operation o DX: Data register – used in I/O operations and for multiplying and from bit 3 to bit 4. dividing large values. o PF: Parity Flag – involves the total number of 1-bit in the result of an arithmetic operation.  If the total number of 1-bit is even, PF is 0. 02 Handout 1 *Property of STI  [email protected] Page 3 of 7 IT2104  If the total number of 1-bit is odd, PF is 1. DS: Data segment register – specifies the segment containing the data, o CF: Carry Flag – contains the carry of 0 or 1 from a high-order bit after the constant, or the work area for a program. A 16-bit DS register stores an arithmetic operation. the starting address of the data memory segment. CR1 – This is a reserved control register. SS: Stack segment register – refers to the segment that contains a user- CR2 – This involves the page fault linear address. visible stack. A 16-bit SS register stores the starting address of the stack. CR3 – This enables the processor to translate a linear address into ES: Extra segment register – is a spare segment register that may be physical address by locating the page directory and page tables for the utilized for storing data. current operation. Note that in assembly language, an offset is stored as part of the instruction Segmentation is used to define logical memory partitions subject to access set. It usually represents the number of address locations added to a base controls. This process is performed to enhance the speed of fetching data from address in order to obtain specific addresses. the memory and easily execute instructions in a computer system (Stallings, 2019). Data Types and Addressing Modes The following are the general data types used in assembly language, which Segments, also known as memory segments, are specifically defined areas are characterized by size: in a program for containing instruction codes, data elements, or program Nibble : 4-bit stacks. A segmented memory model divides the system memory into Byte : 8-bit independent groups, called sections, referenced by pointers within segment Word : 16-bit registers. Doubleword : 32-bit Code memory segment – The size of this memory section is determined Quadword : 64-bit before a program runs. It contains all the instructions to be executed. In Double Quadword : 128-bit an assembly program, this segment can be identified through the.text In most cases, the x86 architecture does not mandate the storage of the data section. types on natural boundaries – any address evenly divisible by the size of the Data memory segment – This contains the data elements that can be data type in bytes. The 16-bit mode in the x86 architecture nowadays serves used in the program. In an assembly program, this segment can be as a bootloader for a protected mode of an operating system. identified through the following sections: o.data – This section is commonly used to declare data memory The x86 architecture supports a wide variety of data types that are recognized segments that remain static throughout the program. and operated on by particular instructions (Stallings, 2019). o.bss – This section contains buffers for future data declaration during Integer – A signed binary value contained in a byte, word, or doubleword the program execution. using 2's complement representation. Stack memory segment – This holds the data and return addresses of Ordinal – An unsigned integer contained in a byte, word, or doubleword. procedures or subroutines in a program. Packed Binary Coded Decimal (BCD) – A representation of a BCD digit wherein two (2) digits are stored per byte (8-bit). The value of the decimal Segment registers work together with general-purpose register to access any digit is within the range of 0 to 99. memory value (Tutorialspoint, n.d.). Unpacked Binary Coded Decimal (BCD) – A representation of a BCD CS: Code segment register – denotes the segment that holds the digit wherein one (1) digit is stored per byte (8-bit). The value of the instruction being executed. A 16-bit CS register stores the starting address decimal digit is within the range of 0 to 9. of the code memory segment. Near Pointer – This contains a 16-bit offset addressing information from the current segment. 02 Handout 1 *Property of STI  [email protected] Page 4 of 7 IT2104 Far Pointer – This contains a 32-bit of addressing information which is required to obtain the operand, thus saving one memory or cache cycle in composed of a 16-bit segment register value and a 16-bit offset value. the instruction cycle. On the other hand, the size of the number is restricted Bit Field – It is a continuous sequence of bits in which the position of each to the size of the address field, which is commonly smaller compared to a bit is considered as an independent unit. It can begin at any bit position of word length. any byte and can contain up to 32 bits. Direct memory addressing – This requires only one (1) memory Bit String – It is a continuous sequence of bits containing zero (0) up to reference and no special calculations needed. The address field contains the number of bits in the string less one. It is commonly used to represent the effective address of the operand. Earlier generations of computers sets or to manipulate binary data. commonly utilize this mode, but modern computer architectures rarely Floating Point – It is a data type that encompasses decimal points that apply this mode. In addition, the length of the address field is usually less are not considered as fixed point numerals. The significant digits are than the word-length, thus limiting the address range. stored as a unit called mantissa while the decimal point in base 10 is stored Indirect memory addressing – This mode encompasses an address field in a unit called exponent. In assembly language, floating point generally referring to an address of a word in memory, which in turn contains a full- pertains to a set of data types that are used by the floating point unit and length address of the operand. This utilizes the computer's ability of operated on by specific floating point instructions. Segment:Offset Addressing. The instruction execution requires two (2) Packed Single Instruction-Multiple Data (SIMD) – This is a set of data memory references to fetch the operand. First is to get the address and types that were introduced to the x86 architecture as part of the extensions second is to get the value. A rarely used variant of indirect memory of the Intel 8086 instruction set to optimize the performance of multimedia addressing is the Cascaded Indirect Addressing, where one (1) bit of a applications. The basic concept of packed SIMD is that multiple operands full-word address is an indirect flag. are packed into a single referenced memory item and that the multiple Register addressing – This mode involves an address field referring to a operands are operated on in parallel. register containing the operand. An address field that references registers has 3 to 5 bits, thus a total of eight (8) to 32 general-purpose registers can Addressing Modes – This pertains to the way in which the operand of an be referenced. In this addressing mode, only a small address field is instruction is specified. The information contained in the instruction set can needed in the instruction and there are no time-consuming memory either be the value of the operand or the address of the operand or result. references required. If this mode is utilized to a great extent in an o Mode field – This determines which addressing mode is to be used. instruction set, then it implies the heavy utilization of processor registers. Note that, one (1) or more bits in the instruction format can be used Register indirect addressing – In this mode, the address of the operand as a mode field. is placed in a register. The address space limitation of the address field is o Effective address – In a system without any virtual memory (VM), this overcome by having a field referring to a word-length location containing can either be the main memory address or a register. For computers an address. The advantages and limitations of this addressing mode are with VM, the effective address is a virtual address or a register. basically the same with the indirect memory addressing mode. o Memory management unit – This performs the actual mapping of a Displacement addressing – This is considered a very powerful mode of physical address that is not visible to programmers. addressing since it has the combined capabilities of direct memory addressing and register indirect addressing. It requires two address fields, Common Addressing Modes (Stallings, 2019) at least one of which is explicit. The value contained in the first address Immediate addressing – This is the simplest addressing mode wherein field is used directly, while the other refers to a register whose contents the operand is an immediate value stored explicitly in the instruction. This are added to the first address field to produce the effective address. This mode can be utilized to define and use constants or set initial values of addressing mode is more appropriate for structured records accessing. variables. In general, the number will be stored in 2's complement form, The three (3) most common uses of displacement addressing are relative wherein the leftmost bit of the operand field is used as a sign bit. Its addressing, base-register addressing, and indexing. advantage is that no memory reference other than the instruction fetch is 02 Handout 1 *Property of STI  [email protected] Page 5 of 7 IT2104 Stack addressing – This mode is a form of implied addressing. The memory, perform an operation, and store the result to memory, all in one machine instructions need not include a memory reference, but implicitly instruction. Each instruction in a CISC can perform several low-level operate on a stack that is a linear array of locations. The stock is a operations. The instructions are variable in length and use several reserved block of locations, wherein items are appended to the top of the addressing modes requiring complex circuitry. An instruction may take stack so that the block is partially filled at any given time. A pointer is several clock cycles to execute, as the processor performs all of the associated with the stack whose value is the address of the top of the required subtasks. CISC tend to have smaller number of registers due to stack. The stack pointer is maintained in a register. Thus, references to the greater space required for the logical instruction set. The x86 is a stack locations in memory are in fact register indirect addresses. classic example of CISC architecture. Instruction Format An instruction format defines the layout of the bits of an instruction, in terms of its constituent fields. It is composed of an opcode, zero or more operands (implicit or explicit), and the mechanism indicating the addressing mode used for each operand. The format must implicitly or explicitly indicate the addressing mode for each operand. Note that most instruction sets involve more than one instruction format. The instruction length, or instruction format length, is the most basic design- related matter that can be encountered by programmers. Deciding the value for the instruction length affects, and is affected by, the memory size, memory organization, bus structure, processor complexity, and processor speed. With this, the allocation of bits becomes a major challenge in terms of designing instructions. Generally, more opcodes means more bits in the opcode field, thus reducing the number of bits available for addressing. The following interrelated factors are involved in determining the appropriate allocation of bits (Stallings, 2019): o Number of addressing modes Figure 5. The x86 instruction format. o Number of register sets Source: Computer Organization and Architecture: Designing for Performance (11th ed.), 2019 p. 604 o Number of operands o The addressing range Figure 5 shows the general instruction format of an x86 architecture which o The address granularity includes: o a prefix field that can contain an instruction prefix, segment override, Critical components and various factors must simultaneously be considered operand size, or an address size; and balanced in designing processors. The struggle of programmers between o an opcode field which can be 1, 2, or 3 bytes in length; instruction set complexity and the number of registers is addressed and o the ModR/M that specifies if an operand is in a register or in memory; expressed in terms of an architecture categorization, which includes the o the SIB (scale, index, and base filed) which provides addressing following (Ledin, 2020): information to fully specify the addressing mode; Complex Instruction Set Computer (CISC): Variable length o the displacement byte(s), which when used, an 8, 16, or 32-bit instructions – A processor characterized with a rich instruction set signed integer displacement field is added; and providing a variety of features, such as the ability to load operands from 02 Handout 1 *Property of STI  [email protected] Page 6 of 7 IT2104 o the immediate byte(s) that provides the value of an 8, 16, or 32-bit o The following 5 bits constitute an opcode and/or modifier bits for operand. the operation. o The remaining 20 bits are for operand addressing. Reduced Instruction Set Computers (RISC): Fixed length instructions – A processor having fewer number of instructions that References: individually performs a simpler task. The reduction of instructional set Britannica.com. (n.d.). Architecture and organization. Retrieved on July 23, 2021 from complexity leaves more space for registers. This architecture is optimized https://www.britannica.com/science/computer-science/Architecture-and-organization BBC Bitesize. (n.d.). Von Neumann architecture. Retrieved on July 26, 2021 from to execute individual instructions at a very high speed. Although reading https://www.bbc.co.uk/bitesize/guides/zhppfcw/revision/3 memory, performing operations, and writing the result require numerous Stallings, W. (2019). Computer organization and architecture: Designing for performance (11th ed.). instructions in the processor, the turnaround time is still comparable or Pearson Education, Inc. even faster compared to a CISC. Note that the larger number of registers Ledin, J. (2020). Modern computer architecture and organization. Packt Publishing in an architecture reduces the need to access system memory, since more Philadelphia.edu.jo. (n.d.). Help for Emu8086. Retrieved on August 2, 2021 from https://www.philadelphia.edu.jo/academics/qhamarsheh/uploads/emu8086.pdf registers are available for the storage of intermediate results. An Techopedia. (n.d.). Computer architecture. Retrieved on July 23, 2021 from Advanced RISC Machine (ARM) processor, containing 13 general- https://www.techopedia.com/definition/26757/computer-architecture purpose registers, is a good example of this architecture. Tutorialspoint. (n.d.). Assembly – Registers. Retrieved on August 2, 2021 from https://www.tutorialspoint.com/assembly_programming/assembly_registers.htm Tutorialspoint. (n.d.). Assembly – Memory Segments. Retrieved on August 2, 2021 from https://www.tutorialspoint.com/assembly_programming/assembly_memory_segments.htm S (for data processing instructions): signifies that the instruction updates the condition codes S (for load/store multiple instructions): signifies whether instruction execution is restricted to supervisor mode P, U, W: bits that differentiate the types of addressing mode B: distinguishes between an unsigned byte (B==1) and a word (B==0) access L (for load/store instructions): indicates whether the instruction is a Load (L==1) or a Store (L==0) L (for branch instructions): determines whether a return address is stored in the link register Figure 6. The ARM instruction formats. Source: Computer Organization and Architecture: Designing for Performance (11th ed.), 2019 p. 606 Figure 6 shows how ARM instruction formats follow a regular 32-bit structure, which eases the task of the instruction decode units. Below is the corresponding bit allocation for an ARM instruction: o The first 4 bits are the condition code. o The next 3 bits specify the general type of instruction. 02 Handout 1 *Property of STI  [email protected] Page 7 of 7

Use Quizgecko on...
Browser
Browser