C Programming and Assembly Language Lectures - PDF

Document Details

Uploaded by Deleted User

Indian Institute of Technology, Madras

Janakiraman Viraraghavan

Tags

C programming assembly language computer science programming languages

Summary

This document contains lecture notes for a course on C Programming and Assembly Language offered at the Indian Institute of Technology, Madras. The course aims to bridge the gap between high-level and low-level programming languages by demonstrating the execution of C programs in microprocessors through animations and examples. The course covers topics such as function calls, parameter passing, local variables, and inline assembly.

Full Transcript

INDEX S. No Topic Page No. Week 1 1 Lecture 1 1 2 Lecture 2 13 3 Lecture 3 21 4 Lecture 4 A 30 5 Lecture 4B 36 6 Le...

INDEX S. No Topic Page No. Week 1 1 Lecture 1 1 2 Lecture 2 13 3 Lecture 3 21 4 Lecture 4 A 30 5 Lecture 4B 36 6 Lecture 5 42 Week 2 7 Lecture 6 53 8 Lecture 7 63 9 Lecture 8 71 10 Lecture 9 76 11 Lecture 10 83 Week 3 12 Lecture 11 94 13 Lecture 12 101 14 Lecture 13 109 15 Lecture 14 120 16 Lecture 15 132 Week 4 17 Lecture 16 141 18 Lecture 17 A 152 19 Lecture 17 B 161 20 Lecture 18 170 21 Lecture 19 179 22 Lecture 20 186 C Programming and Assembly language Prof. Janakiraman Viraraghavan Electrical Engineering Department Indian Institute of Technology, Madras Lecture - 01 So, welcome to this course on C Programming and Assembly language. So, at the outset let me state that; lot of students do plenty of courses in C programming and other high level languages. And also they do many courses on assembly language programming, on various micro process and microcontrollers. What is really missing in the current curriculum is a bridging gap between these two languages. That is to say students do not have a physical understanding of how a C program or a high level language is actually executed in a microprocessor. And people do not understand what exactly happens in various segments of the memory, when a program is being executed. The idea of this course is to bridge, this very gap. For example, when students are asked where local variables are stored in C? The answer comes very quickly, that it is stored on a stack. And then asked, what kind of scope do, variables in a function have, they call it a local scope or a global scope unfortunately there is no understanding of, what physically these things mean. In this course, we hope to address these issues. (Refer Slide Time: 01:27) 1 So, the course objectives really is to ensure that, we are able to show to students, what exactly happens when a C program is executed in the microprocessor by way of animations and some simple examples. (Refer Slide Time: 01:47) Moving on to the learning objectives of this course, which by the way is what a student should be able to do, on completing this course. The following are the learning objectives, you should be able to explain how function calls are translated to assembly language, explain how parameters are passed to function, explain what it means to say that local variables are stored on stack, demonstrate how local variable space is allocated by a compiler. This is a week, you at least have to be able to show one way, in which this can be done. Explain what it means to say local variables go out of scope after a function call, list out the instructions that need to be executed before entering and before exiting a particular function in C. Then we will move on to be able to explain, what various calling conventions are in C functions right, for example, you have a function which has a variable argument list, as opposed to a fixed argument list, how do these functions differ in assembly language and that is where calling conventions actually come in. Then you should be able to explain simple difference between C and C++ at assembly level. C++ is a very powerful programming language that offers object oriented programming. 2 But the key point to note is that at assembly level it is not very different from C. There is very little change that you need to incorporate when you move from compiling a C program to compiling a C++ program and we will demonstrate this by way of an example again. (Refer Slide Time: 03:23) You should also be able to exploit certain hardware instructions to speed up C functions and then you should also be able to explain why recursion is not a great idea for performance. Typically you will see that, recursion though is a very powerful way of programming and conceptualizing an idea, it is not necessarily the most efficient way of coding or programming it in a microprocessor. 3 (Refer Slide Time: 03:53) We will look at the reasons for that in this course. So, the references and prerequisites , so, there like I said there are plenty of courses that students do in C programming, students do in microprocessor assembly language programming. But for C programming you can typically refer any book, but of course, there is nothing better than the Bible called the C programming language by Kernighan and Ritchie, the second addition. As far as a microprocessor assembly language goes, what I am focusing on this course can be explained with any microprocessor. And it may be slightly different from microprocessor to microprocessor, depending on the assembly instructions that are available. But unless, we actually freeze on a particular architecture or on a particular assembly language, it is not going to be possible to get a physical understanding of what actually happens at the lowest level. For this reason, I decided to adopt the Intel microprocessors architecture and programming for this course. The reason is when you compile a program on a normal desktop, which is based on an Intel microprocessor, which is mostly the case, you will see that the assembly instructions are exactly the, what I am referring to in this course. So, for the Intel microprocessor architecture and programming there is again no better book then very Barry Brey’s book on Intel microprocessors architecture and programming. So, I referred the second edition, but there is nothing different even if you 4 refer the 8th edition, which is the latest book available. As far as prerequisites go , I expect that students are already comfortable with C programming. So, I will not be going into any details of C programming here. For example, what a function is, what an array is, what variables are, I am not going to go into any of those details. I am assuming that you already know, how to program, simple programs in the language C. However, I will focus on something called inline assembly which again can be put in any high level language. But I will look at it in the context of C programming. For assembly language programming, I am assuming that students have worked on assembly language programming of some microprocessor, be it arm or you know the any other microprocessor. So, I just expect familiarity with assembly type of instructions. I am not assuming that you know the Intel architecture or the Intel assembly instruction set already. So, I will spend some time describing these instructions. So, that you get familiarity, but I am not going to go into gory details of how these instructions are you know executed or you know the need for these instructions. I am not going to go into any such detail. I am going to give only some working familiarity with the assembly language of the Intel architecture. (Refer Slide Time: 06:49). So, the agenda for this course is, I have broken this four week course into four modules. Module 1 which will be covered in the first week, will be a brief introduction to the 8086 5 processor architecture and then describing commonly used assembly instructions right. Here when I say commonly used assembly instructions, I mean the assembly instructions that you will typically encounter when you translate a C program into assembly language. There are numerous other instructions that are, there in the x 8 6 architecture of Intel , which can be used for hardware programming and so, many other purposes. But that is not the focus of this course. I will be only dealing with a subset of the Intel x 8 6 instructions, that apply to C programming. Then we will look at the use of stack and related instructions. And finally, I will look at the call and return instructions in some detail. Again all these instructions are typically used in any microprocessor, the idea is to just get you in, to the Intel style of coding these instructions. In the second module which is in week 2, I will give an introduction to C programming and inline assembly. So, Inline assembly is a way in which we can actually intersperse some assembly instructions within a C program. So, you have the syntax of a C program and there is a certain way in which, I can actually switch from a high level programming language into a low level programming language, like assembly language, do certain instructions and then come back to my high level programming language again. Then, I look at the data types and their sizes which are typically used in C and relate them to, you know what we do in a microprocessor. Then I look at some specific examples, of inline assembly. And these examples have been chosen specifically to drive home certain advantages, which we will see in later modules. So, I look at ALU operations, the string length operation, multiplication using repeated addition, swapping two variables in C, this is a very interesting exercise that all students do in C programming, where you know by using a temporary variable, without using a temporary variable and so on right. So, we look at that example. Then we will look at swapping two variables using inline assembly language. Later we will move on to a function to swap two variables in C right and we will also look at swapping two variables using inline assembly in C. So, they have various flavors of these swapping functions that, I would like to look at and each example will drive home a certain advantage and a certain concept, which we will cover and come to at a later point in the week 3. 6 (Refer Slide Time: 09:45). Module 3 which we will cover in week 3 is essentially the main focus of this course. The idea is to take a given C program and simply compile that into low level assembly language. So, the idea of introducing inline assembly is that, at least the basic instructions which are the arithmetic and logical related instructions can be translated pretty trivially, as we will see. However to actually translate the entire function needs certain knowledge of what exactly happens and there is something called a prologue and epilogue that has to be executed for each function. A prologue is a set of instructions that are executed, before you enter into a function and an epilogue is a set of instructions that are executed before you exit that function. So, the idea is to drive home the need for a prologue and an epilogue and to even derive what instructions actually need to be there, by way of an example and animation. In the animation, I will ; give you an exact idea of what happens at assembly language level in the code segment, in the data segment, in the stack segment, all simultaneously, right. And with those examples, we will see what happens if this prologue and epilogue were not there., what is the exact need for these instructions,. So, then I will move to the calling conventions as, I briefly mentioned earlier and then we will look at how variables are passed and accessed. So, these are the key points that 7 we will touch on in week 3. Module 4 which is the final module of this course is where we are actually going to compare C and C++ at assembly language level, right. So, we will take an example of a structure in C and a class in C++ and then we will find that there is actually not much of difference other than the, you know how in C plus plus you have certain restricted access, while in C you do not, right. So, what do these things actually mean at assembly language level is, the going to be the focus of this first discussion in module 4. Then we will look at optimizing certain C functions by exploiting hardware loops right. So, for example, memcpy, string length and some maybe a few other instructions or functions we will also look at,. The idea of discussing this, is to show you that it is not a good idea to actually code, certain basic functions like string length. For example, using loops and you know the regular style of coding that we do. You need to exploit certain instructions that the hardware gives you in order to speed up this process and that is what you will see in any library that is provided to you. So, if you as well actually disassemble that code and look at the assembly implementation of a string length function, the heart of the function will be this hardware instruction, which is well used to exploit loops at a hardware level and not at a software level. We will also discuss, why it is a good idea to sort of convert recursion into software loops, people do this on a regular basis and we will look at, why that is the case. 8 (Refer Slide Time: 13:23) So, with that let us move into module 1, right. So, the idea here is to first give you a brief introduction into the 8086 processor architecture, describe the commonly used assembly instructions, use of stack and related instructions and then focus on the call and return instruction. So, with that let us enter the world of microprocessor programming. (Refer Slide Time: 13:49) So, the microprocessor is actually very sophisticated chip that can do many things for us. I am going to abstract out then necessary definition and the necessary details that we need, in order to proceed with the learning objectives of this course. So, a 9 microprocessor can simply be defined as a black box that can do certain computations,. So, we will call this mu P and this microprocessor talks to what is known as a random access memory,. So, a RAM is as we all know, is Random Access Memory. So, a microprocessor has what is known as a data bus, right and the line here indicates that this could be more than one bit it could typically be 8 bits, 16 bits, 32, 64,. And this data bus is actually going to be bidirectional in nature which means that, microprocessor can send data to the memory, or it can receive data from the memory. So, this is my data. The microprocessor also has what is known as an address bus. This again is a multi bit number, depending on the size of the memory and how much memory it can access, it is that many bits will be assigned. Typically it is going to be 2 power N, if N is the number of address bits, then the number of locations that it can logically address is 2 power N. So, the memory also has two other control signals called the read and the write. So, as far as we are concerned a memory is again a black box, which takes data and addresses inputs and two control signals called read and write, right. So, the RD is read and WR is write. So, the black box called memory here, if you issue a read command, which means that you are going to actually make the read command go high for a short while, and you present an address to it, than that particular location will be read out and placed on the data bus,. Similarly if you assert the right command and you actually present an address to the microprocessor, then this would simply write the data that is available on the data bus, into that particular location in the memory. So, that is all a memory is and that is all we care for in this course,. For example, a memory could be an SRAM or a DRAM or various other kinds of memories are there non-volatile, volatile, I am not going into all that. I am just going to call a memory as a random access memory in this case, which has the features that I just described. So, the other thing that we need to abstract out is what is known as a logical memory map. So, what we want to look at is a Logical Memory Map. 10 (Refer Slide Time: 18:03) So, what is a logical memory map? So, I have an address, which can be about N bits. So, the address bits are a 0 to a N minus 1, and these could actually refer to 2 power N locations. And in each location, I could write K bits of data. So, my data bus could be about 0 to K-1, right. So, it has about K bits of info. So, the idea here is, we are trying to separate out the physical memory implementation from a logical addressing implementation. So, by logical addressing, what I mean is that there are 2 power N locations and in each location, I would say for example, I could say that we write 8 bits of information, for example, this k could be like 8 bits. So, there are 2 power N locations, each 8 bits long, this is the logical address which means that location 0 to 2 power N minus 1 each of it is, actually going to be 8 bits in length. So, this is my location 0, this is my location 2 power N- 1, right and each of this is going to be 8 bits. This is called a logical memory map. This does not mean that, the physical memory has an 8 bits data bus, connecting to the microprocessor, it could be 32 bits or it could be 16 bits. For example, I could say that my physical map is only half of these locations and each location I can write 16 bits, and this is half of this, this is my physical memory. As far as the program is as concerned, you do not have to worry about the physical memory arrangement, all you have to worry about is a logical memory map, 11 which basically says that, if you are given an address with N bits you can write or read 8 bits from that. (Refer Slide Time: 21:49) 12 C Programming and Assembly language Prof. Janakiraman Viraraghavan Electrical Engineering Department Indian Institute of Technology, Madras Lecture - 02 So, welcome back to this course on C Programming and Assembly language. We are in module 1, and last lecture, we abstracted out the model of the memory as follows. (Refer Slide Time: 00:23) We said that a memory is a simple black box. It is a random access memory that has a bi- directional data bus and a unidirectional address bus. So, this is my data, this is my address. So, here I have the microprocessor which is controlling the memory by issuing various commands. So, this is a mu P and the memory also has commands called read and write, and these are not being controlled from elsewhere. The microprocessor again issues these commands at appropriate times in order to read or write to the memory. So, in this lecture let us look at the necessary abstraction of the microprocessor that we need for this course. So, let us start with the basic utility of a microprocessor. A microprocessor is meant to execute a program that is stored in a memory. So, the job is to execute a stored in memory. So, let us assume that this program is stored from location X to location Y. So, 13 let us assume that there is a location X in the memory where the program starts and a location Y where the program ends. The job of the microprocessor is to fetch instruction by instruction from this particular starting location and execute one by one until it reaches memory location Y. So, traditionally a microprocessor does the following tasks. It does what is known as fetch, decode and execute. The name fetch suggests what it should do. The idea is to fetch a particular instruction from the memory, then decode it and execute the instruction accordingly. So, if I denote this fetch by F, decode by D and execute by E and let us assume that there are M instructions between location X and location Y. Then what the microprocessor does is, it actually fetches the first instruction, decodes the first instruction and executes the first instruction. Then it fetches the next instruction, decodes the next instruction and executes the next one and so on; F3 D3 E3 and all the way to FM DM EM. At this point I have to mention that microprocessors have come a long way where they do not just execute this fetch, decode execute cycle serially it does it in what is known as a pipeline manner. So, while the instruction is being fetched and decoded for instruction 1, and then the first instruction is being executed in the microprocessor the data bus happens to be idle during the execution phase. And therefore, you can pre fetch the next instruction in that period. So, F2 D2 and E2 will happen like this F3 D3 E3 happens and so on. So, you can clearly see here that in modern microprocessors because of this concept of pipelining instructions are pre-fetched and they end up getting a much better throughput from the particular microprocessor. So, in order to perform this fetch, decode execute process in a loop we need to see what all a microprocessor has to have and that is the exact abstraction that we are going to deal with in this particular lecture. 14 (Refer Slide Time: 05:39) So, F D and E in order to perform this what do we need? So, let us start with fetch. You have to go and fetch a particular instruction from the memory which means that I need some place in my microprocessor to store an address of where this code exists in memory. So, the first requirement is need something to store, code address and this is precisely satisfied by what is known as the instruction pointer register. So, I have an IP which is INSTRUCTION POINTER and if in 8086 it is 16 bits in length and the instead of the register was called IP as you move forward into the other x86 architectures, 286, 386 and down to the Pentium 4s this would essentially just be called EIP, where this E stands for extended and this is true for any other register in the microprocessor as well as you will see a little later. So, the way we start is we load this location X into the instruction pointer and tell the microprocessor to start executing the instruction from that particular location. So, here you load location_Xinto EIP and you start the execution. So, as the microprocessor fetches a particular instruction at location X and decodes that instruction the microprocessor will know exactly where the next code or the next instruction is in memory. So, therefore, it can automatically calculate what the next address should be and where the next instruction is located. So, automatically EIP is now incremented plus N; N could just be the next byte or it could be the next word or the next d word, it depends on the instruction size ; it depends 15 on what is known as the Opcode size. So, the microprocessor during the decode phase knows exactly what this number N should be and automatically increments the instruction pointer to the new address and continues this fetch decode execute process. So, that is with regard to the code. Now, let us come to the execute part of the instruction. The decode by the way is pretty straightforward. It is just a combinational logic where you decode what the instruction should be and invoke you know those registers and so on. So, there is nothing specifically needed for decode as far as our abstraction goes. So, let us now look at the execute part. The execution typically is going to refer to some sort of an arithmetic or logical instruction or it could even be some sort of a data movement from one location to another. So, what do we need in order to implement this execute part in a microprocessor? So, typically if I take my ALU which you know is something like this, my ALU. You have two operands right; operand 1 and operand 2 and typically these at least one of this will end up being a general purpose register. So, you have a general purpose register here and the other operand could be something more generic. We will come to what those operands could be. So, in order to perform any ALU operation we need to see what are the kind of general purpose registers that are available in a microprocessor. (Refer Slide Time: 11:15) 16 So, let us look at. Traditionally, every the 8085 microprocessor implicitly needed to have one of the registers hard coded and fixed as one of the operands to the ALU and this happened to be the AX register or the accumulator register. So, a stands for accumulator here and if you look at the size of this register in the 8086 microprocessor it is 16 bits in length. So, you have AL which is the lower 8 bits, AH which is the higher 8 bits and this is AX the combined register which is 16 bits in length. So, this is 16, each of this happens to be 8. And, just like the instruction pointer where we had an extended instruction pointer for 32 bits the higher 16 bits is referred to as the E AX. So, the 32 bits is E AX. So, here I have E and AX apart from this you can also access 8 bits of AL or. So, note that you cannot access the higher order 16 bits of E AX directly as the register. Similarly, I have the other registers which are basically my registers BH, BL and the extended value here again. So, this is E BX, BL and BH; and similarly for CH, CL and my E. And, of course, I have one more register called the DX register DL and my extended value E. So, I have CX, CL this is also H CH, DX, DL and DH and my extended 32 bit registers as well. So, you will find that typically for any of the addition subtraction you can use a combination of any of these two registers, you can also use these registers as pointers to something in the memory which we will come to a little later when we do the move instructions and the ALU instructions in some detail. So, apart from this we also need something known as a stack. A stack is a last in first out kind of memory and in order to keep track of the top of stack we need a specific register and that is known as the stack pointer SP and the extended value 32-bit version is E SP. Unfortunately, it is not sufficient if you are able to do just the last in first out kind of operation with the stack. We also need to be able to do some amount of random access and therefore, we have another register which is known as the base pointer BP and E BP. So, these are my stack registers. It also turns out that we are able to do some sort of complicated array manipulation or string manipulation at the hardware level and for that we have two other registers which are known as the source index and destination index SI and DI. So, SI and DI this by the way is the stack pointer and this is the base pointer. Similarly, this is the source index 17 and destination index. Like every other register I have a 32 bit version of this as well in the modern microprocessors E DI. So, we will look at some detailed examples of how all of these registers can be used for specific instructions and how they can be exploited to speed up certain functions in C as well later in the course. (Refer Slide Time: 18:11) So, till now we have been referring to certain address pointers in memory. For example, the instruction pointer accessing the code, we have the stack pointer accessing the stack and so on. So, the question is how do we partition the memory into different segments. For example, I have a large memory. This is my memory what I want to ensure is the segment where the code is stored is different from where my data is stored where my stack data is stored and so on. And, this is enabled by another set of registers which are known as the segment registers. So, we will talk about segment registers. So, I have my segment registers which are simply going to demarcate what the different kinds of memories are in a microprocessor. This for example, is my code; this could be my data; this could be my stack and this could be my extra segment. So, now the question is how do I decide that location X to location Y or location A to location B is meant for code; location B plus 1 to C is meant for data and so on. So, code 18 memory that is achieved by having one pointer to the starting of my code segment as my CS register. Similarly, pointer to my data segment is my DS register, pointer to my stack segment is SS register, pointer to my extra segment is ES register and of course, the 32- bit extensions are just E DS, ESS and EES so on. So, if I now have a particular instruction pointer EIP available now ; let us say that there is some address it has 0x0010. What is the complete address of this particular location or where is this location stored in the external memory is the question. So, the way that complete address is constructed is using a combination of the segment register and my instruction pointer register. So, the complete address is my ECS code segment register and EIP. This is my complete address. Similarly, if I am dealing with my stack segment then the complete address would be stack segment registerESS; ESP or you know ESS: EBP. So, by default, the stack the stack pointer and the base pointer are associated with the stack segment. The instruction pointer EIP is associated with the code segment register the data registers ABCD are associated with the data segment, the EDI and ESI are associated with the data segment and extra segment respectively. So, with that which is a combination of general purpose registers and some segment registers, we are able to address the entire memory to access code and data. The last register that is needed in order to complete this execution process and is almost mandatory in order to do branching and looping is known as the flag register. So, the flag register is just an indication of a result because of an ALU operation, arithmetic or logical operation in the microprocessor. For example, if I subtract two registers and it happens to go down to 0 then the 0 flag will be set. If I subtract two numbers and the result is negative, then you have a particular sign bit that is set. So, you can check the value of these bits in the flag register and make certain decisions in order to branch and loop. So, in summary we have now discussed a bunch of registers that make up a microprocessor in order for the microprocessor to execute this fetch, decode, execute in a loop again and again. 19 (Refer Slide Time: 24:05) 20 C Programming and Assembly language Prof. Janakiraman Viraraghavan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 03 So, welcome back to this course on C Programming and Assembly language. So, with that we introduced all the necessary registers in order for the microprocessor to go ahead and do the fetch, decode and execute operation again and again. So, with that let us move on to what are the typical instructions and how are certain instructions executed in x 86 microprocessor,. So, the instruction set is what we are going to look at in this lecture. (Refer Slide Time: 00:42) So, there are different kinds of instructions that one would want to execute when we do this final execute operation,. The third step of every instruction which is the fetch, decode and execute. So, one could be that you want to just move data. So, you could do data transfer,. Then you could do what is known as a an ALU operation, , arithmetic or a logical ALU related operation. Then you could do for example a stack operation, which effectively is still a sort of a data transfer, stack operation. You could call or return a function, you want to do,. So, the word function as used in C is not the same as used in assembly language, there therefore, maybe this is better to call it as a subroutine,. A function in assembly 21 language is called a subroutine. So, there are many more such instructions which we will not go in detail, we will only look at the related operations as I mentioned earlier that are useful for us in C programming. So, let us start with the data transfer. So, the idea is to actually you know let us go back to our fetch, decode and execute loop,. So, the first step might be for us to actually fetch some data,. It could be from a memory of course, that for the instruction pointer it is very clear it has to be from the memory, and that is a data transfer from the external memory into the microprocessor. But you could also do in a data transfer inside the microprocessor. So, in general, data transfer instructions are actually denoted by MOV destination, destination comma source. So, this is again very specific. This format is very specific to the Intel architecture. It can be different in other microprocessor, but does not matter. So, data from the source is moved to the destination, and the source is unaltered,. So, this remains not, it does not change.. So, you may want to for example move data from a one register to an other register, right. You may want to say MOV Ax comma Bx, which means that I am actually moving my data from register Bx into Ax which is 16 bits long, and Bx is going to be unaltered in the process. So, the direction of data movement is right to left in the Intel architecture,. So, before the go forward note that I have already started using some key words like a MOV and so on,. So, these keywords actually are known as mnemonics in assembly language. So, of course, ultimately in the microprocessor everything is stored in as bytes and bits, and information is decoded out of that. Mnemonics are only sort of a slightly high level translation for us to understand and process easily. What is the difference between a mnemonic and a high level language programming statement? So, the fundamental difference is a high level language program statement can be foremore sophisticated, you can do more complex things of course, but a high level programming language statement can translate to multiple machine level instructions. On the other hand, a mnemonic in assembly language strictly translates to only one machine code even at the microprocessor level. So, though this is still an abstraction a slightly high level language for us, the only difference is that it has a one to one mapping with the machine code that is eventually executed in the microprocessor, unlike a high level language statement,. So, this is only 22 one example that we wanted to deal with you move data from one register to another register. Of course, this could be MOV Cx comma Dx or whatever,. So, I could also move data, I could move a constant into my register which is given by you know MOV Ax comma 0x40 which means the hexadecimal 0x40 is loaded into my Ax register,. So, this is called register direct addressing. This is called immediate addressing, which means that the actual data that I want to load into a register is available immediately after my app code, right. It is part of the machine instruction that is loaded into memory. Then I could, for example, load contents from the memory,. So, I could for example, say I want to move the contents that is in location or the contents of Ax into location 1320, 0 x hexadecimal 1320. I want to move the contents of Ax. So, this is called direct addressing. What this mean is that contents of Ax are actually loaded into the memory location given by 0x 1320, right this is the hexadecimal number and that is the address and that location basically two 16 bits starting at 0x1320 will be loaded with the contents of Ax. So, this is called direct addressing. And I could do register indirect addressing which basically means that the address that I want to deal with, in memory is actually sitting in a register and I am going to you know load the contents of a microprocessor register into that particular address. For example, I can say MOV Ax comma right Bx. What does this do? It treats Bx as an address and whatever data is present in that particular address will be loaded into my Ax register,. So, now, it bring us to an interesting point,. Like I said previously logical address we always say we are going to deal in chunks of 8 bits,. But is it that I always want to deal with 8 bits. Well no, I might want to deal with 8 bits at one time, I might want to deal with 16 bits in sometime and maybe 32 bits at other times. So, we need a way to actually tell the microprocessor that I need you to fetch not just 8 bits, but I need you to fetch 16 bits or even 32 bits from starting at that location,. So, that is actually given by a specification which is you know instead of just Bx I would say DWORD pointer Bx. What is this mean? It means that whatever is in Bx will be treated as an address and the next 4 locations DWORD stands for double word, right which means that this is 32 bits. So, whatever is in this location that is pointed to by Bx, I will load 4 of those locations into my of course, I cannot have Ax then I should have EAx because I need 32 bits,. So, 23 whatever is in those 4 locations starting at the location pointed to by Bx, 4 of those bytes will be loaded into my EAX register in some particular order. That order let us not go into the details it does not matter to us,. The main thing I want to convey here is that we may want to deal with a byte, a word or a DWORD, or a double word. (Refer Slide Time: 11:35) So, if I want to load only 16 bits, for example, if I want to load only 16 bits from an address that is pointed to by Cx, into register Bx. I would say MOV Bx comma word pointer Cx,. What does this mean? It means that whatever is in Cx, let us look at the concrete example here I have Cx register, there is some let us say that the data let us gives a concrete value 0x1000, , and this is my Cx register. I want to load 16 bits of information from the memory, this is my memory and this is my location t 0x1000,. So, this is byte 1 and the other one is byte 2 which is basically location 0x1001. So, when I say MOV word pointer contents of Cx, it means whatever is in Cx that location will actually be accessed and two of those bytes will be moved into Bx,. So, the contents here will be moved to my Bx register. Also note that when I say that contents pointed to by Cx I am referring to the memory which means that I also need a segment register to tell me which segment to access this data from, Cx only gives me the offset and therefore, by default Cx you know all these general purpose registers refer to the data segment. So, this is actually not just Cx, it is 24 going to be pointed to as data segment:Cx, which means it is an offset given by the value Cx from data segment value that is given in the ds register,. So, again you can look up the book for more details, but the general idea is that you can move either data between registers or you can move data between you know you can move immediate data to a register, you can directly access the memory or you can indirectly access the memory through a register. So, this is register indirect addressing. If I want to go back here I would call this as register indirect. Now, I can actually do more than this. I can actually do some computations also. For example, I could Mov data into Bx, wordpointer(Cx+ 4) I can specify an offset now. So, you should not confuse that Cx plus 4 is actually evaluated using an other setup assembly instructions. For example, it is not that I load some value into Cx and then do an arithmetic addition using the add instruction of 4 and then calculate what that new address is and then do this calculation. All of these is enabled in hardware,. So, this is register indirect, register indirect with offset,. You can actually do more than this you can even do certain complicated things like you know do a scaled value 4 times Cx+4 and so on. I urge you to refer the book, but as far as we are concerned for this course it is sufficient to know these 4 addressing schemes which is basically register direct, immediate, direct addressing, and register indirect with offset. So, these 4 modes of addressing will cover the needs for this course. So, with that we can move on to the next class of instructions which are basically the ALU instructions, right. 25 (Refer Slide Time: 16:45) So, as the name suggest here the idea is to do arithmetic and logical instructions. So, what are the kind of instructions that you do? You do arithmetic or logical instructions. The key thing is that with every instruction that is executed here the flag register will be affected. So, for example I might want to do addition of two simple numbers,. So, I can load for example, MOV my Bx register with a number 0x40 and then I would like to ADD this to my Ax register, so Ax, Bx. So, what is this do? As a name suggests, it is going to do an addition it is going to replace Ax register with Ax+ Bx,. So, it is not necessary that Ax always has to be an operant to the ADD instruction, you could have any two register registers doing this addition for you,. So, for example, I could even do ADD Cx, Bx which implies Cx will get Cx+Bx,. And remember here I could extend this addressing scheme that I spoke about previously, even to this addition operation as well. For example, I could do ADD Ax comma contents pointed to by Bx, right which means that whatever data sits in Bx, will be brought on into the microprocessor and added with Ax,. So, what does this mean? This is Ax gets replaced with Ax plus contents pointed to by Bx,. So, you could do this various addressing schemes here as well and get similar results. So, I am not going into the gory details the by way of some examples and assignments we will cover the necessary details that we need out of these instructions. 26 So, let us look at for example, you know you could also do subtraction, Ax comma Bx, which is basically Ax is replaced by Ax- Bx,. You could do logical bitwise XOR, right or an AND operation let us say, AND Ax comma Bx. What does this mean? It means that Ax is simply replaced with a bitwise and BIT WISE AND with Bx,. So, let us look at what happens to the flag register for example in this process. If Bx happens to be 0, or Ax happens to be 0 in this AND instruction then the result of this AND Ax Bx is going to be a 0,. Let us assume that Ax happen to have this number 4, 16 bit hexadecimal all 0s. That means, that the AND operation Ax comma Bx, if you execute this irrespective of what is there in Bx, Ax will now have 0. So, when this instruction is executed and the result happens to be 0 the flag register will actually set the 0 bit to high which means in the flag register, there are various features that you can you used to indicate which like I mentioned the 0, the carry and so on. Let us assume that this actually refers to the 0 bit this is the carry and so on. Then this particular bit will be set to 1; that is what it means, that is what it means to say that the flag is affected with very single arithmetic or logical operation. So, what it means is that I can now make some sort of a decision based on this flag register and we will come to that later which is basically known as branching and looping instructions. So, we will come to that. (Refer Slide Time: 22:42) 27 Similarly, you know if I wanted to clear any register then I would simply do XOR Ax comma Ax. This is the simplest and fastest way to clear any register, because it what is what it says is that any irrespective of what Ax had before executing this instruction if you XOR with itself bitwise XOR, the result has to be 0 which implies that after this instruction Ax will necessarily become 0. And of course, the 0 flag is set. So, there are other logical operations that you can do you like the OR, which again is bitwise or operation. So, you can do XOR, you can do XNOR, then you can do the NOT, logical NOT operation which means you are just inverting, a particular register. And on the arithmetic side you can do the addition, subtraction, you can also do MULTIPLICATION. Unlike the addition or subtraction operation where you are free to give any two registers, the multiplication works only with Ax as one of the operants. So, if you want to multiply two numbers, you have to move one of the operants into Ax the other operant into any other register and then perform it. So, if I want to multiply a Ax and Bx, then I would have to MOV Ax comma let us say I am going to load 0x4500, 16 bit number, I want to now multiply with Bx. So, this here the second operant is implicitly assumed to be the A register. So, you can multiply with Bx and the higher order result, , when you multiply two numbers; obviously, 16 bit numbers you are not going to get another 16 bit number, you need more than 16 bits. So, the higher 16 bits is made available in the Bx register. So, if you look at this what it basically does is Ax will be Ax* Bx and Ax stores the low 16 bits, Dx will store the higher 16 bits. 28 (Refer Slide Time: 25:53) 29 C Programming and Assembly language Prof. Janakiraman Viraraghavan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 04 Welcome back to the course on C Programming and an Assembly language. So, in the last couple of lectures, we discussed two kinds of 8086 instructions. (Refer Slide Time: 00:21) One is the data transfer. And the key summary for the data transfer is we did MOV destination comma source, and the source was unchanged, unchanged in the process. Data would move from the source to the destination and none of the flag registers would be affected in the process. We then moved on to discuss a few ALU instructions like the ADD, subtract, and XOR, and you know all those instructions were bitwise operations. So, the logical operations were bitwise ops. So, we will continue with the ALU instructions in this discussion, and we will introduce to very simple, but very important instructions here. So, in this class, we have the increment and decrement instruction. So, what we do here is we could do the following INC AL which means increment the value of AL right. So, what does this do, it just replaces AL by AL plus 1. Alternately you could also do it on AX or you could do it on EAX. So, this is a way of implicitly adding one to the register 30 without calling the ADD instruction. Similarly, you could do a decrement AL or AX or EAX. So, what does this do, it will simply replace AL by AL minus 1. So, this is an ALU operation. So, implicitly the flags will be affected. So, if AL for example happened to be a 0x0001 , oh, this is 8 bit, so it would just be 0x01. And you decremented AL, then after this instruction AL would now become 0x00 and hence the zero flag will be set =1. (Refer Slide Time: 03:23) So, moving on with a few more ALU operations, the next important operation is the compare operation. So, the mnemonic for this is CMP, I could do AX comma BX. What does this do? It simply performs the operation AX minus BX, but does not put the result into any register which means that both AX and BX are unchanged in the process. So, if none of these registers are actually affected then what actually gets affected in the micro processor, it is only the flag registers. So, what is the use of this instruction, it is used when you want to affect you know compare two numbers and then make a branching or looping decision in the assembly language. So, it affects the flags and based on the flags, I could do a branch or a loop which I will cover later in my in another discussion. So, similarly you could do this with you know compare BX comma CX – this is 16 BIT or BL comma CL-this is 8 BIT , or you could do ECX comma EBX, and this is 32-bit. And with any ALU operation like I mentioned earlier, you can use any kind of addressing which means that you could do CMP BX comma ECX-4. So, this is a register indirect with offset addressing that we are doing here, and we are comparing the 16-bits 31 of BX with 16-bits pointed to by ECX-4. There are a few more interesting things that you can do with 8086 architecture which happens to be a string operation or a string compare. So, let us assume that we have two arrays or you have an array in a in a memory right, and you want to search for a particular pattern. So, let us assume that in the memory in may be 100 consecutive locations, I have a particular string or an array of bytes that have been stored. So, this is my array of bytes, and let us assume that this is may be 256 locations. And what I want to do is to search for a particular byte pattern in this 256 location. So, let us assume that I want to see if some location in this actually has the contents 0x21, this is the very arbitrary number that I am just choosing here, just to illustrate this particular operation. So, let us assume that we want to search for 0x21 in 256 bytes stored in somewhere in memory. So, let us assume that this address location starts at some address called data underscore address. So, starting from data underscore address, I need to scan through 256 locations in order to see if anyone of them has the pattern 0x21. So, in order to do this, there is something called a scan string SCAS and since we are looking for a byte of information I am going to call it SCASB on the other hand remember that I can also search for a word or Dword. So, as with all our earlier discussions, this is 8-bits, this is 16-bits and this is 32-bits. So, what does SCASB do? It compares EAX right or if it is I am if I am trying to do a SCASD, or it would compare AX, or it would compare AL with the contents of the destination index register that is pointed to byte. So, whatever value the destination index has, it is going to compare one of these registers with either a byte or word or a Dword pointed to by the destination index. And after you finish this operation, the destination index register is automatically incremented or decremented by 1, 2 or 4. Now, it is decremented by 1 or increment by 1 if you are talking about a byte; it is incremented or decremented by 2 if you are talking about a word; it is incremented or decremented 4 if you are talking about Dword. 32 (Refer Slide Time: 09:57) So, now lets see how we can use this instruction to perform the task that is given to us namely we want to search for an array, search in an array which starts at data underscore address and find if some location has a particular string called 0x21. So, obviously because EAL or AL has to be the register that is going to be compared with , so we load this particular pattern into the AL register. So, what are we doing here we are loading the pattern into AL. Now, we need to search for the pattern in about 256 locations. So, therefore, we need to load this count into some particular register. So, it turns out that there is a designated register that has to be used for this purpose and that is nothing, but the ECX, C stands for the counter. So, by default any counter has to be loaded into ECX register. So, we are going to move CX with 0x0100. What am I doing here loading the count value which is nothing but 256 by the way in decimal. Now, also I have to initialize my destination index to point to that particular starting address. So, MOV DI comma; so, let us go ahead and add this particular instruction that we just discussed which happens to be the SCAS B. So, what is this instruction going to do, it is going to compare AL with contents of DI, and then it is going to either increment or decrement DI. So, how do you determine if the data I mean the address is going to be incremented or decremented that is determined by the direction flag which can be set or reset. So, therefore, before we start the program, we are going to do this operation called CLD 33 which is nothing, but clear direction flag, so that the string operations work in auto increment mode. So, after you compare AL with the contents pointed to by DI, DI will be incremented by 1 and why, is it 1, because we are doing SCASB, B for byte, and therefore, we are incrementing the address by one. Now, this unfortunately does it only for one particular location. So, for example, if my memory had,you know let me just illustrate with an example here. These are my n locations. Let us say this data in my data address was pointing here. And the data that is being stored in there is 0x45 , 0x46, 0x47 and 0x21. Remember that we are looking for this particular pattern 0x21 and that is what has been loaded into the AL register. So, when I execute this SCASB, once then it is only going to compare this particular value 0x45 with my AL register which is 0x21. And of course, the two numbers are not equal , but unfortunately we have still not been able to achieve the comparison with the entire array. So, how is that achieved, that is achieved with something known as a prefix that we add to this particular instruction REPNE. So, what does REPNE do, REPNE is repeat until not equal or CX equal to 0. So, this is known as a prefix to a string instructions. So, what do we call this, this is a REPNE is a prefix. So, what we are doing here is known as long as none of the locations have 0x21, you continue this SCASB operation again and again. And what is a SCASB do, it just compares then increments the destination pointer. So, after the first operation, my DI , my DI was pointing here 0x45. After the first comparison DI would then point here 0x46; second comparison, it would point here 0x47; the third comparison, it would point here 0x21. So, when it comes to the third particular value it finds that 0x21 has actually matched with my AL register and hence the comparison will stop. So, what does this mean, at when this comparison stops ECX will simply be ECX minus 3. It will be ECX minus 3 right. So, it will simply be 253. So, the idea here is to use a counter. And if you do not use the counter what happens is, it can go into an infinite loop. So, therefore, you have to tell it how many bytes to go and check with, so that is the limit of the number of locations in the array or the string that we are going to search and that is loaded into the ECX register. 34 So, you compare you increment the destination index because the direction flag has been reset and you keep doing this operation until you have encountered the particular pattern or the ECX register goes down 0. So, here I showed an example where this 0x21 occurred now what if this 0x21 did not occur. So, instead of 0x21, if it was 0x22, and all the 256 locations did not have any of this any the pattern 0x21, in that case the comparison will fail always and ECX will eventually come down to 0. So, this is case one, case two ECX simply be 0 because no location equal to 0x21. So, remember that this is not a software loop, but it is a hardware loop which implies that the instruction is just issued once, and the operation finishes in one shot and only then returns to execute the next particular instruction. And therefore, it is extremely fast. (Refer Slide Time: 19:26) 35 C Programming and Assembly language Prof. Janakiraman Viraraghavan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture – 4B (Refer Slide Time: 00:11) Welcome back to this course on C Programming and Assembly language. We were in module 1. (Refer Slide Time: 00:23) 36 In the last lecture, we concluded by discussing about a particular instruction called the SCAS byte right, we did SCASBword or Dword. And what this basically did was to compare either AL, AX, or EAX with the contents pointed to by DI and we also looked at a prefix to this instruction which is called R E P N E repeat as long as it is not equal.. And we will continue this lecture with some more such similar instructions. So, in this lecture, we will look at the following instruction which is a compare operation, compare operation on a string. So, let us assume that in my memory I have two different arrays, one sitting in my extra segment , this is my extra segment, and this is my data segment. And let us assume that there are a bunch of bytes here ok, N bytes, and out here as well. So, these are N bytes and so are these. Now, my job is to write a program to find out if these two arrays have the same data which means that all the locations have to have exactly the same value. So, if I had to do this in regular assembly programming, then I would have to loop through each location, find out if each location matches or mismatches, and then increment the address and so on. It is possible to do this kind of a comparison in a single shot, because the x86 architecture supports string instructions. So, there is an instruction called CMPS byte, word or Dword So, CMPSB would compare a byte, CMPSW would compare a word, and CMPSD would compare a Dword. So, what does this do? It basically compares the contents of a source index with the contents of destination index. And because it is a compare instruction, it is only going to affect the flag it will not affect either the SI location or the DI location , which means only flag is affected. So, just like the S C A S B the scan byte, after executing this instruction, the source index and the destination index will either be auto incremented or auto decremented. So, it is SI plus minus 1 2 or 4. And this plus or minus is determined by the direction flag. So, similarly DI will become DI plus minus 1/2/ 4 depending on whether we are accessing a word, a byte, a word, or a Dword in memory. So, let us now proceed and you know let us assume that you know in this example N is you know let us say this is about 100 bytes. 37 (Refer Slide Time: 05:39) So, what I want to do is to now write a program where I can compare location my location and say if the entire array matches or stop at the first mismatch. So, just like the S C A S B, I am going to go ahead and first clear the direction flag. So, what does this do, it basically clears direction flag implies auto increment, auto increment of SI /DI. (Refer Slide Time: 06:34) So, then lets also assume in the previous page that the first location of both arrays is given by ADDR_2, and this is ADDR_ 1. So, starting at ADDR 1 I am now comparing 100 locations with another array starting at ADDR 2 and going through another 100 38 locations in the extra segment ok. So, obviously, I need to move these addresses into my source index and destination index appropriately MOV SI,ADDR_ 1, and MOV DI,ADDR_2. So, just like we have been mentioning that the segment address and the offset of the register SI or DI gives the full address. The source index is always controlled by the data segment register, this is the full address; and the destination index is controlled by the extra segment ok. Now, let me MOV my count into the register which is 100. Note that this is in decimal not in hexadecimal now ok. And, I will add this instruction which is CMPSB right. This is basically you know load the address, this is load another address. So, this is INIT counter. So, when I do a CMPSB it is going to compare contents of SI with contents of DI and then auto increment SI+1and DI will be DI+1. Just like we had in the SCASB instruction, this does it only for one location. And if I want to repeat this till all the 100 location have been scanned, I need to add a prefix to it, and that prefix now is as long as the two locations are equal, you keep going. So, you keep REPE right. So, you keep going either until as long as both the compare you know compare operation results in equal values or the ECX eventually goes down to 0. So now, because of my prefix ECX will become ECX minus 1 after each of these operations. So by the end, when we finish this instruction REPE, CMPSB, and then proceed to the next instruction it would have compared all the locations that are the same. And at the first mismatch, it would have stopped, or it would have or when you are done with all the 100 locations it means that all the locations match exactly. So, if instead of a byte, if I wanted to compare it with a word, then I would have to do CMPSW in which case my SI and DI would auto increment by value of 2. And if I wanted to compare it with the D word, then I would do CMPSD and the SI and DI would auto increment by a value of 4 each time. So, similar to this there is an other instruction which is known as move instruction , data movement instruction , but string or an array. So, in this case, the example we are dealing with is you have a memory block which is sitting in my data segment data segment, this is my extra segment. And I have N locations which are starting at ADDR_1. And or this time, I could call it a source address source address you know in the in my data segment, and let us assume that this is sum N bytes. And I have another location in my extra segment which is called 39 DEST_ADDR. And what I want to do is to copy these n bytes from this to this, which implies from my data segment I want to copy all the N bytes it could be the 100 bytes as an example or 256 bytes from my data segment SRC underscore address offset to extra segment destination underscore ADDR offset and 100 locations there. So, how would you do this? It turns out that there is an instruction called MOVS byte and has with everything instruction moves word or moves Dword. And so let us go ahead and try to write the simple program for this. If I want to execute this, I would do CLD, which is basically AUTO increment DI and SI. Then I would move my source index with SRC_ADDR. Then I would move my destination index with destination_ADDR. And then of course, I have to load my count, so I would MOV ECX , N, whatever the value of N is. And then I will do MOVS byte ok. So, what does this MOVS byte do it, it will simply copy to the destination index the contents of source index. And of course, this is going to be ES_:destination index, and this is data segment:source index. And because this is a move operation, and it is not an ALU operation, obviously, a flag will not be set. So, therefore, you cannot use any kind of flag condition to decide if you want to repeat or you want to stop this operation. So, remember that REPE would you know go on as long as they are equal REPNE would go on as long as they are not equal and so on. But here because no flag is going to be set, I cannot know use any of those conditions. And therefore, I am simply going to put an unconditional prefix just rep which means repeat this MOVS byte operation until ECX has reached 0. So, and yeah by the way the after doing the MOVS byte, the destination index will get plus minus 1/2/4 and so will the source index 4 ok. So, here when I do this REP MOVS byte it means that destination index will get contents of source index , and then DI will be DI+1, SI will also be SI+1 and ECX let me just ECX will be auto decrement. So, in summary, we have studied three different string instructions SCASB, which is scan a string for a particular pattern which can be loaded in AL, AX or the EAX; CMPSB, or CMPSWD whatever can be used to compare the contents of SI and DI. And you can use a prefix of REPE to repeat this operation as long as they are equal. In the SCASB case, we repeated as long as they were not equal. And then the third operation is 40 a move data movement operation from the source index in the data segment to the destination index in the extra segment as long as my ECX is not 0. (Refer Slide Time: 18:01) 41 C Programming and Assembly language Prof. Janakiraman Viraraghavan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 05 So, welcome back to this course on C Programming and Assembly language. In this lecture we will discuss one of the most important operations that are exploited in a microprocessor both from the assembly language point of view and a C programming point of view and these are the stack related instructions. (Refer Slide Time: 00:30) So, the stack as we all know is as the name suggest impact is just a pile of books and instead of accessing a random book somewhere in between you always pick the book that is available on top. So, which means that the book that goes last is actually accessible first; so, this is a last in first out memory. So, let me remind you that in our abstraction of the memory, we said that there is an address which is some n bits and depending on the big combination 1 of the 2 power n locations will be accessed and that many bits of data will be made available on the data bus. This is the abstraction of the memory that we discussed and really that does not change even when we are dealing with a stack. It is only how the microprocessor actually deals with the memory that enables or that differentiate if it is a random access memory or a 42 stack memory. So, in our picture we had a memory which was random access and a mu P which communicated through a data bus an address bus a data bus and so on, read and write And really none of this part actually changes when we are talking about a stack. So, what is it that enables microprocessor in order to treat a random access memory as a stack. So, this is enabled particularly by a register called the stack pointer and this we have alluded to it earlier is called ESP Extended Stack Pointer or the just the stack pointer. And again what we do here is the memory is divided into various segments, the ESP is only the offset that is specified the base of that address comes from the stack segment ESS stack Extended Stack Segment register colon ESP is the stack address. So, what does this memory do? If you want to access a certain data or put something on to a stack all you have to do is exercise this instruction called a PUSH. And you can specify let us say a register AL. What is AL? It is 8 bits; 8 bits of that has to be pushed on to the memory or the stack segment. Note that here I am not specifying any address of where this data has to go as opposed to a MOV instruction where I would have said MOV AL to maybe some contents pointed to by bx. So, I am explicitly specifying where this data has to go and sit in memory I am not doing that in the case of a PUSH instruction. So, what happens is the PUSH instruction simply looks at this top of stack (top of ESP). And, I am not going to mention always that it is a stack segment register which completes the address, we will only talk about the ESP as the address. So, the contents pointed to by ESP is going to get loaded with the contents of the AL register. So, what happens here? Contents pointed to by ESP will get loaded with AL register, similarly I could also PUSH AX or EAX. So, now, the difference between PUSH AX or PUSH EAX , so PUSH AL. What is the difference? This is 8 bits, this is 16 bits and this is 32 bits. So now, let us go back to our logical memory map that we discussed in the initial lectures where we said that data is always accessed in chunks of 8 bits a particular address can store 8 bits, the next location will store the next 8 bits. 43 So, if I PUSH AL onto the stack, then my stack pointer has to be decremented by 1. So, after performing the PUSH AL my stack pointer has to get decremented by 1. WhyBecause I am now dealing with only 8 bits of information and the logical memory has to change only by 1. So, this is how the location of the stack is eventually decided every time you PUSH something on the stack the stack pointer gets decremented by 1, 2 or 4. So, if you instead AX to stack then the ESP would get decremented by because this is now 2 bytes of information and if you PUSH EAX on to stack ESP would get decremented by ESP minus 4 because there are 4 bytes in the EAX register. So, if you now look at a combination of two such instructions PUSH ECX, PUSH EBX. So, what would happen is the contents of ECX are actually loaded first and the contents of EBX are loaded next. (Refer Slide Time: 08:15) So, if you look at the particular stack segment here this is my entire memory and let us assume that the stack segment starts here and finishes here. So, my ESS is going to determine how much of memory is available in the stack segment or at least when it starts. So, if I now execute the instruction PUSH EBX, then 4 bytes of data of EBX will go and sit in 4 locations pointed to by the stack pointer. So, let us assume that my stack pointer 44 is pointing somewhere here in memory ESP. So, when I PUSH EBX 4 locations will get return; will get return with the contents of EBX and in the process my stack pointer would have now been decremented by 4 and it goes to a new location as pointed to by as shown here in red. Now, when I execute the next instruction which is PUSH ECX right or may be let us consider you know at different size here let us just PUSH CX 16 bits of information. Then what happens is you are going to decrement the stack pointer only by 2 now this distance is 4. Now, I am going to decrement my stack pointer only by 2 and CX will now come and sit here and this is 2 bytes let us say I want to access the data from the memory from the stack and bring it into a register I want to do the opposite of a push. So, that is called a POP operation right. So, I could POP into AL right or I could POP into BL or I could POP into BX or I could POP into EBX. So, I could do the exact same operations AL, AX or EAX. So, remember here I can also PUSH or POP data from an other location right using a different addressing mode. For example, I could do POP of contents pointed to by EBX, where EBX is now a register which is going to point to an address and which segment does this point to EBX always is associate associated with the data segment register by default. So, therefore, if this happens to be my data segment data segment, then EBX would be pointing to some location here and if I say POP into contents pointed to by EBX, then whatever my stack pointer is from that location a certain number of bytes would have to be popped into that location pointed to by EBX. Therefore, it is now important for us to specify how many such bytes we want to POP from the memory remember when we said PUSH AX or EAX, it was implicit from the op code or the mnemonic that we were pushing 16 bits or 32 bits or 8 bits. Now, when we are popping from the memory this information is not obvious and needs to be specified explicitly and therefore, we have discussed this earlier we need to specify if this is a word pointer or is it a DWORD pointer or is it a BYTE, the default is a byte pointer. So, let us look at an example for this if after this PUSH EBX and, PUSH CX instruction we did a POP I just write the instruction lower here POP WORD pointer contents pointed to by ECX. 45 Then what would happen is the top of the stack is pointing to this green location here this location. So, 2 bytes from this location will be popped and return into the location that is pointed to by ECX. So, ECX for example, may actually be pointing to this location in memory we have majenta location here. So, the 2 bytes may go and just get return into this location that is pointed to assuming that ECX is pointing to this particular location.. So, if instead of doing this POP word pointer I did POP DWORD pointer, then it would load 4 bytes from the top of stack into that location pointed to by ECX. So, 4 bytes there would get return. So, it is important to note that on implementing the PUSH the stack pointer is auto decremented. So, when we do a PUSH ESP is ESP minus 1, 2 or 4 which is determined by whether we want to PUSH a 8 byte data; 8 bit data, 16 bit data or a 32 bit data on to stack. So, when we do a POP the ESP will now be the exact opposite will happen will get added with 1, 2 or 4 which is determined by whether we actually say WORD or DWORD or a byte. So, in this operation of PUSH and POP there is really no particular address that is that has to be explicitly mentioned, it is implicitly available from the top of the stack pointer which is given by the extended stack pointer register. Data from there is either accessed out or return to and the stack pointer register is altered by that amount that we are actually writing or reading from the memory. So, this is the first, last in first out operation of the stack; however, it turns out that this is not sufficient for us we also need to be able to do random access even from the stack and therefore, we have another register which is known as the base pointer EBP. What does this do? It is nothing, but a register that allows random access from stack. So, for example, I could do MOV AX comma contents pointed to by EBP minus 4. So, you remember this register in directed indirect addressing with offset. So, that is what we are doing here we are calling the MOV instruction, we are going to load the data pointed to by EBP minus 4 and we have of course, I have to mention WORD pointer here right. So, data 16 bits of data pointed to EBP by minus 4 will get loaded into my AX register on executing this particular instruction. Again the EBP is associated with the extended stack segment register. 46 (Refer Slide Time: 18:52) So, this is E is associated with the ESS which means that the address is always given by ESS colon EBP. So, in summary ESP enables last in first out operation from stack, EBP allows random access from stack. So, with that we move into another class of instructions and probably the most important class of instructions which are again a branching set of instructions, but slightly different from what we studied a little earlier. So, that takes us to the call and return instruction. So, the idea here is that if you want to do a particular operation repeatedly, then you do not have to re rewrite the same code again and again and again. So, you can actually invoke what is known as a subroutine in assembly language or a function in C programming. So, a good programming practice is if you find that you are reusing a particular segment of code again and again, then please do not copy paste that convert that to a function and call that function again and again. So, we have the concept of a subroutine in assembly language or function in C programming So, the idea here is there is a particular chunk of code ; a CODE CHUNK which is going to be called again and again in order to perform a certain operation right. So, the simplest could be for example, I am just going to increment my BX register by 2. Let us say that I want to do this operation again and again EBX should be EBX plus 2, EAX should simply be EAX plus 3 and ECX should be ECX minus 4. Let us just assume that this set of instructions has to be done again and again in a microprocessor. So, what 47 do we do here? We actually if you wanted to write this instruction it would be the following , it would be add EBX comma 0x0002. Add EAX 0x0003 subtract ECX comma 0x0004 right. So, let us though I have written only 3 0s and its 16 bits let us assume that there are the you know its just z sign extended 0s are extended all the way to the 32 bits. So, these are all 32 bit numbers. So, let us assume that this sequence of instructions needs to be executed again and again in my program. So, what do? I load this into memory. (Refer Slide Time: 23:50) So, I have a location this is going to be some address in the code segment where I have these instructions add EBX comma 0x0002 add EAX, 0x0003 and subtract ECX comma 0x0004. So, this is sitting in my location function and my main code. So, these two labels LOC underscore FUN and main sitting before the colon are actually labels for my address in the code segment. So, let us say I have some instructions and I want to be able to call this function multiple times in the process. So, I want do call of location underscore function then I execute some more instructions and I want to call it again maybe we can take a more concrete example here let us clear EAX comma EAX let us clear, EBX comma and let us clear ECX,ECX this is simply going to clear and then I am going to call the function LOC underscore FUN right as described above. 48 So, the idea is you clear it, you execute this function and when you want to come back and continue with the execution of these functions. So, let us may be fill in some more concrete instructions here where I just say I may be I will clear the instructions again the registers again , I am going to just put this code again here. So, what should happen in this sequence of instructions is you clear the registers call the function and when the function is invoked you are just going to increment EBX by 2, EAX by 3 and ECX by 4 at the end of this first call EBX should be 0x0002. EAX should be 0x0003 and ECX should be 0x0004. The same clearing happens again and again I should be able to call this function and the same result should repeat that is the intended idea behind this subroutine call. So, how is that implemented in assembly language? So, when you call a particular address location like LOC underscore function, the first thing that has to be done is the EIP has to be loaded with fun. So, this instruction should translate to this particular operation this is very clear and is no different from the unconditional or the conditional branch that we studied earlier. In fact, its just an unconditional branch that we are doing here. So, what is the difference between a call and a unconditional branch that we studied earlier? The idea behind the call is it is a two way branching that is possible which means that you branch out into that subroutine call finish the execution and then you should be able to come back to where you left off. So, how does that gets executed in the microprocessor is the question. So, let us go back to our you know loop of instructions or loop of process that happens in a microprocessor right we said it does fetch decode and execute in a loop. So, when it fetches the particular instruction and it decodes that instruction what happens is it actually knows where the next instruction is going to sit in memory. So, at this point, the EIP is automatically incremented to the next instruction. So, for example, if you do an add then you know that the add instruction takes maybe 16 bits of information in the code segment; that means, the next instruction is going to be 2 locations away. If another instruction needs 4 bits or 4 bytes of data in the code segment, then at decode time the microprocessor will know that the next instruction is going to be four locations away. Therefore N will be 4. 49 So, what happens is when you execute a particular instruction, the instruction pointer is auto incremented to point to the next instruction at that particular location. So, for example, when you are executing this particular call in the process of decoding that instruction after fetching, the instruction pointer has already been incremented to point to this instruction. So, when the call instruction is going to get executed , the fetch decode and execute process when the call instruction is going to get executed which happens here, the EPI has already being incremented to point to the next instruction or the location where we have to return after finishing the subroutine. Therefore we implicitly have this information with us before we branch out. So, all we have to do is to put this information somewhere on in memory, so that we can just access it when we want to come back. So, it turns out that this return instruction data is pushed onto stack. So, when the call happens the other thing that implicitly happens is PUSH EIP on to stack. So, here remember this is another advantage of a stack that we discussed earlier, I do not have to worry about an exact location I just PUSH it on to stack and at an appropriate time I have to POP it out open out and so, that I can get the data of that instruction pointer back. So, when I do a call the instruction pointer is loaded with an new address that I have to go and execute my subroutine from and it also implicitly pushes my instruction pointer onto the stack. So, which register actually changes in this process by the way just a correction, the order is quite the opposite. The PUSH happens first because you have to PUSH the next instruction location on to stack first and then load the instruction pointer with the new location that you want to go and execute a subroutine from. So, implicitly remember that the stack pointer ESP is altered and the EIP is also altered because you are loading a new address into that register. So, you call this function, it will now go into that location function after pushing the EIP onto stack, it executes that particular instruction set which is this three instructions shown here. Now, I want to come back from this subroutine from where I left off that is enabled by what is known as the RET instruction R E T, it is a return instruction. What is the return instruction do RET. So, what does it do? It simply pops the top of stack data into the instruction pointer for it to continue from where it left off. So, it simply does the following it pops the top of stack into my instruction pointer. So, that it will now simply continue from where it left off. 50 So, in this particular sequence of instructions we clear the registers right we clear the registers one by one, then we do a call it goes to that particular location LOC underscore function executes the three instructions and when it does a ret remember that the EIP got pushed onto stack when we did the call. So, if you have done no other stack operation in between, then the top of stack will now contain the instruction pointer which is pointing to this particular location. So, when you do a return which happens to just POP the instruction pointer into this the data of the top onto the stack into the instruction pointer; it will simply come and resume its execution from this particular instruction onwards. So, you can also do not just a plain return you can do what is known as a RENT N. So, what does the RENT N do? It not only will POP the top of stack into the instruction pointer it will also add ESP will get added plus N. It will just add this number N that is given to the stack pointer, I would like to reiterate that I am breaking down every assembly instruction here into multiple operations does not mean that each of these operations are executed as separate assembly instructions. For example, the RET N is not broken down into these two assembly instructions. It is done in hardware and there is hardware support for it that is why it is that fast, if you do it in software it becomes very very slow , this is just a functional description of what happens all of these things that we discussed here or here happen in a single shot in hardware. So, we will come to why this particular addition is useful when we do a C program. So, with that we complete our discussion on the stack operations and the call and ret which are most critical for subroutine execution and returning in assembly language. 51 (Refer Slide Time: 36:46) 52 C Programming and Assembly language Prof. Janakiraman Viraraghavan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 06 So, welcome back to this C Programming and Assembly language course. In the last module we discussed in reasonable detail the various instructions from the 886 architecture that are relevant to C programming, we discussed a few examples and hopefully the assignments would have reinforced many of those concepts. So now, we move on to module 2 which essentially deals with the C programming and inline assembly. (Refer Slide Time: 00:43) So, primarily we are going to deal with C programs inline assembly. So, what is inline assembly? Inline assembly is nothing, but a simple way of moving from a C program to assembly language and then coming back to a C program. So, essentially it s a way in which you can intersperse assembly instructions in between a high level C like program, by the way before I proceed here I have to mention that like I said in my introductory class that I assume that the viewers and the students of this course are already familiar with C programming. And I am not going to go into any detail not even as much detail as I did for the assembly language of 886, I am not going 53 to go into any such detail of the C programming syntax or the functionality or you know any such thing. I am going to primarily deal only with inline assembly language examples and thereby reinforce some of the C programming concepts. So, you can always go back and refer Kernighan and Ritchie in case you are in doubt of any of these C syntax or the C functionalities. (Refer Slide Time: 02:19) So, what are we going to do in module 2? So, we will primarily deal with C programming inline assembly then so, we will talk about some of the data types and their sizes you know just to make sure that we tie it well with our discussion on the micro process that we had in module 1. Then we will look at some very specific examples. So, this module will be primarily run based on some examples, but the examples have been carefully chosen to drive home a certain concept which can be modified later or will become much clearer as we go on with the course. So, we will look at some ALU operations, string length function, multiplication, then we look at swapping of two functions of two variables in a function in some detail so, there are various ways in which you can do it in C. So, we will look at how you can actually do this better with assembly language how would you do this if you were to do it using a function instead of swapping variables within within that scope. So, these are some nitty gritties and some details that get driven home when we discuss these particular examples we will come to it a little later. 54 So, first of all before I proceed into concrete examples let me also state that there are different kinds of compilers there is a GCC compiler, there is a turbo C compiler, there is a Microsoft visual C compiler and so on, there are many compilers for C and C plus plus. So, which one do I pick so? I have picked technically should not matter which one you pick, because they are all ultimately they all implement the same functionality, but I have picked a compiler which essentially allows this inline assembly coding most easily. So, to do inline assembly programming you need to follow a certain syntax certain compilers like GCC, the syntax is pretty complex and is not worth it for us because you do not want to get stuck in the syntax of these operations rather than actually understanding the concept. So therefore, I have picked the MSVC compiler , this is the Microsoft visual C compiler, C or C++ compiler does not matter. So, let us quickly get into you know what inline assembly is all about. So, let us assume that we have this function void main int x equals to and I m going to do you know x is equal to x plus 4, printf slash n percent d comma x. So, let us assume that I want to translate this particular instruction alone into assembly, I want to do only this instruction in assembly. I want to leave the rest of the syntax as it is, which means that the rest of my function has to be interpreted as a C program which means I can follow the syntax of C programming there, only certain key instructions which I want to speed up I want to move to assembly language. So, it turns out that it is very much possible and in Microsoft visual C or visual C plus plus that is achieved simply by putting this directive called underscore underscore asm and in flower bracket you go ahead and write any assembly instructions. So, you can put in any assembly instructions here, now the beauty is that the variable names x y and z whatever we have as variables can continue to be used inside this assembly directive flower braces that we have put. And we can also use all the registers and other assembly instructions as it is from the assembly instruction set that we have. 55 (Refer Slide Time: 07:23) So, let us look at a very concrete example first. So, let us assume that I want to do this x equal to x plus 2 operation in assembly. So, what do you do? So, I will write underscore underscore asm bracket open the instruction I want to do in comments I am going to put it here is x plus 2. So, what do you do? You just say move EAX comma x. Now, what this x is in terms of microprocessor data and microprocessor registers we will look at later let us not worry about it now, for now assume that x is a variable and is accessible inside this in inline assembly portion as well. So, then what do I do? I do add EAX comma 0 x 0 0 0 2. So, what is it done? It has simply moved EAX the register it has got the value x here, it has done EAX EAX plus 2 this value has not yet got replaced into x. Therefore, I need to execute another instruction which is move x comma or rather small x EAX. So, this executes the instruction where x will eventually get the value EAX. So, now, if I go ahead and do my printf percent d comma x, what gets printed here is of course, it depends on what x was initialized 2. So, x was initialized to 2 here. So therefore, this will simply print 4 for you here. So, notice how we simply translated only this instruction x equal to x plus 2 into a particular assembly block, this is known as inline. So, what we are going to do in this module is to reinforce the assembly language instructions that we learnt in module 1, through inline assembly C programming. So, that 56 way we sort of cover some concepts of C as well as reinforce the instruction set that we learnt in module 1. So, with regard to that let us look at our first example, write an assembly program to evaluate the following expressions. So, let us assume all variables are 32 bit integers so that brings us to an interesting discussion, saying what are the typical data types that we will use in C. So, typically these remember that the logical memory map allows us to deal with multiples of bytes. So; obviously, the smallest unit that we can deal with is going to be a byte of data. So, therefore, byte of data so, this is what is known as a character CHAR in C. Now you could also deal with a word of data depending on you know whether you are dealing with an older processor, a new processor or with what kind of compiler this could either be a SHORT integer or an integer itself. Now, we could also deal with the D WORD of data, again depending on the compiler this could be a LONG INT or it could be an INTEGER. So, the main point is that because the registers and the logical address mapping is going to deal with multiples of bytes or words or D words the data types in C are also mapped to similar sizes and that is what we see in all our compilers across various kinds of processors. (Refer Slide Time: 12:47) 57 So, here this example let us assume that all variables are 32 bit integers and we want to perform the operation x into y plus a minus b and load it into EAX register. We want to do x x or y or a and b and load it into the EBX register ok. So, let us start of by writing our inline assembly. (Refer Slide Time: 13:17) Let us void main going forward I may not always write this void main. So, let s assume that int x equals 2, y equals 3, a equals 4, b equals 5 and what is the operation that we want to perform, EAX should get loaded with x into y plus a minus b. So, let us go ahead and do this thing in assembly language underscore underscore asm. So, what do we do, we first load x into EAX, remember that when we want to do multiplication EAX is an implicit register and because this is a 32 bit integer that we are talking about we have to deal with EAX and not just a x. So, therefore, MOV EAX comma I will just call it x, then I am going to do a MUL y. So, what is this instruction done for us, it is simply loaded EAX with the value x. What is this operation done, it has loaded EDX and EAX together the 64 bit number as x into y. Now, I go ahead and add remember that now EAX has my answer there. So, I go ahead and add a to it sorry ADD EAX comma a and finally, I subtract EAX comma b. So, of course, this particular program will work only if the higher EDX happens to be 0. So, it is an interesting exercise I leave it to you to figure out how to modify this, if EDX also happens to be a non 0 number because of the multiplication. 58 So, what are we doing here, EAX is simply now x into y plus a and EAX eventually is x into y plus a minus b. So, I can close this bracket and this concludes our arithmetic operation that we wanted to do x into y plus a minus b and we are loading the result eventually just in EAX we are not interested in getting this to any other variable. So, now let us look at the logical operation we wanted to implement EBX equals x x or y or a and b, remember all these are bitwise operations. So, therefore, I again go ahead open my underscore underscore asm block I load EBX with x, then XOR EBX with y, then I have to know because there is a bracket I have to do the a and b very carefully I MOV ECX with a, then I do an AND of ECX with b and then I do an OR of EBX comma ECX. So, this operation is EBX will get x, this operation is EBX is x x or y, this operation is ECX gets a, this is ECX is a and b bitwise and the final operation is EBX equals x x or y or a and b. So, I can close this and of course, I eventually I can close my C function as well out here. So, there are 2 blocks that we have introduced into the C programming syntax to just illustrate the concept of inline assembly, this is 1 inline INLINE CODE 1, 2. So, with that now let us move on to another interesting example, which is you know where we are going to reinforce the concept of jumped instructions and loops in assembly language. (Refer Slide Time: 20:01) 59 We want to write an assembly program to evaluate the expression “z equal to x into y” using repeated addition. (Refer Slide Time: 20:13) So, what do we want to perform, we want to perform z equals x into y using repeated Addition. So, let us assume that x and y are 16 bit short integers right. So, we go ahead x equals 2 y equals 3 and let us say int z equals 0 and of course, the intended operation is z equals x times y. This is just one statement in C programming, but we want to now break this down into a repeated addition operation, how would you do this in assembly just to drive home the point of doing a branching and looping operation in assembly language. So, this is the instruction that I want to convert. So, therefore, I will now introduce my underscore underscore asm block here. So, repeated addition is just adding x y times. So, what do you do? You first have to clear some register where they are going to add this any number of times. So therefore, I do x or EAX comma EAX. So, this is EAX equals 0 irrespective of what EAX was, the value of at the end of this XOR operation is just 0. So, I am clearing the register and I am going to MOV ECX with y , this is my counter I am going to add x that many times with my eventual register. So, what I do is, I start adding now EAX comma x after this I need to decrement ECX because I have now added it once I need to decrement ECX remember that on decrementing ECX the 0 flag 60 may or may not be set. So, you need to do this operation of decrementing or adding x to itself as many times as the value of ECX does not go to 0. So, therefore, here you have to do a jump on no 0 to this address here there is let us call this maybe I will write it in a different color to indicate that this is a label or an address. So, I am going to jump to this label called MULT. So, here what are we doing, we are simply loading the COUNTER ECX is which is nothing, but y. Then add EAX with x I am adding x equals or plus x and this is ECX becomes ECX minus 1 and as long as it is not 0 you keep adding this x to itself. So, when the for example, y is 3 here so, after 3 counts y will come down to 0 and that is the condition when the instruction instead of looping all the way back to MULT will actually proceed which means, when it hits 0 the 0 flag will be set. And therefore, jump or no 0 will not be satisfied and the instruction will proceed to the next step and where I am ready to load my final answer into z. So, therefore, here MOV z comma EAX so, when you come here z will be x into y. So, again I finish my inline assembly block and if you want you can do a printf here of percent d which is z value and you will see that the answer is 6. So, here we have illustrated apart from all the ALU operations we have also shown how you can exploit the Jump on NO Zero operation to loop back to a particular address depending on a particular condition. So, in the next lecture we will look at some more examples of a string length and so on to reinforce certain other assembly instructions that we studied in module 1.

Use Quizgecko on...
Browser
Browser