(THEMO~3.PDF
Document Details
Uploaded by MagnanimousCloisonnism
Vilnius University
Full Transcript
2.12 Translating and Starting a Program 131 When do you use primitives like load-reserved and store-conditional? Check 1. When cooperating threads of a parallel program need to synchronize to get...
2.12 Translating and Starting a Program 131 When do you use primitives like load-reserved and store-conditional? Check 1. When cooperating threads of a parallel program need to synchronize to get Yourself proper behavior for reading and writing shared data. 2. When cooperating processes on a uniprocessor need to synchronize for reading and writing shared data. 2.12 Translating and Starting a Program This section describes the four steps in transforming a C program into a file in nonvolatile storage (disk or flash memory) into a program running on a computer. Figure 2.20 shows the translation hierarchy. Some systems combine these steps to reduce translation time, but programs go through these four logical phases. This section follows this translation hierarchy. C program Compiler Assembly language program Assembler Object: Machine language module Object: Library routine (machine language) Linker Executable: Machine language program Loader Memory FIGURE 2.20 A translation hierarchy for C. A high-level language program is first compiled into an assembly language program and then assembled into an object module in machine language. The linker combines multiple modules with library routines to resolve all references. The loader then places the machine code into the proper memory locations for execution by the processor. To speed up the translation process, some steps are skipped or combined. Some compilers produce object modules directly, and some systems use linking loaders that perform the last two steps. To identify the type of file, UNIX follows a suffix convention for files: C source files are named x.c, assembly files are x.s, object files are named x.o, statically linked library routines are x.a, dynamically linked library routes are x.so, and executable files by default are called a.out. MS-DOS uses the suffixes.C,.ASM,.OBJ,.LIB,.DLL, and.EXE to the same effect. 132 Chapter 2 Instructions: Language of the Computer Compiler The compiler transforms the C program into an assembly language program, a symbolic form of what the machine understands. High-level language programs take many fewer lines of code than assembly language, so programmer productivity is much higher. assembly language A In 1975, many operating systems and assemblers were written in assembly symbolic language that language because memories were small and compilers were inefficient. The can be translated into million-fold increase in memory capacity per single DRAM chip has reduced binary machine language. program size concerns, and optimizing compilers today can produce assembly language programs nearly as well as an assembly language expert, and sometimes even better for large programs. Assembler Since assembly language is an interface to higher-level software, the assembler can also treat common variations of machine language instructions as if they were instructions in their own right. The hardware need not implement these instructions; however, their appearance in assembly language simplifies translation pseudoinstruction A and programming. Such instructions are called pseudoinstructions. common variation As mentioned above, the RISC-V hardware makes sure that register x0 always of assembly language has the value 0. That is, whenever register x0 is used, it supplies a 0, and if the instructions often treated programmer attempts to change the value in x0, the new value is simply discarded. as if it were an instruction in its own right. Register x0 is used to create the assembly language instruction that copies the contents of one register to another. Thus, the RISC-V assembler accepts the following instruction even though it is not found in the RISC-V machine language: li x9, 123 // load immediate value 123 into register x9 The assembler converts this assembly language instruction into the machine language equivalent of the following instruction: addi x9, x0, 123 // register x9 gets register x0 + 123 The RISC-V assembler also converts mv (move) into an addi instruction. Thus mv x10, x11 // register x10 gets register x11 becomes addi x10, x11, 0 // register x10 gets register x11 + 0 The assembler also accepts j Label to unconditionally branch to a label, as a stand-in for jal x0, Label. It also converts branches to faraway locations into a branch and a jump. As mentioned above, the RISC-V assembler allows large constants to be loaded into a register despite the limited size of the immediate instructions. Thus, the load immediate (li) pseudoinstruction introduced above 2.12 Translating and Starting a Program 133 can create constants larger than addi’s immediate field can contain; the load address (la) macro works similarly for symbolic addresses. Finally, it can simplify the instruction set by determining which variation of an instruction the programmer wants. For example, the RISC-V assembler does not require the programmer to specify the immediate version of the instruction when using a constant for arithmetic and logical instructions; it just generates the proper opcode. Thus and x9, x10, 15 // register x9 gets x10 AND 15 becomes andi x9, x10, 15 // register x9 gets x10 AND 15 We include the “i” on the instructions to remind the reader that andi produces a different opcode in a different instruction format than the and instruction with no immediate operands. In summary, pseudoinstructions give RISC-V a richer set of assembly language instructions than those implemented by the hardware. If you are going to write assembly programs, use pseudoinstructions to simplify your task. To understand the RISC-V architecture and be sure to get best performance, however, study the real RISC-V instructions found in Figures 2.1 and 2.18. To reduce confusion about real instructions versus pseudoinstructions, Chapters 2 and 3 will only use real instructions even where an experienced assembly language programmer would use pseudoinstructions. Assemblers will also accept numbers in a variety of bases. In addition to binary and decimal, they usually accept a base that is more succinct than binary yet converts easily to a bit pattern. RISC-V assemblers use hexadecimal and octal. Such features are convenient, but the primary task of an assembler is assembly into machine code. The assembler turns the assembly language program into an object file, which is a combination of machine language instructions, data, and information needed to place instructions properly in memory. To produce the binary version of each instruction in the assembly language program, the assembler must determine the addresses corresponding to all labels. Assemblers keep track of labels used in branches and data transfer instructions in a symbol table A table symbol table. As you might expect, the table contains pairs of symbols and addresses. that matches names of The object file for UNIX systems typically contains six distinct pieces: labels to the addresses of the memory words that instructions occupy. n The object file header describes the size and position of the other pieces of the object file. n The text segment contains the machine language code. n The static data segment contains data allocated for the life of the program. (UNIX allows programs to use both static data, which is allocated throughout the program, and dynamic data, which can grow or shrink as needed by the program. See Figure 2.13.) n The relocation information identifies instructions and data words that depend on absolute addresses when the program is loaded into memory. 134 Chapter 2 Instructions: Language of the Computer n The symbol table contains the remaining labels that are not defined, such as external references. n The debugging information contains a concise description of how the modules were compiled so that a debugger can associate machine instructions with C source files and make data structures readable. The next subsection shows how to attach such routines that have already been assembled, such as library routines. Linker What we have presented so far suggests that a single change to one line of one procedure requires compiling and assembling the whole program. Complete retranslation is a terrible waste of computing resources. This repetition is particularly wasteful for standard library routines, because programmers would be compiling and assembling routines that by definition almost never change. An alternative is to compile and assemble each procedure independently, so that a change to one line would require compiling and assembling only one procedure. This alternative requires a new systems linker Also called program, called a link editor or linker, which takes all the independently assembled link editor. A systems machine language programs and “stitches” them together. The reason a linker is useful program that combines is that it is much faster to patch code than it is to recompile and reassemble. independently assembled There are three steps for the linker: machine language programs and resolves all undefined labels into an 1. Place code and data modules symbolically in memory. executable file. 2. Determine the addresses of data and instruction labels. 3. Patch both the internal and external references. The linker uses the relocation information and symbol table in each object module to resolve all undefined labels. Such references occur in branch instructions and data addresses, so the job of this program is much like that of an editor: it finds the old addresses and replaces them with the new addresses. Editing is the origin of the name “link editor,” or linker for short. If all external references are resolved, the linker next determines the memory executable file A functional program in locations each module will occupy. Recall that Figure 2.13 on page 112 shows the format of an object the RISC-V convention for allocation of program and data to memory. Since file that contains no the files were assembled in isolation, the assembler could not know where a unresolved references. module’s instructions and data would be placed relative to other modules. It can contain symbol When the linker places a module in memory, all absolute references, that is, tables and debugging memory addresses that are not relative to a register, must be relocated to reflect information. A “stripped its true location. executable” does not contain that information. The linker produces an executable file that can be run on a computer. Typically, Relocation information this file has the same format as an object file, except that it contains no unresolved may be included for the references. It is possible to have partially linked files, such as library routines, that loader. still have unresolved addresses and hence result in object files. 2.12 Translating and Starting a Program 135 Linking Object Files EXAMPLE Link the two object files below. Show updated addresses of the first few instructions of the completed executable file. We show the instructions in assembly language just to make the example understandable; in reality, the instructions would be numbers. Note that in the object files we have highlighted the addresses and symbols that must be updated in the link process: the instructions that refer to the addresses of procedures A and B and the instructions that refer to the addresses of data words X and Y. Object file header Name Procedure A Text size 100hex Data size 20hex Text segment Address Instruction 0 lw x10, 0(x3) 4 jal x1, 0...... Data segment 0 (X)...... Relocation information Address Instruction type Dependency 0 lw X 4 jal B Symbol table Label Address X – B – Name Procedure B Text size 200hex Data size 30hex Text segment Address Instruction 0 sw x11, 0(x3) 4 jal x1, 0...... Data segment 0 (Y)...... Relocation information Address Instruction type Dependency 0 sw Y 4 jal A Symbol table Label Address Y – A – Procedure A needs to find the address for the variable labeled X to put in the load instruction and to find the address of procedure B to place in the jal 136 Chapter 2 Instructions: Language of the Computer instruction. Procedure B needs the address of the variable labeled Y for the ANSWER store instruction and the address of procedure A for its jal instruction. From Figure 2.14 on page 113, we know that the text segment starts at address 0000 0000 0040 0000hex and the data segment at 0000 0000 1000 0000hex. The text of procedure A is placed at the first address and its data at the second. The object file header for procedure A says that its text is 100hex bytes and its data is 20hex bytes, so the starting address for procedure B text is 40 0100hex, and its data starts at 1000 0020hex. Executable file header Text size 300hex Data size 50hex Text segment Address Instruction 0000 0000 0040 0000hex lw x10, 0(x3) 0000 0000 0040 0004hex jal x1, 252ten...... 0000 0000 0040 0100hex sw x11, 32(x3) 0000 0000 0040 0104hex jal x1, -260ten...... Data segment Address 0000 0000 1000 0000hex (X)...... 0000 0000 1000 0020hex (Y)...... Now the linker updates the address fields of the instructions. It uses the instruction type field to know the format of the address to be edited. We have three types here: 1. The jump and link instructions use PC-relative addressing. Thus, for the jal at address 40 0004hex to go to 40 0100hex (the address of procedure B), it must put (40 0100hex – 40 0004hex) or 252ten in its address field. Similarly, since 40 0000hex is the address of procedure A, the jal at 40 0104hex gets the negative number -260ten (40 0000hex – 40 0104hex) in its address field. 2. The load addresses are harder because they are relative to a base register. This example uses x3 as the base register, assuming it is initialized to 0000 0000 1000 0000hex. To get the address 0000 0000 1000 0000hex (the address of word X), we place 0ten in the address field of lw at address 40 0000hex. Similarly, we place 20hex in the address field of sw at address 40 0100hex to get the address 0000 0000 1000 0020hex (the address of doubleword Y). 3. Store addresses are handled just like load addresses, except that their S-type instruction format represents immediates differently than loads’ I-type format. We place 32ten in the address field of sw at address 40 0100hex to get the address 0000 0000 1000 0020hex (the address of word Y). 2.12 Translating and Starting a Program 137 Loader Now that the executable file is on disk, the operating system reads it to memory and starts it. The loader follows these steps in UNIX systems: loader A systems program that places an 1. Reads the executable file header to determine size of the text and data object program in main segments. memory so that it is ready to execute. 2. Creates an address space large enough for the text and data. 3. Copies the instructions and data from the executable file into memory. 4. Copies the parameters (if any) to the main program onto the stack. 5. Initializes the processor registers and sets the stack pointer to the first free location. 6. Branches to a start-up routine that copies the parameters into the argument registers and calls the main routine of the program. When the main routine returns, the start-up routine terminates the program with an exit system call. Dynamically Linked Libraries The first part of this section describes the traditional approach to linking libraries Virtually every before the program is run. Although this static approach is the fastest way to call problem in computer library routines, it has a few disadvantages: science can be solved n The library routines become part of the executable code. If a new version of by another level of the library is released that fixes bugs or supports new hardware devices, the indirection. statically linked program keeps using the old version. David Wheeler n It loads all routines in the library that are called anywhere in the executable, even if those calls are not executed. The library can be large relative to the program; for example, the standard C library on a RISC-V system running the Linux operating system is 1.5 MiB. These disadvantages lead to dynamically linked libraries (DLLs), where the dynamically linked library routines are not linked and loaded until the program is run. Both the libraries (DLLs) Library program and library routines keep extra information on the location of nonlocal routines that are linked procedures and their names. In the original version of DLLs, the loader ran to a program during execution. a dynamic linker, using the extra information in the file to find the appropriate libraries and to update all external references. The downside of the initial version of DLLs was that it still linked all routines of the library that might be called, versus just those that are called during the running of the program. This observation led to the lazy procedure linkage version of DLLs, where each routine is linked only after it is called. Like many innovations in our field, this trick relies on a level of indirection. Figure 2.21 shows the technique. It starts with the nonlocal routines calling a set of dummy routines at the end of the program, with one entry per nonlocal routine. These dummy entries each contain an indirect branch. 138 Chapter 2 Instructions: Language of the Computer Text Text JAL JAL...... LW LW JALR JALR...... Data Data Text... ADDI x0, ID JAL... Text Dynamic linker/loader Remap DLL routine JAL... Data/Text Text DLL routine DLL routine...... JALR JALR (a) First call to DLL routine (b) Subsequent calls to DLL routine FIGURE 2.21 Dynamically linked library via lazy procedure linkage. (a) Steps for the first time a call is made to the DLL routine. (b) The steps to find the routine, remap it, and link it are skipped on subsequent calls. As we will see in Chapter 5, the operating system may avoid copying the desired routine by remapping it using virtual memory management. The first time the library routine is called, the program calls the dummy entry and follows the indirect branch. It points to code that puts a number in a register to identify the desired library routine and then branches to the dynamic linker/ loader. The linker/loader finds the wanted routine, remaps it, and changes the address in the indirect branch location to point to that routine. It then branches to it. When the routine completes, it returns to the original calling site. Thereafter, the call to the library routine branches indirectly to the routine without the extra hops. In summary, DLLs require additional space for the information needed for dynamic linking, but do not require that whole libraries be copied or linked. They pay a good deal of overhead the first time a routine is called, but only a single indirect branch thereafter. Note that the return from the library pays no extra overhead. Microsoft’s Windows relies extensively on dynamically linked libraries, and it is also the default when executing programs on UNIX systems today. 2.12 Translating and Starting a Program 139 Starting a Java Program The discussion above captures the traditional model of executing a program, where the emphasis is on fast execution time for a program targeted to a specific instruction set architecture, or even a particular implementation of that architecture. Indeed, it is possible to execute Java programs just like C. Java was invented with a different set of goals, however. One was to run safely on any computer, even if it might slow execution time. Figure 2.22 shows the typical translation and execution steps for Java. Rather than compile to the assembly language of a target computer, Java is compiled first to instructions that are easy to interpret: the Java bytecode instruction set (see Java bytecode Section 2.15). This instruction set is designed to be close to the Java language so that Instruction from an this compilation step is trivial. Virtually no optimizations are performed. Like the C instruction set designed compiler, the Java compiler checks the types of data and produces the proper operation to interpret Java for each type. Java programs are distributed in the binary version of these bytecodes. programs. A software interpreter, called a Java Virtual Machine (JVM), can execute Java bytecodes. An interpreter is a program that simulates an instruction set architecture. Java Virtual Machine For example, the RISC-V simulator used with this book is an interpreter. There is (JVM) The program that no need for a separate assembly step, since either the translation is so simple that interprets Java bytecodes. the compiler fills in the addresses or JVM finds them at runtime. The upside of interpretation is portability. The availability of software Java virtual machines meant that most people could write and run Java programs shortly after Java program Compiler Class files (Java bytecodes) Java library routines (machine language) Just In Time Java Virtual Machine compiler Compiled Java methods (machine language) FIGURE 2.22 A translation hierarchy for Java. A Java program is first compiled into a binary version of Java bytecodes, with all addresses defined by the compiler. The Java program is now ready to run on the interpreter, called the Java Virtual Machine (JVM). The JVM links to desired methods in the Java library while the program is running. To achieve greater performance, the JVM can invoke the JIT compiler, which selectively compiles methods into the native machine language of the machine on which it is running. 140 Chapter 2 Instructions: Language of the Computer Java was announced. Today, Java virtual machines are found in billions of devices, in everything from cell phones to Internet browsers. The downside of interpretation is lower performance. The incredible advances in performance of the 1980s and 1990s made interpretation viable for many important applications, but the factor of 10 slowdown when compared to traditionally compiled C programs made Java unattractive for some applications. To preserve portability and improve execution speed, the next phase of Java’s Just In Time compiler development was compilers that translated while the program was running. Such (JIT) The name Just In Time compilers (JIT) typically profile the running program to find where commonly given to a compiler that operates at the “hot” methods are and then compile them into the native instruction set on runtime, translating the which the virtual machine is running. The compiled portion is saved for the next interpreted code segments time the program is run, so that it can run faster each time it is run. This balance into the native code of the of interpretation and compilation evolves over time, so that frequently run Java computer. programs suffer little of the overhead of interpretation. As computers get faster so that compilers can do more, and as researchers invent betters ways to compile Java on the fly, the performance gap between Java and C or C++ is closing. Section 2.15 goes into much greater depth on the implementation of Java, Java bytecodes, JVM, and JIT compilers. Check Which of the advantages of an interpreter over a translator was the most important Yourself for the designers of Java? 1. Ease of writing an interpreter 2. Better error messages 3. Smaller object code 4. Machine independence 2.13 A C Sort Example to Put it All Together One danger of showing assembly language code in snippets is that you will have no idea what a full assembly language program looks like. In this section, we derive the RISC-V code from two procedures written in C: one to swap array elements and one to sort them.