Unit-04-final PDF
Document Details
Uploaded by TranquilRoseQuartz
MIT-WPU
Tags
Summary
This document discusses various aspects of compiler design, including semantic analysis, intermediate code generation, and different types of three-address statements. It also covers different types of SDTs, such as S-attributed and L-attributed definitions, and how to implement them with the help of example.
Full Transcript
UNIT – IV Semantic Analysis and Intermediate Code Generation Final Year BTECH Subject : System Software and Compiler Design Unit IV : Contents 2 Semantic Analysis: ◻ Need, Syntax Directed Translation ◻ Syntax Directed Definitions ◻ Translation of...
UNIT – IV Semantic Analysis and Intermediate Code Generation Final Year BTECH Subject : System Software and Compiler Design Unit IV : Contents 2 Semantic Analysis: ◻ Need, Syntax Directed Translation ◻ Syntax Directed Definitions ◻ Translation of Assignment Statements Iterative statements, Boolean expression conditional statements, ◻ Type Checking and Type conversion Intermediate Code Formats: ◻ Postfix notation, Parse and syntax tress, ◻ Three address code, quadruples and triples Intermediate Code generator 3 ▪ The front end translates a source program into an intermediate representation from which the back end generates target code. ▪ Benefits of using a machine-independent intermediate form are: 1. Retargeting is facilitated. That is, a compiler for a different machine can be created by attaching a back end for the new machine to an existing front end. 2. A machine-independent code optimizer can be applied to the intermediate representation. Intermediate Languages 4 ▪ Three ways of intermediate representation: Syntax tree Postfix notation Three address code ▪ The semantic rules for generating three-address code from common programming language constructs are similar to those for constructing syntax trees or for generating postfix notation. Syntax Tree 5 ▪ A syntax tree depicts the natural hierarchical structure of a source program ▪ A dag (Directed Acyclic Graph) gives the same information but in a more compact way because common subexpressions are identified ▪ A syntax tree and dag for the assignment statement a : = b * - c + b * - c are as follows: Postfix Notation 6 ▪ Postfix notation is a linearized representation of a syntax tree; it is a list of the nodes of the tree in which a node appears immediately after its children. The postfix notation for the syntax tree given above is a b c uminus * b c uminus * + assign Three-Address Code 7 ▪ Three-address code is a sequence of statements of the general form x : = y op z ▪ where x, y and z are names, constants, or compiler-generated temporaries; op stands for any operator, such as a fixed- or floating-point arithmetic operator, or a logical operator on boolean- valued data. Thus a source language expression like x+ y*z might be translated into a sequence t1 : = y * z t2 : = x + t1 ▪ where t1 and t2 are compiler-generated temporary names. Advantages of Three-Address Code 8 ▪ Advantages of three-address code: 1. The unraveling of complicated arithmetic expressions and of nested flow-of-control statements makes three-address code desirable for Target code generation and optimization. 2. The use of names for the intermediate values computed by a program allows three- address code to be easily rearranged – unlike postfix notation. ▪ Three-address code is a linearized representation of a syntax tree or a dag in which explicit names correspond to the interior nodes of the graph. The syntax tree and dag are represented by the three-address code sequences. Variable names can appear directly in three- address statements. Three-Address Code Example 9 ▪ Three-address code corresponding to the syntax tree and dag given above t1 : = - c t1 : = -c t2 : = b * t1 t2 : = b * t1 t3 : = - c t5 : = t2 + t2 t4 : = b * t3 a : = t5 t5 : = t2 + t4 a : = t5 (a) Code for the syntax tree (b) Code for the dag ▪ The reason for the term “three-address code” is that each statement usually contains three addresses, two for the operands and one for the result. Types of Three -Address Statements 10 ▪ The common three-address statements are: 1. Assignment statements of the form x : = y op z, where op is a binary arithmetic or logical operation. Assignment instructions of the form x : = op y, where op is a unary operation. Essential unary operations include unary minus, logical negation, shift operators, and conversion operators that, for example, convert a fixed-point number to a floating-point number 2. Copy Statements of the form x : = y where the value of y is assigned to x. 3. The unconditional jump goto L. The three-address statement with label L is the next to be executed. 4. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator (=, etc. ) to x and y, and executes the statement with label L next if x stands in relation relop to y. If not, the three-address statement following if x relop y goto L is executed next, as in the usual sequence. Types of Three -Address Statements 11 5. param x and call p, n for procedure calls and return y, where y representing a returned value is optional. For example, param x1 param x2... param xn call p,n generated as part of a call of the procedure p(x1, x2, …. ,xn ). 6. Indexed assignments of the form x : = y[i] and x[i] : = y. 7. Address and pointer assignments of the form x : = &y , x : = *y, and *x : = y. Three -Address Statements: Example 12 ◻ Consider the statement do { i = i+l;} while (a[i] < v) ; The multiplication i * 8 is appropriate for an array of elements that each take 8 units of space. Implementation of Three-Address Statements Three address code instructions can be implemented as objects or as records with fields for the operator and the operands. Three such representations are called Quadruples A quadruple (or just "quad') has four fields, which we call op, arg,, arg2, and result Triples: A triple has only three fields, which we call op, arg1, and arg2. the DAG and triple representations of expressions are equivalent Indirect Triples: consist of a listing of pointers to triples, rather than a listing of triples themselves. The benefit of Quadruples over Triples can be seen in an optimizing compiler, where instructions are often moved around. With quadruples, if we move an instruction that computes a temporary t, then the instructions that use t require no change. With triples, the result of an operation is referred to by its position, so moving an instruction may require to change all references to that result. This problem does not occur with indirect triples. Quadruples 14 ▪ A quadruple is a record structure with four fields, which are, op, arg1, arg2 and result. ▪ The op field contains an internal code for the operator. The three-address statement x : = y op z is represented by placing y in arg1, z in arg2 and x in result ▪ The contents of fields arg1, arg2 and result are normally pointers to the symbol-table entries for the names represented by these fields. If so, temporary names must be entered into the symbol table as they are created. Triples 15 ▪ To avoid entering temporary names into the symbol table, we might refer to a temporary value by the position of the statement that computes it. ▪ If we do so, three-address statements can be represented by records with only three fields: op, arg1 and arg2. ▪ The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table or pointers into the triple structure ( for temporary values ). ▪ Since three fields are used, this intermediate code format is known as triples. Triples 16 ▪ A ternary operation like x[i] : = y requires two entries in the triple structure as shown as below while x : = y[i] is naturally represented as two operations. Indirect Triples 17 ▪ Another implementation of three-address code is that of listing pointers to triples, rather than listing the triples themselves. This implementation is called indirect triples. ▪ For example, let us use an array statement to list pointers to triples in the desired order. Then the triples shown above might be represented as follows: NEED FOR SEMANTIC ANALYSIS 18 ▪ Semantic analysis is a phase by a compiler that adds semantic information to the parse tree and performs certain checks based on this information. ▪ It logically follows the parsing phase, in which the parse tree is generated, and logically precedes the code generation phase, in which (intermediate/target) code is generated. (In a compiler implementation, it may be possible to fold different phases into one pass.) ▪ Typical examples of semantic information that is added and checked is typing information ( type checking ) and the binding of variables and function names to their definitions ( object binding ). ▪ Sometimes also some early code optimization is done in this phase. For this phase the compiler usually maintains symbol tables in which it stores what each symbol (variable names, function names, etc.) refers to. Semantic Errors 19 We have mentioned some of the semantics errors that the semantic analyzer is expected to recognize: ▪ Type mismatch ▪ Undeclared variable ▪ Reserved identifier misuse. ▪ Multiple declaration of variable in a scope. ▪ Accessing an out of scope variable. ▪ Actual and formal parameter mismatch. Syntax Directed Definitions 20 ▪ Syntax Directed Definitions are a generalization of context-free grammars in which: 1. Grammar symbols have an associated set of Attributes; 2. Productions are associated with Semantic Rules for computing the values of attributes. ▪ Such formalism generates Annotated Parse-Trees where each node of the tree is a record with a field for each attribute (e.g., X.a indicates the attribute a of the grammar symbol X). ▪ The value of an attribute of a grammar symbol at a given parse-tree node is defined by a semantic rule associated with the production used at that node. ATTRIBUTE GRAMMAR 21 ▪ Attributes are properties associated with grammar symbols. Attributes can be numbers, strings, memory locations, data types, etc. ▪ Attribute grammar is a special form of context-free grammar where some additional information (attributes) are appended to one or more of its non-terminals in order to provide context-sensitive information. ▪ Attribute grammar is a medium to provide semantics to the context-free grammar and it can help specify the syntax and semantics of a programming language. Attribute grammar (when viewed as a parse-tree) can pass values or information among the nodes of a tree. ▪ The right part of the CFG contains the semantic rules that specify how the grammar should be interpreted. Here, the values of non-terminals E and T are added together and the result is copied to the non-terminal E. ▪ Semantic attributes may be assigned to their values from their domain at the time of parsing and evaluated at the time of assignment or conditions. ATTRIBUTE TYPES 22 ▪ Based on the way the attributes get their values, they can be broadly divided into two categories : synthesized attributes and inherited attributes Synthesized Attributes 23 ▪ These are those attributes which get their values from their children nodes i.e. value of synthesized attribute at node is computed from the values of attributes at children nodes in parse tree. ▪ To illustrate, assume the following production: S -> ABC S.a= A.a,B.a,C.a ▪ If S is taking values from its child nodes (A, B, C), then it is said to be a synthesized attribute, as the values of ABC are synthesized to S. ▪ Computation of Synthesized Attributes Write the SDD using appropriate semantic rules for each production in given grammar. The annotated parse tree is generated and attribute values are computed in bottom up manner. The value obtained at root node is the final output. Synthesized Attributes: Example 24 ▪ Consider the following grammar: S --> E E --> E1 + T E --> T T --> T1 * F T --> F F --> digit ▪ The SDD for the above grammar can be written as follow Synthesized Attributes: Example 25 ▪ Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse tree for the input string is ANNOTATED PARSE TREE 26 ▪ The parse tree containing the values of attributes at each node for given input string is called annotated or decorated parse tree Inherited Attributes 27 ▪ These are the attributes which inherit their values from their parent or sibling nodes. i.e. value of inherited attributes are computed by value of parent or sibling nodes. ▪ EXAMPLE: A --> BCD { C.in = A.in, C.type = B.type } ▪ B can get values from A, C and D. C can take values from A, B, and D. Likewise, D can take values from A, B, and C. ▪ Computation of Inherited Attributes Construct the SDD using semantic actions. The annotated parse tree is generated and attribute values are computed in top down manner. Inherited Attributes: Example 28 ▪ Consider the following grammar: D --> T L T --> int T --> float T --> double L --> L1, id L --> id ▪ The SDD for the above grammar can be written as follow Inherited Attributes: Example 29 ▪ Let us assume an input string int a, c for computing inherited attributes. The annotated parse tree for the input string is ▪ The value of L nodes is obtained from T.type (sibling) which is basically lexical value obtained as int, float or double. ▪ Then L node gives type of identifiers a and c. The computation of type is done in top down manner or preorder traversal. ▪ Using function Enter_type the type of identifiers a and c is inserted in symbol table at corresponding id.entry. Implementing Syntax Directed Definitions 30 1. Dependency Graphs ▪ Implementing a Syntax Directed Definition consists primarily in finding an order for the evaluation of attributes ▪ Each attribute value must be available when a computation is performed. ▪ Dependency Graphs are the most general technique used to evaluate syntax directed definitions with both synthesized and inherited attributes. ▪ Annotated parse tree shows the values of attributes, dependency graph helps to determine how those values are computed Implementing Syntax Directed Definitions 31 ▪ The interdependencies among the attributes of the various nodes of a parse-tree can be depicted by a directed graph called a dependency graph. ▪ There is a node for each attribute; ▪ If attribute b depends on an attribute c there is a link from the node for c to the node for b (b ← c). ▪ Dependency Rule: If an attribute b depends from an attribute c, then we need to find the semantic rule for c first and then the semantic rule for b. Implementing Syntax Directed Definitions 32 Dependency graph for declaration statements. Evaluation order 33 ▪ A dependency graph characterizes the possible order in which we can calculate the attributes at various nodes of the parse tree. ▪ If there is an edge from node M to N, then the attribute corresponding to M first be evaluated before evaluating N. ▪ Thus the allowable orders of evaluation are N1,N2…..,Nk such that if there is an edge from Ni toNj then i