Compiler Design (21CS3001) Unit 3 - PDF
Document Details
Uploaded by Deleted User
SRM University
Atif Mahmood
Tags
Summary
These lecture notes cover Compiler Design (21CS3001) Unit 3. Topics include formal grammar, various categories of grammars, ambiguity in context-free grammars (CFGs), and fundamental parsing methods.
Full Transcript
COMPILER DESIGN (21CS3001) UNIT-3 Atif Mahmood Asst. Professor-CSED...
COMPILER DESIGN (21CS3001) UNIT-3 Atif Mahmood Asst. Professor-CSED SRM University, Delhi-NCR-Sonipat Disclaimer: The contents in the slides are assembled and modified by the instructor, purely for academic & teaching purposes with great acknowledgement of many others who made their course material freely available over internet. Instructor do not claims about the entire contents as his own. Todays Agenda Formal Grammar and their application to Syntax Analysis Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Categories of Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Formal Grammar Ambiguity in CFG Formal Grammar Todays Agenda Ambiguity & Capabilities of CFG Left Recursion and Left Factoring Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG How to resolve ? Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Ambiguity & Capabilities of CFG Capabilities of CFG: ▪ Context free grammar is useful to describe most of the programming languages. ▪ If the grammar is properly designed then an efficient parser can be constructed automatically. ▪ Using the features of associatively and precedence information, grammars for expressions can be constructed. ▪ Context free grammar is capable of describing nested structures like : balanced parenthesis, matching begin-end, corresponding if- then-else’s and so on. Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Left Recursion and Left Factoring ▪ This is the grammar which will not suffer from non-determinism. ▪ But will it be accepted by top-down parsers? No Left Recursion and Left Factoring Non-Deterministic Deterministic Left Recursion and Left Factoring ▪ Even if we remove the Non-Determinism, that doesn’t guarantee unambiguity. ▪ So lets understand, Why this intermediate form is ambiguous? Left Recursion and Left Factoring Left Recursion and Left Factoring Some more examples on left factoring Left Recursion and Left Factoring Left Recursion and Left Factoring Todays Agenda Basic Parsing Techniques Top-Down Parsing & Bottom-Up Parsing Basic Parsing Techniques ▪ A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. ▪ The parser analyzes the source code (token stream) against the production rules to detect any errors in the code. ▪ The output of this phase is a parse tree. Working principle Syntax Analyzer Basic Parsing Techniques Basic Parsing Techniques In a parse tree: ▪ 1. All leaf nodes are terminals. ▪ 2. All interior nodes are non-terminals. ▪ 3. In-order traversal gives original input string. Limitations of Syntax Analyzers ▪ It cannot determine if a token is valid. ▪ It cannot determine if a token is declared before it is being used. ▪ It cannot determine if a token is initialized before it is being used. ▪ It cannot determine if an operation performed on a token type is valid or not. These tasks are accomplished by the semantic analyzer. Basic Parsing Techniques Basic Parsing Techniques Basic Parsing Techniques Top-Down Parser: ▪ The top-down parser is the parser that generates parse for the given input string with the help of grammar productions by expanding the non-terminals i.e. it starts from the start symbol and ends on the terminals. ▪ It uses left most derivation. Bottom-Up Parser: ▪ Bottom-up Parser is the parser that generates the parse tree for the given input string with the help of grammar productions by compressing the non-terminals i.e. it starts from non-terminals and ends on the start symbol. ▪ It uses the reverse of the rightmost derivation. Basic Parsing Techniques Problems with Top-down Parsing There are certain problems in top-down parsing. In order to implement the parsing, we need to eliminate these problems. 1. Backtracking ▪ Backtracking is a technique in which for expansion of non-terminal symbol we choose one alternative and if some mismatch occurs then we try another alternative if any. ▪ If for a non-terminal there are multiple production rules beginning with the same input symbol then to get the correct derivation we need to try all these alternatives. ▪ In backtracking, we need to move some levels upward in order to check the possibilities. This increase a lot of overhead in the implementation of parsing. And hence it becomes necessary to eliminate the backtracking by modifying the grammar. Basic Parsing Techniques Backtracking required After Backtracking Basic Parsing Techniques 2. Left Recursion ▪ A left recursion grammar is a grammar which is as given below. A→ Aα ▪ If left recursion is present in the grammar then it creates serious problems. Because of left recursion the top down parser can enter in infinite loop. ▪ To eliminate left recursion we need to modify the grammar. Let G be a context free grammar having a production rule with left recursion. A→ Aα A→ β Then we eliminate left recursion by re-writing the production rule as: A→ β A’ A’→ α A’ A’→ ℇ Basic Parsing Techniques 3. Left Factoring ▪ If the grammar is left factored then it becomes suitable for the use. ▪ Basically left factoring is used when it is not clear that which of the two alternatives is used to expand the non-terminal. ▪ By left factoring we may be able to re-write the production in which the decision can be deferred until enough of the input is seen to make the right choice. Basic Parsing Techniques 4. Ambiguity ▪ Ambiguous grammar is not desirable in top-down parsing. ▪ For removing the ambiguity we will apply one rule: If grammar has left associativity operator (such as: +, -, *, /) then induce the left recursion, and if the grammar has right associative operator (exponential operator) then induce the right recursion. Basic Parsing Techniques Basic Parsing Techniques Basic Parsing Techniques Basic Parsing Techniques Basic Parsing Techniques Recursive Descent Parser Basic Parsing Techniques LL(1) Parser Basic Parsing Techniques Basic Parsing Techniques LL(1) Parser ▪. ▪. ▪. ▪. ▪. Basic Parsing Techniques Rules For Calculating First Function ▪ Rule-1: For a production rule X → ∈, First(X) = { ∈ } ▪ Rule-2: For any terminal symbol ‘a’, First(a) = { a } ▪ Rule-3: For a production rule X → Y1Y2Y3, Calculating First(X) If ∈ ∉ First(Y1), then First(X) = First(Y1) If ∈ ∈ First(Y1), then First(X) = { First(Y1) – ∈ } ∪ First(Y2Y3) Basic Parsing Techniques Calculating First(Y2Y3) If ∈ ∉ First(Y2), then First(Y2Y3) = First(Y2) If ∈ ∈ First(Y2), then First(Y2Y3) = { First(Y2) – ∈ } ∪ First(Y3) Similarly, we can make expansion for any production rule X → Y1Y2Y3…..Yn. Rules For Calculating Follow Function ▪ Rule-1: For the start symbol S, place $ in Follow(S). ▪ Rule-2: For any production rule A → αB, Follow(B) = Follow(A) ▪ Rule-3: For any production rule A → αBβ If ∈ ∉ First(β), then Follow(B) = First(β) If ∈ ∈ First(β), then Follow(B) = { First(β) – ∈ } ∪ Follow(A) Basic Parsing Techniques Important Notes: Note-1: ∈ may appear in the first function of a non-terminal. ∈ will never appear in the follow function of a non-terminal. Note-2: Before calculating the first and follow functions, eliminate Left Recursion from the grammar, if present. Note-3: We calculate the follow function of a non-terminal by looking where it is present on the RHS of a production rule. Basic Parsing Techniques Example 1. Basic Parsing Techniques Example 2. Basic Parsing Techniques Example 3. Basic Parsing Techniques Example 4. Basic Parsing Techniques Example 5. Basic Parsing Techniques Example 6. Basic Parsing Techniques Example 7. Basic Parsing Techniques Example 8. Basic Parsing Techniques Example-9: Consider the Grammar: E → TE’ E’→ +TE' | ε T → FT’ T’→ *FT' | ε F → id | (E) Calculate FIRST() and FOLLOW() Solution: Basic Parsing Techniques FIRST & FOLLOW Parsing Table Basic Parsing Techniques To parse the given grammar using LL(1) parsing table consider a string: id + id * id Basic Parsing Techniques Rules to remember: ▪ All the ε-productions are placed under FOLLOW sets ▪ Remaining productions are placed under the FIRSTs Basic Parsing Techniques Example-10: Consider the Grammar: S → iEtSS’ | a S’ → eS |ε E→b Calculate FIRST() and FOLLOW() Basic Parsing Techniques FIRST & FOLLOW Parsing Table Basic Parsing Techniques No ambiguous grammar or left recursive grammar can be LL(1). Thus , the given grammar is not LL(1). Basic Parsing Techniques Draw the LL(1) parsing table for the given grammar Basic Parsing Techniques Because of X→ Xb and X → ϵ , going in same block, given grammar is not LL(1). Difference between Top Down Parsing and Bottom Up Parsing. Todays Agenda Bottom-Up Parsing ✓ Shift-Reduce Parser ✓ LR Parser Bottom-up Parsing ▪ Bottom-up parsing is more general than top-down parsing. ▪ Bottom-up parsers handle a large class of grammars. ▪ It is the preferred method in practice. ▪ It is also called LR parsing; o L means that tokens are read left to right and, o R means that the parser constructs a rightmost derivation. ▪ LR parsers do not need left-factored grammars. ▪ LR parsers can handle left-recursive grammars. ▪ LR parsing reduces a string to the start symbol by inverting productions. Bottom-up Parsing Consider the grammar S → aABe A → Abc | b B→ d The sentence abbcde can be reduced to S: abbcde aAbcde aAde aABe S Bottom-up Parsing These reductions, in fact, trace out the following right-most derivation in reverse: S → aABe → aAde → aAbcde → abbcde Bottom-up Parsing Handle ▪ The handle is the substring that matches the body of a production whose reduction represents one step in right-most derivation of input stream. Example: S → aABe A → Abc | b B→d Steps: abbcde : γ = abbcde , A→b; Handle = b aAbcde : γ = aAbcde , A→Abc; Handle = Abc aAde : γ = aAde , B→d; Handle = d aABe : γ = aABe, S→ aABe; Handle = aABe S Bottom-up Parsing Handle Pruning ▪ Removing the children of the left-hand side non-terminal from the parse tree is called Handle Pruning. ▪ A rightmost derivation in reverse can be obtained by handle pruning. Example Bottom-up Parsing Example Bottom-up Parsing Notation ▪ Split input into two substrings ✓ Right substring (a string of terminals) is as yet unexamined by parser ✓ Left substring has terminals and non-terminals ▪ The dividing point is marked by a ✓ The is not part of the string ▪ Initially, all input is unexamined: x1x2... xn Bottom-up Parsing Shift-Reduce Parsing Bottom-up parsing uses only two kinds of actions: Shift and Reduce ▪ Shift: Pushes a terminal on the stack. ✓ Move one place to the right ✓ Shifts a terminal to the left string E + ( int ) E + (int ) ▪ Reduce: Pops 0 or more symbols off of the stack. And pushes a non-terminal on the stack (production LHS) ✓ Apply an inverse production at the right end of the left string ✓ If E → E + ( E ) is a production, then E + (E + ( E ) ) E +(E ) Bottom-up Parsing Shift–Reduce Parsing Algorithm push $ onto stack sym ←nextToken() repeat until (sym == $ and the stack contains exactly Goal on top of $) if a handle for A → β on top of stack pop |β| symbols off the stack push A onto the stack else if (sym ≠ $ ) push sym onto stack sym← nextToken() else report error and halt Bottom-up Parsing Shift-Reduce Example Bottom-up Parsing When to Shift or Reduce Idea: ▪ Use a finite automaton (FA) to decide when to shift or reduce ✓ The input is the stack ✓ The language consists of terminals and non-terminals ▪ We run the FA on the stack and we examine the resulting state X and the token tok after ✓ If X has a transition labeled tok then shift ✓ If X is labeled with “A → b on tok” then reduce Bottom-up Parsing Anatomy of an LR Parser Bottom-up Parsing LR Parser Algorithm Bottom-up Parsing LR(0) Parser ▪ Left-to-right scanning, Right-most derivation, “zero” look-ahead characters ▪ Too weak to handle most language grammars. ▪ But will help us lay the groundwork for more sophisticated parsers Bottom-up Parsing LR(0) States ▪ A state is a set of items keeping track of progress on possible upcoming reductions ▪ An LR(0) item is a production from the language with a separator “.” somewhere in the RHS of the production ▪ Stuff before “.” is already on stack (beginnings of possible γ’s to be reduced) ▪ Stuff after “.” : what we might see next ▪ The prefixes represented by state itself Bottom-up Parsing Start States & Closure S →( L ) | id L →S | L , S Constructing a DFA to read stack: ▪ First step: augment grammar with prod's S’ →S $ ▪ Start state of DFA: empty stack = S’ →. S $ ▪ Closure of a state adds items for all productions whose LHS occurs in an item in the state, just after “.” ✓ Set of possible productions to be reduced next ✓ Added items have the “.” located at the beginning: no symbols for these items on the stack yet. Bottom-up Parsing Applying Terminal Symbols S →( L ) | id L →S | L , S In new state, include all items that have appropriate input symbol just after dot, advance dot in those items, and take closure. Bottom-up Parsing Applying Non-Terminal Symbols S →( L ) | id L →S | L , S ▪ Non-terminals on stack treated just like terminals (except added by reductions) Bottom-up Parsing Applying Reduce Action S →( L ) | id L →S | L , S ▪ Pop RHS off stack, replace with LHS X (X→γ) Bottom-up Parsing Full DFA S →( L ) | id L →S | L , S ▪ reduce-only state: reduce ▪ if shift transition for look-ahead: shift otherwise: syntax error ▪ current state: push stack through DFA Thank You