lec01-Lexical Anaysis (1).pdf

SECTION 1.1 LEXICAL ANALYSIS- INTRODUCTION LEXICAL ANALYZER  Lexical Analyzer reads the source program character by character to produce tokens.  Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token when the parser asks a token from it. source Lexical token program Parser Analyzer get next token Symbol Table ROLES OF THE LEXICAL ANALYSER Lexical analyzer performs following tasks:  Helps to identify token in the symbol table  Removes white spaces and comments from the source program  Correlates error messages with the source program  Helps you to expands the macros if it is found in the source program  Read input characters from the source program TOKENS, LEXEMES AND PATTERNS  Token: Token is a sequence of characters that can be treated as a single logical entity. Typical tokens are: Identifiers 2) keywords 3) operators 4) special symbols 5)constants  Lexeme: A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.  Pattern: A set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token. TOKENS, LEXEMES AND PATTERNS Token Lexeme Pattern (element of a kind ) ID x y n_0 letter followed by letters and digits NUM -123 any numeric constant 1.456e-5 IF if if LPAREN ( ( LITERAL ``Hello'' any string of characters (except ``) between `` and ``  Regular expressions are widely used to specify patterns. EXAMPLE #include int maximum(int x, int y){ // This will compare 2 numbers Tokens Generated Lexeme Token int Keyword maximu Identifier m Type Examples ( Operator Comment // This will compare 2 numbers int Keyword Pre- #include x Identifier processor directive , Operator Whitespace /n /b /t int Keyword Non-Tokens Y Identifier ) Operator { Operator TERMINOLOGY OF LANGUAGES  Alphabet : a finite set of symbols (ASCII characters)  String :  Finite sequence of symbols on an alphabet  Sentence and word are also used in terms of string   is the empty string  |s| is the length of string s.  Language: sets of strings over some fixed alphabet   the empty set is a language.  {} the set containing empty string is a language  The set of well-formed C programs is a language  The set of all possible identifiers is a language.  Operators on Strings:  Concatenation: xy represents the concatenation of strings x and y. OPERATIONS ON LANGUAGES  Concatenation:  L1L2 = { s1s2 | s1  L1 and s2  L2 }  Union  L1  L2 = { s | s  L1 or s  L2 }  Exponentiation:  L0 = {} L1 = L L2 = LL  Kleene Closure   L* = Li i =0  Positive Closure  L+ =  L i  i =1 EXAMPLE  L1 = {a,b,c,d} L2 = {1,2}  L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}  L1  L2 = {a,b,c,d,1,2}  L13 = all strings with length three (using a,b,c,d)  L1* = all strings using letters a,b,c,d and empty string  L1+ = doesn’t include the empty string REGULAR EXPRESSIONS  We use regular expressions to describe tokens of a programming language.  A regular expression is built up of simpler regular expressions (using defining rules)  Each regular expression denotes a language.  A language denoted by a regular expression is called as a regular set. REGULAR EXPRESSIONS (RULES) Regular expressions over alphabet  Reg. Expr Language it denotes  {} a  {a} (r1) | (r2) L(r1)  L(r2) (r1) (r2) L(r1) L(r2) (r)* (L(r))* (r) L(r)  (r)+ = (r)(r)*  (r)? = (r) |  REGULAR EXPRESSIONS (CONT.)  We may remove parentheses by using precedence rules.  * highest  concatenation next  | lowest  ab*|c means (a(b)*)|(c)  Ex:   = {0,1}  0|1 => {0,1}  (0|1)(0|1) => {00,01,10,11}  0* => { ,0,00,000,0000,....}  (0|1)* => all strings with 0 and 1, including the empty string REGULAR DEFINITIONS  To write regular expression for some languages can be difficult, because their regular expressions can be quite complex. In those cases, we may use regular definitions.  We can give names to regular expressions and we can use these names as symbols to define other regular expressions.  A regular definition is a sequence of the definitions of the form: d1 → r1 where di is a distinct name and d2 → r2 ri is a regular expression over symbols in. {d1,d2,...,di-1} dn → rn basic symbols previously defined names REGULAR DEFINITIONS (CONT.)  Ex: Identifiers in Pascal letter → A | B |... | Z | a | b |... | z digit → 0 | 1 |... | 9 id → letter (letter | digit ) *  If we try to write the regular expression representing identifiers without using regular definitions, that regular expression will be complex. (A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *  Ex: Unsigned numbers in Pascal digit → 0 | 1 |... | 9 digits → digit + opt-fraction → (. digits ) ? opt-exponent → ( E (+|-)? digits ) ? unsigned-num → digits opt-fraction opt-exponent NOTATIONAL SHORTHAND  The following shorthand are often used: r+ = rr* r? = r│ε [a-z] = a │ b │ c │ … │ z  Examples: digit → [0-9] digits → digit+ optional_fraction → (. digits)? optional_exponent → ( E (+ │ -)? digit+ )? num → digits optional_fraction optional_exponent RECOGNITION OF TOKENS  e.g. Regular Definitions stmt → if expr then stmt if → if │ if expr then stmt else stmtthen → then │ ε else → else expr → term relop term relop → < │ │ >= │ term id → letter (letter │digit)* term → id num →digits optional_fraction │ num optional_exponent Assumptions delim → blank │tab │newline TRANSITION DIAGRAMS relop → == start < = 0 1 2 return(relop, LE) > 3 return(relop, NE) other 4 * return(relop, LT) = 5 return(relop, EQ) > = 6 7 return(relop, GE) other 8 * return(relop, GT) id → letter ( letterdigit )* letter or digit start letter other 9 10 11 * return(gettoken(), install_id()) TRANSITION DIAGRAMS: CODE  token nexttoken() { while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { Decides the state = 0; lexeme_beginning++; next start state } else if (c==‘’) state = 6; else state = fail(); int fail() break; { forward = token_beginning; case 1: swith (start) { … case 0: start = 9; break; case 9: c = nextchar(); case 9: start = 12; break; if (isletter(c)) state = 10; case 12: start = 20; break; else state = fail(); case 20: start = 25; break; break; case 25: recover(); break; case 10: c = nextchar(); default: if (isletter(c)) state = 10; } else if (isdigit(c)) state = 10; return start; else state = 11; } break; … THE LEX AND FLEX SCANNER GENERATORS  Lex and its newer cousin flex are scanner generators  Systematically translate regular definitions into C source code for efficient scanning  Generated code is easy to integrate in C applications CREATING A LEXICAL ANALYZER WITH LEX AND FLEX lex source lex or flex lex.yy.c program compiler lex.l lex.yy.c C a.out compiler input sequence stream a.out of tokens LEX SPECIFICATION  A lex specification consists of three parts: regular definitions, C declarations in %{ %} %% translation rules %% user-defined auxiliary procedures  The translation rules are of the form: p1 { action1 } p2 { action2 } … pn { actionn } REGULAR EXPRESSIONS IN LEX x match the character x \. match the character. “string”match contents of string of characters. match any character except newline ^ match beginning of a line $ match the end of a line [xyz] match one character x, y, or z (use \ to escape -) [^xyz]match any character except x, y, and z [a-z] match one of a to z r* closure (match zero or more occurrences) r+ positive closure (match one or more occurrences) r? optional (match zero or one occurrence) r1 r2 match r1 then r2 (concatenation) r1|r2 match r1 or r2 (union) (r) grouping r1\r2 match r1 when followed by r2 {d} match the regular expression defined by d STAR OPERATION (KLEENE CLOSURE) a* = {a0, a1, a2, a3, a4,…. a∞} ={ε, a, aa, aaa, aaaa,….. a∞} Important Characteristics Value of * ranges from 0 to ∞ i.e. the elements of set a* will include {a0, a1, a2, a3, a4, a5…. a∞} a0 means zero number of a’s and this is represented by ε. * is represented in finite automata by a loop on that particular state; if value of a is 3 i.e. a3 loop iterates for 3 times. If value of a is 0 i.e. a0 loop will not iterate at all. a q2f m/c for a* POSITIVE CLOSURE a+ = {a1, a2, a3, a4,…., a ∞} = { a, aa, aaa, aaaa,….. a ∞} Important Characteristics value of + ranges from 1 to ∞ i.e. the elements of set a+ will include {a1, a2, a3, a4, a5…. a ∞} There is no a0 move i.e. ε is not part of this set. Value of a will start from 1 i.e. at least one will come which can be followed by 0 or more 1’s. Please remember: a+ = a.a* a a q0 q2f m/c for a+ CONCATENATION OPERATION Concatenation means joining (a.b) Important Note: a.b ≠ b.a i.e. order of join will change the design of automata a q0 qq2f m/c for a b m/c for b q0 q2f b b a a q0 q1 qq2f q0 q1 qqq2f f m/c for a.b m/c for b.a OR OPERATION a q0 qq2f m/c for a b q0 q2f m/c for b NFA for a+b (a/b) a q2f q0 m/c for a/b b q2f SECTION 1.2 INTRODUCTION TO FINITE AUTOMATA FINITE AUTOMATA Automata means machine Finite Automata consist of 5 tuples: M = (Q, Σ, δ, q0, F) Q A finite set of states Σ A finite set of input alphabet δ A transition function q0 The initial/starting state, q0 is in Q F A set of final/accepting states, which is a subset of F TYPES OF AUTOMATA There are two types of finite Automata: Deterministic Finite Automata (DFA) Non-deterministic finite Automata (NFA) DETERMINISTIC FINITE AUTOMATA Deterministic Finite Automata is a Machine where corresponding to a every input of Σ, there can be only one output from every state. b Here Σ = { a, b} and at every state there is one a q1 O/P from ‘a’ and one q0 a, O/P from ‘b’. None of b a b the states have more b q2 then one output corresponding to a or qf a b. NON-DETERMINISTIC FINITE AUTOMATA Non-Deterministic Finite Automata is a machine where corresponding to a single input of Σ (a,b), there can be more than one output from a particular state. b Here state q0 has two a moves from a, one to q0 q1 q1 and other to q2, a b like wise state q2 has a two moves on ‘b’ one b q2 qf self loop to q1 and b another to qf TYPES OF NFA There are two type of NFA i. NFA without ε -move ii. NFA with ε -move NFA WITH Ε-MOVE Consider the following NFA, here corresponding q1 there is an ε-move. a, b a q1 q0 a ε a,b qf DIFFERENCE BETWEEN DFA AND NFA Deterministic Finite Non-Deterministic Finite Automata Automata  Deterministic Finite  Non-Deterministic Automata is a Machine Finite Automata is a where corresponding to a machine where every input of Σ, there corresponding to a can be only one output single input of Σ (a,b), from every state. there can be more than  DFA will not have ε- one output from a move particular state.  NFA can have ε-move SECTION 1.3 THOMSON’S CONSTRUCTION THOMPSON’S CONSTRUCTION We have three operations on Regular Expressions: i) Star operation ii) Concatenation iii) OR operation For each operation we have defined rules to build a NFA with ε-move Thompson’s Construction for Star Operation a* = {ε, a, aa, aaa, aaaa,…..} a qf NFA for a* NFA for a* using Thomson’s Construction: ε ε ε q0 q1 q2 qf a ε Thompson’s Construction for Star Operation NFA for a* using Thomson’s Construction: ε Only ε ε ε q0 q1 q2 qf a ε ε Single a ε ε q0 q1 q2 qf a ε Thompson’s Construction for Star Operation NFA for a* using Thomson’s Construction: ε Two a’s ε ε q0→q1→q2→q1→q2→qf q0 q1 q2 qf a ε ε N number of a’s q0→q1→q2→q1→q2→qf ε ε q1→q2→q1 loops for N q0 q1 q2 qf a times where N varies from 2 to ∞ ε THOMPSON’S CONSTRUCTION FOR CONCATENATION OPERATION a NFA for a q0 qf b NFA for b q0 qf NFA for ab using Thomson’s Construction a b q0 q1 qf THOMPSON’S CONSTRUCTION FOR OR OPERATION a NFA for a q0 qf b NFA for b q0 qf NFA for a+b (a/b) using Thomson’s Construction a ε ε q1 q2 q0 qf ε b q4 ε q3 THOMPSON’S CONSTRUCTION FOR AA*B Question 1 a Thompson’s for a: q0 qf b Thompson’s for b: q0 qf ε Thompson’s for a*: ε ε q0 q1 q2 qf a ε THOMPSON’S CONSTRUCTION FOR a*b(a/b) Question 1 Thompson’s Construction for aa*b: ε a ε ε b q0 q1 q2 q3 q4 qf a ε NFA using Thompson’s Construction a a q0 q1 qf b NFA without Thompson’s THOMPSON’S CONSTRUCTION FOR a*b(a/b) Question 2 ε Thompson’s for a*: ε ε q0 q1 q2 qf a ε b Thompson’s for b: q0 qf a ε ε q1 q2 Thompson’s for a/b: q0 qf ε b q4 ε q3 THOMPSON’S CONSTRUCTION FOR a*b(a/b) Question 2 NFA using Thompson’s Construction ε a ε ε q5 q6 ε ε b qf q0 q1 q2 q3 q4 a b ε q7 q8 ε ε a b a,b q0 q1 qf NFA without Thompson’s THOMPSON’S CONSTRUCTION FOR (a/b/c) ε q1 a q2 ε Question 3 b ε qf Three ε out moves moves from a q0 q3 q4 ε state are not allowed c q6 ε ε q5 ε a ε q1 q2 ε b qf q0 ε q4 q6 ε ε q3 q8 ε ε c q5 q7 ε Final Output THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4 a ε ε q1 q2 Thompson’s for a/b: q0 qf ε b q4 ε q3 Thompson’s for (a/b)*: ε a ε ε q2 q3 ε q1 q6 ε q0 qf ε b q5 ε q4 ε THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4 a Thompson’s for a: q0 qf b Thompson’s for b: q0 qf Thompson’s for (a/b)*: ε a ε ε q2 q3 ε q1 q6 ε qf q0 ε b q5 ε q4 ε THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4 ε a ε ε q4 q5 a b ε q3 q8 ε q0 q1 q2 qf ε b ε q6 q7 ε NFA using Thompson’s Construction a,b a b q0 q1 qf NFA without Thompson’s SECTION 1.4 SUBSET CONSTRUCTION HOW TO WORK WITH Ε-CLOSURE FUNCTION Steps for ε-Closure function: First step is to take ε-Closure of the start state , for e.g. if the start state is 0 so take ε-Closure(0). ε-Closure(n) will include set of all the states which can be traversed from state n without consuming any input i.e. through ε move only. Most Imp.- “ε-Closure of a state will include that state itself in the set”, i.e. ε-Closure(n) will include n in its set of states. SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 4 5 ε ε State a b Start with the start state: state 0 A ε-closure(0):{0,1,2,4,7} = A (0,1,2,4,7) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 4 5 ε ε Start with the start state: ε-closure(0):{0,1,2,4,7} = A State a b (A, a)= ({0,1,2,4,7}, a) = {0,a} ⋃{1,a} ⋃{2,a} ⋃{4,a} ⋃{7,a} A = Φ ⋃ Φ ⋃{3} ⋃ Φ ⋃ {8} (0,1,2,4,7) = ε -closure (3) ⋃ ε -closure (8) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 5 ε 4 ε State a b (A, a)= ε -closure (3) ⋃ ε -closure (8) A B = {1,2,3,4,6,7} U {8} (0,1,2,4,7) (1,2,3,4,6,7,8 ) = {1,2,3,4,6,7,8}=B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 5 ε 4 ε State a b (A, b)= ({0,1,2,4,7}, b) A B ={0,b} ⋃{1,b} ⋃{2,b}⋃{4,b} ⋃{7,b} (0,1,2,4,7) (1,2,3,4,6,7,8) = Φ ⋃ Φ ⋃ Φ ⋃{5} ⋃ Φ = ε -closure (5) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 4 5 ε ε State a b (A, b)= ε -closure (5) A B C = {1,2,4,5,6,7}=C (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε ε a b 1 6 7 8 0 9 ε b 4 5 ε ε (B, a)= ({1,2,3,4,6,7,8}, a) State a b = {1,a}⋃{2,a} ⋃{a,a} ⋃{4,a}⋃{6,a}⋃{7,a} ⋃{8,a} A B C = Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ {8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (3) ⋃ ε -closure (8) B B = {1,2,3,4,6,7,8}=B (Slide No. 55) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b ε 4 5 ε (B, b)= ({1,2,4,5,6,7,8}, b) State a b ={1,b} ⋃{2,b} ⋃{4,b} ⋃{5,b} ⋃{6,b} ⋃{7,b} ⋃{8,b A B C = Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ{9} (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (5) ⋃ ε -closure (9) B B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (B, b) = ε -closure (5) ⋃ ε -closure (9) State a b = {1,2,4,5,6,7,9}=D A B C (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) B B D (1,2,4,5,6,7,9) SUBSET CONSTRUCTION FOR(a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (C, a)= ({1,2,4,5,6,7}, a) State a b = {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} A B C = Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8 (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (3) ⋃ ε -closure (8) B B D = {1,2,3,4,6,7,8}=B (Slide no. 55) (1,2,4,5,6,7,9) C B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε ε a b 0 1 6 7 8 9 ε b 5 ε 4 ε (C, b)= ({1,2,4,5,6,7}, b) State a b = {1,b} ⋃{2,b} ⋃{4,b}⋃{5,b} ⋃{6,b} ⋃{7,b} A B C = Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57) B B D (1,2,4,5,6,7,9) C B C SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (D, a)= ({1,2,4,5,6,7,9}, a) State a b = {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} ⋃{9,a} A B C = Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (3) ⋃ ε -closure (8) B B D (1,2,4,5,6,7,9) = {1,2,3,4,6,7,8}=B (Slide no. 55) C B C D B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (D, b)= ({1,2,4,5,6,7,9}, b) State a b = {1,b}⋃{2,b}⋃{4,b}⋃{5,b}⋃{6,b} ⋃{7,b} ⋃{9,b} A B C = Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57) B B D (1,2,4,5,6,7,9) C B C D B C SUBSET CONSTRUCTION FOR (a/b)*ab b C State a b b A B C b a (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) B B D a B A (1,2,4,5,6,7,9) C B C a a b D B C qD2 Here state A is start state since set ‘A’ has state ‘0’ in its subset which is Final Output start state in the NFA with Thompson’s construction. D is final state since the set D has state ‘9’ which is final state in the NFA with Thompson’s Construction Ε-CLOSURE(T) push all states of T onto stack initialize ϵ-closure(T) to T while (stack is not empty) do begin pop t, the top element, off stack; for (each state u with an edge from t to u labelled ϵ do begin if (u is not in ϵ-closure(T)) do begin add u to ϵ-closure(T) push u onto stack end end end CONVERTING A NFA INTO A DFA (SUBSET CONSTRUCTION) put -closure({s0}) as an unmarked state into the set of DFA (DS) while (there is one unmarked S1 in DS) do -closure({s0}) is the set of all states can be accessible from s0 by -transition. begin mark S1 set of states to which there is a transition on for each input symbol a do a from a state s in S1 begin S2  -closure(move(S1,a)) if (S2 is not in DS) then add S2 into DS as an unmarked state transfunc[S1,a]  S2 end end  a state S in DS is an accepting state of DFA if a state s in S is an accepting state of NFA  the start state of DFA is -closure({s0}) SECTION 1.5 RE TO DFA THROUGH SYNTAX TREE METHOD OR DIRECT METHOD CONVERTING REGULAR EXPRESSIONS DIRECTLY TO DFAS  Important state  We may convert a regular expression into a DFA (without creating a NFA first).  First we augment the given regular expression by concatenating it with a special symbol #. r ➔ (r)# augmented regular expression  Then, we create a syntax tree for this augmented regular expression.  In this syntax tree, all alphabet symbols (plus # and the empty string) in the augmented regular expression will be on the leaves, and all inner nodes will be the operators in that augmented regular expression.  Then each alphabet symbol (plus #) will be numbered (position numbers). FROM REGULAR EXPRESSION TO DFA DIRECTLY: SYNTAX TREE OF (a/b)*abb# concatenation # 6 b closure 5 b 4 a * 3 alternation | position number a b (for leafs ) 1 2 FROM REGULAR EXPRESSION TO DFA DIRECTLY: ANNOTATING THE TREE  nullable(n): the subtree at node n generates languages including the empty string  firstpos(n): set of positions that can match the first symbol of a string generated by the subtree at node n  lastpos(n): the set of positions that can match the last symbol of a string generated by the subtree at node n  followpos(i): the set of positions that can follow position i in the tree FROM REGULAR EXPRESSION TO DFA DIRECTLY: ANNOTATING THE TREE Node n nullable(n) firstpos(n) lastpos(n) Leaf  true   Leaf i false {i} {i} | nullable(c1) firstpos(c1) lastpos(c1) / \ or ꓴ ꓴ c1 c2 nullable(c2) firstpos(c2) lastpos(c2) if nullable(c1) then if nullable(c2) then nullable(c1) firstpos(c1) ꓴ lastpos(c1) ꓴ / \ and c1 c2 firstpos(c2) lastpos(c2) nullable(c2) else firstpos(c1) else lastpos(c2) * | true firstpos(c1) lastpos(c1) c1 FROM REGULAR EXPRESSION TO DFA DIRECTLY: SYNTAX TREE OF (a/b)*abb# {1, 2, 3} {6} {1, 2, 3} {5} {6} # {6} 6 {1, 2, 3} {4} {5} b {5} nullable 5 {1, 2, 3} {3} {4} b {4} 4 a {3} firstpos lastpos {1, 2} {1, 2} {3} * 3 {1, 2} | {1, 2} {1} a {1} {2} b {2} 1 2 FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE Node followpos (a/b)*a b b # 1 {1, 2, 3} 2 {1, 2, 3} 1 2 34 5 6 3 {4} 4 {5} 5 {6} 6 - FROM RE TO DFA DIRECTLY (a/b)*a b b # Let {1,2,3}=A A,a ({1,2,3},a) followpos (1) ꓴ {1,2,3,4} B 1 2 34 5 6 followpos(3) Node Symbol followpos Name A,b ({1,2,3},b) followpos (2) {1,2,3} A 1 a {1, 2, 3} B,a ({1,2,3,4},a followpos (1) ꓴ {1,2,3,4} B 2 b {1, 2, 3} ) followpos(3) 3 a {4} B,b ({1,2,3,4},b followpos (2) ꓴ {1,2,3,5} C 4 b {5} ) followpos(4) 5 b {6} C,a ({1,2,3,5},a followpos (1) ꓴ {1,2,3,4} B 6 # - ) followpos(3) State a b C,b ({1,2,3,5},b followpos (2) ꓴ {1,2,3,6} D A B A ) followpos(5) B B C D,a ({1,2,3,6},a followpos (1) ꓴ {1,2,3,4} B C B D ) followpos(3) D B A D,b ({1,2,3,6},b followpos (2) {1,2,3} A ) FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE Node followpos b b 1 {1, 2, 3} a 2 {1, 2, 3} start a 1,2, b 1,2, b 1,2, 3 {4} 1,2,3 3,4 3,5 3,6 4 {5} a 5 {6} a 6 - DIFFERENT DFA’S FOR (a/b)*abb b State a b C b A B C b a B B D a a b C B C A B D EE b a D B E a E B C b b State a b a A B A start a 1,2, b 1,2, b 1,2, 1,2,3 B B C 3,4 3,5 3,6 C A D a D B A FROM REGULAR EXPRESSION TO DFA DIRECTLY: FOLLOWPOS for each node n in the tree do if n is a cat-node with left child c1 and right child c2 then for each i in lastpos(c1) do followpos(i) := followpos(i)  firstpos(c2) end do else if n is a star-node for each i in lastpos(n) do followpos(i) := followpos(i)  firstpos(n) end do end if end do FROM REGULAR EXPRESSION TO DFA DIRECTLY: ALGORITHM s0 := firstpos(root) where root is the root of the syntax tree Dstates := {s0} and is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a   do let U be the set of positions that are in followpos(p) for some position p in T, such that the symbol at position p is a if U is not empty and not in Dstates then add U as an unmarked state to Dstates end if Dtran[T,a] := U end do end do SECTION 1.6 MINIMIZATION OF DFA Question 1 MINIMIZATION THE FOLLOWING DFA, IF POSSIBLE a a B A b a a a b C b D E b b USING FINAL AND NON FINAL STATE Divide the entire set of states into two subsets: Set of final States and set of non final states. Consider each sub-set as a separate entity and identify if they need to be split further or can they be combined together Question 1 DFA MINIMIZATION USING PARTITIONING METHOD a a B A Stat a b b a → e a a A B C b C b B B D D E b C B C D B E b * E B C Draw the transition table corresponding to the given DFA Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Divide the states into two subsets- final and non-final State a b → A B C B B D Set of non Final States (NF): {A,B,C, D} C B C Set of Final States (F): {E} D B E * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C,D) with Σ=a NF= {A,B,C,D} State a b F= {E} → A B C A,B,C ,D B B D C B C E D B E * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C,D) with Σ=b A,B,C ,D NF= ({A,B,C} {D}) F= {E} State a b → A B C A,B,C D B B D b Split into two since C B C E {A,B,C} goes on D B E states within {A,B,C) while state D goes to * E B C State {E} Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C) with Σ=a NF= ({A,B,C}, {D}) State a b → A B C A,B,C D B B D C B C E NO SPLIT D B E * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C) with Σ=b A,B,C B NF= ({A,C}, {B} State a b {D}) → A B C A,C b B B D D C B C Split into two since {A,C} goes to state D B E E {C} while {B} goes * E B C to State {D} which is already separated. Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,C) with Σ=a NO SPLIT B NF= ({A,C}, {B} State a b {D}) → A B C A,C D B B D C B C E Both A and C go to state B which is D B E already separated * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,C) with Σ=b NO SPLIT NF= ({A,C}, {B} {D}) B State a b Both A and C state → go to same group A B C A,C D {A,C} on Σ=b B B D Since subset {A,C} C B C E remain as single D B E combined state till end, both states will * E B C be joined together as a single state State a b State a b DFA MINIMIZATION → A B C A,C B A,C → USING PARTITIONING METHOD B B D B B D C B C D B E D B E * E B A,C * E B C a b a a B a A A, B C b a a a a a b C b b D E E b D b b b Final Output Question 2 MINIMIZATION THE FOLLOWING DFA, IF POSSIBLE b a a b a A B C D a a b b b b b a E F G H b a a Question 2 DFA MINIMIZATION USING PARTITIONING METHOD b a State a b a b a C D → A B F A B a B G C a b C A C b b * D C G b b a E H F E F G H F C G b a G G E a H G C Draw the transition table corresponding to the given DFA Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Divide the states into two subsets- final and non-final State a b → A B F B G C * C A C D C G Set of Non Final States (NF): {A,B,D,E,F,G,H} E H F Set of Final States (F): {C} F C G G G E H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,D,E,F,G,H) with Σ=a State a b A,B,D,E NF= {A,B,E,G,H}, {D,F} → A , F,G,H B F B G C * C A C A,B,E, D,F D C G G,H E H F a Split into two since F C G {A,B,E,G,H} go to G G E C state states within its H G C set while {D,F} goes to State {C} Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,E,G,H) with Σ=a State a b NO SPLIT → A B F B G C * C A C A,B,E, D,F D C G G,H E H F a F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,E,G,H) with Σ=b NF= {A,E},{G},{B,H},{D,F} State a b A,B,E, → A B F G,H B G C D,F * C A C A,E B,H D C G b E H F G F C G b G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,E) with Σ=a State a b NO SPLIT → A B F NF= {A,E},{G},{B,H},{D,F} B G C * C A C A,E B,H D C G E H F G D,F F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,E) with Σ=b NO SPLIT State a b → A B F NF= {A,E},{G},{B,H},{D,F} B G C D,F * C A C A,E B,H D C G E H F G F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (B,H) with Σ=a State a b NO SPLIT → A B F B G C NF= {A,E},{G},{B,H},{D,F} * C A C A,E B,H D C G E H F G D,F F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (B,H) with Σ=b State a b NO SPLIT → A B F B G C NF= {A,E},{G},{B,H},{D,F} * C A C A,E B,H D C G E H F G D,F F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (D,F) with Σ=a State a b NO SPLIT → A B F NF= {A,E},{G},{B,H},{D,F} B G C * C A C A,E B,H D C G E H F G D,F F C G a G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (D,F) with Σ=b State a b NO SPLIT → A B F NF= {A,E},{G},{B,H},{D,F} B G C * C A C A,E B,H D,F D C G E H F F C G G G G E C H G C State a b State a b DFA MINIMIZATION USING → A B B G F C → B, H A,E B,H G D,F C PARTITIONING METHOD * C D A C C G * C A,E C E H F D,F C G b G G A,E a F C G G G E H G C b b a b a a A B C D a a b a D, A, B, a b H C F b E b a b b b a a E F G H a b a G a a b Final Output THANKS

lec01-Lexical Anaysis (1).pdf

Document Details

Related

Full Transcript