lec01-Lexical Anaysis (1).pdf
Document Details
Uploaded by CommendableOsmium
Full Transcript
SECTION 1.1 LEXICAL ANALYSIS- INTRODUCTION LEXICAL ANALYZER Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token when the parser asks a token...
SECTION 1.1 LEXICAL ANALYSIS- INTRODUCTION LEXICAL ANALYZER Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical analyzer doesn’t return a list of tokens at one shot, it returns a token when the parser asks a token from it. source Lexical token program Parser Analyzer get next token Symbol Table ROLES OF THE LEXICAL ANALYSER Lexical analyzer performs following tasks: Helps to identify token in the symbol table Removes white spaces and comments from the source program Correlates error messages with the source program Helps you to expands the macros if it is found in the source program Read input characters from the source program TOKENS, LEXEMES AND PATTERNS Token: Token is a sequence of characters that can be treated as a single logical entity. Typical tokens are: Identifiers 2) keywords 3) operators 4) special symbols 5)constants Lexeme: A lexeme is a sequence of characters in the source program that is matched by the pattern for a token. Pattern: A set of strings in the input for which the same token is produced as output. This set of strings is described by a rule called a pattern associated with the token. TOKENS, LEXEMES AND PATTERNS Token Lexeme Pattern (element of a kind ) ID x y n_0 letter followed by letters and digits NUM -123 any numeric constant 1.456e-5 IF if if LPAREN ( ( LITERAL ``Hello'' any string of characters (except ``) between `` and `` Regular expressions are widely used to specify patterns. EXAMPLE #include int maximum(int x, int y){ // This will compare 2 numbers Tokens Generated Lexeme Token int Keyword maximu Identifier m Type Examples ( Operator Comment // This will compare 2 numbers int Keyword Pre- #include x Identifier processor directive , Operator Whitespace /n /b /t int Keyword Non-Tokens Y Identifier ) Operator { Operator TERMINOLOGY OF LANGUAGES Alphabet : a finite set of symbols (ASCII characters) String : Finite sequence of symbols on an alphabet Sentence and word are also used in terms of string is the empty string |s| is the length of string s. Language: sets of strings over some fixed alphabet the empty set is a language. {} the set containing empty string is a language The set of well-formed C programs is a language The set of all possible identifiers is a language. Operators on Strings: Concatenation: xy represents the concatenation of strings x and y. OPERATIONS ON LANGUAGES Concatenation: L1L2 = { s1s2 | s1 L1 and s2 L2 } Union L1 L2 = { s | s L1 or s L2 } Exponentiation: L0 = {} L1 = L L2 = LL Kleene Closure L* = Li i =0 Positive Closure L+ = L i i =1 EXAMPLE L1 = {a,b,c,d} L2 = {1,2} L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2} L1 L2 = {a,b,c,d,1,2} L13 = all strings with length three (using a,b,c,d) L1* = all strings using letters a,b,c,d and empty string L1+ = doesn’t include the empty string REGULAR EXPRESSIONS We use regular expressions to describe tokens of a programming language. A regular expression is built up of simpler regular expressions (using defining rules) Each regular expression denotes a language. A language denoted by a regular expression is called as a regular set. REGULAR EXPRESSIONS (RULES) Regular expressions over alphabet Reg. Expr Language it denotes {} a {a} (r1) | (r2) L(r1) L(r2) (r1) (r2) L(r1) L(r2) (r)* (L(r))* (r) L(r) (r)+ = (r)(r)* (r)? = (r) | REGULAR EXPRESSIONS (CONT.) We may remove parentheses by using precedence rules. * highest concatenation next | lowest ab*|c means (a(b)*)|(c) Ex: = {0,1} 0|1 => {0,1} (0|1)(0|1) => {00,01,10,11} 0* => { ,0,00,000,0000,....} (0|1)* => all strings with 0 and 1, including the empty string REGULAR DEFINITIONS To write regular expression for some languages can be difficult, because their regular expressions can be quite complex. In those cases, we may use regular definitions. We can give names to regular expressions and we can use these names as symbols to define other regular expressions. A regular definition is a sequence of the definitions of the form: d1 → r1 where di is a distinct name and d2 → r2 ri is a regular expression over symbols in. {d1,d2,...,di-1} dn → rn basic symbols previously defined names REGULAR DEFINITIONS (CONT.) Ex: Identifiers in Pascal letter → A | B |... | Z | a | b |... | z digit → 0 | 1 |... | 9 id → letter (letter | digit ) * If we try to write the regular expression representing identifiers without using regular definitions, that regular expression will be complex. (A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) * Ex: Unsigned numbers in Pascal digit → 0 | 1 |... | 9 digits → digit + opt-fraction → (. digits ) ? opt-exponent → ( E (+|-)? digits ) ? unsigned-num → digits opt-fraction opt-exponent NOTATIONAL SHORTHAND The following shorthand are often used: r+ = rr* r? = r│ε [a-z] = a │ b │ c │ … │ z Examples: digit → [0-9] digits → digit+ optional_fraction → (. digits)? optional_exponent → ( E (+ │ -)? digit+ )? num → digits optional_fraction optional_exponent RECOGNITION OF TOKENS e.g. Regular Definitions stmt → if expr then stmt if → if │ if expr then stmt else stmtthen → then │ ε else → else expr → term relop term relop → < │ │ >= │ term id → letter (letter │digit)* term → id num →digits optional_fraction │ num optional_exponent Assumptions delim → blank │tab │newline TRANSITION DIAGRAMS relop → == start < = 0 1 2 return(relop, LE) > 3 return(relop, NE) other 4 * return(relop, LT) = 5 return(relop, EQ) > = 6 7 return(relop, GE) other 8 * return(relop, GT) id → letter ( letterdigit )* letter or digit start letter other 9 10 11 * return(gettoken(), install_id()) TRANSITION DIAGRAMS: CODE token nexttoken() { while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { Decides the state = 0; lexeme_beginning++; next start state } else if (c==‘’) state = 6; else state = fail(); int fail() break; { forward = token_beginning; case 1: swith (start) { … case 0: start = 9; break; case 9: c = nextchar(); case 9: start = 12; break; if (isletter(c)) state = 10; case 12: start = 20; break; else state = fail(); case 20: start = 25; break; break; case 25: recover(); break; case 10: c = nextchar(); default: if (isletter(c)) state = 10; } else if (isdigit(c)) state = 10; return start; else state = 11; } break; … THE LEX AND FLEX SCANNER GENERATORS Lex and its newer cousin flex are scanner generators Systematically translate regular definitions into C source code for efficient scanning Generated code is easy to integrate in C applications CREATING A LEXICAL ANALYZER WITH LEX AND FLEX lex source lex or flex lex.yy.c program compiler lex.l lex.yy.c C a.out compiler input sequence stream a.out of tokens LEX SPECIFICATION A lex specification consists of three parts: regular definitions, C declarations in %{ %} %% translation rules %% user-defined auxiliary procedures The translation rules are of the form: p1 { action1 } p2 { action2 } … pn { actionn } REGULAR EXPRESSIONS IN LEX x match the character x \. match the character. “string”match contents of string of characters. match any character except newline ^ match beginning of a line $ match the end of a line [xyz] match one character x, y, or z (use \ to escape -) [^xyz]match any character except x, y, and z [a-z] match one of a to z r* closure (match zero or more occurrences) r+ positive closure (match one or more occurrences) r? optional (match zero or one occurrence) r1 r2 match r1 then r2 (concatenation) r1|r2 match r1 or r2 (union) (r) grouping r1\r2 match r1 when followed by r2 {d} match the regular expression defined by d STAR OPERATION (KLEENE CLOSURE) a* = {a0, a1, a2, a3, a4,…. a∞} ={ε, a, aa, aaa, aaaa,….. a∞} Important Characteristics Value of * ranges from 0 to ∞ i.e. the elements of set a* will include {a0, a1, a2, a3, a4, a5…. a∞} a0 means zero number of a’s and this is represented by ε. * is represented in finite automata by a loop on that particular state; if value of a is 3 i.e. a3 loop iterates for 3 times. If value of a is 0 i.e. a0 loop will not iterate at all. a q2f m/c for a* POSITIVE CLOSURE a+ = {a1, a2, a3, a4,…., a ∞} = { a, aa, aaa, aaaa,….. a ∞} Important Characteristics value of + ranges from 1 to ∞ i.e. the elements of set a+ will include {a1, a2, a3, a4, a5…. a ∞} There is no a0 move i.e. ε is not part of this set. Value of a will start from 1 i.e. at least one will come which can be followed by 0 or more 1’s. Please remember: a+ = a.a* a a q0 q2f m/c for a+ CONCATENATION OPERATION Concatenation means joining (a.b) Important Note: a.b ≠ b.a i.e. order of join will change the design of automata a q0 qq2f m/c for a b m/c for b q0 q2f b b a a q0 q1 qq2f q0 q1 qqq2f f m/c for a.b m/c for b.a OR OPERATION a q0 qq2f m/c for a b q0 q2f m/c for b NFA for a+b (a/b) a q2f q0 m/c for a/b b q2f SECTION 1.2 INTRODUCTION TO FINITE AUTOMATA FINITE AUTOMATA Automata means machine Finite Automata consist of 5 tuples: M = (Q, Σ, δ, q0, F) Q A finite set of states Σ A finite set of input alphabet δ A transition function q0 The initial/starting state, q0 is in Q F A set of final/accepting states, which is a subset of F TYPES OF AUTOMATA There are two types of finite Automata: Deterministic Finite Automata (DFA) Non-deterministic finite Automata (NFA) DETERMINISTIC FINITE AUTOMATA Deterministic Finite Automata is a Machine where corresponding to a every input of Σ, there can be only one output from every state. b Here Σ = { a, b} and at every state there is one a q1 O/P from ‘a’ and one q0 a, O/P from ‘b’. None of b a b the states have more b q2 then one output corresponding to a or qf a b. NON-DETERMINISTIC FINITE AUTOMATA Non-Deterministic Finite Automata is a machine where corresponding to a single input of Σ (a,b), there can be more than one output from a particular state. b Here state q0 has two a moves from a, one to q0 q1 q1 and other to q2, a b like wise state q2 has a two moves on ‘b’ one b q2 qf self loop to q1 and b another to qf TYPES OF NFA There are two type of NFA i. NFA without ε -move ii. NFA with ε -move NFA WITH Ε-MOVE Consider the following NFA, here corresponding q1 there is an ε-move. a, b a q1 q0 a ε a,b qf DIFFERENCE BETWEEN DFA AND NFA Deterministic Finite Non-Deterministic Finite Automata Automata Deterministic Finite Non-Deterministic Automata is a Machine Finite Automata is a where corresponding to a machine where every input of Σ, there corresponding to a can be only one output single input of Σ (a,b), from every state. there can be more than DFA will not have ε- one output from a move particular state. NFA can have ε-move SECTION 1.3 THOMSON’S CONSTRUCTION THOMPSON’S CONSTRUCTION We have three operations on Regular Expressions: i) Star operation ii) Concatenation iii) OR operation For each operation we have defined rules to build a NFA with ε-move Thompson’s Construction for Star Operation a* = {ε, a, aa, aaa, aaaa,…..} a qf NFA for a* NFA for a* using Thomson’s Construction: ε ε ε q0 q1 q2 qf a ε Thompson’s Construction for Star Operation NFA for a* using Thomson’s Construction: ε Only ε ε ε q0 q1 q2 qf a ε ε Single a ε ε q0 q1 q2 qf a ε Thompson’s Construction for Star Operation NFA for a* using Thomson’s Construction: ε Two a’s ε ε q0→q1→q2→q1→q2→qf q0 q1 q2 qf a ε ε N number of a’s q0→q1→q2→q1→q2→qf ε ε q1→q2→q1 loops for N q0 q1 q2 qf a times where N varies from 2 to ∞ ε THOMPSON’S CONSTRUCTION FOR CONCATENATION OPERATION a NFA for a q0 qf b NFA for b q0 qf NFA for ab using Thomson’s Construction a b q0 q1 qf THOMPSON’S CONSTRUCTION FOR OR OPERATION a NFA for a q0 qf b NFA for b q0 qf NFA for a+b (a/b) using Thomson’s Construction a ε ε q1 q2 q0 qf ε b q4 ε q3 THOMPSON’S CONSTRUCTION FOR AA*B Question 1 a Thompson’s for a: q0 qf b Thompson’s for b: q0 qf ε Thompson’s for a*: ε ε q0 q1 q2 qf a ε THOMPSON’S CONSTRUCTION FOR a*b(a/b) Question 1 Thompson’s Construction for aa*b: ε a ε ε b q0 q1 q2 q3 q4 qf a ε NFA using Thompson’s Construction a a q0 q1 qf b NFA without Thompson’s THOMPSON’S CONSTRUCTION FOR a*b(a/b) Question 2 ε Thompson’s for a*: ε ε q0 q1 q2 qf a ε b Thompson’s for b: q0 qf a ε ε q1 q2 Thompson’s for a/b: q0 qf ε b q4 ε q3 THOMPSON’S CONSTRUCTION FOR a*b(a/b) Question 2 NFA using Thompson’s Construction ε a ε ε q5 q6 ε ε b qf q0 q1 q2 q3 q4 a b ε q7 q8 ε ε a b a,b q0 q1 qf NFA without Thompson’s THOMPSON’S CONSTRUCTION FOR (a/b/c) ε q1 a q2 ε Question 3 b ε qf Three ε out moves moves from a q0 q3 q4 ε state are not allowed c q6 ε ε q5 ε a ε q1 q2 ε b qf q0 ε q4 q6 ε ε q3 q8 ε ε c q5 q7 ε Final Output THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4 a ε ε q1 q2 Thompson’s for a/b: q0 qf ε b q4 ε q3 Thompson’s for (a/b)*: ε a ε ε q2 q3 ε q1 q6 ε q0 qf ε b q5 ε q4 ε THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4 a Thompson’s for a: q0 qf b Thompson’s for b: q0 qf Thompson’s for (a/b)*: ε a ε ε q2 q3 ε q1 q6 ε qf q0 ε b q5 ε q4 ε THOMPSON’S CONSTRUCTION FOR ab(a/b)* Question 4 ε a ε ε q4 q5 a b ε q3 q8 ε q0 q1 q2 qf ε b ε q6 q7 ε NFA using Thompson’s Construction a,b a b q0 q1 qf NFA without Thompson’s SECTION 1.4 SUBSET CONSTRUCTION HOW TO WORK WITH Ε-CLOSURE FUNCTION Steps for ε-Closure function: First step is to take ε-Closure of the start state , for e.g. if the start state is 0 so take ε-Closure(0). ε-Closure(n) will include set of all the states which can be traversed from state n without consuming any input i.e. through ε move only. Most Imp.- “ε-Closure of a state will include that state itself in the set”, i.e. ε-Closure(n) will include n in its set of states. SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 4 5 ε ε State a b Start with the start state: state 0 A ε-closure(0):{0,1,2,4,7} = A (0,1,2,4,7) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 4 5 ε ε Start with the start state: ε-closure(0):{0,1,2,4,7} = A State a b (A, a)= ({0,1,2,4,7}, a) = {0,a} ⋃{1,a} ⋃{2,a} ⋃{4,a} ⋃{7,a} A = Φ ⋃ Φ ⋃{3} ⋃ Φ ⋃ {8} (0,1,2,4,7) = ε -closure (3) ⋃ ε -closure (8) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 5 ε 4 ε State a b (A, a)= ε -closure (3) ⋃ ε -closure (8) A B = {1,2,3,4,6,7} U {8} (0,1,2,4,7) (1,2,3,4,6,7,8 ) = {1,2,3,4,6,7,8}=B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 5 ε 4 ε State a b (A, b)= ({0,1,2,4,7}, b) A B ={0,b} ⋃{1,b} ⋃{2,b}⋃{4,b} ⋃{7,b} (0,1,2,4,7) (1,2,3,4,6,7,8) = Φ ⋃ Φ ⋃ Φ ⋃{5} ⋃ Φ = ε -closure (5) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 6 7 8 0 9 ε b 4 5 ε ε State a b (A, b)= ε -closure (5) A B C = {1,2,4,5,6,7}=C (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε ε a b 1 6 7 8 0 9 ε b 4 5 ε ε (B, a)= ({1,2,3,4,6,7,8}, a) State a b = {1,a}⋃{2,a} ⋃{a,a} ⋃{4,a}⋃{6,a}⋃{7,a} ⋃{8,a} A B C = Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ {8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (3) ⋃ ε -closure (8) B B = {1,2,3,4,6,7,8}=B (Slide No. 55) SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b ε 4 5 ε (B, b)= ({1,2,4,5,6,7,8}, b) State a b ={1,b} ⋃{2,b} ⋃{4,b} ⋃{5,b} ⋃{6,b} ⋃{7,b} ⋃{8,b A B C = Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ{9} (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (5) ⋃ ε -closure (9) B B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (B, b) = ε -closure (5) ⋃ ε -closure (9) State a b = {1,2,4,5,6,7,9}=D A B C (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) B B D (1,2,4,5,6,7,9) SUBSET CONSTRUCTION FOR(a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (C, a)= ({1,2,4,5,6,7}, a) State a b = {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} A B C = Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8 (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (3) ⋃ ε -closure (8) B B D = {1,2,3,4,6,7,8}=B (Slide no. 55) (1,2,4,5,6,7,9) C B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε ε a b 0 1 6 7 8 9 ε b 5 ε 4 ε (C, b)= ({1,2,4,5,6,7}, b) State a b = {1,b} ⋃{2,b} ⋃{4,b}⋃{5,b} ⋃{6,b} ⋃{7,b} A B C = Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57) B B D (1,2,4,5,6,7,9) C B C SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (D, a)= ({1,2,4,5,6,7,9}, a) State a b = {1,a}⋃{2,a}⋃{4,a}⋃{5,a}⋃{6,a} ⋃{7,a} ⋃{9,a} A B C = Φ ⋃{3} ⋃ Φ ⋃ Φ ⋃ Φ ⋃{8} ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (3) ⋃ ε -closure (8) B B D (1,2,4,5,6,7,9) = {1,2,3,4,6,7,8}=B (Slide no. 55) C B C D B SUBSET CONSTRUCTION FOR (a/b)*ab ε a ε ε 2 3 ε 1 ε a b 0 6 7 8 9 ε b 4 5 ε ε (D, b)= ({1,2,4,5,6,7,9}, b) State a b = {1,b}⋃{2,b}⋃{4,b}⋃{5,b}⋃{6,b} ⋃{7,b} ⋃{9,b} A B C = Φ ⋃ Φ ⋃{5} ⋃ Φ ⋃ Φ ⋃ Φ ⋃ Φ (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) = ε -closure (5)= {1,2,4,5,6,7}=C (Slide no. 57) B B D (1,2,4,5,6,7,9) C B C D B C SUBSET CONSTRUCTION FOR (a/b)*ab b C State a b b A B C b a (0,1,2,4,7) (1,2,3,4,6,7,8) (1,2,4,5,6,7) B B D a B A (1,2,4,5,6,7,9) C B C a a b D B C qD2 Here state A is start state since set ‘A’ has state ‘0’ in its subset which is Final Output start state in the NFA with Thompson’s construction. D is final state since the set D has state ‘9’ which is final state in the NFA with Thompson’s Construction Ε-CLOSURE(T) push all states of T onto stack initialize ϵ-closure(T) to T while (stack is not empty) do begin pop t, the top element, off stack; for (each state u with an edge from t to u labelled ϵ do begin if (u is not in ϵ-closure(T)) do begin add u to ϵ-closure(T) push u onto stack end end end CONVERTING A NFA INTO A DFA (SUBSET CONSTRUCTION) put -closure({s0}) as an unmarked state into the set of DFA (DS) while (there is one unmarked S1 in DS) do -closure({s0}) is the set of all states can be accessible from s0 by -transition. begin mark S1 set of states to which there is a transition on for each input symbol a do a from a state s in S1 begin S2 -closure(move(S1,a)) if (S2 is not in DS) then add S2 into DS as an unmarked state transfunc[S1,a] S2 end end a state S in DS is an accepting state of DFA if a state s in S is an accepting state of NFA the start state of DFA is -closure({s0}) SECTION 1.5 RE TO DFA THROUGH SYNTAX TREE METHOD OR DIRECT METHOD CONVERTING REGULAR EXPRESSIONS DIRECTLY TO DFAS Important state We may convert a regular expression into a DFA (without creating a NFA first). First we augment the given regular expression by concatenating it with a special symbol #. r ➔ (r)# augmented regular expression Then, we create a syntax tree for this augmented regular expression. In this syntax tree, all alphabet symbols (plus # and the empty string) in the augmented regular expression will be on the leaves, and all inner nodes will be the operators in that augmented regular expression. Then each alphabet symbol (plus #) will be numbered (position numbers). FROM REGULAR EXPRESSION TO DFA DIRECTLY: SYNTAX TREE OF (a/b)*abb# concatenation # 6 b closure 5 b 4 a * 3 alternation | position number a b (for leafs ) 1 2 FROM REGULAR EXPRESSION TO DFA DIRECTLY: ANNOTATING THE TREE nullable(n): the subtree at node n generates languages including the empty string firstpos(n): set of positions that can match the first symbol of a string generated by the subtree at node n lastpos(n): the set of positions that can match the last symbol of a string generated by the subtree at node n followpos(i): the set of positions that can follow position i in the tree FROM REGULAR EXPRESSION TO DFA DIRECTLY: ANNOTATING THE TREE Node n nullable(n) firstpos(n) lastpos(n) Leaf true Leaf i false {i} {i} | nullable(c1) firstpos(c1) lastpos(c1) / \ or ꓴ ꓴ c1 c2 nullable(c2) firstpos(c2) lastpos(c2) if nullable(c1) then if nullable(c2) then nullable(c1) firstpos(c1) ꓴ lastpos(c1) ꓴ / \ and c1 c2 firstpos(c2) lastpos(c2) nullable(c2) else firstpos(c1) else lastpos(c2) * | true firstpos(c1) lastpos(c1) c1 FROM REGULAR EXPRESSION TO DFA DIRECTLY: SYNTAX TREE OF (a/b)*abb# {1, 2, 3} {6} {1, 2, 3} {5} {6} # {6} 6 {1, 2, 3} {4} {5} b {5} nullable 5 {1, 2, 3} {3} {4} b {4} 4 a {3} firstpos lastpos {1, 2} {1, 2} {3} * 3 {1, 2} | {1, 2} {1} a {1} {2} b {2} 1 2 FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE Node followpos (a/b)*a b b # 1 {1, 2, 3} 2 {1, 2, 3} 1 2 34 5 6 3 {4} 4 {5} 5 {6} 6 - FROM RE TO DFA DIRECTLY (a/b)*a b b # Let {1,2,3}=A A,a ({1,2,3},a) followpos (1) ꓴ {1,2,3,4} B 1 2 34 5 6 followpos(3) Node Symbol followpos Name A,b ({1,2,3},b) followpos (2) {1,2,3} A 1 a {1, 2, 3} B,a ({1,2,3,4},a followpos (1) ꓴ {1,2,3,4} B 2 b {1, 2, 3} ) followpos(3) 3 a {4} B,b ({1,2,3,4},b followpos (2) ꓴ {1,2,3,5} C 4 b {5} ) followpos(4) 5 b {6} C,a ({1,2,3,5},a followpos (1) ꓴ {1,2,3,4} B 6 # - ) followpos(3) State a b C,b ({1,2,3,5},b followpos (2) ꓴ {1,2,3,6} D A B A ) followpos(5) B B C D,a ({1,2,3,6},a followpos (1) ꓴ {1,2,3,4} B C B D ) followpos(3) D B A D,b ({1,2,3,6},b followpos (2) {1,2,3} A ) FROM REGULAR EXPRESSION TO DFA DIRECTLY: EXAMPLE Node followpos b b 1 {1, 2, 3} a 2 {1, 2, 3} start a 1,2, b 1,2, b 1,2, 3 {4} 1,2,3 3,4 3,5 3,6 4 {5} a 5 {6} a 6 - DIFFERENT DFA’S FOR (a/b)*abb b State a b C b A B C b a B B D a a b C B C A B D EE b a D B E a E B C b b State a b a A B A start a 1,2, b 1,2, b 1,2, 1,2,3 B B C 3,4 3,5 3,6 C A D a D B A FROM REGULAR EXPRESSION TO DFA DIRECTLY: FOLLOWPOS for each node n in the tree do if n is a cat-node with left child c1 and right child c2 then for each i in lastpos(c1) do followpos(i) := followpos(i) firstpos(c2) end do else if n is a star-node for each i in lastpos(n) do followpos(i) := followpos(i) firstpos(n) end do end if end do FROM REGULAR EXPRESSION TO DFA DIRECTLY: ALGORITHM s0 := firstpos(root) where root is the root of the syntax tree Dstates := {s0} and is unmarked while there is an unmarked state T in Dstates do mark T for each input symbol a do let U be the set of positions that are in followpos(p) for some position p in T, such that the symbol at position p is a if U is not empty and not in Dstates then add U as an unmarked state to Dstates end if Dtran[T,a] := U end do end do SECTION 1.6 MINIMIZATION OF DFA Question 1 MINIMIZATION THE FOLLOWING DFA, IF POSSIBLE a a B A b a a a b C b D E b b USING FINAL AND NON FINAL STATE Divide the entire set of states into two subsets: Set of final States and set of non final states. Consider each sub-set as a separate entity and identify if they need to be split further or can they be combined together Question 1 DFA MINIMIZATION USING PARTITIONING METHOD a a B A Stat a b b a → e a a A B C b C b B B D D E b C B C D B E b * E B C Draw the transition table corresponding to the given DFA Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Divide the states into two subsets- final and non-final State a b → A B C B B D Set of non Final States (NF): {A,B,C, D} C B C Set of Final States (F): {E} D B E * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C,D) with Σ=a NF= {A,B,C,D} State a b F= {E} → A B C A,B,C ,D B B D C B C E D B E * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C,D) with Σ=b A,B,C ,D NF= ({A,B,C} {D}) F= {E} State a b → A B C A,B,C D B B D b Split into two since C B C E {A,B,C} goes on D B E states within {A,B,C) while state D goes to * E B C State {E} Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C) with Σ=a NF= ({A,B,C}, {D}) State a b → A B C A,B,C D B B D C B C E NO SPLIT D B E * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,C) with Σ=b A,B,C B NF= ({A,C}, {B} State a b {D}) → A B C A,C b B B D D C B C Split into two since {A,C} goes to state D B E E {C} while {B} goes * E B C to State {D} which is already separated. Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,C) with Σ=a NO SPLIT B NF= ({A,C}, {B} State a b {D}) → A B C A,C D B B D C B C E Both A and C go to state B which is D B E already separated * E B C Question 1 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,C) with Σ=b NO SPLIT NF= ({A,C}, {B} {D}) B State a b Both A and C state → go to same group A B C A,C D {A,C} on Σ=b B B D Since subset {A,C} C B C E remain as single D B E combined state till end, both states will * E B C be joined together as a single state State a b State a b DFA MINIMIZATION → A B C A,C B A,C → USING PARTITIONING METHOD B B D B B D C B C D B E D B E * E B A,C * E B C a b a a B a A A, B C b a a a a a b C b b D E E b D b b b Final Output Question 2 MINIMIZATION THE FOLLOWING DFA, IF POSSIBLE b a a b a A B C D a a b b b b b a E F G H b a a Question 2 DFA MINIMIZATION USING PARTITIONING METHOD b a State a b a b a C D → A B F A B a B G C a b C A C b b * D C G b b a E H F E F G H F C G b a G G E a H G C Draw the transition table corresponding to the given DFA Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Divide the states into two subsets- final and non-final State a b → A B F B G C * C A C D C G Set of Non Final States (NF): {A,B,D,E,F,G,H} E H F Set of Final States (F): {C} F C G G G E H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,D,E,F,G,H) with Σ=a State a b A,B,D,E NF= {A,B,E,G,H}, {D,F} → A , F,G,H B F B G C * C A C A,B,E, D,F D C G G,H E H F a Split into two since F C G {A,B,E,G,H} go to G G E C state states within its H G C set while {D,F} goes to State {C} Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,E,G,H) with Σ=a State a b NO SPLIT → A B F B G C * C A C A,B,E, D,F D C G G,H E H F a F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,B,E,G,H) with Σ=b NF= {A,E},{G},{B,H},{D,F} State a b A,B,E, → A B F G,H B G C D,F * C A C A,E B,H D C G b E H F G F C G b G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,E) with Σ=a State a b NO SPLIT → A B F NF= {A,E},{G},{B,H},{D,F} B G C * C A C A,E B,H D C G E H F G D,F F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (A,E) with Σ=b NO SPLIT State a b → A B F NF= {A,E},{G},{B,H},{D,F} B G C D,F * C A C A,E B,H D C G E H F G F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (B,H) with Σ=a State a b NO SPLIT → A B F B G C NF= {A,E},{G},{B,H},{D,F} * C A C A,E B,H D C G E H F G D,F F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (B,H) with Σ=b State a b NO SPLIT → A B F B G C NF= {A,E},{G},{B,H},{D,F} * C A C A,E B,H D C G E H F G D,F F C G G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (D,F) with Σ=a State a b NO SPLIT → A B F NF= {A,E},{G},{B,H},{D,F} B G C * C A C A,E B,H D C G E H F G D,F F C G a G G E C H G C Question 2 DFA MINIMIZATION USING PARTITIONING METHOD Check O/P of all clubbed states (D,F) with Σ=b State a b NO SPLIT → A B F NF= {A,E},{G},{B,H},{D,F} B G C * C A C A,E B,H D,F D C G E H F F C G G G G E C H G C State a b State a b DFA MINIMIZATION USING → A B B G F C → B, H A,E B,H G D,F C PARTITIONING METHOD * C D A C C G * C A,E C E H F D,F C G b G G A,E a F C G G G E H G C b b a b a a A B C D a a b a D, A, B, a b H C F b E b a b b b a a E F G H a b a G a a b Final Output THANKS