Document Details

FortuitousSanctuary7499

Uploaded by FortuitousSanctuary7499

Dr. Salah Eldin Shaban

Tags

lexical analysis compiler design programming languages computer science

Summary

This document provides an outline of Lexical Analysis, a crucial step in compiler design. It covers the role of the lexical analyzer, specification of tokens, recognition of tokens, and finite automata. The document aims to provide a comprehensive understanding of these concepts.

Full Transcript

Chapter Two: Lexical Analysis Outlines:  The Role of the Lexical Analyzer  Specification of Tokens  Recognition of Tokens  Finite Automata Dr. Salah Eldin Shaban 1 The Role of Lexical Analyzer  The Front-end of a compil...

Chapter Two: Lexical Analysis Outlines:  The Role of the Lexical Analyzer  Specification of Tokens  Recognition of Tokens  Finite Automata Dr. Salah Eldin Shaban 1 The Role of Lexical Analyzer  The Front-end of a compiler reads and parses source program.  Lexical Analyzer or Scanner  The first phase of a compiler,  Reads the input Characters of the source program, groups them into Lexemes, and produces as output a sequence of Tokens for each lexeme. The parser uses Tokens for syntax analysis.  Construct tables including a table of identifiers (symbol table) and a table of numeric constants.  Reasons to make it a separate phase are:  Simplifies the design of the compiler.  Provides efficient implementation.  Improves portability. Dr. Salah Eldin Shaban 2 The Role of Lexical Analyzer Cont.  The Interaction of the Lexical Analyzer with the Parser is: Dr. Salah Eldin Shaban 3 Tokens, Patterns, and Lexemes  A Token is a classification of lexical units. It is a pair consisting of a token name and an optional attribute value.  For example: keyword, id and num  The token names are the input symbols that the parser processes.  A Pattern is a description of the form that the lexemes of a token may take.  For example an id pattern is: “letter followed by letters and digits” and “non-empty sequence of digits”  A Lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token.  For example: abc and 123 Dr. Salah Eldin Shaban 4 Tokens, Patterns, and Lexemes Cont.  Examples of Tokens. Dr. Salah Eldin Shaban 5 Tokens, Patterns, and Lexemes Cont.  Examples show the difference between Token, Lexeme and Pattern. Token Lexeme Pattern if if if relation = < or or >= id Y,x Letter followed by letters and digits num 31, 28 Any numeric constant operator +, *, -, / Any arithmetic operator + or * or – or / Dr. Salah Eldin Shaban 6 Tokens, Patterns, and Lexemes Cont.  Attributes of Tokens: Dr. Salah Eldin Shaban 7 Tokens, Patterns, and Lexemes Cont.  Examples of token classes (names) are: 1. Keywords 2. Identifiers 3. Operators 4. Numeric constants 5. Character constants 6. Special characters 7. Comments 8. White space 9. Newline for ( count = 1 ; count = x2 + 3.4e+6 ; count = count + 1 ) //outer loop 1 6 2 3 46 2 3 2 3 4 6 2 3 2 3 46 7 Dr. Salah Eldin Shaban 8 Specification of Tokens  Regular Expressions are an important notation for specifying lexeme’s patterns.  While they cannot express all possible patterns, they are very effective in specifying those types of patterns that we need for tokens.  We consider:  The formal notation for regular expressions, and  How these expressions are used in a lexical-analyzer generator.  We show:  How to build the lexical analyzer by converting regular expressions to automata that perform the recognition of the specified tokens. Dr. Salah Eldin Shaban 9 Alphabets  In discrete mathematics, a set is a collection of unique objects.  Each element listed only once and may be listed in any order.  e.g., {boy, girl, animal} is a set of words, but it represents the same set as {girl, boy, animal}.  A set may contain an infinite number of objects.  The empty set contains no elements and designate it either by { } or .  Everything in mathematics or in automata is based on symbols.  The symbols are generally letters or digits. Dr. Salah Eldin Shaban 10 Alphabets Cont.  An alphabet is a finite and nonempty set of symbols.  For example, a set of decimal numbers alphabet S = {0, 1, 2, …, 9} an alphabet  S is representation for the alphabet.  Another example, a set of binary alphabet S = {0, 1} an alphabet S = {A, B, … Z} an alphabet of upper English letters S = {a, b, … z} an alphabet of lower English letters Dr. Salah Eldin Shaban 11 Strings  A string (or sometimes a word) is a list of characters or a finite sequence of symbols from a given alphabet.  For example, if S {a, b} is an alphabet then abab is a string over alphabet S. A string is generally denoted by w.  Elements of a string need not be unique, and their listed order is important. For example, "abc" and "cba" are different strings.  The length of the string is denoted by |w| and it is the number of positions for the symbols in the string.  The empty or null string is the string with zero occurrence of symbols. It consists of no characters and designate it by e, or l, or g. Dr. Salah Eldin Shaban 12 Strings Cont.  The set of strings, including the empty, over an alphabet S is denoted by S* For examples, S* = {e, 0, 1, 01, 10, 000, 010, 0000, 1111000...} of S = {0,1} S* = {e, 0, 00, 000, 0000, 00000,...} of S = {0} S* = {e, 1, 11, 111, 1111, 11111,...} of S = {1}  e is in S* regardless of what alphabet S is.  e is only string whose length is 0  For example, if S = {0, 1}, then S1 = {0, 1}, S2 = {00, 01, 10, 11}, S3 = {000, 001, 010, 100, 011, 101, 110, 111}, … etc.  The set of non-empty strings from S is denoted by S+ S + = S1  S2  S 3  … S* = S+  {e} Dr. Salah Eldin Shaban 13 Strings Cont.  The concatenation of two strings w1 and w2 is denoted by w1 w2  It is the string formed by making a copy of w1 and followed by a copy of w2 For example, if w1 = abc and w2 = xyz then w1 w2 = abc xyz  The exponentiation of string w is defined by w = wi = e for i = 0 w = wi-1 w for i ≥ 1 we=ew=w  The reverse of the string can be achieved by flipping over last symbols. For example, if w = abc then wR = cba |w| = |wR| Dr. Salah Eldin Shaban 14 Strings Cont.  Some commonly used string-related terms are:  A prefix of string s is any string obtained by removing zero or more symbols from the end of s. Examples, ban and e are prefixes of banana.  A suffix of string s is any string obtained by removing zero or more symbols from the beginning of s. Example, nana is a suffix of banana.  A substring of s is obtained by deleting any prefix and any suffix from s. For instance, banana, nan, and e are substrings of banana.  The proper prefixes, suffixes, and substrings of a string s are those, prefixes, suffixes, and substrings, respectively of s that are not e or not equal to s itself.  A subsequence of s is any string formed by deleting zero or more not necessarily consecutive positions of s. For example, baan is a subsequence of banana. Dr. Salah Eldin Shaban 15 Languages  A language is a set of strings all of which are chosen from some S*, whose S is a specific alphabet.  If S is an alphabet, and L  S*, then L is said to be a language over alphabet S  S* is a language for any alphabet S  f, the empty language, is a language over any alphabet (no strings).  {e}, the language consisting of only the empty string, is also a language over any alphabet (one string). So, f  {e}  All alphabets are finite.  Languages can have an infinite number of strings. Dr. Salah Eldin Shaban 16 Languages Cont.  A Natural language is normally spoken by people.  A Formal language can be specified precisely and is used with computers e.g. the syntax of a programming language.  A formal language is a set of strings from a given alphabet.  Examples of languages from the alphabet {0, 1}: finite sets such as {0, 10, 1011}, and { } Infinite sets such as {e, 0, 00, 000, 0000, 00000,...} The set of all strings of zeroes and ones having an even number of ones. Dr. Salah Eldin Shaban 17 Languages Cont.  The most important operations on languages are union, concatenation, and closure. If we assume L and M languages, then Dr. Salah Eldin Shaban 18 Languages Cont.  For example: Let L be the set of letters {A, B, … , Z, a, b, …}and let D be the set of digits {0, 1, 2, …, 9}.  Some other languages can be constructed from languages L and D, using the above operators as:  L Ս D is the set of letters and digits.  LD is the set of 520 strings of length two, each consisting of one letter followed by one digit.  L4 is the set of all 4-letter strings.  L* is the set of all strings of letters, including e, the empty string.  L(L Ս D)* is the set of all strings of letters and digits beginning with a letter.  D+ is the set of all strings of one or more digits. Dr. Salah Eldin Shaban 19 Regular Expressions  An Integer Constant token might be defined in English as: "At least one decimal digit, followed by zero or more additional digits.“  "zero or more" recognized in Kleene star notation as * So, an integer constant may be represented as "dd*“ where d represents a digit.  This means a constant is one digit followed by zero or more digits.  Similarly, an identifier may consist of one letter (represented as a) followed by zero or more of either letters or digits.  If we use the vertical bar "|" to mean "either/or," then we can express the idea "either an a or a d, but not both" by writing "a | d".  So, an identifier succinctly can be represented as "a(a | d)*".  Such compact notation is called a Regular Expression. Dr. Salah Eldin Shaban 20 Regular Expression Cont.  The rules define the regular expressions over some alphabet Ʃ and the languages that those expressions denote are: 1. ε is a regular expression, and L(ε) is {ε}; the language of empty string. 2. If a is a symbol in Ʃ, then a is a regular expression, and L(a) = {a}; the language with one string, of length one, with a in its one position.  Suppose r and s are regular expressions denoting L(r) and L(s).  (r) | (s) is a regular expression denoting the language L(r) Ս L(s).  (r) (s) is a regular expression denoting the language L(r) L(s).  (r)* is a regular expression denoting (L(r))*.  (r) is a regular expression denoting L(r). Adding additional parentheses around expressions without changing the language they denote. Dr. Salah Eldin Shaban 21 Regular Expression Cont.  The unary operator * has highest precedence and is left associative.  Concatenation has second highest precedence and is left associative.  | has lowest precedence and is left associative.  So, we may replace the regular expression (a) | ((b)* (c)) by a | b* c.  The set of strings that are either a single a or are zero or more b's followed by one c.  Example: Let Ʃ = {a, b}.  The regular expression a | b denotes the language {a, b}.  (a | b) (a | b) denotes {aa, ab, ba, bb}, the language of all strings of length two over the alphabet Ʃ. Dr. Salah Eldin Shaban 22 Regular Expression Cont.  Another regular expression for the same language is aa | ab | ba | bb.  a* denotes the language consisting of all strings of zero or more a's, that is, {a , aa , aaa, …}.  a | a* b denotes the language {a, b, ab, aab, aaab, …}, that is, the string a and all strings consisting of zero or more a's and ending in b.  (a | b)* denotes the set of all strings consisting of zero or more instances of a or b, that is, all strings of a's and b's: {a, b, aa, ab, ba, bb, aaa, …}.  Another regular expression for the same language is (a*b*)*. Dr. Salah Eldin Shaban 23 Regular Expression Cont.  Expressions consisting of three possible operations on languages. Union, Concatenation, and Kleene star  Union ('+') – since a language is a set, the union of two sets is that set which contains all the elements in each of the two sets; e.g., {abc, ab, ba} + {ba, bb} = {abc, ab, ba, bb}.  The union of any language with the empty set is that language: L+ { } = L  Concatenation (a raised dot) of languages is the same as that of strings. This is simply the juxtaposition of two strings forming a new string. e.g., abc. ba = abcba  Any string concatenated with the null string is that string itself: s. e = s.  The concatenation of two languages is that language formed by concatenating each string in one language with each string in the other language. Dr. Salah Eldin Shaban 24 Regular Expression Cont.  For example, {ab, a, c}. {b, e} = {ab. b, ab. e, a. b, a. e, c. b, c. e} = {abb, ab, a, cb, c}  If L1 and L2 are two languages, then L1. L2 is not necessarily equal to L2. L1. Also, L. e = L, but L. f = f.  Kleene * - This operation is a unary operation (designated by a postfix asterisk) and is often called closure. If L is a language, we define: L0 = {e}, L1 = L, L2 = L. L, Ln = L. Ln-1 L* = L0 + L1 + L2 + L3 + L4 + L5 + …  The f0 = {e}. Dr. Salah Eldin Shaban 25 Regular Expression Cont.  If x is a character in the input alphabet, then x = {"x"}; i.e., the character x represents the set consisting of one string of length 1 consisting of the character x. This simplifies some of the regular expressions will write: 0 + 1 = {0} + {1} = {0, 1} 0 + e = {0, e}  If L1, L2, and L3 are languages, then: L1 + L2. L3 = L1 + (L2. L3) L1. L2* = L1. (L2* )  An example of a regular expression is: (0+1)*  Let L = {0, 1}, to find L*: L0 = {e} L1 = {0, 1} L2 = L. L1 = {00, 01, 10, 11} L3 = L. L2 = {000, 001, 010, 011, 100, 101, 110, 111} L* = {e, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111} = the set of all strings of zeros and ones. Dr. Salah Eldin Shaban 26 Regular Expression Cont.  Another example: 1.(0+1)*0 = 1(0+1)*0 = {10, 100, 110, 1000, 1010, 1100, 1110, …} = the set of all strings of zeros and ones which begin with a 1 and end with a 0.  We do not need to be concerned with the order of evaluation of several concatenations in one regular expression, since it is an associative operation. The same is true of union: L. (L. L) = (L. L). L L+ (L + L) = (L + L) + L  List six strings for each of the following regular expressions: Solution: (a(b+c)*)*d ad abd acd aad abbcbd (a+b)*(c+d) c d ac abd babc bad (a*b*)* e a b ab ba aa Note that (a*b*)* = (a+b)* Dr. Salah Eldin Shaban 27 Algebraic Laws for REs Dr. Salah Eldin Shaban 28 Algebraic Lows for REs Cont.  Consider the regular expression a (b a | c a) | (a c | a b) a = (aba|aca) | (aca|aba) (6,7) = (aba|(aca|(aca|aba)) (2) = aba|((aca|aca)|aba) (2) = aba|(aca|aba) (3) = aba|(aba|aca) (1) = (aba|aba)|aca) (2) = aba|aca (3) = a(b|c)a (6,7) Dr. Salah Eldin Shaban 29 Regular Definitions  If Ʃ is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form: d 1 → r1 d 2 → r2 … d n → rn  where:  Each di is a new symbol, not in Ʃ and not the same as any other of the d's, and  Each ri is a regular expression over the alphabet Ʃ Ս {d1, d2, …, di-1} Dr. Salah Eldin Shaban 30 Regular Definitions Cont.  Example: Identifiers  Example: Unsigned numbers (integer or floating point) Dr. Salah Eldin Shaban 31 Extensions of Regular Expressions  One or more instances: (r)+  Zero of one instances: r?  Character classes: [abc]  Example:  Example: Dr. Salah Eldin Shaban 32 Recognition of Tokens  Starting point is the language grammar to understand the tokens: stmt → if expr then stmt | if expr then stmt else stmt | e expr → term relop term | term term → id | number  The next step is to formalize the patterns: Dr. Salah Eldin Shaban 33 Recognition of Tokens Cont.  Also, handle whitespaces: ws → ( blank | tab | newline )+  The lexical analyzer goal is summarized as: Dr. Salah Eldin Shaban 34 Recognition of Tokens Cont.  Transition Diagrams:  Transition diagram for relop Dr. Salah Eldin Shaban 35 Recognition of Tokens Cont.  Transition diagram for reserved words and identifiers Dr. Salah Eldin Shaban 36 Recognition of Tokens Cont.  Transition diagram for unsigned numbers  Transition diagram for whitespace Dr. Salah Eldin Shaban 37 Finite Automata (FA)  Regular Expressions = Specification  Finite Automata = Implementation  A FSM is a mathematical or hypothetical method.  It is described in mathematical terms.  Its operation should be perfectly clear.  A FSM consists of:  A finite set of states S, one of which is designated the starting state, and zero or more of which are designated accepting states F  S.  The starting state may also be an accepting state.  A state transition function which has two arguments – a state and an input symbol (from a given input alphabet S) –and returns as result a state. state input state Dr. Salah Eldin Shaban 38 Finite Automata (FA) Cont.  Finite state machines (automata) (FSMs or FSA) mimic a computer that:  Sequentially, executes a finite number of instructions according to an algorithm,  Accepts a valid input and  Produces an output if the input is accepted.  If an FSM winds up in one of a set of final states the input strings is considered to be accepted. The language accepted by the machine is the set of strings, it accepts. Dr. Salah Eldin Shaban 39 Finite Automata (FA) Cont. Input Tape String Output “Accept” Finite or Automaton “Reject” Dr. Salah Eldin Shaban 40 Finite Automata (FA) Cont.  An FSA may be either deterministic (DFSA or DFA) or non- deterministic (NFSA or NFA).  An FSA is deterministic if its behavior during recognition is fully determined by the state it is in and the symbol to be consumed.  i.e., given an input string, only one path may be taken through the FSA.  Conversely, a FSA is non-deterministic if, given an input string, more than one path may be taken through the FSA.  One type of non-determinism is e-transitions, i.e. transitions which consume the empty string (no symbols). Dr. Salah Eldin Shaban 41 Finite Automata (FA) Cont.  In the designing of the FSA:  Firstly, analyze the set of strings i.e. language which is accepted by the FSA.  Make sure that every state is check for the output state for every input symbol of S.  Make sure that no state must have two different output state for a single input symbol.  Make sure that there is one initial state and at least one final state in transition diagram of FSA.  Start transition diagram drawing by satisfying the minimum condition of strings in the language. Dr. Salah Eldin Shaban 42 Transition Graph a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 initial accepting state state transition state Dr. Salah Eldin Shaban Alphabet S  {a , b } a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 For every state, there is a transition for every symbol in the alphabet Dr. Salah Eldin Shaban head Initial Configuration Input Tape a b b a Input String a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Initial state Dr. Salah Eldin Shaban Scanning the Input a b b a a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban a b b a a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban a b b a a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Input finished a b b a a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 accept Dr. Salah Eldin Shaban A Rejection Case a b a Input String a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban a b a a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban a b a a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Input finished a b a a, b reject q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Another Rejection Case Tape is empty (l ) Input Finished a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 reject Dr. Salah Eldin Shaban Language Accepted: L  {abba } a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban To accept a string: all the input string is scanned and the last state is accepting To reject a string: all the input string is scanned and the last state is non-accepting Dr. Salah Eldin Shaban Another Example L  {l, ab, abba } a, b q5 b a a a, b b q0 a q1 b q2 b q3 a q4 Accept Accept Accept state state state Dr. Salah Eldin Shaban Empty Tape (l ) Input Finished a, b q5 b a a a, b b q0 a q1 b q2 b q3 a q4 accept Dr. Salah Eldin Shaban Another Example a a, b q0 b q1 a, b q2 Accept trap state state Dr. Salah Eldin Shaban a a b Input String a a, b q0 b q1 a, b q2 Dr. Salah Eldin Shaban a a b a a, b q0 b q1 a, b q2 Dr. Salah Eldin Shaban a a b a a, b q0 b q1 a, b q2 Dr. Salah Eldin Shaban Input finished a a b a a, b accept q0 b q1 a, b q2 Dr. Salah Eldin Shaban A Rejection Case b a b Input String a a, b q0 b q1 a, b q2 Dr. Salah Eldin Shaban b a b a a, b q0 b q1 a, b q2 Dr. Salah Eldin Shaban b a b a a, b q0 b q1 a, b q2 Dr. Salah Eldin Shaban Input finished b a b a a, b q0 b q1 a, b q2 reject Dr. Salah Eldin Shaban Language Accepted: L  {a b : n  0 } n a a, b q0 b q1 a, b q2 Dr. Salah Eldin Shaban Another Example Alphabet: S  {1} 1 q0 q1 1 Language Accepted: EVEN  {x : x  S and x is even} *  { l, 11, 1111, 111111, } Dr. Salah Eldin Shaban Formal Definition Deterministic Finite Automaton (DFA) M  Q, S,  , q0 , F  Q : set of states S : input alphabet l S  : transition function q0 : initial state F : set of accepting states Dr. Salah Eldin Shaban Set of States Q Example Q  {q0 , q1, q2 , q3 , q4 , q5 } a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Input Alphabet S l S :the input alphabet never contains l Example S  {a, b} a, b q5 b a a, b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Initial State q0 Example a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Set of Accepting States F  Q Example F  {q4 } a, b q5 a a, b b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Transition Function  :QS Q  (q , x )  q  x q q Describes the result of a transition from state q with symbol x Dr. Salah Eldin Shaban Example:  q0 , a   q1 a, b q5 b a a, b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban  q0 , b   q5 a, b q5 b a a, b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban  q2 , b   q3 a, b q5 b a a, b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Transition Table for  symbols  a b q0 q1 q5 q1 q5 q2 states q2 q5 q3 a, b q3 q4 q5 q4 q5 q5 q5 q5 q5 q5 a, b b a a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Extended Transition Function  :Q  S Q * *  (q ,w )  q  * Describes the resulting state after scanning string w from state q Dr. Salah Eldin Shaban Example:  q0 , ab   q2 * a, b q5 b a a, b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban  q0 , abbbaa   q5 * a, b q5 b a a, b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban  q1, bba   q4 * a, b q5 b a a, b a b q0 a q1 b q2 b q3 a q4 Dr. Salah Eldin Shaban Special case: for any state q  q , l   q * Dr. Salah Eldin Shaban In general:  q ,w   q  * implies that there is a walk of transitions w   1 2  k 1 2 k q q states may be repeated q w q Dr. Salah Eldin Shaban Language Accepted by DFA Language of DFA M: it is denoted as LM  and contains all the strings accepted by M We say that a language L  is accepted (or recognized) by DFA M if L M  L   Dr. Salah Eldin Shaban For a DFA M  Q, S,  , q0 , F  Language accepted by M: { LM   w  S :  q0 ,w   F * * } q0 w q q  F Dr. Salah Eldin Shaban Language rejected by M: { LM   w  S :  q0 ,w   F * * } q0 w q q  F Dr. Salah Eldin Shaban More DFA Examples S  {a , b } a, b a, b q0 q0 L (M )  { } L (M )  S * Empty language All strings Dr. Salah Eldin Shaban S  {a , b } a, b q0 a, b q0 L(M )  { l } Language of the empty string Dr. Salah Eldin Shaban S  {a , b } LM = { all strings with prefix ab } a, b q0 a q1 b q2 b a accept q3 a, b Dr. Salah Eldin Shaban LM  = { all binary strings containing substring 001 } 0,1 1 0 1 l 0 0 00 1 001 0 Dr. Salah Eldin Shaban LM  = { all binary strings without substring 001 } 1 0 0,1 1 0 1 l 0 00 001 0 Dr. Salah Eldin Shaban { L(M )  awa : w  {a , b } * } a b b q0 a q2 q3 b a q4 a, b Dr. Salah Eldin Shaban Regular Languages Definition: A language L is regular if there is a DFA M that accepts it ( L(M )  L) The languages accepted by all DFAs form the family of regular languages Dr. Salah Eldin Shaban Example regular languages: {abba} {l , ab, abba} {a b : n  0} {awa : w  {a , b} } n * { all strings in {a,b}* with prefix ab } { all binary strings without substring 001} {x : x  {1} * and x is even} { } { l } {a , b } * There exist automata that accept these languages (see previous slides). Dr. Salah Eldin Shaban There exist languages which are not Regular: L {a b : n  0} n n ADDITION  {x + y  z : x  1 , y  1 , z  1 , n m k n+m k} There is no DFA that accepts these languages Dr. Salah Eldin Shaban Nondeterministic Finite Automaton (NFA) Alphabet = {a} q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban Alphabet = {a} Two choices q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban Alphabet = {a} Two choices q1 a q2 No transition a q0 a q3 No transition Dr. Salah Eldin Shaban First Choice a a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban First Choice a a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban First Choice a a All input is consumed q1 a q2 “accept” a q0 a q3 Dr. Salah Eldin Shaban Second Choice a a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban Second Choice a a Input cannot be consumed q1 a q2 a q0 Automaton Halts a q3 “reject” Dr. Salah Eldin Shaban An NFA accepts a string: if there is a computation of the NFA that accepts the string i.e., all the input string is processed and the automaton is in an accepting state Dr. Salah Eldin Shaban aa is accepted by the NFA: “accept” q1 a q2 q1 a q2 a a q0 q0 a a q3 q3 “reject” because this this computation computation is ignored accepts aa Dr. Salah Eldin Shaban Rejection example a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban First Choice a “reject” q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban Second Choice a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban Second Choice a q1 a q2 a q0 a q3 “reject” Dr. Salah Eldin Shaban Another Rejection example a a a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban First Choice a a a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban First Choice a a a Input cannot be consumed q1 a q2 “reject” a q0 a Automaton halts q3 Dr. Salah Eldin Shaban Second Choice a a a q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban Second Choice a a a Input cannot be consumed q1 a q2 a q0 Automaton halts a q3 “reject” Dr. Salah Eldin Shaban An NFA rejects a string: if there is no computation of the NFA that accepts the string. For each computation: All the input is consumed and the automaton is in a non final state OR The input cannot be consumed Dr. Salah Eldin Shaban a is rejected by the NFA: “reject” q1 a q2 a q1 q2 a a q0 q0 a a q3 “reject” q3 All possible computations lead to rejection Dr. Salah Eldin Shaban aaa is rejected by the NFA: “reject” q1 a q2 q1 a q2 a a q0 q0 a a q3 q3 “reject” All possible computations lead to rejection Dr. Salah Eldin Shaban Language accepted: L  {aa} q1 a q2 a q0 a q3 Dr. Salah Eldin Shaban Lambda Transitions q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban a a q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban a a q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban input tape head does not move a a q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban all input is consumed a a “accept” q0 a q1 l q2 a q3 String aa is accepted Dr. Salah Eldin Shaban Rejection Example a a a q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban a a a q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban (read head doesn’t move) a a a q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban Input cannot be consumed a a a Automaton halts “reject” q0 a q1 l q2 a q3 String aaa is rejected Dr. Salah Eldin Shaban Language accepted: L  {aa} q0 a q1 l q2 a q3 Dr. Salah Eldin Shaban Another NFA Example q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b “accept” q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban Another String a b a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b a b q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban a b a b “accept” q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban Language accepted L  {ab, abab, ababab,...} +  {ab} q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban Another NFA Example 0 q0 q1 0, 1 q2 1 l Dr. Salah Eldin Shaban Language accepted L(M ) = {λ, 10, 1010, 101010,...} = {10} * 0 q0 q1 0, 1 q2 1 (redundant state) l Dr. Salah Eldin Shaban Remarks: The l symbol never appears on the input tape Simple automata: M1 M2 q0 q0 L(M1 ) = {} L(M 2 ) = {λ} Dr. Salah Eldin Shaban NFAs are interesting because we can express languages easier than DFAs NFA M1 DFA M2 a q2 q0 a q1 a q0 a q1 L( M1 ) = {a} L( M 2 ) = {a} Dr. Salah Eldin Shaban Formal Definition of NFAs M  Q, S,  , q0 , F  Q : Set of states, i.e. {q0 , q1, q2 } S: Input aplhabet, i.e. {a, b} l S : Transition function q0 : Initial state F: Accepting states Dr. Salah Eldin Shaban Transition Function   q , x   {q1 , q2,, qk } q1 x resulting states with q x q1 following one transition x with symbol x qk Dr. Salah Eldin Shaban  q0 , 1  {q1} 0 q0 q1 0, 1 q 2 1 l Dr. Salah Eldin Shaban  (q1,0)  {q0 , q2} 0 q0 q1 0, 1 q 2 1 l Dr. Salah Eldin Shaban  (q0 , l )  {q2 } 0 q0 q1 0, 1 q 2 1 l Dr. Salah Eldin Shaban  (q2 ,1)   0 q0 q1 0, 1 q 2 1 l Dr. Salah Eldin Shaban * Extended Transition Function  Same with  but applied on strings  q0 , a   {q1 } * q4 q5 a a q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban  q0 , aa  {q4 , q5 } * q4 q5 a a q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban  * q0 , ab   {q2, q3, q0 } q4 q5 a a q0 a q1 b q2 l q3 l Dr. Salah Eldin Shaban Special case: for any state q q   q , l  * Dr. Salah Eldin Shaban In general q j   qi ,w  : there is a walk from qi to q j * with label w qi w qj w   1 2  k 1 2 k qi qj Dr. Salah Eldin Shaban The Language of an NFA M The language accepted by M is: LM   {w1 ,w2,...wn } where  (q0 ,w m )  {qi ,..., qk , , q j } * and there is some qk  F (accepting state) Dr. Salah Eldin Shaban w m  LM   (q0 ,wm ) * qi wm q0 w qk qk  F m wm qj Dr. Salah Eldin Shaban F  {q0 ,q5 } q4 q5 a a q0 a q1 b q2 l q3 l {  q0 , aa  q4 , q5 * } aa  L(M ) F Dr. Salah Eldin Shaban F  {q0 ,q5 } q4 q5 a a q0 a q1 b q2 l q3 l {  q0 , ab   q2 , q3, q0 * } ab  LM  F Dr. Salah Eldin Shaban F  {q0 ,q5 } q4 q5 a a q0 a q1 b q2 l q3 l {  q0 , abaa   q4 , q5 * } aaba  L(M ) F Dr. Salah Eldin Shaban F  {q0 ,q5 } q4 q5 a a q0 a q1 b q2 l q3 l  q0 , aba   {q1 } * aba  LM  F Dr. Salah Eldin Shaban q4 q5 a a q0 a q1 b q2 l q3 l L M   {ab } *  {ab } * {aa } Dr. Salah Eldin Shaban NFAs accept the Regular Languages Dr. Salah Eldin Shaban Equivalence of Machines Definition: Machine M1 is equivalent to machine M 2 if L M1   L M 2  Dr. Salah Eldin Shaban Example of equivalent machines NFA M1 LM1   {10} * 0 q0 q1 1 DFA M2 0,1 LM 2   {10} * 0 q0 q1 1 q2 1 0 Dr. Salah Eldin Shaban Theorem:  Languages Regular accepted Languages by NFAs Languages accepted by DFAs NFAs and DFAs have the same computation power, accept the same set of languages Dr. Salah Eldin Shaban Proof: we only need to show Languages accepted  Regular Languages by NFAs AND Languages accepted  Regular Languages by NFAs Dr. Salah Eldin Shaban Proof-Step 1 Languages accepted  Regular Languages by NFAs Every DFA is trivially an NFA Any language L accepted by a DFA is also accepted by an NFA Dr. Salah Eldin Shaban Proof-Step 2 Languages accepted  Regular Languages by NFAs Any NFA can be converted to an equivalent DFA Any language L accepted by an NFA is also accepted by a DFA Dr. Salah Eldin Shaban Conversion NFA to DFA NFA M a q a 0 q l q 1 2 b DFA M {q0 } Dr. Salah Eldin Shaban  * (q0, a )  {q1, q2 } NFA M a q0 a q1 l q2 b DFA M {q0 } a {q1,q2 } Dr. Salah Eldin Shaban  * (q0 , b )   empty set NFA M a q0 a q1 l q2 b DFA M {q0 } a {q1,q2 } b  trap state Dr. Salah Eldin Shaban  (q1, a )  {q1, q2 } * NFA M a  * (q2, a )   q0 a q1 l q2 union b {q1,q2 } a DFA M {q0 } a {q1,q2 } b  Dr. Salah Eldin Shaban  (q1, b )  {q0 } * NFA M a  * (q2, b )  {q0 } a l union q0 q1 q2 b {q0 } a DFA M b {q0 } a {q1,q2 } b  Dr. Salah Eldin Shaban NFA M a q0 a q1 l q2 b a DFA M b {q0 } a {q1,q2 } b  a, b trap state Dr. Salah Eldin Shaban END OF CONSTRUCTION NFA M a q0 a q1 l q2 q1  F b a DFA M b {q0 } a {q1,q2 } {q1, q2 } F  b  a, b Dr. Salah Eldin Shaban General Conversion Procedure Input: an NFA M Output: an equivalent DFA M  with LM   L(M ) Dr. Salah Eldin Shaban The NFA has states q0 , q1, q2 ,... The DFA has states from the power set , {q0 }, {q1 }, {q0 , q1 }, {q1 , q2 , q3 },.... Dr. Salah Eldin Shaban Conversion Procedure Steps step 1. Initial state of NFA: q0 Initial state of DFA: {q0 } Dr. Salah Eldin Shaban Example NFA M a q0 a q1 l q2 b DFA M {q0 } Dr. Salah Eldin Shaban step 2. For every DFA’s state {qi , q j ,..., qm} compute in the NFA  * qi , a     * qj , a  Union  {qk , ql,..., qn }...   * qm , a  add transition to DFA  {qi , q j ,..., qm }, a   {qk , ql,..., qn } Dr. Salah Eldin Shaban Example  * (q0 , a )  {q1, q2 } NFA M a q0 a q1 l q2 b  {q0 }, a   {q1, q2 } DFA M {q0 } a {q1,q2 } Dr. Salah Eldin Shaban step 3. Repeat Step 2 for every state in DFA and symbols in alphabet until no more states can be added in the DFA Dr. Salah Eldin Shaban Example NFA M a q0 a q1 l q2 b a DFA M b {q0 } a {q1,q2 } b  a, b Dr. Salah Eldin Shaban step 4. For any DFA state {qi , q j ,..., qm} if some q j is accepting state in NFA Then, {qi , q j ,..., qm } is accepting state in DFA Dr. Salah Eldin Shaban Example NFA M a q0 a q1 l q2 q1  F b a DFA M b {q0 } a {q1,q2 } {q1, q2 } F  b  a, b Dr. Salah Eldin Shaban Lemma: If we convert NFA M to DFA M  then the two automata are equivalent: L M   L M   Proof: We only need to show: L M   L M   AND L M   L M   Dr. Salah Eldin Shaban First we show: L M   L M   We only need to prove: w L(M ) w  L(M ) Dr. Salah Eldin Shaban NFA Consider w L(M ) q0 w qf symbols w   1 2  k 1 2 k q0 qf Dr. Salah Eldin Shaban symbol i qi qj denotes a possible sub-path like symbol l l i l qi qj Dr. Salah Eldin Shaban We will show that if w L(M ) w   1 2  k 1 2 k NFA M: q0 qf then 1 2 k DFA M: {q0 } {q f ,} state w  L(M ) state label label Dr. Salah Eldin Shaban More generally, we will show that if in M: (arbitrary string) v  a1a2 an a1 a2 an NFA M: q0 qi qj ql qm then M: a1 a2 an DFA {q0 } {qi ,} {q j ,} {ql ,} {qm ,} Dr. Salah Eldin Shaban Proof by induction on |v| Induction Basis: |v | 1 v  a1 a1 NFA M: q0 qi M: a1 DFA {q0 } {qi ,} is true by construction of M  Dr. Salah Eldin Shaban Induction hypothesis: 1 | v | k v  a1a2 ak Suppose that the following hold a1 a2 ak NFA M: q0 qi qj qc qd M: a1 a2 ak DFA {q0 } {qi ,} {q j ,} {qc ,} {qd ,} Dr. Salah Eldin Shaban Induction Step: | v | k + 1 v  a1a2  ak ak +1  vak +1   v Then this is true by construction of M  a1 a2 ak ak +1 NFA M: q0 qi qj qc qd qe v M: a1 a2 ak ak +1 DFA {q0 } {qi ,} {q j ,} {qc ,} {qd ,} {qe ,} v Dr. Salah Eldin Shaban Therefore if w L(M ) w   1 2  k 1 2 k q0 qf NFA M: then 1 2 k DFA M: {q0 } {q f ,} w  L(M ) Dr. Salah Eldin Shaban We have shown: L M   L M   With a similar proof we can show: L M   L M   Therefore: LM   LM  END OF LEMMA PROOF Dr. Salah Eldin Shaban Implementation with FSMs  FSM can be implemented very simply by an array; a row for each state and a column for each possible input symbol.  The array look very much like the table form of FSM.  In code the states and/or input symbols as integers.  Once the array has been initialized, the operation of the machine can be easily simulated, as shown below: Dr. Salah Eldin Shaban 200 Implementation with FSMs Cont. void main () { int fsm[STATES,INPUTS]; //state transition table int inp; //input symbol int start; //starting state int state; cin >> inp; state = start; while (not eof()) { state = fsm[state,inp]; cin >> inp; } } Dr. Salah Eldin Shaban 201 ? Dr. Salah Eldin Shaban 202

Use Quizgecko on...
Browser
Browser