Formal Language and Automata Theory PDF
Document Details
Uploaded by AccurateCombinatorics
Tsegaye A.
Tags
Summary
This document is an introduction to formal language and automata theory, a branch of computer science. It covers fundamental concepts and methodologies in this field, including readings assignments and various application areas.
Full Transcript
Formal Language and Automata Theory Compiled By Tsegaye A. 1 Contents ✓Introduction ✓Alphabets and Strings ✓ Languages ✓Grammars ✓Automata Reading assignments: set theory, relation, function, graphs and tree At end of this chapter students should able to understand...
Formal Language and Automata Theory Compiled By Tsegaye A. 1 Contents ✓Introduction ✓Alphabets and Strings ✓ Languages ✓Grammars ✓Automata Reading assignments: set theory, relation, function, graphs and tree At end of this chapter students should able to understand basic theory of computation of automata and formal language theory Compiled By Tsegaye A. 2 Introduction Theory of computation ▪ Theory has a number of distinct meanings in different fields of knowledge, depending on the context and their methodologies. ▪ Theory is relevant to practice. It provides conceptual tools that practitioners use in computer science or engineering. Hence, various application areas, such as: Modern cryptographic protocols, Designing a new programming language, Dealing with string searching and pattern matching, rely on theoretical principles that you will learn here. ▪ Theory provides concepts and principles that help us understand the general nature of the discipline. Compiled By Tsegaye A. 3 Introduction Theory of computation ▪ This course helps you to learn problem solving skills and teaches you how to think, prove, argue, solve problems, express, and abstract. ▪ Theory simplifies the complex computers to an abstract & simple mathematical model, and helps you to understand them better. ▪ Computer technology changes quickly. Specific technical knowledge, though useful today, becomes outdated in just a few years. Consider instead the abilities to think, to express yourself clearly and precisely, to solve problems, and to know when you haven't solved a problem. ▪ Computation is the process of solving problems through the mathematical, preprogrammed execution of a series of small, unambiguous steps using computers. Compiled By Tsegaye A. 4 Introduction ▪ Theory of computation comprises the fundamental mathematical properties of computer hardware, software, and certain applications. ▪ The study of the theory of computation has several purposes, most importantly: To better understand the development of formal mathematical models of computation that reflect the real-world of computer. To achieve deep understanding about the mathematical properties of computer hardware and software. To understand mathematical definitions of the computation and the algorithm. To rectify the limitations of computers and answer what kind of problems can be computed? ▪ Nowadays, the Theory of Computation can be divided into the following three areas: Complexity Theory, Computability Theory, and Automata Theory. Compiled By Tsegaye A. 5 Introduction ▪ Complexity Theory deals to answer the question "What makes some problems computationally hard and other problems easy?" i.e. it Classifies/identifies problems according to their degree of "difficulty". Give a rigorous proof that problems that seem to be "hard" or "easy". ▪ Computability Theory tries to answer "Which problems can be solved by computers and which ones cannot?" i.e. it classifies problems as being solvable or unsolvable. For instance, computers cannot solve the problem of determining whether a mathematical statement is true or false. ▪ Generally speaking, theories of computability and complexity are closely related. In complexity theory, the objective is to classify problems as easy ones and hard ones, whereas in computability theory the classification of problems is by those that are solvable and those that are not. ▪ Computability theory introduces several of the concepts used in complexity theory. Compiled By Tsegaye A. 6 Introduction ▪ Automata theory deals with the definitions and properties of mathematical models of computation. These models play a role in several applied areas of computer science. ▪ For example: one model, called the finite automaton, is used in text processing, compilers, and hardware design. Oher model, called the context-free grammar, is used in programming languages and in artificial intelligence. And other model called Turing Machines which form a simple abstract model of a "real" computer, such as your PC at home. Automata theory deals to find answer for the question "Do these models have the same power, or can one model solve more problems than the other?" ▪ Automata theory is an excellent place to begin the study of the theory of computation. ▪ Automata theory allows practice with formal definitions of computation as it introduces concepts relevant to other non theoretical areas of computer science. Compiled By Tsegaye A. 7 Alphabets and Strings An alphabet is a finite, non-empty set of symbols We use the symbol Σ (sigma) to denote an alphabet Examples: Binary: Σ = {0,1} All lower case letters: Σ = {a,b,c,..z} Alphanumeric: Σ = {a-z, A-Z, 0-9} Σ ={(0, 0), (0,1), (1, 0), (1, 1)}. NB:(this alphabet is composed of the ordered pairs in {0, 1} x {0, 1}. Strings are defined over an alphabet which is finite. Elements of an alphabet are called symbols. Compiled By Tsegaye A. 8 Alphabets and Strings A string or word is a finite sequence of symbols from the alphabet, usually written as concatenated symbols and not separated by gaps or commas. For example if S= {a,b}, a string abbab is a string or word over S. For any string w = a1,a2..anof length n, the reverse of w is written as wR which is the string an an-1 : : : a1, where each symbol a i belongs to the basic alphabet S. A string z that is appearing consecutively within another string w is called a substring or sub word of w. For example aab is a substring of baabb. Compiled By Tsegaye A. 9 Continued If w is a string over an alphabet S , then the length of w written as len (w) or |w| is the number of symbols it contains. i.e. For a given alphabet ∑, and a word x = a1a2... an over ∑, |x| denotes the length of x. That is, |ala2,... an |= n. NB: a2 b3 = aabbb, (ac)3 = acacac If |w| = 0, then w is called as empty string ε (or "epsilon") E.g., x = 010100 then, |x| = 6 x = 01ε0ε1ε00ε |x| = ? xy = concatenation of two strings x and y Given an alphabet ∑, let x = a1 an and y = b1... bm be strings over ∑. x and y are equal iff n = m and for each i = 1,2,..., n, ai = bi. Compiled By Tsegaye A. 10 Continued Given an alphabet ∑, and some b € ∑, the length of a word w with respect to b, denoted |w |b , is the number of occurrences of the letter b within that word. EXAMPLE: i. |abb|b = 2 ii. |abb |e = 0 iii. |1000000001118881888888|1 = 5 Given an alphabet ∑ and a nonnegative integer k E ∑, we define ∑k ={x | x is a word over∑ and |x |= k}. EXAMPLE If ∑ = {0, 1}, then, ∑0 = {λ} ∑1= {0, 1} ∑2 = {00, 01,10, 11} ∑3 = {000, 001, 010, 011,100,101,110, 111} λ is the only element of ∑0, the set of all words containing zero letters from ∑. Compiled By Tsegaye A. 11 Continued Given an alphabet ∑ , define ∑* is the set of all words that may be constructed from the letters of an alphabet ∑. Example − If ∑ = {a, b}, ∑*= {λ, a, b, aa, ab, ba, bb,………..} ∑+ is the set of all nonempty words that may be constructed from∑. i.e ∑+ = ∑* - { λ } Example − If ∑ = { a, b } , ∑+ ={ a, b, aa, ab, ba, bb,………..} Compiled By Tsegaye A. 12 Languages What is a language? May refer either to the (human) capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication. "A language is a collection of sentences of finite length all constructed from a finite alphabet of symbols" There are two types of languages Informal Languages (Semantic languages): Focuses on semantics not on syntax alone. Eg. Human languages: Amharic, Ge’ez, Affan Oromo, English,… Compiled By Tsegaye A. 13 Languages Formal Languages (Syntactic languages): refers to the fact that all the rules for the language are explicitly stated in terms of what strings of symbols can occur. i.e. A language is formal if it is provided with a mathematically rigorous representation of its alphabet of symbols, and formation rules specifying which strings of symbols count as well- formed. Example1: The language L of strings of odd length, defined over Σ={a}, can be written as L={a, aaa, aaaaa,…..} Example2: The language L of strings that does not start with a, defined over Σ={a,b,c}, can be written as L={b, c, ba, bb, bc, ca, cb, cc, …} Let S={0,1}. 0,00,01,000,11.. Sentences of string Language is a set of sentences L={000,0001,01000,00011} Compiled By Tsegaye A. 14 Grammars Grammars: The description of a natural language is usually called a grammar; it should indicate how the sentences of a language are composed of elements, how elements form larger units, and how these units are related within the context of the sentence. The theory of formal languages finds its applicability extensively in the fields of Computer Science. Grammar is a mechanism to describe the formal language. Or it is a finite lists of rules that defining a language. egg Compiled By Tsegaye A. 15 Automata − What is it? Automata Theory is a branch of computer science that deals with designing abstract self-propelled computing devices that follow a predetermined sequence of operations automatically. It is the plural form of automaton, and it means "something that works automatically" It is the study of abstract computing devices or machines. automaton is a construct that possesses all the indispensable feature of digital computer. Compiled By Tsegaye A. 16 Automata − What is it? It accepts input, produces output, may have some temporary storage can make decisions in to transforming the input into output. Although finite automata theory is concerned with relatively simple machines, it is an important foundation of a large number of concrete and abstract applications. The finite-state control of a finite automaton is also at the heart of more complex computing devices such as finite state transducers, pushdown automata, and Turing machines Note: A "device" need not even be a physical hardware! Compiled By Tsegaye A. 17 Why Study Automata Theory? Since computer science is practical discipline, it includes a wide range of special topics, from machine design to programming. The use of computers in the real world involved a wealth of specific detail that must be learned for a successful application. A fundamental question in computer science: Find out what different models of machines can do and cannot do Automata (abstract machines) – able to recognize the elements of particular languages. Formal language theory/automata theory is important in programming language design and is at the heart of modern compiler architectures. Automata theory contributes the concept of regular expressions, used in many ways in pattern matching. Compiled By Tsegaye A. 18 Chapter two: Finite Automata Contents Finite automata (FA) and its behavior; DFA -Formal definition Simplified notations (state diagram, transition table) Language of a DFA. NFA-Formal definition NFA Equivalence of DFAs NFA to DFA conversion and vise versa Compiled By Tsegaye A. 19 What is a Finite Automaton? Informal definition Finite Automaton, Finite State Machine, FSA or FSM:- it is a mechanism to recognize a set of valid inputs before carrying out an action. Has a finite number of states, and a finite amount of memory (i.e., the current state). An abstract machine used to implement regular expressions (etc.). An automaton with a finite number of states is called a Finite Automaton (FA) or Finite State Machine (FSM). Compiled By Tsegaye A. 20 Characteristics of Finite Automata Input: the input values are as I1, I2, I3,….Ip, each of which can take a finite number of fixed values from the input alphabet ∑, are applied to the input side of the model. Output: O1, O2, O3, …., Oq, are the output of the discrete automata model, each of which can take a finite number of fixed values form an output O States: A state is a condition of processing the inputs. At any instant of time the automaton can be in one of the states q1, q2, q3, ….. qn. Compiled By Tsegaye A. 21 Characteristics of Finite Automata State relation: The next state of an automaton at any instant of time is determined by the present state and the present input (i.e. transition). Output relation: The output is related to either state only or to both the input and the current state. It should be noted that at any instant of time the automaton is in some state. On reading an input symbol the automaton moves to a next state which is given by the state relation. Finite automata are finite collections of states with transition rules that take you from one state to another. It can be represented by directed graphs or transition tables. Today, several kinds of software can be modeled by FA Compiled By Tsegaye A. 22 General framework for Automaton Compiled By Tsegaye A. 23 Finite Automaton Fig.1 Fig.2 Compiled By Tsegaye A. 24 Different Kinds of Automata finite automata Devices with a finite amount of memory. (no temporary Used to model "small" computers. memory) Ex Elevators, Lexical Analyzers push-down automata Devices with infinite memory that can be (stack) accessed in a restricted way. Ex Parsers for Programming Languages Turing Machines Devices with infinite memory. (random access Used to model any computer. memory) Ex Any Algorithm (highest known computing power) time-bounded Infinite memory, but bounded running time. Turing Machines Used to model any computer program that runs in a "reasonable" amount of time. Power of Automata Simple More complex Hardest problems problems problems Finite Pushdown Turing Automata Automata Machine Less power More power Solve more computational problems Compiled By Tsegaye A. 26 Acceptors, Classifiers, and Transducers Acceptor/ Recognizer An automaton that computes a Boolean function is called an acceptor. All the states of an acceptor is either accepting or rejecting the inputs given to it. Classifier A classifier has more than two final states and it gives a single output when it terminates. Transducer An automaton that produces outputs based on current input and/or previous state is called a transducer. Transducers can be of two types:− Mealy Machine − The output depends both on the current state and the current input. Moore Machine − The output depends only on the current state. Compiled By Tsegaye A. 27 Why Study Finite Automata? Used for both design and verification of circuits and communication protocols. Used for many text-processing applications. An important component of compilers. Describes simple patterns of events, etc. Compiled By Tsegaye A. 28 Finite Automaton – Formal Definition A finite automaton is a 5-tupple (Q, , , q0 , F ) where: 1. Q is a finite set called the states. 2. is a finite set called the alphabet. 3. : Q → Q is the transition function. 4. q0 Q is the start state(Initial), and 5. F Q is the set of accept states(Final). state indicates the status of the machine after, consuming some portion of the input. Summarizes the history of the computation that is relevant , to the future course of action. 29 Compiled By Tsegaye A. Finite-state Automata Representation An FSA may be represented as a directed graph; each node (or vertex) represents a state, and the edges (or arcs) connecting the nodes represent transitions. Each state is labeled. Each transition is labeled with a symbol from the alphabet over which the regular language represented by the FSA is defined, or with ,the empty string( ) Among the FSA’s states, there is a start state and at least one final state (or accepting state). NB: A DFA is represented by digraphs called state diagram. Compiled By Tsegaye A. 30 Strings and automata The input data are represented by a string over some alphabet and it determines how the machine progresses from state to state. Beginning in the start state, the characters of the input string cause the machine to change from one state to another. Accepting automata give only yes or no answers, depending on whether they end up in a ‘final state.’ Strings which end in a final state are accepted by the automaton. There are several common ways to represent a finite automaton. One is by a table showing the transitions defined by the machine, another is by a directed graph called a state diagram. Compiled By Tsegaye A. 31 Acceptance of Inputs Given a sequence of inputs (input string ), start in the starting/initial state and follow the transition from each symbol in turn. Given an input string, an FSA will either accept or reject the input. If the FSA is in a final (or accepting) state after all input symbols have been consumed, then the string is accepted (or recognized). Otherwise (including the case in which an input symbol cannot be consumed), the string is rejected. Generally, Acceptability by DFA and NDFA means A string is accepted by a DFA/NDFA iff the DFA/NDFA starting at the initial state ends in an accepting state (any of the final states) after reading the string wholly. A string S is accepted by a DFA/NDFA (Q, ∑, δ, q 0, F), iff δ*(q0, S) ∈ F The language L accepted by DFA/NDFA is {S | S ∈ ∑* and δ*(q0, S) ∈ F} Compiled By Tsegaye A. 32 Simpler Notations for DFA's Specifying a DFA as a five-tuple with a detailed description of the δ (transition function) is both tedious and hard to read. There are two preferred notations for describing automata: A transition diagram, which is a digraph using its vertices and edges as states and transitions respectively A transition table, which is a tabular listing of the δ function, which by implication tells us the set of states and the input alphabet. Compiled By Tsegaye A. 33 Finite Automaton - An Example 0 q0 0,1 qs 1 q1 0,1 States: Q = {qs , q0 , q1} Initial State: qs , Final State: q0 , 34 Compiled By Tsegaye A. Finite Automaton – An Example 0 q0 0,1 qs 1 q1 0,1 Transition (qs ,0) = q0 (qs ,1) = q1 Function: (q0 ,0) = (q0 ,1) = q0 (q1,0) = (q1,1) = q1 Alphabet: 0,1. , , Accepted words: 0,00,01,000,001,... 35 Compiled By Tsegaye A. Transition function table a b s0 (start) s2 s0 s1 (final) s0 s0 s2 s1 s1 The state diagram for this machine is shown here. Compiled By Tsegaye A. 36 Example The labeled graph in the figure above represents a FA over the alphabet Σ = {a, b} with start state 0 and final state 3. Final states are denoted by a double circle. Compiled By Tsegaye A. 37 Finite-state Automata state a b c a q0 q1 q2 q3 q4 = { a, b, c } start state transition final state Input Representation (continued) State a b c – An FSA may also be Q0 Q1 represented with a state-transition table. Q1 Q2 The table for the Q2 Q3 above FSA: Q3 Q4 Compiled By Tsegaye A. Q4 38 Regular language of Finite-state Automata The class of languages defined by Finite Automata are called Regular Languages (or Regular Sets of strings). These languages are described by Regular Expressions. We say that a language is a regular language if the set of strings in the language is a regular set. i.e. M is a DFA with alphabet Σ L(M) is the set of strings accepted by M – L(M) is the language of M – M recognizes L(M) NB: assume that, M→ is a DFA machine and L is language of Machine Compiled By Tsegaye A. 39 Regular language of Finite-state Automata An FSA defines a regular language over an alphabet Σ : ⱷ is a regular language: Any symbol from Σ is a regular language: Σ = { a, b, c}. egg Two concatenated regular languages is a regular language: Σ = { a, b, c} q0 b q1 q0 c q1 q0 b q1 c q2 Compiled By Tsegaye A. 40 Finite-state Automata Regular language (continued): The union (or disjunction) of two regular languages is a regular language: q0 b q1 q0 c q1 = { a, b, c} q1 b q0 q2 c q3 The Kleene closure (denoted by the Kleene star: *) of a regular language is a regular language: = { a, b, c} q0 b q1 Compiled By Tsegaye A. 41 Examples of regular languages: 1. Number of characters is even Input: Σ = {0}. Accept: all strings in which the number of characters is even. 2. Number of characters is divisible by 3 Input: Σ = {0}. Accept: all strings in which the number of characters is divisible by 3. 3. Number of characters is divisible by 6 Input: Σ = {0}. Accept: all strings in which the number of characters is divisible by 6. Compiled By Tsegaye A. 42 Examples of regular languages: 4. L = { w | w contains 001} is regular 5. Number of ones is even Input is a string over Σ = {0, 1}. Accept: all strings in which the number of ones is even. 0 43 Compiled By Tsegaye A. Exercises Draw each FSA for the following machines. 1. M1 accepts all words (strings) ending with 1: L(M 1) = w | w ends with 1 2. M2 accepts all words ending with 0: L(M 2 ) = w | w ends with 0 3. M3 accepts all words of the form 0m1n where m, n are integers and m, n > 0 4. M4 accepts all words in {0, 1, 00, 01, 10} 5. M5 accepts all words that contain even number of 0s and 1s. L(M) = {w | w has both an even number of 0's and an even number of l's} 6. Draw an equivalence FSA for the following transition table. NB → and * indicate the start state and final state respectively 44 Compiled By Tsegaye A. How to do it 1. Find some simple examples (short accepted and rejected words) 2. Think what should each state "remember" (represent). 3. Draw the states with a proper name. 4. Draw transitions that preserve the states’ "memory". 5. Validate or correct. 6. Write a correctness argument. Compiled By Tsegaye A. 45 Deterministic/Nondeterministic Finite State Automata An FSA may be either deterministic (DFSA or DFA) or non-deterministic (NFSA or NFA). An FSA is deterministic if its behavior during recognition is fully determined by the state it is in and the symbol to be consumed. I.e., given an input string, only one path may be taken through the FSA. Or i.e. When the machine is in a given state and reads the next input symbol, we know what the next state will be-it is determined. We call this deterministic computation. Conversely, an FSA is non-deterministic if, given an input string, more than one path may be taken through the FSA. i.e. In a nondeterministic machine, several choices may exist for the next state at any point. One type of non-determinism is e-transitions, i.e. transitions which consume the empty string (no symbols). NB: Nondeterminism is a generalization of determinism, so every deterministic finite automaton is automatically a nondeterministic finite automaton. Compiled By Tsegaye A. 46 DFA vs. NFA The difference between a DFA and an NFA, is immediately apparent. DFA NFA every state of a DFA always has exactly one In an NFA a state may have zero, exiting transition arrow for each symbol in the one, or many exiting arrows for each alphabet alphabet. symbol. Hence, fig1. NFA violates that rule. In general, an NFA may have arrows i.e. State q1 has one exiting arrow for 0, but it labeled with members of the alphabet or e has two exit arrows for 1; q2 has one arrow (epsilon). i.e. for 0, but it has none for 1. Zero, one, or many arrows may exit In a DFA, labels on the transition arrows are from each state with the label e. symbols from the alphabet. fig1. NFA Compiled By Tsegaye A. 47 Deterministic/Nondeterministic Finite State Automata Compiled By Tsegaye A. 48 DFA vs. NFA e.g. Fig2. For example, the computation of Fig2. on the input 010110 is Compiled By Tsegaye A. 49 DFA vs. NFA DFA NFA I. Deterministic Finite Automaton is an FA in 1. NFA or Non-Deterministic Finite Automaton which there is only one path for a specific is the one in which there exist many paths input from the current state to next state. for a specific input from a current state to There is a unique transition on each input next state. symbol. 2. NFA can use Empty String transition. II. DFA cannot use Empty String transition 3. NFA can be understood as multiple little III.DFA can be understood as one machine machines computing at the same time. IV.DFA will reject the string if it ends at other 4. If all of the branches of NFA dies or rejects the than accepting state string, we can say that NFA rejects the string. V. For Every symbol of the alphabet, there is 5. We do not need to specify how the NFA reacts only one state transition in DFA according to some symbol. Compiled By Tsegaye A. 50 Deterministic finite automata (DFA’s) A DFA over a finite alphabet Σ is a finite directed graph with the property that each node emits one labeled edge for each distinct element of Σ. For every state and every alphabet symbol there is exactly one move that the machine can make. δ : Q x Σ → Q A DFA accepts a string w in Σ* if there is a path from the start state to some final state such that w is the concatenation of the labels on the edges of the path. Otherwise, the DFA rejects w. The set of all strings accepted by a DFA M is called the language of M and is denoted by L(M) Compiled By Tsegaye A. 51 Formal Definition of a DFA A DFA is a five- tuples: M = (Q, Σ, δ, q0, F) Q A finite set of states Σ A finite input alphabet q0 The initial/starting state, q0 is in Q F A set of final/accepting states, which is a subset of Q δ A transition function, which is a total function from Q x Σ to Q δ: (Q x Σ) –> Q δ is defined for any q in Q and s in Σ, and δ(q, s) = q’ is equal to some state q’ in Q, could be q’=q Intuitively, δ(q, s) is the state entered by M after reading symbol s while in state q Compiled By Tsegaye A. 52 DFA State Diagrams L(M) ={ w | w has an even number of 1s} with input alphabet { a , b } That accepts strings contain at least 3 a’s. i.e. L(M)= {w | w has a length of a at least 3} Compiled By Tsegaye A. 53 DFA State Diagram Example #1: 1 0 q0 q1 1 0 1 0 0 1 1 q0 q0 q1 q0 q0 q0 The above DFA accepts those strings that contain an even number of 0’s Compiled By Tsegaye A. 54 DFA State Diagram For example #2: 1 0 Q = {q0, q1} q0 q1 1 Σ = {0, 1} 0 Start & final state is q0 F = {q0} δ: 0 1 q0 q1 q0 q1 q0 q1 55 Compiled By Tsegaye A. DFA State Diagram Example #3: There are states off and on, the automaton starts in off and tries to reach the "good state" on What sequences of fs lead to the good state? Answer: {f, fff, fffff, …} = {fn: n is odd} This is an example of a deterministic finite automaton over alphabet {f} Compiled By Tsegaye A. 56 Applications of DFA’s DFA’s are very often used for pattern matching, e.g. searching for words/structures in strings This is used often in UNIX, particularly by the grep command, which searches for combinations of strings and wild words (*, ?) grep stands for Global (search for) Regular Expressions Parser DFA’s are also used to design and check simple circuits, verifying protocols, etc. They are of use whenever significant memory is not required Embedded controllers in your car Compiled By Tsegaye A. 57 Non-deterministic finite automata In NDFA, for a particular input symbol, the machine can move to any combination of the states in the machine. In other words, the exact state to which the machine moves cannot be determined. i.e. Nondeterminism may be viewed as a kind of parallel computation wherein multiple independent "processes" or "threads" can be running concurrently. For NFA’s, sometimes there is more than one direction we can go with the same input character Non-determinism can occur, because following a particular string, one could be in many possible states, or taken different paths to end at the same state! NFAs can be constructed from DFAs using transitions: Called NFA- 58 Non-deterministic finite automata A non-deterministic finite automaton (NFA) over an alphabet Σ is a finite directed graph with each node having zero or more edges. Each edge is labelled either with a letter from Σ or with . Processing follows a lambda-arc no matter what its input is (even null input) Multiple edges may be emitted from the same node with the same label. Some letters may not have an edge associated with them and such strings following such paths are not recognised. Compiled By Tsegaye A. 59 Non-deterministic finite automata If an edge is labelled with the empty string λ, then we can travel the edge without consuming an input letter. Effectively we could be in either state, and so the possible paths could branch. If there are two edges with the same label, we can take either path. NFA’s recognise a string if any one of its many possible states following it is a final state Otherwise, it rejects it. Compiled By Tsegaye A. 60 Formal definition for NFA A Non-Deterministic Finite Automata is a 5-tuple (Q,Σ, δ, qo, F) where Q is a finite set (of states) Σ is a finite alphabet of symbols q0 ∈ Q is the start state F ⊆ Q is the set of final states δ is a function from Q x (Σ ∪ {λ}) to 2Q (transition function).2Q is the set of subsets of Q and Σ ∪ {λ} Compiled By Tsegaye A. 61 Nondeterministic Finite Acceptors /NFAs Example: It may have: Multiple outgoing edges 1 with the same letter a a λ- transitions λ b b 0 2 3 4 a a 5 6 b Compiled By Tsegaye A. 62 Non-deterministic finite automata Egg. Compiled By Tsegaye A. 63 Non-deterministic finite automata Egg. i.e. there is 2 exiting arrows for input 1 Notice that transition tables can be used to specify the transition function for an NFA as well as for a DFA. The only difference is that each entry in the table for the NFA is a set, even if the set is a singleton (has one member). Compiled By Tsegaye A. 64 Non-deterministic finite automata Egg. Let A be the language consisting of all strings over {0, 1} containing a 1 in the third position from the end (e.g., 000100 is in A but 0011 is not). NB: One good way to view the computation of this NFA is to say that it stays in the start state q1 until it "guesses" that it is three places from the end. At that point, If the input symbol is a 1, it branches to state q2 and uses q3 and q4 to "check" on whether its guess was correct. Compiled By Tsegaye A. 65 Non-deterministic finite automata Egg. Consider the following NFA that has an input alphabet {0} consisting of a single symbol. An alphabet containing only one symbol is called a unary alphabet. It accepts all strings of the form 0k where k is a multiple of 2 or 3. For example, it accepts the strings e, 00, 000, 0000, and 000000, but not 0 or 00000. Compiled By Tsegaye A. 66 Equivalence of DFA and NFA ▪ Nondeterminism is a generalization of determinism, so every deterministic finite automaton is automatically a nondeterministic finite automaton. ▪ What does it mean for two automata to be equivalent? ▪ Two finite automata M1 and M2 are equivalent if L(M1) = L (M2). ▪ If they accept/recognize the same language. ▪ Hence, Every nondeterministic finite automaton (NFA) has an equivalent deterministic finite automaton (DFA) Compiled By Tsegaye A. 67 Equivalence of DFA and NFA How we will show this 1 Given a DFA that accepts L, create an NFA that also accepts L 2. Given an NFA that accepts L, create a DFA that also accepts L Compiled By Tsegaye A. 68 Equivalence of DFAs and NFAs Automata are equivalent if they accept the same language. Example 1: Both NFA & DFA accept a common language. What is that language? a 1 a b 0 b 1 2 λ b 2 b L (M)= { w | w ∈ all strings consist of Σ and they can start with a or b and other positions with b or e.} I.e. L (M)= { w | w ∈ (a+b)b* , where * = 0, 1, 2, …..} 69 Compiled By Tsegaye A. NFA’s versus DFA’s NFA for a*a : DFA for a*a : Why is the top an NFA while the bottom is a DFA? Compiled By Tsegaye A. 70 NFA’s equivalent Example Draw two NFAs to recognize the language of the regular expression ab + a*a. ThisNFA has a λ edge, which allows us to travel to state 2 without consuming an input letter. The upper path corresponds to ab and the lower one to a*a Compiled By Tsegaye A. 71 An equivalent NFA This NFA also recognizes the same language. Perhaps it's easier to see this by considering the equality ab + a*a = ab + aa* Compiled By Tsegaye A. 72 Equivalence of DFAs and NFAs Example: a b 0 2 1 a,b a, b Construct an equivalent DFA. Compiled By Tsegaye A. 73 Application for non-deterministic FA All digital computers are deterministic; their state at any time is uniquely predictable from the input and the initial state. The usual mechanism for deterministic computers is to try one particular path and to backtrack to the last decision point if that path proves poor. Parallel computers make non-determinism almost realizable. We can let each process make a random choice at each branch point, thereby exploring many possible trees. Compiled By Tsegaye A. 74 NDFA to DFA Conversion Let X = (Qx, ∑, δx, q0, Fx) be an NDFA which accepts the language L(X). We have to design an equivalent DFA Y = (Qy, ∑, δy, {q0} , Fy) such that L(Y) = L(X). Notice that the input alphabets of the two automata are the same, and the start state of Y is the set containing only the start state of X. The following procedure converts the NDFA to its equivalent DFA Compiled By Tsegaye A. 75 NDFA to DFA Conversion Algorithm Input: an NDFA Output: equivalent DFA Step-1: Create state table from the given NDFA Step-2: Create a blank state table under possible input alphabets for the equivalent DFA Step-3: Mark the start state of DFA by the q0 (same as NDFA) Step-4: Find out the combination of states {q0, q1, … qn) for each input alphabet Step-5: Each time we generate a new DFA state under the input alphabet columns, we have to apply step 4 again, otherwise go to step 6 Step-6: The states which contain any of the final states of the NDFA are the final states of the equivalent DFA Compiled By Tsegaye A. 76 From NFA’s to DFA’s We prove the equivalence of NFA’s and DFA’s by showing how, for any NFA, to construct a DFA which recognizes the same language Generally, the DFA will have more possible states than the NFA. If the NFA has n states, then the DFA could have as many as 2n states! Example: NFA has three states {A}, {B}, {C} the DFA could have eight: {}, {A}, {B}, {C}, {A, B}, {A, C}, {B, C}, {A, B, C} These correspond to the possible states the NFA could be in after any string Compiled By Tsegaye A. 77 DFA construction Begin in the NFA start state, which could be a multiple state if it is connected to any by λ (epsilon) Determine the set of possible NFA states you could be in after receiving each character. Each set is a new DFA state, and is connected to the start by that character. Repeat for each new DFA state, exploring the possible results for each character until the system is closed DFA final states are any that contain a NFA final state Compiled By Tsegaye A. 78 From NFA To DFA: example Construct the equivalent DFA of the NFA given in figure Solution: Step 1: Transaction Table of NFA from Figure NFA Step 2,…: Transaction Table of DFA: 1 , q2} equivalent DFA 79 Compiled By Tsegaye A. Example (a+b)*ab A B C a b NFA a,b The start state is A, but following an input a you could be in A or B; following a b you could only be in state A A A,B A,C a b a a DFA b b Compiled By Tsegaye A. 80 From e-NFA To DFA conversion Step 1 : Take ∈ closure for the beginning state of NFA as beginning state of DFA. Step 2 : Find the states that can be traversed from the present for each input symbol (union of transition value and their closures for each states of NFA present in current state of DFA). Step 3 : If any new state is found take it as current state and repeat step 2. Step 4 : Do repeat Step 2 and Step 3 until no new state present in DFA transition table. Step 5 : Mark the states of DFA which contains final state of NFA as final states of DFA. Compiled By Tsegaye A. 81 From e-NFA To DFA conversion Convert the following e-NFA to DFA Solution: step1: ∈-closure (A) ={A,B,C} ∈-closure (B) ={B,C} ∈-closure (C) ={C} NFA with epsilon Step2: NFA transition table Presen Inputs alphabets t state 0 1 e A {B,C} {A} {B} B {B} {C} C {C} {C} Step3,…: DFA transition table DFA Inputs alphabets Present state 0 1 {A,B,C} {B,C} {A,B,C} {B,C} {C} {B,C} {C} {C} {C} Compiled By Tsegaye A. 82 Exercise: Convert this NFA to DFA λ q2 a q1 λ q4 q0 λ λ q6 q3 a q5 Compiled By Tsegaye A. 83 Chapter three: Regular expression and regular language Contents Regular expressions (RE), RE to FA, FA to RE, Applications of REs. Regular languages and FA Connection Between Regular Expressions and Regular Languages Pumping lemma and non-regular languages Regular grammars Compiled By Tsegaye A. 84 RE’s: Introduction Regular expressions are an algebraic way to describe particular kind of formal language called a regular language. Are Notation to specify a language (i.e. Regular expressions denote languages.) Regular expression is one way of describing a finite-state automaton(FSA) Are formula in algebraic notation for specifying a set of strings String: Any sequence of alphanumeric characters: Letters, numbers, spaces, tabs, punctuation marks Can use RegExps to specify the rules for any set of possible strings you want to match Sentences, e-mail addresses, etc. 85 Compiled By Tsegaye A. RE’s: Introduction Composed of symbols "ε", "Ø", "+", "*", ".", "(", ")" and alphabets Regular expressions represent the regular languages. DFA’s recognize the regular languages. NFA’s also recognize the regular languages. Example of regular expression whitespace: "\s" = "[ \t\r\n\f\v]" digit: "\d" = "[0-9]" word: "\w" = "[a-zA-Z0-9_ ]" non-whitespace: "\S" = "[^ \t\r\n\f]" non-digit: "\D" = "[^0-9]" non-word: "\W" = "[^a-zA-Z0-9_ ]" 86 Compiled By Tsegaye A. RE’s: Definition Basis 1: If a is any symbol, then a is a RE, and L(a) = {a}. Note: {a} is the language containing one string, and that string is of length 1. Basis 2: ε is a RE, and L(ε) = {ε}. (ε represents the language containing a single string-namely, the empty string ) Basis 3: ∅ is a RE, and L(∅) = ∅. ( i.e. ∅ represents the empty language that doesn't contain any strings.) Hence, ∅, ε, and a ∈ Σ are called primitive regular expressions and the regular expressions a and ε represent the languages {a} and {ε} respectively. Compiled By Tsegaye A. 87 RE’s: Definition – (2) Induction 1: If E1 and E2 are regular expressions, then E1+E2 is a regular expression, and L(E1+E2) = L(E1)L(E2). E.g. if E1 = {001, 10, 1ll} and E2 = {e,00l}, then E1 + E2 = E1 E2 ={e, 10, 001, 1ll} Induction 2: If E1 and E2 are regular expressions, then E1E2 is a regular expression, and L(E1E2) = L(E1)L(E2). e.g. if L = {001, 10, 1ll} and M = {e, 00l}, then L.M, or just LM, is {00l, 10, 111, 001001,10001, 11l00l}. The first three strings in LM are the strings in L concatenated with e. Because e is the identity for concatenation. 88 Compiled By Tsegaye A. RE’s: Definition – (3) Induction 3: If E is a RE, then E* is a RE, and L(E*) = (L(E))*. Concatenation : the set of strings wx such that w is in L(E1) and x is in L(E2). Closure, or "Kleene closure" = set of strings w1w2…wn, for some n > 0, where each wi is in L(E). Note: when n=0, the string is ε. 89 Compiled By Tsegaye A. Examples: RE’s L(01) = {01}. L(01+0) = {01, 0}. L(0(1+0)) = {01, 00}. L(0*) = {ε, 0, 00, 000,… }. L((0+10)*(ε+1)) = all strings of 0’s and 1’s without two consecutive 1’s. Note order of precedence of operators. Order of precedence is * (highest), then concatenation (.) , then union (+) (lowest). i.e. precedence of * > precedence of. > precedence of + Compiled By Tsegaye A. 90 Examples: RE’s Let the alphabet Ʃ is {0, 1} then, 0*10* = {w| w contains a single 1}. Ʃ* l Ʃ* = {w | w has at least one 1}. Ʃ*001Ʃ * = {w | w contains the string 001 as a substring}. (Ʃ Ʃ Ʃ )* = {w| the length of w is a multiple of 3} (Ʃ Ʃ )* = {w| w is a string of even length} 01 U 10 = {01, 10} Since U=+ 91 Compiled By Tsegaye A. Examples: RE’s Let R be any regular expression and then, R U ∅ =R (Adding the empty language to any other language will not change it) R o ε = R (Joining the empty string to any string will not change it) But, exchanging ∅ and ε in the preceding identities may cause the equalities to fail. i.e. R U ε may not equal R. For example, if R = 0, then L(R) = {0} but L(R U ε) = {0, ε} R o ∅ may not equal R. For example, if R = 0, then L(R) = {0} but L(R o ∅) = ∅. Note that ∅0 = {ε}, while ∅i , for any i>= 1, is empty, since we can't select any strings from the empty set. 92 Compiled By Tsegaye A. Characteristics of REs Two regular expression (or automaton) are EQUAL if they both generate the same languages Every regular language defined by a finite automaton is also defined by some regular expression defined by a regular expression is also defined by some finite automaton Regular Language plan for showing the equivalence of four different notations for regular languages Compiled By Tsegaye A. 93 Example Regular expression: (a + b ) a * L((a + b ) a *) = L((a + b )) L(a *) = L(a + b ) L(a *) = ( L(a ) L(b )) ( L(a )) * = (a b) (a) * = a, b , a, aa, aaa,... L((a + b ) a *) = a, aa, aaa,..., b, ba, baa,... Compiled By Tsegaye A. 94 Using FSA to Recognize Sheep talk Output: baa!, baaa!, baaaa!, baaaaa!, baaaaaa!,...... regular expression: (baa+!) nodes: states(q0: start state, q4: final state/accepting state) arcs: transitions: Present Inputs FSA: state a b ! a q0 - q1 - q1 q2 - - q2 q3 - - q3 q3 - q4 q4 - - - Compiled By Tsegaye A. 95 examples Regular expression = ab*c , then Regular language = {ac, abc, abbc, ….} NB: * =0,1 … Example: (a + b c) * = a, bc* Regular language = , a, bc, aa, abc, bca,... Draw DFA for above REs Compiled By Tsegaye A. 96 Regular expressions → Finite Automata Given a regular expression, we can find an automata which recognises its language. Start the algorithm with a machine that has a start state, a single final state, and an edge labelled with the given regular expression as follows: Compiled By Tsegaye A. 97 Regular expressions → Finite Automata Transform any diagram like NB: final state must be double circle Compiled By Tsegaye A. 98 Regular expressions → Finite Automata Transform any diagram like Compiled By Tsegaye A. 99 Regular expressions → Finite Automata Example: (a+b)* Construct a DFA to recognize the regular languages represented by the regular expression (a + b)* over alphabet Σ = {a, b}. This is the set {a, b}* of all strings over {a, b}. This can be recognised by Compiled By Tsegaye A. 100 Regular expressions → Finite Automata Example: a(a+b)* Find a DFA to recognize the language represented by the regular expression a(a + b)* over the alphabet Σ = {a, b}. a, b a 1 2 Compiled By Tsegaye A. 101 Regular expressions → Finite Automata Example: a(a+b)* Find a DFA to recognize the language represented by the regular expression a(a + b)* over the alphabet Σ = {a, b}. This is the set of all strings in Σ* which begin with a. One possible DFA is: Compiled By Tsegaye A. 102 Regular expressions → Finite Automata Example: pattern recognition Build a DFA to recognize the regular language represented by the regular expression (a + b)*abb over the alphabet Σ = {a, b}. Thelanguage is the set of strings that begin with anything (a or b), but must end with the substring abb. Effectively, we’re looking for strings which have a particular pattern to them b a a a b 1 2 b 3 40 a b Compiled By Tsegaye A. 103 Regular expressions → Finite Automata Example a*+ab Construct an NFA for the regular expression, a* + ab Start with a* + ab a* ab Compiled By Tsegaye A. 104 Regular expressions → Finite Automata Example a*+ab (continued) a a* λ λ ab ab a λ λ a*+ab a b Compiled By Tsegaye A. 105 Exercises Write a regular expression for the set of strings that contains an even number of 1’s over ={0,1}. Compiled By Tsegaye A. 106 Applications of REs Unix text search, search matching patterns (grep) Lexical/Parser analysis Parse text against a regular expression find set of first tokens at this expression root find set of last tokens at this expression root find set of next tokens after an alphabet position in a regular expression Efficient search of patterns in very large repository (web text search) Compiled By Tsegaye A. 107 Regular Language Definition A language (a set of strings) is defined to be a regular language if it can be defined by a finite automaton: DFA or an NFA or And by a RE or And by RG 108 Compiled By Tsegaye A. Pumping Lemma and non Regular Languages Pumping Lemma for Regular Languages Pumping lemma is used to show that a given languages is not regular. This lemma is used to prove whether certain sets of language (or strings) are regular or not regular. Let L be a regular language. Then there exists a constant n (which depends on L) such that for every string w in L such that |w| >= n, we can break w into three strings, w = xyz, such that: 1. y ≠ ε i.e. y > 0 2. |xy| = 0 i.e. When w is divided into xyz, either x or z may be ε, but condition 1 says that y ≠ ε That is, we can always find a nonempty string y not too far from the beginning of w that can be "pumped"; that is, repeating y any number of times, or deleting it (in the case of i = 0), to yield another string that is in the language L. we say that the middle string "pumped"; hence the term pumping lemma is for this result Compiled By Tsegaye A. 109 Pumping Lemma and non Regular Languages Example: prove that language L =anbn | n>=0 is not regular. Solution: to prove this language is not regular we have to follow Pumping Lemma rules/steps Step I: assume that L is a regular language and n is the number of states (in finite automaton for this language). Step II: choose a string w, so w= anbn , and find length of |w| = 2n ( number of a and b) it means |w|= 2n >=n (condition 1 is satisfied) Step III: divided chosen string in to three components x, y and z. i.e. w=xyiz for example, string may be w=aaaaaaabbbbbbb (equal no of a and b that is n=7). Consider some number of a in y out of 7 and rest number of a in x and all number of b in z. Assume w=xyz with |xy|=1 // m can not be zero because minimum length of y must be 1. then, w=x(bm)iz and replace the value of x,y,z in w=xyiz form w=an(bm)ibn-m //an is component of x, (bm)i is component of yi and bn-m is component of z Select value of i , egg. i=0, then w=an(bm)0bn-m = anbn-m (because of i=0 component (bm)i will be NULL) Hence, string w=anbn-m , and n ≠ n-m, (number of a ≠ number of b) As a result, anbn-m will not belong to given language L. Compiled By Tsegaye A. 112 Pumping Lemma example continued… case 3: suppose y has some number of a's and b's; let y=akbm | k, m >=1. then w=x(akbm)iz and replace the value of x,y,z in w=xyiz form. w=an-k(akbm)ibn-m //an-k is component of x, (akbm)i is component of yi and bn-m is component of z Select the value of i , egg. i=2; then, w=an-k(akbm)2bn-m (because of i=2 component (akbm)2 will be as akbmakbm ) Hence, string w=an-kakbmakbmbn-m and we can show that here a be come after any occurrence of b, As a result, an-kakbmakbmbn-m will not belong to given language L. With the help of three cases we can easily say that giving language L is not regular language. 113 Compiled By Tsegaye A. Pumping Lemma and non Regular Languages Example 2: Prove that L = {0k 1k : k ≥ 0} is NOT regular. Let w = 0n 1n and w € L and |w| >= n. We show that ∀ x, y, z do not satisfied rules 1–3 If (1), (2) hold then w = 0n 1n = xyz with |xy| ≤ n and |y| ≥ 1. So, x = 0s, y = 0t, z = 0p1n with |xy|=s + t ≤ n, (NB: here, 0 is divided to x , y and z) And with |y| ≥ 1= t ≥ 1, p ≥ 0, s + t + p = n. But then (3) fails for k = 0: xy0z = xz = 0s0p1n = 0s+p 1n not in L, since s + p ≠ n Compiled By Tsegaye A. 114 Pumping Lemma and non Regular Languages Example 3: Consider the language A = {w ∈ {0, 1} ∗ : number of 0s and 1s in w is equal}. Again, we prove by contradiction that A is not a regular language. Assume that A is a regular language. Let p ≥ 1 be the pumping length, as given by the pumping lemma. Consider the string s = 0p1p. Then s ∈ A and |s| = 2p ≥ p. By the pumping lemma, s can be written as s = xyz, where y ≠ ε , | xy| ≤ p, and xyiz ∈ A for all i ≥ 0. Since | xy| ≤ p, the string y contains only 0s. Since y ≠ ε , y contains at least one 0. Therefore, the string xy2z = xyyz contains more 0s than 1s, which implies that this string is not contained in A. But, by the pumping lemma, this string is contained in A. This is a contradiction and, therefore, A is not a regular language Compiled By Tsegaye A. 115 Connection Between Regular Expressions and Regular Languages As the terminology recommends, the connection between regular languages and regular expressions is a close one. The two concepts are essentially the same; for every regular language there is a regular expression, and for every regular expression there is a regular language. Regular Expressions Denote Regular Languages Examples ɸ is a regular expression for regular language ɸ. If a ∈ Σ (Σ represents the input alphabet), a is regular expression for regular language {a}. If a and b are regular expression, a + b is also a regular expression with language {a, b}. (a, b*) is a regular expression for regular language {a, ab, abb, abbb, abbbb,….} Compiled By Tsegaye A. 116 Connection Between Regular Expressions and Regular Languages Regular Languages expressed by Regular Expressions Examples Regular expressions describe regular languages Example: ( a + b c) * describes the language: a, bc* = , a, bc, aa, abc, bca,... Compiled By Tsegaye A. 117 Regular Grammars A grammar is a set of rules for a strings generation in a formal language. These rules describe how does strings forms from the language that are valid according to the language syntax. A grammar does not describe the meaning of the generated string Grammars are often alternative way of specifying languages. Grammars express languages Example: English language → → A regular grammar is any right- → linear or left-linear grammar → a → the Regular Grammars generate Regular → boy Languages (that is why they are → dog called Regular Grammars). → runs → walks Compiled By Tsegaye A. 118 Regular Grammars A derivation of "the boy walks" is: Compiled By Tsegaye A. 119 Regular Grammars A derivation of "a dog runs" is: Compiled By Tsegaye A. 120 Regular Grammars Then, the Language of the grammar is: L = { "a boy runs", "a boy walks", "the boy runs", "the boy walks", "a dog runs", "a dog walks", "the dog runs", "the dog walks" } Compiled By Tsegaye A. 121 Regular Grammars Notations noun boy noun dog Variable Terminal Production or rule Non-terminal 122 Compiled By Tsegaye A. 122 Grammars Definition of a grammar A grammar is a quadruple G = (V, T, S, P), Where: V: finite set of variables/ non terminals T: finite set of terminal symbols S: start variable € V P: set of production rules Compiled By Tsegaye A. 123 The four components ❖terminal symbols ▪ atomic components of statements in the language ▪ appear in source programs ▪ identifiers, keywords and so ❖Nonterminal symbols intermediate elements in producing terminal symbols never appear in source program ❖start (or goal) symbol a special nonterminal which is the starting symbol for producing statements ❖productions ▪ rules for transforming nonterminal symbols into terminals or other nonterminal ▪ "nonterminal" ::= terminals and/or nonterminals ▪ each has left-hand side (LHS) and right-hand side (RHS) ▪ every nonterminal must appear on LHS of at least one production Compiled By Tsegaye A. 124 Grammars Definition: Let G = (V,T, S, P) be a grammar. Then the set is the language generated by G, where represents an unspecified number of derivations (including zero, if not including zero, we then use ) that can be taken from S to w. Compiled By Tsegaye A. 125 Grammars Example: Find the grammar that generates the language Both grammar G1 defined by: with P1 consisting of the productions: S → aSb S→ λ and grammar G2 defined by: with P 2 consisting of the productions: will generate the language 126 Compiled By Tsegaye A. Grammars Compiled By Tsegaye A. 127 Grammars Compiled By Tsegaye A. 128 Grammars Compiled By Tsegaye A. 129 Linear Grammars A linear grammar is a grammar in which at most one variable can occur on the right side of any production, without restriction on the position of this variable. Examples: S → aSb S → Ab S → A → aAb A → λ In other words a linear grammar is one in which no right-hand side of any production contains more than one occurrence of a variable. A language is linear if it has a linear grammar. Compiled By Tsegaye A. 130 Linear Grammars Another Linear Grammar example Grammar: G S→A A → aB | B → Ab L(G ) = {a n b n : n 0} Compiled By Tsegaye A. 131 A Non-Linear Grammar A non-linear grammar is a grammar in which more variables can occur on the right side of any production, without restriction on the position of this variable. Example:Grammar: G S → SS S → S → aSb S → bSa 𝐿(𝐺) = {𝑤: |𝑤|𝑎 = |𝑤|𝑏 } The above grammar is non-linear because one of the rules (the first one) has two variables/ non-terminals on the right-hand side. Compiled By Tsegaye A. 132 Right-Linear Grammars A grammar G =(V, T, S, P) is said to be right-linear if all productions are of the form: A → xB, or A → x, where A, B ∈ V, and x ∈ T*. Now consider those grammars in which all of the rules contain only one non- terminal on the right-hand side, and that in every case, that non terminal is the right-most symbol. Such grammars are called "right linear." Compiled By Tsegaye A. 133 Right-Linear Grammars Example: The grammar G1 = ({S}, {a,b},S,P1), with P1 given as S → abS |a is right-linear. The sequence: S ⇒ abS ⇒ ababS ⇒ababa is a derivation with G1 The language L (G1) is denoted by the regular expression r = (ab)* a. Compiled By Tsegaye A. 134 Left-Linear Grammars A grammar G =(V, T, S, P) is said to be left-linear if all productions are of the form: A → Bx or A→ x where A, B ∈ V, and x ∈ T*. Now let’s consider a similar class of grammars in which all of the rules contain only one non-terminal on the left-hand side, and where in every case that non-terminal is the first symbol. Example 1: Example 2: S → Aab The grammar G2 = ({S, S1, S2}, {a, b}, S, P2), with productions S → S1ab, A → Aab | B S1 → S1ab|S2, B→a S2 → a, is left-linear. The language L(G2) is expressed by RE = (aab(ab)*). Compiled By Tsegaye A. 135 Right vs. Left-Linear Grammars Right versus left linear grammars Right Linear Grammars: Left Linear Grammars: Rules of the forms Rules of the forms A→ ε A→ ε A→a A→a A → aB A → Ba A,B: variables/non terminals A,B: variables/non terminals and and a: terminal a: terminal Left-linear/left regular grammars, in which all non-terminals in right hand sides are at the left ends. Right-linear/right regular grammars, in which all non-terminals in right hand sides are at the right ends. Compiled By Tsegaye A. 136 Regular Grammars con.. Note that, a regular grammar is always linear, but not all linear grammars are regular. Example: the grammar G: S → A, A → aB| λ, B → Ab, is not a regular grammar. Although every production is either in right-linear or left linear form, the above grammar itself is neither right-linear nor left-linear, and therefore is not regular grammar. Left-linear/left regular grammars, in which all non-terminals in right hand sides are at the left ends. Right-linear/right regular grammars, in which all non-terminals in right hand sides are at the right ends. Compiled By Tsegaye A. 137 Regular grammars generate regular languages Examples: G1 G2 S → abS S → Aab S →a A → Aab | B L(G1) = (ab) * a B→a L(G2 ) = aab(ab) * Compiled By Tsegaye A. 138 Exercises 1. Find the grammar that generates the language: 2. Find the language generated by the grammar with productions: Compiled By Tsegaye A. 139 Conversion of RG to NFA and vice versa This can be proven in the same constructive manner that we used previously to show that any NFA can be converted into an equivalent DFA. In this case, we (1) give a sequence of steps for converting a Regular Grammar into an equivalent NFA, and (2) give a sequence of steps for converting an NFA into an equivalent Regular Grammar. Compiled By Tsegaye A. 140 Converting (right-linear) Regular Grammar → NFA step 1 – create states for each non-terminal that present in grammar and special state for final state step 2 – productions of the form V → aW result in a transition from V to W that consumes symbol a. step 3 – productions of the form V → W result in a transition from V to W that consumes symbol λ. step 4 – productions of the form V → ab…cW result in a series of transitions starting at V and ending at W, that consume the symbols a, b,…,c and pass through intermediate series of states as necessary. step 5 – productions of the form V → ab…c| λ result in a series of transitions starting at V and ending in an accept state, passing through an intermediate series of states as necessary. 141 Compiled By Tsegaye A. Converting (right-linear) Regular Grammar → NFA Example Construct NFA M for the following right-linear grammar: S → aA | B A → aa B A B→b B|a S VF special final state B Note that every state is a grammar variable: Compiled By Tsegaye A. 142 Converting (right-linear) Regular Grammar → NFA Add edges for each production: a A S VF B S aA Compiled By Tsegaye A. 143 Converting (right-linear) Regular Grammar → NFA a A S VF B S aA |B Compiled By Tsegaye A. 144 Converting (right-linear) Regular Grammar → NFA A a a S a VF B S aA |B A aa B Compiled By Tsegaye A. 145 Converting (right-linear) Regular Grammar → NFA A a a S a VF B S aA| B b A. aa B B. bB Compiled By Tsegaye A. 47 146 Converting (right-linear) Regular Grammar → NFA A a a S a VF a B S aA| B b A. aa B B. bB |a Compiled By Tsegaye A. 48 147 Converting (right-linear) Regular Grammar → NFA A a a S a VF a B b S aA aaaB aaabB aaaba 49 Compiled By Tsegaye A. 148 Converting (right-linear) Regular Grammar → NFA Grammar G A a S aA| B a A aa B S a B bB |a VF a B L(M ) L(G) b aaab* a b*a Compiled By Tsegaye A. 149 Example: RG to NFA Transform the following Right Linear grammar in an equivalent NFAε. S → aS | bA Solution-1: A → cA | ε a c Solution: b ε S Α f L={ab, abc, abcc, …, b, bc, bcc, …} Solution-2: a c You can remove the final state f by making state A as final state since A→ε b S Α A Compiled By Tsegaye A. Example: RG to NFA Transform the following Right Linear grammar in an equivalent NFAε. r S → rA | r|ε Solution: L={ε, r, rt, rtr, rtrt, rtrtr, rtrtrt, …} S r Α t A → tS | t F t Since S is final state, A→t can link to S NB: state S be a final state i.e., productions A→t |tS used common since S→ε to generate ε string edge/label r F So, it can be looks like this: S r Α t or, we can make A as final state and connect S→r to final state A. r S A i.e., productions S→r|rA used common edge/label and looks like: t Compiled By Tsegaye A. Converting NFA → Regular Grammar (right linear) step 1 – convert edges from V to W consuming a, to productions of the form V → aW step 2 – convert null edges (λ) from V to W into productions of the form V→W step 3 – convert accept states (say, V) into rules of the form V→ λ Compiled By Tsegaye A. 152 Converting NFA → Regular Grammar (right linear) Compiled By Tsegaye A. 153 Example-2 NFA to Right Linear RG L(G) L(M ) Lb G q0 aq1 a a q0 q1 q2 q1 bq1 q1 aq2 b q2 bq3 q3 q3 q1 q3 67 Compiled By Tsegaye A. 154 Example-3 DFA to Right Linear RG Transform the following DFA to a right linear grammar b a q0 q1 b a Solution: q0 → aq1 | bq0 q1 → aq1 | bq0 | ε Compiled By Tsegaye A. 155 The case of Left-Linear Grammars Let G be a left-linear grammar We will prove: L(G) is regular Proof idea: We will construct a right-linear grammar G with L(G) L(G ) R Compiled By Tsegaye A. 156 Since G is left-linear grammar the productions looks like: Left linear G: A → Ba1a2…ak Or Left linear G: A→a1a2...ak A → Bv A→ v Right linear G’: A → ak…a2a1B Right linear G’: A → ak…a2a1 A → vRB A → vR So L(G) L(G ) R Compiled By Tsegaye A. 157 Example C → Bc B → Ab A→a The derivation of abc is: C → Bc → Abc → abc, or abc ← Abc ← Bc ← C Then So you should start creating the string abc from right to left. But this is equivalent with creating the reverse of cba. C → cB → cbA → cba and then take the reverse. Compiled By Tsegaye A. 158 Example (continued) The Right Linear grammar with the rules A → Ba reversed is C → cB B → bA A→ a and it produces the reverse language. So, just create the NFAε for the language produced by the Right Linear grammar and then compute the reverse (change start with final state and reverse the arrows). This is an NFAε for the Left Linear grammar. Compiled By Tsegaye A. 159 Example Left linear G Right linear G’ b b a A -> Ba/Ab/b A -> aB/bA/b A a B a C a F B -> Ca/Bb B -> aC/bB a b C -> Aa/Ca/a C -> aA/aC/a Finally, create reverse of the above FA Or FA can be as: A C b b a b a a a A B C F a b reverse of the above FA NB: here, Final states are A and C in both figures, because C they are having terminal b production Compiled By Tsegaye A.