lecture_notes-Week2.pdf
Document Details
Tags
Full Transcript
Chapter 4 Regular Expressions 4.1 Regular Expression - An algebraic way to represent regular languages. - Some practical applications: pattern matching in text editors, used in compiler design. Some examples Expression Language 0...
Chapter 4 Regular Expressions 4.1 Regular Expression - An algebraic way to represent regular languages. - Some practical applications: pattern matching in text editors, used in compiler design. Some examples Expression Language 0 {0} 1 {1} 0∪1 {0, 1} 0∗ {, 0, 00, 000,...} (0 ∪ 1)∗ {, 0, 1, 00, 01, 10,...} (0 ∪ 1) · 1∗ {0, 1, 01, 11, 011, 111,...} {} ∅ {} Each expression corresponds to a language. Regular expressions are defined inductively as shown below. Definition 4.1.1. R is said to be a regular expression (or RE in short) if R has one of the following forms: Language of the regular Regular Expression Comment expression or L(R) ∅ {} the empty set {} the set containing only a {a} a∈Σ for two regular expressions R1 and R1 ∪ R2 L(R1 ) ∪ L(R2 ) R2 for two regular expressions R1 and R1 · R2 L(R1 ) · L(R2 ) R2 R1∗ (L(R1 ))∗ for a regular expression R1 (R1 ) L(R1 ) for a regular expression R1 Remark. Note the following 27 - Regular expressions are well defined. In other words, each regular expression corresponds to a unique language. Is the converse true? - ∪ is often replaced by +. Hence R1 ∪ R2 is the same as R1 + R2. - The dot symbol is often discarded. - () gives precedence to an expression (similar to standard arithmetic). - Order of precedence (higher to lower): () ∗ · ∪ - The language corresponding to the RE ∅∗ is {}. (since is the concatenation of zero symbols from the set ∅) Some more examples of REs and their corresponding languages. R L(R) 01 {01} 01 + 1 {01, 1} (01 + )1 {011, 1} (0 + 10)∗ ( + 1) {, 0, 10, 00, 001, 010, 0101,...} Informally, L(R) consists of all those strings that “matches” the regular expression R. Let us see some examples of the other type. That is given a regular language, what is the corresponding regular expression. Language RE {w | w has a single 1} 0∗ 10∗ {w | w has at most a single 1} 0 + 0∗ 10∗ ∗ {w | |w| is a multiple of 3} ((0 + 1)(0 + 1)(0 + 1))∗ {w | w has a 1 at every odd position and |w| is odd} 1((0 + 1)1)∗ {w | w has a 1 at every even position} ((0 + 1)1)∗ + (0 + 1)((0 + 1)1)∗ We say that two regular expressions R1 and R2 are equivalent (denoted as R1 ≡ R2 ) if L(R1 ) = L(R2 ). Note 3. Some basic algebraic properties of REs. 1. R1 + (R2 + R3 ) = (R1 + R2 ) + R3 2. R1 (R2 R3 ) = (R1 R2 )R3 3. R1 (R2 + R3 ) = R1 R2 + R1 R3 4. (R1 + R2 )R3 = R1 R3 + R2 R3 5. R1 + R2 = R2 + R1 (only addition is commutative)) 6. (R∗ )∗ = R∗ 7. R = R = R 8. R∅ = ∅R = ∅ 9. R + ∅ = R 4.2 Regular Expressions and Regular Languages Theorem 5. A language L is regular if and only if L = L(R) for some regular expression R. In other words, REs are equivalent in power to NFAs/DFAs. 4.2.1 Converting an RE to an NFA Given a regular expression, we will convert it into an NFA N such that L(R) = L(N ). We will give a case based analysis based on the inductive definition of REs. Case 1: R = ∅. NFA is start Case 2: R = . NFA is start Case 3: R = a for some a ∈ Σ. NFA is a start Case 4: R = R1 + R2 , where R1 and R2 are two REs. Let N1 and N2 be the NFAs for R1 and R2 respectively. Then the NFA for R is q1 r1 N1 start q r q2 r2 N2 Case 5: R = R1 R2 , where R1 and R2 are two REs. Let N1 and N2 be the NFAs for R1 and R2 respectively. Then the NFA for R is q q1 r1 q2 r2 start r N1 N2 Case 6: R = R1∗ , where R1 is an RE. Let N1 be the NFA for R1. Then the NFA for R is q q1 r1 start r N1 The above construction constructs an NFA from an RE in an inductive manner. Therefore the class of languages accepted by regular expressions are a subset of regular languages. 4.2.2 Generalized Nondeterministic Finite Automaton We will now prove that for every regular language there exists a regular expression. For this we will introduce another type of finite automaton known as generalized non-deterministic finite automaton (or GNFA). A GNFA is a non-deterministic automaton with transitions being labeled with regular expressions instead of just symbols from the alphabet or . Here is an example of a GNFA. 11 q1 01∗ 0 + 01 (11)∗ 1 qstart ∅ q2 1∗ qaccept start 0 00∗ Strings accepted by the above GNFA: - 01101 : in multiple ways. - 00 : at least 3 ways. - 0100 Strings not accepted by the above GNFA: - 10 : no way to partition so that it matches a sequence from start to accept state - A string w ∈ Σ∗ is accepted by a GNFA if w = w1 w2... wk , where each wi ∈ Σ∗ and there exists a sequence of states q0 , q1 ,... qk , such that - q0 is the start state, - qk is the accept state, and - for each i, if the transition from qi−1 to qi is labeled with the regular expression Ri , then wi ∈ L(Ri ). We assume the following conditions on a GNFA without loss of generality. 1. Has a unique start state and a unique accept state. 2. The start state has a transition going out to every other state (excluding itself). 3. No transition coming into the start state from any other state. 4. The accept state has a transition coming in from every other state (excluding itself). 5. No transition going out of the accept state to any other state. 6. Except for the start and accept states, there are transitions between every pair of states (in both directions), and also from a state to itself. 4.2.3 Converting a DFA to an RE We will illustrate the algorithm with an example. 1. Consider the following DFA. a a, b start 1 2 b a b 3 2. We convert the DFA into a GNFA satisfying the above assumptions. - Create new start state s and new start accepting state t. Let the new set of states be Q - Add transition from s to old start state. - Add transitions from old accept states to t. - Make sure there are transitions from s to every state in the GNFA (except s itself), and from every state (except t) to t. - Add transitions from every state in Q \ {s, t} to every other state in Q \ {s, t}, putting the label ∅, if a transition did not exist there earlier. ∅ ∅ a a+b start s 1 ∅ 2 t ∅ ∅ b a ∅ b 3 ∅ 3. We now remove states in Q \ {s, t}, one at a time. replace the resulting transitions with suitable labels as described below. Consider the following set of 3 states with regular expressions labeled on the transitions, and qrip is the state that we want to remove. R1 qi qj R2 R4 qrip R3 Then on removing qrip , the resulting GNFA will be R1 + R2 R3∗ R4 qi qj - GNFA after removing state 1. a a+b start s 2 t ∅ b + a(a + b) b +a ∅ 3 ∅ - GNFA after removing state 2. (a + b)a∗ b +a start s 3 t (b + a(a + b))a∗ b - GNFA after removing state 3. + ((a + b)a∗ b)((b + a(a + b))a∗ b)∗ ( + a) start s t Therefore regular expression corresponding to the given DFA is + ((a + b)a∗ b)((b + a(a + b))a∗ b)∗ ( + a) Chapter 5 Properties of Regular Languages 5.1 Closure Properties We have already seen that regular languages are closed under union, concatenation and star operations. We will discuss some more closure properties of regular languages. 5.1.1 Complement, Intersection and Set Difference It is easy to see that regular languages are closed under complement. If D = (Q, Σ, δ, q0 , F ) is a DFA for a regular language L then a DFA for L is D0 = (Q, Σ, δ, s, Q \ F ). That is the DFA whose accept states are the non-accept states of the DFA D and vice versa. Then if w ∈ L(D) then w∈ / L(D0 ) and w ∈ / L(D) then w ∈ L(D0 ). Using De Morgan’s Law, A ∩ B = A ∪ B. Since regular languages are closed under union and complement, hence they are also closed under intersection. A \ B = A ∩ B. Hence regular languages are closed under set difference. 5.1.2 Reversal Let w = a1 a2... an be a string. Then rev(w) = an an−1... a1. Extending the definition, we say that for a language L ⊆ Σ∗ , rev(L) = {rev(w) | w ∈ L}. Theorem 6. If L is regular then rev(L) is also regular. Consider a DFA D = (Q, Σ, δ, q0 , F ) such that L = L(D). Now any string that is in the language L, will start at the start state q0 and end up at one of the accept states in F. To design an automaton for rev(L) we will invert the transitions of D. Since we do not know a priori in which state a string would be accepted, we would use nondeterminism to “guess” a starting position in the reversed automaton. Here is the formal construction. Let D0 = (Q0 , Σ, δ 0 , q00 , F 0 ) be an NFA for rev(L) defined as follows. - Q0 = Q ∪ {q00 }. - δ(q, a) = {r | δ(r, a) = q} - F 0 = {q0 } 35 5.1.3 First-Halves For a language L ⊆ Σ∗ , define FirstHalves(L) = {x | ∃y such that |x| = |y|, xy ∈ L}. For example, let L = {0, 10, 110, 1011, 100110} then FirstHalves(L) = {1, 10, 100}. Theorem 7. If L is regular then FirstHalves(L) is also regular. Let D = (Q, Σ, δ, q0 , F ) be a DFA such that L = L(D). We will design a pebble game on the DFA D, corresponding to the language FirstHalves(L) and then use the game to construct an automaton for FirstHalves(L). B, G R Figure 5.1: The DFA D: Initial configuration of the game 1. Starting configuration of the game - The red pebble R is placed at the start state of the DFA D. - The blue pebble B and the green pebble G are together placed at a nondeterministically chosen state of D (see the Figure 5.1.3). Let w ∈ Σ∗. Then R will correspond to tracing the first half of the string w, G will correspond to tracing the second half of the string, and B will remember the initial position of G. 2. Moves of the game. - R moves according to the transition function of D. - B remains static. - For every step of R, G takes one step nondeterministically. 3. Winning configuration of the game - R and B are in the same state. - G is in some accept state of D (see Figure 2). B, R G Figure 5.2: The DFA D: Winning configuration of the game We will now design an NFA for FirstHalves(L) based on the above game. Let N = (Q0 , Σ, δ 0 , q00 , F 0 ) where - Q0 = Q3 ∪ {qs }, where qs is an additional state. - δ 0 (qs , ) = {(q0 , q, q) | q ∈ Q} δ 0 ((p, q, r), a) = {(δ(p, a), q, (δ(r, b)) | b ∈ Σ} - q00 = qs. - F 0 = {(q, q, f ) | q ∈ Q, f ∈ F } Exercise 8. 1. For a language L, let MiddleThirds(L) = {y | ∃x, z and |x| = |y| = |z| and xyz ∈ L} For example, MiddleThirds({, a, ab, bab, bbab, aabbab}) = {, a, bb}. Prove that if L is regular, MiddleThirds(L) is also regular. 2. Given L ⊆ {0, 1}∗ , define L0 = {xy | x1y ∈ L}. Show that if L is regular then L0 is also regular. 3. For a language A, let A00 = {xz | ∃y and |x| = |y| = |z| and xyz ∈ A} Show that even if A is regular, A00 is not necessarily regular.