03 Lecture Notes PDF
Document Details
Uploaded by RestoredAzurite2265
TUM
Tags
Summary
These lecture notes cover context-free languages, including regular expressions, pushdown automata, and grammars. They provide a theoretical foundation in computer science. The notes also outline concepts like the Pumping Lemma and examples of non-regular languages.
Full Transcript
Administrivia Last time: regular expressions, properties of regular languages, pumping lemma Today: Context-free languages, pushdown automata, grammars Reading: Chapter 2 Test date poll Anonymous feedback now Recall The language of an auto...
Administrivia Last time: regular expressions, properties of regular languages, pumping lemma Today: Context-free languages, pushdown automata, grammars Reading: Chapter 2 Test date poll Anonymous feedback now Recall The language of an automaton is the set of strings it accepts For now computation simply means determining whether a given string is in the language or not (decision problem) Two FA variants: deterministic and non-deterministic Regular language () some FA recognizes it There are 3 regular operations: union, concatenation and Kleene star Regular expressions Regular language () some regular expression describes it Non-regular languages and the pumping lemma Regular Expressions R is a regular expression if R is 1 a for some a in the alphabet ⌃, 2 ✏, 3 ;, 4 (R1 [ R2 ), where R1 and R2 are regular expressions, 5 (R1 R2 ), where R1 and R2 are regular expressions, or 6 (R1⇤ ), where R1 is a regular expression. The Pumping Lemma (PL) for Regular Languages Theorem (PL) If A is a regular language, then there exists a number p (the pumping length) so that if s is any string in A of length at least p, then s may be divided into three pieces, s = xyz, satisfying the following conditions: 1 xy i z 2 A, for each i 0, 2 |y | > 0, and 3 |xy | p. Proving a Language Non-Regular Use PL to prove a language is non-regular. Think of it as a game: You are given the language L You are given a value p You choose a string s longer than p Choose carefully in order to break the claim that s satisfies the 3 conditions of the PL Note that you do not choose how s is broken into 3 parts s = xyz You must argue that no matter how s is broken into 3 parts there is always a contradiction. Choosing s poorly can make your job harder or impossible. Examples of Non-Regular Languages Claim L = {0n 1n |n 0} is not a regular language Claim L = {ww |w 2 {0, 1}⇤ } is not a regular language Claim L = {0i 1j |i > j} is not a regular language Context Free Languages Regular languages are characterized by FAs and REs CFLs are characterized by Context Free Grammars (CFGs) CFLs are characterized by pushdown automata (PDAs) PDAs are machines with memory (stack) Example of a CFL: L = {0n 1n |n 0} Unlike FAs, DPDAs and NPDAs are not equivalent in power CFGs are Type 2 grammars in Chomsky’s hierarchy Context Free Grammars Definition A context-free grammar is a 4-tuple (V , ⌃, R, S), where: 1 V is a finite set of variables, 2 ⌃ is a finite set of terminals, ⌃ \ V = ; 3 R is a finite set of rules, ↵ ! , where ↵ is a variable and a string of variables and terminals 4 S 2 V is the start variable Example: G = ({S}, {0, 1}, R, S), where the set of rules R is: S ! 0S1 S !✏ Context Free Grammars Definition A context-free grammar is a 4-tuple (V , ⌃, R, S), where: 1 V is a finite set of variables, 2 ⌃ is a finite set of terminals, ⌃ \ V = ; 3 R is a finite set of rules, ↵ ! , where ↵ is a variable and a string of variables and terminals 4 S 2 V is the start variable Example: G = ({S}, {0, 1}, R, S), where the set of rules R is: S ! 0S1 S !✏ And the language of G is L = {0n 1n |n 0} Context Free Grammars Definition If u, v , and w are strings of variables and terminals, and A ! w is a rule of the grammar, we say that uAv yields uwv , written ⇤ uAv ) uwv. We say that u derives v , written u = ) v , if u = v or if a sequence u1 , u2 ,... , uk exists for k 0 and u 1 ) u2 ) · · · ) uk ) v. ⇤ The language of the grammar is {w 2 ⌃⇤ |S = ) w }. Any language that can be generated by some context-free grammar is called a context-free language (CFL). Example: S ! 0S1|✏ Context Free Grammars Definition If u, v , and w are strings of variables and terminals, and A ! w is a rule of the grammar, we say that uAv yields uwv , written ⇤ uAv ) uwv. We say that u derives v , written u = ) v , if u = v or if a sequence u1 , u2 ,... , uk exists for k 0 and u 1 ) u2 ) · · · ) uk ) v. ⇤ The language of the grammar is {w 2 ⌃⇤ |S = ) w }. Any language that can be generated by some context-free grammar is called a context-free language (CFL). Example: S ! 0S1|✏ Parsing and parse trees: compilers and interpreters use parsers to extract the meaning of a program before generating the compiled code; the parser can be built, given a CFG. Context-Free Languages and Grammars hSENTENCE i ! hNOUN.PHRASE ihVERB.PHRASE i hNOUN.PHRASE i ! hCMPLX.NOUNi|hCMPLX.NOUNihPREP.PHRASE i hVERB.PHRASE i ! hCMPLX.VERBi|hCMPLX.VERBihPREP.PHRASE i hPREP.PHRASE i ! hPREPihCMPLX.NOUNi hCMPLX.NOUNi ! hARTICLE ihNOUNi hCMPLX.VERBi ! hVERBi|hVERBihNOUN.PHRASE i hARTICLE i ! a|the hNOUNi ! boy |girl|binoculars hVERBi ! sees|likes hPREPi ! with Where ⌃ = {a, b, c,..., z,“ ”} and “ ” is the blank symbol, placed invisibly after each word, so the words won’t run together. Strings generated by this grammar include: The boy sees the girl The girl likes the boy with the binoculars A boy sees hSENTENCE i ) hNOUN.PHRASE ihVERB.PHRASE i ) hCMPLX.NOUNihVERB.PHRASE i ) hARTICLE ihNOUNihVERB.PHRASE i ) a hNOUNihVERB.PHRASE i ) a boy hVERB.PHRASE i ) a boy hCMPLX.VERBi ) a boy hVERBi ) a boy sees CFLs and CFGs Context-free grammars are used to specify the syntax of a language. Programming languages such as Python and Java have context-free grammars and they are used for parsing. The O(n3 |G |) CYK algorithm uses dynamic programming to check whether a given string (program) can be generated by the context-free grammar for the language. Grammars We have seen two di↵erent, though equivalent, methods of describing regular languages: FAs and REs A grammar provides yet another way to describe a language While an automaton recognizes a language, a grammar generates the language The grammars for regular languages are also known as type 3 grammars The grammars for context-free languages are context-free grammars, also known as type 2 grammars There exists a finite automaton for a language L () there exists a type 3 grammar for L There exists a pushdown automaton for a language L () there exists a type 2 grammar for L Type 3 Grammars: Example Constructing a CFG for a language that happens to be regular is easy if you can first construct a DFA for that language. You can convert any DFA (Q, ⌃, , q0 , F ) into an equivalent CFG as follows: make a variable Ri for each state qi of the DFA add the rule Ri ! aRj to the CFG if (qi , a) = qj add the rule Ri ! ✏ if qi 2 F (qi is an accept state) make R0 the start variable of the grammar, where q0 is the start state of the machine. Let’s see an example. Type 3 Grammars: Example S ! 0S S ! 0S|1A S ! 1A A ! 1A|0B|✏ A ! 1A B ! 0A|1A A ! 0B A!✏ B ! 0A B ! 1A Designing Grammars 1 L1 = {w |w contains at least three 1s} Designing Grammars 1 L1 = {w |w contains at least three 1s} S1 ! R1R1R1R R ! 0R|1R|✏ Designing Grammars 1 L1 = {w |w contains at least three 1s} S1 ! R1R1R1R R ! 0R|1R|✏ 2 L2 = {w | the length of w is odd and its middle symbol is a 0} Designing Grammars 1 L1 = {w |w contains at least three 1s} S1 ! R1R1R1R R ! 0R|1R|✏ 2 L2 = {w | the length of w is odd and its middle symbol is a 0} S2 ! 0|0S0|0S1|1S0|1S1 3 L3 is strings over the alphabet {a, b} with more a’s than b’s Designing Grammars 1 L1 = {w |w contains at least three 1s} S1 ! R1R1R1R R ! 0R|1R|✏ 2 L2 = {w | the length of w is odd and its middle symbol is a 0} S2 ! 0|0S0|0S1|1S0|1S1 3 L3 is strings over the alphabet {a, b} with more a’s than b’s S3 ! TaT T ! TT |aTb|bTa|a|✏ Ambiguity Type3 grammars and languages are unambiguous: there’s exactly one way to generate a string, exactly one way to parse a string, exactly one way to interpret a string in the language Type2 grammars (CFGs) and CFL can be ambiguous. The following string is ambiguous: The girl sees the boy with the binoculars Such a string has several di↵erent parse trees and thus several di↵erent meanings This result may be undesirable for certain applications, such as programming languages, where a program should have a unique interpretation A grammar generates a string ambiguously, if the string has two di↵erent parse trees, not two di↵erent derivations Ambiguity Two derivations may di↵er merely in the order in which they replace variables yet not in their overall structure To concentrate on structure, we define a type of derivation that replaces variables in a fixed order A derivation of a string w in a grammar G is a leftmost derivation if at every step the leftmost remaining variable is the one replaced Definition A string w is derived ambiguously in CFG G if it has two or more di↵erent leftmost derivations. Grammar G is ambiguous if it generates some string ambiguously. Sometimes when we have an ambiguous grammar we can find an unambiguous grammar that generates the same language Some CFLs, however, can be generated only by ambiguous grammars; such languages are called inherently ambiguous An example of an inherently ambiguous language is {ai b j c k |i = j or j = k} Pushdown Automata Schematic Pushdown Automaton Schematic Finite Automaton Pushdown Automata Schematic Pushdown Automaton Schematic Finite Automaton PDA A pushdown automaton is a 6-tuple (Q, ⌃, , , qs , F ), where Q is a finite set called the states, ⌃ is a finite input alphabet, is a finite stack alphabet, : Q ⇥ ⌃✏ ⇥ ✏ ! P(Q ⇥ ✏ ) is the transition function, qs 2 Q is the start state, F ✓ Q is the set of accept states. Computation of a Pushdown Automaton Let M = (Q, ⌃, , , qS , F ) be a pushdown automaton and let w = w1 w2... wn be a string where each wi is a member of the alphabet ⌃✏. Then M accepts w if there exists a sequence of states r0 , r1 ,... , rn in Q and a sequence of strings s0 , s1 ,... , sn in ✏ that satisfy the three conditions: r0 = qS and s0 = ✏ (ri+1 , b) 2 (ri , wi+1 , a), for i = 0,..., n 1, where si = at and si+1 = bt for some a, b 2 ✏ and t 2 ⇤. rn 2 F. PDA Technicalities The formal PDA definition has no explicit mechanism to test for: Empty stack. We accomplish this by initially placing a special symbol $ on the stack. When we see this on top of the stack we know that the stack is empty. This gives us the ability to ”test for an empty stack.” End of input string. We assume that the accept state takes e↵ect only when the machine is at the end of the input, i.e., if the PDA is an accept state but the input has not yet been completely processed, the PDA does not yet accept. PDA Technicalities The many meanings of a, b ! c: on a from input, pop b, push c: a = ✏: don’t read from input b = ✏: don’t pop top of stack c = ✏: don’t push anything on top of stack a, b ! ✏: on a, pop b, (don’t push) a, ✏ ! c: on a (don’t pop), push c Let’s Build a PDA for a Language Consider the language L = {ai b j c k |i , j, k 0 and i = j or i = k} Let’s Build a PDA for a Language Consider the language L = {ai b j c k |i , j, k 0 and i = j or i = k} Let’s Build a PDA for a Language Consider the language L = {ww R |w 2 {0, 1}⇤ }, where w R means w written backwards (in reverse). Let’s Build a PDA for a Language Consider the language L = {ww R |w 2 {0, 1}⇤ }, where w R means w written backwards (in reverse). Context Free Grammars and Normal Forms Definition A context-free grammar is a 4-tuple (V , ⌃, R, S), where: 1 V is a finite set of variables, 2 ⌃ is a finite set of terminals, ⌃ \ V = ; 3 R is a finite set of rules, ↵ ! , where ↵ is a variable and a string of variables and terminals 4 S 2 V is the start variable Example S ! ASA ! AASAA ! S ! ASA BASAA ! BBSAA ! S ! aB BBSBA ! BBSBB ! A!B BBaBBB ! BBaBB ! A!S BaBB ! BaB ! aB ! a B!b B!✏ Normal Forms for Type 2 Grammars Simplified (standardized) rules such as those in the Chomsky Normal Form or the Greibach Normal Form, provide some nice properties of the derivations: the size of the generated string is proportional to the number of rules applied. Chomsky Normal Form (1959) makes it possible to use a polynomial time algorithm to decide whether a string can be generated by a grammar; see the Cocke-Younger-Kasami (CYK) algorithm. Greibach Normal Form (1965) makes it possible to prove that every CFG can be recognized by a PDA that works in real time (i.e., without ✏ transitions). A ! BC A ! aA1 A2... Ak , k 0 A!a S ! ✏ (if ✏ 2 L) S ! ✏ (if ✏ 2 L) Chomsky Normal Form Definition A context-free grammar is in Chomsky normal form if every rule is of the form: A ! BC A!a where a is any terminal and A, B, and C are any variables, except that B and C may not be the start variable. In addition, we permit the rule S ! ✏, where S is the start variable. - Compare and contrast with right linear type3 grammars (A ! 1B or A ! ✏). - Clearly CNF and GNF are much more restricted than the initial definition of type2 grammars (↵ ! , where ↵ 2 V and is a string over V and T ). So it is not obvious that they are as powerful... Any CFL has a CFG in CNF Theorem Any context-free language can be generated by a context-free grammar in Chomsky normal form. Proof. Let L be a CFL. Then there exists a type 2 grammar G for it. We show how to convert G into Chomsky normal form. The conversion has several stages wherein rules that violate the conditions are replaced with equivalent ones that are satisfactory. add a new start variable. eliminate all ✏-rules of the form A ! ✏. eliminate all unit rules of the form A ! B. convert the remaining rules into the proper form. CFG to CNF Example Phase 1 (new start) S ! ASA|aB S0 ! S A ! B|S S ! ASA|aB B ! b|✏ A ! B|S B ! b|✏ Phase 2 (remove B ! ✏) S0 ! S S0 ! S S ! ASA|aB S ! ASA|aB|a A ! B|S A ! B|S|✏ B ! b|✏ B!b Phase 2 (remove A ! ✏) S0 ! S S0 ! S S ! ASA|aB|a S ! ASA|aB|a|SA|AS|S A ! B|S|✏ A ! B|S B!b B!b CFG to CNF Example Phase 3 (remove S ! S) S0 ! S S0 ! S S ! ASA|aB|a|SA|AS|S S ! ASA|aB|a|SA|AS A ! B|S A ! B|S B!b B!b Phase 3 (remove S0 ! S) S0 ! S S0 ! ASA|aB|a|SA|AS S ! ASA|aB|a|SA|AS S ! ASA|aB|a|SA|AS A ! B|S A ! B|S B!b B!b CFG to CNF Example Phase 3 (remove A ! B) S0 ! ASA|aB|a|SA|AS S0 ! ASA|aB|a|SA|AS S ! ASA|aB|a|SA|AS S ! ASA|aB|a|SA|AS A ! B|S A ! S|b B!b B!b Phase 3 (remove A ! S) S0 ! ASA|aB|a|SA|AS S0 ! ASA|aB|a|SA|AS S ! ASA|aB|a|SA|AS S ! ASA|aB|a|SA|AS A ! S|b A ! b|ASA|aB|a|SA|AS B!b B!b CFG to CNF Example Phase 4 (fix remaining rules) S0 ! ASA|aB|a|SA|AS S0 ! AA1 |UB|a|SA|AS S ! ASA|aB|a|SA|AS S ! AA1 |UB|a|SA|AS A ! b|ASA|aB|a|SA|AS A ! b|AA1 |UB|a|SA|AS B!b A1 ! SA U! a B!b