Full Transcript

**University of Science and Technology** **Faculty of Computer Science and Information Technology** [Computer Science Department] Subject: Theory of Computation Lecture (8) Instructor: Prof. Noureldien Abdelrahman Date 21-12-2022 **Types of Phrase-Structure Grammars** Phrase-structure grammar...

**University of Science and Technology** **Faculty of Computer Science and Information Technology** [Computer Science Department] Subject: Theory of Computation Lecture (8) Instructor: Prof. Noureldien Abdelrahman Date 21-12-2022 **Types of Phrase-Structure Grammars** Phrase-structure grammars can be classified according to the types of productions that are allowed. A **type 0** grammar has no restrictions on its productions. A **type 1** grammar can have productions of the form *w*1 → *w*2, where *w*1 = *lAr* and *w*2 = *lwr*, where *A* is a nonterminal symbol, *l* and *r* are strings of zero or more terminal or nonterminal symbols, and *w* is a nonempty string of terminal or nonterminal symbols. It can also have the production *S* → *λ* as long as *S* does not appear on the right-hand side of any other production. A **type 2** grammar can have productions only of the form *w*1 → *w*2, where *w*1 is a single symbol that is not a terminal symbol. A **type 3** grammar can have productions only of the form *w*1 → *w*2 with *w*1 = *A* and either *w*2 = *aB* or *w*2 = *a*, where *A* and *B* are nonterminal symbols and *a* is a terminal symbol, or with *w*1 = *S* and *w*2 = *λ*. Type 2 grammars are called **context-free grammars** because a nonterminal symbol that is the left side of a production can be replaced in a string whenever it occurs, no matter what else is in the string. A language generated by a type 2 grammar is called a **context-free language**. When there is a production of the form *lw*1*r* → *lw*2*r* (but not of the form *w*1 → *w*2), the grammar is called type 1 or **context-sensitive** because *w*1 can be replaced by *w*2 only when it is surrounded by the strings *l* and *r*. A language generated by a type 1 grammar is called a **context-sensitive language**. Type 3 grammars are also called **regular grammars**. A language generated by a regular grammar is called **regular**. Section 13.4 deals with the relationship between regular languages and finite-state machines. Context-free and regular grammars play an important role in programming languages. Context-free grammars are used to define the syntax of almost all programming languages. These grammars are strong enough to define a wide range of languages. Furthermore, efficient algorithms can be devised to determine whether and how a string can be generated. Regular grammars are used to search text for certain patterns and in lexical analysis, which is the process of transforming an input stream into a stream of tokens for use by a parser. **EXAMPLE 9** It follows from Example 5 that {0*^n^*1*^n^* \| *n* = 0*,* 1*,* 2*,...* } is a context-free language, because the productions in this grammar are *S* → 0*S*1 and *S* → *λ*. However, it is not a regular language. **EXAMPLE 10** The set {0*n*1*n*2*n* \| *n* = 0*,* 1*,* 2*,...* } is a context-sensitive language, because it can be generated by a type 1 grammar, as Example 7 shows, but not by any type 2 language. (This is shown in Exercise 28 in the supplementary exercises at the end of the chapter.) Table 1 summarizes the terminology used to classify phrase-structure grammars. **Derivation Trees** A derivation in the language generated by a context-free grammar can be represented graphically using an ordered rooted tree, called a **derivation**, or **parse tree**. The root of this tree represents the starting symbol. The internal vertices of the tree represent the nonterminal symbols that arise in the derivation. The leaves of the tree represent the terminal symbols that arise. If the production *A* → *w* arises in the derivation, where *w* is a word, the vertex that represents *A* has as children vertices that represent each symbol in *w*, in order from left to right. **EXAMPLE 11** Construct a derivation tree for the derivation of *the hungry rabbit eats quickly*, given in the introduction of this section. *Solution:* The derivation tree is shown in Figure 1. The problem of determining whether a string is in the language generated by a context-free grammar arises in many applications, such as in the construction of compilers. Two approaches to this problem are indicated in Example 12. **EXAMPLE 12** Determine whether the word *cbab* belongs to the language generated by the grammar *G* = *(V, T, S, P)*, where *V* = {*a, b, c,A,B,C, S*}, *T* = {*a, b, c*}, *S* is the starting symbol, and the productions are *S* → *AB* *A* → *Ca* *B* → *Ba* *B* → *Cb* *B* → *b* *C* → *cb* *C* → *b.* ***Solution:*** One way to approach this problem is to begin with *S* and attempt to derive *cbab* using a series of productions. Because there is only one production with *S* on its left-hand side, we must start with *S ⇒AB*. Next we use the only production that has *A* on its left-hand side, namely, *A* → *Ca*, to obtain *S ⇒AB ⇒CaB*. Because *cbab* begins with the symbols *cb*, we use the production *C* → *cb*. This gives us *S ⇒AB ⇒CaB ⇒cbaB*. We finish by using the production *B* → *b*, to obtain *S ⇒AB ⇒CaB ⇒cbaB ⇒cbab*. The approach that we have used is called **top-down parsing**, because it begins with the starting symbol and proceeds by successively applying productions. There is another approach to this problem, called **bottom-up parsing**. In this approach, we work backward. Because *cbab* is the string to be derived, we can use the production *C* → *cb*, so that *Cab⇒cbab*. Then, we can use the production *A* → *Ca*, so that *Ab⇒Cab⇒cbab*. Using the production *B* → *b* gives *AB ⇒Ab⇒Cab⇒cbab*. Finally, using *S* → *AB* shows that a complete derivation for *cbab* is *S ⇒AB ⇒Ab⇒ Cab⇒cbab*.

Use Quizgecko on...
Browser
Browser