lecture-7.docx
Document Details
Uploaded by StylishSpessartine
University of Science and Technology
2022
Tags
Full Transcript
**University of Science and Technology** **Faculty of Computer Science and Information Technology** [Computer Science Department] Subject: Theory of Computation Lecture (7) Instructor: Prof. Noureldien Abdelrahman Date 14-12-2022 **Grammars** **Introduction** Grammars are used to generate th...
**University of Science and Technology** **Faculty of Computer Science and Information Technology** [Computer Science Department] Subject: Theory of Computation Lecture (7) Instructor: Prof. Noureldien Abdelrahman Date 14-12-2022 **Grammars** **Introduction** Grammars are used to generate the words of a language and to determine whether a word is in a language. [Formal languages, which are generated by grammars, provide models both for natural languages, such as English, and for programming languages, such as Pascal, Fortran, Prolog, C, and Java]. In particular, **[grammars are extremely important in the construction and theory of compilers.]** The grammars that we will discuss were first used by the American linguist Noam Chomsky in the 1950s. Words in the English language can be combined in various ways. The grammar of English tells us whether a combination of words is a valid sentence. For instance, *the frog writes neatly* is a valid sentence, because it is formed from a noun phrase, *the frog*, made up of the article *the* and the noun *frog*, followed by a verb phrase, *writes neatly*, made up of the verb *writes* and the adverb *neatly*. We do not care that this is a [nonsensical statement, because] we are concerned only with the **[syntax, or form, of the sentence, and not its semantics]**, or meaning. We also note that the combination of words ***swims quickly mathematics*** is not a valid sentence because it does not follow the rules of English grammar. The syntax of a **natural language**, that is, a spoken language, such as English, French, German, or Spanish, Arabic is extremely complicated. In fact, it does not seem possible to specify all the rules of syntax for a natural language. Research in the automatic translation of one language to another has led to the concept of a **formal language**, w**[hich, unlike a natural language, is specified by a well-defined set of rules of syntax]**. Rules of syntax are important not only in linguistics, the study of natural languages, **[but also in the study of programming languages.]** We will describe the sentences of a formal language using a grammar. The use of grammars helps when we consider the two classes of problems that arise most frequently in applications to programming languages: \(1) How can we determine whether a combination of words is a valid sentence in a formal language? \(2) How can we generate the valid sentences of a formal language? Before giving a technical definition of a grammar, we will describe an example of a grammar that generates a subset of English. This subset of English is defined using a list of rules that describe how a valid sentence can be produced. We specify that 1\. a **sentence** is made up of a **noun phrase** followed by a **verb phrase**; 2\. a **noun phrase** is made up of an **article** followed by an **adjective** followed by a **noun**, or 3\. a **noun phrase** is made up of an **article** followed by a **noun**; 4\. a **verb phrase** is made up of a **verb** followed by an **adverb**, or 5\. a **verb phrase** is made up of a **verb**; 6\. an **article** is *a*, or 7\. an **article** is *the*; 8\. an **adjective** is *large*, or 9\. an **adjective** is *hungry*; 10\. a **noun** is *rabbit*, or 11\. a **noun** is *mathematician*; 12\. a **verb** is *eats*, or 13\. a **verb** is *hops*; 14\. an **adverb** is *quickly*, or 15\. an **adverb** is *wildly*. From these rules we can form valid sentences using a series of replacements until no more rules can be used. For instance, we can follow the sequence of replacements: **Phrase-Structure Grammars** Before we give a formal definition of a grammar, we introduce a little terminology. **[DEFINITION 1 ]** **[A *vocabulary* (or *alphabet*]**) *V* is a finite, nonempty set of elements called *symbols*. **[A *word* (or *sentence*) over *V*]** is a string of finite length of elements of *V*. The **[*empty string* or *null string*,]** denoted by *λ*, is the string containing no symbols. The set of all words over **[*V* is denoted by *V*∗.]** **[A *language over V* is a subset of *V*∗.]** Note that *λ*, the empty string, is the string containing no symbols. It is different from ∅, the empty set. It follows that {*λ*} is the set containing exactly one string, namely, the empty string. **[DEFINITION 2]** **[A *phrase-structure grammar*]** *G* = *(V, T, S, P)* consists of **a vocabulary *V***, a subset *T* of *V* **[consisting of terminal symbols]**, a **start symbol *S* from *V***, and a finite set of **productions *P***. The set *V* − *T* is **[denoted by *N*]**. Elements of *N* are called **[*nonterminal symbols*. Every]** production in *P* must [contain at least one nonterminal] on its left side. **EXAMPLE 1** Let *G* = *(V, T, S, P)*, where *V* = {*a, b,A,B, S*}, *T* = {*a, b*}, *S* is the start symbol, and *P* = {*S* → *ABa, A* → *BB, B* → *ab, AB* → *b*}. *G* is an example of a phrase-structure grammar. **[We will be interested in the words that can be generated by the productions of a phrase-structure grammar.]** **DEFINITION 3** Let *G* = *(V, T, S, P)* be a phrase-structure grammar. Let *w*0 = *lz*0*r* (that is, the concatenation of *l, z*0, and *r*) and *w*1 = *lz*1*r* be strings over *V*. If *z*0 → *z*1 is a production of *G*, **[we say that *w*1 is *directly derivable* from *w*0 and]** we write *w*0 *⇒w*1. If *w*0*, w*1*,... ,wn* are strings over *V* such that *w*0 *⇒w*1*, w*1 *⇒w*2*,... ,wn*−1 *⇒wn*, then we say that *wn* is *derivable from w*0, and **[we write *w*0 ∗*⇒ wn*. The sequence of steps used to obtain *wn* from *w*0 is called a *derivation*.]** **EXAMPLE 2** The string *Aaba* is directly derivable from *ABa* in the grammar in Example 1 because *B* → *ab* is a production in the grammar. The string *abababa* is derivable from *ABa* because *ABa⇒Aaba ⇒BBaba ⇒Bababa ⇒abababa*, using the productions *B* →*ab*, *A* → *BB*, *B* →*ab*, and *B* →*ab* in succession. **DEFINITION 4** Let *G* = *(V, T, S, P)* be a phrase-structure grammar. The *language generated by G* (or the *language of G*), denoted by *L(G)*, i**[s the set of all strings of terminals that are derivable from the starting state *S*]**. In other words, *L(G)* = {*w* ∈ *T*∗ \| *S*∗*⇒w*}*.* In Examples 3 and 4 we find the language generated by a phrase-structure grammar. **EXAMPLE 3** Let *G* be the grammar with vocabulary *V* = {*S,A, a, b*}, **[set of terminals *T* = {*a, b*},]** starting symbol *S*, and productions *P* = {*S* → *aA*, *S* → *b*, *A* → *aa*}. What is *L(G)*, the language of this grammar? ***Solution:*** From the start state *S* we can derive *aA* using the production *S* → *aA*. We can also use the production *S* → *b* to derive *b*. From *aA* the production *A* → *aa* can be used to derive *aaa*. No additional words can be derived. Hence, *L(G)* = {*b, aaa*}. **EXAMPLE 4** Let *G* be the grammar with vocabulary *V* = {*S,* 0*,* 1}, **[set of terminals *T* = {0*,* 1},]** starting symbol *S*, and productions *P* = {*S* → 11*S, S* → 0}. What is *L(G)*, the language of this grammar? ***Solution**:* From *S* we can derive 0 using *S* → 0, or 11*S* using *S* → 11*S*. From 11*S* we can derive either 110 or 1111*S*. From 1111*S* we can derive 11110 and 111111*S*. At any stage of a derivation we can either add two 1s at the end of the string or terminate the derivation by adding a 0 at the end of the string. We surmise that *L(G)* = {0*,* 110*,* 11110*,* 1111110*,...* }, the set of all strings that begin with an even number of 1s and end with a 0. This can be proved using an inductive argument that shows that after *n* productions have been used, the only strings of terminals generated are those consisting of *n* − 1 concatenations of 11 followed by 0. **How to Construct a Grammar for a Language?** The problem of constructing a grammar that generates a given language often arises. Examples 5, 6, and 7 describe problems of this kind. **EXAMPLE 5** Give a phrase-structure grammar that generates the set {0*^n^*1*^n^* \| *n* = 0*,* 1*,* 2*,...* }. ***Solution**:* Two productions can be used to generate all strings consisting of a string of 0s followed by a string of the same number of 1s, including the null string. The first builds up successively longer strings in the language by adding a 0 at the start of the string and a 1 at the end. The second production replaces *S* with the empty string. The solution is the grammar *G* = *(V, T, S, P)*, where *V* = {0*,* 1*, S*}, *T* = {0*,* 1}, *S* is the starting symbol, and the productions are *S* → 0*S*1 *S* → *λ.* The verification that this grammar generates the correct set is left as an exercise. **Example 5** involved the set of strings made up of 0s followed by 1s, where the number of 0s and 1s are the same. Example 6 considers the set of strings consisting of 0s followed by 1s, where the number of 0s and 1s may differ. **EXAMPLE 6** Find a phrase-structure grammar to generate the set {0*^m^*1*^n^* \| *m* and *n* are nonnegative integers}. ***Solution:*** We will give two grammars *G*1 and *G*2 that generate this set. This will illustrate that two grammars can generate the same language. The grammar*G*1 has alphabet *V* ={*S,* 0*,* 1}; terminals *T* ={0*,* 1}; and productions *S* → 0*S*, *S* → *S*1, and *S* → *λ*. *G*1 generates the correct set, because using the first production *m* times puts *m* 0s at the beginning of the string, and using the second production *n* times puts *n* 1s at the end of the string. The grammar *G*2 has alphabet *V* = {*S,A,* 0*,* 1}; terminals *T* = {0*,* 1}; and productions *S* → 0*S*, *S* → 1*A*, *S* → 1, *A* → 1*A*, *A* → 1, and *S* → *λ*. The details that this grammar generates the correct set are left as an exercise. **EXAMPLE 7** One grammar that generates the set {0*^n^*1*^n^*2*^n^* \| *n* = 0*,* 1*,* 2*,* 3*,...* } is *G* = *(V, T, S, P)* with *V* = {0*,* 1*,* 2*, S,A,B,C*}; ***T* = {0*,* 1*,* 2};** starting state *S*; and productions *S* → *C*, *C* → 0*CAB*, *S* → *λ*, *BA* → *AB*, 0*A* → 01, 1*A* → 11, 1*B* → 12, and 2*B* → 22. The grammar given is the simplest type of grammar that generates this set, in a sense that will be made clear later in this section.