Formal Language Theory PDF
Document Details
Uploaded by Deleted User
Université Constantine 2 - Abdelhamid Mehri
2024
Wafa Mousser
Tags
Summary
This document is a set of lecture notes on formal language theory and the Chomsky hierarchy. The notes cover topics like grammars, languages, and the Chomsky hierarchy. These notes are from a course at Constantine 2 University during the 2024/2025 academic year.
Full Transcript
Abdelhamid Mehri - Constantine 2 University New Information Technologies Faculty Formal Language Theory Chapter 4 : Grammars and Chomsky Hierarchy Wafa Mousser 2024/2025 License 2 of...
Abdelhamid Mehri - Constantine 2 University New Information Technologies Faculty Formal Language Theory Chapter 4 : Grammars and Chomsky Hierarchy Wafa Mousser 2024/2025 License 2 of Mathematics-Computer Science Semester 3 Wafa Mousser Formal Language Theory 2024/2025 1 / 25 Outline 1 Grammars 2 Languages 3 Chomsky hierarchy 4 Proprieties 5 Applications W. Mousser Grammars and Chomsky Hierarchy 2024/2025 2 / 25 Grammars Denitions Grammar and language Denition (Grammar) A grammar is a set of rules which describe how to generate strings over a language Example The universal formula for a sentence in English is : Sentence = Subject + Verb + Object Denition (Language) A language is dened by the grammar which describe how to generate all the strings over the language W. Mousser Grammars and Chomsky Hierarchy 2024/2025 3 / 25 Grammars Formal denitions Denition (Grammar) A quadruple G = (VT , VN , S, R) where : VT is the set of terminal elements VN is the set of non-terminals where VT ∩ VN = ϕ and VT ∪ VN = V S ∈ VN is the starting rule R is a nite set of rules Example G = (VT , VN , S, R) where : VT = {she, he, eats, drinks, chocolate, juice} VN = {Sentence, Subject, Verb, Object} ⎧ 1 ⎪ ⎪ ⎪ Sentence Ð → Subject Verb Object ⎪ ⎪ 2 Sentence is the starting rule R = ⎪ ⎪ ⎪Subject Ð → she ∣3 he ⎨ R is a nite set of rules ⎪ ⎪ ⎪ Verb Ð 4 → eats ∣5 drinks ⎪ ⎪ ⎪ 6 ⎪ ⎩Object Ð ⎪ → chocolate ∣7 juice W. Mousser Grammars and Chomsky Hierarchy 2024/2025 4 / 25 Grammars Formal denitions Rules Denition (Rules) A nite set of pairs from V ⋆ VN V ⋆ × V ⋆ , denoted by → Example G = (VT , VN , S, R) where : VT = {a, b, c} ,VN = {S, B, C } ⎧ 1 ⎪ ⎪ ⎪ SÐ→ aSBC ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎪ SÐ→ abC ⎪ ⎪ ⎪ 3 ⎪CB Ð ⎪ → BC R =⎨ 4 ⎪ ⎪ ⎪ ⎪ bB Ð→ bb ⎪ ⎪ ⎪ 5 ⎪ ⎪ ⎪ bC Ð→ bc ⎪ ⎪ ⎪ 6 ⎩cC Ð ⎪ → cc W. Mousser Grammars and Chomsky Hierarchy 2024/2025 5 / 25 Grammars Formal denitions Productions Denition (Productions) The rules which generate (produce) all the strings over the language, denoted by ⇒ α ⇒ β / α ∈ (V ⋆ VN V ⋆ )+ and β ∈ V ⋆ with V = VN ∪ VT Note Productions may be done in : One-step Multi-step W. Mousser Grammars and Chomsky Hierarchy 2024/2025 6 / 25 Grammars Formal denitions Productions One-step production G = (VT , VN , S, R) µ, α ∈ V ⋆ VN V ⋆ ν, σ, γ, η, β, θ ∈ V ⋆ Denition µ produces ν in one step, denoted µ⇒ν i µ = σαγ and ν = ηβθ and (α→β) ∈ R Example One-step production : 1 2 SÔ ⇒ aSBC Ô ⇒aabC BC W. Mousser Grammars and Chomsky Hierarchy 2024/2025 7 / 25 Grammars Formal denitions Productions Multi-step production G = (VT , VN , S, R) µ ∈ V ⋆ VN V ⋆ i ∈ [0, n] with n ≥ 0 ν, ρn ∈ V ⋆ Denition ⇒ν i µ produces ν by multiple steps, denoted µÔ ⋆ G ∃ρi ∈ V ⋆ VN V ⋆ , ∃ρi+1 ∈ V ⋆ where µ = ρ0 , ν = ρn and ρi ⇒ρi+1 Example Multi-step production : S Ô ⋆ ⇒aabbcc G 1 2 3 4 5 ⇒ a S BC Ô SÔ ⇒ aab CB C Ô ⇒ aa bB CC Ô ⇒ aab bC C Ô ⇒ 6 aabb cC Ô ⇒ aabbcc W. Mousser Grammars and Chomsky Hierarchy 2024/2025 8 / 25 Languages Languages produced by grammars Languages & grammars G = (VT , VN , S, R) Denition (µ is generated by G ) Strings that can be generated by the grammar G are produced by S. µ ∈ VT⋆ and S Ô ⋆ ⇒µ Denition A language L is the set of strings produced by the grammar G. L(G ) = {ω/ω ∈ VT⋆ and S Ô ⋆ ⇒ ω} G Remarks A grammar produces one language A language can be produced by one or more grammars W. Mousser Grammars and Chomsky Hierarchy 2024/2025 9 / 25 Languages Languages produced by grammars Languages & grammars Example 2 5 ⇒ a bC Ô SÔ ⇒ abc 1 2 5 ⇒ a S BC Ô SÔ ⇒ aa bC BC Ô ⇒ aabcBC leads nowhere 1 2 3 4 5 ⇒ a S BC Ô SÔ ⇒ aab CB C Ô ⇒ aa bB CC Ô ⇒ aab bC C Ô ⇒ 6 aabb cC Ô ⇒ aabbcc 1 2 3 n 4 ⇒ an S (BC )n Ô SÔ ⇒ an ab C(B C )n Ô ⇒ an a bB CC n Ô ⇒ n n n 5 n 6 ⇒ an abb n cC Ô an abb n C C n Ô ⇒ aan bb n cc n n n ∗ SÔ ⇒ n+ 1 n+1 n+1 a b c G L(G ) = {an+1 b n+1 c n+1 with n ≥ 0} L(G ) is the set of strings that consist of 1 or more a, followed by the same number of b, followed by the same number of c W. Mousser Grammars and Chomsky Hierarchy 2024/2025 10 / 25 Chomsky hierarchy Chomsky Hierarchy Denition (Types of grammars) The type of a grammar is related to the restrictions present on the grammar's rules In 1957, the American professor, Noam Chomsky classies grammars into 4 included levels : Type-0 Type-1 Type-2 Figure Noam Chomsky Type-3 (2017) W. Mousser Grammars and Chomsky Hierarchy 2024/2025 11 / 25 Chomsky hierarchy Type-0 grammars Type-0 grammars & recursively enumerable languages Denition (Type-0 grammar) G is a type-0 grammar i ∀r ∈ R , r has the following form : µ → ν / µ ∈ V ⋆ VN V ⋆ and ν ∈ V ⋆ Theorem Type-0 grammars generate recursively enumerable languages Remark Recursively enumerable languages are recognized by a Turing machine Example : G = (VT , VN , S, R) with VT = {a, b, c}, VN = {S, A, B} R = {S → ACaB , Bc → acB , CB → DB ; aD → Db} W. Mousser Grammars and Chomsky Hierarchy 2024/2025 12 / 25 Chomsky hierarchy Type-1 grammars Type-1 grammars & context-sensitive languages Denition (Type-1 grammar) G is a type-1 grammar i ∀r ∈ R , r has the following form : µ → ν / µ ∈ V ⋆ VN V ⋆ and ν ∈ V ⋆ and ∣µ∣ ≤ ∣ν∣ Theorem Type-1 grammars generate context-sensitive languages Remark Context-sensitive languages are recognized by a linear bounded automaton Example : G = (VT , VN , S, R) with VT = {a, b, c}, VN = {S, A, B} R = {S → ABc, AB → AbBc, A → bcA, B → b} W. Mousser Grammars and Chomsky Hierarchy 2024/2025 13 / 25 Chomsky hierarchy Type-2 grammars Type-2 grammars & context-free languages Denition (Type-2 grammar) G is a type-2 grammar i ∀r ∈ R , r has the following form : µ → ν / µ ∈ VN and ν ∈ V ⋆ Theorem Type-2 grammars generate context-free languages Remark Context-free languages are recognized by a non-deterministic push-down automaton Example : G = (VT , VN , S, R) with VT = {a, b, c}, VN = {S, A, B} R = {S → Aa, A → a, A → aB, B → abc, B → b} W. Mousser Grammars and Chomsky Hierarchy 2024/2025 14 / 25 Chomsky hierarchy Type-3 grammars Type-3 grammars & regular languages Denition (Type-3 grammar) G is a type-3 grammar i µ, η ∈ VN and ν ∈ VT , ∀r ∈ R , r has the following form : µ → νη | ν Xor µ → ην | ν Left linear regular grammar Right linear regular grammar Theorem Type-3 grammars generate regular languages Remark Regular languages are recognized by nite state automata Example : G = (VT , VN , S, R) with VT = {a, b, c}, VN = {S, A, B} R = {S → ε, S → a, S → aA, A → aA, A → aB, B → bB, B → b} W. Mousser Grammars and Chomsky Hierarchy 2024/2025 15 / 25 Chomsky hierarchy Grammars & Languages Grammars & Languages ω ∈ VT⋆ and n > 0 Figure Chomsky hierarchy W. Mousser Grammars and Chomsky Hierarchy 2024/2025 16 / 25 Proprieties Grammar Grammar proper inclusions Theorem A type-i grammar is also type-(i-1) Remarks The grammar's type increases with the increase of the restrictions present on the grammar's rules The language type decreases with the increase of restrictions on strings (words) form Figure Grammar inclusions W. Mousser Grammars and Chomsky Hierarchy 2024/2025 17 / 25 Proprieties Language Language proper inclusions Theorem Every regular language is context-free Every context-free language is context-sensitive Every context-sensitive language is recursively enumerable Regular ⊂ context-free ⊂ context-sensitive ⊂ recursively enumerable R ⊂ CF ⊂ CS ⊂ RE Remarks ∃ context-free languages that are not regular as an b n / n ≥ 0 ∃ context-sensitive languages that are not context-free as an b n c n / n≥0 ∃ recursively enumerable languages that are not context-sensitive W. Mousser Grammars and Chomsky Hierarchy 2024/2025 18 / 25 Proprieties Language Language type Theorem A language L is type-i i : 1 L is generated by a type-i grammar G 2 ∄G ′ where L(G ′ ) = L and G ′ is type-j with j > i Example : G = (VT , VN , S, R) with VT = {harry, grune, tom, and, ','} , VN = {Sentence, List, Name, End} Sentence Ð→ Name |2 List End ⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Name Ð → harry |4 grune |5 tom ⎪ 3 ⎪ R =⎨ G 1 generates names list List Ð → Name | Name ',' List 6 ⎪ 7 ⎪ ⎪ ⎪ ⎪ ⎩ ',' Name End Ð → and Name ⎪ 8 ⎪ ⎪ 1. Grune, D., & Jacobs, C. J. (2007). Parsing techniques, Springer US W. Mousser Grammars and Chomsky Hierarchy 2024/2025 19 / 25 Proprieties Language Language type Example G is a type-0 grammar because : ∀r ∈ R , r has the following form : µ → ν / µ ∈ V + and ν ∈ V ⋆ L is the language generated by G. L is recursively enumerable i : ∄G1 where L(G1 ) = L(G ) and G1 is a type-1 grammar (CSG) Consider G1 = (VT , VN , S, R) with VT = {harry, grune, tom, and, ','} , VN = {Sentence, List, Name, EndName} Sentence Ð → Name |2 List ⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Name Ð → harry |4 grune |5 tom ⎪ 3 ⎪ R =⎨ L(G1 ) = L(G ) List EndName | Name ',' List 6 ⎪ 7 ⎪ ⎪ ⎪ Ð → ⎪ ⎩ ',' EndName Ð → and Name ⎪ 8 ⎪ ⎪ W. Mousser Grammars and Chomsky Hierarchy 2024/2025 20 / 25 Proprieties Language Language type Example G1 is a type-1 grammar because : ∀r ∈ R , r has the following form : µ → ν / µ ∈ V + and ν ∈ V ⋆ and ∣µ∣ ≤ ∣ν∣ L(G ) is a context-sensitive language i : ∄G2 where L(G2 ) = L(G ) and G2 is a type-2 grammar (CFG) Consider G2 = (VT , VN , S, R) with VT = {harry, grune, tom, and, ','} , VN = {Sentence, List, Name, End} Sentence Ð→ Name |2 List and Name ⎧ 1 ⎪ ⎪ ⎪ ⎪ ⎪ R = ⎨ Name Ð → harry |4 grune |5 tom 3 L(G2 ) = L(G ) ⎪ ⎪ ⎩ List Ð → Name |7 Name ',' List ⎪ 6 ⎪ ⎪ W. Mousser Grammars and Chomsky Hierarchy 2024/2025 21 / 25 Proprieties Language Language type Example G2 is a type-2 grammar because : ∀r ∈ R , r has the following form : µ → ν / µ ∈ VN and ν ∈ V ⋆ L(G ) is a context-free language i : ∄G3 where L(G3 ) = L(G ) and G3 is a type-3 grammar (RG) Consider G3 = (VT , VN , S, R) with VT = {harry, grune, tom, and, ','} , VN = {Sentence, List, Tail, ListTail, Name} Sentence Ð → harry List |2 grune List |3 tom List ⎧ 1 ⎪ ⎪ ⎪ ⎪ List Ð → ',' Tail |5 and Name |6 ε ⎪ ⎪ ⎪ 4 ⎪ ⎪ ⎪ ⎪ R = ⎨ Tail Ð→ harry ListTail |8 grune ListTail |9 tom ListTail 7 ⎪ ⎪ ListTail → , Tail |11 and Name ⎪ 10 ⎪ ⎪ ⎪ Ð ⎪ ⎪ ⎩ Name Ð→ harry | grune | tom ⎪ 12 ⎪ ⎪ 13 14 L(G3 ) = L(G ) W. Mousser Grammars and Chomsky Hierarchy 2024/2025 22 / 25 Proprieties Language Language type Example G3 is a type-3 grammar because : ∀r ∈ R , r has the following form : µ → νη | ν where : µ, η ∈ VN and ν ∈ VT L(G ) is a regular language because : ∃G3 where L(G3 ) = L(G ) and G3 is a regular grammar L(G ) is regular ⇒ ∃R a regular expression where L(G ) = L(R) R = (((h∣g ∣t), )⋆ (h∣g ∣t)and)? (h∣g ∣t) , Tail ListTail , h∣g ∣t and h∣g ∣t and h∣g ∣t S List Name W. Mousser Grammars and Chomsky Hierarchy 2024/2025 23 / 25 Proprieties Closure properties Closure properties L1 and L2 are type-i languages : Theorem Languages of type-i are closed with mirror, union, concatenation, recursive concatenation and Kleene closure i.e : L∼1 , L1 ∪ L2 , L1 ⋅ L2 and L⍟1 are type-i languages Theorem Languages of type-i are not closed with complement and intersection i.e : L1 ∩ L2 , L1 and L2 are not type-i languages W. Mousser Grammars and Chomsky Hierarchy 2024/2025 24 / 25 Applications Applications Popular tools TLM-project is a simple program to work with grammars Chomsky-Hierarchy a python package that returns grammar's type W. Mousser Grammars and Chomsky Hierarchy 2024/2025 25 / 25