Cis217 Concepts of Programming Languages - Chapter 3 PDF
Document Details
Uploaded by AccomplishedUniverse8955
Tags
Summary
This document is an academic chapter on concepts of programming languages. It details syntax and semantics of programming languages. It introduces Backus-Naur Form (BNF), context-free grammars, and related topics in programming language theory. It includes example programs and derivations.
Full Transcript
CIS217 CHAPTER-3 CONCEPTS OF Describing Syntax and Semantics PROGRAMMING LANGUAGES Outlines: 3.1 Introduction 3.2 The General Problem of Describing Syntax 3.3 Formal Methods of Describing Syntax 3.4 Attribute Grammars 3.5 Describing the Meanings of Programs: D...
CIS217 CHAPTER-3 CONCEPTS OF Describing Syntax and Semantics PROGRAMMING LANGUAGES Outlines: 3.1 Introduction 3.2 The General Problem of Describing Syntax 3.3 Formal Methods of Describing Syntax 3.4 Attribute Grammars 3.5 Describing the Meanings of Programs: Dynamic Semantics Summary 3.1 Introduction Syntax – the form of the expressions, statements, and program units Semantics - the meaning of the expressions, statements, and program units. Ex: the syntax of a Java while statement is while (boolean_expr) statement The semantics of this statement form is that when the current value of the Boolean expression is true, the embedded statement is executed. 3.2 The General Problem of Describing Syntax A sentence or “statement” is a string of characters over some alphabet. The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language. A language is a set of sentences. A lexeme is the lowest level syntactic unit of a language. It includes identifiers, literals, operators, and special word (e.g. *, sum, begin). A program is strings of lexemes. A token is a category of lexemes (e.g., identifier). An identifier is a token that have lexemes, or instances, such as sum and total. Ex: index = 2 * count + 17; Lexemes Tokens Lexemes Tokens Index identifier 17 int_literal + plus_op = equal_sign ; semicolon 2 int_literal * mult_op Language Recognizers and Generators In general, language can be formally defined in two distinct ways: by recognition and by generation. Language Recognizers: A recognition device reads input strings of the language and decides whether the input strings belong to the language. It only determines whether given programs are in the language. Example: syntax analyzer part of a compiler. The syntax analyzer, also known as parsers, determines whether the given programs are syntactically correct. Language Generators: A device that generates sentences of a language One can determine if the syntax of a particular sentence is correct 3.3 Formal Methods of Describing Syntax The formal language generation mechanisms are usually called grammars. Grammars are commonly used to describe the syntax of programming languages. 3.3.1 Backus-Naur Form and Context-Free Grammars It is a syntax description formalism that became the most widely used method for programming language syntax. 3.3.1.1 Context-free Grammars Context-free and regular grammars are useful for describing the syntax of programming languages. Tokens of programming languages can be described by regular grammars. Whole programming languages can be described by context-free grammars. 3.3.1.2 Origins of Backus-Naur Form (1959) BNF (Backus-Naur Form) is equivalent to context-free grammars used for describing syntax. 3.3.1.3 Fundamentals A metalanguage is a language used to describe another language. BNF is a metalanguage for programming language. In BNF, abstractions are used to represent classes of syntactic structures. → = A rule has a left-hand side (LHS) “The abstraction being defined” and a right-hand side (RHS) “consists of some mixture of tokens, lexemes and references to other abstractions”, and consists of terminal and nonterminal symbols. Example: total = subtotal1 + subtotal2 A grammar is a finite nonempty set of rules and the abstractions are called nonterminal symbols, or simply nonterminal. The lexemes and tokens of the rules are called terminal symbols or terminals A BNF description, or grammar, is simply a collection of rules. An abstraction (or nonterminal symbol) can have more than one RHS For Example, a Java if statement can be described with the rule 3.3.1.4 Describing Lists Syntactic lists are described using recursion. → identifier | identifier, A rule is recursive if its LHS appears in its RHS. 3.3.1.5 Grammars and Derivations The sentences of the language are generated through a sequence of applications of the rules, beginning with a special nonterminal of the grammar called the start symbol. A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols) Example 3.1 A Grammar for a Small language: An example derivation for a program begin A = B + C; B = C end → begin end → | ; → = => begin end → A|B|C => begin ; end → + | - | => begin = ; end => begin A = ; end => begin A = + ; end => begin A = B + ; end => begin A = B + C; end => begin A = B + C; end => begin A = B + C; = end => begin A = B + C; B = end => begin A = B + C; B = end => begin A = B + C; B = C end 3.3.1.6 Parse Trees Hierarchical structures of the language are called parse trees. A parse tree for the simple statement A = B * (A + C) 3.3.1.7 Ambiguity A grammar is ambiguous if it generates a sentential form that has two or more distinct parse trees. Example 3.3 An Ambiguous Grammar for Small Assignment Statements Two distinct parse trees for the same sentence, A = B + C * A 3.3.1.8 Operator Precedence When an expression includes two different operators, for example, x + y * z, one obvious semantic issue is the order of evaluation of the two operators. The fact that an operator in an arithmetic expression is generated lower in the parse tree can be used to indicate that it has higher precedence over an operator produced higher up in the tree. Example 3.4 An unambiguous Grammar for Expressions → = → A | B | C → + | → * | → () 3.3.1.9 Associativity of Operators When an expression includes two operators that have the same precedence (as * and / usually have)—for example, A / B * C— a semantic rule is required to specify which should have precedence. This rule is named associativity. When a grammar rule has LHS also appearing at beginning of its RHS, the rule is said to be left recursive and visa versa. 3.3.2 Extended BNF EBNF extensions do not enhance the descriptive powers of BNF; they only increase its readability and writability. Three extension are commonly included in various versions of EBNF : Optional parts are placed in brackets ([ ]) → if (expression>) [else ] Put repetitions (0 or more) in braces ({ }) → {, } Put multiple-choice options of RHSs in parentheses and separate them with vertical bars (|, OR operator) → (* | / | %) 3.4 Attribute Grammars An attribute grammar is a device used to describe more of the structure of a programming language than can be described with a context-free grammar. 3.4.1 Static Semantics The static semantics of a language is only indirectly related to the meaning of programs during execution; rather, it has to do with the legal forms of programs (syntax rather than semantics). Many static semantic rules of a language state its type constraints. Static semantics is so named because the analysis required to these specifications can be done at compile time. 3.4.2 Basic Concepts Attribute grammars have additions to are context-free grammars to carry some semantic information on parse tree nodes. Attribute grammars are context-free grammars to which have been added attributes, attribute computation functions, and predicate function. 3.4.3 Attribute Grammars Defined Synthesized attributes are used to pass semantic information up a parse tree. 3.4.4 Intrinsic Attributes Intrinsic attributes are synthesized attributes of leaf nodes whose values are determined outside the parse tree. For example, the type of an instance of a variable in a program could come form the symbol table, which is used to store variable names and their types. 3.4.5 Examples Attribute Grammars The syntax portion of our example attribute grammar is → = → + | → A | B | C actual_type - A synthesized attribute associated with nonterminal and. – It is used to store the actual type, int or real, or a variable or expression. – In the case of a variable, the actual type is intrinsic. – In the of an expression, it is determined from the actual types of child node or children nodes of the nonterminal. 3.4.6 Computing Attribute Values Consider the process of computing the attribute values of a parse tree, which is sometimes called decorating the parse tree. 3.4.7 Evaluation Checking the static semantic rules of a language is an essential part of all compiler. One of the main difficulties in using an attribute grammar to describe all of the syntax and static semantics of a real contemporary programming language is the size and complexity of the attribute grammar. –Furthermore, the attribute values on a large parse tree are costly to evaluate. 3.5 Describing the Meanings of Programs: Dynamic Semantics Three methods of semantic description: 3.5.1 Operational semantics: It is a method of describing the meaning of language constructs in terms of their effects on an ideal machine. Operational semantics depends on programming languages of lower level, not mathematics and logic. 3.5.2 Denotation semantics: Mathematical objects are used to represents the meanings of languages constructs. Language entities are converted to these mathematical objects with recursive functions. 3.5.3 Axiomatic semantics: It is based on formal(Mathematical) logic and devised as a tool for proving the correctness of programs. It is defined in conjunction with the development of a method to prove the correctness of programs. Such correction proofs, when they can be constructed, show that a program performs the computation described by its specification. In a proof, each statement of a program is both preceded and followed by a logical expression that specified constraints on program variables. Approach: Define axioms or inference rules for each statement type in the language (to allow transformations of expressions to other expressions.) The expressions are called assertions. 3.5.3.1 Assertions The logical expressions are called predicates, or assertions. An assertion before a statement (a precondition) states the relationships and constraints among variables that are true at that point in execution. An assertion following a statement is a postcondition. 3.5.3.2 Weakest Preconditions A weakest precondition is the least restrictive precondition that will guarantee the validity of the associated postcondition. An example: a = b + 1 {a > 1} One possible precondition: {b > 10} Weakest precondition: {b > 0} Program proof process: The postcondition for the whole program is the desired result. Work back through the program to the first statement. If the precondition on the first statement is the same as the program spec, the program is correct. An Axiom is a logical statement that is assumed to be true 3.5.3.3 Assignment Statements Ex: a = b / 2 – 1 {a < 10} The weakest precondition is computed by substituting b / 2 -1 in the assertion {a < 10} as follows: b / 2 – 1 < 10 b/2 < 11 b < 22 ∴ the weakest precondition for the given assignment and the postcondition is {b < 22} 3.5.3.4 Sequences The weakest precondition for Theaweakest sequence of statements precondition cannot is computed as follows:be described by an axiom, because the precondition depends on the particular kinds of statements in the y + 3 < 10 y 0} This produce precondition {y – 1 > 0} or {y > 1} Now we apply the same axiom to the else clause y = y + 1 {y > 0} 3.5.3.6 Logical Pretest Loops Computing the weakest precondition (wp) for a while loop is inherently more difficult than for a sequence b/c the number of iterations cannot always be predetermined. The corresponding step in the axiomatic semantics of a while loop is finding an assertion called a loop invariant, which is crucial to finding the weakest precondition. Ex: while y x do y = y + 1 end {y = x} For 0 iterations, the wp is → {y = x} For 1 iteration, wp(y = y + 1, {y = x}) = {y + 1 = x}, or {y = x – 1} For 2 iterations, wp(y = y + 1, {y = x - 1}) = {y + 1 = x - 1}, or {y = x – 2} For 3 iterations, wp(y = y + 1, {y = x - 2}) = {y + 1 = x - 2}, or {y = x – 3} It is now obvious that {y < x} will suffice for cases of one or more iterations. Summary BNF and context-free grammars are equivalent meta-languages – Well-suited for describing the syntax of programming languages An attribute grammar is a descriptive formalism that can describe both the syntax and the semantics of a language Three primary methods of semantics description – Operation, denotational, axiomatic