Context-Free Grammars
40 Questions
4 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary task of syntax analysis?

  • Recognizing syntactic structure in a token stream (correct)
  • Converting source code into machine code
  • Managing memory allocation during program execution
  • Optimizing the performance of a program

Regular expressions are sufficient for describing languages that require balanced parentheses.

False (B)

What type of grammar is similar to regular expressions but includes recursion?

context-free grammar

In a context-free grammar, symbols that cannot be broken down further are called ______ symbols.

<p>terminal</p> Signup and view all the answers

Match the following grammar notations with their descriptions:

<p>BNF (Backus-Naur Form) = A notation developed for Algol-60 that uses ::= to define productions. EBNF (Extended Backus-Naur Form) = An extension of BNF that includes symbols for optional and repetitive parts. Scheme Reports (R6RS) = A notation that uses ellipses (...) to indicate repetition zero or more times.</p> Signup and view all the answers

What does EBNF add to BNF?

<p>Symbols for optional and repetitive parts (A)</p> Signup and view all the answers

A left-recursive grammar is generally preferred for top-down parsers.

<p>False (B)</p> Signup and view all the answers

What is the main issue with ambiguous grammars?

<p>multiple parse trees</p> Signup and view all the answers

A grammar that generates more than one parse tree for a single input string is considered ______.

<p>ambiguous</p> Signup and view all the answers

Match the parser types with their derivation approach:

<p>LL(k) = Left-to-right, Leftmost derivation LR(k) = Left-to-right, Rightmost derivation</p> Signup and view all the answers

Which of the following parser types uses a bottom-up parsing approach?

<p>LR(1) (A)</p> Signup and view all the answers

A concrete parse tree is the same as an abstract syntax tree.

<p>False (B)</p> Signup and view all the answers

What is the root node of a concrete parse tree?

<p>start symbol</p> Signup and view all the answers

In a concrete parse tree, ______ nodes are terminal symbols (tokens).

<p>leaf</p> Signup and view all the answers

Match the tree type with its description.

<p>Concrete Parse Tree = Represents how the input was parsed and includes all terminal and non-terminal symbols. Abstract Parse Tree = Focuses on the syntactic structure of the input, omitting details of the parsing process.</p> Signup and view all the answers

What does an abstract syntax tree primarily represent?

<p>The syntactic structure of the input (A)</p> Signup and view all the answers

In right-most derivation, the left-most non-terminal is expanded at each step.

<p>False (B)</p> Signup and view all the answers

In parsing, what is the purpose of derivations?

<p>define language</p> Signup and view all the answers

Expanding the right-most non-terminal in each step corresponds to ______ parsing.

<p>bottom-up</p> Signup and view all the answers

Match the type of derivation to its corresponding parsing approach:

<p>Left-most derivation = Top-down parsing Right-most derivation = Bottom-up parsing</p> Signup and view all the answers

Which of the following is true about higher-precedence operators in a desired parse tree?

<p>They should be below lower-precedence operators. (A)</p> Signup and view all the answers

Parentheses must always be explicitly represented in a parse tree.

<p>False (B)</p> Signup and view all the answers

How is associativity of operators represented in a parse tree?

<p>tree structure</p> Signup and view all the answers

In an unambiguous expression grammar, lower-precedence operators are at the ______ of the tree.

<p>top</p> Signup and view all the answers

Match the concept with its description in the context of parsing:

<p>Precedence of Operators = Determines which operators are evaluated first in an expression. Associativity of Operators = Determines how operators of the same precedence are grouped in the absence of parentheses.</p> Signup and view all the answers

What is a common issue that arises with nested if-statements in ambiguous grammars?

<p>Dangling else problem (A)</p> Signup and view all the answers

An ambiguous grammar for if-statements is always preferred for its conciseness.

<p>False (B)</p> Signup and view all the answers

In the context of the dangling else problem, what does a matched statement refer to?

<p>if with else</p> Signup and view all the answers

The 'then' part is always matched, therefore each 'else' goes with the ______ 'if'.

<p>inner</p> Signup and view all the answers

Match the following terms related to shift-reduce conflicts with their meanings:

<p>Shift-Reduce Conflict = The parser can either shift the next input symbol onto the stack or reduce a sequence of symbols from the top of the stack according to a grammar rule. The CUP tool resolves it in favor of shift. Reduce-Reduce Conflict = The parser can reduce the same sequence of symbols on the top of the stack by two or more different grammar rules. The CUP tool resolves it based on the order of rules.</p> Signup and view all the answers

In JavaCUP, how is a shift-reduce conflict typically resolved by default?

<p>By shifting (B)</p> Signup and view all the answers

In Tiger, the assignment operator is right-associative.

<p>False (B)</p> Signup and view all the answers

What is a common approach to resolving reduce/reduce conflicts?

<p>rule order</p> Signup and view all the answers

A C++ statement like C x(a, b, c); can be ambiguous because it could be a forward declaration or a ______.

<p>constructor call</p> Signup and view all the answers

Which parser technology uses compressed LR(1) parse tables?

<p>LALR(1) (C)</p> Signup and view all the answers

LL parsing directly handles left recursion without modification.

<p>False (B)</p> Signup and view all the answers

What is the purpose of FIRST and FOLLOW sets in LL parsing?

<p>lookahead</p> Signup and view all the answers

In recursive descent parsing, the code structure corresponds to the ______ structure.

<p>grammar</p> Signup and view all the answers

Match the parsing term to its action:

<p>Shift = Moves a token from input to the stack Reduce = Replaces RHS of a rule on the stack by the LHS</p> Signup and view all the answers

Which type of parser generator is yacc?

<p>Bottom-Up Parser (B)</p> Signup and view all the answers

Flashcards

Syntax Analysis Task

Recognizing the syntactic composition of a token stream.

Parse Tree Production

Producing a tree-like depiction of a program's structure.

Terminal Symbols

Symbols that form the basic vocabulary of the language.

Non-terminal Symbols

Symbols representing higher-level syntactic constructs.

Signup and view all the flashcards

Start Symbol

A special non-terminal symbol where analysis begins.

Signup and view all the flashcards

Production Rule

Defines how non-terminals can be replaced by other terminals/non-terminals.

Signup and view all the flashcards

BNF (Backus-Naur Form)

A formalism develoepd for Algol-60.

Signup and view all the flashcards

EBNF (Extended Backus-Naur Form)

An extended variant of a specific form.

Signup and view all the flashcards

Top-level Alternation

Describes alternation, like (a | b).

Signup and view all the flashcards

Right-Recursive Grammar

A grammar that calls itself on the right.

Signup and view all the flashcards

Left-Recursive Grammar

A grammar that calls itself on the left.

Signup and view all the flashcards

Left-to-right Parsing

Parsing proceeds from left to right.

Signup and view all the flashcards

Concrete Parse Tree

Represents syntactic structure.

Signup and view all the flashcards

Leaves of Parse Tree

Symbols at the base of derivation.

Signup and view all the flashcards

Interior Nodes

Points that apply production rules.

Signup and view all the flashcards

Ambiguous Grammar

Grammar permits multiple parse trees.

Signup and view all the flashcards

Derivation

Expresses how to derive strings.

Signup and view all the flashcards

Left-most Derivation

Use left-most non-terminal.

Signup and view all the flashcards

Right-most Derivation

Use right-most non-terminal.

Signup and view all the flashcards

Abstract Parse Tree

Represents structure and ignores parsing steps.

Signup and view all the flashcards

Operator Precedence

Prioritization of operations.

Signup and view all the flashcards

Parentheses in Parse Trees

Grouping expressions implicitly.

Signup and view all the flashcards

Matched Statement

An if construct that needs an else.

Signup and view all the flashcards

Unmatched Statement

An if construct lacking an else.

Signup and view all the flashcards

Dangling Else

When an else matches the wrong if.

Signup and view all the flashcards

Parse Conflict

Table cell with multiple actions.

Signup and view all the flashcards

Shift-Reduce Conflict

Conflict between shifting/reducing.

Signup and view all the flashcards

Reduce-Reduce Conflict

Conflict between multiple reductions.

Signup and view all the flashcards

Error Recovery

Corrects syntax problems after the fact.

Signup and view all the flashcards

Error Token

A symbol used in grammar rules for error handling.

Signup and view all the flashcards

Error Productions

Grammar rules made for specific syntax errors.

Signup and view all the flashcards

Single-Pass Compiler

Single pass compilation.

Signup and view all the flashcards

Multi-Pass Compiler

Multi-Pass compilation.

Signup and view all the flashcards

Recursive Descent Parser

Parser that uses functions to parse.

Signup and view all the flashcards

LL(1) Grammar

A grammar that must not have loops or multi-entries in the parse table.

Signup and view all the flashcards

FIRST sets

A set of terminals. Derivations can start.

Signup and view all the flashcards

FOLLOW sets

A terminal that derivation can follow.

Signup and view all the flashcards

Burke-Fisher Error Repair

Technique of the Burke-Fisher Algorithm.

Signup and view all the flashcards

Nullable(X)

When an expression can derive the empty string.

Signup and view all the flashcards

One Token Lookahead

Looks ahead one Token

Signup and view all the flashcards

Study Notes

  • Syntax analysis involves recognizing syntactic structure in a token stream and producing a parse tree that represents the structure of the input program.

Regular Expressions

  • Regular expression macros can be used for summation expressions; an input of "10+2+30" is valid using digits = [0-9]+ & sum = ({digits} "+")* {digits}
  • Regular expressions cannot describe an input like "10+(2+30)", as it is not a regular language.

Context-Free Grammars

  • A context-free grammar is similar to regular expressions but include recursion for added functionality.
  • Top-level alternation translates into aux = c | d & expr = a b aux e where expr = a b (c | d) e.
  • Kleene closure translates into expr = a b c expr & expr = ε where expr = (a b c)*

Key Components

  • Terminal symbols (tokens) are fundamental units recognized by the scanner.
  • Non-terminal symbols represent higher-level syntactic constructs.
  • Start symbol is the initial non-terminal from which derivations begin.
  • Grammar rules have the form N -> X*, where N is a non-terminal, and X is a terminal or non-terminal.

CFG Syntax Variants

  • BNF (Backus-Naur Form) was developed for Algol-60; ::= 'if' 'then' | ...
  • EBNF (Extended Backus-Naur Form) has uses square brackets for optional parts [ ... ] and curly brackets for repetition zero or more times { ... }.
  • Scheme Reports (R6RS) utilizes ellipses (...) for repetition zero or more times.

Grammar Notation

digit = 0
...
digit = 9
digits = digit
digits = digit digits
sum = expr "+" sum
sum = expr
expr = digits
expr = "(" sum ")"

Recursion

  • Right-recursive grammar: N -> t & N -> t N
  • Left-recursive grammar: N -> t & N -> N t

Example grammar

S -> S ; S S -> id := E S -> print ( L ) E -> id E -> num E -> E + E E -> ( S , E ) L -> E L -> L , E

Lists of Expressions

  • Left-recursive version: L -> E & L -> L , E
  • Right-recursive version: L -> E & L -> E , L
  • Ambiguous version L -> E & L -> L , L

Grammar Rule Styles

  • Left-recursive rules are preferable for left-associative operators and do not work with top-down parsers; the cost is O(n).
  • Right-recursive rules are compatible with both top-down and bottom-up parsers; the cost is O(n).
  • Ambiguous rules results in multiple possible parse trees; the cost is O(n³), supported by Earley parser & GLR parser.

Grammar Classes

  • LL(k): Left-to-right, Left-most derivations, k tokens lookahead
  • LR(k): Left-to-right, Right-most derivations, k tokens lookahead

Technologies

  • LL(0) is the simplest top-down parser.
  • LL(1) is a typical handwritten recursive-descent parser.
  • LL(k) represents parsers generated by a top-down parser generator.
  • LR(0) is the simplest bottom-up parser.
  • SLR is the simplest bottom-up parser with 1 token lookahead.
  • LALR(1) is using a bottom-up parser generator.
  • LR(1) is a bottom-up parser with 1 token lookahead.

Concrete Parse Trees

  • Concrete Parse Trees represents syntactic structure of input and summarizes how the input was parsed.
  • Leaves are terminal symbols (tokens), interior nodes are non-terminal symbols, and the root is the start symbol.
  • Reading leaves left-to-right produces the input.

Ambiguous Grammars

  • Ambiguous grammars can produce multiple parse trees for the same input.

Derivations

  • Derivations start with the start symbol and replace non-terminals with the right-hand side (RHS) of grammar rules until a terminal string is derived.
  • Used to define language described by grammar; language is set of all possible strings that can be derived from start symbol.
  • Left-most derivations expand the left-most non-terminal by RHS of grammar rule and corresponds to top-down (LL) parsing.
  • Right-most derivations expand the right-most non-terminal by RHS of grammar rule and corresponds to bottom-up (LR) parsing.

Abstract Parse Treees

  • Represents syntactic structure of input and do not summarize how the input was parsed.

Ambiguous Expression Grammar

  • This grammar is convenient and compact:

E -> id E -> num E -> E * E E -> E / E E -> E + E E -> E - E E -> ( E )

Desired Parse Trees

  • Higher-precedence operators should be below lower-precedence operators. The right tree for 1-2*3 is prefered.
  • For operators with equal precedence, parse trees should reflect associativity. The left parse tree for 1-2-3 reflects (1-2)-3.
  • Parentheses do not need representation in the parse tree and are implicit in tree structure.

Unambiguous Expression Grammar

E -> E + T E -> E - T E -> T T -> T * F T -> T / F T -> F F -> id F -> num F -> ( E )

  • Lower-precedence operators are at the top.
  • Left recursion corresponds to left-associative operators.
  • Necessary for handwritten parsers.
  • C, C#, and Java have 15 precedence levels, while C++ has 17.

Ambiguous Grammar

  • The grammar is ambigious for two nested if-statements (aka. a dangling else); should "else" belong to the inner or outer "if".
S -> if E then S else S
S -> if E then S
S -> other

Unambiguous Grammar

  • M: matched statement (each if comes with an else) & U: unmatched statement.
S -> M
S -> U
M -> if E then M else M
M -> other
U -> if E then S
U -> if E then M else U
  • The "then" part is matched, therefore each "else" goes with the inner "if".
  • LR parsers automatically repair the ambiguous if-statement grammar.

JavaCUP Precedence Directives

precedence nonassoc EQ, NEQ; precedence left PLUS, MINUS; precedence left TIMES, DIV; precedence right EXP; precedence left UMINUS; exp ::= INT | exp PLUS exp | exp MINUS exp | exp TIMES exp | MINUS exp %prec UMINUS ;

Comparison Operators

  • In Tiger, comparison operators are non-associative and enforced with a rewrite.
  • Java, consecutive comparison operators can result in a type error.
  • In C and C++, comparison operators are left-associative.

Assignment Operators

  • In the C family, the assignment operator is right-associative and returns the value of the right hand side (RHS).
  • Tiger does not allow assignments inside expressions.

Parse Trees in JavaCUP

  • Not every grammar rule needs a constructor call
Exp ::= LPAREN SeqExp:e RPAREN
{: RESULT = e; :}
;
-  SeqExp represents a sequence of expressions separated by commas.
  • When constructing lists, an empty list might be represented as null
::=
|
{: RESULT = null; :}
Xs:1
{: RESULT = 1; :}
;
  • Xs is a list of one or more X's with/without a separator.

LR Parse Engine

  • Employs a stack and DFA
    • A DFA is applied to the stack where edges are labeled with (non-)terminals.
  • Actions are performed via the following commands: shift/reduce go to state, or accept (end of file - EOF) or reject.

LR Parse Actions

  • Conceptually, a shift moves a token from the input onto the stack.
  • Reduce(+goto) replaces the RHS of a rule on top of the stack by the LHS.
  • Actually, the stack contains DFA state numbers, not tokens or non-terminals.
  • Shift and goto put state numbers onto the stack

Parse Table

  • The goto-table is to the right, the rest is the parse table.
  • $ indicates EOF
  • The parse table is for an LALR(1) or LR(1) parser
  • The stack stores states, and when parsing a token the parser looks up the action to perform based on the state on top of the stack for the input token.
  • An LR(0) parser decides whether to shift or reduce based on the stack content alone before looking at the input token; the parse table uses same reduce action in each column.

Parse Conflict

  • Parse conflicts occur if the table construction algorithm results in two actions in the same table slot. CUP handles shift-reduce conflicts and reduce-reduce conflicts using rule order.

Grammar Rule

  • Grammar rule with dot is a parser configuration. Everything left of the dot is on the stack; everything right of the dot is on the input. Java CUP shows lookahead tokens for parser configurations.

Parse Actions

Shift(n)

  • Eat one token
  • Shift state n onto the stack Reduce(k)
  • Pop n states off the stack, where n is number of symbols on RHS of rule k
  • In a state, look up X (LHS of rule k) to find goto(m) in goto table to find.
  • Push state m onto the stack

Conflicts

  • Push decision to semantic analysis since there are incorrect expressions in the parser
stm ::= ID ASSIGN E;
E
::= E OR E
E AND E
E EQUAL E
E PLUS E
ID;
  • Push decision to Scanner because C++ has ambiguous contexts; needs type info from semantic analysis to be fed back to scanner, or requires lookahead on token stream if type info not available.

Operators

  • Use precedence declarations for operators
precedence left PLUS;
precedence right ASSIGN;
precedence nonassoc EQUAL;
  • Use %prec if there are no appropriate operators
precedence left UMINUS;
E ::= MINUS E %prec UMINUS;

Conflict Resolution

  • Shift/Reduce Conflict in Tiger is fixed by unrolling recursion for Var, ID before LBRACK should not be reduced to Var.
  • Parse Conflicts for Declarations in Tiger are solved by structuring grammar to allow correct constructor calls, avoid R/R. Goal: S/R that is automatically resolved via shift preference.
  • Solution is to "give preference to DOT in QualName".
  • Modify grammar or use precedence declarations.

Error Recovery

  • All empty entries in parse tables result in "Syntax error". The default behavior is to stops after the first error.

Error Reporting

  • Use grammar rules with special error token.
  • Error recovery reports the parse error, pops stack until there is a shift on error, and then discards the input until non-error at which point parsing is resumed.

Error Reporting

  • Use grammar rules with special error token.
  • Error recovery after reporting parse error includes popping stack until there is a shift on error, shift error token, and discarding input until a state with non-error action, then resume parsing
  • Easier with synchronizing token following error but too many error recovery rules lead to S/R and R/R parse conflicts.

Error recovery example

E -> ID
E -> E + E
E -> ( E )
E -> ( error )
Es -> E
Es -> Es ; E
Es > error ; E
  • Reports syntax error, "ID+" is deleted from stack, since with "(" on stack, error token shifts. "+ ID" is then removed from input, since "(error must be followed by )" so parser continues.
  • Can use grammar rules for reporting semantic errors:
Decl ::= Type ID LBRACK INT RBRACK
ASSIGN LBRACE Exp RBRACE SEM
{: ... :}
| Type ID LBRACK INT RBRACK
ASSIGN LBRACE RBRACE SEM
{:
error("empty array initializer");
RESULT = null;
:}

Error Repair

  • Global error repair with a let statement:
let type a := intArray[10] of 0;

  1. Error production: delete type ... 0
  2. Local repair: replace := with =
  3. Global repair: replace type with var

Burke-Fisher

  • Use single-token insertions/deletions/substitutions in the last K tokens where the repair farthers the parse by R tokens. K+N*K
  • Use semantic actions for insertions
%value ID ("bogus")
%value INT (1)
%value STRING ("")
· Programmer-specified substitutions
%change EQ -> ASSIGN
| ASSIGN -> EQ
| SEM ELSE -> ELSE
|--> IN INT END
  • Most bottom-up parser generators build LALR(1) parsers and merge states only distinguished by lookahead.

LR Parser Technologies

  • LR(O): parse table construction and parser don't use lookahead
  • SLR: Uses the LR(O) parse table construction, reduces lookahead only if it is in the FOLLOW set for LL parsing.
  • LALR(1): parse tables uses compressed LR(1).
  • LR(1): parse table construction, and use one token lookahead.
  • Burke-Fisher requires more data structure; Old stack: parse stack for stack content more than K tokens in the past, Token queue of K tokens, Current stack: parse stack including those K tokens; costs for window size K and and N tokens can be calculated via deletions K+(NK), insertions K+(NK), and substitutions K+(N*K).

Recursive Descent Parsing

void S() {
	switch (tok) {
		case IF:
			eat(IF);
			E();
			eat(THEN);
			S();
			eat(ELSE);
			S();
			break;
		case BEGIN:
			...
		default: error();
} }
  • Corresponds to grammar structure of the language. Has Concatenation in grammar corresponds to straight-line code with alternation in grammar corresponds to if-then-else or switch statements

Recusrive Descent Parsing

  • "tok" is a variable for the current token 1 Assume one token lookahead; var tok before returns, function eat () also checks it prior to getting next token 2 Each parse function returns a parse tree, and parse tree constructors are called in return statements of parse functions
  • Recursive Descent parsing has both Left recursion and common left factors, and solved by restructuring grammar to change from left recursion to right recursion, and restructuring grammar to remove common left factors.

Computing Sets

  • Translate Grammar into Code with X-> Y |Z must choose alternatives based on the look ahead and need knowledge of the sets of terminals that "YZ" can start. If considering following grammars, an empty RHS then those sets will also need knowledge of teriminals that can follow X, after the empty RHS Nullabe(X) - "Can X derive the empty string?" FIRST(X) - "Start with terminal" FOLLOW(X) - "Follow Terminal"

Predictive parsers

  • Construction of Predictive Parser table requires the user to do
  1. First enter production for each given grammar rules such as X-> Y
  2. Column T
  3. FOLLOW sets must also include "$", also IF if the table contains multiple productions, the grammar is not LL(1) - LL parse table can be translated into recursive-descent code or for use in table-driven parser
X-> Y
		|Z
 Y-> 
     |C
 X-> Y
      |a
- Grammar is not LL(1)

Single Pass Compilation

  • Type ID Param is Single-pass vs. multi-pass compiler must be semantic analysis vs main calls as a single vs multi
FunDec:: = Types ID Params
{
	ad Params into symta LBRACE StmtList RBRACE { type-check body generate code
}

Summary Includes

  • In Short, Bottom-up parsing is table driven due to the Push automate
  • Top-down parsing has recursion, predictive and grammar classes of LL
  • Parse has content-free syntax.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore syntax analysis using context-free grammars in programming. Learn about terminal, non-terminal, and start symbols. Understand the concept of regular expressions and their limitations in describing complex inputs.

More Like This

Use Quizgecko on...
Browser
Browser