Podcast
Questions and Answers
What is the primary task of syntax analysis?
What is the primary task of syntax analysis?
- Recognizing syntactic structure in a token stream (correct)
- Converting source code into machine code
- Managing memory allocation during program execution
- Optimizing the performance of a program
Regular expressions are sufficient for describing languages that require balanced parentheses.
Regular expressions are sufficient for describing languages that require balanced parentheses.
False (B)
What type of grammar is similar to regular expressions but includes recursion?
What type of grammar is similar to regular expressions but includes recursion?
context-free grammar
In a context-free grammar, symbols that cannot be broken down further are called ______ symbols.
In a context-free grammar, symbols that cannot be broken down further are called ______ symbols.
Match the following grammar notations with their descriptions:
Match the following grammar notations with their descriptions:
What does EBNF add to BNF?
What does EBNF add to BNF?
A left-recursive grammar is generally preferred for top-down parsers.
A left-recursive grammar is generally preferred for top-down parsers.
What is the main issue with ambiguous grammars?
What is the main issue with ambiguous grammars?
A grammar that generates more than one parse tree for a single input string is considered ______.
A grammar that generates more than one parse tree for a single input string is considered ______.
Match the parser types with their derivation approach:
Match the parser types with their derivation approach:
Which of the following parser types uses a bottom-up parsing approach?
Which of the following parser types uses a bottom-up parsing approach?
A concrete parse tree is the same as an abstract syntax tree.
A concrete parse tree is the same as an abstract syntax tree.
What is the root node of a concrete parse tree?
What is the root node of a concrete parse tree?
In a concrete parse tree, ______ nodes are terminal symbols (tokens).
In a concrete parse tree, ______ nodes are terminal symbols (tokens).
Match the tree type with its description.
Match the tree type with its description.
What does an abstract syntax tree primarily represent?
What does an abstract syntax tree primarily represent?
In right-most derivation, the left-most non-terminal is expanded at each step.
In right-most derivation, the left-most non-terminal is expanded at each step.
In parsing, what is the purpose of derivations?
In parsing, what is the purpose of derivations?
Expanding the right-most non-terminal in each step corresponds to ______ parsing.
Expanding the right-most non-terminal in each step corresponds to ______ parsing.
Match the type of derivation to its corresponding parsing approach:
Match the type of derivation to its corresponding parsing approach:
Which of the following is true about higher-precedence operators in a desired parse tree?
Which of the following is true about higher-precedence operators in a desired parse tree?
Parentheses must always be explicitly represented in a parse tree.
Parentheses must always be explicitly represented in a parse tree.
How is associativity of operators represented in a parse tree?
How is associativity of operators represented in a parse tree?
In an unambiguous expression grammar, lower-precedence operators are at the ______ of the tree.
In an unambiguous expression grammar, lower-precedence operators are at the ______ of the tree.
Match the concept with its description in the context of parsing:
Match the concept with its description in the context of parsing:
What is a common issue that arises with nested if-statements in ambiguous grammars?
What is a common issue that arises with nested if-statements in ambiguous grammars?
An ambiguous grammar for if-statements is always preferred for its conciseness.
An ambiguous grammar for if-statements is always preferred for its conciseness.
In the context of the dangling else problem, what does a matched statement refer to?
In the context of the dangling else problem, what does a matched statement refer to?
The 'then' part is always matched, therefore each 'else' goes with the ______ 'if'.
The 'then' part is always matched, therefore each 'else' goes with the ______ 'if'.
Match the following terms related to shift-reduce conflicts with their meanings:
Match the following terms related to shift-reduce conflicts with their meanings:
In JavaCUP, how is a shift-reduce conflict typically resolved by default?
In JavaCUP, how is a shift-reduce conflict typically resolved by default?
In Tiger, the assignment operator is right-associative.
In Tiger, the assignment operator is right-associative.
What is a common approach to resolving reduce/reduce conflicts?
What is a common approach to resolving reduce/reduce conflicts?
A C++ statement like C x(a, b, c);
can be ambiguous because it could be a forward declaration or a ______.
A C++ statement like C x(a, b, c);
can be ambiguous because it could be a forward declaration or a ______.
Which parser technology uses compressed LR(1) parse tables?
Which parser technology uses compressed LR(1) parse tables?
LL parsing directly handles left recursion without modification.
LL parsing directly handles left recursion without modification.
What is the purpose of FIRST and FOLLOW sets in LL parsing?
What is the purpose of FIRST and FOLLOW sets in LL parsing?
In recursive descent parsing, the code structure corresponds to the ______ structure.
In recursive descent parsing, the code structure corresponds to the ______ structure.
Match the parsing term to its action:
Match the parsing term to its action:
Which type of parser generator is yacc?
Which type of parser generator is yacc?
Flashcards
Syntax Analysis Task
Syntax Analysis Task
Recognizing the syntactic composition of a token stream.
Parse Tree Production
Parse Tree Production
Producing a tree-like depiction of a program's structure.
Terminal Symbols
Terminal Symbols
Symbols that form the basic vocabulary of the language.
Non-terminal Symbols
Non-terminal Symbols
Signup and view all the flashcards
Start Symbol
Start Symbol
Signup and view all the flashcards
Production Rule
Production Rule
Signup and view all the flashcards
BNF (Backus-Naur Form)
BNF (Backus-Naur Form)
Signup and view all the flashcards
EBNF (Extended Backus-Naur Form)
EBNF (Extended Backus-Naur Form)
Signup and view all the flashcards
Top-level Alternation
Top-level Alternation
Signup and view all the flashcards
Right-Recursive Grammar
Right-Recursive Grammar
Signup and view all the flashcards
Left-Recursive Grammar
Left-Recursive Grammar
Signup and view all the flashcards
Left-to-right Parsing
Left-to-right Parsing
Signup and view all the flashcards
Concrete Parse Tree
Concrete Parse Tree
Signup and view all the flashcards
Leaves of Parse Tree
Leaves of Parse Tree
Signup and view all the flashcards
Interior Nodes
Interior Nodes
Signup and view all the flashcards
Ambiguous Grammar
Ambiguous Grammar
Signup and view all the flashcards
Derivation
Derivation
Signup and view all the flashcards
Left-most Derivation
Left-most Derivation
Signup and view all the flashcards
Right-most Derivation
Right-most Derivation
Signup and view all the flashcards
Abstract Parse Tree
Abstract Parse Tree
Signup and view all the flashcards
Operator Precedence
Operator Precedence
Signup and view all the flashcards
Parentheses in Parse Trees
Parentheses in Parse Trees
Signup and view all the flashcards
Matched Statement
Matched Statement
Signup and view all the flashcards
Unmatched Statement
Unmatched Statement
Signup and view all the flashcards
Dangling Else
Dangling Else
Signup and view all the flashcards
Parse Conflict
Parse Conflict
Signup and view all the flashcards
Shift-Reduce Conflict
Shift-Reduce Conflict
Signup and view all the flashcards
Reduce-Reduce Conflict
Reduce-Reduce Conflict
Signup and view all the flashcards
Error Recovery
Error Recovery
Signup and view all the flashcards
Error Token
Error Token
Signup and view all the flashcards
Error Productions
Error Productions
Signup and view all the flashcards
Single-Pass Compiler
Single-Pass Compiler
Signup and view all the flashcards
Multi-Pass Compiler
Multi-Pass Compiler
Signup and view all the flashcards
Recursive Descent Parser
Recursive Descent Parser
Signup and view all the flashcards
LL(1) Grammar
LL(1) Grammar
Signup and view all the flashcards
FIRST sets
FIRST sets
Signup and view all the flashcards
FOLLOW sets
FOLLOW sets
Signup and view all the flashcards
Burke-Fisher Error Repair
Burke-Fisher Error Repair
Signup and view all the flashcards
Nullable(X)
Nullable(X)
Signup and view all the flashcards
One Token Lookahead
One Token Lookahead
Signup and view all the flashcards
Study Notes
- Syntax analysis involves recognizing syntactic structure in a token stream and producing a parse tree that represents the structure of the input program.
Regular Expressions
- Regular expression macros can be used for summation expressions; an input of "10+2+30" is valid using digits = [0-9]+ & sum = ({digits} "+")* {digits}
- Regular expressions cannot describe an input like "10+(2+30)", as it is not a regular language.
Context-Free Grammars
- A context-free grammar is similar to regular expressions but include recursion for added functionality.
- Top-level alternation translates into aux = c | d & expr = a b aux e where expr = a b (c | d) e.
- Kleene closure translates into expr = a b c expr & expr = ε where expr = (a b c)*
Key Components
- Terminal symbols (tokens) are fundamental units recognized by the scanner.
- Non-terminal symbols represent higher-level syntactic constructs.
- Start symbol is the initial non-terminal from which derivations begin.
- Grammar rules have the form N -> X*, where N is a non-terminal, and X is a terminal or non-terminal.
CFG Syntax Variants
- BNF (Backus-Naur Form) was developed for Algol-60;
::= 'if' 'then' | ... - EBNF (Extended Backus-Naur Form) has uses square brackets for optional parts
[ ... ]
and curly brackets for repetition zero or more times{ ... }
. - Scheme Reports (R6RS) utilizes ellipses (...) for repetition zero or more times.
Grammar Notation
digit = 0
...
digit = 9
digits = digit
digits = digit digits
sum = expr "+" sum
sum = expr
expr = digits
expr = "(" sum ")"
Recursion
- Right-recursive grammar: N -> t & N -> t N
- Left-recursive grammar: N -> t & N -> N t
Example grammar
S -> S ; S S -> id := E S -> print ( L ) E -> id E -> num E -> E + E E -> ( S , E ) L -> E L -> L , E
Lists of Expressions
- Left-recursive version: L -> E & L -> L , E
- Right-recursive version: L -> E & L -> E , L
- Ambiguous version L -> E & L -> L , L
Grammar Rule Styles
- Left-recursive rules are preferable for left-associative operators and do not work with top-down parsers; the cost is O(n).
- Right-recursive rules are compatible with both top-down and bottom-up parsers; the cost is O(n).
- Ambiguous rules results in multiple possible parse trees; the cost is O(n³), supported by Earley parser & GLR parser.
Grammar Classes
- LL(k): Left-to-right, Left-most derivations, k tokens lookahead
- LR(k): Left-to-right, Right-most derivations, k tokens lookahead
Technologies
- LL(0) is the simplest top-down parser.
- LL(1) is a typical handwritten recursive-descent parser.
- LL(k) represents parsers generated by a top-down parser generator.
- LR(0) is the simplest bottom-up parser.
- SLR is the simplest bottom-up parser with 1 token lookahead.
- LALR(1) is using a bottom-up parser generator.
- LR(1) is a bottom-up parser with 1 token lookahead.
Concrete Parse Trees
- Concrete Parse Trees represents syntactic structure of input and summarizes how the input was parsed.
- Leaves are terminal symbols (tokens), interior nodes are non-terminal symbols, and the root is the start symbol.
- Reading leaves left-to-right produces the input.
Ambiguous Grammars
- Ambiguous grammars can produce multiple parse trees for the same input.
Derivations
- Derivations start with the start symbol and replace non-terminals with the right-hand side (RHS) of grammar rules until a terminal string is derived.
- Used to define language described by grammar; language is set of all possible strings that can be derived from start symbol.
- Left-most derivations expand the left-most non-terminal by RHS of grammar rule and corresponds to top-down (LL) parsing.
- Right-most derivations expand the right-most non-terminal by RHS of grammar rule and corresponds to bottom-up (LR) parsing.
Abstract Parse Treees
- Represents syntactic structure of input and do not summarize how the input was parsed.
Ambiguous Expression Grammar
- This grammar is convenient and compact:
E -> id E -> num E -> E * E E -> E / E E -> E + E E -> E - E E -> ( E )
Desired Parse Trees
- Higher-precedence operators should be below lower-precedence operators. The right tree for 1-2*3 is prefered.
- For operators with equal precedence, parse trees should reflect associativity. The left parse tree for 1-2-3 reflects (1-2)-3.
- Parentheses do not need representation in the parse tree and are implicit in tree structure.
Unambiguous Expression Grammar
E -> E + T E -> E - T E -> T T -> T * F T -> T / F T -> F F -> id F -> num F -> ( E )
- Lower-precedence operators are at the top.
- Left recursion corresponds to left-associative operators.
- Necessary for handwritten parsers.
- C, C#, and Java have 15 precedence levels, while C++ has 17.
Ambiguous Grammar
- The grammar is ambigious for two nested if-statements (aka. a dangling else); should "else" belong to the inner or outer "if".
S -> if E then S else S
S -> if E then S
S -> other
Unambiguous Grammar
- M: matched statement (each if comes with an else) & U: unmatched statement.
S -> M
S -> U
M -> if E then M else M
M -> other
U -> if E then S
U -> if E then M else U
- The "then" part is matched, therefore each "else" goes with the inner "if".
- LR parsers automatically repair the ambiguous if-statement grammar.
JavaCUP Precedence Directives
precedence nonassoc EQ, NEQ; precedence left PLUS, MINUS; precedence left TIMES, DIV; precedence right EXP; precedence left UMINUS; exp ::= INT | exp PLUS exp | exp MINUS exp | exp TIMES exp | MINUS exp %prec UMINUS ;
Comparison Operators
- In Tiger, comparison operators are non-associative and enforced with a rewrite.
- Java, consecutive comparison operators can result in a type error.
- In C and C++, comparison operators are left-associative.
Assignment Operators
- In the C family, the assignment operator is right-associative and returns the value of the right hand side (RHS).
- Tiger does not allow assignments inside expressions.
Parse Trees in JavaCUP
- Not every grammar rule needs a constructor call
Exp ::= LPAREN SeqExp:e RPAREN
{: RESULT = e; :}
;
- SeqExp represents a sequence of expressions separated by commas.
- When constructing lists, an empty list might be represented as null
::=
|
{: RESULT = null; :}
Xs:1
{: RESULT = 1; :}
;
- Xs is a list of one or more X's with/without a separator.
LR Parse Engine
- Employs a stack and DFA
- A DFA is applied to the stack where edges are labeled with (non-)terminals.
- Actions are performed via the following commands: shift/reduce go to state, or accept (end of file - EOF) or reject.
LR Parse Actions
- Conceptually, a shift moves a token from the input onto the stack.
- Reduce(+goto) replaces the RHS of a rule on top of the stack by the LHS.
- Actually, the stack contains DFA state numbers, not tokens or non-terminals.
- Shift and goto put state numbers onto the stack
Parse Table
- The goto-table is to the right, the rest is the parse table.
- $ indicates EOF
- The parse table is for an LALR(1) or LR(1) parser
- The stack stores states, and when parsing a token the parser looks up the action to perform based on the state on top of the stack for the input token.
- An LR(0) parser decides whether to shift or reduce based on the stack content alone before looking at the input token; the parse table uses same reduce action in each column.
Parse Conflict
- Parse conflicts occur if the table construction algorithm results in two actions in the same table slot. CUP handles shift-reduce conflicts and reduce-reduce conflicts using rule order.
Grammar Rule
- Grammar rule with dot is a parser configuration. Everything left of the dot is on the stack; everything right of the dot is on the input. Java CUP shows lookahead tokens for parser configurations.
Parse Actions
Shift(n)
- Eat one token
- Shift state n onto the stack Reduce(k)
- Pop n states off the stack, where n is number of symbols on RHS of rule k
- In a state, look up X (LHS of rule k) to find goto(m) in goto table to find.
- Push state m onto the stack
Conflicts
- Push decision to semantic analysis since there are incorrect expressions in the parser
stm ::= ID ASSIGN E;
E
::= E OR E
E AND E
E EQUAL E
E PLUS E
ID;
- Push decision to Scanner because C++ has ambiguous contexts; needs type info from semantic analysis to be fed back to scanner, or requires lookahead on token stream if type info not available.
Operators
- Use precedence declarations for operators
precedence left PLUS;
precedence right ASSIGN;
precedence nonassoc EQUAL;
- Use %prec if there are no appropriate operators
precedence left UMINUS;
E ::= MINUS E %prec UMINUS;
Conflict Resolution
- Shift/Reduce Conflict in Tiger is fixed by unrolling recursion for Var, ID before LBRACK should not be reduced to Var.
- Parse Conflicts for Declarations in Tiger are solved by structuring grammar to allow correct constructor calls, avoid R/R. Goal: S/R that is automatically resolved via shift preference.
- Solution is to "give preference to DOT in QualName".
- Modify grammar or use precedence declarations.
Error Recovery
- All empty entries in parse tables result in "Syntax error". The default behavior is to stops after the first error.
Error Reporting
- Use grammar rules with special
error
token. Error
recovery reports the parse error, pops stack until there is a shift onerror
, and then discards the input until non-error
at which point parsing is resumed.
Error Reporting
- Use grammar rules with special
error
token. - Error recovery after reporting parse error includes popping stack until there is a shift on
error
, shifterror
token, and discarding input until a state with non-error action, then resume parsing - Easier with synchronizing token following
error
but too many error recovery rules lead to S/R and R/R parse conflicts.
Error recovery example
E -> ID
E -> E + E
E -> ( E )
E -> ( error )
Es -> E
Es -> Es ; E
Es > error ; E
- Reports syntax error, "ID+" is deleted from stack, since with "(" on stack, error token shifts. "+ ID" is then removed from input, since "(error must be followed by )" so parser continues.
- Can use grammar rules for reporting semantic errors:
Decl ::= Type ID LBRACK INT RBRACK
ASSIGN LBRACE Exp RBRACE SEM
{: ... :}
| Type ID LBRACK INT RBRACK
ASSIGN LBRACE RBRACE SEM
{:
error("empty array initializer");
RESULT = null;
:}
Error Repair
- Global error repair with a let statement:
let type a := intArray[10] of 0;
- Error production: delete type ... 0
- Local repair: replace := with =
- Global repair: replace type with var
Burke-Fisher
- Use single-token insertions/deletions/substitutions in the last K tokens where the repair farthers the parse by R tokens. K+N*K
- Use semantic actions for insertions
%value ID ("bogus")
%value INT (1)
%value STRING ("")
· Programmer-specified substitutions
%change EQ -> ASSIGN
| ASSIGN -> EQ
| SEM ELSE -> ELSE
|--> IN INT END
- Most bottom-up parser generators build LALR(1) parsers and merge states only distinguished by lookahead.
LR Parser Technologies
- LR(O): parse table construction and parser don't use lookahead
- SLR: Uses the LR(O) parse table construction, reduces lookahead only if it is in the FOLLOW set for LL parsing.
- LALR(1): parse tables uses compressed LR(1).
- LR(1): parse table construction, and use one token lookahead.
- Burke-Fisher requires more data structure; Old stack: parse stack for stack content more than K tokens in the past, Token queue of K tokens, Current stack: parse stack including those K tokens; costs for window size K and and N tokens can be calculated via deletions K+(NK), insertions K+(NK), and substitutions K+(N*K).
Recursive Descent Parsing
void S() {
switch (tok) {
case IF:
eat(IF);
E();
eat(THEN);
S();
eat(ELSE);
S();
break;
case BEGIN:
...
default: error();
} }
- Corresponds to grammar structure of the language. Has Concatenation in grammar corresponds to straight-line code with alternation in grammar corresponds to if-then-else or switch statements
Recusrive Descent Parsing
- "tok" is a variable for the current token 1 Assume one token lookahead; var tok before returns, function eat () also checks it prior to getting next token 2 Each parse function returns a parse tree, and parse tree constructors are called in return statements of parse functions
- Recursive Descent parsing has both Left recursion and common left factors, and solved by restructuring grammar to change from left recursion to right recursion, and restructuring grammar to remove common left factors.
Computing Sets
- Translate Grammar into Code with X-> Y |Z must choose alternatives based on the look ahead and need knowledge of the sets of terminals that "YZ" can start. If considering following grammars, an empty RHS then those sets will also need knowledge of teriminals that can follow X, after the empty RHS Nullabe(X) - "Can X derive the empty string?" FIRST(X) - "Start with terminal" FOLLOW(X) - "Follow Terminal"
Predictive parsers
- Construction of Predictive Parser table requires the user to do
- First enter production for each given grammar rules such as X-> Y
- Column T
- FOLLOW sets must also include "$", also IF if the table contains multiple productions, the grammar is not LL(1) - LL parse table can be translated into recursive-descent code or for use in table-driven parser
X-> Y
|Z
Y->
|C
X-> Y
|a
- Grammar is not LL(1)
Single Pass Compilation
- Type ID Param is Single-pass vs. multi-pass compiler must be semantic analysis vs main calls as a single vs multi
FunDec:: = Types ID Params
{
ad Params into symta LBRACE StmtList RBRACE { type-check body generate code
}
Summary Includes
- In Short, Bottom-up parsing is table driven due to the Push automate
- Top-down parsing has recursion, predictive and grammar classes of LL
- Parse has content-free syntax.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore syntax analysis using context-free grammars in programming. Learn about terminal, non-terminal, and start symbols. Understand the concept of regular expressions and their limitations in describing complex inputs.