Compiler Design CSC 448: Scanning Techniques
40 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of the function scanner() in the provided code?

  • To scan and identify symbols in the input stream (correct)
  • To generate output from the input stream
  • To handle syntax errors in the input
  • To parse input files

Which case in the scanner handles printing commands?

  • case 'i':
  • case 'f':
  • case 'p': (correct)
  • case '=':

What action does the scanner take when it encounters a whitespace character?

  • It skips the character and continues scanning (correct)
  • It returns an error
  • It adds the whitespace to the symbol table
  • It stops the scanning process

What will happen if the scanner reads an unexpected character?

<p>It will throw an error message (C)</p> Signup and view all the answers

In the given code snippet, what does scanDigits() likely do?

<p>It scans and returns a digit symbol (C)</p> Signup and view all the answers

Which data type is used for symbol objects in the scanner?

<p>Symbol (B)</p> Signup and view all the answers

What does the malloc function do in the context of the scanner?

<p>It allocates memory for a new symbol object (D)</p> Signup and view all the answers

What type of symbol does the scanner associate with a variable name read as a lowercase letter?

<p>ID_SYMBOL (C)</p> Signup and view all the answers

What is the purpose of the 'yywrap' function in the provided program?

<p>It handles the end of the input file. (D)</p> Signup and view all the answers

What will the lexer output if the input is '12345'?

<p>Integer: 12345 (C)</p> Signup and view all the answers

Which symbol is NOT accounted for in the lexer according to the requirements given?

<p>/ (A)</p> Signup and view all the answers

How does the macro 'YY_INPUT' contribute to the lexer functionality?

<p>It efficiently reads a buffer of characters. (B)</p> Signup and view all the answers

What would be an appropriate modification to handle floating point numbers?

<p>Include decimal points in the token patterns. (A)</p> Signup and view all the answers

Which of the following represents the correct regular expression for an identifier in this lexer?

<p>[A-Za-z_][A-Za-z_0-9]* (C)</p> Signup and view all the answers

What is the main purpose of the lexical analyzer described in the content?

<p>To recognize and process sequences of characters. (B)</p> Signup and view all the answers

Which of the following actions should be performed for tokens categorized as comments?

<p>Ignore and do not process them. (C)</p> Signup and view all the answers

What will the function scanDigits() return if the input character stream contains a valid integer?

<p>A pointer to an INT_SYMBOL (D)</p> Signup and view all the answers

What is the purpose of the line 'symbolPtr= (Symbol*) malloc(sizeof(Symbol));' in the scanDigits() function?

<p>To create a new Symbol instance (C)</p> Signup and view all the answers

Which of the following tools was developed as a faster lexical analyzer compared to lex?

<p>flex (D)</p> Signup and view all the answers

In the lex/flex example program, what is the purpose of the line 'printf("%c",yytext);'?

<p>To output each character matched by the scanner (B)</p> Signup and view all the answers

When is 'yywrap()' called in a lex/flex program?

<p>At the end of input processing (D)</p> Signup and view all the answers

What is the primary role of the 'flex' tool in programming?

<p>Lexical analysis of input strings (D)</p> Signup and view all the answers

What type of symbol does the function assign when a decimal point is detected in scanDigits()?

<p>FLOAT_SYMBOL (B)</p> Signup and view all the answers

What is the expected behavior of the example lex/flex program when run?

<p>It will echo back input characters (B)</p> Signup and view all the answers

Which regular expression matches a string that starts with 'begin'?

<p>^begin (D)</p> Signup and view all the answers

What does the regular expression '[^0-9]' represent?

<p>Any character that is not a digit (C)</p> Signup and view all the answers

In the context of regular expressions, what does '*' signify?

<p>Matches zero or more occurrences of the previous character (C)</p> Signup and view all the answers

What is the purpose of the '' character in a regular expression?

<p>To escape a special character (A)</p> Signup and view all the answers

Which regular expression will match an optional '+' followed by one or more digits?

<p>(+)?[0-9]+ (C)</p> Signup and view all the answers

In the given context, how would you interpret the expression 'A{1,3}'?

<p>Matches the letter 'A' at least once and at most three times (A)</p> Signup and view all the answers

Which command sequence correctly compiles a lex program named 'ex4_echoer.lex'?

<p>$ flex -o ex4_echoer.c ex4_echoer.lex (A)</p> Signup and view all the answers

How does the '$' symbol function in a regex?

<p>Matches a specific character at the end of a line (D)</p> Signup and view all the answers

What does the line 'yylex();' accomplish in the program?

<p>It triggers the lexical analyzer to start processing input. (D)</p> Signup and view all the answers

What is the role of the function yywrap() in the lex/flex program?

<p>It determines if more input files need to be opened. (A)</p> Signup and view all the answers

What happens if yywrap() returns a value of 1?

<p>It signifies the end of processing the current input file. (D)</p> Signup and view all the answers

What does the regular expression '.*' mean in this context?

<p>It matches any string of characters, including whitespace. (D)</p> Signup and view all the answers

In the context of the lex/flex program, what is the outcome of typing 'quit'?

<p>The program terminates and prints 'quit'. (B)</p> Signup and view all the answers

Which command compiles the lex file into a C source file?

<p>$ flex -o ex1_echoer.c ex1_echoer.lex (C)</p> Signup and view all the answers

What is the significance of the line 'printf("\n");' in the given program?

<p>It outputs a newline to the console whenever a newline character is detected. (D)</p> Signup and view all the answers

If you wanted to count characters and newlines, which task would be appropriate?

<p>Implement a new rule for counting newline characters in the lex file. (B)</p> Signup and view all the answers

Flashcards

Tokenization

The process of converting a stream of characters (source code) into a sequence of tokens, which represent meaningful units of the program.

Tokenizer

A program that performs tokenization on a source code file.

Token

An individual unit of meaning in a program, such as keywords, identifiers, operators, and literals.

Input Character Stream

A stream of characters representing the source code of a program.

Signup and view all the flashcards

Symbol

A data structure that stores information about a token, such as its type and its value.

Signup and view all the flashcards

Symbol Type

A data type used to represent a token, such as operators, keywords, and identifier.

Signup and view all the flashcards

Symbol Value

A value associated with a token, such as the name of an identifier or the value of a literal.

Signup and view all the flashcards

End Symbol

A special token that marks the end of the input character stream.

Signup and view all the flashcards

EOF (End of File)

A special character that marks the end of a file or input. It signals to the program that there are no more characters to be processed.

Signup and view all the flashcards

Regular Expression

A pattern of characters that is used to match a specific sequence of input text in a program.

Signup and view all the flashcards

Lexer (Lexical Analyzer)

a lexical analyzer that translates the source code into a sequence of tokens, which represent meaningful units of the program.

Signup and view all the flashcards

yywrap()

A C function that is used to tell the lexer whether to keep reading input or stop.

Signup and view all the flashcards

yylex()

A function that is automatically generated and used by the lexer. It takes input and generates a sequence of tokens based on the defined regular expressions.

Signup and view all the flashcards

yytext

A variable in programming languages like C that holds the current part of the input that has been matched by a regular expression.

Signup and view all the flashcards

Lex Rules

A set of rules that tell a lexer how to tokenize the input.

Signup and view all the flashcards

Action Code

A special code block that is included in the body of a lexer, where C code can be used to perform actions when a particular regular expression is matched.

Signup and view all the flashcards

quoted strings

A sequence of characters enclosed within double quotes. Example: "Hello, world!"

Signup and view all the flashcards

identifiers

A unique name used to identify a variable, function, or other program element. Example: myVariable, calculateSum

Signup and view all the flashcards

reserved words

A predefined keyword that has a specific meaning in the programming language, such as "if", "else", "while", etc.

Signup and view all the flashcards

integers

A whole number without a decimal point. Example: 10, 25, 0

Signup and view all the flashcards

lexer

A part of the parser that analyzes the source code into smaller, meaningful units called 'tokens'.

Signup and view all the flashcards

operators

A special character that is used to perform a specific operation. Example: +, -, *, /, =, !=, etc.

Signup and view all the flashcards

Period (.)

In regular expressions, a period (.) matches any single character except newline.

Signup and view all the flashcards

Vertical Bar (|)

In regular expressions, the vertical bar (|) represents the OR operator, allowing you to match either the expression before or after the bar.

Signup and view all the flashcards

Square Brackets ([...])

In regular expressions, square brackets ([...]) define a character class, which matches any single character within the brackets.

Signup and view all the flashcards

Negated Character Class ([^...])

In regular expressions, caret (^) within square brackets ([^...]) defines a negated character class, which matches any single character not within the brackets.

Signup and view all the flashcards

Parentheses ()

In regular expressions, parentheses () are used to group parts of the expression, allowing you to apply other operators to the entire group.

Signup and view all the flashcards

Asterisk (*)

In regular expressions, the asterisk (*) matches zero or more occurrences of the preceding expression.

Signup and view all the flashcards

Plus Sign (+)

In regular expressions, the plus sign (+) matches one or more occurrences of the preceding expression.

Signup and view all the flashcards

Question Mark (?)

In regular expressions, the question mark (?) matches zero or one occurrence of the preceding expression, making it optional.

Signup and view all the flashcards

String

A sequence of characters, commonly used to store and manipulate textual data in programming.

Signup and view all the flashcards

Lexical Analyzer

A tool used to convert a stream of characters (source code) into individual tokens that represent meaningful units of the program. Lexical analyzers are essential for compilers and interpreters to understand the structure of the source code.

Signup and view all the flashcards

Language Grammar

A compiler or interpreter utilizes these to specify a set of rules that define the structure and meaning of the tokens in a programming language.

Signup and view all the flashcards

Flex

Flex is a tool used to generate lexical analyzers. It takes a set of regular expressions as input and outputs C source code for a lexical analyzer.

Signup and view all the flashcards

Multi-Component program

A program that consists of several components that communicate with each other to perform a task. The program may have different functions responsible for different aspects of the task.

Signup and view all the flashcards

Component

A component is a part of a program that performs a specific task. It is a self-contained unit of code responsible for its own functionality.

Signup and view all the flashcards

Study Notes

Course Information

  • Course: CSC 448 Compiler Design
  • Lecturer: Joseph Phillips
  • University: De Paul University
  • Date: 2018 April 9

Reading Material

  • Book Title: "Crafting a Compiler"
  • Authors: Charles Fischer, Ron Cytron, Richard LeBlanc Jr.
  • Publication Year: 2010
  • Chapter 3: Scanning - Theory and Practice

Topics

  • Scanning (Practice)
  • Flex

Compiler Structure

  • Program → Compiler → Symbol Table → Executable
  • Scanner → Parser → Type checker → Translator → Optimizer → Code generator

Hand-coded Tokenizer (Example)

  • The code snippet shows a hand-coded tokenizer.
  • The code tokenizes input characters.
  • It uses different cases to identify and categorize the tokens: assignment, addition, subtraction, print, integer declaration, float declaration, identifier etc.
  • Input stream is checked for whitespace before processing.
  • It handles integers, and identifiers.
  • The code contains error handling

Flex (Alternative Tokenizer)

  • Flex is a tool for generating lexical analyzers (scanners) in C.
  • It allows specifying regular expressions to define tokens.
  • Flex automatically generates C code for the scanner.
  • It has better performance and flexibility than a hand-coded tokenizer.
  • Flex's history
    • Lexical Analyzer created in 1970s by Mike Lesk and Eric Schmidt.
    • Fast Lexical analyzer (flex) created in 1987 by Vern Paxson.

First Lex/Flex Example (Basic Echoer)

  • Shows a compilation process of a simple lex/flex program that echoes input to output.

First Lex/Flex Program (Variations & Examples)

  • First Lex/Flex program variations demonstrate the code structure.
  • Explanation: There are example input and output text to illustrate various commands and compilation steps

Lex/Flex Rules

  • Regular Expressions:
    • Period (.) matches any character.
    • Bracket expressions [ ] match the set of characters inside.
    • Ranges 0-9 can be used.
    • Negated bracketed expressions [^ ] matches any character not in the bracket.
    • Repetition:
        • matches zero or more occurrences of the preceding element.
        • matches one or more occurrences of the preceding element
      • ? matches zero or one occurrence of the preceding element
      • {} matches a specific number of occurrences of the preceding element
      • / matches the preceding element only if followed by the following element
    • Anchoring:
      • ^ matches the beginning of a line.
      • $ matches the end of a line.
    • Grouping and escaping:
      • Parenthesis () group expressions
      • \ character is used in regular expressions to escape special characters (Like \n)

Counting Characters, Newlines And Vowels

  • Specific Lex/flex programs provided to show how to count characters/lines.
  • Some examples to count vowel and nonvowel letters.

Lex-Flex Functions and Variables

  • yyin : A FILE pointer to the input. Used like stdin.
  • yywrap(): Called at the end of the input. Should return 0 to read a new file, or 1 if there's nothing more to read
  • yytext: holds the currently read lexeme in a char array
  • getchar/strdup: function used to read and create string copies
  • YY_INPUT: A macro to efficiently read input buffer-fuls of characters.

Input Control

  • The input() or yyinput function in C++ Allows us to read characters from the input
  • Methods for controlling input such as skipping over comments

Nested Comments

  • Lex rule examples of properly reading nested C comments

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore the principles of scanning as discussed in Chapter 3 of 'Crafting a Compiler.' This quiz covers the theoretical and practical aspects of tokenization, focusing on hand-coded tokenizers and the use of Flex as an alternative. Test your understanding of compiler structures and the importance of scanning in the compilation process.

More Like This

Skimming and Scanning Quiz
10 questions
Graph Theory and Project Scheduling Concepts
33 questions
Business Environment Scanning Concepts
41 questions
Use Quizgecko on...
Browser
Browser