2 - CSCI70-L3-Syntax-Translation-Intro.pdf

Syntax and Translation CSCI 70 Structure and Interpretation of Programming Languages Programming language Some definitions Notation for specifying programs or computation Set of words, symbols, and rules for constructing programs Set of instructions that can be used to write code for a computer A vocabulary and set of grammatical rules for instructing a computer to perform tasks Syntax, semantics, pragmatics Syntax: the way a program is written (form) Semantics: what the program means (meaning) Pragmatics: how the program is executed (implementation) * Every programming language has rules and details for each of these facets * Most definitions of a programming language focus on rules for syntax Reference Michael L. Scott. 2016. Programming Language Pragmatics. 4th Edition. Amsterdam: Elsevier/Morgan Kaufmann Publishers. Compilation overview Compilation From source code (text) To target code (binary instructions of target computing platform) Phases Lexical analysis: from text/character stream to tokens Syntax analysis: from tokens to a parse tree Code generation: from parse tree to target code (can be broken down further to include semantic analysis, intermediate code generation, and optimization) Symbol table management: monitoring and processing of symbols/names across the above phases Source code example // computes the surface area of a cylinder whose // height is 500cm and radius is 3.2cm height = 500; radius = 3.2; pi = 3.1415926; area = 2*pi*radius*radius + 2*pi*radius*height; Lexical analysis Process the source program as a character stream Filter out white spaces and comments Group characters into smallest meaningful units (tokens) Output a token stream After lexical analysis … height IDENT ; SCOL 2 NUM = EQ area IDENT * MULT 500 NUM = EQ pi IDENT ; SCOL 2 NUM * MULT radius IDENT * MULT radius IDENT = EQ pi IDENT * MULT 3.2 FLOAT * MULT height IDENT ; SCOL radius IDENT ; SCOL pi IDENT * MULT = EQ radius IDENT 3.1415926 FLOAT + PLUS After lexical analysis … height IDENT ; SCOL 2 NUM = EQ area IDENT * MULT 500 NUM = EQ pi IDENT ; SCOL 2 NUM * MULT radius IDENT * MULT radius IDENT = EQ pi IDENT * MULT 3.2 FLOAT * MULT height IDENT ; SCOL radius IDENT ; SCOL pi IDENT * MULT = EQ radius IDENT token: lexeme and token id 3.1415926 FLOAT + PLUS Tokens, token ids, and lexemes Token: smallest meaningful unit in a source program Each token represents a sequence of 1 or more characters Includes identifiers, keywords, numbers, strings, operators, special symbols Not considered as tokens: white spaces and comments Exception: Python indentation (indent levels derived from number of spaces) Token id: token type Lexeme: actual sequence of characters that comprise the token Syntax analysis and code generation From the token stream, ensure the program follows syntax Build an internal parse tree A program is viewed as a sequence of statements and, for example, the token sequence IDENT EQ FLOAT SCOL is a valid statement Guided by grammar rules Code generation: “walk” the parse tree to generate target code Usually would first produce intermediate code independent of target platform Incorporates semantic analysis Would refer to a symbol table maintained in previous stages Translation phases Scanning: Regular expressions, finite automata, and lexical analysis Parsing: Grammars and syntax analysis Reference: Chapters 1 and 2 of the Scott textbook

2 - CSCI70-L3-Syntax-Translation-Intro.pdf

Document Details

Related

Full Transcript