Compiler Construction Lecture 1 PDF
Document Details
Uploaded by AppreciativeElectricOrgan
Pir Mehr Ali Shah Arid Agriculture University Rawalpindi
Anum Aleem
Tags
Summary
This document is a presentation on compiler construction, particularly the introductory lecture. It covers compilers, programming languages, and different types of languages.It also discusses the concept of compilation, giving a summary of different parts of the compiler and also introducing basic concepts and the theory behind this technology.
Full Transcript
COMPILER CONSTRUCTION LECTURE 1 WEEK 1 ANUM ALEEM Course Introduct Information ion Book: Compilers Principles, Techniques, and Tools by Alfred V.Aho, Ravi Sethi,...
COMPILER CONSTRUCTION LECTURE 1 WEEK 1 ANUM ALEEM Course Introduct Information ion Book: Compilers Principles, Techniques, and Tools by Alfred V.Aho, Ravi Sethi, Jeffery D.Ullman References: compiler design and construction by Alfred V.Aho, Ravi Sethi Pre Requisite: Theory of Automata, DS 2 Contact Details [email protected] / [email protected] Office hours: Monday to Friday (See Time Table) About me MCS ◦Arid Agriculture University Rawalpindi MS (CS) ◦Bahria University Some Rules Attendance during first 10 minutes for each hour After that you will be marked as absent Raise your hand before asking any question and then WAIT for the permission Keep mobile phones on silent mode in the class Whatever you do, please do not create disturbance in the class Dishonesty / Plagiarism Students involved in any kind of cheating in any exam (Quizzes/Assignments/projects) will get 0 (zero) in that exam OR get F in the course In case of quizzes with cheating case, their weightage will be highest Class and Evaluation Policy No extension in assignments deadline will be allowed. Strictly follow timeline to get good marks. Submission must be done on time No compensation for the missed quizzes, assignments & projects Internal marks will be awarded as actual. No annoying query about marks will be entertained at the end of the course. Ensure your presence during the lecture to avoid attendance shortage. Golden Policy: No class miss – No course headache Tentative Grading Policy Total 100 Quizzes 20 Assignments 10 Mid Terms 30 Final Terms 40 INTRODUCTION Computer Program: A self-contained set of instructions used to operate a computer to produce a specific result. Programming Language: A set of instructions that can be used to construct a program. Low-Level Languages use instructions that are directly tied to one type of computer. Programs written in such language are fastest in execution. E.g., Assembly, Machine language,… High Level Languages use instructions that resemble written languages (e.g., English language). Examples are C, C++, Fortran,… Source Code/Source Program: Programs written in a computer language (high or low level). Once a program is written in a high level language be translated in to machine language of that computer on which it will run. This translation can be accomplished in two ways: Interpretation or Compilation. 9 INTRODUCTION Interpreter: When each statement in a high level source program is translated individually and executed immediately upon translation, the program doing the translation is called as Interpreter. Compiler: When all of the statements in a high level source program are translated as a complete unit before anyone statement is executed, the program doing this translation is called as compiler. Source program may be converted into an executable file (executable program) by a compiler and later on executed by a CPU. A compiler is a computer program (or set of programs) that transform source code written in a programming language (the source program) into another computer language (the target language having a binary form known as object code or machine code). 10 Introducti Introduct on ion Simple taxonomy of compilation process: Compilat Source Code Object Code ion (human readable) Process (Non-executable machine (extensions are.c,.cpp, etc) Group of compilation code) phases (extension is.obj) Linked with different libraries (.h,.lib, etc) Executable Code (extension is.exe) The main role of compiler is to report any errors in the source program that it detects during the compilation process. (e.g. a missing semicolon at the end of a statement) 11 Introducti Introduct on ion Compilation process operates as a sequence of phases each of which transforms one representation of the source program to another. Two-Pass Compiler: Intermediate Source Code Front EndRepresentation Back End Machine Code Report errors if any Front End: Recognizes legal and illegal programs presented to it. Consists of two modules: Scanner & Parser. 12 Introducti Introduct on ion Tokens Intermediate Source Code Scanner Parser Representation Report errors if any Scanner: Maps the source code/source program character stream into words. It produces pairs (tokens) that consist of a word and its part of speech. Token Name Attribute Example: z = x – y Value It will be converted as; Tokens may be identifier, number, new, while, switch, if, +, -, / etc… 13 INTRODUCTION Parser: Takes tokens as input, recognizes context-free syntax, report errors, and builds IR for the source program. It analysis the semantics like type checking. It builds a parse thus named parser. Parse can be represented by a tree called a parse tree. Back End: Translates IR into target machine code. It selects instruction, allocate registers, and then schedule the instructions for execution. 14 Compilation Process Something that Something we computer can can understand understand easily easily Source Code Compilation Process Object Code Error Messages 1 5 Cousins of Compiler There are some translator programs that prepare the “input” to the compiler or further process the “output” from the compiler. These programs are called as cousins of compiler. Some of them are: 1. Pre-processor 2. Assembler 3. Loader/Linker 1 7 Cousins of Compiler (cont’d…) Pre-Processor Produces input for the compiler. Generally the program lines starting with # sign represent pre-processor directives. Assemblers Translators that translate the assembly code into relocatable machine code. Assembly code is mnemonic version of machine code in which names are used instead of binary code for different operations. E.g., MUL, ADD etc. 1 8 Cousins of Compiler (cont’d…) Linker / Loader It has four functions 1.Allocation: It means get the memory portions from operating system and storing the object data. 2.Relocation: It maps the relative address to the physical address and relocating the object code. 3.Linker: It combines all the executable object module to pre single executable file. 4.Loader: It loads the executable file into permanent storage. 1 9 Phases of a Compiler (Structure of Compiler) Analysis Synthesis ( front end of compiler) ( back end of compiler) Source Code Lexical Analyzer Token Syntax sAnalyzer Analysis Syntax Tree Semantic Analyzer Symbol Error Table Syntax Tree Handler Manager Intermediate Code Generator Intermediate Representation Code Optimizer Synthesis Intermediate Representation Code Generator Object Code 7 Analysis part breaks up the source program into constituent pieces and checks grammar and syntax. It then uses this structure to generate intermediate representation of the source program. If the source program is detected syntactically incorrect or semantically unsound then proper error messages are generated so that the user may take proper action. Symbol Table is a data structure that collects information about the source program and pass it to the Synthesis part along with the intermediate representation. Synthesis part constructs the desired target program from the intermediate representation and symbol table information. 2 2 Terminolog ies Lexeme – Actual sequence of characters that matches a pattern and has a given Token class. – Examples: Identifier: Name, Data, x Integer: 345, 2, 0, 629 Pattern – The rules that characterize the set of strings for a token – Example: Integer: A digit followed or not followed by digits Identifier: A character followed or not followed by characters or digits 23 2 8 TOKENS Typical classes of tokens in a programming language: –keywords, –operators, –one token for identifiers, –tokens for constants (numbers, literal strings,..), –tokens for punctuation symbols (comma, semicolon,..). TOKEN ATTRIBUTES Typical attribute values for the identifier token (id): –the precise lexeme, –its type/storage requirements, –location where first found Toke n The activity of breaking stream of characters into tokens s called lexical analysis. The lexical analyzer partition input string into substrings, called words, and classifies them according to their role. Example: Consider if(b == 0) a = b In the above programming sentence the words are “if”, “(”, “b”, “==”, “0”, “)”, “a”, “=” and “b”. The roles are keyword, variable, boolean operator, assignment operator. 27 The pairs made by lexical analyzer are: The pair is called token. 28 Specification/description of tokens Regular Languages are the most popular for specifying tokens. Regular languages can be described using regular expressions. Each regular expression is a notation for a regular language (a set of words). If A is a regular expression, we write L(A) to refer to language denoted by A. A regular expression (RE) is defined inductively a ordinary character from ∑ є the empty string R|S either R or S RS R followed by S (concatenation) R* concatenation of R zero or more times (R* = є |R|RR|RRR...) 29 Here are some REs and the strings of the language denoted by the RE. RE Strings in L(R) a “a” ab “ab” a|b “a” (ab “b” )* “” “ab” (a| “abab”... “ab” є)b “b” 30 Regular Expressions Identifier ( _ + letter)(letter | digits)* Digits (Digits) (Digits)* (Digits)*. (Digits)* Interaction between lexical analyzer and parser LEXICAL ANALYZER –Read input characters –identify lexeme patterns for tokens, Lexical analyz –identify token attributes, er –construct the symbol table, –error handling and reporting, –strip whitespace and comments. LEXICAL ANALYZER Other possible roles –Makes a copy of program with the inserted error messages –If source program uses macro-preprocessor, macro expansion is performed at this level LEXICAL ANALYSIS ALSO KNOWN AS SCANNER, LEXER, TOKENIZER Tokenization Give Error message Exceeding length Unmatched String Illegal character Eliminate Comments, white spaces, tab, new line, Interaction between lexical analyzer and parser Interaction between lexical analyzer and parser Lexical analyzer scans the entire source code of the program. It identifies each token one by one. Scanners are usually implemented to produce tokens only when requested by a parser. Here is how recognition of tokens in compiler design works- 1.“Get next token” is a command which is sent from the parser to the lexical analyzer. 2.On receiving this command, the lexical analyzer scans the input until it finds the next token. 3. It returns the token to Parser. 39