CSC 448: Compiler Design Lecture 3 PDF

CSC 448: Compiler Design Lecture 3 Joseph Phillips De Paul University 2018 April 9 Copyright © 2015-8 Joseph Phillips All rights reserved Reading Charles Fischer, Ron Cytron, Richard LeBlanc Jr. “Crafting a Compiler” Addison-Wesley. 2010. – Chapter 3: Scanning – Theory and Practice Topics: Scanning (Practice) Flex Overview: Remember our hand-coded tokenizer? Symbol* scanner () throw(const char*) case '-' : { symbolPtr->symbol_ = SUBTRACT_SYMBOL; while ( isspace(inputCharStream_.peek()) ) break; inputCharStream_.advance(); case 'p' : if ( inputCharStream_.isAtEnd() ) symbolPtr->symbol_ = PRINT_SYMBOL; return( &endSymbol ); break; if ( isdigit(inputCharStream_.peek()) ) case 'i' : return( scanDigits() ); symbolPtr->symbol_ = INT_DECLARE_SYMBOL; break; char ch = inputCharStream_.peek(); Symbol* symbolPtr = (Symbol*) case 'f' : malloc(sizeof(Symbol)); symbolPtr->symbol_ = FLOAT_DECLARE_SYMBOL; break; inputCharStream_.advance(); default : switch (ch) if ( islower(ch) ) { { case '=' : symbolPtr->symbol_ = ID_SYMBOL; symbolPtr->value_.varName_= ch; symbolPtr->symbol_ = ASSIGN_SYMBOL; break; break; } case '+' : throw "Unexpected character in input"; symbolPtr->symbol_ = ADD_SYMBOL; } break; return(symbolPtr); } Remember our hand-coded tokenizer? Symbol* scanDigits () throw() { else std::string lex(""); { lex += inputCharStream_.peek(); while ( isdigit(inputCharStream_.peek()) ) inputCharStream_.advance(); { while ( isdigit(inputCharStream_.peek()) ) lex += inputCharStream_.peek(); { inputCharStream_.advance(); lex += inputCharStream_.peek(); } inputCharStream_.advance(); } Symbol* symbolPtr; symbolPtr= (Symbol*) if (inputCharStream_.peek() != '.') malloc(sizeof(Symbol)); { symbolPtr->symbol_ = FLOAT_SYMBOL; symbolPtr= (Symbol*) symbolPtr->value_.floatPt_ malloc(sizeof(Symbol)); = strtod(lex.c_str(),NULL); symbolPtr->symbol_ = INT_SYMBOL; } symbolPtr->value_.integer_ = strtol(lex.c_str(),NULL,10); return(symbolPtr); } } Q: Eeww! What a pain-in-the-ass! Can’t we just specify regular expressions and someone else make a tokenizer in C? A: Yes you may! With flex History of flex lex: – “Lexical Analyzer” – Written by Mike Lesk and Eric Schmidt for ATT Unix in 1970s – Uses regular expression to define lexemes – C code actions do-able for each flex: – “fast lexical analyzer” – Written by Vern Paxson circa 1987 Our first lex/flex program (1) %{ // Compile with: // $ flex o ex1_echoer.c ex1_echoer.lex // $ gcc ex1_echoer.c o ex1_echoer %} %% \n printf("\n");. printf("%c",yytext); %% int yywrap () { return(1); } int main () { yylex(); return(0); } Our first lex/flex program (2) I know! I know! It looks funny... just compile and run it! $ flex o ex1_echoer.c ex1_echoer.lex $ gcc ex1_echoer.c o ex1_echoer $./ex1_echoer I type something I type something I type something else I type something else Hey! Are *YOU* copying me?!?! Hey! Are *YOU* copying me?!?! ^C # I typed CtrlC Our first lex/flex program (3) So, it just copies what I type: $./ex1_echoer < ex1_echoer.lex %{ // Compile with: // $ flex o ex1_echoer.c ex1_echoer.lex // $ gcc ex1_echoer.c o ex1_echoer %} %% \n printf("\n");. printf("%c",yytext); %% int yywrap () { return(1); } int main () { yylex(); return(0); } Our first lex/flex program (4) Now let’s try to understand the bad-boy: In general: %{ // #includes and Cexterns go here %} // regular expression defns go here %% // regular expressions rules go here %% C functions go here Our first lex/flex program (5) %{ // Compile with: // $ flex o ex1_echoer.c ex1_echoer.lex // $ gcc ex1_echoer.c o ex1_echoer %} Not much going on here... just storing some comments Our first lex/flex program (6) %% \n printf("\n");. printf("%c",yytext); %% Regular expression rules to match – Period (.) matches any char, – \n means newline printf(..) is the C-code to do when regular expression matches – yytext[ ] holds the input read so far Our first lex/flex program (7) %% int yywrap () { return(1); } int main () { yylex(); return(0); } yylex() is the C function that lex/flex produces yywrap() is a function/macro that yylex() uses: – Return value 0: keep on reading when reach EOF (yywrap() would open the next file and then return 0) – Return value 1: return the zero-token to report EOF Pressing Ctrl-C was annoying! Another solution? (1) %{ // Compile with: // $ flex o ex2_echoer.c ex2_echoer.lex // $ gcc ex2_echoer.c o ex2_echoer %} %% quit return(0); // This is our new rule! \n printf("\n");. printf("%c",yytext); %% int yywrap () { return(0); } int main () { printf("Type \"quit\" to quit:\n"); yylex(); return(0); } Pressing Ctrl-C was annoying! Another solution? (2) $ flex o ex2_echoer.c ex2_echoer.lex $ gcc ex2_echoer.c o ex2_echoer $./ex2_echoer Type "quit" to quit: Hello Hello Goodbye Goodbye quit $ Your Turn! Any problems with that rule for reading files? Your Turn Again! Write a Lex program that counts the number of newline characters, and the total number of chars. (Similar to Unix’s wc) And Again! Write a program to count the number of vowels in a file Lex regular expressions (1): Rules – Period (.) Matches any char – | The thing before or after – [] Defines character class (all included) and [0-9] both define digits – [^ ] Defines character class Everything but all included [^0-9] defines all chars other than digits Lex regular expressions (2): Rules – ( ) Just used for grouping – * Matches 0 or more of previous thing – + Matches 1 or more of previous thing – ? Matches 0 or 1 of previous thing (e.g. “optional”) – { } How many times the previous thing should match A{1,3} matches ‘A’ between 1 to 3 times – / Match preceding thing only if followed by following thing 0/1 matches 0 only when followed by 1 (1 still in input) Lex regular expressions (3): Rules: – ^ Match at beginning of line ^begin only matches “begin” at beginning of line – $ Match at end of line end$ only matches “end” at end of line – \ C-escape sequence – "(whatever)" Interpret all enclosed characters literally (other than C escape chars) Regular expression rule example (1) %{ // Compile with: // $ flex o ex4_echoer.c ex4_echoer.lex // $ gcc ex4_echoer.c o ex4_echoer %} %% \n // Ignore newlines ("+"|)?[09]+ { int i=atoi(yytext); printf("Integer: %d\n",i); } [AZaz_][AZaz_09]* { printf("Identifier: %s\n",yytext); }. // Ignore other chars %% int yywrap () { return(1); } int main () { yylex(); return(0); } // yytext[] array into which flex puts currently read lexeme Regular expression rule example (2) $./ex4_echoer < ex4_echoer.lex Identifier: Compile Identifier: with Identifier: flex Identifier: o Identifier: ex4_echoer Identifier: c Identifier: ex4_echoer Identifier: lex Identifier: gcc Identifier: ex4_echoer Identifier: c Identifier: o Identifier: ex4_echoer Identifier: n Identifier: ignore Identifier: newline Integer: 0 Integer: 9... You can define regular expressions at a higher level %{ // Compile with: // $ flex o ex5_echoer.c ex5_echoer.lex // $ gcc ex5_echoer.c o ex5_echoer %} DIGIT [09] ID [AZaz_][AZaz_09]* %% \n // Ignore newlines ("+"|)?{DIGIT}+ { int i=atoi(yytext); printf("Integer: %d\n",i); } {ID} { printf("Identifier: %s\n",yytext); }. // Ignore other chars %% int yywrap () { return(1); } int main () { yylex(); return(0); } Your Turn! Revise the previous program to handle floating point numbers (including scientific notation) Your Turn Again! Write a program that recognizes the tokens of 4 th example lex program: Ignore \n \t \r, spaces, , // comments Print these out as “Percent Begin curly”, etc: %{, %}, %%, \n, (, ), |, (char)-(char), (comma), {, }, (period), *, +, [, ], (semi-colon), Print "quoted strings" (don’t worry about \") Print identifiers (include reserved words) Print integers You control the input! input() (or yyinput() in C++) – Returns next read char. You may call it yourself, e.g. to fast-forward to the end of a comment. " Other helper vars and fncs (1) yyin – FILE* object from which yylex() reads input. – Could set to stdin, or to value from fopen() the file to read. yywrap() – Called when reach EOF. – If returns 0 then means “I set up yyin to read from the beginning of the next file.” – If return 1 then means “That’s it. There is no more input” Other helper vars and fncs (2) YY_INPUT: macro to efficiently read buffer-fulls of chars: YY_INPUT(destinationBuffer,intVarSetToNumCharsRead,buf ferSize) // Example definition: #define YY_INPUT(buf,result,max_size) \ { \ int c = getchar(); \ if (c == EOF) return(YY_NULL); \ buf = c; \ result = 1; \ } Other helper vars and fncs (3) Another example of YY_INPUT %{ #undef YY_INPUT #define YY_INPUT(buf,res,size) \ { res = ourInput(buf,size); } %} %%.. %% Other helper vars and fncs (4) int ourInput (char* buffer, int size) { int n; errno = 0; while ( (n=fread(buffer,1,size,yyin)) == 0 && ferror(yyin)) { if (errno != EINTR) { fprintf(stderr,"Reading file failed: %s\n",strerror(errno)); exit(EXIT_FAILURE); } errno = 0; clearerr(yyin); } return(n); }

CSC 448: Compiler Design Lecture 3 PDF

Document Details

Tags

Related

Summary

Full Transcript