Syntax Directed Translation in Compiler Design PDF

UNIT-3 SyntaxDirectedTranslationinCompilerDesign Parser uses a CFG(Context-free-Grammar) tovalidatetheinput string and produceoutput forthenext phaseof thecompiler. Output could beeithera parse tree or an abstractsyntax tree. Nowtointerleavesemantic analysis with thesyntaxanalysis phaseof the compiler, weuseSyntaxDirected Translation. Conceptually, with both syntax-directed definition and translation schemes,weparse the input token stream,build theparsetree,and then traverse the tree as needed to evaluate thesemantic rules attheparse tree nodes. Evaluation of thesemantic rules may generate code,saveinformation in a symbol table,issueerrormessages,orperform any other activities. Thetranslation of the token streamis theresultobtained by evaluating the semantic rules. Definition SyntaxDirected Translation has augmented rules to the grammar that facilitatesemantic analysis. SDT involves passing information bottom-up and/or top-down to the parsetree in formof attributes attached to the nodes. Syntax-directed translationrules use1) lexical values of nodes, 2) constants & 3) attributes associated with the non-terminals in their definitions. Thegeneral approach toSyntax-Directed Translation is toconstruct aparse treeor syntax treeand computethevalues of attributes atthenodesof thetreebyvisiting them insome order. In many cases, translation can bedone during parsing without building an explicit tree. Example E-> E+T |T T -> T*F | F F -> INTLIT This is a grammar tosyntactically validate an expression having additions and multiplications in it. Now,tocarry out semantic analysis wewillaugmentSDT rules tothis grammar,in ordertopass someinformationup the parsetreeand check for semantic errors,if any. In thisexample,wewill focus on theevaluationof thegiven expression,as we don’ thaveany semantic assertions to check in this verybasic example. E-> E+T {E.val = E.val +T.val } PR#1 E-> T { E.val =T.val } PR#2 T -> T*F { T.val = T.val * F.val} PR#3 T -> F {T.val =F.val } PR#4 F -> INTLIT {F.val= INTLIT.lexval } PR#5 For understanding translation rules further,wetakethefirstSDT augmented to[ E -> E+T ] production rule. Thetranslation ruleinconsiderationhas valas an attributeforboth the non-terminals– E &T. Right-hand side of thetranslation rule corresponds to attribute values of theright-sidenodes of theproduction ruleand vice-versa. Generalizing,SDT are augmented rules to a CFG thatassociate 1) setof attributes toevery nodeof the grammar and 2) a set of translation rules to every production rule using attributes,constants,and lexical values. Let’ s take astring to see how semantic analysis happens – S =2+3*4. Parsetree corresponding toS would be Toevaluatetranslation rules,we can employ one depth-first searchtraversal on theparse tree. This is possible only because SDT rulesdon’ t imposeany specific orderon evaluationuntilchildren’ s attributes arecomputed beforeparents fora grammar having allsynthesized attributes. Otherwise,wewould havetofigure out the best-suited plan to traversethroughtheparse treeand evaluatealltheattributesin one or moretraversals. For betterunderstanding,wewillmovebottom-up in the left to right fashion forcomputing thetranslation rulesof our example. Theabovediagram shows how semantic analysis could happen. The flow of information happens bottom-up and all the children’ s attributes arecomputed beforeparents, as discussed above. Right-hand side nodes are sometimes annotated with subscript 1 to distinguish between children and parents. AdditionalInformation SynthesizedAttributes aresuch attributesthat depend only on theattributevalues of childrennodes. Thus [ E-> E+T{ E.val = E.val+ T.val } ] has asynthesized attribute val corresponding to nodeE. If allthesemantic attributesin an augmented grammar are synthesized,onedepth- firstsearch traversalin any orderis sufficientforthesemantic analysisphase. InheritedAttributes aresuch attributes that depend onparentand/orsibling’ sattributes. Thus [ Ep -> E+T {Ep.val = E.val+ T.val,T.val =Ep.val }], where E& Ep are same production symbols annotated to differentiatebetweenparentand child,has an inherited attributeval corresponding tonodeT. AdvantagesofSyntaxDirectedTranslation: Easeofimplementation: SDT is a simple and easy-to-implement method fortranslating a programming language. It providesa clear and structured way tospecify translation rules using grammar rules. Separationofconcerns: SDT separates the translationprocess from the parsing process, making it easiertomodify and maintainthecompiler. It alsoseparates the translation concerns fromtheparsing concerns,allowing for moremodular and extensiblecompiler designs. Efficientcodegeneration: SDTenables thegeneration of efficientcodeby optimizing the translation process. Itallows for theuseof techniques such as intermediate code generation and code optimization. DisadvantagesofSyntaxDirectedTranslation: Limitedexpressiveness: SDT haslimited expressiveness in comparison toother translation methods, such as attributegrammars. This limits thetypes of translations that can beperformed using SDT. Inflexibility: SDT can be inflexible insituationswhere the translationrules arecomplex and cannotbeeasily expressed using grammarrules. Limitederrorrecovery: SDT is limited in its ability torecover from errors during the translation process. This can result in poorerror messages and may make it difficult to locate and fixerrors in the inputprogram. IntermediateCodeGenerationinCompilerDesign In theanalysis-synthesis modelof acompiler,thefrontend of a compiler translates a sourceprograminto an independent intermediatecode,thentheback end of thecompiler uses this intermediatecodetogeneratethetarget code (which can be understood by the machine). The benefits of using machine-independent intermediatecodeare: Becauseof the machine-independent intermediatecode,portabilitywillbe enhanced. For example,suppose,if a compilertranslatesthesourcelanguageto its targetmachinelanguagewithouthaving the option forgenerating intermediate code,then foreach new machine,a full nativecompiler is required. Because, obviously, there weresomemodifications inthecompiler itself according tothe machinespecifications. Retargeting isfacilitated. It iseasier to apply sourcecode modification toimprovetheperformanceof sourcecode by optimizing theintermediatecode. WhatisIntermediateCodeGeneration? IntermediateCodeGeneration is a stagein theprocessof compiling aprogram,wherethe compiler translates the source codeintoan intermediate representation. This representation is notmachinecodebut is simplerthan the originalhigh-level code. Here’ s how itworks: Translation: Thecompiler takes thehigh-levelcode(likeCor Java) and converts it into an intermediateform, which can beeasier to analyzeand manipulate. Portability:This intermediate codecan often run on different types of machines withoutneeding majorchanges,making it moreversatile. Optimization: Beforeturning it intomachine code, the compilercanoptimizethis intermediatecode to make the final programrun faster or use less memory. If wegeneratemachine codedirectlyfromsourcecodethen for n target machine wewill have optimizers and n codegenerator but if wewillhavea machine-independent intermediatecode,wewill have only one optimizer. Intermediatecodecanbeeither language-specific (e.g.,BytecodeforJava) or language. independent (three-address code). Thefollowing are commonly used intermediate coderepresentations: PostfixNotation Alsoknownas reversePolish notation or suffixnotation. In theinfixnotation,theoperator is placed between operands,e.g., Postfix notation positions theoperator attheright end, as in. For any postfix expressions and with a binary operator applying the operator yields Postfixnotation eliminatestheneed for parentheses,as theoperator’ s position and arity allow unambiguous expression decoding. In postfixnotation,theoperator consistently follows theoperand. Example1: The postfix representation of the expression(a+ b) * c is : ab +c * Example2: The postfix representation of the expression(a– b) * (c + d) +(a– b) is : ab – cd +*ab -+ Read more: InfixtoPostfix Three-AddressCode Athreeaddress statement involves a maximum of three references, consisting of two for operands and onefortheresult. Asequenceof threeaddress statements collectively forms a three address code. Thetypicalformof athree addressstatement isexpressed as ,where , and representmemory addresses. Each variable in a three address statementis associated with a specific memory location. Whilea standard threeaddress statementincludes threereferences,thereareinstances wherea statement may contain fewer than threereferences,yetit is stillcategorized as a threeaddress statement. Example: Thethreeaddress codefor theexpression a +b * c + d : T1 = b * c T2 = a+ T1 T3 = T2 + d; T1 ,T2 ,T3 are temporary variables. Thereare3 ways torepresenta Three-Address Code in compiler design: i) Quadruples ii) Triples iii) Indirect Triples Read more: Three-address code SyntaxTree Asyntaxtreeserves as a condensed representation of a parsetree. Theoperator and keyword nodes present in theparse treeundergoa relocation processtobecomepart of their respective parent nodes in the syntax tree. the internal nodes areoperators and child nodes are operands. Creating a syntaxtree involves strategically placing parentheses withinthe expression. Thistechniquecontributes toa moreintuitiverepresentation,making iteasier to discern thesequencein which operands should beprocessed. Thesyntaxtreenotonly condenses the parsetree but alsooffers an improved visual representation of theprogram’ s syntactic structure, Example: x= (a +b * c) / (a – b * c) AdvantagesofIntermediateCodeGeneration Easier toImplement: Intermediate code generation can simplify thecode generation process byreducing the complexity of theinput code,making it easier toimplement. FacilitatesCodeOptimization: Intermediate code generation can enabletheuse of various codeoptimization techniques,leading toimproved performanceand efficiency of thegenerated code. PlatformIndependence: Intermediate codeis platform-independent,meaning that it can betranslated intomachinecodeorbytecodefor anyplatform. CodeReuse: Intermediatecodecanbereused in the future to generate codefor otherplatforms or languages. Easier Debugging: Intermediate codecan beeasier to debug than machinecode or bytecode, as it isclosertotheoriginal source code. DisadvantagesofIntermediateCodeGeneration IncreasedCompilationTime: Intermediate codegeneration can significantly increase the compilation time, making it less suitable for real-timeortime-critical applications. AdditionalMemoryUsage: Intermediatecode generation requires additional memory to storetheintermediate representation,which can be aconcern for memory-limited systems. IncreasedComplexity: Intermediatecodegeneration can increase the complexity of the compilerdesign,making it harder to implementand maintain. ReducedPerformance: The process of generating intermediatecode can result in code that executes slower than codegenerated directly fromthesourcecode. Assignmentstatements consist of an expression. It involves only integer variables. AbstractTranslationScheme Consider thegrammar,which consists of anassignmentstatement. S → id = E E→ E +E E→ E ∗ E E→ − E E→ (E) E→ id HereTranslation of Ecanhavetwoattributes − 𝐄. 𝐏 𝐋 𝐀 𝐂 𝐄 − It tells aboutthenamethat willhold thevalue of theexpression. 𝐄. 𝐂 𝐎 𝐃 𝐄 − Itrepresents a sequence of threeaddress statements evaluating theexpression Ein grammarrepresents an Assignmentstatement. E. CODE represents the threeaddress codes of thestatement. CODEfornon-terminals on theleft is the concatenation of CODE for eachnon-terminal on theright of Production. AbstractTranslation Scheme Production SemanticAction S → id = E {S. CODE= E. CODE| |id. PLACE|| '=. '||E. PLACE} {T = newtemp( ); E→ E(1) +E(2) E. PLACE= T; E. CODE= E(1). CODE| |E(2). CODE|| E. PLACE Production SemanticAction | | '=' | |E(1). PLACE || '+' | |E(2). PLACE} {T = newtemp( ); E. PLACE= T; E→ E(1) ∗ E(2) E. CODE= E(1). CODE| |E(2). CODE || E. PLACE| | '=' | |E(1). PLACE | | '*'| |E(2). PLACE} {T = newtemp( ); E. PLACE= T; E→ − E(1) E. CODE= E(1). CODE | |E. PLACE || '=− ' ||E(1). PLACE } {E. PLACE= E(1). PLACE; E→ (E(1)) E. CODE= E(1). CODE} {E. PLACE= id. PLACE; E→ id E. CODE= null; } In the firstproduction S → id =E, id. PLACE|| '=' || E. PLACEis astring which follows S. CODE= E. CODE. In the secondproduction E → E(1) +E(2), E. PLACE| |'=' | |E(1). PLACE| | '+' || E(2). PLACEis astring which is appended with E. CODE = E(1). CODE||E(2). CODE. In the fifthproduction,i.e.,E→ (E(1)) does nothaveany string which follows E. CODE= E(1). CODE. This is becauseit does not have any operatoron its R.H.Sof the production. Similarly,the sixthproduction alsodoes nothaveany string appended after E. CODE =null. Thesixth production contains null becausethereis noexpression appears on R.H.S of production. So,noCODE attribute will exist as noexpression exists,becauseCODE represents a sequence of Three Address Statements evaluating the expression. It consists of id in itsR.H.S,which is the terminal symbol but not anexpression. We can alsousea procedure GEN(Statement) in place of S. CODE& E. CODE,As the GEN procedure will automatically generatethreeaddress statements. So, the GEN statementwillreplace the CODEStatements. GENStatementsreplacingCODEdefinitions Production SemanticAction S → id = E GEN(id. PLACE= E. PLACE) E→ E(1) +E(2) GEN(E. PLACE= E(1). PLACE+ E(2). PLACE E→ E(1) ∗ E(2) GEN(E. PLACE= E(1). PLACE∗ E(2). PLACE E→ − E(1) GEN(E. PLACE= − E(1). PLACE) E→ (E(1)) None E→ id None Controlstatements are the statements thatchangetheflowof execution of statements. Consider theGrammar S → if E then S1 |if EthenS1 elseS2 |while Edo S1 In this grammar, Eis the Boolean expression depending upon which S1 or S2 will be executed. Following representation shows the orderof executionof an instruction of if-then, ifthen-else,& while do. 𝐒 →𝐢 𝐟 𝐄 𝐭 𝐡 𝐞 𝐧 𝐒 𝟏 E.CODE& S.CODEarea sequenceof statements which generatethreeaddress code. E.TRUE is thelabel to which control flow if E istrue. E.FALSE is thelabeltowhich control flow if E is false. ThecodeforEgeneratesa jump to E.TRUEif Eis true and a jump toS.NEXT if Eis false. ∴ E.FALSE=S.NEXT inthefollowing table. In thefollowing table,a new labelis allocated to E.TRUE. WhenS1.CODEwillbeexecuted,and thecontrolwillbejumped tostatement following S, i.e.,toS1.NEXT. ∴ S1. NEXT = S. NEXT. SyntaxDirectedTranslationfor "IfEthenS1." Production SemanticRule E. TRUE= newlabel; E. FALSE= S. NEXT; 𝐒 →𝐢 𝐟 𝐄 𝐭 𝐡 𝐞 𝐧 𝐒 𝟏 S1. NEXT = S. NEXT; S. CODE= E. CODE| | GEN(E. TRUE'− ') || S1. CODE 𝐒 →𝐈 𝐟 𝐄 𝐭 𝐡 𝐞 𝐧 𝐒 𝟏 𝐞 𝐥 𝐬 𝐞 𝐒 𝟐 If E is true,control willgotoE.TRUE,i.e.,S1.CODEwill be executed and afterthatS.NEXT appears after S1.CODE. If E.CODEwillbefalse, then S2.CODE willbeexecuted. Initially,both E.TRUE& E.FALSE are taken asnew labels. Hen S1.CODEat label E.TRUE is executed,control will jump toS.NEXT. Therefore,afterS1,controlwill jump tothenext statementof completestatementS. S1.NEXT=S.NEXT Similarly,afterS2.CODE, the nextstatement of S will be executed. ∴ S2.NEXT=S.NEXT SyntaxDirectedTranslationfor "IfEthenS1elseS2." Production SemanticRule E. TRUE= newlabel; E. FALSE= newlabel; S1. NEXT = S. NEXT; S2. NEXT = S. NEXT; 𝐒 →𝐢 𝐟 𝐄 𝐭 𝐡 𝐞 𝐧 𝐒 𝟏 𝐞 𝐥 𝐬 𝐞 𝐒 𝟐 S. CODE= E. CODE| |GEN (E. TRUE '− ') | | S1. CODE GEN(gotoS. NEXT) || GEN(E. FALSE− ) | |S2. CODE 𝐒 →𝐰 𝐡 𝐢 𝐥 𝐞 𝐄 𝐝 𝐨 𝐒 𝟏 Anotherimportant control statementis while Edo S1,i.e.,statement S1 will be executed till ExpressionE istrue. Control willarriveoutof the loop as the expression E will become false. ALabelS. BEGIN iscreated whichpoints to the first instruction forE. Label E. TRUEis attached with the first instruction forS1. If Eis true,controlwill jump tothelabel E. TRUE& S1. CODEwill be executed. If E isfalse,controlwilljump toE. FALSE. After S1. CODE, again controlwilljump toS. BEGIN,which will again check E. CODEfortrueorfalse. ∴ S1. NEXT = S. BEGIN If E. CODEis false,control will jump toE. FALSE,which causes thenext statementafterS to be executed. ∴ E. FALSE= S. NEXT SyntaxDirectedTranslationfor " 𝐒 → 𝐰 𝐡 𝐢 𝐥 𝐞 𝐄 𝐝 𝐨 𝐒 𝟏 " Production SemanticRule S. BEGIN = newlabel; E. TRUE= newlabel; E. FALSE= S. NEXT; S1. NEXT = S. BEGIN; 𝐒 →𝐰 𝐡 𝐢 𝐥 𝐞 𝐄 𝐝 𝐨 𝐒 𝟏 S. CODE = GEN(S. BEGIN '− ') | | E. CODE| |GEN(E. TRUE '− ')| | S1. CODE| |GEN('goto' S. BEGIN PostfixTranslationSchemes: Thesyntax-directed translation whichhas its semantic actions attheend of the production is called the postfixtranslationscheme. This typeof translation of SDT has its corresponding semantics atthelast in the RHS of theproduction. SDTs which contain thesemantic actions at the right ends of theproduction are called postfixSDTs. ExampleofPostfixSDT S ⇢ A#B{S.val= A.val * B.val} A⇢B@1{A.val = B.val+1} B ⇢num{B.val= num.lexval} Parser-StackImplementationofPostfixSDTs: PostfixSDTsareimplemented when thesemantic actions areat the right end of the production and withthebottom-up parser(LRparser or shift-reduce parser) with the non-terminalshaving synthesized attributes. Theparserstack containstherecord for the non-terminals in the grammar and their corresponding attributes. Thenon-terminal symbols of theproductionarepushed ontotheparserstack. If theattributesaresynthesized and semantic actions areattheright ends then attributes of the non-terminals areevaluated forthesymbol in thetop of thestack. Whenthereduction occurs at the top of the stack,the attributes areavailable in the stack,and aftertheaction occurstheseattributes arereplaced by the corresponding LHS non-terminal and its attribute. Now,theLHS non-terminal and its attributesareat thetop of thestack. Production A⇢ BC{A.str = B.str. C.str} B ⇢a {B.str =a} C⇢b{C.str= b} Initially,theparserstack: B C Non-terminals B.str C.str Synthesized attributes ⇡ TopofStack Afterthereduction occurs A⇢BCthen afterB,Cand theirattributes arereplaced by Aand in the attribute. Now,thestack: A Non-terminals A.str Synthesized attributes ⇡ Topofstack SDTwithactioninsidetheproduction: Whenthesemantic actionsarepresent anywhereontheright sideof theproduction then it is SDTwithactioninsidetheproduction. It isevaluated and actions areperformed immediately after the left non-terminal is processed. This typeof SDT includes both S-attributed and L-attributed SDTs. If theSDTis parsed ina bottom-up parserthen, actions are performed immediately after theoccurrenceof a non-terminal atthetop of theparserstack. If theSDTis parsed ina top-down parser then,actions arebefore the expansion of the non-terminalorif theterminal checks forinput. ExampleofSDTwithactioninsidetheproduction S ⇢ A+{print'+'}B A⇢ {print'num'}B B ⇢ num{print'num'} EliminatingLeftRecursionfromSDT: Thegrammarwith leftrecursioncannot be parsed by the top-down parser. So,left recursion should be eliminated and the grammarcan be transformed by eliminating it. GrammarwithLeftRecursion Grammaraftereliminatingleftrecursion P ⇢ Pr| q P ⇢ qA A ⇢ rA |∈ SDTforL-attributedDefinitions: SDT with L-attributed definitions involves bothsynthesized and inherited attributes in the production. Toconvertan L-attributed definition into its equivalent SDT follow theunderlying rules: Whentheattributes are inherited attributes of any non-terminal then place the action immediately beforethenon-terminal in theproduction. Whentheattributes of thenon-terminalaresynthesized then,placetheaction at therightend of that production. 6.MoreaboutTranslation ArrayReferencesinArithmeticExpressions Converts references like A[i]+B[j]A[i] +B[j]A[i]+B[j] intoIR. Example: t1 = i* size_of_element t2 = A+t1 t3 = *t2 ProcedureCalls Translates function calls,parameter passing,and return statements. Example: param x param y call foo,2 Declarations Handles variable declarations, types,and initializations. Example: intx,y; x= 5; CaseStatements Translates switchswitchswitch-caseconstructsintojump tables or conditional jumps. Examplein TAC: switch(x): case 1: goto L1 case 2: goto L2 default: gotoL3

Syntax Directed Translation in Compiler Design PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue