Compiler Construction Slides PDF

Compiler Construction Mayer Goldberg \ Ben-Gurion University 2025-02-14 Mayer Goldberg \ Ben-Gurion University Compiler Construction 1 / 1309 Chapter 1 Goals ▶ Establishing common language & vocabulary ▶ Understanding the “big picture” Agenda ▶ Some background in programming languages ▶ Abstraction ▶ Dynamic vs Static ▶ Functional vs Imperative languages ▶ Introduction to compiler construction ▶ Introduction to the ocaml programming language Mayer Goldberg \ Ben-Gurion University Compiler Construction 2 / 1309 Abstraction ▶ Abstraction is a way of moving from a particular to the general ▶ Abstraction appears mathematics, logic, and in computer science ▶ Abstraction is a force-multiplier, and a great time-saver Mayer Goldberg \ Ben-Gurion University Compiler Construction 3 / 1309 Abstraction (continued) ▶ Abstraction in logic: going from propositions to quantified propositions. For example: Hx x is hungry Ex x goes to eat ▶ Specific: Hx0 → Ex0 means that if [the specific] x0 is hungry, then [the specific] x0 goes to eat ▶ General: ∃x(Hx → Ex) ∀x(Hx → Ex) Mayer Goldberg \ Ben-Gurion University Compiler Construction 4 / 1309 Abstraction (continued) ▶ Abstraction in mathematics: going from expression to function ▶ Consider the expression x · sin2 (1 + x) ▶ This expression denotes a number; not a function ▶ Writing x · sin2 (1 + x) ′ is actually an abuse of notation, because only functions have derivatives ▶ When we write x · sin2 (1 + x) ′ what we intend is the derivative of a function, the argument of which is x, and the body of which is x · sin2 (1 + x): ▶ f(x) = x · sin2 (1 + x) ▶ f ′ (x) = sin2 (1 + x) + 2x · sin(1 + x) · cos(1 + x) Mayer Goldberg \ Ben-Gurion University Compiler Construction 5 / 1309 Abstraction (continued) ▶ Abstraction in mathematics: going from expression to function ▶ The only thing “wrong” here is that we gave the function a name — f ▶ This is “wrong” because the choice of the name is arbitrary ▶ Functions can be written anonymously using λ-notation: ▶ The symbol λ (“lambda”) just means “anonymous function” ▶ In theory (the λ-calculus): λx.(x · sin2 (1 + x)) ▶ In programming (Scheme): (lambda (x) (* x (square (sin (+ 1 x))))) ▶ The expression x · sin2 (1 + x) is concrete, and for a specific x ▶ The expression λx.(x · sin2 (1 + x)) is an abstraction: We say that it abstracts the variable x over the concrete expression Mayer Goldberg \ Ben-Gurion University Compiler Construction 6 / 1309 Abstraction (continued) ▶ Abstraction in programming ▶ Functional programming: Similar to abstraction mathematics ▶ Expressions are abstracted into functions ▶ Functions are abstracted into higher order functions ▶ Collections of functions are abstracted into modules ▶ Modules are abstracted into functors ▶ Modules are abstracted into signatures ▶ Procedural programming ▶ Statements are abstracted into procedures Mayer Goldberg \ Ben-Gurion University Compiler Construction 7 / 1309 Abstraction (continued) ▶ Abstraction in programming (continued) ▶ Object-oriented programming ▶ Objects are abstracted into classes ▶ Classes are abstracted into generics & templates ▶ Classes are abstracted into packages ▶ Logic programming ▶ Similar to abstraction in logic ▶ Propositions are abstracted into predicates ▶ Textual abstraction ▶ Text is abstracted into templates Mayer Goldberg \ Ben-Gurion University Compiler Construction 8 / 1309 What textual abstraction looks like Concrete Abstract September 12, 2001 $(DATE) Dear John: Dear $(FIRST-NAME): I would like to interest I would like to you in our insurance interest you in our policies. With a wife insurance policies. With and three children to $(DEPENDENTS) to look look after, I am sure you after, I am sure you desire the extra sense of desire the extra sense of security afforded by the security afforded by the extended coverage of our extended coverage of our life-insurance policies. life-insurance policies....... Mayer Goldberg \ Ben-Gurion University Compiler Construction 9 / 1309 What textual abstraction looks like (cont) ▶ Notice the text variables that are embedded within the template: $(VARIABLE_1), $(VARIABLE_2), $(VARIABLE_3) ▶ All junk mail, whether printed or electronic, is generated in this way ▶ In word-processing, this functionality is known as mail merge: You are merging templates and tables from a database or a spreadsheet… Mayer Goldberg \ Ben-Gurion University Compiler Construction 10 / 1309 Further Reading 🕮 The Structure and Interpretation of Computer Programs (SICP), chapters 1.3, 2.1, 2.3.1, 2.3.2 🔗 The M4 textual macro language 🔗 Paul Graham’s article Beating the Averages Mayer Goldberg \ Ben-Gurion University Compiler Construction 11 / 1309 Dynamic vs Static ▶ Dynamic & Static are used in many areas of computer science ▶ In programming languages theory, these terms have a very specific meaning: Dynamic To say that something is dynamic, means that it can be —  known   evaluated    computed    performed no sooner than run-time determined    decided   has its address known    etc. Mayer Goldberg \ Ben-Gurion University Compiler Construction 12 / 1309 Dynamic vs Static (continued) Static (aka Lexical) To say that something is static, means that it can be —  known     compile-time evaluated        parsing-time computed        analysis-time performed before run-time, or at assembly-time determined      linking-time decided        loading-time has its address known      etc. etc. Mayer Goldberg \ Ben-Gurion University Compiler Construction 13 / 1309 Dynamic vs Static (continued) From a type-theoretic point of view, the distinction of dynamic vs static is called the phase distinction: ▶ Programming languages where types exist at compile-time only are said to observe the phase distinction. 🤔 This means that: ☞ In languages that observe the phase distinction: ▶ Variables have types ▶ Data have no type ☞ In languages that fail to observe the phase distinction: ▶ Variables have no type ▶ Data have types Mayer Goldberg \ Ben-Gurion University Compiler Construction 14 / 1309 Dynamic vs Static (continued) What is meant by compile-time type-information? ▶ Types are annotations that exist during compile-time ▶ Variables, expressions, sub-expressions are all assigned types ▶ Type checking is performed at compile-time, verifying that data is used according to its type ▶ Example: The function f : τ → σ can be applied to an argument of type τ , and returns a result of type σ ▶ Types are not represented during run-time ▶ Example: A 64-bit integer is represented in two’s complement, with bit #63 denoting the sign Mayer Goldberg \ Ben-Gurion University Compiler Construction 15 / 1309 Dynamic vs Static (continued) What is meant by run-time type-information (RTTI)? ▶ Much less type-information is available at compile-time ▶ Much less type-checking can be done at compile-time ▶ Example: The function (lambda (x) x) can be applied to anything, including itself. In general, nothing is known at compile-time about the type of x ▶ The type is a part of the binary encoding of the data at run-time ▶ Example: In an integer in Scheme is represented using a n-bits, some of those bits encode the type Mayer Goldberg \ Ben-Gurion University Compiler Construction 16 / 1309 Dynamic vs Static (continued) What is meant by run-time type-information (RTTI)? ▶ Example: In our compiler we encode the RTTI as follows: % define T_void 0 % define T_nil 1 % define T_char 2 % define T_string 3 % define T_closure 4 % define T_undefined 5 % define T_boolean 8 % define T_boolean_false ( T_boolean | 1) % define T_boolean_true ( T_boolean | 2) % define T_number 16 % define T_integer ( T_number | 1) % define T_fraction ( T_number | 2) % define T_real ( T_number | 3)... Mayer Goldberg \ Ben-Gurion University Compiler Construction 17 / 1309 Dynamic vs Static (continued) Open Question Can you explain the motivation/rationale behind the way we encode Booleans and numbers in our compiler? % define T_boolean 8 % define T_boolean_false ( T_boolean | 1) % define T_boolean_true ( T_boolean | 2) % define T_number 16 % define T_integer ( T_number | 1) % define T_fraction ( T_number | 2) % define T_real ( T_number | 3) Mayer Goldberg \ Ben-Gurion University Compiler Construction 18 / 1309 Dynamic vs Static (continued) Question Pick the feature that is strictly dynamic: 👎 The address of a variable in Java (within the JVM) 👎 Whether a C function is called with the syntactically-correct number of arguments 👎 Whether a variable is initialized in Java 👎 How much memory is taken up by an object (as in C++ or Java) 👍 Whether an array reference in C is within bounds Mayer Goldberg \ Ben-Gurion University Compiler Construction 19 / 1309 Dynamic vs Static (continued) Question Pick the feature that is strictly static: 👎 Whether an expression in the source code is positive 👎 Whether a function shall be called 👎 How many times a loop shall be performed 👎 Whether an array de-reference is valid 👍 Whether all parentheses are balanced. Mayer Goldberg \ Ben-Gurion University Compiler Construction 20 / 1309 Dynamic vs Static (continued) Question Pick the feature that is strictly static: 👎 Whether a while-loop terminates in C 👎 Whether an address on the stack is a frame pointer 👎 Whether a for-loop iterates more than n times in C 👎 Whether the value of an expression is greater than M 👍 The type of a variable in C Mayer Goldberg \ Ben-Gurion University Compiler Construction 21 / 1309 Dynamic vs Static (continued) Question Pick the feature that is strictly dynamic: 👎 What exceptions appear in the code 👎 The type of a method 👎 Whether all variables have been initialized before use 👎 Whether a function-call is in tail-position 👍 Whether a program de-allocates all dynamically-allocated resources Mayer Goldberg \ Ben-Gurion University Compiler Construction 22 / 1309 Dynamic vs Static (continued) Question Pick the correct statement: 👎 The terms static and dynamic only describe the way programming languages handle types 👎 There is no feature that is checked dynamically that could have been checked statically 👎 It is implossible to check/enforce/validate dymanic behavior/features 👎 No feature is both static and dynamic 👍 The success of opening a file cannot be determinted statically Mayer Goldberg \ Ben-Gurion University Compiler Construction 23 / 1309 Further Reading 🕮 The Dragon Book (2nd edition), page 25 (subsection 1.6.1) 🔗 Phase distinction in type theory (paper) 🔗 Phase distinction in the Wikipedia Mayer Goldberg \ Ben-Gurion University Compiler Construction 24 / 1309 Imperative vs functional languages ▶ You’ve already been exposed to functional programming in your PPL course ▶ Functional languages are often described as languages lacking in side effects ▶ Imperative languages have side effects Defining a language in terms of what it is not does not teach us much. ▶ What can we say positively about functional programming? Mayer Goldberg \ Ben-Gurion University Compiler Construction 25 / 1309 Imperative vs functional languages (continued) ▶ Computer science is the illegitimate child of mathematics and electrical engineering 👶 ☞ Electrical engineering gave us the hardware, the actual machine ☞ Nearly all ideas come from mathematics 🧠 Computers ▶ Digital electronics: Gates, flip-flops, latches, memory, etc ▶ Boolean functions that read their inputs from memory & write their outputs to memory ▶ Reading & writing are synchronized using a clock (with a frequency measured in Hz, KHz, MHz, GHz…) ▶ A finite-state machine Mayer Goldberg \ Ben-Gurion University Compiler Construction 26 / 1309 Imperative vs functional languages (continued) ▶ We cannot design large software-systems while thinking at the level of digital electronics ▶ We need some theoretical foundations for programming & computation Mayer Goldberg \ Ben-Gurion University Compiler Construction 27 / 1309 Imperative vs functional languages (continued) What is mathematics? ▶ A language ▶ A set of ideas, notions, definitions, techniques, results, all expressed in this language What computer science takes from mathematics? ▶ Programming is based on computable mathematics ▶ Theoretical computer science uses all of mathematics ☞ In short: Everything! Mayer Goldberg \ Ben-Gurion University Compiler Construction 28 / 1309 Imperative vs functional languages (continued) What programming languages take from mathematics? ▶ A language ▶ Notions such as functions, recursion, iteration, operators, variables, expressions, evaluation, numbers, types, sets, relations, ordered n-tuples, structures, graphs, etc ▶ Operations such as arithmetic, Boolean, structural (e.g., on n-tuples, sets, etc), abstraction, mapping, folding, etc. Nearly all of the ideas in computer science come from mathematics! Mayer Goldberg \ Ben-Gurion University Compiler Construction 29 / 1309 Imperative vs functional languages (continued) Mathematical objects are free of the limitations that beset computers: Real-world compromises ▶ Computers can only approximate real numbers ▶ Computers cannot implement infinite tape (Turing machines) ▶ Mathematical objects are cheaper than objects created on a physical computer: ▶ Functions are mappings; They take no time to compute or evaluate! ▶ Knowing that an object exists is often all we need! 🤯 We often don’t need to know its value ▶ 🧮 We often don’t need to know how to compute it ▶ Bad things cannot happen: ▶ No exceptions, errors, incorrect results ▶ Nothing is lost, nothing is “too big” or “too much” Mayer Goldberg \ Ben-Gurion University Compiler Construction 30 / 1309 Imperative vs functional languages (continued) Functional programming languages ▶ Closer to mathematics ▶ Easier to reason about ▶ Easier to transform ▶ Easier to generate automatically Imperative programming languages ▶ Farther from mathematics ▶ Harder to reason about ▶ Harder to transform ▶ Harder to generate automatically Mayer Goldberg \ Ben-Gurion University Compiler Construction 31 / 1309 Imperative vs functional languages (continued) Example of non-mathematical ideas: Side effects Imagine having to teach C programming to a logician ▶ To simplify matters, let’s pretend there is a string type in C (rather than char *) ▶ You teach them a simplified version of printf: ▶ Only takes a single string argument ▶ Returns an int: the number of characters in the string 🤔 Remember: stdio.h defines the prototype for printf to return int (the number of characters printed) ▶ Roughly: printf : string -> int Mayer Goldberg \ Ben-Gurion University Compiler Construction 32 / 1309 Imperative vs functional languages (continued) Example of non-mathematical ideas: Side effects ▶ But the logician objects: He already knows of a function from string -> int that returns the number of characters in the string ☞ strlen : string -> int 🤔 He wants to know the difference between printf and strlen Mayer Goldberg \ Ben-Gurion University Compiler Construction 33 / 1309 Imperative vs functional languages (continued) Side-effects: The Dialogue ▶ You: “Simple, printf also prints the string to the screen!” ▶ Logician: “What does it mean to print??” ▶ You: “Seriously?? The printf function prints its argument to the screen & also returns the number of characters it printed!” ▶ Logician: “But you said the domain of printf is string -> int, right?” ▶ You: “Yes, so?” ▶ Logician: “Then where’s the screen??” ▶ You: “In front of you!” ▶ Logician: “Where’s the screen in the domain of the function printf!” Mayer Goldberg \ Ben-Gurion University Compiler Construction 34 / 1309 Imperative vs functional languages (continued) Side-effects: The Dialogue (continued) ▶ You: “It isn’t in the domain. You can think of the screen as a global variable.” ▶ Logician: “I have no idea what you mean: How can the screen be a variable when it’s not passed as a parameter, and its type is not expressed in the domain of the function??” ▶ You: “But that’s the whole point of this printing being a side effect: It is not expressed in the type!” ▶ Logician: “Well, then printf isn’t a function!” ▶ You: “Ummm…” 🤯 Logician (having a Eureka!-moment): “I get it now! You got the domain of printf all wrong!” Mayer Goldberg \ Ben-Gurion University Compiler Construction 35 / 1309 Imperative vs functional languages (continued) Side-effects from a logical point of view ▶ The real type of printf is string × screen → int × screen ▶ The underlined parts of the type are implicit, i.e., they are not written explicitly in the original type given for printf ▶ The implicit parts of the type form the environment ▶ The function-call mentions only the explicit arguments ▶ Leaving out the implicit arguments in the function call creates an illusion of change, as if the environment has changed: An illusory notion of time has been created; We have the environment before the function call, and the environment after the function call ▶ In fact, nothing has changed: The screen in the domain has been mapped into another screen in the range. Mayer Goldberg \ Ben-Gurion University Compiler Construction 36 / 1309 Imperative vs functional languages (continued) Side-effects from a logical point of view (continued) ▶ Introducing side-effects introduces discrete time ▶ Having introduced time, we must now introduce sequencing: { printf("Hello "); printf("world!\n"); } Mayer Goldberg \ Ben-Gurion University Compiler Construction 37 / 1309 Imperative vs functional languages (continued) Side-effects from a logical point of view (continued) The notion of sequencing, like the notion of time, is illusory: ▶ The screen object in the range of printf("Hello "); is the screen object in the domain of printf("world!\n"); ▶ So the two printf expressions are nested, and this is why their “ordering” matters! 🤔 For the very same reason that f(g(x)) generally does not the same as g(f(x))… Mayer Goldberg \ Ben-Gurion University Compiler Construction 38 / 1309 Imperative vs functional languages (continued) Functional Rendition h"Hello ", screeni =⇒ h6, screen ⊕ "Hello "i Imperative C h"World!\n", screen ⊕ "Hello "i =⇒ { h7, screen ⊕ "Hello World!\n"i printf (" Hello "); printf (" World !\n"); } ▶ The return value of the second call to printf ▶ The screen after the second call to printf Mayer Goldberg \ Ben-Gurion University Compiler Construction 39 / 1309 Imperative vs functional languages (continued) Functional programming languages ▶ Closer to the mathematical notions of function, variable, evaluation ▶ Explicit about types ▶ Does not include notions that are not native to mathematics, such as time, side-effect, environment, etc ▶ Offers many other advantages Mayer Goldberg \ Ben-Gurion University Compiler Construction 40 / 1309 Imperative vs functional languages (continued) Imperative programming languages ▶ Farther away from the mathematical notions such as function, variable, evaluation ▶ Hides information through the use of implicit arguments (for convenience) ▶ Harder to reason about: Contains notions such as side effects, sequences, environment, etc., that require translation before we can reason about them ▶ Abstraction is harder, prone to errors ▶ Side-effects create implicit, knotty inter-dependencies between various parts of a software system, making it harder to debug Mayer Goldberg \ Ben-Gurion University Compiler Construction 41 / 1309 Imperative vs functional languages (continued) Abstraction in functional programming languages ▶ Values ⇒ Expressions ⇒ Functions ▶ Higher-order functions ▶ Mathematical operators: mapping, folding, filtering, partitioning, etc ▶ The interpreter evaluates expressions Abstraction in imperative programming languages ▶ State ⇒ Change ⇒ Commands ⇒ Procedures ▶ Object-oriented programming ▶ Imperative ≡ Based upon commands (imperare means to command in Latin) ▶ The interpreter performs commands Mayer Goldberg \ Ben-Gurion University Compiler Construction 42 / 1309 Programming paradigms: A reality check ▶ There are very few strictly functional languages, i.e., languages in which side-effects cannot be expressed ▶ Most functional languages are really quasi-functional: They don’t make it impossible to use side-effects, but they don’t make these easy or fun to use ▶ Most new imperative languages do include features from functional languages ▶ anonymous functions (lambda) ▶ higher-order functions ▶ modules/namespaces/functors Mayer Goldberg \ Ben-Gurion University Compiler Construction 43 / 1309 Imperative vs functional languages (continued) Question Languages based on the functional paradigm — 👎 do not allow us to perform side-effects 👎 do not support object-oriented programming 👎 are not suited to model and solve “real” problems 👎 are slow and ineﬀicient 👍 support high order functions Mayer Goldberg \ Ben-Gurion University Compiler Construction 44 / 1309 Further Reading 🔗 Comparing programming paradigms 🔗 What is functional programming (blog post) Mayer Goldberg \ Ben-Gurion University Compiler Construction 45 / 1309 Introduction to compiler construction ▶ L1 : Language ▶ L2 : Language ▶ PL : A program in language L ▶ Values: A set of values ▶ J·K: Semantic brackets Interpreters evaluate/perform The functional picture (evaluate) ▶ An interpreter is a function IntL : L → Values ▶ Interpreters map expressions to their values ▶ For example: In a functional subset of Scheme, we have IntfuncScheme J(+ 3 5)K = 8 Mayer Goldberg \ Ben-Gurion University Compiler Construction 46 / 1309 Introduction to compiler construction Interpreters evaluate/perform The imperative picture (perform) ▶ An interpreter is a function IntL : L × Environment → Values × Environment ▶ Interpreters map the product of an expression and an environment to the product of a value and an environment ▶ The environments are implicit in the imperative language ▶ When the environment in the domain is not equal to the environment in the range, an illusion of change has been created: “Something changed in the environment” ▶ For example: In the full, imperative Scheme, we have IntScheme Jh(define x 3), {x 7→ undefined}iK = h#hvoidi, {x 7→ 3}i Mayer Goldberg \ Ben-Gurion University Compiler Construction 47 / 1309 Introduction to compiler construction Compilers translate ▶ A compiler is a function CompLL21 : L1 → L2 ▶ Compilers translate programs from one language to another ▶ Let PL1 ∈ L1 , then CompLL21 JPL1 K ∈ L2 ▶ The correctness of the translation is established using interpreters for both the source and target language: IntL1 JPL1 K = IntL2 JCompLL21 JPL1 KK Mayer Goldberg \ Ben-Gurion University Compiler Construction 48 / 1309 Introduction to compiler construction Compilers translate (continued) ▶ We may chain any number of compilers together: IntL1 JPL1 K = IntL2 JCompLL21 JPL1 KK = IntL3 JCompLL32 JCompLL21 JPL1 KKK = IntL4 JCompLL43 JCompLL32 JCompLL21 JPL1 KKKK =... etc. Mayer Goldberg \ Ben-Gurion University Compiler Construction 49 / 1309 Introduction to compiler construction Question Compiled code is generally faster than interpreted code. So why do we need interpreters? Why not stick with compiled code? 👎 It’s easier to program in interpreted languages 👎 It’s easier to debug interpreted code 👎 Interpreters are more flexible than compilers 👎 The difference is pretty much a matter of personal taste 👍 Interpreters are the only way to reach values Mayer Goldberg \ Ben-Gurion University Compiler Construction 50 / 1309 Introduction to compiler construction Question When we program directly in machine language, where’s the interpreter? 👎 The operating system is the interpreter 👎 The shell is the interpreter 👎 There is no interpreter 👎 The process of translating code to machine language is the underlying interpretation 👍 The microprocessor is a hardware implementation of an interpreter for the given micro-architecture Mayer Goldberg \ Ben-Gurion University Compiler Construction 51 / 1309 Introduction to compiler construction Question If interpreters are a logical necessity, why do we need a compiler? 👎 For generality 👎 For ease 👎 For flexibility 👎 For portability 👍 For optimizations Mayer Goldberg \ Ben-Gurion University Compiler Construction 52 / 1309 Introduction to compiler construction Another way of looking at the question A compiler from language L to language L (itself!) is… …just an optimizer! Mayer Goldberg \ Ben-Gurion University Compiler Construction 53 / 1309 What is an optimization A compiler optimization is a semantics-preserving transformation that results in more eﬀicient code Semantics-preserving (≡ meaning-preserving) ▶ Returns the same value (as the original code) under interpretation More eﬀicient code ▶ The interpreter takes less resources to evaluate the code. Resources include: ☞ Execution time ▶ Computer memory ▶ Network traﬀic ▶ Microprocessor registers or any other resource consumed by the code Mayer Goldberg \ Ben-Gurion University Compiler Construction 54 / 1309 What is an optimization (continued) Special-purpose compilers may optimize for non-standard “resources”: ▶ Compilers to g-code, a language for CNCs, will try to minimize motion ▶ Compilers of graphs to visual presentations will try to minimize crossovers of edges ▶ Compilers of document-layout languages will try to minimize “widows”, “orphans”, hyphenations, and other typographically problematic conditions Mayer Goldberg \ Ben-Gurion University Compiler Construction 55 / 1309 What is an optimization (continued) Question Why do programmers write less-than-optimized code? What is the rationale for having compilers be in charge of optimizations? Are good programmers really that rare?? ▶ Good code is hard to write ▶ Some programmers are incompetent ▶ Compilers are faster at detecting opportunities for optimizations ▶ Consistent output: Once debugged, compilers make no mistakes All these reasons are true, but irrelevant! 🤔 Can you think of a reason why a programmer might intentionally write less-eﬀicient code? Mayer Goldberg \ Ben-Gurion University Compiler Construction 56 / 1309 What is an optimization (continued) Which code is better: Code I for (i = 0; i < 200; ++i) { A[i] = 0; } for (i = 0; i < 200; ++i) { B[i] = 0; } Code II for (i = 0; i < 200; ++i) { A[i] = 0; B[i] = 0; } Code III #define M 200 #define N 200... for (i = 0; i < M; ++i) { A[i] = 0; } for (i = 0; i < N; ++i) { B[i] = 0; } Mayer Goldberg \ Ben-Gurion University Compiler Construction 57 / 1309 What is an optimization (continued) Analysis ▶ Code I ▶ less general than Code III ▶ less eﬀicient than Code II ▶ less maintainable than Code III ▶ Code II ▶ less general than Code I ▶ more eﬀicient than Code I, Code III ▶ less maintainable than Code I ▶ Code III ▶ more general than Code I, Code II ▶ less eﬀicient than Code II ▶ more maintainable than Code I, Code II Mayer Goldberg \ Ben-Gurion University Compiler Construction 58 / 1309 What is an optimization (continued) So which code is better? FACT: Optimizing the loops in Code III, converting it to Code II, is performed by most compilers today ▶ Clearly Code III is better! Conclusion People write less-than-optimal code because: ▶ It is more general ▶ It is more maintainable ▶ It is more flexible ▶ Most compilers optimize away such ineﬀiciencies Mayer Goldberg \ Ben-Gurion University Compiler Construction 59 / 1309 What is an optimization (continued) How people use compilers ▶ Want to program in a more general, maintainable, flexible way ▶ Compilation is a point-in-time where generality is traded for eﬀiciency: ▶ Compilers are opportunistic ▶ Compilers identify opportunities to trade generality for eﬀiciency ▶ The resulting code is unreadable, unmaintainable, machine-specific, and fast ▶ We [normally] don’t touch the compiled code: For programmers, it is only the quality of the source code that matters 🤯 Without optimizing compilers, we would be forced to write unmaintainable, overly-specific, machine-dependent code in order to obtain eﬀicient code. Mayer Goldberg \ Ben-Gurion University Compiler Construction 60 / 1309 Composing & combining processors, interpreters, compilers, & programs Remember Lego? We can compose ▶ Processors running programs written in language L ▶ Interpreters written in L′ , running programs written in language L ▶ Compilers written in L′ , running on L, compile programs written in L′′ into equivalent programs written in L′′′ (we are talking about 4 languages!) ▶ Programs written in L′ , running on L We have 5 new blocks for you! Mayer Goldberg \ Ben-Gurion University Compiler Construction 61 / 1309 Composing & combing processors, interpreters, compilers, & programs What it looks like A processor provides ▶ Hardware-implementations of lang interpreters for some language processor ▶ They have only 1 docking point: The languages they provide Mayer Goldberg \ Ben-Gurion University Compiler Construction 62 / 1309 Composing & combing processors, interpreters, compilers, & programs A program What it looks like ▶ Programs have at least 1 docking point: The language they run on machine- ▶ Mach Lang programs are language runs on hand-written directly in hex or program binary. This is hardly ever done these days ▶ Other programs are written in a program source language and compiled to runs lang src some other language using a on compiler. Such programs have an additional docking point Mayer Goldberg \ Ben-Gurion University Compiler Construction 63 / 1309 Composing & combing processors, interpreters, compilers, & programs What it looks like An interpreter ▶ Interpreters come with 3 docking provides lang points: ▶ The language they provide interpreter ▶ The language [interpreter] on runs lang which they run src on ▶ The [source] language in which they were written Mayer Goldberg \ Ben-Gurion University Compiler Construction 64 / 1309 Composing & combing processors, interpreters, compilers, & programs A compiler What it looks like ▶ Compilers come with 4 docking points: compiling program ▶ The language they compile from compiler ▶ The language they compile to ▶ The language in which they were runs lang lang dst src on written ▶ The language [interpreter] on which they run Mayer Goldberg \ Ben-Gurion University Compiler Construction 65 / 1309 Composing & combing processors, interpreters, compilers, & programs Why bother with these pics?? ▶ Interpreters & compilers are often composed in complex ways ▶ Diagrams provide a simple, visual way to make sure that the compositions are correct Mayer Goldberg \ Ben-Gurion University Compiler Construction 66 / 1309 Composing & combing processors, interpreters, compilers, & programs What it looks like A machine language program running machine- runs language ▶ The program must be written in the on program same machine-language interpreted provides lang by the processor processor ▶ The two blocks join naturally Mayer Goldberg \ Ben-Gurion University Compiler Construction 67 / 1309 Composing & combing processors, interpreters, compilers, & programs A compiled program running What it looks like ▶ The program must have been program compiled into the same machine-language interpreted by the runs lang src on processor provides ▶ The two blocks join naturally lang ▶ We are not saying anything about processor the language the program was written in (though we could!) Mayer Goldberg \ Ben-Gurion University Compiler Construction 68 / 1309 Composing & combing processors, interpreters, compilers, & programs An interpreter running a compiled What it looks like program (I) program ▶ Interpreters are similar to processors runs lang src on ▶ They execute/evaluate programs provides lang in a given language ▶ Interpreters are programs too! interpreter ▶ Written in some source language runs lang src on ▶ Compiled into some target provides language lang ▶ They run on an interpreter: processor ▶ a processor (hardware) ▶ a program (another interpreter) Mayer Goldberg \ Ben-Gurion University Compiler Construction 69 / 1309 Composing & combing processors, interpreters, compilers, & programs An interpreter running a compiled program (II) What it looks like ▶ Interpreters can execute/evaluate program runs lang src on programs that were compiled from compiling program another language. Note that the compiler program compiler runs runs lang lang lang dst src src on on ▶ takes the program (on top) as provides lang interpreter source, and runs lang ▶ outputs the program (on the src on provides lang right) as target processor ▶ It is the target program that is executed Mayer Goldberg \ Ben-Gurion University Compiler Construction 70 / 1309 Composing & combing processors, interpreters, compilers, & programs An interpreter running a compiled What it looks like program (III) program ▶ We may add additional details runs lang src on ▶ the processor/interpreter on compiling program compiler program which the compiler executes. runs runs lang lang lang dst src src ▶ We are still missing details on on provides provides lang lang ▶ the compiler that compiled the processor interpreter runs lang interpreter src on ▶ the compiled that compiled the provides lang processor compiler ▶ …this can go on! Mayer Goldberg \ Ben-Gurion University Compiler Construction 71 / 1309 Composing & combing processors, interpreters, compilers, & programs Cross-compilers ▶ The processor on which the compiler runs is different from the one one which the [compiled] program runs ▶ It crosses the boundaries of architectures. ▶ Java compiler javac is an example of a cross-compiler: ▶ It runs on [e.g.,] x86 ▶ It generates Java-byte-code that runs on the JVM ▶ The JVM is an interpreter (java) running on [e.g.,] x86 Mayer Goldberg \ Ben-Gurion University Compiler Construction 72 / 1309 Composing & combing processors, interpreters, compilers, & programs Towers of interpreters What it looks like ▶ Interpreters may be stacked up in a program tower runs lang src on ▶ Towers of interpreters consume provides lang interpreter resources that are exponential to the runs lang src on height of the tower provides lang ▶ Unless there is a marked interpreter slowdown, this is not really a runs lang src on provides tower! lang interpreter ▶ Virtual machines (VMs) can be runs lang src on stacked up as towers of interpreters provides lang ▶ IBM mainframe architecture processor actually does this! Mayer Goldberg \ Ben-Gurion University Compiler Construction 73 / 1309 Composing & combing processors, interpreters, compilers, & programs How are compilers and interpreters made? ▶ Using previously-written compilers and interpreters, of course! ▶ This process is known as bootstrapping, and it is a specific form of composing & combining compilers and interpreters… Mayer Goldberg \ Ben-Gurion University Compiler Construction 74 / 1309 Bootstrapping (I) compiler written in compiled by runs on compiles outputs c1 pascal ibm pascalvs ibm 370 x86 asm x86 exe c2 x86 asm c1 x86 x86 asm x86 exe c3 x86 asm c2 x86 x86 asm x86 exe ▶ c1 is an assembler acting as a cross-compiler ▶ c2 already runs on our PC, but it was created on an IBM mainframe ▶ All the effort of writing an assembler (in Pascal) has to be re-done from scratch in x86 assembly language if we are to detach from the mainframe! ▶ Any updates, upgrades, bug fixes, changes, require that we re-compile c2 on the mainframe! ▶ c3 is essentially c2 compiling itself ▶ With c3 , we are free from our old environment: Pascal on IBM mainframe! Mayer Goldberg \ Ben-Gurion University Compiler Construction 75 / 1309 Bootstrapping (II) compiler written in compiled by runs on compiles outputs c3 x86 asm c2 x86 x86 asm x86 exe c4 x86 asm c3 x86 C (v. 0.1) x86 exe c5 C (v. 0.1) c4 x86 C (v. 0.2) x86 exe c6 C (v. 0.2) c5 x86 C (v. 0.2) x86 exe ▶ With c4 we’re diverging: ▶ c4 is a C compiler ▶ We don’t yet support many features ▶ c5 is a C compiler written in C! Notice that ▶ it is written in an older version of C (v. 0.1) ▶ it supports a newer version of C (v. 0.2) ▶ In writing c6 , we finally get to use all the language features our compiler supports! Mayer Goldberg \ Ben-Gurion University Compiler Construction 76 / 1309 Why study compiler construction ▶ Almost all of you shall use compilers ▶ Most of you shall never work on another compiler after this course ☞ So why study an entire course on writing compilers?? Mayer Goldberg \ Ben-Gurion University Compiler Construction 77 / 1309 Why study compiler construction (cont) There are many benefits to studying compilers ▶ Better understanding of programming languages ▶ Reduce the learning-curve for learning a new programming language ▶ Better understanding of what compilers can & cannot do ▶ Demystify the compilation process ☞ Compilers are examples of code that writes code Mayer Goldberg \ Ben-Gurion University Compiler Construction 78 / 1309 Code that writes code ▶ Like having a team of programmers working for you ▶ Save time; Write code quickly ▶ Gain consistency ▶ Spread your bugs everywhere 🎉 ▶ A bug in the code-generator will spread bugs to many places ▶ This actually makes the bugs easier to find! ▶ Once debugged, the code is generated again, and the same bugs never re-appear Mayer Goldberg \ Ben-Gurion University Compiler Construction 79 / 1309 Why study compiler construction (cont) Bottom line ▶ Knowledge of how compilers work will make you a more effective programmer ▶ Learning to write code that writes code is a force-multiplying technique you can use anywhere (think DSLs!) Mayer Goldberg \ Ben-Gurion University Compiler Construction 80 / 1309 Further Reading 🕮 Modern compiler design (2nd edition), Page 1 (Section 1: Introduction) 🔗 Self hosting 🔗 Bootstrapping 🔗 Interpreters Mayer Goldberg \ Ben-Gurion University Compiler Construction 81 / 1309 Chapter 1 Goals 🗸 Establishing common language & vocabulary 🗸 Understanding the “big picture” Agenda 🗸 Some background in programming languages ▶ Abstraction ▶ Dynamic vs Static ▶ Functional vs Imperative languages 🗸 Introduction to compiler construction ☞ Introduction to the ocaml programming language Mayer Goldberg \ Ben-Gurion University Compiler Construction 82 / 1309 Languages used in this course In this course, we shall be working with 3 languages: ▶ The language in which to write the compiler: ocaml ▶ The language we shall be compiling: Scheme ▶ The language we shall be compiling to: x86/64 assembly language Mayer Goldberg \ Ben-Gurion University Compiler Construction 83 / 1309 Introduction to ocaml (1) ▶ ML is a family of statically-typed, quasi-functional programming languages ▶ The main members of ML are ▶ SML (Standard ML) ▶ ocaml ▶ In Microsoftese, ocaml is called F#… Mayer Goldberg \ Ben-Gurion University Compiler Construction 84 / 1309 What kind of language is ocaml Ocaml — ▶ is used all over the world ▶ is used in commercial and open source projects ▶ is powerful, eﬀicient, convenient, modern, elegant, and has a rich toolset ▶ supports both functional and object-oriented programming ▶ The ocaml object system is very powerful! ▶ makes it very diﬀicult to have run-time errors! Mayer Goldberg \ Ben-Gurion University Compiler Construction 85 / 1309 The advantages of learning ocaml ▶ Very rich language ▶ Great support for abstractions of various kinds ▶ Great library support: dbms, networking, web programming, system programming, etc ▶ Compiles eﬀiciently, either to bytecode or native Mayer Goldberg \ Ben-Gurion University Compiler Construction 86 / 1309 Why we are using ocaml in the course ▶ Pattern-matching, modules, object-orientation, and types make programming sophisticated, abstract, clean, easy, elegant, re-usable, safe ▶ Easy to enforce an API Mayer Goldberg \ Ben-Gurion University Compiler Construction 87 / 1309 Getting, installing & using ocaml 🔗 The OCaml Homepage, or Mayer Goldberg \ Ben-Gurion University Compiler Construction 88 / 1309 Getting, installing & using ocaml ▶ I run ocaml under GNU Emacs (https://www.gnu.org/software/emacs/) which is the best text editor. Period. ▶ Create the file.ocamlinit in your home directory, and place in it the line: #use "topfind";; This will load some basic tools you will want to use. Mayer Goldberg \ Ben-Gurion University Compiler Construction 89 / 1309 Getting, installing & using ocaml Mayer Goldberg \ Ben-Gurion University Compiler Construction 90 / 1309 Getting, installing & using ocaml ▶ You are free to use ocaml under any editor/environment you like ▶ For example, for ocaml under Eclipse, try OcaIDE at http://www.algo-prog.info/ocaide/, or Mayer Goldberg \ Ben-Gurion University Compiler Construction 91 / 1309 Further Reading 🕮 OCaml from the Very Beginning, by John Whitington 🕮 More OCaml: Algorithms, Methods & Diversions, by John Whitington 🕮 Real World OCaml: Functional programming for the masses, by Yaron Minsky & Anil Madhavapeddy 🕮 Practical OCaml, by Joshua B. Smith 🔗 The online manual at http://caml.inria.fr/pub/docs/manual-ocaml/, or Mayer Goldberg \ Ben-Gurion University Compiler Construction 92 / 1309 Expressions & types ▶ Ocaml is a functional programming language ▶ Ocaml is interactive ▶ You enter expressions at the prompt, and get their values and their types ▶ Expressions are ended with ;; Mayer Goldberg \ Ben-Gurion University Compiler Construction 93 / 1309 Expressions & types An interaction at the ocaml prompt: # 3;; - : int = 3 # "asdf";; - : string = "asdf" # 'm';; - : char = 'm' # 3.1415;; - : float = 3.1415 # [1; 2; 3; 5; 8; 13];; - : int list = [1; 2; 3; 5; 8; 13] # [; [2; 3]; [4; 5; 6]];; - : int list list = [; [2; 3]; [4; 5; 6]] Mayer Goldberg \ Ben-Gurion University Compiler Construction 94 / 1309 What’s available?? Modules ▶ We shall learn about modules later on, as part of the module system ▶ In the meantime,modules are ways of aggregating functions & variables, while controlling their visibility ▶ Functionality in ocaml is managed via loading and using modules. Mayer Goldberg \ Ben-Gurion University Compiler Construction 95 / 1309 What’s available?? Directives ▶ Commands that start with # ▶ Non-programmable ▶ Tell you about the run-time system ▶ Change the run-time system Some useful directives ▶ #list;; to list available modules ▶ #cd ;; to change to a directory ▶ #require ;; to specify that a module is required ▶ #show_module ;; to see the signature of the module ▶ #trace ;; to trace a function ▶ #untrace ;; to untrace a function Mayer Goldberg \ Ben-Gurion University Compiler Construction 96 / 1309 What’s available?? What directives are available? Hashtbl.iter (fun k _v -> print_endline k) Toploop.directive_table;; This is nasty! We can define a function to do that: let directives () = Hashtbl.iter (fun k _v -> print_endline k) Toploop.directive_table;; and now we can just run directives();; to see the list of directives. Mayer Goldberg \ Ben-Gurion University Compiler Construction 97 / 1309 What’s available? Pervasives ▶ The module Pervasives contains all the “builtin” procedures in ocaml. ▶ Try executing #show_module Pervasives;; and see what you get! module Pervasives : sig external raise : exn -> 'a = "%raise" external raise_notrace : exn -> 'a = "%raise_notrace" val invalid_arg : string -> 'a val failwith : string -> 'a exception Exit... Mayer Goldberg \ Ben-Gurion University Compiler Construction 98 / 1309 Integers in ocaml # 1 + 2;; - : int = 3 # 3 * 4;; - : int = 12 # 8 / 3;; - : int = 2 # 8 mod 3;; - : int = 2 Mayer Goldberg \ Ben-Gurion University Compiler Construction 99 / 1309 Read your error messages! # 1.2 + 3.4;; Characters 0-3: 1.2 + 3.4;; ^^^ Error: This expression has type float but an expression was expected of type int # cos(3);; Characters 3-6: cos(3);; ^^^ Error: This expression has type int but an expression was expected of type float Mayer Goldberg \ Ben-Gurion University Compiler Construction 100 / 1309 Floating-point numbers in ocaml Operators take a. after them to denote floating-point ops: # 3.4 +. 4.5;; - : float = 7.9 # 3.2 *. 5.1;; - : float = 16.32 # cos(2.0);; - : float = -0.416146836547142407 Mayer Goldberg \ Ben-Gurion University Compiler Construction 101 / 1309 Booelans in ocaml # true;; - : bool = true # false;; - : bool = false # true && false;; - : bool = false # false || false;; - : bool = false # 3 = 3;; - : bool = true # 3 = 4;; - : bool = false # 3 != 4;; - : bool = true Mayer Goldberg \ Ben-Gurion University Compiler Construction 102 / 1309 Read your error messages! In ocaml, unlike in Scheme, the then-clause and else-clause of if-expressions must be of the same type! # if 3 = 3 then 123 else 456;; - : int = 123 # if 3 = 4 then " moshe " else " yosi ";; - : string = " yosi " # if 4 = 4 then 123 else " moshe ";; Characters 23 -30: if 4 = 4 then 123 else " moshe ";; ^^^^^^^ Error : This expression has type string but an expression was expected of type int Mayer Goldberg \ Ben-Gurion University Compiler Construction 103 / 1309 Bitwise Boolean functions over the integers # 5 land 3;; - : int = 1 # 8 lor 3;; - : int = 11 # 5 lxor 3;; - : int = 6 Mayer Goldberg \ Ben-Gurion University Compiler Construction 104 / 1309 Characters in ocaml # '*';; - : char = '*' # '\t';; - : char = '\t' # '\n';; - : char = '\n' # '\\';; - : char = '\\' # '\"';; - : char = '"' # '\065';; - : char = 'A' Mayer Goldberg \ Ben-Gurion University Compiler Construction 105 / 1309 Strings in ocaml # "moshe!";; - : string = "moshe!" # "moshe\n";; - : string = "moshe\n" # "hello" ^ " " ^ "world";; - : string = "hello world" # "moshe".;; - : char = 'h' #show_module String;; will show you what string functions are available Mayer Goldberg \ Ben-Gurion University Compiler Construction 106 / 1309 Conversion & coercion of types # char_of_int 97;; - : char = 'a' # int_of_char 'A';; - : int = 65 ▶ Chars in ocaml are encoded as single bytes in ASCII, not as Unicode! # char_of_int 1488;; Exception: Invalid_argument "char_of_int". ☞ There is Unicode support in ocaml (later!) ▶ The only characters supported in F# are Unicode… Mayer Goldberg \ Ben-Gurion University Compiler Construction 107 / 1309 Conversion & coercion of types # int_of_string "1234";; - : int = 1234 # int_of_string "12+34";; Exception: Failure "int_of_string". # string_of_int 496351;; - : string = "496351" # float_of_string "123.456";; - : float = 123.456 # string_of_float 3.1415927;; - : string = "3.1415927" Mayer Goldberg \ Ben-Gurion University Compiler Construction 108 / 1309 Tuples in ocaml Ocaml supports ordered n-tuples. If e1 : τ1 ,... en : τn , then he1 ,... , en i : (τ1 × · · · × τn ). # (3, 4, 5);; - : int * int * int = (3, 4, 5) # (3, "blind", "mice");; - : int * string * string = (3, "blind", "mice") The ordered 0-tuple is possible too, and its type is unit: # ();; - : unit = () Mayer Goldberg \ Ben-Gurion University Compiler Construction 109 / 1309 Lists in ocaml For a type α, A list of type α list is either empty, or it contains something of type α, followed by an α list. Ocaml supports lists as a builtin data type. Lists are lists of some type: ▶ lists of integers ▶ lists of strings ▶ lists of user-defined data-types etc. # [2; 3; 5; 8; 13];; - : int list = [2; 3; 5; 8; 13] # [true; false; false; false; true];; - : bool list = [true; false; false; false; true] Mayer Goldberg \ Ben-Gurion University Compiler Construction 110 / 1309 Lists in ocaml Elements in a list must all belong to the same type: # [ true ; 234; " moshe !" ];; Characters 7 -10: [ true ; 234; " moshe !" ];; ^^^ Error : This expression has type int but an expression was expected of type bool This is different from lists in dynamic languages (Scheme, Racket, Prolog, Python, etc). Mayer Goldberg \ Ben-Gurion University Compiler Construction 111 / 1309 Lists in ocaml The empty list has an interesting type: # [];; - : 'a list = [] In ocaml, ▶ 'a is called alpha and is often written using α ▶ 'b is called beta and is often written using β ▶ 'c is called gamma and is often written using γ ▶ … etc. The expressions 'a, 'b, 'c, etc., are known as type variables Mayer Goldberg \ Ben-Gurion University Compiler Construction 112 / 1309 Lists in ocaml ▶ With non-empty lists, ocaml can figure out the type of the list based on the type of the elements in the list. ▶ With empty lists, ocaml is unable to figure out the type of the list. This is why you see the type variable unresolved in the type of []. ▶ You may specify the type of α: # ([] : int list);; - : int list = [] # ([] : string list);; - : string list = [] # ([] : int list list list);; - : int list list list = [] Mayer Goldberg \ Ben-Gurion University Compiler Construction 113 / 1309 Read your error messages! 💣 Specifying the type is not the same as casting in C/C++/Java: # (2.345 : float );; - : float = 2.345 # (2 : float );; Characters 1 -2: (2 : float ) ;; ^ Error : This expression has type int but an expression was expected of type float There is no casting in ocaml, but there are procedures you can call to generate corresponding data in another type. Mayer Goldberg \ Ben-Gurion University Compiler Construction 114 / 1309 Lists in ocaml Working with lists: Adding an element to a list: # 3 :: [4; 5; 6];; - : int list = [3; 4; 5; 6] Appending elements to a list: # [2; 3] @ [5; 8; 13];; - : int list = [2; 3; 5; 8; 13] #show_module List;; will show you what more is available Mayer Goldberg \ Ben-Gurion University Compiler Construction 115 / 1309 Functions in ocaml Overview ▶ Syntax for named functions ▶ Syntax for anonymous functions ▶ Syntax for functions with pattern-matching ▶ Syntax for recursive functions ▶ Syntax for mutually recursive functions Mayer Goldberg \ Ben-Gurion University Compiler Construction 116 / 1309 Functions in ocaml To define the function square, that takes an integer argument and returns its square, we define: # let square n = n * n;; val square : int -> int = We can now use square as any builtin function: # square;; - : int -> int = # square 234;; - : int = 54756 # square(34);; - : int = 1156 # square 0;; - : int = 0 Mayer Goldberg \ Ben-Gurion University Compiler Construction 117 / 1309 Read your error messages! ▶ You don’t ordinarily need parenthesis ▶ It’s not an error to have unneeded parenthesis ▶ Sometimes it’s really needed! # square ((((((123))))));; - : int = 15129 # square -234;; Characters 0-6: square -234;; ^^^^^^ Error: This expression has type int -> int but an expression was expected of type int # square (-234);; - : int = 54756 Mayer Goldberg \ Ben-Gurion University Compiler Construction 118 / 1309 Read your error messages! 🤔 What is the ocaml type-checker thinking?? # square -5;; Line 1, characters 0 -6: 1 | square -5;; ^^^^^^ Error: This expression has type int -> int but an expression was expected of type int # square - 5;; Line 1, characters 0 -6: 1 | square - 5;; ^^^^^^ Error: This expression has type int -> int but an expression was expected of type int # ( square ) - (5) ;; Line 1, characters 0 -8: 1 | ( square ) - (5) ;; ^^^^^^^^ Error: This expression has type int -> int but an expression was expected of type int Mayer Goldberg \ Ben-Gurion University Compiler Construction 119 / 1309 Functions in ocaml Overview 🗸 Syntax for named functions ▶ Syntax for anonymous functions ▶ Syntax for functions with pattern-matching ▶ Syntax for recursive functions ▶ Syntax for mutually recursive functions Mayer Goldberg \ Ben-Gurion University Compiler Construction 120 / 1309 Functions in ocaml # fun n -> n * n;; - : int -> int = # ((fun n -> n * n) 123);; - : int = 15129 # List.map;; - : ('a -> 'b) -> 'a list -> 'b list = # List.map (fun n -> n * n) [1; 2; 3; 4; 5];; - : int list = [1; 4; 9; 16; 25] Mayer Goldberg \ Ben-Gurion University Compiler Construction 121 / 1309 Functions in ocaml Overview 🗸 Syntax for named functions 🗸 Syntax for anonymous functions ▶ Syntax for functions with pattern-matching ▶ Syntax for recursive functions ▶ Syntax for mutually recursive functions Mayer Goldberg \ Ben-Gurion University Compiler Construction 122 / 1309 Functions in ocaml # function 0 -> "zero" | 1 -> "one" | 2 -> "two" | n -> (string_of_int n) ^ " is too much!";; - : int -> string = # (function 0 -> "zero" | 1 -> "one" | 2 -> "two" | n -> (string_of_int n) ^ " is too much!")(1);; - : string = "one" # (function 0 -> "zero" | 1 -> "one" | 2 -> "two" | n -> (string_of_int n) ^ " is too much!")(3);; - : string = "3 is too much!" Mayer Goldberg \ Ben-Gurion University Compiler Construction 123 / 1309 Functions in ocaml Overview 🗸 Syntax for named functions 🗸 Syntax for anonymous functions 🗸 Syntax for functions with pattern-matching ▶ Syntax for recursive functions ▶ Syntax for mutually recursive functions Mayer Goldberg \ Ben-Gurion University Compiler Construction 124 / 1309 Functions in ocaml (continued) # let rec fact n = if (n = 0) then 1 else n * (fact (n - 1));; val fact : int -> int = # fact 5;; - : int = 120 Mayer Goldberg \ Ben-Gurion University Compiler Construction 125 / 1309 Functions in ocaml (continued) Computing logarithms It is straightforward to compute logarithms in any base:  b  1 + loga ( a ) a < b loga (b) = 1 a=b  1 a>b log (a) b ▶ You can carry this algorithm to any number of steps. The logarithm is then given as a continued fraction ▶ This algorithm can be implemented using a simple recursive function, that iterates over the number of times the third condition is met Mayer Goldberg \ Ben-Gurion University Compiler Construction 126 / 1309 Functions in ocaml (continued) let rec logarithm a b n = if n = 0 then 1.0 else if a < b then 1.0 +. logarithm a (b /. a) n else 1.0 /. logarithm b a (n - 1);; Testing the function to compute log10 (5): # logarithm 10. 5. 4;; - : float = 0.692307692307692291 # logarithm 10. 5. 14;; - : float = 0.698970004331193 # log(5.) /. log(10.);; - : float = 0.698970004336018746 Mayer Goldberg \ Ben-Gurion University Compiler Construction 127 / 1309 Functions in ocaml (continued) Overview 🗸 Syntax for named functions 🗸 Syntax for anonymous functions 🗸 Syntax for functions with pattern-matching 🗸 Syntax for recursive functions ▶ Syntax for mutually recursive functions Mayer Goldberg \ Ben-Gurion University Compiler Construction 128 / 1309 Functions in ocaml let epsilon = 1.0e -6 let square x = x *. x;; let rec our_sin x = if Float. abs x < epsilon then x else 2. *. our_sin (x /. 2.) *. our_cos (x /. 2.) and our_cos x = if Float. abs x < epsilon then sqrt (1. -. square ( our_sin x)) else square ( our_cos (x /. 2.) ) -. square ( our_sin (x /. 2.) );; Mayer Goldberg \ Ben-Gurion University Compiler Construction 129 / 1309 Functions in ocaml # our_cos 0.7;; - : float = 0.764842187303379606 # cos 0.7;; - : float = 0.764842187284488495 # our_cos (- 0.7);; - : float = 0.764842187303379606 # Float.cos (- 0.7);; - : float = 0.764842187284488495 # our_sin 0.123;; - : float = 0.122690090024220544 # our_sin (- 0.123);; - : float = -0.122690090024220544 # sin 0.123;; - : float = 0.122690090024315329 Mayer Goldberg \ Ben-Gurion University Compiler Construction 130 / 1309 Functions in ocaml Overview 🗸 Syntax for named functions 🗸 Syntax for anonymous functions 🗸 Syntax for functions with pattern-matching 🗸 Syntax for recursive functions 🗸 Syntax for mutually recursive functions Mayer Goldberg \ Ben-Gurion University Compiler Construction 131 / 1309 Currying arguments Functions in ocaml are Curried… Not this kind of Curry Mayer Goldberg \ Ben-Gurion University Compiler Construction 132 / 1309 Currying arguments So what is Currying ▶ A function is a subset of the Cartesian product of a domain-set times a range-set: f ⊆ D × R. ▶ Suppose the domain is itself a Cartesian product: D = D1 × · · · × Dn. ▶ Then f ⊆ ((D1 × · · · × Dn ) × R). ▶ The structure ((D1 × · · · × Dn ) × R) is isomorphic to D1 × (D2 · · · × (Dn × R) · · · ). ▶ The structure D1 × (D2 · · · × (Dn × R) · · · ) is the domain of a function of 1 argument, that returns a function of 1 argument, … that returns a function of 1 argument that returns something in R. This is a Curried function. Mayer Goldberg \ Ben-Gurion University Compiler Construction 133 / 1309 Currying arguments The American logician, Haskell B Curry is responsible for the idea of Currying. Haskell B Curry Curried functions So for any function of several variables f, we can defined a Curried function fCurry that Curries over its arguments, i.e., takes one at a time. Mayer Goldberg \ Ben-Gurion University Compiler Construction 134 / 1309 Currying arguments ▶ In applications, Ocaml Curries arguments naturally. This means that the application f x y z w t really means (((((f x) y) z) w) t). ▶ Parameters to named functions Curry naturally. For example: # let f x y z = x + y + z;; val f : int -> int -> int -> int = # f 2;; - : int -> int -> int = # f 2 3;; - : int -> int = # f 2 3 4;; - : int = 9 ▶ To avoid Currying, you may pass tuples: In C, f(x, y, z) is a procedure call with 3 arguments. In ocaml, this same code is a procedure call with a single argument: an ordered triple. Mayer Goldberg \ Ben-Gurion University Compiler Construction 135 / 1309 🤔 Self-Study The Language of FOPL Roadmap ☞ Motivation ▶ Boolean Connectives ▶ Predicates ▶ Quantifiers ▶ Syllogisms ▶ Formalization using FOPL Mayer Goldberg \ Ben-Gurion University Compiler Construction 136 / 1309 🤔 Self-Study The Language of FOPL Goals ▶ The goal of this tutorial is to make you proficient in the topic of formalizing statements in first-order predicate-logic (FOPL) ▶ You’ve studied some basic FOPL in your freshman course Logic & Set-Theory, but the emphasis was mainly on set-theory ▶ Traditionally, formalization of sentences from a natural language to FOPL is a subject taught in logic courses in the Faculty of the Humanities ▶ By the end of this tutorial — ▶ You should be able to read sentences in FOPL and grasp their meaning ▶ You should be able to translate sentences from a natural language into FOPL, capturing & preserving their meaning Mayer Goldberg \ Ben-Gurion University Compiler Construction 137 / 1309 🤔 Self-Study The Language of FOPL Motivation ▶ FOPL is the language of the exact sciences (mathematics & computer science) ▶ It is precise ▶ It is concise ▶ It is unambiguous ▶ It is language-neutral ▶ It is easy to check Mayer Goldberg \ Ben-Gurion University Compiler Construction 138 / 1309 🤔 Self-Study The Language of FOPL Motivation (cont) ▶ FOPL is used in your courses on calculus, discrete mathematics, algorithms, etc ▶ FOPL is also used in the Compiler-Construction Course ▶ In recent years, FOPL emerged as a great vehicle for testing student-knowledge on various topics ▶ This does not mean that questions that use FOPL are logic-questions ▶ We only use the language of FOPL as a way of expressing knowledge about various problem-domains Mayer Goldberg \ Ben-Gurion University Compiler Construction 139 / 1309 🤔 Self-Study The Language of FOPL Motivation (cont) ▶ In principle, most of this tutorial should be familiar to you, if you remember your freshman-courses ▶ Which is why this is a self-study tutorial! ▶ You may think of this tutorial as a refresher ▶ The aim of this tutorial is to ensure that all students have suﬀicient experience with FOPL so as to enable them to use it effectively during tests Mayer Goldberg \ Ben-Gurion University Compiler Construction 140 / 1309 🤔 Self-Study The Language of FOPL Roadmap 🗸 Motivation ☞ Boolean Connectives ▶ Predicates ▶ Quantifiers ▶ Syllogisms ▶ Formalization using FOPL Mayer Goldberg \ Ben-Gurion University Compiler Construction 141 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives: Negation ▶ If α is a proposition, then ¬α is a proposition ▶ If α is true, then ¬α is false, and vice versa ▶ The truth-table for negation is given by α ¬α F T T F ▶ The property of double negation states that for any proposition α, we have ¬¬α ⇔ α Mayer Goldberg \ Ben-Gurion University Compiler Construction 142 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives: Conjunction ▶ If α, β are propositions, then α ∧ β is a proposition ▶ For α ∧ β to be true, both α, β must be true ▶ The truth-table for conjunction is given by: α β α∧β F F F F T F T F F T T T Mayer Goldberg \ Ben-Gurion University Compiler Construction 143 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives: Disjunction ▶ If α, β are propositions, then α ∨ β is a proposition ▶ For α ∧ β to be true, either α, β must be true ▶ The truth-table for disjunction is given by: α β α∨β F F F F T T T F T T T T Mayer Goldberg \ Ben-Gurion University Compiler Construction 144 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives (cont) Conjuction and Disjunction are commutative and associative ▶ α∧β ⇔β∧α ▶ α∨β ⇔β∨α ▶ (α ∧ (β ∧ γ)) ⇔ ((α ∧ β) ∧ γ) ▶ (α ∨ (β ∨ γ)) ⇔ ((α ∨ β) ∨ γ) Boolean Connectives (cont) Conjuction and Disjunction satisfy two distributive laws: ▶ (α ∧ (β ∨ γ)) ⇔ ((α ∧ β) ∨ (α ∧ γ)) ▶ (α ∨ (β ∧ γ)) ⇔ ((α ∨ β) ∧ (α ∨ γ)) Mayer Goldberg \ Ben-Gurion University Compiler Construction 145 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — de Morgan’s Laws ▶ (¬(α ∨ β)) ⇔ ((¬α) ∧ (¬β)) ▶ (¬(α ∧ β)) ⇔ ((¬α) ∨ (¬β)) Mayer Goldberg \ Ben-Gurion University Compiler Construction 146 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Material-Implication ▶ If α, β are propositions, then α → β is a proposition ▶ Material-Implication captures the idea of if-then-else: ▶ Read α → β as ▶ If α then β ▶ If α is true, then β is true ▶ α entails β ▶ From α being true it follows that β is true ▶ In α → β, α is called the antecedent of the implication, and β is called the conclusion of the implication ▶ Material-Implication is sometimes written as α ⊃ β 🤔 “We learned this as just ‘implication’; Why do you call it ‘material implication’? ▶ We’ll get to that soon! Mayer Goldberg \ Ben-Gurion University Compiler Construction 147 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Material-Implication (cont) ▶ Material-implication fails to hold only when the antecedent is true and the conclusion is false ▶ The truth-table for Material-Implication is given by: α β α→β F F T F T T T F F T T T Mayer Goldberg \ Ben-Gurion University Compiler Construction 148 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Material-Implication (cont) Consider the following example of material-implication: ▶ “If dogs are green, then cats are green too” ▶ This is a true statement ▶ Neither the antecedent nor the conclusion hold true ▶ Therefore the implication does hold true! ▶ In natural language, when we use conditional statements ▶ We hardly ever use them vacuously ▶ When we do, it’s only for comic effect ▶ We normally intend to underscore some deep, often causal relationship between the antecedent and the conclusion: ▶ “If you drop the clock, then it surely shall break” ▶ None of these are captured by the material-implication! Mayer Goldberg \ Ben-Gurion University Compiler Construction 149 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Material-Implication (cont) Here’s another example of material-implication: ▶ “If 1 + 2 = 3 then n! = Γ(n + 1)” ▶ Both the antecedent and the conclusion hold true ▶ Therefore the implication does hold true! ▶ There is no obvious way of getting from the antecedent to the conclusion: ▶ These are two, unrelated, true mathematical propositions ▶ This antecedent cannot serve as evidence in establishing the truth of this conclusion Mayer Goldberg \ Ben-Gurion University Compiler Construction 150 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Material-Implication (cont) ▶ There is something unnatural and superficial about the relationship implied by material-implication ▶ Philosophers have concerned themselves with many other kinds of implications, that attempt to capture some deeper relationship, possibly causal, between the antecedent and the conclusion ▶ Modal ▶ Counterfactual ▶ Temporal ▶ Causal etc ▶ The word “material” in the term “material-implication” is meant to express the superficiality of this relationship Mayer Goldberg \ Ben-Gurion University Compiler Construction 151 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Material-Implication (cont) ▶ Material-implication has one advantage though: It is the only notion of implication that is truth-functional ▶ The truth-functionality of material-implication means that the truth-value of α → β is a function of the truth-value of α and the truth-value of β ▶ Any deeper implicative relation between antecedent and conclusion requires more information about the antecedent and the conclusion than merely their truth values! Mayer Goldberg \ Ben-Gurion University Compiler Construction 152 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Material-Implication (cont) ▶ Material-implication can be written in terms of negation and disjunction, or negation and conjunction: (α → β) = (¬α) ∨ β = ¬(α ∧ ¬β) ▶ Material implication naturally associates to the right: α → β → γ should be interpreted to mean α → (β → γ) Mayer Goldberg \ Ben-Gurion University Compiler Construction 153 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Bi-implication (cont) ▶ If α, β are propositions, then α ↔ β is a proposition ▶ Bi-implication captures the idea of if-and-only-if: ▶ Read α ↔ β as ▶ α if-and-only-if (iff) β ▶ If α is true, then β is true, and vice versa ▶ Either α, β are both true, or both false ▶ α is equivalent to β ▶ Bi-implication is also known by the names equivalence (≡), and bi-conditional Mayer Goldberg \ Ben-Gurion University Compiler Construction 154 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Bi-implication (cont) ▶ Bi-implication captures the idea of two propositions having the same truth-value, whether both true or both false ▶ Here are two ways to define bi-implication: ▶ (α ↔ β) ⇔ ((α → β) ∧ (β → α)) ☞ This is why it’s called bi-implication or bi-conditional ▶ (α ↔ β) ⇔ ((α ∧ β) ∨ ((¬α) ∧ (¬β))) ☞ “either both true or both false” Mayer Goldberg \ Ben-Gurion University Compiler Construction 155 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Functional Completeness ▶ Let Bool = {F, T} be the set of Boolean values ▶ The set of all Boolean functions is the set of all functions in Booln → Boolm , for all natural numbers m, n ∈ N ▶ The set of Boolean functions {¬, ∧} can be used to express any Boolean function ▶ The property of being able to express any function is called functional completeness ▶ The set {¬, ∧} are said to be functionally complete ▶ Another set of functionally-complete Boolean functions is {¬, ∨} Mayer Goldberg \ Ben-Gurion University Compiler Construction 156 / 1309 🤔 Self-Study The Language of FOPL Boolean Connectives — Functional Completeness (cont) ▶ The functions nand, nor are also functionally-complete: α∼ ∧ β = ¬(α ∧ β) α∼ ∨ β = ¬(α ∨ β) ▶ Hint: Try to define negation, conjunction, disjunction using only nand or only nor ▶ Nand is also known as the Sheffer stroke, and is written as α | β, after the American logician Henry Maurice Sheffer Mayer Goldberg \ Ben-Gurion University Compiler Construction 157 / 1309 🤔 Self-Study The Language of FOPL Roadmap 🗸 Motivation 🗸 Boolean Connectives ☞ Predicates ▶ Quantifiers ▶ Syllogisms ▶ Formalization using FOPL Mayer Goldberg \ Ben-Gurion University Compiler Construction 158 / 1309 🤔 Self-Study The Language of FOPL Predicates ▶ You may think of predicates as functions from some type α to the set of Bool = {F, T} of Boolean values ▶ The number of arguments taken by a predicate is its arity ▶ We speak of ▶ 1-place, or unary predicates ▶ 2-place, or binary predicates ▶ n-place, or n-ary predicates Mayer Goldberg \ Ben-Gurion University Compiler Construction 159 / 1309 🤔 Self-Study The Language of FOPL Predicates (cont) ▶ Predicates extend our language to express properties and relations ▶ When α is the type of an object in our domain of discourse, we say that the predicate denotes a property ▶ When α is a product-type, populated by tuples of objects in our domain of discourse, we say that the predicate denotes a relation among those objects ▶ Predicates are written as Px or P(x), where x : α Mayer Goldberg \ Ben-Gurion University Compiler Construction 160 / 1309 🤔 Self-Study The Language of FOPL Predicates (cont) ▶ Example: ▶ We define the following predicates: ▶ Let Bx denote that x is a boy ▶ Let Gx denote that x is a girl ▶ Let Lxy denote that x loves y ▶ We can now express simple propositions using these predicates: ▶ “Tarzan is a boy”: B(Tarzan) ▶ “Jane is a girl”: G(Jane) ▶ “Tarzan loves Jane”: L(Tarzan, Jane) ▶ “Jane does not love Tarzan”: ¬L(Jane, Tarzan) ☞ When the predicate arguments are not single-letter variables, it’s often more convenient to write then in the format Pred(arg1 , arg2 , · · · ) Roadmap Mayer Goldberg \ Ben-Gurion University Compiler Construction 161 / 1309 🤔 Self-Study The Language of FOPL Quantifiers ▶ Quantifiers extend our language so we can talk about the quantities of objects that satisfy some proposition: ▶ The universal quantifier ∀ is used to assert that some proposition holds for all objects ▶ the existential quantifier ∃ is used to assert that some proposition holds for [at least] one object Mayer Goldberg \ Ben-Gurion University Compiler Construction 162 / 1309 🤔 Self-Study The Language of FOPL Quantifiers (cont) ▶ Example: ▶ Continuing our previous predicates: ▶ Let Bx denote that x is a boy ▶ Let Gx denote that x is a girl ▶ Let Lxy denote that x loves y ▶ How would we formalize the following: ▶ “There exist at least two boys” ▶ Answer: ∃x∃y(Bx ∧ By ∧ x 6= y) 🤔 Had we not added the caveat that x 6= y, this proposition would have been true for a universe consisting of one boy! Mayer Goldberg \ Ben-Gurion University Compiler Construction 163 / 1309 🤔 Self-Study The Language of FOPL Quantifiers (cont) ▶ Continuing the example with B, G, L: ▶ How would we formalize the following: ▶ “Not all girls love themselves” ▶ Answer: ¬∀x(Gx → Lxx) ▶ Later on, we shall see other ways of encoding this proposition ▶ How would we formalize the following: ▶ “All boys like Mary” ▶ Answer: ∀x(Bx → L(x, Mary)) Mayer Goldberg \ Ben-Gurion University Compiler Construction 164 / 1309 🤔 Self-Study The Language of FOPL Quantifiers (cont) We can abbreviate sequences of same quantifiers: ▶ Rather than writing ∀x∀y (α), we can write ∀x, y (α) ▶ Rather than writing ∃x∃y (α), we can write ∃x, y (α) ▶ The order of quantifiers can be switched among their own kind: ▶ Universal: ∀x, y (α) is equivalent to ∀y, x (α) ▶ Existential: ∃x, y (α) is equivalent to ∃y, x (α) ▶ But the order cannot be swapped when the quantifiers are different: ▶ ∀x∃y (α) is not equivalent to ∃y∀x (α) ▶ Example: “For every person, there exists a sandwich, such that this person eats that sandwich” is not equivalent to “There exists a sandwich, that is eaten by every person”! Mayer Goldberg \ Ben-Gurion University Compiler Construction 165 / 1309 🤔 Self-Study The Language of FOPL Roadmap 🗸 Motivation 🗸 Boolean Connectives 🗸 Predicates 🗸 Quantifiers ☞ Syllogisms ▶ Formalization using FOPL Mayer Goldberg \ Ben-Gurion University Compiler Construction 166 / 1309 🤔 Self-Study The Language of FOPL Syllogisms ▶ A syllogism is an argument-form; A way of reasoning deductively ▶ Many of the syllogisms we present were discovered by the Greek, Stoic Philosopher, Chrysippus of Soli, of the 3rd century BC ▶ Although this is a refresher tutorial with the aim of concentrating on formalization, we shall mention in passing some of the important syllogisms ▶ We present syllogisms in the form: assumption1 , assumption2 , · · · ∴ conclusion ▶ The symbol ∴ should be read as “therefore” Mayer Goldberg \ Ben-Gurion University Compiler Construction 167 / 1309 🤔 Self-Study The Language of FOPL Syllogisms — Modus Ponendo Ponens α, α→β ∴ β ▶ It’s raining; If it’s raining, John carries an umbrella =⇒ John is currently carrying an umbrella Mayer Goldberg \ Ben-Gurion University Compiler Construction 168 / 1309 🤔 Self-Study The Language of FOPL Syllogisms — Modus Tollendo Tollens ¬β, α→β ∴ ¬α ▶ John is not currently carrying an umbrella; If it’s raining, John carries an umbrella =⇒ It is currently not raining Mayer Goldberg \ Ben-Gurion University Compiler Construction 169 / 1309 🤔 Self-Study The Language of FOPL Syllogisms — Modus Tollendo Ponens ▶ Modus Tollendo Ponens is also known as Disjunctive Syllogism ¬α, α∨β ∴ β ▶ The coffee is not black; Coffee is either black, or with milk =⇒ The coffee has milk Mayer Goldberg \ Ben-Gurion University Compiler Construction 170 / 1309 🤔 Self-Study The Language of FOPL Syllogisms — Modus Ponendo Tollens α, ¬(α ∧ β) ∴ ¬β ▶ This food contains sugar; A food cannot both contain sugar and be dietetic =⇒ This food is not dietetic Mayer Goldberg \ Ben-Gurion University Compiler Construction 171 / 1309 🤔 Self-Study The Language of FOPL Syllogisms — Hypothetical Syllogism α→β ∴ ¬β → ¬α ▶ If it’s raining, John carries an umbrella =⇒ If John is not carrying an umbrella, it is not raining Mayer Goldberg \ Ben-Gurion University Compiler Construction 172 / 1309 🤔 Self-Study The Language of FOPL Syllogisms — Double Negation ¬¬α ∴ α ▶ It is not the case that this is not complicated =⇒ It is complicated ?

Compiler Construction Slides PDF

Document Details

Tags

Related

Summary

Full Transcript