Summary

This document provides a summary of programming languages, covering topics such as Language Design Issues, Machine Architecture Impact, and Language Translation Issues. It also discusses different language paradigms and how different languages are implemented.

Full Transcript

**Programming Languages** **Chapter 1 -- Language Design Issues** Why study programming languages? - To improve ability of developing effective algorithms - To improve the use and understand of programming languages - To increase vocabulary of useful programming constructs - To make it...

**Programming Languages** **Chapter 1 -- Language Design Issues** Why study programming languages? - To improve ability of developing effective algorithms - To improve the use and understand of programming languages - To increase vocabulary of useful programming constructs - To make it easier to learn a new programming language - To make it easier to design a new programming language - To allow better choice of programming languages - For apps requiring numerical calculations: C, Fortran, Ada - For apps related to AI: Python, Lisp, ML, Prolog - For internet apps: Perl, Java Syntax and Semantics of a Programming Language - Syntax of a language means how the program code looks like - Semantics of a language is the meaning given to various syntactic constructs Language Paradigms - Imperative Languages - Are languages that rely on statements to code the program - Examples are: C++, Pascal, Basic - Applicative Languages (Functional Languages) - Are languages that rely on functions to code the program - Examples are: Lisp, Ml - Rule-based Languages - Are languages that rely on conditions to code the program - Object-Oriented Programming Languages - Are languages that rely on Object-Oriented structure to code the program - Some core object-oriented features - Classes, Objects, Abstraction, Encapsulation, Polymorphism, Inheritance Spaghetti Code vs Structures Code - Spaghetti Code - Is unorganized unstructured code that is hard to understand and maintain - Structured Code - Is the opposite of spaghetti code. An organized and structured easy to maintain code **Chapter 2 -- Impact of Machine Architecture** The Definition of a Computer - An integrated set of algorithms and data structures capable of storing and executing programs Organization of a Computer Hardware-level Operations - Fetch-execute Cycle - The processor fetches an instruction from memory - The processor then executes the fetched instruction - Instruction Cycle - Processing that is required for a single instruction ![](media/image2.png) According to the figure above, the program works in the following steps: 1. Fetch instructions (1 + 2) 2. Decode instructions to find the operation (ex: +) and the operands (ex: 1, 2) 3. Fetch designated operands (1 and 2) 4. Branch to designated operation (1 + 2) 5. Execute primitive operation (execute 1 + 2) 6. Keep executing until all instructions completed 7. Stop program Fetch-Execute Cycle - Program Counter - Contains the address of the next instruction to be executed - It keeps tracks of instructions to be executed - Instruction Register - The fetched instruction is loaded and stored in the instruction register - Program counter is incremented to the next instruction after previous one is fetched into the instruction register - Processor - The processor interprets the bits stored in the IR (Instruction Register) and performs the required action Language Implementation Structure Binding and Binding Time - Definition of Binding - Binding is fixing a feature to have a specific value among a set of possible values - Binding Time Classes - Languages Definition - Is when the language is first designed and created - Example: Binding data types to the language - Example: Binding language operators to operations - Language Implementation - Is when the language is implemented on different operating systems - Example: Number representation and arithmetic operations - Translation - Is when the programmer codes the program - Choices: - By the programmer: variable types and assignments - By the compiler: relative locations of variables and arrays - By the loader: absolute locations - Execution - Is when the program in run and arguments are copied to parameter locations - Importance of Binding Times - Some languages perform early binding, some perform late binding - If done at translation time: **efficient** - If done at execution time: **flexible** - Example: Bindings for X = X + 10 - Set possible types for variable X - Set the type of variable X - Set possible values for variable X - Set value for variable X - Set representation of the constant 10 - Set the properties of the operator + **Chapter 3 -- Language Translation Issues** Programming Syntax - The arrangement of words as elements in a sentence to show their relationship - The sequence of symbols that make up valid programs General Syntactic Criteria - To provide a notation for communication between the programmer and the programming language processor - Readability - Writability - Ease of verifiability - Ease of translation - Lack of ambiguity - Readability and Writability - **Readability** of a program means how easy is it to read and understand a program written in a particular programming language - for i in range (1, 10): print(i) - car(cdr(abc(de))f) - Consider the difference between these two lines of code, the first one would be more understandable than the second one, therefore more readable - **Writability** of a program means how easy is it to write a program in a particular programming language - Some languages could be more difficult than others in terms of writing code. Take Python and C for example, surely, Python is far more writable than C - Sometimes, a more readable language is a less writable one, and vice versa Syntactic Elements of a Language - Character Set (ASCII) - A set of allowed characters in a language, such as letters and digits - An example of such sets is the ASCII set, a widely common and used set of characters - The choice of character sets is important in determining the type of I/O (Input/Output) devices needed in implementing the language - Identifiers (Variable and Function names) - An identifier is a string of letters and digits beginning with a letter - It is used to identify a specific variable or function. Languages may differ in the criteria of identifiers, as some may allow optional characters such as. or -- in identifiers - Some languages may even have length restriction, as the BASIC language restriction to a single letter and digit - Operator Symbols (+, -, PLUS, MINUS, etc..) - As in the symbols used for operations, like + for addition or -- for subtraction - Different language may use different operator symbols, such as lisp which uses PLUS for addition, TIMES for multiplication, and so on, or Fortran which uses EQ for equality and \*\* for exponentiation. - Keywords and Reserved Words (if, else if, elif) - A keyword is a special, fixed word reserved for special operations that the compiler understands - An example is the *if* keyword for condition statements. Different languages could have different keywords, such as *else if* in C or C++ and *elif* in Python - Such keywords and reserved words should not be used as identifiers across the program code - Noise Words (Optional keywords in statements, like TO in GO TO) - Are optional words that can be used or omitted in specific language statements. - An example is the GO TO statement in the COBOL language, in such example, GO is required while TO is optional - Such words are added to improve readability of programs - Comments (// for C single line comment, or ! for Fortran, /\*\*/ for C multiline comments) - Comments are lines of code that are omitted by the compiler. Such lines could be added to the code as documentation, explanation, or labeling to make the code understandable - Different languages could have different notations for declaring comments, C and C++ could use // for single line comments, while Python uses \# for single line comments, and Fortran uses ! for single line comments - Multiline comments could be done as well to create comments that extend for multiple lines. C and C++ use /\* to indicate the beginning of a multiline comment, and \*/ to indicate the end of it - Blanks (Empty spaces) - Blanks are basically empty spaces. Different language could have different rules regarding empty spaces in the code - In C, spaces are not significant in most cases except in string literals and operators such as +=. But spaces could be a significant syntax operator in other languages, Python for example, where indentation (spaces) specifies the scope of a function or a loop - Delimiters and Brackets ({} for defining scopes in functions or loops for example) - A delimiter is a syntactic element that marks the beginning or end of some syntactic unit as a statement or expression - An example is the use of brackets in C or C++ to specify the scope of a function or a loop. Other languages could use the words BEGIN and END instead of brackets as delimiters - Free and Fixed Field Formats (whether statement positioning matters or not) - A free-field format syntax is when language statements may be written anywhere on an input line without regard for positioning on the line or for breaks between lines - A fixed-field format syntax is when the positioning on an input line matters. Such syntax is most often seen in assembly languages. Such syntax is increasingly rare today, and free-syntax is the norm - Expressions (functions that return a value in programming languages) - Are functions that access data objects in a program and return some value - In imperative languages (C), expressions form the basic operations that allow for the machine state to change by each statement - In applicative languages (ML, LISP), expressions form the basic sequence control that drives the program execution - Statements (such as if statement) - The most prominent syntactic component in imperative languages. It has a critical effect on the overall readability and writability of a language Stages in Translation - Lexical Analysis (Scanning) - ![](media/image4.png)The phase of grouping code characters into categories: identifiers, delimiters, operator symbols, numbers, keywords, noise words, blanks, comments, etc.. - The output of this phase is lexical items (tokens) - Lexical analyzer (scanner) reads the program input and breaks it down to lexical items called lexemes and feeds them into later translation stages - Lexical analyzer identifies the type of each lexeme (number, identifier, delimiter, operator, etc..) and attaches a type tag to it - Example: x := x + 10\ tokens are: x := x + x ; - Syntactic (Syntax) Analysis - Syntactic analyzer identifies a sequence of lexical items that form a syntactic unit such as an expression, statement, declaration, etc... - Then the semantic analyzer is called to process this syntactic unit - Semantic Analysis - The central phase of translation - In the semantic analyzer, syntactic items from the syntactic analyzer are processed, and the structure of the executable object code begins to take shape - Elements in the semantic analysis: - Symbol-table Maintenance: A table that contains data and attributes about an identifier, such as variables, arrays, or functions - Insertion of Implicit Information: The translator inserts implicit information into the program, such as type information and default values for function parameters - Error Detection: The translator detects and reports any semantic errors found in the program - Macro Processing: The translator expands macros in the program, which are text substitutions that are used to simplify code reuse **Chapter 5 -- Elementary Data Types** Data Objects - A location in the memory with an assigned name in the actual computer - Types of Data Objects - Programmer defined data objects - Systems defined data objects Data Values - A bit pattern that is recognized by the computer - Elementary Data Object: Contains a data value that is manipulated as a unit - Data Structure: A combination of data objects Bindings about Data Objects - Type - Determines the set of possible data values that the object may take and the applicable operations of that data value - Name - The binding of a name to a data object - Component - The binding of a data object to one or more data objects - Location - The storage location of the data object in the memory - Storage location is assigned by the system - Value - The assignment of a bit pattern to a name Variables and Constants - In programs, data objects are represented as variables and constants - **Variables**: Data objects defined and named by the programmer explicitly - **Constants**: Data objects with a name that is permanently bound to a value for its lifetime - **Literals**: Constants whose name is the written representation of their value - **Persistence**: Existence of data beyond run time Data Types - A data type is a class of data objects with a set of possible operations for creating and manipulating them - Examples (Elementary Data Types) - Integer, real, character, boolean, enum, pointer - Specification of a Data Type - Attributes - Distinguish data objects of a given type - Approaches: - Stored in a descriptor and used during program execution - Used only to determine the storage representation, and not used explicitly during program execution - Values - The data type determines the values that a data object may have - Usually an ordered set, it has a least and a greatest value - Operations - Define the possible manipulations of data objects of that type - Types of operations: - **Primitive**: specified as part of the language definition - **Programmer-defined**: as functions or class methods - An operation is defined by: - Domain: set of possible input arguments - Range: set of possible results - Action: how the result is produced - Signature - Specifies the domain and the range - The number, order, and data types of the arguments in the domain - The number, order, and data types of the resulting range - Implementation of a data type: - When implementing a data type, two things must be considered - Storage representation - Implementation of operations - Hardware operation: integer addition operation - Function: square root operation - In-line code: Instead of using a function, the code is copied into the program at the point where the function would have been invoked - Declarations - Information about the name and data type of data objects needed during execution - Types of declarations: - **Explicit**: Programmer defined - **Implicit**: System defined - Declaration Examples: - FORTRAN: the first letter in the variable name determines the data type - Perl: the variable is declared by assigning a value - Declaration of Operations: - Programmer-defined prototypes of functions - Examples: - Declaration: float sub(int, float) - Signature: sub: int \* float -\> float - Purpose of Declaration: - Choice of storage representation - Storage management - Polymorphic operations - Static Type checking - Type Checking - Checking that each operation executed by a program receives the proper number of arguments of the proper data type - Static Type Checking: done at compilation - Dynamic Type Checking: done at runtime - Disadvantages of Dynamic Type Checking - Time consuming - Additional memory is required (Type Descriptor) - Difficulty in debugging - Strong Typing & Type Inference - Strong Typing: all type errors can be statically checked - Type Inference: Implicit data types, used if the interpretation is unambitious - Type Conversion - Coercion: implicit type conversion, performed by the system - Explicit Conversion: routines to change from one data type to another - Coercion - Programming languages may follow two approaches: - No coercion: any type mismatch is considered an error (Pascal, Ada) - Coercions: only if no implicit conversion is possible, error is reported (C, C++) - Advantages - Free the programmer from some low-level concerns - Disadvantages: - May hide serious programming errors - Assignments - Assignment is the basic operation for changing the binding of a value to a data object - It can be defined using two concepts: - L-Value: Location for an object - R-Value: Contents of that location - Example: - A = A + B - Pick up contents of location A: R-value of A - Add contents of location B: R-value of B - Store results into address A: L-value of A - Initialization - Uninitialized data object is a data object that has been created but with no assigned value. For example, an allocation of a storage block - Could be implicit or explicit - Elementary Data Types - Types of Elementary Data Types - Scalar Data Types - Numerical Data Types - Integers - Subranges - Floating-point real numbers - Fixed-point real numbers - Other Data Types - Complex numbers - Software simulated with two storage locations: one for the real portion and one for the imaginary portion - Rational numbers - The quotient of two integers - Enumerations - Ordered list of different values - Booleans - True or False - Characters - A single character - Composite Data Types - Characterized by a complex data structure organization, processed by the compiler - Character strings - Lists - Pointers - Programmer-constructed objects - Files - Integers \[Scalar - Numerical\] - Specification: Has specific maximal and minimal values - Implementation is hardware defined - Operations: - Arithmetic - Relational - Assignment - Bit Operations - Subranges \[Scalar - Numerical\] - Specification: A subtype of integers - A sequence of integer values within some restricted range - Implementation: has smaller storage requirements and better type checking - Example: Pascal declaration A: 1..10 means that the variable may be assigned integer values from 1 through 10 - Floating-point Real Numbers \[Scalar - Numerical\] - Specification: Has minimal and maximal values - May introduce roundoff issues: the check for equality may fail due to roundoff errors - Implementation - Defined by mantissa and exponent - Example: - 10.5 = 0.105 x 10\^2 - Mantissa is 105 and exponent is 2 - Fixed-point Real Numbers \[Scalar - Numerical\] - Specification: Real numbers with predefined decimal places - Implementation: Directly supported by hardware or simulated by software - Examples: - COBOL : x picture 99v999 - PL/l : fixed decimal(10,2) - Enumerations \[Scalar\] - Implementation: Represented during runtime as integers, corresponding to the listed values - Example: - enum StudentClass { Fresh, Soph, Junior, Senior } - The variable StudentClass may accept only one of the four listed values - Booleans \[Scalar\] - Specification: two possible values: true or false - Basic Operations: - And (&&) -- Or (\|\|) -- Not (!) - Implementation: Requires a single addressable unit such as byte or word - Use a particular bit for the value. Eg. 1 for true, 0 for false - Use the entire storage: zero value for false, otherwise true - Characters \[Scalar\] - Specification: Single character as a value of a data object - Operations: - Relational - Assignment - Testing the type of the character (digit, letter, symbol) - Implementation: supported by the underlying hardware - Character Strings \[Composite\] - Specification: - Fixed declared length - Storage allocation at translation time - Strings longer than the declared length are truncated - Implemented as a packed vector of characters - Variable length to a declared bound - Storage allocation at translation time - Length upper bound is set and any string longer than the upper bound is truncated - Implemented with a descriptor that contains the max length and the current length - Unbounded length - Storage allocation at run time - String can be of any length - Terminated by a null character - Two possible implementations: - Implemented as a linked storage of fixed-length data objects - Implemented as a continuous array of characters with dynamic runtime storage allocation - Operations: - Concatenation: joining two strings together - Relational Operations: == \< \> \= - Substring selection using positional subscripts - Substring selection using pattern matching - Input/output formatting - Dynamic Strings: evaluated at runtime - Pointers & Programmer-Constructed Objects - Specification: - Reference data objects only of a single type (C, Pascal, Ada) - Reference data objects of any type (Smalltalk) - C/C++ : pointers are data objects and can be manipulated by the program - Java : pointers are hidden data structures, managed by the language implementation - Implementation: - Absolute addresses - Stored in the pointer - Allows for storing the new objects anywhere in the memory - Relative addresses - Offset with respect to some base address - Advantage: the entire block can be moved to another location without invalidating the addresses in the pointers since they are relative, not absolute - Implementation Problems - Management of a general heap storage area: when creating objects of different sizes - Garbage: where the pointer is destroyed but the object still exists - Dangling reference: the object is destroyed but the pointer still contains the address of the used location and can be wrongly used by the program - Garbage Example: - Int \*a = new int\[10\]; - Int \*b = new int; - a = b; // dynamic array -\> garbage - Dangling Reference Example: - Int \*p = new int; - \*p = 15; - delete p; - int \*q = new int; - \*q = 24; - \*p = 67; // dangling reference occurred - Files - Characteristics: - Usually reside on secondary storage devices as disks, tapes - Lifetime is greater than the lifetime of the program that has created the files - Implementation: as a part of the operating system **Chapter 6 -- Encapsulation** - Structured Data Types (SDT) - A data structure is a data object that contains other data objects as its elements of components - Specifications: - Number of components and size - Fixed Size: Arrays - Variable Size: Stacks, Lists, Queues - Type of each component - Homogeneous: all components are the same type - Heterogeneous: components are of different types - Selection Mechanism - Identify components through indexes or pointers - Two-step process: - Reference the structure - Selection of a particular component - Maximum number of components - var s1: set of 1..20 - s1 := \[2,5,19\] - Organization of components - struct time { - int hour; - Int minute; - Int second; - } - time a\[100\] - Organization of SDT - Simple linear sequence: arrays, stacks, lists - Multidimensional structures - Separate types (Fortran) - A vector of vectors (C++) - Operations: - Component Selection operations - Sequential - Random - Insertion/Deletion of components - Whole-data structure operations - Creation/Destruction operations - Implementation: - Implementation of Storage Representation - Storage of components - Optional descriptor that contains some or all of the attributes - Types - Sequential Representation - The data structure is stored in a single contiguous block of storage that includes both descriptor and components - Usually used for fixed-size structures - Linked Representation - The data structure is stored in several noncontiguous blocks of storage, linked together through pointers - Used for variable-size structured data types like trees and lists - Implementation of Operations - Component selection in sequential representation - Base address plus offset calculation - Add component size to the current location to move to the next component - Component selection in linked representation - Follow the chain of pointers - Storage Management - Two Central Problems - Garbage - Data objects is bound but access path is destroyed - Memory cannot be unbound - Dangling Reference - Data object is destroyed but access path still exists - Declaration & Type Checking for SDT - What is to be checked - Existence of a selected component - Type of a selected component - Vectors and Arrays - Terms - Vector: one dimensional array - Matrix: two-dimensional array - Multidimensional arrays - Array slices - Associate Arrays - Specification of Arrays - Number of dimensions - Type of components - Number of components - Index range of dimensions - Implementation of Array Operations - Access: can be implemented efficiently if the length of the components of the array is known at compilation time - The address of each selected element can be computed using an arithmetic expression - Whole array operations: copying an array (may require much memory) - Multidimensional Arrays - Storage Representation - Sequential - Row Major Order - Column Major Order - Linked - Row Major Order - Column Major Order - Records - Specification - Attributes - Number of components - Type of each component - Names to be used for selecting components - Operations - Selection - Assignment - Implementation - Lists - Specification - Attributes - Number of components - Type of components - Lists can be homogeneous or heterogeneous - Sets - Specification - Number of components - Maximum number of components - Type of Components - Operations - Membership - Union - Intersection - Subtract - Subset - Implementation - Bit-string representation of sets - Hash-coded representation of sets - Subprograms - Subprogram is a mathematical function that maps each particular set of arguments into a particular set of results - Specification - Name of the subprogram - Signature of the subprogram: arguments, results - Action performed by the subprogram - Math Functions VS Programming Functions - Implicit arguments in the form of non-local variables - Implicit results (changes in non-local variables) - History sensitiveness (results may depend on previous executions) - Error conditions - Implementation - Uses the data structures and operations provided by the language - Defined by the subprogram body - Local data declarations - Statements defining the actions over the data - Interface with the user: arguments and returned result - Approaches of Implementation of Subprograms - Simple (inefficient approach): each time the subprogram is invoked, a copy of its executable statements, constants, and local variables is created - A better approach: The executable statements and constants are invariant parts of the subprogram; therefore, they do not need to be copied for each execution of the subprogram - Subprogram Definition & Activation - Subprogram Definition: The set of statements constituting the body of the subprogram - Static Property: The only information available during translation - Subprogram Activation: A data structure (record) created upon invoking the subprogram. It exists while the subprogram is being executed. After that the activation record is destroyed - Subprograms Implementation - Static Part (Code Segment) - A single copy is used for all activations of the subprogram. This copy is called code segment. This is the static part. - Dynamic Part (Activation Record) - The activation record contains only the parameters, results, and local data. This is the dynamic part. It has same structure but different values for the variables

Use Quizgecko on...
Browser
Browser