CIS217 Concepts of Programming Languages Chapter 6 PDF
Document Details
Uploaded by AccomplishedUniverse8955
Tags
Summary
This document is a chapter on data types from a course on programming languages. It covers various concepts like primitive data types, character strings, arrays, and more. It provides an overview of the different types of data that are used in programming, including numeric, character-based, and logical data types, as well as how they are structured and used in programs.
Full Transcript
CIS217 CONCEPTS OF PROGRAMMING LANGUAGES CHAPTER-6 Data Types Outline 6.1 Introduction 6.2 Primitive Data Types 6.3 Character String Types 6.4 Enumeration Types 6.5 Array Types 6.6 Associative Arrays 6.7 Record Types 6.8 Tuple Types...
CIS217 CONCEPTS OF PROGRAMMING LANGUAGES CHAPTER-6 Data Types Outline 6.1 Introduction 6.2 Primitive Data Types 6.3 Character String Types 6.4 Enumeration Types 6.5 Array Types 6.6 Associative Arrays 6.7 Record Types 6.8 Tuple Types 6.9 List Types 6.11 Pointer and Reference Types 6.12 Optional Types 6.13 Type Checking Summary 6.1 Introduction A data type defines a collection of data values and a set of predefined operations on those values. Computer programs produce results by manipulating data. A descriptor is the collection of the attributes of a variable. In an implementation a descriptor is a collection of memory cells that store variable attributes. If the attributes are static, descriptor are required only at compile time. These descriptors are built by the compiler, usually as a part of the symbol table, and are used during compilation. For dynamic attributes, part or all the descriptor must be maintained during execution. Descriptors are used for type checking and by allocation and deallocation operations. 6.2 Primitive Data Types Those not defined in terms of other data types are called primitive data types. Almost all programming languages provide a set of primitive data types. The primitive data types of a language are used, along with one or more type constructors. 6.2.1 Numeric Types Integer – The most common primitive numeric data type is integer. – The hardware of many computers supports several sizes of integers. – Java includes four signed integer sizes: byte, short, int, and long. – C++ and C#, include unsigned integer types, which are types for integer values without sings. Floating-point – Model real numbers, but only as approximations for most real values. – Languages for scientific use support at least two floating-point types; sometimes more (e.g. float, and double.) – The collection of values that can be represented by a floating-point type is defined in terms of precision and range. Precision: is the accuracy of the fractional part of a value, measured as the number of bits. Figure below shows single and double precision. Range: is the range of fractions and exponents. 6.2.1 Numeric Types Complex – Some languages support a complex type: e.g., Fortran and Python – Each value consists of two floats: the real part and the imaginary part – Literal form (in Python): (7 + 3j) where 7 is the real part and 3 is the imaginary part Decimal – Most larger computers that are designed to support business applications have hardware support for decimal data types – Decimal types store a fixed number of decimal digits, with the decimal point at a fixed position in the value – Advantage: accuracy of decimal values – Disadvantages: limited range since no exponents are allowed, and its representation wastes memory 6.2.2 Boolean Types Boolean Types – They are used to represent switched and flags in programs – The use of Booleans enhances readability – Range of values: two elements, one for “true” and one for “false” – One popular exception is C89, in which numeric expressions are used as conditionals. In such expressions, all operands with nonzero values are considered true, and zero is considered false – A Boolean value could be represented by a single bit, but often statured in the smallest efficiently addressable cell of memory, typically a byte 6.2.3 Character Types Char types are stored as numeric coding (ASCII / Unicode) – Traditionally, the most commonly used coding was the 8-bit code ASCII (American Standard Code for Information Interchange) – An alternative, 16-bit coding: Unicode (UCS-2) – Java was the first widely used language to use the Unicode character set. Since then, it has found its way into JavaScript, Python, Perl, C# and F# – After 1991, the Unicode Consortium, in cooperation with the International Standards Organization (ISO), developed a 4-byte character code named UCS-4 or UTF-32, which is described in the ISO/IEC 10646 Standard, published in 2000 6.3 Character String Types A character string type is one in which values are sequences of characters 6.3.1 Design Issues – The two most important design issues: – Is it a primitive type or just a special kind of array? – Is the length of objects static or dynamic? 6.3.1 String and Their Operations – Assignment – Comparison (=, >, etc.) – Catenation – Substring reference – Pattern matching C and C++ use char arrays to store char strings and provide a collection of string operations through a standard library whose header is string.h 6.3.2 String Length Options – Static Length String: The length can be static and set when the string is created. This is the choice for the immutable objects of Java’s String class as well as similar classes in the C++ standard class library and the.NET class library available to C# and F# – Limited Dynamic Length Strings: allow strings to have varying length up to a declared and fixed maximum set by the variable’s definition, as exemplified by the strings in C – Dynamic Length Strings: Allows strings various length with no maximum. This option requires the overhead of dynamic storage allocation and deallocation but provides flexibility. Ex: Perl and JavaScript 6.4 Enumeration Types All possible values, which are named constants, are provided, or enumerated, in the definition Enumeration types provide a way of defining and grouping collections of named constants, which are called enumeration constants – C# example enum days {mon, tue, wed, thu, fri, sat, sun}; – The enumeration constants are typically implicitly assigned the integer values, 0, 1, …, but can explicitly assigned any integer literal in the type’s definition 6.5 Array Types An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element. The individual data elements of an array are of the same type. References to individual array elements are specified using subscription expressions. If any of the subscript expressions in a reference include variables, then the reference will require an addition run-time calculation to determine the address of the memory location being referenced. 6.5.1 Arrays and Indices Indexing (or subscripting) is a mapping from indices to elements. The mapping can be shown as: array_name (index_value_list) → an element C-based languages use [ ] to delimit array indices. Two distinct types are involved in an array type: – The element type, and – The type of the subscripts. The type of the subscript is often a sub-range of integers. 6.5.2 Subscript Bindings and Array Categories The binding of subscript type to an array variable is usually static, but the subscript value ranges are sometimes dynamically bound. Categories of arrays – There are four categories of arrays, based on the binding to subscript ranges, the binding to storage, and rom where the storage is allocated. Static Array is one in which the subscript ranges are statically bound, and storage allocation is static (done before run time). A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but the allocation is done at elaboration time during execution. A fixed heap-dynamic array is similar to fixed stack-dynamic in which the subscript ranges are dynamically bound, and the storage allocation is dynamic, but they are both fixed after storage is allocated (i.e., binding is done when requested and storage is allocated from heap, not stack) A heap-dynamic array is one in which the subscript ranges are dynamically bound, and the storage allocation is dynamic, and can change any number of times during the array’s lifetime. 6.5.3 Array Initialization Some language allow initialization at the time of storage allocation. – Usually just a list of values that are put in the array in the order in which the array elements are stored in memory. C, C++, Java, and C# allow initialization of their arrays. Consider the following C declaration: int list [] = {4, 5, 7, 83} – The compiler sets the length of the array. Character Strings in C & C++ are implemented as arrays of char. These arrays can be initialized to string constants, as in char name [] = “Freddie”; //how many elements in array name? – The array will have 8 elements because all strings are terminated with a null character(zero), which is implicitly supplied by system for string constants. Arrays of strings in C and C++ can also be initialized with string literals. For example, char *names [] = {″Bob″, ″Jake″, ″Darcie″]; 6.5.4 Array Operations The most common array operations are assignment, catenation, comparison for equality and inequality, and slices. The C-based languages do not provide any array operations, except thought methods of Java, C++, and C#. Perl supports array assignments but does not support comparisons. Python’s arrays are called lists, although they have all the characteristics of dynamic arrays. Because the objects can be of any types, these arrays are heterogeneous. Python’s array assignments, but they are only reference changes. Python also supports array catenation and element membership operations Ruby also provides array catenation APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements) 6.5.5 Slices A slice of an array is some substructure of an array. It is a mechanism for referencing part of an array as a unit. If arrays cannot be manipulated as units in a language, that has no use for slices. Slices are only useful in languages that have array operations. Python vector = [2, 4, 6, 8, 10, 12, 14, 16] mat = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] vector (3:6) is a three-element array, which is [8, 10, 12] mat[0:2] is the first and second element of the first row of mat, which is [1, 2] 6.6 Associative Arrays An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. – So each element of an associative array is in fact a pair of entities, a key and a value. Associative arrays are supported by the standard class libraries of Java, C++, C#, and F#. – Example: In Perl, associative arrays are often called hashes. Names begin with %; literals are delimited by parentheses. Hashes can be set to literal values with assignment statement, as in – Subscripting is done using braces and keys. So an assignment of 58850 to the element of %salaries with the key “Perry” would appear as follows: Python’s associative arrays, which are called dictionaries, are similar to those of Perl, except the values are all reference to objects. PHP’s arrays are both normal arrays and associative array. 6.7 Record Types A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by names. In C, C++, and C#, records are supported with the struct data type. In C++, structures are a minor variation on classes. 6.7.1 Definitions of Records The fundamental difference between a record and an array is that record elements, or fields, are not referenced by indices. Instead, the fields are named with identifier, and references to the fields are made using these identifiers. The COBOL form of a record declaration, which is part of the data division of a COBOL program, is illustrated in the following example: COBOL uses level numbers to show nested records; others use recursive definition – The numbers 01, 02, and 05 that begin the lines of the record declaration are level numbers, which indicate by their relative values the hierarchical structure of the record. – The PICTURE clauses show the formats of the field storage locations, with X(20) specifying 20 alphanumeric characters and 99V99 specifying four decimal digits with decimal point in the middle. 6.8 Tuple Types A tuple is a data type that is similar to a record, except that the elements are not named Python – Closely related to its lists, but tuples are immutable – If a tuple needs to be changed, it can be converted to an array with the list function – Create with a tuple literal – Note that the elements of a tuple need not be of the same type – The elements of a tuple can be referenced with indexing in brackets, as in the following: – Tuple can be catenated with the plus (+) operator – They can be deleted with del statement 6.9 List Types Lists were first supported in the first functional programming language. Python Lists – The list data type also serves as Python’s arrays – Unlike Scheme, Common Lisp, ML, and F#, Python’s lists are mutable – Elements can be of any type – Create a list with an assignment – List elements are referenced with subscripting, with indices beginning at zero – List elements can be deleted with del 6.10 Pointer and Reference Types A pointer type in which the variables have a range of values that consists of memory addresses and a special value, nil. The value nil is not a valid address and is used to indicate that a pointer cannot currently be used to reference any memory cell. Pointers are designed for two distinct kinds of uses – Provide the power of indirect addressing – Provide a way to manage dynamic memory. A pointer can be used to access a location in the area where storage is dynamically created (usually called a heap) 6.10.1 Pointer Operations A pointer type usually includes two fundamental pointer operations, assignment and dereferencing. Assignment sets a pointer var’s value to some useful address. Dereferencing takes a reference through one level of indirection – In C++, dereferencing is explicitly specified with the (*) as a prefix unary operation. – – If ptr is a pointer var with the value 7080, and the cell whose address is 7080 has the value 206, then the assignment 6.11 Type Checking Type checking is the activity of ensuring that the operands of an operator are of compatible types. A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler-generated code, to a legal type. This automatic conversion is called a coercion. – Ex: an int variable and a float variable are added in Java, the value of the int variable is coerced to float and a floating-point is performed. A type error is the application of an operator to an operand of an inappropriate type. – Ex: in C, if an int value was passed to a function that expected a float value, a type error would occur (compilers did not check the types of parameters) If all type bindings are static, nearly all type checking can be static. If type bindings are dynamic, type checking must be dynamic and done at run-time. Summary A data type defines a collection of data values and a set of predefined operations on those values The primitive data types of most imperative languages include numeric, character, and Boolean types The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programs An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. – Each element of an associative array is in fact a pair of entities, a key and a value There are four categories of arrays, based on the binding to subscript ranges, the binding to storage, and rom where the storage is allocated. – Static array: as in C++ array whose definition includes the static specifier – Fixed stack-dynamic array: as in C function (without the static specifier) – Fixed heap-dynamic array: as with Java’s objects – Heap dynamic array: as in Java’s ArrayList and C#'s List Summary A data type defines a collection of data values and a set of predefined operations on those values The primitive data types of most imperative languages include numeric, character, and Boolean types The user-defined enumeration and subrange types are convenient and add to the readability and reliability of programs An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element An associative array is an unordered collection of data elements that are indexed by an equal number of values called keys. – Each element of an associative array is in fact a pair of entities, a key and a value There are four categories of arrays, based on the binding to subscript ranges, the binding to storage, and rom where the storage is allocated. – Static array: as in C++ array whose definition includes the static specifier – Fixed stack-dynamic array: as in C function (without the static specifier) – Fixed heap-dynamic array: as with Java’s objects – Heap dynamic array: as in Java’s ArrayList and C#'s List Tuples are similar to records, but do not have names for their constituent parts. They are part of Python, ML, and F#. Python’s tuples are closed to lists, but immutable Lists are staples of the functional programming languages, but are now also included in Python and C#. Python’s lists are mutable A pointer type in which the variables have a range of values that consists of memory addresses and a special value, nil – Pointers are used for addressing flexibility and to control dynamic storage management – Pointers have some inherent dangers: dangling pointers are difficult to avoid, and memory leakage can occur Reference types, such as those in Java and C#, provide heap management without the danger of pointers Strong typing is the concept of requiring that all type errors be detected The type equivalence rules of a language determine what operations are legal among the structured types of a language