CMPT 214 Programming Principles and Practice Lecture 19 PDF
Document Details
Uploaded by SumptuousBaroque
University of Saskatchewan
2024
Noah Orensa
Tags
Summary
These are lecture notes for CMPT 214 Programming Principles and Practice, covering the topic of Miscellaneous and advanced Bash tools, C, and C preprocessor. The lecture notes include reading material and examples.
Full Transcript
CMPT 214 Programming Principles and Practice Lecture 19: Miscellaneous and advanced Bash tools, C, C preprocessor Reading: Sobell: Chapter 14, 15 Kochan: Chapter 12 Kochan: Chapter 16 Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CM...
CMPT 214 Programming Principles and Practice Lecture 19: Miscellaneous and advanced Bash tools, C, C preprocessor Reading: Sobell: Chapter 14, 15 Kochan: Chapter 12 Kochan: Chapter 16 Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 1 AWK - Sobell 14 Acronym for the authors' names: Aho, Weiderhold and Kernighan Kernighan, who wrote the C book. https://arstechnica.com/gadgets/2022/08/unix-legend-who-owes-us-nothing- keeps-fixing-foundational-awk-code/ still being developed and maintained, but originally written in 1977 text parsing tool; data-driven as opposed to procedural Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 2 Principles of AWK Many C consructs conditionals, loops numeric and string variables regular expressions and relational expressions printf coprocess execution (gawk - GNU awk) gawk [options] [program] [file-list] what does this look like? awk -f Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 3 Using AWK simple programs can go right after awk with a program in quotes EX: % awk '/h/' cars What do you think this does??? Options: look at man page main idea is data manipulation via pattern matching, so similar to grep , but makes changes to the lines. pattern [action] Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 4 AWK - patterns and actions Patterns patterns are regular expressions BEGIN and END are 2 special patterns pre/post processing , specifies a range of lines within the file Actions default is to just print the line matched print type commands can be used with standard redirection | is a pipe, so outputs to another command |& is a coprocess command (2-way exchange of data) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 5 AWK - Variables/Functions/Operators Variables $0 $1- $n FILENAME, FS, NF, NR, OFS, ORS, RS Built-in Functions what would you want for manipulating lines of text??? Operators just like C Associative arrays - hash maps (this is where it came from) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 6 AWK examples awk '$1 ~ /^h/' cars awk '$2 ~ ^[tm]/ {print $3, $2, "$" $5}' cars awk -f manip.awk *.txt manip.awk BEGIN {print "starting changes"} $1 ~ /z/ {print toupper($0)} END {print "done"} Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 7 SED - Stream Editor batch editor -> edits a number of files at a time also based on pattern matching and substitution sed [-n] program [file-list] look familiar?? sed [-n] -f program-file[file-list] Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 8 SED - Editor basics [address[, address]] instruction [argument-list] for each line in the file for each line in the program/program file, if the line-number/address matches, perform the instruction on the line Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 9 SED - Examples using what we already know sed '/line/ p' lines sed -n -f subs_demo lines Contents of subs_demo s/line/sentence/p Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 10 Program portability Another use of the #define statement is that it helps to make programs more portable from one computer system to another E.g., recall the different outcomes of bitwise right shift on signed integers: logical right shift vs. arithmetic right shift One way to mitigate that is to use bitwise AND to force the leftmost bit to be 0 (logical right shift) #define INT_LEFTMOST_ZERO & 0x7FFFFFFF int x;.... (x >> 1) INT_LEFTMOST_ZERO; Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 11 Program portability However, this requires knowing the size of the integer on the target machine and whether the machine performs arithmetic right shift. E.g, on a machine that implements logical right shift, the #define could be #define INT_LEFTMOST_ZERO... and on a machine with 16-bit integers and arithmetic right shift, the #define could be #define INT_LEFTMOST_ZERO & 0x7FFF We will see later that #define statements could be added at compile time (in a makefile that detects the type of system) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 12 Macros Consider the statement that tests for leap years if (year % 4 == 0 && year % 100 != 0 || year % 400 == 0) One way to make the code more readable is to move this expression to a #define #define IS_LEAP_YEAR year % 4 == 0 && year % 100 != 0 \ || year % 400 == 0... if (IS_LEAP_YEAR) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 13 Macros The previous #define is only compatible with variables named year and will not work with any other variable names. A #define statement can take arguments! #define IS_LEAP_YEAR(y) (y % 4 == 0 && y % 100 != 0 || y % 400 == 0)... if (IS_LEAP_YEAR(year1) && IS_LEAP_YEAR(year2)) This is called a preprocessor macro Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 14 Macros It is also wise to always wrap your #define in brackets to avoid having the expression accidentally being mixed with the rest of the text in an unexpected way Consider the following: #define SQUARE(x) x * x... int x = 5, y; y = SQUARE(x + 1); SQUARE(x + 1) will be expanded as x + 1 * x + 1 This simplifies to 2x+1 when order of operations is applied We mean (x + 1) * (x + 1) or x^2 + 2x + 1 15 Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall Macros A safer way to define this macro: #define SQUARE(x) ((x) * (x))... int x = 5, y; y = SQUARE(x + 1); SQUARE(x + 1) will now be expanded as ((x + 1) * (x + 1)) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 16 Variable number of arguments to macros A macro can be defined to take a variable number of arguments. This is specified to the preprocessor by putting three dots at the end of the argument list. The remaining arguments in the list are collectively referenced in the macro definition by the special identifier __VA_ARGS__ For example, #define DEBUG_PRINTF(...) printf("DEBUG: " __VA_ARGS__) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 17 Variable number of arguments to macros DEBUG_PRINTF("Hello world!\ni = %i", i); would then be expanded to printf("DEBUG: " "Hello world!\ni = %i", i) note: the two string literals are concatenated if they appear after each other in this way "DEBUG: " "Hello world!\ni = %i" is the same as "DEBUG: Hello world!\ni = %i" Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 18 The # operator If you place a # in front of a parameter in a macro definition, the preprocessor creates a constant string out of the macro argument when the macro is invoked For example, #define STR(x) #x This causes a statement like printf(STR(Programming in C is fun.\n)); to be expanded to printf("Programming in C is fun.\n"); Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 19 The # operator The preprocessor literally inserts double quotation marks around the actual macro argument. Any double quotation marks or backslashes in the argument are preserved by the preprocessor. e.g., STR("hello") is expanded to "\"hello\"" Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 20 The # operator A more practical example: #define PRINT_INT(var) printf(#var " = %i\n", var) This cause a statement PRINT_INT(count) to be expanded to printf("count" " = %i\n", count) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 21 The ## operator This operator is used in macro definitions to join two tokens together. Suppose, for example, you have a list of variables x1 through x100. You can write a macro called PRINT_X that simply takes as its argument an integer value 1 through 100 and displays the corresponding x variable #define PRINT_X(n) PRINT_INT(x ## n) This causes the statement PRINT_X(20) to be expanded to PRINT_INT(x20) which further expands to printf("x20" " = %i\n", x20) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 22 Conditional compilation ( #ifdef / #ifndef , #elif , #else , and #endif ) Unfortunately, a program sometimes must rely on system-dependent parameters (e.g., a pathname that might be specified differently on different systems). #ifdef UNIX # define DATADIR "/uxn1/data" #else # define DATADIR "\usr\data" #endif The previous statements have the effect of defining DATADIR to "/uxn1/data" if the symbol UNIX has been previously defined and to "\usr\data" otherwise Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 23 Conditional compilation ( #ifdef / #ifndef , #elif , #else , and #endif ) To define the symbol UNIX, you could use the statement #define UNIX 1 or even #define UNIX Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 24 Conditional compilation ( #ifdef / #ifndef , #elif , #else , and #endif ) Alternatively, most compilers will also permit you to define a name to the preprocessor when the program is compiled by using a special option to the compiler command. The gcc command line gcc -D UNIX program.c defines the name UNIX to the preprocessor, causing all #ifdef UNIX statements inside program.c to evaluate as TRUE. note that the -D UNIX must be typed before the program name on the command line A value can also be assigned to the defined name on the command line. For example, gcc -D GNUDIR=/c/gnustep program.c Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 25 The conditional operator Perhaps the most unusual operator in the C language is one called the conditional operator Unlike all other operators in C, the conditional operator is a ternary operator (takes 3 operands) The two symbols that are used to denote this operator are the question mark ? and the colon : The general format of the operator is condition ? expression1 : expression2 Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 26 The conditional operator The condition is an expression, usually a relational expression, that is evaluated first whenever the conditional operator is encountered If the result of the evaluation of condition is TRUE (nonzero), then expression1 is evaluated and the result of the evaluation becomes the result of the operation. If condition evaluates FALSE (zero), then expression2 is evaluated and its result becomes the result of the operation. For example, int x, y, max;... max = x > y ? x : y; Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 27 Type qualifiers Type qualifiers can be used in front of variables to give the compiler more information about the intended use of the variable and, in some cases, to help it generate better code. Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 28 Type qualifiers: register If a function uses a particular variable heavily, you can request that access to the variable be made as fast as possible by the compiler. Typically, this means requesting that it be stored in one of the machine’s registers (you will learn about those in CMPT 215) when the function is executed. This is done by prefixing the declaration of the variable by the keyword register , as follows: register int index; register char *textPtr; Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 29 Type qualifiers: volatile The volatile qualifier is sort of the inverse to const. It tells the compiler explicitly that the specified variable will change its value. It’s included in the language to prevent the compiler from optimizing away seemingly redundant assignments to a variable, or repeated examination of a variable without its value seemingly changing. A good example is to consider an I/O port on a device (e.g., a network interface card or NIC) Suppose you have an output port to a device that’s pointed to by a variable in your program called outPort. Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 30 Type qualifiers: volatile If you want to write two characters to the port, for example an O followed by an N , you might have the following code: *outPort = 'O'; *outPort = 'N'; A smart compiler might notice two successive assignments to the same location and, because outPort isn’t being modified in between, simply remove the first assignment from the program. To prevent this from happening, you declare outPort to be a volatile pointer, as follows: volatile char *outPort; Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 31 Type qualifiers: restrict Like the register qualifier, restrict is an optimization hint for the compiler. As such, the compiler can choose to ignore it. It is used to tell the compiler that a particular pointer is the only reference (either indirect or direct) to the value it points to throughout its scope. i.e., the same value is not referenced by any other pointer or variable within that scope Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 32 Type qualifiers: restrict int * restrict intPtrA; int * restrict intPtrB; The above lines tell the compiler that, for the duration of the scope in which intPtrA and intPtrB are defined, they will never access the same value. Their use for pointing to integers inside an array, for example, is mutually exclusive. Based on this information, the compiler can optimize away statements like x = *intPtrA; *intPtrB = 2; x = *intPtrA; Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 33 Unions One of the more unusual constructs in the C programming language is the union. This construct is used mainly in more advanced programming applications in which it is necessary to store different types of data in the same storage area. For example, if you want to define a single variable called x , which could be used to store a single character, a floating-point number, or an integer: union mixed { char c; float f; int i; }; Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 34 Unions You can then declare a variable to be of type union mixed , union mixed x; Members can be accessed using the dot operator, like in structures. The declaration for a union is identical to that of a structure, except the keyword union is used where the keyword struct is otherwise specified. The real difference between structures and unions has to do with the way memory is allocated. The declaration union mixed x does not define x to contain three distinct members called c , f , and i ; rather, it defines x to contain a single member that is called either c , f , or i Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 35 Unions Because the float , char , and int members of x all exist in the same place in memory, only one value can be stored in x at a time. Furthermore, it is your responsibility to ensure that the value retrieved from a union is consistent with the way it was last stored in the union. A union can be defined to contain as many members as desired. The C compiler ensures that enough storage is allocated to accommodate the largest member of the union. Structures can be defined that contain unions. Pointers to unions can also be declared, and their syntax and rules for performing operations are the same as for structures. Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 36 Unions One of the members of a union variable can be initialized. If no member name is specified, the first member of the union is set to the specified value union mixed x = { '#' }; this sets the first member of x , which is c , to the character '#' By specifying the member name, any one member of the union can be initialized union mixed x = {.f = 123.456; }; Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 37 Unions Using unions, it is possible to implement dynamic typing. For example, struct dynamic_variable { char *name; enum symbolType type; union { int i; float f; char c; } data; }; symbolType has to be checked every time we want to read the variable to know which of i , f , or c to use Similarly, symbolType has to be updated every time we want to write the variable 38 Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall Command line arguments Command line arguments can also be passed to C programs int main(int argc, char **argv) {... } or int main(int argc, char *argv[]) {... } Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 39 Command line arguments argc (argument count) holds the number of command line arguments passed to the program argv (argument vector) is an array of char * values, each pointing to a string. argv points to the first argument argv points to the first argument... argv[argc - 1] points to the last argument argv is a special argument that shows how the program was called, similar to $0 in BASH Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 40 Things you should probably??? never do Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 41 The goto statement (it depends) Execution of a goto statement causes a direct branch to be made to a specified point in the program This branch is made immediately and unconditionally upon execution of the goto To identify where in the program the branch is to be made, a label is needed A label is a name that is formed with the same rules as variable names and must be immediately followed by a colon The label is placed directly before the statement to which the branch is to be made and must appear in the same function as the goto Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 42 The goto statement For example: goto out_of_data; The above statement causes the program to branch immediately to the statement that is preceded by the label out_of_data This label can be located anywhere in the function, before or after the goto, and might be used as shown: out_of_data: printf ("Unexpected end of data.\n");... Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 43 The goto statement Avoid using the goto statement as much as possible Programmers who are undisciplined frequently abuse the goto statement to branch to other portions of their code The goto statement interrupts the normal sequential flow of a program. Using many goto s in a program can make it impossible to decipher. This leads to “spaghetti code” Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 44 The null statement C permits a solitary semicolon to be placed wherever a normal program statement can appear The effect of such a statement, known as the null statement, is that nothing is done Although this might seem useless, it is often used by C programmers in while , for , and do loops We have seen an example of a null statement: while (loop-condition); Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 45 The null statement Other examples: skipping the initialization in a for loop for (; loop-condition; loop-statement) skipping the loop statement in a for loop for (initial-statement; loop-condition;) infinite loop for (;;) Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 46 The comma operator The comma operator can be used to separate multiple expressions anywhere that a valid C expression can be used The expressions are evaluated from left to right For example, for (i = 0, j = 100; i != 10; ++i, j -= 10)... The initial statement is i = 0, j = 100 , and the loop statement is ++i, j -= 10 Because all operators in C produce a value, the value of the comma operator is that of the rightmost expression. Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 47 The comma operator Another example, while (i < 100) sum += data[i], ++i; note that the curly braces are not needed since sum += data[i], ++i is one statement Note that a comma, used to separate arguments in a function call, or variable names in a list of declarations, for example, is a separate syntactic entity and is not an example of the use of the comma operator. Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 48 Next assorted principles topics Review Original Slides Noah Orensa Modified by Jon Lovering/Lauresa Stilling/Alexander Dumais/Dwight Makaroff CMPT 214 2024 Fall 49