The C Programming Language PDF

Summary

This book is a tutorial introduction to C programming for beginners. It covers various fundamental concepts, including variables, loops, functions, and arrays. This textbook is beneficial for those who want to learn the fundamental concepts of programming in C.

Full Transcript

1 2 Preface........................................................................................................................................ 6 Prefa...

1 2 Preface........................................................................................................................................ 6 Preface to the first edition.......................................................................................................... 8 Chapter 1 - A Tutorial Introduction........................................................................................... 9 1.1 Getting Started.................................................................................................................. 9 1.2 Variables and Arithmetic Expressions........................................................................... 11 1.3 The for statement............................................................................................................ 16 1.4 Symbolic Constants........................................................................................................ 17 1.5 Character Input and Output............................................................................................ 18 1.5.1 File Copying............................................................................................................ 18 1.5.2 Character Counting................................................................................................. 20 1.5.3 Line Counting.......................................................................................................... 21 1.5.4 Word Counting........................................................................................................ 22 1.6 Arrays............................................................................................................................. 23 1.7 Functions........................................................................................................................ 25 1.8 Arguments - Call by Value............................................................................................. 28 1.9 Character Arrays............................................................................................................ 29 1.10 External Variables and Scope...................................................................................... 31 Chapter 2 - Types, Operators and Expressions........................................................................ 35 2.1 Variable Names.............................................................................................................. 35 2.2 Data Types and Sizes..................................................................................................... 35 2.3 Constants........................................................................................................................ 36 2.4 Declarations.................................................................................................................... 39 2.5 Arithmetic Operators...................................................................................................... 40 2.6 Relational and Logical Operators................................................................................... 40 2.7 Type Conversions........................................................................................................... 41 2.8 Increment and Decrement Operators.............................................................................. 44 2.9 Bitwise Operators........................................................................................................... 46 2.10 Assignment Operators and Expressions....................................................................... 47 2.11 Conditional Expressions............................................................................................... 49 2.12 Precedence and Order of Evaluation............................................................................ 49 Chapter 3 - Control Flow......................................................................................................... 52 3.1 Statements and Blocks................................................................................................... 52 3.2 If-Else............................................................................................................................. 52 3.3 Else-If............................................................................................................................. 53 3.4 Switch............................................................................................................................. 54 3.5 Loops - While and For................................................................................................... 56 3.6 Loops - Do-While........................................................................................................... 58 3.7 Break and Continue........................................................................................................ 59 3.8 Goto and labels............................................................................................................... 60 Chapter 4 - Functions and Program Structure.......................................................................... 62 4.1 Basics of Functions........................................................................................................ 62 4.2 Functions Returning Non-integers................................................................................. 65 4.3 External Variables.......................................................................................................... 67 4.4 Scope Rules.................................................................................................................... 72 4.5 Header Files.................................................................................................................... 73 4.6 Static Variables.............................................................................................................. 75 4.7 Register Variables.......................................................................................................... 75 4.8 Block Structure............................................................................................................... 76 4.9 Initialization................................................................................................................... 76 4.10 Recursion...................................................................................................................... 78 4.11 The C Preprocessor...................................................................................................... 79 3 4.11.1 File Inclusion......................................................................................................... 79 4.11.2 Macro Substitution................................................................................................ 80 4.11.3 Conditional Inclusion............................................................................................ 82 Chapter 5 - Pointers and Arrays............................................................................................... 83 5.1 Pointers and Addresses................................................................................................... 83 5.2 Pointers and Function Arguments.................................................................................. 84 5.3 Pointers and Arrays........................................................................................................ 87 5.4 Address Arithmetic........................................................................................................ 90 5.5 Character Pointers and Functions................................................................................... 93 5.6 Pointer Arrays; Pointers to Pointers............................................................................... 96 5.7 Multi-dimensional Arrays.............................................................................................. 99 5.8 Initialization of Pointer Arrays..................................................................................... 101 5.9 Pointers vs. Multi-dimensional Arrays......................................................................... 101 5.10 Command-line Arguments......................................................................................... 102 5.11 Pointers to Functions.................................................................................................. 106 5.12 Complicated Declarations.......................................................................................... 108 Chapter 6 - Structures............................................................................................................. 114 6.1 Basics of Structures...................................................................................................... 114 6.2 Structures and Functions.............................................................................................. 116 6.3 Arrays of Structures..................................................................................................... 118 6.4 Pointers to Structures................................................................................................... 122 6.5 Self-referential Structures............................................................................................ 124 6.6 Table Lookup............................................................................................................... 127 6.7 Typedef......................................................................................................................... 129 6.8 Unions.......................................................................................................................... 131 6.9 Bit-fields....................................................................................................................... 132 Chapter 7 - Input and Output.................................................................................................. 135 7.1 Standard Input and Output........................................................................................... 135 7.2 Formatted Output - printf............................................................................................. 137 7.3 Variable-length Argument Lists................................................................................... 138 7.4 Formatted Input - Scanf................................................................................................ 140 7.5 File Access................................................................................................................... 142 7.6 Error Handling - Stderr and Exit.................................................................................. 145 7.7 Line Input and Output.................................................................................................. 146 7.8 Miscellaneous Functions.............................................................................................. 147 7.8.1 String Operations................................................................................................... 147 7.8.2 Character Class Testing and Conversion.............................................................. 148 7.8.3 Ungetc................................................................................................................... 148 7.8.4 Command Execution............................................................................................. 148 7.8.5 Storage Management............................................................................................. 148 7.8.6 Mathematical Functions........................................................................................ 149 7.8.7 Random Number generation................................................................................. 149 Chapter 8 - The UNIX System Interface................................................................................ 151 8.1 File Descriptors............................................................................................................ 151 8.2 Low Level I/O - Read and Write.................................................................................. 152 8.3 Open, Creat, Close, Unlink.......................................................................................... 153 8.4 Random Access - Lseek............................................................................................... 155 8.5 Example - An implementation of Fopen and Getc....................................................... 156 8.6 Example - Listing Directories...................................................................................... 159 8.7 Example - A Storage Allocator.................................................................................... 163 Appendix A - Reference Manual........................................................................................... 168 A.1 Introduction................................................................................................................. 168 4 A.2 Lexical Conventions.................................................................................................... 168 A.2.1 Tokens.................................................................................................................. 168 A.2.2 Comments............................................................................................................. 168 A.2.3 Identifiers.............................................................................................................. 168 A.2.4 Keywords.............................................................................................................. 169 A.2.5 Constants.............................................................................................................. 169 A.2.6 String Literals....................................................................................................... 171 A.3 Syntax Notation........................................................................................................... 171 A.4 Meaning of Identifiers................................................................................................. 171 A.4.1 Storage Class........................................................................................................ 171 A.4.2 Basic Types.......................................................................................................... 172 A.4.3 Derived types........................................................................................................ 173 A.4.4 Type Qualifiers..................................................................................................... 173 A.5 Objects and Lvalues.................................................................................................... 173 A.6 Conversions................................................................................................................. 173 A.6.1 Integral Promotion................................................................................................ 174 A.6.2 Integral Conversions............................................................................................. 174 A.6.3 Integer and Floating.............................................................................................. 174 A.6.4 Floating Types...................................................................................................... 174 A.6.5 Arithmetic Conversions........................................................................................ 174 A.6.6 Pointers and Integers............................................................................................ 175 A.6.7 Void...................................................................................................................... 176 A.6.8 Pointers to Void.................................................................................................... 176 A.7 Expressions.................................................................................................................. 176 A.7.1 Pointer Conversion............................................................................................... 177 A.7.2 Primary Expressions............................................................................................. 177 A.7.3 Postfix Expressions.............................................................................................. 177 A.7.4 Unary Operators................................................................................................... 179 A.7.5 Casts..................................................................................................................... 181 A.7.6 Multiplicative Operators....................................................................................... 181 A.7.7 Additive Operators............................................................................................... 182 A.7.8 Shift Operators..................................................................................................... 182 A.7.9 Relational Operators............................................................................................. 183 A.7.10 Equality Operators.............................................................................................. 183 A.7.11 Bitwise AND Operator....................................................................................... 183 A.7.12 Bitwise Exclusive OR Operator......................................................................... 184 A.7.13 Bitwise Inclusive OR Operator.......................................................................... 184 A.7.14 Logical AND Operator....................................................................................... 184 A.7.15 Logical OR Operator.......................................................................................... 184 A.7.16 Conditional Operator.......................................................................................... 184 A.7.17 Assignment Expressions..................................................................................... 185 A.7.18 Comma Operator................................................................................................ 185 A.7.19 Constant Expressions......................................................................................... 186 A.8 Declarations................................................................................................................. 186 A.8.1 Storage Class Specifiers....................................................................................... 187 A.8.2 Type Specifiers..................................................................................................... 188 A.8.3 Structure and Union Declarations........................................................................ 188 A.8.4 Enumerations........................................................................................................ 191 A.8.5 Declarators............................................................................................................ 192 A.8.6 Meaning of Declarators........................................................................................ 193 A.8.7 Initialization.......................................................................................................... 196 A.8.8 Type names........................................................................................................... 198 5 A.8.9 Typedef................................................................................................................. 199 A.8.10 Type Equivalence............................................................................................... 199 A.9 Statements................................................................................................................... 199 A.9.1 Labeled Statements............................................................................................... 200 A.9.2 Expression Statement........................................................................................... 200 A.9.3 Compound Statement........................................................................................... 200 A.9.4 Selection Statements............................................................................................. 201 A.9.5 Iteration Statements.............................................................................................. 201 A.9.6 Jump statements................................................................................................... 202 A.10 External Declarations................................................................................................ 203 A.10.1 Function Definitions........................................................................................... 203 A.10.2 External Declarations......................................................................................... 204 A.11 Scope and Linkage.................................................................................................... 205 A.11.1 Lexical Scope..................................................................................................... 205 A.11.2 Linkage............................................................................................................... 206 A.12 Preprocessing............................................................................................................. 206 A.12.1 Trigraph Sequences............................................................................................ 207 A.12.2 Line Splicing...................................................................................................... 207 A.12.3 Macro Definition and Expansion....................................................................... 207 A.12.4 File Inclusion...................................................................................................... 209 A.12.5 Conditional Compilation.................................................................................... 210 A.12.6 Line Control....................................................................................................... 211 A.12.7 Error Generation................................................................................................. 211 A.12.8 Pragmas.............................................................................................................. 212 A.12.9 Null directive...................................................................................................... 212 A.12.10 Predefined names............................................................................................. 212 A.13 Grammar.................................................................................................................... 212 Appendix B - Standard Library.............................................................................................. 220 B.1 Input and Output:........................................................................................ 220 B.1.1 File Operations..................................................................................................... 220 B.1.2 Formatted Output.................................................................................................. 222 B.1.3 Formatted Input.................................................................................................... 223 B.1.4 Character Input and Output Functions.................................................................. 225 B.1.5 Direct Input and Output Functions....................................................................... 225 B.1.6 File Positioning Functions.................................................................................... 226 B.1.7 Error Functions..................................................................................................... 226 B.2 Character Class Tests:................................................................................ 226 B.3 String Functions:........................................................................................ 227 B.4 Mathematical Functions:............................................................................. 228 B.5 Utility Functions:....................................................................................... 229 B.6 Diagnostics:................................................................................................ 231 B.7 Variable Argument Lists:.......................................................................... 231 B.8 Non-local Jumps:...................................................................................... 232 B.9 Signals:...................................................................................................... 232 B.10 Date and Time Functions:.......................................................................... 233 B.11 Implementation-defined Limits: and............................................................................................................................................ 234 Appendix C - Summary of Changes...................................................................................... 236 6 Preface The computing world has undergone a revolution since the publication of The C Programming Language in 1978. Big computers are much bigger, and personal computers have capabilities that rival mainframes of a decade ago. During this time, C has changed too, although only modestly, and it has spread far beyond its origins as the language of the UNIX operating system. The growing popularity of C, the changes in the language over the years, and the creation of compilers by groups not involved in its design, combined to demonstrate a need for a more precise and more contemporary definition of the language than the first edition of this book provided. In 1983, the American National Standards Institute (ANSI) established a committee whose goal was to produce ``an unambiguous and machine-independent definition of the language C'', while still retaining its spirit. The result is the ANSI standard for C. The standard formalizes constructions that were hinted but not described in the first edition, particularly structure assignment and enumerations. It provides a new form of function declaration that permits cross-checking of definition with use. It specifies a standard library, with an extensive set of functions for performing input and output, memory management, string manipulation, and similar tasks. It makes precise the behavior of features that were not spelled out in the original definition, and at the same time states explicitly which aspects of the language remain machine-dependent. This Second Edition of The C Programming Language describes C as defined by the ANSI standard. Although we have noted the places where the language has evolved, we have chosen to write exclusively in the new form. For the most part, this makes no significant difference; the most visible change is the new form of function declaration and definition. Modern compilers already support most features of the standard. We have tried to retain the brevity of the first edition. C is not a big language, and it is not well served by a big book. We have improved the exposition of critical features, such as pointers, that are central to C programming. We have refined the original examples, and have added new examples in several chapters. For instance, the treatment of complicated declarations is augmented by programs that convert declarations into words and vice versa. As before, all examples have been tested directly from the text, which is in machine-readable form. Appendix A, the reference manual, is not the standard, but our attempt to convey the essentials of the standard in a smaller space. It is meant for easy comprehension by programmers, but not as a definition for compiler writers -- that role properly belongs to the standard itself. Appendix B is a summary of the facilities of the standard library. It too is meant for reference by programmers, not implementers. Appendix C is a concise summary of the changes from the original version. As we said in the preface to the first edition, C ``wears well as one's experience with it grows''. With a decade more experience, we still feel that way. We hope that this book will help you learn C and use it well. 7 We are deeply indebted to friends who helped us to produce this second edition. Jon Bently, Doug Gwyn, Doug McIlroy, Peter Nelson, and Rob Pike gave us perceptive comments on almost every page of draft manuscripts. We are grateful for careful reading by Al Aho, Dennis Allison, Joe Campbell, G.R. Emlin, Karen Fortgang, Allen Holub, Andrew Hume, Dave Kristol, John Linderman, Dave Prosser, Gene Spafford, and Chris van Wyk. We also received helpful suggestions from Bill Cheswick, Mark Kernighan, Andy Koenig, Robin Lake, Tom London, Jim Reeds, Clovis Tondo, and Peter Weinberger. Dave Prosser answered many detailed questions about the ANSI standard. We used Bjarne Stroustrup's C++ translator extensively for local testing of our programs, and Dave Kristol provided us with an ANSI C compiler for final testing. Rich Drechsler helped greatly with typesetting. Our sincere thanks to all. Brian W. Kernighan Dennis M. Ritchie 8 Preface to the first edition C is a general-purpose programming language with features economy of expression, modern flow control and data structures, and a rich set of operators. C is not a ``very high level'' language, nor a ``big'' one, and is not specialized to any particular area of application. But its absence of restrictions and its generality make it more convenient and effective for many tasks than supposedly more powerful languages. C was originally designed for and implemented on the UNIX operating system on the DEC PDP-11, by Dennis Ritchie. The operating system, the C compiler, and essentially all UNIX applications programs (including all of the software used to prepare this book) are written in C. Production compilers also exist for several other machines, including the IBM System/370, the Honeywell 6000, and the Interdata 8/32. C is not tied to any particular hardware or system, however, and it is easy to write programs that will run without change on any machine that supports C. This book is meant to help the reader learn how to program in C. It contains a tutorial introduction to get new users started as soon as possible, separate chapters on each major feature, and a reference manual. Most of the treatment is based on reading, writing and revising examples, rather than on mere statements of rules. For the most part, the examples are complete, real programs rather than isolated fragments. All examples have been tested directly from the text, which is in machine-readable form. Besides showing how to make effective use of the language, we have also tried where possible to illustrate useful algorithms and principles of good style and sound design. The book is not an introductory programming manual; it assumes some familiarity with basic programming concepts like variables, assignment statements, loops, and functions. Nonetheless, a novice programmer should be able to read along and pick up the language, although access to more knowledgeable colleague will help. In our experience, C has proven to be a pleasant, expressive and versatile language for a wide variety of programs. It is easy to learn, and it wears well as on's experience with it grows. We hope that this book will help you to use it well. The thoughtful criticisms and suggestions of many friends and colleagues have added greatly to this book and to our pleasure in writing it. In particular, Mike Bianchi, Jim Blue, Stu Feldman, Doug McIlroy Bill Roome, Bob Rosin and Larry Rosler all read multiple volumes with care. We are also indebted to Al Aho, Steve Bourne, Dan Dvorak, Chuck Haley, Debbie Haley, Marion Harris, Rick Holt, Steve Johnson, John Mashey, Bob Mitze, Ralph Muha, Peter Nelson, Elliot Pinson, Bill Plauger, Jerry Spivack, Ken Thompson, and Peter Weinberger for helpful comments at various stages, and to Mile Lesk and Joe Ossanna for invaluable assistance with typesetting. Brian W. Kernighan Dennis M. Ritchie 9 Chapter 1 - A Tutorial Introduction Let us begin with a quick introduction in C. Our aim is to show the essential elements of the language in real programs, but without getting bogged down in details, rules, and exceptions. At this point, we are not trying to be complete or even precise (save that the examples are meant to be correct). We want to get you as quickly as possible to the point where you can write useful programs, and to do that we have to concentrate on the basics: variables and constants, arithmetic, control flow, functions, and the rudiments of input and output. We are intentionally leaving out of this chapter features of C that are important for writing bigger programs. These include pointers, structures, most of C's rich set of operators, several control- flow statements, and the standard library. This approach and its drawbacks. Most notable is that the complete story on any particular feature is not found here, and the tutorial, by being brief, may also be misleading. And because the examples do not use the full power of C, they are not as concise and elegant as they might be. We have tried to minimize these effects, but be warned. Another drawback is that later chapters will necessarily repeat some of this chapter. We hope that the repetition will help you more than it annoys. In any case, experienced programmers should be able to extrapolate from the material in this chapter to their own programming needs. Beginners should supplement it by writing small, similar programs of their own. Both groups can use it as a framework on which to hang the more detailed descriptions that begin in Chapter 2. 1.1 Getting Started The only way to learn a new programming language is by writing programs in it. The first program to write is the same for all languages: Print the words hello, world This is a big hurdle; to leap over it you have to be able to create the program text somewhere, compile it successfully, load it, run it, and find out where your output went. With these mechanical details mastered, everything else is comparatively easy. In C, the program to print ``hello, world'' is #include main() { printf("hello, world\n"); } Just how to run this program depends on the system you are using. As a specific example, on the UNIX operating system you must create the program in a file whose name ends in ``.c'', such as hello.c, then compile it with the command cc hello.c 10 If you haven't botched anything, such as omitting a character or misspelling something, the compilation will proceed silently, and make an executable file called a.out. If you run a.out by typing the command a.out it will print hello, world On other systems, the rules will be different; check with a local expert. Now, for some explanations about the program itself. A C program, whatever its size, consists of functions and variables. A function contains statements that specify the computing operations to be done, and variables store values used during the computation. C functions are like the subroutines and functions in Fortran or the procedures and functions of Pascal. Our example is a function named main. Normally you are at liberty to give functions whatever names you like, but ``main'' is special - your program begins executing at the beginning of main. This means that every program must have a main somewhere. main will usually call other functions to help perform its job, some that you wrote, and others from libraries that are provided for you. The first line of the program, #include tells the compiler to include information about the standard input/output library; the line appears at the beginning of many C source files. The standard library is described in Chapter 7 and Appendix B. One method of communicating data between functions is for the calling function to provide a list of values, called arguments, to the function it calls. The parentheses after the function name surround the argument list. In this example, main is defined to be a function that expects no arguments, which is indicated by the empty list ( ). #include include information about standard library main() define a function called main that received no argument values { statements of main are enclosed in braces printf("hello, world\n"); main calls library function printf to print this sequence of characters } \n represents the newline character The first C program The statements of a function are enclosed in braces { }. The function main contains only one statement, printf("hello, world\n"); 11 A function is called by naming it, followed by a parenthesized list of arguments, so this calls the function printf with the argument "hello, world\n". printf is a library function that prints output, in this case the string of characters between the quotes. A sequence of characters in double quotes, like "hello, world\n", is called a character string or string constant. For the moment our only use of character strings will be as arguments for printf and other functions. The sequence \n in the string is C notation for the newline character, which when printed advances the output to the left margin on the next line. If you leave out the \n (a worthwhile experiment), you will find that there is no line advance after the output is printed. You must use \n to include a newline character in the printf argument; if you try something like printf("hello, world "); the C compiler will produce an error message. printf never supplies a newline character automatically, so several calls may be used to build up an output line in stages. Our first program could just as well have been written #include main() { printf("hello, "); printf("world"); printf("\n"); } to produce identical output. Notice that \n represents only a single character. An escape sequence like \n provides a general and extensible mechanism for representing hard-to-type or invisible characters. Among the others that C provides are \t for tab, \b for backspace, \" for the double quote and \\ for the backslash itself. There is a complete list in Section 2.3. Exercise 1-1. Run the ``hello, world'' program on your system. Experiment with leaving out parts of the program, to see what error messages you get. Exercise 1-2. Experiment to find out what happens when prints's argument string contains \c, where c is some character not listed above. 1.2 Variables and Arithmetic Expressions The next program uses the formula oC=(5/9)(oF-32) to print the following table of Fahrenheit temperatures and their centigrade or Celsius equivalents: 12 1 -17 20 -6 40 4 60 15 80 26 100 37 120 48 140 60 160 71 180 82 200 93 220 104 240 115 260 126 280 137 300 148 The program itself still consists of the definition of a single function named main. It is longer than the one that printed ``hello, world'', but not complicated. It introduces several new ideas, including comments, declarations, variables, arithmetic expressions, loops , and formatted output. #include main() { int fahr, celsius; int lower, upper, step; lower = 0; upper = 300; step = 20; fahr = lower; while (fahr = '0' && c = '0' && c = '0' && c = 0 */ int power(int base, int n) { int i, p; p = 1; for (i = 1; i = 0 */ power(base, n) int base, n; { int i, p; p = 1; for (i = 1; i = 0; version 2 */ int power(int base, int n) { int p; for (p = 1; n > 0; --n) p = p * base; return p; } 29 The parameter n is used as a temporary variable, and is counted down (a for loop that runs backwards) until it becomes zero; there is no longer a need for the variable i. Whatever is done to n inside power has no effect on the argument that power was originally called with. When necessary, it is possible to arrange for a function to modify a variable in a calling routine. The caller must provide the address of the variable to be set (technically a pointer to the variable), and the called function must declare the parameter to be a pointer and access the variable indirectly through it. We will cover pointers in Chapter 5. The story is different for arrays. When the name of an array is used as an argument, the value passed to the function is the location or address of the beginning of the array - there is no copying of array elements. By subscripting this value, the function can access and alter any argument of the array. This is the topic of the next section. 1.9 Character Arrays The most common type of array in C is the array of characters. To illustrate the use of character arrays and functions to manipulate them, let's write a program that reads a set of text lines and prints the longest. The outline is simple enough: while (there's another line) if (it's longer than the previous longest) (save it) (save its length) print longest line This outline makes it clear that the program divides naturally into pieces. One piece gets a new line, another saves it, and the rest controls the process. Since things divide so nicely, it would be well to write them that way too. Accordingly, let us first write a separate function getline to fetch the next line of input. We will try to make the function useful in other contexts. At the minimum, getline has to return a signal about possible end of file; a more useful design would be to return the length of the line, or zero if end of file is encountered. Zero is an acceptable end-of-file return because it is never a valid line length. Every text line has at least one character; even a line containing only a newline has length 1. When we find a line that is longer than the previous longest line, it must be saved somewhere. This suggests a second function, copy, to copy the new line to a safe place. Finally, we need a main program to control getline and copy. Here is the result. 30 #include #define MAXLINE 1000 int getline(char line[], int maxline); void copy(char to[], char from[]); main() { int len; int max; char line[MAXLINE]; char longest[MAXLINE]; max = 0; while ((len = getline(line, MAXLINE)) > 0) if (len > max) { max = len; copy(longest, line); } if (max > 0) printf("%s", longest); return 0; } int getline(char s[],int lim) { int c, i; for (i=0; i < lim-1 && (c=getchar())!=EOF && c!='\n'; ++i) s[i] = c; if (c == '\n') { s[i] = c; ++i; } s[i] = '\0'; return i; } void copy(char to[], char from[]) { int i; i = 0; while ((to[i] = from[i]) != '\0') ++i; } The functions getline and copy are declared at the beginning of the program, which we assume is contained in one file. main and getline communicate through a pair of arguments and a returned value. In getline, the arguments are declared by the line int getline(char s[], int lim); which specifies that the first argument, s, is an array, and the second, lim, is an integer. The purpose of supplying the size of an array in a declaration is to set aside storage. The length of an array s is not necessary in getline since its size is set in main. getline uses return to 31 send a value back to the caller, just as the function power did. This line also declares that getline returns an int; since int is the default return type, it could be omitted. Some functions return a useful value; others, like copy, are used only for their effect and return no value. The return type of copy is void, which states explicitly that no value is returned. getline puts the character '\0' (the null character, whose value is zero) at the end of the array it is creating, to mark the end of the string of characters. This conversion is also used by the C language: when a string constant like "hello\n" appears in a C program, it is stored as an array of characters containing the characters in the string and terminated with a '\0' to mark the end. The %s format specification in printf expects the corresponding argument to be a string represented in this form. copy also relies on the fact that its input argument is terminated with a '\0', and copies this character into the output. It is worth mentioning in passing that even a program as small as this one presents some sticky design problems. For example, what should main do if it encounters a line which is bigger than its limit? getline works safely, in that it stops collecting when the array is full, even if no newline has been seen. By testing the length and the last character returned, main can determine whether the line was too long, and then cope as it wishes. In the interests of brevity, we have ignored this issue. There is no way for a user of getline to know in advance how long an input line might be, so getline checks for overflow. On the other hand, the user of copy already knows (or can find out) how big the strings are, so we have chosen not to add error checking to it. Exercise 1-16. Revise the main routine of the longest-line program so it will correctly print the length of arbitrary long input lines, and as much as possible of the text. Exercise 1-17. Write a program to print all input lines that are longer than 80 characters. Exercise 1-18. Write a program to remove trailing blanks and tabs from each line of input, and to delete entirely blank lines. Exercise 1-19. Write a function reverse(s) that reverses the character string s. Use it to write a program that reverses its input a line at a time. 1.10 External Variables and Scope The variables in main, such as line, longest, etc., are private or local to main. Because they are declared within main, no other function can have direct access to them. The same is true of the variables in other functions; for example, the variable i in getline is unrelated to the i 32 in copy. Each local variable in a function comes into existence only when the function is called, and disappears when the function is exited. This is why such variables are usually known as automatic variables, following terminology in other languages. We will use the term automatic henceforth to refer to these local variables. (Chapter 4 discusses the static storage class, in which local variables do retain their values between calls.) Because automatic variables come and go with function invocation, they do not retain their values from one call to the next, and must be explicitly set upon each entry. If they are not set, they will contain garbage. As an alternative to automatic variables, it is possible to define variables that are external to all functions, that is, variables that can be accessed by name by any function. (This mechanism is rather like Fortran COMMON or Pascal variables declared in the outermost block.) Because external variables are globally accessible, they can be used instead of argument lists to communicate data between functions. Furthermore, because external variables remain in existence permanently, rather than appearing and disappearing as functions are called and exited, they retain their values even after the functions that set them have returned. An external variable must be defined, exactly once, outside of any function; this sets aside storage for it. The variable must also be declared in each function that wants to access it; this states the type of the variable. The declaration may be an explicit extern statement or may be implicit from context. To make the discussion concrete, let us rewrite the longest-line program with line, longest, and max as external variables. This requires changing the calls, declarations, and bodies of all three functions. #include #define MAXLINE 1000 int max; char line[MAXLINE]; char longest[MAXLINE]; int getline(void); void copy(void); main() { int len; extern int max; extern char longest[]; max = 0; while ((len = getline()) > 0) if (len > max) { max = len; copy(); } if (max > 0) printf("%s", longest); return 0; } 33 int getline(void) { int c, i; extern char line[]; for (i = 0; i < MAXLINE - 1 && (c=getchar)) != EOF && c != '\n'; ++i) line[i] = c; if (c == '\n') { line[i] = c; ++i; } line[i] = '\0'; return i; } void copy(void) { int i; extern char line[], longest[]; i = 0; while ((longest[i] = line[i]) != '\0') ++i; } The external variables in main, getline and copy are defined by the first lines of the example above, which state their type and cause storage to be allocated for them. Syntactically, external definitions are just like definitions of local variables, but since they occur outside of functions, the variables are external. Before a function can use an external variable, the name of the variable must be made known to the function; the declaration is the same as before except for the added keyword extern. In certain circumstances, the extern declaration can be omitted. If the definition of the external variable occurs in the source file before its use in a particular function, then there is no need for an extern declaration in the function. The extern declarations in main, getline and copy are thus redundant. In fact, common practice is to place definitions of all external variables at the beginning of the source file, and then omit all extern declarations. If the program is in several source files, and a variable is defined in file1 and used in file2 and file3, then extern declarations are needed in file2 and file3 to connect the occurrences of the variable. The usual practice is to collect extern declarations of variables and functions in a separate file, historically called a header, that is included by #include at the front of each source file. The suffix.h is conventional for header names. The functions of the standard library, for example, are declared in headers like. This topic is discussed at length in Chapter 4, and the library itself in Chapter 7 and Appendix B. Since the specialized versions of getline and copy have no arguments, logic would suggest that their prototypes at the beginning of the file should be getline() and copy(). But for compatibility with older C programs the standard takes an empty list as an old-style declaration, and turns off all argument list checking; the word void must be used for an explicitly empty list. We will discuss this further in Chapter 4. You should note that we are using the words definition and declaration carefully when we refer to external variables in this section.``Definition'' refers to the place where the variable is 34 created or assigned storage; ``declaration'' refers to places where the nature of the variable is stated but no storage is allocated. By the way, there is a tendency to make everything in sight an extern variable because it appears to simplify communications - argument lists are short and variables are always there when you want them. But external variables are always there even when you don't want them. Relying too heavily on external variables is fraught with peril since it leads to programs whose data connections are not all obvious - variables can be changed in unexpected and even inadvertent ways, and the program is hard to modify. The second version of the longest-line program is inferior to the first, partly for these reasons, and partly because it destroys the generality of two useful functions by writing into them the names of the variables they manipulate. At this point we have covered what might be called the conventional core of C. With this handful of building blocks, it's possible to write useful programs of considerable size, and it would probably be a good idea if you paused long enough to do so. These exercises suggest programs of somewhat greater complexity than the ones earlier in this chapter. Exercise 1-20. Write a program detab that replaces tabs in the input with the proper number of blanks to space to the next tab stop. Assume a fixed set of tab stops, say every n columns. Should n be a variable or a symbolic parameter? Exercise 1-21. Write a program entab that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing. Use the same tab stops as for detab. When either a tab or a single blank would suffice to reach a tab stop, which should be given preference? Exercise 1-22. Write a program to ``fold'' long input lines into two or more shorter lines after the last non-blank character that occurs before the n-th column of input. Make sure your program does something intelligent with very long lines, and if there are no blanks or tabs before the specified column. Exercise 1-23. Write a program to remove all comments from a C program. Don't forget to handle quoted strings and character constants properly. C comments don't nest. Exercise 1-24. Write a program to check a C program for rudimentary syntax errors like unmatched parentheses, brackets and braces. Don't forget about quotes, both single and double, escape sequences, and comments. (This program is hard if you do it in full generality.) 35 Chapter 2 - Types, Operators and Expressions Variables and constants are the basic data objects manipulated in a program. Declarations list the variables to be used, and state what type they have and perhaps what their initial values are. Operators specify what is to be done to them. Expressions combine variables and constants to produce new values. The type of an object determines the set of values it can have and what operations can be performed on it. These building blocks are the topics of this chapter. The ANSI standard has made many small changes and additions to basic types and expressions. There are now signed and unsigned forms of all integer types, and notations for unsigned constants and hexadecimal character constants. Floating-point operations may be done in single precision; there is also a long double type for extended precision. String constants may be concatenated at compile time. Enumerations have become part of the language, formalizing a feature of long standing. Objects may be declared const, which prevents them from being changed. The rules for automatic coercions among arithmetic types have been augmented to handle the richer set of types. 2.1 Variable Names Although we didn't say so in Chapter 1, there are some restrictions on the names of variables and symbolic constants. Names are made up of letters and digits; the first character must be a letter. The underscore ``_'' counts as a letter; it is sometimes useful for improving the readability of long variable names. Don't begin variable names with underscore, however, since library routines often use such names. Upper and lower case letters are distinct, so x and X are two different names. Traditional C practice is to use lower case for variable names, and all upper case for symbolic constants. At least the first 31 characters of an internal name are significant. For function names and external variables, the number may be less than 31, because external names may be used by assemblers and loaders over which the language has no control. For external names, the standard guarantees uniqueness only for 6 characters and a single case. Keywords like if, else, int, float, etc., are reserved: you can't use them as variable names. They must be in lower case. It's wise to choose variable names that are related to the purpose of the variable, and that are unlikely to get mixed up typographically. We tend to use short names for local variables, especially loop indices, and longer names for external variables. 2.2 Data Types and Sizes There are only a few basic data types in C: char a single byte, capable of holding one character in the local character set int an integer, typically reflecting the natural size of integers on the host machine float single-precision floating point 36 double double-precision floating point In addition, there are a number of qualifiers that can be applied to these basic types. short and long apply to integers: short int sh; long int counter; The word int can be omitted in such declarations, and typically it is. The intent is that short and long should provide different lengths of integers where practical; int will normally be the natural size for a particular machine. short is often 16 bits long, and int either 16 or 32 bits. Each compiler is free to choose appropriate sizes for its own hardware, subject only to the the restriction that shorts and ints are at least 16 bits, longs are at least 32 bits, and short is no longer than int, which is no longer than long. The qualifier signed or unsigned may be applied to char or any integer. unsigned numbers are always positive or zero, and obey the laws of arithmetic modulo 2n, where n is the number of bits in the type. So, for instance, if chars are 8 bits, unsigned char variables have values between 0 and 255, while signed chars have values between -128 and 127 (in a two's complement machine.) Whether plain chars are signed or unsigned is machine-dependent, but printable characters are always positive. The type long double specifies extended-precision floating point. As with integers, the sizes of floating-point objects are implementation-defined; float, double and long double could represent one, two or three distinct sizes. The standard headers and contain symbolic constants for all of these sizes, along with other properties of the machine and compiler. These are discussed in Appendix B. Exercise 2-1. Write a program to determine the ranges of char, short, int, and long variables, both signed and unsigned, by printing appropriate values from standard headers and by direct computation. Harder if you compute them: determine the ranges of the various floating-point types. 2.3 Constants An integer constant like 1234 is an int. A long constant is written with a terminal l (ell) or L, as in 123456789L; an integer constant too big to fit into an int will also be taken as a long. Unsigned constants are written with a terminal u or U, and the suffix ul or UL indicates unsigned long. Floating-point constants contain a decimal point (123.4) or an exponent (1e-2) or both; their type is double, unless suffixed. The suffixes f or F indicate a float constant; l or L indicate a long double. The value of an integer can be specified in octal or hexadecimal instead of decimal. A leading 0 (zero) on an integer constant means octal; a leading 0x or 0X means hexadecimal. For example, decimal 31 can be written as 037 in octal and 0x1f or 0x1F in hex. Octal and 37 hexadecimal constants may also be followed by L to make them long and U to make them unsigned: 0XFUL is an unsigned long constant with value 15 decimal. A character constant is an integer, written as one character within single quotes, such as 'x'. The value of a character constant is the numeric value of the character in the machine's character set. For example, in the ASCII character set the character constant '0' has the value 48, which is unrelated to the numeric value 0. If we write '0' instead of a numeric value like 48 that depends on the character set, the program is independent of the particular value and easier to read. Character constants participate in numeric operations just as any other integers, although they are most often used in comparisons with other characters. Certain characters can be represented in character and string constants by escape sequences like \n (newline); these sequences look like two characters, but represent only one. In addition, an arbitrary byte-sized bit pattern can be specified by '\ooo' where ooo is one to three octal digits (0...7) or by '\xhh' where hh is one or more hexadecimal digits (0...9, a...f, A...F). So we might write #define VTAB '\013' #define BELL '\007' or, in hexadecimal, #define VTAB '\xb' #define BELL '\x7' The complete set of escape sequences is \a alert (bell) character \\ backslash \b backspace \? question mark \f formfeed \' single quote \n newline \" double quote \r carriage return \ooo octal number \t horizontal tab \xhh hexadecimal number \v vertical tab The character constant '\0' represents the character with value zero, the null character. '\0' is often written instead of 0 to emphasize the character nature of some expression, but the numeric value is just 0. A constant expression is an expression that involves only constants. Such expressions may be evaluated at during compilation rather than run-time, and accordingly may be used in any place that a constant can occur, as in #define MAXLINE 1000 char line[MAXLINE+1]; or #define LEAP 1 int days[31+28+LEAP+31+30+31+30+31+31+30+31+30+31]; 38 A string constant, or string literal, is a sequence of zero or more characters surrounded by double quotes, as in "I am a string" or "" The quotes are not part of the string, but serve only to delimit it. The same escape sequences used in character constants apply in strings; \" represents the double-quote character. String constants can be concatenated at compile time: "hello, " "world" is equivalent to "hello, world" This is useful for splitting up long strings across several source lines. Technically, a string constant is an array of characters. The internal representation of a string has a null character '\0' at the end, so the physical storage required is one more than the number of characters written between the quotes. This representation means that there is no limit to how long a string can be, but programs must scan a string completely to determine its length. The standard library function strlen(s) returns the length of its character string argument s, excluding the terminal '\0'. Here is our version: int strlen(char s[]) { int i; while (s[i] != '\0') ++i; return i; } strlen and other string functions are declared in the standard header. Be careful to distinguish between a character constant and a string that contains a single character: 'x' is not the same as "x". The former is an integer, used to produce the numeric value of the letter x in the machine's character set. The latter is an array of characters that contains one character (the letter x) and a '\0'. There is one other kind of constant, the enumeration constant. An enumeration is a list of constant integer values, as in enum boolean { NO, YES }; The first name in an enum has value 0, the next 1, and so on, unless explicit values are specified. If not all values are specified, unspecified values continue the progression from the last specified value, as the second of these examples: enum escapes { BELL = '\a', BACKSPACE = '\b', TAB = '\t', NEWLINE = '\n', VTAB = '\v', RETURN = '\r' }; enum months { JAN = 1, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC }; 39 Names in different enumerations must be distinct. Values need not be distinct in the same enumeration. Enumerations provide a convenient way to associate constant values with names, an alternative to #define with the advantage that the values can be generated for you. Although variables of enum types may be declared, compilers need not check that what you store in such a variable is a valid value for the enumeration. Nevertheless, enumeration variables offer the chance of checking and so are often better than #defines. In addition, a debugger may be able to print values of enumeration variables in their symbolic form. 2.4 Declarations All variables must be declared before use, although certain declarations can be made implicitly by content. A declaration specifies a type, and contains a list of one or more variables of that type, as in int lower, upper, step; char c, line; Variables can be distributed among declarations in any fashion; the lists above could well be written as int lower; int upper; int step; char c; char line; The latter form takes more space, but is convenient for adding a comment to each declaration for subsequent modifications. A variable may also be initialized in its declaration. If the name is followed by an equals sign and an expression, the expression serves as an initializer, as in char esc = '\\'; int i = 0; int limit = MAXLINE+1; float eps = 1.0e-5; If the variable in question is not automatic, the initialization is done once only, conceptionally before the program starts executing, and the initializer must be a constant expression. An explicitly initialized automatic variable is initialized each time the function or block it is in is entered; the initializer may be any expression. External and static variables are initialized to zero by default. Automatic variables for which is no explicit initializer have undefined (i.e., garbage) values. The qualifier const can be applied to the declaration of any variable to specify that its value will not be changed. For an array, the const qualifier says that the elements will not be altered. const double e = 2.71828182845905; const char msg[] = "warning: "; The const declaration can also be used with array arguments, to indicate that the function does not change that array: int strlen(const char[]); 40 The result is implementation-defined if an attempt is made to change a const. 2.5 Arithmetic Operators The binary arithmetic operators are +, -, *, /, and the modulus operator %. Integer division truncates any fractional part. The expression x % y produces the remainder when x is divided by y, and thus is zero when y divides x exactly. For example, a year is a leap year if it is divisible by 4 but not by 100, except that years divisible by 400 are leap years. Therefore if ((year % 4 == 0 && year % 100 != 0) || year % 400 == 0) printf("%d is a leap year\n", year); else printf("%d is not a leap year\n", year); The % operator cannot be applied to a float or double. The direction of truncation for / and the sign of the result for % are machine-dependent for negative operands, as is the action taken on overflow or underflow. The binary + and - operators have the same precedence, which is lower than the precedence of *, / and %, which is in turn lower than unary + and -. Arithmetic operators associate left to right. Table 2.1 at the end of this chapter summarizes precedence and associativity for all operators. 2.6 Relational and Logical Operators The relational operators are > >= < = '0' && s[i] = 'A' && c = '0' && c perform left and right shifts of their left operand by the number of bit positions given by the right operand, which must be non-negative. Thus x b) z = a; else z = b; compute in z the maximum of a and b. The conditional expression, written with the ternary operator ``?:'', provides an alternate way to write this and similar constructions. In the expression expr1 ? expr2 : expr3 the expression expr1 is evaluated first. If it is non-zero (true), then the expression expr2 is evaluated, and that is the value of the conditional expression. Otherwise expr3 is evaluated, and that is the value. Only one of expr2 and expr3 is evaluated. Thus to set z to the maximum of a and b, z = (a > b) ? a : b; It should be noted that the conditional expression is indeed an expression, and it can be used wherever any other expression can be. If expr2 and expr3 are of different types, the type of the result is determined by the conversion rules discussed earlier in this chapter. For example, if f is a float and n an int, then the expression (n > 0) ? f : n is of type float regardless of whether n is positive. Parentheses are not necessary around the first expression of a conditional expression, since the precedence of ?: is very low, just above assignment. They are advisable anyway, however, since they make the condition part of the expression easier to see. The conditional expression often leads to succinct code. For example, this loop prints n elements of an array, 10 per line, with each column separated by one blank, and with each line (including the last) terminated by a newline. for (i = 0; i < n; i++) printf("%6d%c", a[i], (i%10==9 || i==n-1) ? '\n' : ' '); A newline is printed after every tenth element, and after the n-th. All other elements are followed by one blank. This might look tricky, but it's more compact than the equivalent if- else. Another good example is printf("You have %d items%s.\n", n, n==1 ? "" : "s"); Exercise 2-10. Rewrite the function lower, which converts upper case letters to lower case, with a conditional expression instead of if-else. 2.12 Precedence and Order of Evaluation Table 2.1 summarizes the rules for precedence and associativity of all operators, including those that we have not yet discussed. Operators on the same line have the same precedence; rows are in order of decreasing precedence, so, for example, *, /, and % all have the same precedence, which is higher than that of binary + and -. The ``operator'' () refers to function call. The operators -> and. are used to access members of structures; they will be covered in 50 Chapter 6, along with sizeof (size of an object). Chapter 5 discusses * (indirection through a pointer) and & (address of an object), and Chapter 3 discusses the comma operator. Operators Associativity () [] ->. left to right ! ~ ++ -- + - * (type) sizeof right to left * / % left to right + - left to right > left to right < >= left to right == != left to right & left to right ^ left to right | left to right && left to right || left to right ?: right to left = += -= *= /= %= &= ^= |= = right to left , left to right Unary & +, -, and * have higher precedence than the binary forms. Table 2.1: Precedence and Associativity of Operators Note that the precedence of the bitwise operators &, ^, and | falls below == and !=. This implies that bit-testing expressions like if ((x & MASK) == 0)... must be fully parenthesized to give proper results. C, like most languages, does not specify the order in which the operands of an operator are evaluated. (The exceptions are &&, ||, ?:, and `,'.) For example, in a statement like x = f() + g(); f may be evaluated before g or vice versa; thus if either f or g alters a variable on which the other depends, x can depend on the order of evaluation. Intermediate results can be stored in temporary variables to ensure a particular sequence. Similarly, the order in which function arguments are evaluated is not specified, so the statement printf("%d %d\n", ++n, power(2, n)); can produce different results with different compilers, depending on whether n is incremented before power is called. The solution, of course, is to write ++n; 51 printf("%d %d\n", n, power(2, n)); Function calls, nested assignment statements, and increment and decrement operators cause ``side effects'' - some variable is changed as a by-product of the evaluation of an expression. In any expression involving side effects, there can be subtle dependencies on the order in which variables taking part in the expression are updated. One unhappy situation is typified by the statement a[i] = i++; The question is whether the subscript is the old value of i or the new. Compilers can interpret this in different ways, and generate different answers depending on their interpretation. The standard intentionally leaves most such matters unspecified. When side effects (assignment to variables) take place within an expression is left to the discretion of the compiler, since the best order depends strongly on machine architecture. (The standard does specify that all side effects on arguments take effect before a function is called, but that would not help in the call to printf above.) The moral is that writing code that depends on order of evaluation is a bad programming practice in any language. Naturally, it is necessary to know what things to avoid, but if you don't know how they are done on various machines, you won't be tempted to take advantage of a particular implementation. 52 Chapter 3 - Control Flow The control-flow of a language specify the order in which computations are performed. We have already met the most common control-flow constructions in earlier examples; here we will complete the set, and be more precise about the ones discussed before. 3.1 Statements and Blocks An expression such as x = 0 or i++ or printf(...) becomes a statement when it is followed by a semicolon, as in x = 0; i++; printf(...); In C, the semicolon is a statement terminator, rather than a separator as it is in languages like Pascal. Braces { and } are used to group declarations and statements together into a compound statement, or block, so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after an if, else, while, or for are another. (Variables can be declared inside any block; we will talk about this in Chapter 4.) There is no semicolon after the right brace that ends a block. 3.2 If-Else The if-else statement is used to express decisions. Formally the syntax is if (expression) statement1 else statement2 where the else part is optional. The expression is evaluated; if it is true (that is, if expression has a non-zero value), statement1 is executed. If it is false (expression is zero) and if there is an else part, statement2 is executed instead. Since an if tests the numeric value of an expression, certain coding shortcuts are possible. The most obvious is writing if (expression) instead of if (expression != 0) Sometimes this is natural and clear; at other times it can be cryptic. Because the else part of an if-else is optional,there is an ambiguity when an else if omitted from a nested if sequence. This is resolved by associating the else with the closest previous else-less if. For example, in 53 if (n > 0) if (a > b) z = a; else z = b; the else goes to the inner if, as we have shown by indentation. If that isn't what you want, braces must be used to force the proper association: if (n > 0) { if (a > b) z = a; } else z = b; The ambiguity is especially pernicious in situations like this: if (n > 0) for (i = 0; i < n; i++) if (s[i] > 0) { printf("..."); return i; } else printf("error -- n is negative\n"); The indentation shows unequivocally what you want, but the compiler doesn't get the message, and associates the else with the inner if. This kind of bug can be hard to find; it's a good idea to use braces when there are nested ifs. By the way, notice that there is a semicolon after z = a in if (a > b) z = a; else z = b; This is because grammatically, a statement follows the if, and an expression statement like ``z = a;'' is always terminated by a semicolon. 3.3 Else-If The construction if (expression) statement else if (expression) statement else if (expression) statement else if (expression) statement else statement occurs so often that it is worth a brief separate discussion. This sequence of if statements is the most general way of writing a multi-way decision. The expressions are evaluated in order; if an expression is true, the statement associated with it is executed, and this terminates the whole chain. As always, the code for each statement is either a single statement, or a group of them in braces. 54 The last else part handles the ``none of the above'' or default case where none of the other conditions is satisfied. Sometimes there is no explicit action for the default; in that case the trailing else statement can be omitted, or it may be used for error checking to catch an ``impossible'' condition. To illustrate a three-way decision, here is a binary search function that decides if a particular value x occurs in the sorted array v. The elements of v must be in increasing order. The function returns the position (a number between 0 and n-1) if x occurs in v, and -1 if not. Binary search first compares the input value x to the middle element of the array v. If x is less than the middle value, searching focuses on the lower half of the table, otherwise on the upper half. In either case, the next step is to compare x to the middle element of the selected half. This process of dividing the range in two continues until the value is found or the range is empty. void reverse(char s[]) { int c, i, j; for (i = 0, j = strlen(s)-1; i < j; i++, j--) { c = s[i]; s[i] = s[j]; s[j] = c; } } The commas that separate function arguments, variables in declarations, etc., are not comma operators, and do not guarantee left to right evaluation. Comma operators should be used sparingly. The most suitable uses are for constructs strongly related to each other, as in the for loop in reverse, and in macros where a multistep computation has to be a single expression. A comma expression might also be appropriate for the exchange of elements in reverse, where the exchange can be thought of a single operation: for (i = 0, j = strlen(s)-1; i < j; i++, j--) c = s[i], s[i] = s[j], s[j] = c; Exercise 3-3. Write a function expand(s1,s2) that expands shorthand notations like a-z in the string s1 into the equivalent complete list abc...xyz in s2. Allow for letters of either case and digits, and be prepared to handle cases like a-b-c and a-z0-9 and -a-z. Arrange that a leading or trailing - is taken literally. 3.6 Loops - Do-While 59 As we discussed in Chapter 1, the while and for loops test the termination condition at the top. By contrast, the third loop in C, the do-while, tests at the bottom after making each pass through the loop body; the body is always executed at least once. The syntax of the do is do statement while (expression); The statement is executed, then expression is evaluated. If it is true, statement is evaluated again, and so on. When the expression becomes false, the loop terminates. Except for the sense of the test, do-while is equivalent to the Pascal repeat-until statement. Experience shows that do-while is much less used than while and for. Nonetheless, from time to time it is valuable, as in the following function itoa, which converts a number to a character string (the inverse of atoi). The job is slightly more complicated than might be thought at first, because the easy methods of generating the digits generate them in the wrong order. We have chosen to generate the string backwards, then reverse it. void itoa(int n, char s[]) { int i, sign; if ((sign = n) < 0) n = -n; i = 0; do { s[i++] = n % 10 + '0'; } while ((n /= 10) > 0); if (sign < 0) s[i++] = '-'; s[i] = '\0'; reverse(s); } The do-while is necessary, or at least convenient, since at least one character must be installed in the array s, even if n is zero. We also used braces around the single statement that makes up the body of the do-while, even though they are unnecessary, so the hasty reader will not mistake the while part for the beginning of a while loop. Exercise 3-4. In a two's complement number representation, our version of itoa does not handle the largest negative number, that is, the value of n equal to -(2wordsize-1). Explain why not. Modify it to print that value correctly, regardless of the machine on which it runs. Exercise 3-5. Write the function itob(n,s,b) that converts the integer n into a base b character representation in the string s. In particular, itob(n,s,16) formats s as a hexadecimal integer in s. Exercise 3-6. Write a version of itoa that accepts three arguments instead of two. The third argument is a minimum field width; the converted number must be padded with blanks on the left if necessary to make it wide enough. 3.7 Break and Continue 60 It is sometimes convenient to be able to exit from a loop other than by testing at the top or bottom. The break statement provides an early exit from for, while, and do, just as from switch. A break causes the innermost enclosing loop or switch to be exited immediately. The following function, trim, removes trailing blanks, tabs and newlines from the end of a string, using a break to exit from a loop when the rightmost non-blank, non-tab, non-newline is found. int trim(char s[]) { int n; for (n = strlen(s)-1; n >= 0; n--) if (s[n] != ' ' && s[n] != '\t' && s[n] != '\n') break; s[n+1] = '\0'; return n; } strlen returns the length of the string. The for loop starts at the end and scans backwards looking for the first character that is not a blank or tab or newline. The loop is broken when one is found, or when n becomes negative (that is, when the entire string has been scanned). You should verify that this is correct behavior even when the string is empty or contains only white space characters. The continue statement is related to break, but less often used; it causes the next iteration of the enclosing for, while, or do loop to begin. In the while and do, this means that the test part is executed immediately; in the for, control passes to the increment step. The continue statement applies only to loops, not to switch. A continue inside a switch inside a loop causes the next loop iteration. As an example, this fragment processes only the non-negative elements in the array a; negative values are skipped. for (i = 0; i < n; i++) if (a[i] < 0) continue;... The continue statement is often used when the part of the loop that follows is complicated, so that reversing a test and indenting another level would nest the program too deeply. 3.8 Goto and labels C provides the infinitely-abusable goto statement, and labels to branch to. Formally, the goto statement is never necessary, and in practice it is almost always easy to write code without it. We have not used goto in this book. Nevertheless, there are a few situations where gotos may find a place. The most common is to abandon processing in some deeply nested structure, such as breaking out of two or more loops at once. The break statement cannot be used directly since it only exits from the innermost loop. Thus: 61 for (... ) for (... ) {... if (disaster) goto error; }... error: This organization is handy if the error-handling code is non-trivial, and if errors can occur in several places. A label has the same form as a variable name, and is followed by a colon. It can be attached to any statement in the same function as the goto. The scope of a label is the entire function. As another example, consider the problem of determining whether two arrays a and b have an element in common. One possibility is for (i = 0; i < n; i++) for (j = 0; j < m; j++) if (a[i] == b[j]) goto found;... found:... Code involving a goto can always be written without one, though perhaps at the price of some repeated tests or an extra variable. For example, the array search becomes found = 0; for (i = 0; i < n && !found; i++) for (j = 0; j < m && !found; j++) if (a[i] == b[j]) found = 1; if (found)... else... With a few exceptions like those cited here, code that relies on goto statements is generally harder to understand and to maintain than code without gotos. Although we are not dogmatic about the matter, it does seem that goto statements should be used rarely, if at all. 62 Chapter 4 - Functions and Program Structure Functions break large computing tasks into smaller ones, and enable people to build on what others have done instead of starting over from scratch. Appropriate functions hide details of operation from parts of the program that don't need to know about them, thus clarifying the whole, and easing the pain of making changes. C has been designed to make functions efficient and easy to use; C programs generally consist of many small functions rather than a few big ones. A program may reside in one or more source files. Source files may be compiled separately and loaded together, along with previously compiled functions from libraries. We will not go into that process here, however, since the details vary from system to system. Function declaration and definition is the area where the ANSI standard has made the most changes to C. As we saw first in Chapter 1, it is now possible to declare the type of arguments when a function is declared. The syntax of function declaration also changes, so that declarations and definitions match. This makes it possible for a compiler to detect many more errors than it could before. Furthermore, when arguments are properly declared, appropriate type coercions are performed automatically. The standard clarifies the rules on the scope of names; in particular, it requires that there be only one definition of each external object. Initialization is more general: automatic arrays and structures may now be initialized. The C preprocessor has also been enhanced. New preprocessor facilities include a more complete set of conditional compilation directives, a way to create quoted strings from macro arguments, and better control over the macro expansion process. 4.1 Basics of Functions To begin with, let us design and write a program to print each line of its input that contains a particular ``pattern'' or string of characters. (This is a special case of the UNIX program grep.) For example, searching for the pattern of letters ``ould'' in the set of lines Ah Love! could you and I with Fate conspire To grasp this sorry Scheme of Things entire, Would not we shatter it to bits -- and then Re-mould it nearer to the Heart's Desire! will produce the output Ah Love! could you and I with Fate conspire Would not we shatter it to bits -- and then Re-mould it nearer to the Heart's Desire! The job falls neatly into three pieces: while (there's another line) if (the line contains the pattern) print it 63 Although it's certainly possible to put the code for all of this in main, a better way is to use the structure to advantage by making each part a separate function. Three small pieces are better to deal with than one big one, because irrelevant details can be buried in the functions, and the chance of unwanted interactions is minimized. And the pieces may even be useful in other programs. ``While there's another line'' is getline, a function that we wrote in Chapter 1, and ``print it'' is printf, which someone has already provided for us. This means we need only write a routine to decide whether the line contains an occurrence of the pattern. We can solve that problem by writing a function strindex(s,t) that returns the position or index in the string s where the string t begins, or -1 if s does not contain t. Because C arrays begin at position zero, indexes will be zero or positive, and so a negative value like -1 is convenient for signaling failure. When we later need more sophisticated pattern matching, we only have to replace strindex; the rest of the code can remain the same. (The standard library provides a function strstr that is similar to strindex, except that it returns a pointer instead of an index.) Given this much design, filling in the details of the program is straightforward. Here is the whole thing, so you can see how the pieces fit together. For now, the pattern to be searched for is a literal string, which is not the most general of mechanisms. We will return shortly to a discussion of how to initialize character arrays, and in Chapter 5 will show how to make the pattern a parameter that is set when the program is run. There is also a slightly different version of getline; you might find it instructive to compare it to the one in Chapter 1. #include #define MAXLINE 1000 int getline(char line[], int max) int strindex(char source[], char searchfor[]); char pattern[] = "ould"; main() { char line[MAXLINE]; int found = 0; while (getline(line, MAXLINE) > 0) if (strindex(line, pattern) >= 0) { printf("%s", line); found++; } return found; } int getline(char s[], int lim) { int c, i; i = 0; while (--lim > 0 && (c=getchar()) != EOF && c != '\n') s[i++] = c; if (c == '\n') s[i++] = c; 64 s[i] = '\0'; return i; } int strindex(char s[], char t[]) { int i, j, k; for (i = 0; s[i] != '\0'; i++) { for (j=i, k=0; t[k]!='\0' && s[j]==t[k]; j++, k++) ; if (k > 0 && t[k] == '\0') return i; } return -1; } Each function definition has the form return-type function-name(argument declarations) { declarations and statements } Various parts may be absent; a minimal function is dummy() {} which does nothing and returns nothing. A do-nothing function like this is sometimes useful as a place holder during program development. If the return type is omitted, int is assumed. A program is just a set of definitions of variables and functions. Communication between the functions is by arguments and values returned by the functions, and through external variables. The functions can occur in any order in the source file, and the source program can be split into multiple files, so long as no function is split. The return statement is the mechanism for returning a value from the called function to its caller. Any expression can follow return: return expression; The expression will be converted to the return type of the function if necessary. Parentheses are often used around the expression, but they are optional. The calling function is free to ignore the returned value. Furthermore, there need to be no expression after return; in that case, no value is returned to the caller. Control also returns to the caller with no value when execution ``falls off the end'' of the function by reaching the closing right brace. It is not illegal, but probably a sign of trouble, if a function returns a value from one place and no value from another. In any case, if a function fails to return a value, its ``value'' is certain to be garbage. The pattern-searching program returns a status from main, the number of matches found. This value is available for use by the environment that called the program The mechanics of how to compile and load a C program that resides on multiple source files vary from one system to the next. On the UNIX system, for example, the cc command mentioned in Chapter 1 does the job. Suppose that the three functions are stored in three files called main.c, getline.c, and strindex.c. Then the command 65 cc main.c getline.c strindex.c compiles the three files, placing the resulting object code in files main.o, getline.o, and strindex.o, then loads them all into an executable file called a.out. If there is an error, say in main.c, the file can be recompiled by itself and the result loaded with the previous object files, with the command cc main.c getline.o strindex.o The cc command uses the ``.c'' versus ``.o'' naming convention to distinguish source files from object files. Exercise 4-1. Write the function strindex(s,t) which returns the position of the rightmost occurrence of t in s, or -1 if there is none. 4.2 Functions Returning Non-integers So far our examples of functions have returned either no value (void) or an int. What if a function must return some other type? many numerical functions like sqrt, sin, and cos return double; other specialized functions return other types. To illustrate how to deal with this, let us write and use the function atof(s), which converts the string s to its double- precision floating-point equivalent. atof if an extension of atoi, which we showed versions of in Chapters 2 and 3. It handles an optional sign and decimal point, and the presence or absence of either part or f

Use Quizgecko on...
Browser
Browser