The C Programming Language (Second Edition) PDF

SECOND EDITION THE BRIAN W KERNIGHAN DENNIS M. RITCHIE PRENTICE HALL SOFTWARE SERIES THE c PROGRAMMING LANGUAGE Second Edition THE c PROGRAMMING LANGUAGE Second Edition Brian W. Kernighan...

SECOND EDITION THE BRIAN W KERNIGHAN DENNIS M. RITCHIE PRENTICE HALL SOFTWARE SERIES THE c PROGRAMMING LANGUAGE Second Edition THE c PROGRAMMING LANGUAGE Second Edition Brian W. Kernighan Dennis M. Ritchie AT & T Bell Laboratories Murray Hill, New Jersey PRENTICE HALL, Englewood Cliffs, New Jersey 07632 Ubrary of Congress Cataloging-in-Publication Data Kernighan, Brian W. The C programming language. Includes index. 1. C (Computer program language) I. Ritchie, Dennis M. II. Title. QA76.73.C15K47 1988 005.13'3 88-5934 ISBN 0-13-110370-9 ISBN 0-13-110362-8 (pbk.) Copyright e 1988, 1978 by Bell Telephone Laboratories, Incorporated. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopy- ing, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Published simultaneouslyin Canada. UNIX is a registered trademark of AT&T. This book was typeset (pic Itbll eqn Itroff -ms) in Times Roman and Courier by the authors, using an Autologic APS-5 phototypesetter and a DEC VAX 8550 running the 9th Edition of the UNIX. operating system. Prentice Hall Software Series Brian Kernighan, Advisor Printed in the United States of America 10 9 8 7 ISBN 0-13-110362-8 {PBK} ISBN 0-l3-110370-9 Prentice-Hall International (UK) Limited, London Prentice-Hall of Australia Pty. Limited, Sydney Prentice-Hall Canada Inc., Toronto Prentice-Hall Hispanoamericana, S.A. , Mexico Prentice-Hall of India Private Limited, New Delhi Prentice-Hall of Japan, Inc., Tokyo Simon & Schuster Asia Pte. Ltd., Singapore Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro Contents Preface ix Preface to the First Edition xi Introduction I Chapter I. A Tutorial Introduction 5 1.1 Getting Started 5 1.2 Variables and Arithmetic Expressions 8 1.3 The For Statement 13 1.4 Symbolic Constants 14 1.5 Character Input and Output 15 1.6 Arrays 22 1.7 Functions 24 1.8 Arguments-Call by Value 27 1.9 Character Arrays 28 1.10 External Variables and Scope 31 Chapter 2. Types, Operators, and Expressions 35 2.1 Variable Names 35 2.2 Data Types and Sizes 36 2.3 Constants 37 2.4 Declarations 40 2.5 Arithmetic Operators 41 2.6 Relational and Logical Operators 41 2.7 Type Conversions 42 2.8 Increment and Decrement Operators 46 2.9 Bitwise Operators 48 2.10 Assignment Operators and Expressions 50 2.11 Conditional Expressions 51 2.12 Precedence and Order of Evaluation 52 Chapter 3. Control Flow 55 3.1 Statements and Blocks 55 3.2 If-Else 55 , vi THE C PROGRAMMING LANGUAGE CONTENTS 3.3 Else-If 57 3.4 Switch 58 3.5 Loops- While and For 60 3.6 Loops- Do-while 63 3.7 Break and Continue 64 3.8 Goto and Labels 65 Chapter 4. Functions and Program Structure 67 4.1 Basicsof Functions 67 4.2 Functions Returning Non-integers 71 4.3 External Variables 73 4.4 Scope Rules 80 4.5 Header Files 81 4.6 Static Variables 83 4.7 Register Variables 83 4.8 Block Structure 84 4.9 Initialization 85 4.10 Recursion 86 4.11 The C Preprocessor 88 Chapter 5. Pointers and Arrays 93 5.1 Pointers and Addresses 93 5.2 Pointers and Function Arguments 95 5.3 Pointers and Arrays 97 5.4 Address Arithmetic 100 5.5 Character Pointers and Functions 104 5.6 Pointer Arrays; Pointers to Pointers 107 5.7 Multi-dimensionalArrays 110 5.8 Initialization of Pointer Arrays 113 5.9 Pointers vs. Multi-dimensionalArrays 113 5.10 Command-line Arguments 114 5.11 Pointers to Functions 118 5.12 Complicated Declarations 122 Chapter 6. Structures 127 6.1 Basics of Structures 127 6.2 Structures and Functions 129 6.3 Arrays of Structures 132 6.4 Pointers to Structures 136 6.5 Self-referential Structures 139 6.6 Table Lookup 143 6.7 Typedef 146 6.8 Unions 147 6.9 Bit-fields 149 Chapter 7. Input and Output 151 7.1 Standard Input and Output 151 7.2 Formatted Output-Printf 153 THE C PROGRAMMING LANGUAGE CONTENTS vii 7.3 Variable-length Argument Lists 155 7.4 Formatted Input-Scanf 157 7.5 File Access 160 7.6 Error Handling-Stderr and Exit 163 7.7 Line Input and Output 164 7.8 Miscellaneous Functions 166 Chapter 8. The UNIX System Interface 169 8.1 File Descriptors 169 8.2 Low Level I/O-Read and Write 170 8.3 Open, Creat, Close, Unlink 172 8.4 Random Access- Lseek 174 8.5 Example-An Implementation of Fopen and Getc 175 8.6 Example - Listing Directories 179 8.7 Example- A Storage Allocator 185 Appendix A. Reference Manual 191 A 1 Introduction 191 A2 Lexical Conventions 191 A3 Syntax Notation 194 A4 Meaning of Identifiers 195~ A5 Objects and Lvalues 197 A6 Conversions 197 A7 Expressions 200 A8 Declarations 210 A9 Statements 222 AI0 External Declarations 225 A 11 Scope and Linkage 227 A 12 Preprocessing 228 A13 Grammar 234 Appendix B. Standard Library 241 Bl Input and Output: 241 B2 Character Class Tests: 248 B3 String Functions: < string.h > 249 B4 Mathematical Functions: 250 B5 Utility Functions: 251 B6 Diagnostics: < assert.h > 253 B7 Variable Argument Lists: 254 B8 Non-local Jumps: 254 B9 Signals: 255 BIODate and Time Functions: < time.h > 255 Bll Implementation-defined Limits: and 257 Appendix C. Summary of Changes 259 Index 263 Preface The computing world has undergone a revolution since the publication of The C Programming Language in 1978. Big computers are much bigger, and personal computers have capabilities that rival the mainframes of a decade ago. During this time, C has changed too, although only modestly, and it has spread far beyond its origins as the language of the UNIX operating system. The growing popularity of C, the changes in the language over the years, and the creation of compilers by groups not involved in its design, combined to demonstrate a need for a more precise and more contemporary definition of the language than the first edition of this book provided. In 1983, the American ' National Standards Institute (ANSI) established a committee whose goal was to produce "an unambiguous and machine-independent definition of the language C," while still retaining its spirit. The result is the ANSI standard for C. The standard formalizes constructions that were hinted at but not described in the first edition, particularly structure assignment and enumerations. It pro- vides a new form of function declaration that permits cross-checking of defini- tion with use. It specifies a standard library, with an extensive set of functions for performing input and output, memory management, string manipulation, and similar tasks. It makes precise the behavior of features that were not spelled out in the original definition, and at the same time states explicitly which aspects of the language remain machine-dependent. This second edition of The C Programming Language describes C as defined by the ANSI standard. Although we have noted the places where the language has evolved,we have chosen to write exclusivelyin the new form. For the most part, this makes no significant difference; the most visible change is the new form of function declaration and definition. Modern compilers already support most features of the standard. We have tried to retain the brevity of the first edition. C is not a big language, and it is not well served by a big book. We have improved the exposi- tion of critical features, such as pointers, that are central to C programming. We have refined the original examples, and have added new examples in several chapters. For instance, the treatment of complicated declarations is augmented by programs that convert declarations into words and vice versa. As before, all Ix X PREFACE examples have been tested directly from the text, which is in machine-readable form. Appendix A, the reference manual, is not the standard, but our attempt to convey the essentials of the standard in a smaller space. It is meant for easy comprehension by programmers, but not as a definition for compiler writers- that role properly belongs to the standard itself. Appendix B is a summary of the facilities of the standard library. It too is meant for reference by program- mers, not implementers. Appendix C is a concise summary of the changes from the original version. As we said in the preface to the first edition, C "wears well as one's experi- ence with it grows." With a decade more experience, we still feel that way. We hope that this book will help you to learn C and to use it well. We are deeply indebted to friends who helped us to produce this second edi- tion. Jon Bentley, Doug Gwyn, Doug Mcllroy, Peter Nelson, and Rob Pike gave us perceptive comments on almost every page of draft manuscripts. We are grateful for careful reading by Al Aho, Dennis Allison, Joe Campbell, G. R. Emlin, Karen Fortgang, Allen Holub, Andrew Hume, Dave Kristol, John Linderman, Dave Prosser, Gene Spafford, and Chris Van Wyk. We also received helpful suggestions from BHl Cheswick, Mark Kernighan, Andy Koenig, Robin Lake, Tom London, Jim Reeds, Clovis Tondo, and Peter Wein- berger. Dave Prosser answered many detailed questions about the ANSI stand- ard. We used Bjarne Stroustrup's C++ translator extensively for local testing of our programs, and Pave Kristol provided us with an ANSI C compiler for final testing, Rich Drechsler helped greatly with typesetting. Our sincerethanks to all. Brian W. Kernighan Dennis M. Ritchie Preface to the First Edition C is a general-purpose programming language which features economy of expression, modern control flow and data structures, and a rich set of operators. C is not a "very high level" language, nor a "big" one, and is not specialized to any particular area of application. But its absence of restrictions and its gen- erality make it more convenient and effective for many tasks than supposedly more powerful languages. C was originally designed for and implemented on the UNIX operating sys- tem on the DEC PDP-ll, by Dennis Ritchie. The operating system, the C com- piler, and essentially all UNIX applications programs (including all of the software used to prepare this book) are written in C. Production compilers also exist for several other machines, including the IBM System/370, the Honeywell 6000, and the Interdata 8/32. C is not tied to any particular hardware or sys- tem, however, and it is easy to write programs that will run without change on any machine that supports C. This book is meant to help the reader learn how to program in C. It con- tains a tutorial introduction to get new users started as soon as possible, separate chapters on each major feature, and a reference manual. Most of the treatment is based on reading, writing and revising examples, rather than on mere statements of rules. For the most part, the examples are complete, real programs, rather than isolated fragments. All examples have been tested directly from the text, which is in machine-readable form. Besides showing how to make effective use of the language, we have also tried where possible to illus- trate useful algorithms and principles of good style and sound design. The book is not an introductory programming manual; it assumes some fam- iliarity with basic programming concepts like variables, assignment statements, loops, and functions. Nonetheless, a novice programmer should be able to read along and pick up the language, although access to a more knowledgeable col- league will help. In our experience, C has proven to be a pleasant, expressive, and versatile language for a wide variety of programs. It is easy to learn, and it wears well as one's experience with it grows. We hope that this book will help you to use it well. xi xii PREFACE TO THE 1ST EDITION The thoughtful criticisms and suggestions of many friends and colleagues have added greatly to this book and to our pleasure in writing it. In particular, Mike Bianchi, Jim Blue, Stu Feldman, Doug McIlroy, Bill Roome, Bob Rosin, and Larry Rosier all read multiple versions with care. We are also indebted to Al Aho, Steve Bourne, Dan Dvorak, Chuck Haley, Debbie Haley, Marion Harris, Rick Holt, Steve Johnson, John Mashey, Bob Mitze, Ralph Muha, Peter Nelson, Elliot Pinson, Bill PIauger, Jerry Spivack, Ken Thompson, and Peter Weinberger for helpful comments at various stages, and to Mike Lesk and Joe Ossanna for invaluable assistance with typesetting. Brian W. Kernighan Dennis M. Ritchie Introduction C is a general-purpose programming language. It has been closely associ- ated with the UNIX system where it was developed, since both the system and most of the programs that run on it are written in C. The language, however, is not tied to anyone operating system or machine; and although it has been called a "system programming language" because it is useful for writing com- pilers and operating systems, it has been used equally well to write major pro- grams in many different domains. Many of the important ideas of C stem from the language BCPL, developed by Martin Richards. The influence of BCPL on C proceeded indirectly through the language B, which was written by Ken Thompson in 1970 for the first UNIX system on the DEC PDP-7. BCPL and Bare "typeless" languages. By contrast, C provides a variety of data types. The fundamental types are characters, and integers and floating- point numbers of several sizes. In addition, there is a hierarchy of derived data types created with pointers, arrays, structures, and unions. Expressions are formed from operators and operands; any expression, including an assignment or a function call, can be a statement. Pointers provide for machine-independent address arithmetic. C provides the fundamental control-flow constructions required for well- structured programs: statement grouping, decision making (if-else), selecting one of a set of possible cases (switch), looping with the termination test at the top (while, for) or at the bottom (do), and early loop exit (break). Functions may return values of basic types, structures, unions, or pointers. Any function may be called recursively. Local variables are typically "automatic," or created anew with each invocation. Function definitions may not be nested but variables may be declared in a block-structured fashion. The functions of a C program may exist in separate source files that are compiled separately. Variables may be internal to a function, external but known only within a single source file, or visible to the entire program. A preprocessing step performs macro substitution on program text, inclusion of other source files, and conditional compilation. C is a relatively "low level" language. This characterization is not 1 1 INTRODUCTION pejorative; it simply means that C deals with the same sort of objects that most computers do, namely characters, numbers, and addresses. These may be com- bined and moved about with the arithmetic and logical operators implemented by real machines. C provides no operations to deal directly with composite objects such as character strings, sets, lists, or arrays. There are no operations that manipulate an entire array or string, although structures may be copied as a unit. The language does not define any storage allocation facility other than static defini- tion and the stack discipline provided by the local variables of functions; there is no heap or garbage collection. Finally, C itself provides no input/output facili- ties; there are no READ or WRITE statements, and no built-in file access methods. All of these higher-level mechanisms must be provided by explicitly- called functions. Most C implementations have included a reasonably standard collection of such functions. Similarly, C offers only straightforward, single-thread control flow: tests, loops, grouping, and subprograms, but not multiprogramming, parallel opera- tions, synchronization, or coroutines. Although the absence of some of these features may seem like a grave defi- ciency ("You mean I have to call a function to compare two character strings?"), keeping the language down to modest size has real benefits. Since C is relatively small, it can be described in a small space, and learned quickly. A programmer can reasonably expect to know and understand and indeed regu- larly use the entire language. For many years, the definition of C was the reference manual in the first edition of The C Programming Language. In 1983, the American National Standards Institute (ANSI) established a committee to provide a modern, comprehensive definition of C. The resulting definition, the ANSI standard, or "ANSI C," was completed late in 1988. Most of the features of the standard are already supported by modern compilers. The standard is based on the original reference manual. The language is relatively little changed; one of the goals of the standard was to make sure that most existing programs would remain valid, or, failing that, that compilers could produce warnings of new behavior. For most programmers, the most important change is a new syntax for declaring and defining functions. A function declaration can now include a description of the arguments of the function; the definition syntax changes to match. This extra information makes it much easier for compilers to detect errors caused by mismatched arguments; in our experience, it is a very useful addition to the language. There are other small-scale language changes. Structure assignment and enumerations, which had been widely available, are now officially part of the language. Floating-point computations may now be done in single precision. The properties of arithmetic, especially for unsigned types, are clarified. The preprocessor is more elaborate. Most of these changes will have only minor THE C PROGRAMMING LANGUAGE 3 effects on most programmers. A second significant contribution of the standard is the definition of a library to accompany C. It specifies functions for accessing the operating system (for instance, to read and write files), formatted input and output, memory alloca- tion, string manipulation, and the like. A collection of standard headers pro- vides uniform access to declarations of functions and data types. Programs that use this library to interact with a host system are assured of compatible behavior. Most of the library is closely modeled on the "standard 1/0 library" of the UNIX system. This library was described in the first edition, and has been widely used on other systems as well. Again, most programmers will not see much change.. Because the data types and control structures provided by C are supported directly by most computers, the run-time library required to implement self- contained programs is tiny. The standard library functions are only called explicitly, so they can be avoided if they are not needed. Most can be written in C, and except for the operating system details they conceal, are themselves port- able. Although C matches the capabilities of many computers, it is independent of any particular machine architecture. With a little care' it is easy to write port- able programs, that is, programs that can be run without change on a variety of hardware. The standard makes portability issues explicit, and prescribes a set of constants that characterize the machine on which the program is run. C is not a strongly-typed language, but as it has evolved, its type-checking has been strengthened. The original definition of C frowned on, but permitted, the interchange of pointers and integers; this has long since been eliminated, and the standard now requires the proper declarations and explicit conversionsthat had already been enforced by good compilers. The new function declarations are another step in this direction. Compilers will warn of most type errors, and there is no automatic conversion of incompatible data types. Nevertheless, C retains the basic philosophy that programmers know what they are doing; it only requires that they state their intentions explicitly. C, like any other language, has its blemishes. Some of the operators have the wrong precedence; some parts of the syntax could be better. Nonetheless, C has proven to be an extremely effective and expressive language for a wide variety of programming applications. The book is organized as follows. Chapter 1 is a tutorial on the central part of C. The purpose is to get the reader started as quickly as possible, since we believe strongly that the way to learn a new language is to write programs in it. The tutorial does assume a working knowledge of the basic elements of pro- gramming; there is no explanation of computers, of compilation, nor of the meaning of an expression like n=n+ 1. Although we have tried where possibleto show useful programming techniques, the book is not intended to be a reference work on data structures and algorithms; when forced to make a choice, we have Concentratedon the language. 4 INTRODUCTION Chapters 2 through 6 discuss various aspects of C in more detail, and rather more formally, than does Chapter 1, although the emphasis is still on examples of complete programs, rather than isolated fragments. Chapter 2 deals with the basic data types, operators and expressions. Chapter 3 treats control flow: if-else, switcb, while, for, etc. Chapter 4 covers functions and program structure-external variables, scope rules, multiple source files, and so on-and also touches on the preprocessor. Chapter 5 discusses pointers and address arithmetic. Chapter 6 covers structures and unions. Chapter 7 describes the standard library, which provides a common interface to the operating system. This library is defined by the ANSI standard and is meant to be supported on all machines that support C, so programs that use it for input, output, and other operating system access can be moved from one sys- tem to another without change. Chapter 8 describes an interface between C programs and the UNIX operat- ing system, concentrating on input/output, the file system, and storage alloca- tion. Although some of this chapter is specific to UNIX systems, programmers who use other systems should still find useful material here, including some insight into how one version of the standard library is implemented, and sugges- tions on portability. Appendix A contains a language reference manual. The official statement of the syntax and semantics of C is the ANSI standard itself. That document, however, is intended foremost for compiler writers. The reference manual here conveys the definition of the language more concisely and without the same legalistic style. Appendix B is a summary of the standard library, again for users rather than implementers. Appendix C is a short summary of changes from the original language. In cases of doubt, however, the standard and one's own compiler remain the final authorities on the language. CHAPTER 1: A Tutorial Introduction Let us begin with a quick introduction to C. Our aim is to show the essen- tial elements of the language in real programs, but without getting bogged down in details, rules, and exceptions. At this point, we are not trying to be complete or even precise (save that the examples are meant to be correct). We want to get you as quickly as possible to the point where you can write useful programs, and to do that we have to concentrate on the basics: variables and constants, arithmetic, control flow, functions, and the rudiments of input and output. We are intentionally leaving out of this chapter features of C that are important for writing bigger programs. These include pointers, structures, most of C's rich set of operators, several control-flow statements, and the standard library. This approach has its drawbacks. Most notable is that the complete story on any particular language feature is not found here, and the tutorial, by being brief, may also be misleading. And because the examples do not use the full power of C, they are not as concise and elegant as they might be. We have tried to minimize these effects, but be warned. Another drawback is that later chapters will necessarily repeat some of this chapter. We hope that the repeti- tion will help you more than it annoys. In any case, experienced programmers should be able to extrapolate from the material in this chapter 'to their own programming needs. Beginners should sup- plement it by writing small, similar programs of their own. Both groups can use it as a framework on which to hang the more detailed descriptions that begin in Chapter 2. 1. 1 Getting Started The only way to learn a new programming language is by writing programs in it. The first program to write is the same for all languages: Print the words hello, world This is the big hurdle; to leap over it you have to be able to create the program s 6 A TUTORIAL INTRODUCTION CHAPTER 1 text somewhere, compile it successfully, load it, run it, and find out where your output went. With these mechanical details mastered, everything else is com- paratively easy. In C, the program to print "hello, world"is #include maine ) { printf("hello, world\n"); } Just how to run this program depends on the system you are using. As a specific example, on the UNIX operating system you must create the program in a file whose name ends in ". c", such as hello. c, then compile it with the command cc hello.c If you haven't botched anything, such as omitting a character or misspelling something, the compilation will proceed silently, and make an executable file called a. out. Ifyou run a. out by typing the command a.out it will print hello, world On other systems, the rules will be different; check with a local expert. Now for some explanations about the program itself. A C program, what- ever its.size, consists of functions and variables. A function contains state- ments that specify the computing operations to be done, and variables store values used during the computation. C functions are like the subroutines and functions of Fortran or the procedures and functions of Pascal. Our example is a function named. main. Normally you are at liberty to give functions whatever names you like, but "main" is special-your program begins executing at the beginning of main. This means that every program must have a main some- where. main will usually call other functions to help perform its job, some that you wrote, and others from libraries that are provided for you. The first line of the program, #include tells the compiler to include information about the standard input/output library; this line appears at the beginning of many C source files. The standard library is described in Chapter 7 and Appendix B. One method of communicating data between functions is for the calling function to provide a list of values, called arguments, to the function it calls. The parentheses after the function name surround the argument list. In this SECTION 1.1 GETTING STARTED 7 #include include information about standard library maine) define a Junction named main that receives no argument values { statements oj main are enclosed in braces printf( "hello, world\n"); main calls libraryJunction printf to print this sequence oj characters; } \n represents the newline character The first C program. example, main is defined to be a function that expects no arguments, which is indicated by the empty list ( ). The statements of a function are enclosed in braces {}. The function main contains only one statement, printf( "hello, world\n"); A function is called by naming it, followedby a parenthesized list of arguments, so this calls the function printf with the argument "hello, world\n". printf is a library function that prints output, in this case the string of char- acters between the quotes. A sequence of characters in double quotes, like "hello, world\n", is called a character string or string constant. For the moment our only use of character strings will be as arguments for printf and other functions. The sequence \n in the string is C notation for the newline character, which when printed advances the output to the left margin on the next line. If you leave out the \n (a worthwhile experiment), you will find that there is no line advance after the output is printed. You must use \n to include a newline character in the printf argument; if you try something like printf("hello, world ") ; the C compiler will produce an error message. printf never supplies a newline automatically, so several calls may be used to build up an output line in stages. Our "first program could just as well have been written 8 A TUTORIAL INTRODUCTION CHAPTER 1 #include maine ) { printf("hello, "); printf("world"); printf ("'n" ); } to produce identical output. Notice that \n represents only a single character. An escape sequence like \n provides a general and extensible mechanism for representing hard-to-type or invisible characters. Among the others that C provides are \ t for tab, \b for backspace, \ n for the double quote, and \ \ for the backslash itself. There is a complete list in Section 2.3. Exercise 1-1. Run the "hello, world" program on your system. Experiment with leaving out parts of the program, to see what error messages you get. 0 Exercise 1-2. Experiment to find out what happens when printf's argument string contains \c, where c is some character not listed above. 0 1.2 Variables and Arithmetic Expressions The next program uses the formula 0C - (519)(0 F-32) to print the follow- ing table of Fahrenheit temperatures and their centigrade or Celsius equivalents: o -17 20 -6 40 4 60 15 80 26 100 37 120 48 140 60 160 71 180 82 200 93 220 104 240 115 260 126 280 137 300 148 The program itself still consists of the definition of a single function named main. It is longer than the one that printed "hello, world", but not compli- cated. It introduces several new ideas, including comments, declarations, vari- ables, arithmetic expressions, loops, and formatted output. SECTION 1.2 VARIABLES AND ARITHMETIC EXPRESSIONS 9 #include 1* print Fahrenheit-Celsius table for fahr = 0, 20,..., 300 *1 maine ) { int fahr, celsius; int lower, upper, step; lower = 0; 1* lower limit of temperature table *1 upper = 300; 1* upper limit *1 step = 20; 1* step size *1 fahr = lower; while (fahr = 0 *1 int power(int base, int n) { int i, p; p = 1; for (i = 1; i = 0 *1 1* (old-style version) *1 power (base, n) int base, n; { int i, p; p = 1; for (i = 1; i =O; version 2 *1 int power(int base, int n) { int p; for (p = 1; n > 0; --n) p = p * base; return p; } The parameter n is used as a temporary variable, and is counted down (a for loop that runs backwards) until it becomes zero; there is no longer a need for the variable i. Whatever is done to n inside power has no effect on the argu- ment that power was originally called with. When necessary, it is possible to arrange for a function to modify a variable 28 A TUTORIAL INTRODUCTION CHAPTER 1 in a calling routine. The caller must provide the address of the variable to be set (technically a pointer to the variable), and the called function must declare the parameter to be a pointer and access the variable indirectly through it. We will cover pointers in Chapter 5. The story is different for arrays. When the name of an array is used as an argument, the value passed to the function is the location or address of the beginning of the array-there is no copying of array elements. By subscripting this value, the function can access and alter any element of the array. This is the topic of the next section. 1.9 Character Arrays The most common type of array in C is the array of characters. To illus- trate the use of character arrays and functions to manipulate them, let's write a program that reads a set of text lines and prints the longest. The outline is sim- ple enough: while (there's another line) if (it's longer than the previous longest) save it save its length print longest line This outline makes it clear that the program divides naturally into pieces. One piece gets a new line, another tests it, another saves it, and the rest controls the process. Since things divide so nicely, it would be well to write them that way too. Accordingly, let us first write a separate function getline to fetch the next line of input. We will try to make the function useful in other contexts. At the minimum, getline has to return a signal about possible end of file; a more useful design would be to return the length of the line, or zero if end of file is encountered. Zero is an acceptable end-of-file return because it is never a valid line length. Every text line has at least one character; even a line containing only a newline has length 1. When we find a line that is longer than the previous longest line, it must be saved somewhere. This suggests a second function, copy, to copy the new line to a safe place. Finally, we need a main program to control getline and copy. Here is the result. SECTION 1.9 CHARACTER ARRAYS 29 #include Idefine MAXLINE 1000 1* maximum input line size *1 int getline{char line[], int maxline); void copy{char to[], char from[]); 1* print longest input line *1 main{ ) { int len; 1* current line length *1 int max; 1* maximum length seen so far *1 char line[MAXLINE]; 1* current input line *1 char longest[MAXLINE]; 1* longest line saved here *1 max = 0; while {{len = getline(line, MAXLINE» > 0) if (len> max) { max = len; copy{longest, line); } if (max> 0) 1* there was a line *1 printf{"%s", longest); return 0; } 1* getline: read a line into s, return length *1 int getline{char s[], int lim) { int c, i; for (i=O; i= < b) z = a', else z = b; compute in z the maximum of a and b. The conditional expression, written with the ternary operator "?:", provides an alternate way to write this and similar constructions. In the expression expr I ? expr 2 : expr 3 the expression expr 1 is evaluated first. If it is non-zero (true), then the expres- sion expr-. is evaluated, and that is the value of the conditional expression. Otherwise expr , is evaluated, and that is the value. Only one of expr- and expr 3 is evaluated. Thus to set z to the maximum of a and b, z = (a > b) ? a : b; 1* z = max(a, b) *1 It should be noted that the conditional expression is indeed an expression, and it can be used wherever any other expression can be. If expr-. and expr3 52 TYPES, OPERA TORS AND EXPRESSIONS CHAPTER 2 are of different types, the type of the result is determined by the conversion rules discussed earlier in this chapter. For example, if f is a float and n is an int, then the expression (n > 0) ? f : n is of type float regardless of whether n is positive. Parentheses are not necessary around the first expression of a conditional expression, since the precedence of ?: is very low, just above assignment. They are advisable anyway, however, since they make the condition part of the expression easier to see. The conditional expression often leads to succinct code. For example, this loop prints n elements of an array, 10 per line, with each column separated by one blank, and with each line (including the last) terminated by a newline. for (i = 0; i < n; i++) printf( ""6d"c", a[i], (i%10==9 :: i==n-1) ? '\n' : ' '); A newline is printed after every tenth element, and after the n-th. All other elements are followed by one blank. This might look tricky, but it's more com- pact than the equivalent if-else. Another good example is printf("You have %d item%s.\n",· n, n==1 ? "" : "s"); Exercise 2-10. Rewrite the function lower, which converts upper case letters to lower case, with a conditional expression instead of if-else. 0 2. 12 Precedence and Order of Evaluation Table 2-1 summarizes the rules for precedence and associativity of all opera- tors, including those that we have not yet discussed. Operators on the same line have the same precedence; rows are in order of decreasing precedence, so, for example, *, I, and" all have the same precedence, which is higher than that of binary + and -. The "operator" () refers to function call. The operators -> and are used to access members of structures; they will be covered in Chapter 6, along with sizeof (size of an object). Chapter 5 discusses * (indirection through a pointer) and &. (address of an object), and Chapter 3 discusses the comma operator. Note that the precedenceof the bitwise operators &, ''', and I falls below == and I=. This implies that bit-testing expressions like if « x & MASK) == 0)... must be fully parenthesized to give proper results. C, like most languages, does not specify the order in which the operands of an operator are evaluated. (The exceptions are ss, I I, ?:, and ','.) For example, in a statement like SECTION 2.12 PRECEDENCE AND ORDER OF EVALUATION 53 TABLE 2-1. PRECEDENCE AND ASSOCIATIVITYOF OPERATORS OPERATORS ASSOCIATIVITY ( ) [] -> left to right.. ++ -- + - * s, (type) sizeof right to left left to right * + I " left to right « » left to right < >= left to right -- 1= left to right &. Ieit to right left to right left to right && left to right I I I I left to right ?: right to left = += -= *= 1= %=&= A= 1= «= »= right to left left to right Unary +, -, and * have higher precedence than the binary forms. x = f() + g(); f may be evaluated before g or vice versa; thus if either f or g alters a variable on which the other depends, x can depend on the order of evaluation. Inter- mediate results can be stored in temporary variables to ensure a particular sequence. Similarly, the order in which function arguments are evaluated is not speci- fied, so the statement printf (""d %d\n", ++n, power(2, n»; 1* WRONG *1 can produce different results with different compilers, depending on whether n is incremented before power is called. The solution, of course, is to write ++n; printf( ""d %d\n", n, power(2, n»; Function calls, nested assignment statements, and increment and decrement operators cause "side effects" -some variable is changed as a by-product of the evaluation of an expression. In any expression involving side effects, there can be subtle dependencies on the order in which variables taking part in the expres- sion are updated. One unhappy situation is typified by the statement a[i] = i++; The question is whether the subscript is the old value of i or the new. 54 TYPES, OPERATORS AND EXPRESSIONS CHAPTER 2 Compilers can interpret this in different ways, and generate different answers depending on their interpretation. The standard intentionally leaves most such matters unspecified. When side effects (assignment to variables) take place within an expression is left to the discretion of the compiler, since the best order depends strongly on machine architecture. (The standard does specify that all side effects on arguments take. effect before a function is called, but that would not help in the call to printf above.) The moral is that writing code that depends on order of evaluation is a bad programming practice in any language. Naturally, it is necessary to know what things to avoid, but if you don't know how they are done on various machines, you won't be tempted to take advantage of a particular implementation. CHAPTER 3: Control Flow The control-flow statements of a language specify the order in which compu- tations are performed. We have already met the most common control-flow constructions in earlier examples; here we will complete the set, and be more precise about the ones discussed before. 3. 1 Statements and Blocks An expression such as x = 0 or i++ or printf (...) becomes a statement when it is followedby a semicolon,as in x= 0; i++; printf (...) ; In C, the semicolon is a statement terminator, rather than a separator as it is in languages like Pascal. Braces { and} are used to group declarations and statements together into a compound statement, or block, so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after an if, else, while, or for are another. (Variables can be declared inside any block; we will talk about this in Chapter 4.) There is no semicolon after the right brace that ends a block. 3.2 If-Else The if -else statement is used to express decisions. Formally, the syntax is if (expression) statement 1 else statement 2 55 56 CONTROL FLOW CHAPTER 3 where the else part is optional. The expression is evaluated; if it is true {that is, if>expression has a non-zero value}, statement 1 is executed. If it is false {expression is zero} and if there is an else part, statement 2 is executed instead. Since an if simply tests the numeric value of an expression, certain coding shortcuts are possible. The most obvious is writing if (expression) instead of if (expression 1= 0) Sometimes this is natural and clear; at other times it can be cryptic. Because the else part of an if-else is optional, there is an ambiguity when an else is omitted from a nested if sequence. This is resolved by asso- ciating the else with the closest previous else-less if. For example, in if (n > 0) if (a > b) z = a; else z = b; the else goes with the inner if, as we have shown by indentation. If that isn't what you want, braces must be used to force the proper association: if (n > 0) { if (a > b) z = a; } else z = b; The ambiguity is especially pernicious in situations like this: if (n >= 0) for (i = 0; i < n; i++) if (s [ i] > 0) { printf( "... n); return i; } printf("error -- n is neqative\n"); The indentation shows unequivocally what you want, but the compiler doesn't get the message, and associates the else with the inner if. This kind of bug can be hard to find; it's a good idea to use braces when there are nested ifs. By the way, notice that there is a semicolon after z = a in SECTION 3.3 ELSE·IF 57 if (a > b) z a; = else z = b; This is because grammatically, a statement follows the if, and an expression statement like "z = a;" is always terminated by a semicolon. 3.3 Else-If The construction if (expression) statement else if (expression) statement else if (expression) statement else if (expression) statement else statement occurs so often that it is worth a brief separate discussion. This sequence of if statements is the most general way of writing a multi-way decision. The expressions are evaluated in order; if any expression is true, the statement asso- ciated with it is executed, and this terminates the whole chain. As always, the code for each statement is either a single statement, or a group in braces. The last else part handles the "none of the above" or default case where none of the other conditions is satisfied. Sometimes there is no explicit action for the default; in that case the trailing else statement can be omitted, or it may be used for error checking to catch an "impossible" condition. To illustrate a three-way decision, here is a binary search function that decides if a particular value x occurs in the sorted array v. The elements of v must be in increasing order. The function returns the position (a number between 0 and n-1) if x occurs in v, and -1 if not. Binary search first compares the input value x to the middle element of the array v. If x is less than the middle value, searching focuses on the lower half of the table, otherwise on the upper half. In either case, the next step is to com- pare x to the middle element of the selected half. This process of dividing the range in two continues until the value is found or the range is empty. 58 CONTROL FLOW CHAPTER 3 1* binsearch: find x in v[O] =O && v[j]>v[j+gap]; j-=gap) { temp = v[ j]; v[j] = v[j+gap]; v[j+gap] = temp; } } There are three nested loops. The outermost controls the gap between com- pared elements, shrinking it from n/2 by a factor of two each pass until it becomes zero. The middle loop steps along the elements. The innermost loop compares each pair of elements that is separated by gap and reverses any that are out of order. Since gap is eventually reduced to one, all elements are even- tually ordered correctly. Notice how the generality of the for makes the outer loop fit the same form as the others, even though it is not an arithmetic progres- sion. One final C operator is the comma " , ", which most often finds use in the for statement. A pair of expressions separated by a comma is evaluated left to right, and the type and value of the result are the type and value of the right operand. Thus in a for statement, it is possible to place multiple expressionsin the various parts, for example to process.two indices in parallel. This is illus- trated in the function reverse (s ), which reverses the string s in place. #include 1* reverse: reverse string s in place *1 void reverse(char s[]) { int e , i, j; for (i = 0, j = strlen(s)-1; i < j; i++, j--) { c = s[i]; 8 = s[j]; s[j] c; = } } SECTION 3.6 LooPS-OO-WHILE 63 The commas that separate function arguments, variables in declarations, etc., are not comma operators, and do not guarantee left to right evaluation. Comma operators should be used sparingly. The most suitable uses are for constructs strongly related to each other, as in the for loop in reverse, and in macros where a multistep computation has to be a single expression. A comma expression might also be appropriate for the exchange of elements in reverse, where the exchange can be thought of as a single operation: for (i = 0, j = strlen(s)-1; i < j; i++, j--) C = s[i], s[i] = s[j], s[j] = c; Exercise 3-3. Write a function expand (s 1, s2) that expands shorthand nota- tions like a - z in the string s 1 into the equivalent complete list abc... xyz in s2. Allow for letters of either case and digits, and be prepared to handle cases like a-b-c and a-zO-9 and -a-z. Arrange that a leading or trailing - is taken literally. 0 3.6 Loops-Do-whlle As we discussed in Chapter 1, the while and for loops test the termination condition at the top. By contrast, the third loop in C, the do-while, tests at the bottom after making each pass through the loop body; the body is always executed at least once. The syntax of the do is do statement whi 1 e ( expression) ; The statement is executed, then expression is evaluated. If it is true, statement is evaluated again, and so on. When the expression becomes false, the loop ter- minates. Except for the sense of the test, do-while is equivalent to the Pascal repeat-until statement. Experience shows that do-while is much less used than while and for. Nonetheless, from time to time it is valuable, as in the following function i toa, which converts a number to a character string (the inverse of atoi). The job is slightly more complicated than might be thought at first, because the easy methods of generating the digits generate them in the wrong order. We have chosen to generate the string backwards, then reverse it. 64 CONTROL FLOW CHAPTER 3 1* itoa: convert n to characters in s *1 void itoa(int n, char s[]) { int i, sign; if «sign = n) < 0) 1* record sign *1 n = -n; 1* make n positive *1 i = 0; do { 1* generate digits in reverse order *1 s[i++] = n % 10 + '0'; 1* get next digit *1 } while «n 1= 10) > 0); 1* delete it *1 if (sign < 0) s[i++] = '-'; s[i] = '\0'; reverse(s); } The do-while is necessary, or at least convenient, since at least one character must be installed in the array s, even if n is zero. We also used braces around the single statement that makes up the body of the do-while, even though they are unnecessary, so the hasty reader will not mistake the while part for the beginning of a whi1e loop. Exercise 3-4. In a two's complement number representation, our version of i toa does not handle the largest negative number, that is, the value of n equal to _(2wordsize-l). Explain why not. Modify it to print that value correctly, regardless of the machine on which it runs. 0 Exercise 3-5. Write the function i tob (n, s ,b) that converts the integer n into a base b character representation in the string s. In particular, i tob (n, s, 16) formats n as a hexadecimal integer in s. 0 Exercise 3-6. Write a version of i toa that accepts three arguments instead of two. The third argument is a minimum field width; the converted number must be padded with blanks on the left if necessary to make it wide enough. 0 3.7 Break and Continue It is sometimes convenient to be able to exit from a loop other than by test- ing at the top or bottom. The break statement provides an early exit from for, while, and do, just as from switch. A break causes the innermost enclosing loop or switch to be exited immediately. The following function, trim, removes trailing blanks, tabs, and newlines from the end of a string, using a break to exit from a loop when the rightmost non-blank, non-tab, non-newline is found. SECTION 3.8 GOTO AND LABELS 65 1* trim: remove trailing blanks, tabs, newlines *1 int trim(char s[]) { int n; for (n = strlen(s)-1; n >= 0; n--) if (s[n ] I= ' , && s[n ] I= ' \ t' && s[n ] I= ' \n' ) break; s[n+1] = '\0'; return n; } strlen returns the length of the string. The for loop starts at the end and scans backwards looking for the first character that is not a blank or tab or newline. The loop is broken when one is found, or when n becomes negative (that is, when the entire string has been scanned). You should verify that this is correct behavior even when the string is empty or contains only white space characters. The continue statement is related to break, but less often used; it causes the next iteration of the enclosing for, while, or do loop to begin. In the while and do, this means that the test part is executed immediately; in the for, control passes to the increment step. The continue statement applies only to loops, not to switch. A continue inside a switch inside a loop causes the next loop iteration. As an example, this fragment processes only the non-negative elements in the array a; negative values are skipped. for (i = 0; i < n; i++) { if (a[i] < 0) 1* skip negative elements *1 continue; 1* do positive elements *1 } The continue statement is often used when the part of the loop that followsis complicated, so that reversing a test and indenting another level would nest the program too deeply. 3.8 Goto and Labels C provides the infinitely-abusable goto statement, and labels to branch to. Formally, the goto is never necessary, and in practice it is almost always easy to write code without it. We have not used goto in this book. Nevertheless, there are a few situations where gotos may find a place. The most common is to abandon processing in some deeply nested structure, such as breaking out of two or more loops at once. The break statement cannot be used directly since it only exits from the innermost loop. Thus: 66 CONTROL FLOW CHAPTER 3 for (...) for (...) { if (disaster) goto error; } error: clean up the mess This organization is handy if the error-handling code is non-trivial, and if errors can occur in several places. A label has the same form as a variable name, and is followedby a colon. It can be attached to any statement in the same function as the qoto. The scope of a label is the entire function. As another example, consider the problem of determining whether two arrays a and b have an element in common. One possibility is for (i = 0; i < n; i++) for (j = 0; j < m; j++) if (a[i] b[j]) == goto found; 1* didn't find any common element *1 found: 1* got one: a[i] == b[j] *1 Code involvinga qoto can always be written without one, though perhaps at the price of some repeated tests or an extra variable. For example, the array search becomes found = 0; for (i = 0; i < n && Ifound; i++) for (j = 0; j < m && Ifound; j++) if (a[i] b[j]) == found = 1; if (found) 1* got one: a[i-1] == b[j-1] *1 else 1* didn't find any common element *1 With a few exceptions like those cited here, code that relies on qoto state- ments is generally harder to understand and to maintain than code without qotos. Although we are not dogmatic about the matter, it does seem that qoto statements should be used rarely, if at all. CHAPTER 4: Functions and Program Structure Functions break large computing tasks into smaller ones, and enable people to build on what others have done instead of starting over from scratch. Appropriate functions hide details of operation from parts of the program that don't need to know about them, thus clarifying the whole, and easing the pain of making changes. C has been designed to make functions efficient and easy to use; C programs generally consist of many small functions rather than a few big ones. A pro- gram may reside in one or more source files. Source files may be compiled separately and loaded together, along with previously compiled functions from libraries. We will not go into that process here, however, since the details vary from system to system. Function declaration and. definition is the area where the ANSI standard has made the most visible changes to C. As we saw first in Chapter 1, it is now possible to declare the types of arguments when a function is declared. The syntax of function definition also changes, so that declarations and definitions match. This makes it possible for a compiler to detect many more errors than it could before. Furthermore, when arguments are properly declared, appropriate type coercions are performed automatically. The standard clarifies the rules on the scope of names; in particular, it requires that there be only one definition of each external object. Initialization is more general: automatic arrays and structures may now be initialized. The C preprocessor has also been enhanced. New preprocessor facilities include a more complete set of conditional compilation directives, a way to create quoted strings from macro arguments, and better control over the macro expansion process. 4.1 Basics of Functions To begin, let us design and write a program to print each line of its input that contains a particular "pattern" or string of characters. (This is a special case of the UNIX program grep.) for example, searching for the pattern of 67 68 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 letters "ould"in the set of lines Ah Lovel could you and I with Fate conspire To grasp this sorry Scheme of Things entire, Would not we shatter it to bits -- and then Re-mould it nearer to the Heart's Desire I will produce the output Ah Lovel could you and I with Fate conspire Would not we shatter it to bits -- and then Re-mould it nearer to the Heart's Desire I The job falls neatly into three pieces: while (there's another line) if (the line contains the pattern) print it Although it's certainly possible to put the code for all of this in main, a better way is to use the structure to advantage by making each part a separate function. Three small pieces are easier to deal with than one big one, because irrelevant details can be buried in the functions, and the chance of unwanted interactions is minimized. And the pieces may even be useful in other pro- grams. "While there's another line" is getline, a function that we wrote in Chapter 1, and "print it" is printf, which someone has already provided for us. This means we need only write a routine to decide whether the line contains an occurrence of the pattern. We can solve that problem by writing a function strindex (s , t) that returns the position or index in the string s where the string t begins;. or -1 if s doesn't contain t. Because C arrays begin at position zero, indexes will be zero or positive, and so a negative value like -1 is convenient for signaling failure. When we later need more sophisticated pattern matching, we only have to replace strindex; the rest of the code cart remain the same. (The standard library provides a function strstr that is similar to strindex, except that it returns a pointer instead of an index.) Given this much design, filling in the details of the program is straightfor- ward. Here is the whole thing, so you can see how the pieces fit together. For now, the pattern to be searched for is a literal string, which is not the most gen- eral of mechanisms. We will return shortly to a discussion of how to initialize character arrays, and in Chapter 5 will show how to make the pattern a param- eter that is set when the program is run. There is also a slightly different.ver- sion of getline; you might find it instructive to compare it to the one in Chapter 1. SECTION 4.1 BASICS OF FUNCTIONS 69 #include #define MAXLINE 1000 1* maximum input line length *1 int getline (char line [], int max); int strindex(char source[], char searchfor[]); char pattern[] = "ould"; 1* pattern to search for *1 1* find all lines matching pattern *1 main( ) { char line[MAXLINE]; int found = 0; while (getline(line, MAXLINE) > 0) if (strindex(line, pattern) >= 0) { printf("%s", line); found++; } return found; } 1* getline: get line into s, return length *1 int getline(char s[], int lim) { int c, i; i = 0; while (--lim> 0 && (c=getchar(» 1= EOF && c 1= '\n') s[i++] = c; if (c = = ' \n' ) s[i++] = c; s[i] = '\0'; return i; } 1* strindex: return index of t in s, -1 if none *1 int strindex(char s[], char t[]) { int i, j, k; for (i = 0; s[i] 1= '\0'; i++) { for (j=i, k=O; t[k]I='\O' && s[j]==t[k]; j++, k++) ; if (k > 0 && t[k] == '\0') return i; } return -1; } Each function definition has the form 70 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 return-type function-name (argument declarations) { declarations and statements } Various parts may be absent; a minimal function is dummy() {} which does nothing and returns nothing. A do-nothing function like this is sometimes useful as a place holder during program development. If the return type is omitted, int is assumed. A program is just a set of definitions of variables and functions. Communi- cation between the functions is by arguments and values returned by the func- tions, and through external variables. The functions can occur in any order in the source file, and the source program can be split into multiple files, so long as no function is split. The returh statement is the mechanism for returning a value from the called function to its caller. Any expressioncan follow return: return expression; The expression will be converted to the return type of the function if necessary. Parentheses are often used around the expression, but they are optional. The calling function is free to ignore the returned value. Furthermore, there need be no expression after return; in that case, no value is returned to the caller. Control also returns to the caller with no value when execution "falls off the end" of the function by reaching the closing right brace. It is.not illegal, but probably a sign of trouble, if a function returns a value from one place and no value from another. In any case, if a function fails to return a value, its "value" is certain to be garbage. The pattern-searching program returns a status from main,the number of matches found. This value is available for use by the environment that called the program. The mechanics of how to compile and load a C program that resides on mul- tiple source files vary from one system to the next. On the UNIX system, for example, the cc command mentioned in Chapter 1 does the job. Suppose that the three functions are stored in three files called main.c, qetline. c, and strindex. c. Then the command cc main.c getline.c strindex.c compiles the three files, placing the resulting object code in files main.0, qetline.o, and strindex. 0, then loads them all into an executable file called a. out. If there is an error, say in main.c, that file can be recompiled by itself and the result loaded with the previous object files, with the command cc main.c qetline.o strindex.o The cc command uses the " c" versus ".0" naming convention to distinguish SECTION 4.2 FUNCTIONS RETURNING NON-INTEGERS 71 source files from object files. Exercise 4-1. Write the function strrindex (s, t), which returns the position of the rightmost occurrence of t in s, or - 1 if there is none. 0 4.2 Functions Returning Non-Integers So far our examples of functions have returned either no value (void) or an into What if a function must return some other type? Many numerical func- tions like sqrt, sin, and cos return double; other specialized functions return other types. To illustrate how to deal with this, let us write and use the function atof (s), which converts the string s to its double-precision floating- point equivalent. atof is an extension of atoi, which we showed versionsof in Chapters 2 and 3. It handles an optional sign and decimal point, and the pres- ence or absence of either integer part or fractional part. Our version is not a high-quality input conversion routine; that would take more space than we care to use. The standard library includes an atof; the header declares it. First, atof itself must declare the type of value it returns, since it is not into The type name precedes the function name: #include 1* atof: convert string s to double *1 double atof(char s[]) { double val, power; int i, sign; for (i = 0; isspace(s[i]); i++) 1* skip white space *1 sign = (s[i] == '-') ? -1 : 1; if (s[i] == '+' I I sri] == '-') i++; for (val = 0.0; isdigit(s[i]); i++) val = 10.0 * val + (s[i] - '0'); if (s[i] == '.') i++; for (power = 1.0; isdigit(s[i]); i++) { val = 10.0 * val + (s[i] - '0'); power *= 10.0; } return sign * val I power; } Second, and just as important, the calling routine must know that atof returns a non-int value. One way to ensure this is to declare atof explicitly 71 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 in the calling routine. The declaration is shown in this primitive calculator (barely adequate for check-book balancing), which reads one number per line, optionally preceded by a sign, and adds them up, printing the running sum after each input: #include #define MAXLINE 100 1* rudimentary calculator *1 main( ) { double sum, atof(char []); char line[MAXLINE]; int qetline(char liner], int max); sum = 0; while (qetline(line, MAXLINE) > 0) printf("\t"q\n", sum += atof(line»; return 0; } The declaration double sum, atof(char []); says that sumis a double variable, and that atof is a function that takes one char [ ] argument and returns a double. The function atof must be declared and defined consistently. If atof itself and the call to it in main have inconsistent types in the same source file, the error will be detected by the compiler. But if (as is more likely) atof were compiled separately, the mismatch would not be detected, atof would return a double that main would treat as an int, and meaningless answers would result. In the light of what we have said about how declarations must match defini- tions, this might seem surprising. The reason a mismatch can happen is that if there is no function prototype, a function is implicitly declared by its first appearance in an expression, such as sum += atof(line) If a name that has.not been previously declared occurs in an expression and is followed by a left parenthesis, it is declared by context to be a function name, the function is assumed to return an int, and nothing is assumed about its arguments. Furthermore, if a function declaration does not include arguments, as in double atof(); that too is taken to mean that nothing is to be assumed about the arguments of atof; all parameter checking is turned off. This special meaning of the empty SECTION 4.3 EXTERNAL VARIABLES 73 argument list is intended to permit older C programs to compile with new com- pilers. But it's a bad idea to use it with new programs. If the function takes arguments, declare them; if it takes no arguments, use void. Given atof, properly declared, we could write atoi (convert a string to int) in terms of it: 1* atoi: convert string s to integer using atof *1 int atoi(char s[]) { double atof(char s[]); return (int) atof(s); } Notice the structure of the declarations and the return statement. The value of the expression in return expression; is converted to the type of the function before the return is taken. Therefore, the value of atof, a double, is converted automatically to int when it appears in this return, since the function atoi returns an into This opera- tion does potentially discard information, however, so some compilers warn of it. The cast states explicitly that the operation is intended, and suppresses any warning. Exercise 4-2. Extend atof to handle scientific notation of the form 123.45e-6 where a floating-point number may be followed by e or E and an optionally signed exponent. 0 4.3 External Variables A C program consists of a set of external objects, which are either variables or functions. The adjective "external" is used in contrast to "internal," which describes the arguments and variables defined inside functions. External vari- ables are defined outside of any function, and are thus potentially available to many functions. Functions themselves are always external, because C does not allow functions to be defined inside other functions. By default, external vari- ables and functions have the property that all references to them by the same name, even from functions compiled separately, are references to the same thing. (The standard calls this property external linkage.) In this sense, exter- nal variables are analogous to Fortran COMMON blocks or variables in the outermost block in Pascal. We will see later how to define external variables and functions that are visible only within a single source file. 74 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 Because external variables are globally accessible, they provide an alternative to function arguments and return values for communicating data between func- tions. Any function may access an external variable by referring to it by name, if the name has been declared somehow. If a large number of variables must be shared among functions, external variables are more convenient and efficient than long argument lists. As pointed out in Chapter 1, however, this reasoning should be applied with some caution, for it can have a bad effect on program structure, and lead to programs with too many data connections between functions. External variables are also useful because of their greater scope and lifetime. Automatic variables are internal to a function; they come into existence when the function is entered, and disappear when it is left. External variables, on the other hand, are permanent, so they retain values from one function invocation to the next. Thus if two functions must share some data, yet neither calls the other, it is often most convenient if the shared data is kept in external variables rather than passed in and out via arguments. Let us examine this issue further with a larger example. The problem is to write a calculator program that provides the operators +, -, *, and I. Because it is easier to implement, the calculator will use reverse Polish notation instead of infix. (Reverse Polish is used by some pocket calculators, and in languages like Forth and Postscript.) In reverse Polish notation, each operator follows its operands; an infix expression like (1 - 2) * (4 + 5) is entered as 1 2 - 45+ * Parentheses are not needed; the notation is unambiguous as long as we know how many operands each operator expects. The implementation is simple. Each operand is pushed onto a stack; when an operator arrives, the proper number of operands (two for binary operators) is popped, the operator is applied to them, and the result is pushed back onto the stack. In the example above, for instance, 1 and 2 are pushed, then replaced by their difference, -1. Next, 4 and 5 are pushed and then replaced by their sum, 9. The product of -1 and 9, which is -9, replaces them on the stack. The value on the top of the stack is popped and printed when the end of the input line is encountered. The structure of the program is thus a loop that performs the proper opera- tion on each operator and operand as it appears: SECTION 4.3 EXTERNAL VARIABLES 75 while (next operator or operand is not end-of-file indicator) if (number) push it else if (operator) pop operands do operation push result else if (newline) pop and print top of stack else error The operations of pushing and popping a stack are trivial, but by the time error detection and recovery are added, they are long enough that it is better to put each in a separate function than to repeat the code throughout the whole program. And there should be a separate function for fetching the next input operator or operand. The main design decision that has not yet been discussed is where the stack is, that is, which routines access it directly. One possibility is to keep it in main, and pass the stack and the current stack position to the routines that push and pop it. But main doesn't need to know about the variables that con- trol the stack; it only does push and pop operations. So we have decided to store the stack and its associated information in external variables accessible to the push and pop functions but not to main. Translating this outline into code is easy enough. If for now we think of the program as existing in one source file, it will look like this: 'includes.'defines function declarations for main main() {... } external variables for push and pop void push(double f) {... } double pop(void) {... } int getop (char s [ ]) {... } routines called by getop Later we will discuss how this might be split into two or more source files. The function main is a loop containing a big switch on the type of opera- tor or operand; this is a more typical use of swi tch than the one shown in Sec- tion 3.4. 76 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 #include #include 1* for atof() *1 #define MAXOP 100 1* max size of operand or operator *1 #define NUMBER '0' 1* signal that a number was found *1 int getop(char []); void push(double); double pop(void); 1* reverse Polish calculator *1 maine ) { int type; double op2;' char s[MAXOP] ; while «type = getop(s» 1= EOF) { switch (type) { case NUMBER: push(atof(s»; break; case '+' : push(pop() + pope»~; break; case '*': push(pop() * pope»~; break; case '-' : op2 = pop(); push (pop () - op2); break; case 'I': op2 = pop(); if (op2 1= 0.0) push(pop() lop2); else printf(nerror: zero divisor\nn); break; case '\n': printf (n\t".8g\n n, pop ()); break; default: printf(nerror: unknown command "s\nn, s); break; } } return 0; } SECTION 4.3 EXTERNAL VARIABLES 77 Because + and * are commutative operators, the order in which the popped operands are combined is irrelevant, but for - and / the left and right operands must be distinguished. In push(pop() - pOp(»; the order in which the two calls of pop are evaluated is not defined. To guarantee the right order, it is necessary to pop the first value into a temporary variable as we did in main. #define MAXVAL 100 1* maximum depth of val stack *1 int sp = 0; 1* next free stack position *1 double val[MAXVAL]; 1* value stack *1 1* push: push f onto value stack *1 void push(double f) { if (sp < MAXVAL) val[sp++] = f; else printf("error: stack full, can't push "9'\n",f); } 1* pop: pop and return top value from stack *1 double pop(void) { if (sp > 0) return val[--sp]; else { printf("error: stack empty\n"); return 0.0; } } A variable is external if it is defined outside of any function. Thus the stack and stack index that must be shared by push and pop are defined outside of these functions. But main itself does not refer to the stack or stack position- the representation can be hidden. Let us now turn to the implementation of getop, the function that fetches the next operator or operand. The task is easy. Skip blanks and tabs. If the next character is not a digit or a decimal point, return it. Otherwise, collect a string of digits (which might include a decimal point), and return NUMBER, the signal that a number has been collected. 78 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 #include int getch(void); void ungetch(int); 1* getop: get next operator or numeric operand *1 int getop(char s[]) { int i, c; while «s[O] = c = getch(» -- , , I I I I c == '\t') 8 = '\0'; if (Iisdigit(c) && c 1= '.') return c; 1* not a number *1 i = 0; if (isdigit(c» 1* collect integer part *1 while (isdigit(s[++i] c getch(») = = , if (c == '.')1* collect fraction part *1 while (isdigit(s[++i] = c = getch()}) , s[i] = '\0'; if (c 1= EOF) ungetch(c); return NUMBER; } What are getch and ungetch? It is often the case that a program cannot determine that it has read enough input until it has read too much. One instance is collecting the characters that make up a number: until the first non- digit is seen, the number is not complete. But then the program has read one character too far, a character that it is not prepared for. The problem would be solved if it were possible to "un-read" the unwanted character. Then, every time the program reads one character too many, it could push it back on the input, so the rest of the code could behave as if it had never been read. Fortunately, it's easy to simulate un-getting a character, by writing a pair of cooperating functions. getch delivers the next input character to be considered; ungetch remembers the characters put back on the input, so that subsequent calls to getch will return them before reading new input. How they work together is simple. unqe cch puts the pushed-back charac- ters into a shared buffer-a character array. getch reads from the buffer if there is anything there, and calls getc~ar if the buffer is empty. There must also be an index variable that records the Position of the current character in the buffer. Since the buffer and the index are shared by getch and ungetch and must retain their values between calls, they must be external to both routines. Thus we can write getch, ungetch, and their shared variables as: SECTION 4.3 EXTERNAL VARIABLES 79 #define BUFSIZE 100 char buf[BUFSIZE]; I. buffer for ungetch.1 int bufp = 0; I. next free position in buf.1 int getch(void) I. get a (possibly pushed back) character.1 { return (bufp > 0) ? buf[--bufp] : getchar(); } void ungetch(int c) I. push character back on input.1 { if (bufp >= BUFSIZE) printf("ungetch: too many characters\n"); else buf[bufp++] = c; } The standard library includes a function ungetc that provides one character of push back; we will discuss it in Chapter 7. We have used an array for the push- back, rather than a single character, to illustrate a more general approach. Exercise 4-3. Given the basic framework, it's straightforward to extend the cal- culator. Add the modulus (%) operator and provisions for negative numbers. 0 Exercise 4-4. Add commands to print the top element of the stack without pop- ping, to duplicate it, and to swap the top two elements. Add a command to clear the stack. 0 Exercise 4-5. Add access to library functions like sin, exp, and pow. See in Appendix B, Section 4. 0 Exercise 4-6. Add commands for handling variables. (It's easy to provide twenty-six variables with single-letter names.) Add a variable for the most recently printed value. 0 Exercise 4-7. Write a routine ungets (s) that will push back an entire string onto the input. Should ungets know about buf and bufp, or should it just use ungetch? 0 Exercise 4-8. Suppose that there will never be more than one character of pushback. Modify getch and ungetch accordingly. 0 Exercise 4-9. Our getch and ungetch do not handle a pushed-back EOF correctly. Decide what their properties ought to be if an EOFis pushed back, then implement your design. 0 Exercise 4-10. An alternate organization uses getline to read an entire input line; this makes getch and ungetch unnecessary. Revise the calculator to use this approach. 0 80 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 4.4 Scope Rules The functions and external variables that make up a C program need not all be compiled at the same time; the source text of the program may be kept in several files, and previously compiled routines may be loaded from libraries. Among the questions of interest are How are declarations written so that variables are properly declared during compilation? How are declarations arranged so;that all the pieces will be properly con- nected when the program is loaded? How are declarations organized so there is only one copy? How are external variables initialized? Let us discuss these topics by reorganizing the calculator program into several files. As a practical matter, the calculator is too small to be worth splitting, but it is a fine illustration of the issues that arise in larger programs. The scope of a name is the part of the program within which the name can be used. For an automatic variable declared at the beginning of a function, the scope is the function in which the name is declared. Local variables of the same name in different functions are unrelated. The same is true of the parameters of the function, which are in effect local variables. The scope of an external variable or a function lasts from the point at which it is declared to the end of the file being compiled. For example, if main, sp, val, push, and pop are defined in one file, in the order shown above, that is, main() {...} int sp = 0; double val[MAXVAL]; void push (double f) {... } double pop(void) {...} then the variables sp and val may be used in push and pop simply by nam- ing them; no further declarations are needed. But these names are not visible in main, nor are push and pop themselves. On the other hand, if an external variable is to be referred to before it is defined, or if it is defined in a different source file from the one where it is being used, then an extern declaration is mandatory. It is important to distinguish between the declaration of an external variable and its definition. A declaration announces the properties of a variable (pri- marily its type); a definition also causes storage to be set aside. If the lines int sp; double val[MAXVAL]; appear outside of any function, they define the external variables sp and val, SECTION 4.5 HEADER FILES 81 cause storage to be set aside, and also serve as the declaration for the rest of that source file. On the other hand, the lines extern int sp; extern double vall]; declare for the rest of the source file that sp is an int and that val is a double array (whose size is determined elsewhere), but they do not create the variables or reserve storage for them. There must be only one definition of an external variable among all the files that make up the source program; other files may contain extern declarations to access it. (There may also be extern declarations in the file containing the definition') Array sizes must be specified with the definition, but are optional with an extern declaration. Initialization of an external variable goes only with the definition. Although it is not a likely organization for this program, the functions push and pop could be defined in one file, and the variables val and sp defined and initialized in another. Then these definitions and declarations would be neces- sary to tie them together: Infilel: extern int sp; extern double vall]; void push(double f) { } double pop (void) {...} Infile2: int sp = 0; double val [MAXVAL] ; Because the extern declarations in file} lie ahead of and outside the function definitions, they apply to all functions; one set of declarations suffices for all of file}. This same organization would also be needed if the definitions of sp and val followed their use in one file. 4.5 Header Flies Let us now consider dividing the calculator program into several source files, as it might be if each of the components were substantially bigger. The main function would go in one file, which we will call main. c; push, pop, and their variables go into a second file, stack. c; getop goes into a third, getop. c. Finally, getch and ungetch go into a fourth file, getch. c; we separate them from the others because they would come from a separately-compiled library in a realistic program. 82 FUNCTIONS AND PROGRAM STRUCTURE CHAPTER 4 There is one more thing to worry about-the definitions and declarations shared among the files. As much as possible, we want to centralize this, so that there is only one copy to get right and keep right as the program evolves. Accordingly, we will place this common material in a header file, calc. h, which will be included as necessary. (The #include line is described in Sec- tion 4.11.) The resulting program then looks like this: calc.h: #define NUMBER '0' void push(double); double pop(void); int getop(char []); int getch(void); void ungetch(int); main.c: getop.c: stack.c: #include #include 'include #include #include 'include "calc.h" #include "calc.h" 'include "calc.h" #define MAXVAL 100 #define MAXOP 100 getop( ) int sp = 0; main() { double val[MAXVAL]; void push(double) { } getch.c: double pop(void) 'include #define BUFSIZE 100 char buf[BUFSIZE]; int bufp = 0; int getchtvoid) void ungetch(int) There is a tradeoff between the desire that each file have access only to the information it needs for its job and the practical reality that it is harder to maintain more header files. Up to some moderate program size, it is probably best to have one header file that contains everything that is to be shared between any two parts of the program; that is the decision we made here. For a much larger program, more organization and more headers would be needed. SECTION 4.7 REGISTER VARIABLES 83 4.6 Static Variables The variables sp and val in stack. c, and buf and bufp in getch. c, are for the private use of the functions in their respective source files, and are not meant to be accessed by anything else. The static declaration, applied to an external variable or function, limits the scope of that object to the rest of the source file being compiled. External static thus provides a way to hide names like buf and bufp in the getch-ungetch combination, which must be external so they can be shared, yet which should not be visible to users of getch and ungetch. Static storage is specified by prefixing the normal declaration with the word static. If the two routines and the two variables are compiled in one file, as in static char buf[BUFSIZE]; 1* buffer for ungetch *1 static int bufp = 0; 1* next free position in buf *1 int getch (void) {...} void ungetch (int c) {...} then no other routine will be able to access buf and bufp, and those names will not conflict with the same names in other files of the same program. In the same way, the variables that push and pop use for stack manipulation can be hidden, by declaring sp and val to be static. The external static declaration is most often used for variables, but it can be applied to functions as well. Normally, function names are global, visible to any part of the entire program. If a function is declared static, however, its name is invisible outside of the file in which it is declared. The static declaration can also be applied to internal variables. Internal static variables are local to a particular function just as automatic variables are, but unlike automatics, they remain in existence rather than coming and going each time the function is activated. This means that internal static variables provide private, permanent storage within a single function. Exercise 4

The C Programming Language (Second Edition) PDF

Document Details

Tags

Related

Summary

Full Transcript