UNIX Network Programming PDF - The Sockets Networking API, Volume 1, Third Edition
Document Details
![ResoluteMoon](https://quizgecko.com/images/avatars/avatar-3.webp)
Uploaded by ResoluteMoon
2004
W. Richard Stevens
Tags
Summary
This book, UNIX Network Programming: The Sockets Networking API, Volume 1, Third Edition, by W. Richard Stevens focuses on the Sockets Networking API. It covers concepts of computer networking, including TCP/IP and socket programming. It's a comprehensive guide to the topics discussed.
Full Transcript
Stevens_title.fm Page i Tuesday, October 21, 2003 11:29 AM UNIX Network Programming The Sockets Networking API Addison-Wesley Professional Computing Series Brian W. Kernighan and Craig Partridge, Consulting Editors Matthew H. Aus...
Stevens_title.fm Page i Tuesday, October 21, 2003 11:29 AM UNIX Network Programming The Sockets Networking API Addison-Wesley Professional Computing Series Brian W. Kernighan and Craig Partridge, Consulting Editors Matthew H. Austern, Generic Programming and the STL: Using and Extending the C++ Standard Template Library David R. Butenhof, Programming with POSIX® Threads Brent Callaghan, NFS Illustrated Tom Cargill, C++ Programming Style William R. Cheswick/Steven M. Bellovin/Aviel D. Rubin, Firewalls and Internet Security, Second Edition: Repelling the Wily Hacker David A. Curry, UNIX® System Security: A Guide for Users and System Administrators Stephen C. Dewhurst, C++ Gotchas: Avoiding Common Problems in Coding and Design Erich Gamma/Richard Helm/Ralph Johnson/John Vlissides, Design Patterns: Elements of Reusable Object- Oriented Software Erich Gamma/Richard Helm/Ralph Johnson/John Vlissides, Design Patterns CD: Elements of Reusable Object- Oriented Software Peter Haggar, Practical Java™ Programming Language Guide David R. Hanson, C Interfaces and Implementations: Techniques for Creating Reusable Software Mark Harrison/Michael McLennan, Effective Tcl/Tk Programming: Writing Better Programs with Tcl and Tk Michi Henning/Steve Vinoski, Advanced CORBA® Programming with C++ Brian W. Kernighan/Rob Pike, The Practice of Programming S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, the Internet, and the Telephone Network John Lakos, Large-Scale C++ Software Design Scott Meyers, Effective C++ CD: 85 Specific Ways to Improve Your Programs and Designs Scott Meyers, Effective C++, Second Edition: 50 Specific Ways to Improve Your Programs and Designs Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs Scott Meyers, Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library Robert B. Murray, C++ Strategies and Tactics David R. Musser/Gillmer J. Derge/Atul Saini, STL Tutorial and Reference Guide, Second Edition: C++ Programming with the Standard Template Library John K. Ousterhout, Tcl and the Tk Toolkit Craig Partridge, Gigabit Networking Radia Perlman, Interconnections, Second Edition: Bridges, Routers, Switches, and Internetworking Protocols Stephen A. Rago, UNIX® System V Network Programming Curt Schimmel, UNIX® Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers W. Richard Stevens/Bill Fenner/Andrew M. Rudoff, UNIX Network Programming Volume 1, Third Edition: The Sockets Networking API W. Richard Stevens, Advanced Programming in the UNIX® Environment W. Richard Stevens, TCP/IP Illustrated, Volume 1: The Protocols W. Richard Stevens, TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP, and the UNIX® Domain Protocols W. Richard Stevens/Gary R. Wright, TCP/IP Illustrated Volumes 1-3 Boxed Set John Viega/Gary McGraw, Building Secure Software: How to Avoid Security Problems the Right Way Gary R. Wright/W. Richard Stevens, TCP/IP Illustrated, Volume 2: The Implementation Ruixi Yuan/ W. Timothy Strayer, Virtual Private Networks: Technologies and Solutions Visit www.awprofessional.com/series/professionalcomputing for more information about these titles. Stevens_title.fm Page iii Tuesday, October 21, 2003 11:29 AM UNIX Network Programming The Sockets Networking API Volume 1 Third Edition W. Richard Stevens Bill Fenner Andrew M. Rudoff Boston San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo Singapore Mexico City Stevens_title.fm Page iv Tuesday, October 21, 2003 11:29 AM Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison- Wesley was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers discounts on this book when ordered in quantity for bulk purchases and special sales. For more information, please contact: U.S. Corporate and Government Sales (800) 382-3419 [email protected] For sales outside of the U.S., please contact: International Sales (317) 581-3793 [email protected] Visit Addison-Wesley on the Web: www.awprofessional.com Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book can be obtained from the Library of Congress. Copyright © 2004 by Pearson Education, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of the publisher. Printed in the United States of America. Published simultaneously in Canada. For information on obtaining permission for use of material from this work, please submit a written request to: Pearson Education, Inc. Rights and Contracts Department 75 Arlington Street, Suite 300 Boston, MA 02116 Fax: (617) 848-7047 ISBN: 0-13-141155-1 Text printed on recycled paper First printing To Rich. Aloha nui loa. Stevens_title.fm Page ii Tuesday, October 21, 2003 11:29 AM Contents Foreword xvii Preface xix Part 1. Introduction and TCP/IP 1 Chapter 1. Introduction 3 1.1 Introduction 3 1.2 A Simple Daytime Client 6 1.3 Protocol Independence 10 1.4 Error Handling: Wrapper Functions 11 1.5 A Simple Daytime Server 13 1.6 Roadmap to Client/Server Examples in the Text 16 1.7 OSI Model 18 1.8 BSD Networking History 20 1.9 Test Networks and Hosts 22 1.10 Unix Standards 25 1.11 64-Bit Architectures 28 1.12 Summary 29 Chapter 2. The Transport Layer: TCP, UDP, and SCTP 31 2.1 Introduction 31 2.2 The Big Picture 32 2.3 User Datagram Protocol (UDP) 34 vii viii UNIX Network Programming Contents 2.4 Transmission Control Protocol (TCP) 35 2.5 Stream Control Transmission Protocol (SCTP) 36 2.6 TCP Connection Establishment and Termination 37 2.7 TIME_WAIT State 43 2.8 SCTP Association Establishment and Termination 44 2.9 Por t Numbers 50 2.10 TCP Port Numbers and Concurrent Servers 52 2.11 Buffer Sizes and Limitations 55 2.12 Standard Internet Services 61 2.13 Protocol Usage by Common Internet Applications 62 2.14 Summary 63 Part 2. Elementary Sockets 65 Chapter 3. Sockets Introduction 67 3.1 Introduction 67 3.2 Socket Address Structures 67 3.3 Value-Result Arguments 74 3.4 Byte Ordering Functions 77 3.5 Byte Manipulation Functions 80 3.6 inet_aton, inet_addr, and inet_ntoa Functions 82 3.7 inet_pton and inet_ntop Functions 83 3.8 sock_ntop and Related Functions 86 3.9 readn, writen, and readline Functions 88 3.10 Summary 92 Chapter 4. Elementary TCP Sockets 95 4.1 Introduction 95 4.2 socket Function 95 4.3 connect Function 99 4.4 bind Function 101 4.5 listen Function 104 4.6 accept Function 109 4.7 fork and exec Functions 111 4.8 Concurrent Servers 114 4.9 close Function 117 4.10 getsockname and getpeername Functions 117 4.11 Summary 120 Chapter 5. TCP Client/Server Example 121 5.1 Introduction 121 5.2 TCP Echo Server: main Function 122 5.3 TCP Echo Server: str_echo Function 123 5.4 TCP Echo Client: main Function 124 5.5 TCP Echo Client: str_cli Function 125 5.6 Normal Startup 126 5.7 Normal Termination 128 UNIX Network Programming Contents ix 5.8 POSIX Signal Handling 129 5.9 Handling SIGCHLD Signals 132 5.10 wait and waitpid Functions 135 5.11 Connection Abor t before accept Returns 139 5.12 Termination of Server Process 141 5.13 SIGPIPE Signal 142 5.14 Crashing of Server Host 144 5.15 Crashing and Rebooting of Server Host 144 5.16 Shutdown of Server Host 145 5.17 Summary of TCP Example 146 5.18 Data Format 147 5.19 Summary 151 Chapter 6. I/O Multiplexing: The select and poll Functions 153 6.1 Introduction 153 6.2 I/O Models 154 6.3 select Function 160 6.4 str_cli Function (Revisited) 167 6.5 Batch Input and Buffering 169 6.6 shutdown Function 172 6.7 str_cli Function (Revisited Again) 173 6.8 TCP Echo Server (Revisited) 175 6.9 pselect Function 181 6.10 poll Function 182 6.11 TCP Echo Server (Revisited Again) 185 6.12 Summary 188 Chapter 7. Socket Options 191 7.1 Introduction 191 7.2 getsockopt and setsockopt Functions 192 7.3 Checking if an Option Is Supported and Obtaining the Default 194 7.4 Socket States 198 7.5 Generic Socket Options 198 7.6 IPv4 Socket Options 214 7.7 ICMPv6 Socket Option 216 7.8 IPv6 Socket Options 216 7.9 TCP Socket Options 219 7.10 SCTP Socket Options 222 7.11 fcntl Function 233 7.12 Summary 236 Chapter 8. Elementary UDP Sockets 239 8.1 Introduction 239 8.2 recvfrom and sendto Functions 240 8.3 UDP Echo Server: main Function 241 8.4 UDP Echo Server: dg_echo Function 242 8.5 UDP Echo Client: main Function 244 8.6 UDP Echo Client: dg_cli Function 245 x UNIX Network Programming Contents 8.7 Lost Datagrams 245 8.8 Verifying Received Response 246 8.9 Server Not Running 248 8.10 Summary of UDP Example 250 8.11 connect Function with UDP 252 8.12 dg_cli Function (Revisited) 256 8.13 Lack of Flow Control with UDP 257 8.14 Determining Outgoing Interface with UDP 261 8.15 TCP and UDP Echo Server Using select 262 8.16 Summary 264 Chapter 9. Elementary SCTP Sockets 267 9.1 Introduction 267 9.2 Interface Models 268 9.3 sctp_bindx Function 272 9.4 sctp_connectx Function 274 9.5 sctp_getpaddrs Function 275 9.6 sctp_freepaddrs Function 275 9.7 sctp_getladdrs Function 275 9.8 sctp_freeladdrs Function 276 9.9 sctp_sendmsg Function 276 9.10 sctp_recvmsg Function 277 9.11 sctp_opt_info Function 278 9.12 sctp_peeloff Function 278 9.13 shutdown Function 278 9.14 Notifications 280 9.15 Summary 286 Chapter 10. SCTP Client/Server Example 287 10.1 Introduction 287 10.2 SCTP One-to-Many-Style Streaming Echo Server: main Function 288 10.3 SCTP One-to-Many-Style Streaming Echo Client: main Function 290 10.4 SCTP Streaming Echo Client: str_cli Function 292 10.5 Exploring Head-of-Line Blocking 293 10.6 Controlling the Number of Streams 299 10.7 Controlling Termination 300 10.8 Summary 301 Chapter 11. Name and Address Conversions 303 11.1 Introduction 303 11.2 Domain Name System (DNS) 303 11.3 gethostbyname Function 307 11.4 gethostbyaddr Function 310 11.5 getservbyname and getservbyport Functions 311 11.6 getaddrinfo Function 315 11.7 gai_strerror Function 320 11.8 freeaddrinfo Function 321 11.9 getaddrinfo Function: IPv6 322 UNIX Network Programming Contents xi 11.10 getaddrinfo Function: Examples 324 11.11 host_serv Function 325 11.12 tcp_connect Function 326 11.13 tcp_listen Function 330 11.14 udp_client Function 334 11.15 udp_connect Function 337 11.16 udp_server Function 338 11.17 getnameinfo Function 340 11.18 Re-entrant Functions 341 11.19 gethostbyname_r and gethostbyaddr_r Functions 344 11.20 Obsolete IPv6 Address Lookup Functions 346 11.21 Other Networking Information 348 11.22 Summary 349 Part 3. Advanced Sockets 351 Chapter 12. IPv4 and IPv6 Interoperability 353 12.1 Introduction 353 12.2 IPv4 Client, IPv6 Server 354 12.3 IPv6 Client, IPv4 Server 357 12.4 IPv6 Address-Testing Macros 360 12.5 Source Code Por tability 361 12.6 Summary 362 Chapter 13. Daemon Processes and the inetd Superserver 363 13.1 Introduction 363 13.2 syslogd Daemon 364 13.3 syslog Function 365 13.4 daemon_init Function 367 13.5 inetd Daemon 371 13.6 daemon_inetd Function 377 13.7 Summary 379 Chapter 14. Advanced I/O Functions 381 14.1 Introduction 381 14.2 Socket Timeouts 381 14.3 recv and send Functions 387 14.4 readv and writev Functions 389 14.5 recvmsg and sendmsg Functions 390 14.6 Ancillary Data 395 14.7 How Much Data Is Queued? 398 14.8 Sockets and Standard I/O 399 14.9 Advanced Polling 402 14.10 Summary 408 xii UNIX Network Programming Contents Chapter 15. Unix Domain Protocols 411 15.1 Introduction 411 15.2 Unix Domain Socket Address Structure 412 15.3 socketpair Function 414 15.4 Socket Functions 415 15.5 Unix Domain Stream Client/Server 416 15.6 Unix Domain Datagram Client/Server 418 15.7 Passing Descriptors 420 15.8 Receiving Sender Credentials 429 15.9 Summary 432 Chapter 16. Nonblocking I/O 435 16.1 Introduction 435 16.2 Nonblocking Reads and Writes: str_cli Function (Revisited) 437 16.3 Nonblocking connect 448 16.4 Nonblocking connect: Daytime Client 449 16.5 Nonblocking connect: Web Client 452 16.6 Nonblocking accept 461 16.7 Summary 463 Chapter 17. ioctl Operations 465 17.1 Introduction 465 17.2 ioctl Function 466 17.3 Socket Operations 466 17.4 File Operations 468 17.5 Interface Configuration 468 17.6 get_ifi_info Function 469 17.7 Interface Operations 480 17.8 ARP Cache Operations 481 17.9 Routing Table Operations 483 17.10 Summary 484 Chapter 18. Routing Sockets 485 18.1 Introduction 485 18.2 Datalink Socket Address Structure 486 18.3 Reading and Writing 487 18.4 sysctl Operations 495 18.5 get_ifi_info Function (Revisited) 500 18.6 Interface Name and Index Functions 504 18.7 Summary 508 Chapter 19. Key Management Sockets 511 19.1 Introduction 511 19.2 Reading and Writing 512 19.3 Dumping the Security Association Database (SADB) 514 19.4 Creating a Static Security Association (SA) 517 19.5 Dynamically Maintaining SAs 524 19.6 Summary 528 UNIX Network Programming Contents xiii Chapter 20. Broadcasting 529 20.1 Introduction 529 20.2 Broadcast Addresses 531 20.3 Unicast versus Broadcast 532 20.4 dg_cli Function Using Broadcasting 535 20.5 Race Conditions 538 20.6 Summary 547 Chapter 21. Multicasting 549 21.1 Introduction 549 21.2 Multicast Addresses 549 21.3 Multicasting versus Broadcasting on a LAN 553 21.4 Multicasting on a WAN 556 21.5 Source-Specific Multicast 558 21.6 Multicast Socket Options 559 21.7 mcast_join and Related Functions 565 21.8 dg_cli Function Using Multicasting 570 21.9 Receiving IP Multicast Infrastructure Session Announcements 571 21.10 Sending and Receiving 575 21.11 Simple Network Time Protocol (SNTP) 579 21.12 Summary 584 Chapter 22. Advanced UDP Sockets 587 22.1 Introduction 587 22.2 Receiving Flags, Destination IP Address, and Interface Index 588 22.3 Datagram Truncation 594 22.4 When to Use UDP Instead of TCP 594 22.5 Adding Reliability to a UDP Application 597 22.6 Binding Interface Addresses 608 22.7 Concurrent UDP Servers 612 22.8 IPv6 Packet Information 615 22.9 IPv6 Path MTU Control 618 22.10 Summary 620 Chapter 23. Advanced SCTP Sockets 621 23.1 Introduction 621 23.2 An Autoclosing One-to-Many-Style Server 621 23.3 Par tial Deliver y 622 23.4 Notifications 625 23.5 Unordered Data 629 23.6 Binding a Subset of Addresses 630 23.7 Determining Peer and Local Address Information 631 23.8 Finding an Association ID Given an IP Address 635 23.9 Heartbeating and Address Failure 636 23.10 Peeling Off an Association 637 23.11 Controlling Timing 639 23.12 When to Use SCTP Instead of TCP 641 23.13 Summary 643 xiv UNIX Network Programming Contents Chapter 24. Out-of-Band Data 645 24.1 Introduction 645 24.2 TCP Out-of-Band Data 645 24.3 sockatmark Function 654 24.4 TCP Out-of-Band Data Recap 661 24.5 Summary 662 Chapter 25. Signal-Driven I/O 663 25.1 Introduction 663 25.2 Signal-Driven I/O for Sockets 664 25.3 UDP Echo Server Using SIGIO 666 25.4 Summary 672 Chapter 26. Threads 675 26.1 Introduction 675 26.2 Basic Thread Functions: Creation and Termination 676 26.3 str_cli Function Using Threads 679 26.4 TCP Echo Server Using Threads 681 26.5 Thread-Specific Data 686 26.6 Web Client and Simultaneous Connections (Continued) 694 26.7 Mutexes: Mutual Exclusion 697 26.8 Condition Variables 701 26.9 Web Client and Simultaneous Connections (Continued) 705 26.10 Summary 707 Chapter 27. IP Options 709 27.1 Introduction 709 27.2 IPv4 Options 709 27.3 IPv4 Source Route Options 711 27.4 IPv6 Extension Headers 719 27.5 IPv6 Hop-by-Hop Options and Destination Options 719 27.6 IPv6 Routing Header 725 27.7 IPv6 Sticky Options 731 27.8 Historical IPv6 Advanced API 732 27.9 Summary 733 Chapter 28. Raw Sockets 735 28.1 Introduction 735 28.2 Raw Socket Creation 736 28.3 Raw Socket Output 737 28.4 Raw Socket Input 739 28.5 ping Program 741 28.6 traceroute Program 755 28.7 An ICMP Message Daemon 769 28.8 Summary 786 UNIX Network Programming Contents xv Chapter 29. Datalink Access 787 29.1 Introduction 787 29.2 BSD Packet Filter (BPF) 788 29.3 Datalink Provider Interface (DLPI) 790 29.4 Linux: SOCK_PACKET and PF_PACKET 791 29.5 libpcap: Packet Capture Librar y 792 29.6 libnet: Packet Creation and Injection Librar y 793 29.7 Examining the UDP Checksum Field 793 29.8 Summary 815 Chapter 30. Client/Server Design Alternatives 817 30.1 Introduction 817 30.2 TCP Client Alternatives 819 30.3 TCP Test Client 820 30.4 TCP Iterative Server 821 30.5 TCP Concurrent Server, One Child per Client 822 30.6 TCP Preforked Server, No Locking Around accept 826 30.7 TCP Preforked Server, File Locking Around accept 832 30.8 TCP Preforked Server, Thread Locking Around accept 835 30.9 TCP Preforked Server, Descriptor Passing 836 30.10 TCP Concurrent Server, One Thread per Client 842 30.11 TCP Prethreaded Server, per-Thread accept 844 30.12 TCP Prethreaded Server, Main Thread accept 846 30.13 Summary 849 Chapter 31. STREAMS 851 31.1 Introduction 851 31.2 Overview 851 31.3 getmsg and putmsg Functions 856 31.4 getpmsg and putpmsg Functions 857 31.5 ioctl Function 857 31.6 Transpor t Provider Interface (TPI) 858 31.7 Summary 868 Appendix A. IPv4, IPv6, ICMPv4, and ICMPv6 869 A.1 Introduction 869 A.2 IPv4 Header 869 A.3 IPv6 Header 871 A.4 IPv4 Addresses 874 A.5 IPv6 Addresses 877 A.6 Internet Control Message Protocols (ICMPv4 and ICMPv6) 882 Appendix B. Virtual Networks 885 B.1 Introduction 885 B.2 The MBone 885 B.3 The 6bone 887 B.4 IPv6 Transition: 6to4 889 xvi UNIX Network Programming Contents Appendix C. Debugging Techniques 891 C.1 System Call Tracing 891 C.2 Standard Internet Services 893 C.3 sock Program 893 C.4 Small Test Programs 896 C.5 tcpdump Program 896 C.6 netstat Program 896 C.7 lsof Program 897 Appendix D. Miscellaneous Source Code 899 D.1 unp.h Header 899 D.2 config.h Header 904 D.3 Standard Error Functions 910 Appendix E. Solutions to Selected Exercises 913 Bibliography 947 Index 955 Foreword When the original text of this book arrived in 1990, it was quickly recognized as the definitive reference for programmers to learn network programming techniques. Since then, the art of computer networking has changed dramatically. All it takes is a look at the return address for comments from the original text (‘‘uunet!hsi!netbook’’) to make this clear. (How many readers will even recognize this as an address in the UUCP dialup network that was commonplace in the 1980s?) Today, UUCP networks are a rarity and new technologies such as wireless networks are becoming ubiquitous! With these changes, new network protocols and program- ming paradigms have been developed. But, programmers have lacked a good reference from which to learn the intricacies of these new techniques. This book fills that void. Readers who have a dog-eared copy of the original book will want a new copy for the updated programming techniques and the substantial new material describing next-generation protocols such as IPv6. Everyone will want this book because it provides a great mix of practical experience, historical perspective, and a depth of understanding that only comes from being intimately involved in the field. I’ve already enjoyed and learned from reading this book, and surely you will, too. Sam Leffler xvii Stevens_title.fm Page ii Tuesday, October 21, 2003 11:29 AM Preface Introduction This book is for people who want to write programs that communicate with each other using an application program interface (API) known as sockets. Some readers may be very familiar with sockets already, as that model has become synonymous with network programming. Others may need an introduction to sockets from the ground up. The goal of this book is to offer guidance on network programming for beginners as well as professionals, for those developing new network-aware applications as well as those maintaining existing code, and for people who simply want to understand how the net- working components of their system function. All the examples in this text are actual, runnable code tested on Unix systems. However, many non-Unix systems support the sockets API and the examples are largely operating system-independent, as are the general concepts we present. Virtually every operating system (OS) provides numerous network-aware applications such as Web browsers, email clients, and file-sharing servers. We discuss the usual partitioning of these applications into client and server and write our own small examples of these many times throughout the text. xix xx UNIX Network Programming Preface Presenting this material in a Unix-oriented fashion has the natural side effect of pro- viding background on Unix itself, and on TCP/IP as well. Where more extensive back- ground may be interesting, we refer the reader to other texts. Four texts are so com- monly mentioned in this book that we’ve assigned them the following abbreviations: APUE: Advanced Programming in the UNIX Environment [Stevens 1992] TCPv1: TCP/IP Illustrated, Volume 1 [Stevens 1994] TCPv2: TCP/IP Illustrated, Volume 2 [Wright and Stevens 1995] TCPv3: TCP/IP Illustrated, Volume 3 [Stevens 1996] TCPv2 contains a high level of detail very closely related to the material in this book, as it describes and presents the actual 4.4BSD implementation of the network program- ming functions for the sockets API (socket, bind, connect, and so on). If one under- stands the implementation of a feature, the use of that feature in an application makes more sense. Changes from the Second Edition Sockets have been around, more or less in their current form, since the 1980s, and it is a tribute to their initial design that they have continued to be the network API of choice. Therefore, it may come as a surprise to learn that quite a bit has changed since the sec- ond edition of this book was published in 1998. The changes we’ve made to the text are summarized as follows: This new edition contains updated information on IPv6, which was only in draft form at the time of publication of the second edition and has evolved somewhat. The descriptions of functions and the examples have all been updated to reflect the most recent POSIX specification (POSIX 1003.1-2001), also known as the Single Unix Specification Version 3. The coverage of the X/Open Transport Interface (XTI) has been dropped. That API has fallen out of common use and even the most recent POSIX specification does not bother to cover it. The coverage of TCP for transactions (T/TCP) has been dropped. Three chapters have been added to describe a relatively new transport protocol, SCTP. This reliable, message-oriented protocol provides multiple streams between endpoints and transport-level support for multihoming. It was origi- nally designed for transport of telephony signaling across the Internet, but pro- vides some features that many applications could take advantage of. UNIX Network Programming Preface xxi A chapter has been added on key management sockets, which may be used with Internet Protocol Security (IPsec) and other network security services. The machines used, as well as the versions of their variants of Unix, have all been updated, and the examples have been updated to reflect how these machines behave. In many cases, examples were updated because OS vendors fixed bugs or added features, but as one might expect, we’ve discovered the occasional new bug here and there. The machines used for testing the examples in this book were: Apple Power PC running MacOS/X 10.2.6 HP PA-RISC running HP-UX 11i IBM Power PC running AIX 5.1 Intel x86 running FreeBSD 4.8 Intel x86 running Linux 2.4.7 Sun SPARC running FreeBSD 5.1 Sun SPARC running Solaris 9 See Figure 1.16 for details on how these machines were used. Volume 2 of this UNIX Network Programming series, subtitled Interprocess Communi- cations, builds on the material presented here to cover message passing, synchroniza- tion, shared memory, and remote procedure calls. Using This Book This text can be used as either a tutorial on network programming or as a reference for experienced programmers. When used as a tutorial or for an introductory class on net- work programming, the emphasis should be on Part 2, ‘‘Elementary Sockets’’ (Chapters 3 through 11), followed by whatever additional topics are of interest. Part 2 covers the basic socket functions for both TCP and UDP, along with SCTP, I/O multiplexing, socket options, and basic name and address conversions. Chapter 1 should be read by all readers, especially Section 1.4, which describes some wrapper functions used throughout the text. Chapter 2 and perhaps Appendix A should be referred to as neces- sary, depending on the reader ’s background. Most of the chapters in Part 3, ‘‘Advanced Sockets,’’ can be read independently of the others in that part of the book. To aid in the use of this book as a reference, a thorough index is provided, along with summaries on the end papers of where to find detailed descriptions of all the func- tions and structures. To help those reading topics in a random order, numerous refer- ences to related topics are provided throughout the text. xxii UNIX Network Programming Preface Source Code and Errata Availability The source code for all the examples that appear in the book is available on the Web at www.unpbook.com. The best way to learn network programming is to take these pro- grams, modify them, and enhance them. Actually writing code of this form is the only way to reinforce the concepts and techniques. Numerous exercises are also provided at the end of each chapter, and most answers are provided in Appendix E. A current errata for the book is also available from the same Web site. Acknowledgments The first and second editions of this book were written solely by W. Richard Stevens, who passed away on September 1, 1999. His books have set a high standard and are largely regarded as concise, laboriously detailed, and extremely readable works of art. In providing this revision, the authors struggled to maintain the quality and thorough coverage of Rich’s earlier editions and any shortcomings in this area are entirely the fault of the new authors. The work of an author is only as good as the support from family members and friends. Bill Fenner would like to thank his dear wife, Peggy (beach ¼ mile champion), and their housemate, Christopher Boyd for letting him off all his household chores while working in the treehouse on this project. Thanks are also due to his friend, Jerry Winner, whose prodding and encouragement were invaluable. Likewise, Andy Rudoff wants to specifically thank his wife, Ellen, and girls, Jo and Katie, for their understand- ing and encouragement throughout this project. We simply could not have done this without all of you. Randall Stewart with Cisco Systems, Inc. provided much of the SCTP material and deserves a special acknowledgment for this much-valued contribution. The coverage of this new and interesting topic simply would not exist without Randall’s work. The feedback from our reviewers was invaluable for catching errors, pointing out areas that required more explanation, and suggesting improvements to our text and code examples. The authors would like to thank: James Carlson, Wu-Chang Feng, Rick Jones, Brian Kernighan, Sam Leffler, John McCann, Craig Metz, Ian Lance Taylor, David Schwartz, and Gary Wright. Numerous individuals and their organizations went beyond the normal call of duty UNIX Network Programming Preface xxiii to provide either a loaner system, software, or access to a system, all of which were used to test some of the examples in the text. Jessie Haug of IBM Austin provided an AIX system and compilers. Rick Jones and William Gilliam of Hewlett-Packard provided access to multiple systems running HP-UX. The staff at Addison Wesley has been a true pleasure to work with: Noreen Regina, Kathleen Caren, Dan DePasquale, Anthony Gemellaro, and a very special thanks to our editor, Mary Franz. In a trend that Rich Stevens instituted (but contrary to popular fads), we produced camera-ready copy of the book using the wonderful Groff package written by James Clark, created the illustrations using the gpic program (using many of Gary Wright’s macros), produced the tables using the gtbl program, performed all the indexing, and did the final page layout. Dave Hanson’s loom program and some scripts by Gary Wright were used to include the source code in the book. A set of awk scripts written by Jon Bentley and Brian Kernighan helped in producing the final index. The authors welcome electronic mail from any readers with comments, suggestions, or bug fixes. Bill Fenner Andrew M. Rudoff Woodside, California Boulder, Colorado October 2003 [email protected] http://www.unpbook.com Part 1 Introduction and TCP/IP Stevens_title.fm Page ii Tuesday, October 21, 2003 11:29 AM 1 Introduction 1.1 Introduction When writing programs that communicate across a computer network, one must first invent a protocol, an agreement on how those programs will communicate. Before delv- ing into the design details of a protocol, high-level decisions must be made about which program is expected to initiate communication and when responses are expected. For example, a Web server is typically thought of as a long-running program (or daemon) that sends network messages only in response to requests coming in from the network. The other side of the protocol is a Web client, such as a browser, which always initiates communication with the server. This organization into client and server is used by most network-aware applications. Deciding that the client always initiates requests tends to simplify the protocol as well as the programs themselves. Of course, some of the more complex network applications also require asynchronous callback communication, where the server initiates a message to the client. But it is far more common for applications to stick to the basic client/server model shown in Figure 1.1. application protocol client server Figure 1.1 Network application: client and server. Clients normally communicate with one server at a time, although using a Web browser as an example, we might communicate with many different Web servers over, say, a 10-minute time period. But from the server’s perspective, at any given point in time, it is not unusual for a server to be communicating with multiple clients. We show this in Figure 1.2. Later in this text, we will cover several different ways for a server to handle multiple clients at the same time. 3 4 Introduction Chapter 1 client... client server... client Figure 1.2 Server handling multiple clients at the same time. The client application and the server application may be thought of as communicat- ing via a network protocol, but actually, multiple layers of network protocols are typi- cally involved. In this text, we focus on the TCP/IP protocol suite, also called the Internet protocol suite. For example, Web clients and servers communicate using the Transmission Control Protocol, or TCP. TCP, in turn, uses the Internet Protocol, or IP, and IP communicates with a datalink layer of some form. If the client and server are on the same Ethernet, we would have the arrangement shown in Figure 1.3. user Web application protocol Web application layer process client server TCP protocol TCP TCP transport layer protocol stack within kernel IP protocol IP IP network layer Ethernet Ethernet protocol Ethernet datalink layer driver driver actual flow between client and server Ethernet Figure 1.3 Client and server on the same Ethernet communicating using TCP. Even though the client and server communicate using an application protocol, the transport layers communicate using TCP. Note that the actual flow of information between the client and server goes down the protocol stack on one side, across the net- work, and up the protocol stack on the other side. Also note that the client and server are typically user processes, while the TCP and IP protocols are normally part of the Section 1.1 Introduction 5 protocol stack within the kernel. We have labeled the four layers on the right side of Figure 1.3. TCP and IP are not the only protocols that we will discuss. Some clients and servers use the User Datagram Protocol (UDP) instead of TCP, and we will discuss both proto- cols in more detail in Chapter 2. Furthermore, we have used the term ‘‘IP,’’ but the protocol, which has been in use since the early 1980s, is officially called IP version 4 (IPv4). A new version, IP version 6 (IPv6) was developed during the mid-1990s and could potentially replace IPv4 in the years to come. This text covers the development of network applications using both IPv4 and IPv6. Appendix A provides a comparison of IPv4 and IPv6, along with other protocols that we will discuss. The client and server need not be attached to the same local area network (LAN) as we show in Figure 1.3. For instance, in Figure 1.4, we show the client and server on dif- ferent LANs, with both LANs connected to a wide area network (WAN) using routers. client server application application host host with with TCP/IP TCP/IP LAN LAN router router WAN router router router router Figure 1.4 Client and server on different LANs connected through a WAN. Routers are the building blocks of WANs. The largest WAN today is the Internet. Many companies build their own WANs and these private WANs may or may not be con- nected to the Internet. The remainder of this chapter provides an introduction to the various topics that are covered in detail later in the text. We start with a complete example of a TCP client, albeit a simple one, that demonstrates many of the function calls and concepts that we will encounter throughout the text. This client works with IPv4 only, and we show the changes required to work with IPv6. A better solution is to write protocol-independent clients and servers, and we will discuss this in Chapter 11. This chapter also shows a complete TCP server that works with our client. To simplify all our code, we define our own wrapper functions for most of the sys- tem functions that we call throughout the text. We can use these wrapper functions 6 Introduction Chapter 1 most of the time to check for an error, print an appropriate message, and terminate when an error occurs. We also show the test network, hosts, and routers used for most examples in the text, along with their hostnames, IP addresses, and operating systems. Most discussions of Unix these days include the term ‘‘X,’’ which is the standard that most vendors have adopted. We describe the history of POSIX and how it affects the Application Programming Interfaces (APIs) that we describe in this text, along with the other players in the standards arena. 1.2 A Simple Daytime Client Let’s consider a specific example to introduce many of the concepts and terms that we will encounter throughout the book. Figure 1.5 is an implementation of a TCP time-of- day client. This client establishes a TCP connection with a server and the server simply sends back the current time and date in a human-readable format. intro/daytimetcpcli.c 1 #include "unp.h" 2 int 3 main(int argc, char **argv) 4 { 5 int sockfd, n; 6 char recvline[MAXLINE + 1]; 7 struct sockaddr_in servaddr; 8 if (argc != 2) 9 err_quit("usage: a.out "); 10 if ( (sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0) 11 err_sys("socket error"); 12 bzero(&servaddr, sizeof(servaddr)); 13 servaddr.sin_family = AF_INET; 14 servaddr.sin_port = htons(13); 15 if (inet_pton(AF_INET, argv, &servaddr.sin_addr) 0) { 20 recvline[n] = 0; 21 if (fputs(recvline, stdout) == EOF) 22 err_sys("fputs error"); 23 } 24 if (n < 0) 25 err_sys("read error"); 26 exit(0); 27 } intro/daytimetcpcli.c Figure 1.5 TCP daytime client. Section 1.2 A Simple Daytime Client 7 This is the format that we will use for all the source code in the text. Each nonblank line is numbered. The text describing portions of the code notes the starting and ending line num- bers in the left margin, as shown shortly. Sometimes a paragraph is preceded by a short, descriptive, bold heading, providing a summary statement of the code being described. The horizontal rules at the beginning and end of a code fragment specify the source code file- name: the file daytimetcpcli.c in the directory intro for this example. Since the source code for all the examples in the text is freely available (see the Preface), this lets you locate the appropriate source file. Compiling, running, and especially modifying these programs while reading this text is an excellent way to learn the concepts of network programming. Throughout the text, we will use indented, parenthetical notes such as this to describe imple- mentation details and historical points. If we compile the program into the default a.out file and execute it, we will have the following output: solaris % a.out 206.168.112.96 our input Mon May 26 20:58:40 2003 the program’s output Whenever we display interactive input and output, we will show our typed input in bold and the computer output like this. Comments are added on the right side in italics. We will always include the name of the system as part of the shell prompt (solaris in this example) to show on which host the command was run. Figure 1.16 shows the systems used to run most of the examples in this book. The hostnames usually describe the operating system (OS) as well. There are many details to consider in this 27-line program. We mention them briefly here, in case this is your first encounter with a network program, and provide more information on these topics later in the text. Include our own header 1 We include our own header, unp.h, which we will show in Section D.1. This header includes numerous system headers that are needed by most network programs and defines various constants that we use (e.g., MAXLINE). Command-line arguments 2–3 This is the definition of the main function along with the command-line arguments. We have written the code in this text assuming an American National Standards Insti- tute (ANSI) C compiler (also referred to as an ISO C compiler). Create TCP socket 10–11 The socket function creates an Internet (AF_INET) stream (SOCK_STREAM) socket, which is a fancy name for a TCP socket. The function returns a small integer descriptor that we can use to identify the socket in all future function calls (e.g., the calls to connect and read that follow). 8 Introduction Chapter 1 The if statement contains a call to the socket function, an assignment of the return value to the variable named sockfd, and then a test of whether this assigned value is less than 0. While we could break this into two C statements, sockfd = socket(AF_INET, SOCK_STREAM, 0); if (sockfd < 0) it is a common C idiom to combine the two lines. The set of parentheses around the function call and assignment is required, given the precedence rules of C (the less-than operator has a higher precedence than assignment). As a matter of coding style, the authors always place a space between the two opening parentheses, as a visual indicator that the left-hand side of the comparison is also an assignment. (This style is copied from the Minix source code [Tanen- baum 1987].) We use this same style in the while statement later in the program. We will encounter many different uses of the term ‘‘socket.’’ First, the API that we are using is called the sockets API. In the preceding paragraph, we referred to a function named socket that is part of the sockets API. In the preceding paragraph, we also referred to a TCP socket, which is synonymous with a TCP endpoint. If the call to socket fails, we abort the program by calling our own err_sys func- tion. It prints our error message along with a description of the system error that occurred (e.g., ‘‘Protocol not supported’’ is one possible error from socket) and termi- nates the process. This function, and a few others of our own that begin with err_, are called throughout the text. We will describe them in Section D.3. Specify server’s IP address and port 12–16 We fill in an Internet socket address structure (a sockaddr_in structure named servaddr) with the server’s IP address and port number. We set the entire structure to 0 using bzero, set the address family to AF_INET, set the port number to 13 (which is the well-known port of the daytime server on any TCP/IP host that supports this ser- vice, as shown in Figure 2.18), and set the IP address to the value specified as the first command-line argument (argv). The IP address and port number fields in this structure must be in specific formats: We call the library function htons (‘‘host to net- work short’’) to convert the binary port number, and we call the library function inet_pton (‘‘presentation to numeric’’) to convert the ASCII command-line argument (such as 206.62.226.35 when we ran this example) into the proper format. bzero is not an ANSI C function. It is derived from early Berkeley networking code. Never- theless, we use it throughout the text, instead of the ANSI C memset function, because bzero is easier to remember (with only two arguments) than memset (with three arguments). Almost every vendor that supports the sockets API also provides bzero, and if not, we pro- vide a macro definition of it in our unp.h header. Indeed, the author of TCPv3 made the mistake of swapping the second and third arguments to memset in 10 occurrences in the first printing. A C compiler cannot catch this error because both arguments are of the same type. (Actually, the second argument is an int and the third argument is size_t, which is typically an unsigned int, but the values specified, 0 and 16, respectively, are still acceptable for the other type of argument.) The call to memset still worked, but did nothing. The number of bytes to initialize was specified as 0. The programs still worked, because only a few of the socket functions actually require that the final 8 bytes of an Internet socket address structure be set to 0. Nevertheless, it was an error, and one that could be avoided by using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used. Section 1.2 A Simple Daytime Client 9 This may be your first encounter with the inet_pton function. It is new with IPv6 (which we will talk more about in Appendix A). Older code uses the inet_addr function to convert an ASCII dotted-decimal string into the correct format, but this function has numerous limitations that inet_pton corrects. Do not worry if your system does not (yet) support this function; we will provide an implementation of it in Section 3.7. Establish connection with server 17–18 The connect function, when applied to a TCP socket, establishes a TCP connection with the server specified by the socket address structure pointed to by the second argu- ment. We must also specify the length of the socket address structure as the third argu- ment to connect, and for Internet socket address structures, we always let the compiler calculate the length using C’s sizeof operator. In the unp.h header, we #define SA to be struct sockaddr, that is, a generic socket address structure. Everytime one of the socket functions requires a pointer to a socket address structure, that pointer must be cast to a pointer to a generic socket address structure. This is because the socket functions predate the ANSI C standard, so the void * pointer type was not available in the early 1980s when these functions were developed. The problem is that ‘‘struct sockaddr’’ is 15 characters and often causes the source code line to extend past the right edge of the screen (or page, in the case of a book), so we shorten it to SA. We will talk more about generic socket address structures when explaining Figure 3.3. Read and display server’s reply 19–25 We read the server’s reply and display the result using the standard I/O fputs function. We must be careful when using TCP because it is a byte-stream protocol with no record boundaries. The server ’s reply is normally a 26-byte string of the form Mon May 26 20:58:40 2003\r\n where \r is the ASCII carriage return and \n is the ASCII linefeed. With a byte-stream protocol, these 26 bytes can be returned in numerous ways: a single TCP segment con- taining all 26 bytes of data, in 26 TCP segments each containing 1 byte of data, or any other combination that totals to 26 bytes. Normally, a single segment containing all 26 bytes of data is returned, but with larger data sizes, we cannot assume that the server’s reply will be returned by a single read. Therefore, when reading from a TCP socket, we always need to code the read in a loop and terminate the loop when either read returns 0 (i.e., the other end closed the connection) or a value less than 0 (an error). In this example, the end of the record is being denoted by the server closing the con- nection. This technique is also used by version 1.0 of the Hypertext Transfer Protocol (HTTP). Other techniques are available. For example, the Simple Mail Transfer Proto- col (SMTP) marks the end of a record with the two-byte sequence of an ASCII carriage return followed by an ASCII linefeed. Sun Remote Procedure Call (RPC) and the Domain Name System (DNS) place a binary count containing the record length in front of each record that is sent when using TCP. The important concept here is that TCP itself provides no record markers: If an application wants to delineate the ends of records, it must do so itself and there are a few common ways to accomplish this. Terminate program 26 exit terminates the program. Unix always closes all open descriptors when a pro- cess terminates, so our TCP socket is now closed. 10 Introduction Chapter 1 As we mentioned, the text will go into much more detail on all the points we just described. 1.3 Protocol Independence Our program in Figure 1.5 is protocol-dependent on IPv4. We allocate and initialize a sockaddr_in structure, we set the family of this structure to AF_INET, and we specify the first argument to socket as AF_INET. To modify the program to work under IPv6, we must change the code. Figure 1.6 shows a version that works under IPv6, with the changes highlighted in bold. intro/daytimetcpcliv6.c 1 #include "unp.h" 2 int 3 main(int argc, char **argv) 4 { 5 int sockfd, n; 6 char recvline[MAXLINE + 1]; 7 struct sockaddr_in6 servaddr; 8 if (argc != 2) 9 err_quit("usage: a.out "); 10 if ( (sockfd = socket(AF_INET6, SOCK_STREAM, 0)) < 0) 11 err_sys("socket error"); 12 bzero(&servaddr, sizeof(servaddr)); 13 servaddr.sin6_family = AF_INET6; 14 servaddr.sin6_port = htons(13); 15 if (inet_pton(AF_INET6, argv, &servaddr.sin6_addr) 0) { 20 recvline[n] = 0; 21 if (fputs(recvline, stdout) == EOF) 22 err_sys("fputs error"); 23 } 24 if (n < 0) 25 err_sys("read error"); 26 exit(0); 27 } intro/daytimetcpcliv6.c Figure 1.6 Version of Figure 1.5 for IPv6. Only five lines are changed, but what we now have is another protocol-dependent pro- gram; this time, it is dependent on IPv6. It is better to make a program protocol-independent. Figure 11.11 will show a version of this client that is protocol-inde- pendent by using the getaddrinfo function (which is called by tcp_connect). Section 1.4 Error Handling: Wrapper Functions 11 Another deficiency in our programs is that the user must enter the server’s IP address as a dotted-decimal number (e.g., 206.168.112.219 for the IPv4 version). Humans work better with names instead of numbers (e.g., www.unpbook.com). In Chapter 11, we will discuss the functions that convert between hostnames and IP addresses, and between service names and ports. We purposely put off the discussion of these functions and continue using IP addresses and port numbers so we know exactly what goes into the socket address structures that we must fill in and examine. This also avoids complicating our discussion of network programming with the details of yet another set of functions. 1.4 Error Handling: Wrapper Functions In any real-world program, it is essential to check every function call for an error return. In Figure 1.5, we check for errors from socket, inet_pton, connect, read, and fputs, and when one occurs, we call our own functions, err_quit and err_sys, to print an error message and terminate the program. We find that most of the time, this is what we want to do. Occasionally, we want to do something other than terminate when one of these functions returns an error, as in Figure 5.12, when we must check for an interrupted system call. Since terminating on an error is the common case, we can shorten our programs by defining a wrapper function that performs the actual function call, tests the return value, and terminates on an error. The convention we use is to capitalize the name of the func- tion, as in sockfd = Socket(AF_INET, SOCK_STREAM, 0); Our wrapper function is shown in Figure 1.7. lib/wrapsock.c 236 int 237 Socket(int family, int type, int protocol) 238 { 239 int n; 240 if ( (n = socket(family, type, protocol)) < 0) 241 err_sys("socket error"); 242 return (n); 243 } lib/wrapsock.c Figure 1.7 Our wrapper function for the socket function. Whenever you encounter a function name in the text that begins with an uppercase letter, that is one of our wrapper functions. It calls a function whose name is the same but begins with the lowercase letter. When describing the source code that is presented in the text, we always refer to the lowest level function being called (e.g., socket), not the wrapper function (e.g., Socket). 12 Introduction Chapter 1 While these wrapper functions might not seem like a big savings, when we discuss threads in Chapter 26, we will find that thread functions do not set the standard Unix errno variable when an error occurs; instead, the errno value is the return value of the function. This means that every time we call one of the pthread_ functions, we must allocate a variable, save the return value in that variable, and then set errno to this value before calling err_sys. To avoid cluttering the code with braces, we can use C’s comma operator to combine the assignment into errno and the call of err_sys into a single statement, as in the following: int n; if ( (n = pthread_mutex_lock(&ndone_mutex)) != 0) errno = n, err_sys("pthread_mutex_lock error"); Alternately, we could define a new error function that takes the system’s error number as an argument. But, we can make this piece of code much easier to read as just Pthread_mutex_lock(&ndone_mutex); by defining our own wrapper function, as shown in Figure 1.8. lib/wrappthread.c 72 void 73 Pthread_mutex_lock(pthread_mutex_t *mptr) 74 { 75 int n; 76 if ( (n = pthread_mutex_lock(mptr)) == 0) 77 return; 78 errno = n; 79 err_sys("pthread_mutex_lock error"); 80 } lib/wrappthread.c Figure 1.8 Our wrapper function for pthread_mutex_lock. With careful C coding, we could use macros instead of functions, providing a little run-time efficiency, but these wrapper functions are rarely the performance bottleneck of a program. Our choice of capitalizing the first character of a function name is a compromise. Many other styles were considered: prefixing the function name with an ‘‘e’’ (as done on p. 182 of [Kernighan and Pike 1984]), appending ‘‘_e’’ to the function name, and so on. Our style seems the least distracting while still providing a visual indication that some other function is really being called. This technique has the side benefit of checking for errors from functions whose error returns are often ignored: close and listen, for example. Throughout the rest of this book, we will use these wrapper functions unless we need to check for an explicit error and handle it in some way other than terminating the process. We do not show the source code for all our wrapper functions, but the code is freely available (see the Preface). Section 1.5 A Simple Daytime Server 13 Unix errno Value When an error occurs in a Unix function (such as one of the socket functions), the global variable errno is set to a positive value indicating the type of error and the function normally returns −1. Our err_sys function looks at the value of errno and prints the corresponding error message string (e.g., ‘‘Connection timed out’’ if errno equals ETIMEDOUT). The value of errno is set by a function only if an error occurs. Its value is unde- fined if the function does not return an error. All of the positive error values are con- stants with all-uppercase names beginning with ‘‘E,’’ and are normally defined in the header. No error has a value of 0. Storing errno in a global variable does not work with multiple threads that share all global variables. We will talk about solutions to this problem in Chapter 26. Throughout the text, we will use phrases such as ‘‘the connect function returns ECONNREFUSED’’ as shorthand to mean that the function returns an error (typically with a return value of −1), with errno set to the specified constant. 1.5 A Simple Daytime Server We can write a simple version of a TCP daytime server, which will work with the client from Section 1.2. We use the wrapper functions that we described in the previous sec- tion and show this server in Figure 1.9. Create a TCP socket 10 The creation of the TCP socket is identical to the client code. Bind server’s well-known port to socket 11–15 The server’s well-known port (13 for the daytime service) is bound to the socket by filling in an Internet socket address structure and calling bind. We specify the IP address as INADDR_ANY, which allows the server to accept a client connection on any interface, in case the server host has multiple interfaces. Later we will see how we can restrict the server to accepting a client connection on just a single interface. Convert socket to listening socket 16 By calling listen, the socket is converted into a listening socket, on which incom- ing connections from clients will be accepted by the kernel. These three steps, socket, bind, and listen, are the normal steps for any TCP server to prepare what we call the listening descriptor (listenfd in this example). The constant LISTENQ is from our unp.h header. It specifies the maximum num- ber of client connections that the kernel will queue for this listening descriptor. We say much more about this queueing in Section 4.5. 14 Introduction Chapter 1 intro/daytimetcpsrv.c 1 #include "unp.h" 2 #include 3 int 4 main(int argc, char **argv) 5 { 6 int listenfd, connfd; 7 struct sockaddr_in servaddr; 8 char buff[MAXLINE]; 9 time_t ticks; 10 listenfd = Socket(AF_INET, SOCK_STREAM, 0); 11 bzero(&servaddr, sizeof(servaddr)); 12 servaddr.sin_family = AF_INET; 13 servaddr.sin_addr.s_addr = htonl(INADDR_ANY); 14 servaddr.sin_port = htons(13); 15 Bind(listenfd, (SA *) &servaddr, sizeof(servaddr)); 16 Listen(listenfd, LISTENQ); 17 for ( ; ; ) { 18 connfd = Accept(listenfd, (SA *) NULL, NULL); 19 ticks = time(NULL); 20 snprintf(buff, sizeof(buff), "%.24s\r\n", ctime(&ticks)); 21 Write(connfd, buff, strlen(buff)); 22 Close(connfd); 23 } 24 } intro/daytimetcpsrv.c Figure 1.9 TCP daytime server. Accept client connection, send reply 17–21 Normally, the server process is put to sleep in the call to accept, waiting for a client connection to arrive and be accepted. A TCP connection uses what is called a three-way handshake to establish a connection. When this handshake completes, accept returns, and the return value from the function is a new descriptor (connfd) that is called the connected descriptor. This new descriptor is used for communication with the new client. A new descriptor is returned by accept for each client that connects to our server. The style used throughout the book for an infinite loop is for ( ; ; ) {... } The current time and date are returned by the library function time, which returns the number of seconds since the Unix Epoch: 00:00:00 January 1, 1970, Coordinated Section 1.5 A Simple Daytime Server 15 Universal Time (UTC). The next library function, ctime, converts this integer value into a human-readable string such as Mon May 26 20:58:40 2003 A carriage return and linefeed are appended to the string by snprintf, and the result is written to the client by write. If you’re not already in the habit of using snprintf instead of the older sprintf, now’s the time to learn. Calls to sprintf cannot check for overflow of the destination buffer. snprintf, on the other hand, requires that the second argument be the size of the destination buffer, and this buffer will not overflow. snprintf was a relatively late addition to the ANSI C standard, introduced in the version referred to as ISO C99. Virtually all vendors provide it as part of the standard C library, and many freely available versions are also available. We use snprintf throughout the text, and we recommend using it instead of sprintf in all your programs for reliability. It is remarkable how many network break-ins have occurred by a hacker sending data to cause a server ’s call to sprintf to overflow its buffer. Other functions that we should be careful with are gets, strcat, and strcpy, normally calling fgets, strncat, and strncpy instead. Even better are the more recently available functions strlcat and strlcpy, which ensure the result is a properly terminated string. Additional tips on writing secure network programs are found in Chapter 23 of [Garfinkel, Schwartz, and Spafford 2003]. Terminate connection 22 The server closes its connection with the client by calling close. This initiates the normal TCP connection termination sequence: a FIN is sent in each direction and each FIN is acknowledged by the other end. We will say much more about TCP’s three-way handshake and the four TCP packets used to terminate a TCP connection in Section 2.6. As with the client in the previous section, we have only examined this server briefly, saving all the details for later in the book. Note the following points: As with the client, the server is protocol-dependent on IPv4. We will show a protocol-independent version that uses the getaddrinfo function in Fig- ure 11.13. Our server handles only one client at a time. If multiple client connections arrive at about the same time, the kernel queues them, up to some limit, and returns them to accept one at a time. This daytime server, which requires call- ing two library functions, time and ctime, is quite fast. But if the server took more time to service each client (say a few seconds or a minute), we would need some way to overlap the service of one client with another client. The server that we show in Figure 1.9 is called an iterative server because it iter- ates through each client, one at a time. There are numerous techniques for writ- ing a concurrent server, one that handles multiple clients at the same time. The simplest technique for a concurrent server is to call the Unix fork function (Sec- tion 4.7), creating one child process for each client. Other techniques are to use 16 Introduction Chapter 1 threads instead of fork (Section 26.4), or to pre-fork a fixed number of children when the server starts (Section 30.6). If we start a server like this from a shell command line, we might want the server to run for a long time, since servers often run for as long as the system is up. This requires that we add code to the server to run correctly as a Unix daemon: a process that can run in the background, unattached to a terminal. We will cover this in Section 13.4. 1.6 Roadmap to Client/Server Examples in the Text Two client/server examples are used predominantly throughout the text to illustrate the various techniques used in network programming: A daytime client/server (which we started in Figures 1.5, 1.6, and 1.9) An echo client/server (which will start in Chapter 5) To provide a roadmap for the different topics that are covered in this text, we will sum- marize the programs that we will develop, and give the starting figure number and page number in which the source code appears. Figure 1.10 lists the versions of the daytime client, two versions of which we have already seen. Figure 1.11 lists the ver- sions of the daytime server. Figure 1.12 lists the versions of the echo client, and Fig- ure 1.13 lists the versions of the echo server. Figure Page Description 1.5 6 TCP/IPv4, protocol-dependent 1.6 10 TCP/IPv6, protocol-dependent 11.4 313 TCP/IPv4, protocol-dependent, calls gethostbyname and getservbyname 11.11 328 TCP, protocol-independent, calls getaddrinfo and tcp_connect 11.16 336 UDP, protocol-independent, calls getaddrinfo and udp_client 16.11 450 TCP, uses nonblocking connect 31.8 859 TCP, protocol-dependent, uses TPI instead of sockets E.1 917 TCP, protocol-dependent, generates SIGPIPE E.5 920 TCP, protocol-dependent, prints socket receive buffer sizes and MSS E.11 931 TCP, protocol-dependent, allows hostname (gethostbyname) or IP address E.12 932 TCP, protocol-independent, allows hostname (gethostbyname) Figure 1.10 Different versions of the daytime client developed in the text. Section 1.6 Roadmap to Client/Server Examples in the Text 17 Figure Page Description 1.9 14 TCP/IPv4, protocol-dependent 11.13 332 TCP, protocol-independent, calls getaddrinfo and tcp_listen 11.14 334 TCP, protocol-independent, calls getaddrinfo and tcp_listen 11.19 339 UDP, protocol-independent, calls getaddrinfo and udp_server 13.5 371 TCP, protocol-independent, runs as standalone daemon 13.12 378 TCP, protocol-independent, spawned from inetd daemon Figure 1.11 Different versions of the daytime server developed in the text. Figure Page Description 5.4 124 TCP/IPv4, protocol-dependent 6.9 168 TCP, uses select 6.13 174 TCP, uses select and operates on buffers 8.7 244 UDP/IPv4, protocol-dependent 8.9 247 UDP, verifies server’s address 8.17 256 UDP, calls connect to obtain asynchronous errors 14.2 384 UDP, times out when reading server’s reply using SIGALRM 14.4 386 UDP, times out when reading server’s reply using select 14.5 387 UDP, times out when reading server’s reply using SO_RCVTIMEO 15.4 418 Unix domain stream, protocol-dependent 15.6 419 Unix domain datagram, protocol-dependent 16.3 438 TCP, uses nonblocking I/O 16.10 447 TCP, uses two processes (fork) 16.21 462 TCP, establishes connection then sends RST 14.15 404 TCP, uses /dev/poll for multiplexing 14.18 407 TCP, uses kqueue for multiplexing 20.5 537 UDP, broadcasts with race condition 20.6 540 UDP, broadcasts with race condition 20.7 542 UDP, broadcasts, race condition fixed by using pselect 20.9 544 UDP, broadcasts, race condition fixed by using sigsetjmp and siglongjmp 20.10 547 UDP, broadcasts, race condition fixed by using IPC from signal handler 22.6 600 UDP, reliable using timeout, retransmit, and sequence number 26.2 680 TCP, uses two threads 27.6 716 TCP/IPv4, specifies a source route 27.13 729 UDP/IPv6, specifies a source route Figure 1.12 Different versions of the echo client developed in the text. 18 Introduction Chapter 1 Figure Page Description 5.2 123 TCP/IPv4, protocol-dependent 5.12 139 TCP/IPv4, protocol-dependent, reaps terminated children 6.21 178 TCP/IPv4, protocol-dependent, uses select, one process handles all clients 6.25 186 TCP/IPv4, protocol-dependent, uses poll, one process handles all clients 8.3 242 UDP/IPv4, protocol-dependent 8.24 263 TCP and UDP/IPv4, protocol-dependent, uses select 14.14 400 TCP, uses standard I/O library 15.3 417 Unix domain stream, protocol-dependent 15.5 418 Unix domain datagram, protocol-dependent 15.15 431 Unix domain stream, with credential passing from client 22.4 593 UDP, receives destination address and received interface; truncated datagrams 22.15 609 UDP, binds all interface addresses 25.4 668 UDP, uses signal-driven I/O 26.3 682 TCP, one thread per client 26.4 684 TCP, one thread per client, portable argument passing 27.6 716 TCP/IPv4, prints received source route 27.14 730 UDP/IPv6, prints and reverses received source route 28.31 773 UDP, uses icmpd to receive asynchronous errors E.15 943 UDP, binds all interface addresses Figure 1.13 Different versions of the echo server developed in the text. 1.7 OSI Model A common way to describe the layers in a network is to use the International Organiza- tion for Standardization (ISO) open systems interconnection (OSI) model for computer communications. This is a seven-layer model, which we show in Figure 1.14, along with the approximate mapping to the Internet protocol suite. We consider the bottom two layers of the OSI model as the device driver and net- working hardware that are supplied with the system. Normally, we need not concern ourselves with these layers other than being aware of some properties of the datalink, such as the 1500-byte Ethernet maximum transfer unit (MTU), which we describe in Section 2.11. The network layer is handled by the IPv4 and IPv6 protocols, both of which we will describe in Appendix A. The transport layers that we can choose from are TCP and UDP, and we will describe these in Chapter 2. We show a gap between TCP and UDP in Figure 1.14 to indicate that it is possible for an application to bypass the transport layer and use IPv4 or IPv6 directly. This is called a raw socket, and we will talk about this in Chapter 28. Section 1.7 OSI Model 19 application 7 application details user 6 presentation application process 5 session sockets XTI 4 transport TCP UDP 3 network IPv4, IPv6 kernel device communications 2 datalink driver details and 1 physical hardware OSI model Internet protocol suite Figure 1.14 Layers in OSI model and Internet protocol suite. The upper three layers of the OSI model are combined into a single layer called the application. This is the Web client (browser), Telnet client, Web server, FTP server, or whatever application we are using. With the Internet protocols, there is rarely any dis- tinction between the upper three layers of the OSI model. The sockets programming interfaces described in this book are interfaces from the upper three layers (the ‘‘application’’) into the transport layer. This is the focus of this book: how to write applications using sockets that use either TCP or UDP. We already mentioned raw sockets, and in Chapter 29 we will see that we can even bypass the IP layer completely to read and write our own datalink-layer frames. Why do sockets provide the interface from the upper three layers of the OSI model into the transport layer? There are two reasons for this design, which we note on the right side of Figure 1.14. First, the upper three layers handle all the details of the appli- cation (FTP, Telnet, or HTTP, for example) and know little about the communication details. The lower four layers know little about the application, but handle all the com- munication details: sending data, waiting for acknowledgments, sequencing data that arrives out of order, calculating and verifying checksums, and so on. The second reason is that the upper three layers often form what is called a user process while the lower four layers are normally provided as part of the operating system (OS) kernel. Unix provides this separation between the user process and the kernel, as do many other con- temporary operating systems. Therefore, the interface between layers 4 and 5 is the nat- ural place to build the API. 20 Introduction Chapter 1 1.8 BSD Networking History The sockets API originated with the 4.2BSD system, released in 1983. Figure 1.15 shows the development of the various BSD releases, noting the major TCP/IP developments. A few changes to the sockets API also took place in 1990 with the 4.3BSD Reno release, when the OSI protocols went into the BSD kernel. The path down the figure from 4.2BSD through 4.4BSD shows the releases from the Computer Systems Research Group (CSRG) at Berkeley, which required the recipient to already have a source code license for Unix. But all the networking code, both the ker- nel support (such as the TCP/IP and Unix domain protocol stacks and the socket inter- face), along with the applications (such as the Telnet and FTP clients and servers), were developed independently from the AT&T-derived Unix code. Therefore, starting in 1989, Berkeley provided the first of the BSD networking releases, which contained all the networking code and various other pieces of the BSD system that were not con- strained by the Unix source code license requirement. These releases were ‘‘publicly available’’ and eventually became available by anonymous FTP to anyone. The final releases from Berkeley were 4.4BSD-Lite in 1994 and 4.4BSD-Lite2 in 1995. We note that these two releases were then used as the base for other systems: BSD/OS, FreeBSD, NetBSD, and OpenBSD, most of which are still being actively developed and enhanced. More information on the various BSD releases, and on the history of the var- ious Unix systems in general, can be found in Chapter 1 of [McKusick et al. 1996]. Many Unix systems started with some version of the BSD networking code, includ- ing the sockets API, and we refer to these implementations as Berkeley-derived implemen- tations. Many commercial versions of Unix are based on System V Release 4 (SVR4). Some of these versions have Berkeley-derived networking code (e.g., UnixWare 2.x), while the networking code in other SVR4 systems has been independently derived (e.g., Solaris 2.x). We also note that Linux, a popular, freely available implementation of Unix, does not fit into the Berkeley-derived classification: Its networking code and sock- ets API were developed from scratch. Section 1.8 BSD Networking History 21 4.2BSD (1983) first widely available release of TCP/IP and sockets API 4.3BSD (1986) TCP performance improvements 4.3BSD Tahoe (1988) slow start, congestion avoidance, fast retransmit BSD Networking Software Release 1.0 (1989): Net/1 4.3BSD Reno (1990) fast recovery, TCP header prediction, SLIP header compression, routing table changes; length field added to sockaddr{}; control information added to msghdr{} BSD Networking Software Release 2.0 (1991): Net/2 4.4BSD (1993) multicasting, long fat pipe modifications 4.4BSD-Lite (1994) referred to in text as Net/3 BSD/OS FreeBSD NetBSD OpenBSD 4.4BSD-Lite2 (1995) Figure 1.15 History of various BSD releases. 22 Introduction Chapter 1 1.9 Test Networks and Hosts Figure 1.16 shows the various networks and hosts used in the examples throughout the text. Fo