CIS*2750 Midterm 1 PDF - C Tool Chain - University of Guelph

CIS*2750 Lecture 2a: the C tool chain Based on CIS*2750 notes from previous generations of CIS*2750 instructors Announcement Volunteer note taker needed Student Accessibility Services are looking for volunteer notetakers who are willing to share their class notes. In being a volunteer notetaker, you can contribute to academic success and make a di erence in the lives of your fellow peers. The volunteer notetaking position is a one semester commitment and volunteers can request to be removed from the program at any time. As a volunteer notetaker, your course notes will only be shared with students registered with our o ce, who have a documented disability and who have been approved for notetaking services. Link: uoguel.ph/notetaker 2 ff ffi What is a “tool chain”? Developing a software project is inherently tied to the tools used to develop the project - language(s), libraries, as well as the additional software that forms part of the development tool chain The tool chain is an informal term describing the sequence of software tools that converts your source code (one or more text les) into binary code On the surface it consists of just the compiler, but things are more complex than that - you use a sequence (chain) of tools 3 fi Lecture overview In this course, we will use C for the "native" portion (A1-A2), and Python/SQL for the UI/ database portion (A3). C covers the rst half of the course, so we will look at the Big Picture of the tools that many C programmers - at lest the ones working with *nix-family operating systems - use regularly Expose them all to view Some are “hiding” inside others! How they “chain” together in sequence How to control them Bring them up from the level of ritualized mumbo-jumbo to one of knowledge 4 fi The usual suspects C Preprocessor C Compiler Linker Loader Make 5 IDE The same tools are in use “under the hood,” but normally managed for you by the IDE “IDE” = Interactive Development Environment Windows - Visual Studio (becoming available on other platforms) macOS - Xcode Multi-platform: Eclipse, IntelliJ, Codelite, CodeBlocks, etc. Di erent languages have di erent tool chains Although often one IDE can accommodate more than one language/toolchain 6 ff ff A word on compilers There is more than one C compiler out there Popular FOSS (free, open-source) GNU Compiler Collection (GCC) compiler is a widespread and popular choice Free, open-source, available on many platforms Often installed by default with Linux Clang is a more recent compiler and an alternative to GCC Free, open-source, available on many platforms Installed on macOS with Xcode Both implement the C standards, and provide custom extensions There are MANY others We mostly use the C compiler from GCC as the platform in SoCS 7 The Big Picture Components of the C computer toolchain.h.c Your project files.o/.obj.so/.dll all appears to be “gcc”.i.s cpp* gcc* gas ld* C Preprocessor C Compiler (Assembler) Linker prog CPPFLAGS CFLAGS (SFLAGS) a.out LDFLAGS.exe.h.a/.lib System Libraries ld.so dlopen.so/.dll Loader *Managed by Make Run time Process Memory Build time data flow 8 Note: creating intermediate files The intermediate les discussed here (.i,.s, and.o) are not always generated by default We sometimes want to create.o les We almost never need the.i and.s les If you want to see them, compile your code with the -save-temps ag 9 fi fi fi (1) C Preprocessor Purpose: Interpret all the # directives in the.h and.c les before compiler sees the source code (intermediate.i le) #include: merge in header les #define: macros (with and without arguments) macros are “expanded” by processor through nd-and-replace #ifdef, if, else, endif: conditional compilation Exception: #pragma is passed on as compiler directive (ignored if compiler does not recognize) Note: The C preprocessor is often abbreviated to "cpp" - which is confusing, because "cpp" also happens to be a common acronym for C++ (and extension for C++ source les). We are not using C++ in this course, so "cpp" for us means "C preprocessor" 10 fi fi fi Prevent multiple/circular includes “Best practice”.h pattern in le headername.h #ifndef HEADERNAME_H #define HEADERNAME_H...body of.h file... #endif On the rst #include “headername.h”, symbol HEADERNAME_H gets de ned If it is #included again (even indirectly), the le’s contents is skipped 11 fi fi fi Headers and paths The #include directive has slightly di erent syntax for di erent headers #include This variant is used for system header les It searches for a le named le in a standard list of system directories #include "file" This variant is used for header les of your own program It searches for a le named le as follows: rst in the directory containing the current le then in the "quote" directories then the same directories used for < le>. 12 fi fi fi fi fi fi fi ff Headers and paths You must not include the path to the le in the #include statement e.g. #include "../includeDir/Parser.h" - do not do this! If the header and/or the le that uses the header move to a di erent location, the path breaks - and your code no longer compiles. That's bad. Instead, you always include the le itself, e.g. #include "Parser.h" Let the compiler toolchain take care of the paths 13 fi fi fi Headers and paths A directory that contains a header le can be passed to the compiler using the -I option: e.g. gcc -I../includeDir myFile.c These paths can be relative or absolute You can pass more than one include directory: gcc -I../includeDir -I/home/username/stuff/otherIncludeDir myFile.c 14 fi (2) C compiler Purpose: Compile C language source code (.i le) into assembly language (.s) Diagnoses abuses of language, and issues warnings and/or errors Intermediate.i and.s les normally deleted after assembly → not seen by user unless requested (-save-temps option) See the sample.c,.i,.s, etc. les in the GCC_intermediate_files.zip example posted on the course website (Week 1) Compile with gcc -save-temps test.c 15 fi fi (2+) Assembler = [G]AS Purpose: Assemble assembly code (.s) from compiler into object code (.o) This tool step is normally transparent There are options for controlling it, but rare to use them You can insert assembly statements into your C program, e.g., to utilize vector instructions for the CPU's vector processing unit 16 (3) Linker (link editor) = ld Purpose: to stitch together objects (.o) into a single binary program le - executable or library All the.o object les that make up the user program, plus All the referenced system libraries Linker creates libraries Static (.a /.lib): inserted in the executable le Dynamic (.so /.dll): linked in at run time We will see this in one of the upcoming lectures 17 fi fi What is the linker doing? In the object les, there are tables of external references (e.g., call “printf”) It reads all the libraries until it nds a matching external de nition ( the actual code of the printf() function in stdio.h) Libraries come in two avours - static and dynamic. They are linked di erently: Static linking: It pulls the de nitions (function code and global variables) into the program le, and xes up all the refs to point to their locations Dynamic linking: It “makes a note” of what library le to get the de nition from at run time 18 fi fi fl fi fi Linking and paths Linking a library The linker will only automatically link the standard C libraries - excluding the C math library The names of all other libraries must be explicitly passed to the linker using the -l option For example, to link the C math library, you add the -lm option to the compiler where m is the name of the math library the convention for library naming will be discussed in a later lecture on creating shared libraries E.g. gcc myCode.c -lm //Includes the standard math library gcc myCode.c -lsomeLibrary //Includes the standard math library But how does the linker know where the libraries m and someLibrary are? 19 Linking and paths Giving the linker a path to a library The linker has to know where the disk location of the binary les containing the libraries By default, it only looks in a small number of directories containing system libraries, e.g. /usr/lib, /usr/local/lib, etc. Notably, it does not look in the current directory! So if myCode.c and the someLibrary are in the same directory, and you run gcc myCode.c -lsomeLibrary, you will get an error: /usr/bin/ld: cannot nd -lsomeLibrary: No such le or directory 20 fi Linking and paths Giving the linker a path to a library We can explicitly give the linker additional paths to look in using the -L ag e.g. gcc myCode.c -L/path/to/a/library -lsomeLibrary So if myCode.c and the someLibrary are in the same directory, and you are running the compiler from that directory, you need to run: gcc myCode.c -L. -lsomeLibrary 21 Recap: preprocessor and linker paths -I - provides paths for the preprocessor (locations of header les for #include, AKA include paths) -L - provides paths to the linker (locations of libraries) -l - provides the names of libraries that the linker has to link with 22 Why is gcc involved in tools 1-3? GCC acts as a front end for {cpp, gcc, ld} You keep executing “gcc” to get all these tools invoked Nonetheless, gcc is not cpp or ld; it is just calling them for you! Reason: convenience gcc knows where all standard libraries are installed, so it secretly beefs up *FLAGS for you, otherwise your commands would be much longer/messier and less portable 23 (4) Loader = ld.so The steps we examined so far are concerned with creating binary les - e.g. executables using libraries What happens when we execute a le? Command shell (or program) tells OS to execute a program le… OS opens a fresh process (ref. CIS*3110) and calls loader to “ ll it up” by copying the segments of the program le into di erent regions of the (virtual) memory for the process Program instructions (“text” segment) Static data (“data” segment) OS creates a new stack and an empty heap, then transfers control to rst instruction of program 24 fi ff fi (4+) Loader = dlopen Program using dynamically linked libraries: ld.so will load these when program starts but references (calls) will be xed up on demand Because much/most of libraries will not be referenced Some programs use dlopen to load shared object “plugins” on demand at run time OS pauses process execution while loader grabs the object and links it into already-loaded code, then continues where left o 25 fi f Loader and paths On Linux, to see what dependencies your shared object has, use the ldd command e.g. ldd a.out The loader has to know where to nd these shared libraries at run time To see what directories the loader knows about, look at the les in the directory /etc/ld.so.conf.d 26 fi Adding a path for the loader By default, the current directory is not included in any of the paths in /etc/ld.so.conf.d - so the loader will not look for your shared library in the current directory You can x this by updating the environment variable LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:. If you do not want to retype this statement every time you log in, add it at the very end of the.bashrc le in your home directory For example, we will need to do this to load the List API library we will use in this course We will also need to do this when we link the A1 parser library with a main program that would test it 27 fi fi Example: compiling a list library Your assignments will use a simple list library, and the sample code consists of three les: LinkedListAPI.c - implementations of library functions LinkedListAPI.h - headers for the library (function prototypes, constants, etc.) StructListDemo.c - demo executable with a main() function To run the example, we need to Compile the library (more on that in an upcoming lecture) Compile the demo, which needs The location of library headers The location of the library binary le The name of the library binary le 28 fi fi Example: compiling a list library For now, we will compile the library using the provided Makefile (which we will examine soon) It creates a binary le called liblist.so NOTE: when we link this library, we refer to it as just list - this weird naming convention will be explained later 29 fi Example: compiling a list library If we then compile try to compile the executable demo le with gcc StructListDemo.c we get a bunch of errors: /usr/bin/ld: /tmp/ccUliKAO.o: in function `main': StructListDemo.c:(.text+0x1f4): undefined reference to `initializeList' /usr/bin/ld: StructListDemo.c:(.text+0x2d4): undefined reference to `insertBack'... What component of the C compiler toolchain generates this error? What do we need to x? 30 fi Example: compiling a list library If we then compile try to compile the executable demo le with gcc StructListDemo.c we get a bunch of errors: /usr/bin/ld: /tmp/ccUliKAO.o: in function `main': StructListDemo.c:(.text+0x1f4): undefined reference to `initializeList' /usr/bin/ld: StructListDemo.c:(.text+0x2d4): undefined reference to `insertBack'... What component of the C compiler toolchain generates this error? - the linker What do we need to x? - tell the linker what library to link and where it is 31 fi Example: compiling a list library Solution: gcc StructListDemo.c -L. -llist Where -L. tells the linker to look for a library in the current directory, in addition to the global path -llist tells the linker link with the library called list This library happens to be in the le called liblist.so - again, the naming convention will be explained later 32 fi Example: executing a list demo When we try to execute the le StructListDemo, we also get another error:./StructListDemo: error while loading shared libraries: liblist.so: cannot open shared object file: No such file or directory What causes this error? How do we x it? 33 fi fi Example: executing a list demo When we try to execute the le StructListDemo, we also get another error:./StructListDemo: error while loading shared libraries: liblist.so: cannot open shared object file: No such file or directory What causes this error? The loader How do we x it? We tell the loader to look in the current directory See Slide 27 for details Once the loader knows the path, the program executes correctly 34 fi fi CIS*2750 Lecture 2b: make les Based on CIS*2750 notes from previous generations of CIS*2750 instructors fi Makefiles The make utility executes a sequence of commands from a description le It is generally used to create executables, but it can perform other tasks: remove les report the status of a project packaging multiple les into a distribution installing les in a directory building libraries The make utility is not unique the ant utility for Java is similar Cmake is supposed to be an improved make However, make is still a very common one, so we stick with it 2 fi fi fi Makefiles and the compiler toolchain The make utility and the compiler toolchain are completely separate, unrelated entities However, the make utility is usually used to automate code building, to you will regularly use it to pass ags to compilers, specify paths, specify library names, etc. Which is why it is essential to understand how the compiler toolchain works - otherwise, you can't use make les e ectives, and won't be able to understand sample make les (e.g. my examples) 3 fl fi ff Make - the vanilla approach There are many (crazy) ways to achieve the same e ects with make les Programmers tend to develop their preferential ways of coding them Good to get exposure to several people’s approaches to make les Learn/understand features you didn’t realize exist Avoid treating make le as “magical incantation” We will use a very straightforward and simple convention, for the sake of readability I recommend you stick with it - writing code you do not fully understand is always a bad idea 4 fi Makefiles It examines dependencies between les. The date and existence of a le is also examined. If les do not exist then make attempts to build them. If the compiled les do exist, but they depend on a le which has a newer date, then they are recompiled. 5 fi fi fi fi Makefile Structure Each entry in a make le consists of three parts: Target Prerequisites or dependencies Command line myprog: myprog.c myprog.h gcc myprog.c -o myprog 6 fi Makefile Structure Target Each entry in a make le consists of three parts: Target Prerequisites or dependencies Command line myprog: myprog.c myprog.h gcc myprog.c -o myprog Target 7 fi Makefile Structure Prerequisites or dependencies Each entry in a make le consists of three parts: Target Prerequisites or dependencies Command line Prerequisites or dependencies myprog: myprog.c myprog.h gcc myprog.c -o myprog 8 fi Makefile Structure Command line Each entry in a make le consists of three parts: Target Prerequisites or dependencies Command line myprog: myprog.c myprog.h gcc myprog.c -o myprog Command line 9 fi Makefile Structure Tab Each entry in a make le consists of three parts: Target Prerequisites or dependencies Command line myprog: myprog.c myprog.h gcc myprog.c -o myprog Tab 10 fi Part 1: target The target is what a particular make le entry aims to build Typically a target is a lename Can be an executable, e.g. myprog or a library, e.g. librecord.so Although not always, as we will see later 11 fi fi Part 2: prerequisites The prerequisites are les that must exist in order for a target to be "buildable" E.g. if we want to build myprog, the les myprog.c and myprog.h must exist If any of the prerequisites are newer than the target, then the command line is executed In other words, if any of the dependencies were modi ed after the target was created, we recompile the target Only the les which need to be recompiled will be recompiled - instead of all of them 12 fi fi fi Makefiles and efficiency Recompiling only the les which require it is important for large applications where recompiling everything is slow https://xkcd.com/303/ 13 fi Part 3: command line The command line is the command - typically, a Unix/Linux command line utility with arguments - that must be executed to build the target using the dependencies The command line must be prefaced by a single tab The tab must a single tab character and not spaces. This might cause you pain and anguish, so be careful. 14 Invoking Make Typing only: make at the command line will look for either a le named Makefile or makefile and build the rst target that appears in the le. Typing: make will build the speci ed target. 15 fi fi fi fi Invoking Make e.g. making a speci c target, with the following make le myprog: myprog.c gcc … fred: fred.c gcc … Typing make myprog will build myprog. make fred will build fred. Typing make will build myprog Because myprog is the rst target in the make le 16 fi fi fi Other common targets As mentioned earlier, targets do not have to be les, e.g. clean: rm *.o core These can be used to clean up les that you no longer need. In my examples, the clean target usually deletes all targets i.e. executables and library les all temporary.o les the core le Other people have other conventions 17 fi fi fi fi Checking the commands You can view the commands make would execute without actually executing them using -n ag For example: make -n will display the commands that would be executed for the rst target make someTarget -n will display the commands that would be executed for someTarget 18 fl Multiline command lines You can put multiple lines in the command line by separating them with a semi-colon. e.g. libawesome.a: awesome.c gcc awesome.c -o awesome.o -c ;\ ar cr libawesome.a awesome.o The backslash means continue to the next line. The backslash must be the last character on the line Note: not all versions of make require the backslash. 19 Makefile macros Macros are used to avoid repeatedly typing a lot of text in make les. Things like paths, compiler ags, and lists of libraries can appear in multiple locations in a make le and are annoying to retype. Macros replace long strings with shorter text. They are de ned using the equal sign: e.g. LIBS = -L/usr/local/lib -lm -llibname and they are referenced using a $ and brackets: e.g. $(LIBS) or ${LIBS} 20 fi fi fl Example: sample makefile Macro names are normally in upper case. CC = gcc LIBS = -lm -L/usr/local/lib -L. -lmyLib prog: prog.c $(CC) prog.c -o prog $(LIBS) This example: compiles using gcc, but we can easily specify a di erent compiler by changing the CC macro (e.g. clang) Links in libraries m (C math library) and myLib Speci es that the linker should look for libraries in /usr/local/lib and the current directory (.) in addition to the linker's standard search path 21 fi ff Undefined macros Unde ned macros are replaced with a null string (nothing). LIB = -lm -L/usr/local/lib -L. -lmyLib a1: a1.c a1.h gcc a1.c -o a1 $(LIBS) will run the command: gcc a1.c -o a1 22 fi Predefined macros Macro CC is prede ned as command cc. cc is a usually symbolic link to the default C compiler of your *nix distribution Typically either GNU C compiler or the LLVM C compiler other options exist depends on your *nix distribution Macro LD is prede ned as command ld. You can use these without de ning them. Or you can replace them with something else. e.g. CC = gcc 23 fi fi fi Macro string substitutions This allows you to use a macro and substitute a string in the macro. e.g. SRCS = a.c b.c c.c SRCS can be referenced with: $(SRCS:.c =.o) This will translate the list to: a.o b.o c.o 24 Macro string substitutions $(SRCS:.c = ) translates to: a b c The names of the executables can be created using the names of the source les a.c, b.c, and c.c. These allow you to do things like specify prerequisites without too much typing Can also reduce clutter in a make le for a large project, where a make le entry might have dozens of prerequisites 25 fi Suffix rules The following make le will compile the executables a, b, and c. SRCS = a.c b.c c.c all: $(SRCS:.c = ) Question: Why will this work? There are no instructions describing how to convert the C les into executables. 26 fi Suffix rules The answer is su x rules. These tell the system how to compile di erent types of les. By convention C les end with.c, Fortran les end with.f, C++ les end with.cc. Make has built-in rules which will use the correct compiler to turn a source code le into an executable. Type make -p to see all of the prede ned macros. 27 ffi fi fi fi Comments Comments begin with a # and continue to the end of the line. #build all assignments all: a1 a2 #a1 requires the math library a1: a1.c a1.h gcc a1.c -o a1 -lm -std=c11 -Wall a2: a2.c gcc a2.c -o a2 -std=c11 -Wall 28 Flags control options for the tools The components of the C toolchain mentioned in earlier can be controlled using make les Note: Some IDEs let you set ags indirectly by clicking options on property pages Others - primarily the multi-platform ones - use make (or cmake) under the hood and provide lists of ags that you can modify directly 29 fl fl Selecting the compiler in a Makefile We have already seen this: CC=gcc select gcc C compiler as front end Note: on a Mac, the command line utility "gcc" is an alias for the Clang compiler, unless you speci cally install GNU gcc We can also pass various options to the elements of the compiler toolchain 30 fi Preprocessor flags Preprocessor: CPPFLAGS= -Iinclude_file_dir Add include_ le_dir to the include paths of both and "" includes. We use this to avoid hardcoding relative or absolute paths into #include statements E.g. -I~/myproj/include Will add "~/myproj/include" to the include path, so all les in that directory can be included with a simple #include " lename.h" directive Other useful ags -Dsymbol[=value] Equivalent to #define symbol value -DNDEBUG disable assertions 31 fi fl Compiler flags Compiler: CFLAGS= -g save symbols for debugger -On optimization level (0,1,2,3) -Wall -std=c11 all warnings, C11 standard -fpic position-independent code (for shared object library) This ag goes in the command, not in CFLAGS: -c compile to.o le (don’t link) 32 fl fi Linker flags Linker: LDFLAGS= -Llibrary_dir Pass a library path to the linker Example: -L~/myproj/lib Add "~/myproj/lib" to the paths containing linked library les -llibrary link in library Example: -lfoo link in libfoo.so (or.a) -shared create shared object lib. These ags go in the command, not in LDFLAGS: -o lename create an output le with a speci c name, instead of the default Example: -o caltest 33 fi fl fi A note on code organization Large C codebases are often broken up into di erent directories, which often follow certain naming conventions: The headers do into the include directory This can be more complicated - e.g. public headers for the nal product go into include, other headers get placed into other directories The source code goes into the src directory Again, this can get a lot more complex The various binary les got into one or more directories - e.g. bin, lib, etc. 34 fi https://github.com/curl/curl Real-world example The curl library (used for command-line URL data transfer) 35 Standards for our course We will follow a similar convention in our course code in src headers in include Make le in the main directory A1 description will date the exact details The Make le and le structure for the List example have been updated accordingly, so you can use them as a template 36 fi fi fi Standards for our course Using the Make le macros This is where the macros shine We use macros to de ne locations of code and headers: INC = include/ SRC = src/ We then use them in our Make le - dependencies and the command line, e.g. StructListDemo.o: $(SRC)StructListDemo.c $(INC)LinkedListAPI.h $(CC) $(CFLAGS) -I$(INC) -c $(SRC)StructListDemo.c See the les in the updated ListExample.zip, posted in the Week 2 module on the course website 37 fi fi fi fi CIS*2750 Lecture 3: List API; creating libraries Based on CIS*2750 notes from previous generations of CIS*2750 instructors A review of a simple C library: List ADT Fixed-length arrays are not very useful for dealing with optional data, so we want something exible Many data structure implementations rely on linked data records Obvious example: linked list Other examples: Trees Hash tables - linked elements are used to resolve collisions, when more than one element hashes into the same bucket We are using C and do not have equivalents to Java’s ArrayList or Vector, so we need something like a linked list fl Common list operations create a list insert a node at front/back remove a node from front/back retrieve an arbitrary node iterate through a list clear/delete list create a humanly-readable representation of the list with its data Storing data A good data structure implementation should be generic - i.e. be able to store any data of any type Re-declaring a new list ADT for every data type we might want to store is a non-starter In C, this means that we store data of type void* in the list So a list node has links to other nodes and a pointer to the data Storing data We do not know the data type at compile time, so we do not know how to Safely free the data (a single call to free() might not be enough) Compare data values if we want to sort the list or insert elements in order Convert the contents of a list to a humanly-readable representation think toString() in Java We do this by having the List structure contain a few function pointers These point to the functions for freeing and comparing data on the list, as well as converting that data into a string In essence, we are creating a pseudo-class with public members List API: C Check LinkedListAPI.h in the Week1-2 examples.zip on the course web page The example is extensively documented Common list operations create a list insert a node at front/back remove a node from front/back retrieve an arbitrary node iterate through a list clear/delete list Iterating through a list “Stepping through” every element in a data collection is a very common operation For an indexed collection (e.g. an array) we can use a loop with a counter For a linked data collection, we use An iterator A “smart” for-loop (which C doesn't have, so it will not help us) An iterator is an object that allows you to iterate through (traverse) a data collection Typically used with a linked data collection It allows us to have a generic way of traversing a list without relying on the list’s implementation Iterator example: Java ArrayList al = new ArrayList(); //Create and add strides records al.add(new StudentRecord(...) ); Iterator itr = al.iterator(); while(itr.hasNext()) { String element = itr.next(); System.out.print(element); } System.out.println(); Iterator example: C // Allocate the data. It must be dynamically allocated, since // our deleteFunc will try to free the contents of each node char* str; for (int i = 0; i < 4; i++){ str[i] = (char*)malloc(10*sizeof(char)); } //Initialize the string array strcpy(str, "Hello"); strcpy(str, " "); strcpy(str, "world"); strcpy(str, “!"); //Create and populate the list List* list = initializeList(&printFunc, &deleteFunc, &compareFunc); for (int i = 0; i < 4; i++){ insertBack(&list, (void*)str[i]); } Iterator example: C void* elem; //Create an iterator ListIterator iter = createIterator(list); //Traverse the list, examining one element at a time while ((elem = nextElement(&iter)) != NULL){ char* str = (char*)elem; printf("%s", str); } printf("\n"); deleteList(list); Iterator example: C See StructListDemo.c Libraries In this course, you will spend a lot of time developing libraries Libraries are reusable collection of precompiled functions and ancillary data - new types, enums, constants, etc.. A large part of software development involves designing the library API - application programming interface The API is the "interface" for the user of your library - i.e. another programmer It is made of all the public components of your interface - functions, constants, custom types, etc.. The List API we have just described is used as a library in StructListDemo.c Library design Programmer decides on what functionality goes into library components what each function does what arguments it takes and what it returns how it handles errors how to name it so its role is clear Since for the rst part of the course we are using C, we will focus on creating libraries of functions However, in an object-oriented languages we would be creating libraries of classes Later, we will use libraries in Python (which calls them "modules") fi Libraries in action You have been using libraries since you started programming in C For example, the standard C library has an API consisting of all functions and constants that you are familiar with: functions: printf()/scanf(), strcpy(), pow(), etc. constants (#de ne macros, actually): NULL, RAND_MAX, EOF, etc. fi Library design - public aspects Since the API is the public interface for your library, designing a good API is important Deciding what makes up the API is similar to deciding on public methods of a class For this course (A1 and A2) I have designed the API for you, and your job is to implement it In later courses, you will be learning to design the public API yourselves Think of the end-user (programmer) What does the library allow them to do? e.g. parse les in a speci c format, do math, manipulate strings, etc. You expose the important functionality to library users through functions Think of these as public functions in your interface fi fi Library design - private aspects It is equally important to decide what functionality you do not want other programmers to see / use - i.e. what components will be private This is the concept of information hiding you saw in object-oriented programming The purpose is to hide implementation details, so the user of the library does not mess things up e.g. why let the user manipulate the next pointer of a list struct - possibly incorrectly? instead, we create insertFront / insertBack methods for the user C has very limited tools for actually making library internals inaccessible - we will see them in an upcoming lecture Libraries in practice On Linux, system-wide libraries are normally stored in /lib and /usr/lib. They always have an associated header.h le which contains constants and function de nitions. Standard headers are normally located in /usr/include One library you would have used by now is the math library Library le is libm.so and the associated header le is math.h Notice they do not necessarily have the same name. fi fi fi Static and dynamic libraries There are two types of libraries: For static libraries, when a program is compiled, the library contents are copied into the executable and stored with it Library copy is stored in the executable le Compile-time linking For dynamic libraries, when the program is executed, the library is linked (connected) to the executable in memory The library copy is not stored in the executable le Run-time linking Also known as shared libraries in the Unix/Linux world, dynamically linked libraries (DLLs) in the Windows world fi Static and dynamic libraries Static libraries Pros: one self-contained executable, a bit more beginner-friendly Cons: executable can be very large; in exible - library cannot be updated Dynamic libraries Pros: exible - an executable can be linked to di erent versions of the library, as long as the library API is the same; executable is not bloated Cons: In some applications a single self-contained executable might be more convenient; require more experience to use Dynamic libraries tend to be the default in most real-world applications fl fl Naming libraries All library names begin with lib Static libraries end with.a Dynamic libraries end with.so (shared object on Linux) or.dll (dynamic linked library on Windows) Can also be.dylib (dynamic library) in addition to.so on macOS Shared objects can have version numbers after the.so: libm.so.3 libm.so.3.2 These would be version 3 and 3.2 of the math library Libraries and symbolic links Symbolic links may be used to create multiple names for the same library For example, liba.so.3.2 may have links liba.so.3 and liba.so which point to it The library itself is called "a" Some common library names: libm.so.6 - the math library, m libc.so.6 - the standard C library, c libstdc++.so.2.8 - the C++ standard library, stdc++ Creating a dynamic (shared) library Easy to do it with gcc We need to enable creation of Position-Independent Code (PIC). It works no matter where in memory it is placed the location of the library in memory will vary from program to program, and more than one program might be using it at any given time so we need to use appropriate compiler/linker arguments Done in two steps gcc -c -fpic record.c - creates the.o le gcc -shared -o librecord.so record.o - creates the.so le See the Make le in the List API sample code fi fi Creating a dynamic (shared) library requires two steps: source code le record.c gcc -c -fpic record.c compiled into an record.o object le gcc -shared -o librecord.so record.o librecord.so -shared ag creates a shared library -o ag speci es the output le name fl fi fl fi fi fi Creating an executable using a shared library Given a library and associated header le: librecord.so record.h And a program which uses the library: prog.c -source code which uses the library prog.h -header le associated with prog.h First compile the source code (prog.c) and then link the library (librecord.so). fi fi Creating an executable using a shared library need to include the header for the library record.h prog.c so the compiler knows the structure of the library function calls. We may need to specify the location of prog.h the header using the -I preprocessor command record.h (Lectures 3a and 3b) Compile Step gcc prog.c -o prog.o -c prog.o librecord.so Link Step gcc prog.o -o prog -lrecord -L. Converts the compiled program prog and library into an executable. Using a library The compile step converts the source code into an object le. Use -c with gcc. The link step links the library to the compiled program Use -lxyz (lower case L followed by the library name) You only need to provide the unique part of the library name. Do not use the entire library name. E.g. Correct: -lxml2 Incorrect: libxml2.so Do use -l (a lower case L) to identify the library Do not include lib or.so in the library name when you link with it CIS*2750 Lecture 4: memory and valgrind refresher Allocating memory in a function: The rule Rule: to modify a thing in a function, we must pass its address - i.e. a pointer to the thing If we want to modify a value in a function, we must pass it by reference In other words, we must pass its address, or a pointer to it Otherwise, what is modi ed in the function is the copy of the value For example, if we want to modify the value of an int in a function, we pass a pointer to it - i.e. its address fi Modifying an int in a function main: int n = 2; //We want to change a thing add2(&n); //We pass the address of the thing printf(“%d”,n); //prints 4 add2: void add2(int* val){ *val = *val + 2; //we add 2 to the thing that val points to - //i.e. the value in variable n } Allocating memory in a function If we want to allocate memory to a pointer, we can do it in a function The value of a pointer variable p is a memory address Could be NULL initially - e.g. the value of p is 0 We want to change the value of p to be the address of some block of memory we have allocated We apply the same rule: to modify a thing in a function, we must pass its address to modify a pointer, we must pass its address an address is a pointer the address of pointer is a … double pointer - an address of an address Modifying a pointer in a function main: int arrLen = 10; int* array = NULL; //We want to change a thing allocate(&array, arrLen); //We pass the address of the thing //the value of array is now an address of some freshly allocated memory block allocate: void allocate(int** p, int len){ *p = malloc(sizeof(int)*len); //We have modified the thing that array points - i.e. the value (address) stored in the variable array } Modifying a pointer in a function 05fd 0 array int* array = NULL; allocate(&array, arrLen); We have a create named array that lives at memory address 05fd The value stored at memory address 05fd is whatever NULL happens to be - which is usually the integer value 0 Modifying a pointer in a function 05fd 0 05fd array p int* array = NULL; allocate(&array, arrLen); void allocate(int** p, int len){ *p = malloc(sizeof(int)*arrLen); } We call the function allocate and pass it the address of array Inside allocate: p = 05fd - i.e. the address of array *p = 0 - i.e. the value stored in array Modifying a pointer in a function 05fd f23c 0 05fd array p 80 bytes int* array = NULL; allocate(&array, arrLen); void allocate(int** p, int len){ *p = malloc(sizeof(int)*arrLen); } Inside allocate, we call malloc - which asks the operating system for 10*sizeof(int) bytes of memory - i.e. 80 bytes of a 64-bit system and returns the start address of that block Modifying a pointer in a function 05fd f23c f23c 05fd array p int* array = NULL; allocate(&array, arrLen); void allocate(int** p, len){ *p = malloc(sizeof(int)*arrLen); } We then assign this address to the thing that p points to In other words, we go the address stored inside p - i.e. 05fd and overwrite whatever is stored there with the value returned by malloc so we replace 0 with f230 Modifying a pointer in a function 05fd f23c f23c array int* array = NULL; allocate(&array, arrLen); void allocate(int** p, len){ *p = malloc(sizeof(int)*arrLen); } Control passes back to main() The value in array is now the address of the newly allocated block of memory Why not do this? Let’s see… main: int arrLen = 10; int* array = NULL; allocate(array, arrLen); //Address, shmaddress allocate: void allocate(int* p, len){ //Screw all those extra *’s! p = malloc(sizeof(int)*len); } So far so good 05fd 0 array int* array = NULL; allocate(array, arrLen); We have a create named array that lives at memory address 05fd The value stored at memory address 05fd is whatever NULL happens to be - which is usually the integer value 0 this looks a bit different… 05fd 0 0 array p int* array = NULL; allocate(array, arrLen); void allocate(int* p, len){ p = malloc(sizeof(int)*arrLen); } We call the function allocate and pass it the value of array Inside allocate: p = 0 - i.e. value stored in array the malloc() bit is the same 05fd f23c 0 0 array p 80 bytes int* array = NULL; allocate(array, arrLen); void allocate(int* p, len){ p = malloc(sizeof(int)*arrLen); } Inside allocate, we call malloc - which asks the operating system for 10*sizeof(int) bytes of memory - i.e. 80 bytes of a 64-bit system and returns the start address of that block uh oh… 05fd f23c 0 f23c ccc array p int* array = NULL; allocate(array, arrLen); void allocate(int* p, len){ p = malloc(sizeof(int)*arrLen); } We then assign this address to p In other words, we change the value of p from 0 to f23c The value of array remains 0 - with is NOT what we want! fail! 05fd f23c 0 array int* array = NULL; allocate(array, arrLen); void allocate(int* p, len){ p = malloc(sizeof(int)*arrLen); } Control passes back to main() The value in array remains the same The freshly allocated memory block still exists, but nothing points to it All we got is a brand new memory leak! Moral of the story to modify a thing in a function, we must pass its address - otherwise we’re modifying a copy! Usability note Dereferencing double-pointers is a pain In your Assignment 1, I recommend you do this: VCardErrorCode createCard(char* fileName, Card** newCardObject){ Card* tmp = malloc… //Allocate memory //Fill in tmp with value from the file... //at the end of the file, modify the thing that obj points to *obj = tmp; //return stuff here } Valgrind Valgrind is a memory debugging tool that can check for memory leaks (failing to free dynamically-allocated memory) and diagnose various other memory errors Unix/Linux only. Works on Ubuntu shell on Windows, and macOS up to 10.11 (El Capitan) Run valgrind with a compiled program to check for memory which has been allocated and not freed. e.g. valgrind./myprog You need to compile your program using the -g ag (same as for use with GDB): gcc a1.c -o a1 -g valgrind can print out a lot of information about the memory usage of your program. The most important sections are the HEAP SUMMARY, the LEAK SUMMARY, and the ERROR SUMMARY. Output of a program with no errors $ valgrind./exampleCode1 ==24731== Memcheck, a memory error detector ==24731== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==24731== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==24731== Command:./tree1 Reyn-Rozh_parking(0rt_1trk_1seg_400m).gpx ==24731== ==24731== ==24731== HEAP SUMMARY: ==24731== in use at exit: 0 bytes in 0 blocks ==24731== total heap usage: 175 allocs, 175 frees, 132,127 bytes allocated ==24731== ==24731== All heap blocks were freed -- no leaks are possible ==24731== ==24731== For counts of detected and suppressed errors, rerun with: -v ==24731== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Valgrind and memory leaks Memory blocks are marked by valgrind as one of four types: "de nitely lost" means your program is leaking memory x these leaks "indirectly lost" means your program is leaking memory in a pointer-based structure. (E.g. if the root node of a binary tree is "de nitely lost", all the children will be "indirectly lost".) x these leaks If you x the "de nitely lost" leaks, the "indirectly lost" leaks should go away "possibly lost" means your program is leaking memory, unless you're doing unusual things with pointers that could cause them to point into the middle of an allocated block; see the user manual for some possible causes x these leaks you are unlikely to see these errors http://valgrind.org/docs/manual/faq.html fi fi fi fi fi fi fi Valgrind output program with a memory leak created by a missing free() command $ valgrind./exampleCode2 ==24702== Memcheck, a memory error detector ==24702== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==24702== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==24702== Command:./tree1 Reyn-Rozh_parking(0rt_1trk_1seg_400m).gpx ==24702== ==24702== ==24702== HEAP SUMMARY: ==24702== in use at exit: 14,865 bytes in 113 blocks ==24702== total heap usage: 175 allocs, 62 frees, 132,127 bytes allocated ==24702== ==24702== LEAK SUMMARY: ==24702== definitely lost: 176 bytes in 1 blocks ==24702== indirectly lost: 14,689 bytes in 112 blocks ==24702== possibly lost: 0 bytes in 0 blocks ==24702== still reachable: 0 bytes in 0 blocks ==24702== suppressed: 0 bytes in 0 blocks ==24702== Rerun with --leak-check=full to see details of leaked memory ==24702== ==24702== For counts of detected and suppressed errors, rerun with: -v ==24702== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Valgrind and memory errors In addition to leaks, valgrind can detect memory errors: Using an uninitialized variable Writing into memory that was not allocated etc. http://valgrind.org/docs/manual/faq.html Valgrind output Memory error: your malloc call did not allocate enough memory $ valgrind./StructListDemo ==32322== Memcheck, a memory error detector ==32322== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==32322== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==32322== Command:./StructListDemo ==32322== ==32322== Invalid write of size 8 ==32322== at 0x1095D2: main (StructListDemo.c:143) ==32322== Address 0x4a25d10 is 0 bytes inside a block of size 1 alloc'd ==32322== at 0x483577F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==32322== by 0x1095B9: main (StructListDemo.c:142) ==32322==... further errors omitted... ==32322== Invalid read of size 1 ==32322== at 0x4839D20: strcmp (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==32322== by 0x10932D: compareFunc (StructListDemo.c:56) ==32322== by 0x48455AE: deleteDataFromList (LinkedListAPI.c:181) ==32322== by 0x1095F0: main (StructListDemo.c:145) ==32322== Address 0x4a25d11 is 0 bytes after a block of size 1 alloc'd ==32322== at 0x483577F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==32322== by 0x1095B9: main (StructListDemo.c:142) ==32322== ==32322== ==32322== HEAP SUMMARY: ==32322== in use at exit: 0 bytes in 0 blocks ==32322== total heap usage: 33 allocs, 33 frees, 2,049 bytes allocated ==32322== ==32322== All heap blocks were freed -- no leaks are possible ==32322== ==32322== For counts of detected and suppressed errors, rerun with: -v ==32322== ERROR SUMMARY: 20 errors from 3 contexts (suppressed: 0 from 0) Valgrind output Memory error: you did not initialize a variable before use $ valgrind./StructListDemo ==308== Memcheck, a memory error detector ==308== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==308== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==308== Command:./StructListDemo ==308== ==308== Conditional jump or move depends on uninitialised value(s) ==308== at 0x48B2029: vfprintf (vfprintf.c:1637) ==308== by 0x48D3320: vsprintf (iovsprintf.c:42) ==308== by 0x48B9773: sprintf (sprintf.c:32) ==308== by 0x1092D2: printFunc (StructListDemo.c:38) ==308== by 0x1094C0: main (StructListDemo.c:104) ==308== ==308== Use of uninitialised value of size 8 ==308== at 0x48ADD1E: _itoa_word (_itoa.c:179) ==308== by 0x48B15F3: vfprintf (vfprintf.c:1637) ==308== by 0x48D3320: vsprintf (iovsprintf.c:42) ==308== by 0x48B9773: sprintf (sprintf.c:32) ==308== by 0x1092D2: printFunc (StructListDemo.c:38) ==308== by 0x1094C0: main (StructListDemo.c:104) ==308==... further errors omitted... ==308== ==308== ==308== HEAP SUMMARY: ==308== in use at exit: 0 bytes in 0 blocks ==308== total heap usage: 33 allocs, 33 frees, 2,052 bytes allocated ==308== ==308== All heap blocks were freed -- no leaks are possible ==308== ==308== For counts of detected and suppressed errors, rerun with: -v ==308== Use --track-origins=yes to see where uninitialised values come fr Memory Reminders All calls to malloc/calloc/realloc require an associated free or a memory leak will be created Some library functions will allocate memory and it is up to you to free that memory if it is returned to your program (e.g. strdup() ) This is also how many of our List functions work as well - e.g. toString() Others will use memory and only free it when you nish using the functions e.g. fopen() and fclose() More information Use the valgrind documentation page: http://valgrind.org/docs/manual/index.html The Quick Start guide is always a good starting point http://valgrind.org/docs/manual/QuickStart.html You can also use the labs and talk to the TA about using valgrind CIS*2750 Lecture 5: Advanced C Programming Some material from Expert C Programming: Deep C Secrets by Peter van der Linden Based on CIS*2750 notes from previous generations of CIS*2750 instructors Topics Scope of symbol names 4 avours Precedence of operators associativity syntax of declarations with multiple operators fl Scope, Access Control, Storage Class Most programming languages separate the concepts of scope, access control, and storage class to some extent Variable scope is the region over which you can access a variable by name. In other words, scope de nes where a variable is "visible" E.g. Variables declared within a method are only visible within that method (local scope) Storage class determines how long the symbol - e.g. variable - stays "alive" Access control determines who can access a symbol that is visible to everyone fi Scope vs Access Control Most modern languages give us additional control for accessibility of functions/methods, as well as some of the variables For example, in Java we have 4 access levels for methods and class/instance variables private, protected, package-level (default), public So a class variable - that has class-wide scope - is visible to all classes/objects within a program, but we can make it inaccessible to various other classes through the use of access control e.g. private to make it visible only to the objects of that class Scope in C C has no access control as such - everything is "public" - but we have some control over who can see a speci c variable There are 4 types of scope: Program scope … widest File scope Function scope Block scope … narrowest fi Scoping Principle Always de ne a symbol in the narrowest scope that works Reasons? Same as for controlling access in OO programming - error prevention and, to a lesser extent, security fi 1. Program Scope The variable is accessible by all source les that make up the executable. In C, all functions Global (extern) variables fi Program Symbol Concepts Names used for data and functions variable name, typedef, enum, struct ( elds), class (data members, methods), and more De nition: where the named thing “lives” actual memory location of data or function Reference: some use of the thing by name load/store, call: must be “resolved” to location Declaration: tells compiler about the name compiler can verify that references are correct fi fi Examples int max(int a, int b); // function prototype declaration int main(){... float sum = 0.0; // variable definition sum = sum*10 + max(x,y); // references ^store ^load ^call... } // function definition int max(int a, int b) { return a>b? a : b; } // if this definition was up top before the reference, the definition // would serve as a declaration, too External Symbols Program scope symbols are passed to linker (ref. Lecture 2a) in a.o le External de nition, “extdef” External reference, “extref” In linked executable, for each external symbol: Exactly one extdef, or else we get an error: “unde ned external”, “multiply de ned external” Any number of extrefs substituted with nal memory address of symbol fi fi fi fi “Externals” Having “program scope” (external symbols) is a common requirement assembly language all kinds of programming languages allows big program to be linked together out of small modules Each language has own convention for designating extdef & extref Using Program Scope in C: function extdef: void insertBack(List* list, void* toBeAdded){…} de nition only appears in one.c le (LinkedListAPI.c) declaration: void insertBack(List* list, void* toBeAdded); seems to appear only once, in LinkedListAPI.h however, prototype declaration (LinkedListAPI.h) is actually included in many.c les we use include guards to prevent recursive re-declarations extref: insertBack(list, data); call - de nitely happens in multiple les fi fi Using Program Scope in C: variable extdef: FILE* inputfile; de nition only appears in one.c, outside any function can initialize: type varname = initial_value; declaration: extern FILE* inputfile; declaration appears anywhere in a le, in/outside functions extref: fclose(inputfile); appears anywhere we use the variable fi fi Using Program Scope You have to decide when to use program scope In general, variables should never be globally accessible by the end user Program scope functions should be part of the program's public interface Try to avoid program scope for internal functions, if possible 2. File Scope A variable or a function is accessible from its declaration (de nition) point to the end of the le. In C, static things that are global within a le, but not the whole program CAUTION: static keyword has multiple uses! If variable de ned outside any function… would normally be “program scope” (global) static keyword keeps de nition from being passed to linker → doesn’t become external fi fi fi fi Using File Scope The le scope in C sits somewhere between private and package-level in Java File scope is perfect for "internal-use-only" functions Can be used for variables, but you really should avoid variables that are global to a le Keep variables local to each function body (local scope) and pass them as function arguments instead Global variables ( le or program scope) are usually a bad idea - they lead to hard-to- nd errors fi fi 3. Function Scope Accessible throughout a function. In C, only goto labels have function scope; therefore you will never see them “Throughout” means you can jump ahead: goto bummer; … bummer: printf(“Outta here!”); 4. Block (local) Scope The variable is accessible after its declaration point to the end of the block in which it was declared. Remember, a block is anything between {} braces In C, variables declared within a block - e.g. the body of a function or a loop - are local variables. I will usually use the terms local scope and block scope interchangeably Includes: function’s parameters local variables declared in a function loop counters declared in a for (int i = 0; …) statements variables declared within the loop or branching statement body What Happens? func() { int a = 11; { int b = 10; } printf(“%d %d\n”,a,b); } Won’t work! The variable b is inside a block and therefore is not visible to the rest of the function. What Happens? newfunc() { int a = 11; { int b = 10; printf ( “%d\n”,b); } printf ( “%d\n”,a); } Fixed! Using local variables - badly! As a rule, declare variables with the narrowest scope E.g. loop counters can be declared right in the for-loop statement Temporary variables used only inside one loop can be declared within that loop Move a variable to a wider local scope only when necessary Avoid using the same variable name in overlapping scopes - i.e. shadowing Bad example void func(){ int x = 7; for (int i = 0; i < 10; i++){ int x = i;// The "outer" x never changes } } Scope vs. Storage Class Storage class applies to where & how long variable is kept Di erent from scope - i.e. who can see a variable Typically, variable scope and storage class are unrelated to each other C is a bit weird, since declaring a variable static a ects both its storage and its scope! ff Automatic Storage Associated with functions Arguments Local variables inside function - or inside any other block remember, a block is stu between a pair of {}s Fresh temporary copy created on the stack every time function called Copy can be initialized (same value each time) Copy goes away when function returns to caller Allows recursion to work! ff Automatic Storage - warning Never return pointers to automatic variables! Remember, the auto variable goes away when the scope ends / function returns So by returning a pointer to an automatic variable, we return a pointer to memory that has be freed! To make the matters worse, this will sometimes "work" Automatic Storage - BAD example char* func(){ char x; //Do stuff with x return x; } int main(void){ char* s = func(); printf("%s\n", s); } This might print something useful. Or it might not. Never do this! Static Storage static storage: Means there is only one instance of variable in executable program Applies to program scope (global variables) “static” le scope variables “static” local variables If you add static keyword, a variable is not global anymore - it is restricted to le scope The static keyword changes local variable from automatic to static storage class Initialization e ective once, when program started fi ff Static Storage Typically, the only static variables are the ones we want to restrict to le scope Occasionally we want to have a function that "remembers" its local value between function calls In that case, we declare a variable with local (block) scope, and static storage For example, we want to create a constructor-style function that allocates a struct: a 2D point We want to keep track of how many structs were allocated, i.e. how many times the function was called we can create an internal counter in the function: static int count = 0; Example: static storage #include #include typedef struct { float x; float y; } Point2D; Point2D* createPoint2D(){ static int count = 0; Point2D* tmp = malloc(sizeof(Point2D)); if (tmp != NULL){ printf("Allocated %d structs\n", ++count); } return tmp; } Example: static storage int main(void){ Point2D* p1 = createPoint2D(); //Prints Allocated 1 structs Point2D* p2 = createPoint2D(); //Prints Allocated 2 structs Point2D* p3 = createPoint2D(); //Prints Allocated 3 structs return 0; } Global / external storage Program scope variables exist as long as the program is running Example (all in one.c file) int i; // Program scope, program storage static int j; func( int k ) { int m; // Block/local scope, automatic storage static int x; // Block/local scope, static storage } Dynamic Storage Third class of storage, contrasted with static and automatic Created (temporarily) on the heap via malloc(), calloc(), realloc() Must be explicitly freed via free() Address (pointer) has to go in some variable That variable has scope and storage class itself If dynamic storage is not explicitly freed, it typically exists even after the program terminates We often have to reboot/log out to clear un-freed memory TL;DR: General rules of scope and access Keep variables local, use the narrowest scope No program scope variables Few to no le scope variables Only the necessary functions with program-level access Give the internal, utility functions le-scope, if possible Only return pointers to dynamically allocated storage Never return pointer to automatic (statically allocated) variables from functions Always remember to manually free dynamically allocated storage fi fi Precedence of Operators Operator precedence determines the order in which operators are evaluated x = 25 * a + c / 2.1 Operators are used to calculate values for both numeric and pointer expressions Operators also have an associativity which is used to determine which operands are grouped with similar operators. Associativity Applies with 2 or more operators of same precedence: A op1 B op2 C op3 D Answers question: Which op is done rst? Associativity can be either Left-to-Right or Right-to-Left fi Associativity Left-to-Right (AKA left associative) is most common a + b – c; The + and – operators are both evaluated left-to-right so the expression is “a plus b, then subtract c” Equivalent to: (a + b) – c; Associativity Right-to-Left (AKA right associative) is rare a = b = c = 1; This expression is read “assign c to b, then to a” Equivalent to: a = (b = (c = 1)); Only meaningful because in C, assignment operator is an expression, resulting in a value In some other languages, e.g. Swift, assignment operator does not return a value Problems with Precedence The precedence of some operators produces problems when they create behaviours which are unexpected Don’t get clever, use parentheses Problems with Precedence Pointer to structure: *p.f Expectation: the member f of what p points to: (*p).f Actually: means *(p.f) p.f gives a compile error if p is a pointer Why?. is higher precedence than * Note: The -> operator was made to correct this. p->f Problems with Precedence int *ap[] Expectation: ap is a ptr to an array of ints int (*ap)[] Actually: ap is an array of pointers-to-int int *(ap[]) Why? [] is higher precedence than * Note: usually found in declarations. Problems with Precedence int *fp() Expectation: fp is a ptr to a function returning an int: int (*fp)() Actually: fp is a function returning a ptr-to-int: int *(fp()) Why? () is higher than * Note: usually found in declarations This is particularly bad, since in C, f() means a function with a variable number of arguments - not a function with no arguments If you want a pointer to a function that has no arguments and returns an int, declare int (*fp)(void) Problems with Precedence c = getchar() != EOF Expectation: ( c = getchar() ) != EOF Actually: c = (getchar() != EOF) c is set equal to the true/false value Why? comparators == and != have higher precedence than assignment Fix: ( c = getchar() ) != EOF Solution to Precedence Problems When in doubt, use parentheses Better still, always use parentheses You may not be in doubt, but the next reader could be Resist the temptation to write “clever” code - you might impress yourself at the time, but You will hate yourself later Your coworkers/teammates will hate you as well All other aspects being equal, always strive for clarity and readability CIS*2750 Lecture 5b: C odds and ends - types and typedefs, enumerated types searching with predicates Based on CIS*2750 notes from previous generations of CIS*2750 instructors Review: typedefs When we are dealing with large-scale software development - modules, libraries, etc. - we often have to de ne new types - so let's remember how to de ne them in C The typedef operator can be used to create aliases or shorthand notations for existing C types Typedefs do not really create new types - they are used to give existing types a new name (alias). The intent is to make code more readable and easier to write. typedef ; Syntax: keyword type new name (alias) fi Typedef Examples Consider the following de nition: struct Vec2 { float x, y; }; This de nes a new type - a structure We refer to this type as struct Vec2 in our code, e.g.: struct Vec2 u = { 0, 1 }; struct Vec2 v = { 1, 0 }; fi fi Typedef Examples We can use a typedef now: struct Vec2 { float x, y; }; typedef struct Vec2 Vector; This de nes a structure (struct Vec2) and an alias (Vector), which allows us write this: Vector u = { 0, 1 }; Vector v = { 1, 0 }; Instead of this: struct Vec2 u = { 0, 1 }; struct Vec2 v = { 1, 0 }; fi Typedef Examples The following does exactly what we did on the previous slide, but in a single statement: typedef struct Vec2 { float x, y; } Vector; Again, we can write this: Vector u = { 0, 1 }; Vector v = { 1, 0 }; instead of this: struct Vec2 u = { 0, 1 }; struct Vec2 v = { 1, 0 }; Typedef Examples The name of the alias can be the same as the name of the structure: typedef struct Vec2 { float x, y; } Vec2; Now, we can write this: Vec2 u = { 0, 1 }; Vec2 v = { 1, 0 }; instead of this: struct Vec2 u = { 0, 1 }; struct Vec2 v = { 1, 0 }; In this case, we simply created a “shorthand” notation for struct Vec2. Types Types will be a continuous theme in this course You've been using them since you started programming in C We will de ne what a type actually is later in the course Since we are using a typed programming language, we might as well use types to our advantage Make the code more readable O oad some of the error checking onto the compiler Build good programming habits that will be useful when using languages with stronger type systems Using types correctly also helps you avoid precision issues Remember, ints are precise, but oats are often NOT ffl fi fl Rules for using types Always pick the appropriate type for the job and use the most speci c type ("narrowest") type For example: use an int instead, say, oat, if you need to store counts, lengths, and other inherently integer values use the C bool type instead of int if your value is always true or false Outside of C, this will also let you use a compiler to keep you from making accidental mistakes For example, language with strict type rules will not allow you to assign an int to a boolean variable will not allow you to assign a signed int to an unsigned int fl Rules for using types Create new, speci c types that help you model and solve a problem For example, let's say you need to store an array of 2D vectors fi Solution 1: use existing types We de ne a 2D array of oats We do not know array size at compile time, so we have to use pointers/malloc //declaration and memory allocation float** vectors; vectors = malloc(numVec*sizeof(float*)); for (int i = 0; i < numVec; i++){ vectors[i] = malloc(2*sizeof(float)); } //use for (int i = 0; i < numVec; i++){ vectors[i] = xVal; vectors[i] = yVal; } //Deallocation for (int i = 0; i < numVec; i++){ free(vectors[i]); } free(vectors); fi fl Solution 2: use a dedicated type We can use Vec2 we've de ned earlier Again, we do not know array size at compile time, so we use malloc typedef struct Vec2 { float x, y; } Vec2; //allocation Vec2* vectors = malloc(numVec*sizeof(Vec2)); //Use for (int i = 0; i < numVec; i++){ vectors[i].x = xVal; vectors[i].y = yVal; } //Deallocation free(vectors); fi Analysis Advantages of Solution 2 E ciency: Solution 1 makes numVec calls to malloc() and numVec calls to free, Solution 2 uses one malloc and one free We use less memory (why?) Reliability: less manual memory management usually means fewer errors and less time spend debugging Readability we know what vectors[i].x is we have to guess what vectors[i] is The 2D array is n×2. What if it were 2×n? ffi Analysis Advantages of Solution 2 Maintainability and information hiding If we want to change the storage type from float to double, In Solution 1 we have to change multiple lines of code (every array declaration, all malloc calls) In Solution 2 we have to change only one line typedef struct Vec2 { double x, y; } Vec2; Moral of the story Types are useful, so let's use them! Reminder: enumerated types Enumerations De nition: an enumeration, also called enum or enumerated type, is a data type consisting of a nite set of named values. The values are internally represented as integers, at lest in C and closely related languages. When eclaring an enumeration: enum CardSuit {CLUBS, DIAMONDS,HEARTS, SPADES}; Each of the four elements (a.k.a. enumerants) is assigned a unique integer value. By default, the rst element (CLUBS) is assigned the value 0, and subsequent elements are given incremental integer values (DIAMONDS=1, HEARTS=2, SPADES=3). fi fi fi Enumerations It is possible to override the default value assignments using explicit assignments at de nition time In the following, all elements are given custom values: enum CardSuit {CLUBS = 1, DIAMONDS = 2, HEARTS = 4, SPADES = 8}; Enumerations We do not have to specify all values. If some values are omitted, the compiler will “ ll in the blanks” using the incremental numbering scheme. For example: enum CardSuit {CLUBS, DIAMONDS, HEARTS = 100, SPADES}; Here, CLUBS is assigned the default value 0, DIAMONDS becomes 1, HEARTS is given the custom value 100, and SPADES becomes 101. Enumerations Enumerations are used to represent simple symbolic information such as classifying the “type” of an object. Because enumerants are represented by integers, they can be used in arithmetic expressions just like other integer data types. This is unlike the Java enums, where enums are considered separate types from integers. To declare an enum variable, use the following syntax: enum CardSuit mysuit = CLUBS; The leading enum keyword, followed by the name of the enumeration, forms a complete type speci cation. fi Enumerations Example Consider the following enumeration de nition: enum Direction { NORTH, EAST, SOUTH, WEST }; We can de ne a function turnRight() which takes the old direction and returns the new direction. The function would be used like so: enum Direction dir = NORTH; dir = turnRight(dir); fi fi Enumerations Example N One de nition of the turnRight() function is as follows: enum Direction turnRight(enum Direction dir){ W E if (dir == NORTH) return EAST; else if (dir == EAST) return SOUTH; S else if (dir == SOUTH) return WEST; else return NORTH; } fi Enumerations Example N Another possible de nition is this: enum Direction turnRight(enum Direction dir){ return (dir + 1) % 4; } W E The latter de nition takes advantage of the fact that enums are represented as integers in a sequence. S It is shorter, but potentially less readable fi fi Enums and typedefs Typedefs also work with enumerations, like this: enum Direction { NORTH, EAST, SOUTH, WEST }; typedef enum Direction Direction; Or like this: typedef enum Direction { NORTH, EAST, SOUTH, WEST } Direction; The two de nitions are equivalent and let us declare variables like this: Direction dir1 = NORTH; Direction dir2 = WEST; fi Complete Enumeration Example See directions.c in Week 5 examples for a more complete example. Enum guidelines Use them for types with a small number of discrete values E.g. days, months, cardinal directions, error codes, etc. Advantages: They make your code more readable - using named labels instead of magic constants is always a big win Allow for some elegant code (enum values could be used as array indices, for example) They also help developing good programming habits: In languages other than C that have enums (C#, C++, Swift, Java, etc.), the compiler enforces strong rules for enum use, and prevents you from making mistakes If you are used to using enums in C, you will carry this habit into other languages, and will be able to take advantage of stronger type systems Enum guidelines As a beginner, avoid relying too much on the integer representation, and always use the labels Relying on the integer representations can decrease readability It can also can break your code if the enum changes In later assignments we might add other error codes to enum (i.e. type VCardErrorCode) If the order of existing labels changes and you rely in a speci c int value, you code will not work correctly Searching with predicates Mental exercise Imagine an array of Person structs: typedef struct name { char* firstName; char* lastName; unsigned int age; } Person; How would you like to retrieve values from it? Now imagine that you want to write a function (or functions) for searching such an array Searching based on what? How would you design and implement this? What changes if the array contains value of a di erent type, instead of the Person struct? Mental exercise After working on this exercise, you will nd that your searching depends on conditions Return the element (or elements) for which a condition holds - or an indication that there are no elements for which a condition holds fi Predicates In discrete math, predicates (or predicate functions) are boolean-valued functions Given a logical proposition, they return true or false The notion of a predicate can be extended to concrete functions implemented in a programming language A predicate would be a function that accepts one or more arguments and returns a boolean value You can think of predicates as condition testers If a condition - that depends on the arguments of the function - hols, the function returns true Otherwise the function returns false Predicates The compare() ction that we use in our List API is somewhat similar to a predicate If two arguments are equal, return a value (0) Otherwise, return a positive or negative value that indicates how the arguments should be ordered However, the function in not boolean-valued - it returns an int Moreover, returned values of -1, 0, and 1 mean three di erent things fi Predicates To turn it into a predicate, we can simplify it: return true if two arguments are equal return false otherwise Use the bool type for clarity However, what does it mean for two “things” to be equal? Searching with predicates When we search a data collection, we often want to search by di erent criteria, e.g. Find student with rst name “Rick” and last name “Sanchez” Find a student with age 25 Find if the class has any students with the last name starting with an “F” etc. This can get very complex quickly, and for advanced functionality we need database query support However, we can do a relatively simple implementation: when we search, we pass the search condition as a predicate fi Searching with predicates Generic search function for an array: Person* search(Person* array[], int arrayLen, bool (*compare)(const void*, const void*), Person* searchName); 1st argument is the array we wish to search 2nd argument is the army length 3rd argument is the predicate - the function that speci es the comparison condition 4th argument contains the value we should be searching for Returns data associated with the 1st element that satis es the condition NULL if no elements in the array satisfy the condition fi Searching with predicates See StructListDemo.c (lines 161 - 172)

CIS*2750 Midterm 1 PDF - C Tool Chain - University of Guelph

Document Details

Tags

Related

Summary

Full Transcript