Summary

This document is a review of the C tool chain, covering topics such as the preprocessor, compilers (like GCC and Clang), intermediate and object files, linking, and the roles of the loader. It explains how the tool chain converts source code into executable or library files, touching on makefiles, libraries, and memory allocation, for undergraduate computer science students.

Full Transcript

The C Tool Chain - Tool Chain: This is an informal term describing the sequence of software tools that converts your source code into binary code. IDE - IDE: Interactive Development Environment - Windows: Visual Studio - MacOS: Xcode - Multi-Plataform Eclipse, Intelli...

The C Tool Chain - Tool Chain: This is an informal term describing the sequence of software tools that converts your source code into binary code. IDE - IDE: Interactive Development Environment - Windows: Visual Studio - MacOS: Xcode - Multi-Plataform Eclipse, IntelliJ, Codelite, CodeBlocks Compilers - C compilers: GNU Compiler Collection(GCC), Clang Intermediate Files - Consists of the.i.s and.o files C Preprocessor - Purpose: Interpret all the \#directives in the.h and.c files before compiler sees the source code(intermediate i.file) - \#include: merge in header files - \#define: macros(with and without arguments) - \#indef, if, else, endif: conditional compilation - Exception: \#pragma is passed on as a compiler directive and is ignored if the compiler does not recognize it. - - Note: The C preprocessor is abbreviated to cpp Preventing Multiple/Circular Includes - Best Practice.h pattern A close up of a computer screen Description automatically generated - On the first \#include "headername.h" symbol HEADERNAME\_H gets defined - If it gets included again the files contents is skipped Headers and Paths - The \#include directive has different syntax for different headers. - \#include\ : This variant is used for system header files - It searches for a file named file in a standard list of system directories - \#include "file": This variant is used for header files of your own program - It searches for a file named file as follows: - First in the directory containing the current file - Then in the "quote" directories - Then in the same directories used for \ Headers and Paths - You must not include the path to the file in the \#include statement - This is because if the header and or the file that uses the header move to a different location the path breaks and the code will no longer compile - Instead you should always include the file itself, e.g. \#include "Parser.h" - Let the compiler toolchain take care of the paths. - A directory that contains a header file can be passed to the compiler using the -I option - The paths can be relative or absolute and you can pass more than one include directory Compiler - Purpose: Compile C language source code.i files into assembly language.s - Diagnoses abuses of language and issues warning and or errors - Intermediate. I files and.s files normally deleted after assembly Assembler - Purpose: Assemble assembly code(.s) from compiler into object code (.o) - Tool setup is normally transparent - There are potions for controlling it - You can insert assembly statements into your C program Linker - Purpose: TO stitch together object files into a single binary program file -- executable or library - All the.o object files that make up the user program plus - All the referenced system libraries - The linker creates libraries - Static (.a/.lib): inserted in the executable file - Dynamic (.so/.dll) linked at run time - Note if looking at error messages undefined reference errors are almost always cause by the linker being unable to find the definition of a symbol. What is the linker doing? - In the object files there are tables of external references (eg printf) - It reads all the libraries until it finds a matching external definition( the actual code of the printf function in stdio.h) - Libraries come in static and dynamic form - Static Linking: It pulls the definitions (function code and global variables) into the program file and fixes up all the refs to point to their locations. - Dynamic Linking: It makes a note of that library file to get the definition at runtime Linking and Paths - The linker will only automatically link the standard C libraries excluding the math library. - By default it only looks in a small number of directories containing system libraries - It does not look in the current directory - We can explicitly give the linker additional paths to look using the -L flag - Note that -l provides the names of libraries that the linker has to link with Loader = ld.so - Command shell or program tells OS to execute a program file - OS opens a fresh process and calls loader to fill it up by copying the segments of the program file into different regions of the memory for the process - Program instructions - Static data - OS creates a new stack and an empty heap then transfers control to first instruction of program. - Note that errors caused such as error while loading shared libraries and issues with.so files are handled by the loader. Loader = dlopen - Program using dynamically linked libraries - Ld.so will load these when program starts - But referenced calls will be fixed up on demand - Some programs use dlopen to load shared object plugins on demand at run time Makefile Makefiles - The make utility executes a sequence of commands from a description file - It is generally used to create executables but it can perform other tasks - Remove files - Report the status of a project - Packagining multiple files into a distribution - Installing files in a directory - Building libraries - The make utility is not unique - The ant util for java is similar - Cmake is an improved make - It also examines dependencies between files - If files don't exist it attempts to build them - If the compiled files do exist and they depend on a file which has a newer date then they are recompiled. Makefile and the Compiler toolchain - The make utility and compiler toolchain are separate - However the make util is usually used to automate code building Makefile Structure - Each entry in a makefile consist of three parts - Target - Prerequisites of dependencies - Command line - ![A close-up of a white background Description automatically generated](media/image5.png) Part 1: Target - The target is what a particular makefile entry aims to build - Typically this is a filename but it can also be a executable or a library Part 2: Prerequisites - The prerequisites are the files that must exist in order for a target to be buildable - If any of the prerequisites are newer than the target then the command line is executed - In other words if any of the dependencies were modified after the target was created we recompile the target Part 3: The command line - The command line is the command that must be executed to build the target using the dependencies. - The command line must be prefaced by a single tab Invoking Make - Typing only: make - This looks for a file named Makefile or makefile and builds the first target that appears in the file - Typing: make \ - Will build the specified target Other Common Targets - Targets don't have to be files as for example make clean - In class examples make usually deletes all targets(exe and lib files) - All temporary.o files - And the core file Checking Commands - You can view the commands that make would execute using the -n flag after you type make Multiline Command Lines - You can put multiple line sin the command line by separating them with a semi-colon - A close-up of a word Description automatically generated - The backslash means to continue to the next line and must be the last character on the line Makefile macros - Macros are used to avoid repeatedly typing a lot of text in makefiles - They are defined using the equal sign: - ![](media/image7.png) - And they are referenced using a \$ and brackets: - - Note that undefined macros are replaced with a null string Predefined Macros - Macro CC is predefined as command cc - Cc is a usually symbolic link to the default C compiler of your \*nix distribution - Typically either the GNU C compiler or the LLVM C compiler - Macro LD is predefined as command ld - You can use these without defining them or you can replace them with something else - Eg CC = gcc Macro strings and substitutions - This allows you to use a macro and substitute a string in the macro - E.g. SRCS = a.c b.c c.c - SRCS can be referenced with: - \$(SRCS:.c =.o) - This will translate the list to a.o b.o c.o - \$(SRCS:.c = ) - This will translate to a b c - The names of the executables can be created using the name of the source files Suffix rules - The following makefile will compile the executables a b and c - all: \$(SRCS:.c = ) - Suffix rules tell the system how to compile different types of files - By convention c files end with.c and etc - As a result make has built in rules which will use the correct compiler to turn a source code file into an executable - \- p lets us see all the predefined macros Comments - Comments begin with a \# and continue to the end of the line - ![A white background with text Description automatically generated](media/image9.png) Flags control options for the tools - The components of the C toolchain can be controlled using makefiles Preprocessor Flags - Preprocessor: CPPFLAGS = - -Iinclude\_file\_dir - This adds include\_file\_dir to on the include paths of both \ and "" includes - - Other useful symbols - ![A close up of words Description automatically generated](media/image11.png) Compiler flags A close-up of a flag Description automatically generated Linker Flags ![A close-up of a computer screen](media/image13.png) Creating Libraries Libraries - Libraries are reusable collection of precompiled functions and ancillary data - A large part of software development involves designing the library API -- application programming interface - The api is the interface for the user of your library - It is made of all the public components of your interface -- functions, constants, custom types etc... Library Design - Programmer decides on hat functionality goes into library components - What each function does - What arguments it takes and what it returns - How it handles errors - How to name it so its role is clear - Note that in C we create libraries of functions but in oop languages we would be creating libraries of classes Libraries Design Public Aspects - Since the API is the public interface for your library, designing a good API is important - Deciding what makes up the API is similar to deciding on public methods of a class\\ Library Desing Private Aspects - This is done through the concept of information hiding - The purpose is to hide implementation details so the user of the library does not mess things up - e.g. why let the user manipulate the next pointer of a list struct -- possibly incorrectly - instead we can create insertFront/insertBack Methods for the user - C has very limited tools for actually making library internals inaccessible Libraries in Practice - On Linux, system-wide libraries are normally stored in /lib and /usr/lib - Standard headers are normally located in /usr/include - For the math library - Library file is libm.so and the associated header file is math.h - Notice they do not necessarily have the same name Static and Dynamic Libraries - Static Libraries: When a program is compiled the library contents are copied into the executable and stored with it - Library copy is stored in the executable file. - Compile-time linking - For dynamic libraries, when the program is executed, the library is linked to the executable in memory. - The library copy is not stored in the executable file - Run-time linking - Also known as shared libraries int the unix/linux world and dynamically linked libs (DLLs) in the Windows world. - Static Libs - Pros: one self contained executable and more beginner friendly - Cons: executable can be very large and inflexible as the library cant be updated - Dynamic Libs: - Pros: flexible - executable can be linked to different versions of the library as long as the library API is the same; executable is not bloated - Cons: In some applications a single self contained executable might be more convenient; require more experience to use - Dynamic libraries tend to be the default in real world applications Naming Libraries - All library names begin with lib - Static libraries end with.a - Dynamic libraries end with a.so(on linux) or dll(dynamically linked library on windows) - Can also be.dylib and.so on macOS - Shared objects can have version numbers after the.so: - libm.so.3 Libraries and Symbolic Liks - Symbolic links may be used to create multiple names for the same library - For example liba.so.3.2 may have links liba.so.3 and liba.so which point to it - The library itself is called "a" Creating a dynamic(shared) Library - Easy to do with gcc - Must enable creation of Position Independent code(PIC). This means it works no matter where in memory it is placed - The location of the library in memory will vary from program to program - Appropriate compiler/linker arguments - Using a Library ![A white text on a white background Description automatically generated](media/image15.png) Memory and Valgrind Allocating Memory in a function - In order to modify something in a function we must pass its address -- ie the pointer to that thing - If we want to modify a value in a function we must pass it by reference - Otherwise we will only modify a copy of the value - If we want to allocate memory to a pointer we can do it in a function - The value of a pointer variable p is a memory address - Can be initially null Modifying a Pointer in a Function A screenshot of a computer program Description automatically generated Valgrind - Valgrind is a memory debugging tool that can check for memory leaks and diagnose various other memory errors - Note that in order to use valgrind you need to compile your program using the -g flag Valgrind and Memory Leaks - Memory blocks are marked by valgrind as one of four types - Definitely lost: means your program is leaking memory - Indirectly lost: means your program is leaking memory in a pointer based structure (E.g. if the root node of a binary tree is definitely lost the children will be indirectly lost) - Note that if you fix the definitely lost leaks the indirectly lost leaks should go away - Possibly lost: means your program is leaking memory unless your doing unusual things with pointers that could cause them to point into the middle of an allocated block - In addition to leaks valgrind can detect memory errors such as: - Using an uninitialized value - Writing into memory that was not allocated Advanced C Programming Scope Access Control, Storage Class - Most programming langs separate the concepts of scope, access control and storage class to some extent - Variable scope is the region over which you can access a variable by name - In other words scope defines where a variable is visible - Storage class determines how long the symbol stays alive - Access control determines who can access a symbol that is visible to everyone Scope Vs Access Control - In java there are 4 access level for methods and class/instance variables - Private, protected, package level and public - A class variable that has a class wide scope is visible to all classes/objects within a program but we can make it inaccessible to various other classes through the use of access control Scope in C - C has no access control but we do have some control over who can see a specific variable - There are four types of scope - Program scope - File scope - Function scope - Block scope Scoping Principle - Always define a symbol in the narrowest scope that works - This is to prevent errors and to a lesser extent security reasons Program Scope - The variable is accessible by all source files that make up the executable - In c all functions and global extern variables Program Symbol concepts - Definition: Where the named thing lives - Actual memory location of data or function - Reference: Some use of the thing by name - Load/store, call: must be resolved to location - Declaration: Tells the compiler about name - Compiler can verify that references are correct External Symbols - Program scope symbols are passed to the linker in a.o file - External definition "extdef" - External reference "extref" - In linked executable, for each external symbol: - Exactly one extdef or we get an error: - Undefined external, multiply defined external - Any number of extrefs - Substituted with final memory address of symbol Externals - Having program scope is a common requirement in - Assembly - Allows big program to be linked together out of small modules - Each language has its own convention for extdef and extref Using Program scope in C: function - extdef: void insertBackk(List\* list, void\* toBeAdded){...} - definition only appears in one.c file - declaration: void insertBack(List\* list, void\* toBeAdded); - prototype declaration is included in many c files with the header files - extref: insertBack(list, data); call - denitely happens in multiple files Using Program Scope in C: Variable - extdef: FILE\* inputfile; - definition only appears in one.c outside any function - can initialize: type varname = initial\_value - declaration: extern FILE\* inputfile; - declaration appears anywhere in a file - extref: fclsoe(inputfile) - appears anywhere we use the variable Using Program scope - You have to decide when to use program scope - In general variables should never be globally accessible by the end user - Program scope functions should be part of the programs public interface - Try to avoid program scope for interval functions if possible File Scope - A variable or a function is accessible from its declaration(definition) point to the end of the file - In C static things that are global within a file but not the whole program - If variable defined outside any function - Would normally be program scope - Static keyword keeps definition from being passed to linker: doesn't become external Using File Scope - The file scope in C sits between private and package level in Java - File scope is perfect for internal use only functions - Can be used for variables but you really should avoid variables that are global to a file - Keep variables local to each function body(local scope) and pass them as function arguments instead. - Global variables(file or program scope) are usually a bad idea as they lead to hard to find errors Function Scope - Accessible throughout a function - In C only goto labels have function scope; therefore you will never see them - Throughout means you can jump ahead: Block(local scope) - The variable is accessible after its declaration point to the end of the block in which it was declared. - Note that a block is anything between curly bcraces - Includes - Function params - Local variables declared in a function - Loop counters declared in for loops - Variables declared within the loop or branching statement body What Happens? DON'T DO THINGS FOR LOCAL VARIABLES - Avoid using the same variable names in overlapping scopes (shadowing) Scope Vs Storage Class - Storage class applies to where and how long variables is kept - Different from scope - Typically variable scope and storage class are unrelated to each other - C is weird as static affects both the variables storage and scope Automatic Storage - Associated with functions - Local variables inside a function or inside any other block - Fresh temporary copy created on the stack every time a function is called - Copy can be initialized - Copy goes away when function returns to caller - Enables recursion Automatic Storage Warning - Never return pointers to automatic variables - This is because the auto variable goes away when the scope ends/function returns Static Storage - Means there is only one instance of variable in executable program - Applies to - Program scope - Static file scope variables - Static local variables - If you add static keyword a variable is not global anymore as it is restricted to file scope - The static keyword changes local variables from automatic to static storage class - Initialization effective once when program started. Global/External Storage - Program scope variables that exist as long as the program is running Dynamic Storage - Third class of storage contrasted with static and automatic - Created (temporarily) on the heap via malloc calloc and realloc - Must be freed via free - Address pointer has to go in some variable - That variable ahs scope and storage class itself - If dynamic storage is not explicitly freed it will exist after the program terminates Problems with Precedence ![A screenshot of a computer code Description automatically generated](media/image17.png) A white background with black text Description automatically generated ![A white background with black text Description automatically generated](media/image19.png)\ A close-up of a number Description automatically generated C odds and Ends Types and Typedefs Enumerated types, Searching With Predicates Typedefs - The typedef operator can be used to create aliases or shorthand notations for existing C types - Typedefs don't really create new types but rather they are used to give existing types a new name Typedef Examples - Consider the following definition: - Struct Vec2{ - This defines a new type -- a structure - We refer to this type as a struct Vec2 in our code ![A screenshot of a computer code AI-generated content may be incorrect.](media/image21.png) Types - They help to make code more readable - Offload some of the error checking onto the compiler - Builds good programming habits that will be useful when using languages with stronger type systems - Using types correctly also helps you to avoid precision issues Rules for Using Types - Always pick the appropriate type for the job and use the narrowest type - For example use an int instead, say, float, if you need to sore counts, lengths and other inherently integer values - Use the C bool type instead of int if your value is always true or false - Outside of C this will also let you use a compiler to keep you from making accidental mistakes - For example languages with strict type rules - Will not all you to assign an int to a Boolean variable - Will not allow you to assign a signed int to an unsigned int A screenshot of a computer program AI-generated content may be incorrect. ![A screenshot of a computer code AI-generated content may be incorrect.](media/image23.png) Analysis - Efficiency - Solution makes numVec calls to malloc and numVec calls to free, Solution uses one malloc and one free - Readability - Less manual memory management usually means fewer errors and less time spend debugging - Maintainability and information hiding - If we want to change the storage type from float to double - In solution 1 we would have to change multiple lines of code - In solution 2 we only have to change one line Enumerations - Enumerations: An enumeration also called an enum or enumerated type, is a data type consisting of a finite set of named values. - The values are internally represented as integers, at least in C and Closely related languages - When declaring an enumeration: - Enum CardSuit {CLUBS, DIAMONDS, HEARTS, SPADES}; - Each of the four elements are assigned a unique integer value - By default the first element CLUBS is assigned the value 0 and all subsequent values are given incremental integer value. - It is possible to override the default value assignments using explicit assignments at definition time - Example: enum CardSuit {CLUBS=1, DIAMONDS=2, HEARTS =4, SPADES=8} - Note that we don't have to specify all the values as if some values are omitted the compiler will fill in the blanks using the incremental numbering scheme - Enumerations are used to represent simple symbolic information such as classifying the type of an object - Because enumerants are represented by integers they can be used in arithmetic expressions just like other integer data types. - To declare an enum variable use the following syntax: - Enum CardSuit mysuit = CLUBS; - The leading enum keyword, followed by the name of enumeration forms a complete type specification Enumerations and Typedefs - Typedefs also work with enumerations - Enum Direction { NORTH, EAST, SOUTH, WEST} Direction; Enums Guidelines - Use them for types with a small number of discrete values - Eg days months cardinal directions, error codes - Advantages: - They make your code more readable -- using named labels instead of magic constants is a big win - They allow for more elegant code (enum values can be used as array indices) Predicates - In discreet math predicates are Boolean valued functions - Given a logical proposition they return true or false - The notion of a predicate can be extended to concrete functions implemented in a programming language. - A predicate would be a function that accepts one or more arguments and returns a Boolean value. - You can think of predicates as condition testers - If a condition that depends on the arguments of the function holds the function returns true - Otherwise it returns false

Use Quizgecko on...
Browser
Browser