LabSO_handout.pdf Notes from the slides 2022/2023
Document Details
Uploaded by SteadyBoltzmann
Università di Torino
2023
Enrico Bini
Tags
Summary
These notes cover Unix basics, C programming fundamentals, and system calls. Content includes topics like shell usage, file systems, variables, memory management, operators, debugging, and inter-process communication through pipes and message queues. The document is likely course notes from an academic setting in Computer Science or a similar field.
Full Transcript
Laboratorio C+Unix: Notes from the slides Anno accademico 2022/23 Enrico Bini Contents 1 Unix: introduction and basic usage...
Laboratorio C+Unix: Notes from the slides Anno accademico 2022/23 Enrico Bini Contents 1 Unix: introduction and basic usage 3 1.1 Shell............................................................. 3 1.2 File system.......................................................... 4 1.3 Accounts........................................................... 6 2 C: minimal program 9 2.1 Overview........................................................... 9 2.2 Variables and memory.................................................... 11 2.3 Output: basics........................................................ 13 3 C: arrays and strings 14 4 C: operators and control 17 4.1 Operators.......................................................... 17 4.2 Control constructs...................................................... 19 5 C: pointers to memory 20 5.1 scanf, copying memory................................................... 24 6 Debugging by gdb 25 7 Understanding all 2519 options of gcc 26 7.1 Pre-processing........................................................ 26 7.2 Compiling.......................................................... 31 7.3 Assembling.......................................................... 32 7.4 Linking............................................................ 32 8 C: types 33 8.1 Integers........................................................... 33 8.2 “Boolean”.......................................................... 34 8.3 Floating-point numbers................................................... 34 8.4 Type conversion....................................................... 35 9 C: functions 36 10 C: scope of variables 40 11 C: storage classes 40 11.1 Variables on the BSS.................................................... 41 11.2 Variables on the stack.................................................... 42 11.3 Variables on the heap.................................................... 43 11.4 Variables in memory: comparison.............................................. 44 11.5 Variables stored in processor registers............................................ 44 12 C: more on operators 45 1 13 C: composite data types 46 13.1 Data structures: struct.................................................. 46 13.2 “Overlapping data structrures”: union........................................... 48 13.3 Enumerating constants: enum................................................ 49 13.4 Defining new data types: typedef............................................. 49 13.5 Dynamic lists........................................................ 49 14 C: more on pointers 50 15 Modular programming and libraries in C 53 15.1 Modules: overview...................................................... 53 15.2 Modules in C........................................................ 53 15.3 Libraries........................................................... 56 16 The make utility 57 17 Files 59 17.1 Streams........................................................... 59 17.2 File descriptors........................................................ 62 18 Error handling 64 19 Environment variables 64 20 Process control 65 20.1 Process creation....................................................... 65 20.2 Waiting for termination of child processes......................................... 67 21 Replacing the image of a process 71 22 Signals 72 22.1 Sending signals....................................................... 72 22.2 Handling signals....................................................... 73 22.3 Lifecycle of signals: delivering, masking, merging..................................... 75 22.4 Getting a signal when in waiting state........................................... 77 23 Pipes 79 23.1 Recalling file descriptors................................................... 79 23.2 Pipes in C.......................................................... 80 23.3 Redirecting input/output via pipes............................................. 83 24 FIFOs 84 25 SysV: IPC 85 26 SysV: message queues 87 27 SysV: semaphores 91 28 SysV: shared memory 95 2 1 Unix: introduction and basic usage Operating System (OS) An operating system is the software interface between the user and the hardware of a system. We say that the operating system manages the available resources. – Whether your operating system is Unix-like (Linux), Android, Windows, or iOS, everything you do as a user or programmer interacts with the hardware in some way. the components that make up a Unix-like operating system are 1. device drivers: make the hardware work properly (coded in C and assembly), 2. the kernel: CPU scheduling, memory management, etc. (coded in C) 3. the shell: allows the interaction with OS 4. the file system: organizes all data present in the system 5. applications: used by the user (coded in fancy languages: Java, python, or else) 1.1 Shell Shell shell (Italiano = “guscio”), versus kernel (Italiano = “nucleo”) The shell is a command line interpreter that enables the user to access to the services offered by the kernel The shell is used almost exclusively via the command line, a text-based mechanism by which the user interacts with the system. Terminals (the “black window”. Icon: ) allows the user to enter shell commands When entering commands in a terminal, the button “TAB” helps to complete The real hacker uses the terminal only. The mouse and the graphic interfaces are for kids: is it more efficient to use 10 fingers over a keyboard? Or one finger over a strange device? Exercise: open a terminal and try cat /etc/shells (cat shows the content of a file) echo $SHELL (SHELL is an environment variable) System calls system calls (“syscalls” for short) are the “access point” to the kernel: the way programs ask the kernel for any service Example of services asked to the kernel: – reading a file from the disk, – reading the keyboard, – printing over the screen, – reading from the network card –... syscalls are identified by a unique number strace shows all system calls happening when invoking. Example: echo ciao strace echo ciao strace -wC also shows a summary of the invoked system calls 3 Help on commands Unix manual pages (or man pages) are the best way to learn about any given command man pages are invoked by “man ” – Space to scroll down, b to scroll up, q to quit man pages are divided in sections Sec. Description 1 General commands 2 System calls 3 Library functions, covering in particular the C standard library 4 Special files (usually devices, those found in /dev) and drivers 5 File formats and conventions 6 Games and screensavers 7 Miscellanea 8 System administration commands and daemons if same entry in more section, it is returned lower section try: man printf , man 1 printf , man 3 printf 1.2 File system File system The file system enables the user to view, organize, store, and interact with all data available on the system Files have names: file extension does not imply anything about the content, it is just part of the name Files are arranged in a tree structure Directories are special files which may contain other files The root of the tree is “/” The full pathname of a file is the list of all directories from the root “/” until the directory of the file “.” is the current directory “..” is the parent directory “~” is the home directory of the user Files may be links to other files: command ln to create links File types Files are an abstraction of anything that can be viewed as a sequence of bytes: the disk is a (special) file More in general, there are 7 types of files: 1. (marked by “-” in ls -l) regular file: contains data, are on disk 2. (marked by “d” in ls -l) directories: contains names of other files 3. (marked by “c” in ls -l) character special file: used to read/write devices byte by byte (stat /dev/urandom) 4. (marked by “b” in ls -l) block special file: used to read/write to devices in block (disks). Try stat /dev/nvme0n1 5. (marked by “p” in ls -l) FIFO: a special file used for interprocess communication (IPC) 6. (marked by “s” in ls -l) socket: used for network communication 7. (marked by “l” in ls -l) symbolic link: it just points to another file try stat , stat /dev/nvme0n1 to view status and type of any file the disk is a file: cat /dev/nvme0n1 to show its content 4 /bin common programs, executables (often subdirectory of /usr or /usr/local) /boot The startup files and the kernel /etc contains configuration files /home parent of home directory of common users Directory content /tmp place for temporary files, writable by everybody, cleaned upon reboot /root home directory of the administrator /lib library files /proc information on processes and resources (only on some Unix-like ma- chines) /dev contains references to special files (disks, terminals, etc.) The content of directories follows the “Filesystem Hierarchy Standard” https://refspecs.linuxbase.org/fhs.shtml so that programmers can expect to find something in the right place Common commands Most important key is TAB: it helps auto-complete cd Change directory: moves you to the directory identified cat Concatenate: displays a file cp Copy: copies one file/directory to specified location du Disk usage echo Display a line of text grep Print lines matching a pattern ls List: shows the contents of the directory specified mkdir Make directory: creates the specified directory more Browses through a file (has an advanced version: less ) mv Move: moves the location of or renames a file/directory pwd shows the current directory the user is in rm Remove: removes a file sort Sort lines of text tail Shows the end of a file touch Creates a blank file or modifies an existing file’s attributes Input/Output redirection To work properly, every command uses a source of input and a destination for output. Unless specified differently – the input is read from the keyboard – the output is written to the terminal Unix allows the redirection of the input, output, or both – redirection of the input from a file (with “”) – redirection of the output of command A as input to command B (“pipe” with “|”) Examples: – ls > my_list – wc < my_list – ls -latr | less – du -a | sort -n 5 Metacharacters wildcards are special characters that can be used to match multiple files at the same time – ? matches any one character – * matches any character or characters in a filename – [ ] matches one of the characters included inside the [ ] symbols. Examples – ls *.tex – ls *.[tl]* – ls *t* – ls ?t* 1.3 Accounts Accounts Unix is a multi-user systems: more than one user can use “simultaneously” the available resources (computing capacity, memory, etc.) – Once upon a time there were single-user operating systems such as MS-DOS – In applications where the resources must be used by a single application, multi-user is not needed (example: embedded systems) accounts are used to distinguish between different type of usage of resources There are three primary types of accounts on a Unix system: – the root user (or superuser) account, – system accounts, and – user accounts. All accounts cat /etc/passwd to see all accounts. Seven colon-separated “:” fields: 1. login name 2. crypted password (today passwords are in /etc/shadow, accessible only with root privileges) 3. numeric user ID 4. numeric group ID 5. a comment field (used to store the name of the user or the name of the service associated a system account) 6. the home directory of the account 7. the default shell Command usermod [OPTIONS] to change any among the fields above and more usermod -c "New Name" bini to change the comment field into “New Name” 6 Root accounts The root account’s user has complete control of the system: he can run commands to completely destroy the software system as well as some hardware component The root user (also called root) can do absolutely anything on the system, with no restrictions on files that can be accessed, removed, and modified. The Unix methodology assumes that root users know what they want to do, so if they issue a command that will completely destroy the system, Unix allows it. People generally use root for only the most important tasks, and then use it only for the time required and very cautiously. “With great power comes great responsibility” command sudo allows running a command as another user (even root if allowed). Example: packages are installed by sudo apt install command su allows becoming another user (even root if allowed) System accounts System accounts are specialized accounts dedicated to specific functions cat /etc/passwd – the “mail” account is used to manage email – the “sshd” account handles the SSH server – web servers run as dedicated account –... they assist in running services or programs that the users require they are needed because often running some services (mail, SSH,... ) requires some root privilege. Hence: – running these services with user privilege is not possible – running these services with root privileges is too risky – that’s why system accounts are useful main access to hackers: accessible to user, but with some root privileges services running with system accounts must be super safe! User accounts user accounts are needed to allow users to run applications system resources and are “protected” by passwords most common passwords 123456 qwerty password 987654321 mynoob 666666 18atcskd2w 1q2w3e4r zaq1zaq1 zxcvbn Some users may be fully trusted and the OS would like to give them the possibility to do anything Some others may be authorized to do only a subset of the possible actions How are privileges managed? 7 Groups users with similar privileges are assigned to the same group the administrator (root) can then manage all the users belonging to the group by simply assigning privileges to the group an account may belong to more than one group, if needed cat /etc/group to view the list of group. Each row has: 1. group name 2. group password (very rarely used. From man gpasswd: “Group passwords are an inherent security problem since more than one person is permitted to know the password.”) 3. group ID 4. list of users belonging to the group Example: cat /etc/group | grep sudo shows all users belonging to the sudo group (who can launch sudo ) groups bini shows the groups a user belongs to File ownership, permission Each “file” (which may be the disk and the terminal and other strange things) has – an owner and – a group Permissions are divided in three subsets: – u permissions of the user (owner) – g permissions of the users in the group – o permissions to all others Permissions are of three types: – read (r) if the file can be read – write (w) if the file can be written – execute (x) if the file can be executes (“search” permission id directory) chown to change the owner of a file chgrp to change the group of a file chmod to change the permissions of a file Example: chmod u+rw adds read/write for the owner Example: chmod o-r remove write for the others File permission, octal representation File permissions are often represented in octal (base 8) user group other octal r w x r w x r w x 1 1 1 1 1 0 1 0 0 =764 Equivalent commands – chmod u=rwx,g=rw,o=r 8 – chmod 764 Examples: – ls -l to view permission (try it is /dev/) – chmod to change permissions of a file – chown to change owner and group of a file 2 C: minimal program 2.1 Overview C vs. Java Founding principles of C programming: 1. Trust the programmer. 2. Don’t prevent the programmer from doing what needs to be done. 3. Keep the language small and simple. 4. Make it fast, even at risk of portability. Efficiency is favoured over abstraction (no objects or fancy stuff) C is the standard language for: device drivers, kernel. Widely used in embedded systems (all contexts where high efficiency is a must) How to write a C program 1. verify the presence of the C compiler gcc by gcc -v If not installed, then sudo apt-get install gcc 2. Edit a program by a text editor (nano, vim, emacs, gedit on GNOME, kate on KDE,... ) you should know what the editor writes into the saved file sophisticated development environment “helps” you to write the code. Sometime they take decisions for you and you don’t know about it 3. Compile the program by gcc If no compilation error, execute the program If errors, try to understand the errors, fix them and recompile My first C program 1. Create and edit the following program hello. c 2. Compile it by gcc hello.c By default the executable is a.out Launch it by./a.out (why not just a.out?) Usually we want the executable to have a name similar to the program. We do it by the “-o” option gcc hello.c -o hello 9 Basic structure of a C program 1. Pre-processor directives (#include ) #include... is used to add libraries in the example #include is needed to use the function printf() 2. Declaration of types (not in hello.c) 3. Declaration of global variables (not in hello.c) 4. Declaration of functions (not present in hello.c) 5. main function: the first function invoked at execution 6. Declaration of local variable v (array of characters) 7. Body of function main Coding style C is powerful. C programs must be clean and understandable It is highly recommended to adopt a coding style It is suggested: the Linux kernel coding style (may be useful if one day you’ll write kernel code) https://www.kernel.org/doc/html/v4.10/process/coding-style.html In short: – indentation made with TAB (8 characters long). ∗ TAB is one byte only (ASCII character number 9). Not 8 spaces (8 bytes!!). C programmers like to be efficient and not to waste bytes, energy,... – no new line before “{” (unless first brace of a function) – new line after “}”, unless there is a continuation of the previous statement as in “} else {” – do your best to stay in 80 columns Check example at the website The opposite of coding style: Obfuscated C Code some guys enjoy write “obfuscated code” by Yusuke Endoh at 2018 International Obfuscated C Code Contest download 2018.c and smily.txt compile by gcc 2018.c -o 2018 run by./2018 < smily.txt > smily.gif open smily.gif with any image viewer 10 2.2 Variables and memory What is a variable? What is a variable? In C, a view closer to implementation is taken In C, the best way to think of a variable is as a portion of memory In some sense, variables “do not exist”. Only the memory exists! “Variables” are just a convenient way to refer to pieces of memory. In C, we care only about: – the amount of memory taken by a variable – where in the memory is a variable allocated – the content of the variable: the bytes in memory For this memory-centric view, C has operators for – “give me the memory address of this variable” and – “give me the variable at this memory address” The type of a variable describes how to interpret the bytes in memory – 2-complement integers, – floating-point,... the C is a weakly typed language no strict control is made on the type of operands Memory: the abstraction In C the memory is abstracted as a loooong sequence of bytes Memory is byte-addressable: at every memory address, there is only one byte address content...... 7FFF0040671A8107 A7 7FFF0040671A8108 E8 7FFF0040671A8109 03 7FFF0040671A810A 00 7FFF0040671A810B 00 7FFF0040671A810C 50...... addresses in memory are represented by a machine-dependent number of bytes: in the slide by 8 bytes (16 hex digits) – by 8-bytes-long addresses, it is possible to address up to 28×8 = 264 ≈ 16 × 1018 bytes (16 billions of GB) about a billion times larger than current size of large RAMs the address space is not used only to store data files or I/O devices (video) may be mapped onto a portion of the address space 11 Memory: endianness How to store variables needing more bytes? – by using contiguous memory locations Example – an int variable var is represented over 4 bytes – if stored at address 7FFF0040671A8108, then it occupies the cells at: 1. 7FFF0040671A8108 2. 7FFF0040671A8109 3. 7FFF0040671A810A 4. 7FFF0040671A810B – its value is var = 1000 (1000 decimal = 4 bytes 00 00 03 E816 )...... 7FFF0040671A8108 E8 little-endian: starts from least significant byte 7FFF0040671A8109 03 7FFF0040671A810A 00 var 7FFF0040671A810B 00...... 7FFF0040671A8108 00 big-endian: starts from most significant byte 7FFF0040671A8109 00 7FFF0040671A810A 03 var 7FFF0040671A810B E8 x86 processors: multi-byte data is stored as little-endian test-endian. c (the code is for experts, still questions are welcome) Variables in C 1. the declaration of a variable informs the compiler of the size of the variable and its type (Example: int a; informs that a is an integer) 2. the identifier is the “name”of the variable. identifiers may be composed by alphanumeric characters and underscore “_”. Cannot start with a number. Cannot be a reserved C keyword (for, while, etc.) 3. An optional initialization by a constant int a; int b = 91; int c = my_function(); 4. The value of the variable is the interpretation of the bytes in memory according the variable type 5. a variable is a portion of memory. The amount of memory used depends on the type of the variable. A variable is never empty: it always has the value of the bytes in the memory Do not assume that the initial content of a variable is zero (or else). Always initialize it. Variable types Possible types of C variables are: – integer types of incresing length: (char), (short), (int), (long) – floating point types: (float), (double) are – addresses of memory, aka pointers: (char *), (int *), (double *), (void *) – No other standard C types (example: no boolean) 12 the size (in bytes) of these types is highly machine dependent the operator sizeof() returns the number of bytes of the type – sizeof(int), number of byte of any variable of type int – sizeof(a), number of byte of the variable a Check the size of the type of variables on your machine test-sizeof. c 2.3 Output: basics Printing to the terminal The classic function fo print is printf It needs the directive #include to be used printf can print strings, the value of variables and special characters The format is printf(, ) test-printf. c For each expression in the list, the format string must specify how this expresion should be printed. Format specificators must be as many as the expressions %d print integer, base 10 %o print integer, base 8 %X print integer, base 16 %e print floating point, notation 1.23e1 %f print floating point, notation 12.3 %s string of characters %c the ASCII character man 3 printf for full reference Printing: escape character The may contain escape characters to print non ASCII standard characters \n new line \t tab \" character " \’ character ’ \\ character \ %% character % \uXXXX Unicode character coded by the 4 hex digits XXXX \UXXXXXXXX Unicode character coded by the 8 hex digits XXXXXXXX test-printf. c 13 3 C: arrays and strings Arrays An array is not a class (as in Java) An array is a contiguous area of memory allocated to several variables of the same type An array is declared by []; it has size sizeof() = sizeof()* Example: int v; declares the variable v as an array of 10 int variables. Elements are v, v,... , v[-1] and are stored contiguosly in memory C does NOT check array boundaries! WARNING: v[-1] is syntactically correct address content variable......... 0080F8...... 0080FC v[-1] outside v!!! 008100 v 008104 v v...... 008124 v 008128 v outside v!!! 00812C...... Arrays: length The length of an array is not saved in the data structure – Do not ever try to invoke the “method” length() with an “array object” – “methods” and “objects” do not exist in C The programmer must record the length of the array in some way – by storing a special character terminating the useful content (such as in strings, which are terminated by the byte 0) – by recording the length in another (additional) variable Still, the following constant expression is useful to compute the length of an array int v , len ; len = sizeof ( v ) / sizeof ( v ) ; 14 Strings: arrays of non-zero bytes terminated by 0 The String “object” or “class” does not exist in C – (again, “object” and “class” do not exist at all in C) In C, the term “string” is used to denote 1. an array of char, as declared by: char s; 2. the bytes of such an array are interpreted as ASCII codes of characters 3. the byte 0 is written in s after the last character, to terminate the string Constant strings are enclosed by double quotes " " constant " ’ wrong ’ ’A ’ "A" A string may be printed by the %s placeholder of the printf as in printf ( " The string s is \"% s \"\ n " , s ) ; Initialization of arrays Arrays may be initialized by a sequence of values enclosed within { and } 1. The size of the array may be unspecified and determined by the length of the initialization, as follows int v[] = {1,2,3}; 2. The following declaration+intialization char v1[] = {’C’, ’i’, ’a’, ’o’, 0}; char v2[] = "Ciao"; are equivalent and create an array of 5 bytes (NOT 4 bytes) (strings are arrays of characters terminated by 0) 3. If the size is specified, as in int v = {3, -1, 4}; then all following elements are set equal to zero. Hence, int v = {0}; is a convenient way to initialize all elements of the vector to zero. 15 Strings in memory, converting string into integers A strings is stored as an array (sequence) of characters, terminated by the null character (0) char v [] = " 258 " ; address (hex) 7FFF0040671A8108 32=’2’ 7FFF0040671A8109 35=’5’ v 7FFF0040671A810A 38=’8’ 7FFF0040671A810B 00 int n = 258; address (hex) 7FFF0040671A8108 02 7FFF0040671A8109 01 n 7FFF0040671A810A 00 7FFF0040671A810B 00 Converting a string into an integer # include < stdlib.h > int a ; a = strtol (s , NULL , 10) ; – stores the value represented by the string s in the integer variable a – the second parameter is for advanced users Strings in memory, converting string into floating point Converting a string into a floating point number # include < stdlib.h > double a ; a = atof ( " 123.45 " ) ; – stores the value represented by the string s in the floating point variable a Strings: manipulation by including string.h By including the library #include some useful function strings to manipulate strings may be used 1. The following function returns the number of bytes in s before the terminating byte 0 strlen ( s ) ; 2. to append string src to string dest strcat ( dest , src ) ; – dest must be allocated at least strlen(dest)+strlen(src)+1 – otherwise (quoting from man strcat): “If dest is not large enough, program behavior is unpredictable; buffer overruns are a favorite avenue for attacking secure programs.” 16 3. to append up to n bytes of src to string dest strncat ( dest , src , n ) ; – if no 0 byte terminating scr among the first n bytes, only first n bytes are concatenated – it prevents the user to write arbitrary-long data Reading input from the keyboard: fgets() the function fgets(...) reads a string of characters #include must be added on top to use it Syntax char s ; fgets (s , sizeof ( s ) , stdin ) ; – s is a pre-allocated array of characters (string of characters) – reads a string from stdin (standard input) – store the string up to sizeof(s)-1 characters into s. The string cannot be sizeof(s) long because the terminating zero must be stored too – the string is read until EOF (end-of-file, Ctrl+D) or newline – if “Enter” is pressed, then the ASCII code of “new line” (=10) is also stored in s man fgets test-read. c , try with input from file 4 C: operators and control 4.1 Operators Operators with conditions Comparison operators – == “equal to” (WARNING: not =) – != “different than” – = Logical operators – !, logic NOT – &&, logic AND – ||, logic OR boolean type does not exist Example of operations among conditions int cond ; cond = x >= 3; cond = cond && x = 3 && x >, ) {... } else {... } “block TRUE” is executed if is not zero “block FALSE”, if present, executed if is zero while loop while ( < cond - expr >) {... } body of the loop repeated until becomes zero (which represent “false”) if is zero the loop is never executed if is always non-zero (not necessarily 1), it loops forever while (1) {... break ;... } 19 do-while loop do {... } while ( < cond - expr >) ; for loop for ( < expr1 >; < expr2 >; < expr3 >) {... } more natural for looping a known number of times is evaluated the before the first execution of the for is evaluate at the beginning of every loop. If zero, then exit the for is evaluated at the end of every loop Classic example (n-times loop) for ( i =0; i < n ; i ++) {... } switch construct switch ( letter ) { case ’A ’: case ’a ’: break ; case ’M ’: case ’m ’: case ’K ’: case ’k ’: break ; default : break ; } 5 C: pointers to memory Pointers: declaration All variables are represented by a sequence of bytes – int, long are interpreted as integer in two-complement – float, double are interpreted as floating point numbers according to the standard IEEE 754-1985 A pointer variable is interpreted as an address in memory Declared by specifying the type of the variable it points to 20 * ; only the pointer is allocated, not the variable it points to!! Example int *pi1, *pi2, i, j; declares pi1 and pi2 as pointers to integer, i and j are just integers. Usually names of pointers contain “p” or “prt” Pointers: example of usage memory address content variable type size.... 8100 26 v (int) 4.... 93A0 8100 p (int *) 8.... int v; int * p; v = 25; p = &v; *p += 1; printf("%d", v); Operations with pointers int v ; int * p ; v = 25; p = & v ; * p += 1; “address of”: from a variable to its address in memory – The unary operator & can be applied to any variable – &v is the address in memory of the variable v – if v declared by v, then &v is of type ( *) dereferencing: from the pointer to the variable it points to – The unary operator * can only be applied to a pointer (any variable p declared by * p;) – If p is a pointer, *p is the variable pointed by p – Warning: “*” is used to both declare a pointer and to dereference it Can we write &p? 21 Arrays and pointers The array “variable” is a constant pointer to the first cell of the array p.... v=8 v v=? v=5.... v=? If * p;, then p[i] is the i-th element of an array of starting at address p.... int v, *p; *v = 8; p = v; p = &v; p = 5; the difference betwen a pointer p and an array v is that 1. the name of arrays is constant, it cannot be assigned to a value v = p; 2. at declaration time – int v allocates a contiguous area to store 10 variables of type int – int * p allocates a variable p to store only a pointer 3. sizeof(p) is the size of the address p, sizeof(v) is the size of the array v if v is an array, the pointers v and &v are the same Casting a pointer to another type What is the difference between (char *), (int *), ( *)? They are all addresses in memory, right? – The type of a pointer p is necessary to properly interpret the pointed data *p, when the pointer is dereferenced By casting a pointer, it is changed the type of pointed data address content variable type size.... 8100 256 v (int) 4.... 93A0 8100 p (char *) 8.... int v = 256; char * p ; p = ( char *) & v ; * p = 1; printf ( " % d " , v ) ; test-ptr-cast. c 22 Segmentation fault Segmentation fault is a run time error signaled by the operating system when the user attempts to read/write to some memory areas where the user has no right to access to int * p ; v = *p; * p = 5; The following code tries to read and write everywhere test-seg-fault. c Generic pointer (void *) C allows defining a generic pointer by void * p; p is a simple address of a memory location, however no type of the pointed variable is specified It is possible to have int v=4; void * p; p = &v; however, it is not possible to dereference it by *p. The compiler doesn’t know how to interpret the byte at the memory location pointed by p. Pointer arithmetics If p is a pointer to , (p+i) is a pointer to p[i] of the array p of elements of type The address pointed by p+i, then is p+i*dim, with dim=sizeof(*p) Example: assuming that the following variables are declared int v = {1, 9, 1000}, *q = v + 3; among the following expressions, which one is correct? For the correct ones, what is the action taken? q = v+1; v = q+1; q++; *q = *(v+1); *q = *v+1; q = *(v+2); v = (int)*((char *)q-3); q[-1] = *(((int *)&q)-9); v[-1] = *(--q); 23 address content variable......... 008100 1 008104 9 008108 03E8=100010 v 00810C 0...... 008124 0 008128 00810C q...... 5.1 scanf, copying memory scanf: a printf-like method to read the input fgets(...)+strtol(...) require to invoke two functions and a preallocated string buffer scanf allows to read from stdin a string and stores the converted input into the pointed variable Standard example of usage int n ; scanf ( " % i " , & n ) ; ’i’: reads an integer(hex: if it starts with 0x, octal: it starts with 0, decimal: otherwise) Input format is similar to the printf The input is read until a “white-space”: space, tab, newline do not use scanf with “%s” to read a string: you may get a segmentation fault (by writing over more than the allocated memory). fgets should be used to read strings man scanf for more format conversions and specifications Copying/setting memory blocks To use the following function, you must add the following line on top of your program # include < string.h > To copy n bytes from the memory pointed by src to the memory pointed by dst, we can use void * memcpy ( void * dest , const void * src , size_t n ) ; – we must have access to both *src and *dest – troubles if two memory areas overlap (check bcopy(...) or memmove(...) in case of overlap) To fill the first n bytes pointed by p with the character c, use void * memset ( void *p , int c , size_t n ) ; – the memory area pointed by p must be allocated – bzero(p,n) is the same as memset(p, 0, n) 24 6 Debugging by gdb Debugging Debugging is very helpful to find issues in programs gdb is the debugging engine as everything in Unix/Linux, it is very powerful and very cryptic Launching gdb 1. As examples, we debug a code terminating with Segmentation Fault test-seg-fault. c 2. To properly debug a program, it must be compiled with the flags: -g, to add extra information to the object files -O0, to turn all code optimization off -O0 is a valid alternative gcc -g -O0 test-seg-fault.c 3. To run a program within the debugger gdb 4. Even if the executable should have some command-line options, just ignore it gdb commands 1/2 It appears the prompt (gdb) here gdb commands may be entered – help , help on commands – list , list the source code ∗ list , list the source code starting from of the current file being debugged – break , insert a breakpoint at line (always insert a breakpoint before running) ∗ break : , insert a breakpoint at line in file – info b , show current breakpoints. Each breakpoint is identified by a numeric ID – del , delete the breakpoint number – run , run the executable (until the first breakpoint) ∗ run , run the executable with the specified command-line arguments gdb commands 2/2 next , execute a line of code: if a function call, invokes the call step , execute a line of code: if a function call, step in the function cont , continue the execution until the next breakpoint, print , evaluate and print display , evaluate and print at every step bt , “backtrace” shows all the called functions on the stack quit , to quit the debugger 25 7 Understanding all 2519 options of gcc From C program to an executable A C program (which is a text file) becomes an executable after a sequence of transformations Each transformation takes a file as input and produces a file as output gcc is called the “compiler”, however it makes the next 4 steps (compiling is just one step) 1. Pre-processing: the pre-processor syntatically replaces pre-processor directives (starting with “#”, #include, #define, #ifdef,...) 2. Compiling: the compiler translates the C code into assembly code 3. Assembling: the assembler translates assembly instructions into machine code or object code 4. Linking: object code is linked to the library code source file (.c) header files (.h) pre−processing libraries header included, macro expanded compiling assembly code (.s) assembling object file (.o) library objects (.a,.o) linking executable 7.1 Pre-processing Pre-processing: overview input The original C program (text file) written by the programmer output Another text file with all pre-processor directives being replaced/expanded (still a C program) The pre-processor replaces text typographically The “instructions” of the pre-processor are called directives Pre-processor directives starts with the symbol “#” Pre-processor directives are not indented: they always begin at the first character of the line Brief list of directives is: – #define, defines a “macro” to be replaced – #include, insert another file – #if, #ifdef, insert/remove portions of text depending on conditions Pre-processing: #define directive, constants #define is used to define costants and macros. Classic example: # define VEC_LEN 80 int v [ VEC_LEN ] , i ; for ( i =0; i < VEC_LEN ; i ++) { v [ i ] = ; } 26 If VEC_LEN is changed, it is sufficient to change the value only in one place and not everywhere the length of the vector is used by convention macro names are always un UPPER CASE macros are used to configure the code (try make menuconfig to configure the Linux sources) a macro can be defined when invoking gcc. Example: gcc -D PI=3.14 is equivalent to add at the head of file # define PI 3.14 Empty constants are possible: they are removed from the source file # define EMPTY_CONST Pre-processing: #define directive, macros #define can be used to define parametric macros, which may seem functions but are not!! # define SQUARE ( x ) x*x a = SQUARE (2) + SQUARE (3) ; what happens with # define SUM (x , y ) x+y a = SUM (1 ,2) * SUM (1 ,2) ; it is expanded in a = 1+2*1+2; macro with parameters must always have round brackets # define SUM (x , y ) (( x ) +( y ) ) a = SUM (1 ,2) * SUM (1 ,2) ; Pre-processing: #define directive, long macros #define macros must fit in one line! long definitions are possible but the character \ must be used to break the line Example: # define EXCHANGE ( type ,a , b ) {\ type aux ;\ aux = a ; \ a = b; \ b = aux ; } to be used as EXCHANGE(int, a, b); If v is a parameter of a macro, #v is the string of v. Useful for printing a variable in debugging # define PRINT_INTV ( v ) printf ( " % s =% i \ n " ,#v , v ) ; PRINT_INTV ( var1 ) ; 27 Pre-processing: #include directive #include is used to include an external file – if the included file is in angular brackets #include the file is searched in standard paths (usually \usr\include\) – if the included file is in double quotes #include "my_header.h" the file is first searched in current directory (used to include user-defined headers) #include is usually used to include header files A header file exports some functions of a library The C standard library, often called libc (glibc is the GNU libc) collects many useful functions – stdio.h, functions for input/output, files, etc. – string.g, string handling, copying blocks of memory – math.h, mathematical functions (sin, cos, pow, etc.) – errno.h, to test error codes set by functions – limits.h, architecture-dependent min/max values of different types – stdlib.h, random numbers, memory allocation, process control – ctype.h, for testing the type of characters (upper/lower case, etc.) Pre-processing: conditional inclusion portions of code may be conditionally inserted by – “#if, #else, #endif” directives # if integer - const # else # endif – “#ifdef, #ifndef, #else, #endif” directives # ifdef macro # endif # ifndef macro # endif conditions of #if cannot be specified by C variables!! (must be evaluated at pre-processing time, not run time) Pre-processing: how to avoid multiple inclusions It may happen that a C program includes the following header files #include #include however, they both include #include 28 which would give a “double definition” warning/error for many functions/variables to prevent multiple inclusions, all header file starts and ends as follows (example: /usr/include/stdio.h) # ifndef _STDIO_H # define _STDIO_H # endif try gedit /usr/include/stdio.h Pre-processing: temporarily removing code the directive #if offers a convenient way to add and remove code this is useful for testing purpose # if 0 # endif # if 1 # endif Pre-processing: pre-defined macros for debugging To support the debugging, the following macro are predefined __FILE__ string expanded with the name of the file where the macro appears; useful with programs made by many files __LINE__ integer of the line number where the macro appears __DATE__ string with the date of compilation __TIME__ string with the time of compilation A good example of debugging code is: # ifdef DEBUG # define MY_DBG printf ( " File %s , line % i \ n " ,\ __FILE__ ,\ __LINE__ ) # else # define MY_DBG # endif Pre-processing: the NULL pointer macro The macro NULL represents a pointer (address in memory) which is invalid # define NULL ( void *) 0 The value of the NULL macro is zero. After int * p; p = NULL; all bits of the variable p are zero. a NULL pointers cannot be dereferenced: it does not point to any useful memory location 29 Pre-processing: invoking preprocessor only By running gcc -E filename the pre-processor only is executed on filename and the output is written to the terminal (stdout) source file (.c) header files (.h) pre−processing gcc −E libraries header included, macro expanded compiling assembly code (.s) assembling object file (.o) library objects (.a,.o) linking executable Hence, by gcc -E filename > after-pre-proc the output of the pre-processor is written to after-pre-proc test-preproc. c Using #define macro to declare standard used The development of C libraries and Unix is 50 years long! Over the years, many different libraries, standard, APIs were proposed Feature Test Macros are a way to declare the desired standard Examples: # define _GNU_SOURCE # define _BSD_SOURCE # define _POSIX_C_SOURCE # define __STRICT_ANSI__ man feature_test_macros or gedit /use/include/features.h for full description Important: the availability of some functions may depend on the these macro – This can be seen at the man page. Example: man sigaction These macros must appear before any #include directive gedit /usr/include/stdio.h Pre-processing: options -E stop after pre-processing and produce the output to the terminal (stdout). Must be redirected to file is it is needed to save it -D , defines a macro -I , search directory before standard include directories – useful if you want to override standard declaration of functions 30 7.2 Compiling Compiling: invoking compiler only After the pre-processor is run, the C program (text file) is traslated into a sequence of assembly instructions (still a text file) gcc can be stopped after the pre-processor and the compilation by gcc -S.... source file (.c) header files (.h) pre−processing libraries header included, macro expanded gcc −S compiling assembly code (.s) assembling object file (.o) library objects (.a,.o) linking executable by default gcc -S.c saves the assembly instructions in.s Compiling: options billions of options for compiling man gcc 1. Cross-compiling: produce the assembly for different architectures: -m32 32-bit architectures -marm ARM architectures 2. Optimization of the code -O2 some typical optimizations (such as loop unrolling): optimizations depends very much on the architecture -Os, optimize the size of the object file 3. Debugging -g, add debugging symbols (used by the debugger gdb) -O0, no optimization (optimized code is hard to debug) 4. Try compiling by gcc -S -g -O0 test-print-char.c Compiling: syntax to be used for the exercises/project 1. -std=c89, select the ANSI C standard (the first standardized C in 1989) variables are declared only at the top of the block. Not allowed to declare variables “on the fly” as in for (int i=0; i you can use the macros INT_MIN, INT_MAX, USHRT_MAX, etc. for the maximum/minimum constants of all the types Integers: constants In C code integer constants are 1. sequences of digits without a decimal dot “.” – if they start with “0x”, they are interpreted in hexadecimal – if they start with “0”, they are interpreted in octal – otherwise they are interpreted as decimal 33 2. single characters within ’ (as in ’a’) to represent the ASCII code of that character 3. Best expression to write the ASCII code of the digit n is ’0’+n constants may also be explicitly declared as unsigned, long or both, otherwise they are int – “345U” for unsigned – “234L” for long – “2367LU” for unsigned long test-int-const. c Integer promotion The usage of variables shorter than int such as char or short, may be good to save memory in memory constrained devices (embedded systems) Still, operations by the CPU are more conveniently performed over the “word”, which is as long as an int char and short variables are promoted to int when they appear in expressions test-promote-char. c also, the effect of int-promotion and mixing signed and unsigned integers in the same expression may generate unexpected results 8.2 “Boolean” The type boolean does not exist Although conditions do exist When evaluated as condition, a numerical expression is false if is equal to zero true otherwise Example of a for loop for ( i =10; i ; i - -) { } 8.3 Floating-point numbers Floating point: representation Two types for floating-point representation: float, double A floating-point number n is represented by – one bit for sign s of the number; – “biased” exponent e (biased exponent introduced to give a special meaning to e = 0) – fraction f , that is the sequence of digits after the “1,”; Standard IEEE 754-1985 34 Floating-point constants are written in C with the decimal dot “.” or with the letter e (or E ) double a ; a = 10.0; a =.3; a = 84753933.; a = 918.7032 E -4; a = 4 e +12; a = 3.5920 E12 ; Floating point: imprecise arithmetic The finite number of bits to represent real numbers introduces an approximation error The approx error may even lead to violation of basic properties, such as the associativity of addition double d1 = 1 e30 , d2 = -1 e30 , d3 = 1.0; printf ( " % lf \ n " , ( d1 + d2 ) + d3 ) ; printf ( " % lf \ n " , d1 + ( d2 + d3 ) ) ; Also, if a floating point number needs to be tested if it is equal to zero never use == 0 or != 0 Always, test proximity to zero (not equality) by some code as double a , b , tol ;... tol = 1e -6; if ( fabs (a - b ) < tol * a ) {... } Testing now many conditions test-constants. c 8.4 Type conversion Automatic type conversion In expressions with operands of different types, each operand is converted in the most expressive format Order of expressiveness char < short < int < long < float < double Example of automatic conversion in expressions if (3/2 == 3/2.0) { printf ( " VERO : -) \ n " ) ; } else { printf ( " FALSO : -(\ n " ) ; } It is printed FALSO :-( 35 Conversion by assignment An expression assigned to a variable is converted to the type of the assigned variable Assignments to same type of smaller size are truncated Example of conversion by assignment double a =1025.12; int i ; unsigned char c ; i = a; // i gets 1025 ( fractional part truncated ) c = i; // c gets 1 ( least significant byte of int ) Explicit conversion: cast The programmer may specify a type conversion explicitly: cast (type) expression Example of explicit conversion in expressions if (3/2 == ( int ) (3/2.0) ) { printf ( " VERO : -) \ n " ) ; } else { printf ( " FALSO : -(\ n " ) ; } It is printed VERO :-) The content of variable may be altered after a (explicit/implicit) type conversion Example: type conversion test-celsius. c 9 C: functions Functions Functions are used to break down a complex problem into smaller ones If you find yourself copying/pasting lines of code which “do something”, then you may need a function for that code Functions are not parametric macros As in mathematics with R2 → |{z} f : |{z} R , input output a C function gets an input and produces an output A C function is characterized by 1. the declaration of the function (aka function prototype), which holds information about – the name of the function (mandatory) – the list of types of input parameters (optional) – the type of one output parameter (optional) 2. the body of the function, which is the code that processes the inputs to produce the output the void type is specified for missing input, output or both 36 Functions: declaration (or prototype) The compiler requires that a function is declared before being used The declaration of a function (or prototype) is a line of code with – the type of one output parameter (void if none) – the name of the function (mandatory) – a comma-separated list of types of input parameters within round brackets (optional) – terminated by a semi-colon “;” Notice: the compiler (step “2” of gcc) allows using a function by only knowing its declaration!! Example of function declaration void sort ( int * v , unsigned int num ) ; and an equivalent way without the parameter names void sort ( int * , unsigned int ) ; Functions: definition (or body) The definition of a function includes 1. its declaration and 2. its body int min ( int a , int b ) { if (a < b ) { return a ; } else { return b ; } } The body is needed by the linker only (step “4” of gcc) Why may it be useful to have a declaration without a body? – Libraries of functions expose to the user the declaration of the functions only (in the header file, such as stdio.h) – The body may be intentionally hidden to protect the code – The function body and the code using the function may be both developed and compiled separately (by different teams) test-declare-fun. c Functions: invocation The declaration or the full definition of a function must appear above its first usage – otherwise the compiler doesn’t recognize the function name – try to move #include at the bottom in hello. c 37 A function is invoked by passing the parameters in accordance to the declaration int min ( int , int ) ; int main () { int a ; a = min (4 , -2) ; } at compile time, only the function declaration is needed a function fun with void list of parameters is invoked by fun() Functions: passing parameters When a function is invoked, the invocation parameters are copied into additional variables A function can use and modify the paramaters These modifications, however, have no effect outside the function int mul ( int x , int y ) { x *= y ; return x ; } int main () { int a , b =4; a = mul (b ,3) ; } Fuctions: how to modify a parameter? Often times it is needed that a function modifies one or more parameters. Example: to sort an array However, parameters are always copies: any change to a parameter is lost after returning Solution: if some data needs to be modified by a function, then we declare a function that receives a pointer to the data, not the data itself Through the pointer the original data may be modified Example void sort ( int * v , unsigned int n ) { } The ponter only is copied internally to the function. The function can then access and modify the data through the (copied) pointer – often time called “call by reference” 38 Functions: passing const parameters Sometimes it is needed to pass a large amount of data to functions (a long vector, etc) To avoid copying all the data as parameters (which is inefficient), it is advisable to pass only a reference to the data (a pointer) In this way, however, the function may accidentally (or maliciously) modify the data To pass a pointer to a data structure that we don’t want to modify we use the keyword const before the parameter For example (man 3 printf) int printf ( const char * format ,...) ; Functions: returning the keyword return is used to return the value of a function once return is executed, no other statement of the function is executed there may be more than one return in the function body: the first one that is encountered is the one executed functions with void output: – has not return statement: it completes once the closing bracket “}” is reached – may have a return; with no value Functions vs. parametric macros stage of gcc: macros expanded by preprocessor, functions are compiled type checking: in macros, the type of operands is not checked efficiency: macros may be more efficient than functions, no parameters passing, no call instruction size of executable: if macros are used used the size of the executable grows parameters: macros are expanded by the pre-processor, if a paramenters is modified it remains modified after the macro as well. A modification of parameters within functions isn’t seen outside return value: macros do not return any value. Still, a macro may be an expression recursion: obviously, no recursion with macros debugging: programs with many macros may be harder to debug Functions vs. parametric macros: conclusion Macros may be a good replacement of functions when: 1. the lines of code are few (say, 10) 2. the function code is used many times 3. high efficiency is needed 4. no return value, nor recursion is used 5. we are ready to hard-to-debug errors 6. gcc -E is your friend Macros may be good for: 1. computing the minimum between two values Functions may be good for: 1. sorting an array 39 10 C: scope of variables Scope of a variable The scope of a name (variable or function) is the portion of code where that name is visible and then it may be used The scope of a name (variable) declared within a function is restricted to that function Parameters of functions have the same scope of a variable declared inside a function The scope of a name declared outside any function (global variables or function declarations) is from the place of declaration until the end of the file If a variable with the same name is declared both outside and inside a function, the one inside the function prevails Global variables Global variables are declared outside any function Global variables are visible to all functions from the declaration to the end of file Usage of global variables: – good: when many functions share a large amount of data, the usage of global variable is more efficient (it prevents parameter passing) – bad: the code relying much on global variable may be: 1. hardly portable: functions are not really “isolated” pieces of code 2. hard to comprehend/debug: when the reader finds a global variable, it may be not obvious where it is declared If a function uses a global variable as input or output, it is strongly recommended to add a comment on top of the function Names of global variables should be highly informative to avoid the reader to browse much code: – number_students is good – n is bad 11 C: storage classes Storage classes All C variables have a storage class, which determines where variables are stored Normally, variables are stored in memory. Three possible areas of memory: 1. variables over the BSS (Block Standard by Symbol, historical acronym) 2. variables over the stack segment 3. variables over the heap Moreover, 4. variables may be stored in registers Finally, 5. the storage class may be delegated to other places in the code (external varables). More details in “Modular programming” 40 Memory segments Depending on the needs, the OS assigns a few memory segments to processes (which are programs in execution) Segments are mapped over the process address space Each segment has: 1. start and end addresses (meaningful in the process address space) 2. flags that determine the access modes: – read: it can be read – write: it can be written – execute: it contains code which may be executed – private/shared: if it isn’t/is shared among other processes To view the memory mapping of a process, try: ps -Af | grep sh to get a Process ID (PID) cat /proc//maps to print the memory map of 11.1 Variables on the BSS Variables on the BSS BSS is a read-write memory segment Size of BSS is decided at compile time (depending on the size of allocated variables, plus some padding for alignment) Two ways to allocate variables over the BSS 1. global variables 2. local variables declared with the static qualifier void func ( void ) { static int my_static_var ;... } Allocated: at the begin of the program Deallocated: at the end of the program static variables within functions Scope: same as local variables (only within the function) Lifetime: same as global variables (from the start to the end of the program) Typical usage: to keep some state between consecutive invocation of a function void func ( void ) { static int cou nt_in vocati ons = 0; ++ co unt_in vocati ons ;... } Example: test-static. c 41 11.2 Variables on the stack Variables on the stack When variables are declared at the top of a function, they are allocated onto the stack (unless the static qualifier is pre-fixed) Variable are allocated over the stack by reducing the stack pointer as needed by the size of the variables Allocated: when the function is entered Deallocated: when returning from the function Hence, we cannot rely on the their initial value The prefix auto in variables declaration, such as in void func ( void ) { auto int my_stack_var ;... } may be used. However, since it is the default allocation it is rarely (never?) used explicitly How much space is available on the stack? test-stack-killer. c Content of the stack The stack is a memory area with LIFO (Last-In First-Out) policy – push assembly instruction stores data to top of the stack – pop assembly instruction extracts data from the top of the stack Main purpose is to store parameters and return address of function invocation – when a function is invoked (call assembly instruction), 1. the parameters of the function invocations are pushed to the stack 2. the return address is pushed to the stack 3. then the control flow goes to the invoked function – when we return from function (ret assembly instruction)) 1. the return address is fetched from the stack 2. the control goes back to the invoking function The example test-stack. c – shows the content of the stack (the parameters, the return address, etc.) – alters the content of the stack to modify the execution flow (typical attack) 42 11.3 Variables on the heap Variables on the heap: dynamic allocation The heap (in Italiano “mucchio”, “cumulo”) is a memory area available to the program upon specific request to the operating system The program may ask the OS some memory via the following calls # include < stdlib.h > void * malloc ( size_t size ) ; void * calloc ( size_t nmemb , size_t size ) ; void * realloc ( void * ptr , size_t size ) ; which is returned via a pointer (void *) – malloc allocates size bytes in memory – calloc allocates nmemb elements of size bytes in memory and set them to zero – realloc changes the size of previously allocated area Allocating memory via malloc is called dynamic memory allocation because the size of the allocated memory is decided at runtime When variables are declared, the size of memory is decided at compile time (static memory allocation) Standard ways for dynamic allocation Standard code to allocate an array of num elements int * p ; p = malloc ( num * sizeof (* p ) ) ; calloc(...) has a slightly different syntax and it clears the memory (set all bytes equal to zero) int * p ; p = calloc ( num , sizeof (* p ) ) ; After the allocation, the memory can be used as needed p = calloc ( num , sizeof (* p ) ) ; for ( i =0; i < num ; i ++) { p [ i ] = i * i ; } Memory must be freed All memory areas allocated by malloc, calloc and realloc must be released by free standard code to deallocate a memory area pointed by p is free ( p ) ; free(p) is error, if p not returned by malloc/calloc A special care must be taken to free a memory area before the pointer to the area is lost 43 p = malloc ( N * sizeof (* p ) ) ;... p = & v ; To avoid forgetting to free the memory, it is recommended to write the free code immediately, possibly at the bottom of the file. Lifetime of memory allocated onto the heap – Allocated: when malloc()/calloc() is invoked – Deallocated: when free() is invoked (or at the end of the program) Static vs. dynamic allocation Is it better static allocation int v; or, dynamic allocation int * v; v = malloc(100*sizeof(*v)); used by same syntax: v = 412; Dynamic allocation can use less memory than static allocation (static allocation requires overallocating, by dynamic allocation memory can be allocated when needed) Static allocation is faster since it avoids expensive system calls such as malloc and free Example of usage of malloc test-malloc. c 11.4 Variables in memory: comparison Allocation of data in memory Let us have a look to the following examples: – test-show-addr. c Remember the difference between char v [] = " string0 " ; char * p = " string1 " ; – string0 may be modified (it belongs to a page with “w” permission) – string1 may not be modified (no “w” permission) 11.5 Variables stored in processor registers register variables The compiler may be informed that some variable should be allocated to a register of the processor by adding the keyword register at the declaration register int my_register_var ; register variables are used for frequently accessed variables: access time to a register if 10–100 times faster than access to memory The number of register is limited: the compiler cannot guarantee the allocation to a register 44 12 C: more on operators Conditional and comma operators The conditional “?:” is a ternary operator ? : returns: – if is non-zero – if is zero Typical usage is the maximum # define MAX (x , y ) ( (x) > (y) ? (x) : (y) ) The comma “,” operator , and are evaluated in this order and is returned – sometime used in for loops in the first and third expressions, when more assignments or increments are needed int v[VEC_LEN], *p, i, somma = 0; for (p = v, i = 0; i < VEC_LEN; p++, i++) { somma = somma + *p; } Precedence of operators Operators (such as “+” or “*”) are used to combine operands and then produce expressions When evaluating a complex expression, in what precedence are operators evaluated? a = 2; b = 3; c = a+a *b; In math we know that multiplications are made before addition This is called precedence of operators C operators have a precedence (to be illustrated later on) Associativity of operators When combining operators of the same precedence, in what order do we proceed? a = 2; b = 3; c = 4; d = a - b - c; a = b = c = d; Associativity can be “right-to-left” or “left-to-right” 45 Table Precedence/Associativity 1/3 Available at http://en.cppreference.com/w/c/language/operator_precedence Starting from highest precedence Prec. Operator Description Associativity 1 ++ -- Suffix/postfix incr. and decr. Left-to-right () Function call [] Array subscripting. Struct/union access -> Struct/union access via pointer 2 ++ -- Prefix increment and decrement Right-to-left +- Unary plus and minus ! ~ Logical NOT and bitwise NOT (type) Type cast * Indirection (dereference) & Address-of sizeof Size-of Table Precedence/Associativity 2/3 Available at http://en.cppreference.com/w/c/language/operator_precedence Prec. Operator Description Associativity 3 */% Mul., div., and remainder Left-to-right 4 +- Addition and subtraction Left-to-right 5 > Bitwise left shift and right shift Left-to-right 6 < >= Comparison > and ≥ respectively Left-to-right 7 == != For relational = and 6= respectively Left-to-right 8 & Bitwise AND Left-to-right 9 ^ Bitwise XOR Left-to-right 10 | Bitwise OR Left-to-right 11 && Logical AND Left-to-right 12 || Logical OR Left-to-right Table Precedence/Associativity 3/3 Available at http://en.cppreference.com/w/c/language/operator_precedence Prec. Operator Description Associativity 13 ?: Ternary conditional Right-to-Left 14 = Simple assignment Right-to-Left += -= Assign. by sum/difference *= /= %= Assign. by prod./quot./remainder = Assign. by bitwise left/right shift &= ^= |= Assign. by bitwise AND/XOR/OR 15 , Comma Left-to-right 13 C: composite data types 13.1 Data structures: struct Structures: declaration primitive data types: int, char, double, etc collection of homogeneous data: arrays collection of heterogeneous data: structures How to declare a structure? Example: 46 struct point { double x ; double y ; }; Each piece of data is called field of the struct In the example, the struct point has 2 double fields with names x and y The name of the type is “struct point”. Hence, variables of that type are declare by struct point p1 , p2 ; Structures: initialization Initialization by listing values within curly braces {...} separated by commas struct info { int id ; char * name ; int age ; }; struct info el1 = {3 , " Aldo " , 45}; the initialization of each field must follow the order of declaration. Structures: usage Each field of a struct is referred by the “dot” notation struct info { int id ; char * name ; int age ; }; struct info v1 ; v1. id = 10; When structures are accessed by pointers, each field of the pointed struct is referred by the notation “->” struct info * p ; p = malloc ( sizeof (* p ) ) ; p - > age = 35; Structures: byte alignment, padding How much memory is allocated to a struct? Where? struct myrecord { int field1 ; double field2 ; }; 47 (int) padding (double) 8 bytes Normally, fields are allocated in memory in the order they are declared Amount of memory of a struct may be more than sum of memory of each field sizeof(struct myrecord) =sizeof(field1)+ + sizeof(field2) + · · · + ”padding” “padding” may be added to align the fields to “good” memory boundaries (multiples of 4, 8, or 16) test-struct. c Structures: assignment struct may be assigned struct info a , b ; a = b; however, they cannot by tested with the equal sign. The following code is incorrect struct info a , b ; if ( a == b ) {... } 13.2 “Overlapping data structrures”: union Unions The union data type is declared similarly to struct union my_union_t { double f ; long i ; }; double f long i 8 bytes however all fields overlaps in memory, starting from the same address!! (the term “field” may sound a bit inappropriate for unions) if you modify one field, the others are modified too!! test-union. c hence, sizeof() is the size of the largest field unions are used to store alternatives union used to save memory (especially in embedded systems) 48 13.3 Enumerating constants: enum Enumerations Enumerations are used to define “labelled constants” A labelled constant is an integer constant with a name Example of declaration enum month { Gen , Feb , Mar , Apr , May , Jun , Jul , Aug , Sep , Oct , Nov , Dec }; test-enum. c – The value of the first constant is set to zero unless explicitly specified by the programmer (for example, with “Gen = 1”) – From the second constant, the value is incremented unless the programmer specifies explicitly another value (for example, with “May = 2”) The purpose of enum is to improve readability of code variables of enum type are replaced by their value in the assembly code 13.4 Defining new data types: typedef Defining new types typedef allows defining “new” types (to rename an old type) typedef ; Used to hide the real type used – good: when you do not trust who will read your code – bad: when you trust who will read your code (it may be complicated to go through many include files to understand the type of a variable) for example, /usr/include/stdint.h has many integer types defined which specifies the exact size of the integer gedit /usr/include/stdint.h often types are also defined by pre-processor macros with #define. # define MY_TYPE double MY_TYPE my_var ; Differences: macro-defined type is just a replacement by the pre-processor 13.5 Dynamic lists Dynamic lists by struct, typedef, malloc,... In C, dynamic lists are created by – defining the element of the list by a struct typedef struct node { int value ; struct node * next ; } node ; typedef node * list ; 49 – the struct has a pointer to the next element – setting a pointer head to the head of the list – the.next field of last element has value NULL – node insertion: 1. new node allocated by malloc(...) 2. the new node is properly linked – node removal: 1. node is unlinked 2. node memory deallocated by free(...) test-list. c memory.value.next.value NULL.next head.value.next 14 C: more on pointers Array of arrays (“matrices”) C allows declaring bi-dimensional arrays v[][]; – allocates an array v of DIM1 elements. – each element is an array of DIM2 contiguous variables of type Overall, it is allocated for v a contiguous amount of memory of sizeof()*DIM1*DIM2 bytes – The first element in memory is v then v and so on – The element v[i][DIM2-1] (last element of “row” i) is followed by v[i+1] (first element of next “row”) – The last element in memory is v[DIM1-1][DIM2-1] Example: test-bi-array.c int v; – v[i][j] is the j-th element of the i-th array in v – v[i] is the array of 3 int at position i in v – v is an array of 10 arrays of 3 int variables WARNING: elements are addressed by v[i][j] and not by v[i,j] 50 Array of pointers Pointers are variables – Arrays of pointers can be declared and used as arrays of any variable An array of pointers is declared by < type > * v [ < size >]; which statically allocates an array of pointers to Example of initialization: char * p [] = { " defghi " , " jklmnopqrst " , " abc " }; initializes: – a vector p with three pointers p, p and p (read/write) – three strings pointed respectively by p, p and p (read-only) test-array-ptr. c Usage of array of pointers: command-line arguments When commands are invoked at the shell, they may have a sequence of space-separated “command-line arguments” Example: gcc -c my_file.c -o my_file – the command is gcc – 4 command-line arguments follow Command-line arguments can be read and used within a program We have been writing the main as int main () { } however, to read command-line arguments it must be written as int main ( int argc , char * argv []) { } – argc: number of space-separated strings at command line – argv: array of pointers to each string test-command-line. c 51 Pointers to pointers C allows the declaration of pointers to pointers, for example by int ** p ; in this case: – p is a pointer of type (int **) pointing to a memory address containing a variable of type int * – *p is a pointer of type (int *) pointing to a memory address containing a variable of type int – **p is an (int) variable memory (int) **p p *p Also variables of type int **** p ; are possible. Then p is a pointer to a pointer to a pointer to a pointer (4 times!!) to a variable of type int I never saw in the code more than 2 levels of dereferencing test-ptr-ptr. c Pointers to functions The code of functions is in memory It is then possible to declare pointers to functions A pointer to a function is the address of the first instruction executed after the CALL assembly instruction A pointer to a function is declared by: