201.pdf Threading Concepts PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides an overview of threading concepts, including POSIX threads, synchronization, and locks. It details how threads are similar to processes, but more lightweight, and share an address space making data sharing easier.
Full Transcript
Threading Threads: Similar to processes, you can run a thread and it’ll run in parallel They are more lightweight. Processes will get their own virtual memory but with threads, they share an address space, hence less overhead. ○ Data sharing is easy. Any thread can read fr...
Threading Threads: Similar to processes, you can run a thread and it’ll run in parallel They are more lightweight. Processes will get their own virtual memory but with threads, they share an address space, hence less overhead. ○ Data sharing is easy. Any thread can read from or write to a global variable. ○ A process always has at least one thread called the main thread. POSIX Threads: pthread_t: int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void *), void *restrict arg); ○ Starts a new thread in the calling process. We pass in a pthread_t for the thread parameter, attr specifies the various attributes of the new thread (can just be set as NULL), then the thread function to start, and the attributes being passed to the thread function when it starts. [[noreturn]] void pthread_exit(void *retval); ○ Terminates the calling thread. Performing a return from the thread function implicitly calls the pthread_exit() with its return value. pthread_t pthread_self(): ○ returns the ID of the calling thread. Same value that is returned in *thread in the pthread_create(3) call that created this thread. Always succeeds, returning the calling thread’s ID. int pthread_join(pthread_t thread, void **retval); ○ Waits until the thread identified by the parameter terminates. You can get the return value from the thread with void **retval. If retval is not null, then pthread_join() copies the exit status of the target thread into the location pointed to by retval. int pthread_detach(pthread_t thread); ○ lets the calling thread just run. You can use this when you don't need to return anything. Synchronization Data Race: Data Race: multiple threads overriding each other and overriding the value. Accessing values at the same time. Race Condition: condition where the outcome is based on the timing of the operations. Non Deterministic vs Deterministic Behaviour: ○ Non-deterministic, meaning the outputs are different every time the program runs. Behaviour is different. Non-deterministic behaviour does not always lead to non-deterministic output. More often we expect non-deterministic behaviour to produce deterministic output. ○ Deterministic, outputs where a program outputs the same result every time it runs. Locks Mutexes: Mutual Exclusion. A lock ensures only 1 thread can execute a piece of code A mutex or a lock is a primitive that gives us the ability to prevent the mix-up of sub-operations from different threads. Lock mechanism Consists of: a. A lock variable that defines a lock b. lock() function that grabs a lock c. unlock() function that releases a lock pthread Lock Mechanism: pthread_mutex_t ○ The data type to define a lock int pthread_mutex_lock(pthread_mutex_t *mutex) ○ The lock function that grabs the lock passed as the argument int pthread_mutex_unlock(pthread_mutex_t *mutex) ○ Unlock function that releases the lock passed as the argument Only a single thread can hold a lock: ○ If thread A calls lock() on mutex, the underlying lock mechanism allows thread A to grab the lock mutex. ○ If thread B then tries to lock the mutex, it will not be allowed to as only a single thread can grab a lock. The lock mechanism now becomes blocking for thread B until lock A releases the lock. Lock Usage: Atomicity ○ Turning the multiple operations into 1 single operation inside of a lock and unlock(). All or nothing, as it runs either all operations or no operations at all. Serialization and Interleaving ○ Seralization: Ensuring tasks or resource access are executed sequentially to avoid conflicts and maintain thread safety. ○ Using a lock effectively serializes operations, eg. only one thread at a time can run the operations guarded by a lock. ○ Operations from different threads are interleaved in some order. But we can’t control the order in which different threads run. Protecting Shared Variables ○ Whenever multiple threads share a variable, we need to use a lock to control the access to the variable. Typically the shared variable is a global variable in C. Multiple Locks ○ Lock Contention: Occurs when multiple threads compete for the same lock, causing delays as threads wait to acquire the lock before accessing a shared resource. ○ Reducing lock contention is important for performance. ○ Fine grained Locks: using multiple locks for different pieces of data pthread_mutex_trylock() and pthread_mutex_timedlock() ○ Allows us to control the blocking behaviour of pthread_mutex_lock() ○ pthread_mutex_trylock() Is the equivalent to pthread_mutex_lock() except that if the mutex is occupied, then return immediately. ○ pthread_mutex_timedlock() Locks the mutex object referenced, if it is already locked, the calling thread will block until the mutex becomes available. If the mutex can’t be locked without waiting for another thread to unlock the mutex, this wait should be terminated when the specified timeout expires. Critical Section and Thread Safety: Critical Section: ○ A piece of code that accesses a shared variable and must not be concurrently executed by more than one thread. ○ If a thread is executing the CS, no other threads should execute the CS. A solution for the critical section problem should satisfy: ○ Mutual exclusion: only one thread should be allowed to run in the CS ○ Progress: A thread should eventually complete ○ Bounded Waiting: An upper bound must exist for the amount of time a thread waits to enter the CS. A thread should only be blocked for a finite amount of time. Thread Safety: ○ Thread Safe Function: Function that multiple threads can run safely Either does not access shared resources or provides proper protection for its critical sections that access shared resources. Reentrant Functions: Function that produces the correct output when it is called again while executing (via. Different threads, signal handlers, etc.) Function that can be safely called simultaneously by multiple threads or interrupted and resumed without affecting its behaviour or data, as it does not rely on shared or static data. printf() is not safe since it uses a buffer so that if a main thread was running, then a signal handler came and interrupted it, it would then be overwritten since it uses a shared buffer. For write() it is safe since we have to provide a buffer for it or also called a caller-allocated buffer. Deadlock and Livelock: Deadlock: Condition where a set of threads each hold a resource and wait to acquire a resource held by another thread. The threads get stuck and make no progress. Eg. 2 threads are grabbing locks and they both get stuck. They both require a lock from each other but they locked both of them so they are at a standstill. Conditions for a Deadlock to happen: Although a deadlock may not occur even if the conditions below hold but mutual exclusion with non-preemptive guarantees a deadlock a. Hold and Wait: Threads are already holding resources but are also waiting for additional resources being held by other threads b. Circular Wait: There exists a set {T0, T1,.., Tn-1} of threads such that T0 is waiting for a resource that is held by T1, T1 is waiting for T2, …, Tn-1 is waiting for T0. c. Mutual Exclusion: Threads hold resources exclusively. There is some section of the code that only one thread can run. No two threads run at the same time at that point, no 2 locks can share that same section. d. Non-preemptive: Resources released only voluntarily by the thread holding it. This condition is something out of our control. Deadlock Prevention: a. Grabbing all the locks at once atomicity. Breaks hold-and-wait since you grab all the locks together or no locks at all. b. Acquiring locks in the same global order for all threads. Breaks circular wait condition as all the threads try to grab locks in the exact same order. Livelock: Conditions where a set of threads each execute instructions actively, but they still don’t make any progress. The threads still execute but don’t make any progress. Eg. an infinite loop, if there are two threads: T0, T1, each of which tries to acquire two resources, R0 and R1. ○ They run a function that acquires the first resource, then tries to access the second. If the second is not available, the function releases the first resource and tries again. However, if: T0 acquires R0 T1 acquires R1 T0 now wants to acquire R1 but T1 has it T1 now wants to acquire R0 but T0 has it They both give up, release, then try again forever Condition Variables: Producer Consumer: Set of threads producing data and another set of threads consuming the data. There is typically a shared resource, eg. a variable or a buffer that these threads access for either producing or consuming data. The consumer must continuously check if there is something to consume, but if the producer doesn’t produce, the consumer keeps locking, unlocking, checking over and over. Using a Condition Variable: The consumer doesn’t need to check over and over, it can just wait for a conditional that there is new data. When waiting, it will give up the CPU pthread_cond_t ○ The data type to define a conditional variable pthread_cond_signal(pthread_cond_t *cond) ○ Sends a signal to cond. If there is a thread waiting on cond, this wakes up the thread. Only wakes up a single thread pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) ○ Waits for a signal sent to cond ○ For mutex, it behaves like a lock-safe sleep. Internally it releases mutex, waits for a signal on cond, and once a signal is sent to cond, wakes up and re-grabs mutex. Doesn’t go to sleep holding a lock pthread_cond_broadcast(pthread_cond_t *cond) ○ The function that wakes up all threads waiting on cond ○ All threads wake up and try to grab mutex. ○ Used if all threads do the same thing and it doesn’t matter which thread does the work When checking to see if there is something to consume, we should use a while because there may be other threads that may have woken up first and changed the shared variable. Semaphores: Availability Entries, like a lock with a counter ○ A lot is either available or not available. ○ There is a counter with availability, when it is available, it tells you how many are available. ○ sem_init(sem_t *sem, int pshared, unsigned int value): Same as the mutex. Initializing an unnamed semaphore at the address pointed to by the sem. ○ sem_wait(sem_t *sem): Function that grabs sem. Internally if the count is 0, it blocks until the count is greater than 0. If the count is > 0, decrements the count by one and returns. ○ sem_post(sem_t *sem): The function that releases sem. Internally it increments the count by 1 Used in producer-consumer with a bounded or circular buffer ○ Circular Buffer: a way to emulate an infinite buffer with limited space. We fill it up in a circular fashion, it goes back to the beginning. ○ Multiple threads share a buffer ○ Producer threads place items into the buffer, they must wait if the buffer is full ○ Consumer threads take items from the buffer, they must wait if buffer is empty. ○ The buffer emulates an infinite buffer with a limited size ○ In the producer, we apply a sem_wait, as long as the availableCount > 0, return right away, if there is anything available, fill in the buffer ○ In the consumer, we will use a sem_post() after consuming, which will increase the availability count. Read Write Lock: Allows either unlimited readers or a single writer but not both (XOR) ○ pthread_rwlock_rdlock(pthread_rwlock_t *rwlock): Grabs rwlock for reading. Allows any threads to grab rwlock as long as there is no thread that holds rwlock for writing. ○ pthread_rwlock_wrlock(pthread_rwlock_t *rwlock): Allows only one thread to grab rwlock Dining Philosophers: Philosophers sit at a round table and alternate between eating and thinking, there is a fork between any two philosophers. To eat, each philosopher needs 2 forks (at their left and right). Solutions: ○ Have at least one philosopher grab forks in a different order. Have one philosopher grab their left fork and then the right fork while having all other philosophers grab their right fork and then the left fork. Breaks hold and wait condition since there will be at least one philosopher who can’t grab any fork ○ Try grabbing both locks at once. Try left lock, try right lock, if you can’t grab it, give up and try again Could lead to livelock. File Input Output Basic System Calls: open() ○ int open(const char *pathname, int flags); ○ int open(const char *pathname, int flags, mode_t mode); int flags Specifies an access mode and a creation mode O_RDONLY, O_WRONLY, O_RDWR mode_t mode If creation is specified (either O_CREAT or O_TMPFILE), this specifies the file permissions ○ Returns a file descriptor: A handle to your pathname of the file A small non-negative integer, could change every time you open the file ○ File Offset File opened for read and write is associated with the file offset, which is a pointer to a specific byte within a file read() automatically increments the file offset as it reads new bytes from the file ssize_t write(int fd, const void *buf, size_t count); ○ Writes to a file descriptor and returns the number of bytes written ○ The number of bytes written may be less than count ssize_t read(int fd, void *buf, size_t count); ○ Reads from file descriptor and returns the number of bytes read. On files that support seeking, the read operation commences at the file offset, and the file offset is incremented by the number of bytes read. ○ If file offset is at or past the end of file, no bytes read, and read() returns zero ○ It is not an error if the return value is ambler than the number of bytes requested. May happen because fewer bytes are available. close() ○ Closes the file descriptor off_t lseek(int fd, off_t offset, int whence); ○ Can manually adjust the file offset ○ whence - indicates from which location we want to adjust the file offset. offset is always added. ○ 3 Possibilities: SEEK_SET Beginning of the file SEEK_CUR Current file offset SEEK_END End of the file fcntl() ○ Used for file control ○ Many use cases but one use is to modify the flags and mode used when opening a file with F_SETFL. We can open a file as read-only and later change it to read-write with fcntl() and F_SETFL Buffered I/O: 3 Categories ○ Functions that start with f: fprintf(), fscanf(), fputs(), fgets(), fput(), fget(), etc. Manages a buffer in memory and writes to the buffer. Eg. fprintf() manages a buffer in memory and writes to the buffer using write() under the hood. This is called buffered I/O The reason behind this is because there is overhead associated with system calls A system call needs to cross the boundary between user space and kernel space. Less syscalls, better performance When a user program has user data to write -> writes data to the associated buffer in memory -> data will be passed to the kernel via a system call (maybe write()) -> kernel will write to the actual disk int fprintf(FILE *stream, const char *format,...); ○ Takes a stream to write Stream is a convenient file descriptor. Used by the stdio functions. Like a file descriptor plus a buffer backing it up fdopen(): can get the file stream from a file descriptor fileno(): can get the file descriptor from a file stream ○ Same functions without f: printf(), scanf(), puts(), gets(), etc. Standard library functions ○ I/O functions that are systems calls: write(), read(), etc. Sends data directly to or from the kernel Kernel Buffering: When we send data to the kernel, it does not immediately write to the disk We can force this with: fsync() ○ Using O_SYNC with open() automatically does fsync() With user buffering and kernel buffering, we can use: ○ fflush() and fsync(): both flush their buffer. ○ setbuf() with a NULL buffer and O_SYNC: both automatically perform no buffering. Universality of I/O: UNIX I/O model is often referred to as the “everything is a file” model ○ Used not only for actual files on stored on disks but also for all I/O devices such as the terminal, network, etc. Disk Partitions: A disk is divided into partitions A partition is used as a file system ○ A system that manages files and directories. There are different ways of doing it, and there are many file systems in use. ○ The file tree starts with the root directory / ○ Think of the partition as a storage entity that contains a file tree Also used as swap space for memory management i-Nodes: Contains metadata about the file, eg. file type, permissions, owner, timestamps, etc. Identified by number Functions that show file metadata, mostly from the i-node: ○ stat(), lstat(), and fstat() Hard/Soft Links: ln Hard Links: We can give many names to the same file ○ A hard link is giving a name to a file ○ No hard link is allowed for a directory to prevent circular links. Eg. child directory that links to the parent directory. Hard link should be within the same file system. This is because a hard link is giving another name to an existing file ln -s Soft Links: Symbolic Links. Content of the file is the path to the og file ○ Dangling Link: Even if we delete the original, the sym link will point to nothing. VFS: Mounting and Unmounting: We can stitch together different partitions on a single disk Mounting: All file systems (from different partitions/disks) are mounted and form a single file tree Unmounting: Removing a file system from a file tree Network Programming: Basics of the Networking Stack: Layers: ○ Physical: Hardware control Generate and receive signals Need to know how to physically send and receive something ○ Link (MAC): Local network addressing and routing For only local network and wireless local networking ○ IP: Inter-network addressing and routing Addressing and routing for the entire internet ○ Transport: Delivers Packets: In charge of delivering packets TCP: Connection UDP: Connectionless, doesn’t provide any protection ○ HTTP: App layer protocol The Socket Interface Protocol: defines a set of rules that an entity needs to follow to communicate with another entity There are five key syscalls int socket(int domain, int type, int protocol) ○ Used to communicate with another process ○ Returns a file descriptor ○ Domain: specifies a protocol: AF_UNIX: Local communication AF_INET: IPv4 Internet protocols AF_INET6: IPv6 Internet protocols ○ Type: SOCK_STREAM: TCP reliable, connection-oriented SOCK_DGRAM: UDP datagrams, connectionless ○ Protocol: Not used for common domains, hence 0 bind() ○ Binds the socket to an address listen() ○ Marks the socket as passive. Used to wait for a connection to come. By default a socket is active accept() ○ Accepts a new connection ○ Returns a new socket to use for the new connection ○ The original socket is only used to accept new connections connect() ○ Connects to a passive socket Datagram vs Stream Socket Sequence: Datagram: ○ No passive or active socket ○ sendto() needs to specify the receiver’s address every time Stream Socket Sequence: ○ Requires a passive or active socket ○ listen() marks the socket as passive ○ accept() accepts a new connection ○ connect() connects to a passive socket AF_INET and AF_INET6 Overview AF_INET: ○ Uses 4 bytes addresses ○ Uses: struct sockaddr_in { sa_family_t sin_family; in_port_t sin_port; struct in_addr sin_addr; unsigned char __pad[X]; } sin_addr: Represents an IPv4 address 4 bytes Functions that convert human-readable to a binary address: ○ inet_pton(): presentation to network ○ inet_ntop(): network to presentation defines two constants: INET_ADDRSTRLEN INET6_ADDRSTRLEN Two Special Addresses to Know: ○ loopback address: 127.0.0.1 The constant to use for sin_addr.s_addr is INADDR_LOOPBACK. This is for local communication, similar to the UNIX domain sockets. ○ Wildcard address: The constant to use for sin_addr.s_addr is INADDR_ANY. A machine can have multiple network cards, e.g., a typical laptop has a wireless card and a wired (Ethernet) card. If this is the case, each network card gets a different IP address. If we bind() a socket to the wildcard address, we tell the OS that we want to listen on any address. sin_port: When you bind() a socket to an address, you also bind to a port number If you do not assign one, there will be one given as default, called an ephemeral port AF_INET6: ○ Uses 16 bytes for addresses Network Byte Order Little endian: the little end goes first ("little end"---the least significant byte; "first"---the lowest address). Big endian: the big end goes first ("big end"---the most significant byte; "first"---the lowest address). When sending multi-byte data (not byte by byte) we need to convert each byte to the network order, You can use these functions to translate: ○ #include ○ uint32_t htonl(uint32_t hostlong); ○ uint16_t htons(uint16_t hostshort); ○ uint32_t ntohl(uint32_t netlong); ○ uint16_t ntohs(uint16_t netshort); Host names We can use a host name instead of an IP address getaddrinfo(): given a host name, returns a set of all possible options (structs containing an IP and a port number) as it's possible to associate multiple IP&port addresses to a single host name. getnameinfo(): performs reverse---IP to host name. recv() and send() ssize_t recv(int sockfd, void *buf, size_t len, int flags); ○ Similar to read() but socket specific. ○ Provides more control, e.g.: MSG_DONTWAIT: Non-blocking MSG_PEEK: read but don't remove ssize_t send(int sockfd, const void *buf, size_t len, int flags); ○ Similar to write() but socket specific. ○ Provides more control, e.g.: MSG_DONTWAIT: Non-blocking Multiple Clients Use a non-blocking socket to handle multiple clients: ○ A non-blocking accept() will either accept a new connection or return immediately if there’s no incoming connection ○ Non-blocking read() and write() will either perform I/O or return immediately if there’s no incoming connection ○ We can then define a socket array and write an infinite loop that uses the socket array We call accept() to receive an incoming connection if any. Add the new socket to the socket array. If there's no incoming connection, accept() will return immediately. We call either read() or write() or both appropriately (depending on what we want to do with each socket) on all sockets in the array, and see if we can read or write immediately. epoll() ○ An ability to monitor file descriptors to check if a file descriptor is read for read or write We add FD’s to the monitored list Indicate what events we want to monitor the file descriptors for eg. read and write We call the function eg. select() When it returns, we check which file descriptors can perform I/O We perform the I/O ○ epoll_create() Returns an epoll instance. Like the monitor object that maintains the monitoring list ○ epoll_ctl() Allows add, remove, or modify a file descriptor to the epoll instance When you add a file descriptor, you store struct epoll_event that later you receive when the file descriptor has something new to process ○ int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); Waits for a file descriptor to be available for I/O *events is a buffer passed to epoll_wait() Each entry is for a fd that has something new to process. There can be multiple entries if different file descriptors each have something new to process Inter-Process Communication Allows different processes (as well as threads) to communicate with each other Pipes | ○ The output of the first becomes input to the second int pipe(int filedes) ○ Returns to us two file descriptors: filedes gives us the read end. filedes gives us the write end. ○ Uses a buffer in the kernel ○ Unidirectional: Meaning that once you determine who’s the sender and who’s the receiver, you can’t switch that ○ We can use regular file I/O Parent-Child Communication: ○ Common use case for pipes ○ First we create a pipe, then we call fork() to create a child ○ Both file descriptors will be available for both the parent and the child. This is because memory is cloned ○ After that, the parent and the child can communicate with each other with the pipe ○ Each process will close the end that they do not use PIPE_BUF ○ If we call write() with less than PIPE_BUF, it will be atomic (will be performed in a single step) ○ If we call write() with more than PIP_BUF, it may be atomic (data will be written interleaving) Closing the write end will completely trigger read() to return 0 after returning everything in the buffer. ○ Can be used as a signaling mechanism where if the parent gets 0 from read(), it knows that all child processes have closed the write end int dup2(int oldfd, int newfd) ○ Used in the case where we want to redirect one program’s standard output to another program’s standard input, we can combine dup2() and pipe() ○ Creates a copy of the file descriptor oldfd using the file descriptor number specified in newfd. ○ We can redirect all our program’s standard output to the write end of the pipe If we call dup2(filedes, STDOUT_FILENO), it duplicates filedes to STDOUT_FILENO. Thus, if our program writes to STDOUT_FILENO, it will write to filedes, i.e., the write end of the pipe. FILE *popen(const char *command, const char *mode) ○ Used for when we want to create a pipe to an existing program. ○ Command: Used for when we want to execute and connect with a pipe. ○ Mode: determines read or write ○ Use pclose() to close. FIFOs Two or more related processes (parent, child, grandchild, etc.) can share a pipe as above. However, unrelated processes can’t share a pipe FIFO: a named pipe ○ int mkfifo(const char *pathname, mode_t mode) pathname is the name of the FIFO to be created. mode is the permission, same as open(). This is similar to UNIX domain sockets as it creates a file. Use unlink() to remove a FIFO, just like a file. ○ As long as a process knows the pathname, it can accesses a FIFO Unidirectional One process should open it for read and the other should open it for write POSIX Message Queues Similar to a FIFO but it is typically used to send structured data (eg. struct or union) rather than a byte stream 5 Important Functions: ○ mq_open(), mq_send(), mq_receive(), mq_close(), and mq_unlink() int mq_send(mqd_t mqdes, const char *msg_ptr, size_t msg_len, unsigned int msg_prio); A message queue sends a structured data using a pointer (msg_ptr) to the structured data. The last parameter msg_prio determines a priority of the message Memory Mapping void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) ○ Creates a memory mapping ○ addr: starting address of the new mapping. NULL is typical to let the OS pick the address for the new mapping. ○ length: the length of the mapping. ○ prot: executable, readable, writable, or not accessible. ○ flags: MAP_SHARED, MAP_PRIVATE, and MAP_ANONYMOUS. (Explained below.) ○ fd: file to be mapped. (Explained below.) ○ offset: the offset of the file to be mapped. ○ Returns a pointer to the beginning of the new mapping. Two Types of Memory Mappings: ○ File Mapping: Maps a file to a memory region, allowing us to use a file as a memory region. File I/O becomes memory access. Instead of read()/write() calls, we can just use pointers and variables to read from or write to a file. Called a memory-mapped file ○ Anonymous Mapping: Another way to allocate memory to our process in addition to sbrk(). Shared or Private Memory Mappings: ○ Multiple processes can share a mapping: We can create a file/anonymous mapping and fork a child() since memory is cloned, the parent and the child will share the same mapping. Multiple processes can map the same file If the file has a value change, the value in memory will also change ○ Process can create a private mapping: Each process will have its own private copy We map a file to its own process. Four Cases of Memory Mapping: ○ Private File Mapping: A file is mapped to a process as a private mapping. If multiple processes map the same file, they will each have a private copy. If they write to its mapping, it won’t be written to the actual file. ○ Private Anonymous Mapping: More memory gets allocated to the calling process. Fork() copies the memory but each process has a private copy. Changes to the mapped memory are not propagated to the child (or vice versa) ○ Shared File Mappings: A file is mapped to a process as a shared mapping. If multiple processes map the same file, changes will be propagated to all the processes and the actual file will be written. ○ Shared Anonymous Mapping: More memory gets allocated to the calling process. Memory is shared and changes are propagated. int munmap(void *addr, size_t length); ○ Un,aps the mapped memory Shared Memory Allows sharing across unrelated processes. ○ mmap with MAP_SHARED | MAP_ANONYMOUS (i.e., shared anonymous) allows memory sharing for related processes via fork(). int shm_open(const char *name, int oflag, mode_t mode) ○ Opens a shared memory object ○ Name is assumed to be known and shared among processes. General rule for naming is to use the form /somename ○ Returns a file descriptor ○ Similar to file open but provides a file descriptor for a shared memory object ○ O_CREATE flag determines whether we open a new object or an existing one ○ Mode is for permissions on creation int ftruncate(int fd, off_t length) ○ When a memory object is created, the size is 0. We use this function to change the size void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset) ○ Creates a memory mapping for a shared memory object after its size has been adjusted ○ Need to give the shared memory object file descriptor as fd int munmap(void *addr, size_t length) ○ When we no longer need to access the shared memory, we can unmap it int shm_unlink(const char *name) ○ Once we are done with the shared memory, we can remove the shared memory object. ○ Removes the shared memory object from /dev/shm/ ○ If there are still processes that still use the shared memory object, they can keep using it Basic Crypto: CIA Model: ○ C - Confidentiality Information is only disclosed to those authorized to know it ○ I - Integrity Only modify info in allowed ways by authorized parties ○ A - Availability Those authorized for access are not prevented from it Cryptography: ○ Modern encryption: Algorithms are public Key are secret and provide security May be Symmetric (secret) or asymmetric (public) ○ Algorithms Goal: Given a key, it should be relatively easy to compute Window of validity ○ The minimum time to compromise a cryptographic algorithm Three Types of Cryptography: ○ Cryptographic Hash Functions: Zero Keys One way function: given a hash function, it should be difficult to find x. The reverse of the hash function should be difficult to compute, but it should be easy to compute the other way. Weak collision resistance: given a value and a hash function, it should be difficult to find another value that produces the same hash. Strong collision resistance: given a hash function, it should be difficult to find two values that produce the same hash. Eg. SHA-256 ○ Secret-Key Functions: One Key Symmetric Key Crypto: One key is shared between encryption and decryption Encryption with shared key, decryption with shared key ○ Public-key functions: two keys Two keys: Public Key: can be known to anyone, used to encrypt and verify signatures Private Key: should be only known to the owner of the key, used to decrypt and sign signatures RSA Applications: ○ Storing passwords: When checking a password, check hash(typed) == stored hash Salt: Added when hashing. A random number added to input, stored together with hash. Avoids pre-computation of all possible hashes in rainbow tables ○ Secure-Digest Used to verify a downloaded file Fixed-length that characterizes an arbitrary length message Usually produced by a hash function ○ Digital Signature: Combines public key crypto and hashing Verifies a message or a document is unaltered and produced by a signer Two parties: Signer ○ Computes a message digest: h(m) ○ Encrypts the digest with the private key Signing: The signer is the only one who can do this ○ Sends the message and the signature Verifier ○ Typically the message recipient and verifies that the message was indeed sent by the signer ○ Takes the message and computes a message digest and verifies the equality ○ Digital Certificate: Digital certificates ensure secure communication by proving the authenticity of a public key. HTTP vs HTTPS: HTTPS has encryption ○ Instagram sends its public key inside a digital certificate. ○ Your OS verifies the authenticity of the certificate. ○ Once verified, your browser encrypts data using Instagram's public key, ensuring only Instagram (with the corresponding private key) can decrypt it. How they Work: Certificate Creation: ○ Instagram provides its public key to a trusted Certificate Authority (CA) like VeriSign. ○ The CA creates a document stating, "This public key belongs to Instagram," and signs it with its private key. ○ This signed document is the digital certificate. Certificate Verification: ○ Instagram sends this certificate to your browser. ○ The certificate contains the CA's signature. ○ To verify the signature, your OS uses the CA's public key, which is pre-installed in the OS (e.g., VeriSign’s public key). ○ If verified, you know Instagram’s public key is legitimate. Chain of Trust and the Root of Trust ○ We rely on the chain of trust In order to trust the public key sent by Instagram, we need to trust VeriSign’s signature In order to trust VeriSign’s signature we need to trust VeriSign’s public key shipped with our OS In order to trust VeriSign’s public key, we need to trust that our OS is not compromised RPC (Remote Procedure Call): Appears as if the programmer were calling a local function Mechanism to enable function calls between different processes RPC System: ○ IDL: Interface Definition Language Used to define a server function and data structures for parameters and return values IDL File that contains function definitions and such -> IDL Compiler -> Creates the Client and Server Library ○ Generates a Client and Server Stub ○ Client Stub: Client-side library that contains a function to be calls. The client will call on this function to communicate with the server. It will then use the socket API to communicate with the server. ○ Server Stub: Server-side library that communicates with a client using the socket API. Also defines a function to be executed When a new client requests comes in through the socket API, the server stub makes a call to execute it. ○ Marshalling/Unmarshalling: Take parameters as they are and transform them into a byte stream so it can send it to the client/server. Same as serialization and deserialization. Marshalling: client -> server. Turning data into byte stream Unmarshalling: server takes in what it got and transfers to the data structure Invocation Semantics: ○ Local Calls: exactly once, you call it, you execute it, get response, end. ○ Remote Call: Possible Failures: Lost request, Lost Reply, Crash before execution and reply, Client Crash Client Side Failures: Restransmit the request when a response is not received Server-side Failures: Re-execution: simply undo operation, but risk duplicate processing Duplicate Filtering: ○ Ensures no duplicate actions, requiring more implementation ○ Types of Invocation Semantics: Depending on which server-side mechanism is used. Maybe: An RPC call may be invoked at the server side. This is when no fault-tolerance mechanisms are used. At-most-once: An RPC call will be invoked at most once. This is when (client-side) request retransmission and (server-side) duplicate filtering are combined. Zero-or-more: An RPC call will be invoked zero or more times. This is when (client-side) request retransmission and (server-side) re-execution are combined. RPC is ideal for idempotent functions. An idempotent function is a function that (i) produces the same output even if it is called multiple times, given the input is the same, and (ii) doesn't alter the system state after the first execution. ○ For RPC, an idempotent function is safe even with re-executions and there are no side effects.