Podcast
Questions and Answers
What is the primary motivation for using condition variables?
What is the primary motivation for using condition variables?
What is a common issue with using mutexes to synchronize access to a global state?
What is a common issue with using mutexes to synchronize access to a global state?
What is the purpose of a condition variable in synchronization?
What is the purpose of a condition variable in synchronization?
Why is active waiting not a suitable solution for synchronization?
Why is active waiting not a suitable solution for synchronization?
Signup and view all the answers
What happens when the counter reaches the thread count in the barrier pseudocode?
What happens when the counter reaches the thread count in the barrier pseudocode?
Signup and view all the answers
What is necessary to associate with a condition variable?
What is necessary to associate with a condition variable?
Signup and view all the answers
What problem can condition variables be used to solve?
What problem can condition variables be used to solve?
Signup and view all the answers
What is the purpose of the 'notEmptyforConsumer' condition variable in the producer-consumer example?
What is the purpose of the 'notEmptyforConsumer' condition variable in the producer-consumer example?
Signup and view all the answers
What happens when a producer thread finds the buffer full in the producer-consumer example?
What happens when a producer thread finds the buffer full in the producer-consumer example?
Signup and view all the answers
What is the purpose of the mutex in the producer-consumer example?
What is the purpose of the mutex in the producer-consumer example?
Signup and view all the answers
What is the role of the 'cond_wait' function in the barrier pseudocode?
What is the role of the 'cond_wait' function in the barrier pseudocode?
Signup and view all the answers
What happens when a consumer thread finds the buffer empty in the producer-consumer example?
What happens when a consumer thread finds the buffer empty in the producer-consumer example?
Signup and view all the answers
What is the purpose of the 'cond_broadcast' function in the barrier pseudocode?
What is the purpose of the 'cond_broadcast' function in the barrier pseudocode?
Signup and view all the answers
What is the primary goal of the Mutual Exclusion mechanism in thread synchronization?
What is the primary goal of the Mutual Exclusion mechanism in thread synchronization?
Signup and view all the answers
In the Producer-Consumer problem, what is the main condition for the Consumer thread to consume resources?
In the Producer-Consumer problem, what is the main condition for the Consumer thread to consume resources?
Signup and view all the answers
In the Reader-Writer problem, what is the restriction when a Writer thread wants to alter the data?
In the Reader-Writer problem, what is the restriction when a Writer thread wants to alter the data?
Signup and view all the answers
What is the purpose of the barrier() function in thread synchronization?
What is the purpose of the barrier() function in thread synchronization?
Signup and view all the answers
In the Producer-Consumer problem, what happens when the shared buffer is full and the Producer thread wants to produce more resources?
In the Producer-Consumer problem, what happens when the shared buffer is full and the Producer thread wants to produce more resources?
Signup and view all the answers
What is a key feature of the Mutual Exclusion mechanism?
What is a key feature of the Mutual Exclusion mechanism?
Signup and view all the answers
In the Reader-Writer problem, what is the main advantage of allowing multiple Reader threads to access the data concurrently?
In the Reader-Writer problem, what is the main advantage of allowing multiple Reader threads to access the data concurrently?
Signup and view all the answers
What is the main difference between the Producer-Consumer problem and the Reader-Writer problem?
What is the main difference between the Producer-Consumer problem and the Reader-Writer problem?
Signup and view all the answers
What is the primary issue with using active waiting to synchronize access to a global state?
What is the primary issue with using active waiting to synchronize access to a global state?
Signup and view all the answers
In the context of the Producer-Consumer problem, what is the purpose of the mutex?
In the context of the Producer-Consumer problem, what is the purpose of the mutex?
Signup and view all the answers
What is a key feature of barrier synchronization?
What is a key feature of barrier synchronization?
Signup and view all the answers
What is the main advantage of using condition variables over active waiting?
What is the main advantage of using condition variables over active waiting?
Signup and view all the answers
In the context of the Reader-Writer problem, what is the main restriction when a Writer thread wants to alter the data?
In the context of the Reader-Writer problem, what is the main restriction when a Writer thread wants to alter the data?
Signup and view all the answers
What is the purpose of associating a mutex with a condition variable?
What is the purpose of associating a mutex with a condition variable?
Signup and view all the answers
What is the purpose of the pthread_cond_wait function?
What is the purpose of the pthread_cond_wait function?
Signup and view all the answers
What happens when a thread invokes the pthread_cond_signal function?
What happens when a thread invokes the pthread_cond_signal function?
Signup and view all the answers
What is the purpose of the pthread_cond_broadcast function?
What is the purpose of the pthread_cond_broadcast function?
Signup and view all the answers
What is the type of auxiliary structure used for condition variables?
What is the type of auxiliary structure used for condition variables?
Signup and view all the answers
What is the purpose of the mutex in the producer-consumer problem?
What is the purpose of the mutex in the producer-consumer problem?
Signup and view all the answers
What problem can be solved using condition variables and mutexes?
What problem can be solved using condition variables and mutexes?
Signup and view all the answers
In the barrier pseudocode, what happens when the counter reaches zero?
In the barrier pseudocode, what happens when the counter reaches zero?
Signup and view all the answers
What is the purpose of the 'notFullforProducer' condition variable in the producer-consumer example?
What is the purpose of the 'notFullforProducer' condition variable in the producer-consumer example?
Signup and view all the answers
In the producer-consumer problem, what happens when a consumer thread finds the buffer empty?
In the producer-consumer problem, what happens when a consumer thread finds the buffer empty?
Signup and view all the answers
What is the main difference between the producer-consumer problem and the reader-writer problem?
What is the main difference between the producer-consumer problem and the reader-writer problem?
Signup and view all the answers
In the barrier pseudocode, what is the purpose of the 'cond_wait' function?
In the barrier pseudocode, what is the purpose of the 'cond_wait' function?
Signup and view all the answers
What is the purpose of the mutex in the producer-consumer example?
What is the purpose of the mutex in the producer-consumer example?
Signup and view all the answers
In the reader-writer problem, what is the main advantage of allowing multiple reader threads to access the data concurrently?
In the reader-writer problem, what is the main advantage of allowing multiple reader threads to access the data concurrently?
Signup and view all the answers
What is the primary goal of the mutual exclusion mechanism in thread synchronization?
What is the primary goal of the mutual exclusion mechanism in thread synchronization?
Signup and view all the answers
What is the purpose of the MPI_Init function?
What is the purpose of the MPI_Init function?
Signup and view all the answers
What is the purpose of the MPI_Comm_rank function?
What is the purpose of the MPI_Comm_rank function?
Signup and view all the answers
What is the purpose of the MPI_Datatype argument in the MPI_Send function?
What is the purpose of the MPI_Datatype argument in the MPI_Send function?
Signup and view all the answers
What is the purpose of the mpirun command?
What is the purpose of the mpirun command?
Signup and view all the answers
What is the purpose of the MPI_Recv function?
What is the purpose of the MPI_Recv function?
Signup and view all the answers
What is the purpose of the MPI_Status argument in the MPI_Recv function?
What is the purpose of the MPI_Status argument in the MPI_Recv function?
Signup and view all the answers
What is the purpose of the MPI_Comm_size function?
What is the purpose of the MPI_Comm_size function?
Signup and view all the answers
What is the purpose of the --use-hwthread-cpus option in the mpirun command?
What is the purpose of the --use-hwthread-cpus option in the mpirun command?
Signup and view all the answers
What is the purpose of the MPI_Get_count function?
What is the purpose of the MPI_Get_count function?
Signup and view all the answers
What is the difference between the Send function and the Ssend function?
What is the difference between the Send function and the Ssend function?
Signup and view all the answers
What is necessary for a message communication to be successful?
What is necessary for a message communication to be successful?
Signup and view all the answers
What is the primary motivation for using collectives?
What is the primary motivation for using collectives?
Signup and view all the answers
What is the purpose of the MPI_Datatype parameter in the MPI_Get_count function?
What is the purpose of the MPI_Datatype parameter in the MPI_Get_count function?
Signup and view all the answers
What happens if the next message received does not match the reception parameters?
What happens if the next message received does not match the reception parameters?
Signup and view all the answers
What is the behavior of the Recv function?
What is the behavior of the Recv function?
Signup and view all the answers
What is the purpose of the status parameter in the MPI_Get_count function?
What is the purpose of the status parameter in the MPI_Get_count function?
Signup and view all the answers
What is the purpose of the MPI_Init function?
What is the purpose of the MPI_Init function?
Signup and view all the answers
What is the purpose of the MPI_Comm_rank function?
What is the purpose of the MPI_Comm_rank function?
Signup and view all the answers
What is the purpose of the Open MPI library?
What is the purpose of the Open MPI library?
Signup and view all the answers
What is the purpose of the MPI_Finalize function?
What is the purpose of the MPI_Finalize function?
Signup and view all the answers
What is the purpose of the mpirun script?
What is the purpose of the mpirun script?
Signup and view all the answers
What is the purpose of the MPI_Comm_size function?
What is the purpose of the MPI_Comm_size function?
Signup and view all the answers
What is the purpose of the #include header file?
What is the purpose of the #include header file?
Signup and view all the answers
What is the purpose of the MPI_COMM_WORLD constant?
What is the purpose of the MPI_COMM_WORLD constant?
Signup and view all the answers
What happens when a kernel is launched in CUDA?
What happens when a kernel is launched in CUDA?
Signup and view all the answers
What type of memory does the host have in the CUDA memory model?
What type of memory does the host have in the CUDA memory model?
Signup and view all the answers
What is the purpose of blockIdx and threadIdx?
What is the purpose of blockIdx and threadIdx?
Signup and view all the answers
How are blocks scheduled in a grid?
How are blocks scheduled in a grid?
Signup and view all the answers
What function is used to allocate memory in the device's global memory?
What function is used to allocate memory in the device's global memory?
Signup and view all the answers
What is the purpose of the cudaMemcpy function?
What is the purpose of the cudaMemcpy function?
Signup and view all the answers
What is the primary advantage of using GPUs for parallel computing?
What is the primary advantage of using GPUs for parallel computing?
Signup and view all the answers
What is a warp?
What is a warp?
Signup and view all the answers
How many threads can be in a block?
How many threads can be in a block?
Signup and view all the answers
What is the direction of data transfer when cudaMemcpyHostToDevice is used?
What is the direction of data transfer when cudaMemcpyHostToDevice is used?
Signup and view all the answers
What is the role of the CUDA compiler in the CUDA programming architecture?
What is the role of the CUDA compiler in the CUDA programming architecture?
Signup and view all the answers
What is a CUDA kernel?
What is a CUDA kernel?
Signup and view all the answers
What is the purpose of gridDim?
What is the purpose of gridDim?
Signup and view all the answers
Where is local thread memory physically stored?
Where is local thread memory physically stored?
Signup and view all the answers
How are threads organized in CUDA?
How are threads organized in CUDA?
Signup and view all the answers
What is the purpose of the global tag in a CUDA kernel function?
What is the purpose of the global tag in a CUDA kernel function?
Signup and view all the answers
What is the storage type of automatic scalar variables in CUDA?
What is the storage type of automatic scalar variables in CUDA?
Signup and view all the answers
What is the significance of the number of Streaming Multiprocessors (SM)?
What is the significance of the number of Streaming Multiprocessors (SM)?
Signup and view all the answers
What is the result of compiling a CUDA program?
What is the result of compiling a CUDA program?
Signup and view all the answers
What is the difference between the host and device in a CUDA program?
What is the difference between the host and device in a CUDA program?
Signup and view all the answers
What is the purpose of launching a kernel function on the device?
What is the purpose of launching a kernel function on the device?
Signup and view all the answers
What is a key feature of GPUs that makes them suitable for parallel computing?
What is a key feature of GPUs that makes them suitable for parallel computing?
Signup and view all the answers
What is the primary advantage of using OpenMP over Pthreads?
What is the primary advantage of using OpenMP over Pthreads?
Signup and view all the answers
What is the purpose of the #pragma omp parallel
directive?
What is the purpose of the #pragma omp parallel
directive?
Signup and view all the answers
What is the purpose of the -fopenmp
compiler option?
What is the purpose of the -fopenmp
compiler option?
Signup and view all the answers
What is the purpose of the omp_get_thread_num()
function?
What is the purpose of the omp_get_thread_num()
function?
Signup and view all the answers
What is the purpose of the num_threads
clause in the #pragma omp parallel
directive?
What is the purpose of the num_threads
clause in the #pragma omp parallel
directive?
Signup and view all the answers
What happens when the code following the #pragma omp parallel
directive is finished?
What happens when the code following the #pragma omp parallel
directive is finished?
Signup and view all the answers
What is the purpose of the omp.h
header file?
What is the purpose of the omp.h
header file?
Signup and view all the answers
What is the purpose of the #ifdef _OPENMP
pre-compiler directive?
What is the purpose of the #ifdef _OPENMP
pre-compiler directive?
Signup and view all the answers
What is the primary operating system assumed by the Pthreads library?
What is the primary operating system assumed by the Pthreads library?
Signup and view all the answers
What is the purpose of the pthread_create function?
What is the purpose of the pthread_create function?
Signup and view all the answers
What is the purpose of the linker option '-lpthread'?
What is the purpose of the linker option '-lpthread'?
Signup and view all the answers
What is the type of the 'start_routine' argument in the pthread_create function?
What is the type of the 'start_routine' argument in the pthread_create function?
Signup and view all the answers
What is the purpose of the pthread_join function?
What is the purpose of the pthread_join function?
Signup and view all the answers
What is the purpose of the 'attr_p' argument in the pthread_create function?
What is the purpose of the 'attr_p' argument in the pthread_create function?
Signup and view all the answers
What is the purpose of the 'thread_p' argument in the pthread_create function?
What is the purpose of the 'thread_p' argument in the pthread_create function?
Signup and view all the answers
What is the return type of the 'start_routine' function?
What is the return type of the 'start_routine' function?
Signup and view all the answers
What is the primary purpose of the mutual exclusion mechanism in thread synchronization?
What is the primary purpose of the mutual exclusion mechanism in thread synchronization?
Signup and view all the answers
What is a critical section in the context of threads?
What is a critical section in the context of threads?
Signup and view all the answers
Why is it necessary to use mutual exclusion mechanisms in parallel programming?
Why is it necessary to use mutual exclusion mechanisms in parallel programming?
Signup and view all the answers
What is a race condition in the context of threads?
What is a race condition in the context of threads?
Signup and view all the answers
What is the purpose of implementing a parallel version of a program?
What is the purpose of implementing a parallel version of a program?
Signup and view all the answers
What is the expected performance improvement of a parallel program with n threads compared to its sequential version?
What is the expected performance improvement of a parallel program with n threads compared to its sequential version?
Signup and view all the answers
What is the primary motivation for using mutual exclusion mechanisms in parallel programming?
What is the primary motivation for using mutual exclusion mechanisms in parallel programming?
Signup and view all the answers
What is the purpose of implementing a critical section in a parallel program?
What is the purpose of implementing a critical section in a parallel program?
Signup and view all the answers
What is the primary purpose of the reduction clause in OpenMP?
What is the primary purpose of the reduction clause in OpenMP?
Signup and view all the answers
What is a characteristic of shared variables in OpenMP?
What is a characteristic of shared variables in OpenMP?
Signup and view all the answers
What is the purpose of the critical directive in OpenMP?
What is the purpose of the critical directive in OpenMP?
Signup and view all the answers
What happens to local variables created within a parallel section in OpenMP?
What happens to local variables created within a parallel section in OpenMP?
Signup and view all the answers
What is a disadvantage of using critical sections in OpenMP?
What is a disadvantage of using critical sections in OpenMP?
Signup and view all the answers
What is a characteristic of private variables in OpenMP?
What is a characteristic of private variables in OpenMP?
Signup and view all the answers
What is the purpose of the reduction clause in OpenMP, in terms of synchronization?
What is the purpose of the reduction clause in OpenMP, in terms of synchronization?
Signup and view all the answers
What is the difference between a shared variable and a private variable in OpenMP?
What is the difference between a shared variable and a private variable in OpenMP?
Signup and view all the answers
Study Notes
Thread Execution Ordering
- Threads have an implicit execution order, even though they are concurrent
- Mutual Exclusion mechanism guarantees that only one thread executes code that manipulates the same resource at a time
- Typical cases of synchronization:
- Producer - Consumer
- Reader - Writer
- Barrier
Producer-Consumer
- Problem consists of two types of threads: Producer and Consumer
- Producer produces resources and stores them in a shared buffer
- Consumer consumes produced resources from the shared buffer
- Conditions:
- Only consumed resources already produced can be consumed
- Production capacity depends on the availability of storing produced resources
- Example: Unbounded Buffer
- Producer generates items for a buffer with limited size
- Consumer extracts items from the buffer
- Cases:
- Consumer can only execute when there is at least 1 item in the list
- Producer can only execute if the list has space available for 1 more item
Producer-Consumer Pseudo-code
- uses mutex_t mutex, cond_t notEmptyforConsumer, and cond_t notFullforProducer
- variables: buffer, firstptr, lastptr, count, and buf_size
- produce function:
- waits for buffer to have space available
- adds item to buffer
- signals notEmptyforConsumer
- consume function:
- waits for buffer to have at least 1 item
- removes item from buffer
- signals notFullforProducer
Reader-Writer
- Problem with threads of two types: Readers and Writers
- Restrictions:
- several readers can access data concurrently without creating inconsistencies
- when a writer wants to alter the data, there can be no simultaneous access
Barrier
- Problem of synchronization between threads
- Threads invoke barrier() function, which blocks execution until all threads are in this position
- When all threads execute this function, it terminates and unblocks for all threads
- Acts as a synchronization point in the execution of each thread
Barrier Example using Condition Variables
- uses mutex_t mutex and cond_t cond_var
- Pseudocode:
- increments counter
- if counter == thread_count, signals all threads and resets counter
- else, waits for broadcast()
Condition Variables
- allows threads to suspend execution until a certain event or condition occurs
- when the event occurs, signals locked threads to continue execution
- must be associated with a mutual exclusion (mutex) mechanism
- solves the problem of active waiting and blocking threads when checking for a global state condition
Motivation for Condition Variables
- Sometimes, threads need to check that a global condition is fulfilled.
- Access to a common global state is necessary, but it also requires waiting for the state to change by another thread.
- This cannot be done using only Mutex, as it would block other threads from changing the global state.
Condition Variables
- A condition variable is a structure that allows threads to suspend execution until a certain event occurs.
- It must be associated with a mutual exclusion (mutex) mechanism.
- When the event occurs, a signalling mechanism "wakes up" the locked threads to continue execution.
Condition Variables API
-
pthread_cond_wait(cv, mt)
: passive waiting for the event to occur. -
pthread_cond_signal(cv)
: signalling to wake up one locked thread. -
pthread_cond_broadcast(cv)
: signalling to wake up all locked threads. -
pthread_cond_init()
: creation of a condition variable. -
pthread_cond_destroy()
: destruction of a condition variable. -
pthread_cond_t
: auxiliary structure type.
Condition Variables Usage Pattern
- Problem: synchronization between threads without active waiting.
- Threads invoke
wait()
function, which only returns when the event is signalled. - All threads update global state (with mutex) and block.
- The last thread detecting the event signals to all using
broadcast()
.
Barrier Example using Condition Variables
- Pseudocode:
- Mutex lock
- Counter increment
- If counter equals thread count, reset counter and broadcast
- Else, wait on condition variable
- Mutex unlock
Producer-Consumer Example
- Unbounded buffer example:
- Producer generates items for a buffer with limited size.
- Consumer extracts items from the same list.
- Cases:
- Consumer can only execute when there is at least 1 item in the list.
- Producer can only execute if the list has space available for 1 more item.
Producer-Consumer Unbounded Buffer Example (Pseudocode)
-
produce()
function:- Mutex lock
- Wait if buffer is full
- Add item to buffer
- Signal to consumer
- Mutex unlock
-
consume()
function:- Mutex lock
- Wait if buffer is empty
- Extract item from buffer
- Signal to producer
- Mutex unlock
MPI Basics
- MPI stands for Message Passing Interface, a standard for message passing in parallel computing
- MPI is used for high-performance computing in parallel architectures with multiple nodes, each with its own processor and memory
Compilation and Execution
- Compile MPI programs using the
mpicc
wrapper around the system compiler (e.g.,gcc
) - Execute MPI programs with the
mpirun
command, specifying the number of processes (e.g.,mpirun -n 4 ./exemplo
)
Point-to-Point Communication
- Communication between processes occurs through sending and receiving messages
- Each process has a unique rank value
- Processes execute different parts of the same code using
if
statements - Send and receive functions must be aligned in their execution
MPI Send
-
MPI_Send
function sends a message from one process to another - Parameters:
-
buf
: memory pointer to data -
count
: number of elements in the message -
datatype
: type of data sent (MPI constant) -
dest
: rank of the destination process -
tag
: tag (integer value) used to distinguish message channels -
comm
: process group (general:MPI_COMM_WORLD
)
-
MPI Receive
-
MPI_Recv
function receives a message from another process - Parameters:
-
buf
: memory pointer to receive message -
count
: maximum number of possible elements to receive -
datatype
: type of data of the message -
source
: rank of the sender process (general:MPI_ANY_SOURCE
) -
tag
: message tag (general:MPI_ANY_TAG
) -
comm
: set of processes in communication (general:MPI_COMM_WORLD
) -
status
: status of the result of the operation, to be consulted later
-
MPI DataTypes
- Data types in MPI must match between sender and receiver
-
MPI_Get_count
function returns the number of elements received by the last message
Message Exchange Example
- Synchronization model:
-
MPI_Send
can be synchronous or deferred -
MPI_Recv
is always synchronous and blocking
-
-
MPI_Ssend
function is always synchronous and blocking, ensuring the message reaches the destination
Collectives
- Collectives are optimized functions for group communication between all processes
- Motivation: need to exchange messages between all processes, not just two
MPI API Initialization
-
MPI_Init
function must be invoked before any other MPI function in the program -
MPI_Finalize
function terminates the MPI library in the process -
MPI_Comm_rank
function returns a process identifier within the process set -
MPI_Comm_size
function returns the size of the process set -
MPI_COMM_WORLD
is a constant representing the set of all processes in an execution
Parallel Programming on GPU
- GPUs are capable of massive data parallelism through SIMD (Single Instruction, Multiple Data)
- Compared to CPUs, GPUs have more parallel ALU units, focused on arithmetic data operations
- GPUs are not suitable for task parallelism, i.e., running different operations concurrently
CUDA Programming Architecture
- CUDA program integrates two types of processors: host (CPU) and device (GPU)
- Application code includes host and device codes
- Device variables and code are marked with keywords
- CUDA annotated program is not a valid C program
- CUDA compiler (nvcc) takes .cu input program and outputs:
- Standard C host-only source for host compiler
- Device code for GPU compiler
CUDA Kernel and Thread
- Kernel represents the device function that will be launched into the GPU on multiple threads
- Kernel is described as a standard C/C++ function with global tag
- Kernel function is specified as C code, returns void, and parameters serve as input and output
- Local variables are independent for each thread
Kernel Launch
- Execution of a kernel function on the device is named "kernel launch"
- Host starts the launch, kernel specifies the code for a single thread
- Launching a kernel specifies the number of threads to be used
Thread Hierarchy
- Threads are organized in a double hierarchy: grid and block
- When launching a kernel, a grid is created
- A grid can have multiple blocks of threads (up to 3 dimensions)
- A block can have multiple threads (up to 3 dimensions)
- Example: launching a grid with 4 blocks, each with 32 threads
Thread Identification
- Each thread has available the following private variables:
- gridDim: grid dimension (x, y, z coordinates)
- blockDim: block dimension (x, y, z coordinates)
- blockIdx: block identification (per thread)
- threadIdx: thread identification
Block-oriented Kernel Scheduling
- Scheduling of kernel within a grid is block-oriented
- Blocks are required to run independent from each other
- Device can schedule them in any order
- Depending on hardware resources, multiple blocks can run at a time
- SM (Streaming Multiprocessor) schedules a group of threads of a block in parallel, called a warp
- Warps have a size of a multiple of 32 threads, depending on the hardware
CUDA Memory Model
- Host has its own RAM: host memory
- Device has its own RAM: global memory
- Data transfer is necessary between these memories:
- Initially from host to device (with input data)
- In the end, from device to host (with the result)
Device Memory Initialisation & Transfer
- Memory in the device Global Memory needs to be allocated from the host
- cudaMalloc() and cudaFree() are used for allocation and deallocation
- cudaMemcpy() is used for data transfer between host and device
- Direction of transfer can be specified: cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost
Cuda Device Memory Layout
- Specification of the storage type, scope, and lifetime of device variables is defined by tag
- Automatic scalar (non-array) vars are register-based
- Local thread memory is physically on Global Memory
OpenMP Motivation
- A library for developing parallel applications using shared memory
- Provides high-level abstraction for adapting sequential applications in a simple way
- Alternative to Pthreads, which has a more complex API
Pragma Mechanism in C
- OMP commands are based on pragmas
- Pragmas allow adding functionality to a C program without affecting its generic compilation
- The
#pragma omp
directive is used - Compilers that support these directives make use of them, while those that don't ignore them and compile the program as usual
- Correctly coded OpenMP programs always compile on any platform, even without support
Compiler Extensions for OpenMP
- Auxiliary Library "omp.h" is included with
#include
- Pre-compiler directive to validate support:
# ifdef _OPENMP ... #else ... #endif
Compiling and Executing OpenMP Programs
- Compiling option:
-fopenmp
- Example:
gcc -fopenmp hello_omp.c -o hello_omp
- Linker option:
-lomp
(optional if including-f
)
Parallel Directive
-
#pragma omp parallel
creates multiple threads of execution in the same process - Each thread executes the code immediately following the pragma
- When the code is finished, all threads wait for the completion of the remaining threads
- Example code and result:
- Code:
#pragma omp parallel printf("Esta é a thread %d, num threads %d\n", omp_get_thread_num(), omp_get_num_threads()); printf("Fim\n");
- Result:
Esta é a thread 2, num threads 3 Esta é a thread 1, num threads 3 Esta é a thread 0, num threads 3 Fim
- Code:
numthreads Option
-
#pragma omp parallel num_threads(thrcnt)
defines the number of threads to parallelize the application - If not specified, the number of threads created is set by the system during execution (e.g., number of cores)
Pthreads Overview
- Pthreads is a library for developing parallel applications using shared memory.
- It assumes a POSIX-compliant operating system as its base.
- The library can be embedded in any programming language, usually C.
Compilation and Execution of Pthreads Programs
- To compile a Pthreads program, include the library headers using
#include
. - Use the linker option
-lpthread
to link the pthread library. - Example compilation command:
gcc -lpthread hello.c -o hello
.
Pthread API to Create and Join Threads
-
pthread_create
function is used to create a new thread. - Syntax:
pthread_create(pthread_t* thread_p, const pthread_attr_t* attr_p, void* (*start_routine)(void*), void* arg_p)
. -
thread_p
: thread object reference. -
attr_p
: creation attributes, usually set toNULL
. -
start_routine
: function to execute in the new thread. -
arg_p
: function argument. -
pthread_join
function is used to wait for a thread to finish. - Syntax:
pthread_join(pthread_t thread, void** ret_val_p)
.
Example Incremental Application
- The example application demonstrates parallel processing using multiple threads.
- Global variables:
n
(number of iterations),thread_count
(number of threads), andsum
(global sum value). - The
Increment
function is the thread operation that each thread executes. - Each thread calculates a portion of the total sum based on its rank and the number of iterations.
Parallel Programming Challenge
- Develop a program that increments a global variable N times
- Sequential version uses a single thread, while parallel version uses more than 1 thread to increment a global counter variable
- Correctness: the result of both applications must always be the same
- Performance: the parallel version is expected to be n times faster for n threads
Sequential Version
- Implement code that increments a global variable
- Consider parameter #iterations
- Time measurement serves for comparison
Parallel Version
- Implement parallel version that makes use of threads
- Allow configuring the number of threads and distribute the cycle size among the threads
Race Condition and Critical Section
- A race condition arises when several threads try to access the same shared variable at the same time
- Characteristics of the hardware can cause inconsistencies in the values of shared variables
- A critical section is the code executed by several threads that accesses a shared variable and must be "protected" from simultaneous access
Mutual Exclusion Mechanism
- Mechanisms are needed to ensure that only one thread can access the same resources shared between them
- Mutual exclusion mechanisms prevent multiple threads from executing critical sections simultaneously
Solution to Get it Correct in the Parallel Version
- Use mutual exclusion mechanism to prevent threads from freely accessing a shared variable
- Identify:
- Shared variable
- Thread code that accesses the variable
- Rotate this code by a mutual exclusion mechanism
Mutual Exclusion Mechanisms in OpenMP
- Variable Visibility: shared, private
- Shared variables are accessed by all threads, while private variables are internal to each thread and only visible to the thread itself
- Changes to shared variables are visible to all other threads
- Private variables have a different value for each thread
Critical Directive
-
#pragma omp critical
directive indicates a "critical section" and is protected by a mutual exclusion mechanism between all threads - It ensures that only one thread can execute the code at a time and not simultaneously
Accumulation of Results in Shared Global Variable
- Using the critical clause guarantees correctness but creates a serialization of execution, losing the advantage of parallelism
- Critical sections can collect local results to be "accumulated" into a global value
Reduction Clause
-
reduction(: )
clause allows using a shared global variable as if it were private during thread execution - At the end of execution, private values of each thread are accumulated into the global variable
- Equivalent to using an auxiliary private variable to accumulate internal values and then integrating them into the shared global value
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Quiz on thread synchronization problems in high performance computing, covering mutual exclusion mechanism and thread execution ordering. For Master in Applied Artificial Intelligence students.