Thread Synchronization in High Performance Computing
119 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary motivation for using condition variables?

  • To synchronize access to a global state without active waiting or blocking threads (correct)
  • To implement barrier synchronization
  • To implement mutual exclusion mechanism
  • To solve the producer-consumer problem
  • What is a common issue with using mutexes to synchronize access to a global state?

  • It is used to solve the reader-writer problem
  • It is not suitable for implementing barrier synchronization
  • It blocks other threads from changing the state (correct)
  • It allows multiple threads to access the state simultaneously
  • What is the purpose of a condition variable in synchronization?

  • To block threads until a certain condition is met (correct)
  • To solve the producer-consumer problem
  • To enable active waiting
  • To implement a mutual exclusion mechanism
  • Why is active waiting not a suitable solution for synchronization?

    <p>It consumes too much CPU resources</p> Signup and view all the answers

    What happens when the counter reaches the thread count in the barrier pseudocode?

    <p>The condition variable is signaled.</p> Signup and view all the answers

    What is necessary to associate with a condition variable?

    <p>A mutual exclusion mechanism (mutex)</p> Signup and view all the answers

    What problem can condition variables be used to solve?

    <p>All of the above</p> Signup and view all the answers

    What is the purpose of the 'notEmptyforConsumer' condition variable in the producer-consumer example?

    <p>To signal the consumer that the buffer is not empty.</p> Signup and view all the answers

    What happens when a producer thread finds the buffer full in the producer-consumer example?

    <p>It waits on the 'notFullforProducer' condition variable.</p> Signup and view all the answers

    What is the purpose of the mutex in the producer-consumer example?

    <p>To protect the buffer from concurrent access.</p> Signup and view all the answers

    What is the role of the 'cond_wait' function in the barrier pseudocode?

    <p>To wait on the condition variable until it is signaled.</p> Signup and view all the answers

    What happens when a consumer thread finds the buffer empty in the producer-consumer example?

    <p>It waits on the 'notEmptyforConsumer' condition variable.</p> Signup and view all the answers

    What is the purpose of the 'cond_broadcast' function in the barrier pseudocode?

    <p>To signal all threads to proceed.</p> Signup and view all the answers

    What is the primary goal of the Mutual Exclusion mechanism in thread synchronization?

    <p>To guarantee that only one thread executes code that manipulates a shared resource at a time</p> Signup and view all the answers

    In the Producer-Consumer problem, what is the main condition for the Consumer thread to consume resources?

    <p>Resources can only be consumed if they have already been produced</p> Signup and view all the answers

    In the Reader-Writer problem, what is the restriction when a Writer thread wants to alter the data?

    <p>No other threads can access the data, reading or writing</p> Signup and view all the answers

    What is the purpose of the barrier() function in thread synchronization?

    <p>To act as a synchronization point in the execution of each thread</p> Signup and view all the answers

    In the Producer-Consumer problem, what happens when the shared buffer is full and the Producer thread wants to produce more resources?

    <p>The Producer thread waits until the shared buffer has available space</p> Signup and view all the answers

    What is a key feature of the Mutual Exclusion mechanism?

    <p>It ensures that only one thread executes code that manipulates a shared resource at a time</p> Signup and view all the answers

    In the Reader-Writer problem, what is the main advantage of allowing multiple Reader threads to access the data concurrently?

    <p>It increases the throughput of the system</p> Signup and view all the answers

    What is the main difference between the Producer-Consumer problem and the Reader-Writer problem?

    <p>The restrictions on thread access to shared resources</p> Signup and view all the answers

    What is the primary issue with using active waiting to synchronize access to a global state?

    <p>It can cause the thread to occupy the CPU unnecessarily.</p> Signup and view all the answers

    In the context of the Producer-Consumer problem, what is the purpose of the mutex?

    <p>To synchronize access to the shared buffer.</p> Signup and view all the answers

    What is a key feature of barrier synchronization?

    <p>It ensures that all threads reach a certain point before continuing execution.</p> Signup and view all the answers

    What is the main advantage of using condition variables over active waiting?

    <p>It enables threads to wait for a condition to be met without occupying the CPU.</p> Signup and view all the answers

    In the context of the Reader-Writer problem, what is the main restriction when a Writer thread wants to alter the data?

    <p>No Reader threads can access the data.</p> Signup and view all the answers

    What is the purpose of associating a mutex with a condition variable?

    <p>To synchronize access to the condition variable.</p> Signup and view all the answers

    What is the purpose of the pthread_cond_wait function?

    <p>To lock a mutex and wait for a signal on a condition variable</p> Signup and view all the answers

    What happens when a thread invokes the pthread_cond_signal function?

    <p>Only one waiting thread is awakened</p> Signup and view all the answers

    What is the purpose of the pthread_cond_broadcast function?

    <p>To awaken all waiting threads</p> Signup and view all the answers

    What is the type of auxiliary structure used for condition variables?

    <p>pthread_cond_t</p> Signup and view all the answers

    What is the purpose of the mutex in the producer-consumer problem?

    <p>To synchronize access to the shared buffer</p> Signup and view all the answers

    What problem can be solved using condition variables and mutexes?

    <p>Synchronization between threads</p> Signup and view all the answers

    In the barrier pseudocode, what happens when the counter reaches zero?

    <p>The barrier is reset and the threads proceed</p> Signup and view all the answers

    What is the purpose of the 'notFullforProducer' condition variable in the producer-consumer example?

    <p>To prevent the producer thread from producing when the buffer is full</p> Signup and view all the answers

    In the producer-consumer problem, what happens when a consumer thread finds the buffer empty?

    <p>The consumer thread waits on the 'notEmptyforConsumer' condition variable</p> Signup and view all the answers

    What is the main difference between the producer-consumer problem and the reader-writer problem?

    <p>The access pattern to the shared resource</p> Signup and view all the answers

    In the barrier pseudocode, what is the purpose of the 'cond_wait' function?

    <p>To release the mutex and wait for the condition variable</p> Signup and view all the answers

    What is the purpose of the mutex in the producer-consumer example?

    <p>To protect the shared buffer from concurrent access</p> Signup and view all the answers

    In the reader-writer problem, what is the main advantage of allowing multiple reader threads to access the data concurrently?

    <p>Reduced contention for the shared resource</p> Signup and view all the answers

    What is the primary goal of the mutual exclusion mechanism in thread synchronization?

    <p>To prevent data corruption and ensure thread safety</p> Signup and view all the answers

    What is the purpose of the MPI_Init function?

    <p>To initialize the MPI environment</p> Signup and view all the answers

    What is the purpose of the MPI_Comm_rank function?

    <p>To get the rank of the process</p> Signup and view all the answers

    What is the purpose of the MPI_Datatype argument in the MPI_Send function?

    <p>To specify the type of data being sent</p> Signup and view all the answers

    What is the purpose of the mpirun command?

    <p>To execute MPI programs</p> Signup and view all the answers

    What is the purpose of the MPI_Recv function?

    <p>To receive a message</p> Signup and view all the answers

    What is the purpose of the MPI_Status argument in the MPI_Recv function?

    <p>To specify the status of the receive operation</p> Signup and view all the answers

    What is the purpose of the MPI_Comm_size function?

    <p>To get the total number of processes</p> Signup and view all the answers

    What is the purpose of the --use-hwthread-cpus option in the mpirun command?

    <p>To use hardware threads as CPU resources</p> Signup and view all the answers

    What is the purpose of the MPI_Get_count function?

    <p>To return the number of elements received by the last message</p> Signup and view all the answers

    What is the difference between the Send function and the Ssend function?

    <p>Send is non-blocking, while Ssend is blocking</p> Signup and view all the answers

    What is necessary for a message communication to be successful?

    <p>The sender and receiver must have symmetry in their function calls</p> Signup and view all the answers

    What is the primary motivation for using collectives?

    <p>To exchange messages between all processes</p> Signup and view all the answers

    What is the purpose of the MPI_Datatype parameter in the MPI_Get_count function?

    <p>To specify the datatype of the message</p> Signup and view all the answers

    What happens if the next message received does not match the reception parameters?

    <p>The program will block</p> Signup and view all the answers

    What is the behavior of the Recv function?

    <p>It is always synchronous</p> Signup and view all the answers

    What is the purpose of the status parameter in the MPI_Get_count function?

    <p>To specify the status of the receive operation</p> Signup and view all the answers

    What is the purpose of the MPI_Init function?

    <p>To initialize the MPI environment and set up the MPI communicator</p> Signup and view all the answers

    What is the purpose of the MPI_Comm_rank function?

    <p>To return a process identifier within the process set</p> Signup and view all the answers

    What is the purpose of the Open MPI library?

    <p>To provide an open-source implementation of the MPI standard</p> Signup and view all the answers

    What is the purpose of the MPI_Finalize function?

    <p>To finalize the MPI library in the process</p> Signup and view all the answers

    What is the purpose of the mpirun script?

    <p>To execute MPI programs</p> Signup and view all the answers

    What is the purpose of the MPI_Comm_size function?

    <p>To return the size of the process set</p> Signup and view all the answers

    What is the purpose of the #include header file?

    <p>To include the MPI API</p> Signup and view all the answers

    What is the purpose of the MPI_COMM_WORLD constant?

    <p>To represent the set of all processes in an execution</p> Signup and view all the answers

    What happens when a kernel is launched in CUDA?

    <p>A grid is created</p> Signup and view all the answers

    What type of memory does the host have in the CUDA memory model?

    <p>Host memory</p> Signup and view all the answers

    What is the purpose of blockIdx and threadIdx?

    <p>To identify a thread within a block and a block within a grid</p> Signup and view all the answers

    How are blocks scheduled in a grid?

    <p>In a parallel order</p> Signup and view all the answers

    What function is used to allocate memory in the device's global memory?

    <p>cudaMalloc</p> Signup and view all the answers

    What is the purpose of the cudaMemcpy function?

    <p>To transfer data between the host and device</p> Signup and view all the answers

    What is the primary advantage of using GPUs for parallel computing?

    <p>Massive data parallelism through SIMD</p> Signup and view all the answers

    What is a warp?

    <p>A group of threads of a block in parallel</p> Signup and view all the answers

    How many threads can be in a block?

    <p>Up to 3 dimensions</p> Signup and view all the answers

    What is the direction of data transfer when cudaMemcpyHostToDevice is used?

    <p>From host to device</p> Signup and view all the answers

    What is the role of the CUDA compiler in the CUDA programming architecture?

    <p>To split the input program into host and device code</p> Signup and view all the answers

    What is a CUDA kernel?

    <p>A device function that will be launched on multiple threads</p> Signup and view all the answers

    What is the purpose of gridDim?

    <p>To specify the grid dimensions</p> Signup and view all the answers

    Where is local thread memory physically stored?

    <p>Global Memory</p> Signup and view all the answers

    How are threads organized in CUDA?

    <p>In a double hierarchy: grid and block</p> Signup and view all the answers

    What is the purpose of the global tag in a CUDA kernel function?

    <p>To mark the function as a kernel function</p> Signup and view all the answers

    What is the storage type of automatic scalar variables in CUDA?

    <p>Registers</p> Signup and view all the answers

    What is the significance of the number of Streaming Multiprocessors (SM)?

    <p>It determines the number of warps that can be scheduled in parallel</p> Signup and view all the answers

    What is the result of compiling a CUDA program?

    <p>Two separate files: one for the host and one for the device</p> Signup and view all the answers

    What is the difference between the host and device in a CUDA program?

    <p>The host is the CPU and the device is the GPU</p> Signup and view all the answers

    What is the purpose of launching a kernel function on the device?

    <p>To launch the function on multiple threads on the device</p> Signup and view all the answers

    What is a key feature of GPUs that makes them suitable for parallel computing?

    <p>More parallel ALU units than CPUs</p> Signup and view all the answers

    What is the primary advantage of using OpenMP over Pthreads?

    <p>Higher level of abstraction</p> Signup and view all the answers

    What is the purpose of the #pragma omp parallel directive?

    <p>To create multiple threads of execution</p> Signup and view all the answers

    What is the purpose of the -fopenmp compiler option?

    <p>To compile OpenMP code</p> Signup and view all the answers

    What is the purpose of the omp_get_thread_num() function?

    <p>To get the current thread number</p> Signup and view all the answers

    What is the purpose of the num_threads clause in the #pragma omp parallel directive?

    <p>To specify the number of threads</p> Signup and view all the answers

    What happens when the code following the #pragma omp parallel directive is finished?

    <p>All threads wait for the completion of the remaining threads</p> Signup and view all the answers

    What is the purpose of the omp.h header file?

    <p>To include OpenMP functionality</p> Signup and view all the answers

    What is the purpose of the #ifdef _OPENMP pre-compiler directive?

    <p>To validate OpenMP support</p> Signup and view all the answers

    What is the primary operating system assumed by the Pthreads library?

    <p>POSIX-compliant</p> Signup and view all the answers

    What is the purpose of the pthread_create function?

    <p>To create a new thread</p> Signup and view all the answers

    What is the purpose of the linker option '-lpthread'?

    <p>To link pthread library to the program</p> Signup and view all the answers

    What is the type of the 'start_routine' argument in the pthread_create function?

    <p>void* (<em>) (void</em>)</p> Signup and view all the answers

    What is the purpose of the pthread_join function?

    <p>To join a thread</p> Signup and view all the answers

    What is the purpose of the 'attr_p' argument in the pthread_create function?

    <p>Creation attributes</p> Signup and view all the answers

    What is the purpose of the 'thread_p' argument in the pthread_create function?

    <p>Thread object reference</p> Signup and view all the answers

    What is the return type of the 'start_routine' function?

    <p>void*</p> Signup and view all the answers

    What is the primary purpose of the mutual exclusion mechanism in thread synchronization?

    <p>To ensure that only one thread can access shared resources</p> Signup and view all the answers

    What is a critical section in the context of threads?

    <p>A section of code that accesses shared variables and must be protected from simultaneous access</p> Signup and view all the answers

    Why is it necessary to use mutual exclusion mechanisms in parallel programming?

    <p>To prevent race conditions</p> Signup and view all the answers

    What is a race condition in the context of threads?

    <p>A situation where multiple threads access shared variables simultaneously</p> Signup and view all the answers

    What is the purpose of implementing a parallel version of a program?

    <p>To reduce the time complexity of a program</p> Signup and view all the answers

    What is the expected performance improvement of a parallel program with n threads compared to its sequential version?

    <p>The parallel version is expected to be n times faster</p> Signup and view all the answers

    What is the primary motivation for using mutual exclusion mechanisms in parallel programming?

    <p>To prevent race conditions and ensure correctness</p> Signup and view all the answers

    What is the purpose of implementing a critical section in a parallel program?

    <p>To prevent race conditions and ensure correctness</p> Signup and view all the answers

    What is the primary purpose of the reduction clause in OpenMP?

    <p>To accumulate local results into a global value</p> Signup and view all the answers

    What is a characteristic of shared variables in OpenMP?

    <p>Changes to the variable are visible to all threads</p> Signup and view all the answers

    What is the purpose of the critical directive in OpenMP?

    <p>To synchronize access to a shared variable</p> Signup and view all the answers

    What happens to local variables created within a parallel section in OpenMP?

    <p>They are only visible to the thread that created them</p> Signup and view all the answers

    What is a disadvantage of using critical sections in OpenMP?

    <p>It serializes the execution of code, losing parallelism</p> Signup and view all the answers

    What is a characteristic of private variables in OpenMP?

    <p>They are only visible to one thread</p> Signup and view all the answers

    What is the purpose of the reduction clause in OpenMP, in terms of synchronization?

    <p>To synchronize access to a shared variable</p> Signup and view all the answers

    What is the difference between a shared variable and a private variable in OpenMP?

    <p>A shared variable is accessible by all threads, while a private variable is only visible to the thread that created it</p> Signup and view all the answers

    Study Notes

    Thread Execution Ordering

    • Threads have an implicit execution order, even though they are concurrent
    • Mutual Exclusion mechanism guarantees that only one thread executes code that manipulates the same resource at a time
    • Typical cases of synchronization:
      • Producer - Consumer
      • Reader - Writer
      • Barrier

    Producer-Consumer

    • Problem consists of two types of threads: Producer and Consumer
    • Producer produces resources and stores them in a shared buffer
    • Consumer consumes produced resources from the shared buffer
    • Conditions:
      • Only consumed resources already produced can be consumed
      • Production capacity depends on the availability of storing produced resources
    • Example: Unbounded Buffer
      • Producer generates items for a buffer with limited size
      • Consumer extracts items from the buffer
      • Cases:
        • Consumer can only execute when there is at least 1 item in the list
        • Producer can only execute if the list has space available for 1 more item

    Producer-Consumer Pseudo-code

    • uses mutex_t mutex, cond_t notEmptyforConsumer, and cond_t notFullforProducer
    • variables: buffer, firstptr, lastptr, count, and buf_size
    • produce function:
      • waits for buffer to have space available
      • adds item to buffer
      • signals notEmptyforConsumer
    • consume function:
      • waits for buffer to have at least 1 item
      • removes item from buffer
      • signals notFullforProducer

    Reader-Writer

    • Problem with threads of two types: Readers and Writers
    • Restrictions:
      • several readers can access data concurrently without creating inconsistencies
      • when a writer wants to alter the data, there can be no simultaneous access

    Barrier

    • Problem of synchronization between threads
    • Threads invoke barrier() function, which blocks execution until all threads are in this position
    • When all threads execute this function, it terminates and unblocks for all threads
    • Acts as a synchronization point in the execution of each thread

    Barrier Example using Condition Variables

    • uses mutex_t mutex and cond_t cond_var
    • Pseudocode:
      • increments counter
      • if counter == thread_count, signals all threads and resets counter
      • else, waits for broadcast()

    Condition Variables

    • allows threads to suspend execution until a certain event or condition occurs
    • when the event occurs, signals locked threads to continue execution
    • must be associated with a mutual exclusion (mutex) mechanism
    • solves the problem of active waiting and blocking threads when checking for a global state condition

    Motivation for Condition Variables

    • Sometimes, threads need to check that a global condition is fulfilled.
    • Access to a common global state is necessary, but it also requires waiting for the state to change by another thread.
    • This cannot be done using only Mutex, as it would block other threads from changing the global state.

    Condition Variables

    • A condition variable is a structure that allows threads to suspend execution until a certain event occurs.
    • It must be associated with a mutual exclusion (mutex) mechanism.
    • When the event occurs, a signalling mechanism "wakes up" the locked threads to continue execution.

    Condition Variables API

    • pthread_cond_wait(cv, mt): passive waiting for the event to occur.
    • pthread_cond_signal(cv): signalling to wake up one locked thread.
    • pthread_cond_broadcast(cv): signalling to wake up all locked threads.
    • pthread_cond_init(): creation of a condition variable.
    • pthread_cond_destroy(): destruction of a condition variable.
    • pthread_cond_t: auxiliary structure type.

    Condition Variables Usage Pattern

    • Problem: synchronization between threads without active waiting.
    • Threads invoke wait() function, which only returns when the event is signalled.
    • All threads update global state (with mutex) and block.
    • The last thread detecting the event signals to all using broadcast().

    Barrier Example using Condition Variables

    • Pseudocode:
      • Mutex lock
      • Counter increment
      • If counter equals thread count, reset counter and broadcast
      • Else, wait on condition variable
      • Mutex unlock

    Producer-Consumer Example

    • Unbounded buffer example:
      • Producer generates items for a buffer with limited size.
      • Consumer extracts items from the same list.
      • Cases:
        • Consumer can only execute when there is at least 1 item in the list.
        • Producer can only execute if the list has space available for 1 more item.

    Producer-Consumer Unbounded Buffer Example (Pseudocode)

    • produce() function:
      • Mutex lock
      • Wait if buffer is full
      • Add item to buffer
      • Signal to consumer
      • Mutex unlock
    • consume() function:
      • Mutex lock
      • Wait if buffer is empty
      • Extract item from buffer
      • Signal to producer
      • Mutex unlock

    MPI Basics

    • MPI stands for Message Passing Interface, a standard for message passing in parallel computing
    • MPI is used for high-performance computing in parallel architectures with multiple nodes, each with its own processor and memory

    Compilation and Execution

    • Compile MPI programs using the mpicc wrapper around the system compiler (e.g., gcc)
    • Execute MPI programs with the mpirun command, specifying the number of processes (e.g., mpirun -n 4 ./exemplo)

    Point-to-Point Communication

    • Communication between processes occurs through sending and receiving messages
    • Each process has a unique rank value
    • Processes execute different parts of the same code using if statements
    • Send and receive functions must be aligned in their execution

    MPI Send

    • MPI_Send function sends a message from one process to another
    • Parameters:
      • buf: memory pointer to data
      • count: number of elements in the message
      • datatype: type of data sent (MPI constant)
      • dest: rank of the destination process
      • tag: tag (integer value) used to distinguish message channels
      • comm: process group (general: MPI_COMM_WORLD)

    MPI Receive

    • MPI_Recv function receives a message from another process
    • Parameters:
      • buf: memory pointer to receive message
      • count: maximum number of possible elements to receive
      • datatype: type of data of the message
      • source: rank of the sender process (general: MPI_ANY_SOURCE)
      • tag: message tag (general: MPI_ANY_TAG)
      • comm: set of processes in communication (general: MPI_COMM_WORLD)
      • status: status of the result of the operation, to be consulted later

    MPI DataTypes

    • Data types in MPI must match between sender and receiver
    • MPI_Get_count function returns the number of elements received by the last message

    Message Exchange Example

    • Synchronization model:
      • MPI_Send can be synchronous or deferred
      • MPI_Recv is always synchronous and blocking
    • MPI_Ssend function is always synchronous and blocking, ensuring the message reaches the destination

    Collectives

    • Collectives are optimized functions for group communication between all processes
    • Motivation: need to exchange messages between all processes, not just two

    MPI API Initialization

    • MPI_Init function must be invoked before any other MPI function in the program
    • MPI_Finalize function terminates the MPI library in the process
    • MPI_Comm_rank function returns a process identifier within the process set
    • MPI_Comm_size function returns the size of the process set
    • MPI_COMM_WORLD is a constant representing the set of all processes in an execution

    Parallel Programming on GPU

    • GPUs are capable of massive data parallelism through SIMD (Single Instruction, Multiple Data)
    • Compared to CPUs, GPUs have more parallel ALU units, focused on arithmetic data operations
    • GPUs are not suitable for task parallelism, i.e., running different operations concurrently

    CUDA Programming Architecture

    • CUDA program integrates two types of processors: host (CPU) and device (GPU)
    • Application code includes host and device codes
    • Device variables and code are marked with keywords
    • CUDA annotated program is not a valid C program
    • CUDA compiler (nvcc) takes .cu input program and outputs:
      • Standard C host-only source for host compiler
      • Device code for GPU compiler

    CUDA Kernel and Thread

    • Kernel represents the device function that will be launched into the GPU on multiple threads
    • Kernel is described as a standard C/C++ function with global tag
    • Kernel function is specified as C code, returns void, and parameters serve as input and output
    • Local variables are independent for each thread

    Kernel Launch

    • Execution of a kernel function on the device is named "kernel launch"
    • Host starts the launch, kernel specifies the code for a single thread
    • Launching a kernel specifies the number of threads to be used

    Thread Hierarchy

    • Threads are organized in a double hierarchy: grid and block
    • When launching a kernel, a grid is created
    • A grid can have multiple blocks of threads (up to 3 dimensions)
    • A block can have multiple threads (up to 3 dimensions)
    • Example: launching a grid with 4 blocks, each with 32 threads

    Thread Identification

    • Each thread has available the following private variables:
      • gridDim: grid dimension (x, y, z coordinates)
      • blockDim: block dimension (x, y, z coordinates)
      • blockIdx: block identification (per thread)
      • threadIdx: thread identification

    Block-oriented Kernel Scheduling

    • Scheduling of kernel within a grid is block-oriented
    • Blocks are required to run independent from each other
    • Device can schedule them in any order
    • Depending on hardware resources, multiple blocks can run at a time
    • SM (Streaming Multiprocessor) schedules a group of threads of a block in parallel, called a warp
    • Warps have a size of a multiple of 32 threads, depending on the hardware

    CUDA Memory Model

    • Host has its own RAM: host memory
    • Device has its own RAM: global memory
    • Data transfer is necessary between these memories:
      • Initially from host to device (with input data)
      • In the end, from device to host (with the result)

    Device Memory Initialisation & Transfer

    • Memory in the device Global Memory needs to be allocated from the host
    • cudaMalloc() and cudaFree() are used for allocation and deallocation
    • cudaMemcpy() is used for data transfer between host and device
    • Direction of transfer can be specified: cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost

    Cuda Device Memory Layout

    • Specification of the storage type, scope, and lifetime of device variables is defined by tag
    • Automatic scalar (non-array) vars are register-based
    • Local thread memory is physically on Global Memory

    OpenMP Motivation

    • A library for developing parallel applications using shared memory
    • Provides high-level abstraction for adapting sequential applications in a simple way
    • Alternative to Pthreads, which has a more complex API

    Pragma Mechanism in C

    • OMP commands are based on pragmas
    • Pragmas allow adding functionality to a C program without affecting its generic compilation
    • The #pragma omp directive is used
    • Compilers that support these directives make use of them, while those that don't ignore them and compile the program as usual
    • Correctly coded OpenMP programs always compile on any platform, even without support

    Compiler Extensions for OpenMP

    • Auxiliary Library "omp.h" is included with #include
    • Pre-compiler directive to validate support: # ifdef _OPENMP ... #else ... #endif

    Compiling and Executing OpenMP Programs

    • Compiling option: -fopenmp
    • Example: gcc -fopenmp hello_omp.c -o hello_omp
    • Linker option: -lomp (optional if including -f)

    Parallel Directive

    • #pragma omp parallel creates multiple threads of execution in the same process
    • Each thread executes the code immediately following the pragma
    • When the code is finished, all threads wait for the completion of the remaining threads
    • Example code and result:
      • Code: #pragma omp parallel printf("Esta é a thread %d, num threads %d\n", omp_get_thread_num(), omp_get_num_threads()); printf("Fim\n");
      • Result: Esta é a thread 2, num threads 3 Esta é a thread 1, num threads 3 Esta é a thread 0, num threads 3 Fim

    numthreads Option

    • #pragma omp parallel num_threads(thrcnt) defines the number of threads to parallelize the application
    • If not specified, the number of threads created is set by the system during execution (e.g., number of cores)

    Pthreads Overview

    • Pthreads is a library for developing parallel applications using shared memory.
    • It assumes a POSIX-compliant operating system as its base.
    • The library can be embedded in any programming language, usually C.

    Compilation and Execution of Pthreads Programs

    • To compile a Pthreads program, include the library headers using #include .
    • Use the linker option -lpthread to link the pthread library.
    • Example compilation command: gcc -lpthread hello.c -o hello.

    Pthread API to Create and Join Threads

    • pthread_create function is used to create a new thread.
    • Syntax: pthread_create(pthread_t* thread_p, const pthread_attr_t* attr_p, void* (*start_routine)(void*), void* arg_p).
    • thread_p: thread object reference.
    • attr_p: creation attributes, usually set to NULL.
    • start_routine: function to execute in the new thread.
    • arg_p: function argument.
    • pthread_join function is used to wait for a thread to finish.
    • Syntax: pthread_join(pthread_t thread, void** ret_val_p).

    Example Incremental Application

    • The example application demonstrates parallel processing using multiple threads.
    • Global variables: n (number of iterations), thread_count (number of threads), and sum (global sum value).
    • The Increment function is the thread operation that each thread executes.
    • Each thread calculates a portion of the total sum based on its rank and the number of iterations.

    Parallel Programming Challenge

    • Develop a program that increments a global variable N times
    • Sequential version uses a single thread, while parallel version uses more than 1 thread to increment a global counter variable
    • Correctness: the result of both applications must always be the same
    • Performance: the parallel version is expected to be n times faster for n threads

    Sequential Version

    • Implement code that increments a global variable
    • Consider parameter #iterations
    • Time measurement serves for comparison

    Parallel Version

    • Implement parallel version that makes use of threads
    • Allow configuring the number of threads and distribute the cycle size among the threads

    Race Condition and Critical Section

    • A race condition arises when several threads try to access the same shared variable at the same time
    • Characteristics of the hardware can cause inconsistencies in the values of shared variables
    • A critical section is the code executed by several threads that accesses a shared variable and must be "protected" from simultaneous access

    Mutual Exclusion Mechanism

    • Mechanisms are needed to ensure that only one thread can access the same resources shared between them
    • Mutual exclusion mechanisms prevent multiple threads from executing critical sections simultaneously

    Solution to Get it Correct in the Parallel Version

    • Use mutual exclusion mechanism to prevent threads from freely accessing a shared variable
    • Identify:
      • Shared variable
      • Thread code that accesses the variable
      • Rotate this code by a mutual exclusion mechanism

    Mutual Exclusion Mechanisms in OpenMP

    • Variable Visibility: shared, private
    • Shared variables are accessed by all threads, while private variables are internal to each thread and only visible to the thread itself
    • Changes to shared variables are visible to all other threads
    • Private variables have a different value for each thread

    Critical Directive

    • #pragma omp critical directive indicates a "critical section" and is protected by a mutual exclusion mechanism between all threads
    • It ensures that only one thread can execute the code at a time and not simultaneously

    Accumulation of Results in Shared Global Variable

    • Using the critical clause guarantees correctness but creates a serialization of execution, losing the advantage of parallelism
    • Critical sections can collect local results to be "accumulated" into a global value

    Reduction Clause

    • reduction(: ) clause allows using a shared global variable as if it were private during thread execution
    • At the end of execution, private values of each thread are accumulated into the global variable
    • Equivalent to using an auxiliary private variable to accumulate internal values and then integrating them into the shared global value

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    2-OpenMP.pdf
    4-MutualExclusionMechanisms.pdf
    6-ThreadSynchronization.pdf
    7-MPI.pdf
    9-introCuda.pdf

    Description

    Quiz on thread synchronization problems in high performance computing, covering mutual exclusion mechanism and thread execution ordering. For Master in Applied Artificial Intelligence students.

    More Like This

    Thread Functionality in IT2105
    10 questions
    Multi-threading in Java Networking
    16 questions
    Semaphores in Concurrent Programming
    15 questions
    POSIX Threads and Synchronization
    39 questions
    Use Quizgecko on...
    Browser
    Browser