0% found this document useful (0 votes)
14 views

Pthreads

Posix threads

Uploaded by

sipossandor1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Pthreads

Posix threads

Uploaded by

sipossandor1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Shared Memory Programming

with Pthreads

T. Yang. UCSB CS240A.


# Chapter Subtitle
Outline

• Shared memory programming: Overview


• POSIX pthreads
• Critical section & thread synchronization.
§ Mutexes.
§ Producer-consumer synchronization and
semaphores.
§ Barriers and condition variables.
§ Read-write locks.
• Thread safety.

Copyright © 2010, Elsevier


Inc. All rights Reserved
Processes/Threads in Shared Memory
Architecture
• A process is an instance of a running (or
suspended) program.
• Threads are analogous to a “light-weight” process.
• In a shared memory program a single process may
have multiple threads of control.

A process T2
T4 Process hierarchy
T1
shared code, data P1
and kernel context
sh sh sh

T5 T3
foo
Execution Flow on one-core or multi-core
systems
Concurrent execution on a single core system: Two threads run
concurrently if their logical flows overlap in time

Parallel execution on a multi-core system


Benefits of multi-threading

• Responsiveness

• Resource Sharing
§ Shared memory

• Economy

• Scalability
§ Explore multi-core CPUs
Thread Programming with Shared Memory
• Program is a collection of threads of control.
§ Can be created dynamically
• Each thread has a set of private variables, e.g., local stack
variables
• Also a set of shared variables, e.g., static variables, shared
common blocks, or global heap.
§ Threads communicate implicitly by writing and reading
shared variables.
§ Threads coordinate by synchronizing on shared
variables
Shared memory
s
s = ...

i: 2 i: 5 Private i: 8
memory
6
P0 P1 Pn
Shared Memory Programming

Several Thread Libraries/systems


• Pthreads is the POSIX Standard
§ Relatively low level
§ Portable but possibly slow; relatively heavyweight
• OpenMP standard for application level programming
§ Support for scientific programming on shared memory
§ http://www.openMP.org
• Java Threads
• TBB: Thread Building Blocks
§ Intel
• CILK: Language of the C “ilk”
§ Lightweight threads embedded into C 7
Overview of POSIX Threads

• POSIX: Portable Operating System Interface for


UNIX
§ Interface to Operating System utilities
• PThreads: The POSIX threading interface
§ System calls to create and synchronize threads
§ In CSIL, compile a c program with gcc -lpthread
• PThreads contain support for
§ Creating parallelism and synchronization
§ No explicit support for communication, because
shared memory is implicit; a pointer to shared data
is passed to a thread
8
Creation of Unix processes vs. Pthreads
C function for starting a thread
pthread.h One object for
each thread.
pthread_t

int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
Copyright © 2010, Elsevier
Inc. All rights Reserved
A closer look (1)

int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;

We won’t be using, so we just pass NULL.

Allocate before calling.

Copyright © 2010, Elsevier


Inc. All rights Reserved
A closer look (2)

int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;

Pointer to the argument that should


be passed to the function start_routine.

The function that the thread is to run.

Copyright © 2010, Elsevier


Inc. All rights Reserved
Function started by pthread_create
• Prototype:
void* thread_function ( void* args_p ) ;

• Void* can be cast to any pointer type in C.

• So args_p can point to a list containing one or


more values needed by thread_function.
• Similarly, the return value of thread_function can
point to a list of one or more values.

Copyright © 2010, Elsevier


Inc. All rights Reserved
Wait for Completion of Threads

pthread_join(pthread_t *thread, void


**result);
§ Wait for specified thread to finish. Place exit value
into *result.
• We call the function pthread_join once for each
thread.
• A single call to pthread_join will wait for the thread
associated with the pthread_t object to complete.

Copyright © 2010, Elsevier


Inc. All rights Reserved
Example of Pthreads
#include <pthread.h>
#include <stdio.h>
void *PrintHello(void * id){
printf(“Thread%d: Hello World!\n", id);
}

void main (){


pthread_t thread0, thread1;
pthread_create(&thread0, NULL, PrintHello, (void *) 0);
pthread_create(&thread1, NULL, PrintHello, (void *) 1);
}
Example of Pthreads with join
#include <pthread.h>
#include <stdio.h>
void *PrintHello(void * id){
printf(“Hello from thread %d\n", id);
}

void main (){


pthread_t thread0, thread1;
pthread_create(&thread0, NULL, PrintHello, (void *) 0);
pthread_create(&thread1, NULL, PrintHello, (void *) 1);
pthread_join(thread0, NULL);
pthread_join(thread1, NULL);
}
Some More Pthread Functions

• pthread_yield();
§ Informs the scheduler that the thread is willing to yield
• pthread_exit(void *value);
§ Exit thread and pass value to joining thread (if exists)
Others:
• pthread_t me; me = pthread_self();
§ Allows a pthread to obtain its own identifier pthread_t
thread;
• Synchronizing access to shared variables
§ pthread_mutex_init, pthread_mutex_[un]lock
§ pthread_cond_init, pthread_cond_[timed]wait
Compiling a Pthread program

gcc −g −Wall −o pth_hello pth_hello . c −lpthread

link in the Pthreads library

Copyright © 2010, Elsevier


Inc. All rights Reserved
Running a Pthreads program

. / pth_hello

Hello from thread 1


Hello from thread 0

. / pth_hello
Hello from thread 0
Hello from thread 1

Copyright © 2010, Elsevier


Inc. All rights Reserved
Difference between Single and Multithreaded
Processes
Shared memory access for code/data
Separate control flow -> separate stack/registers
CRITICAL SECTIONS

Copyright © 2010, Elsevier


Inc. All rights Reserved
Data Race Example

static int s = 0;
Thread 0 Thread 1

for i = 0, n/2-1 for i = n/2, n-1


s = s + f(A[i]) s = s + f(A[i])

• Also called critical section problem.


• A race condition or data race occurs when:
- two processors (or two threads) access the same variable,
and at least one does a write.
- The accesses are concurrent (not synchronized) so they
could happen simultaneously
Synchronization Solutions

1. Busy waiting
2. Mutex (lock)
3. Semaphore
4. Conditional Variables
Example of Busy Waiting

static int s = 0;
static int flag=0
Thread 0 Thread 1
int temp, my_rank int temp, my_rank
for i = 0, n/2-1 for i = n/2, n-1
temp0=f(A[i]) temp=f(A[i])
while flag!=my_rank; while flag!=my_rank;
s = s + temp0 s = s + temp
flag= (flag+1) %2 flag= (flag+1) %2

• A thread repeatedly tests a condition, but, effectively, does no


useful work until the condition has the appropriate value.
•Weakness: Waste CPU resource. Sometime not safe with
compiler optimization.
Mutexes (Locks)
Acquire mutex lock

Critical section
• Code structure
Unlock/Release mutex

• Mutex (mutual exclusion) is a special type of variable


used to restrict access to a critical section to a single
thread at a time.
• guarantee that one thread “excludes” all other threads
while it executes the critical section.
• When A thread waits on a mutex/lock,
CPU resource can be used by others.
• Only thread that has acquired the lock
can release this lock
Execution example with 2 threads

Thread 1 Thread 2

Acquire mutex lock Acquire mutex lock

Critical section

Unlock/Release mutex

Critical section

Unlock/Release mutex
Mutexes in Pthreads

• A special type for mutexes: pthread_mutex_t.

• To gain access to a critical section, call

• To release

• When finishing use of a mutex, call

Copyright © 2010, Elsevier


Inc. All rights Reserved
Global sum function that uses a mutex (1)

Copyright © 2010, Elsevier


Inc. All rights Reserved
Global sum function that uses a mutex (2)

Copyright © 2010, Elsevier


Inc. All rights Reserved
Semaphore: Generalization from mutex
locks
• Semaphore S – integer variable
• Can only be accessed /modified via two
(atomic) operations with the following
semantics:
§ wait (S) { //also called P()
while S <= 0 wait in a queue;
S--;
}
§ post(S) { //also called V()
S++;
Wake up a thread that waits in the queue.
}
Why Semaphores?
Synchronization Functionality/weakness
Busy waiting Spinning for a condition. Waste
resource. Not safe
Mutex lock Support code with simple mutual
exclusion
Semaphore Handle more complex signal-based
synchronization

• Examples of complex synchronization


§ Allow a resource to be shared among multiple
threads.
– Mutex: no more than 1 thread for one protected region.
§ Allow a thread waiting for a condition after a signal
– E.g. Control the access order of threads entering the
critical section.
– For mutexes, the order is left to chance and the system.
Syntax of Pthread semaphore functions
Semaphores are not part of Pthreads;
you need to add this.

Copyright © 2010, Elsevier


Inc. All rights Reserved
Producer-consumer
Synchronization and
Semaphores
Copyright © 2010, Elsevier
Inc. All rights Reserved
Producer-Consumer Example

T0 T1 T2

• Thread x produces a message for Thread x+1.


§ Last thread produces a message for thread 0.
• Each thread prints a message sent from its source.
• Will there be null messages printed?
§ A consumer thread prints its source message before
this message is produced.
§ How to avoid that?
Flag-based Synchronization with 3 threads

Thread 0 Thread 1 Thread 2


Write a msg to #1 Write a msg to #2 Write a msg to #0
Set msg[1] Set msg[2] Set msg[0]
If msg[0] is ready If msg[1] is ready If msg[2] is ready
Print msg[0] Print msg[1] Print msg[2]

To make sure a message is received/printed, use busy waiting.


First attempt at sending messages using pthreads

Produce a message for a destination


thread

Consume a message

Copyright © 2010, Elsevier


Inc. All rights Reserved
Semaphore Synchronization with 3 threads

Thread 0 Thread 1 Thread 2


Write a msg to #1 Write a msg to #2 Write a msg to #0
Set msg[1] Set msg[2]
Post(semp[1]) Set msg[0]
Post(semp[2]) Post(semp[0])
Wait(semp[0])
Print msg[0] Wait(semp[1]) Wait(semp[2])
Print msg[1] Print msg[2]
Message sending with semaphores

sprintf(my_msg, "Hello to %ld from %ld", dest, my_rank);


messages[dest] = my_msg;

sem_post(&semaphores[dest]);
/* signal the dest thread*/
sem_wait(&semaphores[my_rank]);
/* Wait until the source message is created */

printf("Thread %ld > %s\n", my_rank,


messages[my_rank]);
READERS-WRITERS PROBLEM

Copyright © 2010, Elsevier


Inc. All rights Reserved
Synchronization Example for Readers-Writers Problem

• A data set is shared among a number of concurrent


threads.
§ Readers – only read the data set; they do not perform any
updates
§ Writers – can both read and write
• Requirement:
§ allow multiple readers to read at the same time.
§ Only one writer can access the shared data at the same
time.
• Reader/writer access permission table:
Reader Writer

Reader OK No
Writer NO No
Readers-Writers (First try with 1 mutex lock)
• writer
do {
mutex_lock(w);
// writing is performed
mutex_unlock(w);
Reader Writer
} while (TRUE);
• Reader Reader ? ?
Writer ? ?
do {
mutex_lock(w);
// reading is performed
mutex_unlock(w);
} while (TRUE);
Readers-Writers (First try with 1 mutex lock)
• writer
do {
mutex_lock(w);
// writing is performed
mutex_unlock(w);
Reader Writer
} while (TRUE);
• Reader Reader no no
Writer no no
do {
mutex_lock(w);
// reading is performed
mutex_unlock(w);
} while (TRUE);
2nd try using a lock + readcount
• writer
do {
mutex_lock(w);// Use writer mutex lock
// writing is performed
mutex_unlock(w);
} while (TRUE);
• Reader
do {
readcount++; // add a reader counter.
if(readcount==1) mutex_lock(w);
// reading is performed
readcount--;
if(readcount==0) mutex_unlock(w);
} while (TRUE);
Readers-Writers Problem with semaphone

• Shared Data
§ Data set
§ Lock mutex (to protect readcount)
§ Semaphore wrt initialized to 1 (to
synchronize between
readers/writers)
§ Integer readcount initialized to 0
Readers-Writers Problem

• A writer
do {
sem_wait(wrt) ; //semaphore wrt

// writing is performed

sem_post(wrt) ; //
} while (TRUE);
Readers-Writers Problem (Cont.)

• Reader
do {
mutex_lock(mutex);
readcount ++ ;
if (readcount == 1)
sem_wait(wrt); //check if anybody is writing
mutex_unlock(mutex)

// reading is performed

mutex_lock(mutex);
readcount - - ;
if (readcount == 0)
sem_post(wrt) ; //writing is allowed now
nlock(mutex) ;
} while (TRUE);
Barriers

• Synchronizing the threads to make sure that they all


are at the same point in a program is called a barrier.
• No thread can cross the barrier until all the threads
have reached it.
• Availability:
§ No barrier provided by
Pthreads library and needs
a custom implementation
§ Barrier is implicit in
OpenMP
and available in MPI.
Copyright © 2010, Elsevier
Inc. All rights Reserved
Condition Variables

• Why?
• More programming primitives to simplify code for
synchronization of threads
Synchronization Functionality
Busy waiting Spinning for a condition. Waste resource.
Not safe
Mutex lock Support code with simple mutual
exclusion
Semaphore Signal-based synchronization. Allow
sharing (not wait unless semaphore=0)

Barrier Rendezvous-based synchronization


Condition More complex synchronization: Let
variables threads wait until a user-defined
condition becomes true
Synchronization Primitive: Condition Variables

• Used together with a lock


• One can specify more general waiting
condition compared to semaphores.
• A thread is blocked when condition is no
true:
§ placed in a waiting queue, yielding
CPU resource to somebody else.
§ Wake up until receiving a signal
Pthread synchronization: Condition
variables
int status; pthread_condition_t cond;
const pthread_condattr_t attr;
pthread_mutex mutex;
status = pthread_cond_init(&cond,&attr);
status = pthread_cond_destroy(&cond);
status = pthread_cond_wait(&cond,&mutex);
-wait in a queue until somebody wakes up. Then the mutex is
reacquired.
status = pthread_cond_signal(&cond);
- wake up one waiting thread.
status = pthread_cond_broadcast(&cond);
- wake up all waiting threads in that condition
How to Use Condition Variables: Typical
Flow
§ Thread 1: //try to get into critical section and
wait for the condition
Mutex_lock(mutex);
While (condition is not satisfied)
Cond_Wait(mutex, cond);
Critical Section;
Mutex_unlock(mutex)

§ Thread 2: // Try to create the condition.


Mutex_lock(mutex);
When condition can satisfy, Signal(cond);
Mutex_unlock(mutex);
Condition variables for in producer-
consumer problem with unbounded buffer
Producer deposits data in a buffer for others to consume
First version for consumer-producer problem
with unbounded buffer
• int avail=0; // # of data items available for consumption
• Consumer thread:

while (avail <=0); //wait


Consume next item; avail = avail-1;

§ Producer thread:

Produce next item; avail = avail+1;


//notify an item is available
Condition Variables for consumer-producer
problem with unbounded buffer
• int avail=0; // # of data items available for consumption
• Pthread mutex m and condition cond;
• Consumer thread:
multex_lock(&m)
while (avail <=0) Cond_Wait(&cond, &m);
Consume next item; avail = avail-1;
mutex_unlock(&mutex)

§ Producer thread:
mutex_lock(&m);
Produce next item; availl = avail+1;
Cond_signal(&cond); //notify an item is available
mutex_unlock(&m);
When to use condition broadcast?

• When waking up one thread to run


is not sufficient.
• Example: concurrent malloc()/free()
for allocation and deallocation of
objects with non-uniform sizes.
Running trace of malloc()/free()
• Initially 10 bytes are free.
• m() stands for malloc(). f() for free()
Thread 1: Thread 2: Thread 3:
m(10) – succ m(5) – wait m(5) – wait
f(10) –broadcast
Resume m(5)-succ
Resume m(5)-succ
m(7) – wait
m(3) –wait
f(5) –broadcast
Resume m(7)-wait Resume m(3)-succ

Time
Issues with Threads: False Sharing,
Deadlocks, Thread-safety

Copyright © 2010, Elsevier


Inc. All rights Reserved
Problem: False Sharing
• Occurs when two or more processors/cores access
different data in same cache line, and at least one
of them writes.
§ Leads to ping-pong effect.
• Let’s assume we parallelize code with p=2:
for( i=0; i<n; i++ )
a[i] = b[i];
§ Each array element takes 8 bytes
§ Cache line has 64 bytes (8 numbers)
False Sharing: Example (2 of 3)

Execute this program in two processors


for( i=0; i<n; i++ )
a[i] = b[i];
cache line

a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7]

Written by CPU 0
Written by CPU 1
False Sharing: Example Two CPUs execute:
for( i=0; i<n; i++ )
a[i] = b[i];
a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7]

cache line
Written by CPU 0
Written by CPU 1
a[0] a[2] a[4] CPU0

inv data ...


a[1] a[3] a[5] CPU1
Matrix-Vector Multiplication with
Pthreads

Parallel programming book by Pacheco book P.159-162


Sequential code
Block Mapping for Matrix-Vector Multiplication

• Task partitioning
For (i=0; i<m; i=i+1)
Task Si for Row i
y[i]=0;
For (j=0; j<n; j=j+1)
y[i]=y[i] +a[i][j]*x[j]
Task graph
S0 S1 Sm
...
Mapping to
threads S0 S1 S2 S3
...
Thread 0 Thread 1
Using 3 Pthreads for 6 Rows: 2 row per
thread

S0, S1
S2, S3
S4,S5

Code for S0

Code for Si
Pthread code for thread with ID rank
i-th thread calls Pth_mat_vect( &i)
m is # of rows in this matrix A.
n is # of columns in this matrix A.
local_m is # of rows handled by
this thread.

Task Si

Copyright © 2010, Elsevier


Inc. All rights Reserved
Impact of false sharing on performance of
matrix-vector multiplication

(times are in seconds)

Why is performance of
8x8,000,000 matrix bad?
How to fix that? Copyright © 2010, Elsevier
Inc. All rights Reserved
Deadlock and Starvation

• Deadlock – two or more threads are waiting


indefinitely for an event that can be only caused by
one of these waiting threads
• Starvation – indefinite blocking (in a waiting queue
forever).
■ Let S and Q be two mutex locks:
P0 P1
Lock(S); Lock(Q);
Lock(Q); Lock(S);
. .
. .
. .
Unlock(Q); Unlock(S);
Unlock(S); Unlock(Q);
Deadlock Avoidance

• Order the locks and always acquire the locks in


that order.
• Eliminate circular waiting
■ :
P0 P1
Lock(S); Lock(S);
Lock(Q); Lock(Q);
. .
. .
. .
Unlock(Q); Unlock(Q);
Unlock(S); Unlock(S);
Thread-Safety

• A block of code is thread-safe if it can be


simultaneously executed by multiple threads without
causing problems.
• When you program your own functions, you know if
they are safe to be called by multiple threads or not.
• You may forget to check if system library functions
used are thread-safe.
§ Unsafe function: strtok()from C string.h library
§ Other example.
– The random number generator random in stdlib.h.
– The time conversion function localtime in time.h.
Concluding Remarks

• A thread in shared-memory programming is analogous


to a process in distributed memory programming.
§ However, a thread is often lighter-weight than a full-
fledged process.
• When multiple threads access a shared resource
without controling, it may result in an error: we have a
race condition.
§ A critical section is a block of code that updates a
shared resource that can only be updated by one
thread at a time
§ Mutex, semaphore, condition variables
• Issues: false sharing, deadlock, thread safety
Copyright © 2010, Elsevier
Inc. All rights Reserved

You might also like