Pthreads
Pthreads
with Pthreads
A process T2
T4 Process hierarchy
T1
shared code, data P1
and kernel context
sh sh sh
T5 T3
foo
Execution Flow on one-core or multi-core
systems
Concurrent execution on a single core system: Two threads run
concurrently if their logical flows overlap in time
• Responsiveness
• Resource Sharing
§ Shared memory
• Economy
• Scalability
§ Explore multi-core CPUs
Thread Programming with Shared Memory
• Program is a collection of threads of control.
§ Can be created dynamically
• Each thread has a set of private variables, e.g., local stack
variables
• Also a set of shared variables, e.g., static variables, shared
common blocks, or global heap.
§ Threads communicate implicitly by writing and reading
shared variables.
§ Threads coordinate by synchronizing on shared
variables
Shared memory
s
s = ...
i: 2 i: 5 Private i: 8
memory
6
P0 P1 Pn
Shared Memory Programming
int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
Copyright © 2010, Elsevier
Inc. All rights Reserved
A closer look (1)
int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
int pthread_create (
pthread_t* thread_p /* out */ ,
const pthread_attr_t* attr_p /* in */ ,
void* (*start_routine ) ( void ) /* in */ ,
void* arg_p /* in */ ) ;
• pthread_yield();
§ Informs the scheduler that the thread is willing to yield
• pthread_exit(void *value);
§ Exit thread and pass value to joining thread (if exists)
Others:
• pthread_t me; me = pthread_self();
§ Allows a pthread to obtain its own identifier pthread_t
thread;
• Synchronizing access to shared variables
§ pthread_mutex_init, pthread_mutex_[un]lock
§ pthread_cond_init, pthread_cond_[timed]wait
Compiling a Pthread program
. / pth_hello
. / pth_hello
Hello from thread 0
Hello from thread 1
static int s = 0;
Thread 0 Thread 1
1. Busy waiting
2. Mutex (lock)
3. Semaphore
4. Conditional Variables
Example of Busy Waiting
static int s = 0;
static int flag=0
Thread 0 Thread 1
int temp, my_rank int temp, my_rank
for i = 0, n/2-1 for i = n/2, n-1
temp0=f(A[i]) temp=f(A[i])
while flag!=my_rank; while flag!=my_rank;
s = s + temp0 s = s + temp
flag= (flag+1) %2 flag= (flag+1) %2
Critical section
• Code structure
Unlock/Release mutex
Thread 1 Thread 2
Critical section
Unlock/Release mutex
Critical section
Unlock/Release mutex
Mutexes in Pthreads
• To release
T0 T1 T2
Consume a message
sem_post(&semaphores[dest]);
/* signal the dest thread*/
sem_wait(&semaphores[my_rank]);
/* Wait until the source message is created */
Reader OK No
Writer NO No
Readers-Writers (First try with 1 mutex lock)
• writer
do {
mutex_lock(w);
// writing is performed
mutex_unlock(w);
Reader Writer
} while (TRUE);
• Reader Reader ? ?
Writer ? ?
do {
mutex_lock(w);
// reading is performed
mutex_unlock(w);
} while (TRUE);
Readers-Writers (First try with 1 mutex lock)
• writer
do {
mutex_lock(w);
// writing is performed
mutex_unlock(w);
Reader Writer
} while (TRUE);
• Reader Reader no no
Writer no no
do {
mutex_lock(w);
// reading is performed
mutex_unlock(w);
} while (TRUE);
2nd try using a lock + readcount
• writer
do {
mutex_lock(w);// Use writer mutex lock
// writing is performed
mutex_unlock(w);
} while (TRUE);
• Reader
do {
readcount++; // add a reader counter.
if(readcount==1) mutex_lock(w);
// reading is performed
readcount--;
if(readcount==0) mutex_unlock(w);
} while (TRUE);
Readers-Writers Problem with semaphone
• Shared Data
§ Data set
§ Lock mutex (to protect readcount)
§ Semaphore wrt initialized to 1 (to
synchronize between
readers/writers)
§ Integer readcount initialized to 0
Readers-Writers Problem
• A writer
do {
sem_wait(wrt) ; //semaphore wrt
// writing is performed
sem_post(wrt) ; //
} while (TRUE);
Readers-Writers Problem (Cont.)
• Reader
do {
mutex_lock(mutex);
readcount ++ ;
if (readcount == 1)
sem_wait(wrt); //check if anybody is writing
mutex_unlock(mutex)
// reading is performed
mutex_lock(mutex);
readcount - - ;
if (readcount == 0)
sem_post(wrt) ; //writing is allowed now
nlock(mutex) ;
} while (TRUE);
Barriers
• Why?
• More programming primitives to simplify code for
synchronization of threads
Synchronization Functionality
Busy waiting Spinning for a condition. Waste resource.
Not safe
Mutex lock Support code with simple mutual
exclusion
Semaphore Signal-based synchronization. Allow
sharing (not wait unless semaphore=0)
§ Producer thread:
§ Producer thread:
mutex_lock(&m);
Produce next item; availl = avail+1;
Cond_signal(&cond); //notify an item is available
mutex_unlock(&m);
When to use condition broadcast?
Time
Issues with Threads: False Sharing,
Deadlocks, Thread-safety
Written by CPU 0
Written by CPU 1
False Sharing: Example Two CPUs execute:
for( i=0; i<n; i++ )
a[i] = b[i];
a[0] a[1] a[2] a[3] a[4] a[5] a[6] a[7]
cache line
Written by CPU 0
Written by CPU 1
a[0] a[2] a[4] CPU0
• Task partitioning
For (i=0; i<m; i=i+1)
Task Si for Row i
y[i]=0;
For (j=0; j<n; j=j+1)
y[i]=y[i] +a[i][j]*x[j]
Task graph
S0 S1 Sm
...
Mapping to
threads S0 S1 S2 S3
...
Thread 0 Thread 1
Using 3 Pthreads for 6 Rows: 2 row per
thread
S0, S1
S2, S3
S4,S5
Code for S0
Code for Si
Pthread code for thread with ID rank
i-th thread calls Pth_mat_vect( &i)
m is # of rows in this matrix A.
n is # of columns in this matrix A.
local_m is # of rows handled by
this thread.
Task Si
Why is performance of
8x8,000,000 matrix bad?
How to fix that? Copyright © 2010, Elsevier
Inc. All rights Reserved
Deadlock and Starvation