0% found this document useful (0 votes)

10 views

Multi Threading

The document discusses multithreaded algorithms, focusing on parallel computing models with shared memory and the concepts of threads, spawning, and synchronization. It explains the Fibonacci sequence computation using a naive algorithm and how it can be optimized through parallelism with dynamic multithreading. Additionally, it covers performance measures such as work, span, speedup, and slackness in the context of multithreaded execution.

Uploaded by

ruthmp.cs22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Multi Threading

Uploaded by

ruthmp.cs22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 96

Multithreaded

Algorithms
Motivatio
n

Serial algorithms are suitable for running on

a uniprocessor computer.
We will now extend our model to parallel algorithms
that can run on a multiprocessor computer.
Computational
Model

There exist many competing models of parallel

computation that are essentially different. For
example, one can have shared or distributed
memory.
Since multicore processors are ubiquitous, we focus
on a parallel computing model with shared memory.
Thread
s
• A thread in computer science is short for a thread of
execution.

• A thread of execution is the smallest sequence of

programmed instructions that can be managed
independently by a scheduler, which is typically a part of
the operating system.
• Threads are a way for a program to divide (termed "split") itself
into two or more simultaneously (or pseudo-simultaneously)
running tasks.
• The implementation of threads and processes differs between
operating systems, but in most cases a thread is a component of a
process. Multiple threads can exist within one process, executing
concurrently and sharing resources such as memory, while different
processes do not share these resources.
Threading
Types
• Two types feasible:
• Static threading: OS controls, typically for single-core CPU’s.
but multi-core CPU’s use it if compiler guarantees safe execution
• Dynamic threading: Program controls explicitly, threads are
created/destroyed as needed, parallel computing model

Threads allow concurrent execution of two or more parts of a program for

maximum utilization of CPU.
Dynamic
Multithreading
Programming a shared-memory parallel computer
can be difficult and error-prone. In particular, it is
difficult to partition the work among several threads
so that each thread approximately has the same
load.
A concurrency platform is a software layer that
coordinates, schedules, and manages parallel-
computing resources. We will use a simple
extension of the serial programming model that
uses the concurrency instructions parallel,
spawn, and sync.
Spaw
n
Spawn: If spawn proceeds a procedure call, then the
procedure instance that executes the spawn (the
parent) may continue to execute in parallel with the
spawned subroutine (the child), instead of waiting
for the child to complete.
The keyword spawn does not say that a procedure
must execute concurrently, but simply that it may.
At runtime, it is up to the scheduler to decide
which subcomputations should run concurrently.
Syn
c

The keyword sync indicates that the procedure

must wait for all its spawned children to
complete.
Paralle
l

Many algorithms contain loops, where all iterations

can operate in parallel. If the parallel keyword
proceeds a for loop, then this indicates that the
loop body can be executed in parallel.
openMP – Uses task
schedular to create threads
#include <stdio.h>
#include <omp.h>

int main() {
int I , n=5;
int array[5] ={0,1,4,9,16};
#pragma omp parallel for
for (i = 0; i < n; i++) {
printf("Thread processes index %d\n", i);
array[i] = array[i] * 2; // Example operation
printf("array[%d] = %d\n", i, array[i]);
}
return 0;
}
Fibonacci
Numbers
Definitio
n

The Fibonacci numbers (0,1,1,2,3,5,8,13…) are

defined by the recurrence:
F0 = 0

F1 = 1

Fi = Fi-1 + Fi-2

for i > 1.
Naive
Algorithm
Computing the Fibonacci numbers can be done
with the following algorithm:

Fibonacci(n)
if n < 2 then return
n; x =
Fibonacci(n-1); y
= Fibonacci(n-2) ;
return x + y;
Running
Time

Let T(n) denote the running time of Fibonacci(n). Since this

procedure contains two recursive calls and a constant
amount of extra work, we get
T(n) = T(n-1) + T(n-2) + θ(1)
which yields T(n) = θ(Fn)= θ( ((1+sqrt(5))/2)n )

Since this is an exponential growth, this is a particularly bad

way to calculate Fibonacci numbers.
How would you calculate the Fibonacci numbers?
Fibonacci
Example
Observe that within FIB(n), the two recursive calls in lines 3 and 4 to

FIB(n-1) and FIB(n-2), respectively, are independent of each other: they

could be called in either order, and the computation performed by one
has no way affects the other.
Therefore, the two recursive calls can run in parallel.
Fibonacci
Example
Parallel algorithm to compute Fibonacci numbers:
We augment our pseudocode to indicate parallelism by
adding the concurrency keywords spawn and sync.
Here is how we can rewrite the FIB procedure to use
dynamic multithreading:
Spawn, Sync &
Parallel

Notice that if we delete the concurrency keywords spawn and

sync from P-FIB,the resulting pseudocode text is identical to
FIB (other than renaming the procedurein the header and in
the two recursive calls).
We define the serialization of a multithreaded algorithm to
be the serial algorithm that results from deleting the
multithreaded keywords:
spawn, sync, and parallel.
Spaw
n
Nested parallelism occurs when the keyword spawn
precedes a procedure call, as in line 3. It creates a
concurrent process.
The semantics of a spawn differs from an ordinary procedure
call in that the procedure instance that executes the spawn
—the parent—may continue to execute in parallel with
the spawned subroutine—its child—instead of waiting
for the child to complete, as would normally happen in a
serial execution.
Spaw
n
In this case, while the spawned child is computing P-FIB(n-1),
the parent may go on to compute P-FIB(n-2)in line 4 in
parallel with the spawned child.
Since the P-FIB procedure is recursive, these two subroutine
calls themselves create nested parallelism, as do their
children, thereby creating a potentially vast tree of
subcomputations, all executing in parallel.
Spaw
n

The keyword spawn does not say, however, that a

procedure must execute concurrently with its spawned
children, only that it may.
The concurrency keywords express the logical parallelism
of the computation, indicating which parts of the
computation may proceed in parallel.
At runtime, it is up to a scheduler to determine which
subcomputations actually run concurrently by assigning
them to available processors as the computation unfolds.
Syn
c
A procedure cannot safely use the values returned by its
spawned children until after it executes a sync
statement, as in line 5. The keyword sync indicates
that the procedure must wait as necessary for all its
spawned children to complete execution before proceeding
to the statement after the sync.
In the P-FIB procedure, a sync is required before the
return statement in line 6 to avoid the anomaly that
would occur if x and y were summed before x was
computed.
In addition to explicit synchronization provided by the sync
statement, every procedure executes a sync implicitly
before it returns, thus ensuring that all its children terminate
before it does.
A model for multithreaded
execution
It helps to think of a multithreaded computation—the set
of runtime instructions executed by a processor on behalf
of a multithreaded program—as a directed acyclic graph G
=(V,E), called a computation dag.
Computation
DAG
Multithreaded computation can be better understood with
the help of a computation directed acyclic graph
G=(V,E).
The vertices V in the graph are the instructions.
The edges E represent dependencies between
instructions.
An edge (u,v) is in E means that the instruction u must
execute before instruction v.
Strand and
Threads
A sequence of instructions containing no parallel
control (spawn, sync, return from spawn, parallel)
can be grouped into a single strand.

Thus, V represents a set of strands and E the

dependencies between the strands introduced by a
parallel control.

A strand of maximal length will be called a thread.

Computation
DAG
A computation directed acyclic graph G=(V,E) consists a
vertex set V that comprises the threads of the program.
The edge set E contains an edge (u,v) if and only if the
thread u need to execute before thread v.

If there is an edge between thread u and v, then they are

said to be (logically) in series. If there is no edge, then
they are said to be (logically) in parallel.
Edge
Classification
A continuation edge (u,v) connects a thread u to its
successor v within the same procedure instance.
When a thread u spawns a new thread v, then (u,v) is
called a spawn edge.
When a thread v returns to its calling procedure and x is
the thread following the parallel control, then the return
edge (v,x) is included in the graph.
Fibonacci
Example
Parallel algorithm to compute Fibonacci
numbers:
Fibonacci(4
)
•Each circle represents one strand (a chain of instructions which contains no
parallel control).
•Black dots : base case or part of the procedure up to the spawn of P-FIN(n-1) in
line 3.
•Grey dots: regular execution ie the part of the procedure that calls P-FIN(n-2) in
line 4 up to the sync in line 5.
•White dots: part of the procedure after sync up to the point where it returns the
result.
Performance
Measures
DAG: directed acyclic graph. Vertices are the circles for spawn,
sync or procedure call. For a problem of size n:

• Span S or T∞(n). Number of vertices on the longest directed path

from start to finish in the computation DAG. (The critical path).

The run time if each vertex of the DAG has its own processor.
• Work W or T1 (n). Total time to execute the entire computation on
one processor. Defined as the number of vertices in the
computation DAG
• Tp(n). Total time to execute entire computation with p processors

• Speed up = T1/Tp. How much faster it is.

• Parallelism = T1/ T∞ . The maximum possible speed up.

Performance
Measures
The work of a multithreaded computation is the
total time to execute the entire computation on
one processor.
Work = sum of the times taken by each thread
= 17 time units
Performance
Measures
The span is the longest time to execute the
strands along any path of the computational
directed acyclic graph.
Span =the number of vertices on a longest or
critical path
span = 8 time units
Performance Measure
Example
In Fibonacci(4), we have
17 vertices = 17
threads. 8 vertices on
longest path.

Assuming unit time for

each thread, we get
work = 17 time
units span = 8
time units
The actual running time of a multithreaded
computation depends not just on its work and
span, but also on how many processors (cores)
are available, and how the scheduler allocates
strands to processors.
Running time on P processors is indicated by
subscript P
- T1 running time on a single processor

- TP running time on P processors

- T∞ running time on unlimited processors, also

Work
Law

An ideal parallel computer with P processors can do

at most P units of work, and thus in Tp time, it can
perform at most PTp work.
we have PTp >= T1
Dividing by P yields the work law: Tp >= T1/P
Span
Law
A P-processor ideal parallel computer cannot run
faster than a machine with unlimited number of
processors.
However, a computer with unlimited number of
processors can emulate a P-processor machine by
using just P of its processors. Therefore,
Tp >= T∞ (when there are p processors , execution time is greater than or equal to

the execution time taken when there are unlimited processors)

which is called the span law.

- TP running time on P processors

- T∞ running time on unlimited processors

Span Law Explanation:
Why is this true?
Unlimited processors: If you have unlimited
processors, you could ideally execute all operations in
parallel. The only limitation is the critical path length
(the longest sequence of dependent operations).
Hence, the time to finish the algorithm in the best
case is T∞.
P processors: On a machine with only P processors,
you can't exploit parallelism as fully because there
are fewer processors to distribute the work. Some
operations will inevitably have to wait for others, so
the execution time will generally be greater than or
equal to the time it would take on an unlimited
number of processors.
Speedup and
Parallelism

The speed up of a computation on P processors is

defined as T1 / Tp

i.e. how many times faster the computations on P

processors than on 1 processor (How much faster
it is).
Thus, speedup on P processors can be at most P.
PTp >= T1
Speedup and
Parallelism
The parallelism (max possible speed up). of a
multithreaded computation is given by T1 / T∞

We can view the parallelism from three perspectives.

□As a ratio, the parallelism denotes the average amount of
work that can be performed in parallel for each step along
the critical path.
□As an upper bound, the parallelism gives the maximum
possible speedup that can be achieved on any number of
processors.
□Finally, and perhaps most important, the parallelism
provides a limit on the possibility of attaining perfect
linear speedup. Specifically, once the number of
processors exceeds the parallelism, the computation
cannot possibly achieve perfect linear speedup.
Speedup and
Parallelism

Consider the computation P-FIB(4) and assume that each

strand takes unit time.

Since the work is T1 = 17 and the span is T∞

= 8, the parallelism is T1/T∞ = 17/8 =

2.125.
Consequently, achieving much more than double the
speedup is impossible, no matter how many processors we
employ to execute the computation.
Schedulin
g
The performance depends not just on the work and
span. Additionally, the strands must be scheduled
efficiently onto the processors of the parallel
machines.
The strands must be mapped to static threads, and
the operating system schedules the threads on
the processors themselves.
The scheduler must schedule the computation with
no advance knowledge of when the strands will be
spawned or when they will complete; it must
operate online.
Greedy
Scheduler
We will assume a greedy scheduler in our analysis,
since this keeps things simple. A greedy scheduler
assigns as many strands to processors as possible
in each time step.
On P processors, if at least P strands are ready to
execute during a time step, then we say that the
step is a complete step; otherwise we say that it is
an incomplete step.
Greedy Scheduler
Theorem

On an ideal parallel computer with P processors, a

greedy scheduler executes a multithreaded
computation with work T1 and span T∞ in time

TP <= T1 / P + T∞

[Given the fact the best we can hope for on P processors is

TP = T1 / P by the work law, and TP = T∞ by the span law,
the sum of these two bounds ]
Slacknes
s
The parallel slackness of a multithreaded
computation executed on an ideal parallel
computer with P processors is the ratio of
parallelism by P.

Slackness = (T1 / T∞) / P

SLACKNESS

Slackness in the context of parallel computing and

scheduling refers to the amount of unused
time or idle time that a processor has while waiting
for tasks to be assigned.
This refers to the difference between the time
available and the time actually required to complete
a task.
Speedu
p

Let TP be the running time of a multithreaded

computation produced by a greedy scheduler on an
ideal computer with P processors. Let T1 be the work
and T∞ be the span of the
computation. If the slackness is big, P << (T1 / T∞),
then
TP is approximately T1 / P.
Back to
Fibonacci
Parallel Fibonacci
Computation
Parallel algorithm to compute Fibonacci
numbers:
Fibonacci(n)
if n < 2 then return n;
x = spawn // parallel
Fibonacci(n-1); execution
y = spawn Fibonacci(n-2) ; // parallel
execution sync; // wait for results
of x and y
return x + y;
Work of
Fibonacci
We want to know the work and span of the
Fibonacci computation, so that we can compute
the parallelism (work/span) of the computation.

The work T1 is straightforward, since it amounts to

compute the running time of the serialized
algorithm.
T1 = θ( ((1+sqrt(5))/2)n )
Span of
Fibonacci

Recall that the span T∞ in the longest path in the

DAG. Since Fibonacci(n) spawns
computational
• Fibonacci(n-1)
• Fibonacci(n-2)
we have
T∞(n) = max( T∞(n-1) , T∞(n-2) ) + θ(1) = T∞(n-1)
+ θ(1)
which yields T∞(n) = θ(n).
Parallelism of
Fibonacci
The parallelism of the Fibonacci computation is

T1(n)/T∞(n) = θ( ((1+sqrt(5))/2)n / n)

which grows dramatically as n gets

large.

Therefore, even on the largest parallel computers, a modest

value of n suffices to achieve near perfect linear speedup,
since we have considerable parallel slackness.
Parallel
loops
Parallel
loops

Many algorithms contain loops all of whose iterations can

operate in parallel.
We can parallelize such loops using the spawn and sync
keywords, but it is much more convenient to specify
directly that the iterations of such loops can run
concurrently.
The pseudocode provides this functionality via the
parallel concurrency keyword, which precedes the for
keyword in a for loop statement.
Parallel
loops
Parallel
loops
The parallel for keywords in lines 3 and 5 indicate
that the iterations of the respective loops may be
run concurrently.
A compiler can implement each parallel for loop as
a divide-and-conquer subroutine using nested
parallelism.
For example, the parallel for loop in lines 5–7 can be
implemented with the call MAT-VEC-MAIN-LOOP(A, x, y, n,
1, n)
Parallel
loops
Race
Conditions
Race
Conditions

A multithreaded algorithm is deterministic if and only

if it does the same thing on the same input, no
matter how the instructions are scheduled.
A multithreaded algorithm is nondeterministic if
its behavior might vary from run to run.

Often, a multithreaded algorithm that is intended to

be deterministic fails to be.
Determinacy
Race
A determinacy race occurs when two logically
parallel instructions access the same memory
location and at least one of the instructions
performs a write.
Race-
Example() x
=0
parallel for i
= 1 to 2 do
x
=
Determinacy
Race

When a processor increments x, the operation is not

indivisible, but composed of a sequence of instructions.
1)Read x from memory into one of the processor’s
registers
2) Increment the value of the register
3) Write the value in the register back into x in memory
Determinacy
Race
x=0
assign r1 = 0
incr r1, so r1=1
assign r2 = 0
incr r2, so r2 =
1 write back x
= r1
write back x =
r2
print x // now
prints 1 instead
Determinacy
Race

If the effect of the parallel execution were that processor

1 executed all its instructions before processor 2, the
value 2 would be printed.
Conversely, if the effect were that processor 2 executed all
its instructions before processor 1, the value 2 would still
be printed.
When the instructions of the two processors execute at the
same time, however, it is possible, as in this example
execution, that one of the updates to x is lost.
Determinacy
Race
Generally, most orderings produce correct . But some orderings
generate improper results when the instructions interleave.
Consequently, races can be extremely hard to test for. You
can run tests for days and never see the bug, only to
experience a catastrophic system crash in the field when
the outcome is critical.
Although we can cope with races in a variety of ways, including
using mutual exclusion locks and other methods of
synchronization, for our purposes, we shall simply ensure that
strands that operate in parallel are independent: they have
no determinacy races among them.
Thus, in a parallel for construct, all the iterations
should be independent.
Between a spawn and the corresponding sync, the code of the
spawned child should be independent of the code of the parent,
Matrix
Multiplication
Matrix Multiplication:
Naïve Method
void multiply(int A[][N], int B[][N], int C[]
[N])
{
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
C[i][j] = 0;
for (int k = 0; k < N; k++)
{
C[i][j] += A[i][k]*B[k][j];
}
}
}
}
Matrix Multiplication: Divide and
Conquer
Following is simple Divide and Conquer method to multiply two square
matrices.
1)Divide matrices A and B in 4 sub-matrices of size N/2 x N/2 as shown in
the below diagram.
2) Calculate following values recursively. ae + bg, af + bh, ce + dg and
cf + dh.

In the above method, we do 8 multiplications for matrices of size N/2 x

N/2 and 4 additions. Addition of two matrices takes O(N2) time. So the
time complexity can be written as

T(N) = 8T(N/2) + O(N2) From Master's Theorem, time complexity of

above method is O(N3) which is unfortunately same as the above
naive method.
Matrix Multiplication: Divide and
Conquer
Strassen’s Matrix Multiplication
Method

-In divide and conquer method, the main

component for high time complexity is 8
recursive calls.
-The idea of Strassen’s method is to reduce the
number of recursive calls to 7.
-Strassen’s method is similar to simple divide and
conquer method in the sense that this method
also divide matrices to sub-matrices of size N/2 x
N/2 as shown, but in Strassen’s method, the four
sub-matrices of result are calculated using
following formulae.
Strassen’s Matrix Multiplication
Method

Thus, to multiply two 2 × 2 matrices, Strassen’s algorithm makes seven

multiplications and 18 additions/subtractions, whereas the
normal algorithm requires eight multiplications and four additions.
Matrix
Multiplication

One can multiply nxn matrices serially in time

θ( nlog 7) = O( n2.81) using Strassen’s divide-and-
conquer method.

We will use multithreading for a

simpler divide-and-conquer
algorithm.
Simple Divide-and-
Conquer

To multiply two nxn matrices, we perform 8 matrix

multiplications of n/2 x n/2 matrices and one addition of
n x n matrices.
Matrix
Multiplication
Matrix-Multiply(C, A, B, n):
// Multiplies matrices A and B, storing the result in C.
// n is power of 2 (for simplicity).
if n == 1:
C[1, 1] = A[1, 1] · B[1, 1]
else:
allocate a temporary matrix
T[1...n, 1...n]
partition A, B, C, and T into (n/2)x(n/2) submatrices
spawn Matrix-Multiply(C11,A11,B11, n/2)
spawn Matrix-Multiply(C12,A11,B12, n/2)
spawn Matrix-Multiply(C21,A21,B11,
n/2) spawn Matrix-
Multiply(C22,A21,B12, n/2) spawn
Matrix-Multiply(T11,A12,B21, n/2)
spawn Matrix-Multiply(T12,A12,B22, n/2)
spawn Matrix-Multiply(T21,A22,B21,
Addition of
Matrices
Matrix-Add(C, T, n):
// Adds matrices C and T in-place, producing C = C + T
// n is power of 2 (for simplicity).
if n == 1:
C[1, 1] = C[1, 1] + T[1, 1]
else:
partition C and T into (n/2)x(n/2)
submatrices
spawn Matrix-Add(C11, T11, n/2)
spawn Matrix-Add(C12, T12, n/2)
spawn Matrix-Add(C21, T21, n/2)
spawn Matrix-Add(C22, T22, n/2)
sync
Work of Matrix
Multiplication
The work T1(n) of matrix multiplication satisfies the

recurrence T1(n) = 8 T1(n/2) + θ(n2) = θ(n3)

by case 1 of the Master theorem.

The parallelism (max possible speed up). of a

multithreaded computation is given by T1 / T∞
Span of Matrix
Multiplication
The span T (n) of matrix multiplication is determined by
∞

- the span for partitioning θ(1)

- the span of the parallel nested for loops at the end
θ(log n)
-the maximum span of the 8 matrix

multiplications T∞(n) = T∞(n/2) + θ(log n)

Solving this recurrence we

get T∞(n) = θ((log n)2)

The parallelism of matrix multiplication is

given by T1(n) / T∞(n) = θ(n3 / (log n)2 )

Merge
Sort
Merge Sort- Serial
version
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
P-MERGE procedure assumes that the two subarrays to be merged lie
within the same array.
P-MERGE takes as an argument an output subarray A into which the merged
values should be stored.
The call P-MERGE(T, p1, r1, p2, r2,A, p3) merges the sorted subarrays T[p1
..r1] and T[p2 .. r2] into the subarray A[p3 .. r3], where r3=p3 +(r1 -p1 + 1)
+(r2- p2+ 1)- 1 = p3+(r1- p1)+(r2 - p2)+ 1 and is not provided as an input.
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort
Multithreaded Merge
Sort

Parallelism:
Acknowledgeme
nts
https://www.youtube.com/watch?v=UaCX8Iy00DA

https://www.youtube.com/watch?v=VD8hY7kWjdc
https://www.youtube.com/watch?v=7T-gjX24FR0
https://www.slideshare.net/AndresMendezVazquez/24-multith
re
aded-algorithms
• https://homes.luddy.indiana.edu/achauhan/Teaching/B403/L
ect ureNotes/11-multithreaded.html
• https://catonmat.net/mit-introduction-to-algorithms-part-thi
rt een
• https://www.youtube.com/watch?v=iFrmLRr9ke0
• https://www.youtube.com/watch?v=GvtgV2NkdVg&t=31s
• https://www.youtube.com/watch?v=_XOZ2IiP2nw
• Analysis of merge sort :
https://www.youtube.com/watch?v=0nlPxaC
2lTw

Test Bank For CDN ED Strategic Compensation in Canada 4th Edition Long
100% (1)
Test Bank For CDN ED Strategic Compensation in Canada 4th Edition Long
8 pages
Finally
60% (5)
Finally
29 pages
Dyn Multi Alg
No ratings yet
Dyn Multi Alg
13 pages
DAA-1
No ratings yet
DAA-1
40 pages
Multithreading Algorithms
No ratings yet
Multithreading Algorithms
36 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
Unit 4
No ratings yet
Unit 4
42 pages
Distributed Computing Seminar
No ratings yet
Distributed Computing Seminar
37 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
High Performance Computing
100% (2)
High Performance Computing
164 pages
Daa 6
No ratings yet
Daa 6
59 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Lecture 4
No ratings yet
Lecture 4
41 pages
Part 1 - Lecture 3 - Parallel Software-1
No ratings yet
Part 1 - Lecture 3 - Parallel Software-1
45 pages
Cray-1 (1976) : The World's Most Expensive Love Seat
No ratings yet
Cray-1 (1976) : The World's Most Expensive Love Seat
18 pages
CSE524sp10-01
No ratings yet
CSE524sp10-01
62 pages
Con Currency
No ratings yet
Con Currency
46 pages
02 - Introduction To Concurrent Systems PDF
No ratings yet
02 - Introduction To Concurrent Systems PDF
31 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
4_multi-threading
No ratings yet
4_multi-threading
34 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Book
No ratings yet
Book
10 pages
Threads_on_a_Multi_Core_Processor_1737287536
No ratings yet
Threads_on_a_Multi_Core_Processor_1737287536
9 pages
ceng204_w8_systems_programming2024_spring
No ratings yet
ceng204_w8_systems_programming2024_spring
53 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Introduction To Parallel Computing-Dr Nousheen
No ratings yet
Introduction To Parallel Computing-Dr Nousheen
43 pages
Parallel Computing Unit 4 - Pthreads
No ratings yet
Parallel Computing Unit 4 - Pthreads
51 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
No ratings yet
How To Sound Like A Parallel Programming Expert - Part 1 Introducing Concurrency and Parallelism
4 pages
Threads: Tevfik Koşar
100% (1)
Threads: Tevfik Koşar
40 pages
Chapter 4: Threads & Concurrency: Difference Between Multiprocessing and Multithreading
No ratings yet
Chapter 4: Threads & Concurrency: Difference Between Multiprocessing and Multithreading
20 pages
Concurrent and Parallel Programming .Unit-1
No ratings yet
Concurrent and Parallel Programming .Unit-1
8 pages
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
No ratings yet
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
22 pages
MCP-Unit 2
No ratings yet
MCP-Unit 2
77 pages
Goal-Oriented Programming, or Composition Using Events, or Threads Considered Harmful
No ratings yet
Goal-Oriented Programming, or Composition Using Events, or Threads Considered Harmful
6 pages
Types of Parallel Computing
No ratings yet
Types of Parallel Computing
11 pages
Threads
No ratings yet
Threads
32 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
U1&U2 PADCOM-25 (2)
No ratings yet
U1&U2 PADCOM-25 (2)
95 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Parallel and Distributed Computing Systems
100% (1)
Parallel and Distributed Computing Systems
57 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
Lect9 Pthread
No ratings yet
Lect9 Pthread
24 pages
PPL Unit 4
No ratings yet
PPL Unit 4
21 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
CP4253 Map Unit Ii
No ratings yet
CP4253 Map Unit Ii
23 pages
Operating System 6
No ratings yet
Operating System 6
16 pages
Chapter 05 PCPF
No ratings yet
Chapter 05 PCPF
32 pages
LNLCh-3-4
No ratings yet
LNLCh-3-4
38 pages
Lab 5: Threads Creation Using Posix Api: Pthread
100% (1)
Lab 5: Threads Creation Using Posix Api: Pthread
3 pages
Parallel Computing MCSE011
No ratings yet
Parallel Computing MCSE011
189 pages
Lec6 - Modern Development Concept - Concurrency
No ratings yet
Lec6 - Modern Development Concept - Concurrency
20 pages
Lec6_Modern Development Concept – Concurrency
No ratings yet
Lec6_Modern Development Concept – Concurrency
19 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Mini Project Review 2 and 3
No ratings yet
Mini Project Review 2 and 3
21 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
54 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
42 pages
1BM22CS360_RuthMaryPaul_OOMReport
No ratings yet
1BM22CS360_RuthMaryPaul_OOMReport
34 pages
Final Mp Report.docx
No ratings yet
Final Mp Report.docx
27 pages
NetVaultBackup 11.4 CLIReferenceGuide
No ratings yet
NetVaultBackup 11.4 CLIReferenceGuide
118 pages
Unit 1.1
No ratings yet
Unit 1.1
20 pages
1736183362
No ratings yet
1736183362
11 pages
Oca, Jr. v. Trajano
No ratings yet
Oca, Jr. v. Trajano
8 pages
Fibonacci Golden Zone Strategy
No ratings yet
Fibonacci Golden Zone Strategy
34 pages
Sistem Pertanian Terpadu Sapi Dan Padi: September 2016
No ratings yet
Sistem Pertanian Terpadu Sapi Dan Padi: September 2016
12 pages
An Iso 9001: 2015 Certified Institute
No ratings yet
An Iso 9001: 2015 Certified Institute
57 pages
Get (Ebook) Gamification: Using Game Elements in Serious Contexts by Stefan Stieglitz, Christoph Lattemann, Susanne Robra-Bissantz, Rüdiger Zarnekow, Tobias Brockmann (eds.) ISBN 9783319455556, 9783319455570, 3319455559, 3319455575 PDF ebook with Full Chapters Now
100% (8)
Get (Ebook) Gamification: Using Game Elements in Serious Contexts by Stefan Stieglitz, Christoph Lattemann, Susanne Robra-Bissantz, Rüdiger Zarnekow, Tobias Brockmann (eds.) ISBN 9783319455556, 9783319455570, 3319455559, 3319455575 PDF ebook with Full Chapters Now
47 pages
Libero IDE - Session 1
No ratings yet
Libero IDE - Session 1
190 pages
Pundavela, Adriane Gabrielle - Ja Criminal Case
No ratings yet
Pundavela, Adriane Gabrielle - Ja Criminal Case
7 pages
Chapter 3 Report
No ratings yet
Chapter 3 Report
48 pages
Download Full (eBook PDF) Kirk's Fire Investigation (Brady Fire) 7th Edition PDF All Chapters
100% (7)
Download Full (eBook PDF) Kirk's Fire Investigation (Brady Fire) 7th Edition PDF All Chapters
55 pages
Rif300 Vortex Flowmeter
No ratings yet
Rif300 Vortex Flowmeter
28 pages
25 Service Flash DW071 Water Leaking From Bottom of Door
No ratings yet
25 Service Flash DW071 Water Leaking From Bottom of Door
2 pages
TMP55, TMV55 809866
No ratings yet
TMP55, TMV55 809866
194 pages
10.2 Generalized Eigenvectors
No ratings yet
10.2 Generalized Eigenvectors
4 pages
Regional Disparity in Agricultural Development:: Research Plan Proposal
No ratings yet
Regional Disparity in Agricultural Development:: Research Plan Proposal
25 pages
Nishat Chunian Recommendation
No ratings yet
Nishat Chunian Recommendation
6 pages
Reinforced Earth: Case Studies
No ratings yet
Reinforced Earth: Case Studies
7 pages
IA-FILIPINO
No ratings yet
IA-FILIPINO
3 pages
7.forced Convection - Internal Flow
No ratings yet
7.forced Convection - Internal Flow
37 pages
Unit 1 OSCM
No ratings yet
Unit 1 OSCM
51 pages
SLR of Ontology Use in Elearning Recommender System
No ratings yet
SLR of Ontology Use in Elearning Recommender System
16 pages
Valuation-of-goodwill-May-2024
No ratings yet
Valuation-of-goodwill-May-2024
5 pages
Bachelor of Computer Science Second Year Semester Two (Bcs Ii - I)
No ratings yet
Bachelor of Computer Science Second Year Semester Two (Bcs Ii - I)
11 pages
Role of Ngos in Rural Development in India
No ratings yet
Role of Ngos in Rural Development in India
18 pages
Sodium Sulphate Manufacturing Feasibility Study Project Proposal Business Plan in Ethiopia Pdf. - Haqiqa Investment Consultant in Ethiopia
No ratings yet
Sodium Sulphate Manufacturing Feasibility Study Project Proposal Business Plan in Ethiopia Pdf. - Haqiqa Investment Consultant in Ethiopia
1 page
Transfer Pricing Adjustments: by CA - Karnik Gulati
No ratings yet
Transfer Pricing Adjustments: by CA - Karnik Gulati
9 pages

Multi Threading

Uploaded by

Multi Threading

Uploaded by

Multithreaded

Serial algorithms are suitable for running on

There exist many competing models of parallel

• A thread of execution is the smallest sequence of

Threads allow concurrent execution of two or more parts of a program for

The keyword sync indicates that the procedure

Many algorithms contain loops, where all iterations

The Fibonacci numbers (0,1,1,2,3,5,8,13…) are

Let T(n) denote the running time of Fibonacci(n). Since this

Since this is an exponential growth, this is a particularly bad

FIB(n-1) and FIB(n-2), respectively, are independent of each other: they

Notice that if we delete the concurrency keywords spawn and

The keyword spawn does not say, however, that a

Thus, V represents a set of strands and E the

A strand of maximal length will be called a thread.

If there is an edge between thread u and v, then they are

• Span S or T∞(n). Number of vertices on the longest directed path

• Speed up = T1/Tp. How much faster it is.

• Parallelism = T1/ T∞ . The maximum possible speed up.

Assuming unit time for

- TP running time on P processors

- T∞ running time on unlimited processors, also

An ideal parallel computer with P processors can do

the execution time taken when there are unlimited processors)

which is called the span law.

- T∞ running time on unlimited processors

The speed up of a computation on P processors is

i.e. how many times faster the computations on P

We can view the parallelism from three perspectives.

Consider the computation P-FIB(4) and assume that each

Since the work is T1 = 17 and the span is T∞

= 8, the parallelism is T1/T∞ = 17/8 =

On an ideal parallel computer with P processors, a

[Given the fact the best we can hope for on P processors is

Slackness = (T1 / T∞) / P

Slackness in the context of parallel computing and

Let TP be the running time of a multithreaded

The work T1 is straightforward, since it amounts to

Recall that the span T∞ in the longest path in the

which grows dramatically as n gets

Therefore, even on the largest parallel computers, a modest

Many algorithms contain loops all of whose iterations can

A multithreaded algorithm is deterministic if and only

Often, a multithreaded algorithm that is intended to

When a processor increments x, the operation is not

If the effect of the parallel execution were that processor

In the above method, we do 8 multiplications for matrices of size N/2 x

T(N) = 8T(N/2) + O(N2) From Master's Theorem, time complexity of

-In divide and conquer method, the main

Thus, to multiply two 2 × 2 matrices, Strassen’s algorithm makes seven

One can multiply nxn matrices serially in time

We will use multithreading for a

To multiply two nxn matrices, we perform 8 matrix

recurrence T1(n) = 8 T1(n/2) + θ(n2) = θ(n3)

by case 1 of the Master theorem.

The parallelism (max possible speed up). of a

- the span for partitioning θ(1)

multiplications T∞(n) = T∞(n/2) + θ(log n)

Solving this recurrence we

get T∞(n) = θ((log n)2)

The parallelism of matrix multiplication is

given by T1(n) / T∞(n) = θ(n3 / (log n)2 )

You might also like