1 Introduction
1 Introduction
programming
Why parallel computing?
▪ From 1986-2002
▪ Performance of microprocessor increased on an average by 50%
▪ But later, the performance gain was reduced by 20%
▪ Because computer designers started focusing on designing parallel computers
▪ Rather than designing complex single core processors
▪ Multi-core processors
▪ But software developers used to develop serial programs
▪ Aren’t single processor systems fast enough?
▪ Why build parallel systems?
▪ Why we need parallel programs?
Why we need ever-lasting increase in
performance?
▪ Past improvements in performance of microprocessor resulted in
quicker web searcher, accurate and quick medical diagnosis, realistic
computer games, etc
▪ Higher computation power means we can solve larger problems:
▪ Climate modelling
▪ Protein folding
▪ Drug discovery
▪ Energy research
▪ Data analysis
Why we are building parallel systems?
▪ Increase in single processor performance has been due the ever-increasing
density of transistors
▪ As the size of transistors decreases, their speed can be increased
▪ Their power consumption also increases
▪ Dissipates heats
▪ Highly unreliable
▪ Hence, it was impossible to increase the speed of integrated circuits
▪ But, increasing transistor density can continue
▪ Rather than building ever-faster, more complex, monolithic processors
▪ They started bringing out multiple, relatively simple, complete processors
on a single chip
Why we need to write parallel programs?
▪ Most serial programs are designed to run on single core
▪ They are unaware of multiple processors
▪ We can at max run multiple instances of same program on multiple
cores
▪ This is not what we want. Why?
Why parallelism?
▪ Transistor to FLOPs
▪ It is possible to fabricate devices with very large transistor counts
▪ How we use these transistors to achieve increasing rates of computation?
▪ Memory and Disk speed
▪ Overall speed of computation is determined not just by the speed of the
processor, but also by the ability of the memory system to feed data to it
▪ Bottleneck: Gap between processor speed and memory
▪ Data communication
▪ Data mining: mining of large data distributed over relatively low bandwidth
data
▪ Without parallelism its not possible to collect the data at a central location
Applications of parallel computing
▪ Applications in Engineering and Design
▪ Optimization problems
▪ Internal combustion engines
▪ Airfoils designs in aircraft
▪ Scientific Applications
▪ Sequencing of the human genome
▪ Weather modeling, mineral prospecting, flood prediction, etc.
▪ Commercial Application
▪ Web and database servers
▪ Data mining
▪ Analysis for optimizing business and marketing decisions
▪ Applications in Computer Systems
Introduction
▪ Parallel Computing: It is the use of parallel computer to reduce the
time needed to solve computational problem
𝑇𝑠𝑒𝑞
𝑆=
𝑇𝑝𝑎𝑟
1-P P
A1 A2 A1 A2
A1 A2
Memory
Cache-coherence (MESI protocol)
▪ Under control of cache coherence logic discrepancy can be
avoided
▪ M modified: The cache line has been modified in this cache, P1 P2
and it resides in no other cache than this one. Only upon C1 C2
eviction, memory reflect the most current state.
▪ E exclusive: The cache line has been read from memory but not
(yet) modified. However, it resides in no other cache. A1 A2 A1 A2
▪ S shared: The cache line has been read from memory but not
(yet) modified. There may be other copies in other caches of the A1 A2
machine.
Memory
▪ I invalid: The cache line does not reflect any sensible data.
Under normal circumstances this happens if the cache line was
in the shared state and another processor has requested
exclusive ownership.
Uniform Memory Access (UMA)
▪ Simplest implementation of a UMA system is a dual-core processor, in
which two CPUs on one chip share a single path to memory