Week 6 A
Week 6 A
(CS 526)
Muhammad awais,
2. Blocked Execution:
– Coarse-grained
– Thread executed until event causes delay
• E.g., Cache miss
Definitions:Granularity
• Computation / Communication Ratio:
– In parallel computing, granularity is a qualitative
measure of the ratio of computation to
communication
– Periods of computation are typically separated from
periods of communication by synchronization events.
1. Fine-grain parallelism
2. Coarse-grain parallelism
Fine-grain Parallelism
• Relatively small amounts of computational work are done between
communication events
• Low computation to communication ratio
• Implies high communication overhead and less opportunity for
performance enhancement
• If granularity is too fine it is possible that the overhead required for
communications and synchronization between tasks takes longer
than the computation.
Coarse-grain Parallelism
• Relatively large amounts of computational
work are done between
communication/synchronization events
2. Blocked Execution:
– Coarse-grained
– Thread executed until event causes delay
• E.g., Cache miss
Approaches to Explicit Multithreading
3. Simultaneous Multi-Threading (SMT)
– Instructions simultaneously issued from multiple
threads to execution units of superscalar processor
(having multiple units for decoding and execution)
• Example (SMT):
• Intel calls it hyper-threading
• SMT with support for two threads or more
• Single multithreaded processor → logically appear
as two processors
SMT Examples
4. Chip Multi-Processing (CMP)
– A complete processor is replicated on a single
chip (e.g., Multi-core processor)
– Each processor handles separate threads
Taxonomy of Processor Architectures
Tightly Coupled -
NUMA
• Non-Uniform Memory Access (NUMA)
– Access times to different regions of memory differs
CPU Topology of SunFire X4600M2
NUMA machine
Non-uniform Memory Access
(NUMA)
• Non-uniform memory access
– All processors have access to all parts of memory
– Access time of processor differs depending on
region of memory
– Different processors access different regions of
memory at different speeds
Interconnection Network
Interconnection Network
Network Interface Network Interface Network Interface
CPU CPU CPU
Memory
Memory
Memory
CPU CPU CPU
• SMPs:
– Easier to manage and control
– Closer to single processor systems:
• Scheduling is main difference
• Less physical space required
• Lower power consumption
Cluster vs. SMP
• Clustering:
– Superior incremental scalability
– Superior availability
• Redundancy
Introduction to Grid Computing
What is a Grid?