0% found this document useful (0 votes)

21 views32 pages

Week 6 A

The document discusses parallel computing approaches including interleaved execution, blocked execution, simultaneous multi-threading, chip multi-processing, and non-uniform memory access. It also covers definitions of fine-grain and coarse-grain parallelism as well as approaches to explicit multithreading.

Uploaded by

hussmalik69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views32 pages

Week 6 A

Uploaded by

hussmalik69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Parallel Computing Landscape

(CS 526)

Muhammad awais,

Department of Computer Science,

The University of Lahore,
Approaches to Explicit Multithreading
1. Interleaved Execution:
– Fine-grained
– Processor deals with two or more thread-contexts
– Switching thread at each clock cycle
– If thread is blocked it is skipped

2. Blocked Execution:
– Coarse-grained
– Thread executed until event causes delay
• E.g., Cache miss
Definitions:Granularity
• Computation / Communication Ratio:
– In parallel computing, granularity is a qualitative
measure of the ratio of computation to
communication
– Periods of computation are typically separated from
periods of communication by synchronization events.

1. Fine-grain parallelism
2. Coarse-grain parallelism
Fine-grain Parallelism
• Relatively small amounts of computational work are done between
communication events
• Low computation to communication ratio
• Implies high communication overhead and less opportunity for
performance enhancement
• If granularity is too fine it is possible that the overhead required for
communications and synchronization between tasks takes longer
than the computation.
Coarse-grain Parallelism
• Relatively large amounts of computational
work are done between
communication/synchronization events

• High computation to communication ratio

• Implies more opportunity for performance

increase

• Harder to load balance efficiently

Approaches to Explicit Multithreading
1. Interleaved Execution:
– Fine-grained
– Processor deals with two or more thread-contexts
– Switching thread at each clock cycle
– If thread is blocked it is skipped

2. Blocked Execution:
– Coarse-grained
– Thread executed until event causes delay
• E.g., Cache miss
Approaches to Explicit Multithreading
3. Simultaneous Multi-Threading (SMT)
– Instructions simultaneously issued from multiple
threads to execution units of superscalar processor
(having multiple units for decoding and execution)

• Example (SMT):
• Intel calls it hyper-threading
• SMT with support for two threads or more
• Single multithreaded processor → logically appear
as two processors
SMT Examples
4. Chip Multi-Processing (CMP)
– A complete processor is replicated on a single
chip (e.g., Multi-core processor)
– Each processor handles separate threads
Taxonomy of Processor Architectures
Tightly Coupled -
NUMA
• Non-Uniform Memory Access (NUMA)
– Access times to different regions of memory differs
CPU Topology of SunFire X4600M2
NUMA machine
Non-uniform Memory Access
(NUMA)
• Non-uniform memory access
– All processors have access to all parts of memory
– Access time of processor differs depending on
region of memory
– Different processors access different regions of
memory at different speeds

• Cache-coherent NUMA (cc-NUMA)

– Cache coherence is maintained among the caches
of the various processors
Motivation (Why NUMA)
• SMP has practical limit to number of processors
– Bus traffic limits to between 16 and 64 processors

• In clusters each node has own memory:

– Apps do not see large global memory
– Coherence maintained by software not hardware

• NUMA retains SMP flavour while giving large scale

multiprocessing
– e.g., Silicon Graphics Origin’s NUMA machines
CC-NUMA Organization
CC-NUMA Operation
• Each processor has own L1 and L2 cache
• Each node has own main memory
• Nodes connected by some networking facility
• Each processor sees single addressable memory
• Hardware support for read/write to non-local
memories, cache coherency

• Memory request order:

1. L1 cache → L2 cache (local to processor)
2. Main memory (local to node)
3. Remote memory
NUMA Pros & Cons
• Effective performance at higher levels of parallelism
than SMP

• No major software changes

• Performance can breakdown if too much access to

remote memory
Distributed Memory / Message Passing
• Each processor has access to its own memory only

• Data transfer between processors is explicit, user calls

message passing functions

• Common Libraries for message passing: MPI, PVM, etc.

• User has complete control/responsibility for data

placement and management

Interconnection Network

CPU Memory CPU Memory CPU Memory

Hybrid Systems
• Distributed memory system with multiprocessor shared
memory nodes

• Most common architecture for current generation of

parallel machines

Interconnection Network
Network Interface Network Interface Network Interface
CPU CPU CPU
Memory

Memory

Memory
CPU CPU CPU

CPU CPU CPU

Taxonomy of Processor
Architectures
Loosely Coupled - Clusters
• Collection of independent uni-processor systems or
SMPs

• Interconnected to form a cluster

• Communication via fixed path or network connections

• Not a single shared memory

Introduction to Clusters
• Alternative to SMP
• High performance
• High availability
• A group of interconnected whole computers
• Working together as unified resource
• Illusion of being one big machine
• Each computer called a node
Cluster Benefits
• Scalability
• Superior price/performance ratio
Cluster System Architecture
Cluster Middleware
• Unified image to user
– Single system image
• Single point of entry
• Single file hierarchy
• Single job management system
• Single user interface
• Single I/O space
Cluster vs. SMP
• Both provide multiprocessor support
• Both available commercially

• SMPs:
– Easier to manage and control
– Closer to single processor systems:
• Scheduling is main difference
• Less physical space required
• Lower power consumption
Cluster vs. SMP
• Clustering:
– Superior incremental scalability
– Superior availability
• Redundancy
Introduction to Grid Computing
What is a Grid?

• Many definitions exist in the literature:

• “A computational grid is a hardware and software infrastructure

that provides dependable, consistent, pervasive (large
infrastructure), and inexpensive access to high-end computational
facilities”
• Foster and Kesselman, 1998
3-point checklist (Foster 2002)
1. Coordinates/Access of resources not subject to
centralized control

2. Uses standard, open, general-purpose protocols

and interfaces

3. Deliver non-trivial Qualities of Service (QOS)

• e.g., response time, throughput, availability,
security
Grid Architecture

Autonomous, globally distributed computers/clusters

Some of the Major Grid Projects
Name URL/Sponsor Focus
EuroGrid, eurogrid.org Create tech for remote access to super
Grid Interoperability European Union comp resources & simulation codes; in
(GRIP) GRIP, integrate with Globus Toolkit™
Globus Project™ globus.org Research on Grid technologies;
DARPA, DOE, development and support of Globus
NSF, NASA, Msoft Toolkit™; application and deployment
GridLab gridlab.org Grid technologies and applications
European Union
Grid Simulation tools
• GridSim – job scheduling
• SimGrid – single client multi-server scheduling
• Bricks – scheduling
• GangSim- Ganglia Virtual Organization(VO)
• OptoSim – Data Grid Simulations
• G3S – Grid Security services Simulator – security
services

Azure App Service
No ratings yet
Azure App Service
1,959 pages
Solution Revision Tour Worksheet PDF
100% (3)
Solution Revision Tour Worksheet PDF
10 pages
Week_6_A
No ratings yet
Week_6_A
22 pages
CS-3006_3_ParallelArchitectures
No ratings yet
CS-3006_3_ParallelArchitectures
53 pages
CS-3006_3_ParallelArchitectures
No ratings yet
CS-3006_3_ParallelArchitectures
56 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
Architecture
No ratings yet
Architecture
67 pages
Parallel Processing: sp2016 Lec#5
No ratings yet
Parallel Processing: sp2016 Lec#5
27 pages
Jamshed 2015
No ratings yet
Jamshed 2015
17 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Management Information System: Ghulam Yasin Hajvery University Lahore
No ratings yet
Management Information System: Ghulam Yasin Hajvery University Lahore
42 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Cluster Computing: Dr. C. Amalraj 01/03/2021 The University of Moratuwa Amalraj@uom - LK
No ratings yet
Cluster Computing: Dr. C. Amalraj 01/03/2021 The University of Moratuwa Amalraj@uom - LK
49 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
33 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
DSM
No ratings yet
DSM
36 pages
Background: Computer System Architectures Computer System Software
No ratings yet
Background: Computer System Architectures Computer System Software
25 pages
2 - Parallel Computer Architecture - 1
No ratings yet
2 - Parallel Computer Architecture - 1
26 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
Mod 7
No ratings yet
Mod 7
56 pages
L32 SMP
No ratings yet
L32 SMP
47 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
10-Multithreading
No ratings yet
10-Multithreading
60 pages
Understanding Non-Uniform Memory Access - NUMA
No ratings yet
Understanding Non-Uniform Memory Access - NUMA
3 pages
COA Assignment
No ratings yet
COA Assignment
21 pages
QUIZ PREP
No ratings yet
QUIZ PREP
21 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Distributed Memory Architecture
No ratings yet
Distributed Memory Architecture
48 pages
PP16 Lec4 Arch3
No ratings yet
PP16 Lec4 Arch3
23 pages
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
No ratings yet
Unit4 Session3 Parallel Computing Concepts Terminology Design Issues
30 pages
Project - ParallelComputing BSR v2
No ratings yet
Project - ParallelComputing BSR v2
40 pages
PDS Merged
No ratings yet
PDS Merged
182 pages
RS_PDS-OE 3010
No ratings yet
RS_PDS-OE 3010
8 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
MCP ppt
No ratings yet
MCP ppt
19 pages
Multi-Core Architectures
100% (1)
Multi-Core Architectures
43 pages
Thread Level Parallelism (2) : EEC 171 Parallel Architectures John Owens UC Davis
No ratings yet
Thread Level Parallelism (2) : EEC 171 Parallel Architectures John Owens UC Davis
45 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
QUIZ PREP
No ratings yet
QUIZ PREP
21 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
Lecture 19
No ratings yet
Lecture 19
20 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Unit 1
No ratings yet
Unit 1
25 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
Organization of Multiprocessor Systems
No ratings yet
Organization of Multiprocessor Systems
87 pages
Future Processors To Use Coarse-Grain Parallelism
No ratings yet
Future Processors To Use Coarse-Grain Parallelism
48 pages
Multiprocessors and Multicomputers
No ratings yet
Multiprocessors and Multicomputers
27 pages
Mastering the Art of Unix Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Unix Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
From Everand
CUDA Programming Fundamentals: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dokumen - Tips Objectarx
No ratings yet
Dokumen - Tips Objectarx
135 pages
Tree Data Structure Slides
No ratings yet
Tree Data Structure Slides
7 pages
PGDCA Assignment Paper
No ratings yet
PGDCA Assignment Paper
13 pages
Lab7 3
No ratings yet
Lab7 3
5 pages
Practical 3linux Practical For B.tech Student
No ratings yet
Practical 3linux Practical For B.tech Student
6 pages
Irfan Zul Fahmi 1121151510 Teknik Informatika: Nama: NPM: Prodi
No ratings yet
Irfan Zul Fahmi 1121151510 Teknik Informatika: Nama: NPM: Prodi
5 pages
Inheritance
No ratings yet
Inheritance
10 pages
CERTIFICATE (1)
No ratings yet
CERTIFICATE (1)
20 pages
The Ultimate Guide To React Native Optimization Ebook-Callstack
No ratings yet
The Ultimate Guide To React Native Optimization Ebook-Callstack
123 pages
CLASS1 Fundamentals of Data Structures
No ratings yet
CLASS1 Fundamentals of Data Structures
21 pages
Assignment 1: C Programming Language
No ratings yet
Assignment 1: C Programming Language
5 pages
Object Oriented Programming With Free Pascal and Lazarus
100% (2)
Object Oriented Programming With Free Pascal and Lazarus
13 pages
4) Python Operators - Jupyter Notebook
No ratings yet
4) Python Operators - Jupyter Notebook
10 pages
Debug Commands
No ratings yet
Debug Commands
7 pages
Controls
No ratings yet
Controls
14 pages
01 Introduction To Hive
No ratings yet
01 Introduction To Hive
14 pages
OS - Chapter-4 File System Interface
No ratings yet
OS - Chapter-4 File System Interface
27 pages
College Notice Board
No ratings yet
College Notice Board
46 pages
OOP Lab Report 01
No ratings yet
OOP Lab Report 01
15 pages
Broadridge Interview Questions
No ratings yet
Broadridge Interview Questions
3 pages
Hahahaha Python Questions - Variable Names
No ratings yet
Hahahaha Python Questions - Variable Names
11 pages
Software Back End Developer
No ratings yet
Software Back End Developer
2 pages
Patran 2010 PCL Reference Manual Volume 1: Function Descriptions
No ratings yet
Patran 2010 PCL Reference Manual Volume 1: Function Descriptions
2,160 pages
2.6 Linux Processes
No ratings yet
2.6 Linux Processes
3 pages
Data Structure and Algorithm MCQ: A) B) C) D)
No ratings yet
Data Structure and Algorithm MCQ: A) B) C) D)
12 pages
Ucla Pso Manual
100% (1)
Ucla Pso Manual
6 pages
CSE 5th Sem
No ratings yet
CSE 5th Sem
12 pages
SPD Unit-2-1
No ratings yet
SPD Unit-2-1
100 pages

Week 6 A

Uploaded by

Week 6 A

Uploaded by

Parallel Computing Landscape

Department of Computer Science,

• High computation to communication ratio

• Implies more opportunity for performance

• Harder to load balance efficiently

• Cache-coherent NUMA (cc-NUMA)

• In clusters each node has own memory:

• NUMA retains SMP flavour while giving large scale

• Memory request order:

• No major software changes

• Performance can breakdown if too much access to

• Data transfer between processors is explicit, user calls

• Common Libraries for message passing: MPI, PVM, etc.

• User has complete control/responsibility for data

CPU Memory CPU Memory CPU Memory

• Most common architecture for current generation of

CPU CPU CPU

• Interconnected to form a cluster

• Communication via fixed path or network connections

• Not a single shared memory

• Many definitions exist in the literature:

• “A computational grid is a hardware and software infrastructure

2. Uses standard, open, general-purpose protocols

3. Deliver non-trivial Qualities of Service (QOS)

Autonomous, globally distributed computers/clusters

You might also like