0% found this document useful (0 votes)

0 views36 pages

PDC Lecture 01

The CS-402 Parallel and Distributed Systems course syllabus outlines the course structure, prerequisites, and objectives for Fall 2024, focusing on parallel computer architectures, programming, and algorithms. Students will learn to write efficient code for parallel systems and explore various programming paradigms while completing projects related to parallelization and performance comparison. The syllabus emphasizes attendance, academic integrity, and the possibility of changes with advance notice.

Uploaded by

arhamkhan4241

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views36 pages

PDC Lecture 01

Uploaded by

arhamkhan4241

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

CS-402 Parallel and Distributed Systems

Syllabus, Fall 2024

Lecture No. 01
 Location: Computer Science Department
 Time:
TUESDAY (08:30 AM TO 11:30 AM, SECTION A)

WEDNESDAY (12:00 PM TO 3:00 PM, SECTION B and C)

 Instructor: Qamas Gul Khan Safi, Email: [email protected]

o Office hours: Monday, Thursday 10:00 to -12:30 pm, ad hoc times announced on MS-Teams,
and by appointment.
Course Requirements

 Prerequisites
o Operating Systems or equivalent
o No parallel programming/systems background required

 Course website: MS-Teams

Required textbook: There is no single required textbook for this course.
Please see the class lectures.
Course Requirements
 Reference materials:
o Hennessy and D. Patterson, “Computer Architecture A Quantitative Approach,” 6th Edition, 2017.
o Maurice Herlihy, et al, "The Art of Multiprocessor Programming," 2nd edition, 2020.
o William James Dally and Brian Patrick Towles, "Principles and Practices of Interconnection Networks,"
Morgan Kaufmann, 1st Edition, 2004.
o David B. Kirk and Wen-mei W. Hwu, "Programming Massively Parallel Processors: A Hands-on Approach,"
Morgan Kaufmann, 3rd edition, 2016.
o Kevin R. Wadleigh and Isom L Crawford, "Software Optimization for High Performance Computing: Creating
Faster Applications," Prentice Hall, 1st Edition, 2000.
o MPI: http://www-unix.mcs.anl.gov/mpi/
o OpenMP: http://www.openmp.org
o CUDA: NVIDIA CUDA Programming Guide
o Papers and tutorials in recent technical conferences and journals.
Course Description

 This class introduces parallel and distributed systems and programming, covering
three areas: parallel computer architectures, programming parallel and distributed
systems, and algorithms and systems issues in parallel and distributed systems.
o Architectures: Architectural classes, Flynn‘s taxonomy， SIMD/Vector architecture，Shared
memory architecture ， Distributed memory architecture ， GPU architecture ，
Interconnection networks
o Programming: Optimizing single thread performance，SIMD and vector extensions，Shared
memory programming ， GPU programming ， Distributed memory programming ，
Synchronization, concurrency, deadlock, race condition, determinacy
o Algorithms and systems issues: PRAM, BSP, LogP models, systems issues: job scheduling,
power, performance, security
Course Objectives

Upon completion of the course, the student will be able to

o Explain the challenges and techniques in parallel computer architectures.

o Write efficient code to exploit parallelism in uniprocessor and multi-processor

systems with different programming paradigms.

o Explain systems issues and techniques in contemporary parallel and distributed

systems.
2024
Term project

 Development projects, examples:

Parallelize a kernel or an application (Shared memory, GPU, MPI, Spark, Cloud)
Implement a PDS related technique, algorithm from a recent paper.
 Evaluation projects, examples:
Comparing SIMD performance of Intel, AMD, and ARM processors.
Comparing the performance of different All-reduce algorithms (heavily used in
distributed deep learning frameworks).
Benchmarking the performance of unified memory between GPU and CPU.
 Research projects
o Survey an emerging area (e.g. emerging programming models for heterogeneous systems,
recent advances in interconnection networks for exa-scale computing systems).
o Develop a new technique related to PDS (e.g. new algorithm to perform all-reduce for deep
learning, new topology for interconnection networks)
Course policies

 Attendance: required.
 Late assignments: not accepted without a valid excuse.
 Missed exam: following the university rules.
o Let me know when you need to miss an exam ASAP.
 Incomplete grade:
o Miss the final with an accepted excuse
o Due to extraordinary circumstances with appropriate documentation.
Course policies

 Academic Integrity
o No copying from anywhere
o Don’t ask others for solutions and don’t give solutions to others.
 Violation
o The university requires all violations to be reported.
o First violation with level 1 agreement:
0 for the particular assignment/exam and the lowering of one letter (A->B) for course final
grade.
o Second violation: resolved through the office of the Dean and the Faculties
Syllabus Changes

 This syllabus is a guide for the course and is subject to change with
advance notice.
Parallel and Distributed Systems

 What is a parallel computer?

o A collection of processing elements that communicate and cooperate to solve
problems.

 What is a distributed system?

o A collection of independent computers that appear to its users as a single
coherent system.
Almost all Contemporary Computing Systems are
Parallel and Distributed Systems.
Apple Iphone 16 Promax
OS iOS 18
o Mobile devices, IoT devices, many have multi-core CPUs Chipset Apple A18 Pro (3 nm)
CPU Hexa-core
 IPhone 13, A15 – 6 CPU cores, 16 Neural Engine cores, 4 GPU cores
GPU Apple GPU
o Desktop or laptop, multi-core CPU Uniprocessor systems
o A high-end gaming computer (CPU + GPU)
o Multi-core server Multi-processor systems
o Cloud Computing platforms (Amazon AWS, Google cloud). Location: Oak Ridge National Laboratory —
Tennessee, U.S.
o Massive gaming platforms Performance: 1,194 petaFLOPS (1.2 exaFLOPS)
Components: AMD EPYC 64-core CPUs and AMD
o Internet of Things Instinct MI250X GPUs
First online: August 2022
o Fugaku supercomputer (No. 1 on November 11, 2021, 442 Peta flops peak performance)
The performance limit of sequential program

 The CPU clock frequency implicitly implies how many operations the
computer can perform for a sequential (or single-thread) program.
o For more than 10 years, the highest CPU clock frequency stays around 4GHz
o For a sequential (single thread) program, the time to perform 10 operations is in
the order of seconds

 This is a physical limit: the CPU clock frequency is limited by the size of
the CPU and the speed of light.
The limit of clock frequency

 Speed of light = 3 × 10 m/s

 One cycle at 4Ghz frequency = s = × 10 s
×
 The distance that the light can move at one cycle:
o × = 3 × 10 m/s × × 10 s = 0.75× 10 m = 7.5cm

Intel chip dimension = 1.47 in x 1.47 in

= 3.73cm x 3.73cm

Not much room left for increasing the frequency!

Another physical limit: power

 One may think of reducing the size of the CPU to increase frequency.
 Increasing CPU frequency also increases CPU power density.
 We switch to multi-core in the 2004 due to these physical limits.

 For a sequential (single thread) program, the time to perform 10

operations is in the order of seconds.
o If one need more performance, making use of parallelism implicitly or explicitly in
the hardware is the only way to go.
Using the Multiple Computing Elements in
Contemporary Computing Systems
 In many cases, they support concurrent applications (multiple independent apps
running at the same time).
 They can also support individual parallel/distributed applications by pulling
more computing resources for one application. This will require a different type
of programming than the conventional sequential programs.
o Partition the task among multiple computing threads, coordinating and communicating
among computing threads
 This course will look under the hood of such systems and examine their
architectures, how to write effective programs to exploit architectural features,
and issues and solutions at different levels to enable parallel and distributed
computing.
Programming Parallel and Distributed Systems
 Two focuses of programming paradigms for PDS:
o Productivity
 Computing systems are fast enough for most applications. Coding is often where the bottleneck and cost are.
 Many programming systems is designed for productivity. For example, Python, Matlab, etc.
o Performance
 Computing systems are not fast enough for some applications (e.g. the training of very large deep learning
models). As a result, performance is also a focus.

 Programming systems in practice all claim to support both productivity and

performance. As computing systems become more heterogeneous and complicated,
the balance between the two is still under heavy investigation at this time.
 This class focuses on performance.
Why parallel/distributed computing?

 Some large scale applications can use any amount of computing power.
o Scientific computing applications
 Weather simulation. More computing power means more finer granularity and prediction
longer into the future.
 Japan’s K machine was built to provide enough computing power to better understand the
inner workings of the brain.
o Training of large machine learning models in many domains.

 In small scale, we would like our programs to run faster as the technology
advances – conventional sequential programs are not going to do that.
Why parallel/distributed computing?

 Bigger: Solving larger problems in the same amount of time.

 Faster: Solving the same sized problem in a shorter time.
More about parallel/distributed computing

 Parallel/distributed computing allows more hardware resources to

be utilized for a single problem. Parallel/distributed programs,
however, do not always solve bigger problems or solve the same
sized problems faster.
 Exploiting parallelism introduces overheads: work that is not necessary in the
sequential program.
 Not all applications have enough parallelism.
Naïve parallel programs are easy to write, but may not give you what you want.
What we will do in this class?

 Examine architectural features of PDS

 Introduce how to exploit the features and write efficient code for PDS
o Sequential code is a fundamental part of parallel code, so we will briefly discuss
how to write efficient sequential code.

 Study systems issues

 PDS and their programming are very broad, we try to achieve a balance
between breadth and depth.
Classification and Performance
 Flynn’s Taxonomy (1966)
 Performance, peak performance and sustained performance
 Example of parallel computing
 Computation graph, scheduling and execution time
Flynn’s Taxonomy

 Computing is basically executing instructions that operate on data.

 Flynn’s taxonomy classifies the system based on the parallelism in
instruction stream and parallel in data stream.
o single instruction stream or multiple instruction streams.
o single data stream or multiple data streams.
Flynn’s taxonomy
 Single Instruction Single Data (SISD)
 Single Instruction Multiple Data (SIMD)
 Multiple Instructions Multiple Data (MIMD)
 Multiple Instructions Single Data (MISD)
SISD

 At one time, one instruction operates on one data

o Traditional sequential architecture, Von Neumann architecture.
SIMD
 Single control unit and multiple processing units. The control unit
fetches an instruction and broadcast control to all processing units.
The instruction operates on different data.
o Can achieve massive processing power with minimum control logic
o SIMD instructions allow for sequential reasoning.
SIMD
 Exploit data-level parallelism
o Matrix-oriented scientific computing and deep learning applications
o Media (image and sound) processing

 Vector machines, MMX, SSE (Streaming SIMD Extensions), AVX

(Advanced Vector eXtensions), GPU
MISD
 Not commonly seen, no general purpose MISD computer has been built.
 Systolic array is one example of an MISD architecture.
MIMD

 Multiple instruction streams operating on multiple data streams

o MIMD can be thought of as many copies of SISD machine.
o Distributed memory multi-computers, shared memory multi-processors,
multi-core computers.
Flynn’s Taxonomy

Type Instruction Data Examples

Streams Streams
SISD 1 1 Early computers, Von Neumann architecture, turing machine
SIMD 1 N Vector architectures, MMX, SSE, AVX, GPU
MISD N 1 No general purpose machine, systolic array
MIMD N N Multi-core, multi-processor, multi-computer, cluster
Degree of Parallelism
Maximum degree of parallelism
The maximum number of binary digits that can be processed within a unit time by a
computer system is called the maximum parallelism degree P. If a processor is processing P
bits in unit time, then P is called the maximum degree of parallelism.

The maximum degree of parallelism depends on the structure of the arithmetic and logic unit. Higher degree of
parallelism indicates a highly parallel ALU or processing element. Average parallelism depends on both the
hardware and the software. Higher average parallelism can be achieved through concurrent programs.
Feng Taxonomy
In 1972, Tse-yun Feng proposed a system for classifying parallel processing systems based
on the number of bits in a word and word length. This classification focuses on the
parallelism of bits and words. Here are the four categories according to Feng’s
Classification:

1. Word Serial Bit Serial (WSBS): In this case, one bit of a selected word is processed at a time. It corresponds
to serial processing and requires maximum processing time.
2. Word Serial Bit Parallel (WSBP): All the bits of a selected word are processed simultaneously, but one word
at a time. It provides slightly more parallelism than WSBS.
3. Word Parallel Bit Serial (WPBS): One selected bit from all specified words is processed at a time. WPBS can
be thought of as column parallelism.
4. Word Parallel Bit Parallel (WPBP): All the bits of all specified words are operated on simultaneously. This
category offers maximum parallelism and minimum execution time.
Feng Taxonomy

Processors like IBM370, Cray-1, and PDP11 execute words in parallel but with varying

word sizes (from 16 to 64 bits), falling under the WSBP category. On the other hand,

processors like STARAN and MPP execute one bit of a word at a time but multiple words

together, categorizing them as WPBS processors. Finally, processors like C.mmp and PEPE

execute multiple bits and multiple words simultaneously, fitting into the WPBP category.
Handler’s Taxonomy
In 1977, Wolfgang Handler proposed a computer architectural
classification scheme for determining the degree of parallelism and
pipelining built into the computer system hardware. His classification
focuses on pipeline processing systems and divides them into three
subsystems:
1. Processor Control Unit (PCU): Each PCU corresponds to one processor or one CPU.
2. Arithmetic Logic Unit (ALU): ALU is equivalent to the processing element (PE). It
performs arithmetic and logical calculations.
3. Bit Level Circuit (BLC): BLC corresponds to the combinational logic circuit required for
1-bit operations in ALU.
Handler’s Taxonomy
Handler’s classification uses three pairs of integers to describe the computer system:
Computer: K = number of processors (PCUs) within the computer, K’ = number of PCUs that can be pipelined.
ALU: D = number of ALUs (PEs) under the control of PCU, D’ = number of PEs that can be pipelined.
Word Length: W = word length of a PE, W’ = number of pipeline stages in all PEs.
For example:
Texas Instrument’s Advanced Scientific Computer (TI ASC) has one controller controlling 4 arithmetic pipelines,
each with a 64-bit word length and 8 pipeline stages. Representing TI ASC according to Handler’s classification:
TI ASC=(K=1,K′=1,D=4,D′=1,W=64,W′=8)
CDC 6600 has a single CPU with an ALU having 10 specialized hardware functions (each 60-bit word length), and
up to 10 of these functions can be linked into a longer pipeline. It also has 10 peripheral I/O processors
operating in parallel. Each I/O processor has 1 ALU with 12 bits of word length. The representation:
CDC 6600=(K=1,K′=1,D=1,D′=10,W=60,W′=1)
Summary
 Flynn’s taxonomy: SISD, SIMD, MISD, MIMD

 Performance metrics MIPS, GFLOPS.

 Peak performance and sustained performance.

 Computation graph: describe the dependencies between tasks in a parallel computation.

 Parallelism = Work(G) / span(G), an approximation of the number of processors that can be used

effectively in the computation.

 Greedy scheduler assigns tasks to processors whenever a task is ready and a processor is available. The

execution time with a greedy scheduler is at most 2 times that of the optimal scheduler.

Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
HPC BOOk
No ratings yet
HPC BOOk
68 pages
Syllabus
No ratings yet
Syllabus
2 pages
Parallel and Distributed Computing [CC 510]
No ratings yet
Parallel and Distributed Computing [CC 510]
4 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Handbook HPC 23-24
No ratings yet
Handbook HPC 23-24
18 pages
Ec8552-Cao Unit 5
No ratings yet
Ec8552-Cao Unit 5
72 pages
Parallel and Distributed - Courseoutline
No ratings yet
Parallel and Distributed - Courseoutline
2 pages
Be Cse Syllabus
No ratings yet
Be Cse Syllabus
32 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
15-418: Parallel Computer Architecture and Programming Spring 2011 Syllabus
No ratings yet
15-418: Parallel Computer Architecture and Programming Spring 2011 Syllabus
4 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Be Cse 201617
0% (1)
Be Cse 201617
33 pages
CS-3006_Parallel and Distributed Computing_(BS All Programs)_Spring-2023
No ratings yet
CS-3006_Parallel and Distributed Computing_(BS All Programs)_Spring-2023
6 pages
BDS Session 2
No ratings yet
BDS Session 2
56 pages
1-Introduction
No ratings yet
1-Introduction
48 pages
001__DDS-IIIT-Jan-10th
No ratings yet
001__DDS-IIIT-Jan-10th
34 pages
Course Description-Computer Organisation and Architecture
No ratings yet
Course Description-Computer Organisation and Architecture
4 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 03-Aug-2021 Lecture1-Course Introduction
39 pages
Subject Name Parallel and Distributed Computing
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
Course Outline
No ratings yet
Course Outline
4 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
High Performance Computing: Sabah Sayed
No ratings yet
High Performance Computing: Sabah Sayed
22 pages
CS4230 Parallel Programming: Mary Hall August 21, 2012
No ratings yet
CS4230 Parallel Programming: Mary Hall August 21, 2012
17 pages
COSC 4101 Parallel and Distributed Computing Final
No ratings yet
COSC 4101 Parallel and Distributed Computing Final
4 pages
Parallel and Distributed Course Outline
No ratings yet
Parallel and Distributed Course Outline
4 pages
Mscs6060 Parallel and Distributed Systems
No ratings yet
Mscs6060 Parallel and Distributed Systems
50 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
BDS-Session-2
No ratings yet
BDS-Session-2
58 pages
Part 1 - Lecture 1 - Introduction Parallel Computing
No ratings yet
Part 1 - Lecture 1 - Introduction Parallel Computing
33 pages
PDC Lecture 02
No ratings yet
PDC Lecture 02
35 pages
PDC-3
No ratings yet
PDC-3
26 pages
Csi3021 Advanced-computer-Architecture TH 1.0 66 Csi3021 61 Acp
No ratings yet
Csi3021 Advanced-computer-Architecture TH 1.0 66 Csi3021 61 Acp
2 pages
CPEN 402 Advanced Architecture - 2021
No ratings yet
CPEN 402 Advanced Architecture - 2021
4 pages
W1 Intro.4u
No ratings yet
W1 Intro.4u
7 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
Co-1 (2)
No ratings yet
Co-1 (2)
66 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
Parallel and Distributed Computing Handout
100% (3)
Parallel and Distributed Computing Handout
3 pages
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
100% (2)
V. Rajaraman, C. Siva Ram Murthy - Parallel Computers Architecture and Programming-PHI (2016)
506 pages
Unit 1
No ratings yet
Unit 1
22 pages
BE CSE Syllabus 2019-20
No ratings yet
BE CSE Syllabus 2019-20
60 pages
Lecture-2-06.01.2025
No ratings yet
Lecture-2-06.01.2025
21 pages
HPC-Unit-2
No ratings yet
HPC-Unit-2
72 pages
1.Introduction
No ratings yet
1.Introduction
65 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
CSC 504 - Computer Architecture II: Course Particulars
No ratings yet
CSC 504 - Computer Architecture II: Course Particulars
6 pages
p1
No ratings yet
p1
30 pages
P 1
No ratings yet
P 1
44 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
cloud computing
No ratings yet
cloud computing
30 pages
Mtech Cse200708
No ratings yet
Mtech Cse200708
41 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Aca Co
100% (1)
Aca Co
2 pages
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
From Everand
OpenCL Programming and Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
NXF 4100
No ratings yet
NXF 4100
296 pages
Bethany Labrador Bsit-A1: 1. What Are The Advantages of Using Flowchart?
No ratings yet
Bethany Labrador Bsit-A1: 1. What Are The Advantages of Using Flowchart?
9 pages
Making Employee Journeys Unforgettable
No ratings yet
Making Employee Journeys Unforgettable
19 pages
IoT Prevoius Paper
No ratings yet
IoT Prevoius Paper
1 page
Fill in The Gaps Network Hardware
No ratings yet
Fill in The Gaps Network Hardware
2 pages
Programme(s) Semester Course Code (S) Course Title BE 03 17MT305 Theory of Control Systems Course Outcomes
No ratings yet
Programme(s) Semester Course Code (S) Course Title BE 03 17MT305 Theory of Control Systems Course Outcomes
7 pages
Basic Shell Scripting Questions
No ratings yet
Basic Shell Scripting Questions
4 pages
Dce BTCMP Group Presentation
No ratings yet
Dce BTCMP Group Presentation
4 pages
Wireless Sensor Systems: Security Implications For The Industrial Environment
No ratings yet
Wireless Sensor Systems: Security Implications For The Industrial Environment
136 pages
Final - Presentation Project BAD - Team 4
No ratings yet
Final - Presentation Project BAD - Team 4
35 pages
Introduction To Software Reuse
No ratings yet
Introduction To Software Reuse
19 pages
ds1991 Sa
No ratings yet
ds1991 Sa
6 pages
python pra 15
No ratings yet
python pra 15
5 pages
Fiche Technique Star Micronics BSC 10
No ratings yet
Fiche Technique Star Micronics BSC 10
3 pages
02 - Task - Performance - 1 (7) 0s
No ratings yet
02 - Task - Performance - 1 (7) 0s
2 pages
Sop Austria
No ratings yet
Sop Austria
4 pages
GE8151 Unit III
No ratings yet
GE8151 Unit III
40 pages
dotnetconf2024-Simplify the development of distributed applications with .NET Aspire
No ratings yet
dotnetconf2024-Simplify the development of distributed applications with .NET Aspire
57 pages
Device Id
No ratings yet
Device Id
67 pages
Training AWS - Module 5 - Elastic Load Balancing - Auto Scaling Group
No ratings yet
Training AWS - Module 5 - Elastic Load Balancing - Auto Scaling Group
51 pages
Bit2102 Bit3101a Bbit106 Data Structures and Algorithms
No ratings yet
Bit2102 Bit3101a Bbit106 Data Structures and Algorithms
3 pages
Noc Book 4
No ratings yet
Noc Book 4
271 pages
Windows 10 Availability For Ricoh Fiery Servers: Why Upgrade To Windows 10?
No ratings yet
Windows 10 Availability For Ricoh Fiery Servers: Why Upgrade To Windows 10?
3 pages
Design of OBDH Subsystem For Remote Sensing Satell
No ratings yet
Design of OBDH Subsystem For Remote Sensing Satell
5 pages
Top 100 MCQS of Computer Science - Converted
No ratings yet
Top 100 MCQS of Computer Science - Converted
25 pages
36510hp Vectra Vei 7 6285486998b83061721133
No ratings yet
36510hp Vectra Vei 7 6285486998b83061721133
3 pages
FSP 3000 Open Fabric Data Sheet
No ratings yet
FSP 3000 Open Fabric Data Sheet
2 pages
Lab - Use Wireshark To View Network Traffic
100% (1)
Lab - Use Wireshark To View Network Traffic
6 pages
Pine Script™ v5 User (3)
No ratings yet
Pine Script™ v5 User (3)
16 pages
Tutorial 1
100% (1)
Tutorial 1
23 pages

PDC Lecture 01

Uploaded by

PDC Lecture 01

Uploaded by

CS-402 Parallel and Distributed Systems

Syllabus, Fall 2024

WEDNESDAY (12:00 PM TO 3:00 PM, SECTION B and C)

 Instructor: Qamas Gul Khan Safi, Email: [email protected]

 Course website: MS-Teams

Upon completion of the course, the student will be able to

o Write efficient code to exploit parallelism in uniprocessor and multi-processor

o Explain systems issues and techniques in contemporary parallel and distributed

 Development projects, examples:

 What is a parallel computer?

 What is a distributed system?

 Speed of light = 3 × 10 m/s

Intel chip dimension = 1.47 in x 1.47 in

Not much room left for increasing the frequency!

 For a sequential (single thread) program, the time to perform 10

 Programming systems in practice all claim to support both productivity and

 Bigger: Solving larger problems in the same amount of time.

 Parallel/distributed computing allows more hardware resources to

 Examine architectural features of PDS

 Study systems issues

 Computing is basically executing instructions that operate on data.

 At one time, one instruction operates on one data

 Vector machines, MMX, SSE (Streaming SIMD Extensions), AVX

 Multiple instruction streams operating on multiple data streams

Type Instruction Data Examples

 Performance metrics MIPS, GFLOPS.

 Peak performance and sustained performance.

 Computation graph: describe the dependencies between tasks in a parallel computation.

effectively in the computation.

You might also like