0% found this document useful (0 votes)
2 views

HPC Lecture 2 Points

Uploaded by

omargamalelziky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

HPC Lecture 2 Points

Uploaded by

omargamalelziky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

High Performance Computing (HPC) - Lecture 2

Agenda

- Parallel Computer Memory Architectures


- Multithreading vs. Multiprocessing
- Designing Parallel Programs
- HPC Cluster Architecture

Parallel Computer Memory Architectures

- Shared Memory
- All processors access a single global address space.
- Fast data sharing.

-Lack of scalability between memory and CPUs‫ز‬

- Advantages:
- Global address space provides a user-friendly programming perspective.
- Data sharing between tasks is fast and uniform due to proximity of memory to CPUs.
- Disadvantages:
- Lack of scalability between memory and CPUs.
- Requires programmer responsibility for synchronization.
- Expensive to design and produce shared memory machines with many processors.

- Distributed Memory
- Each processor has its own memory.
- Scalable; no overhead for cache coherency.
- Advantages:
- Memory is scalable with the number of processors.
- Each processor accesses its own memory without interference or cache coherency
issues.
- Cost-effective, using off-the-shelf processors and networking.
- Disadvantages:
- Programmer responsible for communication between processors.
Multithreading vs. Multiprocessing

- Threads
- Share the same memory space and global variables.
- Processes
- Separate program with its own variables, stack, and memory allocation.

Designing Parallel Programs

1. Understand the Problem and the Program


- Confirm the problem can be parallelized.
- Analyze any existing serial code for parallel suitability.

- Examples of non-parallelizable problems:


- Sequential Dependency Problems (e.g., Fibonacci series).

- Input/Output Bound Tasks (e.g., file compression).

- Dynamic Programming Problems with dependencies (e.g., Knapsack Problem).


- Embarrassingly Parallel Computations

- Computation can be divided into independent parts.


- Minimal or no communication needed between processes.
- Identify program hotspots : Know where most of the real work is being done. The
majority of scientific and technical programs usually accomplish most of their work in a few
places (functions).
- Focus on parallelizing sections with high CPU usage.
- Use profilers and performance analysis tools.

- Identify bottlenecks

Other considerations : Identify inhibitors to parallelism. One common class of inhibitor is


data dependence, as demonstrated by the Fibonacci sequence above. 

Investigate other algorithms if possible. This may be the single most important
consideration when designing a parallel application.

2. Partitioning
- Break the problem into chunks for distribution across tasks (decomposition).
- Types of Partitioning:
- Domain Decomposition: Split data for each parallel task.
- Functional Decomposition: Split based on the computation needed.

3. Communication and Data Dependencies


- No Communication Needed
- Tasks execute with minimal data sharing.
- Communication Required
- Tasks need to share data (e.g., 3-D heat diffusion problems).

- Factors to Consider in Communication:


- Cost of Communication
- Communication uses machine cycles/resources, requiring synchronization.
- Bandwidth saturation due to competing communication traffic.
- Communication Metrics
- Latency: Time to send a minimal message.
- Bandwidth: Data transmitted per unit time.
- Communication Types
- Synchronous Communication: Requires handshaking; blocking.
- Asynchronous Communication: Non-blocking, allows simultaneous tasks.
* The main advantage of asynchronous communication is the ability to interleave
computation with communication, maximizing efficiency.

- Scope of Communication
- Point-to-Point: Two tasks communicate (producer and consumer).
- Collective: Multiple tasks share data in groups.

* Data Dependencies
- Dependencies affect program order and inhibit parallelism.
- Handling Dependencies:
- Distributed Memory: Communicate data at synchronization points.
- Shared Memory: Synchronize read/write operations.

4. Mapping (Load Balancing)


- Load Balancing: Distribute work to keep all tasks busy.
- Achieving Load Balance:
- Equally partition work among tasks.
- Use dynamic work assignment.

HPC Platforms

- Vertical Scaling (Scale-Up)


- Integration: Tightly integrated components (CPUs, memory, storage).
- Scalability: Add powerful components within a single system.
- Efficiency: Optimized for specific tasks with reduced communication overhead.

- Horizontal Scaling (Scale-Out)


- Distributed Architecture: Independent nodes connected via a network.
- Scalability: Add nodes for growth but limited by network bandwidth and latency.
- Flexibility: Adaptable to different workloads, complex management.

Measuring Computer Performance

- FLOPS (Floating-point Operations per Second)


- Metric for computational performance.
- Supercomputers measured in PFLOPS (PetaFLOPS).
- FLOPS Calculation: Nodes × cores per node × cycles/second × FLOPs per cycle.

HPC Benchmarking

- LINPACK Benchmarks
- Measure floating-point computing power.
- Approximates real problem-solving performance.
- HPL (High-Performance Linpack)
- Portable Linpack implementation in C.
- Provides data for the TOP500 list.
- Metrics:
- Rmax: Achieved LINPACK performance.
- Rpeak: Theoretical peak performance.

Top 500 Supercomputers


Ranked by performance metrics like Rmax, Rpeak, and power usage.

HPC Cluster Architecture

- Cluster Components:
- Nodes: Individual computers in the cluster.
- Cores (Threads): Processing units within each node’s CPU.
- Shared Disk: Storage accessible by all nodes.

You might also like