HPC Lecture 2 Points
HPC Lecture 2 Points
Agenda
- Shared Memory
- All processors access a single global address space.
- Fast data sharing.
- Advantages:
- Global address space provides a user-friendly programming perspective.
- Data sharing between tasks is fast and uniform due to proximity of memory to CPUs.
- Disadvantages:
- Lack of scalability between memory and CPUs.
- Requires programmer responsibility for synchronization.
- Expensive to design and produce shared memory machines with many processors.
- Distributed Memory
- Each processor has its own memory.
- Scalable; no overhead for cache coherency.
- Advantages:
- Memory is scalable with the number of processors.
- Each processor accesses its own memory without interference or cache coherency
issues.
- Cost-effective, using off-the-shelf processors and networking.
- Disadvantages:
- Programmer responsible for communication between processors.
Multithreading vs. Multiprocessing
- Threads
- Share the same memory space and global variables.
- Processes
- Separate program with its own variables, stack, and memory allocation.
- Identify bottlenecks
Investigate other algorithms if possible. This may be the single most important
consideration when designing a parallel application.
2. Partitioning
- Break the problem into chunks for distribution across tasks (decomposition).
- Types of Partitioning:
- Domain Decomposition: Split data for each parallel task.
- Functional Decomposition: Split based on the computation needed.
- Scope of Communication
- Point-to-Point: Two tasks communicate (producer and consumer).
- Collective: Multiple tasks share data in groups.
* Data Dependencies
- Dependencies affect program order and inhibit parallelism.
- Handling Dependencies:
- Distributed Memory: Communicate data at synchronization points.
- Shared Memory: Synchronize read/write operations.
HPC Platforms
HPC Benchmarking
- LINPACK Benchmarks
- Measure floating-point computing power.
- Approximates real problem-solving performance.
- HPL (High-Performance Linpack)
- Portable Linpack implementation in C.
- Provides data for the TOP500 list.
- Metrics:
- Rmax: Achieved LINPACK performance.
- Rpeak: Theoretical peak performance.
- Cluster Components:
- Nodes: Individual computers in the cluster.
- Cores (Threads): Processing units within each node’s CPU.
- Shared Disk: Storage accessible by all nodes.