The world of high performance computing (HPC) drives much of the major scientific advances throughout the world. As one of the most trusted enterprise Linux platforms, Red Hat Enterprise Linux (RHEL) serves as the foundation for many of these HPC workloads, serving industries such as automotive, financial services, biomedical, energy, and beyond.
Meanwhile, the public cloud has continued to gain traction in the broader compute marketplace, offering tremendous flexibility and dynamic infrastructure. This trend has been emerging as well for HPC, with organizations looking to take advantage of that same flexibility and extra compute capacity in order to scale HPC clusters on demand, shortening their product development or research cycles.
This is why we’re excited to launch a new offering: RHEL for HPC on Azure. We’ve partnered closely with Microsoft to identify the technical requirements to accelerate the time-to-deployment for our shared customers. With RHEL for HPC on Azure, you get the automation that installs the tools and libraries required for an accelerated HPC compute environment on Azure infrastructure.
Introducing RHEL HPC system role
The RHEL HPC 9.6 for Azure cloud offering is based on RHEL system roles.
The RHEL HPC system role is a Red Hat Ansible Automation Platform role specifically designed to simplify the deployment and configuration of HPC environments. This system role installs necessary third-party components that customers would otherwise have to manually integrate, such as the NVIDIA CUDA Driver, CUDA Toolkit, NVIDIA Collective Communications Library (NCCL), NVIDIA Fabric Manager, NVIDIA RDMA packages, and Open MPI. It is modular, allowing users to selectively install or skip specific packages and offering functionalities such as configuring storage volumes to ensure enough disk space is allocated for these large installations on Azure.
You can now select the RHEL HPC image listing in the Azure market place. After the virtual machine (VM) instance is launched, you only need to follow a few basic commands to run the RHEL HPC system role (already installed on the image). Once the system roles have downloaded all relevant HPC packages, you can save this image as a golden image and create multiple HPC instances based on it.
The RHEL HPC system role enables Red Hat to continuously release HPC packages during the next 12 months (fast path) while avoiding the need to fully align with RHEL release cadence of 6 months (slow path). As the Red Hat offering grows, you can expect to have the option to consume both RHEL releases (RHEL9.8, RHEL9.9, RHEL10.2, and so on) or the latest versions of the RHEL HPC system role.
What are we offering?
The goal of the RHEL HPC MVP is to produce an Azure-optimized image instance deployable by Azure CycleCloud, which is Microsoft's platform for end-to-end HPC cluster creation and management. HPC customers often leverage CycleCloud, which inherently handles complex cluster management and provisioning tasks.
Red Hat is launching its streamlined RHEL HPC offering for the Azure Marketplace, centered around the newly developed RHEL HPC system role delivered through Ansible, targeting RHEL 9.6 images. This offering significantly enhances the deployment experience for HPC environments on RHEL images
This system role is designed to integrate a number of core dependencies essential for modern HPC workloads:
- NVIDIA CUDA Driver: Installs the necessary proprietary kernel modules and drivers to enable the NVIDIA GPU for computation.
- NVIDIA CUDA Toolkit: Contains the development environment necessary for writing applications that use the CUDA infrastructure.
- NVIDIA Collective Communications Library (NCCL): Optimized primitives for inter-GPU communication. This library is crucial for multi-GPU scenarios and is included in the NVIDIA repository.
- NVIDIA Fabric Manager: This package is related to InfiniBand and networking utilities, particularly supporting features like NVSwitch, essential for high-speed interconnects between GPUs.
- Open MPI (Message Passing Interface): A fundamental standard for distributed HPC jobs, enabling communication between nodes in a cluster.
Where we’re going
This initial release MVP is the first step toward a complete offering, providing even more of the tools, libraries and configurations needed when running HPC workloads on Azure. Over the coming months, we will be releasing updates that incorporate even more of this critical HPC content, tested and validated by our experts here at Red Hat. Customers who purchase the MVP will have access to these updates and expanded capability of this offering.
Unlock your cloud HPC capacity today
Red Hat has long been a trusted partner in the world of HPC, enabling scientific discovery and product development. We are excited to be a trusted partner in our customers’ HPC expansion into the cloud. With RHEL for HPC on Azure, customers can get their HPC clusters deployed on Azure infrastructure faster than ever.
This offering can be found in the Azure marketplace, and is available under the name Red Hat Enterprise Linux (RHEL) for High Performance Computing (HPC) on Azure. Give it a try today and accelerate your HPC deployments.
Product trial
Red Hat Enterprise Linux | Product trial
About the authors
James Huang is a Senior Product Manager for Red Hat Enterprise Linux, where he focuses on AI and High Performance Computing.
More like this
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Virtualization
The future of enterprise virtualization for your workloads on-premise or across clouds