The world of high performance computing (HPC) drives much of the major scientific advances throughout the world. As one of the most trusted enterprise Linux platforms, Red Hat Enterprise Linux (RHEL) serves as the foundation for many of these HPC workloads, serving industries such as automotive, financial services, biomedical, energy, and beyond.

Meanwhile, the public cloud has continued to gain traction in the broader compute marketplace, offering tremendous flexibility and dynamic infrastructure. This trend has been emerging as well for HPC, with organizations looking to take advantage of that same flexibility and extra compute capacity in order to scale HPC clusters on demand, shortening their product development or research cycles.

This is why we’re excited to launch a new offering: RHEL for HPC on Azure. We’ve partnered closely with Microsoft to identify the technical requirements to accelerate the time-to-deployment for our shared customers. With RHEL for HPC on Azure, you get  the automation that installs the tools and libraries required for an accelerated HPC compute environment on Azure infrastructure.

Introducing RHEL HPC system role

The RHEL HPC 9.6 for Azure cloud offering is based on RHEL system roles

The RHEL HPC system role is a Red Hat Ansible Automation Platform role specifically designed to simplify the deployment and configuration of HPC environments. This system role installs necessary third-party components that customers would otherwise have to manually integrate, such as the NVIDIA CUDA Driver, CUDA Toolkit, NVIDIA Collective Communications Library (NCCL), NVIDIA Fabric Manager, NVIDIA RDMA packages, and Open MPI. It is modular, allowing users to selectively install or skip specific packages and offering functionalities such as configuring storage volumes to ensure enough disk space is allocated for these large installations on Azure. 

You can now select the RHEL HPC image listing in the Azure market place. After the virtual machine (VM) instance is launched, you only need to follow a few basic commands to run the RHEL HPC system role (already installed on the image). Once the system roles have downloaded all relevant HPC packages, you can save this image as a golden image and create multiple HPC instances based on it.  

The RHEL HPC system role enables Red Hat to continuously release HPC packages during the next 12 months (fast path) while avoiding the need to fully align with RHEL release cadence of 6 months (slow path). As the Red Hat offering grows, you can expect to have the option to consume both RHEL releases (RHEL9.8, RHEL9.9, RHEL10.2, and so on) or the latest versions of the RHEL HPC system role. 

A screenshot of the Microsoft Marketplace page for the Red Hat Enterprise Linux (RHEL) for High Performance Computing (HPC) on Azure offering.

What are we offering?

The goal of the RHEL HPC MVP is to produce an Azure-optimized image instance deployable by Azure CycleCloud, which is Microsoft's platform for end-to-end HPC cluster creation and management. HPC customers often leverage CycleCloud, which inherently handles complex cluster management and provisioning tasks. 

Red Hat is launching its streamlined RHEL HPC offering for the Azure Marketplace, centered around the newly developed RHEL HPC system role delivered through Ansible, targeting RHEL 9.6 images. This offering significantly enhances the deployment experience for HPC environments on RHEL images 

This system role is designed to integrate a number of core dependencies essential for modern HPC workloads: 

  • NVIDIA CUDA Driver: Installs the necessary proprietary kernel modules and drivers to enable the NVIDIA GPU for computation.
  • NVIDIA CUDA Toolkit: Contains the development environment necessary for writing applications that use the CUDA infrastructure.
  • NVIDIA Collective Communications Library (NCCL): Optimized primitives for inter-GPU communication. This library is crucial for multi-GPU scenarios and is included in the NVIDIA repository.
  • NVIDIA Fabric Manager: This package is related to InfiniBand and networking utilities, particularly supporting features like NVSwitch, essential for high-speed interconnects between GPUs.
  • Open MPI (Message Passing Interface): A fundamental standard for distributed HPC jobs, enabling communication between nodes in a cluster.

Where we’re going

This initial release MVP is the first step toward a complete offering, providing even more of the tools, libraries and configurations needed when running HPC workloads on Azure. Over the coming months, we will be releasing updates that incorporate even more of this critical HPC content, tested and validated by our experts here at Red Hat. Customers who purchase the MVP will have access to these updates and expanded capability of this offering.

Unlock your cloud HPC capacity today

Red Hat has long been a trusted partner in the world of HPC, enabling scientific discovery and product development. We are excited to be a trusted partner in our customers’ HPC expansion into the cloud. With RHEL for HPC on Azure, customers can get their HPC clusters deployed on Azure infrastructure faster than ever.

This offering can be found in the Azure marketplace, and is available under the name Red Hat Enterprise Linux (RHEL) for High Performance Computing (HPC) on Azure. Give it a try today and accelerate your HPC deployments.

Product trial

Red Hat Enterprise Linux | Product trial

A version of Red Hat Enterprise Linux that orchestrates hardware resources and runs on physical systems, in the cloud, or as a hypervisor guest.

About the authors

James Huang is a Senior Product Manager for Red Hat Enterprise Linux, where he focuses on AI and High Performance Computing.

UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Virtualization icon

Virtualization

The future of enterprise virtualization for your workloads on-premise or across clouds