Skip to content

[Initiative]: GPU-Aware Autoscaling in Cloud Native AI Infrastructure #2188

@pmady

Description

@pmady

Name: GPU-Aware Autoscaling in Cloud Native AI Infrastructure

Short description: Whitepaper on GPU autoscaling in Kubernetes using KEDA external scaler with direct NVML metrics

Responsible group: TOC

Does the initiative belong to a subproject: No

Primary contact: @pmady (Pavan Madduri)

Additional contacts: @JulioPerez (Julio Perez - AI TCG Organizer)

Initiative description:
Kubernetes HPA can't see GPU utilization — it watches CPU/memory while GPUs sit at 100%. The current fix (DCGM exporter → Prometheus → KEDA) adds 15-30s latency and a lot of moving parts.

This whitepaper documents a direct NVML approach using KEDA's external scaler pattern:

  • Architecture constraints: Why GPU support can't go in KEDA core (CGO, node-local device access)
  • DaemonSet + gRPC design: Direct NVML reads with sub-second latency
  • Scaling profiles: Pre-built configs for vLLM, Triton, training, batch workloads
  • NUMA integration: Works with Volcano's GPU NUMA-aware scheduling
  • Production data: 4-node A100 cluster running LLM inference
  • Ecosystem fit: Complements DCGM, HAMi, KubeAI (doesn't replace them)

Draft status: Complete whitepaper ready for review (12 sections, 274 lines)
Code implementation: https://github.com/pmady/keda-gpu-scaler (in production)

Deliverable(s) or exit criteria:

  • Whitepaper reviewed and approved by TOC
  • Published on CNCF TAG Infrastructure website
  • Presented at CNCF KubeCon or TAG session
  • Reference implementation (keda-gpu-scaler) shows adoption

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-groupIndicates an issue or PR that has not been assigned a group (toc or tag/foo label applied)needs-kindIndicates an issue or PR that is missing an issue type or kind (a kind/foo label)needs-triageIndicates an issue or PR that has not been triaged yet (has a 'triage/foo' label applied)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    New
    Status
    No status
    Status
    No status
    Status
    No status
    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions