[Initiative]: GPU-Aware Autoscaling in Cloud Native AI Infrastructure

**Name:** GPU-Aware Autoscaling in Cloud Native AI Infrastructure

**Short description:** Whitepaper on GPU autoscaling in Kubernetes using KEDA external scaler with direct NVML metrics

**Responsible group:** TOC

**Does the initiative belong to a subproject:** No

**Primary contact:** @pmady (Pavan Madduri)

**Additional contacts:** @julioperez (Julio Perez - AI TCG Organizer)

**Initiative description:**
Kubernetes HPA can't see GPU utilization — it watches CPU/memory while GPUs sit at 100%. The current fix (DCGM exporter → Prometheus → KEDA) adds 15-30s latency and a lot of moving parts.

This whitepaper documents a direct NVML approach using KEDA's external scaler pattern:

- **Architecture constraints:** Why GPU support can't go in KEDA core (CGO, node-local device access)
- **DaemonSet + gRPC design:** Direct NVML reads with sub-second latency
- **Scaling profiles:** Pre-built configs for vLLM, Triton, training, batch workloads
- **NUMA integration:** Works with Volcano's GPU NUMA-aware scheduling
- **Production data:** 4-node A100 cluster running LLM inference
- **Ecosystem fit:** Complements DCGM, HAMi, KubeAI (doesn't replace them)

**Draft status:** Complete whitepaper ready for review (12 sections, 274 lines)
**Code implementation:** https://github.com/pmady/keda-gpu-scaler (in production)

**Deliverable(s) or exit criteria:**
- [ ] Whitepaper reviewed and approved by TOC
- [ ] Published on CNCF TAG Infrastructure website
- [ ] Presented at CNCF KubeCon or TAG session
- [ ] Reference implementation (keda-gpu-scaler) shows adoption

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Initiative]: GPU-Aware Autoscaling in Cloud Native AI Infrastructure #2188

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Initiative]: GPU-Aware Autoscaling in Cloud Native AI Infrastructure #2188

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions