-
Notifications
You must be signed in to change notification settings - Fork 704
[Initiative]: GPU-Aware Autoscaling in Cloud Native AI Infrastructure #2188
Copy link
Copy link
Open
Labels
needs-groupIndicates an issue or PR that has not been assigned a group (toc or tag/foo label applied)Indicates an issue or PR that has not been assigned a group (toc or tag/foo label applied)needs-kindIndicates an issue or PR that is missing an issue type or kind (a kind/foo label)Indicates an issue or PR that is missing an issue type or kind (a kind/foo label)needs-triageIndicates an issue or PR that has not been triaged yet (has a 'triage/foo' label applied)Indicates an issue or PR that has not been triaged yet (has a 'triage/foo' label applied)
Metadata
Metadata
Assignees
Labels
needs-groupIndicates an issue or PR that has not been assigned a group (toc or tag/foo label applied)Indicates an issue or PR that has not been assigned a group (toc or tag/foo label applied)needs-kindIndicates an issue or PR that is missing an issue type or kind (a kind/foo label)Indicates an issue or PR that is missing an issue type or kind (a kind/foo label)needs-triageIndicates an issue or PR that has not been triaged yet (has a 'triage/foo' label applied)Indicates an issue or PR that has not been triaged yet (has a 'triage/foo' label applied)
Type
Fields
Give feedbackNo fields configured for issues without a type.
Projects
StatusShow more project fields
New
StatusShow more project fields
No status
StatusShow more project fields
No status
StatusShow more project fields
No status
StatusShow more project fields
No status
Name: GPU-Aware Autoscaling in Cloud Native AI Infrastructure
Short description: Whitepaper on GPU autoscaling in Kubernetes using KEDA external scaler with direct NVML metrics
Responsible group: TOC
Does the initiative belong to a subproject: No
Primary contact: @pmady (Pavan Madduri)
Additional contacts: @JulioPerez (Julio Perez - AI TCG Organizer)
Initiative description:
Kubernetes HPA can't see GPU utilization — it watches CPU/memory while GPUs sit at 100%. The current fix (DCGM exporter → Prometheus → KEDA) adds 15-30s latency and a lot of moving parts.
This whitepaper documents a direct NVML approach using KEDA's external scaler pattern:
Draft status: Complete whitepaper ready for review (12 sections, 274 lines)
Code implementation: https://github.com/pmady/keda-gpu-scaler (in production)
Deliverable(s) or exit criteria: