Skip to content

[Roadmap] Fall 2025 KernelBench Maintenance + Improvement Plan #74

@simonguozirui

Description

@simonguozirui

This fall, the KernelBench team will continue to maintain and improve the repo. This issue serves as a roadmap and a document that we might continue to update. If you have concrete feature requests, please post them below or ideally open an issue on the repo.

We have a fantastic group of Stanford undergrads: @AffectionateCurry @nathanjpaek @pythonomar22 @Marsella8 as core maintainers, with @ethanboneh on RL framework integration. We very much welcome community contributions in these directions (we try our best to review the PRs). Thank you to @alexzhang13 @hqjenny for the feedback.

Goal & Motivation

KernelBench has quickly become the standard for evaluating LLM Kernel Generation capabilities. As pointed out by many others in the community and we found in our follow-up work, there are aspects of the benchmark that could be improved to make it a more valuable tool for the community. We already started with this over the summer with KernelBench v0.1 by @AffectionateCurry @nataliakokoromyti @anneouyang.

Ultimately, We want to make KernelBench easy (push-button eval), usable (easy to integrate), and referenceable (compare across various approaches)

Overall Milestone

  • Milestone 1: By October (SF GPU mode hackathon), resolve all previous PRs and Issues (at least have an answer regarding it)
  • Milestone 2: Various integrations with community project for future research directions (RL, evolutionary search, more languages) and for people to experiment with various approaches
  • Milestone 3: Create a Referenceable, Reproducible Pipeline

We hope we could have an update/announcement by early Dec / NeurIPS.

Below are the concrete goals and (tempororay) assignments. We will try our best to realize all of these features, but we make no guarantees. We would love to welcome community contributions!

Milestone 1: Improve KernelBench itself

Milestone 2: Framework Integration

  • Curate a doc of how KernelBench has been used and various approaches tackling it.

DSL (NVIDIA hardware) support.

Alternative Hardware platform support.

  • AMD HiP Support
  • Google TPU support

RL and Search Framework Integration. See #73 for detail.

Milestone 3: Referenceable, Reproducible Pipeline

To make KernelBench an actual standard, led by @pythonomar22 @AffectionateCurry

Metadata

Metadata

Labels

documentationImprovements or additions to documentationenhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions