Halide 10.0.0
·
35 commits
to release/10.x
since this release
We are pleased to announce the release of Halide 10.0.0!
This is a major update over the previous version, Halide 8.0.0, and contains many new features and a few breaking changes.
What happened to version 9?
For major version numbers, we now use the included LLVM version. We aim to release new versions of Halide at the same cadence as LLVM (every six months or so).
Autoschedulers
- There are now multiple autoschedulers, and they have been reworked as plugins. They are each named for the research paper that produced them. The existing autoscheduler is now Mullapudi2016. See the generator documentation for more details.
- The Adams2019 autoscheduler has been added. It is optimized for x86 CPUs and includes an autotuning mode.
- The Li2018 autoscheduler has been added and generates CUDA schedules. It is optimized for pipelines using gradient descent features.
Build
- The CMake build has been rewritten. See
README_cmake.mdfor details. - The minimum CMake version is now 3.16
- The old
halide.cmakemodule has been removed in favor offind_package(Halide). - We no longer support the MinGW toolchain.
Language features
- The
atomicscheduling directive, which gives you another way to parallelize associative reductions (e.g. histograms, or summations) by emitting atomic instructions when available (and compare-and-swap loops or locks when not). - Support for horizontal vector reduction instructions, including dot-product instructions useful in machine learning, via combining the
vectorizeandatomicdirectives - Integer division or mod by zero now returns zero instead of being undefined behavior.
- The simplifier is now formally verified.
- You can now store Funcs that are compute_at GPU blocks in global memory, which is useful if they won't fit in shared memory.
- Allocation size inference is more precise in a variety of cases.
- Various bugfixes for
compute_with.
Backends and targets
- Better Direct3D 12 support
- Added support for macOS and Windows on ARM.
- We no longer support the legacy
buffer_ttype. - Explicit support for Volta, Turing, Ampere GPUs