Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), CUDA etc.
Please visit the full listing of commits for a complete view: v0.20.dev0...v0.20.0.rc0.
Community
None.
RFCs
None.
Adreno
- #17608 - [WINDOWS] Windows build dependencies for Adreno target
BugFix
- #17761 - [FIX][RELAX] fix fusion of transpose + matmul when constant weight
- #17762 - [Fix] Fix OpenCL header in attention utils
- #17711 - [Fix][dlight] add an explicit reduction loop check in Reduce
- #17697 - [Fix] Include
<chrono>
forstd::chrono
- #17677 - Declare build backend for python package
- #17598 - [TIR][FIX] update FlopEstimator to include missing nodes
- #17601 - [Flashinfer][Fix] fix missing args in flashinfer test
- #17607 - [FIX][TVMC] Fix the mixed precision conversion pipeline
CI
- #17687 - Update images to 20250226-223225-63bc315f
- #17680 - update images to 20250225-035137-aeadc31c
- #17675 - [skip ci]Update github tvmbot
- #17635 - Cleanup legacy files
- #17634 - [skip ci]Improve build time
- #17629 - [skip ci]Robustify CI for SPOT failure
- #17620 - Unpin pytest-profiling
- #17621 - [skip ci] Remove legacy CI runners protection
- #17619 - [Refactor]Remove legacy frontend tests
Dlight
- #17754 - Fix general reduction rule to support non-last reduction axis
- #17663 - [CPU] Add CPU Backend Support for GEMV Optimization
Docker
- #17691 - Fix ml_dtypes downgrade issue introduced by TensorFlow
- #17686 - Update ml_dtypes to 0.5.1+
- #17676 - Use Torch GPU on gpu device
- #17648 - Tensorflow (aka TFLite) upgrade to 2.18.0
- #17643 - Update ml_dtypes version
- #17638 - [skip ci]Update ml_dtypes version
- #17638 - [skip ci]Update ml_dtypes version
- #17617 - Tensorflow upgrade to 2.18.0
Docs
MetaSchedule
- #17104 - Adding post optimization in MetaSchedule to Improve Scheduling
OpenCL & CLML
- #17571 - [OPENCL][TEXTURE] Improved texture memory planning
Relax
- #17814 - [PyTorch] Add stack.default and sum.default to exported programs translator
- #17820 - [PyTorch] Add support for broadcast_to, narrow ops
- #17822 - [PyTorch] Cleanup tests for ExportedProgram frontend
- #17806 - [PyTorch] Add Softplus Op Support for Exported Program and FX graph
- #17817 - [PyTorch] Support dynamic shapes in ExportedProgram frontend
- #17813 - [PyTorch] Improve ExportedProgram frontend by supporting
unflatten.int
,hardtanh_.default
,dropout_.default
,silu_.default
,add_.Tensor
andrelu_.default
- #17812 - [PyTorch] Support argsort, topk ops for ExportedProgram importer
- #17810 - [PyTorch] Add support for argsort, sort, topk ops
- #17809 - [PyTorch] Delete duplicate converter function
_to
- #17807 - [PyTorch] Fix torch 2.6 compatibility issues
- #17797 - [Pytorch] Update SELU Implementation Using Decomposed Core-Level Ops
- #17802 - [Pytorch] support for arange in exported programs translator
- #17801 - [PyTorch] Support where, cumprod and reciprocal ops for ExportedProgram importer
- #17790 - [PyTorch] Add support for index_select
- #17786 - [PyTorch] Support softshrink op for ExportedProgram
- #17788 - [PyTorch] Add support for where, cumprod and reciprocal ops
- #17785 - [PyTorch] Support prod, std and var ops for ExportedProgram importer
- #17778 - [PyTorch] Support log2, log10 and log1p ops for ExportedProgram importer
- #17772 - [PyTorch] Add support for prod, std and var ops
- #17766 - [PyTorch] Add support for log2, log10 and log1p ops
- #17760 - [PyTorch] Add support for lerp, select and clone ops
- #17751 - [PyTorch] Support one_hot, empty_like ops for ExportedProgram importer
- #17747 - [PyTorch] Support flip, gather, take ops for ExportedProgram importer
- #17738 - [PyTorch] Support elu, celu, selu ops for ExportedProgram importer
- #17726 - [PyTorch] Add support for numel, empty_like and one_hot ops
- #17707 - [PyTorch] Add support for gather, flip and take ops
- #17702 - [PyTorch] Add support for celu, selu, is_floating_point ops
- #17694 - [PyTorch] Add support for elu, hardtanh ops
- #17689 - [PyTorch] Support several binary ops for ExportedProgram importer
- #17672 - [PyTorch] Refactor binary ops tests
- #17679 - [PyTorch] Support several unary ops for ExportedProgram importer
- #17668 - [PyTorch] Add support for and_, lshift, min, or_, rshift, xor ops
- #17664 - [PyTorch] Add support for ge, gt, le, mod, ne ops
- #17659 - [PyTorch] Add support for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square ops
- #17622 - [PyTorch] Add support for abs, ceil, erf, floor, log ops and refactor unary tests
- #17566 - [ONNX] Add prim experssion support to Neg converter and update Arange converter to use relax.op.arange
- #17642 - [ONNX]replace topi.split with relax.op.split in the onnx frontend
- #17674 - [KVCache] PagedKVCache refactor, FlashInfer JIT and MLA integration
- #17618 - [KVCache] TIR attention kernel support for MLA
- #17615 - [KVCache] Add KV Cache for CPU Runtime
- #17616 - [Runtime][KVCache] Initial interface setup for MLA
- #17782 - [Frontend] Support max/min in frontend op interface
- #17758 - Allow ingesting tensor.chunk() from exported torch program
- #17781 - Enable bfloat16 for softmax struct-info inference
- #17752 - Batch norm correctness on eval mode
- #17774 - check for tensor_meta in exported_program_translator
- #17757 - Tensor.split with uneven tensors
- #17749 - Move TIR backend to gpu_generic
- #17725 - Ingest Tensor.clamp from torch export
- #17724 - Add support to ingest Tensor.expand_as()
- #17723 - Add torch exported program ingestion capability for Tensor.detach(), Tensor.copy_, and aten.lift_fresh_copy
- #17721 - Allow ingesting Upsample module from torch.export either using Size or Scale Factor argument
- #17722 - Allow ingesting vector_norm from torch.export
- #17728 - ingest Tensor.contiguous from torch export
- #17700 - Fix tree attention for Qwen2-1.5 models
- #17682 - Add support for func attr inheritance in SplitLayoutRewritePreproc
- #17654 - [BYOC] OpenCLML offload support for Relax
- #17633 - Pipeline file reorganization
- #17626 - Initial setup of relax backend pipeline
- #17568 - [PASS] Convert layout pass and ops enhanced to support sub indexing
Runtime
- #17614 - [CLML] Profiling options enabled for CLML
- #17614 - [CLML] Profiling options enabled for CLML
- #17570 - [OPENCL] Bugfix
TIR
- #17799 - Fix reduce buffer allocation position
- #17783 - [REFACTOR]remove legacy tir::any
- #17706 - Minor fix for default GPU schedule
- #17579 - [SoftwarePipeline] Ensure pipeline epilogue and prologue do not overlap
- #17584 - [LoopPartition] enforcement on loop partition control
TVMC
- #17606 - Bug fix
cuda & cutlass & tensorrt
- #17789 - [CUTLASS] Add blockwise scale gemm/bmm kernels
- #17741 - [Codegen][CUDA] Fix codegen of cast among vector bfloat16, fp8 and fp4
- #17708 - [CUDA] FP4 cast and reinterpret support
- #17639 - [CUDA] Remove htanh from unsupported math ops for CUDA 12.8
- #16950 - [Codegen, CUDA] Add FP8 Tensor Core Codegen
web
- #17695 - [WASM] Update wasm include in accordance to kv cache revamp
Misc
- #17796 - [Cublas] Added support for bfloat16 while dispatching to cublas kernels
- #17763 - [Flashinfer] Added jit flow for sampling kernel
- #17811 - [NFC] Fix
explict
typo - #17780 - [3rdparty] Enable bfloat16 for custom allreduce kernel
- #17784 - [REFACTOR] Phase out StackVM
- #17750 - BugFix: Relax comment
- #17748 - [Codegen] Support codegen for vectorized tir.ShuffleNode
- #17743 - Fix: Change variable i to x in split operation in cross_compilation_and_rpc.py
- #17730 - [Attention] Added caching for flashinfer binaries during JIT
- #17733 - [Refactor] Clean up Relay references in the codebase
- #17739 - [BF16] Support ndarray.asnumpy() to bfloat16 tensor natively using ml_dtypes
- #17734 - Remove Google Analytics
- #17731 - [IR] Compact Functor vtable
- #17736 - Fix typos in comments and strings
- #17670 - [DataType] BF16 Support
- #17727 - [FFI] Fix dynamic FFI index to ensure compatibility
- #17718 - [Refactor] Migrate build API to
tvm.compile
- #17714 - [FFI] Phase out ctypes fallback in favor of cython
- #17716 - Fix the get_target_compute_version for sm >= 100
- #17710 - [Refactor] Introduce base Executable class and
tvm.compile
interface - #17713 - [REFACTOR] Cleanup legacy relay runtime data structures
- #17712 - [DataType] Rename FP8 dtypes to standard names
- #17703 - Fix typos in multiple files
- #17693 - updated the assert in BindParams to allow tvm.relax.Constant
- #17701 - [Refactor] Remove legacy TE schedule tag
- #17683 - [MSC] Remove relay
- #17688 - Fix relax.ccl.scatter_from_worker0 assert
- #17630 - [Codegen] FP4 support
- #17685 - [REFACTOR] Cleanup legacy TE-based passes
- #17681 - [REFACTOR] Followup cleanup of relay phase out
- #17678 - Bump 3rdparty/cutlass_fpA_intB_gemm
- #17669 - [REFACTOR] Allow target dependent default tir pipeline dispatch in tir.build()
- #17665 - [REFACTOR] move build flow from C++ to Python
- #17624 - Added support for normal MLA kernel
- #17641 - Pick up vector length from 'zvlXXXb' (RVV) mattr for riscv
- #17666 - [Refactor] Improve TargetHasSVE function with optional target handling
- #17661 - [Refactor] Phrase out python dependency
decorator
- #17662 - [REFACTOR] Phase out te.Schedule c++ components
- #17660 - [REFACTOR] Phase out relay c++ components
- #17655 - Upgrading onnx and onnxrt verions
- #17657 - Update argument order for relax.op.pad to make it round-trippable
- #17658 - [REFACTOR] Phase out te.schedule python components
- #17653 - Update images to 20250214-034537-bd1411f8
- #17656 - [REFACTOR] Phase out relay python components
- #17649 - [Refactor] Phase out python dependency attrs
- #17644 - Bump rollup from 2.79.1 to 2.79.2 in /web
- #17637 - [PYTHON] Build cython by default
- #17631 - Handle vector width (VLEN) for RISCV arches
- #17613 - Bug Fix: Removed unused code
- #17585 - [Relay]Disable InferType if it was done and no changes after previous pass
- #17605 - [Refactor] Phase out legacy example apps
- #17603 - [Refactor] Phase out legacy docs
- #17513 - [GRAPH RT] Additional API support