We aim at writing all Flash Linear Attention triton kernels in TileLang for better performance.
To install Tilelang, Tritton (Nightly), FLA:
pip install tilelang=0.1.2
pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly
pip uninstall fla && pip install -U git+https://github.com/fla-org/flash-linear-attention
Note for H100 users: Triton nightly version is required to avoid errors. See issue #196 for details.