Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9 | NVIDIA Technical Blog

LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Harun Bayraktar’s Post

Harun Bayraktar

5mo

#CUDA 12.9 is out! and so is BF16 Tensor Core accelerated single-precision (FP32) Matrix Multiplication (GEMM) that delivers a 3X speed-up on Blackwell GPUs. To learn more about this and more, like the new block-scaled FP4 and FP8 formats check out our latest blog: https://lnkd.in/gsHCWtjv

Boosting Matrix Multiplication Speed and Flexibility with NVIDIA cuBLAS 12.9 | NVIDIA Technical Blog developer.nvidia.com