Harun Bayraktar’s Post

#CUDA 12.9 is out! and so is BF16 Tensor Core accelerated single-precision (FP32) Matrix Multiplication (GEMM) that delivers a 3X speed-up on Blackwell GPUs. To learn more about this and more, like the new block-scaled FP4 and FP8 formats check out our latest blog: https://lnkd.in/gsHCWtjv

Amit Kumar

NVIDIA | Thought Leadership | Multi-Agentic-Systems Powered by Gen AI | Stanford University | IIT Guwahati | Architecting Generative AI & LLMs across the Industries | Seeker-builder

5mo

Love this

Ditching tf32

Like
Reply
Johnny Núñez Cano

Developer Advocate | PhD student in Computer Vision | Generative AI & Robotics | AI Creator Content & Educator

5mo

and the newer ISA?

  • No alternative text description for this image
Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories