Skip to content

Conversation

@mratsim
Copy link
Owner

@mratsim mratsim commented Aug 4, 2024

🔥 🔥 🔥

This adds an end-to-end LLVM IR -> AMD GPU JIT compiler.
ctt_amdgpu

The good news is that AMD GPUs support vectorized add-with-carry. The bad news is that unlike Nvidia GPUs, you cannot use inline assembly to guarantee it so you need to cajole the compiler into producing those:

Another good news is that the device function is properly vectorized without needing to use tricks like __forceinline__ or "Scalable Vector" types in LLVM.

@mratsim mratsim added the enhancement :shipit: New feature or request label Aug 4, 2024
@mratsim mratsim merged commit 1e34ec2 into master Aug 5, 2024
@mratsim mratsim deleted the amdgpu branch August 5, 2024 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement :shipit: New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants