-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Pull requests: karpathy/llm.c
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix build errors by adding compute capability flags to the makefile
#235
opened Apr 23, 2024 by
PeterZhizhin
Loading…
Speedup
attention_forward_kernel2
by implementing Flash Attention 2 kernel
#60
opened Apr 11, 2024 by
leloykun
Loading…
Fixed a TODO to calculate the max value neatly and use inv sum trick
#67
opened Apr 11, 2024 by
sirvan3tr
Loading…
slightly faster gelu on smaller blocksize contexts
#76
opened Apr 11, 2024 by
AndreSlavescu
Loading…
Use the command 'brew --prefix libomp' to retrieve the location where libomp would be installed on macOS.
#87
opened Apr 12, 2024 by
linmajia
Loading…
~2x perf improvement beating PyTorch (cublasLt, TF32, CUDA graphs, kernel fusion, etc…)
#89
opened Apr 12, 2024 by
ademeure
Loading…
Updated a few variables to use exact width integer types
#188
opened Apr 19, 2024 by
jonathanmarvens
Loading…
feat(attention_forward.cu): Gentle introduction to CuTe(cutlass)
#233
opened Apr 23, 2024 by
FeSens
Loading…
gelu_backwards cuda dev file and float4 dtype for parrallel memory read
#241
opened Apr 24, 2024 by
ChrisDryden
•
Draft
Rewrite the encoder_forward float4 kernel with pack128
#302
opened Apr 30, 2024 by
lancerts
Loading…
Added FlameGraphs for nsys reports and some nsys documentation
#333
opened May 2, 2024 by
PeterZhizhin
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.