- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
FlashAttention-2 in Triton: From GPU Mental Models to Kernel Performance
Implementing FlashAttention-2 in Triton, with a practical intro to GPUs and NVIDIA Nsight Systems.
-
Deriving the FlashAttention Backward Pass
This supplementary note provides a complete mathematical derivation of the FlashAttention backward pass.
-
Attention at Inference: Arithmetic Intensity & KV Cache
Why MHA is memory-bound and how KV cache tricks (MQA, GQA, MLA) fix it.
-
LLM From Scratch: Building TinyGPT that works
Transformer Components, Variants, Implementation Details
-
Gradient-Based Optimization: Theory, Practice, and Evolution
sgd, momentum, adagrad, adam, and beyond