AI space.

I write about AI, I focus on Gen AI, NLP, Diffusion Models.

FlashAttention-2 in Triton: From GPU Mental Models to Kernel Performance

Implementing FlashAttention-2 in Triton, with a practical intro to GPUs and NVIDIA Nsight Systems.

21 min read · January 31, 2026

2026 · dl, MLSystems · cs336
Deriving the FlashAttention Backward Pass

This supplementary note provides a complete mathematical derivation of the FlashAttention backward pass.

5 min read · January 16, 2026

2026 · dl, MLSystems
Attention at Inference: Arithmetic Intensity & KV Cache

Why MHA is memory-bound and how KV cache tricks (MQA, GQA, MLA) fix it.

21 min read · November 07, 2025

2025 · dl · inference
LLM From Scratch: Building TinyGPT that works

Transformer Components, Variants, Implementation Details

48 min read · October 05, 2025

2025 · dl llms · llms_from_scratch
Gradient-Based Optimization: Theory, Practice, and Evolution

sgd, momentum, adagrad, adam, and beyond

26 min read · August 24, 2025

2025 · dl-basics