**Warning: drink ☕ before reading!**

Gradient descent is the grandfather of all optimization algorithms. The fundamental insight is fairly straightforward and falls directly out of calculus (finding the slope at a point on a curve), but there are many different algorithmic implementations and all have their own trade-offs.

This is an important topic, and I’ve never seen a better primer.