James Hennessy, Intergration Engineer

The high-level approach to optimization that we always think about in deep learning is Taylor approximation. We look at the gradients (1st order), maybe approximate the Hessian (2nd order), and use these to descend on our loss surface.

An alternative approach is majorization-minimization (MM). The idea here is to upper bound our loss—with the upper bound tight at the current point—and then jump to a lower point on this upper bound. Think Expectation Maximization, K-means, variational lower bounds, etc.

Min