Lecture notes: optimization for ML

Spring semester is over, yay! To celebrate summer, I’ve compiled lecture notes from the graduate course COS 598D, a.k.a. “optimization for machine learning“.

The course is an aftermath of a few lectures and summer school tutorials given in various locations, in which  lectures goal of the course was to present the most useful methods and ideas in a rigorous-but-not-tedious way:

  • Suitable for a mathematically-prepared undergraduate student, and/or researcher beginning their endeavor into machine learning.
  • Focus on the important: we start with stochastic gradient descent for training deep neural networks.
  • Bring out the main ideas: i.e. projection-free methods, second-order optimization, the three main acceleration techniques, qualitatively discuss the advantages and applicability of each.
  • *short* proofs, that everyone can follow and extend in their future research. I prefer being off by a constant/log factor from the best known bound with a more insightful proof.
  • Not loose track of the goal: generalization/regret, rather than final accuracy. This means we talked about online learning, generalization and precision issues in ML.

The most recent version can be downloaded here:

This is still work in progress, please feel free to send me typos/corrections, as well as other topics you’d like to see (on my todos already: lower bounds, quasi-convexity, and the homotopy method).

Note: zero-order/bandit optimization is an obvious topic that’s not address. The reason is purely subjective – it appears as a chapter in this textbook (that also started as lecture notes!).