Control and Reinforcement Learning
Bringing tools from online learning and improper convex relaxations, our group has been working on new algorithms for control and prediction in time series that include:
- A new framework for robust control called Non-Stochastic Control. This permits control in adversarial environments via a new type of algorithm, the Gradient Perturbation Controller, which also gives rise to the first logarithmic regret in online control.
In this framework we can also control with unknown systems and partially observed states.
- Combining time series and control algorithms via the new technique of Boosting for Dynamical Systems.
- The Spectral Filtering, technique, and its application to asymmetric linear dynamical systems.
- Learning auto-regressive moving-average time series with adversarial noise.
- Maximum-entropy exploration in partially observed and/or approximated Markov Decision Processes.
Optimization for Machine Learning
Machine learning moves us from the custom-designed algorithm to generic models, such as neural networks, that are trained by optimization algorithms. Some of the most useful and efficient methods for training convex as well as non-convex methods that we have worked on include:
- The AdaGrad algorithm, and the technique of adaptive preconditioning.
- Sublinear optimization algorithms for linear classification, training support vector machines, semidefinite optimization and to other problems.
- Projection-free algorithms for online learning in the context of recommender systems, and the first linearly convergent projection-free algorithm.
Online Convex Optimization
In recent years, convex optimization and the notion of regret minimization in games, have been combined and applied to machine learning in a general framework called online convex optimization. For more information see graduate text book on online convex optimization in machine learning, or survey on the convex optimization approach to regret minimization. Our research spans efficient online algorithms as well as matrix prediction algorithms, and decision making under uncertainty and continuous multi-armed bandits.