Courses at University of Oxford

• Hillary Term 2021: C21 Dynamic Programming and Reinforcement Learning

Syllabus: The dynamic programming framework: states, actions, transitions, costs. Modeling dynamic decision making problems under uncertainty, such as shortest path problems. Definition of a policy and the value/cost-to-go function of a policy. Bellman’s principle of optimality and the dynamic programming algorithm. Infinite horizon problems, stationary policies, and the Bellman equation. Algorithms for solving infinite horizon problems: policy iteration and value iteration. Solving dynamic programming problems from data: introduction to reinforcement learning approaches. Learning value functions and learning policies.

• Hillary Term 2021: C20 Robust Control

Syllabus: Analysis of dynamical systems using Lyapunov functions, Stability analysis by Linear Matrix Inequalities (LMIs), Performance measures for systems with disturbances, Performance analysis by LMIs, Controller synthesis by semi-definite programming (SDP), Schur Complement, H2 Optimal Control, Linear Quadratic Regulator (LQR) and the Riccati equation, Stability of systems with uncertain dynamics by LMIs, Stability of systems with non-linearities using the S-procedure.

• Hillary Term 2020: C20 Robust Control

Courses at University of Pennsylvania

• Spring 2019: ESE 680 Safe Learning for Control

With Manfred Morari and George J. Pappas

This advanced topics course will expose students to new research problems in the application of data-driven learning methods for control systems and cyber-physical systems. In the first part of the course, basic theory and tools including but not limited to system identification, adaptive control, reinforcement learning, safe learning, and formal methods will be introduced. In the second part, the students will lead a presentation from a selected list of recent publications. Student evaluation will be based on the paper presentation as well as a project of the student's choice. Students may choose their topic of interest for the project, for example implement and evaluate the algorithmic tools discussed in this class, extend theoretical results of the presented papers, or apply the tools in their own research problem.

List of topics:
System identification, least squares, and persistence of excitation condition
Gaussian processes
Approximate dynamic programming/Reinforcement learning, Value function approximation, and Policy gradients
Learning for model-predictive control
Scenario-based control
Finite sample analysis and regret bounds for control of linear systems
Safe learning for control
Safe learning with formal methods

• Spring 2018: ESE 680 Learning for Control

With Manfred Morari and George J. Pappas

• Fall 2017: ESE 680 Dynamic Programming and Optimal Control

The course will describe the foundations of dynamic decision making under uncertainty. In this setup we are dynamically observing the state of an environment (system) over time and are responsible for taking decisions at each time step in an effort to steer the state of the system at future time steps. This evolution is subject to uncertainty, i.e., stochastic noise, as well as incurs some cost, and we are interested in minimizing the costs accumulated over time. Hence, the optimal solution involves accounting for expected future system behavior and planning how to react in a closed loop manner. Applications include traditional control, robotics, dynamic resource allocation, etc.

List of topics:
Finite Horizon Markov Decision Process
Principle of Optimality and Dynamic Programming Algorithm
Linear Quadratic Problems. Riccati Equation. Certainty equivalence.
Limited lookahead and rollout approaches.
Discounted Infinite Horizon Markov Decision Process
Bellman equation
Value and Policy Iteration algorithms. Contraction properties
Approximate Dynamic Programming and Reinforcement Learning, Value function approximation, and Policy gradients
Simulation-based Algorithms. TD-learning, Q-learning and convergence properties. Exploration
Imperfect State Information and POMDPS
Linear Quadratic Gaussian Control. Separation Principle