**Courses at University of Oxford**

### • Hillary Term 2021: C21 Dynamic Programming and Reinforcement Learning

__Syllabus__:
The dynamic programming framework: states, actions, transitions, costs. Modeling dynamic decision making problems under uncertainty, such as shortest path problems. Definition of a policy and the value/cost-to-go function of a policy. Bellmanâ€™s principle of optimality and the dynamic programming algorithm. Infinite horizon problems, stationary policies, and the Bellman equation. Algorithms for solving infinite horizon problems: policy iteration and value iteration. Solving dynamic programming problems from data: introduction to reinforcement learning approaches. Learning value functions and learning policies.

### • Hillary Term 2021: C20 Robust Control

__Syllabus__:
Analysis of dynamical systems using Lyapunov functions, Stability analysis by Linear Matrix Inequalities (LMIs), Performance measures for systems with disturbances, Performance analysis by LMIs, Controller synthesis by semi-definite programming (SDP), Schur Complement, H2 Optimal Control, Linear Quadratic Regulator (LQR) and the Riccati equation, Stability of systems with uncertain dynamics by LMIs, Stability of systems with non-linearities using the S-procedure.

### • Hillary Term 2020: C20 Robust Control

**Courses at University of Pennsylvania **

### • Spring 2019: ESE 680 Safe Learning for Control

With Manfred Morari and George J. Pappas

This advanced topics course will expose students to new research problems in the application of data-driven learning methods for control systems and cyber-physical systems. In the first part of the course, basic theory and tools including but not limited to system identification, adaptive control, reinforcement learning, safe learning, and formal methods will be introduced. In the second part, the students will lead a presentation from a selected list of recent publications. Student evaluation will be based on the paper presentation as well as a project of the student's choice. Students may choose their topic of interest for the project, for example implement and evaluate the algorithmic tools discussed in this class, extend theoretical results of the presented papers, or apply the tools in their own research problem.

__List of topics__:

System identification, least squares, and persistence of excitation condition

Gaussian processes

Approximate dynamic programming/Reinforcement learning, Value function approximation, and Policy gradients

Learning for model-predictive control

Scenario-based control

Finite sample analysis and regret bounds for control of linear systems

Safe learning for control

Safe learning with formal methods

### • Spring 2018: ESE 680 Learning for Control

With Manfred Morari and George J. Pappas

### • Fall 2017: ESE 680 Dynamic Programming and Optimal Control

The course will describe the foundations of dynamic decision making under uncertainty. In this setup we are
dynamically observing the state of an environment (system) over time and are responsible for taking decisions at each time step in
an effort to steer the state of the system at future time steps. This evolution is subject to uncertainty, i.e., stochastic noise, as well as
incurs some cost, and we are interested in minimizing the costs accumulated over time. Hence, the optimal solution involves
accounting for expected future system behavior and planning how to react in a closed loop manner. Applications include traditional
control, robotics, dynamic resource allocation, etc.

__List of topics__:

Finite Horizon Markov Decision Process

Principle of Optimality and Dynamic Programming Algorithm

Linear Quadratic Problems. Riccati Equation. Certainty equivalence.

Limited lookahead and rollout approaches.

Discounted Infinite Horizon Markov Decision Process

Bellman equation

Value and Policy Iteration algorithms. Contraction properties

Approximate Dynamic Programming and Reinforcement Learning, Value function approximation, and Policy gradients

Simulation-based Algorithms. TD-learning, Q-learning and convergence properties. Exploration

Imperfect State Information and POMDPS

Linear Quadratic Gaussian Control. Separation Principle