Time |
Topic |
8:45 - 8:50 |
Opening remarks by organizers |
8:50 - 10:00 |
Session 1: Reinforcement Learning for Control - Chair: Pramod P. Khargonekar
8:50 - 9:25 Reinforcement Learning and Optimal Control: An Overview -
Dimitri P. Bertsekas (Massachusetts Institute of Technology)
(link for book and slides)
Abstract: We discuss a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, model predictive control, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a scoring function V of the state, which biases the values of the aggregate cost function towards their correct levels. Different choices for V yield a variety of interesting methods (the classical aggregation framework is obtained when V=0). When V is the cost function of some known policy, our scheme is equivalent to enhanced forms of the rollout algorithm and model predictive control. More generally, our scheme is equivalent to approximation in value space with lookahead function equal to V plus local corrections that are constant within each aggregate state. It can yield an arbitrarily close approximation to the optimal cost function, assuming a sufficiently large number of aggregate states are used.
References (available from the author’s website):
D. P. Bertsekas, "REINFORCEMENT LEARNING AND OPTIMAL CONTROL
," New Book preprint
D. P. Bertsekas, "Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning," Lab. for Information and Decision Systems Report, MIT, October 2018.
D. P. Bertsekas, "Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations," Lab. for Information and Decision Systems Report, MIT, April 2018; a version to appear in IEEE/CAA Journal of Automatica Sinica.
Bio: Dr. Bertsekas has held faculty positions in several universities, including Stanford University (1971-1974) and the University of Illinois, Urbana (1974-1979). Since 1979 he has been teaching at the Electrical Engineering and Computer Science Department of the Massachusetts Institute of Technology, where he is currently McAfee Professor of Engineering.
Professor Bertsekas was awarded the INFORMS 1997 Prize for Research Excellence in the Interface Between Operations Research and Computer Science for his book "Neuro-Dynamic Programming" (co-authored with John Tsitsiklis), the 2000 Greek National Award for Operations Research, the 2001 ACC John R. Ragazzini Education Award, the 2009 INFORMS Expository Writing Award, the 2014 ACC Richard E. Bellman Control Heritage Award, the 2014 INFORMS Khachiyan Prize, and the SIAM/MOS 2015 George B. Dantzig Prize, and the 2018 INFORMS John von Neumann Theory Prize (jointly with John Tsitsiklis), for the contributions of the research monographs "Parallel and Distributed Computation" and "Neuro-Dynamic Programming". In 2001, he was elected to the United States National Academy of Engineering.
Dr. Bertsekas' recent books are "Convex Optimization Algorithms" (2015), “Nonlinear Programming” (3rd edition, 2016), "Dynamic Programming and Optimal Control” (4th edition, 2017), and "Abstract Dynamic Programming” (2nd edition, 2018), all published by Athena Scientific.
9:25 - 10:00 Reinforcement Learning Structures for Real-Time Optimal Control and Differential Games -
Frank L. Lewis (University of Texas at Arlington)
(link for slides)
Abstract: This talk will discuss some new adaptive control structures for learning online the solutions to optimal control problems and multi-player differential games. Techniques from reinforcement learning are used to design a new family of adaptive controllers based on actor-critic mechanisms that converge in real time to optimal control and game theoretic solutions. Continuous-time systems are considered. Application of reinforcement learning to continuous-time (CT) systems has been hampered because the system Hamiltonian contains the full system dynamics. Using our technique known as Integral Reinforcement Learning (IRL), we will develop reinforcement learning methods that do not require knowledge of the system drift dynamics. In the linear quadratic (LQ) case, the new RL adaptive control algorithms learn the solution to the Riccati equation by adaptation along the system motion trajectories. In the case of nonlinear systems with general performance measures, the algorithms learn the (approximate smooth local) solutions of HJ or HJI equations. New algorithms will be presented for solving online the non zero-sum and zero-sum multi-player games. Each player maintains two adaptive learning structures, a critic network and an actor network. The result is an adaptive control system that learns based on the interplay of agents in a game, to deliver true online gaming behavior. A new Experience Replay technique is given that uses past data for present learning and significantly speeds up convergence. New methods of Off-policy Learning allow learning of optimal solutions without knowing any dynamic information. New RL methods in Optimal Tracking allow solution of the Output Regulator Equations for heterogeneous multi-agent systems.
Bio: Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow AAAS, Fellow U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer. UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at The University of Texas at Arlington Research Institute. Qian Ren Thousand Talents Consulting Professor, Northeastern University, Shenyang, China. Foreign Expert Scholar, Huazhong University of Science and Technology. IEEE Control Systems Society Distinguished Lecturer. Bachelor's Degree in Physics/EE and MSEE at Rice University, MS in Aeronautical Engineering at Univ. W. Florida, Ph.D. at Ga. Tech. He works in feedback control, reinforcement learning, intelligent systems, and distributed control systems. He is author of 7 U.S. patents, 384 journal papers, 426 conference papers, 20 books, 48 chapters, and 12 journal special issues. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award 2009, U.K. Inst. Measurement & Control Honeywell Field Engineering Medal 2009. Received IEEE Computational Intelligence Society Neural Networks Pioneer Award 2012 and AIAA Intelligent Systems Award 2016. Distinguished Foreign Scholar at Nanjing Univ. Science & Technology. Project 111 Professor at Northeastern University, China. Distinguished Foreign Scholar at Chongqing Univ. China. Received Outstanding Service Award from Dallas IEEE Section, selected as Engineer of the Year by Ft. Worth IEEE Section. Listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Received the 2010 IEEE Region 5 Outstanding Engineering Educator Award and the 2010 UTA Graduate Dean’s Excellence in Doctoral Mentoring Award. Elected to UTA Academy of Distinguished Teachers 2012. Texas Regents Outstanding Teaching Award 2013. He served on the NAE Committee on Space Station in 1995.
|
10:00 - 10:30 |
Coffee Break
|
10:30 - 11:05 |
Session 1 continues
The Merits of Models in Continuous Reinforcement Learning -
Benjamin Recht (University of California, Berkeley)
(link for slides)
Abstract: Classical control theory and machine learning have similar goals: acquire data about the environment, perform a prediction, and use that prediction to positively impact the world. However, the approaches they use are frequently at odds. Controls is the theory of designing complex actions from well-specified models, while machine learning makes intricate, model-free predictions from data alone. For contemporary autonomous systems, some sort of hybrid may be essential in order to fuse and process the vast amounts of sensor data recorded into timely, agile, and safe decisions.
In this talk, I will examine the relative merits of model-based and model-free methods in data-driven control problems. I will discuss quantitative estimates on the number of measurements required to achieve a high quality control performance and statistical techniques that can distinguish the relative power of different methods. I will also describe how notions of robustness, safety, constraint satisfaction, and exploration can be transparently incorporated in model-based methods. Given these facts, it will remain unclear what model-free methods have to offer, given their high sample complexity and lack of reliability and versatility.
Bio: Benjamin Recht is an Associate Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Ben's research group studies the theory and practice of optimization algorithms with a particular focus on applications in machine learning, control systems, and data analysis. Ben is the recipient of a Presidential Early Career Awards for Scientists and Engineers, an Alfred P. Sloan Research Fellowship, the 2012 SIAM/MOS Lagrange Prize in Continuous Optimization, the 2014 Jamon Prize, the 2015 William O. Baker Award for Initiatives in Research, and the 2017 NIPS Test of Time Award.
|
11:05 - 12:05 |
Keynote Session - Chair: Manfred Morari
Dynamical, Symplectic and Stochastic Perspectives on Gradient-Based Optimization -
Michael I. Jordan (University of California, Berkeley)
(link for slides)
Abstract: Many new theoretical challenges have arisen in the area of gradient-based
optimization for large-scale control and inference problems, driven by the needs of
applications and the opportunities provided by new hardware and software platforms.
I discuss several recent, related results in this area: (1) a new framework
for understanding Nesterov acceleration, obtained by taking a continuous-time,
Lagrangian/Hamiltonian/symplectic perspective, (2) a discussion of how to
escape saddle points efficiently in nonconvex optimization, and (3) the
acceleration of Langevin diffusion.
Bio: Michael I. Jordan is the Pehong Chen Distinguished Professor in the
Department of Electrical Engineering and Computer Science and the
Department of Statistics at the University of California, Berkeley.
His research interests bridge the computational, statistical, cognitive
and biological sciences. Prof. Jordan is a member of the National Academy
of Sciences and a member of the National Academy of Engineering.
He has been named a Neyman Lecturer and a Medallion Lecturer by the
Institute of Mathematical Statistics. He received the IJCAI Research
Excellence Award in 2016, the David E. Rumelhart Prize in 2015 and
the ACM/AAAI Allen Newell Award in 2009
|
12:00 - 1:30 |
Lunch Break
|
1:30 - 3:15 |
Session 2: Optimization and Statistical Learning - Chair: Konstantinos Gatsis
1:30 - 2:05
Dynamical Systems and the Alternating Direction Method of Multipliers -
Rene Vidal (Johns Hopkins University)
(link for slides)
Abstract: Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms such as gradient descent and accelerated variants. This talk will present differential equations that model the continuous limit of the sequence of iterates generated by the alternating direction method of multipliers, as well as an accelerated variant. We employ the direct method of Lyapunov to analyze the stability of critical points of the dynamical systems and to obtain associated convergence rates.
Bio: Rene Vidal is the Herschel L. Seder Professor of Biomedical Engineering and the Inaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book ``Generalized Principal Component Analysis" (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for ``outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics.
2:05 - 2:40
Convergence of Policy Gradient Methods for the Linear Quadratic Regulator -
Maryam Fazel (University of Washington)
Abstract: Policy gradient methods for reinforcement learning and continuous control are popular in practice, but lack theoretical guarantees even for the simplest case of linear dynamics and a quadratic cost, i.e., the Linear Quadratic Regulator (LQR) problem. A difficulty is that unlike the classical approaches, these methods must solve a nonconvex optimization problem to find the optimal control policy. We show that despite the nonconvexity, gradient descent starting from a stabilizing policy converges to the globally optimal policy. We then discuss how this can help understand policy gradient type methods that do not have access to exact gradients.
Bio: Maryam Fazel is an Associate Professor of Electrical Engineering at the University of Washington, with adjunct appointments in Computer Science and Engineering, Mathematics, and Statistics. Maryam received her MS and PhD from Stanford University, her BS from Sharif University in Iran, and was a postdoctoral scholar at Caltech before joining UW. Her current interests are in mathematical optimization and applications in machine learning. She is a recipient of the NSF Career Award, UWEE Outstanding Teaching Award, UAI conference Best Student Paper Award with her student, and coauthored a paper selected as a Fast-Breaking paper by Science Watch in 2011. She co-leads the Algorithmic Foundations for Data Science Institute (ADSI)---an NSF TRIPODS Institute at UW, and is an associate editor of SIAM Journals on Optimization (SIOPT) and on Mathematics of Data Science (SIMODS).
2:40 - 3:15
Scenario Optimization for Robust Design - foundations and recent developments -
Giuseppe Carlo Calafiore (Politecnico di Torino)
(link for slides)
Abstract: Random convex programs (RCPs) are convex optimization problems subject to a finite number of constraints (scenarios)
that are extracted according to some probability distribution. The optimal objective value of an RCP and its associated
optimal solution (when it exists), are random variables: RCP theory is mainly concerned with providing probabilistic
assessments on the objective and on the probability of constraint violation for RCPs.
In this talk, we give a synthetic overview of RCP theory, discuss practical impact, and illustrate some applicative examples,
with focus on control applications. Finally, we glimpse at recent developments of scenario theory such as
iterative scenario design and non-convex scenario optimization.
Bio: Giuseppe C. Calafiore received the ``Laurea'' degree in Electrical Engineering from Politecnico di Torino in 1993, and the Ph.D. degree in Information and System Theory from Politecnico di Torino, in 1997. He is with the faculty of Dipartimento di Electronics and Telecommunications, Politecnico di Torino, where he currently serves as a full professor and coordinator of the Systems and Data Science lab.
Dr. Calafiore held several visiting positions at international institutions: at the Information Systems Laboratory (ISL), Stanford University, California, in 1995; at the Ecole Nationale Supérieure de Techniques Avanceés (ENSTA), Paris, in 1998; and at the University of California at Berkeley, in 1999, 2003 and 2007. He had an appointment as a Senior Fellow at the Institute of Pure and Applied Mathematics (IPAM), University of California at Los Angeles, in 2010. He had appointments as a Visiting Professor at EECS UC Berkeley in 2017 and at the Haas Business School in 2018.
He is a Fellow of the Italian National Research Council (CNR). He has been an Associate Editor for the IEEE Transactions on Systems, Man, and Cybernetics (T-SMC), for the IEEE Transactions on Automation Science and Engineering (T-ASE), and for the IEEE Transactions on Automatic Control. Dr. Calafiore is the author of more than 180 journal and conference proceedings papers, and of eight books. He is a fellow member of the IEEE since 2018. He received the IEEE Control System Society ``George S. Axelby'' Outstanding Paper Award in 2008. His research interests are in the fields of convex optimization, randomized algorithms, machine learning, computational finance, and identification and control of uncertain systems.
|
3:15 - 3:30 |
Coffee Break
|
3:30 - 5:15 |
Session 3: Safe Learning for Control - Chair: George J. Pappas
3:30 - 4:05
Safe model-based learning for robot control -
Angela Schoellig (University of Toronto)
(link for slides)
Abstract: In contrast to computers and smartphones, the promise of robotics is to design devices that can physically interact with the world. Envisioning robots to work in human-centered and interactive environments challenges current robot algorithm design, which has largely been based on a-priori knowledge about the system and its environment. In this talk, we will show how we combine models and data to achieve safe and high-performance robot behavior in the presence of uncertainties and unknown effects. In particular, we combine learned models in the form of Gaussian processes with classic tools from stability theory in order to analyze the stability of a controller on the learned model. Next, we combine this with model predictive control in order to obtain a control algorithm that is provably safe during the learning process. We demonstrate these algorithms on several experiments with self-driving vehicles. More information and videos at: www.dynsyslab.org and https://berkenkamp.me Authors: Felix Berkenkamp and Angela Schoellig
Bio:Angela Schoellig is an assistant professor at the University of Toronto Institute for Aerospace Studies, an associate director of the Centre for Aerial Robotics Research and Education at U of T, and an instructor of Udacity’s flying-car nanodegree program. She conducts research at the interface of robotics, controls, and machine learning. Her goal is to enhance the performance, safety, and autonomy of robots by enabling them to learn from past experiments and from each other. She is a recipient of a Sloan Research Fellowship (US/Canada-wide award, one of two in robotics); a Canadian Ministry of Research, Innovation \& Science Early Researcher Award; and a Connaught New Researcher Award. She is one of MIT Technology Review’s Innovators Under 35 (2017), one of Robohub’s “25 women in robotics you need to know about (2013),” winner of MIT’s 2015 Enabling Society Tech Competition, a 2015 finalist in Dubai’s 1 million “Drones for Good” competition, and the youngest member of the 2014 Science Leadership Program, which promotes outstanding scientists in Canada. Her PhD was awarded the ETH Medal and the 2013 Dimitris N. Chorafas Foundation Award (one of 35 worldwide).
4:05 - 4:40
Learning Model Predictive Control -
Francesco Borrelli (University of California, Berkeley)
(link for slides)
Abstract: Forecasts play an important role in autonomous and semi-autonomous systems. Applications include transportation, energy, manufacturing and healthcare systems. Predictions of systems dynamics, human behavior and environment conditions can improve safety and performance of the resulting system. However, constraint satisfaction, performance guarantees and real-time computation are challenged by the growing complexity of the engineered system, the human/machine interaction and the uncertainty of the environment where the system operates.
Our research over the past years has focused on predictive control design for autonomous systems safely performing iterative tasks. In this talk I will focus on recent results on the use of data to efficiently formulate predictive control problems which safely improve performance in iterative tasks. Throughout the talk I will focus on autonomous cars to motivate our research and show the benefits of the proposed techniques.
More info on: www.mpc.berkeley.edu
Bio:Francesco Borrelli received the `Laurea' degree in computer science engineering in 1998 from the University of Naples `Federico II', Italy. In 2002 he received the PhD from the Automatic Control Laboratory at ETH-Zurich, Switzerland. He is currently a Professor at the Department of Mechanical Engineering of the University of California at Berkeley, USA. He is the author of more than one hundred publications in the field of predictive control. He is author of the book Constrained Optimal Control of Linear and Hybrid Systems published by Springer Verlag, the winner of the 2009 NSF CAREER Award and the winner of the 2012 IEEE Control System Technology Award. In 2016 he was elected IEEE fellow.
Since 2004 he has served as a consultant for major international corporations. He is the founder and CTO of BrightBox Technologies Inc, a company focused on cloud-computing optimization for autonomous systems. He is the co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley.
His research interests include constrained optimal control, model predictive control and its application to advanced automotive control and energy efficient building operation.
4:40 - 5:15
Safe Learning in Robotics -
Claire J. Tomlin (University of California, Berkeley)
Abstract: A great deal of research in recent years has focused on robot learning. In many applications, guarantees that specifications are satisfied throughout the learning process are paramount. For the safety specification, we present a controller synthesis technique based on the computation of reachable sets, using optimal control and game theory. In the first part of the talk, we will review these methods and their application to collision avoidance and avionics design in air traffic management systems, and networks of unmanned aerial vehicles. In the second part, we will present a toolbox of methods combining reachability with data-driven techniques inspired by machine learning, to enable performance improvement while maintaining safety. We will illustrate these “safe learning” methods on a quadrotor UAV experimental platform which we have at Berkeley, including demonstrations of motion planning around people.
Bio:Claire J. Tomlin received the B.A.Sc.
degree in electrical engineering from the University
of Waterloo, Waterloo, ON, Canada, the M.Sc. degree
in electrical engineering from Imperial College
London, London, U.K., and the Ph.D. degree in
electrical engineering and computer sciences from
the University of California at Berkeley, Berkeley,
CA, USA.
She was an Assistant, an Associate, and a Full
Professor with the Department of Aeronautics and
Astronautics, Stanford University, Stanford, CA,
USA, from 1998 to 2007. She has held visiting researcher positions with the
NASA Ames Research Center, Mountain View, CA, USA, and Honeywell
International, Inc., Morristown, NJ, USA. She is currently the Charles A.
Desoer Professor with the Department of Electrical Engineering and Computer
Sciences, University of California at Berkeley. Her current research interests include
hybrid control systems, with applications in air-traffic systems, unmanned
aerial vehicles, and systems biology.
Dr. Tomlin was a recipient of the MacArthur Fellowship in 2006, the Okawa
Foundation Research Grant in 2006, and the Eckman Award from the American
Automatic Control Council in 2003.
|
5:15 - 5:30 |
Closing remarks by organizers
|