Workshop on Learning for Control (CDC 2018)

Over the past two decades, advances in computing and communications have resulted in the creation, transmission and storage of data from all sectors of society. Over the next decade, the biggest generator of data is expected to be Internet-of-Things devices which sense and control the physical world. This explosion of data that is emerging from the physical world requires a rapprochement of areas such as machine learning, control theory, and optimization. The availability and scale of data, both temporal and spatial, brings a wonderful opportunity for our community to both advance the theory of control systems in a more data-driven fashion, as well as have a broader industrial and societal impact.

There are various challenges on the interface between the control community and the machine learning community. The goal of our workshop is to focus on what new ideas, approaches or questions can arise when learning theory is applied to control problems.In particular, our workshop goals are:

Present state-of-the-art results in the theory and application of Learning for Control, including topics such as statistical learning for control, reinforcement learning for control, online and safe learning for control
Bring together some of the leading researchers across the fields in order to promote cross-fertilization of results, tools, and ideas, and stimulate further progress in the area
Attract new researchers in these exciting problems, creating a larger yet focused community that thinks rigorously across the disciplines and ask new questions

We are delighted to have assembled a world-class team of leading researchers working on the interface between machine learning and control.

The workshop will take place on Sunday December 16, 2018 during the 57th IEEE Conference on Decision and Control at the Fontainebleau in Miami Beach, FL, USA.

For accomodation information please visit the conference page.

Registration for the workshop can be made through this link at the 57th IEEE Conference on Decision and Control website. Please note that only people who have registered for the conference can register for the workshop. The conference early registration rate is till October 1st. The workshop fees are as follows:

Category	Fee
Member	170 USD
Non-Member	170 USD
Life Member	85 USD
Student	85 USD
Retiree	85 USD

Michael I. Jordan
University of California, Berkeley

Dimitri P. Bertsekas
Massachusetts Institute of Technology

Francesco Borrelli
University of California, Berkeley

Giuseppe Carlo Calafiore
Politecnico di Torino

Frank L. Lewis
University of Texas at Arlington

Benjamin Recht
University of California, Berkeley

Claire J. Tomlin
University of California, Berkeley

Time	Topic
8:45 - 8:50	Opening remarks by organizers
8:50 - 10:00	Session 1: Reinforcement Learning for Control - Chair: Pramod P. Khargonekar 8:50 - 9:25 Reinforcement Learning and Optimal Control: An Overview - Dimitri P. Bertsekas (Massachusetts Institute of Technology) (link for book and slides) Abstract: We discuss a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, model predictive control, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a scoring function V of the state, which biases the values of the aggregate cost function towards their correct levels. Different choices for V yield a variety of interesting methods (the classical aggregation framework is obtained when V=0). When V is the cost function of some known policy, our scheme is equivalent to enhanced forms of the rollout algorithm and model predictive control. More generally, our scheme is equivalent to approximation in value space with lookahead function equal to V plus local corrections that are constant within each aggregate state. It can yield an arbitrarily close approximation to the optimal cost function, assuming a sufficiently large number of aggregate states are used. References (available from the author’s website): D. P. Bertsekas, "REINFORCEMENT LEARNING AND OPTIMAL CONTROL ," New Book preprint D. P. Bertsekas, "Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning," Lab. for Information and Decision Systems Report, MIT, October 2018. D. P. Bertsekas, "Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations," Lab. for Information and Decision Systems Report, MIT, April 2018; a version to appear in IEEE/CAA Journal of Automatica Sinica. Bio: Dr. Bertsekas has held faculty positions in several universities, including Stanford University (1971-1974) and the University of Illinois, Urbana (1974-1979). Since 1979 he has been teaching at the Electrical Engineering and Computer Science Department of the Massachusetts Institute of Technology, where he is currently McAfee Professor of Engineering. Professor Bertsekas was awarded the INFORMS 1997 Prize for Research Excellence in the Interface Between Operations Research and Computer Science for his book "Neuro-Dynamic Programming" (co-authored with John Tsitsiklis), the 2000 Greek National Award for Operations Research, the 2001 ACC John R. Ragazzini Education Award, the 2009 INFORMS Expository Writing Award, the 2014 ACC Richard E. Bellman Control Heritage Award, the 2014 INFORMS Khachiyan Prize, and the SIAM/MOS 2015 George B. Dantzig Prize, and the 2018 INFORMS John von Neumann Theory Prize (jointly with John Tsitsiklis), for the contributions of the research monographs "Parallel and Distributed Computation" and "Neuro-Dynamic Programming". In 2001, he was elected to the United States National Academy of Engineering. Dr. Bertsekas' recent books are "Convex Optimization Algorithms" (2015), “Nonlinear Programming” (3rd edition, 2016), "Dynamic Programming and Optimal Control” (4th edition, 2017), and "Abstract Dynamic Programming” (2nd edition, 2018), all published by Athena Scientific. 9:25 - 10:00 Reinforcement Learning Structures for Real-Time Optimal Control and Differential Games - Frank L. Lewis (University of Texas at Arlington) (link for slides) Abstract: This talk will discuss some new adaptive control structures for learning online the solutions to optimal control problems and multi-player differential games. Techniques from reinforcement learning are used to design a new family of adaptive controllers based on actor-critic mechanisms that converge in real time to optimal control and game theoretic solutions. Continuous-time systems are considered. Application of reinforcement learning to continuous-time (CT) systems has been hampered because the system Hamiltonian contains the full system dynamics. Using our technique known as Integral Reinforcement Learning (IRL), we will develop reinforcement learning methods that do not require knowledge of the system drift dynamics. In the linear quadratic (LQ) case, the new RL adaptive control algorithms learn the solution to the Riccati equation by adaptation along the system motion trajectories. In the case of nonlinear systems with general performance measures, the algorithms learn the (approximate smooth local) solutions of HJ or HJI equations. New algorithms will be presented for solving online the non zero-sum and zero-sum multi-player games. Each player maintains two adaptive learning structures, a critic network and an actor network. The result is an adaptive control system that learns based on the interplay of agents in a game, to deliver true online gaming behavior. A new Experience Replay technique is given that uses past data for present learning and significantly speeds up convergence. New methods of Off-policy Learning allow learning of optimal solutions without knowing any dynamic information. New RL methods in Optimal Tracking allow solution of the Output Regulator Equations for heterogeneous multi-agent systems. Bio: Member, National Academy of Inventors. Fellow IEEE, Fellow IFAC, Fellow AAAS, Fellow U.K. Institute of Measurement & Control, PE Texas, U.K. Chartered Engineer. UTA Distinguished Scholar Professor, UTA Distinguished Teaching Professor, and Moncrief-O’Donnell Chair at The University of Texas at Arlington Research Institute. Qian Ren Thousand Talents Consulting Professor, Northeastern University, Shenyang, China. Foreign Expert Scholar, Huazhong University of Science and Technology. IEEE Control Systems Society Distinguished Lecturer. Bachelor's Degree in Physics/EE and MSEE at Rice University, MS in Aeronautical Engineering at Univ. W. Florida, Ph.D. at Ga. Tech. He works in feedback control, reinforcement learning, intelligent systems, and distributed control systems. He is author of 7 U.S. patents, 384 journal papers, 426 conference papers, 20 books, 48 chapters, and 12 journal special issues. He received the Fulbright Research Award, NSF Research Initiation Grant, ASEE Terman Award, Int. Neural Network Soc. Gabor Award 2009, U.K. Inst. Measurement & Control Honeywell Field Engineering Medal 2009. Received IEEE Computational Intelligence Society Neural Networks Pioneer Award 2012 and AIAA Intelligent Systems Award 2016. Distinguished Foreign Scholar at Nanjing Univ. Science & Technology. Project 111 Professor at Northeastern University, China. Distinguished Foreign Scholar at Chongqing Univ. China. Received Outstanding Service Award from Dallas IEEE Section, selected as Engineer of the Year by Ft. Worth IEEE Section. Listed in Ft. Worth Business Press Top 200 Leaders in Manufacturing. Received the 2010 IEEE Region 5 Outstanding Engineering Educator Award and the 2010 UTA Graduate Dean’s Excellence in Doctoral Mentoring Award. Elected to UTA Academy of Distinguished Teachers 2012. Texas Regents Outstanding Teaching Award 2013. He served on the NAE Committee on Space Station in 1995.
10:00 - 10:30	Coffee Break
10:30 - 11:05	Session 1 continues The Merits of Models in Continuous Reinforcement Learning - Benjamin Recht (University of California, Berkeley) (link for slides) Abstract: Classical control theory and machine learning have similar goals: acquire data about the environment, perform a prediction, and use that prediction to positively impact the world. However, the approaches they use are frequently at odds. Controls is the theory of designing complex actions from well-specified models, while machine learning makes intricate, model-free predictions from data alone. For contemporary autonomous systems, some sort of hybrid may be essential in order to fuse and process the vast amounts of sensor data recorded into timely, agile, and safe decisions. In this talk, I will examine the relative merits of model-based and model-free methods in data-driven control problems. I will discuss quantitative estimates on the number of measurements required to achieve a high quality control performance and statistical techniques that can distinguish the relative power of different methods. I will also describe how notions of robustness, safety, constraint satisfaction, and exploration can be transparently incorporated in model-based methods. Given these facts, it will remain unclear what model-free methods have to offer, given their high sample complexity and lack of reliability and versatility. Bio: Benjamin Recht is an Associate Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Ben's research group studies the theory and practice of optimization algorithms with a particular focus on applications in machine learning, control systems, and data analysis. Ben is the recipient of a Presidential Early Career Awards for Scientists and Engineers, an Alfred P. Sloan Research Fellowship, the 2012 SIAM/MOS Lagrange Prize in Continuous Optimization, the 2014 Jamon Prize, the 2015 William O. Baker Award for Initiatives in Research, and the 2017 NIPS Test of Time Award.
11:05 - 12:05	Keynote Session - Chair: Manfred Morari Dynamical, Symplectic and Stochastic Perspectives on Gradient-Based Optimization - Michael I. Jordan (University of California, Berkeley) (link for slides) Abstract: Many new theoretical challenges have arisen in the area of gradient-based optimization for large-scale control and inference problems, driven by the needs of applications and the opportunities provided by new hardware and software platforms. I discuss several recent, related results in this area: (1) a new framework for understanding Nesterov acceleration, obtained by taking a continuous-time, Lagrangian/Hamiltonian/symplectic perspective, (2) a discussion of how to escape saddle points efficiently in nonconvex optimization, and (3) the acceleration of Langevin diffusion. Bio: Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. His research interests bridge the computational, statistical, cognitive and biological sciences. Prof. Jordan is a member of the National Academy of Sciences and a member of the National Academy of Engineering. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the IJCAI Research Excellence Award in 2016, the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009
12:00 - 1:30	Lunch Break
1:30 - 3:15	Session 2: Optimization and Statistical Learning - Chair: Konstantinos Gatsis 1:30 - 2:05 Dynamical Systems and the Alternating Direction Method of Multipliers - Rene Vidal (Johns Hopkins University) (link for slides) Abstract: Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms such as gradient descent and accelerated variants. This talk will present differential equations that model the continuous limit of the sequence of iterates generated by the alternating direction method of multipliers, as well as an accelerated variant. We employ the direct method of Lyapunov to analyze the stability of critical points of the dynamical systems and to obtain associated convergence rates. Bio: Rene Vidal is the Herschel L. Seder Professor of Biomedical Engineering and the Inaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. His research focuses on the development of theory and algorithms for the analysis of complex high-dimensional datasets such as images, videos, time-series and biomedical data. Dr. Vidal has been Associate Editor of TPAMI and CVIU, Program Chair of ICCV and CVPR, co-author of the book ``Generalized Principal Component Analysis" (2016), and co-author of more than 200 articles in machine learning, computer vision, biomedical image analysis, hybrid systems, robotics and signal processing. He is a fellow of the IEEE, IAPR and Sloan Foundation, a ONR Young Investigator, and has received numerous awards for his work, including the 2012 J.K. Aggarwal Prize for ``outstanding contributions to generalized principal component analysis (GPCA) and subspace clustering in computer vision and pattern recognition” as well as best paper awards in machine learning, computer vision, controls, and medical robotics. 2:05 - 2:40 Convergence of Policy Gradient Methods for the Linear Quadratic Regulator - Maryam Fazel (University of Washington) Abstract: Policy gradient methods for reinforcement learning and continuous control are popular in practice, but lack theoretical guarantees even for the simplest case of linear dynamics and a quadratic cost, i.e., the Linear Quadratic Regulator (LQR) problem. A difficulty is that unlike the classical approaches, these methods must solve a nonconvex optimization problem to find the optimal control policy. We show that despite the nonconvexity, gradient descent starting from a stabilizing policy converges to the globally optimal policy. We then discuss how this can help understand policy gradient type methods that do not have access to exact gradients. Bio: Maryam Fazel is an Associate Professor of Electrical Engineering at the University of Washington, with adjunct appointments in Computer Science and Engineering, Mathematics, and Statistics. Maryam received her MS and PhD from Stanford University, her BS from Sharif University in Iran, and was a postdoctoral scholar at Caltech before joining UW. Her current interests are in mathematical optimization and applications in machine learning. She is a recipient of the NSF Career Award, UWEE Outstanding Teaching Award, UAI conference Best Student Paper Award with her student, and coauthored a paper selected as a Fast-Breaking paper by Science Watch in 2011. She co-leads the Algorithmic Foundations for Data Science Institute (ADSI)---an NSF TRIPODS Institute at UW, and is an associate editor of SIAM Journals on Optimization (SIOPT) and on Mathematics of Data Science (SIMODS). 2:40 - 3:15 Scenario Optimization for Robust Design - foundations and recent developments - Giuseppe Carlo Calafiore (Politecnico di Torino) (link for slides) Abstract: Random convex programs (RCPs) are convex optimization problems subject to a finite number of constraints (scenarios) that are extracted according to some probability distribution. The optimal objective value of an RCP and its associated optimal solution (when it exists), are random variables: RCP theory is mainly concerned with providing probabilistic assessments on the objective and on the probability of constraint violation for RCPs. In this talk, we give a synthetic overview of RCP theory, discuss practical impact, and illustrate some applicative examples, with focus on control applications. Finally, we glimpse at recent developments of scenario theory such as iterative scenario design and non-convex scenario optimization. Bio: Giuseppe C. Calafiore received the ``Laurea'' degree in Electrical Engineering from Politecnico di Torino in 1993, and the Ph.D. degree in Information and System Theory from Politecnico di Torino, in 1997. He is with the faculty of Dipartimento di Electronics and Telecommunications, Politecnico di Torino, where he currently serves as a full professor and coordinator of the Systems and Data Science lab. Dr. Calafiore held several visiting positions at international institutions: at the Information Systems Laboratory (ISL), Stanford University, California, in 1995; at the Ecole Nationale Supérieure de Techniques Avanceés (ENSTA), Paris, in 1998; and at the University of California at Berkeley, in 1999, 2003 and 2007. He had an appointment as a Senior Fellow at the Institute of Pure and Applied Mathematics (IPAM), University of California at Los Angeles, in 2010. He had appointments as a Visiting Professor at EECS UC Berkeley in 2017 and at the Haas Business School in 2018. He is a Fellow of the Italian National Research Council (CNR). He has been an Associate Editor for the IEEE Transactions on Systems, Man, and Cybernetics (T-SMC), for the IEEE Transactions on Automation Science and Engineering (T-ASE), and for the IEEE Transactions on Automatic Control. Dr. Calafiore is the author of more than 180 journal and conference proceedings papers, and of eight books. He is a fellow member of the IEEE since 2018. He received the IEEE Control System Society ``George S. Axelby'' Outstanding Paper Award in 2008. His research interests are in the fields of convex optimization, randomized algorithms, machine learning, computational finance, and identification and control of uncertain systems.
3:15 - 3:30	Coffee Break
3:30 - 5:15	Session 3: Safe Learning for Control - Chair: George J. Pappas 3:30 - 4:05 Safe model-based learning for robot control - Angela Schoellig (University of Toronto) (link for slides) Abstract: In contrast to computers and smartphones, the promise of robotics is to design devices that can physically interact with the world. Envisioning robots to work in human-centered and interactive environments challenges current robot algorithm design, which has largely been based on a-priori knowledge about the system and its environment. In this talk, we will show how we combine models and data to achieve safe and high-performance robot behavior in the presence of uncertainties and unknown effects. In particular, we combine learned models in the form of Gaussian processes with classic tools from stability theory in order to analyze the stability of a controller on the learned model. Next, we combine this with model predictive control in order to obtain a control algorithm that is provably safe during the learning process. We demonstrate these algorithms on several experiments with self-driving vehicles. More information and videos at: www.dynsyslab.org and https://berkenkamp.me Authors: Felix Berkenkamp and Angela Schoellig Bio:Angela Schoellig is an assistant professor at the University of Toronto Institute for Aerospace Studies, an associate director of the Centre for Aerial Robotics Research and Education at U of T, and an instructor of Udacity’s flying-car nanodegree program. She conducts research at the interface of robotics, controls, and machine learning. Her goal is to enhance the performance, safety, and autonomy of robots by enabling them to learn from past experiments and from each other. She is a recipient of a Sloan Research Fellowship (US/Canada-wide award, one of two in robotics); a Canadian Ministry of Research, Innovation \& Science Early Researcher Award; and a Connaught New Researcher Award. She is one of MIT Technology Review’s Innovators Under 35 (2017), one of Robohub’s “25 women in robotics you need to know about (2013),” winner of MIT’s 2015 Enabling Society Tech Competition, a 2015 finalist in Dubai’s 1 million “Drones for Good” competition, and the youngest member of the 2014 Science Leadership Program, which promotes outstanding scientists in Canada. Her PhD was awarded the ETH Medal and the 2013 Dimitris N. Chorafas Foundation Award (one of 35 worldwide). 4:05 - 4:40 Learning Model Predictive Control - Francesco Borrelli (University of California, Berkeley) (link for slides) Abstract: Forecasts play an important role in autonomous and semi-autonomous systems. Applications include transportation, energy, manufacturing and healthcare systems. Predictions of systems dynamics, human behavior and environment conditions can improve safety and performance of the resulting system. However, constraint satisfaction, performance guarantees and real-time computation are challenged by the growing complexity of the engineered system, the human/machine interaction and the uncertainty of the environment where the system operates. Our research over the past years has focused on predictive control design for autonomous systems safely performing iterative tasks. In this talk I will focus on recent results on the use of data to efficiently formulate predictive control problems which safely improve performance in iterative tasks. Throughout the talk I will focus on autonomous cars to motivate our research and show the benefits of the proposed techniques. More info on: www.mpc.berkeley.edu Bio:Francesco Borrelli received the `Laurea' degree in computer science engineering in 1998 from the University of Naples `Federico II', Italy. In 2002 he received the PhD from the Automatic Control Laboratory at ETH-Zurich, Switzerland. He is currently a Professor at the Department of Mechanical Engineering of the University of California at Berkeley, USA. He is the author of more than one hundred publications in the field of predictive control. He is author of the book Constrained Optimal Control of Linear and Hybrid Systems published by Springer Verlag, the winner of the 2009 NSF CAREER Award and the winner of the 2012 IEEE Control System Technology Award. In 2016 he was elected IEEE fellow. Since 2004 he has served as a consultant for major international corporations. He is the founder and CTO of BrightBox Technologies Inc, a company focused on cloud-computing optimization for autonomous systems. He is the co-director of the Hyundai Center of Excellence in Integrated Vehicle Safety Systems and Control at UC Berkeley. His research interests include constrained optimal control, model predictive control and its application to advanced automotive control and energy efficient building operation. 4:40 - 5:15 Safe Learning in Robotics - Claire J. Tomlin (University of California, Berkeley) Abstract: A great deal of research in recent years has focused on robot learning. In many applications, guarantees that specifications are satisfied throughout the learning process are paramount. For the safety specification, we present a controller synthesis technique based on the computation of reachable sets, using optimal control and game theory. In the first part of the talk, we will review these methods and their application to collision avoidance and avionics design in air traffic management systems, and networks of unmanned aerial vehicles. In the second part, we will present a toolbox of methods combining reachability with data-driven techniques inspired by machine learning, to enable performance improvement while maintaining safety. We will illustrate these “safe learning” methods on a quadrotor UAV experimental platform which we have at Berkeley, including demonstrations of motion planning around people. Bio:Claire J. Tomlin received the B.A.Sc. degree in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, the M.Sc. degree in electrical engineering from Imperial College London, London, U.K., and the Ph.D. degree in electrical engineering and computer sciences from the University of California at Berkeley, Berkeley, CA, USA. She was an Assistant, an Associate, and a Full Professor with the Department of Aeronautics and Astronautics, Stanford University, Stanford, CA, USA, from 1998 to 2007. She has held visiting researcher positions with the NASA Ames Research Center, Mountain View, CA, USA, and Honeywell International, Inc., Morristown, NJ, USA. She is currently the Charles A. Desoer Professor with the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley. Her current research interests include hybrid control systems, with applications in air-traffic systems, unmanned aerial vehicles, and systems biology. Dr. Tomlin was a recipient of the MacArthur Fellowship in 2006, the Okawa Foundation Research Grant in 2006, and the Eckman Award from the American Automatic Control Council in 2003.
5:15 - 5:30	Closing remarks by organizers

Konstantinos Gatsis
University of Pennsylvania

Pramod P. Khargonekar
University of California, Irvine

Manfred Morari
University of Pennsylvania

George J. Pappas
University of Pennsylvania

Should you have any questions, please do not hesitate to contact the organizers.

Workshop on Learning for Control

57th IEEE Conference on Decision and Control
Miami Beach, Florida, December 16, 2018
Room: Splash 11-12

Date and Location

Registration

Keynote Speaker

Invited Speakers

Schedule

Organizers