Apprentissage par Renforcement et Commande Optimale

Reinforcement Learning and Optimal Control

I won't be offering this course for the fall 2022 semester, but will be returning teaching it next year in 2023.

Horaire et lieu/Time and Location

Lieu/Location: Mila, Salle Agora Rez-de-chaussée, 6650 Saint-Urbain, Montréal, QC H2S 3G9

Préalables/Prerequisites

Il est primordial que vous ayez réussi avec succès tous les cours suivants avant d'entreprendre celui-ci:

De plus, vous devez maîtriser le langage Python et de bien connaître Numpy/Scipy et être capable d'écrire du code vectorisé en Jax. Il est aussi attendu que vous puissiez puiser dans vos connaissances en algèbre linéaire, analyse, probabilité, statistique, processus stochastiques, et optimisation sans et avec contraintes pour dérivez vos propres preuves mathématiques. Vous devez égalements être capables de lire le jargon technique des conférences NeurIPS, ICML et ICLR.

You must have taken the following classes before taking this one:

IFT 6390, Fundamentals of machine learning

IFT 6269, Probabilistic Graphical Models

IFT 6135, Representation Learning

Furthermore, you need to master Python and know how to write Numpy/Scipy code as well as vectorized code in Jax. I also take for granted that you can use your background in linear algebra, analysis, probability, statistics, stochastic processes, constrained and unconstrained optimization to derive your own proofs.You also need to be able to read technical jargon from the conferences NeurIPS, ICML and ICLR.

Sujets/Topics

Processus de décision markovien, formulation sous forme de programme linéaire, forme lisse des équations de Bellman, équations de Bellman projettées, analyse des algorithmes de type TD, estimation de dérivées, commande optimale en temps continu et discret, principe du maximum de Pontryagin, Hamiltonien en temps discret et en temps continu, méthode par état adjoint et méthode variationelle pour le calcul de sensibilité, méthodes de contrôle directes et indirectes, apprentissage par renforcement inversé, et plus!

Markov Decision Processes, LP formulation, occupation measure, smooth bellman equations, projected bellman equations, analysis of TD algorithms, derivative estimation, discrete and continuous optimal control, Pontryagin maximum principle, discrete and continuous time Hamiltonian, adjoint and forward sensitivity equations, single shooting, multiple shooting, collocation methods, inverse reinforcement learning, and more!

Évaluation/Evaluation

Devoirs/Assignments: 3 x 10 = 30%
Intra/Midterm: 20%
Final: 35%
Étude de cas/Case study: 15%

Livres/Textbooks

Le contenu du cours est basé sur les livres suivants:

The course content is based on the following books:

Markov Decision Processes: Discrete Stochastic Dynamic Programming by Martin Puterman
Reinforcement Learning and Optimal Control by Dimitri Bertsekas
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
Practical Methods for Optimal Control and Estimation Using Nonlinear Programming by John T. Betts
Simulation and the Monte Carlo Method, Second Edition by Rubinstein and Kroese (2008)

Bien-Être/Wellbeing

UdeM: Centre de santé et de consultation psychologique
McGill: Student wellness hub
Polytechnique: Soutien à la réussite
HEC: Psychological support and ressources
ASEQ: Student Health Support Program - Mental Health Resources