Excursions in Reinforcement Learning

This course (IFT6760C) is intended for advanced graduate students with a good background in machine learning, mathematics, operations research or statistics. You can register to IFT6760C on Synchro if your affiliation is with UdeM, or via the CREPUQ if you are from another institution. Due to the research-oriented nature of this class, you need to be comfortable with a teaching format involving open-ended questions and assignments. You will be required to think critically and adopt an open mindset. My teaching goal with this course is for all the participants to build their own understanding of reinforcement learning in relation to their primary research area while sharing their unique perspective and insights with the entire class.


Origin: from the Latin verb excurrere which means to run out. This is also the intended meaning behind the title of this course. I want us to deviate from the usual paths and explore the rich connections between reinforcement learning and other disciplines, in particular: optimization, control theory and simulation. And of course, I'm also hoping that this will be a fun activity for everyone.

Time and Location

Twice a week, on Tuesday from 9:30 to 11:30AM and on Friday from 13h30 to 15h40. The course is taught at Mila in the Agora of the 6650 Saint-Urbain. You don't need badge access to enter the classroom. Here's a video showing you how to access the classroom from Saint-Zotique Ouest.


The following evaluation structure is subject to change depending on the class size.

There is no mandatory textbook. I will however be referencing content from:


Tuesday January 7

First class. Markov Decision Processes, induced process

Friday January 10

Examples of MDPs, constrained MLE as sequential allocation, criteria: finite horizon, infinite horizon, average reward, random horizon interpretation of infinite discounted setting. Bellman optimality

Tuesday January 14

Friday January 17

Tuesday January 21

Friday January 24

Tuesday January 28

Friday January 31

Tuesday February 4

Friday February 7

Tuesday February 11

Friday February 14

Week of February 17

Policy gradients: occupation measures, discounted objective, implicit differentiation and derivation in the infinite horizon case

Week of February 24

Policy gradients: derivative estimation, likelihood ratio methods (REINFORCE), reparametrization (IPA), baselines (control variates), actor-critic systems

Week of March 2

Spring break

Week of March 9

Policy gradients: application for learning temporal abstractions, the option-critic architecture, hierarchical and goal-conditioned RL

Week of March 16

Policy gradients: Linear-Quadratic Regulator, Lagrangian formulation, MPC, Monte-Carlo Tree Search

Week of March 23

Automatic differentiation as discrete-time optimal control

Week of March 30

Formulation of inverse RL and meta-RL as bilevel optimization.

Week of April 6

Methods (contd.): KKT "trick", forward, reverse, implicit, competitive. Case studies

Week of April 13

Challenge and opportunities

Week of April 20

Final project presentations


Academic life can sometimes be overwhelming. Don't hesitate to find support: