Excursions in Reinforcement Learning

This course (IFT6760C) is intended for advanced graduate students with a good background in machine learning, mathematics, operations research or statistics. You can register to IFT6760C on Synchro if your affiliation is with UdeM, or via the CREPUQ if you are from another institution. Due to the research-oriented nature of this class, you need to be comfortable with a teaching format involving open-ended questions and assignments. You will be required to think critically and adopt an open mindset. My teaching goal with this course is for all the participants to build their own understanding of reinforcement learning in relation to their primary research area while sharing their unique perspective and insights with the entire class.


Origin: from the Latin verb excurrere which means to run out. This is also the intended meaning behind the title of this course. I want us to deviate from the usual paths and explore the rich connections between reinforcement learning and other disciplines, in particular: optimization, control theory and simulation. And of course, I'm also hoping that this will be a fun activity for everyone.

Time and Location

Twice a week, on Tuesday from 9:30 to 11:30AM and on Friday from 13h30 to 15h40. The course is taught at Mila in the Agora of the 6650 Saint-Urbain. You don't need badge access to enter the classroom. Here's a video showing you how to access the classroom from Saint-Zotique Ouest.


The following evaluation structure is subject to change depending on the class size.

There is no mandatory textbook. I will however be referencing content from:


Tuesday January 7

First class. Markov Decision Processes, induced process

Friday January 10

Examples of MDPs, constrained MLE as sequential allocation, criteria: finite horizon, infinite horizon, average reward, random horizon interpretation of infinite discounted setting. Bellman optimality

Tuesday January 14

Friday January 17

Tuesday January 21

Friday January 24

Tuesday January 28

Friday January 31

Tuesday February 4

Friday February 7

Tuesday February 11

Friday February 14

Tuesday February 18

Friday February 21

Tuesday February 25

Friday February 28

Week of March 2

Spring break

Tuesday March 10

Friday March 13

Week of March 16

Week of March 23

Week of March 30

Formulation of inverse RL and meta-RL as bilevel optimization.

Week of April 6

Methods (contd.): KKT "trick", forward, reverse, implicit, competitive. Case studies

Week of April 13

Challenge and opportunities

Week of April 20

Final project presentations


Academic life can sometimes be overwhelming. Don't hesitate to find support: