Publications
2025
- Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons. Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Irina Rish, Pierre-Luc Bacon, Razvan Pascanu, Aristide Baratin. Transactions on Machine Learning Research (TMLR) 2025.
- MaestroMotif: Skill Design from Artificial Intelligence Feedback. Martin Klissarov, Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent, Amy Zhang, Pierre-Luc Bacon, Doina Precup, Marlos C. Machado, Pierluca D'Oro. ICLR 2025.
- Scaling Trends in Language Model Robustness. Nikolaus H. R. Howe, Ian R. McKenzie, Oskar John Hollinsworth, Michal Zajac, Tom Tseng, Aaron David Tucker, Pierre-Luc Bacon, Adam Gleave. ICML 2025.
- Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning. Guozheng Ma, Lu Li, Zilin Wang, Li Shen, Pierre-Luc Bacon, Dacheng Tao. ICML 2025.
- Mol-MoE: Training Preference-Guided Routers for Molecule Generation. Diego Calanzone, Pierluca D'Oro, Pierre-Luc Bacon. CoRR abs/2502.05633, 2025.
- Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments. Ziyan Luo, Tianwei Ni, Pierre-Luc Bacon, Doina Precup, Xujie Si. CoRR abs/2506.00563, 2025.
- State Entropy Regularization for Robust Reinforcement Learning. Yonatan Ashlag, Uri Koren, Mirco Mutti, Esther Derman, Pierre-Luc Bacon, Shie Mannor. CoRR abs/2506.07085, 2025.
- Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning. Roger Creus Castanyer, Johan S. Obando-Ceron, Lu Li, Pierre-Luc Bacon, Glen Berseth, Aaron C. Courville, Pablo Samuel Castro. CoRR abs/2506.15544, 2025.
- Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators. Marco Jiralerspong, Esther Derman, Danilo Vucetic, Nikolay Malkin, Bilun Sun, Tianyu Zhang, Pierre-Luc Bacon, Gauthier Gidel. CoRR abs/2506.17007, 2025.
- Discovery of Sustainable Refrigerants through Physics-Informed RL Fine-Tuning of Sequence Models. Adrien Goldszal, Diego Calanzone, Vincent Taboga, Pierre-Luc Bacon. CoRR abs/2509.19588, 2025.
- Planning with Unified Multimodal Models. Yihao Sun, Zhilong Zhang, Yang Yu, Pierre-Luc Bacon. CoRR abs/2509.23014, 2025.
- The Three Regimes of Offline-to-Online Reinforcement Learning. Lu Li, Tianwei Ni, Yihao Sun, Pierre-Luc Bacon. CoRR abs/2510.01460, 2025.
- Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism. Tianwei Ni, Esther Derman, Vineet Jain, Vincent Taboga, Siamak Ravanbakhsh, Pierre-Luc Bacon. CoRR abs/2512.04341, 2025.
2024
- Neural differential equations for temperature control in buildings under demand response programs. Vincent Taboga, Clement Gehring, Mathieu Le Cam, Hanane Dagdougui, Pierre-Luc Bacon. Applied Energy, Volume 368, 2024.
- Do Transformer World Models Give Better Policy Gradients?. Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon. ICML 2024.
- Maximum entropy GFlowNets with soft Q-learning. Sobhan Mohammadpour, Emmanuel Bengio, Emma Frejinger, Pierre-Luc Bacon. AISTATS 2024.
- Decoupling regularization from the action space. Sobhan Mohammadpour, Pierre-Luc Bacon, Emma Frejinger. ICLR 2024.
- Bridging State and History Representations: Understanding Self-Predictive RL. Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi, Michel Ma, Clement Gehring, Aditya Mahajan, Pierre-Luc Bacon. ICLR 2024.
- Course Correcting Koopman Representations. Mahan Fathi, Clement Gehring, Jonathan Pilault, David Kanaa, Pierre-Luc Bacon, Ross Goroshin. ICLR 2024.
- Motif: Intrinsic Motivation from Artificial Intelligence Feedback. Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff. ICLR 2024.
- Generative Active Learning for the Search of Small-molecule Protein Binders. Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Pierre-Luc Bacon et al. CoRR abs/2405.01616, 2024.
- Exploring Scaling Trends in LLM Robustness. Nikolaus H. R. Howe, Michal Zajac, Ian R. McKenzie, Oskar John Hollinsworth, Tom Tseng, Pierre-Luc Bacon, Adam Gleave. CoRR abs/2407.18213, 2024.
2023
- When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment. Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon. NeurIPS 2023 oral
- Block-State Transformers. Jonathan Pilault, Mahan Fathi, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin. NeurIPS 2023 poster
- Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control. Nathan Rahn, Pierluca D'Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G Bellemare. NeurIPS 2023 poster
- Double Gumbel Q-Learning. David Yu-Tung Hui, Aaron Courville, Pierre-Luc Bacon. NeurIPS 2023 spotlight
- Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier. Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville. ICLR 2023 notable top 5%
2022
- Myriad: a real-world testbed to bridge trajectory optimization and deep learning. Nikolaus H. R. Howe, Simon Dufort-Labbé, Nitarshan Rajkumar, Pierre-Luc Bacon. NeurIPS 2022 Datasets and Benchmarks
- The Primacy Bias in Deep Reinforcement Learning. Evgenii Nikishin*, Max Schwarzer*, Pierluca D'Oro*, Pierre-Luc Bacon, Aaron Courville. ICML 2022 and RLDM 2022
- Direct Behavior Specification via Constrained Reinforcement Learning. Julien Roy, Roger Girgis, Joshua Romoff, Pierre-Luc Bacon, Christopher Pal. ICML 2022
-
Continuous-Time Meta-Learning with Forward Mode Differentiation. Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon. ICLR, 2022.
2021
- Pierluca D'Oro, Pierre-Luc Bacon. Meta Dynamic Programming. NeurIPS workshop "Metacognition in the Age of AI: Challenges and Opportunities", 2021.
- Michel Ma, Pierluca D'Oro, Pierre-Luc Bacon. Long-Term Credit Assignment via Model-based Temporal Shortcuts. NeurIPS Deep Reinforcement Learning Workshop, 2021.
- Andreea Deac, Petar Veličković, Ognjen Milinković, Pierre-Luc Bacon, Jian Tang, Mladen Nikolić. Neural Algorithmic Reasoners are Implicit Planners. NeurIPS, 2021.
- Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon. "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation". AAAI, 2022. (arXiv)
2020
- Michel Ma, Pierre-Luc Bacon. Counterfactual Policy Evaluation and the Conditional Monte Carlo Method. NeurIPS workshop on Offline Reinforcement, 2020.
- Yao Liu, Pierre-Luc Bacon, Emma Brunskill. "Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling". Thirty-seventh International Conference on Machine Learning (ICML), 2020. (arXiv)
- Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon. "Policy Evaluation Networks". In submission. (arXiv)
- Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau. "TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?". Theoretical Foundations of Reinforcement Learning workshop at ICML 2020. (arXiv)
- Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, Doina Precup. Options of Interest: Temporal Abstraction with Interest Functions. Thirthy-fourth AAAI Conference On Artificial Intelligence (AAAI), 2020.
2019
- Pierre-Luc Bacon, Florian T. Schaefer, Clement Gehring, Animashree Anandkumar, Emma Brunskill. "A Lagrangian Method for Inverse Problems in Reinforcement Learning". NeurIPS 2019 Optimization Foundations for Reinforcement Learning Workshop
- Benjamin Petit, Loren Amdahl-Culleton, Yao Liu , Jimmy Smith, Pierre-Luc Bacon. "All-Action Policy Gradient Methods: A Numerical Integration Approach". NeurIPS 2019 Optimization Foundations for Reinforcement Learning Workshop.
- Pierre-Luc Bacon, Dilip Arumugam, Emma Brunskill. "Goal-Directed Learning as a Bi-level Optimization Problem". 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2019.
2018
- Pierre-Luc Bacon. "Temporal Representation Learning". PhD Thesis. McGill University, Montreal, June 2018.
- Pierre-Luc Bacon and Doina Precup. "Constructing Temporal Abstractions Autonomously in Reinforcement Learning". Association for the Advancement of Artificial Intelligence (AAAI). p. 39. 2018.
- Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent. "Convergent Tree-Backup and Retrace with Function Approximation". In proceedings of the 35th International Conference on Machine Learning (ICML), 2018. (arXiv)
- Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe. "Learning with Options that Terminate Off-Policy". Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2018. (arXiv)
- Jean Harb*, Pierre-Luc Bacon*, Martin Klissarov, Doina Precup. "When Waiting is not an Option : Learning Options with a Deliberation Cost". Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2018. (arXiv)
- Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup. "OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning". Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2018. (arXiv)
- Daniel J. Mankowitz, Timothy Mann, Pierre-Luc Bacon, Shie Mannor, Doina Precup. "Learning Robust Options". Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2018.
2017
- Pierre-Luc Bacon, Doina Precup. "Unifying Multi-Step Methods through Matrix Splitting". 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2017.
- Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup. "Learnings Options End-to-End for Continuous Action Tasks". Hierarchical Reinforcement Learning Workshop (NIPS), 2017.
- Pierre-Luc Bacon, Doina Precup. "A Unified View on Multi-Steps Methods using Matrix Splittings". Data Learning and Inference (DALI), 2017.
- Pierre-Luc Bacon, Jean Harb, Doina Precup. "The Option-Critic Architecture". Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2017. (arXiv, slides)
2016
- Pierre-Luc Bacon, Doina Precup. "A Matrix Splitting Perspective on Planning with Options". Continual Learning and Deep Networks Workshop, NIPS 2016. (poster)
- Doina Precup, Pierre-Luc Bacon. "Advances in Option Construction: The option-critic architecture". Abstraction in RL Workshop, ICML 2016. (video)
- Pierre-Luc Bacon and Doina Precup. "The good, the bad and the discovery: the specification problem of options discovery". 10th Barbados Workshop on Reinforcement Learning, 2016.
- Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup. "Conditional Computation in Neural Networks for faster models". CoRR abs/1511.06297, 2016.
2015
- Pierre-Luc Bacon and Doina Precup. "Learning with options: Just deliberate and relax". Bounded Optimality and Rational Metareasoning Workshop, NIPS 2015. (poster)
- Pierre-Luc Bacon and Doina Precup. "The option-critic architecture". Deep Reinforcement Learning Workshop, NIPS 2015. (poster)
- Pierre-Luc Bacon, Borja Balle and Doina Precup. "Learning and Planning with Timing Information in Markov Decision Processes". 31st Conference on Uncertainty in Artificial Intelligence (UAI), 2015. (poster)
- Joelle Pineau, Pierre-Luc Bacon. "Analyzing Open Data from the City of Montreal". 2nd ICML Workshop on Mining Urban Data (MUD), 2015.
- Pierre-Luc Bacon, Doina Precup. "Learning Recognizers". 9th Barbados Workshop on Reinforcement Learning, 2015. (slides)
- Pierre-Luc Bacon, Emmanuel Bengio, Doina Precup, Joelle Pineau. "Conditional computation in neural networks using a decision-theoretic approach". 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2015.
- Pierre-Luc Bacon, Borja Balle and Doina Precup. "Learning and Planning with Timing Information in Markov Decision Processes". 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2015.
2014
- Pierre-Luc Bacon, Borja Balle and Doina Precup. "Predictive Timing Models". 2014 NIPS Workshop "From Bad Models to Good Policies". (poster, slides, video)
2013
- Pierre-Luc Bacon and Doina Precup. "Using Label Propagation for Learning Temporally Abstract Actions in Reinforcement Learning". AAMAS Workshop on "Multiagent Interaction Networks", 2013.
- Pierre-Luc Bacon. "On the Bottleneck Concept for Options Discovery: Theoretical Underpinnings and Extension in Continuous State Spaces". Master's thesis, McGill University, 2013.