Bibliography

Bibliography#

[1]

Christopher Paul Adams and Van Vu Brantner. Spending on new drug development1. Health Economics, 19(2):130–141, February 2009. URL: http://dx.doi.org/10.1002/hec.1454, doi:10.1002/hec.1454.

[2]

E. L. Allgower and K. Georg. Numerical Continuation Methods: An Introduction. Volume 13 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, Heidelberg, 1990.

[3]

Kenneth J Arrow, Leonid Hurwicz, and Hirofumi Uzawa. Studies in linear and non-linear programming. Stanford University Press, 1958.

[4]

Jordan T Ash and Ryan P Adams. Warm-starting and amortization in continual learning. In International Conference on Learning Representations (ICLR). 2020.

[5]

Dimitri P. Bertsekas. Distributed asynchronous computation of fixed points. Mathematical Programming, 27(1):107–120, September 1983. URL: http://dx.doi.org/10.1007/BF02591967, doi:10.1007/bf02591967.

[6]

Mark Chang. Monte Carlo Simulation for the Pharmaceutical Industry: Concepts, Algorithms, and Case Studies. CRC Press, September 2010. ISBN 9780429152382. URL: http://dx.doi.org/10.1201/EBK1439835920, doi:10.1201/ebk1439835920.

[7]

L. Chen and others. A review of real-world deployments of reinforcement learning and mpc in hvac systems. 2025. White paper, accessed July 2025. URL: https://example.com/hvac-rl-review.

[8]

B. Coffey, K. Knudsen, M. Guo, and E. Haves. Reduced-order residential home modeling for model predictive control. Technical Report LBNL-4064E, National Renewable Energy Laboratory, 2010. Prepared under Lawrence Berkeley National Laboratory. URL: https://www.nrel.gov/docs/fy10osti/47505.pdf.

[9]

Michael J. Conroy and James T. Peterson. Decision Making in Natural Resource Management: A Structured, Adaptive Approach: A Structured, Adaptive Approach. Wiley, January 2013. ISBN 9781118506196. URL: http://dx.doi.org/10.1002/9781118506196, doi:10.1002/9781118506196.

[10]

Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G. Bellemare, and Aaron C. Courville. Sample-efficient reinforcement learning by breaking the replay ratio barrier. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.

[11]

Damien Ernst, Pierre Geurts, and Louis Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.

[12]

Damien Ernst, Pierre Geurts, and Louis Wehenkel. Tree-based batch mode reinforcement learning. J. Mach. Learn. Res., 6:503–556, 2005. URL: https://jmlr.org/papers/v6/ernst05a.html.

[13]

D. Evans and DeepMind. Deepmind ai reduces google data centre cooling bill by 40. 2018. URL: https://www.deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill.

[14]

Scott Fujimoto, Herke Hoof, and David Meger. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning (ICML), 1587–1596. 2018.

[15]

Matthieu Geist, Bruno Scherrer, and Olivier Pietquin. A theory of regularized Markov decision processes. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 2160–2169. PMLR, 09–15 Jun 2019. URL: https://proceedings.mlr.press/v97/geist19a.html.

[16]

Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3–42, March 2006. URL: http://dx.doi.org/10.1007/s10994-006-6226-1, doi:10.1007/s10994-006-6226-1.

[17]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, volume 27. 2014.

[18]

Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on International Conference on Machine Learning, ICML'95, 261–268. San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.

[19]

Alexandra Ivanova Grancharova and Tor Arne Johansen. Explicit nonlinear model predictive control. Lecture notes in control and information sciences. Springer, Berlin, Germany, 2012 edition, March 2012.

[20]

J.T. Gravdahl and O. Egeland. Compressor surge control using a close-coupled valve and backstepping. In Proceedings of the 1997 American Control Conference (Cat. No.97CH36041), 982–986 vol.2. IEEE, 1997. URL: http://dx.doi.org/10.1109/ACC.1997.609673, doi:10.1109/acc.1997.609673.

[21]

Andreas Griewank. On automatic differentiation. Mathematical Programming: Recent Developments and Applications, 1989.

[22]

Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, and Sergey Levine. Reinforcement learning with deep energy-based policies. Proceedings of the 34th International Conference on Machine Learning, 70:1352–1361, 2017.

[23]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (ICML), 1861–1870. PMLR, 2018.

[24]

Roland Hafner and Martin Riedmiller. Reinforcement learning in feedback control: challenges and benchmarks from technical process control. Machine Learning, 84(1-2):137–169, feb 2011. URL: http://dx.doi.org/10.1007/s10994-011-5235-x, doi:10.1007/s10994-011-5235-x.

[25]

Warren A. Hall and William S. Butcher. Optimal timing of irrigation. Journal of the Irrigation and Drainage Division, 94(2):267–275, June 1968. URL: http://dx.doi.org/10.1061/JRCEA4.0000569, doi:10.1061/jrcea4.0000569.

[26]

John H Holland. Genetic algorithms. Scientific american, 267(1):66–73, 1992.

[27]

Fedor Iskhakov, John Rust, and Bertel Schjerning. Machine learning and structural econometrics: contrasts and synergies. The Econometrics Journal, 23(3):S81–S124, August 2020. URL: http://dx.doi.org/10.1093/ectj/utaa019, doi:10.1093/ectj/utaa019.

[28]

James Kennedy and Russell Eberhart. Particle swarm optimization. In Proceedings of ICNN'95-International Conference on Neural Networks, volume 4, 1942–1948. IEEE, 1995.

[29]

Scott Kirkpatrick, C Daniel Gelatt Jr, and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983.

[30]

J. Kleijnen and others. Scoping review of prospective evaluations of ai in healthcare decision-making. Lancet Digital Health, 6(3):e200–e212, 2024.

[31]

Samuel Kortum. Value function approximation in an estimation routine. 1992. Manuscript, Boston University.

[32]

Yann LeCun. A theoretical framework for back-propagation. Proceedings of the 1988 Connectionist Models Summer School, pages 21–28, 1988.

[33]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. Reinforcement learning as a framework for control: a survey. arXiv preprint arXiv:1806.04222, 2018.

[34]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.

[35]

Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning, and teaching. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 1992. Technical Report, CMU-CS-92-170.

[36]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, and others. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop. 2013.

[37]

Dirk Ormoneit and Śaunak Sen. Kernel-based reinforcement learning. Machine Learning, 49(2/3):161–178, 2002. URL: http://dx.doi.org/10.1023/A:1017928328829, doi:10.1023/a:1017928328829.

[38]

J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Computer Science and Applied Mathematics. Academic Press, New York, 1970.

[39]

Lev Semyonovich Pontryagin, Vladimir Grigor'evich Boltyanskii, Revaz Valerianovich Gamkrelidze, and Evgenii Frolovich Mishchenko. The Mathematical Theory of Optimal Processes. Interscience Publishers, 1962.

[40]

Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York, 1994. ISBN 978-0-471-61977-3. First published in 1994.

[41]

Martin Riedmiller. Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method. In Proceedings of the 16th European Conference on Machine Learning (ECML), 317–328. Berlin, Heidelberg, 2005. Springer.

[42]

Martin A. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In João Gama, Rui Camacho, Pavel Brazdil, Al\'ıpio Jorge, and Lu\'ıs Torgo, editors, Machine Learning: ECML 2005, 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, volume 3720 of Lecture Notes in Computer Science, 317–328. Springer, 2005. URL: https://doi.org/10.1007/11564096\_32, doi:10.1007/11564096\_32.

[43]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. Nature, 323(6088):533–536, 1986.

[44]

John Rust. Optimal replacement of gmc bus engines: an empirical model of harold zurcher. Econometrica, 55(5):999–1033, 1987.

[45]

John Rust. Chapter 14 Numerical dynamic programming in economics, pages 619–729. Elsevier, 1996. URL: http://dx.doi.org/10.1016/S1574-0021(96)01016-7, doi:10.1016/s1574-0021(96)01016-7.

[46]

Y. Sawaguchi, E. Furutani, G. Shirakami, M. Araki, and K. Fukuda. A model-predictive hypnosis control system under total intravenous anesthesia. IEEE Transactions on Biomedical Engineering, 55(3):874–887, March 2008. URL: http://dx.doi.org/10.1109/tbme.2008.915670, doi:10.1109/tbme.2008.915670.

[47]

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2nd edition, 2018. ISBN 978-0262039246. URL: http://incompleteideas.net/book/the-book-2nd.html.

[48]

Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2016.

[49]

Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence, 1433–1438. 2008.

[50]

ANYbotics. Anymal: rl-powered autonomous inspection. 2023. Company product documentation. URL: https://www.anybotics.com/anymal-inspection.

[51]

Meta AI. Reinforcement learning for sustainable cooling in data centers. 2024. Meta Engineering Blog, accessed July 2025. URL: https://engineering.fb.com/2024/09/10/data-center/rl-sustainable-cooling.

[52]

U.S. Food and Drug Administration. Artificial intelligence and machine learning (ai/ml)-enabled medical devices. 2025. Accessed July 2025. URL: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices.

[53]

Uber AI Labs. Reinforcement learning at scale for ride-matching optimization. 2025. Uber Engineering Blog, accessed July 2025. URL: https://www.uber.com/blog/rl-ride-matching.