Show cover of TalkRL: The Reinforcement Learning Podcast

TalkRL: The Reinforcement Learning Podcast

TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.

Tracks

Vincent Moens on TorchRL
Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch.  Featured References TorchRL: A data-driven decision-making library for PyTorch Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens  Additional References  TorchRL on github  TensorDict Documentation  
40:14 4/8/24
Arash Ahmadian on Rethinking RLHF
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.Featured ReferenceBack to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMsArash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara HookerAdditional ReferencesSelf-Rewarding Language Models, Yuan et al 2024 Reinforcement Learning: An Introduction, Sutton and Barto 1992Learning from Delayed Rewards, Chris Watkins 1989Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
33:30 3/25/24
Glen Berseth on RL Conference
Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).  Featured Links  Reinforcement Learning Conference  Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach
21:38 3/11/24
Ian Osband
Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.  We spoke about: - Information theory and RL - Exploration, epistemic uncertainty and joint predictions - Epistemic Neural Networks and scaling to LLMs Featured References  Reinforcement Learning, Bit by Bit  Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen  From Predictions to Decisions: The Importance of Joint Predictive Distributions Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy   Epistemic Neural Networks Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy  Approximate Thompson Sampling via Epistemic Neural Networks Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy   Additional References  Thesis defence, Ian Osband Homepage, Ian Osband Epistemic Neural Networks at Stanford RL Forum Behaviour Suite for Reinforcement Learning, Osband et al 2019 Efficient Exploration for LLMs, Dwaracherla et al 2024 
68:26 3/7/24
Sharath Chandra Raparthy
Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!  Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.  Featured Reference  Generalization to New Sequential Decision Making Tasks with In-Context Learning   Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu  Additional References  Sharath Chandra Raparthy Homepage  Human-Timescale Adaptation in an Open-Ended Task Space, Adaptive Agent Team 2023Data Distributional Properties Drive Emergent In-Context Learning in Transformers, Chan et al 2022  Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al  2021
40:41 2/12/24
Pierluca D'Oro and Martin Klissarov
Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!  Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.  Featured References  Motif: Intrinsic Motivation from Artificial Intelligence Feedback  Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff  Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control  Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare  To keep doing RL research, stop calling yourself an RL researcher Pierluca D'Oro 
57:24 11/13/23
Martin Riedmiller
Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!  Martin Riedmiller is a research scientist and team lead at DeepMind.   Featured References   Magnetic control of tokamak plasmas through deep reinforcement learning  Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller Human-level control through deep reinforcement learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis  Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method Martin Riedmiller  
73:56 8/22/23
Max Schwarzer
Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   Featured References Bigger, Better, Faster: Human-level Atari with human-level efficiency  Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro  Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville  The Primacy Bias in Deep Reinforcement Learning Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville  Additional References   Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al 2017  When to use parametric models in reinforcement learning? Hasselt et al 2019 Data-Efficient Reinforcement Learning with Self-Predictive Representations, Schwarzer et al 2020  Pretraining Representations for Data-Efficient Reinforcement Learning, Schwarzer et al 2021  
70:18 8/8/23
Julian Togelius
Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai  Featured References  Choose Your Weapon: Survival Strategies for Depressed AI AcademicsJulian Togelius, Georgios N. YannakakisLearning Controllable 3D Level GeneratorsZehua Jiang, Sam Earle, Michael Cerny Green, Julian TogeliusPCGRL: Procedural Content Generation via Reinforcement LearningAhmed Khalifa, Philip Bontrager, Sam Earle, Julian TogeliusIlluminating Generalization in Deep Reinforcement Learning through Procedural Level GenerationNiels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi
40:04 7/25/23
Jakob Foerster
Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.  Jakob Foerster is an Associate Professor at University of Oxford.  Featured References  Learning with Opponent-Learning Awareness Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch  Model-Free Opponent Shaping Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster  Off-Belief Learning Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster  Learning to Communicate with Deep Multi-Agent Reinforcement Learning Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson  Adversarial Cheap Talk Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster  Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson  Additional References  Lectures by Jakob on youtube 
63:45 5/8/23
Danijar Hafner 2
Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL,  realtime RL on robots with DayDreamer, and his framework for unsupervised agent design! Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind.  He has been our guest before back on episode 11.  Featured References   Mastering Diverse Domains through World Models [ blog ] DreaverV3 Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap  DayDreamer: World Models for Physical Robot Learning [ blog ]  Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel  Deep Hierarchical Planning from Pixels [ blog ]  Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel   Action and Perception as Divergence Minimization [ blog ]  Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess  Additional References  Mastering Atari with Discrete World Models [ blog ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba  Dream to Control: Learning Behaviors by Latent Imagination [ blog ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi  Planning to Explore via Self-Supervised World Models ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak  
45:21 4/12/23
Jeff Clune
AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!  Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.  Featured References  Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ] Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune  Robots that can adapt like animals Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret  Illuminating search spaces by mapping elites Jean-Baptiste Mouret, Jeff Clune  Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley  Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley  First return, then explore Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune
71:11 3/27/23
Natasha Jaques 2
Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!  Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck  PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar  Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine  Additional References  Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019  Learning to summarize from human feedback, Nisan Stiennon et al 2020  Training language models to follow instructions with human feedback, Long Ouyang et al 2022  
46:02 3/14/23
Jacob Beck and Risto Vuorio
Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning.  Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.    Featured Reference   A Survey of Meta-Reinforcement LearningJacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson   Additional References  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning, Luisa Zintgraf et al  Mastering Diverse Domains through World Models (Dreamerv3), Hafner et al    Unsupervised Meta-Learning for Reinforcement Learning (MAML), Gupta et al  Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices (DREAM), Liu et al  RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al  Learning to reinforcement learn, Wang et al  
67:05 3/7/23
John Schulman
John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.Featured ReferencesWebGPT: Browser-assisted question-answering with human feedbackReiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John SchulmanTraining language models to follow instructions with human feedbackLong Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan LoweAdditional ReferencesOur approach to alignment research, OpenAI 2022Training Verifiers to Solve Math Word Problems, Cobbe et al 2021UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation, John Schulman 2017Proximal Policy Optimization Algorithms, Schulman 2017Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, Schulman 2016
44:21 10/18/22
Sven Mika
Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University. Featured ReferencesRLlib Documentation: RLlib: Industry-Grade Reinforcement LearningRay: DocumentationRLlib: Abstractions for Distributed Reinforcement LearningEric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion StoicaEpisode sponsor: AnyscaleRay Summit 2022 is coming to San Francisco on August 23-24.Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.
34:56 8/19/22
Karol Hausman and Fei Xia
Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments. Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.Featured ReferencesDo As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ] Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan YanInner Monologue: Embodied Reasoning through Planning with Language ModelsWenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian IchterAdditional ReferencesLarge-scale simulation for embodied perception and robot learning, Xia 2021QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, Kalashnikov et al 2018MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale, Kalashnikov et al 2021ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, Xia et al 2020Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills, Chebotar et al 2021  Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, Zeng et al 2022Episode sponsor: AnyscaleRay Summit 2022 is coming to San Francisco on August 23-24.Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.
63:09 8/16/22
Sai Krishna Gottipati
Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning.Featured ReferencesCogment: Open Source Framework For Distributed Multi-actor Training, Deployment & OperationsAI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François ChabotDo As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement LearningCurrently under reviewLearning to navigate the synthetically accessible chemical space using reinforcement learningSai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua BengioAdditional ReferencesAsymmetric self-play for automatic goal discovery in robotic manipulation, 2021 OpenAI et al Continuous Coordination As a Realistic Scenario for Lifelong Learning, 2021 Nekoei et alEpisode sponsor: AnyscaleRay Summit 2022 is coming to San Francisco on August 23-24.Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.
68:11 8/1/22
Aravind Srinivas 2
Aravind Srinivas is back!  He is now a research Scientist at OpenAI.Featured ReferencesDecision Transformer: Reinforcement Learning via Sequence ModelingLili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor MordatchVideoGPT: Video Generation using VQ-VAE and TransformersWilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas
58:33 5/9/22
Rohin Shah
Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.Featured ReferencesThe MineRL BASALT Competition on Learning from Human FeedbackRohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca DraganPreferences Implicit in the State of the WorldRohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca DraganBenefits of Assistance over Reward Learning Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart RussellOn the Utility of Learning about Humans for Human-AI CoordinationMicah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca DraganEvaluating the Robustness of Collaborative AgentsPaul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin ShahAdditional ReferencesAGI Safety Fundamentals, EA Cambridge
97:04 4/12/22
Jordan Terry
Jordan Terry is a PhD candidate at University of Maryland, the maintainer of Gym, the maintainer and creator of PettingZoo and the founder of Swarm Labs.Featured ReferencesPettingZoo: Gym for Multi-Agent Reinforcement LearningJ. K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Rodrigo Perez, Caroline Horsch, Clemens Dieffendahl, Niall L. Williams, Yashas Lokesh, Praveen RaviPettingZoo on Githubgym on GithubAdditional ReferencesTime Limits in Reinforcement Learning, Pardo et al 2017Deep Reinforcement Learning at the Edge of the Statistical Precipice, Agarwal et al 2021
63:48 2/22/22
Robert Lange
Robert Tjarko Lange is a PhD student working at the Technical University Berlin.Featured ReferencesLearning not to learn: Nature versus nurture in silicoLange, R. T., & Sprekeler, H. (2020)On Lottery Tickets and Minimal Task Representations in Deep Reinforcement LearningVischer, M. A., Lange, R. T., & Sprekeler, H. (2021). Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task AbstractionsLange, R. T., & Faisal, A. (2019).MLE-Infrastructure on GithubAdditional ReferencesRL^2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al 2016Learning to reinforcement learn, Wang et al 2016Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al 2021
70:57 12/20/21
NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop
We hear about the idea of PERLS and why its important to talk about.Political Economy of Reinforcement Learning (PERLS) Workshop at NeurIPS 2021 on Tues Dec 14th NeurIPS 2021
24:07 11/18/21
Amy Zhang
Amy Zhang is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023. Featured References Invariant Causal Prediction for Block MDPs Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup Multi-Task Reinforcement Learning with Context-based Representations Shagun Sodhani, Amy Zhang, Joelle Pineau MBRL-Lib: A Modular Library for Model-based Reinforcement Learning Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra Additional References Amy Zhang - Exploring Context for Better Generalization in Reinforcement Learning @ UCL DARK ICML 2020 Poster session: Invariant Causal Prediction for Block MDPs Clare Lyle - Invariant Prediction for Generalization in Reinforcement Learning @ Simons Institute 
69:35 9/27/21
Xianyuan Zhan
Xianyuan Zhan is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University.  He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology.  At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems. Featured References DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement LearningXianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng 
41:30 8/30/21
Eugene Vinitsky
Eugene Vinitsky is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind.  Featured References A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018 The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu Additional References SUMO: Simulation of Urban MObility 
66:02 8/18/21
Jess Whittlestone
Dr. Jess Whittlestone is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge. Featured References The Societal Implications of Deep Reinforcement Learning Jess Whittlestone, Kai Arulkumaran, Matthew Crosby Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI Carla Zoe Cremer, Jess Whittlestone Additional References CogX: Cutting Edge: Understanding AI systems for a better AI policy, featuring Jack Clark and Jess Whittlestone 
91:36 7/20/21
Aleksandra Faust
Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research. Featured References Reinforcement Learning and Planning for Preference Balancing Tasks Faust 2014 Learning Navigation Behaviors End-to-End with AutoRL Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis Evolving Rewards to Automate Reinforcement Learning Aleksandra Faust, Anthony Francis, Dar Mehta Evolving Reinforcement Learning Algorithms John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust Adversarial Environment Generation for Learning to Navigate the Web Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust Additional References AutoML-Zero: Evolving Machine Learning Algorithms From Scratch, Esteban Real, Chen Liang, David R. So, Quoc V. Le  
54:30 7/6/21
Sam Ritter
Sam Ritter is a Research Scientist on the neuroscience team at DeepMind. Featured References Unsupervised Predictive Memory in a Goal-Directed Agent (MERLIN) Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap Meta-RL without forgetting:  Been There, Done That: Meta-Learning with Episodic Recall Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning Samuel Ritter 2019 Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo Synthetic Returns for Long-Term Credit Assignment David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song  Additional References Sam Ritter: Meta-Learning to Make Smart Inferences from Small Data , North Star AI 2019 The Bitter Lesson, Rich Sutton 2019 
100:35 6/21/21
Thomas Krendl Gilbert
Thomas Krendl Gilbert is a PhD student at UC Berkeley’s Center for Human-Compatible AI, specializing in Machine Ethics and Epistemology. Featured References Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles Thomas Krendl Gilbert AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert and Tom Zick Additional References Political Economy of Reinforcement Learning Systems (PERLS) The Law and Political Economy (LPE) Project The Societal Implications of Deep Reinforcement Learning, Jess Whittlestone, Kai Arulkumaran, Matthew Crosby Robot Brains Podcast: Yann LeCun explains why Facebook would crumble without AI 
72:14 5/17/21

Similar podcasts