


default search action
Reinforcement Learning Journal, Volume 3
Volume 3, 2024
- Brett Daley, Marlos C. Machado, Martha White:

Demystifying the Recency Heuristic in Temporal-Difference Learning. RLJ 3: 1019-1036 (2024) - Johan S. Obando-Ceron, João Guilherme Madeira Araújo, Aaron C. Courville, Pablo Samuel Castro:

On the consistency of hyper-parameter selection in value-based deep reinforcement learning. RLJ 3: 1037-1059 (2024) - Frieda Rong, Max Kleiman-Weiner:

Value Internalization: Learning and Generalizing from Social Reward. RLJ 3: 1060-1071 (2024) - Timon Willi, Johan S. Obando-Ceron, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Pablo Samuel Castro:

Mixture of Experts in a Mixture of RL settings. RLJ 3: 1072-1105 (2024) - Davide Corsi, Davide Camponogara, Alessandro Farinelli:

Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning. RLJ 3: 1106-1123 (2024) - Cyrus Cousins, Kavosh Asadi, Elita A. Lobo, Michael Littman:

On Welfare-Centric Fair Reinforcement Learning. RLJ 3: 1124-1137 (2024) - Jiayu Yao, Weiwei Pan, Finale Doshi-Velez, Barbara E. Engelhardt:

Inverse Reinforcement Learning with Multiple Planning Horizons. RLJ 3: 1138-1167 (2024) - Yixuan Zhang, Qiaomin Xie:

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation. RLJ 3: 1168-1210 (2024) - Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu:

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling. RLJ 3: 1211-1235 (2024) - Qining Zhang, Honghao Wei, Lei Ying:

Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis. RLJ 3: 1236-1251 (2024) - Kevin Tan, Ziping Xu:

A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage. RLJ 3: 1252-1264 (2024) - Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael Littman:

Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior. RLJ 3: 1265-1288 (2024) - Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, Bin Liu:

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach. RLJ 3: 1289-1305 (2024) - Kris De Asis, Richard S. Sutton:

An Idiosyncrasy of Time-discretization in Reinforcement Learning. RLJ 3: 1306-1316 (2024) - Sai Prasanna, Karim Farid, Raghu Rajan, André Biedenkapp:

Dreaming of Many Worlds: Learning Contextual World Models aids Zero-Shot Generalization. RLJ 3: 1317-1350 (2024) - Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang:

Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes. RLJ 3: 1351-1376 (2024) - Marin Vlastelica, Jin Cheng, Georg Martius, Pavel Kolev:

Offline Diversity Maximization under Imitation Constraints. RLJ 3: 1377-1409 (2024) - Léopold Maytié, Benjamin Devillers, Alexandre Arnold, Rufin VanRullen:

Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace. RLJ 3: 1410-1426 (2024) - Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada:

Stabilizing Extreme Q-learning by Maclaurin Expansion. RLJ 3: 1427-1440 (2024) - Julian Dierkes, Emma Cramer, Holger H. Hoos, Sebastian Trimpe:

Combining Automated Optimisation of Hyperparameters and Reward Shape. RLJ 3: 1441-1466 (2024) - He Wang, Laixi Shi, Yuejie Chi:

Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes. RLJ 3: 1467-1510 (2024) - Raphaël Boige, Yannis Flet-Berliac, Lars C. P. M. Quaedvlieg, Arthur Flajolet, Guillaume Richard, Thomas Pierrot:

PASTA: Pretrained Action-State Transformer Agents. RLJ 3: 1511-1532 (2024)

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














