


default search action
Adam Gleave
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c10]Dillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, Kellin Pelrine:
Scaling Trends for Data Poisoning in LLMs. AAAI 2025: 27206-27214
[c9]Tom Tseng, Euan McLean, Kellin Pelrine, Tony Tong Wang, Adam Gleave:
Can Go AIs Be Adversarially Robust? AAAI 2025: 27662-27670
[c8]Nikolaus H. R. Howe, Ian R. McKenzie, Oskar John Hollinsworth, Michal Zajac, Tom Tseng, Aaron David Tucker, Pierre-Luc Bacon, Adam Gleave:
Scaling Trends in Language Model Robustness. ICML 2025
[i33]Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob N. Foerster, Tomas Gavenciak, The Anh Han
, Edward Hughes, Vojtech Kovarík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schröder de Witt, Nisarg Shah, Michael P. Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran, Igor Krawczuk, Max Lamparth, Niklas Lauffer, Alexander Meinke, Sumeet Motwani, Anka Reuel, Vincent Conitzer, Michael Dennis, Iason Gabriel, Adam Gleave, Gillian K. Hadfield, Nika Haghtalab, Atoosa Kasirzadeh, Sébastien Krier, Kate Larson, Joel Lehman, David C. Parkes, Georgios Piliouras, Iyad Rahwan:
Multi-Agent Risks from Advanced AI. CoRR abs/2502.14143 (2025)
[i32]Dillon Bowen, Ann-Kathrin Dombrowski, Adam Gleave, Chris Cundy:
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations. CoRR abs/2503.17388 (2025)
[i31]Chris Cundy, Adam Gleave:
Preference Learning with Lie Detectors can Induce Honesty or Evasion. CoRR abs/2505.13787 (2025)
[i30]Matthew Kowal, Jasper Timm, Jean-François Godbout, Thomas Costello, Antonio A. Arechar, Gordon Pennycook, David Rand, Adam Gleave, Kellin Pelrine:
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics. CoRR abs/2506.02873 (2025)
[i29]Mohammad Taufeeque, Aaron David Tucker, Adam Gleave, Adrià Garriga-Alonso:
Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban. CoRR abs/2506.10138 (2025)
[i28]Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai, Agnès Delaborde, Nouha Dziri, Francisco Eiras, Joshua Engels, Jinyu Fan, Adam Gleave, Noah Goodman, Fynn Heide, Johannes Heidecke, Dan Hendrycks, Cyrus Hodes, Bryan Low Kian Hsiang, Minlie Huang, Sami Jawhar, Wang Jingyu, Adam Tauman Kalai, Meindert Kamphuis, Mohan S. Kankanhalli, Subhash Kantamneni, Mathias Bonde Kirk, Thomas Kwa, Jeffrey Ladish, Kwok-Yan Lam, Wan Lee Sie, Taewhi Lee, Xiaojian Li, Jiajun Liu, Chaochao Lu, Yifan Mai, Richard Mallah, Julian Michael, Nick Moës, Simon Möller, Kihyuk Nam, Kwan Yee Ng, Mark Nitzberg, Besmira Nushi, Seán Ó hÉigeartaigh, Alejandro Ortega, Pierre Peigné
, James Petrie, Benjamin Prud'homme, Reihaneh Rabbany, Nayat Sanchez-Pi
, Sarah Schwettmann, Buck Shlegeris, Saad Siddiqui, Aradhana Sinha, Martín Soto, Cheston Tan, Dong Ting, William-Chandra Tjhi, Robert Trager, Brian Tse, Anthony Tung K. H., John Willes, Denise Wong, Wei Xu, Rongwu Xu, Yi Zeng, HongJiang Zhang, Djordje Zikelic:
The Singapore Consensus on Global AI Safety Research Priorities. CoRR abs/2506.20702 (2025)
[i27]Ian R. McKenzie, Oskar John Hollinsworth, Tom Tseng, Xander Davies, Stephen Casper, Aaron D. Tucker, Robert Kirk, Adam Gleave:
STACK: Adversarial Attacks on LLM Safeguard Pipelines. CoRR abs/2506.24068 (2025)
[i26]Ann-Kathrin Dombrowski, Dillon Bowen, Adam Gleave, Chris Cundy:
The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models. CoRR abs/2507.11544 (2025)
[i25]Brendan Murphy, Dillon Bowen, Shahrad Mohammadzadeh, Julius Broomfield, Adam Gleave, Kellin Pelrine:
Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility. CoRR abs/2507.11630 (2025)- 2024
[c7]Joar Max Viktor Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate:
STARC: A General Framework For Quantifying Differences Between Reward Functions. ICLR 2024
[i24]Pedro Freire, ChengCheng Tan, Adam Gleave, Dan Hendrycks, Scott Emmons:
Uncovering Latent Human Wellbeing in Language Model Embeddings. CoRR abs/2402.11777 (2024)
[i23]Tom Tseng, Euan McLean, Kellin Pelrine, Tony T. Wang, Adam Gleave:
Can Go AIs be adversarially robust? CoRR abs/2406.12843 (2024)
[i22]Adrià Garriga-Alonso, Mohammad Taufeeque, Adam Gleave:
Planning behavior in a recurrent neural network that plays Sokoban. CoRR abs/2407.15421 (2024)
[i21]Nikolaus H. R. Howe, Michal Zajac, Ian R. McKenzie, Oskar John Hollinsworth, Tom Tseng, Pierre-Luc Bacon, Adam Gleave:
Exploring Scaling Trends in LLM Robustness. CoRR abs/2407.18213 (2024)
[i20]Dillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, Kellin Pelrine:
Scaling Laws for Data Poisoning in LLMs. CoRR abs/2408.02946 (2024)- 2023
[c6]Joar Max Viktor Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave:
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning. ICML 2023: 32033-32058
[c5]Tony Tong Wang, Adam Gleave, Tom Tseng
, Kellin Pelrine, Nora Belrose, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell:
Adversarial Policies Beat Superhuman Go AIs. ICML 2023: 35655-35739
[i19]Lev McKinney, Yawen Duan, David Krueger, Adam Gleave
:
On The Fragility of Learned Reward Functions. CoRR abs/2301.03652 (2023)
[i18]Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate:
STARC: A General Framework For Quantifying Differences Between Reward Functions. CoRR abs/2309.15257 (2023)
[i17]Kellin Pelrine, Mohammad Taufeeque, Michal Zajac, Euan McLean, Adam Gleave:
Exploiting Novel GPT-4 APIs. CoRR abs/2312.14302 (2023)- 2022
[b1]Adam Gleave:
Towards Trustworthy Machine Learning. University of California, Berkeley, USA, 2022
[i16]Adam Gleave, Geoffrey Irving:
Uncertainty Estimation for Language Reward Models. CoRR abs/2203.07472 (2022)
[i15]Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave:
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning. CoRR abs/2203.07475 (2022)
[i14]Adam Gleave, Sam Toyer:
A Primer on Maximum Causal Entropy Inverse Reinforcement Learning. CoRR abs/2203.11409 (2022)
[i13]Erik Jenner, Adam Gleave:
Preprocessing Reward Functions for Interpretability. CoRR abs/2203.13553 (2022)
[i12]Pavel Czempin, Adam Gleave
:
Reducing Exploitability with Population Based Training. CoRR abs/2208.05083 (2022)
[i11]Erik Jenner, Herke van Hoof, Adam Gleave
:
Calculus on MDPs: Potential Shaping as a Gradient. CoRR abs/2208.09570 (2022)
[i10]Tony Tong Wang, Adam Gleave
, Nora Belrose, Tom Tseng, Joseph Miller, Michael D. Dennis, Yawen Duan, Viktor Pogrebniak, Sergey Levine, Stuart Russell:
Adversarial Policies Beat Professional-Level Go AIs. CoRR abs/2211.00241 (2022)
[i9]Adam Gleave
, Mohammad Taufeeque, Juan Rocamonde
, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell:
imitation: Clean Imitation Learning Implementations. CoRR abs/2211.11972 (2022)- 2021
[j1]Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann:
Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 22: 268:1-268:8 (2021)
[c4]Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike:
Quantifying Differences in Reward Functions. ICLR 2021- 2020
[c3]Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell:
Adversarial Policies: Attacking Deep Reinforcement Learning. ICLR 2020
[i8]Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike:
Quantifying Differences in Reward Functions. CoRR abs/2006.13900 (2020)
[i7]Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell:
DERAIL: Diagnostic Environments for Reward And Imitation Learning. CoRR abs/2012.01365 (2020)
[i6]Eric J. Michaud
, Adam Gleave, Stuart Russell:
Understanding Learned Reward Functions. CoRR abs/2012.05862 (2020)
2010 – 2019
- 2019
[i5]Adam Gleave, Michael Dennis, Neel Kant, Cody Wild, Sergey Levine, Stuart Russell:
Adversarial Policies: Attacking Deep Reinforcement Learning. CoRR abs/1905.10615 (2019)- 2018
[i4]Adam Gleave, Oliver Habryka:
Multi-task Maximum Entropy Inverse Reinforcement Learning. CoRR abs/1805.08882 (2018)
[i3]Sören Mindermann, Rohin Shah, Adam Gleave, Dylan Hadfield-Menell:
Active Inverse Reward Design. CoRR abs/1809.03060 (2018)
[i2]Aaron Tucker, Adam Gleave, Stuart Russell:
Inverse reinforcement learning for video games. CoRR abs/1810.10593 (2018)- 2017
[c2]Adam Gleave
, Christian Steinruecken:
Making Compression Algorithms for Unicode Text. DCC 2017: 441
[i1]Adam Gleave, Christian Steinruecken:
Making compression algorithms for Unicode text. CoRR abs/1701.04047 (2017)- 2016
[c1]Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, Steven Hand:
Firmament: Fast, Centralized Cluster Scheduling at Scale. OSDI 2016: 99-115
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-12-09 00:31 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







