Stop the war!

Остановите войну!

for scientists:

default search action

combined dblp search
author search
venue search
publication search

ask others

Neel Nanda

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[i21]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2401-12181
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2401-12181
Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas:
Universal Neurons in GPT2 Language Models. CoRR abs/2401.12181 (2024)
[i20]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-07321
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-07321
Bilal Chughtai, Alan Cooney, Neel Nanda:
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs. CoRR abs/2402.07321 (2024)
[i19]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-15390
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-15390
Cody Rushing, Neel Nanda:
Explorations of Self-Repair in Language Models. CoRR abs/2402.15390 (2024)
[i18]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2403-00745
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2403-00745
János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda:
AtP*: An efficient and scalable method for localizing LLM behaviour to components. CoRR abs/2403.00745 (2024)
2023
[c4]
- view
  authority control:
- export record
  dblp key:
  - conf/blackboxnlp/NandaLW23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/blackboxnlp/NandaLW23
Neel Nanda, Andrew Lee, Martin Wattenberg:
Emergent Linear Representations in World Models of Self-Supervised Sequence Models. BlackboxNLP@EMNLP 2023: 16-30
[c3]
- view
  - electronic edition @ openreview.net (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/iclr/NandaCLSS23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/NandaCLSS23
Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt:
Progress measures for grokking via mechanistic interpretability. ICLR 2023
[c2]
- view
  - electronic edition @ mlr.press (open access)
  - no references & citations available
- export record
  dblp key:
  - conf/icml/ChughtaiCN23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/ChughtaiCN23
Bilal Chughtai, Lawrence Chan, Neel Nanda:
A Toy Model of Universality: Reverse Engineering how Networks Learn Group Operations. ICML 2023: 6243-6267
[i17]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2301-05217
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2301-05217
Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt:
Progress measures for grokking via mechanistic interpretability. CoRR abs/2301.05217 (2023)
[i16]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2302-03025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2302-03025
Bilal Chughtai, Lawrence Chan, Neel Nanda:
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations. CoRR abs/2302.03025 (2023)
[i15]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2304-12918
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2304-12918
Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Fazl Barez:
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models. CoRR abs/2304.12918 (2023)
[i14]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-01610
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-01610
Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas:
Finding Neurons in a Haystack: Case Studies with Sparse Probing. CoRR abs/2305.01610 (2023)
[i13]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-19911
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-19911
Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Shay B. Cohen, Fazl Barez:
Neuron to Graph: Interpreting Language Model Neurons at Scale. CoRR abs/2305.19911 (2023)
[i12]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2307-09458
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2307-09458
Tom Lieberum, Matthew Rahtz, János Kramár, Neel Nanda, Geoffrey Irving, Rohin Shah, Vladimir Mikulik:
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla. CoRR abs/2307.09458 (2023)
[i11]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-00941
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-00941
Neel Nanda, Andrew Lee, Martin Wattenberg:
Emergent Linear Representations in World Models of Self-Supervised Sequence Models. CoRR abs/2309.00941 (2023)
[i10]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-16042
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-16042
Fred Zhang, Neel Nanda:
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods. CoRR abs/2309.16042 (2023)
[i9]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-04625
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-04625
Callum McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda:
Copy Suppression: Comprehensively Understanding an Attention Head. CoRR abs/2310.04625 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-15154
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-15154
Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda:
Linear Representations of Sentiment in Large Language Models. CoRR abs/2310.15154 (2023)
[i7]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2311-00863
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2311-00863
Lucia Quirke, Lovis Heindrich, Wes Gurnee, Neel Nanda:
Training Dynamics of Contextual N-Grams in Language Models. CoRR abs/2311.00863 (2023)
[i6]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2311-17030
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2311-17030
Aleksandar Makelov, Georg Lange, Neel Nanda:
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching. CoRR abs/2311.17030 (2023)
2022
[j1]
- view
  - electronic edition @ jmlr.org (open access)
  - no references & citations available
- export record
  dblp key:
  - journals/jmlr/CohenHN22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/jmlr/CohenHN22
Michael K. Cohen, Marcus Hutter, Neel Nanda:
Fully General Online Imitation Learning. J. Mach. Learn. Res. 23: 334:1-334:30 (2022)
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/fat/GanguliHLABCCDD22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/fat/GanguliHLABCCDD22
Deep Ganguli, Danny Hernandez, Liane Lovitt, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Scott Johnston, Andy Jones, Nicholas Joseph, Jackson Kernian, Shauna Kravec, Ben Mann, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Tom B. Brown, Jared Kaplan, Sam McCandlish, Christopher Olah, Dario Amodei, Jack Clark:
Predictability and Surprise in Large Generative Models. FAccT 2022: 1747-1764
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2202-07785
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-07785
Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Benjamin Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Scott Johnston, Shauna Kravec, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Dario Amodei, Tom B. Brown, Jared Kaplan, Sam McCandlish, Chris Olah, Jack Clark:
Predictability and Surprise in Large Generative Models. CoRR abs/2202.07785 (2022)
[i4]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2204-05862
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2204-05862
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Chris Olah, Benjamin Mann, Jared Kaplan:
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. CoRR abs/2204.05862 (2022)
[i3]
- view
  - electronic edition via DOI (open access)
  - references & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2209-11895
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2209-11895
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah:
In-context Learning and Induction Heads. CoRR abs/2209.11895 (2022)
2021
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2102-08686
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2102-08686
Michael K. Cohen, Marcus Hutter, Neel Nanda:
Fully General Online Imitation Learning. CoRR abs/2102.08686 (2021)
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - references & citations
- export record
  dblp key:
  - journals/corr/abs-2110-01577
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2110-01577
Neel Nanda, Jonathan Uesato, Sven Gowal:
An Empirical Investigation of Learning from Biased Toxicity Labels. CoRR abs/2110.01577 (2021)

Coauthor Index

see FAQ

a service of

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.