Остановите войну!
for scientists:
default search action
Neel Nanda
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [i21]Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas:
Universal Neurons in GPT2 Language Models. CoRR abs/2401.12181 (2024) - [i20]Bilal Chughtai, Alan Cooney, Neel Nanda:
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs. CoRR abs/2402.07321 (2024) - [i19]Cody Rushing, Neel Nanda:
Explorations of Self-Repair in Language Models. CoRR abs/2402.15390 (2024) - [i18]János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda:
AtP*: An efficient and scalable method for localizing LLM behaviour to components. CoRR abs/2403.00745 (2024) - 2023
- [c4]Neel Nanda, Andrew Lee, Martin Wattenberg:
Emergent Linear Representations in World Models of Self-Supervised Sequence Models. BlackboxNLP@EMNLP 2023: 16-30 - [c3]Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt:
Progress measures for grokking via mechanistic interpretability. ICLR 2023 - [c2]Bilal Chughtai, Lawrence Chan, Neel Nanda:
A Toy Model of Universality: Reverse Engineering how Networks Learn Group Operations. ICML 2023: 6243-6267 - [i17]Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt:
Progress measures for grokking via mechanistic interpretability. CoRR abs/2301.05217 (2023) - [i16]Bilal Chughtai, Lawrence Chan, Neel Nanda:
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations. CoRR abs/2302.03025 (2023) - [i15]Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Fazl Barez:
N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models. CoRR abs/2304.12918 (2023) - [i14]Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas:
Finding Neurons in a Haystack: Case Studies with Sparse Probing. CoRR abs/2305.01610 (2023) - [i13]Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Shay B. Cohen, Fazl Barez:
Neuron to Graph: Interpreting Language Model Neurons at Scale. CoRR abs/2305.19911 (2023) - [i12]Tom Lieberum, Matthew Rahtz, János Kramár, Neel Nanda, Geoffrey Irving, Rohin Shah, Vladimir Mikulik:
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla. CoRR abs/2307.09458 (2023) - [i11]Neel Nanda, Andrew Lee, Martin Wattenberg:
Emergent Linear Representations in World Models of Self-Supervised Sequence Models. CoRR abs/2309.00941 (2023) - [i10]Fred Zhang, Neel Nanda:
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods. CoRR abs/2309.16042 (2023) - [i9]Callum McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda:
Copy Suppression: Comprehensively Understanding an Attention Head. CoRR abs/2310.04625 (2023) - [i8]Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda:
Linear Representations of Sentiment in Large Language Models. CoRR abs/2310.15154 (2023) - [i7]Lucia Quirke, Lovis Heindrich, Wes Gurnee, Neel Nanda:
Training Dynamics of Contextual N-Grams in Language Models. CoRR abs/2311.00863 (2023) - [i6]Aleksandar Makelov, Georg Lange, Neel Nanda:
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching. CoRR abs/2311.17030 (2023) - 2022
- [j1]Michael K. Cohen, Marcus Hutter, Neel Nanda:
Fully General Online Imitation Learning. J. Mach. Learn. Res. 23: 334:1-334:30 (2022) - [c1]Deep Ganguli, Danny Hernandez, Liane Lovitt, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Scott Johnston, Andy Jones, Nicholas Joseph, Jackson Kernian, Shauna Kravec, Ben Mann, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Tom B. Brown, Jared Kaplan, Sam McCandlish, Christopher Olah, Dario Amodei, Jack Clark:
Predictability and Surprise in Large Generative Models. FAccT 2022: 1747-1764 - [i5]Deep Ganguli, Danny Hernandez, Liane Lovitt, Nova DasSarma, Tom Henighan, Andy Jones, Nicholas Joseph, Jackson Kernion, Benjamin Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Nelson Elhage, Sheer El Showk, Stanislav Fort, Zac Hatfield-Dodds, Scott Johnston, Shauna Kravec, Neel Nanda, Kamal Ndousse, Catherine Olsson, Daniela Amodei, Dario Amodei, Tom B. Brown, Jared Kaplan, Sam McCandlish, Chris Olah, Jack Clark:
Predictability and Surprise in Large Generative Models. CoRR abs/2202.07785 (2022) - [i4]Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Chris Olah, Benjamin Mann, Jared Kaplan:
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. CoRR abs/2204.05862 (2022) - [i3]Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah:
In-context Learning and Induction Heads. CoRR abs/2209.11895 (2022) - 2021
- [i2]Michael K. Cohen, Marcus Hutter, Neel Nanda:
Fully General Online Imitation Learning. CoRR abs/2102.08686 (2021) - [i1]Neel Nanda, Jonathan Uesato, Sven Gowal:
An Empirical Investigation of Learning from Biased Toxicity Labels. CoRR abs/2110.01577 (2021)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-04-14 01:15 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint