


default search action
Ben Athiwaratkun
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c19]Linda He, Jue Wang, Maurice Weber, Shang Zhu, Ben Athiwaratkun, Ce Zhang:
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation. ICLR 2025
[c18]James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun:
Training-Free Activation Sparsity in Large Language Models. ICLR 2025
[c17]Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou:
Mixture-of-Agents Enhances Large Language Model Capabilities. ICLR 2025
[c16]Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou:
Improving Model Alignment Through Collective Intelligence of Open-Source Models. ICML 2025
[c15]Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao:
Ladder-Residual: Parallelism-Aware Architecture for Accelerating Large Model Inference with Communication Overlapping. ICML 2025
[i35]Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao:
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping. CoRR abs/2501.06589 (2025)
[i34]Linda He, Jue Wang, Maurice Weber, Shang Zhu, Ben Athiwaratkun, Ce Zhang:
Scaling Instruction-Tuned LLMs to Million-Token Contexts via Hierarchical Synthetic Data Generation. CoRR abs/2504.12637 (2025)
[i33]Junlin Wang, Shang Zhu, Jon Saad-Falcon, Ben Athiwaratkun, Qingyang Wu, Jue Wang, Shuaiwen Leon Song, Ce Zhang, Bhuwan Dhingra, James Zou:
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods. CoRR abs/2504.14047 (2025)
[i32]Rahul Thapa, Andrew Li, Qingyang Wu, Bryan He, Yuki Sahashi, Christina Binder, Angela Zhang, Ben Athiwaratkun, Shuaiwen Leon Song, David Ouyang, James Zou:
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos? CoRR abs/2504.14391 (2025)
[i31]Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou:
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS. CoRR abs/2505.03059 (2025)
[i30]Rahul Thapa, Qingyang Wu, Kevin Wu, Harrison Zhang, Angela Zhang, Eric Wu, Haotian Ye, Suhana Bedi, Nevin Aresh, Joseph Boen, Shriya Reddy, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou:
Disentangling Reasoning and Knowledge in Medical Large Language Models. CoRR abs/2505.11462 (2025)
[i29]Zhen Xu, Shang Zhu, Jue Wang, Junlin Wang, Ben Athiwaratkun, Chi Wang, James Zou, Ce Zhang:
When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework. CoRR abs/2506.16411 (2025)
[i28]Jon Saad-Falcon, Estefany Kelly Buchanan, Mayee F. Chen, Tzu-Heng Huang, Brendan McLaughlin, Tanvir Bhathal, Shang Zhu, Ben Athiwaratkun, Frederic Sala, Scott W. Linderman, Azalia Mirhoseini, Christopher Ré:
Shrinking the Generation-Verification Gap with Weak Verifiers. CoRR abs/2506.18203 (2025)
[i27]Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou:
Data Diversification Methods In Alignment Enhance Math Performance In LLMs. CoRR abs/2507.02173 (2025)
[i26]Zhongzhu Zhou, Yibo Yang, Ziyan Chen, Fengxiang Bie, Haojun Xia, Xiaoxia Wu, Robert Wu, Ben Athiwaratkun, Bernard Ghanem, Shuaiwen Leon Song:
Imitate Optimal Policy: Prevail and Induce Action Collapse in Policy Gradient. CoRR abs/2509.02737 (2025)
[i25]Junlin Wang, Jue Wang, Zhen Xu, Ben Athiwaratkun, Bhuwan Dhingra, Ce Zhang, James Zou:
Staircase Streaming for Low-Latency Multi-Agent Inference. CoRR abs/2510.05059 (2025)
[i24]Costin-Andrei Oncescu, Qingyang Wu, Wai Tong Chung, Robert Wu, Bryan Gopal, Junxiong Wang, Tri Dao, Ben Athiwaratkun:
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining. CoRR abs/2511.02237 (2025)
[i23]Jon Saad-Falcon, Avanika Narayan, Hakki Orhun Akengin, J. Wes Griffin, Herumb Shandilya, Adrian Gamarra Lafuente, Medhya Goel, Rebecca Joseph, Shlok Natarajan, Etash Kumar Guha, Shang Zhu, Ben Athiwaratkun, John Hennessy, Azalia Mirhoseini, Christopher Ré:
Intelligence per Watt: Measuring Intelligence Efficiency of Local AI. CoRR abs/2511.07885 (2025)
[i22]Zelei Shao, Vikranth Srivatsa, Sanjana Srivastava, Qingyang Wu, Alpay Ariyak, Xiaoxia Wu, Ameen Patel, Jue Wang, Percy Liang, Tri Dao, Ce Zhang, Yiying Zhang, Ben Athiwaratkun, Chenfeng Xu, Junxiong Wang:
Beat the long tail: Distribution-Aware Speculative Decoding for RL Training. CoRR abs/2511.13841 (2025)
[i21]Haojun Xia, Xiaoxia Wu, Jisen Li, Robert Wu, Junxiong Wang, Jue Wang, Chenxi Li, Aman Singhal, Alay Dilipbhai Shah, Alpay Ariyak, Donglin Zhuang, Zhongzhu Zhou, Ben Athiwaratkun, Zhen Zheng, Shuaiwen Leon Song:
Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost. CoRR abs/2511.18643 (2025)
[i20]Minseo Kim, Chenfeng Xu, Coleman Hooper, Harman Singh, Ben Athiwaratkun, Ce Zhang, Kurt Keutzer, Amir Gholami:
CDLM: Consistency Diffusion Language Models For Faster Sampling. CoRR abs/2511.19269 (2025)- 2024
[c14]Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Robert Kwiatkowski, Ramesh Nallapati, Parminder Bhatia, Bing Xiang:
Token Alignment via Character Matching for Subword Completion. ACL (Findings) 2024: 15725-15738
[c13]Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun:
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies. EMNLP 2024: 19916-19939
[c12]Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang:
Bifurcated Attention for Single-Context Large-Batch Sampling. ICML 2024
[c11]Maurice Weber, Daniel Y. Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, Ce Zhang:
RedPajama: an Open Dataset for Training Large Language Models. NeurIPS 2024
[i19]Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Rob Kwiatkowski, Ramesh Nallapati, Bing Xiang:
Token Alignment via Character Matching for Subword Completion. CoRR abs/2403.08688 (2024)
[i18]Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang:
Bifurcated Attention for Single-Context Large-Batch Sampling. CoRR abs/2403.08845 (2024)
[i17]Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou:
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model. CoRR abs/2406.00977 (2024)
[i16]Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou:
Mixture-of-Agents Enhances Large Language Model Capabilities. CoRR abs/2406.04692 (2024)
[i15]Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun:
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies. CoRR abs/2406.06461 (2024)
[i14]James Liu, Pragaash Ponnusamy, Tianle Cai, Han Guo, Yoon Kim, Ben Athiwaratkun:
Training-Free Activation Sparsity in Large Language Models. CoRR abs/2408.14690 (2024)
[i13]Maurice Weber, Daniel Y. Fu, Quentin Anthony, Yonatan Oren, Shane Adams, Anton Alexandrov, Xiaozhong Lyu, Huu Nguyen, Xiaozhe Yao, Virginia Adams, Ben Athiwaratkun, Rahul Chalamala, Kezhen Chen, Max Ryabinin, Tri Dao, Percy Liang, Christopher Ré, Irina Rish, Ce Zhang:
RedPajama: an Open Dataset for Training Large Language Models. CoRR abs/2411.12372 (2024)- 2023
[c10]Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati:
Multi-lingual Evaluation of Code Generation Models. ICLR 2023
[c9]Xiaokai Wei
, Sujan Kumar Gonugondla
, Shiqi Wang
, Wasi Uddin Ahmad, Baishakhi Ray
, Haifeng Qian
, Xiaopeng Li
, Varun Kumar
, Zijian Wang
, Yuchen Tian
, Qing Sun
, Ben Athiwaratkun
, Mingyue Shang
, Murali Krishna Ramanathan
, Parminder Bhatia
, Bing Xiang
:
Towards Greener Yet Powerful Code Generation via Quantization: An Empirical Study. ESEC/SIGSOFT FSE 2023: 224-236
[i12]Xiaokai Wei, Sujan K. Gonugondla, Wasi Uddin Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang:
Greener yet Powerful: Taming Large Code Generation Models with Quantization. CoRR abs/2303.05378 (2023)- 2022
[i11]Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang:
Multi-lingual Evaluation of Code Generation Models. CoRR abs/2210.14868 (2022)- 2021
[c8]Dheeru Dua, Cícero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh:
Generative Context Pair Selection for Multi-hop Question Answering. EMNLP (1) 2021: 7009-7015
[c7]Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cícero Nogueira dos Santos, Bing Xiang, Stefano Soatto:
Structured Prediction as Translation between Augmented Natural Languages. ICLR 2021
[i10]Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cícero Nogueira dos Santos, Bing Xiang, Stefano Soatto:
Structured Prediction as Translation between Augmented Natural Languages. CoRR abs/2101.05779 (2021)
[i9]Dheeru Dua, Cícero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh:
Generative Context Pair Selection for Multi-hop Question Answering. CoRR abs/2104.08744 (2021)
[i8]Yang Li, Ben Athiwaratkun, Cícero Nogueira dos Santos, Bing Xiang:
Joint Text and Label Generation for Spoken Language Understanding. CoRR abs/2105.05052 (2021)- 2020
[c6]Ben Athiwaratkun, Cícero Nogueira dos Santos, Jason Krone, Bing Xiang:
Augmented Natural Language for Generative Sequence Labeling. EMNLP (1) 2020: 375-385
[i7]Ben Athiwaratkun, Cícero Nogueira dos Santos, Jason Krone, Bing Xiang:
Augmented Natural Language for Generative Sequence Labeling. CoRR abs/2009.13272 (2020)
2010 – 2019
- 2019
[c5]Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson:
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average. ICLR (Poster) 2019- 2018
[j1]Xilun Chen, Yu Sun, Ben Athiwaratkun, Claire Cardie, Kilian Q. Weinberger:
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification. Trans. Assoc. Comput. Linguistics 6: 557-570 (2018)
[c4]Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar:
Probabilistic FastText for Multi-Sense Word Embeddings. ACL (1) 2018: 1-11
[c3]Ben Athiwaratkun, Andrew Gordon Wilson:
Hierarchical Density Order Embeddings. ICLR (Poster) 2018
[i6]Ben Athiwaratkun, Andrew Gordon Wilson:
Hierarchical Density Order Embeddings. CoRR abs/1804.09843 (2018)
[i5]Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar:
Probabilistic FastText for Multi-Sense Word Embeddings. CoRR abs/1806.02901 (2018)
[i4]Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson:
Improving Consistency-Based Semi-Supervised Learning with Weight Averaging. CoRR abs/1806.05594 (2018)- 2017
[c2]Ben Athiwaratkun, Andrew Gordon Wilson:
Multimodal Word Distributions. ACL (1) 2017: 1645-1656
[c1]Ben Athiwaratkun, Jack W. Stokes:
Malware classification with LSTM and GRU language models and a character-level CNN. ICASSP 2017: 2482-2486
[i3]Ben Athiwaratkun, Andrew Gordon Wilson:
Multimodal Word Distributions. CoRR abs/1704.08424 (2017)- 2016
[i2]Xilun Chen, Ben Athiwaratkun, Yu Sun, Kilian Q. Weinberger, Claire Cardie:
Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification. CoRR abs/1606.01614 (2016)- 2015
[i1]Ben Athiwaratkun, Keegan Kang:
Feature Representation in Convolutional Neural Networks. CoRR abs/1507.02313 (2015)
Coauthor Index
aka: Shuaiwen Leon Song

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-01-17 00:06 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







