


default search action
22nd NSDI 2025: Philadelphia, PA, USA
- Theophilus A. Benson, Radhika Niranjan Mysore:
22nd USENIX Symposium on Networked Systems Design and Implementation, NSDI 2025, Philadelphia, PA, USA, April 28-30, 2025. USENIX Association 2025
Data Centers Queuing and Routing
- Xinle Du, Tong Li, Guangmeng Zhou, Zhuotao Liu, Hanlin Huang, Xiangyu Gao, Mowei Wang, Kun Tan, Ke Xu:
PRED: Performance-oriented Random Early Detection for Consistently Stable Performance in Datacenters. 1-20 - Jiali Xing, Akis Giannoukos, Paul Loh, Shuyue Wang, Justin Qiu, Henri Maxime Demoulin, Konstantinos Kallas, Benjamin C. Lee:
Rajomon: Decentralized and Coordinated Overload Control for Latency-Sensitive Microservices. 21-36 - Matthew Buckley, Parsa Pazhooheshy, Z. Morley Mao, Nandita Dukkipati, Hamid Hajabdolali Bazzaz, Priyaranjan Jha, Yingjie Bi, Steve Middlekauff, Yashar Ganjali:
Learnings from Deploying Network QoS Alignment to Application Priorities for Storage Services. 37-53 - Brice Ekane, Djob Mvondo, Renaud Lachaize, Yérom-David Bromberg, Alain Tchana, Daniel Hagimont:
DISC: Backpressure Mitigation In Multi-tier Applications With Distributed Shared Connection. 55-70
Data Plane Programmability 1
- Yinda Zhang, Liangcheng Yu, Gianni Antichi, Ran Ben Basat, Vincent Liu:
Enabling Silent Telemetry Data Transmission with InvisiFlow. 71-86 - Yadong Liu, Yunming Xiao, Xuan Zhang, Weizhen Dang, Huihui Liu, Xiang Li, Zekun He, Jilong Wang, Aleksandar Kuzmanovic, Ang Chen, Congcong Miao:
Unlocking ECMP Programmability for Precise Traffic Control. 87-106 - Jiaxin Lin, Zhiyuan Guo, Mihir Shah, Tao Ji, Yiying Zhang, Daehyeok Kim, Aditya Akella:
Enabling Portable and High-Performance SmartNIC Programs with Alkali. 107-126 - Robert Chang, Pradeep Dogga, Andy Fingerhut, Victor Rios, George Varghese:
Scaling IP Lookup to Large Databases using the CRAM Lens. 127-146
Data Center Resource Scheduling
- Zhenyuan Ruan, Shihang Li, Kaiyan Fan, Seo Jin Park, Marcos K. Aguilera, Adam Belay, Malte Schwarzkopf:
Quicksand: Harnessing Stranded Datacenter Resources with Granular Computing. 147-165 - Quanxi Li, Hong Huang, Ying Liu, Yanwen Xia, Jie Zhang, Mosong Zhou, Xiaobing Feng, Huimin Cui, Quan Chen, Yizhou Shan, Chenxi Wang:
Beehive: A Scalable Disaggregated Memory Runtime Exploiting Asynchrony of Multithreaded Programs. 167-187 - Tingjia Cao, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Tyler Caraza-Harter:
Making Serverless Pay-For-Use a Reality with Leopard. 189-204 - Carlos Segarra, Simon Shillaker, Guo Li, Eleftheria Mappoura, Rodrigo Bruno, Lluís Vilanova, Peter R. Pietzuch:
GRANNY: Granular Management of Compute-Intensive Applications in the Cloud. 205-218
Verification 1
- Delong Zhang, Chong Ye, Fei He:
On Temporal Verification of Stateful P4 Programs. 219-235 - Zechun Li, Peng Zhang, Yichi Zhang, Hongkun Yang:
NDD: A Decision Diagram for Network Verification. 237-258 - Heidi Howard, Markus A. Kuppe, Edward Ashton, Amaury Chamayou, Natacha Crooks:
Smart Casual Verification of the Confidential Consortium Framework. 259-276 - Xiwei Wu, Yueyang Feng, Tianyi Huang, Xiaoyang Lu, Shengkai Lin, Lihan Xie, Shizhen Zhao, Qinxiang Cao:
VEP: A Two-stage Verification Toolchain for Full eBPF Programmability. 277-299
Failure and Diagnosis
- Naiqian Zheng, Tianshuo Qiao, Xuanzhe Liu, Xin Jin:
MeshTest: End-to-End Testing for Service Mesh Traffic Management. 301-316 - Hamid Hajabdolali Bazzaz, Yingjie Bi, Weiwu Pang, Minlan Yu, Ramesh Govindan, Neal Cardwell, Nandita Dukkipati, Meng-Jung Tsai, Chris DeForeest, Yuxue Jin, Charles J. Carver, Jan Kopanski, Liqun Cheng, Amin Vahdat:
Preventing Network Bottlenecks: Accelerating Datacenter Services with Hotspot-Aware Placement for Compute and Storage. 317-333 - Pooria Namyar, Arvin Ghavidel, Daniel Crankshaw, Daniel S. Berger, Kevin Hsieh, Srikanth Kandula, Ramesh Govindan, Behnaz Arzani:
Enhancing Network Failure Mitigation with Performance-Aware Ranking. 335-357 - Ruiming Lu, Yunchi Lu, Yuxuan Jiang, Guangtao Xue, Peng Huang:
One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems. 359-378
All Things Transport
- Kexin Liu, Zhaochen Zhang, Chang Liu, Yizhi Wang, Vamsi Addanki, Stefan Schmid, Qingyue Wang, Wei Chen, Xiaoliang Wang, Jiaqi Zheng, Wenhao Sun, Tao Wu, Ke Meng, Fei Chen, Weiguang Wang, Bingyang Liu, Wanchun Dou, Guihai Chen, Chen Tian:
Pyrrha: Congestion-Root-Based Flow Control to Eliminate Head-of-Line Blocking in Datacenter. 379-405 - Zhongjie Chen, Qingkai Meng, ChonLam Lao, Yifan Liu, Fengyuan Ren, Minlan Yu, Yang Zhou:
eTran: Extensible Kernel Transport with eBPF. 407-425 - Chenxingyu Zhao, Jaehong Min, Ming Liu, Arvind Krishnamurthy:
White-Boxing RDMA with Packet-Granular Software Control. 427-449 - Konstantinos Prasopoulos, Ryan Kosta, Edouard Bugnion, Marios Kogias:
SIRD: A Sender-Informed, Receiver-Driven Datacenter Transport Protocol. 451-471
LLM Training and Resilience
- Fei Gui, Kaihui Gao, Li Chen, Dan Li, Vincent Liu, Ran Zhang, Hongbing Yang, Dian Xiong:
Accelerating Design Space Exploration for LLM Training Systems with Multi-experiment Parallel Simulation. 473-488 - Yinmin Zhong, Zili Zhang, Bingyang Wu, Shengyu Liu, Yukun Chen, Changyi Wan, Hanpeng Hu, Lei Xia, Ranchen Ming, Yibo Zhu, Xin Jin:
Optimizing RLHF Training for Large Language Models with Stage Fusion. 489-503 - Yangtao Deng, Xiang Shi, Zhuo Jiang, Xingjian Zhang, Lei Zhang, Zhang Zhang, Bo Li, Zuquan Song, Hang Zhu, Gaohong Liu, Fuliang Li, Shuguang Wang, Haibin Lin, Jianxi Ye, Minlan Yu:
Minder: Faulty Machine Detection for Large-scale Distributed Model Training. 505-521 - Zhiyi Yao, Pengbo Hu, Congcong Miao, Xuya Jia, Zuning Liang, Yuedong Xu, Chunzhi He, Hao Lu, Mingzhuo Chen, Xiang Li, Zekun He, Yachen Wang, Xianneng Zou, Junchen Jiang:
Holmes: Localizing Irregularities in LLM Training with Mega-scale GPU Clusters. 523-540 - Xizheng Wang, Qingxu Li, Yichi Xu, Gang Lu, Dan Li, Li Chen, Heyang Zhou, Linkang Zheng, Sen Zhang, Yikai Zhu, Yang Liu, Pengcheng Zhang, Kun Qian, Kunling He, Jiaqi Gao, Ennan Zhai, Dennis Cai, Binzhang Fu:
SimAI: Unifying Architecture Design and Performance Tuning for Large-Scale Large Language Model Training with Scalability and Precision. 541-558 - Borui Wan, Mingji Han, Yiyao Sheng, Yanghua Peng, Haibin Lin, Mofan Zhang, Zhichao Lai, Menghan Yu, Junda Zhang, Zuquan Song, Xin Liu, Chuan Wu:
ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development. 559-578
Video and Cloud Gaming
- Neil Agarwal, Rui Pan, Francis Y. Yan, Ravi Netravali:
Mowgli: Passively Learned Rate Control for Real-Time Video. 579-594 - Yang Li, Jiaxing Qiu, Hongyi Wang, Zhenhua Li, Feng Qian, Jing Yang, Hao Lin, Yunhao Liu, Bo Xiao, Xiaokang Qin, Tianyin Xu:
Dissecting and Streamlining the Interactive Loop of Mobile Cloud Gaming. 595-611 - Weijun Wang, Liang Mi, Shaowei Cen, Haipeng Dai, Yuanchun Li, Xiaoming Fu, Yunxin Liu:
Region-based Content Enhancement for Efficient Video Analytics at the Edge. 613-633 - Congkai An, Huanhuan Zhang, Shibo Wang, Jingyang Kang, Anfu Zhou, Liang Liu, Huadong Ma, Zili Meng, Delei Ma, Yusheng Dong, Xiaogang Lei:
Tooth: Toward Optimal Balance of Video QoE and Redundancy Cost by Fine-Grained FEC in Cloud Gaming Streaming. 635-651 - Tong Meng, Wenfeng Li, Chao Yuan, Changqing Yan, Le Zhang:
AsTree: An Audio Subscription Architecture Enabling Massive-Scale Multi-Party Conferencing. 653-666
Infra For ML
- Guanbin Xu, Zhihao Le, Yinhe Chen, Zhiqi Lin, Zewen Jin, Youshan Miao, Cheng Li:
AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training. 667-683 - Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, Muhammad Shahbaz:
OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud. 685-703 - Liangyu Zhao, Siddharth Pal, Tapan Chugh, Weiyang Wang, Jason Fantl, Prithwish Basu, Joud Khoury, Arvind Krishnamurthy:
Efficient Direct-Connect Topologies for Collective Communications. 705-737 - Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov:
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads. 739-758
Fast Scalable Consensus
- Tigran Bantikyan, Jonathan Zarnstorff, Te-Yen Chou, Lewis Tseng, Roberto Palmieri:
Pineapple: Unifying Multi-Paxos and Atomic Shared Registers. 759-778 - Dengcheng Hu, Jianrong Wang, Xiulong Liu, Hao Xu, Xujing Wu, Muhammad Shahzad, Guyue Liu, Keqiu Li:
Ladder: A Convergence-based Structured DAG Blockchain for High Throughput and Low Latency. 779-794 - Tianjing Xu, Yongqi Zhong, Yiming Zhang, Ruofan Xiong, Jingjing Zhang, Guangtao Xue, Shengyun Liu:
Vegeta: Enabling Parallel Smart Contract Execution in Leaderless Blockchains. 795-811 - Balaji Arun, Zekun Li, Florian Suri-Payer, Sourav Das, Alexander Spiegelman:
Shoal++: High Throughput DAG BFT Can Be Fast and Robust! 813-826
Operational Experiences
- Xuan Zeng, Haoran Xu, Chen Chen, Xumiao Zhang, Xiaoxi Zhang, Xu Chen, Guihai Chen, Yubing Qiu, Yiping Zhang, Chong Hao, Ennan Zhai:
Learning Production-Optimized Congestion Control Selection for Alibaba Cloud CDN. 827-845 - Lingyun Yang, Yongchen Wang, Yinghao Yu, Qizhen Weng, Jianbo Dong, Kan Liu, Chi Zhang, Yanyi Zi, Hao Li, Zechao Zhang, Nan Wang, Yu Dong, Menglei Zheng, Lanlan Xi, Xiaowei Lu, Liang Ye, Guodong Yang, Binzhang Fu, Tao Lan, Liping Zhang, Lin Qu, Wei Wang:
GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale. 847-863 - Jianbo Dong, Kun Qian, Pengcheng Zhang, Zhilong Zheng, Liang Chen, Fei Feng, Yichi Xu, Yikai Zhu, Gang Lu, Xue Li, Zhihui Ren, Zhicheng Wang, Bin Luo, Peng Zhang, Yang Liu, Yanqing Chen, Yu Guan, Weicheng Wang, Chaojie Yang, Yang Zhang, Man Yuan, Hanyu Zhao, Yong Li, Zihan Zhao, Shan Li, Xianlong Zeng, Zhiping Yao, Binzhang Fu, Ennan Zhai, Wei Lin, Chao Wang, Dennis Cai:
Evolution of Aegis: Fault Diagnosis for AI Model Training Service in Production. 865-881 - Harish Srinivas, Graham Cormode, Mehrdad Honarkhah, Samuel Lurye, Jonathan Hehir, Lunwen He, George Hong, Ahmed Magdy, Dzmitry Huba, Kaikai Wang, Shen Guo, Shoubhik Bhattacharya:
PAPAYA Federated Analytics Stack: Engineering Privacy, Scalability and Practicality. 883-898
Middleboxes
- Haoyu Gu, Ali José Mashtizadeh, Bernard Wong:
HA/TCP: A Reliable and Scalable Framework for TCP Network Functions. 899-914 - Xiangfeng Zhu, Yuyao Wang, Banruo Liu, Yongtong Wu, Nikola Bojanic, Jingrong Chen, Gilbert Louis Bernstein, Arvind Krishnamurthy, Sam Kumar, Ratul Mahajan, Danyang Zhuo:
High-level Programming for Application Networks. 915-935 - Qiongwen Xu, Sebastiano Miano, Xiangyu Gao, Tao Wang, Adithya Murugadass, Songyuan Zhang, Anirudh Sivaraman, Gianni Antichi, Srinivas Narayana:
State-Compute Replication: Parallelizing High-Speed Stateful Packet Processing. 937-957 - Tao Ji, Rohan Vardekar, Balajee Vamanan, Brent E. Stephens, Aditya Akella:
MTP: Transport for In-Network Computing. 959-977
Rethinking Data Center Efficiency
- Shengkai Lin, Shizhen Zhao, Peirui Cao, Xinchi Han, Quan Tian, Wenfeng Liu, Qi Wu, Donghai Han, Xinbing Wang:
ONCache: A Cache-Based Low-Overhead Container Overlay Network. 979-998 - Kaiqiang Xu, Decang Sun, Han Tian, Junxue Zhang, Kai Chen:
GREEN: Carbon-efficient Resource Scheduling for Machine Learning Clusters. 999-1014 - Linsong Guo, Danial Zuberi, Tal Garfinkel, Amy Ousterhout:
The Benefits and Limitations of User Interrupts for Preemptive Userspace Scheduling. 1015-1032 - Sathiya Kumaran Mani, Kevin Hsieh, Santiago Segarra, Ranveer Chandra, Yajie Zhou, Srikanth Kandula:
Securing Public Cloud Networks with Efficient Role-based Micro-Segmentation. 1033-1048
RDMA
- Wei Liu, Kun Qian, Zhenhua Li, Feng Qian, Tianyin Xu, Yunhao Liu, Yu Guan, Shuhong Zhu, Hongfei Xu, Lanlan Xi, Chao Qin, Ennan Zhai:
Mitigating Scalability Walls of RDMA-based Container Networks. 1049-1065 - Anil Yelam, Stewart Grant, Saarth Deshpande, Nadav Amit, Radhika Niranjan Mysore, Amy Ousterhout, Marcos K. Aguilera, Alex C. Snoeren:
Eden: Developer-Friendly Application-Integrated Far Memory. 1067-1083 - Qing Wang, Jiwu Shu, Jing Wang, Yuhao Zhang:
Achieving Wire-Latency Storage Systems by Exploiting Hardware ACKs. 1085-1100 - Zixuan Wang, Xingda Wei, Jinyu Gu, Hongrui Xie, Rong Chen, Haibo Chen:
ODRP: On-Demand Remote Paging with Programmable RDMA. 1101-1115
Storage
- Yuyuan Kang, Ming Liu:
Understanding and Profiling NVMe-over-TCP Using ntprof. 1117-1136 - Sheng Jiang, Ming Liu:
Building an Elastic Block Storage over EBOFs Using Shadow Views. 1137-1153 - Gyuyeong Kim:
Pushing the Limits of In-Network Caching for Key-Value Stores. 1155-1168
Cellular and Wireless
- William Sentosa, Balakrishnan Chandrasekaran, Philip Brighten Godfrey, Haitham Hassanieh:
CellReplay: Towards accurate record-and-replay for cellular networks. 1169-1186 - Nakul Garg, Irtaza Shahid, Ramanujan K. Sheshadri, Karthikeyan Sundaresan, Nirupam Roy:
Large Network UWB Localization: Algorithms and Implementation. 1187-1203 - Anuj Kalia, Nikita Lazarev, Leyang Xue, Xenofon Foukas, Bozidar Radunovic, Francis Y. Yan:
Towards Energy Efficient 5G vRAN Servers. 1205-1219 - Xincheng Xie, Wentao Hou, Zerui Guo, Ming Liu:
Building Massive MIMO Baseband Processing on a Single-Node Supercomputer. 1221-1242 - Mary Hogan, Gerry Wan, Yiming Qiu, Sharad Agarwal, Ryan Beckett, Rachee Singh, Paramvir Bahl:
Efficient Multi-WAN Transport for 5G with OTTER. 1243-1267
Verification 2
- Tibor Schneider, Stefano Vissicchio, Laurent Vanbever:
Verifying maximum link loads in a changing world. 1269-1287 - Divya Raghunathan, Maria Apostolaki, Aarti Gupta:
A Layered Formal Methods Approach to Answering Queue-related Queries. 1289-1304 - Ding Ding, Zhanghan Wang, Jinyang Li, Aurojit Panda:
Runtime Protocol Refinement Checking for Distributed Protocol Implementations. 1305-1326 - Jianmin Liu, Li Chen, Dan Li, Yukai Miao:
CEGS: Configuration Example Generalizing Synthesizer. 1327-1347
Security
- Yosef Edery Anahory, Jie Kong, Nicholas Scaglione, Justin Furuness, Hemi Leibowitz, Amir Herzberg, Bing Wang, Yossi Gilad:
Suppressing BGP Zombies with Route Status Transparency. 1349-1366 - Jens Frieß, Haya Schulmann, Michael Waidner:
ValidaTor: Domain Validation over Tor. 1367-1380 - Zedong Ni, Yinbo Xu, Hui Zou, Yanbiao Li, Guang Cheng, Gaogang Xie:
From Address Blocks to Authorized Prefixes: Redesigning RPKI ROV with a Hierarchical Hashing Scheme for Fast and Memory-Efficient Validation. 1381-1397 - Shihan Lin, Suting Chen, Yunming Xiao, Yanqi Gu, Aleksandar Kuzmanovic, Xiaowei Yang:
PreAcher: Secure and Practical Password Pre-Authentication by Content Delivery Networks. 1399-1419
Data Plane Programmability 2
- Zhikang Chen, Haoyu Song, Zhiyu Zhang, Yang Xu, Bin Liu:
ClubHeap: A High-Speed and Scalable Priority Queue for Programmable Packet Scheduling. 1421-1436 - Erfan Sharafzadeh, Raymond Matson, Jean Tourrilhes, Puneet Sharma, Soudeh Ghorbani:
Self-Clocked Round-Robin Packet Scheduling. 1437-1465 - Albert Gran Alcoz, Balázs Vass, Pooria Namyar, Behnaz Arzani, Gábor Rétvári, Laurent Vanbever:
Everything Matters in Programmable Packet Scheduling. 1467-1485 - Hao Zheng, Xin Yan, Wenbo Li, Jiaqi Zheng, Xiaoliang Wang, Qingqing Zhao, Luyou He, Xiaofei Lai, Feng Gao, Fuguang Huang, Wanchun Dou, Guihai Chen, Chen Tian:
When P4 Meets Run-to-completion Architecture. 1487-1505
ML for Networks
- Lorenzo Pappone, Alessio Sacco, Flavio Esposito:
Mutant: Learning Congestion Control from Existing Protocols via Online Reinforcement Learning. 1507-1522 - Gerry Wan, Shinan Liu, Francesco Bronzino, Nick Feamster, Zakir Durumeric:
CATO: End-to-End Optimization of ML-Based Traffic Analysis Pipelines. 1523-1540 - Xizheng Wang, Libin Liu, Li Chen, Dan Li, Yukai Miao, Yu Bai:
Resolving Packets from Counters: Enabling Multi-scale Network Traffic Super Resolution via Composable Large Traffic Model. 1541-1561 - Chenyuan Wu, Haoyun Qin, Mohammad Javad Amiri, Boon Thau Loo, Dahlia Malkhi, Ryan Marcus:
BFTBrain: Adaptive BFT Consensus with Reinforcement Learning. 1563-1583

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.