


default search action
PPoPP 2023: Montreal, QC, Canada
- Maryam Mehri Dehnavi, Milind Kulkarni, Sriram Krishnamoorthy:

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, Montreal, QC, Canada, 25 February 2023 - 1 March 2023. ACM 2023, ISBN 979-8-4007-0015-6
Data Structures
- Weihua Zhang, Chuanlei Zhao, Lu Peng, Yuzhe Lin, Fengzhe Zhang, Yunping Lu:

Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based Synchronization. 1-13 - Raed Romanov, Nikita Koval:

The State-of-the-Art LCRQ Concurrent Queue Algorithm Does NOT Require CAS2. 14-26 - Zhe Wang, Jinhao Zhao, Kunal Agrawal, He Liu, Meng Xu, Jing Li:

Provably Good Randomized Strategies for Data Placement in Distributed Key-Value Stores. 27-38
Algorithms
- Pedro Ramalhete, Andreia Correia, Pascal Felber:

2PLSF: Two-Phase Locking with Starvation-Freedom. 39-51 - Xiaojun Dong

, Letong Wang, Yan Gu, Yihan Sun
:
Provably Fast and Space-Efficient Parallel Biconnectivity. 52-65 - Yuanhao Wei

, Guy E. Blelloch
, Panagiota Fatourou
, Eric Ruppert:
Practically and Theoretically Efficient Garbage Collection for Multiversioning. 66-78
Programming Models
- Muhammad Osama

, Serban D. Porumbescu
, John D. Owens
:
A Programming Model for GPU Load Balancing. 79-91 - Mohak Chadha, Nils Krueger

, Jophin John, Anshul Jindal, Michael Gerndt, Shajulin Benedict:
Exploring the Use of WebAssembly in HPC. 92-106 - Nikita Koval, Dan Alistarh, Roman Elizarov:

Fast and Scalable Channels in Kotlin Coroutines. 107-118 - William S. Moses

, Ivan R. Ivanov
, Jens Domke
, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko
:
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. 119-134
Applications
- Kehao Lin, Chunbao Zhou, Yan Zeng, Ningming Nie, Jue Wang, Shigang Li

, Yangde Feng, Yangang Wang, Kehan Yao, Tiechui Yao, Jilin Zhang, Jian Wan:
A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations. 135-147 - Yaojian Chen

, Yong Liu, Xinmin Shi, Jiawei Song, Xin Liu, Lin Gan, Chu Guo, Haohuan Fu, Jie Gao, Dexun Chen, Guangwen Yang:
Lifetime-Based Optimization for Simulating Quantum Circuits on a New Sunway Supercomputer. 148-159 - Hunter McCoy, Steven A. Hofmeyr, Katherine A. Yelick

, Prashant Pandey:
High-Performance Filters for GPUs. 160-173 - Lukas Breitwieser

, Ahmad Hesam
, Fons Rademakers
, Juan Gómez-Luna
, Onur Mutlu
:
High-Performance and Scalable Agent-Based Simulation with BioDynaMo. 174-188
Task Parallelism
- Tao B. Schardl, I-Ting Angelina Lee:

OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code. 189-203 - Zhen Xie, Jie Liu, Jiajia Li

, Dong Li:
Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness. 204-217 - Michael Bauer, Elliott Slaughter, Sean Treichler, Wonchan Lee, Michael Garland, Alex Aiken:

Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence. 218-231
Transactions
- Rati Gelashvili, Alexander Spiegelman, Zhuolun Xiang, George Danezis, Zekun Li, Dahlia Malkhi, Yu Xia, Runtian Zhou:

Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing. 232-244 - Gal Assa, Andreia Correia, Pedro Ramalhete, Valerio Schiavoni, Pascal Felber:

TL4x: Buffered Durable Transactions on Disk as Fast as in Memory. 245-259
Decompositions
- Lizhi Xiang, Miao Yin, Chengming Zhang

, Aravind Sukumaran-Rajam, P. Sadayappan, Bo Yuan, Dingwen Tao
:
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition. 260-273 - Jieyang Chen, Xin Liang, Kai Zhao, Hadi Zamani Sabzi, Laxmi N. Bhuyan, Zizhong Chen

:
Improving Energy Saving of One-Sided Matrix Decompositions on CPU-GPU Heterogeneous Systems. 274-287 - Yang Xia, Peng Jiang, Gagan Agrawal, Rajiv Ramnath:

End-to-End LU Factorization of Large Matrices on GPUs. 288-300 - Shaoshuai Zhang

, Ruchi Shah, Hiroyuki Ootomo, Rio Yokota, Panruo Wu:
Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor Core. 301-312
Kernels
- Zhen Peng, Minjia Zhang, Kai Li

, Ruoming Jin, Bin Ren:
iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures. 313-328 - Serif Yesil, Azin Heidarshenas, Adam Morrison, Josep Torrellas:

WISE: Predicting the Performance of Sparse Matrix Vector Multiplication with Machine Learning. 329-341 - Alexandre de Limas Santana

, Adrià Armejach
, Marc Casas
:
Efficient Direct Convolution Using Long SIMD Instructions. 342-353
Attention
- Yufeng Wang, Charith Mendis

:
TGOpt: Redundancy-Aware Optimizations for Temporal Graph Attention Networks. 354-368 - Zhaodong Chen, Zheng Qu, Yuying Quan, Liu Liu

, Yufei Ding, Yuan Xie:
Dynamic N: M Fine-Grained Structured Sparse Attention Mechanism. 369-379
Training
- Zihao Chen, Chen Xu, Weining Qian, Aoying Zhou:

Elastic Averaging for Efficient Pipelined DNN Training. 380-391 - Zhenkun Cai, Qihui Zhou, Xiao Yan, Da Zheng, Xiang Song, Chenguang Zheng, James Cheng, George Karypis:

DSP: Efficient GNN Training with Multiple GPUs. 392-404 - Chunyang Wang

, Desen Sun
, Yuebin Bai:
PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs. 405-418
Posters
- Ricardo Jesus

, Michèle Weiland
:
AArch64 Atomics: Might They Be Harming Your Performance? 419-421 - Fei Dai

, Yawen Chen
, Zhiyi Huang, Haibo Zhang
, Fangfang Zhang:
Efficient All-Reduce for Distributed DNN Training in Optical Interconnect Systems. 422-424 - Jiantong Jiang

, Zeyi Wen, Atif Bin Mansoor
, Ajmal Mian
:
Fast Parallel Exact Inference on Bayesian Networks. 425-426 - Zhihao Li

, Haipeng Jia, Yunquan Zhang, Yuyan Sun, Yiwei Zhang
, Tun Chen:
Generating Fast FFT Kernels on CPUs via FFT-Specific Intrinsics. 427-428 - Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, John D. Owens:

Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU. 429-431 - Cheng Xu, Chao Li, Pengyu Wang, Xiaofeng Hou, Jing Wang, Shixuan Sun, Minyi Guo, Hanqing Wu, Dongbai Chen, Xiangwen Liu:

High-Throughput GPU Random Walk with Fine-Tuned Concurrent Query Processing. 432-434 - Gali Sheffi, Erez Petrank:

The ERA Theorem for Safe Memory Reclamation. 435-437 - Vitaly Aksenov

, Trevor Brown, Alexander Fedorov, Ilya Kokorin:
Unexpected Scaling in Path Copying Trees. 438-440 - Wentao Cai

, Haosen Wen
, Michael L. Scott
:
Transactional Composition of Nonblocking Data Structures. 441-443 - Ruobing Han, Jun Chen

, Bhanu Garg, Jeffrey Young, Jaewoong Sim, Hyesoon Kim:
CuPBoP: A Framework to Make CUDA Portable. 444-446 - Yuchen Zhong, Guangming Sheng, Juncheng Liu, Jinhui Yuan, Chuan Wu:

Swift: Expedited Failure Recovery for Large-Scale DNN Training. 447-449 - Re'em Harel, Yuval Pinter

, Gal Oren:
Learning to Parallelize in a Shared-Memory Environment with Transformers. 450-452

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














