


default search action
37th IPDPS 2023: St. Petersburg, FL, USA
- IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023, St. Petersburg, FL, USA, May 15-19, 2023. IEEE 2023, ISBN 979-8-3503-3766-2

- Junhyeok Jang, Miryeong Kwon, Donghyun Gouk, Hanyeoreum Bae, Myoungsoo Jung:

GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets. 2-12 - Haiheng He, Dan Chen, Long Zheng, Yu Huang, Haifeng Liu, Chaoqiang Liu, Xiaofei Liao, Hai Jin:

GraphMetaP: Efficient MetaPath Generation for Dynamic Heterogeneous Graph Models. 13-24 - Prasun Gera, Hyesoon Kim:

Traversing Large Compressed Graphs on GPUs. 25-35 - Isuru Ranawaka, Md. Khaledur Rahman, Ariful Azad:

Distributed Sparse Random Projection Trees for Constructing K-Nearest Neighbor Graphs. 36-46 - Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr.:

Fast Deterministic Gathering with Detection on Arbitrary Graphs: The Power of Many Robots. 47-57 - Sudipta Saha Shubha, Shohaib Mahmud, Haiying Shen, Geoffrey C. Fox, Madhav V. Marathe:

Accurate and Efficient Distributed COVID-19 Spread Prediction based on a Large-Scale Time-Varying People Mobility Graph. 58-68 - Zeyu Luan, Qing Li, Yi Wang, Yong Jiang:

H-Cache: Traffic-Aware Hybrid Rule-Caching in Software-Defined Networks. 69-78 - Jiaxin Lei

, Manish Munikar, Hui Lu, Jia Rao:
Accelerating Packet Processing in Container Overlay Networks via Packet-level Parallelism. 79-89 - Haodi Lu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Fubing Mao, Yu Zhang, Hai Jin:

Software-Defined, Fast and Strongly-Consistent Data Replication for RDMA-Based PM Datastores. 90-101 - Mohamed W. Hassan, Adel Dabah

, Hatem Ltaief
, Suhaib A. Fahmy:
Signal Detection for Large MIMO Systems Using Sphere Decoding on FPGAs. 102-111 - Ajay Singh

, Trevor Brown, Michael Spear
:
Efficient Hardware Primitives for Immediate Memory Reclamation in Optimistic Data Structures. 112-122 - Kaushik Kandadi Suresh, Benjamin Michalowicz

, Bharath Ramesh, Nicholas Contini
, Jinghan Yao, Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs. 123-133 - Qinghua Zhou

, Quentin Anthony, Lang Xu, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication. 134-144 - Sonia Rani Gupta, Nikela Papadopoulou

, Miquel Pericàs:
Accelerating CNN inference on long vector architectures via co-design. 145-155 - Jianjin Liao, Mingzhen Li, Hailong Yang, Qingxiao Sun, Biao Sun, Jiwei Hao, Tianyu Feng

, Fengwei Yu, Shengdong Chen, Ye Tao, Zicheng Zhang, Zhongzhi Luan, Depei Qian:
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU. 156-166 - Zheng Zhang

, Donglin Yang, Yaqi Xia, Liang Ding, Dacheng Tao
, Xiaobo Zhou, Dazhao Cheng:
MPipeMoE: Memory Efficient MoE for Pre-trained Models with Adaptive Pipeline Parallelism. 167-177 - Hariharan Devarajan

, Kathryn M. Mohror:
Mimir: Extending I/O Interfaces to Express User Intent for Complex Workloads in HPC. 178-188 - Di Zhang, Chris Egersdoerfer

, Tabassum Mahmud, Mai Zheng, Dong Dai:
Drill: Log-based Anomaly Detection for Large-scale Storage Systems Using Source Code Analysis. 189-199 - Saisha Kamat, Abdullah Al Raqibul Islam, Mai Zheng, Dong Dai:

FaultyRank: A Graph-based Parallel File System Checker. 200-210 - John Ravi, Suren Byna

, Quincey Koziol, Houjun Tang, Michela Becchi:
Evaluating Asynchronous Parallel I/O on HPC Systems. 211-221 - Qifan Xu

, Yang You:
An Efficient 2D Method for Training Super-Large Deep Learning Models. 222-232 - Bingyi Zhang, Viktor K. Prasanna:

Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation. 233-244 - Siddharth Singh, Abhinav Bhatele:

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training. 245-255 - Daning Cheng

, Shigang Li, Yunquan Zhang:
Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner. 256-267 - Danlin Jia, Yiming Xie

, Li Wang, Xiaoqian Zhang, Allen Yang, Xuebin Yao, Mahsa Bayati, Pradeep Subedi
, Bo Sheng
, Ningfang Mi:
SRC: Mitigate I/O Throughput Degradation in Network Congestion Control of Disaggregated Storage Systems. 268-278 - Qi Yu, Lin Wang, Yuchong Hu, Yumeng Xu, Dan Feng, Jie Fu, Xia Zhu, Zhen Yao, Wenjia Wei:

Boosting Multi-Block Repair in Cloud Storage Systems with Wide-Stripe Erasure Coding. 279-289 - Michael J. Brim

, Adam T. Moody, Seung-Hwan Lim, Ross G. Miller
, Swen Boehm, Cameron Stanavige, Kathryn M. Mohror, Sarp Oral:
UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage. 290-300 - Kyu-Jin Cho, Injae Kang, Jin-Soo Kim:

ArkFS: A Distributed File System on Object Storage for Archiving Data in HPC Environment. 301-311 - Rory Hector, Ramachandran Vaidyanathan, Gokarna Sharma, Jerry L. Trahan:

On Doorway Egress by Autonomous Robots. 312-321 - Wissam M. Sid-Lakhdar, Sébastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Piotr Luszczek, Mark Gates

, Stanimire Tomov
, Hans Johansen, David B. Williams-Young
, Timothy A. Davis, Jack J. Dongarra, Hartwig Anzt:
PAQR: Pivoting Avoiding QR factorization. 322-332 - Junqi Yin

, Feiyi Wang, Mallikarjun Arjun Shankar:
DeepThermo: Deep Learning Accelerated Parallel Monte Carlo Sampling for Thermodynamics Evaluation of High Entropy Alloys. 333-343 - Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, Yibo Zhu:

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs. 344-355 - Emmanuel Agullo, Alfredo Buttari, Olivier Coulaud, Lionel Eyraud-Dubois, Mathieu Faverge, Alain Franc, Abdou Guermouche, Antoine Jego

, Romain Peressoni, Florent Pruvost:
On the Arithmetic Intensity of Distributed-Memory Dense Matrix Multiplication Involving a Symmetric Input Matrix (SYMM). 357-367 - João Nuno Ferreira Alves

, Luís M. S. Russo, Alexandre P. Francisco
, Siegfried Benkner:
A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices. 368-378 - Yichen Zhang, Shengguo Li, Fan Yuan, Dezun Dong, Xiaojian Yang, Tiejun Li, Zheng Wang:

Memory-aware Optimization for Sequences of Sparse Matrix-Vector Multiplications. 379-389 - Olivier Beaumont, Jean-Alexandre Collin, Lionel Eyraud-Dubois, Mathieu Vérité:

Data Distribution Schemes for Dense Linear Algebra Factorizations on Any Number of Nodes. 390-401 - Yongseok Soh, Ahmed E. Helal, Fabio Checconi, Jan Laukemann

, Jesmin Jahan Tithi, Teresa M. Ranadive, Fabrizio Petrini, Jee W. Choi:
Dynamic Tensor Linearization and Time Slicing for Efficient Factorization of Infinite Data Streams. 402-412 - Max A. Deppert, Klaus Jansen, Marten Maack, Simon Pukrop, Malin Rau

:
Scheduling with Many Shared Resources. 413-423 - Laurent Schares, Asser N. Tantawi, Pavlos Maniotis, Ming-Hung Chen, Claudia Misale, Seetharami Seelam, Hao Yu:

Chic-sched: a HPC Placement-Group Scheduler on Hierarchical Topologies with Constraints. 424-434 - Lanshun Nie, Yuqi Qiu, Fei Meng, Mo Yu, Jing Li:

Generalizable Reinforcement Learning-Based Coarsening Model for Resource Allocation over Large and Diverse Stream Processing Graphs. 435-445 - Bo Wang, Anara Kozhokanova, Christian Terboven, Matthias S. Müller:

RLP: Power Management Based on a Latency-Aware Roofline Model. 446-456 - Ke Liu

, Kan Wu, Hua Wang, Ke Zhou, Ji Zhang, Cong Li:
SLAP: An Adaptive, Learned Admission Policy for Content Delivery Network Caching. 457-467 - Zahra Najafabadi Samani

, Narges Mehran
, Dragi Kimovski
, Radu Prodan:
Proactive SLA-aware Application Placement in the Computing Continuum. 468-479 - Chuyao Ye, Hao Zheng

, Zhigang Hu, Meiguang Zheng:
PFedSA: Personalized Federated Multi-Task Learning via Similarity Awareness. 480-488 - Jingjing Xue, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang

:
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout. 489-500 - Ruibo Fan, Wei Wang, Xiaowen Chu:

Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks. 501-511 - Süreyya Emre Kurt, Jinghua Yan, Aravind Sukumaran-Rajam, Prashant Pandey, P. Sadayappan

:
Communication Optimization for Distributed Execution of Graph Neural Networks. 512-523 - Yufan Xia

, Marco De La Pierre
, Amanda S. Barnard
, Giuseppe M. J. Barca
:
A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication. 524-534 - Akash Dutta, Jee Choi, Ali Jannesari

:
Power Constrained Autotuning using Graph Neural Networks. 535-545 - Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Ishan G. Thakkar, Sayed Ahmad Salehi, Jeffrey Todd Hastings:

SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast, Energy-Efficient Inference of Integer-Quantized CNNs. 546-556 - Yi-Chien Lin, Viktor K. Prasanna:

HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture. 557-567 - William Ladd

, Christopher Jensen
, Madhurima Vardhan
, Jeff Ames, Jeff R. Hammond, Erik W. Draeger
, Amanda Randles
:
Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation. 568-578 - Archie Powell, Gihan R. Mudalige

:
Predictive Analysis of Code Optimisations on Large-Scale Coupled CFD-Combustion Simulations using the CPX Mini-App. 579-589 - Kumar Saurabh

, Masado Ishii, Makrand A. Khanwale
, Hari Sundar, Baskar Ganapathysubramanian:
Scalable adaptive algorithms for next-generation multiphase flow simulations. 590-601 - Joshua Hoke Davis

, Justin Shafner, Daniel Nichols
, Nathan Grube, Pino Martin, Abhinav Bhatele:
Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms. 602-612 - Ignacio Gavier, Joshua Russell, Devdhar Patel, Edward A. Rietman, Hava T. Siegelmann:

Neural Network Compiler for Parallel High-Throughput Simulation of Digital Circuits. 613-623 - Olivia Grimes, Jacob Nelson-Slivon, Ahmed Hassan, Roberto Palmieri:

Opportunities and Limitations of Hardware Timestamps in Concurrent Data Structures. 624-634 - Younghyun Cho, James Weldon Demmel, Jacob King

, Xiaoye S. Li, Yang Liu, Hengrui Luo:
Harnessing the Crowd for Autotuning High-Performance Computing Applications. 635-645 - Kawthar Shafie Khorassani, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:

Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc*. 646-656 - Yi Zhao, Juepeng Zheng, Haohuan Fu, Wenzhao Wu, Jie Gao, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Runmin Dong, Zhenrong Du, Sha Liu, Xin Liu, Shaoqing Zhang, Le Yu:

SW-LCM: A Scalable and Weakly-supervised Land Cover Mapping Method on a New Sunway Supercomputer. 657-667 - Panagiotis Mpakos, Dimitrios Galanopoulos, Petros Anastasiadis

, Nikela Papadopoulou
, Nectarios Koziris, Georgios I. Goumas:
Feature-based SpMV Performance Analysis on Contemporary Devices. 668-679 - Ichitaro Yamazaki, Alexander Heinlein

, Sivasankaran Rajamanickam:
An Experimental Study of Two-level Schwarz Domain-Decomposition Preconditioners on GPUs. 680-689 - Peter Sanders, Matthias Schimek

:
Engineering Massively Parallel MST Algorithms. 691-701 - Peter Sanders, Tim Niklas Uhl

:
Engineering a Distributed-Memory Triangle Counting Algorithm. 702-712 - Jue Wang, Fumihiko Ino, Jing Ke:

PRF: A Fast Parallel Relaxed Flooding Algorithm for Voronoi Diagram Generation on GPU. 713-723 - Christian Hellwig, Fabian Czappa, Martin Michel, Reinhold Bertrand, Felix Wolf:

Satellite Collision Detection using Spatial Data Structures. 724-735 - Michael Kenzel

, Stefan Lemme, Richard Membarth, Matthias Kurtenacker, Hugo Devillers, Markus Steinberger
, Philipp Slusallek:
AnyQ: An Evaluation Framework for Massively-Parallel Queue Algorithms. 736-745 - Tsung-Wei Huang:

qTask: Task-parallel Quantum Circuit Simulation with Incrementality. 746-756 - Milan Shah, Xiaodong Yu, Sheng Di, Danylo Lykov

, Yuri Alexeev, Michela Becchi, Franck Cappello:
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations. 757-767 - Fei Li, Arul Rhik Mazumder:

An Adaptive Hybrid Quantum Algorithm for the Metric Traveling Salesman Problem. 768-778 - Bradley H. Theilman

, Yipu Wang, Ojas Parekh
, William Severa, J. Darby Smith
, James B. Aimone
:
Stochastic Neuromorphic Circuits for Solving MAXCUT. 779-787 - Haohao Liao, Mahmoud A. Elmohr, Xuan Dong, Yanjun Qian, Wenzhe Yang

, Zhiwei Shang, Yin Tan:
TurboHE: Accelerating Fully Homomorphic Encryption Using FPGA Clusters. 788-797 - Guang Fan

, Fangyu Zheng, Lipeng Wan
, Lili Gao, Yuan Zhao, Jiankuo Dong, Yixuan Song, Yuewu Wang, Jingqiang Lin:
Towards Faster Fully Homomorphic Encryption Implementation with Integer and Floating-point Computing Power of GPUs. 798-808 - Xujing Li, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang:

FedTrip: A Resource-Efficient Federated Learning Method with Triplet Regularization. 809-819 - Pierre-François Dutot, Yeu-Shin Fu, Nikhil Prasad, Oliver Sinnen:

A Guaranteed Approximation Algorithm for Scheduling Fork-Joins with Communication Delay. 820-830 - Yifeng Tang, Cho-Li Wang:

SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors. 831-841 - Zhangchen Xu

, Yuetai Li, Chenglin Feng, Lei Zhang:
Exact Fault-Tolerant Consensus with Voting Validity. 842-852 - Mark de Berg, Leyla Biabani, Morteza Monemizadeh:

k-Center Clustering with Outliers in the MPC and Streaming Model. 853-863 - Lu Zhang, Chao Li, Xinkai Wang

, Weiqi Feng, Zheng Yu, Quan Chen, Jingwen Leng, Minyi Guo, Pu Yang, Shang Yue:
FIRST: Exploiting the Multi-Dimensional Attributes of Functions for Power-Aware Serverless Computing. 864-874 - Zhuo Huang, Hao Fan, Chaoyi Cheng, Song Wu, Hai Jin:

Duo: Improving Data Sharing of Stateful Serverless Applications by Efficiently Caching Multi-Read Data. 875-885 - Hao Wu, Junxiao Deng, Hao Fan, Shadi Ibrahim, Song Wu, Hai Jin:

QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows. 886-896 - Marcin Copik

, Konstantin Taranov, Alexandru Calotoiu, Torsten Hoefler:
rFaaS: Enabling High Performance Serverless with RDMA and Leases. 897-907 - Tianyao Shi

, Yingxuan Yang
, Yunlong Cheng, Xiaofeng Gao, Zhen Fang, Yongqiang Yang:
Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud. 908-917 - Ming Zhao, Kritshekhar Jha, Sungho Hong:

GPU-enabled Function-as-a-Service for Machine Learning Inference. 918-928 - Pouriya Zarbafian, Vincent Gramoli:

Lyra: Fast and Scalable Resilience to Reordering Attacks in Blockchains. 929-939 - Deepal Tennakoon, Yiding Hua, Vincent Gramoli:

Smart Redbelly Blockchain: Reducing Congestion for Web3. 940-950 - Weicong Chen, Hao Qi

, Xiaoyi Lu, Curtis Tatsuoka:
SBGT: Scaling Bayesian-based Group Testing for Disease Surveillance. 951-962 - Vani Nagarajan, Milind Kulkarni:

RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware. 963-973 - Sajal Dash, Mohammad Alaul Haque Monil

, Junqi Yin, Ramu Anandakrishnan, Feiyi Wang:
Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations. 974-984 - Tom Peterka, Dmitriy Morozov, Arnur Nigmetov

, Orcun Yildiz, Bogdan Nicolae, Philip E. Davis:
LowFive: In Situ Data Transport for High-Performance Workflows. 985-995 - Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. 996-1006 - Shaomeng Li, Peter Lindstrom

, John P. Clyne:
Lossy Scientific Data Compression With SPERR. 1007-1017 - Garima Singh, Baidyanath Kundu, Harshitha Menon, Alexander Penev, David J. Lange, Vassil Vassilev:

Fast And Automatic Floating Point Error Analysis With CHEF-FP. 1018-1028 - Nicolau Manubens, Tiago Quintino, Simon D. Smart, Emanuele Danovaro, Adrian Jackson:

DAOS as HPC Storage: a View From Numerical Weather Prediction. 1029-1040 - Bing Lu

, Yida Li, Junqi Wang, Huizhang Luo, Kenli Li:
ZFP-X: Efficient Embedded Coding for Accelerating Lossy Floating Point Compression. 1041-1050

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














