


default search action
ICPP 2022: Bordeaux, France
- Proceedings of the 51st International Conference on Parallel Processing, ICPP 2022, Bordeaux, France, 29 August 2022 - 1 September 2022. ACM 2022, ISBN 978-1-4503-9733-9

Distributing Learning Algorithms
- Hao Zhang

, Tingting Wu, Siyao Cheng, Jie Liu:
Aperiodic Local SGD: Beyond Local SGD. 1:1-1:10 - Yijun Li, Jiawei Huang, Zhaoyi Li

, Shengwen Zhou, Wanchun Jiang, Jianxin Wang:
HSP: Hybrid Synchronous Parallelism for Fast Distributed Deep Learning. 2:1-2:11 - Refael Cohen, Ido Hakimi, Assaf Schuster:

SMEGA2: Distributed Asynchronous Deep Neural Network Training With a Single Momentum Buffer. 3:1-3:10 - Milan Shah

, Reece Neff
, Hancheng Wu, Marco Minutoli, Antonino Tumeo, Michela Becchi:
Accelerating Random Forest Classification on GPU and FPGA. 4:1-4:11
System Optimizations Through Deep Learning
- Liu Liu, Jian Yu, Zhijun Ding:

Adaptive and Efficient GPU Time Sharing for Hyperparameter Tuning in Cloud. 5:1-5:11 - Boqian Fu, Fahao Chen, Peng Li, Deze Zeng:

TCB: Accelerating Transformer Inference Services with Request Concatenation. 6:1-6:11 - Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye, Yabo Duan:

EmbRace: Accelerating Sparse Communication for Distributed Training of Deep Neural Networks. 7:1-7:11 - Guanghao Li, Yue Hu, Miao Zhang, Ji Liu, Quanjun Yin, Yong Peng, Dejing Dou:

FedHiSyn: A Hierarchical Synchronous Federated Learning Framework for Resource and Data Heterogeneity. 8:1-8:11
Parallel Algorithms
- Haonan Ji, Huimin Song, Shibo Lu, Zhou Jin

, Guangming Tan, Weifeng Liu
:
TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs. 9:1-9:11 - Srdan Milakovic, Oguz Selvitopi, Israt Nisa, Zoran Budimlic, Aydin Buluç

:
Parallel Algorithms for Masked Sparse Matrix-Matrix Products. 10:1-10:11 - Francisco López, Lars Karlsson, Paolo Bientinesi:

FLOPs as a Discriminant for Dense Linear Algebra Algorithms. 11:1-11:10 - Boxiang Wang, Qifan Xu

, Zhengda Bian
, Yang You:
Tesseract: Parallelize the Tensor Parallelism Efficiently. 12:1-12:11 - Jan Hückelheim, Laurent Hascoët:

Automatic Differentiation of Parallel Loops with Formal Methods. 13:1-13:11 - Andrey Prokopenko, Piyush Sao, Damien Lebrun-Grandié:

A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs. 14:1-14:10 - Haidong Lan, Wenxi Zhu, Du Wu

, Qian Qiu, Honglin Zhu, Jingjing Zhao, Xinghui Fu, Liu Wei, Jintao Meng, Minwen Deng:
Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach. 15:1-15:9 - Yuhan Wu, Zhuochen Fan, Qilong Shi

, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, Ariel Shtul, Yaofeng Tu:
SHE: A Generic Framework for Data Stream Mining over Sliding Windows. 16:1-16:12
Architectural Support for Learning
- Zhengbo Chen, Qi Yu, Fang Zheng, Feng Guo, Zuoning Chen:

DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training. 17:1-17:10 - Minjin Tang, Mei Wen, Yasong Cao, Junzhong Shen, Jianchao Yang, Jiawei Fei, Yang Guo, Sheng Liu:

Mentha: Enabling Sparse-Packing Computation on Systolic Arrays. 18:1-18:11 - Moiz Arif, Kevin Assogba

, M. Mustafa Rafique, Sudharshan Vazhkudai:
Exploiting CXL-based Memory for Distributed Deep Learning. 19:1-19:11 - Jiazhi Jiang, Jiangsu Du, Dan Huang, Dongsheng Li, Jiang Zheng, Yutong Lu:

Characterizing and Optimizing Transformer Inference on ARM Many-core Processor. 20:1-20:11
Storage Recovery and Repair
- Lin Wang, Yuchong Hu, Qian Du, Dan Feng, Ray Wu, Ingo He, Kevin Zhang:

Exploiting Parallelism of Disk Failure Recovery via Partial Stripe Repair for an Erasure-Coded High-Density Storage Server. 21:1-21:11 - Hai Zhou

, Dan Feng:
Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters. 22:1-22:11 - Shuang Ma, Si Wu, Cheng Li, Yinlong Xu:

Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance. 23:1-23:11 - Shucheng Wang

, Qiang Cao, Ziyi Lu, Jie Yao:
Mlog: Multi-log Write Buffer upon Ultra-fast SSD RAID. 24:1-24:11
Data Systems, Storage, I/O
- Kai Lu, Guokuan Li, Jiguang Wan, Ruixiang Ma

, Wei Zhao:
ADSTS: Automatic Distributed Storage Tuning System Using Deep Reinforcement Learning. 25:1-25:13 - Jie Liu, Bogdan Nicolae

, Dong Li:
Lobster: Load Balance-Aware I/O for Distributed DNN Training. 26:1-26:11 - Yuanzhang Wang

, Fengkui Yang
, Ji Zhang, Chunhua Li, Ke Zhou, Chong Liu, Zhuo Cheng, Wei Fang, Jinhu Liu:
LDPP: A Learned Directory Placement Policy in Distributed File Systems. 27:1-27:11 - Li Liu, Chunhua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang:

A Data-aware Learned Index Scheme for Efficient Writes. 28:1-28:11
Memory Systems and I/O
- Haodong Lin, Zhibing Sha, Jun Li, Zhigang Cai, Balazs Gerofi, Yuanquan Shi

, Jianwei Liao:
DRAM Cache Management with Request Granularity for NAND-based SSDs. 29:1-29:10 - Xiaomin Zou, Fang Wang, Dan Feng, Tianjin Guan, Nan Su:

ROWE-tree: A Read-Optimized and Write-Efficient B+-tree for Persistent Memory. 30:1-30:11 - Christopher Stewart

, Nathaniel Morris, Lydia Y. Chen, Robert Birke:
Performance Modeling for Short-Term Cache Allocation. 31:1-31:11 - Kai Zhang, Zhiqi Wang

, Zili Shao:
BSCache: A Brisk Semantic Caching Scheme for Cloud-based Performance Monitoring Timeseries Systems. 32:1-32:10 - Lucia Pons, Julio Sahuquillo, Salvador Petit, Julio Pons

:
Cache-Poll: Containing Pollution in Non-Inclusive Caches Through Cache Partitioning. 33:1-33:11 - Mengya Lei

, Fang Wang, Dan Feng, Xiaoyu Shuai, Yuchao Cao:
A Dynamic and Recoverable BMT Scheme for Secure Non-Volatile Memory. 34:1-34:11
Graph Algorithms
- Christoph Klein, Robert Strzodka:

Highly Parallel Linear Forest Extraction from a Weighted Graph on GPUs. 35:1-35:11 - Jason Niu, Jaroslaw Zola

, Ahmet Erdem Sariyüce:
Counting Induced 6-Cycles in Bipartite Graphs. 36:1-36:10 - Shuai Lin, Rui Wang

, Yongkun Li, Yinlong Xu, John C. S. Lui, Fei Chen, Pengcheng Wang, Lei Han:
Towards Fast Large-scale Graph Analysis via Two-dimensional Balanced Partitioning. 37:1-37:11 - Anwesh Panda, Sathish Vadhiyar:

Dynamic Strategies for High Performance Training of Knowledge Graph Embeddings. 38:1-38:10 - Xianghao Xu, Hong Jiang, Fang Wang, Yongli Cheng, Peng Fang:

GraphSD: A State and Dependency aware Out-of-Core Graph Processing System. 39:1-39:11
Resource Management and Scheduling
- Taylan Özden

, Tim Beringer, Arya Mazaheri
, Hamid Mohammadi Fard, Felix Wolf:
ElastiSim: A Batch-System Simulator for Malleable Workloads. 40:1-40:11 - Huanle Xu, Yang Liu, Wing Cheong Lau:

Multi Resource Scheduling with Task Cloning in Heterogeneous Clusters. 41:1-41:11 - Anam Tahir, Kai Cui

, Heinz Koeppl:
Learning Mean-Field Control for Delayed Information Load Balancing in Large Queuing Systems. 42:1-42:11 - Tapan Srivastava, Huazhe Zhang, Henry Hoffmann:

Penelope: Peer-to-peer Power Management. 43:1-43:11 - Avinash Kumar Chaurasia, Anshuj Garg, Bhaskaran Raman, Uday Kurkure, Hari Sivaraman, Lan Vu, Sairam Veeraswamy:

Simmer: Rate proportional scheduling to reduce packet drops in vGPU based NF chains. 44:1-44:11 - Md. Maruf Hossain, Erik Saule:

Postmortem Computation of Pagerank on Temporal Graphs. 45:1-45:11 - Yang Liu, Huanle Xu, Wing Cheong Lau:

Online Resource Optimization for Elastic Stream Processing with Regret Guarantee. 46:1-46:11 - Kangjin Wang, Ying Li, Cheng Wang, Tong Jia, Kingsum Chow, Yang Wen, Yaoyong Dou, Guoyao Xu, Chuanjia Hou, Jie Yao, Liping Zhang:

Characterizing Job Microarchitectural Profiles at Scale: Dataset and Analysis. 47:1-47:11 - Huijun Wang, Oliver Sinnen:

Scheduling Fork-Join Task Graphs with Communication Delays and Equal Processing Times. 48:1-48:9 - Wenda Tang

, Senbo Fu, Yutao Ke, Qian Peng, Feng Gao:
Themis: Fair Memory Subsystem Resource Sharing with Differentiated QoS in Public Clouds. 49:1-49:12 - Yuxin Chen, Benjamin Brock, Serban D. Porumbescu, Aydin Buluç

, Katherine A. Yelick
, John D. Owens:
Atos: A Task-Parallel GPU Scheduler for Graph Analytics. 50:1-50:11 - Anne Benoit, Lucas Perotin, Yves Robert

, Hongyang Sun:
Online Scheduling of Moldable Task Graphs under Common Speedup Models. 51:1-51:11
Programming Systems, Runtime Systems and Compilers
- Xiaohan Tao, Yu Zhu, Boyang Wang, Jinlong Xu, Jianmin Pang, Jie Zhao:

Automatically Generating High-performance Matrix Multiplication Kernels on the Latest Sunway Processor. 52:1-52:12 - Xin You, Changxi Liu

, Hailong Yang, Pengbo Wang, Zhongzhi Luan, Depei Qian:
Vectorizing SpMV by Exploiting Dynamic Regular Patterns. 53:1-53:12 - Lijuan Jiang, Ping Xu, Qianchao Zhu

, Xiuhong Li, Shengen Yan, Xingcheng Zhang, Dahua Lin, Wenjing Ma, Zhouyang Li
, Jun Liu, Jinming Ma
, Minxi Jin, Chao Yang:
EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers. 54:1-54:11 - Jimmy Aguilar Mena, Omar Shaaban, Victor Lopez, Marta Garcia, Paul M. Carpenter, Eduard Ayguadé, Jesús Labarta:

Transparent load balancing of MPI programs using [email protected] and DLB. 55:1-55:11
Networks and Communication
- Rongxin Han, Dezhi Chen, Song Guo, Xiaoyuan Fu, Jingyu Wang

, Qi Qi, Jianxin Liao:
Parallel Network Slicing for Multi-SP Services. 56:1-56:11 - Jin Ye, Lin Li, Wenlu Zhang, Guihao Chen, Yuanchao Shan, Yijun Li, Weihe Li, Jiawei Huang:

UA-Sketch: An Accurate Approach to Detect Heavy Flow based on Uninterrupted Arrival. 57:1-57:11 - Qinzhe Wu, Ashen Ekanayake, Ruihao Li, Jonathan C. Beard, Lizy Kurian John:

SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems. 58:1-58:12 - Joseph Izraelevitz, Gaukas Wang

, Rhett Hanscom, Kayli Silvers, Tamara Silbergleit Lehman, Gregory V. Chockler
, Alexey Gotsman:
Acuerdo: Fast Atomic Broadcast over RDMA. 59:1-59:11 - Yuan Liu, Wenxin Li, Wenyu Qu, Heng Qi:

BULB: Lightweight and Automated Load Balancing for Fast Datacenter Networks. 60:1-60:11 - Shan Huang

, Dezun Dong, Lingbin Zeng
, Zejia Zhou, Yukun Zhou, Xiangke Liao:
DC4: Reconstructing Data-Credit-Coupled Congestion Control for Data Centers. 61:1-61:11 - Haoyu Wang, Kevin Zheng, Charles Reiss, Haiying Shen:

NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks. 62:1-62:10 - Mikhail Isaev

, Nic McDonald, Jeffrey Young, Richard W. Vuduc:
ParaGraph: An application-simulator interface and toolkit for hardware-software co-design. 63:1-63:13
Performance Benchmarking and Auto-tuning
- Yiltan Hassan Temuçin, Ryan E. Grant, Ahmad Afsahi:

Micro-Benchmarking MPI Partitioned Point-to-Point Communication. 64:1-64:12 - Kohei Yoshida, Rio Sageyama, Shinobu Miwa, Hayato Yamaki, Hiroki Honda:

Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUs. 65:1-65:12 - Cunyang Wei

, Haipeng Jia, Yunquan Zhang, Liusha Xu, Ji Qi:
IATF: An Input-Aware Tuning Framework for Compact BLAS Based on ARMv8 CPUs. 66:1-66:11 - Hui Dou, Yilun Wang

, Yiwen Zhang, Pengfei Chen:
DeepCAT: A Cost-Efficient Online Configuration Auto-Tuning Approach for Big Data Frameworks. 67:1-67:11
Edge and Cloud Computing
- Xiaoyu Xia, Feifei Chen, Qiang He, Guangming Cui, John C. Grundy, Mohamed Almorsy Abdelrazek, Fang Dong:

Formulating Interference-aware Data Delivery Strategies in Edge Storage Systems. 68:1-68:11 - Guangming Cui, Qiang He, Xiaoyu Xia, Feifei Chen, Yun Yang:

Energy-efficient Edge Server Management for Edge Computing: A Game-theoretical Approach. 69:1-69:11 - Liming Ge, Zizhao Wang, Wei Bao

, Dong Yuan, Nguyen Hoang Tran, Bing Bing Zhou, Albert Y. Zomaya:
Semi-Online Multi-Machine with Restart Scheduling for Integrated Edge and Cloud Computing Systems. 70:1-70:13 - Zhaowu Huang, Fang Dong, Dian Shen, Huitian Wang, Xiaolin Guo, Shucun Fu:

Enabling Latency-Sensitive DNN Inference via Joint Optimization of Model Surgery and Resource Allocation in Heterogeneous Edge. 71:1-71:11
Optimization of Federated Learning
- Lina Su, Ruiting Zhou, Ne Wang, Guang Fang, Zongpeng Li:

An Online Learning Approach for Client Selection in Federated Edge Learning under Budget Constraint. 72:1-72:11 - Nang Hung Nguyen, Phi Le Nguyen

, Thuy Dung Nguyen, Trung Thanh Nguyen, Duc Long Nguyen, Thanh Hung Nguyen, Huy Hieu Pham, Truong Thao Nguyen:
FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning. 73:1-73:11 - Shengyuan Ye

, Liekang Zeng, Qiong Wu, Ke Luo, Qingze Fang, Xu Chen:
Eco-FL: Adaptive Federated Learning with Efficient Edge Collaborative Pipeline Training. 74:1-74:11 - Chuang Hu, Huanghuang Liang, Xiao Ming Han, Boan Liu, Dazhao Cheng, Dan Wang

:
Spread: Decentralized Model Aggregation for Scalable Federated Learning. 75:1-75:12 - Jaehee Jang, Heonseok Ha, Dahuin Jung, Sungroh Yoon:

FedClassAvg: Local Representation Learning for Personalized Federated Learning on Heterogeneous Neural Networks. 76:1-76:10
Performance of Machine Learning
- Zining Zhang

, Bingsheng He, Zhenjie Zhang:
HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks. 77:1-77:13 - Liang Liu, Mingzhu Shen, Ruihao Gong

, Fengwei Yu, Hailong Yang:
NNLQP: A Multi-Platform Neural Network Latency Query and Prediction System with An Evolving Database. 78:1-78:14 - Muhammed Fatih Balin, Kaan Sancak, Ümit V. Çatalyürek:

MG-GCN: A Scalable multi-GPU GCN Training Framework. 79:1-79:11 - Rongxin Xu

, Shiva Raj Pokhrel, Qiujun Lan, Gang Li:
FAIR-BFL: Flexible and Incentive Redesign for Blockchain-based Federated Learning. 80:1-80:11
Optimization of Applications
- Yuhao Liu, Xin Du, Zhihui Lu, Qiang Duan, Jianfeng Feng, Minglong Wang, Jie Wu:

Regularizing Sparse and Imbalanced Communications for Voxel-based Brain Simulations on Supercomputers. 81:1-81:11 - Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine A. Yelick

, Aydin Buluç
:
Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly. 82:1-82:11 - Changdae Kim

, Kwangwon Koh, Taehoon Kim
, Daegyu Han, Jiwon Seo:
BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers. 83:1-83:12 - Ricardo Nobre, Aleksandar Ilic

, Sergio Santander-Jiménez, Leonel Sousa
:
Tensor-Accelerated Fourth-Order Epistasis Detection on GPUs. 84:1-84:11 - Qingcai Jiang, Jielan Li, Junshi Chen, Xinming Qin, Lingyun Wan, Jinlong Yang, Jie Liu

, Wei Hu, Hong An:
Accelerating Parallel First-Principles Excited-State Calculation by Low-Rank Approximation with K-Means Clustering. 85:1-85:11 - Sifan Long

, Xiaowei Guo
, Xiaokang Fan, Chao Li, Kelvin Wong
, Ran Zhao, Yi Liu, Sen Zhang
, Canqun Yang:
ParallelDualSPHysics: supporting efficient parallel fluid simulations through MPI-enabled SPH method. 86:1-86:11 - Frank Wanye

, Vitaliy Gleyzer, Edward K. Kao, Wu-chun Feng:
On the Parallelization of MCMC for Community Detection. 87:1-87:13 - Dian-Lun Lin

, Haoxing Ren, Yanqing Zhang, Brucek Khailany, Tsung-Wei Huang
:
From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus. 88:1-88:12

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














