


default search action
ICPP 2024: Gotland, SwedenA
- Proceedings of the 53rd International Conference on Parallel Processing, ICPP 2024, Gotland, Sweden, August 12-15, 2024. ACM 2024, ISBN 979-8-4007-1793-2

Algorithm Optimization
- Wojciech Kwedlo

:
Parallel Iterative Mistake Minimization (IMM) clustering algorithm for shared-memory systems. 1-10 - Subhajit Sahu

, Kishore Kothapalli
, Dip Sankar Banerjee
:
Fast Leiden Algorithm for Community Detection in Shared Memory Setting. 11-20 - Xianglin Wang

, Xin Yi
, Hengbiao Yu
, Chun Huang
, Lin Peng
:
Parallel Optimization for Accelerating the Generation of Correctly Rounded Elementary Functions. 21-31 - Abhishek V. N. Taraka Josyula

, Pritesh Verma
, Amar Gaonkar
, Amlan Barua
, Nikhil Hegde
:
Optimizing a Super-Fast Eigensolver for Hierarchically Semiseparable Matrices. 32-41
Best Paper Finalists
- Donney Fan

, Ben Liang
:
Online Non-preemptive Multi-Resource Scheduling for Weighted Completion Time on Multiple Machines. 42-51 - Yi Zong

, Peinan Yu
, Haopeng Huang
, Wei Xue
:
FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications. 52-62 - Kan Zhong

, Zhiwang Yu
, Qiao Li
, Xianqiang Luo
, Linbo Long
, Yujuan Tan
, Ao Ren
, Duo Liu
:
DPC: DPU-accelerated High-Performance File System Client. 63-72
Co-design
- Sonia Rani Gupta

, Nikela Papadopoulou
, Jing Chen
, Miquel Pericàs
:
Co-Design of Convolutional Algorithms and Long Vector RISC-V Processors for Efficient CNN Model Serving. 73-83 - Quentin Anthony

, Jacob Hatef
, Deepak Narayanan
, Stella Biderman
, Stas Bekman
, Junqi Yin
, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda
:
The Case for Co-Designing Model Architectures with Hardware. 84-96 - Qifeng Pan

, Ralf Schneider
:
Improving efficiency of Monte Carlo method via code intrinsic framework. 97-106 - Yongseok Soh

, Ramakrishnan Kannan
, Piyush Sao
, Jee W. Choi
:
Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures. 107-116
Communication and Networks
- Ujjaini Mukhopadhyay

, Alok Tripathy
, Oguz Selvitopi
, Katherine A. Yelick
, Aydin Buluç
:
Sparsity-Aware Communication for Distributed Graph Neural Network Training. 117-126 - Jing Xu

, Zhan Wang
, Fan Yang
, Ning Kang
, Zhenlong Ma
, Guojun Yuan
, Guangming Tan
, Ninghui Sun
:
FNCC: Fast Notification Congestion Control in Data Center Networks. 127-137 - Wen Xu

, Juncheng Wang
, Ben Liang
, Gary Boudreau
, Hamza Umit Sokun
:
Distributed Minimax Fair Optimization over Hierarchical Networks. 138-147
Communication and Scalability
- Jing Peng

, Zihan Li
, Shaohuai Shi
, Bo Li
:
Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning. 148-157 - Xinbiao Gan

, Tiejun Li
, Qiang Zhang
, Bo Yang
, Xinhai Chen
, Jie Liu
:
SuperCSR: A Space-Time-Efficient CSR Representation for Large-scale Graph Applications on Supercomputers. 158-167 - Tim Beringer

, Jakob Stock
, Arya Mazaheri
, Felix Wolf
:
Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction. 168-178
GPU Memory
- Jiajian Zhang

, Fangyu Wu
, Hai Jiang
, Guangliang Cheng
, Genlang Chen
, Qiufeng Wang
:
SyncMalloc: A Synchronized Host-Device Co-Management System for GPU Dynamic Memory Allocation across All Scales. 179-188 - Abdun Nihaal

, Madhu Mutyam
:
Selective Memory Compression for GPU Memory Oversubscription Management. 189-198 - Gabin Schieffer

, Jacob Wahlgren
, Jie Ren
, Jennifer Faj
, Ivy Peng
:
Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper. 199-209
In-situ Workflow
- Jiahui Liu

, Tobias Edwards
, Kristina Durovic
, Philipp Schlatter
, Tino Weinkauf
:
In-Situ Binary Segmentation of 3D time-dependent Flows into Laminar and Turbulent Regions. 210-219 - Dewi Yokelson

, Mikhail Titov
, Srinivasan Ramesh
, Ozgur O. Kilic
, Matteo Turilli
, Shantenu Jha
, Allen D. Malony
:
Enabling Performance Observability for Heterogeneous HPC Workflows with SOMA. 220-230 - Jaime Cernuda

, Jie Ye
, Anthony Kougkas
, Xian-He Sun
:
HStream: A hierarchical data streaming engine for high-throughput scientific applications. 231-240
Parallel Algorithm
- Jialin Li

, Zhichen Feng
, Yaqian Gao
, Shaobo Tian
, Haoyuan Zhang
, Huang Ye
, Jian Zhang
:
High-Performance 3D convolution on the Latest Generation Sunway Processor. 241-251 - Gaurav Bhardwaj

, Bapi Chatterjee
, Abhinav Sharma
, Sathya Peri
, Siddharth Nayak
:
Kanva: A Lock-free Learned Search Data Structure. 252-261 - Haopeng Huang

, Yuyang Jin
, Wei Xue
:
BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core System. 262-272
Parallel Language
- Buddhi Ashan Mallika Kankanamalage

, Satish Puri
, Sushil K. Prasad
:
Extending Segment Tree for Polygon Clipping and Parallelizing using OpenMP and OpenACC Directives. 273-283 - Ruben Laso

, Diego Krupitza
, Sascha Hunold
:
Exploring Scalability in C++ Parallel STL Implementations. 284-293 - Zaman Lantra

, Steven A. Wright
, Gihan R. Mudalige
:
OP-PIC - an Unstructured-Mesh Particle-in-Cell DSL for Developing Nuclear Fusion Simulations. 294-304
Scheduling Cloud
- Svetlana Kulagina

, Henning Meyerhenke
, Anne Benoit
:
Mapping Large Memory-constrained Workflows onto Heterogeneous Platforms✱. 305-316 - Liang Zhang

, Hongzi Zhu
, Yunzhe Li
, Jiangang Shen
, Minyi Guo
:
The Blind and the Elephant: A Preference-aware Edge Video Analytics Scheduler for Maximizing System Benefit. 317-326 - Tomasz Kanas

, Krzysztof Rzadca
:
Diminishing cold starts in serverless computing with approximation algorithms. 327-336 - Jiawei Huang

, Qile Wang
, Zhaoyi Li
, Yijun Li
, Zihao Chen
, Sitan Li
, Jing Shao
, Jingling Liu
, Min Zhan
, Jianxin Wang
:
Achieving Efficient Scheduling based on Accurate Measurement of Small Flows in Data Center. 337-346 - Huadong Li

, Hui Liu
, Aoqi Chen
, Xirui Ma
, Junzhao Du
:
Thawbringer: An Orchestrator to Mitigate Cascading Cold Starts of Serverless Function Chains. 347-356 - Ying Zheng

, Lei Jiao
, Han Yang
, Lulu Chen
, Ying Liu
, Yuxiao Wang
, Yuedong Xu
, Xin Wang
, Zongpeng Li
:
Online Scheduling and Pricing for Multi-LoRA Fine-Tuning Tasks. 357-366 - Xin Tan

, Jiamin Li
, Yitao Yang
, Jingzong Li
, Hong Xu
:
Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths. 367-376
Scientific Simulations
- Yaqian Gao

, Jian Zhang
, Huang Ye
, Xuebin Chi
:
Large-scale Phase-Field Simulations for Solid-Solid Phase Transformations involving Elastic Energy. 377-387 - Shui Jiang

, Rongliang Fu
, Lukas Burgholzer
, Robert Wille
, Tsung-Yi Ho
, Tsung-Wei Huang
:
FlatDD: A High-Performance Quantum Circuit Simulator using Decision Diagram and Flat Array. 388-399 - Yi Zhang, Ziyu Zhang, Yang Zhao

, Junshi Chen, Hong An, Zhanming Wang, Longkui Chen:
Multi-level Load Balancing Strategies for Massively Parallel Smoothed Particle Hydrodynamics Simulation. 400-410 - Ran Zhao

, Chao Li
, Xiaowei Guo
, Sen Zhang
, Xi Yang
, Tao Tang
, Canqun Yang
:
A Motion Trace Decomposition-based overset grid method for parallel CFD simulations with moving boundaries. 411-420
Distributed Systems
- Conor James Green

, Mithuna Thottethodi
:
NetSmith: An Optimization Framework for Machine-Discovered Network Topologies. 421-432 - Chen Chen

, Li Shen
, Yingwen Chen
:
A Distributed Framework for Subgraph Isomorphism Leveraging CPU and GPU Heterogeneous Computing. 433-442 - Jinbin Hu

, Ying Liu
, Hao Wang
, Jin Wang
:
AutoPipe: Automatic Configuration of Pipeline Parallelism in Shared GPU Cluster. 443-452 - Ruisong Zhou

, Yuzhan Zhang
, Chunhua Li
, Ke Zhou
, Peng Wang
, Gong Zhang
, Ji Zhang
, Guangyu Zhang
:
HyperDB: a Novel Key Value Store for Reducing Background Traffic in Heterogeneous SSD Storage. 453-463
Federated Learning
- Fuyuan Xia

, Chenhao Ying
, David S. L. Wei
, Wei Chen
, Weiting Zhang
, Haiming Jin
, Yuan Luo
:
ChronusFed: Reinforcement-Based Adaptive Partial Training for Heterogeneous Federated Learning. 464-473 - Md Sirajul Islam

, Simin Javaherian
, Fei Xu
, Xu Yuan
, Li Chen
, Nian-Feng Tzeng
:
FedClust: Tackling Data Heterogeneity in Federated Learning through Weight-Driven Client Clustering. 474-483 - Jiangshan Hao

, Fang Dong
, Bingheng Cen
, Shucun Fu
, Ruiting Zhou
, Ding Ding
:
HASFL: Harnessing Heterogeneous Models Across Diverse Devices for Enhanced Federated Learning. 484-493 - Na Lv

, Zhi Shen
, Chen Chen
, Zhifeng Jiang
, Jiayi Zhang
, Quan Chen
, Minyi Guo
:
FedCA: Efficient Federated Learning with Client Autonomy. 494-503
GPU Cluster Optimization
- Bowen Zhang

, Shuxin Li
, Zhuozhao Li
:
MIGER: Integrating Multi-Instance GPU and Multi-Process Service for Deep Learning Clusters. 504-513 - Fei Yang

, Shuang Peng
, Ning Sun
, Fangyu Wang
, Yuanyuan Wang
, Fu Wu
, Jiezhong Qiu
, Aimin Pan
:
Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment. 514-523 - Bowen Yuchi

, Heng Shi
, Guoqing Bao
:
SPHINX: Search Space-Pruning Heterogeneous Task Scheduling for Deep Neural Networks. 524-533
Graph on GPU
- Chenle Yu

, Sara Royuela
, Eduardo Quiñones
:
Enhancing Heterogeneous Computing Through OpenMP and GPU Graph. 534-543 - Shinnung Jeong

, Sungjun Cho
, Yongwoo Lee
, Hyunjun Park
, Seonyeong Heo
, Gwangsun Kim
, Youngsok Kim
, Hanjun Kim
:
CR2: Community-aware Compressed Regular Representation for Graph Processing on a GPU. 544-554 - Chen Zhao

, Ting Yu
, Zhigao Zheng
, Yuanyuan Zhu
, Song Jin
, Bo Du
, Dacheng Tao
:
SpeedCore: Space-efficient and Dependency-aware GPU Parallel Framework for Core Decomposition. 555-564 - Chih-Chun Chang

, Boyang Zhang
, Tsung-Wei Huang
:
GSAP: A GPU-Accelerated Stochastic Graph Partitioner. 565-575 - Mahesh Lakshminarasimhan

, Mary W. Hall
, Samuel Williams
, Oscar Antepara
:
BrickDL: Graph-Level Optimizations for DNNs with Fine-Grained Data Blocking on GPUs. 576-586 - Mithinti Srikanth

, Prashant Singh
, G. Ramakrishna
:
GPU Algorithms for Fastest Path Problem in Temporal Graphs. 587-596
Memory and Storage
- Wenda Tang

, Ying Han
, Tianxiang Ai
, Guanghui Li
, Bin Yu
, Xin Yang
:
Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared Memory. 597-606 - Ziwei Xiong

, Dejun Jiang
, Jin Xiong
:
DiStore: A Fully Memory Disaggregation Friendly Key-Value Store with Improved Tail Latency and Space Efficiency. 607-617 - Liuying Ma

, Zhenqing Liu
, Jin Xiong
, Yue Wu
, Renhai Chen
, Xi Peng
, Ying Zhang
, Gong Zhang
, Dejun Jiang
:
zQoS: Unleashing full performance capabilities of NVMe SSDs while enforcing SLOs in distributed storage systems. 618-628
Memory Optimization
- Stavroula Zouzoula

, Mohammad Ali Maleki
, Muhammad Waqar Azhar
, Pedro Trancoso
:
Scratchpad Memory Management for Deep Learning Accelerators. 629-639 - Jihu Guo

, Rui Xia
, Jie Liu
, Xiaoxiong Zhu
, Xiang Zhang
:
CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU. 640-649 - Qisheng Jiang

, Lei Jia
, Chundong Wang
:
GNNDrive: Reducing Memory Contention and I/O Congestion for Disk-based GNN Training. 650-659 - XinYu Piao

, Jong-Kook Kim
:
GMM: An Efficient GPU Memory Management-based Model Serving System for Multiple DNN Inference Models. 660-668
Performance Optimization
- Kaveh Mahdavi

:
A Hybrid Machine Learning Method for Cross-Platform Performance Prediction of Parallel Applications. 669-678 - Fugeng Zhu

, Xinxin Qi
, Peng Zhang
, Jianbin Fang
, Tao Tang
, Yonggang Che
, Kainan Yu
, Jing Xie
, Chun Huang
, Jie Ren
:
Optimizing Stencil Computation on Multi-core DSPs. 679-690 - Ricardo Jesus

, Michèle Weiland
:
Evaluating and optimising compiler code generation for NVIDIA Grace. 691-700
Resource Allocation
- Yingwen Chen

, Wenxin Li
, Huan Zhou
, Xiangrui Yang
, Yanfei Yin
:
DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasks. 701-711 - Jiazhen Zhu

, Wenda Tang
, Xianglong Meng
, Nan Gong
, Tianxiang Ai
, Guanghui Li
, Bin Yu
, Xin Yang
:
PheCon: Fine-Grained VM Consolidation with Nimble Resource Defragmentation in Public Cloud Platforms. 712-721 - Dingyu Yang

, Ziyang Xiao
, Dongxiang Zhang
, Shuhao Zhang
, Jian Cao
, Gang Chen
:
PREACT: Predictive Resource Allocation for Bursty Workloads in a Co-located Data Center. 722-731
Scheduling Cloud
- Siyuan Chen

, Decheng Zuo
, Zhan Zhang
:
FlexSP: (1 + β)-Choice based Flexible Stream Partitioning for Stateful Operators. 732-741 - Wen Gao

, Zhiwen Yu
, Hui Xiong
, Bin Guo
, Liang Wang
, Yuan Yao
:
Parallel Task Scheduling in Autonomous Robotic Systems: An Event-Driven Multimodal Prediction Approach. 742-751 - Bin Gao

, Zhehui Wang
, Zhuomin He
, Tao Luo
, Weng-Fai Wong
, Zhi Zhou
:
IMI: In-memory Multi-job Inference Acceleration for Large Language Models. 752-761
Scheduling Edge
- Bei Ouyang

, Shengyuan Ye
, Liekang Zeng
, Tianyi Qian
, Jingyi Li
, Xu Chen
:
Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-tuning. 762-771 - Huadong Li

, Hui Liu
, Aoqi Chen
, Xirui Ma
, Qiaoqiao Liu
, Junzhao Du
:
RIA: Return on Investment Auto-scaler for Serverless Edge Functions. 772-781 - Yan Zhuang

, Zhenzhe Zheng
, Yunfeng Shao
, Bingshuai Li
, Fan Wu
, Guihai Chen
:
Nebula: An Edge-Cloud Collaborative Learning Framework for Dynamic Edge Environments. 782-791 - Jieyu Lin

, Minghao Li
, Sai Qian Zhang
, Alberto Leon-Garcia
:
Murmuration: On-the-fly DNN Adaptation for SLO-Aware Distributed Inference in Dynamic Edge Environments. 792-801
Tools
- Praseetha M

, Madhu Mutyam
, Venkata Kalyan Tavva
:
Cache Line Pinning for Mitigating Row Hammer Attack. 802-811 - Jie Ye

, Jaime Cernuda
, Neeraj Rajesh
, Keith Bateman
, Orcun Yildiz
, Tom Peterka
, Arnur Nigmetov
, Dmitriy Morozov
, Xian-He Sun
, Anthony Kougkas
, Bogdan Nicolae
:
Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network Models. 812-821 - Siyu Wu

, Hailong Yang
, Xin You
, Ruihao Gong
, Yi Liu
, Zhongzhi Luan
, Depei Qian
:
PRoof: A Comprehensive Hierarchical Profiling Framework for Deep Neural Networks with Roofline Analysis. 822-832 - Simon Schwitanski

, Yussur Mustafa Oraji
, Cornelius Pätzold
, Joachim Jenke
, Felix Tomski
, Matthias S. Müller
:
RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access Applications. 833-844
Compression
- Tri Nguyen

, Md Hasanur Rahman
, Sheng Di
, Michela Becchi
:
Significantly Improving Fixed-Ratio Compression Framework for Resource-limited Applications. 845-855 - André Weißenberger

, Bertil Schmidt
:
Massively Parallel Inverse Block-sorting Transforms for bzip2 Decompression on GPUs. 856-865 - Zichen Tang

, Junlin Huang
, Rudan Yan
, Yuxin Wang
, Zhenheng Tang
, Shaohuai Shi
, Amelie Chi Zhou
, Xiaowen Chu
:
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning. 866-875
Configurable Hardware
- Kyle Zhao Bin Chen

, Tarek S. Abdelrahman
, Reza Azimi
, Tomasz S. Czajkowski
, Maziar Goudarzi
:
RoDMap: A Reserve-on-Demand Mapper for Spatially-Configured Coarse-Grained Reconfigurable Arrays. 876-886 - Jie Cheng

, Lifu Hu
, Wei Xu
, Hanhua Chen
, Tian Xia
:
Hardware Acceleration of Minimap2 Genomic Sequence Alignment Algorithm. 887-897 - Weilin Zhu

, Wei Tong
, Hujun Ge
, Zuoxian Zhang
, Mengran Zhang
, Wen Zhou
:
LpaqHP: A High-Performance FPGA Accelerator for LPAQ Compression. 898-907
Distributed Memory
- Piyush Sao

, Andrey Prokopenko
, Damien Lebrun-Grandié
:
PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU. 908-918 - Yifan Li

, Giulia Guidi
:
High-Performance Sorting-Based K-mer Counting in Distributed Memory with Flexible Hybrid Parallelism. 919-928 - Rui Zhang

, Yukai Huang
, Sicheng Liang
, Shangyi Sun
, Shaonan Ma
, Chengying Huan
, Lulu Chen
, Zhihui Lu
, Yang Xu
, Ming Yan
, Jie Wu
:
Revisiting Learned Index with Byte-addressable Persistent Storage. 929-938
Energy-aware Computing
- Hanfei Geng, Yi Sun, Yuanzhe Li, Jichao Leng, Xiangyu Zhu, Xianyuan Zhan, Yuanchun Li, Feng Zhao, Yunxin Liu:

TESLA: Thermally Safe, Load-Aware, and Energy-Efficient Cooling Control System for Data Centers. 939-949 - Hanlong Liao

, Guoming Tang
, Deke Guo
, Yi Wang
, Ruide Cao
:
Rethinking Low-Carbon Edge Computing System Design with Renewable Energy Sharing. 950-960 - Tiago Da Silva Barros

, Davide Ferré
, Frédéric Giroire
, Ramon Aparicio-Pardo
, Stephane Perennes
:
Scheduling Machine Learning Compressible Inference Tasks with Limited Energy Budget. 961-970
Federated Learning
- Haoyu Chen

, Yuxin Zhang
, Jin Zhao
, Xin Wang
, Yuedong Xu
:
Gradient Free Personalized Federated Learning. 971-980 - Yinlong Li

, Hao Zhang
, Siyao Cheng
, Jie Liu
:
Federated Edge Learning with Blurred or Pseudo Data Sharing. 981-990 - Dezhong Yao

, Ziquan Zhu
, Tongtong Liu
, Zhiqiang Xu
, Hai Jin
:
Rethinking Personalized Federated Learning from Knowledge Perspective. 991-1000
GPU Optimization
- Xu Zhang

, Guangda Zhang
, Lu Wang
, Shiqing Zhang
, Xia Zhao
:
AdCoalescer: An Adaptive Coalescer to Reduce the Inter-Module Traffic in MCM-GPUs. 1001-1011 - Jaebeom Jeon

, Minseong Gil
, Junsu Kim
, Jaeyong Park
, Gunjae Koo
, Myung Kuk Yoon
, Yunho Oh
:
VitBit: Enhancing Embedded GPU Performance for AI Workloads through Register Operand Packing. 1012-1021 - Qianchao Zhu

:
FreeStencil: A Fine-Grained Solver Compiler with Graph and Kernel Optimizations on Structured Meshes for Modern GPUs. 1022-1031
Memory-centric Computing
- Pingdan Xiao

, Qinghui Hong
, Sichun Du
, Jiliang Zhang
:
CIM-KF: Efficient Computing-in-memory Circuits for Full-Process Execution of Kalman Filter Algorithm. 1032-1041 - Mohammad Sabri Abrebekoh

, Marc Riera Villanueva
, Antonio González
:
ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNNs. 1042-1051 - Tong Wu

, Shuibing He
, Jianxin Zhu
, Weijian Chen
, Siling Yang
, Ping Chen
, Yanlong Yin
, Xuechen Zhang
, Xian-He Sun
, Gang Chen
:
AUTOHET: An Automated Heterogeneous ReRAM-Based Accelerator for DNN Inference. 1052-1061 - Meven Mognol

, Dominique Lavenier
, Julien Legriel
:
Parallelization of the Banded Needleman & Wunsch Algorithm on UPMEM PiM Architecture for Long DNA Sequence Alignment. 1062-1071
Simulations on GPU
- Zhiyi Zhang

, Pengfei Zhang
, Zhuopin Xu
, Bingjie Yan
, Qi Wang
:
Im2col-Winograd: An Efficient and Flexible Fused-Winograd Convolution for NHWC Format on GPUs. 1072-1081 - Taisuke Boku

, Masatake Sugita
, Ryohei Kobayashi
, Shinnosuke Furuya
, Takuya Fujie
, Masahito Ohue
, Yutaka Akiyama
:
Improving Performance on Replica-Exchange Molecular Dynamics Simulations by Optimizing GPU Core Utilization. 1082-1091 - Fazeleh S. Kazemian

, Jorge L. Galvez Vallejo
, Giuseppe M. J. Barca
:
High-Performance, Accurate Large-Scale Quantum Chemistry Calculations on GPU Supercomputers using Coulomb-Perturbed Fragmentation. 1092-1102 - Runfeng Jin

, Wenhao Liang
, Haoyuan Zhang
, Yinxuan Song
, Zhen Luo
, Haibo Ma
, Yingjin Ma
, Zhong Jin
:
PASCI : A Scalable Framework for Heterogeneous Parallel Calculation of Dynamical Electron Correlation. 1103-1113
Sparse Tensor
- Seungbin Song

, Ju Min Lee
, Haeeun Jeong
, Hyunho Kwon
, Shinnung Jeong
, Jaeho Lee
, Hanjun Kim
:
TeMCO: Tensor Memory Compiler Optimization across Tensor Decompositions in Deep Learning Inference. 1114-1123 - Kaige Zhang

, Xiaoyan Liu
, Hailong Yang
, Tianyu Feng
, Xinyu Yang
, Yi Liu
, Zhongzhi Luan
, Depei Qian
:
Jigsaw: Accelerating SpMM with Vector Sparsity on Sparse Tensor Core. 1124-1134 - YuAng Chen

, Jeffrey Xu Yu
:
Bitmap-Based Sparse Matrix-Vector Multiplication with Tensor Cores. 1135-1144
SpMV
- Deshun Bi

, Shengguo Li
, Dezun Dong
, Peng Zhang
, Jianbin Fang
:
Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and Vectorization. 1145-1155 - Zhong Zheng

, Junshi Chen
, Yang Zhao
, Longsheng Song
, Xinming Qin
, Hong An
:
DB-SpGEMM: A Massively Distributed Block-Sparse Matrix-Matrix Multiplication for Linear-Scaling DFT Calculations. 1156-1165 - Chuhe Hong

, Qinglin Wang
, Runzhang Mao
, Yuechao Liang
, Rui Xia
, Jie Liu
:
SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core Processors. 1166-1175 - Haotian Mo

, Qinglin Wang
, Linyu Liao
, Biao Li
, Lihua Chi
, Jie Liu
:
Detailed Analysis and Optimization of Irregular-Shaped Matrix Multiplication on Multi-Core DSPs. 1176-1186
Storage
- Junhyeok Park

, Chang-Gyu Lee
, Soon Hwang
, Soonyeal Yang
, Jungki Noh
, Woosuk Chung
, Junghee Lee
, Youngjae Kim
:
BandSlim: A Novel Bandwidth and Space-Efficient KV-SSD with an Escape-from-Block Approach. 1187-1196 - Guantian Lin

, Si Wu
, Cheng Li
, Yinlong Xu
:
Designing Non-uniform Locally Repairable Codes for Wide Stripes under Skewed File Accesses. 1197-1206 - Piao Hu

, Huangzhen Xue
, Chentao Wu
, Jie Li
, Minyi Guo
:
HMT: A Hybrid Mitigating and Transferring Approach on I/O Throughput Degradation for Erasure Coded Storage Systems. 1207-1216 - Renping Liu

, Junhua Chen
, Peng Chen
, Linbo Long
, Anping Xiong
, Duo Liu
:
Hi-ZNS: High Space Efficiency and Zero-Copy LSM-Tree Based Stores on ZNS SSDs. 1217-1226 - Jiawei Huang

, Zihao Chen
, Yiting Wang
, Hui Li
, Zhaoyi Li
, Qile Wang
, Sitan Li
, Zhidong He
, Wanchun Jiang
:
Achieving High Efficiency for Datacenter Multicast using Skewed Bloom Filter. 1227-1236 - Shucheng Wang

, Kaiye Zhou
, Zhandong Guo
, Qiang Cao
, Jun Xu
, Jie Yao
:
SIndex: An SSD-based Large-scale Indexing with Deterministic Latency for Cloud Block Storage. 1237-1246 - Jiawei Huang

, Shengwen Zhou
, Zhaoyi Li
, Yijun Li
, Zihao Chen
, Xiaojun Zhu
, Jing Shao
, Sitan Li
, Wanchun Jiang
, Jianxin Wang
, Ping Zhong
:
Coupling Congestion Control and Flow Pausing in Data Center Network. 1247-1256

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














