


default search action
ICS 2023: Orlando, FL, USA
- Kyle A. Gallivan, Efstratios Gallopoulos, Dimitrios S. Nikolopoulos, Ramón Beivide:
Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023. ACM 2023
Best Papers
- Jinyang Liu
, Sheng Di
, Kai Zhao
, Xin Liang
, Zizhong Chen
, Franck Cappello
:
FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data. 1-13 - Neil Lindquist
, Piotr Luszczek
, Jack J. Dongarra
:
Using Additive Modifications in LU Factorization Instead of Pivoting. 14-24 - Jun Xiao
, Yaocheng Xiang
, Xiaolin Wang
, Yingwei Luo
, Andy D. Pimentel
, Zhenlin Wang
:
FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance. 25-36
Compilation and Scheduling
- Thomas Randall
, Jaehoon Koo
, Brice Videau
, Michael Kruse
, Xingfu Wu
, Paul D. Hovland
, Mary W. Hall
, Rong Ge
, Prasanna Balaprakash
:
Transfer-learning-based Autotuning using Gaussian Copula. 37-49 - Lukas Trümper
, Tal Ben-Nun
, Philipp Schaad
, Alexandru Calotoiu
, Torsten Hoefler
:
Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization. 50-62 - Xu Wen
, Wanling Gao
, Anzheng Li
, Lei Wang
, Zihan Jiang
, Jianfeng Zhan
:
CMLCompiler: A Unified Compiler for Classical Machine Learning. 63-74 - Pu Pang
, Yaoxuan Li
, Bo Liu
, Quan Chen
, Zhou Yu
, Zhibin Yu
, Deze Zeng
, Jingwen Leng
, Jieru Zhao
, Minyi Guo
:
PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization. 75-86
Tools and Libraries
- Kelun Lei
, Xin You
, Hailong Yang
, Zhongzhi Luan
, Depei Qian
:
BiRFIA: Selective Binary Rewriting for Function Interception on ARM. 87-98 - Milan Shah
, Xiaodong Yu
, Sheng Di
, Michela Becchi
, Franck Cappello
:
Lightweight Huffman Coding for Efficient GPU Compression. 99-110 - RuQing G. Xu
, Field G. Van Zee
, Robert A. van de Geijn
:
Towards a Unified Implementation of GEMM in BLIS. 111-121
I/O and Storage
- Md. Arifuzzaman
, Engin Arslan
:
Use Only What You Need: Judicious Parallelism For File Transfers in High Performance Networks. 122-132 - Meghana Madhyastha
, Robert Underwood
, Randal C. Burns
, Bogdan Nicolae
:
DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access. 133-143 - Amelie Chi Zhou
, Zhoubin Ke
, Jianming Lao
:
DyVer: Dynamic Version Handling for Array Databases. 144-154
Accelerator Programming I
- Minh Pham
, Yicheng Tu
, Xiaoyi Lv
:
Accelerating BWA-MEM Read Mapping on GPUs. 155-166 - Lingqi Zhang
, Mohamed Wahib
, Peng Chen
, Jintao Meng
, Xiao Wang
, Toshio Endo
, Satoshi Matsuoka
:
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. 167-179 - Marcelo Orenes-Vera
, Ilya Sharapov
, Robert Schreiber
, Mathias Jacquelin
, Philippe Vandermersch
, Sharan Chetlur
:
Wafer-Scale Fast Fourier Transforms. 180-191 - Ismayil Ismayilov
, Javid Baydamirli
, Dogan Sagbili
, Mohamed Wahib
, Didem Unat
:
Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge. 192-202
Large Scale Applications I
- Siddharth Singh
, Olatunji Ruwase
, Ammar Ahmad Awan
, Samyam Rajbhandari
, Yuxiong He
, Abhinav Bhatele
:
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. 203-214 - Han D. Tran
, Siddharth Saurav
, P. Sadayappan
, Sandip Mazumder
, Hari Sundar
:
Scalable parallelization for the solution of phonon Boltzmann Transport Equation. 215-226 - Xiaojian Yang
, Shengguo Li
, Fan Yuan
, Dezun Dong
, Chun Huang
, Zheng Wang
:
Optimizing Multi-grid Computation and Parallelization on Multi-cores. 227-239 - Xinbiao Gan
, Guang Wu
, Ruigeng Zeng
, Jiaqi Si
, Ji Liu
, Daxiang Dong
, Chunye Gong
, Cong Liu
, Tiejun Li
:
FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing. 240-250
Accelerator Programming II
- Lingqi Zhang
, Mohamed Wahib
, Peng Chen
, Jintao Meng
, Xiao Wang
, Toshio Endo
, Satoshi Matsuoka
:
Revisiting Temporal Blocking Stencil Optimizations. 251-263 - Jou-An Chen
, Hsin-Hsuan Sung
, Xipeng Shen
, Sutanay Choudhury
, Ang Li
:
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs. 264-276 - Shaofeng Yang
, Xiandong Liu
, Yunting Wang
, Xin He
, Guangming Tan
:
Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph. 277-288 - Vani Nagarajan
, Durga Mandarapu
, Milind Kulkarni
:
RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search. 289-300
Large Scale Applications II
- Srinivas Eswar
, Benjamin Cobb
, Koby Hayashi
, Ramakrishnan Kannan
, Grey Ballard
, Richard W. Vuduc
, Haesun Park
:
Distributed-Memory Parallel JointNMF. 301-312 - Yu Chen
, Lucca Skon
, James R. McCombs
, Zhenming Liu
, Andreas Stathopoulos
:
Parallel Software for Million-scale Exact Kernel Regression. 313-323 - Chengming Zhang
, Shaden Smith
, Baixi Sun
, Jiannan Tian
, Jonathan Soifer
, Xiaodong Yu
, Shuaiwen Leon Song
, Yuxiong He
, Dingwen Tao
:
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. 324-335 - Anqi Guo
, Yuchen Hao
, Chunshu Wu
, Pouya Haghi
, Zhenyu Pan
, Min Si
, Dingwen Tao
, Ang Li
, Martin C. Herbordt
, Tong Geng
:
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training. 336-347
Architecture and Interconnects
- Boyuan Zhang
, Jiannan Tian
, Sheng Di
, Xiaodong Yu
, Martin Swany
, Dingwen Tao
, Franck Cappello
:
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs. 348-359 - Shixun Wu
, Yujia Zhai
, Jinyang Liu
, Jiajun Huang
, Zizhe Jian
, Bryan M. Wong
, Zizhong Chen
:
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs. 360-372 - Marcin Copik
, Roman Böhringer
, Alexandru Calotoiu
, Torsten Hoefler
:
FMI: Fast and Cheap Message Passing for Serverless Functions. 373-385 - Maulein Pathak
, Yogish Sabharwal
, Neelima Gupta
:
Scalable algorithms for compact spanners on real world graphs. 386-397 - Tun Chen
, Haipeng Jia
, Yunquan Zhang
, Kun Li
, Zhihao Li
, Xiang Zhao
, Jianyu Yao
, Chendi Li
:
OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs. 398-409
Parallel Algorithms
- Grigory Chirkov
, David Wentzlaff
:
Seizing the Bandwidth Scaling of On-Package Interconnect in a Post-Moore's Law World. 410-422 - Ruiqi Wang
, Dezun Dong
, Fei Lei
, Junchao Ma
, Ke Wu
, Kai Lu
:
Roar: A Router Microarchitecture for In-network Allreduce. 423-436 - Guangnan Feng
, Dezun Dong
, Shizhen Zhao
, Yutong Lu
:
GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC. 437-449 - Pouya Haghi
, William Krska
, Cheng Tan
, Tong Geng
, Po-Hao Chen
, Connor Greenwood
, Anqi Guo
, Thomas M. Hines
, Chunshu Wu
, Ang Li
, Anthony Skjellum
, Martin C. Herbordt
:
FLASH: FPGA-Accelerated Smart Switches with GCN Case Study. 450-462 - Gagandeep Singh
, Alireza Khodamoradi
, Kristof Denolf
, Jack Lo
, Juan Gómez-Luna
, Joseph Melber
, Andra Bisca
, Henk Corporaal
, Onur Mutlu
:
SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation. 463-476 - Nicholas Contini
, Bharath Ramesh
, Kaushik Kandadi Suresh
, Tu Tran
, Benjamin Michalowicz
, Mustafa Abduljabbar
, Hari Subramoni
, Dhabaleswar K. Panda
:
Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication. 477-487

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.