


default search action
ICS 2023: Orlando, FL, USA
- Kyle A. Gallivan, Efstratios Gallopoulos, Dimitrios S. Nikolopoulos, Ramón Beivide:

Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023. ACM 2023
Best Papers
- Jinyang Liu

, Sheng Di
, Kai Zhao
, Xin Liang
, Zizhong Chen
, Franck Cappello
:
FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data. 1-13 - Neil Lindquist

, Piotr Luszczek
, Jack J. Dongarra
:
Using Additive Modifications in LU Factorization Instead of Pivoting. 14-24 - Jun Xiao

, Yaocheng Xiang
, Xiaolin Wang
, Yingwei Luo
, Andy D. Pimentel
, Zhenlin Wang
:
FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance. 25-36
Compilation and Scheduling
- Thomas Randall

, Jaehoon Koo
, Brice Videau
, Michael Kruse
, Xingfu Wu
, Paul D. Hovland
, Mary W. Hall
, Rong Ge
, Prasanna Balaprakash
:
Transfer-learning-based Autotuning using Gaussian Copula. 37-49 - Lukas Trümper

, Tal Ben-Nun
, Philipp Schaad
, Alexandru Calotoiu
, Torsten Hoefler
:
Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization. 50-62 - Xu Wen

, Wanling Gao
, Anzheng Li
, Lei Wang
, Zihan Jiang
, Jianfeng Zhan
:
CMLCompiler: A Unified Compiler for Classical Machine Learning. 63-74 - Pu Pang

, Yaoxuan Li
, Bo Liu
, Quan Chen
, Zhou Yu
, Zhibin Yu
, Deze Zeng
, Jingwen Leng
, Jieru Zhao
, Minyi Guo
:
PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization. 75-86
Tools and Libraries
- Kelun Lei

, Xin You
, Hailong Yang
, Zhongzhi Luan
, Depei Qian
:
BiRFIA: Selective Binary Rewriting for Function Interception on ARM. 87-98 - Milan Shah

, Xiaodong Yu
, Sheng Di
, Michela Becchi
, Franck Cappello
:
Lightweight Huffman Coding for Efficient GPU Compression. 99-110 - RuQing G. Xu

, Field G. Van Zee
, Robert A. van de Geijn
:
Towards a Unified Implementation of GEMM in BLIS. 111-121
I/O and Storage
- Md. Arifuzzaman

, Engin Arslan
:
Use Only What You Need: Judicious Parallelism For File Transfers in High Performance Networks. 122-132 - Meghana Madhyastha

, Robert Underwood
, Randal C. Burns
, Bogdan Nicolae
:
DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access. 133-143 - Amelie Chi Zhou

, Zhoubin Ke
, Jianming Lao
:
DyVer: Dynamic Version Handling for Array Databases. 144-154
Accelerator Programming I
- Minh Pham

, Yicheng Tu
, Xiaoyi Lv
:
Accelerating BWA-MEM Read Mapping on GPUs. 155-166 - Lingqi Zhang

, Mohamed Wahib
, Peng Chen
, Jintao Meng
, Xiao Wang
, Toshio Endo
, Satoshi Matsuoka
:
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. 167-179 - Marcelo Orenes-Vera

, Ilya Sharapov
, Robert Schreiber
, Mathias Jacquelin
, Philippe Vandermersch
, Sharan Chetlur
:
Wafer-Scale Fast Fourier Transforms. 180-191 - Ismayil Ismayilov

, Javid Baydamirli
, Dogan Sagbili
, Mohamed Wahib
, Didem Unat
:
Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge. 192-202
Large Scale Applications I
- Siddharth Singh

, Olatunji Ruwase
, Ammar Ahmad Awan
, Samyam Rajbhandari
, Yuxiong He
, Abhinav Bhatele
:
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. 203-214 - Han D. Tran

, Siddharth Saurav
, P. Sadayappan
, Sandip Mazumder
, Hari Sundar
:
Scalable parallelization for the solution of phonon Boltzmann Transport Equation. 215-226 - Xiaojian Yang

, Shengguo Li
, Fan Yuan
, Dezun Dong
, Chun Huang
, Zheng Wang
:
Optimizing Multi-grid Computation and Parallelization on Multi-cores. 227-239 - Xinbiao Gan

, Guang Wu
, Ruigeng Zeng
, Jiaqi Si
, Ji Liu
, Daxiang Dong
, Chunye Gong
, Cong Liu
, Tiejun Li
:
FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing. 240-250
Accelerator Programming II
- Lingqi Zhang

, Mohamed Wahib
, Peng Chen
, Jintao Meng
, Xiao Wang
, Toshio Endo
, Satoshi Matsuoka
:
Revisiting Temporal Blocking Stencil Optimizations. 251-263 - Jou-An Chen

, Hsin-Hsuan Sung
, Xipeng Shen
, Sutanay Choudhury
, Ang Li
:
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs. 264-276 - Shaofeng Yang

, Xiandong Liu
, Yunting Wang
, Xin He
, Guangming Tan
:
Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph. 277-288 - Vani Nagarajan

, Durga Mandarapu
, Milind Kulkarni
:
RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search. 289-300
Large Scale Applications II
- Srinivas Eswar

, Benjamin Cobb
, Koby Hayashi
, Ramakrishnan Kannan
, Grey Ballard
, Richard W. Vuduc
, Haesun Park
:
Distributed-Memory Parallel JointNMF. 301-312 - Yu Chen

, Lucca Skon
, James R. McCombs
, Zhenming Liu
, Andreas Stathopoulos
:
Parallel Software for Million-scale Exact Kernel Regression. 313-323 - Chengming Zhang

, Shaden Smith
, Baixi Sun
, Jiannan Tian
, Jonathan Soifer
, Xiaodong Yu
, Shuaiwen Leon Song
, Yuxiong He
, Dingwen Tao
:
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. 324-335 - Anqi Guo

, Yuchen Hao
, Chunshu Wu
, Pouya Haghi
, Zhenyu Pan
, Min Si
, Dingwen Tao
, Ang Li
, Martin C. Herbordt
, Tong Geng
:
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training. 336-347
Architecture and Interconnects
- Boyuan Zhang

, Jiannan Tian
, Sheng Di
, Xiaodong Yu
, Martin Swany
, Dingwen Tao
, Franck Cappello
:
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs. 348-359 - Shixun Wu

, Yujia Zhai
, Jinyang Liu
, Jiajun Huang
, Zizhe Jian
, Bryan M. Wong
, Zizhong Chen
:
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs. 360-372 - Marcin Copik

, Roman Böhringer
, Alexandru Calotoiu
, Torsten Hoefler
:
FMI: Fast and Cheap Message Passing for Serverless Functions. 373-385 - Maulein Pathak

, Yogish Sabharwal
, Neelima Gupta
:
Scalable algorithms for compact spanners on real world graphs. 386-397 - Tun Chen

, Haipeng Jia
, Yunquan Zhang
, Kun Li
, Zhihao Li
, Xiang Zhao
, Jianyu Yao
, Chendi Li
:
OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs. 398-409
Parallel Algorithms
- Grigory Chirkov

, David Wentzlaff
:
Seizing the Bandwidth Scaling of On-Package Interconnect in a Post-Moore's Law World. 410-422 - Ruiqi Wang

, Dezun Dong
, Fei Lei
, Junchao Ma
, Ke Wu
, Kai Lu
:
Roar: A Router Microarchitecture for In-network Allreduce. 423-436 - Guangnan Feng

, Dezun Dong
, Shizhen Zhao
, Yutong Lu
:
GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC. 437-449 - Pouya Haghi

, William Krska
, Cheng Tan
, Tong Geng
, Po Hao Chen
, Connor Greenwood
, Anqi Guo
, Thomas M. Hines
, Chunshu Wu
, Ang Li
, Anthony Skjellum
, Martin C. Herbordt
:
FLASH: FPGA-Accelerated Smart Switches with GCN Case Study. 450-462 - Gagandeep Singh

, Alireza Khodamoradi
, Kristof Denolf
, Jack Lo
, Juan Gómez-Luna
, Joseph Melber
, Andra Bisca
, Henk Corporaal
, Onur Mutlu
:
SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation. 463-476 - Nicholas Contini

, Bharath Ramesh
, Kaushik Kandadi Suresh
, Tu Tran
, Benjamin Michalowicz
, Mustafa Abduljabbar
, Hari Subramoni
, Dhabaleswar K. Panda
:
Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication. 477-487

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














