


default search action
ICS 2022: Virtual Event
- Lawrence Rauchwerger, Kirk W. Cameron, Dimitrios S. Nikolopoulos, Dionisios N. Pnevmatikatos:

ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28 - 30, 2022. ACM 2022, ISBN 978-1-4503-9281-5
Tools and modeling (I)
- Keren Zhou

, Jonathon M. Anderson, Xiaozhu Meng, John M. Mellor-Crummey
:
Low overhead and context sensitive profiling of CPU-accelerated applications. 1:1-1:13 - Hossein Golestani

, Rathijit Sen, Vinson Young, Gagan Gupta:
Calipers: a criticality-aware framework for modeling processor performance. 2:1-2:14 - Larissa Schmid

, Marcin Copik
, Alexandru Calotoiu, Dominik Werle, Andreas Reiter, Michael Selzer
, Anne Koziolek, Torsten Hoefler:
Performance-detective: automatic deduction of cheap and accurate performance models. 3:1-3:13
New hardware technologies
- Cheng Tan, Thierry Tambe

, Jeff Jun Zhang, Bo Fang, Tong Geng, Gu-Yeon Wei, David Brooks, Antonino Tumeo, Ganesh Gopalakrishnan, Ang Li:
ASAP: automatic synthesis of area-efficient and precision-aware CGRAs. 4:1-4:13 - Zixuan Ma

, Haojie Wang, Guanyu Feng, Chen Zhang, Lei Xie, Jiaao He, Shengqi Chen, Jidong Zhai:
Efficiently emulating high-bitwidth computation with low-bitwidth hardware. 5:1-5:12 - Daeyoung Park, Heehoon Kim, Jinpyo Kim, Taehyun Kim

, Jaejin Lee:
SnuQS: scaling quantum circuit simulation using storage devices. 6:1-6:13 - Ardhi Wiratama Baskara Yudha, Jake Meyer, Shougang Yuan, Huiyang Zhou

, Yan Solihin:
LITE: a low-cost practical inter-operable GPU TEE. 7:1-7:13
Graph processing
- Hans Vandierendonck

:
Software-defined floating-point number formats and their application to graph processing. 8:1-8:17 - Mohsen Koohi Esfahani

, Peter Kilpatrick
, Hans Vandierendonck
:
MASTIFF: structure-aware minimum spanning tree/forest. 9:1-9:13 - Zhuoran Ji

, Cho-Li Wang:
Efficient exact K-nearest neighbor graph construction for billion-scale datasets using GPUs with tensor cores. 10:1-10:12 - Heng Zhang, Lingda Li, Hang Liu, Donglin Zhuang, Rui Liu, Chengying Huan, Shuang Song, Dingwen Tao

, Yongchao Liu
, Charles He, Yanjun Wu, Shuaiwen Leon Song:
Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems. 11:1-11:14
I/O and communication
- Chengming Zhang

, Sian Jin
, Tong Geng, Jiannan Tian
, Ang Li, Dingwen Tao
:
CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression. 12:1-12:13 - Mingzhe Liu, Haikun Liu, Chencheng Ye, Xiaofei Liao, Hai Jin, Yu Zhang, Ran Zheng, Liting Hu:

Towards low-latency I/O services for mixed workloads using ultra-low latency SSDs. 13:1-13:12 - Guangnan Feng

, Dezun Dong, Yutong Lu:
Optimized MPI collective algorithms for dragonfly topology. 14:1-14:11
Compilers
- Adhitha Dias

, Kirshanthan Sundararajah
, Charitha Saumya, Milind Kulkarni:
SparseLNR: accelerating sparse tensor computations using loop nest restructuring. 15:1-15:14 - Sharjeel Khan

, Bodhisatwa Chatterjee
, Santosh Pande
:
VICO: demand-driven verification for improving compiler optimizations. 16:1-16:14 - Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad

, Torsten Hoefler:
Lifting C semantics for dataflow optimization. 17:1-17:13
Algorithms and accelerators
- Jinpyo Kim, Hyungdal Kwon, Jintaek Kang, Jihwan Park, Seungwook Lee, Jaejin Lee:

SnuHPL: high performance LINPACK for heterogeneous GPUs. 18:1-18:12 - Kamalakkannan Kamalavasan, Gihan R. Mudalige

, István Z. Reguly, Suhaib A. Fahmy:
High throughput multidimensional tridiagonal system solvers on FPGAs. 19:1-19:12 - André Müller, Bertil Schmidt, Richard Membarth, Roland Leißa

, Sebastian Hack:
AnySeq/GPU: a novel approach for faster sequence alignment on GPUs. 20:1-20:11 - Mohammad Almasri, Izzat El Hajj, Rakesh Nagi

, Jinjun Xiong
, Wen-Mei Hwu:
Parallel K-clique counting on GPUs. 21:1-21:14
Memory systems and memory management
- Apostolos Kokolis, Namrata Mantri, Shrikanth Ganapathy, Josep Torrellas, John Kalamatianos:

Cloak: tolerating non-volatile cache read latency. 22:1-22:13 - Hugo Tárrega, Alejandro Valero

, Vicente Lorente
, Salvador Petit, Julio Sahuquillo:
Fast-track cache: a huge racetrack memory L1 data cache. 23:1-23:12 - Minh Pham, Hao Li, Yongke Yuan, Chengcheng Mou, Kandethody Ramachandran, Zichen Xu, Yicheng Tu:

Dynamic memory management in massively parallel systems: a case on GPUs. 24:1-24:13 - Zhongzhe Hu, Junmin Xiao, Zheye Deng, Mingyi Li, Kewei Zhang, Xiaoyang Zhang

, Ke Meng, Ninghui Sun, Guangming Tan:
MegTaiChi: dynamic tensor-based memory management optimization for DNN training. 25:1-25:13
Dense and sparse linear algebra
- Andy Nguyen, Ahmed E. Helal, Fabio Checconi, Jan Laukemann

, Jesmin Jahan Tithi, Yongseok Soh, Teresa M. Ranadive, Fabrizio Petrini, Jee W. Choi:
Efficient, out-of-memory sparse MTTKRP on massively parallel architectures. 26:1-26:13 - Serif Yesil, José E. Moreira, Josep Torrellas:

Dense dynamic blocks: optimizing SpMM for processors with vector and matrix units using machine learning techniques. 27:1-27:14 - Xiaoyan Liu, Yi Liu, Hailong Yang, Jianjin Liao, Mingzhen Li, Zhongzhi Luan, Depei Qian:

Toward accelerated stencil computation by adapting tensor core unit on GPU. 28:1-28:12
Applications
- Taha Shahroodi, Mahdi Zahedi, Abhairaj Singh, Stephan Wong, Said Hamdioui:

KrakenOnMem: a memristor-augmented HW/SW framework for taxonomic profiling. 29:1-29:14 - Bagus Hanindhito, Dimitrios Gourounas, Arash Fathi, Dimitar Trenev, Andreas Gerstlauer, Lizy K. John:

GAPS: GPU-acceleration of PDE solvers for wave simulation. 30:1-30:13 - Arthur Francisco Lorenzon, Sandro Matheus V. N. Marques, Antoni C. Navarro, Vicenç Beltran:

Seamless optimization of the GEMM kernel for task-based programming models. 31:1-31:11
Tools and modeling (II)
- Wesley Smith

, Aidan Goldfarb, Chen Ding
:
Beyond time complexity: data movement complexity analysis for matrix multiplication. 32:1-32:12 - Andreas Abel, Jan Reineke:

uiCA: accurate throughput prediction of basic blocks on recent intel microarchitectures. 33:1-33:14 - Jonathon M. Anderson

, Yumeng Liu
, John M. Mellor-Crummey
:
Preparing for performance analysis at exascale. 34:1-34:13 - Khalid Ayedh Alharthi, Arshad Jhumka, Sheng Di, Franck Cappello:

Clairvoyant: a log-based transformer-decoder for failure prediction in large-scale systems. 35:1-35:14
Machine learning
- Oliver Rausch, Tal Ben-Nun, Nikoli Dryden, Andrei Ivanov, Shigang Li, Torsten Hoefler:

A data-centric optimization framework for machine learning. 36:1-36:13 - Shulai Zhang, Weihao Cui, Quan Chen, Zhengnian Zhang, Yue Guan, Jingwen Leng, Chao Li, Minyi Guo:

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences. 37:1-37:12 - Jiangsu Du, Jiazhi Jiang, Yang You, Dan Huang, Yutong Lu:

Handling heavy-tailed input of transformer inference on GPUs. 38:1-38:11 - Shihui Song, Peng Jiang:

Rethinking graph data placement for graph neural network training on multiple GPUs. 39:1-39:10

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














