


default search action
ICS 2021: Virtual Event, USA
- Huiyang Zhou, Jose Moreira, Frank Mueller, Yoav Etsion:

ICS '21: 2021 International Conference on Supercomputing, Virtual Event, USA, June 14-17, 2021. ACM 2021, ISBN 978-1-4503-8335-6
Loop optimizations
- Brandon Neth, Thomas R. W. Scogland, Bronis R. de Supinski, Michelle Mills Strout:

Inter-loop optimization in RAJA using loop chains. 1-12 - Khaled Abdelaal, Martin Kong

:
Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation. 13-26 - Kumudha Narasimhan

, Aravind Acharya, Abhinav Baid, Uday Bondhugula:
A practical tile size selection model for affine loop nests. 27-39
Program analysis and benchmarking
- Wenwen Wang, Pei-Hung Lin

:
Does it matter?: OMPSanitizer: an impact analyzer of reported data races in OpenMP programs. 40-51 - Xin Zhao, Jin Zhou, Hui Guan, Wei Wang

, Xu Liu, Tongping Liu:
NumaPerf: predictive NUMA profiling. 52-62 - Alexandros Nikolaos Ziogas

, Tal Ben-Nun, Timo Schneider, Torsten Hoefler:
NPBench: a benchmarking suite for high-performance NumPy. 63-74 - Xiaofan Sun, Rajiv Gupta

:
DSGEN: concolic testing GPU implementations of concurrent dynamic data structures. 75-87
Managing parallelism
- Seonmyeong Bak, Oscar R. Hernandez, Mark Gates

, Piotr Luszczek, Vivek Sarkar
:
Task-graph scheduling extensions for efficient synchronization and communication. 88-101 - Amirhossein Mirhosseini, Thomas F. Wenisch:

μSteal: a theory-backed framework for preemptive work and resource stealing in mixed-criticality microservices. 102-114 - Hongshi Tan, Xinyu Chen, Yao Chen

, Bingsheng He, Weng-Fai Wong
:
ThundeRiNG: generating multiple independent random number sequences on FPGAs. 115-126
Resilience and security
- Yujia Zhai, Elisabeth Giem, Quan Fan, Kai Zhao

, Jinyang Liu
, Zizhong Chen:
FT-BLAS: a high performance BLAS implementation with online fault tolerance. 127-138 - Shougang Yuan, Yan Solihin, Huiyang Zhou:

PSSM: achieving secure memory for GPUs with partitioned and sectored security metadata. 139-151
New architectures for HPC
- Yaoyang Zhou, Zihao Yu, Chuanqi Zhang, Yinan Xu, Huizhe Wang, Sa Wang, Ninghui Sun, Yungang Bao:

Omegaflow: a high-performance dependency-based architecture. 152-163 - Adrián Barredo, Adrià Armejach, Jonathan C. Beard, Miquel Moretó

:
PLANAR: a programmable accelerator for near-memory data rearrangement. 164-176 - Markos Kynigos, Jose Antonio Pascual, Javier Navaridas

, John Goodacre, Mikel Luján:
Power and energy efficient routing for Mach-Zehnder interferometer based photonic switches. 177-189
Exploiting non-volatile memory
- Jiawen Liu, Dong Li, Roberto Gioiosa, Jiajia Li

:
Athena: high-performance sparse tensor contraction sequence on heterogeneous memory. 190-202 - Jie Ren

, Jiaolin Luo, Ivy Bo Peng
, Kai Wu, Dong Li:
Optimizing large-scale plasma simulations on persistent memory-based heterogeneous memory with effective data placement across memory hierarchy. 203-214 - Zhen Xie, Wenqian Dong

, Jie Liu, Ivy Bo Peng
, Yanbao Ma, Dong Li:
MD-HM: memoization-based molecular dynamics simulations on big memory system. 215-226
Machine learning
- Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang, Dong Li:

Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators. 227-241 - Yuliana Zamora, Logan T. Ward, Ganesh Sivaraman

, Ian T. Foster, Henry Hoffmann:
Proxima: accelerating the integration of machine learning in atomistic simulations. 242-253 - Gunduz Vehbi Demirci, Hakan Ferhatosmanoglu

:
Partitioning sparse deep neural networks for scalable training and inference. 254-265 - Chengming Zhang

, Geng Yuan, Wei Niu
, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang
, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao
:
ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning. 266-278 - Rohan Baskar Prabhakar, Sachit Kuhar, Rohit Agrawal, Christopher J. Hughes

, Christopher W. Fletcher:
SumMerge: an efficient algorithm and implementation for weight repetition-aware DNN inference. 279-290 - MohammadHossein Olyaiy, Christopher Ng, Mieszko Lis:

Accelerating DNNs inference with predictive layer fusion. 291-303 - Siling Yang

, Weijian Chen, Xuechen Zhang, Shuibing He, Yanlong Yin
, Xian-He Sun:
AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator. 304-315
Data locality and vectorization
- Peng Chen

, Mohamed Wahib, Xiao Wang
, Shin'ichiro Takizawa, Takahiro Hirofuchi, Hirotaka Ogawa, Satoshi Matsuoka:
Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations. 316-328 - Doru-Thom Popovici, Andrew Canning, Zhengji Zhao, Lin-Wang Wang, John Shalf

:
A systematic approach to improving data locality across Fourier transforms and linear algebra operations. 329-341
Algorithms adapting to high-performance networks
- Archit Patke, Saurabh Jha, Haoran Qiu

, Jim M. Brandt, Ann C. Gentile, Joe Greenseid, Zbigniew Kalbarczyk, Ravishankar K. Iyer:
Delay sensitivity-driven congestion mitigation for HPC systems. 342-353 - Xiaodong Yu, Tekin Bicer, Rajkumar Kettimuthu, Ian T. Foster:

Topology-aware optimizations for multi-GPU ptychographic image reconstruction. 354-366
Graph data structures and algorithms
- Xuan Huang

, Pavol Klacansky, Steve Petruzza, Attila Gyulassy, Peer-Timo Bremer
, Valerio Pascucci:
Distributed merge forest: a new fast and scalable approach for topological analysis at scale. 367-377 - Xuhao Chen

, Roshan Dathathri, Gurbinder Gill, Loc Hoang, Keshav Pingali:
Sandslash: a two-level framework for efficient graph pattern mining. 378-391
Parallelization constrained by data dependencies
- Akshay Bhosale

, Rudolf Eigenmann:
On the automatic parallelization of subscripted subscript patterns using array property analysis. 392-403 - Ahmed E. Helal, Jan Laukemann

, Fabio Checconi, Jesmin Jahan Tithi, Teresa M. Ranadive, Fabrizio Petrini, Jeewhan Choi:
ALTO: adaptive linearized storage of sparse tensors. 404-416 - Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, Depei Qian:

An optimized tensor completion library for multiple GPUs. 417-430 - Oguz Selvitopi, Benjamin Brock, Israt Nisa, Alok Tripathy, Katherine A. Yelick

, Aydin Buluç
:
Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication. 431-442
Best paper candidates
- Chen Zhang

, Zeyu Song, Haojie Wang, Kaiyuan Rong, Jidong Zhai:
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU. 443-454 - Thomas Randall

, Tyler N. Allen
, Rong Ge:
FULL-W2V: fully exploiting data reuse for W2V on GPU-accelerated systems. 455-466 - Nader Al Awar, Steven Zhu, George Biros, Milos Gligoric:

A performance portability framework for Python. 467-478 - Mazen Al-Wadi, Aziz Mohaisen

, Amro Awad
:
ProMT: optimizing integrity tree updates for write-intensive pages in secure NVMs. 479-490

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














