


default search action
23rd CLUSTER 2021: Portland, OR, USA
- IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021. IEEE 2021, ISBN 978-1-7281-9666-4

- Bin-Rui Li, Shenggan Cheng

, James Lin:
tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores. 1-11 - W. Pepper Marts, Matthew G. F. Dosanjh, Scott Levy, Whit Schonbein

, Ryan E. Grant, Patrick G. Bridges
:
MiniMod: A Modular Miniapplication Benchmarking Framework for HPC. 12-22 - Daniel Rosendo

, Alexandru Costan, Gabriel Antoniu, Matthieu Simonin, Jean-Christophe Lombardo
, Alexis Joly, Patrick Valduriez:
Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum. 23-34 - Bing Xie

, Qiang Cao, Mayuresh Kunjir, Linli Wan, Jeffrey S. Chase, Anirban Mandal
, Mats Rynge:
WIRE: Resource-efficient Scaling with Online Prediction for DAG-based Workflows. 35-46 - Zihan Jiang, Wanling Gao, Fei Tang, Lei Wang, Xingwang Xiong, Chunjie Luo, Chuanxin Lan, Hongxiao Li, Jianfeng Zhan:

HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI Systems. 47-58 - Wenyan Chen, Chengzhi Lu, Kejiang Ye, Yang Wang, Cheng-Zhong Xu:

RPTCN: Resource Prediction for High-dynamic Workloads in Clouds based on Deep Learning. 59-69 - Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux:

READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling. 70-81 - Hongyuan Liu

, Bogdan Nicolae
, Sheng Di, Franck Cappello, Adwait Jog:
Accelerating DNN Architecture Search at Scale Using Selective Weight Transfer. 82-93 - Jing Cao

, Zongwei Zhu, Xuehai Zhou:
SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous Clusters. 94-102 - Lizhi Zhang, Zhiquan Lai, Shengwei Li, Yu Tang, Feng Liu, Dongsheng Li:

2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters. 103-113 - Jaime Cernuda Garcia, Hariharan Devarajan

, Luke Logan, Keith Bateman, Neeraj Rajesh, Jie Ye, Anthony Kougkas, Xian-He Sun:
HFlow: A Dynamic and Elastic Multi-Layered I/O Forwarder. 114-124 - Peng Xu, Nannan Zhao, Jiguang Wan, Wei Liu, Shuning Chen

, Yuanhui Zhou, Hadeel Albahar
, Hanyang Liu, Liu Tang, Changsheng Xie:
Building A Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud Storage. 125-134 - Ovidiu-Cristian Marcu, Alexandru Costan, Bogdan Nicolae

, Gabriel Antoniu:
Virtual Log-Structured Storage for High-Performance Streaming. 135-145 - Pradeep Subedi, Philip E. Davis, Manish Parashar:

RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows. 146-156 - Hanchen Guo, Zhehan Lin, Yunfei Gu, Chentao Wu, Li Jiang, Jie Li, Guangtao Xue, Minyi Guo:

Lazy-WL: A Wear-aware Load Balanced Data Redistribution Method for Efficient SSD Array Scaling. 157-168 - Frederic Schimmelpfennig, Marc-André Vef, Reza Salkhordeh, Alberto Miranda, Ramon Nou

, André Brinkmann:
Streamlining distributed Deep Learning I/O with ad hoc file systems. 169-180 - Hao Wu, Jiangming Jin, Jidong Zhai, Yifan Gong, Wei Liu:

Accelerating GPU Message Communication for Autonomous Navigation Systems. 181-191 - Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Xiaoyan Liu, Ming Dun, Zhongzhi Luan, Depei Qian:

csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs. 192-203 - Patrick Diehl

, Gregor Daiß
, Dominic Marcello, Kevin A. Huck
, Sagiv Shiber, Hartmut Kaiser
, Juhan Frank
, Geoffrey C. Clayton, Dirk Pflüger
:
Octo-Tiger's New Hydro Module and Performance Using HPX+CUDA on ORNL's Summit. 204-214 - Manasi Tiwari, Sathish Vadhiyar:

Pipelined Preconditioned s-step Conjugate Gradient Methods for Distributed Memory Systems. 215-225 - Mohsen Koohi Esfahani

, Peter Kilpatrick
, Hans Vandierendonck
:
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs. 226-237 - Jonathan Lifflander, Nicole Lemaster Slattengren, Philippe P. Pébaÿ, Phil Miller, Francesco Rizzi

, Matthew T. Bettencourt:
Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance. 238-249 - Hrushit Parikh, Vinit Deodhar, Ada Gavrilovska, Santosh Pande

:
Distributed Work Stealing at Scale via Matchmaking. 250-260 - Dominik Scheinert, Lauritz Thamsen, Houkun Zhu, Jonathan Will, Alexander Acker, Thorsten Wittkopp, Odej Kao:

Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts. 261-270 - Ping Chen

, Shuibing He, Xuechen Zhang, Shuaiben Chen, Peiyi Hong, Yanlong Yin
, Xian-He Sun, Gang Chen:
CSWAP: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs. 271-282 - Jiannan Tian

, Sheng Di, Xiaodong Yu, Cody Rivera
, Kai Zhao
, Sian Jin, Yunhe Feng, Xin Liang, Dingwen Tao
, Franck Cappello:
Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs. 283-293 - Jinyang Liu

, Sheng Di, Kai Zhao
, Sian Jin
, Dingwen Tao
, Xin Liang, Zizhong Chen, Franck Cappello:
Exploring Autoencoder-based Error-bounded Compression for Scientific Data. 294-306 - Xiaodong Yu, Sheng Di, Ali Murat Gok, Dingwen Tao

, Franck Cappello:
cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions. 307-319 - Jialing Zhang, Jiaxi Chen, Xiaoyan Zhuo, Aekyeung Moon, Seung Woo Son:

DPZ: Improving Lossy Compression Ratio with Information Retrieval on Scientific Data. 320-331 - Subhadeep Bhattacharya, Weikuan Yu

, Fahim Tahmid Chowdhury, Kathryn M. Mohror:
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging. 332-343 - Nicholas O. Malott, Rishi R. Verma, Rohit P. Singh

, Philip A. Wilsey:
Distributed Computation of Persistent Homology from Partitioned Big Data. 344-354 - Dalin Wang, Feng Zhang, Weitao Wan, Hourun Li, Xiaoyong Du:

FineQuery: Fine-Grained Query Processing on CPU-GPU Integrated Architectures. 355-365 - Shoichi Hirasawa, Hayato Yamaki, Michihiro Koibuchi:

Packet Forwarding Cache of Commodity Switches for Parallel Computers. 366-376 - Megan Grodowitz, Luis E. Peña, Curtis Dunham, Dong Zhong, Pavel Shamis, Steve Poole:

Two-Chains: High Performance Framework for Function Injection and Execution. 377-387 - Wei Liu, Haikun Liu, Xiaofei Liao, Hai Jin, Yu Zhang:

HNGraph: Parallel Graph Processing in Hybrid Memory Based NUMA Systems. 388-397 - Hoang-Dung Do, Valérie Hayot-Sasson, Rafael Ferreira da Silva

, Christopher Steele, Henri Casanova
, Tristan Glatard
:
Modeling the Linux page cache for accurate simulation of data-intensive applications. 398-408 - Bo Fang, Daoce Wang

, Sian Jin, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna
, Sriram Krishnamoorthy, Dingwen Tao
:
Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights. 409-420 - Kurt B. Ferreira, Scott Levy, Victor Kuhns, Nathan DeBardeleben, Sean Blanchard:

Understanding the Effects of DRAM Correctable Error Logging at Scale. 421-432 - Kun Suo, Junggab Son, Dazhao Cheng, Wei Chen, Sabur Baidya

:
Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing. 433-443 - Matthew Wolf, Jeremy Logan, Kshitij Mehta, Daniel A. Jacobson, Mikaela Cashman, Angelica M. Walker

, Greg Eisenhauer, Patrick M. Widener
, Ashley Cliff:
Reusability First: Toward FAIR Workflows. 444-455 - Teng Ma

, Kang Chen, Shaonan Ma, Zhuo Song, Yongwei Wu:
Thinking More about RDMA Memory Semantics. 456-467 - Tapasya Patki, Adam Bertsch, Ian Karlin, Dong H. Ahn, Brian Van Essen, Barry Rountree, Bronis R. de Supinski, Nathan Besaw:

Monitoring Large Scale Supercomputers: A Case Study with the Lassen Supercomputer. 468-480 - Arnab Das, Tanmay Tirpankar, Ganesh Gopalakrishnan, Sriram Krishnamoorthy

:
Robustness Analysis of Loop-Free Floating-Point Programs via Symbolic Automatic Differentiation. 481-491 - Elvis Rojas, Diego Pérez, Jon C. Calhoun, Leonardo Bautista-Gomez, Terry R. Jones

, Esteban Meneses:
Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration. 492-503 - Edgar A. León, Marc Joos, Nathan Hanford

, Adrien Cotte, Tony Delforge, François Diakhaté, Vincent Ducrot, Ian Karlin, Marc Pérache
:
On-the-Fly, Robust Translation of MPI Libraries. 504-515 - Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, Pavan Balaji:

Daps: A Dynamic Asynchronous Progress Stealing Model for MPI Communication. 516-527 - Kevin Sala

, Sandra Macià
, Vicenç Beltran:
Combining One-Sided Communications with Task-Based Programming Models. 528-541 - Wanrong Gao

, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang:
Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures. 542-552 - Yuetsu Kodama, Masaaki Kondo, Mitsuhisa Sato:

Evaluation of SPEC CPU and SPEC OMP on the A64FX. 553-561 - Robert Schöne

, Thomas Ilsche, Mario Bielert, Markus Velten
, Markus Schmidl, Daniel Hackenberg
:
Energy Efficiency Aspects of the AMD Zen 2 Architecture. 562-571 - Julita Corbalán, Oriol Vidal, Lluis Alonso

, Jordi Aneas:
Explicit uncore frequency scaling for energy optimisation policies with EAR in Intel architectures. 572-581 - Robert Schöne

, Markus Schmidl, Mario Bielert, Daniel Hackenberg
:
FIRESTARTER 2: Dynamic Code Generation for Processor Stress Tests. 582-590 - Stefan A. Robila

, David Grant, Chris DePrater, Vali Sorell, Terry L. Rodgers, David Martinez, Shlomo Novotny:
Cooling the Data Center: Design of a Mechanical Controls Owner Project Requirements (OPR) Template. 591-595 - Alessio Netti, Woong Shin

, Michael Ott, Torsten Wilde, Natalie J. Bates:
A Conceptual Framework for HPC Operational Data Analytics. 596-603 - Thomas Jakobsche, Nicolas Lachiche

, Aurélien Cavelan, Florina M. Ciorba
:
An Execution Fingerprint Dictionary for HPC Application Recognition. 604-608 - Ashish Pal, Preeti Malakar:

An Integrated Job Monitor, Analyzer and Predictor. 609-617 - Kenneth Lamar, Alexander V. Goponenko, Christina L. Peterson, Benjamin A. Allan

, Jim M. Brandt, Damian Dechev:
Backfilling HPC Jobs with a Multimodal-Aware Predictor. 618-622 - Louise Harding, Fabien Wernli, Frédéric Suter

:
Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log Messages. 623-631 - Chengcheng Li, Ahmad Maroof Karimi, Woong Shin, Hairong Qi, Feiyi Wang:

The Challenge of Disproportionate Importance of Temporal Features in Predicting HPC Power Consumption. 632-636 - Shantenu Jha

, Allen D. Malony:
Dynamic and Adaptive Monitoring and Analysis for Many-task Ensemble Computing. 637-641 - Jie Yin, Atsushi Hori, Balazs Gerofi, Yutaka Ishikawa:

A Scalability Study of Data Exchange in HPC Multi-component Workflows. 642-648 - Ricardo Macedo

, Cláudia Correia, Marco Dantas, Cláudia Brito
, Weijia Xu, Yusuke Tanimura, Jason Haga, João Paulo
:
The Case for Storage Optimization Decoupling in Deep Learning Frameworks. 649-656 - Marco Dantas, Diogo Leitão, Cláudia Correia, Ricardo Macedo

, Weijia Xu, João Paulo
:
MONARCH: Hierarchical Storage Management for Deep Learning Frameworks. 657-663 - Luke Logan, Jay F. Lofstead

, Scott Levy, Patrick M. Widener
, Xian-He Sun, Anthony Kougkas:
pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory. 664-670 - Sarah Neuwirth

, Arnab Kumar Paul
:
Parallel I/O Evaluation Techniques and Emerging HPC Workloads: A Perspective. 671-679 - Yuzhen Liu, Oana Marin:

Special function neural network (SFNN) models. 680-685 - Yuuichi Asahi, Sora Hatayama, Takashi Shimokawabe, Naoyuki Onodera, Yuta Hasegawa

, Yasuhiro Idomura:
AMR-Net: Convolutional Neural Networks for Multi-resolution Steady Flow Prediction. 686-691 - Xavier Aguilar, Stefano Markidis:

A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations. 692-697 - Li Zhong, Dennis Hoppe, Naweiluo Zhou

, Oleksandr Shcherbakov:
Hybrid workflow of Simulation and Deep Learning on HPC: A Case Study for Material Behavior Determination. 698-704 - Martin Svedin, Artur Podobas, Steven Wei Der Chien, Stefano Markidis:

Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain. 705-710 - Md Abdullah Shahneous Bari, Barbara M. Chapman, Anthony Curtis, Robert J. Harrison, Eva Siegmann

, Nikolay A. Simakov
, Matthew D. Jones:
A64FX performance: experience on Ookami. 711-718 - Sarat Sreepathi

, Mark Taylor:
Early Evaluation of Fugaku A64FX Architecture Using Climate Workloads. 719-727 - Miwako Tsuji

, Mitsuhisa Sato:
Performance Evaluation and Analysis of A64FX many-core Processor for the Fiber Miniapp Suite. 728-735 - Jens Domke:

A64FX - Your Compiler You Must Decide! 736-740 - Fabio Banchelli, Kilian Peiro, Guillem Ramirez-Gargallo, Joan Vinyals

, David Vicente, Marta Garcia-Gasulla, Filippo Mantovani:
Cluster of emerging technology: evaluation of a production HPC system based on A64FX. 741-750 - Jérôme Gurhem, Maxence Vandromme, Miwako Tsuji

, Serge G. Petiton, Mitsuhisa Sato:
Sequences of Sparse Matrix-Vector Multiplication on Fugaku's A64FX processors. 751-758 - Karl F. A. Friebel

, Stephanie Soldavini
, Gerald Hempel, Christian Pilato, Jerónimo Castrillón:
From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics. 759-766 - Nick Brown

:
Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs. 767-774 - Nick Brown

, Mark Klaisoongnoen
, Oliver Thomson Brown:
Optimisation of an FPGA Credit Default Swap engine by embracing dataflow techniques. 775-778 - Brad Green

, Dillon Todd, Jon C. Calhoun, Melissa C. Smith:
TIGRA: A Tightly Integrated Generic RISC-V Accelerator Interface. 779-782 - Norihisa Fujita, Ryohei Kobayashi

, Yoshiki Yamaguchi, Taisuke Boku:
HBM2 Memory System for HPC Applications on an FPGA. 783-786 - Takaaki Miyajima, Kentaro Sano:

A memory bandwidth improvement with memory space partitioning for single-precision floating-point FFT on Stratix 10 FPGA. 787-790 - Naoya Umezu, Yoshiki Yamaguchi, Taisuke Boku:

An FPGA-based storage control with load balancing. 791-794 - Yuting Li

, Yun Xu, Xuehai Zhou:
CVFCC: CV-Based Framework for Container Consolidation in Cloud Data Centers. 795-796 - Sahil Sharma, Zhiling Lan, Xingfu Wu

, Valerie Taylor:
A Dynamic Power Capping Library for HPC Applications. 797-798 - Shaoheng Luo, Lei Wang, Yufeng Liu, Changhai Zhao, Xudong Zhang:

SDIS: A PB-level seismic data index system with ML methods. 799-800 - Iker Martín-Álvarez

, José Ignacio Aliaga, María Isabel Castillo, Rafael Mayo, Sergio Iserte:
Malleability Implementation in a MPI Iterative Method. 801-802 - Chen Zou, Andrew A. Chien, Robert W. Gardner, Ilija Vukotic:

Computational Storage to Increase the Analysis Capability of Tier-2 HEP Data Sites. 803-804 - Chan-Gyu Lee, Hyun-Wook Jin:

NUMA-aware I/O System Call Steering. 805-806 - Michela Taufer

, Ewa Deelman, Rafael Ferreira da Silva
, Trilce Estrada
, Mary W. Hall
, Miron Livny:
A Roadmap to Robust Science for High-throughput Applications: The Developers' Perspective. 807-808 - Menuka Warushavithana, Saptashwa Mitra, Mazdak Arabi, F. Jay Breidt

, Sangmi Lee Pallickara, Shrideep Pallickara:
A Transfer Learning Scheme for Time Series Forecasting Using Facebook Prophet. 809-810 - Yuyang Wang, Fei Lei, Dezun Dong:

Exploring Node Connection Modes in Multi-Rail Fat-tree. 811-812 - Changhong Wang, Dezun Dong, Zicong Wang

, Xiaoyun Zhang, Zhenyu Zhao:
RELAR: A Reinforcement Learning Framework for Adaptive Routing in Network-on-Chips. 813-814 - Saptashwa Mitra, Daniel Rammer, Shrideep Pallickara, Sangmi Lee Pallickara:

A Generative Approach to Visualizing Satellite Data. 815-816 - Lukas Reitz:

Load Balancing Policies for Nested Fork-Join. 817-818 - Xiaoliang Wang, Jianchuan Li, Peiquan Jin, Kuankuan Guo, Yuanjin Lin, Ming Zhao:

Supporting Elastic Compaction of LSM-tree with a FaaS Cluster. 819-820 - Gábor Dániel Balogh, István Z. Reguly:

Automatic Parallelisation of Sturctured Mesh Computations with SYCL. 821-822 - Kevin D. Colby, Shawn Rice:

Halcyon: Unified HPC Center Operations. 823-824 - Keshi Ge, Yiming Zhang, Yongquan Fu

, Zhiquan Lai, Xiaoge Deng, Dongsheng Li:
CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient Quantization. 825-826 - Sarah Neuwirth

:
Toward a Comprehensive Benchmark Suite for Evaluating GASPI in HPC Environments. 827-828 - Trokon Johnson, Herman Lam:

Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation. 829-830

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














