


default search action
ACM Transactions on Architecture and Code Optimization, Volume 16
Volume 16, Number 1, March 2019
- Ghassan Shobaki

, Austin Kerbow
, Christopher Pulido, William Dobson:
Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling. 1:1-1:30 - Yu-Ping Liu, Ding-Yong Hong

, Jan-Jan Wu, Sheng-Yu Fu, Wei-Chung Hsu:
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation. 2:1-2:24 - Mohammad Sadrosadati, Seyed Borna Ehsani

, Hajar Falahati, Rachata Ausavarungnirun, Arash Tavakkol, Mojtaba Abaee
, Lois Orosa
, Yaohua Wang, Hamid Sarbazi-Azad, Onur Mutlu
:
ITAP: Idle-Time-Aware Power Management for GPU Execution Units. 3:1-3:26 - Halit Dogan, Masab Ahmad, Brian Kahne, Omer Khan:

Accelerating Synchronization Using Moving Compute to Data Model at 1, 000-core Multicore Scale. 4:1-4:27 - Leonid Azriel

, Lukas Humbel
, Reto Achermann, Alex Richardson
, Moritz Hoffmann, Avi Mendelson, Timothy Roscoe, Robert N. M. Watson, Paolo Faraboschi, Dejan S. Milojicic
:
Memory-Side Protection With a Capability Enforcement Co-Processor. 5:1-5:26 - Aamer Jaleel, Eiman Ebrahimi, Sam Duncan:

DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems. 6:1-6:24
Volume 16, Number 2, May 2019
- Yemao Xu, Dezun Dong, Weixia Xu, Xiangke Liao:

SketchDLC: A Sketch on Distributed Deep Learning Communication via Trace Capturing. 7:1-7:26 - Aristeidis Mastoras

, Thomas R. Gross:
Efficient and Scalable Execution of Fine-Grained Dynamic Linear Pipelines. 8:1-8:26 - Tae Jun Ham

, Juan L. Aragón
, Margaret Martonosi:
Efficient Data Supply for Parallel Heterogeneous Architectures. 9:1-9:23 - Savvas Sioutas, Sander Stuijk

, Luc Waeijen, Twan Basten, Henk Corporaal, Lou J. Somers:
Schedule Synthesis for Halide Pipelines through Reuse Analysis. 10:1-10:22 - Xiaoyuan Wang, Haikun Liu, Xiaofei Liao, Ji Chen, Hai Jin, Yu Zhang, Long Zheng, Bingsheng He

, Song Jiang:
Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems. 11:1-11:26 - Sahar Sargaran, Naser Mohammadzadeh

:
SAQIP: A Scalable Architecture for Quantum Information Processors. 12:1-12:21 - Prerna Budhkar, Ildar Absalyamov, Vasileios Zois, Skyler Windh

, Walid A. Najjar
, Vassilis J. Tsotras
:
Accelerating In-Memory Database Selections Using Latency Masking Hardware Threads. 13:1-13:28 - Heinrich Riebler

, Gavin Vaz, Tobias Kenter, Christian Plessl
:
Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL. 14:1-14:26 - Xun Gong, Xiang Gong, Leiming Yu, David R. Kaeli:

HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution. 15:1-15:22 - Yang Song, Olivier Alavoine, Bill Lin

:
A Self-aware Resource Management Framework for Heterogeneous Multicore SoCs with Diverse QoS Targets. 16:1-16:23 - Pedro Yébenes, Jose Rocher-Gonzalez, Jesús Escudero-Sahuquillo

, Pedro Javier García
, Francisco J. Alfaro, Francisco J. Quiles
, Crispín Gómez Requena, José Duato
:
Combining Source-adaptive and Oblivious Routing with Congestion Control in High-performance Interconnects using Hybrid and Direct Topologies. 17:1-17:26 - Mohammad A. Alshboul

, Hussein Elnawawy, Reem Elkhouly
, Keiji Kimura, James Tuck, Yan Solihin:
Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory. 18:1-18:27 - Zacharias Hadjilambrou, Marios Kleanthous, Georgia Antoniou, Antoni Portero

, Yiannakis Sazeides:
Comprehensive Characterization of an Open Source Document Search Engine. 19:1-19:21
Volume 16, Number 3, August 2019
- Bingchao Li, Jizeng Wei, Jizhou Sun, Murali Annavaram

, Nam Sung Kim:
An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns. 20:1-20:24 - Stephen I. Roberts, Steven A. Wright

, Suhaib A. Fahmy, Stephen A. Jarvis
:
The Power-optimised Software Envelope. 21:1-21:27 - Ram Srivatsa Kannan, Michael Laurenzano, Jeongseob Ahn

, Jason Mars, Lingjia Tang:
Caliper: Interference Estimator for Multi-tenant Environments Sharing Architectural Resources. 22:1-22:25 - Zhen Lin

, Hongwen Dai, Michael Mantor, Huiyang Zhou
:
Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution. 23:1-23:27 - Keryan Didier, Dumitru Potop-Butucaru, Guillaume Iooss, Albert Cohen

, Jean Souyris, Philippe Baufreton, Amaury Graillat:
Correct-by-Construction Parallelization of Hard Real-Time Avionics Applications on Off-the-Shelf Predictable Hardware. 24:1-24:27 - Pantea Zardoshti

, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott
, Michael F. Spear
:
Simplifying Transactional Memory Support in C++. 25:1-25:24 - Jungwoo Park

, Myoungjun Lee
, Soontae Kim
, Minho Ju, Jeongkyu Hong
:
MH Cache: A Mult Stephen Jarvisi-retention STT-RAM-based Low-power Last-level Cache for Mobile Hardware Rendering Systems. 26:1-26:26 - Jakob Leben

, George Tzanetakis
:
Polyhedral Compilation for Multi-dimensional Stream Processing. 27:1-27:26 - Mohammad Sadegh Sadeghi

, Siavash Bayat Sarmadi
, Shaahin Hessabi
:
Toward On-chip Network Security Using Runtime Isolation Mapping. 28:1-28:25 - Stéphane Louise:

A First Step Toward Using Quantum Computing for Low-level WCETs Estimations. 29:1-29:22 - Artem Chikin, Taylor Lloyd, José Nelson Amaral, Ettore Tiotto, Muhammad Usman:

Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP Loops. 30:1-30:26 - Sanghoon Cha, Bokyeong Kim, Chang Hyun Park

, Jaehyuk Huh:
Morphable DRAM Cache Design for Hybrid Memory Systems. 31:1-31:24 - Chao Luo

, Yunsi Fei
, David R. Kaeli:
Side-channel Timing Attack of RSA on a GPU. 32:1-32:18 - Liang Yuan, Chen Ding, Wesley Smith, Peter J. Denning, Yunquan Zhang:

A Relational Theory of Locality. 33:1-33:26
Volume 16, Number 4, January 2020
- Arun Thangamani, V. Krishna Nandivada:

Optimizing Remote Communication in X10. 34:1-34:26 - Sriseshan Srikanth

, Anirudh Jain
, Joseph M. Lennon, Thomas M. Conte
, Erik DeBenedictis, Jeanine E. Cook:
MetaStrider: Architectures for Scalable Memory-centric Reduction of Sparse Data Streams. 35:1-35:26 - Mostafa Koraei, Omid Fatemi, Magnus Jahre

:
DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs. 36:1-36:24 - Leeor Peled

, Uri C. Weiser, Yoav Etsion
:
A Neural Network Prefetcher for Arbitrary Memory Access Patterns. 37:1-37:27 - Nicolas Vasilache

, Oleksandr Zinenko
, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses
, Sven Verdoolaege, Andrew Adams, Albert Cohen
:
The Next 700 Accelerated Layers: From Mathematical Expressions of Network Computation Graphs to Accelerated GPU Kernels, Automatically. 38:1-38:26 - Wenbin Jiang

, Yang Ma
, Bo Liu, Haikun Liu, Bing Bing Zhou, Jian Zhu, Song Wu, Hai Jin:
Layup: Layer-adaptive and Multi-type Intermediate-oriented Memory Optimization for GPU-based CNNs. 39:1-39:23 - Sergi Siso, Wes Armour

, Jeyarajan Thiyagalingam
:
Evaluating Auto-Vectorizing Compilers through Objective Withdrawal of Useful Information. 40:1-40:23 - Salonik Resch, S. Karen Khatamifard, Zamshed Iqbal Chowdhury

, Masoud Zabihi, Zhengyang Zhao, Jianping Wang, Sachin S. Sapatnekar, Ulya R. Karpuzcu:
PIMBALL: Binary Neural Networks in Spintronic Memory. 41:1-41:26 - Zhen Hang Jiang, Yunsi Fei

, David R. Kaeli:
Exploiting Bank Conflict-based Side-channel Timing Leakage of GPUs. 42:1-42:24 - Kyle Daruwalla

, Heng Zhuo, Rohit Shukla
, Mikko H. Lipasti:
BitSAD v2: Compiler Optimization and Analysis for Bitstream Computing. 43:1-43:25 - Aristeidis Mastoras

, Thomas R. Gross:
Chunking for Dynamic Linear Pipelines. 44:1-44:25 - Manuel Selva

, Fabian Gruber, Diogo Sampaio, Christophe Guillon, Louis-Noël Pouchet, Fabrice Rastello:
Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable. 45:1-45:26 - Ahmad Yasin

, Jawad Haj-Yahya
, Yosi Ben-Asher, Avi Mendelson:
A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors. 46:1-46:25 - Jie Zhao

, Albert Cohen
:
Flextended Tiles: A Flexible Extension of Overlapped Tiles for Polyhedral Compilation. 47:1-47:25 - Daniel Gerzhoy

, Xiaowu Sun, Michael Zuzak, Donald Yeung:
Nested MIMD-SIMD Parallelization for Heterogeneous Microprocessors. 48:1-48:27 - Chunwei Xia, Jiacheng Zhao

, Huimin Cui, Xiaobing Feng, Jingling Xue
:
DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing. 49:1-49:26 - Ian Briggs, Arnab Das

, Mark Baranowski, Vishal Chandra Sharma, Sriram Krishnamoorthy
, Zvonimir Rakamaric, Ganesh Gopalakrishnan:
FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation. 50:1-50:21 - Khalid Ahmad, Hari Sundar, Mary W. Hall

:
Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs. 51:1-51:24 - Larisa Stoltzfus, Bastian Hagedorn, Michel Steuwer

, Sergei Gorlatch, Christophe Dubach:
Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift. 52:1-52:25 - Michiel A. van der Vlag

, Georgios Smaragdos, Zaid Al-Ars, Christos Strydis
:
Exploring Complex Brain-Simulation Workloads on Multi-GPU Deployments. 53:1-53:25 - Reem Elkhouly

, Mohammad A. Alshboul
, Akihiro Hayashi, Yan Solihin, Keiji Kimura:
Compiler-support for Critical Data Persistence in NVM. 54:1-54:25 - Lorenzo Chelini, Oleksandr Zinenko

, Tobias Grosser
, Henk Corporaal:
Declarative Loop Tactics for Domain-specific Optimization. 55:1-55:25 - Asif Ali Khan

, Fazal Hameed
, Robin Bläsing, Stuart S. P. Parkin, Jerónimo Castrillón:
ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0. 56:1-56:23

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














