


default search action
ACM Transactions on Architecture and Code Optimization, Volume 15
Volume 15, Number 1, April 2018
- Hochan Lee, Mansureh S. Moghaddam, Dongkwan Suh, Bernhard Egger

:
Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression/Decompression. 1:1-1:26 - Karthik Sangaiah, Michael Lui, Radhika Jagtap, Stephan Diestelhorst, Siddharth Nilakantan, Ankit More, Baris Taskin, Mark Hempstead:

SynchroTrace: Synchronization-Aware Architecture-Agnostic Traces for Lightweight Multicore Simulation of CMP and HPC Workloads. 2:1-2:26 - Long Zheng

, Xiaofei Liao, Hai Jin:
Efficient and Scalable Graph Parallel Processing With Symbolic Execution. 3:1-3:25 - Jae-Eon Jo

, Gyu-hyeon Lee, Hanhwi Jang, Jaewon Lee, Mohammadamin Ajdari, Jangwoo Kim:
DiagSim: Systematically Diagnosing Simulators for Healthy Simulations. 4:1-4:27 - Sushant Kondguli, Michael C. Huang

:
A Case for a More Effective, Power-Efficient Turbo Boosting. 5:1-5:22 - Kuan-Chung Chen

, Chung-Ho Chen:
Enabling SIMT Execution Model on Homogeneous Multi-Core System. 6:1-6:26 - Mingzhe Zhang

, King Tin Lam, Xin Yao, Cho-Li Wang:
SIMPO: A Scalable In-Memory Persistent Object Framework Using NVRAM for Reliable Big Data Computing. 7:1-7:28 - Bobin Deng, Sriseshan Srikanth, Eric R. Hein, Thomas M. Conte

, Erik DeBenedictis, Jeanine E. Cook, Michael P. Frank:
Extending Moore's Law via Computationally Error-Tolerant Computing. 8:1-8:27 - Dave Dice, Maurice Herlihy, Alex Kogan:

Improving Parallelism in Hardware Transactional Memory. 9:1-9:24 - Namhyung Kim, Junwhan Ahn

, Kiyoung Choi, Daniel Sánchez, Donghoon Yoo, Soojung Ryu:
Benzene: An Energy-Efficient Distributed Hybrid Cache Architecture for Manycore Systems. 10:1-10:23 - Yulong Ao, Chao Yang, Fangfang Liu, Wanwang Yin, Lijuan Jiang, Qiao Sun:

Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer. 11:1-11:20 - Saeed Rashidi, Majid Jalili, Hamid Sarbazi-Azad:

Improving MLC PCM Performance through Relaxed Write and Read for Intermediate Resistance Levels. 12:1-12:31 - Wenlai Zhao, Haohuan Fu, Jiarui Fang

, Weijie Zheng
, Lin Gan, Guangwen Yang:
Optimizing Convolutional Neural Networks on the Sunway TaihuLight Supercomputer. 13:1-13:26 - Dimitrios Mbakoyiannis, Othon Tomoutzoglou

, George Kornaros
:
Energy-Performance Considerations for Data Offloading to FPGA-Based Accelerators Over PCIe. 14:1-14:24 - Zhen Lin

, Michael Mantor, Huiyang Zhou
:
GPU Performance vs. Thread-Level Parallelism: Scalability Analysis and a Novel Way to Improve TLP. 15:1-15:21 - Oleksandr Zinenko

, Stéphane Huot, Cédric Bastoul:
Visual Program Manipulation in the Polyhedral Model. 16:1-16:25
Volume 15, Number 2, June 2018
- Mustafa M. Shihab, Jie Zhang, Myoungsoo Jung, Mahmut T. Kandemir:

ReveNAND: A Fast-Drift-Aware Resilient 3D NAND Flash Design. 17:1-17:26 - Seyed Majid Zahedi, Songchun Fan, Benjamin C. Lee:

Managing Heterogeneous Datacenters with Tokens. 18:1-18:23 - Miquel Pericàs

:
Elastic Places: An Adaptive Resource Manager for Scalable and Portable Performance. 19:1-19:26 - Matthew Benjamin Olson

, Joseph T. Teague, Divyani Rao, Michael R. Jantz
, Kshitij A. Doshi, Prasad A. Kulkarni:
Cross-Layer Memory Management to Improve DRAM Energy Efficiency. 20:1-20:27 - Davide Zoni

, Luca Colombo, William Fornaciari
:
DarkCache: Energy-Performance Optimization of Tiled Multi-Cores by Adaptively Power-Gating LLC Banks. 21:1-21:26 - Yang Zhang

, Dan Feng, Wei Tong
, Yu Hua, Jingning Liu, Zhipeng Tan, Chengning Wang
, Bing Wu
, Zheng Li
, Gaoxiang Xu:
CACF: A Novel Circuit Architecture Co-optimization Framework for Improving Performance, Reliability and Energy of ReRAM-based Main Memory System. 22:1-22:26 - Nicolai Stawinoga

, Tony Field:
Predictable Thread Coarsening. 23:1-23:26 - Probir Roy

, Shuaiwen Leon Song, Sriram Krishnamoorthy, Abhinav Vishnu, Dipanjan Sengupta, Xu Liu:
NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks. 24:1-24:26 - Ahsen Ejaz

, Vassilios Papaefstathiou, Ioannis Sourdis:
DDRNoC: Dual Data-Rate Network-on-Chip. 25:1-25:24 - Ying Cai, Yulong Ao, Chao Yang

, Wenjing Ma, Haitao Zhao:
Extreme-Scale High-Order WENO Simulations of 3-D Detonation Wave with 10 Million Cores. 26:1-26:21
Volume 15, Number 3, October 2018
- Yannis Sfakianakis

, Christos Kozanitis, Christos Kozyrakis, Angelos Bilas
:
QuMan: Profile-based Improvement of Cluster Utilization. 27:1-27:25 - Engin Kayraklioglu, Michael P. Ferguson, Tarek A. El-Ghazawi:

LAPPS: Locality-Aware Productive Prefetching Support for PGAS. 28:1-28:26 - Akrem Benatia, Weixing Ji, Yizhuo Wang, Feng Shi:

BestSF: A Sparse Meta-Format for Optimizing SpMV on GPU. 29:1-29:27 - Pierre Michaud

:
An Alternative TAGE-like Conditional Branch Predictor. 30:1-30:23 - James Garland

, David Gregg:
Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing. 31:1-31:24 - Hyojong Kim, Ramyad Hadidi

, Lifeng Nai
, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, Gabriel H. Loh:
CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems. 32:1-32:23 - Madhavan Manivannan

, Miquel Pericàs, Vassilis Papaefstathiou, Per Stenström:
Global Dead-Block Management for Task-Parallel Programs. 33:1-33:25 - Roman Gareev, Tobias Grosser

, Michael Kruse
:
High-Performance Generalized Tensor Operations: A Compiler-Oriented Approach. 34:1-34:27 - Hervé Yviquel

, Lauro Cruz, Guido Araujo:
Cluster Programming using the OpenMP Accelerator Model. 35:1-35:23 - Mohammad Khavari Tavana

, Amir Kavyan Ziabari
, David R. Kaeli
:
Block Cooperation: Advancing Lifetime of Resistive Memories by Increasing Utilization of Error Correcting Codes. 36:1-36:26 - Hai Jin, Bo Liu

, Wenbin Jiang
, Yang Ma
, Xuanhua Shi, Bingsheng He
, Shaofeng Zhao:
Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures. 37:1-37:26 - Dani Voitsechov, Arslan Zulfiqar, Mark Stephenson, Mark Gebhart, Stephen W. Keckler:

Software-Directed Techniques for Improved GPU Register File Utilization. 38:1-38:23 - Huanxin Lin

, Cho-Li Wang, Hongyuan Liu
:
On-GPU Thread-Data Remapping for Branch Divergence Reduction. 39:1-39:24
Volume 15, Number 4, January 2019
- Stefan Kronawitter

, Christian Lengauer:
Polyhedral Search Space Exploration in the ExaStencils Code Generator. 40:1-40:25 - Jingheng Xu

, Haohuan Fu, Wen Shi, Lin Gan, Yuxuan Li, Wayne Luk, Guangwen Yang:
Performance Tuning and Analysis for Stencil-Based Applications on POWER8 Processor. 41:1-41:25 - Jiajun Wang

, Reena Panda, Lizy K. John:
SelSMaP: A Selective Stride Masking Prefetching Scheme. 42:1-42:21 - Xing Su

, Xiangke Liao, Hao Jiang, Canqun Yang, Jingling Xue
:
SCP: Shared Cache Partitioning for High-Performance GEMM. 43:1-43:21 - Fernando Magno Quintão Pereira

, Guilherme V. Leobas, Abdoulaye Gamatié:
Static Prediction of Silent Stores. 44:1-44:26 - Neal Clayton Crago, Mark Stephenson, Stephen W. Keckler:

Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs. 45:1-45:23 - Feng Zhang, Jingling Xue

:
Poker: Permutation-Based SIMD Execution of Intensive Tree Search by Path Encoding. 46:1-46:28 - Nicolas Belleville

, Damien Couroussé
, Karine Heydemann, Henri-Pierre Charles
:
Automated Software Protection for the Masses Against Side-Channel Attacks. 47:1-47:27 - Chao Yu

, Yuebin Bai, Qingxiao Sun, Hailong Yang:
Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory. 48:1-48:24 - Lois Orosa

, Rodolfo Azevedo, Onur Mutlu
:
AVPP: Address-first Value-next Predictor with Value Prefetching for Improving the Efficiency of Load Value Prediction. 49:1-49:30 - Jun Zhang, Rui Hou, Wei Song

, Sally A. McKee, Zhen Jia, Chen Zheng, Mingyu Chen, Lixin Zhang, Dan Meng:
RAGuard: An Efficient and User-Transparent Hardware Mechanism against ROP Attacks. 50:1-50:21 - Ping Wang, Luke McHale, Paul V. Gratz

, Alex Sprintson:
GenMatcher: A Generic Clustering-Based Arbitrary Matching Framework. 51:1-51:22 - Ding-Yong Hong

, Jan-Jan Wu, Yu-Ping Liu, Sheng-Yu Fu, Wei-Chung Hsu:
Processor-Tracing Guided Region Formation in Dynamic Binary Translation. 52:1-52:25 - Yu Wang, Victor Lee, Gu-Yeon Wei, David M. Brooks:

Predicting New Workload or CPU Performance by Analyzing Public Datasets. 53:1-53:21 - Hyukwoo Park

, SungKook Kim, Jung-Geun Park, Soo-Mook Moon:
Reusing the Optimized Code for JavaScript Ahead-of-Time Compilation. 54:1-54:20 - Han Zhao, Quan Chen

, Yuxian Qiu, Ming Wu, Yao Shen, Jingwen Leng, Chao Li, Minyi Guo:
Bandwidth and Locality Aware Task-stealing for Manycore Architectures with Bandwidth-Asymmetric Memory. 55:1-55:26 - Stefan Ganser, Armin Größlinger, Norbert Siegmund, Sven Apel

, Christian Lengauer:
Speeding up Iterative Polyhedral Schedule Optimization with Surrogate Performance Models. 56:1-56:27 - Song Wu, Fang Zhou, Xiang Gao, Hai Jin, Jinglei Ren:

Dual-Page Checkpointing: An Architectural Approach to Efficient Data Persistence for In-Memory Applications. 57:1-57:27 - Mohsen Kiani, Amir Rajabzadeh

:
Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis. 58:1-58:24 - Thomas Debrunner

, Sajad Saeedi
, Paul H. J. Kelly
:
AUKE: Automatic Kernel Code Generation for an Analogue SIMD Focal-Plane Sensor-Processor Array. 59:1-59:26 - You Zhou, Fei Wu, Zhonghai Lu

, Xubin He
, Ping Huang, Changsheng Xie:
SCORE: A Novel Scheme to Efficiently Cache Overlong ECCs in NAND Flash Memory. 60:1-60:25 - Francisco J. Andújar

, Salvador Coll
, Marina Alonso, Pedro López, Juan-Miguel Martínez:
POWAR: Power-Aware Routing in HPC Networks with On/Off Links. 61:1-61:22 - Rahim Mammadli

, Felix Wolf, Ali Jannesari
:
The Art of Getting Deep Neural Networks in Shape. 62:1-62:21 - Stavros Tzilis, Pedro Trancoso

, Ioannis Sourdis:
Energy-Efficient Runtime Management of Heterogeneous Multicores using Online Projection. 63:1-63:26 - Matthew Kay Fei Lee, Yingnan Cui, Thannirmalai Somu, Tao Luo

, Jun Zhou, Wai Teng Tang
, Weng-Fai Wong
, Rick Siow Mong Goh:
A System-Level Simulator for RRAM-Based Neuromorphic Computing Chips. 64:1-64:24 - Evangelos Vasilakis, Vassilis Papaefstathiou, Pedro Trancoso

, Ioannis Sourdis:
Decoupled Fused Cache: Fusing a Decoupled LLC with a DRAM Cache. 65:1-65:23 - Peter Pirkelbauer

, Amalee Wilson, Christina L. Peterson
, Damian Dechev:
Blaze-Tasks: A Framework for Computing Parallel Reductions over Tasks. 66:1-66:25 - Yukinori Sato

, Tomoya Yuki, Toshio Endo:
An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation. 67:1-67:23 - S. Kazem Shekofteh

, Hamid Noori, Mahmoud Naghibzadeh, Hadi Sadoghi Yazdi, Holger Fröning:
Metric Selection for GPU Kernel Classification. 68:1-68:27 - Angelos Bilas

:
List of 2018 Distinguished Reviewers ACM TACO. 69:1

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














