


default search action
ACM Transactions on Architecture and Code Optimization, Volume 9
Volume 9, Number 1, March 2012
- Walid J. Ghandour, Haitham Akkary, Wes Masri:

Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction. 1:1-1:33 - Jaekyu Lee

, Hyesoon Kim, Richard W. Vuduc
:
When Prefetching Works, When It Doesn't, and Why. 2:1-2:29 - Bita Mazloom, Shashidhar Mysore, Mohit Tiwari

, Banit Agrawal, Timothy Sherwood
:
Dataflow Tomography: Information Flow Tracking For Understanding and Visualizing Full Systems. 3:1-3:26 - Jung Ho Ahn

, Norman P. Jouppi, Christos Kozyrakis, Jacob Leverich, Robert S. Schreiber:
Improving System Energy Efficiency with Memory Rank Subsetting. 4:1-4:28 - Xuejun Yang, Li Wang, Jingling Xue

, Qingbo Wu:
Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors. 5:1-5:30 - Abhinandan Majumdar, Srihari Cadambi, Michela Becchi, Srimat T. Chakradhar, Hans Peter Graf:

A Massively Parallel, Energy Efficient Programmable Accelerator for Learning and Classification. 6:1-6:30
Volume 9, Number 2, June 2012
- Stijn Eyerman, Lieven Eeckhout:

Probabilistic modeling for job symbiosis scheduling on SMT processors. 7:1-7:27 - Rachid Seghir

, Vincent Loechner, Benoît Meister:
Integer affine transformations of parametric ℤ-polytopes and applications to loop nest optimization. 8:1-8:27 - Yi Yang, Ping Xiang, Jingfei Kong, Mike Mantor, Huiyang Zhou

:
A unified optimizing compiler framework for different GPGPU architectures. 9:1-9:33 - Choonki Jang, Jaejin Lee, Bernhard Egger

, Soojung Ryu:
Automatic code overlay generation and partially redundant code fetch elimination. 10:1-10:32 - Zahra Abbasi, Georgios Varsamopoulos, Sandeep K. S. Gupta:

TACOMA: Server and workload management in internet data centers considering cooling-computing power trade-off and energy proportionality. 11:1-11:37 - Andreas Lankes, Thomas Wild, Stefan Wallentowitz

, Andreas Herkersdorf:
Benefits of selective packet discard in networks-on-chip. 12:1-12:21
Volume 9, Number 3, September 2012
- Yangchun Luo, Antonia Zhai:

Dynamically dispatching speculative threads to improve sequential execution. 13:1-13:31 - Huimin Cui, Jingling Xue

, Lei Wang, Yang Yang, Xiaobing Feng, Dongrui Fan
:
Extendable pattern-oriented optimization directives. 14:1-14:37 - Adam Wade Lewis, Nian-Feng Tzeng, Soumik Ghosh:

Runtime energy consumption estimation for server workloads based on chaotic time-series approximation. 15:1-15:26 - Alejandro Valero

, Julio Sahuquillo
, Salvador Petit
, Pedro López, José Duato
:
Combining recency of information with selective random and a victim cache in last-level caches. 16:1-16:20 - Bin Li, Li-Shiuan Peh, Li Zhao, Ravi R. Iyer:

Dynamic QoS management for chip multiprocessors. 17:1-17:29 - Polychronis Xekalakis, Nikolas Ioannou, Marcelo Cintra:

Mixed speculative multithreaded execution models. 18:1-18:26 - Mageda Sharafeddine

, Komal Jothi, Haitham Akkary:
Disjoint out-of-order execution processor. 19:1-19:32 - Diego Andrade, Basilio B. Fraguela

, Ramon Doallo:
Static analysis of the worst-case memory performance for irregular codes with indirections. 20:1-20:32 - Yang Chen, Shuangde Fang, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Olivier Temam, Chengyong Wu:

Deconstructing iterative optimization. 21:1-21:30 - Apala Guha, Kim M. Hazelwood, Mary Lou Soffa:

Memory optimization of dynamic binary translators for embedded systems. 22:1-22:29 - James R. Geraci, Sharon M. Sacco:

A transpose-free in-place SIMD optimized FFT. 23:1-23:21
Volume 9, Number 4, January 2013
- Bart Coppens, Bjorn De Sutter, Jonas Maebe:

Feedback-driven binary code diversification. 24:1-24:26 - Jeremy Fowers, Greg Brown, John Robert Wernsing, Greg Stitt:

A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors. 25:1-25:21 - Erven Rohou

, Kevin Williams, David Yuste:
Vectorization technology to improve interpreter performance. 26:1-26:22 - Jimmy Cleary, Owen Callanan, Mark Purcell, David Gregg:

Fast asymmetric thread synchronization. 27:1-27:22 - Yong Li, Rami G. Melhem, Alex K. Jones

:
PS-TLB: Leveraging page classification information for fast, scalable and efficient translation for future CMPs. 28:1-28:21 - Kristof Du Bois, Stijn Eyerman, Lieven Eeckhout:

Per-thread cycle accounting in multicore processors. 29:1-29:22 - Christian Wimmer, Michael Haupt, Michael L. Van de Vanter, Mick J. Jordan, Laurent Daynès, Doug Simon:

Maxine: An approachable virtual machine for, and in, java. 30:1-30:24 - Malik Murtaza Khan, Protonu Basu, Gabe Rudy

, Mary W. Hall
, Chun Chen, Jacqueline Chame:
A script-based autotuning compiler system to generate high-performance CUDA code. 31:1-31:25 - Kenzo Van Craeynest, Lieven Eeckhout:

Understanding fundamental design choices in single-ISA heterogeneous multicore architectures. 32:1-32:23 - Samuel Antao, Leonel Sousa

:
The CRNS framework and its application to programmable and reconfigurable cryptography. 33:1-33:25 - Boubacar Diouf, Can Hantas, Albert Cohen, Özcan Özturk, Jens Palsberg:

A decoupled local memory allocator. 34:1-34:22 - Huimin Cui, Qing Yi, Jingling Xue

, Xiaobing Feng:
Layout-oblivious compiler optimization for matrix computations. 35:1-35:20 - Stephen Dolan, Servesh Muralidharan, David Gregg:

Compiler support for lightweight context switching. 36:1-36:25 - Pablo Abad Fidalgo

, Valentin Puente
, José-Ángel Gregorio
:
LIGERO: A light but efficient router conceived for cache-coherent chip multiprocessors. 37:1-37:21 - Jorge Albericio, Pablo Ibáñez

, Víctor Viñals
, José María Llabería
:
Exploiting reuse locality on inclusive shared last-level caches. 38:1-38:19 - Paraskevas Yiapanis, Demian Rosas-Ham, Gavin Brown, Mikel Luján:

Optimizing software runtime systems for speculative parallelization. 39:1-39:27 - Cedric Nugteren, Pieter Custers, Henk Corporaal:

Algorithmic species: A classification of affine loop nests for parallel programming. 40:1-40:25 - Marco Gerards

, Jan Kuper:
Optimal DPM and DVFS for frame-based real-time systems. 41:1-41:23 - Zhichao Yan, Hong Jiang, Yujuan Tan, Dan Feng:

An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory. 42:1-42:26 - Doris Chen, Deshanand P. Singh

:
Profile-guided floating- to fixed-point conversion for hybrid FPGA-processor applications. 43:1-43:25 - Yan Cui, Yingxin Wang, Yu Chen, Yuanchun Shi:

Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems. 44:1-44:25 - Kishore Kumar Pusukuri, Rajiv Gupta

, Laxmi N. Bhuyan:
ADAPT: A framework for coscheduling multithreaded programs. 45:1-45:24 - Michele Tartara, Stefano Crespi-Reghizzi

:
Continuous learning of compiler heuristics. 46:1-46:25 - Grigorios Chrysos, Panagiotis Dagritzikos, Ioannis Papaefstathiou

, Apostolos Dollas:
HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system. 47:1-47:25 - Jongwon Lee, Yohan Ko, Kyoungwoo Lee, Jonghee M. Youn

, Yunheung Paek:
Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures. 48:1-48:24 - Fabien Coelho, François Irigoin:

API compilation for image hardware accelerators. 49:1-49:25 - Carlos Luque

, Miquel Moretó
, Francisco J. Cazorla
, Mateo Valero
:
Fair CPU time accounting in CMP+SMT processors. 50:1-50:25 - Pavlos M. Mattheakis, Ioannis Papaefstathiou

:
Significantly reducing MPI intercommunication latency and power overhead in both embedded and HPC systems. 51:1-51:25 - Riyadh Baghdadi, Albert Cohen, Sven Verdoolaege

, Konrad Trifunovic:
Improved loop tiling based on the removal of spurious false dependences. 52:1-52:26 - Antoniu Pop, Albert Cohen:

OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. 53:1-53:25 - Sven Verdoolaege

, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez
, Christian Tenllado
, Francky Catthoor:
Polyhedral parallel code generation for CUDA. 54:1-54:23 - Yu Du, Miao Zhou, Bruce R. Childers, Rami G. Melhem, Daniel Mossé:

Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory. 55:1-55:20 - Suresh Purini, Lakshya Jain:

Finding good optimization sequences covering program space. 56:1-56:23 - Mehmet E. Belviranli, Laxmi N. Bhuyan, Rajiv Gupta

:
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. 57:1-57:20 - Anurag Negi, J. Rubén Titos Gil

:
SCIN-cache: Fast speculative versioning in multithreaded cores. 58:1-58:26 - Thibaut Lutz, Christian Fensch, Murray Cole:

PARTANS: An autotuning framework for stencil computation on multi-GPU systems. 59:1-59:24 - Chunhua Xiao, M.-C. Frank Chang

, Jason Cong, Michael Gill, Zhangqin Huang, Chunyue Liu, Glenn Reinman, Hao Wu:
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects. 60:1-60:27

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














