default search action
Wen-Mei W. Hwu
Wen-mei W. Hwu – Wen-Mei Hwu
Person information
- affiliation: University of Illinois at Urbana-Champaign, Department of Electrical and Computer Engineering, Urbana-Champaign, IL, USA
- award (1999): Grace Murray Hopper Award
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j84]Mohit Mahajan, Wen-Mei Hwu, Rakesh Nagi:
Determining optimal channel partition for 2:4 fine grained structured sparsity. Optim. Lett. 18(9): 2079-2090 (2024) - [j83]Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. Proc. VLDB Endow. 17(6): 1227-1240 (2024) - [c245]Chia-Hao Chang, Jihoon Han, Anand Sivasubramaniam, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
GMT: GPU Orchestrated Memory Tiering for the Big Data Era. ASPLOS (3) 2024: 464-478 - [c244]Kun Wu, Mert Hidayetoglu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-Mei Hwu:
Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures. ASPLOS (3) 2024: 528-544 - [c243]Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Yu Li, Christopher Zimmer, Tekin Bicer, Bin Ren, William Gropp, Wen-Mei Hwu, Alex Aiken:
CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes. ICS 2024: 426-436 - [i62]Ali Hassani, Wen-Mei Hwu, Humphrey Shi:
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level. CoRR abs/2403.04690 (2024) - [i61]Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Qureshi, Scott A. Mahlke, Wen-Mei W. Hwu:
LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme. CoRR abs/2407.15264 (2024) - [i60]Mert Hidayetoglu, Simon Garcia de Gonzalo, Elliott Slaughter, Pinku Surana, Wen-Mei W. Hwu, William Gropp, Alex Aiken:
HiCCL: A Hierarchical Collective Communication Library. CoRR abs/2408.05962 (2024) - [i59]Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoglu, Vikram Sharma Mailthody, Sitao Huang, Steven S. Lumetta, Wen-Mei W. Hwu:
TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading. CoRR abs/2408.10013 (2024) - 2023
- [j82]Mohamed El-Hadedy, Xinfei Guo, Kazutomo Yoshii, Yichen Cai, Robert Herndon, Bryan Banta, Wen-Mei Hwu:
RECO-ASCON: Reconfigurable ASCON hash functions for IoT applications. Integr. 93: 102061 (2023) - [c242]Mohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-mei W. Hwu:
Parallelizing Maximal Clique Enumeration on GPUs. PACT 2023: 162-175 - [c241]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Can Language Models Be Specific? How? ACL (Findings) 2023: 716-727 - [c240]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. ASPLOS (2) 2023: 325-339 - [c239]Luyang Yu, Yizhen Lu, Meghna Mandava, Edward Richter, Vikram Sharma Mailthody, Seungwon Min, Wen-Mei W. Hwu, Deming Chen:
FSSD: FPGA-Based Emulator for SSDs. FPL 2023: 101-108 - [c238]Samiran Kawtikwar, Mohammad Almasri, Wen-Mei Hwu, Rakesh Nagi, Jinjun Xiong:
BEEP: Balanced Efficient subgraph Enumeration in Parallel. ICPP 2023: 142-152 - [c237]Mohamed El-Hadedy, Russell Hua, Kazutomo Yoshii, Wen-Mei Hwu, Martin Margala:
RECO-LFSR: Reconfigurable Low-power Cryptographic processor based on LFSR for Trusted IoT platforms. ISQED 2023: 1-7 - [c236]Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, Wen-Mei Hwu:
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. KDD 2023: 4284-4295 - [c235]Mohamed El-Hadedy, Russell Hua, Shahzman Saqib, Kazutomo Yoshii, Wen-Mei Hwu, Martin Margala:
BLTESTI: Benchmarking Lightweight TinyJAMBU on Embedded Systems for Trusted IoT. SOCC 2023: 1-6 - [c234]Benjamin Reidys, Yuqi Xue, Daixuan Li, Bharat Sukhwani, Wen-Mei Hwu, Deming Chen, Sameh W. Asaad, Jian Huang:
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design. SOSP 2023: 182-199 - [i58]Kun Wu, Mert Hidayetoglu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-Mei W. Hwu:
PIGEON: Optimizing CUDA Code Generator for End-to-End Training and Inference of Relational Graph Neural Networks. CoRR abs/2301.06284 (2023) - [i57]Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, Wen-mei W. Hwu:
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. CoRR abs/2302.13522 (2023) - [i56]Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-Mei Hwu:
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. CoRR abs/2306.16384 (2023) - [i55]Jeongmin Brian Park, Zaid Qureshi, Vikram S. Mailthody, Andrew Gacek, Shunfan Shao, Mohammad Almasri, Isaac Gelado, Jinjun Xiong, Chris J. Newburn, I-Hsin Chung, Michael Garland, Nikolay Sakharnykh, Wen-Mei W. Hwu:
CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs. CoRR abs/2307.03760 (2023) - [i54]Benjamin Reidys, Yuqi Xue, Daixuan Li, Bharat Sukhwani, Wen-mei W. Hwu, Deming Chen, Sameh W. Asaad, Jian Huang:
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design. CoRR abs/2309.06513 (2023) - 2022
- [j81]Omer Anjum, Mohammad Almasri, Simon Garcia de Gonzalo, Wen-Mei W. Hwu:
An efficient GPU implementation and scaling for higher-order 3D stencils. Inf. Sci. 586: 326-343 (2022) - [j80]Xiaofan Zhang, Yuan Ma, Jinjun Xiong, Wen-Mei W. Hwu, Volodymyr V. Kindratenko, Deming Chen:
Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(6): 1606-1619 (2022) - [j79]Mert Hidayetoglu, Tekin Biçer, Simon Garcia de Gonzalo, Bin Ren, Doga Gürsoy, Rajkumar Kettimuthu, Ian T. Foster, Wen-Mei W. Hwu:
MemXCT: Design, Optimization, Scaling, and Reproducibility of X-Ray Tomography Imaging. IEEE Trans. Parallel Distributed Syst. 33(9): 2014-2031 (2022) - [c233]Jie Huang, Kevin Chang, Jinjun Xiong, Wen-Mei Hwu:
Open Relation Modeling: Learning to Define Relations between Entities. ACL (Findings) 2022: 297-308 - [c232]Mhd Ghaith Olabi, Juan Gómez-Luna, Onur Mutlu, Wen-Mei Hwu, Izzat El Hajj:
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. CGO 2022: 1-13 - [c231]Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Understanding Jargon: Combining Extraction and Generation for Definition Modeling. EMNLP 2022: 3994-4004 - [c230]Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
DEER: Descriptive Knowledge Graph for Explaining Entity Relationships. EMNLP 2022: 6686-6698 - [c229]Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu:
Parallel K-clique counting on GPUs. ICS 2022: 21:1-21:14 - [c228]Vibhor Dodeja, Mohammad Almasri, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu:
PARSEC: PARallel Subgraph Enumeration in CUDA. IPDPS 2022: 168-178 - [c227]Seungwon Min, Kun Wu, Mert Hidayetoglu, Jinjun Xiong, Xiang Song, Wen-Mei Hwu:
Graph Neural Network Training and Data Tiering. KDD 2022: 3555-3565 - [c226]Xiangdong Wei, Mohamed El-Hadedy, Sergiu Mosanu, Zhengping Zhu, Wen-Mei Hwu, Xinfei Guo:
RECO-HCON: A High-Throughput Reconfigurable Compact ASCON Processor for Trusted IoT. SOCC 2022: 1-6 - [d1]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelago, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. Zenodo, 2022 - [i53]Mhd Ghaith Olabi, Juan Gómez-Luna, Onur Mutlu, Wen-Mei W. Hwu, Izzat El Hajj:
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs. CoRR abs/2201.02789 (2022) - [i52]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage. CoRR abs/2203.04910 (2022) - [i51]Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
DKG: A Descriptive Knowledge Graph for Explaining Relationships between Entities. CoRR abs/2205.10479 (2022) - [i50]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei Hwu:
Can Language Models Be Specific? How? CoRR abs/2210.05159 (2022) - [i49]Omer Anjum, Alok Kamatar, Toby Liang, Jinjun Xiong, Wen-Mei Hwu:
Submission-Aware Reviewer Profiling for Reviewer Recommender System. CoRR abs/2211.04194 (2022) - [i48]Mohammad Almasri, Yen-Hsiang Chang, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-mei W. Hwu:
Parallelizing Maximal Clique Enumeration on GPUs. CoRR abs/2212.01473 (2022) - 2021
- [j78]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei W. Hwu:
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. Proc. VLDB Endow. 14(11): 2087-2100 (2021) - [j77]Sitao Huang, Kun Wu, Hyunmin Jeong, Chengyue Wang, Deming Chen, Wen-Mei Hwu:
PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. IEEE Trans. Computers 70(12): 2015-2028 (2021) - [j76]Qin Li, Xiaofan Zhang, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Efficient Methods for Mapping Neural Machine Translator on FPGAs. IEEE Trans. Parallel Distributed Syst. 32(7): 1866-1877 (2021) - [c225]Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen-Mei W. Hwu:
Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles. PACT 2021: 345-355 - [c224]Jie Huang, Kevin Chang, Jinjun Xiong, Wen-Mei Hwu:
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach. ACL/IJCNLP (1) 2021: 3641-3651 - [c223]Ashutosh Dhar, Paul Reckamp, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Graviton: A Reconfigurable Memory-Compute Fabric for Data Intensive Applications. ARC 2021: 254-264 - [c222]Sitao Huang, Aayush Ankit, Plínio Silveira, Rodrigo Antunes, Sai Rahul Chalamalasetti, Izzat El Hajj, Dong Eun Kim, Glaucimar Aguiar, Pedro Bruel, Sergey Serebryakov, Cong Xu, Can Li, Paolo Faraboschi, John Paul Strachan, Deming Chen, Kaushik Roy, Wen-Mei W. Hwu, Dejan S. Milojicic:
Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators. ASP-DAC 2021: 372-377 - [c221]Jiachen Li, Bowen Cheng, Rogério Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi:
Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection. CVPR Workshops 2021: 2378-2387 - [c220]Chengyue Wang, Sitao Huang, Wen-Mei Hwu, Deming Chen:
Extending HLS with High-Level Descriptive Language for Configurable Algorithm-Level Spatial Structure Design. FCCM 2021: 261 - [c219]Sitao Huang, Kun Wu, Hyunmin Jeong, Chengyue Wang, Deming Chen, Wen-Mei Hwu:
PyLog: An Algorithm-Centric Python-Based FPGA Programming and Synthesis Flow. FPGA 2021: 227-228 - [c218]Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, Wen-Mei Hwu:
TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes. HPDC 2021: 95-106 - [c217]Mohammad Almasri, Neo Vasudeva, Rakesh Nagi, Jinjun Xiong, Wen-Mei Hwu:
HyKernel: A Hybrid Selection of One/Two-Phase Kernels for Triangle Counting on GPUs. HPEC 2021: 1-7 - [c216]Zhonghao Wang, Kai Wang, Mo Yu, Jinjun Xiong, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi:
Interpretable Visual Reasoning via Induced Symbolic Space. ICCV 2021: 1858-1867 - [c215]Sultan Durrani, Muhammad Saad Chughtai, Abdul Dakkak, Wen-Mei Hwu, Lawrence Rauchwerger:
FFT blitz: the tensor cores strike back. PPoPP 2021: 488-489 - [c214]Omer Anjum, Mohammad Almasri, Jinjun Xiong, Wen-Mei W. Hwu:
PhraseScope: An Effective and Unsupervised Framework for Mining High Quality Phrases. SDM 2021: 639-647 - [i47]Vikram Sharma Mailthody, James Wei, Nicholas Chen, Mohammad Behnia, Ruihao Yao, Qihao Wang, Vedant Agrawal, Churan He, Lijian Wang, Leihao Chen, Amit Agarwal, Edward Richter, Wen-Mei Hwu, Christopher W. Fletcher, Jinjun Xiong, Andrew Miller, Sanjay Patel:
Safer Illinois and RokWall: Privacy Preserving University Health Apps for COVID-19. CoRR abs/2101.07897 (2021) - [i46]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-Mei W. Hwu:
PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses. CoRR abs/2101.07956 (2021) - [i45]Seungwon Min, Kun Wu, Sitao Huang, Mert Hidayetoglu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-Mei W. Hwu:
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture. CoRR abs/2103.03330 (2021) - [i44]Mohammad Almasri, Izzat El Hajj, Rakesh Nagi, Jinjun Xiong, Wen-Mei W. Hwu:
K-Clique Counting on GPUs. CoRR abs/2104.13209 (2021) - [i43]Jiachen Li, Bowen Cheng, Rogério Feris, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, Humphrey Shi:
Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection. CoRR abs/2104.14082 (2021) - [i42]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-mei W. Hwu:
Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach. CoRR abs/2105.13255 (2021) - [i41]Jie Huang, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-Mei W. Hwu:
Open Relation Modeling: Learning to Define Relations between Entities. CoRR abs/2108.09241 (2021) - [i40]Yen-Hsiang Chang, Jianhao Pu, Wen-Mei W. Hwu, Jinjun Xiong:
MLHarness: A Scalable Benchmarking System for MLCommons. CoRR abs/2111.05231 (2021) - [i39]Seungwon Min, Kun Wu, Mert Hidayetoglu, Jinjun Xiong, Xiang Song, Wen-mei W. Hwu:
Graph Neural Network Training with Data Tiering. CoRR abs/2111.05894 (2021) - 2020
- [j75]Seungwon Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-Mei Hwu:
EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs. Proc. VLDB Endow. 14(2): 114-127 (2020) - [j74]Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal, Matthew J. Marinella, Martin Foltin, John Paul Strachan, Dejan S. Milojicic, Wen-Mei Hwu, Kaushik Roy:
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM. IEEE Trans. Computers 69(8): 1128-1142 (2020) - [c213]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei W. Hwu:
The Design and Implementation of a Scalable Deep Learning Benchmarking Platform. CLOUD 2020: 414-425 - [c212]Abdul Dakkak, Tom Wickham-Jones, Wen-Mei Hwu:
The design and implementation of the wolfram language compiler. CGO 2020: 212-228 - [c211]Omer Anjum, Chak Ho Chan, Tanitpong Lawphongpanich, Yucheng Liang, Tianyi Tang, Shuchen Zhang, Wen-Mei Hwu, Jinjun Xiong, Sanjay Patel:
Vertext: An End-to-end AI Powered Conversation Management System for Multi-party Chat Platforms. CSCW Companion 2020: 1-6 - [c210]Zhonghao Wang, Yunchao Wei, Rogério Schmidt Feris, Jinjun Xiong, Wen-Mei W. Hwu, Thomas S. Huang, Honghui Shi:
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation. CVPR Workshops 2020: 4043-4047 - [c209]Zhonghao Wang, Mo Yu, Yunchao Wei, Rogério Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation. CVPR 2020: 12632-12641 - [c208]Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen-mei W. Hwu, Deming Chen:
EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. DAC 2020: 1-6 - [c207]Jie Huang, Zilong Wang, Kevin Chang, Wen-Mei Hwu, Jinjun Xiong:
Exploring Semantic Capacity of Terms. EMNLP (1) 2020: 8509-8518 - [c206]Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices. ACM Great Lakes Symposium on VLSI 2020: 283-290 - [c205]Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong, Rakesh Nagi, Wen-Mei Hwu:
At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation. HPEC 2020: 1-7 - [c204]Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator. ICCAD 2020: 61:1-61:9 - [c203]Mohamed El-Hadedy, Martin Margala, Sergiu Mosanu, Danilo Gligoroski, Jinjun Xiong, Wen-Mei Hwu:
Micro - GAGE: A Low-power Compact GAGE Hash Function Processor for IoT Applications. ICECS 2020: 1-4 - [c202]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, Wen-Mei Hwu:
XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs. IPDPS 2020: 326-327 - [c201]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei Hwu:
Benanza: Automatic μBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs. IPDPS 2020: 440-450 - [c200]Carl Pearson, Mert Hidayetoglu, Mohammad Almasri, Omer Anjum, I-Hsin Chung, Jinjun Xiong, Wen-Mei W. Hwu:
Node-Aware Stencil Communication for Heterogeneous Supercomputers. IPDPS Workshops 2020: 796-805 - [c199]Wen-Mei Hwu:
Advancing Computing Infrastructure for Very Large-Scale Deep Learning at C3SR. IPDPS Workshops 2020: 989 - [c198]Ashutosh Dhar, Xiaohao Wang, Hubertus Franke, Jinjun Xiong, Jian Huang, Wen-Mei W. Hwu, Nam Sung Kim, Deming Chen:
FReaC Cache: Folded-logic Reconfigurable Computing in the Last Level Cache. MICRO 2020: 102-117 - [c197]Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas S. Huang, Honghui Shi, Wen-Mei Hwu, Deming Chen:
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems. MLSys 2020 - [c196]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-mei W. Hwu:
DLSpec: A Deep Learning Task Exchange Specification. OpML 2020 - [c195]Mert Hidayetoglu, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes. SC 2020: 37 - [c194]Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-Mei Hwu:
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs. ICPE 2020: 202-209 - [e5]Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, H. Peter Hofstee:
ICS '20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020. ACM 2020, ISBN 978-1-4503-7983-0 [contents] - [i38]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-Mei Hwu:
MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale. CoRR abs/2002.08295 (2020) - [i37]Abdul Dakkak, Cheng Li, Jinjun Xiong, Wen-Mei Hwu:
DLSpec: A Deep Learning Task Exchange Specification. CoRR abs/2002.11262 (2020) - [i36]Zhonghao Wang, Mo Yu, Yunchao Wei, Rogério Schmidt Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation. CoRR abs/2003.08040 (2020) - [i35]Zhonghao Wang, Yunchao Wei, Rogério Feris, Jinjun Xiong, Wen-Mei Hwu, Thomas S. Huang, Honghui Shi:
Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation. CoRR abs/2004.00794 (2020) - [i34]Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions. CoRR abs/2005.02563 (2020) - [i33]Seungwon Min, Vikram Sharma Mailthody, Zaid Qureshi, Jinjun Xiong, Eiman Ebrahimi, Wen-Mei W. Hwu:
EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs. CoRR abs/2006.06890 (2020) - [i32]Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, Jinjun Xiong, Rakesh Nagi, Wen-mei W. Hwu:
Efficient Inference on GPUs for the Sparse Deep Neural Network Graph Challenge 2020. CoRR abs/2007.14152 (2020) - [i31]Zaid Qureshi, Vikram Sharma Mailthody, Seungwon Min, I-Hsin Chung, Jinjun Xiong, Wen-Mei W. Hwu:
Tearing Down the Memory Wall. CoRR abs/2008.10169 (2020) - [i30]Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-Mei W. Hwu, Deming Chen:
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator. CoRR abs/2008.12745 (2020) - [i29]Mert Hidayetoglu, Tekin Bicer, Simon Garcia De Gonzalo, Bin Ren, Vincent De Andrade, Doga Gürsoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes. CoRR abs/2009.07226 (2020) - [i28]Jie Huang, Zilong Wang, Kevin Chen-Chuan Chang, Wen-Mei Hwu, Jinjun Xiong:
Exploring Semantic Capacity of Terms. CoRR abs/2010.01898 (2020) - [i27]Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices. CoRR abs/2010.07185 (2020) - [i26]Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi:
Interpretable Visual Reasoning via Induced Symbolic Space. CoRR abs/2011.11603 (2020) - [i25]Carl Pearson, Kun Wu, I-Hsin Chung, Jinjun Xiong, Wen-Mei Hwu:
Fast CUDA-Aware MPI Datatypes without Platform Support. CoRR abs/2012.14363 (2020)
2010 – 2019
- 2019
- [c193]Abdul Dakkak, Cheng Li, Simon Garcia De Gonzalo, Jinjun Xiong, Wen-Mei Hwu:
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function-as-a-Service. CLOUD 2019: 372-382 - [c192]Qin Li, Xiaofan Zhang, Jinjun Xiong, Wen-Mei Hwu, Deming Chen:
Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS. ASP-DAC 2019: 693-698 - [c191]Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, Dejan S. Milojicic:
PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference. ASPLOS 2019: 715-731 - [c190]Ahmed H. M. O. Abulila, Vikram Sharma Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, Wen-Mei W. Hwu:
FlatFlash: Exploiting the Byte-Accessibility of SSDs within a Unified Memory-Storage Hierarchy. ASPLOS 2019: 971-985 - [c189]