


default search action
Bowen Shi 0002
Person information
- affiliation: Meta, Meta AI, Fundamental AI Research (FAIR), Audiobox Team, USA
- affiliation: Toyota Technological Institute at Chicago, IL, USA
Other persons with the same name
- Bowen Shi — disambiguation page
- Bowen Shi 0001
— Communication University of China, School of Journalism, Beijing, China (and 1 more)
- Bowen Shi 0003
— Shanghai Jiao Tong University, School of Electronic Information and Electrical Engineering, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
- [i18]Andros Tjandra, Yi-Chiao Wu, Baishan Guo, John Hoffman, Brian Ellis, Apoorv Vyas, Bowen Shi, Sanyuan Chen, Matt Le, Nick Zacharov, Carleigh Wood, Ann Lee, Wei-Ning Hsu:
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound. CoRR abs/2502.05139 (2025) - 2024
- [j1]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. J. Mach. Learn. Res. 25: 97:1-97:52 (2024) - [c14]HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. ACL (1) 2024: 12896-12911 - [c13]Peng-Jen Chen, Bowen Shi, Kelvin Niu, Ann Lee, Wei-Ning Hsu:
M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation. ICASSP 2024: 11896-11900 - [c12]Alexander H. Liu, Matthew Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. ICLR 2024 - [c11]K. R. Prajwal, Bowen Shi, Matthew Le, Apoorv Vyas, Andros Tjandra, Mahi Luthra, Baishan Guo, Huiyu Wang, Triantafyllos Afouras, David Kant, Wei-Ning Hsu:
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation. ICML 2024 - [c10]Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu:
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning. INTERSPEECH 2024 - [c9]Lemeng Wu, Zhaoheng Ni, Bowen Shi, Gaël Le Lan, Anurag Kumar, Varun Nagaraja, Xinhao Mei, Yunyang Xiong, Bilge Soran, Raghuraman Krishnamoorthi, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra:
Data Efficient Reflow for Few Step Audio Generation. SLT 2024: 455-461 - [i17]HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. CoRR abs/2403.14402 (2024) - [i16]Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu:
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning. CoRR abs/2406.06251 (2024) - [i15]Gaël Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra:
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching. CoRR abs/2407.03648 (2024) - [i14]Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le, Matthew Yu, Mitesh Kumar Singh, Peizhao Zhang, Peter Vajda, Quentin Duval, Rohit Girdhar, Roshan Sumbaly, Sai Saketh Rambhatla, Sam S. Tsai, Samaneh Azadi, Samyak Datta, Sanyuan Chen, Sean Bell, Sharadh Ramaswamy, Shelly Sheynin, Siddharth Bhattacharya, Simran Motwani, Tao Xu, Tianhe Li, Tingbo Hou, Wei-Ning Hsu, Xi Yin, Xiaoliang Dai, Yaniv Taigman, Yaqiao Luo, Yen-Cheng Liu, Yi-Chiao Wu, Yue Zhao, Yuval Kirstain, Zecheng He, Zijian He, Albert Pumarola, Ali K. Thabet, Artsiom Sanakoyeu, Arun Mallya, Baishan Guo, Boris Araya, Breena Kerr, Carleigh Wood, Ce Liu, Cen Peng, Dmitry Vengertsev, Edgar Schönfeld, Elliot Blanchard, Felix Juefei-Xu, Fraylie Nord, Jeff Liang, John Hoffman
, Jonas Kohler, Kaolin Fire, Karthik Sivakumar, Lawrence Chen, Licheng Yu, Luya Gao, Markos Georgopoulos, Rashel Moritz, Sara K. Sampson, Shikai Li, Simone Parmeggiani, Steve Fine, Tara Fowler, Vladan Petrovic, Yuming Du:
Movie Gen: A Cast of Media Foundation Models. CoRR abs/2410.13720 (2024) - [i13]K. R. Prajwal, Bowen Shi, Matthew Le, Apoorv Vyas, Andros Tjandra, Mahi Luthra, Baishan Guo, Huiyu Wang, Triantafyllos Afouras, David Kant, Wei-Ning Hsu:
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation. CoRR abs/2410.20478 (2024) - [i12]Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra:
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation. CoRR abs/2411.05141 (2024) - 2023
- [c8]Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration. CVPR 2023: 18796-18806 - [c7]Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. INTERSPEECH 2023: 4064-4068 - [c6]Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. INTERSPEECH 2023: 4823-4827 - [c5]Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. NeurIPS 2023 - [i11]Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. CoRR abs/2303.00628 (2023) - [i10]Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. CoRR abs/2305.13516 (2023) - [i9]Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. CoRR abs/2306.15687 (2023) - [i8]Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. CoRR abs/2308.05725 (2023) - [i7]Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. CoRR abs/2310.16338 (2023) - [i6]Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu:
Audiobox: Unified Audio Generation with Natural Language Prompts. CoRR abs/2312.15821 (2023) - 2022
- [c4]Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. ICLR 2022 - [c3]Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. INTERSPEECH 2022: 2118-2122 - [c2]Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. INTERSPEECH 2022: 4785-4789 - [c1]Wei-Ning Hsu, Bowen Shi:
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality. NeurIPS 2022 - [i5]Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. CoRR abs/2201.01763 (2022) - [i4]Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. CoRR abs/2201.02184 (2022) - [i3]Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. CoRR abs/2205.07180 (2022) - [i2]Wei-Ning Hsu, Bowen Shi:
A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer. CoRR abs/2207.07036 (2022) - [i1]Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement. CoRR abs/2212.11377 (2022)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-10-16 00:18 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint