default search action

combined dblp search
author search
venue search
publication search

ask others

Bowen Shi 0002

> Home > Persons

Person information

affiliation: Meta, Meta AI, Fundamental AI Research (FAIR), Audiobox Team, USA
affiliation: Toyota Technological Institute at Chicago, IL, USA

Other persons with the same name

see FAQ

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2025
[i18]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2502-05139
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2502-05139
Andros Tjandra, Yi-Chiao Wu, Baishan Guo, John Hoffman, Brian Ellis, Apoorv Vyas, Bowen Shi, Sanyuan Chen, Matt Le, Nick Zacharov, Carleigh Wood, Ann Lee, Wei-Ning Hsu:
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound. CoRR abs/2502.05139 (2025)
2024
[j1]
- view
  - electronic edition @ jmlr.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/jmlr/PratapTSTBKENVF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/jmlr/PratapTSTBKENVF24
Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. J. Mach. Learn. Res. 25: 97:1-97:52 (2024)
[c14]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/HanA0HCSW24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/HanA0HCSW24
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. ACL (1) 2024: 12896-12911
[c13]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ChenSN0H24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ChenSN0H24
Peng-Jen Chen, Bowen Shi, Kelvin Niu, Ann Lee, Wei-Ning Hsu:
M2BART: Multilingual and Multimodal Encoder-Decoder Pre-Training for Any-to-Any Machine Translation. ICASSP 2024: 11896-11900
[c12]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/Liu0VSTH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/Liu0VSTH24
Alexander H. Liu, Matthew Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. ICLR 2024
[c11]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/icml/PrajwalS0VTLGWA24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icml/PrajwalS0VTLGWA24
K. R. Prajwal, Bowen Shi, Matthew Le, Apoorv Vyas, Andros Tjandra, Mahi Luthra, Baishan Guo, Huiyu Wang, Triantafyllos Afouras, David Kant, Wei-Ning Hsu:
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation. ICML 2024
[c10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChienTV0SH24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChienTV0SH24
Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu:
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning. INTERSPEECH 2024
[c9]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/WuNSLKNMXSKHSC24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/WuNSLKNMXSKHSC24
Lemeng Wu, Zhaoheng Ni, Bowen Shi, Gaël Le Lan, Anurag Kumar, Varun Nagaraja, Xinhao Mei, Yunyang Xiong, Bilge Soran, Raghuraman Krishnamoorthi, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra:
Data Efficient Reflow for Few Step Audio Generation. SLT 2024: 455-461
[i17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2403-14402
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2403-14402
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang:
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception. CoRR abs/2403.14402 (2024)
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-06251
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-06251
Chung-Ming Chien, Andros Tjandra, Apoorv Vyas, Matt Le, Bowen Shi, Wei-Ning Hsu:
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning. CoRR abs/2406.06251 (2024)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2407-03648
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2407-03648
Gaël Le Lan, Bowen Shi, Zhaoheng Ni, Sidd Srinivasan, Anurag Kumar, Brian Ellis, David Kant, Varun Nagaraja, Ernie Chang, Wei-Ning Hsu, Yangyang Shi, Vikas Chandra:
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching. CoRR abs/2407.03648 (2024)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-13720
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-13720
Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le, Matthew Yu, Mitesh Kumar Singh, Peizhao Zhang, Peter Vajda, Quentin Duval, Rohit Girdhar, Roshan Sumbaly, Sai Saketh Rambhatla, Sam S. Tsai, Samaneh Azadi, Samyak Datta, Sanyuan Chen, Sean Bell, Sharadh Ramaswamy, Shelly Sheynin, Siddharth Bhattacharya, Simran Motwani, Tao Xu, Tianhe Li, Tingbo Hou, Wei-Ning Hsu, Xi Yin, Xiaoliang Dai, Yaniv Taigman, Yaqiao Luo, Yen-Cheng Liu, Yi-Chiao Wu, Yue Zhao, Yuval Kirstain, Zecheng He, Zijian He, Albert Pumarola, Ali K. Thabet, Artsiom Sanakoyeu, Arun Mallya, Baishan Guo, Boris Araya, Breena Kerr, Carleigh Wood, Ce Liu, Cen Peng, Dmitry Vengertsev, Edgar Schönfeld, Elliot Blanchard, Felix Juefei-Xu, Fraylie Nord, Jeff Liang, John Hoffman, Jonas Kohler, Kaolin Fire, Karthik Sivakumar, Lawrence Chen, Licheng Yu, Luya Gao, Markos Georgopoulos, Rashel Moritz, Sara K. Sampson, Shikai Li, Simone Parmeggiani, Steve Fine, Tara Fowler, Vladan Petrovic, Yuming Du:
Movie Gen: A Cast of Media Foundation Models. CoRR abs/2410.13720 (2024)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-20478
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-20478
K. R. Prajwal, Bowen Shi, Matthew Le, Apoorv Vyas, Andros Tjandra, Mahi Luthra, Baishan Guo, Huiyu Wang, Triantafyllos Afouras, David Kant, Wei-Ning Hsu:
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation. CoRR abs/2410.20478 (2024)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2411-05141
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2411-05141
Mu Yang, Bowen Shi, Matthew Le, Wei-Ning Hsu, Andros Tjandra:
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation. CoRR abs/2411.05141 (2024)
2023
[c8]
- view
  authority control:
- export record
  dblp key:
  - conf/cvpr/HsuRSDA23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/HsuRSDA23
Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration. CVPR 2023: 18796-18806
[c7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/AnwarSGH0W23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/AnwarSGH0W23
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. INTERSPEECH 2023: 4064-4068
[c6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NguyenHDSGFRCSH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NguyenHDSGFRCSH23
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. INTERSPEECH 2023: 4823-4827
[c5]
- view
  - electronic edition @ nips.cc (open access)
  - details & citations
- export record
  dblp key:
  - conf/nips/LeVSKSMWMAMH23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/LeVSKSMWMAMH23
Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. NeurIPS 2023
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-00628
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-00628
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, Changhan Wang:
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation. CoRR abs/2303.00628 (2023)
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-13516
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-13516
Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli:
Scaling Speech Technology to 1, 000+ Languages. CoRR abs/2305.13516 (2023)
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2306-15687
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2306-15687
Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, Wei-Ning Hsu:
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale. CoRR abs/2306.15687 (2023)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2308-05725
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2308-05725
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarandi, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux:
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. CoRR abs/2308.05725 (2023)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-16338
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-16338
Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu:
Generative Pre-training for Speech with Flow Matching. CoRR abs/2310.16338 (2023)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2312-15821
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2312-15821
Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu:
Audiobox: Unified Audio Generation with Natural Language Prompts. CoRR abs/2312.15821 (2023)
2022
[c4]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/ShiHLM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/ShiHLM22
Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. ICLR 2022
[c3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiHM22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiHM22
Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. INTERSPEECH 2022: 2118-2122
[c2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ShiMH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ShiMH22
Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. INTERSPEECH 2022: 4785-4789
[c1]
- view
  - electronic edition @ nips.cc (open access)
  - details & citations
- export record
  dblp key:
  - conf/nips/HsuS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/HsuS22
Wei-Ning Hsu, Bowen Shi:
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality. NeurIPS 2022
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2201-01763
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2201-01763
Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed:
Robust Self-Supervised Audio-Visual Speech Recognition. CoRR abs/2201.01763 (2022)
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2201-02184
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2201-02184
Bowen Shi, Wei-Ning Hsu, Kushal Lakhotia, Abdelrahman Mohamed:
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction. CoRR abs/2201.02184 (2022)
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2205-07180
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2205-07180
Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu:
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT. CoRR abs/2205.07180 (2022)
[i2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2207-07036
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2207-07036
Wei-Ning Hsu, Bowen Shi:
A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer. CoRR abs/2207.07036 (2022)
[i1]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-11377
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-11377
Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi:
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement. CoRR abs/2212.11377 (2022)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.