default search action
Qin Jin
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [b1]Qin Jin:
Robust Speaker Recognition. Karlsruhe University, Germany, 2024 - [j18]Yawen Zeng, Ning Han, Keyu Pan, Qin Jin:
Temporally Language Grounding With Multi-Modal Multi-Prompt Tuning. IEEE Trans. Multim. 26: 3366-3377 (2024) - [c173]Liang Zhang, Qin Jin, Haoyang Huang, Dongdong Zhang, Furu Wei:
Respond in my Language: Mitigating Language Inconsistency in Response Generation based on Large Language Models. ACL (1) 2024: 4177-4192 - [c172]Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin:
Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline. ACL (1) 2024: 9479-9493 - [c171]Zihao Yue, Liang Zhang, Qin Jin:
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective. ACL (1) 2024: 11766-11781 - [c170]Tenggan Zhang, Xinjie Zhang, Jinming Zhao, Li Zhou, Qin Jin:
ESCoT: Towards Interpretable Emotional Support Dialogue Systems. ACL (1) 2024: 13395-13412 - [c169]Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang:
TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging. EMNLP 2024: 1882-1898 - [c168]Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou:
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding. EMNLP (Findings) 2024: 3096-3120 - [c167]Lei Sun, Jinming Zhao, Qin Jin:
Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues. EMNLP 2024: 19988-20002 - [c166]Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin:
Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition. ICME 2024: 1-6 - [c165]Zhaopei Huang, Jinming Zhao, Qin Jin:
ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains. IJCAI 2024: 6288-6296 - [c164]Yuting Mei, Linli Yao, Qin Jin:
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos. ICMR 2024: 1034-1042 - [c163]Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Xu Sun, Qin Jin:
Edit As You Wish: Video Caption Editing with Multi-grained User Control. ACM Multimedia 2024: 1924-1933 - [c162]Yang Du, Yuqi Liu, Qin Jin:
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval. ACM Multimedia 2024: 5260-5269 - [c161]Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. ACM Multimedia 2024: 11279-11281 - [i85]Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe:
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2. CoRR abs/2401.17619 (2024) - [i84]Zihao Yue, Liang Zhang, Qin Jin:
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective. CoRR abs/2402.14545 (2024) - [i83]Boshen Xu, Sipeng Zheng, Qin Jin:
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World. CoRR abs/2403.05856 (2024) - [i82]Boshen Xu, Sipeng Zheng, Qin Jin:
SPAFormer: Sequential 3D Part Assembly with Transformers. CoRR abs/2403.05874 (2024) - [i81]Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou:
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding. CoRR abs/2403.12895 (2024) - [i80]Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin:
Movie101v2: Improved Movie Narration Benchmark. CoRR abs/2404.13370 (2024) - [i79]Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin Jin:
Think-Program-reCtify: 3D Situated Reasoning with Large Language Models. CoRR abs/2404.14705 (2024) - [i78]Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang:
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning. CoRR abs/2404.16635 (2024) - [i77]Zhaopei Huang, Jinming Zhao, Qin Jin:
ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains. CoRR abs/2405.10860 (2024) - [i76]Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin:
Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline. CoRR abs/2405.14040 (2024) - [i75]Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, Qin Jin:
EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? CoRR abs/2405.17719 (2024) - [i74]Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin:
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units. CoRR abs/2406.07725 (2024) - [i73]Yuning Wu, Chunlei Zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin Jin:
TokSing: Singing Voice Synthesis based on Discrete Tokens. CoRR abs/2406.08416 (2024) - [i72]Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin:
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models. CoRR abs/2406.08905 (2024) - [i71]Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin:
Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition. CoRR abs/2406.08997 (2024) - [i70]Yuxun Tang, Jiatong Shi, Yuning Wu, Qin Jin:
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction. CoRR abs/2406.10911 (2024) - [i69]Tenggan Zhang, Xinjie Zhang, Jinming Zhao, Li Zhou, Qin Jin:
ESCoT: Towards Interpretable Emotional Support Dialogue Systems. CoRR abs/2406.10960 (2024) - [i68]Yuting Mei, Linli Yao, Qin Jin:
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos. CoRR abs/2406.16301 (2024) - [i67]Ye Wang, Yuting Mei, Sipeng Zheng, Qin Jin:
QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds. CoRR abs/2406.16578 (2024) - [i66]Dingyi Yang, Qin Jin:
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation. CoRR abs/2408.14622 (2024) - [i65]Anwen Hu, Haiyang Xu, Liang Zhang, Jiabo Ye, Ming Yan, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou:
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding. CoRR abs/2409.03420 (2024) - [i64]Liangyu Chen, Zihao Yue, Boshen Xu, Qin Jin:
Unveiling Visual Biases in Audio-Visual Localization Benchmarks. CoRR abs/2409.06709 (2024) - [i63]Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. CoRR abs/2409.07226 (2024) - [i62]Jiatong Shi, Jinchuan Tian, Yihan Wu, Jee-weon Jung, Jia Qi Yip, Yoshiki Masuyama, William Chen, Yuning Wu, Yuxun Tang, Massa Baali, Dareen Alharthi, Dong Zhang, Ruifan Deng, Tejes Srivastava, Haibin Wu, Alexander H. Liu, Bhiksha Raj, Qin Jin, Ruihua Song, Shinji Watanabe:
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech. CoRR abs/2409.15897 (2024) - [i61]Lei Sun, Jinming Zhao, Qin Jin:
Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues. CoRR abs/2409.19723 (2024) - [i60]Ye Wang, Sipeng Zheng, Bin Cao, Qianshan Wei, Qin Jin, Zongqing Lu:
Quo Vadis, Motion Generation? From Large Language Models to Large Motion Models. CoRR abs/2410.03311 (2024) - 2023
- [j17]Liang Zhang, Ludan Ruan, Anwen Hu, Qin Jin:
Multimodal Pretraining from Monolingual to Multilingual. Mach. Intell. Res. 20(2): 220-232 (2023) - [j16]Yun Zhang, Qi Lu, Qin Jin, Wanting Meng, Shuhu Yang, Shen Huang, Yanling Han, Zhonghua Hong, Zhansheng Chen, Weiliang Liu:
Global Sea Surface Height Measurement From CYGNSS Based on Machine Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 16: 841-852 (2023) - [c160]Yuqi Liu, Luhui Xu, Pengfei Xiong, Qin Jin:
Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language. AAAI 2023: 1781-1789 - [c159]Yawen Zeng, Qin Jin, Tengfei Bao, Wenfeng Li:
Multi-Modal Knowledge Hypergraph for Diverse Image Retrieval. AAAI 2023: 3376-3383 - [c158]Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin Jin:
Accommodating Audio Modality in CLIP for Multimodal Processing. AAAI 2023: 9641-9649 - [c157]Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin:
MPMQA: Multimodal Question Answering on Product Manuals. AAAI 2023: 13958-13966 - [c156]Tao Qian, Fan Lou, Jiatong Shi, Yuning Wu, Shuai Guo, Xiang Yin, Qin Jin:
UniLG: A Unified Structure-aware Framework for Lyrics Generation. ACL (1) 2023: 983-1001 - [c155]Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin:
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation. ACL (1) 2023: 3171-3185 - [c154]Zihao Yue, Qi Zhang, Anwen Hu, Liang Zhang, Ziheng Wang, Qin Jin:
Movie101: A New Movie Understanding Benchmark. ACL (1) 2023: 4669-4684 - [c153]Dingyi Yang, Qin Jin:
Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text. ACL (1) 2023: 11053-11066 - [c152]Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo:
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation. CVPR 2023: 10219-10228 - [c151]Sipeng Zheng, Boshen Xu, Qin Jin:
Open-Category Human-Object Interaction Pre-training via Language Modeling Framework. CVPR 2023: 19392-19402 - [c150]Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Lin, Fei Huang:
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model. EMNLP (Findings) 2023: 2841-2858 - [c149]Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin Jin:
Phoneix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation With Phoneme Distribution Predictor. ICASSP 2023: 1-5 - [c148]Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin:
Explore and Tell: Embodied Visual Captioning in 3D Environments. ICCV 2023: 2482-2491 - [c147]Jieting Chen, Junkai Ding, Wenping Chen, Qin Jin:
Knowledge Enhanced Model for Live Video Comment Generation. ICME 2023: 2267-2272 - [c146]Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu:
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World. ACM Multimedia 2023: 1303-1313 - [c145]Boshen Xu, Sipeng Zheng, Qin Jin:
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World. ACM Multimedia 2023: 2807-2816 - [c144]Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin:
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences. ACM Multimedia 2023: 5705-5715 - [c143]Yuchen Liu, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma, Qin Jin:
Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation. ACM Multimedia 2023: 5966-5974 - [c142]Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin:
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation. NeurIPS 2023 - [c141]Zhaopei Huang, Jinming Zhao, Qin Jin:
Two-Stage Adaptation for Cross-Corpus Multimodal Emotion Recognition. NLPCC (2) 2023: 431-443 - [c140]Weijing Chen, Linli Yao, Qin Jin:
Rethinking Benchmarks for Cross-modal Image-text Retrieval. SIGIR 2023: 1241-1251 - [c139]Linli Yao, Weijing Chen, Qin Jin:
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge. WWW 2023: 2392-2401 - [i59]Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu:
TikTalk: A Multi-Modal Dialogue Dataset for Real-World Chitchat. CoRR abs/2301.05880 (2023) - [i58]Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin Jin:
Accommodating Audio Modality in CLIP for Multimodal Processing. CoRR abs/2303.06591 (2023) - [i57]Yuning Wu, Jiatong Shi, Tao Qian, Dongji Gao, Qin Jin:
PHONEix: Acoustic Feature Processing Strategy for Enhanced Singing Pronunciation with Phoneme Distribution Predictor. CoRR abs/2303.08607 (2023) - [i56]Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin:
MPMQA: Multimodal Question Answering on Product Manuals. CoRR abs/2304.09660 (2023) - [i55]Weijing Chen, Linli Yao, Qin Jin:
Rethinking Benchmarks for Cross-modal Image-text Retrieval. CoRR abs/2304.10824 (2023) - [i54]Jieting Chen, Junkai Ding, Wenping Chen, Qin Jin:
Knowledge Enhanced Model for Live Video Comment Generation. CoRR abs/2304.14657 (2023) - [i53]Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin:
InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation. CoRR abs/2305.06002 (2023) - [i52]Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin:
Edit As You Wish: Video Description Editing with Multi-grained Commands. CoRR abs/2305.08389 (2023) - [i51]Zihao Yue, Qi Zhang, Anwen Hu, Liang Zhang, Ziheng Wang, Qin Jin:
Movie101: A New Movie Understanding Benchmark. CoRR abs/2305.12140 (2023) - [i50]Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin:
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation. CoRR abs/2306.13460 (2023) - [i49]Qi Zhang, Sipeng Zheng, Qin Jin:
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection. CoRR abs/2307.10567 (2023) - [i48]Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin:
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences. CoRR abs/2307.16399 (2023) - [i47]Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin:
A Systematic Exploration of Joint-training for Singing Voice Synthesis. CoRR abs/2308.02867 (2023) - [i46]Anwen Hu, Shizhe Chen, Liang Zhang, Qin Jin:
Explore and Tell: Embodied Visual Captioning in 3D Environments. CoRR abs/2308.10447 (2023) - [i45]Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang:
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model. CoRR abs/2310.05126 (2023) - 2022
- [j15]Ludan Ruan, Qin Jin:
Survey: Transformer based video-language pre-training. AI Open 3: 1-13 (2022) - [j14]Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang:
Enhancing Neural Machine Translation With Dual-Side Multimodal Awareness. IEEE Trans. Multim. 24: 3013-3024 (2022) - [c138]Linli Yao, Weiying Wang, Qin Jin:
Image Difference Captioning with Pre-training and Contrastive Learning. AAAI 2022: 3108-3116 - [c137]Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li:
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database. ACL (1) 2022: 5699-5710 - [c136]Yuchen Liu, Jinming Zhao, Jingwen Hu, Ruichen Li, Qin Jin:
DialogueEIN: Emotion Interaction Network for Dialogue Affective Analysis. COLING 2022: 684-693 - [c135]Liyu Meng, Yuchen Liu, Xiaolong Liu, Zhaopei Huang, Wenqiang Jiang, Tenggan Zhang, Chuanhe Liu, Qin Jin:
Valence and Arousal Estimation based on Multimodal Temporal-Aware Features for Videos in the Wild. CVPR Workshops 2022: 2344-2351 - [c134]Sipeng Zheng, Shizhe Chen, Qin Jin:
VRDFormer: End-to-End Video Visual Relation Detection with Transformers. CVPR 2022: 18814-18824 - [c133]Tenggan Zhang, Chuanhe Liu, Xiaolong Liu, Yuchen Liu, Liyu Meng, Lei Sun, Wenqiang Jiang, Fengyuan Zhang, Jinming Zhao, Qin Jin:
Multi-Task Learning Framework for Emotion Recognition In-the-Wild. ECCV Workshops (6) 2022: 143-156 - [c132]Sipeng Zheng, Shizhe Chen, Qin Jin:
Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning. ECCV (4) 2022: 297-313 - [c131]Yuqi Liu, Pengfei Xiong, Luhui Xu, Shengming Cao, Qin Jin:
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval. ECCV (14) 2022: 319-335 - [c130]Qi Zhang, Yuqing Song, Qin Jin:
Unifying Event Detection and Captioning as Sequence Generation via Pre-training. ECCV (36) 2022: 363-379 - [c129]Qi Zhang, Zihao Yue, Anwen Hu, Ziheng Wang, Qin Jin:
MovieUN: A Dataset for Movie Understanding and Narrating. EMNLP (Findings) 2022: 1873-1885 - [c128]Yuwen Chen, Jian Ma, Peihu Zhu, Xiaoming Huang, Qin Jin:
Leveraging Trust Relations to Improve Academic Patent Recommendation. HICSS 2022: 1-10 - [c127]Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li:
Memobert: Pre-Training Model with Prompt-Based Learning for Multimodal Emotion Recognition. ICASSP 2022: 4703-4707 - [c126]Tao Qian, Jiatong Shi, Shuai Guo, Peter Wu, Qin Jin:
Training Strategies for Automatic Song Writing: A Unified Framework Perspective. ICASSP 2022: 4738-4742 - [c125]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. INTERSPEECH 2022: 4272-4276 - [c124]Jiatong Shi, Shuai Guo, Tao Qian, Tomoki Hayashi, Yuning Wu, Fangzheng Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin Jin:
Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis. INTERSPEECH 2022: 4277-4281 - [c123]Xavier Alameda-Pineda, Qin Jin, Vincent Oria, Laura Toni:
M4MM '22: 1st International Workshop on Methodologies for Multimedia. ACM Multimedia 2022: 7394-7396 - [c122]Si Liu, Qin Jin, Luoqi Liu, Zongheng Tang, Linli Lin:
PIC'22: 4th Person in Context Workshop. ACM Multimedia 2022: 7418-7419 - [c121]Liang Zhang, Anwen Hu, Qin Jin:
Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval. NeurIPS 2022 - [c120]Yida Zhao, Yuqing Song, Qin Jin:
Progressive Learning for Image Retrieval with Hybrid-Modality Queries. SIGIR 2022: 1012-1021 - [e2]João Magalhães, Alberto Del Bimbo, Shin'ichi Satoh, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, Laura Toni:
MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022. ACM 2022, ISBN 978-1-4503-9203-7 [contents] - [i44]Linli Yao, Weiying Wang, Qin Jin:
Image Difference Captioning with Pre-training and Contrastive Learning. CoRR abs/2202.04298 (2022) - [i43]Liyu Meng, Yuchen Liu, Xiaolong Liu, Zhaopei Huang, Yuan Cheng, Meng Wang, Chuanhe Liu, Qin Jin:
Multi-modal Emotion Estimation for in-the-wild Videos. CoRR abs/2203.13032 (2022) - [i42]Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin:
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. CoRR abs/2203.17001 (2022) - [i41]Yida Zhao, Yuqing Song, Qin Jin:
Progressive Learning for Image Retrieval with Hybrid-Modality Queries. CoRR abs/2204.11212 (2022) - [i40]Jiatong Shi, Shuai Guo, Tao Qian, Nan Huo, Tomoki Hayashi, Yuning Wu, Frank Xu, Xuankai Chang, Huazhe Li, Peter Wu, Shinji Watanabe, Qin Jin:
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis. CoRR abs/2205.04029 (2022) - [i39]Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li:
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database. CoRR abs/2205.10237 (2022) - [i38]Liang Zhang, Anwen Hu, Qin Jin:
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition. CoRR abs/2206.11091 (2022) - [i37]Yuqi Liu, Pengfei Xiong, Luhui Xu, Shengming Cao, Qin Jin:
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval. CoRR abs/2207.07852 (2022) - [i36]Qi Zhang, Yuqing Song, Qin Jin:
Unifying Event Detection and Captioning as Sequence Generation via Pre-Training. CoRR abs/2207.08625 (2022) - [i35]Sipeng Zheng, Qi Zhang, Bei Liu, Qin Jin, Jianlong Fu:
Exploring Anchor-based Detection for Ego4D Natural Language Query. CoRR abs/2208.05375 (2022) - [i34]