


default search action
Yu Qiao 0001
Person information
- affiliation: Shanghai AI Laboratory, OpenGVLab, China
- affiliation: Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology, China
- affiliation (former): University of Tokyo, Graduate School of Information Science and Technology, Japan
- affiliation (PhD 2006): University of Electro-Communications, Tokyo, Japan
Other persons with the same name
- Yu Qiao — disambiguation page
- Yu Qiao 0002 — Biomedical Imaging Lab, Singapore
- Yu Qiao 0003
— Shanghai Jiao Tong University, Department of Automation, Institute of Image Processing and Pattern Recognition, China (and 1 more) - Yu Qiao 0004
— Kyung Hee University, School of Computing, Department of Artificial Intelligence, Yongin, South Korea (and 1 more) - Yu Qiao 0005 — RWTH Aachen University, Germany
- Yu Qiao 0006
— Nanjing University, National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2025
[j136]Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Yu Qiao, Li Li, Fei-Yue Wang:
Building intelligence identification system via large language model watermarking: a survey and beyond. Artif. Intell. Rev. 58(8): 249 (2025)
[j135]Shixiang Wu, Chao Dong, Yu Qiao:
Exploring Contextual Priors for Real-World Image Super-Resolution. Comput. Vis. Media 11(1): 159-177 (2025)
[j134]Yu Qiao, Xiaohui Yang
, Jing Wang, Tongzhen Si, Qingbei Guo:
Driver Cognitive Distraction Detection based on eye movement behavior and integration of multi-view space-channel feature. Expert Syst. Appl. 266: 125975 (2025)
[j133]Yaohui Wang
, Xin Ma, Xinyuan Chen, Cunjian Chen, Antitza Dantcheva, Bo Dai, Yu Qiao:
LEO: Generative Latent Image Animator for Human Video Synthesis. Int. J. Comput. Vis. 133(3): 1277-1289 (2025)
[j132]Yaohui Wang
, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu:
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models. Int. J. Comput. Vis. 133(5): 3059-3078 (2025)
[j131]Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu
:
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy. Int. J. Comput. Vis. 133(8): 5806-5821 (2025)
[j130]Baoqi Pei, Yifei Huang
, Guo Chen, Jilan Xu, Yali Wang, Limin Wang, Tong Lu, Yu Qiao, Fei Wu:
Guiding Audio-Visual Question Answering with Collective Question Reasoning. Int. J. Comput. Vis. 133(10): 6912-6929 (2025)
[j129]Yifei Huang
, Jilan Xu
, Baoqi Pei
, Lijin Yang
, Mingfang Zhang
, Yuping He
, Guo Chen
, Xinyuan Chen
, Yaohui Wang
, Zheng Nie
, Jinyao Liu
, Dechen Lin
, Fang Fang
, Kunpeng Li
, Chang Yuan
, Yu Qiao
, Yali Wang
, Limin Wang
:
Vinci: A Real-time Smart Assistant Based on Egocentric Vision-language Model for Portable Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 9(3): 88:1-88:33 (2025)
[j128]Ziyan Huang
, Zhongying Deng, Jin Ye
, Haoyu Wang, Yanzhou Su
, Tianbin Li, Hui Sun, Junlong Cheng
, Jianpin Chen, Junjun He
, Yun Gu, Shaoting Zhang, Lixu Gu, Yu Qiao:
A-Eval: A benchmark for cross-dataset and cross-modality evaluation of abdominal multi-organ segmentation. Medical Image Anal. 101: 103499 (2025)
[j127]Peng Xu
, Wenqi Shao
, Kaipeng Zhang, Peng Gao
, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao
, Ping Luo
:
LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 1877-1893 (2025)
[j126]Zhiqi Li
, Wenhai Wang
, Hongyang Li
, Enze Xie
, Chonghao Sima
, Tong Lu
, Yu Qiao, Jifeng Dai
:
BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers. IEEE Trans. Pattern Anal. Mach. Intell. 47(3): 2020-2036 (2025)
[j125]Xiaowei Hu
, Min Shi, Weiyun Wang
, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhou, Lewei Lu, Jie Zhou
, Xiaogang Wang
, Yu Qiao
, Jifeng Dai
:
Demystify Transformers & Convolutions in Modern Image Deep Networks. IEEE Trans. Pattern Anal. Mach. Intell. 47(4): 2416-2428 (2025)
[j124]Haoyi Zhu
, Honghui Yang
, Xiaoyang Wu
, Di Huang
, Sha Zhang
, Xianglong He
, Hengshuang Zhao
, Chunhua Shen
, Yu Qiao
, Tong He
, Wanli Ouyang
:
PonderV2: Improved 3D Representation With a Universal Pre-Training Paradigm. IEEE Trans. Pattern Anal. Mach. Intell. 47(8): 6550-6565 (2025)
[j123]Xiangchao Yan
, Runjian Chen
, Bo Zhang
, Hancheng Ye
, Renqiu Xia, Jiakang Yuan
, Hongbin Zhou
, Xinyu Cai, Botian Shi
, Wenqi Shao
, Ping Luo
, Yu Qiao
, Tao Chen
, Junchi Yan
:
SPOT: Scalable 3D Pre-Training via Occupancy Prediction for Learning Transferable 3D Representations. IEEE Trans. Pattern Anal. Mach. Intell. 47(11): 9609-9625 (2025)
[j122]Zhaokai Wang
, Xizhou Zhu, Xue Yang
, Gen Luo, Hao Li, Changyao Tian
, Wenhan Dou
, Junqi Ge, Lewei Lu
, Yu Qiao
, Jifeng Dai
:
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. IEEE Trans. Pattern Anal. Mach. Intell. 47(11): 10142-10159 (2025)
[j121]Zihan Li
, Diping Song
, Zefeng Yang, Deming Wang
, Fei Li
, Xiulan Zhang, Paul E. Kinahan
, Yu Qiao
:
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced With Clinical Knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 47(12): 11848-11862 (2025)
[j120]Boyu Chen
, Siran Chen
, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang:
Percept, Chat, Adapt: Knowledge transfer of foundation models for open-world video recognition. Pattern Recognit. 160: 111189 (2025)
[j119]Xiaohui Yang
, Yu Qiao, Tongzhen Si, Jing Wang, Tao Xu:
Eye-SCAN: Eye-Movement-Attention-based Spatial Channel Adaptive Network for traffic accident prediction. Pattern Recognit. 165: 111590 (2025)
[j118]Qingsong Zhao
, Yi Wang
, Yinan He, Yu Qiao
, Cairong Zhao
:
Learning Discriminative Representations in Videos via Active Embedding Distance Correlation. IEEE Signal Process. Lett. 32: 56-60 (2025)
[j117]Wenqi Shao
, Meng Lei, Yutao Hu, Peng Gao
, Peng Xu, Kaipeng Zhang
, Fanqing Meng
, Siyuan Huang, Hongsheng Li
, Yu Qiao
, Ping Luo:
TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models. IEEE Trans. Big Data 11(3): 933-947 (2025)
[j116]Lu Dong
, Haiyu Zhang
, Hongjie Zhang
, Yifei Huang, Zhen-Hua Ling
, Yu Qiao
, Limin Wang, Yali Wang
:
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining. IEEE Trans. Circuits Syst. Video Technol. 35(10): 10396-10409 (2025)
[j115]Hao Zhang
, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Nanning Zheng
, Kaipeng Zhang
:
B-AVIBench: Toward Evaluating the Robustness of Large Vision-Language Model on Black-Box Adversarial Visual-Instructions. IEEE Trans. Inf. Forensics Secur. 20: 1434-1446 (2025)
[j114]Weidong Zhang
, Yu Qiao, Ying Liu, Ran Song
, Wei Zhang
:
Fast 3D Room Layout Estimation Based on Compact High-Level Representation. IEEE Trans. Image Process. 34: 3930-3943 (2025)
[j113]Xin Ma, Yaohui Wang, Xinyuan Chen, Gengyun Jia, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao:
Latte: Latent Diffusion Transformer for Video Generation. Trans. Mach. Learn. Res. 2025 (2025)
[j112]Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
LASP: Linear Attention Sequence Parallelism. Trans. Mach. Learn. Res. 2025 (2025)
[j111]Xu Liu, Tong Zhou, Chong Wang, Yuping Wang, Yuanxin Wang, Qinjingwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen:
Toward the unification of generative and discriminative visual foundation model: a survey. Vis. Comput. 41(5): 3371-3412 (2025)
[c424]Junyi Chen, Weicai Ye, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, Tong He:
GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction. AAAI 2025: 2088-2096
[c423]Siran Chen, Yuxiao Luo, Yue Ma, Yu Qiao, Yali Wang:
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving. AAAI 2025: 2212-2220
[c422]Yanbo Ding, Shaobin Zhuang, Kunchang Li, Zhengrong Yue, Yu Qiao, Yali Wang:
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration. AAAI 2025: 2753-2761
[c421]Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu:
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis. ACL (1) 2025: 5555-5579
[c420]Fan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao, Ziwei Liu:
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models. ACL (1) 2025: 7561-7582
[c419]Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu:
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models. ACL (1) 2025: 12975-12993
[c418]Jiakang Yuan, Xiangchao Yan, Bo Zhang, Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, Bowen Zhou:
Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback. ACL (1) 2025: 21768-21789
[c417]Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge, Jionglong Su, Junjun He, Yu Qiao:
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation. ACL (1) 2025: 22477-22503
[c416]Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao:
LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts. ACL (1) 2025: 24763-24785
[c415]Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang:
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation. CVPR 2025: 56-66
[c414]Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang:
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation. CVPR 2025: 3173-3183
[c413]Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, Junjun He:
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. CVPR 2025: 5134-5143
[c412]Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai:
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. CVPR 2025: 14559-14569
[c411]Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao:
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models. CVPR 2025: 19867-19878
[c410]Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu:
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. CVPR 2025: 24960-24971
[c409]Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang:
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment. CVPR 2025: 29880-29892
[c408]Mingzhou Liu, Ching-Wen Lee, Xinwei Sun, Xueqing Yu, Yu Qiao, Yizhou Wang:
Learning Causal Alignment for Reliable Disease Diagnosis. ICLR 2025
[c407]Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding, Liheng Chen, Paul Pu Liang, Yu Qiao:
OS-ATLAS: Foundation Action Model for Generalist GUI Agents. ICLR 2025
[c406]Hengwei Bian, Lingdong Kong, Haozhe Xie, Liang Pan, Yu Qiao, Ziwei Liu:
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes. ICLR 2025
[c405]Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures. ICLR 2025
[c404]Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xie, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, Tong He, Jingwen He, Junjun He, Yu Qiao, Hongsheng Li:
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation. ICLR 2025
[c403]Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong:
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality. ICLR 2025
[c402]Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Tianshuo Yang, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models. ICLR 2025
[c401]Baoqi Pei, Yifei Huang, Jilan Xu, Guo Chen, Yuping He, Lijin Yang, Yali Wang, Weidi Xie, Yu Qiao, Fei Wu, Limin Wang:
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning. ICLR 2025
[c400]Chongjie Si, Xuehui Wang, Xue Yang, Zhengqin Xu, Qingyun Li, Jifeng Dai, Yu Qiao, Xiaokang Yang, Wei Shen:
Maintaining Structural Integrity in Parameter Spaces for Parameter Efficient Fine-tuning. ICLR 2025
[c399]Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang:
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel. ICLR 2025
[c398]Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang:
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning. ICLR 2025
[c397]Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao:
REEF: Representation Encoding Fingerprints for Large Language Models. ICLR 2025
[c396]Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong:
An Intelligent Agentic System for Complex Image Restoration Problems. ICLR 2025
[c395]Yu Qiao, Tianyu Meng, Huilin Ge, Xinning Wang, Jiayue Zhao, Qianchen Xia, Xin Yang:
Localization Hints Exploration for Object Matting. ICME 2025: 1-6
[c394]Guoqing Zhao, Qi Zhang, Shaopeng Zhai, Dazhong Shen, Tianyi Zhang, Yu Qiao, Tong Xu:
I-Lora: Iterative Merging of Routing-Tuned Low-Rank Adapters for Multi-Task Learning. ICME 2025: 1-6
[c393]Zhaodong Wu, Qiaochu Zhao, Ming Hu, Yulong Li, Haochen Xue, Zhengyong Jiang, Angelos Stefanidis, Qiufeng Wang, Imran Razzak, Zongyuan Ge, Junjun He, Yu Qiao, Zhong Zheng, Feilong Tang, Kang Dang, Jionglong Su:
MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset. MICCAI (2) 2025: 378-388
[i513]Xinhao Li, Yi Wang, Jiashuo Yu, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, Limin Wang:
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling. CoRR abs/2501.00574 (2025)
[i512]Jiakang Yuan, Xiangchao Yan, Botian Shi, Tao Chen, Wanli Ouyang, Bo Zhang, Lei Bai, Yu Qiao, Bowen Zhou:
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback. CoRR abs/2501.03916 (2025)
[i511]Siran Chen, Yuxiao Luo, Yue Ma, Yu Qiao, Yali Wang:
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving. CoRR abs/2501.04302 (2025)
[i510]Zhaokai Wang, Xizhou Zhu, Xue Yang, Gen Luo, Hao Li, Changyao Tian, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding. CoRR abs/2501.07783 (2025)
[i509]Weichen Fan, Chenyang Si, Junhao Song, Zhenyu Yang, Yinan He, Long Zhuo, Ziqi Huang, Ziyue Dong, Jingwen He, Dongwei Pan, Yi Wang, Yuming Jiang, Yaohui Wang, Peng Gao, Xinyuan Chen, Hengjie Li, Dahua Lin, Yu Qiao, Ziwei Liu:
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models. CoRR abs/2501.08453 (2025)
[i508]Chenyang Si, Weichen Fan, Zhengyao Lv, Ziqi Huang, Yu Qiao, Ziwei Liu:
RepVideo: Rethinking Cross-Layer Representation for Video Generation. CoRR abs/2501.08994 (2025)
[i507]Xiaohui Li, Yihao Liu, Shuo Cao, Ziyan Chen, Shaobin Zhuang, Xiangyu Chen, Yinan He, Yi Wang, Yu Qiao:
DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency. CoRR abs/2501.10110 (2025)
[i506]Yi Wang, Xinhao Li, Ziang Yan, Yinan He, Jiashuo Yu, Xiangyu Zeng, Chenting Wang, Changlian Ma, Haian Huang, Jianfei Gao, Min Dou, Kai Chen, Wenhai Wang, Yu Qiao, Yali Wang, Limin Wang:
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling. CoRR abs/2501.12386 (2025)
[i505]Jia Yu, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, ShaSha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He:
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages. CoRR abs/2501.14506 (2025)
[i504]Dongyang Liu, Shicheng Li, Yutong Liu, Zhen Li, Kai Wang, Xinyue Li, Qi Qin, Yufei Liu, Yi Xin, Zhongyu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li, Peng Gao:
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT. CoRR abs/2502.06782 (2025)
[i503]Daocheng Fu, Naiting Zhong, Xu Han, Pinlong Cai, Licheng Wen, Song Mao, Botian Shi, Yu Qiao:
LimSim Series: An Autonomous Driving Simulation Platform for Validation and Enhancement. CoRR abs/2502.09170 (2025)
[i502]Haochen Xue, Feilong Tang, Ming Hu, Yexin Liu, Qidong Huang, Yulong Li, Chengzhi Liu, Zhongxing Xu, Chong Zhang, Chun-Mei Feng, Yutong Xie, Imran Razzak, Zongyuan Ge
, Jionglong Su, Junjun He, Yu Qiao:
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation. CoRR abs/2502.11903 (2025)
[i501]Baoqi Pei, Yifei Huang, Jilan Xu, Guo Chen, Yuping He, Lijin Yang, Yali Wang, Weidi Xie, Yu Qiao, Fei Wu, Limin Wang:
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning. CoRR abs/2503.00986 (2025)
[i500]Yifei Huang, Jilan Xu, Baoqi Pei, Yuping He, Guo Chen, Mingfang Zhang, Lijin Yang, Zheng Nie, Jinyao Liu, Guoshun Fan, Dechen Lin, Fang Fang, Kunpeng Li, Chang Yuan, Xinyuan Chen, Yaohui Wang, Yali Wang, Yu Qiao, Limin Wang:
An Egocentric Vision-Language Model based Portable Real-time Smart Assistant. CoRR abs/2503.04250 (2025)
[i499]AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui
, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang, Yu Qiao, Guanghui Ren
, Cheng Ruan, Jiaqi Shan, Yongjian Shen, Chengshi Shi, Mingkang Shi, Modi Shi, Chonghao Sima, Jianheng Song, Huijie Wang, Wenhao Wang, Dafeng Wei, Chengen Xie, Guo Xu, Junchi Yan, Cunbiao Yang, Lei Yang, Shukai Yang, Maoqing Yao, Jia Zeng, Chi Zhang, Qinglin Zhang, Bin Zhao, Chengyue Zhao, Jiaqi Zhao, Jianchao Zhu:
AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems. CoRR abs/2503.06669 (2025)
[i498]Fanqing Meng, Lingxiao Du, Zongkai Liu, Zhixiang Zhou, Quanfeng Lu, Daocheng Fu, Botian Shi, Wenhai Wang, Junjun He, Kaipeng Zhang, Ping Luo, Yu Qiao, Qiaosheng Zhang, Wenqi Shao:
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning. CoRR abs/2503.07365 (2025)
[i497]Weiyun Wang, Zhangwei Gao, Lianjie Chen, Zhe Chen, Jinguo Zhu, Xiangyu Zhao, Yangzhou Liu, Yue Cao, Shenglong Ye, Xizhou Zhu, Lewei Lu, Haodong Duan, Yu Qiao, Jifeng Dai, Wenhai Wang:
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning. CoRR abs/2503.10291 (2025)
[i496]Zhaodong Wu, Qiaochu Zhao, Ming Hu, Yulong Li, Haochen Xue, Kang Dang, Zhengyong Jiang, Angelos Stefanidis, Qiufeng Wang, Imran Razzak, Zongyuan Ge, Junjun He, Yu Qiao, Zhong Zheng, Feilong Tang, Jionglong Su:
MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset. CoRR abs/2503.13560 (2025)
[i495]Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu
, Yunhong Wang, Yu Qiao:
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset. CoRR abs/2503.19462 (2025)
[i494]Zhi Hou, Tianyi Zhang
, Yuwen Xiong, Haonan Duan, Hengjun Pu, Ronglei Tong, Chengyang Zhao, Xizhou Zhu, Yu Qiao, Jifeng Dai, Yuntao Chen:
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy. CoRR abs/2503.19757 (2025)
[i493]Shitian Zhao, Qilong Wu
, Xinyue Li, Bo Zhang, Ming Li, Qi Qin, Dongyang Liu, Kai Zhang, Hongsheng Li, Yu Qiao, Peng Gao, Bin Fu, Zhen Li:
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis. CoRR abs/2503.21749 (2025)
[i492]Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, Ziwei Liu:
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness. CoRR abs/2503.21755 (2025)
[i491]Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao:
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework. CoRR abs/2503.21758 (2025)
[i490]Ruifeng Luo, Zhengjie Liu, Tianxiao Cheng, Jie Wang, Tongjie Wang, Xingguang Wei, Haomin Wang
, Yanpeng Li, Fu Chai, Fei Cheng, Shenglong Ye, Wenhai Wang, Yanting Zhang, Yu Qiao, Hongjie Zhang, Xianzhong Zhao:
ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting. CoRR abs/2503.22346 (2025)
[i489]Yuandong Pu, Le Zhuo, Kaiwen Zhu, Liangbin Xie, Wenlong Zhang, Xiangyu Chen, Peng Gao, Yu Qiao, Chao Dong, Yihao Liu:
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision. CoRR abs/2504.04903 (2025)
[i488]Xinhao Li, Ziang Yan, Desen Meng, Lu Dong, Xiangyu Zeng, Yinan He, Yali Wang, Yu Qiao, Yi Wang, Limin Wang:
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning. CoRR abs/2504.06958 (2025)
[i487]Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu
, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang
, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang, Jiapeng Luo, Yi Wang, Conghui He, Botian Shi, Xingcheng Zhang, Wenqi Shao, Junjun He, Yingtong Xiong, Wenwen Qu, Peng Sun, Penglong Jiao, Han Lv, Lijun Wu, Kaipeng Zhang, Huipeng Deng, Jiaye Ge, Kai Chen, Limin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models. CoRR abs/2504.10479 (2025)
[i486]Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang:
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation. CoRR abs/2504.11739 (2025)
[i485]Daocheng Fu, Zijun Chen, Renqiu Xia, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng, Peng Gao, Junchi Yan, Botian Shi, Bo Zhang, Yu Qiao:
TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving. CoRR abs/2504.15780 (2025)
[i484]Siqi Li, Yufan Shen, Xiangnan Chen, Jiayi Chen, Hengwei Ju, Haodong Duan, Song Mao, Hongbin Zhou, Bo Zhang, Bin Fu, Pinlong Cai, Licheng Wen, Botian Shi, Yong Liu, Xinyu Cai, Yu Qiao:
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling. CoRR abs/2505.00063 (2025)
[i483]Lu Dong, Haiyu Zhang, Hongjie Zhang, Yifei Huang, Zhen-Hua Ling, Yu Qiao, Limin Wang, Yali Wang:
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining. CoRR abs/2505.06557 (2025)
[i482]Jianbiao Mei, Tao Hu, Daocheng Fu, Licheng Wen, Xuemeng Yang, Rong Wu, Pinlong Cai, Xinyu Cai, Xing Gao, Yu Yang, Chengjun Xie, Botian Shi, Yong Liu, Yu Qiao:
O2-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering. CoRR abs/2505.16582 (2025)
[i481]Xingguang Wei, Haomin Wang
, Shenglong Ye, Ruifeng Luo, Yanting Zhang, Lixin Gu, Jifeng Dai, Yu Qiao, Wenhai Wang, Hongjie Zhang:
Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings. CoRR abs/2505.23395 (2025)
[i480]Chenyu Yang, Shiqian Su, Shi Liu, Xuan Dong, Yue Yu, Weijie Su, Xuehui Wang, Zhaoyang Liu, Jinguo Zhu, Hao Li, Wenhai Wang, Yu Qiao, Xizhou Zhu, Jifeng Dai:
ZeroGUI: Automating Online GUI Learning at Zero Human Cost. CoRR abs/2505.23762 (2025)
[i479]Gen Luo, Ganlin Yang, Ziyang Gong, Guanzhou Chen, Haonan Duan, Erfei Cui, Ronglei Tong, Zhi Hou, Tianyi Zhang, Zhe Chen, Shenglong Ye, Lewei Lu, Jingbo Wang, Wenhai Wang, Jifeng Dai, Yu Qiao, Rongrong Ji, Xizhou Zhu:
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces. CoRR abs/2506.00123 (2025)
[i478]Yue Yang, MingKang Chen, Qihua Liu, Mengkang Hu, Qiguang Chen, Gengrui Zhang, Shuyue Hu, Guangtao Zhai, Yu Qiao, Yu Wang, Wenqi Shao, Ping Luo:
Truly Assessing Fluid Intelligence of Large Language Models through Dynamic Reasoning Evaluation. CoRR abs/2506.02648 (2025)
[i477]Zhengyao Lv, Chenyang Si, Tianlin Pan
, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu:
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation. CoRR abs/2506.03123 (2025)
[i476]Zikang Wang, Boyu Chen, Zhengrong Yue, Yi Wang, Yu Qiao, Limin Wang, Yali Wang:
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning. CoRR abs/2506.06097 (2025)
[i475]Boyu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang:
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding. CoRR abs/2506.07576 (2025)
[i474]Jiashuo Yu, Yue Wu, Meng Chu, Zhifei Ren, Zizheng Huang, Pei Chu, Ruijie Zhang, Yinan He, Qirui Li, Songze Li, Zhenxiang Li, Zhongying Tu, Conghui He, Yu Qiao, Yali Wang, Yi Wang, Limin Wang:
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos. CoRR abs/2506.10857 (2025)
[i473]Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu, Tong He, Jiangmiao Pang, Yu Qiao, Yunde Jia, Kaipeng Zhang:
Sekai: A Video Dataset towards World Exploration. CoRR abs/2506.15675 (2025)
[i472]Teng Li, Quanfeng Lu, Lirui Zhao, Hao Li, Xizhou Zhu, Yu Qiao, Jun Zhang, Wenqi Shao:
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation. CoRR abs/2506.17202 (2025)
[i471]Nianchen Deng, Lixin Gu, Shenglong Ye, Yinan He, Zhe Chen, Songze Li, Haomin Wang
, Xingguang Wei, Tianshuo Yang, Min Dou, Tong He, Wenqi Shao, Kaipeng Zhang, Yi Wang, Botian Shi, Yanting Zhang, Jifeng Dai, Yu Qiao, Hongjie Zhang, Wenhai Wang:
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models. CoRR abs/2506.18385 (2025)
[i470]Hongbo Liu, Jingwen He, Yi Jin, Dian Zheng, Yuhao Dong, Fan Zhang, Ziqi Huang, Yinan He, Yangguang Li, Weichao Chen, Yu Qiao, Wanli Ouyang, Shengjie Zhao, Ziwei Liu:
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models. CoRR abs/2506.21356 (2025)
[i469]Gen Luo, Wenhan Dou, Wenhao Li, Zhaokai Wang, Xue Yang, Changyao Tian, Hao Li, Weiyun Wang, Wenhai Wang, Xizhou Zhu, Yu Qiao, Jifeng Dai:
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models. CoRR abs/2507.12566 (2025)
[i468]Shuo Cao, Nan Ma, Jiayang Li, Xiaohui Li, Lihao Shao, Kaiwen Zhu, Yu Zhou, Yuandong Pu, Jiarui Wu, Jiaquan Wang, Bo Qu, Wenhai Wang, Yu Qiao, Dajuin Yao, Yihao Liu:
ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding. CoRR abs/2507.14533 (2025)
[i467]Xiangyu Chen, Kaiwen Zhu, Yuandong Pu, Shuo Cao, Xiaohui Li, Wenlong Zhang, Yihao Liu, Yu Qiao, Jiantao Zhou, Chao Dong:
Exploring Scalable Unified Modeling for General Low-Level Vision. CoRR abs/2507.14801 (2025)
[i466]Chuanhao Yan, Fengdi Che, Xuhan Huang, Xu Xu, Xin Li, Yizhi Li, Xingwei Qu, Jingzhe Shi, Zhuangzhuang He, Chenghua Lin, Yaodong Yang, Binhang Yuan, Hang Zhao, Yu Qiao, Bowen Zhou, Jie Fu:
Re:Form - Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny. CoRR abs/2507.16331 (2025)
[i465]Xiaofeng Mao, Shaoheng Lin, Zhen Li, Chuanhao Li, Wenshuo Peng, Tong He, Jiangmiao Pang, Mingmin Chi, Yu Qiao, Kaipeng Zhang:
Yume: An Interactive World Generation Model. CoRR abs/2507.17744 (2025)
[i464]Yi Xin, Juncheng Yan, Qi Qin, Zhen Li, Dongyang Liu, Shicheng Li, Victor Shea-Jay Huang, Yupeng Zhou, Renrui Zhang, Le Zhuo, Tiancheng Han, Xiaoqing Sun, Siqi Luo, Mengmeng Wang, Bin Fu, Yuewen Cao, Hongsheng Li, Guangtao Zhai, Xiaohong Liu, Yu Qiao, Peng Gao:
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling. CoRR abs/2507.17801 (2025)
[i463]Yang Chen, Yufan Shen, Wenxuan Huang, Sheng Zhou, Qunshu Lin, Xinyu Cai, Zhi Yu, Jiajun Bu, Botian Shi, Yu Qiao:
Learning Only with Images: Visual Reinforcement Learning with Reasoning, Rendering, and Visual Feedback. CoRR abs/2507.20766 (2025)
[i462]Jianxiong Gao, Zhaoxi Chen, Xian Liu, Jianfeng Feng, Chenyang Si, Yanwei Fu, Yu Qiao, Ziwei Liu:
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation. CoRR abs/2508.03694 (2025)
[i461]Jingwen He, Hongbo Liu, Jiajun Li, Ziqi Huang, Yu Qiao, Wanli Ouyang, Ziwei Liu:
Cut2Next: Generating Next Shot via In-Context Tuning. CoRR abs/2508.08244 (2025)
[i460]Yaohui Wang, Di Yang, Xinyuan Chen, François Brémond, Yu Qiao, Antitza Dantcheva:
LIA-X: Interpretable Latent Portrait Animator. CoRR abs/2508.09959 (2025)
[i459]Xiaoxue Wu, Bingjie Gao, Yu Qiao, Yaohui Wang, Xinyuan Chen:
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models. CoRR abs/2508.11484 (2025)
[i458]Tianyi Zhang, Haonan Duan, Haoran Hao, Yu Qiao, Jifeng Dai, Zhi Hou:
Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy. CoRR abs/2508.13103 (2025)
[i457]Lei Bai, Zhongrui Cai, Yuhang Cao, Maosong Cao, Weihan Cao, Chiyu Chen, Haojiong Chen, Kai Chen, Pengcheng Chen, Ying Chen, Yongkang Chen, Yu Cheng, Pei Chu, Tao Chu, Erfei Cui, Ganqu Cui, Long Cui, Ziyun Cui, Nianchen Deng, Ning Ding, Nanqing Dong, Peijie Dong, Shihan Dou, Sinan Du, Haodong Duan, Caihua Fan, Ben Gao, Changjiang Gao, Jianfei Gao, Songyang Gao, Yang Gao, Zhangwei Gao, Jiaye Ge, Qiming Ge, Lixin Gu, Yuzhe Gu, Aijia Guo, Qipeng Guo, Xu Guo, Conghui He, Junjun He, Yili Hong, Siyuan Hou, Caiyu Hu, Hanglei Hu, Jucheng Hu, Ming Hu, Zhouqi Hua, Haian Huang, Junhao Huang, Xu Huang, Zixian Huang, Zhe Jiang, Lingkai Kong, Linyang Li, Peiji Li, Pengze Li, Shuaibin Li, Tianbin Li, Wei Li, Yuqiang Li, Dahua Lin, Junyao Lin, Tianyi Lin, Zhishan Lin, Hongwei Liu, Jiangning Liu, Jiyao Liu, Junnan Liu, Kai Liu, Kaiwen Liu, Kuikun Liu, Shichun Liu, Shudong Liu, Wei Liu, Xinyao Liu, Yuhong Liu, Zhan Liu, Yinquan Lu, Haijun Lv, Hongxia Lv, Huijie Lv, Qitan Lv, Ying Lv, Chengqi Lyu, Chenglong Ma, Jianpeng Ma, Ren Ma, Runmin Ma, Runyuan Ma, Xinzhu Ma, Yichuan Ma, Zihan Ma, Sixuan Mi, Junzhi Ning, Wenchang Ning, Xinle Pang, Jiahui Peng, Runyu Peng, Yu Qiao:
Intern-S1: A Scientific Multimodal Foundation Model. CoRR abs/2508.15763 (2025)
[i456]Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, Zhaokai Wang, Zhe Chen, Hongjie Zhang, Ganlin Yang, Haomin Wang
, Qi Wei, Jinhui Yin, Wenhao Li, Erfei Cui, Guanzhou Chen, Zichen Ding, Changyao Tian, Zhenyu Wu, JingJing Xie, Zehao Li, Bowen Yang, Yuchen Duan, Xuehui Wang, Zhi Hou, Haoran Hao, Tianyi Zhang, Songze Li, Xiangyu Zhao, Haodong Duan, Nianchen Deng, Bin Fu, Yinan He, Yi Wang, Conghui He, Botian Shi, Junjun He, Yingtong Xiong, Han Lv, Lijun Wu, Wenqi Shao, Kaipeng Zhang, Huipeng Deng, Biqing Qi, Jiaye Ge, Qipeng Guo, Wenwei Zhang, Songyang Zhang, Maosong Cao, Junyao Lin, Kexian Tang, Jianfei Gao, Haian Huang, Yuzhe Gu, Chengqi Lyu, Huanze Tang, Rui Wang, Haijun Lv, Wanli Ouyang, Limin Wang, Min Dou, Xizhou Zhu, Tong Lu, Dahua Lin, Jifeng Dai, Weijie Su, Bowen Zhou, Kai Chen, Yu Qiao, Wenhai Wang, Gen Luo:
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency. CoRR abs/2508.18265 (2025)
[i455]Ming Hu, Chenglong Ma, Wei Li, Wanghan Xu, Jiamin Wu, Jucheng Hu, Tianbin Li, Guohang Zhuang, Jiaqi Liu, Yingzhou Lu, Ying Chen, Chaoyang Zhang, Cheng Tan, Jie Ying, Guocheng Wu, Shujian Gao, Pengcheng Chen, Jiashi Lin, Haitao Wu, Lulu Chen, Fengxiang Wang, Yuanyuan Zhang, Xiangyu Zhao, Feilong Tang, Encheng Su, Junzhi Ning, Xinyao Liu, Ye Du, Changkai Ji, Cheng Tang, Huihui Xu, Ziyang Chen, Ziyan Huang, Jiyao Liu, Pengfei Jiang, Yizhou Wang, Chen Tang, Jianyu Wu, Yuchen Ren, Siyuan Yan, Zhonghua Wang, Zhongxing Xu, Shiyan Su, Shangquan Sun, Runkai Zhao, Zhisheng Zhang, Yu Liu, Fudi Wang, Yuanfeng Ji, Yanzhou Su, Hongming Shan, Chun-Mei Feng, Jiahao Xu, Jiangtao Yan, Wenhao Tang, Diping Song, Lihao Liu, Yanyan Huang, Lequan Yu, Bin Fu, Shujun Wang, Xiaomeng Li, Xiaowei Hu, Yun Gu, Ben Fei, Zhongying Deng, Benyou Wang, Yuewen Cao, Minjie Shen, Haodong Duan, Jie Xu, Yirong Chen, Fang Yan, Hongxia Hao, Jielan Li, Jiajun Du, Yanbo Wang, Imran Razzak, Chi Zhang, Lijun Wu, Conghui He, Zhaohui Lu, Jinhai Huang, Yihao Liu, Fenghua Ling, Yuqiang Li, Aoran Wang, Qihao Zheng, Nanqing Dong, Tianfan Fu, Dongzhan Zhou, Yan Lu, Wenlong Zhang, Jin Ye, Jianfei Cai, Wanli Ouyang, Yu Qiao, Zongyuan Ge, Shixiang Tang, Junjun He:
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers. CoRR abs/2508.21148 (2025)
[i454]Zhaokai Wang, Penghao Yin, Xiangyu Zhao, Changyao Tian, Yu Qiao, Wenhai Wang, Jifeng Dai, Gen Luo:
GenExam: A Multidisciplinary Text-to-Image Exam. CoRR abs/2509.14232 (2025)
[i453]Zhaoyang Liu, JingJing Xie, Zichen Ding, Zehao Li, Bowen Yang, Zhenyu Wu, Xuehui Wang, Qiushi Sun, Shi Liu, Weiyun Wang, Shenglong Ye, Qingyun Li, Xuan Dong, Yue Yu, Chenyu Lu, YunXiang Mo, Yao Yan, Zeyue Tian, Xiao Zhang, Yuan Huang, Yiqian Liu, Weijie Su, Gen Luo, Xiangyu Yue, Biqing Qi, Kai Chen, Bowen Zhou, Yu Qiao, Qifeng Chen, Wenhai Wang:
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data. CoRR abs/2509.15221 (2025)
[i452]Ziang Yan, Xinhao Li, Yinan He, Zhengrong Yue, Xiangyu Zeng, Yali Wang, Yu Qiao, Limin Wang, Yi Wang:
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception. CoRR abs/2509.21100 (2025)
[i451]Junbo Niu, Zheng Liu, Zhuangcheng Gu, Bin Wang, Linke Ouyang, Zhiyuan Zhao, Tao Chu, Tianyao He, Fan Wu, Qintong Zhang, Zhenjiang Jin, Guang Liang, Rui Zhang, Wenzheng Zhang, Yuan Qu, Zhifei Ren, Yuefeng Sun, Yuanhong Zheng, Dongsheng Ma, Zirui Tang, Boyu Niu, Ziyang Miao, Hejun Dong, Siyi Qian, Junyuan Zhang, Jingzhou Chen, Fangdong Wang, Xiaomeng Zhao, Liqun Wei, Wei Li, Shasha Wang, Ruiliang Xu, Yuanyuan Cao, Lu Chen, Qianqian Wu, Huaiyu Gu, Lindong Lu, Keming Wang, Dechen Lin, Guanlin Shen, Xuanhe Zhou, Linfeng Zhang, Yuhang Zang, Xiaoyi Dong, Jiaqi Wang, Bo Zhang, Lei Bai, Pei Chu, Weijia Li, Jiang Wu, Lijun Wu, Zhenxiang Li, Guangyu Wang, Zhongying Tu, Chao Xu, Kai Chen, Yu Qiao, Bowen Zhou, Dahua Lin, Wentao Zhang, Conghui He:
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing. CoRR abs/2509.22186 (2025)
[i450]Yangzhou Liu, Yue Cao, Hao Li, Gen Luo, Zhe Chen, Weiyun Wang, Xiaobo Liang, Biqing Qi, Lijun Wu, Changyao Tian, Yanting Zhang, Yuqiang Li, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang:
Sequential Diffusion Language Models. CoRR abs/2509.24007 (2025)
[i449]Yang Chen, Minghao Liu, Yufan Shen, Yunwen Li, Tianyuan Huang, Xinyu Fang, Tianyu Zheng, Wenxuan Huang, Cheng Yang, Daocheng Fu, Jianbiao Mei, Rong Wu, Licheng Wen, Xuemeng Yang, Song Mao, Qunshu Lin, Zhi Yu, Yongliang Shen, Yu Qiao, Botian Shi:
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video? CoRR abs/2509.24709 (2025)
[i448]Songze Li, Zun Wang, Gengze Zhou, Jialu Li, Xiangyu Zeng, Limin Wang, Yu Qiao, Qi Wu, Mohit Bansal, Yi Wang:
Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale. CoRR abs/2509.24910 (2025)
[i447]Daocheng Fu, Jianbiao Mei, Licheng Wen, Xuemeng Yang, Cheng Yang, Rong Wu, Tao Hu, Siqi Li, Yufan Shen, Xinyu Cai, Pinlong Cai, Botian Shi, Yong Liu, Yu Qiao:
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection. CoRR abs/2509.26048 (2025)
[i446]Yi Xin, Qi Qin, Siqi Luo, Kaiwen Zhu, Juncheng Yan, Yan Tai, Jiayi Lei, Yuewen Cao, Keqi Wang, Yibin Wang, Jinbin Bai, Qian Yu, Dengyang Jiang, Yuandong Pu, Haoxing Chen, Le Zhuo, Junjun He, Gen Luo, Tianbin Li, Ming Hu, Jin Ye, Shenglong Ye, Bo Zhang, Chang Xu, Wenhai Wang, Hongsheng Li, Guangtao Zhai, Tianfan Xue, Bin Fu, Xiaohong Liu, Yu Qiao, Yihao Liu:
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding. CoRR abs/2510.06308 (2025)
[i445]Cheng Yang, Xuemeng Yang, Licheng Wen, Daocheng Fu, Jianbiao Mei, Rong Wu, Pinlong Cai, Yufan Shen, Nianchen Deng, Botian Shi, Yu Qiao, Haifeng Li:
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks. CoRR abs/2510.08002 (2025)
[i444]Yuan Xu, Zimu Zhang, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang:
Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality. CoRR abs/2510.10742 (2025)
[i443]Ganlin Yang, Tianyi Zhang, Haoran Hao, Weiyun Wang, Yibin Liu, Dehui Wang, Guanzhou Chen, Zijian Cai, Junting Chen, Weijie Su, Wengang Zhou, Yu Qiao, Jifeng Dai, Jiangmiao Pang, Gen Luo, Wenhai Wang, Yao Mu, Zhi Hou:
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning. CoRR abs/2510.11027 (2025)
[i442]Haomin Wang
, Jinhui Yin, Qi Wei, Wenguang Zeng, Lixin Gu, Shenglong Ye, Zhangwei Gao, Yaohui Wang, Yanting Zhang, Yuanqi Li, Yanwen Guo, Wenhai Wang, Kai Chen, Yu Qiao, Hongjie Zhang:
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models. CoRR abs/2510.11341 (2025)
[i441]Yicheng Xu, Yue Wu, Jiashuo Yu, Ziang Yan, Tianxiang Jiang, Yinan He, Qingsong Zhao, Kai Chen, Yu Qiao, Limin Wang, Manabu Okumura, Yi Wang:
ExpVid: A Benchmark for Experiment Video Understanding & Reasoning. CoRR abs/2510.11606 (2025)
[i440]Zhenxin Lei, Zhangwei Gao, Changyao Tian, Erfei Cui, Guanzhou Chen, Danni Yang, Yuchen Duan, Zhaokai Wang, Wenhao Li, Weiyun Wang, Xiangyu Zhao, Jiayi Ji, Yu Qiao, Wenhai Wang, Gen Luo:
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites. CoRR abs/2510.12126 (2025)
[i439]Long Cui, Weiyun Wang, Jie Shao, Zichen Wen, Gen Luo, Linfeng Zhang, Yanting Zhang, Yu Qiao, Wenhai Wang:
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution. CoRR abs/2510.12793 (2025)
[i438]Kai Zou, Ziqi Huang, Yuhao Dong, Shulin Tian, Dian Zheng, Hongbo Liu, Jingwen He, Bin Liu, Yu Qiao, Ziwei Liu:
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark. CoRR abs/2510.13759 (2025)
[i437]Xinyi Chen, Yilun Chen, Yanwei Fu, Ning Gao, Jiaya Jia, Weiyang Jin, Hao Li, Yao Mu, Jiangmiao Pang, Yu Qiao, Yang Tian, Bin Wang, Bolun Wang, Fangjing Wang, Hanqing Wang, Tai Wang, Ziqin Wang, Xueyuan Wei, Chao Wu, Shuai Yang, Jinhui Ye, Junqiu Yu, Jia Zeng, Jingjing Zhang, Jinyu Zhang, Shi Zhang, Feng Zheng, Bowen Zhou, Yangkun Zhu:
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy. CoRR abs/2510.13778 (2025)
[i436]Yuandong Pu, Le Zhuo, Songhao Han, Jinbo Xing, Kaiwen Zhu, Shuo Cao, Bin Fu, Si Liu, Hongsheng Li, Yu Qiao, Wenlong Zhang, Xi Chen, Yihao Liu:
PICABench: How Far Are We from Physically Realistic Image Editing? CoRR abs/2510.17681 (2025)
[i435]Bingjie Gao, Qianli Ma, Xiaoxue Wu, Shuai Yang, Guanzhou Lan, Haonan Zhao, Jiaxuan Chen, Qingyang Liu, Yu Qiao, Xinyuan Chen, Yaohui Wang, Li Niu:
RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling. CoRR abs/2510.20206 (2025)
[i434]Baoqi Pei, Yifei Huang, Jilan Xu, Yuping He, Guo Chen, Fei Wu, Yu Qiao, Jiangmiao Pang:
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT. CoRR abs/2510.23569 (2025)- 2024
[j110]Zhe Chen, Weiyun Wang
, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang
, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu
, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
How far are we to GPT-4V? Closing the gap to commercial multimodal models with open-source suites. Sci. China Inf. Sci. 67(12) (2024)
[j109]Yangzhou Liu, Yue Cao, Zhangwei Gao, Weiyun Wang
, Zhe Chen, Wenhai Wang, Hao Tian, Lewei Lu, Xizhou Zhu
, Tong Lu, Yu Qiao, Jifeng Dai:
MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity. Sci. China Inf. Sci. 67(12) (2024)
[j108]Yihao Liu
, Hengyuan Zhao, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Yu Qiao, Chao Dong:
Temporally consistent video colorization with deep feature propagation and self-regularization learning. Comput. Vis. Media 10(2): 375-395 (2024)
[j107]Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li
, Yu Qiao:
CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Int. J. Comput. Vis. 132(2): 581-595 (2024)
[j106]Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang:
MixStyle Neural Networks for Domain Generalization and Adaptation. Int. J. Comput. Vis. 132(3): 822-836 (2024)
[j105]Peng Gao, Ziyi Lin, Renrui Zhang, Rongyao Fang, Hongyang Li, Hongsheng Li
, Yu Qiao:
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking. Int. J. Comput. Vis. 132(5): 1546-1556 (2024)
[j104]Haibin He
, Xinyuan Chen, Chaoyue Wang, Juhua Liu
, Bo Du, Dacheng Tao
, Yu Qiao:
Diff-Font: Diffusion Model for Robust One-Shot Font Generation. Int. J. Comput. Vis. 132(11): 5372-5386 (2024)
[j103]Hao Zhang
, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang:
Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching. Int. J. Comput. Vis. 132(12): 5741-5758 (2024)
[j102]Yuhui Wang, Yahan Xie, Yu Qiao, Zhaohui Xia
, Yanying Chen:
Chinese CSUQ: Cross-Cultural Adaptation and Evaluation of Measurement Properties. Int. J. Hum. Comput. Interact. 40(22): 7623-7641 (2024)
[j101]Yi Liu, Yu Qiao, Yali Wang:
F2S-Net: learning frame-to-segment prediction for online action detection. J. Real Time Image Process. 21(3): 73 (2024)
[j100]Hongyang Li
, Chonghao Sima
, Jifeng Dai
, Wenhai Wang
, Lewei Lu
, Huijie Wang
, Jia Zeng
, Zhiqi Li
, Jiazhi Yang
, Hanming Deng
, Hao Tian
, Enze Xie
, Jiangwei Xie
, Li Chen
, Tianyu Li
, Yang Li
, Yulu Gao
, Xiaosong Jia
, Si Liu
, Jianping Shi
, Dahua Lin
, Yu Qiao
:
Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe. IEEE Trans. Pattern Anal. Mach. Intell. 46(4): 2151-2170 (2024)
[j99]Yuexin Ma
, Tai Wang
, Xuyang Bai
, Huitong Yang, Yuenan Hou
, Yaming Wang, Yu Qiao
, Ruigang Yang
, Xinge Zhu
:
Vision-Centric BEV Perception: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 46(12): 10978-10997 (2024)
[j98]Yingqi Liu
, Jingwen He, Yihao Liu
, Xinqi Lin, Fanghua Yu, Jinfan Hu
, Yu Qiao, Chao Dong:
AdaptBIR: Adaptive Blind Image Restoration with latent diffusion prior for higher fidelity. Pattern Recognit. 155: 110659 (2024)
[j97]Mingfei Han, Yali Wang
, Mingjie Li
, Xiaojun Chang
, Yi Yang, Yu Qiao
:
Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection. IEEE Trans. Image Process. 33: 1560-1573 (2024)
[j96]Siran Chen
, Qinglin Xu
, Yue Ma
, Yu Qiao
, Yali Wang
:
Attentive Snippet Prompting for Video Retrieval. IEEE Trans. Multim. 26: 4348-4359 (2024)
[j95]Yuer Ma
, Yi Liu
, Limin Wang
, Wenxiong Kang
, Yu Qiao
, Yali Wang
:
Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery. IEEE Trans. Multim. 26: 5694-5704 (2024)
[j94]Mingye Xu
, Zhipeng Zhou
, Hongbin Xu
, Yu Qiao
, Yali Wang
:
CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning. IEEE Trans. Multim. 26: 8799-8810 (2024)
[j93]Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang
:
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance. Vis. Intell. 2(1): 32 (2024)
[c392]Siran Chen, Yue Ma, Yu Qiao, Yali Wang:
M-BEV: Masked BEV Perception for Robust Autonomous Driving. AAAI 2024: 1183-1191
[c391]Ziteng Cui, Lin Gu
, Xiao Sun, Xianzheng Ma, Yu Qiao, Tatsuya Harada:
Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption. AAAI 2024: 1435-1444
[c390]Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao:
ConditionVideo: Training-Free Condition-Guided Video Generation. AAAI 2024: 4459-4467
[c389]Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang
, Yu Qiao:
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification. AAAI 2024: 4506-4514
[c388]Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao:
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation. AAAI 2024: 6449-6457
[c387]Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao:
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model. AAAI 2024: 7215-7223
[c386]Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao:
Critic-Guided Decision Transformer for Offline Reinforcement Learning. AAAI 2024: 15706-15714
[c385]Yan Ma, Yu Qiao, Pengfei Liu:
MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation. ACL (1) 2024: 2135-2169
[c384]Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao:
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models. ACL (Findings) 2024: 3923-3954
[c383]Chen Qian, Jie Zhang, Wei Yao
, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao:
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models. ACL (Findings) 2024: 4864-4888
[c382]Guoxin Chen, Kexin Tang, Chao Yang, Fuying Ye, Yu Qiao, Yiming Qian:
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning. ACL (1) 2024: 5901-5921
[c381]Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo:
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning. ACL (Findings) 2024: 7775-7803
[c380]Zhanhui Zhou, Jie Liu
, Jing Shao, Xiangyu Yue, Chao Yang, Wanli Ouyang, Yu Qiao:
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization. ACL (Findings) 2024: 10586-10613
[c379]Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu:
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models. ACL (1) 2024: 13091-13116
[c378]Zaibin Zhang, Yongting Zhang, Lijun Li, Jing Shao, Hongzhi Gao, Yu Qiao, Lijun Wang, Huchuan Lu, Feng Zhao:
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety. ACL (1) 2024: 15202-15231
[c377]Zhanhui Zhou, Jie Liu
, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao:
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! ACL (1) 2024: 15810-15830
[c376]Yuan Xu, Xiaoxuan Ma, Jiajun Su, Wentao Zhu, Yu Qiao, Yizhou Wang:
ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring. CVPR 2024: 979-989
[c375]Xiaoliang Ju, Zhaoyang Huang, Yijiin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li
:
DiffInDScene: Diffusion-Based High-Quality 3D Indoor Scene Generation. CVPR 2024: 4526-4535
[c374]Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu
, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao:
Point Transformer V3: Simpler, Faster, Stronger. CVPR 2024: 4840-4851
[c373]Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu
, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li
, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. CVPR 2024: 5652-5661
[c372]Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong:
Towards Real-world Video Face Restoration: A New Benchmark. CVPR Workshops 2024: 5929-5939
[c371]Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji
:
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model. CVPR 2024: 6390-6399
[c370]Yuming Jiang, Tianxing Wu, Shuai Yang
, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu:
VideoBooth: Diffusion-based Video Generation with Image Prompts. CVPR 2024: 6689-6700
[c369]Bin Fu, Fanghua Yu, Anran Liu, Zixuan Wang, Jie Wen, Junjun He, Yu Qiao:
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models. CVPR 2024: 6892-6901
[c368]Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang:
Vlogger: Make Your Dream A Vlog. CVPR 2024: 8806-8817
[c367]Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng:
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion. CVPR 2024: 9784-9794
[c366]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction. CVPR 2024: 14089-14099
[c365]Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li:
Generalized Predictive Model for Autonomous Driving. CVPR 2024: 14662-14672
[c364]Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao:
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception. CVPR 2024: 16307-16316
[c363]Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie Zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li
, Lewei Lu, Jifeng Dai:
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft. CVPR 2024: 16426-16435
[c362]Yi Yu, Xue Yang, Qingyun Li, Feipeng Da, Jifeng Dai, Yu Qiao, Junchi Yan:
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-End Oriented Object Detection with Single Point Supervision. CVPR 2024: 16783-16793
[c361]Zhiyu Zhao, Bingkun Huang, Sen Xing, Gangshan Wu, Yu Qiao, Limin Wang:
Asymmetric Masked Distillation for Pre-Training Small Foundation Models. CVPR 2024: 18516-18526
[c360]Hao Wu, Huabin Liu, Yu Qiao, Xiao Sun:
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement. CVPR 2024: 18699-18708
[c359]Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
VBench: Comprehensive Benchmark Suite for Video Generative Models. CVPR 2024: 21807-21818
[c358]Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao:
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World. CVPR 2024: 22072-22086
[c357]Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo:
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM. CVPR 2024: 22170-22183
[c356]Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Lou, Limin Wang, Yu Qiao:
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark. CVPR 2024: 22195-22206
[c355]Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu
, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai:
Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CVPR 2024: 24185-24198
[c354]Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong:
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild. CVPR 2024: 25669-25680
[c353]Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen
:
SinSR: Diffusion-Based Image Super-Resolution in a Single Step. CVPR 2024: 25796-25805
[c352]Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang
, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue:
OneLLM: One Framework to Align All Modalities with Language. CVPR 2024: 26574-26585
[c351]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
Language-aware Visual Semantic Distillation for Video Question Answering. CVPR 2024: 27103-27113
[c350]Ziyi Lin, Dongyang Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Yu Qiao, Hongsheng Li
:
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models. ECCV (62) 2024: 36-55
[c349]Yuchen Yang
, Yu Qiao
, Xiao Sun
:
Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation. ECCV (44) 2024: 38-55
[c348]Haoyu Wang
, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He
, Yu Qiao
:
SAM-Med3D: Towards General-Purpose Segmentation Models for Volumetric Medical Images. ECCV Workshops (16) 2024: 51-67
[c347]Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong:
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity. ECCV (70) 2024: 70-87
[c346]Xiangyu Chen, Zheyuan Li
, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, Chao Dong:
A Comparative Study of Image Restoration Networks for General Backbone Network Design. ECCV (71) 2024: 74-91
[c345]Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu
, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang:
ControlLLM: Augment Language Models with Tools by Searching on Graphs. ECCV (12) 2024: 89-105
[c344]Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li:
Embodied Understanding of Driving Scenarios. ECCV (62) 2024: 129-148
[c343]Gang Li, Wenhai Wang, Xiang Li, Ziheng Li, Jian Yang, Jifeng Dai, Yu Qiao, Shanshan Zhang:
Distilling Knowledge from Large-Scale Image Models for Object Detection. ECCV (84) 2024: 142-160
[c342]Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Yu Qiao, Peng Gao, Hongsheng Li
:
MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? ECCV (8) 2024: 169-186
[c341]Jiakang Yuan
, Bo Zhang
, Kaixiong Gong, Xiangyu Yue, Botian Shi, Yu Qiao
, Tao Chen
:
Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection. ECCV (43) 2024: 197-213
[c340]Kunchang Li
, Xinhao Li
, Yi Wang, Yinan He
, Yali Wang
, Limin Wang
, Yu Qiao
:
VideoMamba: State Space Model for Efficient Video Understanding. ECCV (26) 2024: 237-255
[c339]Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma, Jian Wang:
Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation. ECCV (33) 2024: 346-363
[c338]Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao:
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models. ECCV (56) 2024: 386-403
[c337]Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Jilan Xu, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang:
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding. ECCV (85) 2024: 396-416
[c336]Xinqi Lin
, Jingwen He
, Ziyan Chen
, Zhaoyang Lyu
, Bo Dai
, Fanghua Yu
, Yu Qiao
, Wanli Ouyang
, Chao Dong
:
DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior. ECCV (59) 2024: 430-448
[c335]Weiyun Wang
, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu
, Yu Qiao, Jifeng Dai:
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World. ECCV (33) 2024: 471-490
[c334]Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng:
Within the Dynamic Context: Inertia-Aware 3D Human Modeling with Pose Sequence. ECCV (49) 2024: 491-508
[c333]Zhixuan Liu, Zhanhui Zhou, Yuanfu Wang, Chao Yang, Yu Qiao:
Inference-Time Language Model Alignment via Integrated Value Guidance. EMNLP (Findings) 2024: 4181-4195
[c332]Yinquan Lu, Wenhao Zhu, Lei Li, Yu Qiao, Fei Yuan:
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages. EMNLP (Findings) 2024: 10748-10772
[c331]Zhaoxun Ju, Chao Yang, Fuchun Sun, Hongbo Wang, Yu Qiao:
Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning. ICAPS 2024: 301-309
[c330]Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo:
Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization. ICASSP 2024: 3475-3479
[c329]Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai:
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. ICLR 2024
[c328]Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao:
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation. ICLR 2024
[c327]Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction. ICLR 2024
[c326]Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo:
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models. ICLR 2024
[c325]Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo:
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models. ICLR 2024
[c324]Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
CO2: Efficient Distributed Training with Full Communication-Computation Overlap. ICLR 2024
[c323]Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao:
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World. ICLR 2024
[c322]Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Ping Luo, Ziwei Liu, Yali Wang, Limin Wang, Yu Qiao:
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation. ICLR 2024
[c321]Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao:
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models. ICLR 2024
[c320]Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo:
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation. ICLR 2024
[c319]Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li:
Personalize Segment Anything Model with One Shot. ICLR 2024
[c318]Renrui Zhang, Jiaming Han, Chris Liu, Aojun Zhou, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao:
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention. ICLR 2024
[c317]Wenlong Zhang, Xiaohui Li, Xiangyu Chen, Xiaoyun Zhang, Yu Qiao, Xiao-Ming Wu, Chao Dong:
SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution. ICLR 2024
[c316]Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang:
Causal Discovery via Conditional Independence Testing with Proxy Variables. ICML 2024
[c315]Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong:
Unifying Image Processing as Visual Prompting Question Answering. ICML 2024
[c314]Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo:
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis. ICML 2024
[c313]Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao:
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models. ICML 2024
[c312]Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo:
Position: Towards Implicit Prompt For Text-To-Image Models. ICML 2024
[c311]Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI. ICML 2024
[c310]Yu Qiao, Xiaohui Yang
, Jing Wang, Tongzhen Si, Qingbei Guo:
Driver Cognitive Distraction Detection Based on Eye Movement Behavior and Spatio-Temporal Information Fusion. ICONIP (10) 2024: 92-107
[c309]Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao:
Safety of Multimodal Large Language Models on Images and Text. IJCAI 2024: 8151-8159
[c308]Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao:
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving. IV 2024: 1084-1090
[c307]Xiangyu Chen
, Yihao Liu
, Yuandong Pu
, Wenlong Zhang
, Jiantao Zhou
, Yu Qiao
, Chao Dong
:
Learning A Low-Level Vision Generalist via Visual Task Prompt. ACM Multimedia 2024: 2671-2680
[c306]Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang:
Fake Alignment: Are LLMs Really Aligned Well? NAACL-HLT 2024: 4696-4712
[c305]Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao:
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey. NAACL-HLT 2024: 6734-6747
[c304]Tao Ma, Hongbin Zhou, Qiusheng Huang, Xuemeng Yang, Jianfei Guo, Bo Zhang, Min Dou, Yu Qiao, Botian Shi, Hongsheng Li:
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving. NeurIPS 2024
[c303]Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Lin Bin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang:
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions. NeurIPS 2024
[c302]Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao:
Are We on the Right Way for Evaluating Large Vision-Language Models? NeurIPS 2024
[c301]Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J. Seibel, Junjun He, Yu Qiao:
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI. NeurIPS 2024
[c300]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD. NeurIPS 2024
[c299]Tianle Gu, Zeyang Zhou, Kexin Huang, Dandan Liang, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Xingge Qiao, Keqing Wang, Yujiu Yang, Yan Teng, Yu Qiao, Yingchun Wang:
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models. NeurIPS 2024
[c298]Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang:
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration. NeurIPS 2024
[c297]Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu:
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI. NeurIPS 2024
[c296]Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang:
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge. NeurIPS 2024
[c295]Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang:
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs. NeurIPS 2024
[c294]Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao, Kaipeng Zhang:
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models. NeurIPS 2024
[c293]Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao:
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving. NeurIPS 2024
[c292]Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai:
Learning 1D Causal Visual Representation with De-focus Attention Networks. NeurIPS 2024
[c291]Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang:
Needle In A Multimodal Haystack. NeurIPS 2024
[c290]Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu, Zhe Chen, Wenhai Wang, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai:
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks. NeurIPS 2024
[c289]Chenyu Yang, Xizhou Zhu, Jinguo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie Zhou, Yu Qiao, Jifeng Dai:
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning. NeurIPS 2024
[c288]Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu Qiao:
4Diffusion: Multi-view Video Diffusion Model for 4D Generation. NeurIPS 2024
[c287]Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Yue Yang, Ziyao Guo, Wenqi Shao, Kai Wang, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang:
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality. NeurIPS 2024
[c286]Qingsong Zhao, Yi Wang, Jilan Xu, Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao:
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection? NeurIPS 2024
[c285]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
SyncVIS: Synchronized Video Instance Segmentation. NeurIPS 2024
[c284]Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao:
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models. NeurIPS 2024
[c283]Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks. NeurIPS 2024
[c282]Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Xiangyang Zhu, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Lirui Zhao, Si Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT. NeurIPS 2024
[c281]Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li:
Learning Manipulation by Predicting Interaction. Robotics: Science and Systems 2024
[c280]Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao:
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models. WACV (Workshops) 2024: 910-919
[c279]Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu
:
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation. WACV 2024: 5362-5371
[i433]Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo:
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning. CoRR abs/2401.02384 (2024)
[i432]Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen
, Yu Qiao:
Latte: Latent Diffusion Transformer for Video Generation. CoRR abs/2401.03048 (2024)
[i431]Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. CoRR abs/2401.06197 (2024)
[i430]Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang:
Vlogger: Make Your Dream A Vlog. CoRR abs/2401.09414 (2024)
[i429]Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie Zhou, Hongsheng Li, Yu Qiao, Jifeng Dai:
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer. CoRR abs/2401.10208 (2024)
[i428]Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao:
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety. CoRR abs/2401.11880 (2024)
[i427]Guoxin Chen, Kexin Tang, Chao Yang, Fuying Ye, Yu Qiao, Yiming Qian:
SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning. CoRR abs/2401.13246 (2024)
[i426]Fanghua Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong:
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild. CoRR abs/2401.13627 (2024)
[i425]Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng
, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang
, Yali Wang, Yan Teng
, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang:
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities. CoRR abs/2401.15071 (2024)
[i424]Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
CO2: Efficient Distributed Training with Full Communication-Computation Overlap. CoRR abs/2401.16265 (2024)
[i423]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang
, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model. CoRR abs/2401.16420 (2024)
[i422]Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu Qiao:
Safety of Multimodal Large Language Models on Images and Text. CoRR abs/2402.00357 (2024)
[i421]Daocheng Fu, Wenjie Lei, Licheng Wen, Pinlong Cai, Song Mao, Min Dou, Botian Shi, Yu Qiao:
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving. CoRR abs/2402.01246 (2024)
[i420]Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao:
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models. CoRR abs/2402.05044 (2024)
[i419]Shikun Ban, Juling Fan, Wentao Zhu, Xiaoxuan Ma, Yu Qiao, Yizhou Wang:
Real-time Holistic Robot Pose Estimation with Unknown States. CoRR abs/2402.05655 (2024)
[i418]Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao:
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models. CoRR abs/2402.05935 (2024)
[i417]Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, Ping Luo:
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM. CoRR abs/2402.09181 (2024)
[i416]Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao:
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey. CoRR abs/2402.09283 (2024)
[i415]Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao:
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning. CoRR abs/2402.12185 (2024)
[i414]Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang
, Yu Qiao:
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! CoRR abs/2402.12343 (2024)
[i413]Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo:
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation. CoRR abs/2402.14623 (2024)
[i412]Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo:
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis. CoRR abs/2402.16117 (2024)
[i411]Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo:
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation. CoRR abs/2402.16880 (2024)
[i410]Zhaoxun Ju, Chao Yang, Hongbo Wang, Yu Qiao, Fuchun Sun:
Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning. CoRR abs/2402.17511 (2024)
[i409]Boyu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang:
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition. CoRR abs/2402.18951 (2024)
[i408]Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang
, Wenchang Ning, Jia Yu, ChaoBin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Dahua Lin, Yu Qiao, Hang Yan, Conghui He:
WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset. CoRR abs/2402.19282 (2024)
[i407]Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao:
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models. CoRR abs/2402.19465 (2024)
[i406]Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai:
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World. CoRR abs/2402.19474 (2024)
[i405]Zishi Li, Xiaoxuan Ma, Qiuyan Shang, Wentao Zhu, Hai Ci, Yu Qiao, Yizhou Wang:
Efficient Action Counting with Dynamic Queries. CoRR abs/2403.01543 (2024)
[i404]Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo:
Towards Implicit Prompt For Text-To-Image Models. CoRR abs/2403.02118 (2024)
[i403]Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures. CoRR abs/2403.02308 (2024)
[i402]Yunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu, Hongzi Zhu, Minyi Guo, Yu Qiao, Hongyang Li:
Embodied Understanding of Driving Scenarios. CoRR abs/2403.04593 (2024)
[i401]Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao:
VideoMamba: State Space Model for Efficient Video Understanding. CoRR abs/2403.06977 (2024)
[i400]Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Yu Qiao, Wai Lam, Lizhuang Ma:
Exploring Safety Generalization Challenges of Large Language Models via Code. CoRR abs/2403.07865 (2024)
[i399]Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang:
AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions. CoRR abs/2403.09346 (2024)
[i398]Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li:
Generalized Predictive Model for Autonomous Driving. CoRR abs/2403.09630 (2024)
[i397]Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng
, Yu Qiao, Jing Shao:
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control. CoRR abs/2403.12037 (2024)
[i396]Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu
, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei Huang, Yu Qiao, Yali Wang, Limin Wang:
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding. CoRR abs/2403.15377 (2024)
[i395]Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao:
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World. CoRR abs/2403.16182 (2024)
[i394]Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng
, Yu Qiao, Jing Shao:
Assessment of Multimodal Large Language Models in Alignment with Human Values. CoRR abs/2403.17830 (2024)
[i393]Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng:
Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence. CoRR abs/2403.19160 (2024)
[i392]Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Haoshu Fang, Zhenfei Yin, Wanli Ouyang
, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng
:
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents. CoRR abs/2403.19622 (2024)
[i391]Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao, Kaipeng Zhang:
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models. CoRR abs/2403.20194 (2024)
[i390]Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang
, Yu Qiao, Dahua Lin, Feng Zhao:
Are We on the Right Way for Evaluating Large Vision-Language Models? CoRR abs/2403.20330 (2024)
[i389]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction. CoRR abs/2404.00913 (2024)
[i388]Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao:
VideoDistill: Language-aware Vision Distillation for Video Question Answering. CoRR abs/2404.00973 (2024)
[i387]Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji:
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model. CoRR abs/2404.01342 (2024)
[i386]Hao Wu, Huabin Liu, Yu Qiao, Xiao Sun:
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement. CoRR abs/2404.02755 (2024)
[i385]Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong:
Linear Attention Sequence Parallelism. CoRR abs/2404.02882 (2024)
[i384]Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang
, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
:
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD. CoRR abs/2404.06512 (2024)
[i383]Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI. CoRR abs/2404.16006 (2024)
[i382]Zhe Chen, Weiyun Wang
, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang
, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. CoRR abs/2404.16821 (2024)
[i381]Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong:
Towards Real-world Video Face Restoration: A New Benchmark. CoRR abs/2404.19500 (2024)
[i380]Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu:
Causal Evaluation of Language Models. CoRR abs/2405.00622 (2024)
[i379]Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li:
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers. CoRR abs/2405.05945 (2024)
[i378]Chuanhao Li, Zhen Li, Chenchen Jing, Shuo Liu, Wenqi Shao, Yuwei Wu, Ping Luo, Yu Qiao, Kaipeng Zhang:
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge. CoRR abs/2405.14554 (2024)
[i377]Chongjie Si, Xuehui Wang, Xue Yang, Zhengqin Xu, Qingyun Li, Jifeng Dai, Yu Qiao, Xiaokang Yang, Wei Shen:
FLoRA: Low-Rank Core Space for N-dimension. CoRR abs/2405.14739 (2024)
[i376]Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao:
Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving. CoRR abs/2405.15324 (2024)
[i375]Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, Yu Qiao:
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models. CoRR abs/2405.19262 (2024)
[i374]Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu
, Yunhong Wang, Yu Qiao:
4Diffusion: Multi-view Video Diffusion Model for 4D Generation. CoRR abs/2405.20674 (2024)
[i373]Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li:
Learning Manipulation by Predicting Interaction. CoRR abs/2406.00439 (2024)
[i372]Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Yu Qiao, Lu Sheng
:
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion. CoRR abs/2406.03184 (2024)
[i371]Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang
:
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions. CoRR abs/2406.04325 (2024)
[i370]Xizhou Zhu, Xue Yang, Zhaokai Wang, Hao Li, Wenhan Dou, Junqi Ge, Lewei Lu, Yu Qiao, Jifeng Dai:
Parameter-Inverted Image Pyramid Networks. CoRR abs/2406.04330 (2024)
[i369]Chenxin Tao, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai:
Learning 1D Causal Visual Representation with De-focus Attention Networks. CoRR abs/2406.04342 (2024)
[i368]Weiyun Wang, Shuibo Zhang, Yiming Ren, Yuchen Duan, Tiantong Li, Shuo Liu, Mengkang Hu, Zhe Chen, Kaipeng Zhang, Lewei Lu, Xizhou Zhu, Ping Luo, Yu Qiao, Jifeng Dai, Wenqi Shao, Wenhai Wang:
Needle In A Multimodal Haystack. CoRR abs/2406.07230 (2024)
[i367]Chenyu Yang, Xizhou Zhu, Jinguo Zhu, Weijie Su, Junjie Wang, Xuan Dong, Wenhai Wang, Lewei Lu, Bin Li, Jie Zhou, Yu Qiao, Jifeng Dai:
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning. CoRR abs/2406.07543 (2024)
[i366]Tianle Gu, Zeyang Zhou, Kexin Huang, Dandan Liang, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Xingge Qiao, Keqing Wang, Yujiu Yang
, Yan Teng
, Yu Qiao, Yingchun Wang:
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models. CoRR abs/2406.07594 (2024)
[i365]Jiannan Wu, Muyan Zhong, Sen Xing, Zeqiang Lai, Zhaoyang Liu
, Wenhai Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Ping Luo, Yu Qiao, Jifeng Dai:
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks. CoRR abs/2406.08394 (2024)
[i364]Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu
, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, Limin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai:
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text. CoRR abs/2406.08418 (2024)
[i363]Quanfeng Lu, Wenqi Shao, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo:
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices. CoRR abs/2406.08451 (2024)
[i362]Tianle Zhang, Langtian Ma, Yuchen Yan, Yuchen Zhang, Kai Wang, Yue Yang, Ziyao Guo, Wenqi Shao, Yang You, Yu Qiao, Ping Luo, Kaipeng Zhang:
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality. CoRR abs/2406.08845 (2024)
[i361]Renqiu Xia, Song Mao
, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao:
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models. CoRR abs/2406.11633 (2024)
[i360]Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu:
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models. CoRR abs/2406.11736 (2024)
[i359]Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, Ping Luo:
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models. CoRR abs/2406.11802 (2024)
[i358]Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang
:
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs. CoRR abs/2406.11833 (2024)
[i357]Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao:
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model. CoRR abs/2406.12030 (2024)
[i356]Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu:
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI. CoRR abs/2406.12753 (2024)
[i355]Baoqi Pei, Guo Chen, Jilan Xu, Yuping He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao:
EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation. CoRR abs/2406.18070 (2024)
[i354]Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang
, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT. CoRR abs/2406.18583 (2024)
[i353]Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang
, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output. CoRR abs/2407.03320 (2024)
[i352]Yinquan Lu, Wenhao Zhu, Lei Li, Yu Qiao, Fei Yuan:
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages. CoRR abs/2407.05975 (2024)
[i351]Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang
, Ziwei Liu:
VEnhancer: Generative Space-Time Enhancement for Video Generation. CoRR abs/2407.07667 (2024)
[i350]Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang, Yu Qiao:
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification. CoRR abs/2407.08787 (2024)
[i349]Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang:
GRUtopia: Dream General Robots in a City at Scale. CoRR abs/2407.10943 (2024)
[i348]Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo:
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models. CoRR abs/2407.11062 (2024)
[i347]Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Yu Qiao, Li Li, Fei-Yue Wang:
Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond. CoRR abs/2407.11100 (2024)
[i346]Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang:
Navigating the Data Trading Crossroads: An Interdisciplinary Survey. CoRR abs/2407.11466 (2024)
[i345]Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong:
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity. CoRR abs/2407.12273 (2024)
[i344]Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing Shao:
The Better Angels of Machine Personality: How Personality Relates to LLM Safety. CoRR abs/2407.12344 (2024)
[i343]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
ViLLa: Video Reasoning Segmentation with Large Language Model. CoRR abs/2407.14500 (2024)
[i342]Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Yuan-Fang Li, Cunjian Chen, Yu Qiao:
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models. CoRR abs/2407.15642 (2024)
[i341]Yangzhou Liu, Yue Cao, Zhangwei Gao, Weiyun Wang, Zhe Chen, Wenhai Wang, Hao Tian, Lewei Lu, Xizhou Zhu, Tong Lu, Yu Qiao, Jifeng Dai:
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity. CoRR abs/2407.15838 (2024)
[i340]Jingru Yu, Yi Yu, Xuhong Wang, Yilun Lin, Manzhi Yang, Yu Qiao, Fei-Yue Wang:
The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure. CoRR abs/2407.15912 (2024)
[i339]Lirui Zhao, Tianshuo Yang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang, Rongrong Ji:
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model. CoRR abs/2407.16982 (2024)
[i338]Xuemeng Yang, Licheng Wen, Yukai Ma, Jianbiao Mei, Xin Li, Tiantian Wei, Wenjie Lei, Daocheng Fu, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao:
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving. CoRR abs/2408.00415 (2024)
[i337]Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Yu Qiao, Hongsheng Li, Peng Gao:
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. CoRR abs/2408.02657 (2024)
[i336]Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao:
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models. CoRR abs/2408.02718 (2024)
[i335]Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E. Kinahan, Yu Qiao:
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge. CoRR abs/2408.02865 (2024)
[i334]Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J. Seibel, Junjun He, Yu Qiao:
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI. CoRR abs/2408.03361 (2024)
[i333]Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong:
Learning A Low-Level Vision Generalist via Visual Task Prompt. CoRR abs/2408.08601 (2024)
[i332]Yanbo Ding, Shaobin Zhuang, Kunchang Li, Zhengrong Yue, Yu Qiao, Yali Wang:
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration. CoRR abs/2408.10605 (2024)
[i331]Xiangtao Kong, Jinjin Gu, Yihao Liu, Wenlong Zhang, Xiangyu Chen, Yu Qiao, Chao Dong:
A Preliminary Exploration Towards General Image Restoration. CoRR abs/2408.15143 (2024)
[i330]Junyi Chen, Weicai Ye, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, Tong He:
GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction. CoRR abs/2409.06685 (2024)
[i329]Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Junlin Xi, Yu Qiao, Peng Gao, Hongsheng Li:
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions. CoRR abs/2409.15278 (2024)
[i328]Fuxian Huang, Qi Zhang, Shaopeng Zhai, Jie Wang, Tianyi Zhang, Haoran Zhang, Ming Zhou, Yu Liu, Yu Qiao:
CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation. CoRR abs/2409.15806 (2024)
[i327]Zhixuan Liu, Zhanhui Zhou, Yuanfu Wang, Chao Yang, Yu Qiao:
Inference-Time Language Model Alignment via Integrated Value Guidance. CoRR abs/2409.17819 (2024)
[i326]Bin Wang, Chao Xu, Xiaomeng Zhao
, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, Fukai Shang, Bo Zhang, Liqun Wei, Zhihao Sui, Wei Li, Botian Shi, Yu Qiao, Dahua Lin, Conghui He:
MinerU: An Open-Source Solution for Precise Document Content Extraction. CoRR abs/2409.18839 (2024)
[i325]Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quanfeng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, Ping Luo:
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation. CoRR abs/2410.05363 (2024)
[i324]Qingwen Bu, Hongyang Li, Li Chen, Jisong Cai, Jia Zeng, Heming Cui, Maoqing Yao, Yu Qiao:
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation. CoRR abs/2410.08001 (2024)
[i323]Yifan Zhan, Qingtian Zhu, Muyao Niu, Mingze Ma, Jiancheng Zhao, Zhihang Zhong, Xiao Sun, Yu Qiao, Yinqiang Zheng:
ToMiE: Towards Modular Growth in Enhanced SMPL Skeleton for 3D Human with Animatable Garments. CoRR abs/2410.08082 (2024)
[i322]Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jifeng Dai, Yu Qiao, Xizhou Zhu:
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. CoRR abs/2410.08202 (2024)
[i321]Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao:
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues. CoRR abs/2410.10700 (2024)
[i320]Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Bin Zhang, Nana Pei, Rongshan Yu, Yu Qiao, Junjun He:
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. CoRR abs/2410.11761 (2024)
[i319]Yiwei Guo, Shaobin Zhuang, Kunchang Li, Yu Qiao, Yali Wang:
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration. CoRR abs/2410.12183 (2024)
[i318]Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang
, Yong Liu, Yu Qiao, Jing Shao:
REEF: Representation Encoding Fingerprints for Large Language Models. CoRR abs/2410.14273 (2024)
[i317]Zhi Hou, Tianyi Zhang
, Yuwen Xiong, Hengjun Pu, Chengyang Zhao, Ronglei Tong, Yu Qiao, Jifeng Dai, Yuntao Chen:
Diffusion Transformer Policy. CoRR abs/2410.15959 (2024)
[i316]Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Jifeng Dai, Wenhai Wang:
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance. CoRR abs/2410.16261 (2024)
[i315]Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong:
An Intelligent Agentic System for Complex Image Restoration Problems. CoRR abs/2410.17809 (2024)
[i314]Hengwei Bian, Lingdong Kong, Haozhe Xie, Liang Pan, Yu Qiao, Ziwei Liu:
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes. CoRR abs/2410.18084 (2024)
[i313]Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong
:
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality. CoRR abs/2410.19355 (2024)
[i312]Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang:
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning. CoRR abs/2410.19702 (2024)
[i311]Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, Yian Wang, Qiushi Sun, Chengyou Jia, Kanzhi Cheng, Zichen Ding
, Liheng Chen, Paul Pu Liang, Yu Qiao:
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents. CoRR abs/2410.23218 (2024)
[i310]Tao Ma, Hongbin Zhou, Qiusheng Huang, Xuemeng Yang, Jianfei Guo, Bo Zhang, Min Dou, Yu Qiao, Botian Shi, Hongsheng Li:
ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving. CoRR abs/2411.05311 (2024)
[i309]Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Jinguo Zhu, Xizhou Zhu, Lewei Lu, Yu Qiao, Jifeng Dai:
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization. CoRR abs/2411.10442 (2024)
[i308]Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Chaochao Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao:
OASIS: Open Agent Social Interaction Simulations with One Million Agents. CoRR abs/2411.11581 (2024)
[i307]Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu:
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models. CoRR abs/2411.13503 (2024)
[i306]Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma
, Ying Chen, Ming Hu, Yanjun Li, Pengcheng Chen, Xiaowei Hu, Zhongying Deng, Yuanfeng Ji, Jin Ye, Yu Qiao, Junjun He:
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI. CoRR abs/2411.14522 (2024)
[i305]Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang:
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation. CoRR abs/2411.18499 (2024)
[i304]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
SyncVIS: Synchronized Video Instance Segmentation. CoRR abs/2412.00882 (2024)
[i303]Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu
, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang:
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling. CoRR abs/2412.05271 (2024)
[i302]Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang:
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel. CoRR abs/2412.08467 (2024)
[i301]Pan Zhang, Xiaoyi Dong, Yuhang Cao, Yuhang Zang, Rui Qian, Xilin Wei, Lin Chen, Yifei Li, Junbo Niu, Shuangrui Ding, Qipeng Guo, Haodong Duan, Xin Chen, Han Lv, Zheng Nie, Min Zhang, Bin Wang, Wenwei Zhang, Xinyue Zhang, Jiaye Ge, Wei Li, Jingwen Li, Zhongying Tu, Conghui He, Xingcheng Zhang, Kai Chen, Yu Qiao, Dahua Lin, Jiaqi Wang:
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions. CoRR abs/2412.09596 (2024)
[i300]Fan Zhang, Shulin Tian, Ziqi Huang, Yu Qiao, Ziwei Liu:
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models. CoRR abs/2412.09645 (2024)
[i299]Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai:
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. CoRR abs/2412.16158 (2024)
[i298]Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang:
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment. CoRR abs/2412.19326 (2024)
[i297]Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu:
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis. CoRR abs/2412.19723 (2024)
[i296]Yifei Huang, Jilan Xu, Baoqi Pei, Yuping He, Guo Chen, Lijin Yang, Xinyuan Chen, Yaohui Wang, Zheng Nie, Jinyao Liu, Guoshun Fan, Dechen Lin, Fang Fang, Kunpeng Li, Chang Yuan, Yali Wang, Yu Qiao, Limin Wang:
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model. CoRR abs/2412.21080 (2024)- 2023
[j92]Ruyun Hu
, Lihao Fu, Yongcan Chen, Junyu Chen, Yu Qiao, Tong Si
:
Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments. Briefings Bioinform. 24(1) (2023)
[j91]Mingye Xu, Zhipeng Zhou, Yali Wang, Yu Qiao:
Towards robustness and generalization of point cloud representation: A geometry coding method and a large-scale object-level dataset. Comput. Vis. Media 10(1): 27-43 (2023)
[j90]Kaiyang Zhou
, Ziwei Liu
, Yu Qiao
, Tao Xiang
, Chen Change Loy
:
Domain Generalization: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 45(4): 4396-4415 (2023)
[j89]Anran Liu
, Yihao Liu
, Jinjin Gu
, Yu Qiao
, Chao Dong
:
Blind Image Super-Resolution: A Survey and Beyond. IEEE Trans. Pattern Anal. Mach. Intell. 45(5): 5461-5480 (2023)
[j88]Mingye Xu
, Yali Wang
, Yihao Liu
, Tong He
, Yu Qiao
:
CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm. IEEE Trans. Pattern Anal. Mach. Intell. 45(8): 9583-9594 (2023)
[j87]Kunchang Li
, Yali Wang
, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li
, Yu Qiao
:
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(10): 12581-12600 (2023)
[j86]Yihao Liu
, Hengyuan Zhao
, Jinjin Gu
, Yu Qiao
, Chao Dong
:
Evaluating the Generalization Ability of Super-Resolution Networks. IEEE Trans. Pattern Anal. Mach. Intell. 45(12): 14497-14513 (2023)
[j85]Weicong Su
, Yali Wang, Kunchang Li, Peng Gao, Yu Qiao:
Hybrid token transformer for deep face recognition. Pattern Recognit. 139: 109443 (2023)
[j84]Shihua Li
, Haobin Chen
, Shijie Yu, Zhiqun He, Feng Zhu, Rui Zhao, Jie Chen
, Yu Qiao
:
COCAS+: Large-Scale Clothes-Changing Person Re-Identification With Clothes Templates. IEEE Trans. Circuits Syst. Video Technol. 33(4): 1839-1853 (2023)
[j83]Ming Li, Bin Fu
, Zhengfu Zhang, Yu Qiao
:
Character-Aware Sampling and Rectification for Scene Text Recognition. IEEE Trans. Multim. 25: 649-661 (2023)
[j82]Shixiang Wu, Chao Dong
, Yu Qiao
:
Blind Image Restoration Based on Cycle-Consistent Network. IEEE Trans. Multim. 25: 1111-1124 (2023)
[j81]Ming Li, Bin Fu
, Han Chen, Junjun He
, Yu Qiao
:
Dual Relation Network for Scene Text Recognition. IEEE Trans. Multim. 25: 4094-4107 (2023)
[j80]Yihao Liu
, Jingwen He
, Xiangyu Chen
, Zhengwen Zhang, Hengyuan Zhao, Chao Dong
, Yu Qiao
:
Very Lightweight Photo Retouching Network With Conditional Sequential Modulation. IEEE Trans. Multim. 25: 4638-4652 (2023)
[j79]Qitong Wang
, Bin Fu
, Ming Li, Junjun He
, Xi Peng
, Yu Qiao
:
Region-Aware Arbitrary-Shaped Text Detection With Progressive Fusion. IEEE Trans. Multim. 25: 4718-4729 (2023)
[j78]Yu Qiao
, Yuhao Liu
, Ziqi Wei
, Yuxin Wang
, Qiang Cai
, Guofeng Zhang
, Xin Yang:
Hierarchical and Progressive Image Matting. ACM Trans. Multim. Comput. Commun. Appl. 19(2): 52:1-52:23 (2023)
[j77]Shidong Wang
, Wei Zeng
, Xi Chen
, Yu Ye, Yu Qiao
, Chi-Wing Fu
:
ActFloor-GAN: Activity-Guided Adversarial Networks for Human-Centric Floorplan Design. IEEE Trans. Vis. Comput. Graph. 29(3): 1610-1624 (2023)
[c278]Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Jingjing Xu, Yu Qiao:
OpenICL: An Open-Source Framework for In-context Learning. ACL (demo) 2023: 489-498
[c277]Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei Li, Yu Qiao, Jingjing Xu:
Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation. ACL (Findings) 2023: 11518-11533
[c276]Jia Zeng, Li Chen, Hanming Deng, Lewei Lu, Junchi Yan, Yu Qiao, Hongyang Li:
Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection. CVPR 2023: 992-1001
[c275]Chenxin Tao, Xizhou Zhu
, Weijie Su
, Gao Huang, Bin Li, Jie Zhou, Yu Qiao, Xiaogang Wang, Jifeng Dai:
Siamese Image Modeling for Self-Supervised Vision Representation Learning. CVPR 2023: 2132-2141
[c274]Hao Li, Jinguo Zhu, Xiaohu Jiang
, Xizhou Zhu
, Hongsheng Li
, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai:
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks. CVPR 2023: 2691-2700
[c273]Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie:
Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision. CVPR 2023: 2935-2944
[c272]Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao:
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency. CVPR 2023: 4380-4390
[c271]Runnan Chen, Youquan Liu, Lingdong Kong
, Xinge Zhu
, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao, Wenping Wang:
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP. CVPR 2023: 7020-7030
[c270]Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao:
Uni3D: A Unified Baseline for Multi-Dataset 3D Object Detection. CVPR 2023: 9253-9262
[c269]Xuyang Shen, Dong Li, Jinxing Zhou, Zhen Qin, Bowen He, Xiaodong Han, Aixuan Li, Yuchao Dai, Lingpeng Kong, Meng Wang, Yu Qiao, Yiran Zhong:
Fine-grained Audible Video Description. CVPR 2023: 10585-10596
[c268]Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu
, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li
, Xiaogang Wang, Yu Qiao:
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. CVPR 2023: 14408-14419
[c267]Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong
, Yinan He, Yi Wang, Yali Wang, Yu Qiao:
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking. CVPR 2023: 14549-14560
[c266]Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Yu Qiao, Peng Gao, Hongsheng Li
:
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners. CVPR 2023: 15211-15222
[c265]Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao:
Bi3D: Bi-Domain Active Learning for Cross-Domain 3D Object Detection. CVPR 2023: 15599-15608
[c264]Weijie Su
, Xizhou Zhu
, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie Zhou, Jifeng Dai:
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information. CVPR 2023: 15888-15899
[c263]Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu, Xingjiao Wu
, Qin Chen, Yikang Li, Yu Qiao, Liang He:
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion. CVPR 2023: 17524-17534
[c262]Zhaoyang Xia, Youquan Liu, Xin Li, Xinge Zhu
, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao:
SCPNet: Semantic Scene Completion on Point Cloud. CVPR 2023: 17642-17651
[c261]Chenyu Yang
, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu
, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai:
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision. CVPR 2023: 17830-17839
[c260]Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu
, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, Lewei Lu, Xiaosong Jia, Qiang Liu, Jifeng Dai, Yu Qiao, Hongyang Li:
Planning-oriented Autonomous Driving. CVPR 2023: 17853-17862
[c259]Jiaqi Xu
, Xiaowei Hu, Lei Zhu, Qi Dou, Jifeng Dai, Yu Qiao, Pheng-Ann Heng:
Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior. CVPR 2023: 18053-18062
[c258]Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, Xiaowei Hu:
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions. CVPR 2023: 21747-21758
[c257]Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, Hongsheng Li
:
Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders. CVPR 2023: 21769-21780
[c256]Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong:
Activating More Pixels in Image Super-Resolution Transformer. CVPR 2023: 22367-22377
[c255]Bin Fu, Junjun He, Jianjun Wang, Yu Qiao:
Neural Transformation Fields for Arbitrary-Styled Font Generation. CVPR 2023: 22438-22447
[c254]Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang:
ResFormer: Scaling ViTs with Multi-Resolution Training. CVPR 2023: 22721-22731
[c253]Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
:
Stare at What You See: Masked Image Modeling without Reconstruction. CVPR 2023: 22732-22741
[c252]Yihao Liu
, Jingwen He, Jinjin Gu, Xiangtao Kong, Yu Qiao, Chao Dong:
DegAE: A New Pretraining Paradigm for Low-Level Vision. CVPR 2023: 23292-23303
[c251]Lingdong Kong
, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu
, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu:
Rethinking Range View Representation for LiDAR Segmentation. ICCV 2023: 228-240
[c250]Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Limin Wang, Yu Qiao:
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding. ICCV 2023: 1632-1643
[c249]Tao Ma, Xuemeng Yang, Hongbin Zhou
, Xin Li, Botian Shi, Junjie Liu, Yuchen Yang, Zhizheng Liu, Liang He, Yu Qiao, Yikang Li, Hongsheng Li
:
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds. ICCV 2023: 6713-6724
[c248]Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Ziteng Cui, Yu Qiao, Hongsheng Li
, Peng Gao:
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection. ICCV 2023: 9121-9132
[c247]Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao:
Scaling Data Generation in Vision-and-Language Navigation. ICCV 2023: 11975-11986
[c246]Mingfei Han, Yali Wang, Zhihui Li, Lina Yao, Xiaojun Chang
, Yu Qiao:
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation. ICCV 2023: 13368-13377
[c245]Bingkun Huang, Zhiyu Zhao, Guozhen Zhang, Yu Qiao
, Limin Wang:
MGMAE: Motion Guided Masking for Video Masked Autoencoding. ICCV 2023: 13447-13458
[c244]Lihe Yang, Zhen Zhao, Lei Qi, Yu Qiao
, Yinghuan Shi, Hengshuang Zhao:
Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning. ICCV 2023: 16141-16150
[c243]Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji
, Yu Qiao, Ping Luo:
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers. ICCV 2023: 17118-17128
[c242]Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, Limin Wang, Yu Qiao:
Unmasked Teacher: Towards Training-Efficient Video Foundation Models. ICCV 2023: 19891-19903
[c241]Youquan Liu, Runnan Chen, Xin Li, Lingdong Kong
, Yuchen Yang, Zhaoyang Xia, Yeqi Bai, Xinge Zhu
, Yuexin Ma, Yikang Li, Yu Qiao, Yuenan Hou:
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase. ICCV 2023: 21605-21616
[c240]Yu Qiao, Bo Dong
, Ao Jin, Yu Fu, Seung-Hwan Baek
, Felix Heide, Pieter Peers, Xiaopeng Wei, Xin Yang:
Multi-view Spectral Polarization Propagation for Video Glass Segmentation. ICCV 2023: 23161-23171
[c239]Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li
:
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models. ICCV (Workshops) 2023: 272-283
[c238]Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao:
Vision Transformer Adapter for Dense Predictions. ICLR 2023
[c237]Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Yu Qiao, Zhenguo Li, Ping Luo:
CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving. ICLR 2023
[c236]Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao:
Policy Pre-training for Autonomous Driving via Self-supervised Geometric Modeling. ICLR 2023
[c235]Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao:
Long-Term Rhythmic Video Soundtracker. ICML 2023: 40339-40353
[c234]Yu Qiao, Hengyi Zhang, Pengfei Sun, Yuan Tian, Yong Guan, Zhenzhou Shao, Zhiping Shi:
Parallelizable Simple Recurrent Units with Hierarchical Memory. ICONIP (15) 2023: 380-392
[c233]Licheng Wen, Daocheng Fu, Song Mao, Pinlong Cai, Min Dou, Yikang Li, Yu Qiao:
LimSim: A Long-Term Interactive Multi-Scenario Traffic Simulator. ITSC 2023: 1255-1262
[c232]Yunkun Zhang, Jin Gao
, Mu Zhou, Xiaosong Wang
, Yu Qiao, Shaoting Zhang, Dequan Wang:
Text-Guided Foundation Model Adaptation for Pathological Image Classification. MICCAI (5) 2023: 272-282
[c231]Hongjie Zhang
, Yi Liu
, Yali Wang
, Limin Wang
, Yu Qiao
:
Learning Discriminative Feature Representation for Open Set Action Recognition. ACM Multimedia 2023: 7696-7705
[c230]Jinjin Gu, Xianzheng Ma, Xiangtao Kong, Yu Qiao, Chao Dong:
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining. NeurIPS 2023
[c229]Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li:
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection. NeurIPS 2023
[c228]Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo:
Foundation Model is Efficient Multimodal Multitask Model Selector. NeurIPS 2023
[c227]Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo:
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. NeurIPS 2023
[c226]Keqiang Sun, Junting Pan, Yuying Ge, Hao Li, Haodong Duan, Xiaoshi Wu, Renrui Zhang, Aojun Zhou, Zipeng Qin, Yi Wang, Jifeng Dai, Yu Qiao, Limin Wang, Hongsheng Li:
JourneyDB: A Benchmark for Generative Image Understanding. NeurIPS 2023
[c225]Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai:
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. NeurIPS 2023
[c224]Jiakang Yuan, Bo Zhang, Xiangchao Yan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao:
AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset. NeurIPS 2023
[c223]Wenlong Zhang, Xiaohui Li, Guangyuan Shi, Xiangyu Chen, Yu Qiao, Xiaoyun Zhang, Xiao-Ming Wu, Chao Dong:
Real-World Image Super-Resolution as Multi-Task Learning. NeurIPS 2023
[c222]Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao:
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation. NeurIPS 2023
[i295]Penghao Wu, Li Chen, Hongyang Li, Xiaosong Jia, Junchi Yan, Yu Qiao:
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling. CoRR abs/2301.01006 (2023)
[i294]Runnan Chen, Youquan Liu, Lingdong Kong, Xinge Zhu
, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao, Wenping Wang:
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP. CoRR abs/2301.04926 (2023)
[i293]Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie:
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision. CoRR abs/2301.09121 (2023)
[i292]Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Hongsheng Li, Yu Qiao, Peng Gao:
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners. CoRR abs/2303.02151 (2023)
[i291]Zhenyu Wu, Yaoxiang Wang, Jiacheng Ye, Jiangtao Feng, Jingjing Xu, Yu Qiao, Zhiyong Wu:
OpenICL: An Open-Source Framework for In-context Learning. CoRR abs/2303.02913 (2023)
[i290]Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu, Xingjiao Wu, Qin Chen, Yikang Li, Yu Qiao, Liang He:
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion. CoRR abs/2303.03595 (2023)
[i289]Zhongying Deng, Xiaoyu Ren, Jin Ye, Junjun He, Yu Qiao:
FCN+: Global Receptive Convolution Makes FCN Great Again. CoRR abs/2303.04589 (2023)
[i288]Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu
, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu:
Rethinking Range View Representation for LiDAR Segmentation. CoRR abs/2303.05367 (2023)
[i287]Ziteng Cui, Lin Gu
, Xiao Sun, Yu Qiao, Tatsuya Harada:
Aleth-NeRF: Low-light Condition View Synthesis with Concealing Fields. CoRR abs/2303.05807 (2023)
[i286]Jiakang Yuan, Bo Zhang, Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao:
Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection. CoRR abs/2303.05886 (2023)
[i285]Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao:
Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection. CoRR abs/2303.06880 (2023)
[i284]Zhaoyang Xia, Youquan Liu, Xin Li, Xinge Zhu
, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao:
SCPNet: Semantic Scene Completion on Point Cloud. CoRR abs/2303.06884 (2023)
[i283]Jiaqi Xu, Xiaowei Hu, Lei Zhu, Qi Dou, Jifeng Dai, Yu Qiao, Pheng-Ann Heng:
Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior. CoRR abs/2303.09757 (2023)
[i282]Xuyang Shen, Dong Li, Jinxing Zhou, Zhen Qin, Bowen He, Xiaodong Han, Aixuan Li, Yuchao Dai, Lingpeng Kong, Meng Wang, Yu Qiao, Yiran Zhong:
Fine-grained Audible Video Description. CoRR abs/2303.15616 (2023)
[i281]Kunchang Li, Yali Wang, Yizhuo Li, Yi Wang, Yinan He, Limin Wang, Yu Qiao:
Unmasked Teacher: Towards Training-Efficient Video Foundation Models. CoRR abs/2303.16058 (2023)
[i280]Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao:
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. CoRR abs/2303.16199 (2023)
[i279]Limin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong
, Yinan He, Yi Wang, Yali Wang, Yu Qiao:
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking. CoRR abs/2303.16727 (2023)
[i278]Tianyu Li
, Li Chen, Xiangwei Geng, Huijie Wang, Yang Li, Zhenbo Liu, Shengyin Jiang, Yuting Wang, Hang Xu, Chunjing Xu, Feng Wen, Ping Luo, Junchi Yan, Wei Zhang, Xiaogang Wang, Yu Qiao, Hongyang Li:
Topology Reasoning for Driving Scenes. CoRR abs/2304.05277 (2023)
[i277]Ziyan Huang, Haoyu Wang, Zhongying Deng, Jin Ye, Yanzhou Su, Hui Sun, Junjun He, Yun Gu, Lixu Gu, Shaoting Zhang
, Yu Qiao:
STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training. CoRR abs/2304.06716 (2023)
[i276]Xiaoliang Ju, Yiyang Sun, Yiming Hao, Yikang Li, Yu Qiao, Hongsheng Li:
Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles. CoRR abs/2304.09365 (2023)
[i275]Huijie Wang, Zhenbo Liu, Yang Li, Tianyu Li
, Li Chen, Chonghao Sima, Yuting Wang, Shengyin Jiang, Feng Wen, Hang Xu, Ping Luo, Junchi Yan, Wei Zhang, Jun Yao, Yu Qiao, Hongyang Li:
Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving. CoRR abs/2304.10440 (2023)
[i274]Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu
:
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation. CoRR abs/2304.11829 (2023)
[i273]Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei Zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao:
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. CoRR abs/2304.15010 (2023)
[i272]Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao:
Long-Term Rhythmic Video Soundtracker. CoRR abs/2305.01319 (2023)
[i271]Yaohui Wang, Xin Ma, Xinyuan Chen, Antitza Dantcheva, Bo Dai, Yu Qiao:
LEO: Generative Latent Image Animator for Human Video Synthesis. CoRR abs/2305.03989 (2023)
[i270]Mingzhou Liu, Xinwei Sun
, Yu Qiao, Yizhou Wang:
Causal Discovery with Unobserved Variables: A Proxy Variable Approach. CoRR abs/2305.05281 (2023)
[i269]Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao:
InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language. CoRR abs/2305.05662 (2023)
[i268]Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, Yu Qiao:
VideoChat: Chat-Centric Video Understanding. CoRR abs/2305.06355 (2023)
[i267]Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai:
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. CoRR abs/2305.11175 (2023)
[i266]Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li:
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model. CoRR abs/2305.11176 (2023)
[i265]Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei Huang, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, Limin Wang:
VideoLLM: Modeling Video Sequence with Large Language Models. CoRR abs/2305.13292 (2023)
[i264]Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo:
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. CoRR abs/2305.15021 (2023)
[i263]Jinjin Gu, Xianzheng Ma, Xiangtao Kong, Yu Qiao, Chao Dong:
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining. CoRR abs/2305.15134 (2023)
[i262]Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei Zhang, Hongyang Li, Yu Qiao, Zhongjiang He, Peng Gao:
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation. CoRR abs/2305.16318 (2023)
[i261]Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su
, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai:
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory. CoRR abs/2305.17144 (2023)
[i260]Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo:
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers. CoRR abs/2305.17997 (2023)
[i259]Xiaoliang Ju, Zhaoyang Huang, Yijin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li:
DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation with Occupancy Prior. CoRR abs/2306.00519 (2023)
[i258]


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID