


default search action
Zheng Shou 0001
Mike Zheng Shou
Person information
- affiliation: National University of Singapore
- affiliation (former): Columbia University, New York, NY, USA
Other persons with the same name
- Zheng Shou — disambiguation page
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2025
- [j8]Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai:
A large cross-modal video retrieval dataset with reading comprehension. Pattern Recognit. 157: 110818 (2025) - 2024
- [j7]Henry Hengyuan Zhao
, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou:
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels. Int. J. Comput. Vis. 132(3): 731-749 (2024) - [j6]Alex Jinpeng Wang
, Pan Zhou
, Mike Zheng Shou
, Shuicheng Yan
:
Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts. IEEE Trans. Pattern Anal. Mach. Intell. 46(5): 3406-3421 (2024) - [j5]Weijia Wu, Yuzhong Zhao, Zhuang Li, Lianlei Shan, Hong Zhou, Mike Zheng Shou:
Continual Learning for Image Segmentation With Dynamic Query. IEEE Trans. Circuits Syst. Video Technol. 34(6): 4874-4886 (2024) - [j4]Bingxue Zhang
, Gang Chen
, Beng Chin Ooi
, Mike Zheng Shou
, Kian-Lee Tan
, Anthony K. H. Tung
, Xiaokui Xiao
, James Wei Luen Yip
, Meihui Zhang
:
Managing Metaverse Data Tsunami: Actionable Insights. IEEE Trans. Knowl. Data Eng. 36(12): 7423-7441 (2024) - [j3]Ming Li
, Huazhu Fu
, Shengfeng He
, Hehe Fan
, Jun Liu
, Jussi Keppo
, Mike Zheng Shou
:
DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition. IEEE Trans. Multim. 26: 6297-6309 (2024) - [c89]Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou:
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model. CVPR 2024: 1481-1490 - [c88]Yuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang:
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence. CVPR 2024: 7621-7630 - [c87]Yuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Mike Zheng Shou:
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis. CVPR 2024: 7631-7640 - [c86]Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou:
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing. CVPR 2024: 7664-7674 - [c85]Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou:
X- Adapter: Universal Compatibility of Plugins for Upgraded Diffusion Model. CVPR 2024: 8775-8784 - [c84]Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou:
AssistGUI: Task-Oriented PC Graphical User Interface Automation. CVPR 2024: 13289-13298 - [c83]Jinheng Xie, Songhe Deng, Bing Li, Haozhe Liu, Yawen Huang, Yefeng Zheng, Jürgen Schmidhuber, Bernard Ghanem, Linlin Shen, Mike Zheng Shou:
Tune-an-Ellipse: CLIP Has Potential to Find what you Want. CVPR 2024: 13723-13732 - [c82]Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou:
Bootstrapping SparseFormers from Vision Foundation Models. CVPR 2024: 17710-17721 - [c81]Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou:
VideoLLM-online: Online Video Large Language Model for Streaming Video. CVPR 2024: 18407-18418 - [c80]Jingtao Sun, Yaonan Wang, Mingtao Feng, Yulan Guo, Ajmal Mian, Mike Zheng Shou:
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream. CVPR 2024: 21146-21156 - [c79]Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou:
VIT-LENS: Towards Omni-modal Representations. CVPR 2024: 26637-26647 - [c78]Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou:
GENIXER: Empowering Multimodal Large Language Model as a Powerful Data Generator. ECCV (23) 2024: 129-147 - [c77]Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jia-Wei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou:
MotionDirector: Motion Customization of Text-to-Video Diffusion Models. ECCV (56) 2024: 273-290 - [c76]Weijia Wu
, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang:
DragAnything: Motion Control for Anything Using Entity Representation. ECCV (22) 2024: 331-348 - [c75]Hai Ci
, Pei Yang, Yiren Song
, Mike Zheng Shou
:
RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-key Identification. ECCV (28) 2024: 338-354 - [c74]Yiqi Lin
, Conghui He
, Alex Jinpeng Wang, Bin Wang
, Weijia Li, Mike Zheng Shou:
Parrot Captions Teach CLIP to Spot Text. ECCV (42) 2024: 368-385 - [c73]Kevin Qinghong Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Zheng Shou:
Learning Video Context as Interleaved Multimodal Sequences. ECCV (49) 2024: 375-396 - [c72]David Junhao Zhang, Mutian Xu, Jay Zhangjie Wu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou:
Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images. ECCV (40) 2024: 465-482 - [c71]Difei Gao
, Siyuan Hu
, Qinghong Lin
, Mike Zheng Shou
:
AssistGPT: Towards Multi-modal Agent for Human-Centric AI Assistant. HCMA@MM 2024: 3-5 - [c70]Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li:
Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks. ICASSP 2024: 226-230 - [c69]Ziteng Gao, Zhan Tong, Limin Wang, Mike Zheng Shou:
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens. ICLR 2024 - [c68]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. IJCAI 2024: 3160-3168 - [c67]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces. IJCAI 2024: 5862-5871 - [c66]Qi Mao
, Lan Chen
, Yuchao Gu
, Zhen Fang
, Mike Zheng Shou
:
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance. ACM Multimedia 2024: 6842-6850 - [c65]Difei Gao
, Siyuan Hu
, Zechen Bai
, Qinghong Lin
, Mike Zheng Shou
:
AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation. ACM Multimedia 2024: 11255-11257 - [c64]Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, liulei, Zheng Zhang, Mike Zheng Shou:
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos. NeurIPS 2024 - [c63]Feipeng Ma, Hongwei Xue, Yizhou Zhou, Guangting Wang, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Visual Perception by Large Language Model's Weights. NeurIPS 2024 - [c62]Alex Jinpeng Wang, Linjie Li, Yiqi Lin, Min Li, Lijuan Wang, Mike Zheng Shou:
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning. NeurIPS 2024 - [c61]Pei Yang, Hai Ci, Yiren Song, Mike Zheng Shou:
Can Simple Averaging Defeat Modern Watermarks? NeurIPS 2024 - [c60]Yiren Song
, Shijie Huang
, Chen Yao
, Hai Ci
, Xiaojun Ye
, Jiaming Liu
, Yuxuan Zhang
, Mike Zheng Shou
:
ProcessPainter: Learning to draw from sequence data. SIGGRAPH Asia 2024: 18:1-18:10 - [i148]Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, Jianfeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training. CoRR abs/2401.00849 (2024) - [i147]David Junhao Zhang, Dongxu Li, Hung Le, Mike Zheng Shou, Caiming Xiong, Doyen Sahoo:
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions. CoRR abs/2401.01827 (2024) - [i146]Jay Zhangjie Wu, Guian Fang, Haoning Wu, Xintao Wang, Yixiao Ge, Xiaodong Cun, David Junhao Zhang, Jia-Wei Liu, Yuchao Gu, Rui Zhao, Weisi Lin, Wynne Hsu, Ying Shan, Mike Zheng Shou:
Towards A Better Metric for Text-to-Video Generation. CoRR abs/2401.07781 (2024) - [i145]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces. CoRR abs/2401.13516 (2024) - [i144]Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou:
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models. CoRR abs/2402.01345 (2024) - [i143]Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tian:
Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters. CoRR abs/2402.13724 (2024) - [i142]Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang:
DragAnything: Motion Control for Anything using Entity Representation. CoRR abs/2403.07420 (2024) - [i141]Jingtao Sun, Yaonan Wang, Mingtao Feng, Chao Ding, Mike Zheng Shou, Ajmal Saeed Mian:
Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation. CoRR abs/2403.12728 (2024) - [i140]Wentian Zhang, Haozhe Liu, Jinheng Xie, Francesco Faccio, Mike Zheng Shou, Jürgen Schmidhuber:
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models. CoRR abs/2404.02747 (2024) - [i139]Hai Ci, Pei Yang, Yiren Song, Mike Zheng Shou:
RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification. CoRR abs/2404.14055 (2024) - [i138]Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou:
Learning Long-form Video Prior via Generative Pre-Training. CoRR abs/2404.15909 (2024) - [i137]Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou:
Hallucination of Multimodal Large Language Models: A Survey. CoRR abs/2404.18930 (2024) - [i136]Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Mike Zheng Shou:
LOVA3: Learning to Visual Question Answering, Asking and Assessment. CoRR abs/2405.14974 (2024) - [i135]Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Multi-Modal Generative Embedding Model. CoRR abs/2405.19333 (2024) - [i134]Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Visual Perception by Large Language Model's Weights. CoRR abs/2405.20339 (2024) - [i133]Alex Jinpeng Wang, Linjie Li, Yiqi Lin, Min Li, Lijuan Wang, Mike Zheng Shou:
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning. CoRR abs/2406.02547 (2024) - [i132]Yiren Song, Shijie Huang, Chen Yao, Xiaojun Ye, Hai Ci, Jiaming Liu, Yuxuan Zhang, Mike Zheng Shou:
ProcessPainter: Learn Painting Process from Sequence Data. CoRR abs/2406.06062 (2024) - [i131]Hai Ci, Yiren Song, Pei Yang, Jinheng Xie, Mike Zheng Shou:
WMAdapter: Adding WaterMark Control to Latent Diffusion Models. CoRR abs/2406.08337 (2024) - [i130]Pei Yang, Hai Ci, Yiren Song, Mike Zheng Shou:
Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious? CoRR abs/2406.09026 (2024) - [i129]Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
VideoGUI: A Benchmark for GUI Automation from Instructional Videos. CoRR abs/2406.10227 (2024) - [i128]Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou:
VideoLLM-online: Online Video Large Language Model for Streaming Video. CoRR abs/2406.11816 (2024) - [i127]Qinchen Wu, Difei Gao, Kevin Qinghong Lin, Zhuoyu Wu, Xiangwu Guo, Peiran Li, Weichen Zhang, Hengxu Wang, Mike Zheng Shou:
GUI Action Narrator: Where and When Did That Action Take Place? CoRR abs/2406.13719 (2024) - [i126]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. CoRR abs/2407.09521 (2024) - [i125]Kevin Qinghong Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Zheng Shou:
Learning Video Context as Interleaved Multimodal Sequences. CoRR abs/2407.21757 (2024) - [i124]Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou:
GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval. CoRR abs/2408.07249 (2024) - [i123]Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou:
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation. CoRR abs/2408.12528 (2024) - [i122]Shiwei Wu, Joya Chen, Kevin Qinghong Lin, Qimeng Wang, Yan Gao, Qianli Xu, Tong Xu, Yao Hu, Enhong Chen, Mike Zheng Shou:
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation. CoRR abs/2408.16730 (2024) - [i121]Zongbo Han, Jialong Yang, Junfan Li, Qinghua Hu, Qianli Xu, Mike Zheng Shou, Changqing Zhang:
DOTA: Distributional Test-Time Adaptation of Vision-Language Models. CoRR abs/2409.19375 (2024) - [i120]Zhongcong Xu, Chaoyue Song, Guoxian Song, Jianfeng Zhang, Jun Hao Liew, Hongyi Xu, You Xie, Linjie Luo, Guosheng Lin, Jiashi Feng, Mike Zheng Shou:
High Quality Human Image Animation using Regional Supervision and Motion Blur Condition. CoRR abs/2409.19580 (2024) - [i119]Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou:
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos. CoRR abs/2409.19603 (2024) - [i118]Ziyu Wang, Shuangpeng Han, Mike Zheng Shou, Mengmi Zhang:
Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos. CoRR abs/2410.03858 (2024) - [i117]Yepeng Liu, Yiren Song, Hai Ci, Yu Zhang, Haofan Wang, Mike Zheng Shou, Yuheng Bu:
Image Watermarks are Removable Using Controllable Regeneration from Clean Noise. CoRR abs/2410.05470 (2024) - [i116]Rui Zhao, Hangjie Yuan, Yujie Wei, Shiwei Zhang, Yuchao Gu, Lingmin Ran, Xiang Wang, Jay Zhangjie Wu, Junhao Zhang, Yingya Zhang, Mike Zheng Shou:
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models. CoRR abs/2410.07133 (2024) - [i115]Hongbin Xu, Weitao Chen, Zhipeng Zhou, Feng Xiao, Baigui Sun, Mike Zheng Shou, Wenxiong Kang:
ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model. CoRR abs/2410.09592 (2024) - [i114]Zijie Ye, Jia-Wei Liu, Jia Jia, Shikun Sun, Mike Zheng Shou:
Skinned Motion Retargeting with Dense Geometric Interaction Perception. CoRR abs/2410.20986 (2024) - [i113]David Junhao Zhang, Roni Paiss, Shiran Zada, Nikhil Karnad, David E. Jacobs, Yael Pritch, Inbar Mosseri, Mike Zheng Shou, Neal Wadhwa, Nataniel Ruiz:
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning. CoRR abs/2411.05003 (2024) - [i112]Siyuan Hu, Mingyu Ouyang, Difei Gao, Mike Zheng Shou:
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use. CoRR abs/2411.10323 (2024) - [i111]Binqian Xu, Xiangbo Shu, Haiyang Mei, Guosen Xie, Basura Fernando, Mike Zheng Shou, Jinhui Tang:
FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data. CoRR abs/2411.14717 (2024) - [i110]Weijia Wu, Mingyu Liu, Zeyu Zhu, Xi Xia, Haoen Feng, Wen Wang, Kevin Qinghong Lin, Chunhua Shen, Mike Zheng Shou:
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation. CoRR abs/2411.15262 (2024) - [i109]Zechen Bai, Jianxiong Gao, Ziteng Gao, Pichao Wang, Zheng Zhang, Tong He, Mike Zheng Shou:
Factorized Visual Tokenization and Generation. CoRR abs/2411.16681 (2024) - [i108]Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou:
ShowUI: One Vision-Language-Action Model for GUI Visual Agent. CoRR abs/2411.17465 (2024) - [i107]Yuchao Gu, Yipin Zhou, Yunfan Ye, Yixin Nie, Licheng Yu, Pingchuan Ma, Kevin Qinghong Lin, Mike Zheng Shou:
ROICtrl: Boosting Instance Control for Visual Generation. CoRR abs/2411.17949 (2024) - [i106]Yiren Song, Shengtao Lou, Xiaokang Liu, Hai Ci, Pei Yang, Jiaming Liu, Mike Zheng Shou:
Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation. CoRR abs/2412.05980 (2024) - [i105]Muhammet Ilaslan, Ali Koksal, Kevin Qinhong Lin, Burak Satar, Mike Zheng Shou, Qianli Xu:
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting. CoRR abs/2412.11621 (2024) - [i104]Yiren Song, Pei Yang, Hai Ci, Mike Zheng Shou:
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation. CoRR abs/2412.11638 (2024) - [i103]Yiren Song, Xiaokang Liu, Mike Zheng Shou:
DiffSim: Taming Diffusion Models for Evaluating Visual Similarity. CoRR abs/2412.14580 (2024) - 2023
- [j2]Wenqian Wang
, Faliang Chang
, Junhao Zhang, Rui Yan
, Chunsheng Liu
, Bin Wang, Mike Zheng Shou:
Magi-Net: Meta Negative Network for Early Activity Prediction. IEEE Trans. Image Process. 32: 3254-3265 (2023) - [c59]Stan Weixian Lei, Difei Gao, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou:
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task. AAAI 2023: 1250-1259 - [c58]Rui Yan, Mike Zheng Shou, Yixiao Ge, Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang:
Video-Text Pre-training with Learned Regions for Retrieval. AAAI 2023: 3100-3108 - [c57]Binjie Zhang, Shupeng Su, Yixiao Ge, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying Shan:
Darwinian Model Upgrades: Model Evolving with Selective Compatibility. AAAI 2023: 3393-3400 - [c56]Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing Kwong Chan, Chong-Wah Ngo, Mike Zheng Shou, Nan Duan:
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding. ACL (1) 2023: 8013-8028 - [c55]Shuning Chang, Pichao Wang, Fan Wang, Jiashi Feng, Mike Zheng Shou:
DOAD: Decoupled One Stage Action Detection Network. CVPR Workshops 2023: 3123-3232 - [c54]Shuning Chang, Pichao Wang, Ming Lin, Fan Wang, David Junhao Zhang, Rong Jin, Mike Zheng Shou:
Making Vision Transformers Efficient from A Token Sparsification View. CVPR 2023: 6195-6205 - [c53]Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Kevin Qinghong Lin, Satoshi Tsutsui, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
All in One: Exploring Unified Video-Language Pre-Training. CVPR 2023: 6598-6608 - [c52]Joya Chen, Difei Gao, Kevin Qinghong Lin, Mike Zheng Shou:
Affordance Grounding from Demonstration Video to Target Image. CVPR 2023: 6799-6808 - [c51]Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou:
MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering. CVPR 2023: 14773-14783 - [c50]Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang:
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval. CVPR 2023: 14846-14855 - [c49]Jinpeng Wang, Pan Zhou
, Mike Zheng Shou, Shuicheng Yan:
Position-Guided Text Prompt for Vision-Language Pre-Training. CVPR 2023: 23242-23251 - [c48]Muhammet Ilaslan, Chenan Song, Joya Chen, Difei Gao, Weixian Lei, Qianli Xu, Joo Lim, Mike Zheng Shou:
GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations. EMNLP 2023: 10462-10479 - [c47]Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen:
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models. ICCV 2023: 1206-1217 - [c46]Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou:
UniVTG: Towards Unified Video-Language Temporal Grounding. ICCV 2023: 2782-2792 - [c45]Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou:
Too Large; Data Reduction for Vision-Language Pre-Training. ICCV 2023: 3124-3134 - [c44]Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou
, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan:
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition. ICCV 2023: 5083-5092 - [c43]Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang:
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone. ICCV 2023: 5262-5274 - [c42]Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng, Mike Zheng Shou:
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion. ICCV 2023: 7418-7427 - [c41]Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation. ICCV 2023: 7589-7599 - [c40]Parantak Singh, You Li, Ankur Sikarwar, Weixian Lei, Difei Gao, Morgan B. Talbot, Ying Sun, Mike Zheng Shou, Gabriel Kreiman, Mengmi Zhang:
Learning to Learn: How to Continuously Teach Humans and Machines. ICCV 2023: 11674-11685 - [c39]Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He:
Unsupervised Open-Vocabulary Object Localization in Videos. ICCV 2023: 13701-13709 - [c38]Jia-Wei Liu, Yan-Pei Cao, Tianyuan Yang, Zhongcong Xu, Jussi Keppo, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video. ICCV 2023: 18437-18448 - [c37]Jay Zhangjie Wu, David Junhao Zhang, Wynne Hsu, Mengmi Zhang, Mike Zheng Shou:
Label-Efficient Online Continual Object Detection in Streaming Video. ICCV 2023: 19189-19198 - [c36]Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou:
Revisiting Vision Transformer from the View of Path Ensemble. ICCV 2023: 19832-19842 - [c35]