


default search action
32nd MM 2024: Melbourne, VIC, Australia
- Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian
, Liang Zheng, Vivek K. Singh
, Pablo César
, Lexing Xie
, Dong Xu:
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024 - 1 November 2024. ACM 2024, ISBN 979-8-4007-0686-8
Keynote Talks
- Pascale Fung
:
From Assistants to Agents in the LLM Era. 1 - Benoit Huet
:
Revolutionizing Lung Cancer Diagnostics with eyonis TM LCS: Cutting-edge AI/ML Technology-based SaMD for Enhanced Patient Care. 2-3 - Judy Kay
:
Empowering People to Harness and Control their Multimodal Data in Scrutable User models. 4-5 - Jiebo Luo
:
Large Multimodal Models as Social Multimedia Analysis Engines. 6-7
Oral Session 1: Large Language Models & Applications 1
- Haicheng Liao
, Yongkang Li
, Chengyue Wang
, Yanchen Guan
, Kahou Tam
, Chunlin Tian
, Li Li
, Chengzhong Xu
, Zhenning Li
:
When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models. 8-17 - Haonan Zheng
, Xinyang Deng
, Wen Jiang
, Wenrui Li
:
A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models. 18-27 - Xiang Fang
, Wanlong Fang
, Daizong Liu
, Xiaoye Qu
, Jianfeng Dong
, Pan Zhou
, Renfu Li, Zichuan Xu
, Lixing Chen
, Panpan Zheng
, Yu Cheng
:
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval using Language. 28-37 - Huishan Ji
, Qingyi Si
, Zheng Lin
, Weiping Wang
:
Towards Flexible Evaluation for Generative Visual Question Answering. 38-47 - Jiaqi Zhu
, Shaofeng Cai
, Fang Deng
, Beng Chin Ooi
, Junran Wu
:
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection. 48-57 - Yudong Li
, Xianxu Hou
, Dezhi Zheng
, Linlin Shen
, Zhe Zhao
:
FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Language-Image Pre-Training. 58-67
Oral Session 2: Large Language Models & Applications 2
- Esmée Henrieke Anne de Haas
, Lik-Hang Lee
, Yiming Huang
, Carlos Bermejo
, Pan Hui
, Zijun Lin
:
Towards Trustworthy MetaShopping: Studying Manipulative Audiovisual Designs in Virtual-Physical Commercial Platforms. 68-77 - Weiqi Li
, Shijie Zhao
, Bin Chen
, Xinhua Cheng
, Junlin Li
, Li Zhang
, Jian Zhang
:
ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images. 78-87 - Yunqiang Pei
, Kaiyue Zhang
, Hongrong Yang
, Yong Tao
, Qihang Tang
, Jialei Tang
, Guoqing Wang
, Zhitao Liu
, Ning Xie
, Peng Wang
, Yang Yang
, Hengtao Shen
:
Improving Interaction Comfort in Authoring Task in AR-HRI through Dynamic Dual-Layer Interaction Adjustment. 88-97 - Yang Lu
, Junxian Li
, Zhitong Cui
, Jiapeng Hu
, Yanna Lin
, Shijian Luo
:
Designing Spatial Visualization and Interactions of Immersive Sankey Diagram in Virtual Reality. 98-107 - Zhang Wan
, Sheng Tang
, Jiawei Wei
, Ruize Zhang
, Juan Cao
:
DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships. 108-116 - Kento Shigyo
, Yifan Cao
, Kentaro Takahira
, Mingming Fan
, Huamin Qu
:
VR-Mediated Cognitive Defusion: A Comparative Study for Managing Negative Thoughts. 117-126
Oral Session 3: Novel Multimedia Applications 1
- Yinxuan Gui
, Bin Zhu
, Jingjing Chen, Chong Wah Ngo
, Yu-Gang Jiang
:
Navigating Weight Prediction with Diet Diary. 127-136 - Feiyu Chen
, Cong Xu
, Qi Jia
, Yihua Wang
, Yuhan Liu
, Haotian Zhang
, Endong Wang
:
Egocentric Vehicle Dense Video Captioning. 137-146 - Jinyue Chen
, Lingyu Kong
, Haoran Wei
, Chenglong Liu
, Zheng Ge
, Liang Zhao
, Jianjian Sun
, Chunrui Han
, Xiangyu Zhang
:
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token. 147-155 - Jiawei Lin
, Zhaoyun Jiang
, Jiaqi Guo
, Shizhao Sun
, Ting Liu
, Zijiang Yang
, Jian-Guang Lou
, Dongmei Zhang
:
IconDM: Text-Guided Icon Set Expansion Using Diffusion Models. 156-165 - Haipeng Zhou
, Hongqiu Wang
, Tian Ye
, Zhaohu Xing
, Jun Ma
, Ping Li
, Qiong Wang
, Lei Zhu
:
Timeline and Boundary Guided Diffusion Network for Video Shadow Detection. 166-175 - Yichang Qu
, Bing Li
, Jie Huang
, Feng Zhao
:
Training Pansharpening Networks at Full Resolution Using Degenerate Invariance. 176-185
Oral Session 4: Graph and Diffusion Models
- Jielong Lu
, Zhihao Wu
, Zhaoliang Chen
, Zhiling Cai
, Shiping Wang
:
Towards Multi-view Consistent Graph Diffusion. 186-195 - Liyuan Ma
, Xueji Fang
, Guo-Jun Qi
:
Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization. 196-204 - Weilun Feng
, Chuanguang Yang
, Zhulin An
, Libo Huang
, Boyu Diao
, Fei Wang
, Yongjun Xu
:
Relational Diffusion Distillation for Efficient Image Generation. 205-213 - Hongjie Wu
, Linchao He
, Mingqin Zhang
, Dongdong Chen
, Kunming Luo
, Mengting Luo
, Jizhe Zhou
, Hu Chen
, Jiancheng Lv
:
Diffusion Posterior Proximal Sampling for Image Restoration. 214-223 - Yiheng Huang
, Hui Yang
, Chuanchen Luo
, Yuxi Wang
, Shibiao Xu
, Zhaoxiang Zhang
, Man Zhang
, Junran Peng
:
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework. 224-232 - Yichi Zhang
, Zhuo Chen
, Lingbing Guo
, Yajing Xu
, Wen Zhang
, Huajun Chen
:
Making Large Language Models Perform Better in Knowledge Graph Completion. 233-242
Oral Session 5: Multimodal Models and Applications
- Rishikesh Devanathan
, Apoorva Singh
, A. S. Poornash
, Sriparna Saha
:
Seeing Beyond Words: Multimodal Aspect-Level Complaint Detection in Ecommerce Videos. 243-252 - Hsiang-Hui Hung
, Huu-Phu Do
, Yung-Hui Li
, Ching-Chun Huang
:
TimeNeRF: Building Generalizable Neural Radiance Fields across Time from Few-Shot Input Views. 253-262 - Xiaoxuan Shen
, Fenghua Yu
, Yaqi Liu
, Ruxia Liang
, Qian Wan
, Kai Yang
, Jianwen Sun
:
Revisiting Knowledge Tracing: A Simple and Powerful Model. 263-272 - Peiming Li
, Ziyi Wang
, Mengyuan Liu
, Hong Liu
, Chen Chen
:
ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models. 273-281 - Bochao Liu
, Pengju Wang
, Weijia Guo
, Yong Li
, Liansheng Zhuang
, Weiping Wang
, Shiming Ge
:
Private Gradient Estimation is Useful for Generative Modeling. 282-290 - Ke Zhu
, Liang Zhao
, Zheng Ge
, Xiangyu Zhang
:
Self-Supervised Visual Preference Alignment. 291-300
Oral Session 6: Innovations in Medical Imaging and Physiological Measurement
- Yuxin Hong
, Xiao Zhang
, Xin Zhang
, Joey Tianyi Zhou
:
Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification. 301-310 - Ruiqi Wang
, Jinyang Huang
, Jie Zhang
, Xin Liu
, Xiang Zhang
, Zhi Liu
, Peng Zhao
, Sigui Chen
, Xiao Sun
:
FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks. 311-320 - Wei Zhang
, En Zhu
, Juan Chen
, Yunpeng Li
:
MDDR: Multi-modal Dual-Attention aggregation for Depression Recognition. 321-329 - Wei Qian
, Kun Li
, Dan Guo
, Bin Hu
, Meng Wang
:
Cluster-Phys: Facial Clues Clustering Towards Efficient Remote Physiological Measurement. 330-339 - Zhenxi Song
, Ruihan Qin
, Huixia Ren
, Zhen Liang
, Yi Guo
, Min Zhang
, Zhiguo Zhang
:
EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations. 340-349 - Xueyuan Xu
, Li Zhuo
, Jinxin Lu
, Xia Wu
:
WSEL: EEG Feature Selection with Weighted Self-expression Learning for Incomplete Multi-dimensional Emotion Recognition. 350-359
Oral Session 7: Imaging, Computer Vision & Graphics
- Yuanbo Wen
, Tao Gao
, Ting Chen
:
Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model. 360-369 - Zeyu Li
, Ruitong Gan
, Chuanchen Luo
, Yuxi Wang
, Jiaheng Liu
, Ziwei Zhu
, Qing Li
, Xucheng Yin
, Man Zhang
, Zhaoxiang Zhang
, Junran Peng
:
MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. 370-379 - Xiao Han
, Yiming Ren
, Peishan Cong
, Yujing Sun
, Jingya Wang
, Lan Xu
, Yuexin Ma
:
Gait Recognition in Large-scale Free Environment via Single LiDAR. 380-389 - Tang Tao
, Longfei Gao
, Guangrun Wang
, Yixing Lao
, Peng Chen
, Hengshuang Zhao
, Dayang Hao
, Xiaodan Liang
, Mathieu Salzmann
, Kaicheng Yu
:
LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields. 390-398 - Mu Chen
, Zhedong Zheng
, Yi Yang:
Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation. 399-408 - Yujian Mo
, Yan Wu
, Junqiao Zhao
, Zhenjie Hou
, Weiquan Huang
, Yinghao Hu
, Jijun Wang
, Jun Yan
:
Sparse Query Dense: Enhancing 3D Object Detection with Pseudo Points. 409-418
Oral Session 8: Multimodal Reasoning & Inference
- Changmeng Zheng
, Dayong Liang
, Wengyu Zhang
, Xiaoyong Wei
, Tat-Seng Chua
, Qing Li
:
A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning. 419-428 - Qian Guo
, Xinyan Liang
, Yuhua Qian
, Zhihua Cui
, Jie Wen
:
A Progressive Skip Reasoning Fusion Method for Multi-Modal Classification. 429-437 - Wenxin Xu
, Hexin Jiang
, Xuefeng Liang
:
Leveraging Knowledge of Modality Experts for Incomplete Multimodal Learning. 438-446 - Bo Xu
, Junzhe Zheng
, Jiayuan He
, Yuxuan Sun
, Hongfei Lin
, Liang Zhao
, Feng Xia
:
Generating Multimodal Metaphorical Features for Meme Understanding. 447-455 - Junjie Shi
, Caozhi Shang
, Zhaobin Sun
, Li Yu
, Xin Yang
, Zengqiang Yan
:
PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates. 456-465 - Mengze Li
, Kairong Han
, Jiahe Xu
, Yueying Li
, Tao Wu
, Zhou Zhao
, Jiaxu Miao
, Shengyu Zhang
, Jingyuan Chen
:
Cross-modal Observation Hypothesis Inference. 466-475
Oral Session 9: Image, Video, and Multimedia Processing
- Jiyang Li
, Lechao Cheng
, Zhangye Wang
, Tingting Mu
, Jingxuan He
:
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field. 476-485 - Chaofeng Chen
, Sensen Yang
, Haoning Wu
, Liang Liao
, Zicheng Zhang
, Annan Wang
, Wenxiu Sun
, Qiong Yan
, Weisi Lin
:
Q-Ground: Image Quality Grounding with Large Multi-modality Models. 486-495 - Cheng Ye
, Weidong Chen
, Jingyu Li
, Lei Zhang
, Zhendong Mao
:
Dual-path Collaborative Generation Network for Emotional Video Captioning. 496-505 - Hu Lin
, Chengjiang Long
, Yifeng Fei
, Qianchen Xia
, Erwei Yin
, Baocai Yin
, Xin Yang:
Exploring Matching Rates: From Keypoint Selection to Camera Relocalization. 506-514 - Zhihong Zhu
, Xuxin Cheng
, Zhaorun Chen
, Yuyan Chen
, Yunyan Zhang
, Xian Wu
, Yefeng Zheng
, Bowen Xing
:
InMu-Net: Advancing Multi-modal Intent Detection via Information Bottleneck and Multi-sensory Processing. 515-524 - Chaoya Jiang
, Hongrui Jia
, Mengfan Dong
, Wei Ye
, Haiyang Xu
, Ming Yan
, Ji Zhang
, Shikun Zhang
:
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models. 525-534
Oral Session 10: Speech and Audio in Multimedia Processing
- Zhongxu Wang
, Yujia Wang
, Mingzhu Li
, Hua Huang
:
ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations. 535-544 - Shuai Yu
, Xiaoliang He
, Ke Chen
, Yi Yu
:
HKDSME: Heterogeneous Knowledge Distillation for Semi-supervised Singing Melody Extraction Using Harmonic Supervision. 545-553 - Yixuan Zhou
, Xiaoyu Qin
, Zeyu Jin
, Shuoyi Zhou
, Shun Lei
, Songtao Zhou
, Zhiyong Wu
, Jia Jia
:
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. 554-563 - Navonil Majumder
, Chia-Yu Hung
, Deepanway Ghosal
, Wei-Ning Hsu
, Rada Mihalcea
, Soujanya Poria
:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. 564-572 - Xihua Wang
, Yuyue Wang
, Yihan Wu
, Ruihua Song
, Xu Tan
, Zehua Chen
, Hongteng Xu
, Guodong Sui
:
TiVA: Time-Aligned Video-to-Audio Generation. 573-582 - Alejandro Galán-Cuenca
, Jose J. Valero-Mas
, Juan C. Martinez-Sevilla
, Antonio Hidalgo-Centeno
, Antonio Pertusa
, Jorge Calvo-Zaragoza
:
MUSCAT: A Multimodal mUSic Collection for Automatic Transcription of Real Recordings and Image Scores. 583-591
Oral Session 11: Emotion & Sentiment
- Jianing Zhao
, Jingjing Wang
, Yujie Jin
, Jiamin Luo
, Guodong Zhou
:
Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model. 592-601 - Daiqing Wu
, Dongbao Yang
, Yu Zhou
, Can Ma
:
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs. 602-611 - Tan Yu
, Jingjing Wang
, Jiawen Wang
, Jiamin Luo
, Guodong Zhou
:
Towards Emotion-enriched Text-to-Motion Generation via LLM-guided Limb-level Emotion Manipulating. 612-621 - Wenjie Zheng
, Jianfei Yu
, Rui Xia
:
A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion Recognition. 622-631 - Xinji Mai
, Junxiong Lin
, Haoran Wang
, Zeng Tao
, Yan Wang
, Shaoqi Yan
, Xuan Tong
, Jiawen Yu
, Boyang Wang
, Ziheng Zhou
, Qing Zhao
, Shuyong Gao
, Wenqiang Zhang
:
All rivers run into the sea: Unified Modality Brain-Inspired Emotional Central Mechanism. 632-641 - Xin Li
, Shangfei Wang
, Xuandong Huang
:
Temporal Enhancement for Video Affective Content Analysis. 642-650
Poster Session 1
- Pei He
, Licheng Jiao
, Lingling Li
, Xu Liu
, Fang Liu
, Wenping Ma
, Shuyuan Yang
, Ronghua Shang
:
Domain Generalization-Aware Uncertainty Introspective Learning for 3D Point Clouds Segmentation. 651-660 - Yi Ma
, Peiqi Duan
, Yuchen Hong
, Chu Zhou
, Yu Zhang
, Jimmy S. J. Ren
, Boxin Shi
:
Color4E: Event Demosaicing for Full-color Event Guided Image Deblurring. 661-670 - Jiajie Zhu
, Xia Du
, Jizhe Zhou
, Chi-Man Pun
, Qizhen Xu
, Xiaoyuan Liu
:
DP-RAE: A Dual-Phase Merging Reversible Adversarial Example for Image Privacy Protection. 671-680 - Xinyi Zhang
, Qinpeng Cui
, Qiqi Bao
, Wenming Yang
, Qingmin Liao
:
Geometry-Guided Diffusion Model with Masked Transformer for Robust Multi-View 3D Human Pose Estimation. 681-690 - Meiqi Cao
, Rui Yan
, Xiangbo Shu
, Guangzhao Dai
, Yazhou Yao
, Guo-Sen Xie
:
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition. 691-700 - Junsheng Wang
, Tiantian Gong
, Yan Yan
:
Partially Aligned Cross-modal Retrieval via Optimal Transport-based Prototype Alignment Learning. 701-709 - Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, Depeng Dang:
Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring. 710-718 - Hangjun Che
, Xinyu Pu
, Deqiang Ouyang
, Beibei Li
:
Enhanced Tensorial Self-representation Subspace Learning for Incomplete Multi-view Clustering. 719-728 - Jian-Jun Qiao
, Meng-Yu Duan
, Xiao Wu
, Yu-Pei Song
:
CartoonNet: Cartoon Parsing with Semantic Consistency and Structure Correlation. 729-737 - Qianyu Guo
, Jieji Ren
, Haofen Wang
, Tianxing Wu
, Weifeng Ge
, Wenqiang Zhang
:
Visual-Language Collaborative Representation Network for Broad-Domain Few-Shot Image Classification. 738-747 - Wenzhuo Xu
, Kai Chen
, Ziyi Gao
, Zhipeng Wei
, Jingjing Chen, Yu-Gang Jiang
:
Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models. 748-757 - Hongzhi Wang
, Xiubo Liang
, Tao Zhang
, Yue Gu
, Weidong Geng
:
PSSD-Transformer: Powerful Sparse Spike-Driven Transformer for Image Semantic Segmentation. 758-767 - Zengsheng Kuang
, Changxing Ding
, Huan Yao
:
Learning Context with Priors for 3D Interacting Hand-Object Pose Estimation. 768-777 - Yang Chen
, Jingcai Guo
, Tian He
, Xiaocheng Lu
, Ling Wang
:
Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition. 778-786 - Shuo Zhang
, Yupeng Zhai
, Jilin Mei
, Yu Hu
:
FusionOcc: Multi-Modal Fusion for 3D Occupancy Prediction. 787-796 - Shaokun Wang
, Yifan Yu
, Yuhang He
, Yihong Gong
:
Enhancing Pre-trained ViTs for Downstream Task Adaptation: A Locality-Aware Prompt Learning Method. 797-806 - Fangming Cui
, Xun Yang
, Chao Wu
, Liang Xiao
, Xinmei Tian
:
Advancing Prompt Learning through an External Layer. 807-816 - Hanzi Wang
, Jiamin Ren
, Yifeng Ding
, Lei Ren
, Huixing Jiang
, Wei Chen
, Fangxiang Feng
, Xiaojie Wang
:
Q-MoE: Connector for MLLMs with Text-Driven Routing. 817-825 - Guozhen Peng
, Yunhong Wang
, Yuwei Zhao
, Shaoxiong Zhang
, Annan Li
:
GLGait: A Global-Local Temporal Receptive Field Network for Gait Recognition in the Wild. 826-835 - Qiang Wang
, Yuning Cui
, Yawen Li
, Yaping Ruan
, Ben Zhu
, Wenqi Ren
:
RFFNet: Towards Robust and Flexible Fusion for Low-Light Image Denoising. 836-845 - Minghe Gao
, Shuang Chen
, Liang Pang
, Yuan Yao
, Jisheng Dang
, Wenqiao Zhang
, Juncheng Li
, Siliang Tang
, Yueting Zhuang
, Tat-Seng Chua
:
Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales. 846-855 - Yue Zhang
, Parisa Kordjamshidi
:
Narrowing the Gap between Vision and Action in Navigation. 856-865 - Zequn Zeng
, Jianqiao Sun
, Hao Zhang
, Tiansheng Wen
, Yudi Su
, Yan Xie
, Zhengjue Wang
, Bo Chen
:
HICEScore: A Hierarchical Metric for Image Captioning Evaluation. 866-875 - Chen Feng
, Georgios Tzimiropoulos
, Ioannis Patras
:
CLIPCleaner: Cleaning Noisy Labels with CLIP. 876-885 - Haochen Zhao
, Hui Meng
, Deqian Yang
, Xiaozheng Xie
, Xiaoze Wu
, Qingfeng Li
, Jianwei Niu
:
GuidedNet: Semi-Supervised Multi-Organ Segmentation via Labeled Data Guide Unlabeled Data. 886-895 - Kin-Chung Chan
, Jun Xiao
, Hana Lebeta Goshu
, Kin-Man Lam
:
Point Cloud Densification for 3D Gaussian Splatting from Sparse Input Views. 896-904 - Xiaorui Huang
, Gen Luo
, Chaoyang Zhu
, Bo Tong
, Yiyi Zhou
, Xiaoshuai Sun
, Rongrong Ji
:
Deep Instruction Tuning for Segment Anything Model. 905-914 - Ziyi Wang
, Yiming Rong
, Deyang Jiang
, Haoran Wu
, Shiyu Zhou
, Bo Xu
:
CIEASR: Contextual Image-Enhanced Automatic Speech Recognition for Improved Homophone Discrimination. 915-924 - Jinxu Zhang
, Yongqi Yu
, Yu Zhang
:
CREAM: Coarse-to-Fine Retrieval and Multi-modal Efficient Tuning for Document VQA. 925-934 - Hebaixu Wang
, Hao Zhang
, Xunpeng Yi
, Xinyu Xiang
, Leyuan Fang
, Jiayi Ma
:
TeRF: Text-driven and Region-aware Flexible Visible and Infrared Image Fusion. 935-944 - Ruonan Zhang
, Ziwei Shang
, Fengjuan Wang
, Zhaoqilin Yang
, Shan Cao
, Yigang Cen
, Gaoyun An
:
Synergetic Prototype Learning Network for Unbiased Scene Graph Generation. 945-954 - Jiawei Zhu
, Yishu Liu
, Huanjia Zhu
, Hui Lin
, Yuncheng Jiang
, Zheng Zhang
, Bingzhi Chen
:
Combating Visual Question Answering Hallucinations via Robust Multi-Space Co-Debias Learning. 955-964 - Qian Cao
, Xu Chen
, Ruihua Song
, Xiting Wang
, Xinting Huang
, Yuchen Ren
:
See or Guess: Counterfactually Regularized Image Captioning. 965-974 - Shuai Li
, Fan Qi
, Zixin Zhang
, Changsheng Xu
:
Cross-Modal Meta Consensus for Heterogeneous Federated Learning. 975-984 - Xiang He
, Xiangxi Liu
, Yang Li
, Dongcheng Zhao
, Guobin Shen
, Qingqun Kong
, Xin Yang
, Yi Zeng
:
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization. 985-993 - Jiabao Guo
, Huan Liu
, Yizhi Luo
, Xueli Hu
, Hang Zou
, Yuan Zhang
, Hui Liu
, Bo Zhao
:
Style-conditional Prompt Token Learning for Generalizable Face Anti-spoofing. 994-1003 - Bowen Chen
, Yun Sing Koh
, Gillian Dobbie
:
SSAT-Adapter: Enhancing Vision-Language Model Few-shot Learning with Auxiliary Tasks. 1004-1013 - Haoyu Tong
, Xiaoyu Zhang
, Yulin Jin
, Jian Lou
, Kai Wu
, Xiaofeng Chen
:
Balancing Generalization and Robustness in Adversarial Training via Steering through Clean and Adversarial Gradient Directions. 1014-1023 - Shuo Zheng
, Yuanjie Dang
, Peng Chen
, Ruohong Huan
, Dongdong Zhao
, Ronghua Liang
:
Saliency-Guided Fine-Grained Temporal Mask Learning for Few-Shot Action Recognition. 1024-1033 - Mengyin Liu
, Chao Zhu
, Shiqi Ren
, Xu-Cheng Yin
:
Unsupervised Multi-view Pedestrian Detection. 1034-1042 - Zhilin Huang
, Yijie Yu
, Ling Yang
, Chujun Qin
, Bing Zheng
, Xiawu Zheng
, Zikun Zhou
, Yaowei Wang
, Wenming Yang
:
Motion-aware Latent Diffusion Models for Video Frame Interpolation. 1043-1052 - Zongxin Ye
, Wenyu Li
, Sidun Liu
, Peng Qiao
, Yong Dou
:
AbsGS: Recovering Fine Details in 3D Gaussian Splatting. 1053-1061 - Ziming Wang
, Boxiang Zhang, Ming Ma, Yue Wang
, Taoli Du
, Wenhui Li:
Multi-fineness Boundaries and the Shifted Ensemble-aware Encoding for Point Cloud Semantic Segmentation. 1062-1071 - Yubo Wang
, Chaohu Liu
, Yanqiu Qu
, Haoyu Cao
, Deqiang Jiang
, Linli Xu
:
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models. 1072-1081 - Wenhao Li
, Qiangchang Wang
, Peng Zhao
, Yilong Yin
:
KNN Transformer with Pyramid Prompts for Few-Shot Learning. 1082-1091 - Lu Zhang
, Ke Yan
, Shouhong Ding
:
AlignCLIP: Align Multi Domains of Texts Input for CLIP models with Object-IoU Loss. 1092-1100 - Pengfei Yue
, Jianghang Lin
, Shengchuan Zhang
, Jie Hu
, Yilin Lu
, Hongwei Niu
, Haixin Ding
, Yan Zhang
, Guannan Jiang
, Liujuan Cao
, Rongrong Ji
:
Adaptive Selection based Referring Image Segmentation. 1101-1110 - Shanshan Wang
, ALuSi
, Xun Yang
, Ke Xu
, Huibin Tan
, Xingyi Zhang
:
Dual-stream Feature Augmentation for Domain Generalization. 1111-1119 - Yang Liu
, Xiang Huang
, Minghan Qin
, Qinwei Lin
, Haoqian Wang
:
Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars. 1120-1129 - Wei Feng
, Dongyuan Wei
, Qianqian Wang
, Bo Dong
, Quanxue Gao
:
Multi-View Clustering Based on Deep Non-negative Tensor Factorization. 1130-1138 - Aoqi Li
, Saihui Hou
, Chenye Wang
, Qingyuan Cai
, Yongzhen Huang
:
AerialGait: Bridging Aerial and Ground Views for Gait Recognition. 1139-1147 - Zefan Zhang
, Weiqi Zhang
, Yanhui Li
, Tian Bai
:
Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization. 1148-1157 - Xiaochen Li
, Jian Cheng
, Ziying Xia
, Zichong Chen
, Junhao Shi
, Zhicheng Dong
, Nyima Tashi
:
TS-ILM: Class Incremental Learning for Online Action Detection. 1158-1167 - Yuxiang Cai
, Yongheng Shang
, Jianwei Yin
:
MultiDAN: Unsupervised, Multistage, Multisource and Multitarget Domain Adaptation for Semantic Segmentation of Remote Sensing Images. 1168-1177 - Yu Tong
, Weihai Lu
, Zhe Zhao
, Song Lai
, Tong Shi
:
MMDFND: Multi-modal Multi-Domain Fake News Detection. 1178-1186 - Minghang Zheng
, Jiahua Zhang
, Qingchao Chen
, Yuxin Peng
, Yang Liu
:
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding. 1187-1196 - Shilong Jia
, Tingting Wu
, Yingying Fang
, Tieyong Zeng
, Guixu Zhang
, Zhi Li
:
Purified Distillation: Bridging Domain Shift and Category Gap in Incremental Object Detection. 1197-1205 - Haonan Zhang
, Pengpeng Zeng
, Lianli Gao
, Jingkuan Song
, Heng Tao Shen
:
MPT: Multi-grained Prompt Tuning for Text-Video Retrieval. 1206-1214 - Ziwei Zheng
, Zechuan Zhang
, Yulin Wang
, Shiji Song
, Gao Huang
, Le Yang
:
Rethinking the Architecture Design for Efficient Generic Event Boundary Detection. 1215-1224 - Jinglun Li
, Xinyu Zhou
, Kaixun Jiang
, Lingyi Hong
, Pinxue Guo
, Zhaoyu Chen
, Weifeng Ge
, Wenqiang Zhang
:
TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning. 1225-1234 - Zihan Cao
, Xiao Wu
, Liang-Jian Deng
, Yu Zhong
:
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion. 1235-1244 - Zhenyu Yang
, Shengsheng Qian
, Dizhan Xue
, Jiahong Wu
, Fan Yang
, Weiming Dong
, Changsheng Xu
:
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval. 1245-1254 - Zeyu Jin
, Jia Jia
, Qixin Wang
, Kehan Li
, Shuoyi Zhou
, Songtao Zhou
, Xiaoyu Qin
, Zhiyong Wu
:
SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description. 1255-1264 - Lihao Liu
, Yanqi Cheng
, Zhongying Deng
, Shujun Wang
, Dongdong Chen
, Xiaowei Hu
, Pietro Liò
, Carola-Bibiane Schönlieb
, Angelica E. Avilés-Rivero
:
TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios. 1265-1273 - Jing Yang, Xiaowen Jiang, Yuan Gao, Laurence T. Yang
, Jieming Yang:
Generalize to Fully Unseen Graphs: Learn Transferable Hyper-Relation Structures for Inductive Link Prediction. 1274-1282 - Panjun Liu
, Jiacheng Li
, Lizhi Wang
, Zheng-Jun Zha
, Zhiwei Xiong
:
MLP Embedded Inverse Tone Mapping. 1283-1291 - Mingkai Lin
, Wenzhong Li
, Xiaobin Hong
, Sanglu Lu
:
Scalable Multi-Source Pre-training for Graph Neural Networks. 1292-1301 - Xiaole Zhao
, Linze Li
, Chengxing Xie
, Xiaoming Zhang
, Ting Jiang
, Wenjie Lin
, Shuaicheng Liu
, Tianrui Li
:
Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation. 1302-1310 - Minsu Kim
, Jeong Hun Yeo
, Se Jin Park
, Hyeongseop Rha
, Yong Man Ro
:
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation. 1311-1320 - Shoutong Luo
, Zhengxing Sun
, Yi Wang
, Yunhan Sun
, Chendi Zhu
:
LDCNet: Long-Distance Context Modeling for Large-Scale 3D Point Cloud Scene Semantic Segmentation. 1321-1330 - Yiming Cui
, Liang Li
, Jiehua Zhang
, Chenggang Yan
, Hongkui Wang
, Shuai Wang
, Heng Jin
, Li Wu
:
Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection. 1331-1340 - Zhuoling Li
, Yong Wang
, Kaitong Li
:
FewVS: A Vision-Semantics Integration Framework for Few-Shot Image Classification. 1341-1350 - Yuyan Bu
, Qiang Sheng
, Juan Cao
, Peng Qi
, Danding Wang
, Jintao Li
:
FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process. 1351-1360 - Subash Khanal
, Eric Xing
, Srikumar Sastry
, Aayush Dhakal
, Zhexiao Xiong
, Adeel Ahmad
, Nathan Jacobs
:
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping. 1361-1369 - Zizhao Wu
, Haohan Li
, Gongyi Chen
, Zhou Yu
, Xiaoling Gu
, Yigang Wang
:
3D Question Answering with Scene Graph Reasoning. 1370-1378 - Liang He
, Hongke Wang
, Zhen Wu
, Jianbing Zhang
, Xinyu Dai
, Jiajun Chen
:
Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media. 1379-1388 - Yuanchen Wu
, Xiaoqiang Li
, Jide Li
, Kequan Yang
, Pinpin Zhu
, Shaohua Zhang
:
DINO is Also a Semantic Guider: Exploiting Class-aware Affinity for Weakly Supervised Semantic Segmentation. 1389-1397 - Dongshuo Yin
, Xueting Han
, Bin Li
, Hao Feng
, Jing Bai
:
Parameter-efficient is not Sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions. 1398-1406 - Rongwen Li
, Haiyang Hu
, Liang Du
, Jiarong Chen
, Bingbing Jiang
, Peng Zhou
:
One-Stage Fair Multi-View Spectral Clustering. 1407-1416 - Jingfan Tan
, Hyunhee Park
, Ying Zhang
, Tao Wang
, Kaihao Zhang
, Xiangyu Kong
, Pengwen Dai
, Zikun Liu
, Wenhan Luo
:
Blind Face Video Restoration with Temporal Consistent Generative Prior and Degradation-Aware Prompt. 1417-1426 - Yinghui Sun
, Xingfeng Li
, Quansen Sun
, Min-Ling Zhang
, Zhenwen Ren
:
Improved Weighted Tensor Schatten p-Norm for Fast Multi-view Graph Clustering. 1427-1436 - Xinjie Jiang
, Chenxi Zheng
, Xuemiao Xu
, Bangzhen Liu
, Weiying Zheng
, Huaidong Zhang
, Shengfeng He
:
VrdONE: One-stage Video Visual Relation Detection. 1437-1446 - Chenxi Ma
, Weimin Tan
, Shili Zhou
, Bo Yan
:
Learning Cross-Spectral Prior for Image Super-Resolution. 1447-1455 - Dayu Hu
, Suyuan Liu
, Jun Wang
, Junpu Zhang
, Siwei Wang
, Xingchen Hu
, Xinzhong Zhu, Chang Tang
, Xinwang Liu
:
Reliable Attribute-missing Multi-view Clustering with Instance-level and feature-level Cooperative Imputation. 1456-1466 - Duc Dang Trung Tran
, Byeongkeun Kang
, Yeejin Lee
:
MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation. 1467-1475 - Jingjing Hu
, Dan Guo
, Kun Li
, Zhan Si
, Xun Yang
, Meng Wang
:
Maskable Retentive Network for Video Moment Retrieval. 1476-1485 - Junming Hou
, Zihan Cao
, Naishan Zheng
, Xuan Li
, Xiaoyu Chen
, Xinyang Liu
, Xiaofeng Cong
, Danfeng Hong
, Man Zhou
:
Linearly-evolved Transformer for Pan-sharpening. 1486-1494 - Zhenhao Yang
, Xin Liu
, Deqiang Ouyang
, Guiduo Duan
, Dongyang Zhang
, Tao He
, Yuan-Fang Li
:
Towards Open-vocabulary HOI Detection with Calibrated Vision-language Models and Locality-aware Queries. 1495-1504 - Kang Zeng
, Hao Shi
, Jiacheng Lin
, Siyu Li
, Jintao Cheng
, Kaiwei Wang
, Zhiyong Li
, Kailun Yang
:
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model. 1505-1513 - Tao Tang
, Hong Liu
, Yingxuan You
, Ti Wang
, Wenhao Li
:
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos. 1514-1523 - Xudong Lu
, Yuqi Jiang
, Haiwen Hong
, Qi Sun
, Cheng Zhuo
:
DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion. 1524-1533 - Wenbin Zou
, Hongxia Gao
, Weipeng Yang
, Tongtong Liu
:
Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement. 1534-1543 - Junwei He, Qianqian Xu
, Yangbangyan Jiang
, Zitai Wang
, Yuchen Sun
, Qingming Huang
:
HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection. 1544-1553 - Ke Liang
, Lingyuan Meng
, Yue Liu
, Meng Liu
, Wei Wei
, Suyuan Liu
, Wenxuan Tu
, Siwei Wang
, Sihang Zhou
, Xinwang Liu
:
Simple Yet Effective: Structure Guided Pre-trained Transformer for Multi-modal Knowledge Graph Reasoning. 1554-1563 - Yuning Ding
, Sifan Zhang
, Shenglan Liu
, Jinrong Zhang
, Wenyue Chen
, Haifei Duan
, Bingcheng Dong
, Tao Sun
:
2M-AF: A Strong Multi-Modality Framework For Human Action Quality Assessment with Self-supervised Representation Learning. 1564-1572 - Liqiu Chen
, Yuqing Huang
, Hengyu Li
, Zikun Zhou
, Zhenyu He
:
Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT Tracking. 1573-1582 - Can Cui
, Siteng Huang
, Wenxuan Song
, Pengxiang Ding
, Min Zhang
, Donglin Wang
:
ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification. 1583-1592 - Tianqi Wei
, Zhi Chen
, Zi Huang
, Xin Yu
:
Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline. 1593-1601 - Jiaming Lei
, Lin Li
, Chunping Wang
, Jun Xiao
, Long Chen
:
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer. 1602-1611 - Jinyong Wen
:
Gaussian Mutual Information Maximization for Efficient Graph Self-Supervised Learning: Bridging Contrastive-based to Decorrelation-based. 1612-1621 - Haowei Kuang
, Yiyang Ma
, Wenhan Yang
, Zongming Guo
, Jiaying Liu
:
Consistency Guided Diffusion Model with Neural Syntax for Perceptual Image Compression. 1622-1631 - Zhangchi Feng
, Richong Zhang
, Zhijie Nie
:
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. 1632-1641 - Guanchen Ding
, Lingbo Liu
, Zhenzhong Chen
, Changwen Chen
:
Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity Augmentation. 1642-1651 - Cunhang Fan
, Jingjing Zhang
, Hongyu Zhang
, Wang Xiang
, Jianhua Tao
, Xinhui Li
, Jiangyan Yi
, Dianbo Sui
, Zhao Lv
:
MSFNet: Multi-Scale Fusion Network for Brain-Controlled Speaker Extraction. 1652-1661 - Zhong Ji
, Changxu Meng
, Yan Zhang
, Haoran Wang
, Yanwei Pang
, Jungong Han
:
Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning. 1662-1671 - Jinyan Zhang
, Mengyuan Liu
, Hong Liu
, Guoquan Wang
, Wenhao Li
:
APP: Adaptive Pose Pooling for 3D Human Pose Estimation from Videos. 1672-1681 - Jing Bi
, Yunlong Tang
, Luchuan Song
, Ali Vosoughi
, Nguyen Nguyen
, Chenliang Xu
:
EAGLE: Egocentric AGgregated Language-video Engine. 1682-1691 - Kai Yin
, Jie Shen
:
Expanded Convolutional Neural Network Based Look-Up Tables for High Efficient Single-Image Super-Resolution. 1692-1700 - Zheng Han
, Xiaobin Zhu
, Chun Yang
, Hongyang Zhou
, Jingyan Qin
, Xu-Cheng Yin
:
Exploring Stable Meta-Optimization Patterns via Differentiable Reinforcement Learning for Few-Shot Classification. 1701-1710 - Yixin Guo
, Yu Liu
, Jianghao Li
, Weimin Wang
, Qi Jia
:
Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection. 1711-1720 - Jiangbin Zheng
, Han Zhang
, Qianqing Xu
, An-Ping Zeng
, Stan Z. Li
:
MetaEnzyme: Meta Pan-Enzyme Learning for Task-Adaptive Redesign. 1721-1730 - Yiming Zhong
, Xiaolin Zhang
, Yao Zhao
, Yunchao Wei
:
DreamLCM: Towards High Quality Text-to-3D Generation via Latent Consistency Model. 1731-1740 - Anna Zhu
, Ke Xiao
, Bo Zhou
, Runmin Wang
:
Trust Prophet or Not? Taking a Further Verification Step toward Accurate Scene Text Recognition. 1741-1750 - Gongli Xi
, Ye Tian
, Mengyu Yang
, Lanshan Zhang
, Xirong Que
, Wendong Wang
:
Global Patch-wise Attention is Masterful Facilitator for Masked Image Modeling. 1751-1760 - Chenghao Deng, Haote Xu, Xiaolu Chen, Haodi Xu, Xiaotong Tu, Xinghao Ding, Yue Huang:
SimCLIP: Refining Image-Text Alignment with Simple Prompts for Zero-/Few-shot Anomaly Detection. 1761-1770 - Yuanhe Tian
, Fei Xia
, Yan Song
:
Diffusion Networks with Task-Specific Noise Control for Radiology Report Generation. 1771-1780 - Yun Xing
, Qing Guo
, Xiaofeng Cao
, Ivor W. Tsang
, Lei Ma
:
MetaRepair: Learning to Repair Deep Neural Networks from Repairing Experiences. 1781-1790 - Xingtao Wang
, Xianqi Zhang
, Wenxue Cui
, Ruiqin Xiong
, Xiaopeng Fan
, Debin Zhao
:
Mesh Denoising Using Filtering Coefficients Jointly Aware of Noise and Geometry. 1791-1799 - Yan Zhuang, Yanru Zhang, Zheng Hu, Xiaoyue Zhang, Jiawen Deng, Fuji Ren:
GLoMo: Global-Local Modal Fusion for Multimodal Sentiment Analysis. 1800-1809 - Yuhui Wu
, Guoqing Wang
, Zhiwen Wang
, Yang Yang
, Tianyu Li
, Malu Zhang
, Chongyi Li
, Heng Tao Shen
:
JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement. 1810-1818 - Zichen Wen
, Tianyi Wu
, Yazhou Ren
, Yawen Ling
, Chenhang Cui
, Xiaorong Pu
, Lifang He
:
Dual-Optimized Adaptive Graph Reconstruction for Multi-View Graph Clustering. 1819-1828 - Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren:
3D Priors-Guided Diffusion for Blind Face Restoration. 1829-1838 - Hao Wu
, Likun Zhang
, Shucheng Li
, Fengyuan Xu
, Sheng Zhong
:
CoAst: Validation-Free Contribution Assessment for Federated Learning based on Cross-Round Valuation. 1839-1847 - Kang Xia
, Wenzhong Li
, Yimiao Shao
, Sanglu Lu
:
Vi2ACT: Video-enhanced Cross-modal Co-learning with Representation Conditional Discriminator for Few-shot Human Activity Recognition. 1848-1856 - Seonggwan Ko
, Yeong Jun Koh
, Donghyeon Cho
:
Reference-based Burst Super-resolution. 1857-1865 - Yi Zhang, Zhefeng Wang, Rui Hu, Xinyu Duan, Yi Zheng, Baoxing Huai, Jiarun Han, Jitao Sang:
Poisoning for Debiasing: Fair Recognition via Eliminating Bias Uncovered in Data Poisoning. 1866-1874 - Dizhan Xue
, Shengsheng Qian
, Changsheng Xu
:
Few-Shot Multimodal Explanation for Visual Question Answering. 1875-1884 - Jingtao Wang
, Zechao Li
:
3DPCP-Net: A Lightweight Progressive 3D Correspondence Pruning Network for Accurate and Efficient Point Cloud Registration. 1885-1894 - Jiawei Ge
, Jiuxin Cao
, Xuelin Zhu
, Xinyu Zhang
, Chang Liu
, Kun Wang
, Bo Liu
:
Consistencies are All You Need for Semi-supervised Vision-Language Tracking. 1895-1904 - Zhen Zou
, Hu Yu
, Jie Huang
, Feng Zhao
:
FreqMamba: Viewing Mamba from a Frequency Perspective for Image Deraining. 1905-1914 - Zhida Zhao
, Jia Li
, Lijun Wang
, Yifan Wang
, Huchuan Lu
:
MaskMentor: Unlocking the Potential of Masked Self-Teaching for Missing Modality RGB-D Semantic Segmentation. 1915-1923 - Linli Yao
, Yuanmeng Zhang
, Ziheng Wang
, Xinglin Hou
, Tiezheng Ge
, Yuning Jiang
, Xu Sun
, Qin Jin
:
Edit As You Wish: Video Caption Editing with Multi-grained User Control. 1924-1933 - Wenlin Li
, Yucheng Xu
, Xiaoqing Zheng
, Suoya Han
, Jun Wang
, Xiaobo Sun
:
Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images. 1934-1942 - Zhiwei Hao
, Zhongyu Xiao
, Yong Luo
, Jianyuan Guo
, Jing Wang
, Li Shen
, Han Hu
:
PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation. 1943-1951 - Kaixin Shen
, Ruijie Quan
, Linchao Zhu
, Jun Xiao
, Yi Yang:
Neural Interaction Energy for Multi-Agent Trajectory Prediction. 1952-1960 - Hao Gu
, Jiangyan Yi
, Chenglong Wang
, Yong Ren
, Jianhua Tao
, Xinrui Yan
, Yujie Chen
, Xiaohui Zhang
:
Utilizing Speaker Profiles for Impersonation Audio Detection. 1961-1970 - Zejun Li
, Ye Wang
, Mengfei Du
, Qingwen Liu
, Binhao Wu
, Jiwen Zhang
, Chengxing Zhou
, Zhihao Fan
, Jie Fu
, Jingjing Chen, Zhongyu Wei
, Xuanjing Huang
:
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks. 1971-1980 - Jiankang Chen
, Ling Deng
, Zhiyong Gan
, Wei-Shi Zheng
, Ruixuan Wang
:
FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector. 1981-1990 - Xudong Wang
, Weihong Ren
, Xi'ai Chen
, Huijie Fan
, Yandong Tang
, Zhi Han
:
Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open World. 1991-2000 - Junliu Zhong
, Zhiyi Li
, Dan Xiang
, Maotang Han
, Changsheng Li
, Yanfen Gan
:
A Lightweight Multi-domain Multi-attention Progressive Network for Single Image Deraining. 2001-2010 - Weijia Zhang
, Dongnan Liu
, Weidong Cai
, Chao Ma
:
Cross-View Consistency Regularisation for Knowledge Distillation. 2011-2020 - Zikai Song
, Ying Tang
, Run Luo
, Lintao Ma
, Junqing Yu
, Yi-Ping Phoebe Chen
, Wei Yang
:
Autogenic Language Embedding for Coherent Point Tracking. 2021-2030 - Yuwen Pan
, Rui Sun
, Yuan Wang
, Tianzhu Zhang
, Yongdong Zhang
:
Rethinking the Implicit Optimization Paradigm with Dual Alignments for Referring Remote Sensing Image Segmentation. 2031-2040 - Zhaopeng Gu
, Bingke Zhu
, Guibo Zhu
, Yingying Chen
, Hao Li
, Ming Tang
, Jinqiao Wang
:
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization. 2041-2049 - Yi Lei
, Huilin Zhu
, Jingling Yuan
, Guangli Xiang
, Xian Zhong
, Shengfeng He
:
DenseTrack: Drone-Based Crowd Tracking via Density-Aware Motion-Appearance Synergy. 2050-2058 - Fengze Jiang
, Shuling Wang
, Xiaojin Gong
:
Task-Conditional Adapter for Multi-Task Dense Prediction. 2059-2068 - Yitai Lin, Zhijie Wei, Wanfa Zhang, Xiping Lin, Yudi Dai
, Chenglu Wen, Siqi Shen, Lan Xu, Cheng Wang:
HmPEAR: A Dataset for Human Pose Estimation and Action Recognition. 2069-2078 - Deji Zhao
, Donghong Han
, Ye Yuan
, Bo Ning
, Mengxiang Li
, Zhongjiang He
, Shuangyong Song
:
AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation. 2079-2088 - Jiaxin Zhang
, Yiqi Wang
, Xihong Yang
, Siwei Wang
, Yu Feng
, Yu Shi
, Ruichao Ren
, En Zhu
, Xinwang Liu
:
Test-Time Training on Graphs with Large Language Models (LLMs). 2089-2098 - Yujia Xiao
, Xi Wang
, Xu Tan
, Lei He
, Xinfa Zhu
, Sheng Zhao
, Tan Lee
:
Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis. 2099-2107 - Junyu Lin
, Yan Zheng
, Xinyue Chen
, Yazhou Ren
, Xiaorong Pu
, Jing He
:
Cross-view Contrastive Unification Guides Generative Pretraining for Molecular Property Prediction. 2108-2116 - Bo Yuan
, Danpei Zhao
, Zhuoran Liu
, Wentao Li
, Tian Li
:
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images. 2117-2126 - Shidi Chen
, Lili Wei
, Liqian Liang
, Congyan Lang
:
Joint Homophily and Heterophily Relational Knowledge Distillation for Efficient and Compact 3D Object Detection. 2127-2135 - Zhiwen Wang
, Yuhui Wu
, Zheng Wang
, Jiwei Wei
, Tianyu Li
, Guoqing Wang
, Yang Yang
, Hengtao Shen
:
Cascaded Adversarial Attack: Simultaneously Fooling Rain Removal and Semantic Segmentation Networks. 2136-2145 - Jiexuan Yan
, Sheng Huang
, Nankun Mu
, Luwen Huangfu
, Bo Liu
:
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification. 2146-2155 - Penglei Sun
, Yaoxian Song
, Xiang Liu
, Xiaofei Yang
, Qiang Wang
, Tiefeng Li
, Yang Yang
, Xiaowen Chu
:
3D Question Answering for City Scene Understanding. 2156-2165 - Qiuyu Kong
, Jiangming Chen
, Jie Jiang
, Zanxi Ruan
, Lai Kang
:
Dual-Branch Fusion with Style Modulation for Cross-Domain Few-Shot Semantic Segmentation. 2166-2174 - Jiaqi Wang
, Lu Lu
, Mingmin Chi
, Jian Chen
:
MDR: Multi-stage Decoupled Relational Knowledge Distillation with Adaptive Stage Selection. 2175-2183 - Xiongjun Zhao
, Zhengyu Liu
, Fen Liu
, Guanting Li
, Yutao Dou
, Shaoliang Peng
:
Report-Concept Textual-Prompt Learning for Enhancing X-ray Diagnosis. 2184-2193 - Jianzhi Lu
, Ruian He
, Shili Zhou
, Weimin Tan
, Bo Yan
:
FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model. 2194-2203 - Wei-Bang Jiang
, Yu-Ting Lan
, Bao-Liang Lu
:
REmoNet: Reducing Emotional Label Noise via Multi-regularized Self-supervision. 2204-2213 - Shuxun Wang
, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu
, Li Yang
, Bing Li
, Wenjuan Li, Jin Gao
, Weiming Hu:
NFT1000: A Cross-Modal Dataset For Non-Fungible Token Retrieval. 2214-2222 - Haoyang Su
, Wenzhe Du, Xiaoliang Wang, Cam-Tu Nguyen:
Sample Efficiency Matters: Training Multimodal Conversational Recommendation Systems in a Small Data Setting. 2223-2232 - Xincheng Ju
, Dong Zhang
, Suyang Zhu
, Junhui Li
, Shoushan Li
, Guodong Zhou
:
ECFCON: Emotion Consequence Forecasting in Conversations. 2233-2241 - Xiangbo Yin
, Jiangming Shi
, Yachao Zhang
, Yang Lu
, Zhizhong Zhang
, Yuan Xie
, Yanyun Qu
:
Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification. 2242-2251 - Yubo Li
, De Cheng
, Chaowei Fang
, Changzhe Jiao
, Nannan Wang
, Xinbo Gao
:
Disentangling Identity Features from Interference Factors for Cloth-Changing Person Re-identification. 2252-2261 - Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li:
Harmfully Manipulated Images Matter in Multimodal Misinformation Detection. 2262-2271 - Wuliang Huang
, Yiqiang Chen
, Xinlong Jiang
, Chenlong Gao
, Qian Chen
, Teng Zhang
, Bingjie Yan
, Yifan Wang
, Jianrong Yang
:
Correlation-Driven Multi-Modality Graph Decomposition for Cross-Subject Emotion Recognition. 2272-2281 - Wenbin Wang
, Liang Ding
, Li Shen
, Yong Luo
, Han Hu
, Dacheng Tao
:
WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge. 2282-2291 - Zhanpeng Chen
, Zhihong Zhu
, Wanshi Xu
, Yunyan Zhang
, Xian Wu
, Yefeng Zheng
:
Aspects are Anchors: Towards Multimodal Aspect-based Sentiment Analysis via Aspect-driven Alignment and Refinement. 2292-2300 - Haodong Chen
, Haojian Huang
, Junhao Dong
, Mingzhe Zheng
, Dian Shao
:
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs. 2301-2310 - Honghao Li
, Lei Sang
, Yi Zhang
, Yiwen Zhang
:
SimCEN: Simple Contrast-enhanced Network for CTR Prediction. 2311-2320 - Yuanyuan Shi
, Yunan Li
, Siyu Liang
, Huizhou Chen
, Qiguang Miao
:
MGR-Dark: A Large Multimodal Video Dataset and RGB-IR Benchmark for Gesture Recognition in Darkness. 2321-2330 - Shuanglin Yan
, Jun Liu
, Neng Dong
, Liyan Zhang
, Jinhui Tang
:
Prototypical Prompting for Text-to-image Person Re-identification. 2331-2340 - Kexiang Feng
, Chuanmin Jia
, Siwei Ma
, Wen Gao
:
Unifying Spike Perception and Prediction: A Compact Spike Representation Model Using Multi-scale Correlation. 2341-2349 - Feifei Zhang
, Sijia Qu
, Fan Shi
, Changsheng Xu
:
Overcoming the Pitfalls of Vision-Language Model for Image-Text Retrieval. 2350-2359 - Francesco Tonini
, Nicola Dall'Asen
, Lorenzo Vaquero
, Cigdem Beyan
, Elisa Ricci
:
AL-GTD: Deep Active Learning for Gaze Target Detection. 2360-2369 - Yuxiang Zhou
, Zhe Sun
, Rui Liu
, Yong Chen
, Dell Zhang
:
AVHash: Joint Audio-Visual Hashing for Video Retrieval. 2370-2378 - Xin Jiang
, Hao Tang
, Rui Yan
, Jinhui Tang
, Zechao Li
:
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines. 2379-2388 - Qian Li
, Yucheng Zhou
, Cheng Ji
, Feihong Lu
, Jianian Gong
, Shangguang Wang
, Jianxin Li
:
Multi-Modal Inductive Framework for Text-Video Retrieval. 2389-2398 - Hancheng Zhu
, Ju Shi
, Zhiwen Shao
, Rui Yao
, Yong Zhou
, Jiaqi Zhao
, Leida Li
:
Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment. 2399-2408 - Zeyu Xiao
, Dachun Kai
, Yueyi Zhang
, Xiaoyan Sun
, Zhiwei Xiong
:
Asymmetric Event-Guided Video Super-Resolution. 2409-2418 - Yuanfeng Pan
, Wenkang Su
, Jiangqun Ni
, Qingliang Liu
, Yulin Zhang
, Donghua Jiang
:
Model-Based Non-Independent Distortion Cost Design for Effective JPEG Steganography. 2419-2427 - Xianghu Yue
, Xueyi Zhang
, Yiming Chen
, Chengwei Zhang
, Mingrui Lao
, Huiping Zhuang
, Xinyuan Qian
, Haizhou Li
:
MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks. 2428-2437 - Yuzheng Wang
, Zhaoyu Chen
, Jie Zhang
, Dingkang Yang
, Zuhao Ge
, Yang Liu
, Siao Liu
, Yunquan Sun
, Wenqiang Zhang
, Lizhe Qi
:
Sampling to Distill: Knowledge Transfer from Open-World Data. 2438-2447 - Xi Wu
, Chuang Huang
, Xinliu Liu
, Fei Zhou
, Zhenwen Ren
:
Multiple Kernel Clustering with Shifted Laplacian on Grassmann Manifold. 2448-2456 - Guangyao Li
, Yajun Jian, Yan Yan, Hanzi Wang:
GLATrack: Global and Local Awareness for Open-Vocabulary Multiple Object Tracking. 2457-2466 - Xuze Hao
, Wenqian Ni
, Xuhao Jiang
, Weimin Tan
, Bo Yan
:
Addressing Imbalance for Class Incremental Learning in Medical Image Classification. 2467-2476 - Qiwei Li
, Yuxin Peng
, Jiahuan Zhou
:
Progressive Prototype Evolving for Dual-Forgetting Mitigation in Non-Exemplar Online Continual Learning. 2477-2486 - Fengfan Zhou
, Qianyu Zhou
, Bangjie Yin
, Hui Zheng
, Xuequan Lu
, Lizhuang Ma
, Hefei Ling
:
Rethinking Impersonation and Dodging Attacks on Face Recognition Systems. 2487-2496 - Xin Chen
, Bin Wang
, Jinzheng Jiang
, Kunkun Zhang
, Yongsheng Gao
:
SDePR: Fine-Grained Leaf Image Retrieval with Structural Deep Patch Representation. 2497-2505 - Yuhan Liu
, Qianxin Huang
, Siqi Hui
, Jingwen Fu
, Sanping Zhou
, Kangyi Wu
, Pengna Li
, Jinjun Wang
:
Semantic-aware Representation Learning for Homography Estimation. 2506-2514 - Chen Hui
, Haiqi Zhu
, Shuya Yan
, Shaohui Liu
, Feng Jiang
, Debin Zhao
:
S2-CSNet: Scale-Aware Scalable Sampling Network for Image Compressive Sensing. 2515-2524 - Gangyan Zeng
, Yuan Zhang
, Jin Wei
, Dongbao Yang
, Peng Zhang
, Yiwen Gao
, Xugong Qin
, Yu Zhou
:
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval. 2525-2534 - Hua Yu
, Weiming Liu
, Jiapeng Bai
, Xu Gui
, Yaqing Hou
, Yew-Soon Ong
, Qiang Zhang
:
Towards Efficient and Diverse Generative Model for Unconditional Human Motion Synthesis. 2535-2544 - Dan Zeng
, Yu Zhu
, Shuiwang Li
, Qijun Zhao
, Qiaomu Shen
, Bo Tang
:
Towards Labeling-free Fine-grained Animal Pose Estimation. 2545-2553 - Rui Xie
, Anlong Ming
, Shuai He
, Yi Xiao
, Huadong Ma
:
"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical Perspective. 2554-2563 - Zhengwei Yin
, Mingze Ma
, Guixu Lin
, Yinqiang Zheng
:
Exploring Data Efficiency in Image Restoration: A Gaussian Denoising Case Study. 2564-2573 - Yuntao Wang
, Jinpu Zhang
, Ruonan Wei
, Wenbo Gao
, Yuehuan Wang
:
MFRGN: Multi-scale Feature Representation Generalization Network for Ground-to-Aerial Geo-localization. 2574-2583 - Chang Wu
, Guancheng Quan
, Gang He
, Xin-Quan Lai
, Yunsong Li
, Wenxin Yu
, Xianmeng Lin
, Cheng Yang
:
QS-NeRV: Real-Time Quality-Scalable Decoding with Neural Representation for Videos. 2584-2592 - Xiaoyu Han
, Shunyuan Zheng
, Zonglin Li
, Chenyang Wang
, Xin Sun
, Quanling Meng
:
Shape-Guided Clothing Warping for Virtual Try-On. 2593-2602 - Richen Liu
, Hansheng Wang
, Hailong Wang
, Siru Chen
, Chufan Lai
, Ayush Kumar
, Siming Chen
:
ScaleTraversal: Creating Multi-Scale Biomedical Animation with Limited Hardware Resources. 2603-2612 - Chenrui Wu
, Haishuai Wang
, Xiang Zhang
, Zhen Fang
, Jiajun Bu
:
Spatio-temporal Heterogeneous Federated Learning for Time Series Classification with Multi-view Orthogonal Training. 2613-2622 - Yaopeng Peng
, Milan Sonka
, Danny Z. Chen
:
Group Vision Transformer. 2623-2631 - Zhichao Yang
, Leida Li
, Pengfei Chen
, Jinjian Wu
, Weisheng Dong
:
Semantics-Aware Image Aesthetics Assessment using Tag Matching and Contrastive Ranking. 2632-2641 - Pengcheng Zhang
, Xiaohan Yu
, Xiao Bai
, Jin Zheng
, Xin Ning
:
Prompting Continual Person Search. 2642-2651 - Xiao Zhao
, Xukun Zhang
, Dingkang Yang
, Mingyang Sun
, Mingcheng Li
, Shunli Wang
, Lihua Zhang
:
MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation. 2652-2661 - Yong Yang
, Aoqi Zhao
, Shuying Huang
, Xiaozheng Wang
, Yajing Fan
:
SCPSN: Spectral Clustering-based Pyramid Super-resolution Network for Hyperspectral Images. 2662-2670 - Xiangyu Chen
, Yihao Liu
, Yuandong Pu
, Wenlong Zhang
, Jiantao Zhou
, Yu Qiao
, Chao Dong
:
Learning A Low-Level Vision Generalist via Visual Task Prompt. 2671-2680 - Wenxu Shi
, Bochuan Zheng
:
Alleviating the Equilibrium Challenge with Sample Virtual Labeling for Adversarial Domain Adaptation. 2681-2689 - Federico Espositi
, Andrea Bonarini
:
The Room: Design and Embodiment of Spaces as Social Beings. 2690-2699 - Chunjie Ma
, Lina Du
, Zan Gao
, Li Zhuo
, Meng Wang
:
A Coarse to Fine Detection Method for Prohibited Object in X-ray Images Based on Progressive Transformer Decoder. 2700-2708 - Qizhi Xie
, Kun Yuan
, Yunpeng Qu
, Mingda Wu
, Ming Sun
, Chao Zhou
, Jihong Zhu
:
QPT-V2: Masked Image Modeling Advances Visual Scoring. 2709-2718 - Shengguang Wu
, Zhenglun Chen
, Qi Su
:
Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision. 2719-2728 - Yu Feng
, Zhen Tian
, Yifan Zhu
, Zongfu Han
, Haoran Luo
, Guangwei Zhang
, Meina Song
:
CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning. 2729-2738 - Huixiang Wen
, Shizong Yan
, Shan Chang
, Jie Xu
, Hongzi Zhu
, Yanting Zhang
, Bo Li
:
DepthCloak: Projecting Optical Camouflage Patches for Erroneous Monocular Depth Estimation of Vehicles. 2739-2747 - Keming Wu
, Man Yao
, Yuhong Chou
, Xuerui Qiu
, Rui Yang
, Bo Xu
, Guoqi Li
:
RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding. 2748-2756 - Xueying Mao
, Xiaoxiao Hu
, Wanli Peng
, Zhenliang Gan
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
:
From Covert Hiding To Visual Editing: Robust Generative Video Steganography. 2757-2765 - Wu Ran
, Peirong Ma
, Zhiquan He
, Hong Lu
:
Rainmer: Learning Multi-view Representations for Comprehensive Image Deraining and Beyond. 2766-2775 - Haoxuan Li
, Zhengmao Yang
, Yunshan Ma
, Yi Bin
, Yang Yang
, Tat-Seng Chua
:
MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models. 2776-2785 - Shuyuan Wen
, Bingrui Hu
, Wenchao Li
:
CDEA: Context- and Detail-Enhanced Unsupervised Learning for Domain Adaptive Semantic Segmentation. 2786-2794 - Xitong Ling
, Minxi Ouyang
, Yizhi Wang
, Xinrui Chen
, Renao Yan
, Hongbo Chu
, Junru Cheng
, Tian Guan
, Sufang Tian
, Xiaoping Liu
, Yonghong He
:
Agent Aggregator with Mask Denoise Mechanism for Histopathology Whole Slide Image Analysis. 2795-2803 - Kepeng Xu
, Zijia Ma
, Li Xu
, Gang He
, Yunsong Li
, Wenxin Yu
, Taichu Han
, Cheng Yang
:
An End-to-End Real-World Camera Imaging Pipeline. 2804-2813 - Lijian Yang
, Weisheng Li
, Yucheng Shu
, Jian-Xun Mi
, Yuping Huang
, Bin Xiao
:
ShiftMorph: A Fast and Robust Convolutional Neural Network for 3D Deformable Medical Image Registration. 2814-2823 - Ximing Wu
, Kongyange Zhao
, Xu Chen
, Teng Liang
:
Edge-assisted Real-time Dynamic 3D Point Cloud Rendering for Multi-party Mobile Virtual Reality. 2824-2832 - Nannan Yu
, Tao Ma
, Jiqing Zhang
, Yuji Zhang
, Qirui Bao
, Xiaopeng Wei
, Xin Yang:
Adaptive Vision Transformer for Event-Based Human Pose Estimation. 2833-2841 - Litian Zhang
, Xiaoming Zhang
, Chaozhuo Li
, Ziyi Zhou
, Jiacheng Liu
, Feiran Huang
, Xi Zhang
:
Mitigating Social Hazards: Early Detection of Fake News via Diffusion-Guided Propagation Path Generation. 2842-2851 - Yuzhen Du
, Teng Hu
, Ran Yi
, Lizhuang Ma
:
LD-BFR: Vector-Quantization-Based Face Restoration Model with Latent Diffusion Enhancement. 2852-2860 - Jie Huang
, Zhao-Min Chen, Xiaoqin Zhang, Yisu Ge
, Lusi Ye, Guodao Zhang, Huiling Chen:
Label Decoupling and Reconstruction: A Two-Stage Training Framework for Long-tailed Multi-label Medical Image Recognition. 2861-2869 - Chengpei Xu
, Hao Fu
, Long Ma
, Wenjing Jia
, Chengqi Zhang
, Feng Xia
, Xiaoyu Ai
, Binghao Li
, Wenjie Zhang
:
Seeing Text in the Dark: Algorithm and Benchmark. 2870-2878 - Ye Tian
, Zhe Wang
, Jianguo Sun
, Liguo Zhang
:
Time-Frequency Domain Fusion Enhancement for Audio Super-Resolution. 2879-2887 - Lei Liu, Li Liu, Yawen Cui:
Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning. 2888-2897 - Tianjiao Xu
, Aoxuan Chen
, Yuxi Zhao
, Jinfei Gao
, Tian Gan
:
A Chinese Multimodal Social Video Dataset for Controversy Detection. 2898-2907 - Zhe Ji
, Qiansiqi Hu
, Yicheng Zheng
, Liyao Xiang
, Xinbing Wang
:
A Principled Approach to Natural Language Watermarking. 2908-2916 - Hao Wu
, Fan Xu
, Chong Chen
, Xian-Sheng Hua
, Xiao Luo
, Haixin Wang
:
PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction. 2917-2926 - Jiawei Yao
, Yingxin Lai
, Hongrui Kou
, Tong Wu
, Ruixi Liu
:
QE-BEV: Query Evolution for Bird's Eye View Object Detection in Varied Contexts. 2927-2935 - Xiangrui Liu
, Xinju Wu
, Pingping Zhang
, Shiqi Wang
, Zhu Li
, Sam Kwong
:
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting. 2936-2944 - Shengyu Hao
, Wenhao Chai
, Zhonghan Zhao
, Meiqi Sun
, Wendi Hu
, Jieyang Zhou
, Yixian Zhao
, Qi Li
, Yizhou Wang
, Xi Li
, Gaoang Wang
:
Ego3DT: Tracking Every 3D Object in Ego-centric Videos. 2945-2954 - Junkang Liu
, Fanhua Shang
, Yuanyuan Liu
, Hongying Liu
, Yuangang Li
, YunXiang Gong
:
FedBCGD: Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning. 2955-2963 - Yiran Cheng
, Bintao He
, Fa Zhang
, Renmin Han
:
Serial Section Microscopy Image Inpainting Guided by Axial Optical Flow. 2964-2972 - Han Fang
, Kejiang Chen
, Yupeng Qiu
, Zehua Ma
, Weiming Zhang
, Ee-Chien Chang
:
DERO: Diffusion-Model-Erasure Robust Watermarking. 2973-2981 - Yin Wang
, Hao Lu
, Ying-Cong Chen
, Li Kuang
, Mengchu Zhou
, Shuiguang Deng
:
rPPG-HiBa: Hierarchical Balanced Framework for Remote Physiological Measurement. 2982-2991 - Huan Chen
, Tingfa Xu
, Zhenxiang Chen
, Peifu Liu
, Huiyan Bai
, Jianan Li
:
Multi-scale Change-Aware Transformer for Remote Sensing Image Change Detection. 2992-3000 - Yinyin Peng
, Yaofei Wang
, Donghui Hu
, Kejiang Chen
, Xianjin Rong
, Weiming Zhang
:
LDStega: Practical and Robust Generative Image Steganography based on Latent Diffusion Models. 3001-3009 - Lei Lu
, Yanyue Xie
, Wei Jiang
, Wei Wang
, Xue Lin
, Yanzhi Wang
:
HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression. 3010-3018 - Linfei Li
, Lin Zhang
, Zhong Wang
, Ying Shen
:
GS3LAM: Gaussian Semantic Splatting SLAM. 3019-3027 - Shuang Wang
, Pengyi Hao
, Fuli Wu
, Cong Bai
:
Live on the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning. 3028-3036 - Xuhan Zhu
, Yifei Xing
, Ruiping Wang
, Yaowei Wang
, Xiangyuan Lan
:
Calibration for Long-tailed Scene Graph Generation. 3037-3046 - Minjing Yu
, Lingzhi Zeng
, Xinxin Du
, Jenny Sheng
, Qiantian Liao
, Yong-Jin Liu
:
VisHanfu: An Interactive System for the Promotion of Hanfu Knowledge via Cross-Shaped Flat Structure. 3047-3055 - Xiuquan Du
, Jiajia Chen
, Xuejun Zhang
:
CBNet: Cooperation-Based Weakly Supervised Polyp Detection. 3056-3064 - Zeyu Xiao
, Zhihe Lu
, Michael Bi Mi
, Zhiwei Xiong
, Xinchao Wang
:
Unraveling Motion Uncertainty for Local Motion Deblurring. 3065-3074 - Yi Wang
, Ningze Zhong
, Minglin Chen
, Longguang Wang
, Yulan Guo
:
Tangram-Splatting: Optimizing 3D Gaussian Splatting Through Tangram-inspired Shape Priors. 3075-3083 - Jiali Chen
, Yi Cai
, Ruohang Xu
, Jiexin Wang
, Jiayuan Xie
, Qing Li
:
Deconfounded Emotion Guidance Sticker Selection with Causal Inference. 3084-3093 - Zhijian Wu
, Jun Li
, Yang Hu
, Dingjiang Huang
:
Compacter: A Lightweight Transformer for Image Restoration. 3094-3103 - Xiuli Bi
, Yang Hu
, Bo Liu
, Weisheng Li
, Pamela C. Cosman
, Bin Xiao:
PriFU: Capturing Task-Relevant Information Without Adversarial Learning. 3104-3112 - Zan Chen
, Xiao Yu
, Yuanjing Feng
:
Connectivity-based Cerebrovascular Segmentation in Time-of-Flight Magnetic Resonance Angiography. 3113-3121 - Jiawei Chen
, Dingkang Yang
, Yue Jiang
, Mingcheng Li
, Jinjie Wei
, Xiaolu Hou
, Lihua Zhang
:
Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Models. 3122-3130 - Keke Tang
, Zhensu Wang
, Weilong Peng
, Lujie Huang
, Le Wang
, Peican Zhu
, Wenping Wang
, Zhihong Tian
:
SymAttack: Symmetry-aware Imperceptible Adversarial Attacks on 3D Point Clouds. 3131-3140 - Jie Liang
, Rongjie Wang
, Rui Peng
, Zhe Zhang
, Kaiqiang Xiong
, Ronggang Wang
:
High Fidelity Aggregated Planar Prior Assisted PatchMatch Multi-View Stereo. 3141-3150 - Tao Huang
, Xinjia Ou
, Huali Yang
, Shengze Hu
, Jing Geng
, Junjie Hu
, Zhuoran Xu
:
Remembering is Not Applying: Interpretable Knowledge Tracing for Problem-solving Processes. 3151-3159 - Kien T. Pham
, Jingye Chen
, Qifeng Chen
:
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization. 3160-3169 - Lingyu Xiong
, Xize Cheng
, Jintao Tan
, Xianjia Wu
, Xiandong Li
, Lei Zhu
, Fei Ma
, Minglei Li
, Huang Xu
, Zhihui Hu
:
SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing. 3170-3179 - Changshuo Wang
, Mingzhe Yu
, Lei Wu
, Lei Meng
, Xiang Li
, Xiangxu Meng
:
InstantAS: Minimum Coverage Sampling for Arbitrary-Size Image Generation. 3180-3188 - Du Chen
, Zhengqiang Zhang
, Jie Liang
, Lei Zhang
:
SSL: A Self-similarity Loss for Improving Generative Image Super-resolution. 3189-3198 - Zhengze Xu
, Mengting Chen
, Zhao Wang
, Linyu Xing
, Zhonghua Zhai
, Nong Sang
, Jinsong Lan
, Shuai Xiao
, Changxin Gao
:
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos. 3199-3208 - Lixing Tan
, Shuang Song
, Kangneng Zhou
, Chengbo Duan
, Lanying Wang
, Huayang Ren
, Linlin Liu
, Wei Zhang
, Ruoxiu Xiao
:
Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans. 3209-3218 - Zecheng Wang
, Xinye Li
, Zhanyue Qin
, Chunshan Li
, Zhiying Tu
, Dianhui Chu
, Dianbo Sui
:
Can We Debias Multimodal Large Language Models via Model Editing? 3219-3228 - Shuqi Dai
, Ming-Yu Liu
, Rafael Valle
, Siddharth Gururani
:
ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control. 3229-3238 - Dehao Ying
, Fengchang Yu
, Haihua Chen
, Wei Lu
:
DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout Analysis. 3239-3247 - Shibo Hong
, Xuhong Zhang
, Tianyu Du
, Sheng Cheng
, Xun Wang
, Jianwei Yin
:
Cons2Plan: Vector Floorplan Generation from Various Conditions via a Learning Framework based on Conditional Diffusion Models. 3248-3256 - Qihe Pan
, Zhen Zhao
, Zicheng Wang
, Sifan Long
, Yiming Wu
, Wei Ji
, Haoran Liang
, Ronghua Liang
:
Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach. 3257-3265 - Xiaofeng Mao
, Zhengkai Jiang
, Qilin Wang
, Chencan Fu
, Jiangning Zhang
, Jiafu Wu
, Yabiao Wang
, Chengjie Wang
, Wei Li
, Mingmin Chi
:
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation. 3266-3274 - Jihoon Lee
, Yunhong Min
, Hwidong Kim
, Sangtae Ahn
:
DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting. 3275-3283 - Boyong He
, Yuxiang Ji
, Zhuoyue Tan
, Liaoni Wu
:
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector. 3284-3293 - Weizhi Liu
, Yue Li
, Dongdong Lin
, Hui Tian
, Haizhou Li
:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. 3294-3302 - Feihong Lu
, Weiqi Wang
, Yangyifei Luo
, Ziqin Zhu
, Qingyun Sun
, Baixuan Xu
, Haochen Shi
, Shiqi Gao
, Qian Li
, Yangqiu Song
, Jianxin Li
:
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery. 3303-3312 - Guojin Zhong
, Yihu Guo
, Jin Yuan
, Qianjun Zhang
, Weili Guan
, Long Chen
:
PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation. 3313-3322 - Xiangcheng Zhai
, Yingqi Jie
, Xueguang Xie
, Aimin Hao
, Na Jiang
, Yang Gao
:
ANFluid: Animate Natural Fluid Photos base on Physics-Aware Simulation and Dual-Flow Texture Learning. 3323-3331 - Shoubin Yu
, Jacob Zhiyuan Fang
, Jian Zheng
, Gunnar A. Sigurdsson
, Vicente Ordonez
, Robinson Piramuthu
, Mohit Bansal
:
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition. 3332-3341 - Goirik Chakrabarty
, Aditya Chandrasekar
, Ramya Hebbalaguppe
, Prathosh AP
:
LoMOE: Localized Multi-Object Editing via Multi-Diffusion. 3342-3351 - Yuyan Chen
, Songzhou Yan
, Zhihong Zhu
, Zhixu Li
, Yanghua Xiao
:
XMeCap: Meme Caption Generation with Sub-Image Adaptability. 3352-3361 - Zhenqiang Li
, Jie Li
, Yangjie Cao
, Jiayi Wang
, Runfeng Lv
:
ImageBind3D: Image as Binding Step for Controllable 3D Generation. 3362-3371 - Pengxiang Cai
, Zhiwei Liu
, Guibo Zhu
, Yunfang Niu
, Jinqiao Wang
:
Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner. 3372-3380 - Chengwei Zhang
, Xueyi Zhang
, Xianghu Yue
, Mingrui Lao
, Tao Jiang
, Jiawei Wang
, Fubo Zhang
, Longyong Chen
:
PD-Refiner: An Underlying Surface Inheritance Refiner with Adaptive Edge-Aware Supervision for Point Cloud Denoising. 3381-3390 - Yue Jiang
, Yueming Lyu
, Ziwen He
, Bo Peng
, Jing Dong
:
Mitigating Social Biases in Text-to-Image Diffusion Models via Linguistic-Aligned Attention Guidance. 3391-3400 - Peng Zhou
, Dunbo Cai
, Yujian Du
, Runqing Zhang
, Bingbing Ni
, Jie Qin
, Ling Qian
:
Edit3D: Elevating 3D Scene Editing with Attention-Driven Multi-Turn Interactivity. 3401-3410 - Ziyu Yao
, Xuxin Cheng
, Zhiqi Huang
:
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model. 3411-3420 - Xiaomin Li
, Xu Jia
, Qinghe Wang
, Haiwen Diao
, Mengmeng Ge
, Pengxiang Li
, You He
, Huchuan Lu
:
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models. 3421-3430 - Qi Xu
, Yaxin Li
, Xuanye Fang
, Jiangrong Shen
, Qiang Zhang
, Gang Pan
:
Reversing Structural Pattern Learning with Biologically Inspired Knowledge Distillation for Spiking Neural Networks. 3431-3439 - Xiaogang Wang
, Yuhang Cheng
, Ziyang Fan
, Kai Xu
:
Learning to Transfer Heterogeneous Translucent Materials from a 2D Image to 3D Models. 3440-3448 - Zonglin Lyu
, Ming Li
, Jianbo Jiao
, Chen Chen
:
Frame Interpolation with Consecutive Brownian Bridge Diffusion. 3449-3458 - Teng Hu
, Jiangning Zhang
, Ran Yi
, Yating Wang
, Jieyu Weng
, Hongrui Huang
, Yabiao Wang
, Lizhuang Ma
:
COMD: Training-free Video Motion Transfer With Camera-Object Motion Disentanglement. 3459-3468 - Yihao Liu
, Feng Xue
, Anlong Ming
, Mingshuai Zhao
, Huadong Ma
, Nicu Sebe
:
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model. 3469-3478 - Qinfeng Li
, Zhiqiang Shen
, Zhenghan Qin
, Yangfan Xie
, Xuhong Zhang
, Tianyu Du
, Sheng Cheng
, Xun Wang
, Jianwei Yin
:
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment. 3479-3488 - Tao Wu
, Mengze Li
, Jingyuan Chen
, Wei Ji
, Wang Lin
, Jinyang Gao
, Kun Kuang
, Zhou Zhao
, Fei Wu
:
Semantic Alignment for Multimodal Large Language Models. 3489-3498 - Wenxuan Yang
, Weimin Tan
, Yuqi Sun
, Bo Yan
:
A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation Models. 3499-3508 - Jin Liu
, Huaibo Huang
, Jie Cao
, Ran He
:
ZePo: Zero-Shot Portrait Stylization with Faster Sampling. 3509-3518 - Yiding Li
, Lingyun Yu
, Li Wang
, Hongtao Xie
:
Control-Talker: A Rapid-Customization Talking Head Generation Method for Multi-Condition Control and High-Texture Enhancement. 3519-3527 - Zhaoyang Li
, Zhu Teng
, Baopeng Zhang
, Jianping Fan
:
Boosting Non-causal Semantic Elimination: An Unconventional Harnessing of LVM for Open-World Deepfake Interpretation. 3528-3537 - Zhihao Sun
, Haipeng Fang
, Juan Cao
, Xinying Zhao
, Danding Wang
:
Rethinking Image Editing Detection in the Era of Generative AI Revolution. 3538-3547 - Hongyun Yu
, Zhan Qu
, Qihang Yu
, Jianchuan Chen
, Zhonghua Jiang
, Zhiwen Chen
, Shengyu Zhang
, Jimin Xu
, Fei Wu
, Chengfei Lv
, Gang Yu
:
GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting. 3548-3557 - Xingqi Wang
, Xiaoyuan Yi
, Xing Xie
, Jia Jia
:
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization. 3558-3567 - Weili Zeng
, Yichao Yan
, Qi Zhu
, Zhuo Chen
, Pengzhi Chu
, Weiming Zhao
, Xiaokang Yang
:
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting. 3568-3577 - Yi Liu
, Chengjun Cai
, Xiaoli Zhang
, Xingliang Yuan
, Cong Wang
:
Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts. 3578-3586 - Yisu Liu
, Jinyang An
, Wanqian Zhang
, Dayan Wu
, Jingzi Gu
, Zheng Lin
, Weiping Wang
:
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization. 3587-3596 - Yiren Lu
, Jing Ma
, Yu Yin
:
View-consistent Object Removal in Radiance Fields. 3597-3606 - Shaocong Long
, Qianyu Zhou
, Xiangtai Li
, Xuequan Lu
, Chenhao Ying
, Yuan Luo
, Lizhuang Ma
, Shuicheng Yan
:
DGMamba: Domain Generalization via Generalized State Space Model. 3607-3616 - Wangguandong Zheng
, Haifeng Xia
, Rui Chen
, Libo Sun
, Ming Shao
, Siyu Xia
, Zhengming Ding
:
Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation. 3617-3626 - Ziyin Zhou
, Ke Sun
, Zhongxi Chen
, Huafeng Kuang
, Xiaoshuai Sun
, Rongrong Ji
:
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model. 3627-3636 - Hong Chen
, Xin Wang
, Yipeng Zhang
, Yuwei Zhou
, Zeyang Zhang
, Siao Tang
, Wenwu Zhu
:
DisenStudio: Customized Multi-Subject Text-to-Video Generation with Disentangled Spatial Control. 3637-3646 - Ziqi Yu
, Jing Zhou
, Zhongyun Bao
, Gang Fu
, Weilei He
, Chao Liang
, Chunxia Xiao
:
CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model. 3647-3656 - Hao Wang
, Shangwei Guo
, Jialing He
, Kangjie Chen
, Shudong Zhang
, Tianwei Zhang
, Tao Xiang
:
EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second. 3657-3665 - Haiyan Jiang
, Leiyu Song
, Dongdong Weng
, Zhe Sun
, Huiying Li
, Xiaonuo Dongye
, Zhenliang Zhang
:
In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces. 3666-3675 - Haoning Wu
, Xiele Wu
, Chunyi Li
, Zicheng Zhang
, Chaofeng Chen
, Xiaohong Liu
, Guangtao Zhai
, Weisi Lin
:
T2I-Scorer: Quantitative Evaluation on Text-to-Image Generation via Fine-Tuned Large Multi-Modal Models. 3676-3685 - Shiwei Li
, Yingyi Cheng
, Haozhao Wang
, Xing Tang
, Shijie Xu
, Weihong Luo
, Yuhua Li
, Dugang Liu
, Xiuqiang He
, Ruixuan Li
:
Masked Random Noise for Communication-Efficient Federated Learning. 3686-3694 - Sa Yan
, Nuowen Kan
, Chenglin Li
, Wenrui Dai
, Junni Zou
, Hongkai Xiong
:
Task-Oriented Multi-Bitstream Optimization for Image Compression and Transmission via Optimal Transport. 3695-3703 - Tingting Li
, Ziming Zhao
, Jianwei Yin
:
Minerva: Enhancing Quantum Network Performance for High-Fidelity Multimedia Transmission. 3704-3712 - Xiaotong Yu
, Chang-Wen Chen
:
Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception. 3713-3721 - Yu Chen
, Yanan Wu
, Na Han
, Xiaozhao Fang
, Bingzhi Chen
, Jie Wen
:
Partial Multi-label Learning Based On Near-Far Neighborhood Label Enhancement And Nonlinear Guidance. 3722-3731 - Ruofan Jia
, Weiying Xie
, Jie Lei
, Yunsong Li
:
Adaptive Hierarchical Aggregation for Federated Object Detection. 3732-3740 - Liang Xie
, Wei Gao
, Huiming Zheng
, Ge Li:
ROI-Guided Point Cloud Geometry Compression Towards Human and Machine Vision. 3741-3750
Oral Session 12: Human-centric and Interactive Multimedia
- Xiyu Wang
, Yufei Wang
, Satoshi Tsutsui
, Weisi Lin
, Bihan Wen
, Alex C. Kot
:
Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models. 3751-3760 - Shiyu Liu
, Zibo Zhao
, Yihao Zhi
, Yiqun Zhao
, Binbin Huang
, Shuo Wang
, Ruoyu Wang
, Michael Xuan
, Zhengxin Li
, Shenghua Gao
:
HeroMaker: Human-centric Video Editing with Motion Priors. 3761-3770 - Yunze Liu
, Changxi Chen
, Chenjing Ding
, Li Yi
:
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation. 3771-3780 - Wenxuan Wang
, Haonan Bai
, Jen-tse Huang
, Yuxuan Wan
, Youliang Yuan
, Haoyi Qiu
, Nanyun Peng
, Michael R. Lyu
:
New Job, New Gender? Measuring the Social Bias in Image Generation Models. 3781-3789 - Mengzhen Liu
, Mengyu Wang
, Henghui Ding
, Yilong Xu
, Yao Zhao
, Yunchao Wei
:
Segment Anything with Precise Interaction. 3790-3799 - Zhihua Xu
, Tianshui Chen
, Zhijing Yang
, Chunmei Qing
, Yukai Shi
, Liang Lin
:
Self-Supervised Emotion Representation Disentanglement for Speech-Preserving Facial Expression Manipulation. 3800-3808
Oral Session 13: Machine Learning for Multimedia
- Dongyu Xie
, Chaofan Qiao
, Lanyue Liang
, Zhiwen Wang
, Tianyu Li
, Qiao Liu
, Chongyi Li
, Guoqing Wang
, Yang Yang
:
Generalizing ISP Model by Unsupervised Raw-to-raw Mapping. 3809-3817 - Yang Liu
, Daizong Liu
, Zongming Guo
, Wei Hu
:
Cross-Task Knowledge Transfer for Semi-supervised Joint 3D Grounding and Captioning. 3818-3827 - Yang Liu
, Qianqian Xu
, Peisong Wen
, Siran Dai
, Qingming Huang
:
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval. 3828-3837 - Dongjie Fu
, Xize Cheng
, Xiaoda Yang
, Hanting Wang
, Zhou Zhao
, Tao Jin
:
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts. 3838-3847 - Xingyu Zhu
, Beier Zhu
, Yi Tan
, Shuo Wang
, Yanbin Hao
, Hanwang Zhang
:
Selective Vision-Language Subspace Projection for Few-shot CLIP. 3848-3857 - Jin Liu
, Bo Wang
, Chuanming Wang
, Huiyuan Fu
, Huadong Ma
:
Learning Exposure Correction in Dynamic Scenes. 3858-3866
Oral Session 14: Multimodal Datasets, Models & Analytics
- Fuqiang Niu
, Zebang Cheng
, Xianghua Fu
, Xiaojiang Peng
, Genan Dai
, Yin Chen
, Hu Huang
, Bowen Zhang
:
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model. 3867-3876 - Ruilin Yao
, Shengwu Xiong
, Yichen Zhao
, Yi Rong
:
Visual Grounding with Multi-modal Conditional Adaptation. 3877-3886 - Junhao Xu
, Jingjing Chen, Xue Song
, Feng Han
, Haijun Shan
, Yu-Gang Jiang
:
Identity-Driven Multimedia Forgery Detection via Reference Assistance. 3887-3896 - Bowen Zhao
, Tianhao Cheng
, Yuejie Zhang
, Ying Cheng
, Rui Feng
, Xiaobo Zhang
:
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart. 3897-3906 - Zhanyu Wang
, Longyue Wang
, Zhen Zhao
, Minghao Wu
, Chenyang Lyu
, Huayang Li
, Deng Cai
, Luping Zhou
, Shuming Shi
, Zhaopeng Tu
:
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation. 3907-3916 - Linmei Hu
, Duokang Wang
, Yiming Pan
, Jifan Yu
, Yingxia Shao
, Chong Feng
, Liqiang Nie
:
NovaChart: A Large-scale Dataset towards Chart Understanding and Generation of Multimodal Large Language Models. 3917-3925
Oral Session 15: Video Applications
- Jiaxu Li
, Songsong Yu
, Yifan Wang
, Lijun Wang
, Huchuan Lu
:
SelM: Selective Mechanism based Audio-Visual Segmentation. 3926-3935 - Yuqing Wang, Lei Meng, Haokai Ma, Yuqing Wang, Haibei Huang, Xiangxu Meng:
Modeling Event-level Causal Representation for Video Classification. 3936-3944 - Te Yang
, Jian Jia
, Bo Wang
, Yanhua Cheng
, Yan Li
, Dongze Hao
, Xipeng Cao
, Quan Chen
, Han Li
, Peng Jiang
, Xiangyu Zhu
, Zhen Lei
:
Spatiotemporal Fine-grained Video Description for Short Videos. 3945-3954 - Yili Li
, Jing Yu
, Keke Gai
, Bang Liu
, Gang Xiong
, Qi Wu
:
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval. 3955-3963 - Haijie Yang
, Zhenyu Zhang
, Hao Tang
, Jianjun Qian
, Jian Yang
:
ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance. 3964-3973 - Zhiyu Zhang
, Guo Lu
, Huanxiong Liang
, Zhengxue Cheng
, Anni Tang
, Li Song
:
Rate-aware Compression for NeRF-based Volumetric Video. 3974-3983
Oral Session 16: Biological and Health Applications
- Jingxiong Li
, Sunyi Zheng
, Chenglu Zhu
, Yuxuan Sun
, Pingyi Chen
, Zhongyi Shui
, Yunlong Zhang
, Honglin Li
, Lin Yang
:
PathUp: Patch-wise Timestep Tracking for Multi-class Large Pathology Image Synthesising Diffusion Model. 3984-3993 - Dian Xie
, Peiang Zhao
, Jiarui Zhang
, Kangqi Wei
, Xiaobao Ni
, Jiong Xia
:
BrainRAM: Cross-Modality Retrieval-Augmented Image Reconstruction from Human Brain Activity. 3994-4003 - Shuo Ma
, Yingwei Zhang
, Qiqi Zhang
, Yiqiang Chen
, Haoran Wang
, Ziyu Jia
:
SleepMG: Multimodal Generalizable Sleep Staging with Inter-modal Balance of Classification and Domain Discrimination. 4004-4013 - Zixuan Gong
, Qi Zhang
, Guangyin Bao
, Lei Zhu
, Yu Zhang
, Ke Liu
, Liang Hu
, Duoqian Miao
:
Lite-Mind: Towards Efficient and Robust Brain Representation Learning. 4014-4023 - Kun Dong
, Jian Xue
, Zehai Niu
, Xing Lan
, Ke Lu
, Qingyuan Liu
, Xiaoyu Qin
:
Realistic Full-Body Motion Generation from Sparse Tracking with State Space Model. 4024-4033 - Usman Naseem
, Adam G. Dunn
, Matloob Khushi
, Jinman Kim
:
Vaccine Misinformation Detection in X using Cooperative Multimodal Framework. 4034-4042
Oral Session 17: Person Modeling and Tracking
- Shizong Yan
, Huixiang Wen
, Shan Chang
, Hongzi Zhu
, Luo Zhou
:
Fooling 3D Face Recognition with One Single 2D Image. 4043-4052 - Fangyi Liu
, Mang Ye
, Bo Du
:
Cloth-aware Augmentation for Cloth-generalized Person Re-identification. 4053-4062 - Zhiqi Pang
, Lingling Zhao
, Chunyu Wang
:
Dual-Resolution Fusion Modeling for Unsupervised Cross-Resolution Person Re-Identification. 4063-4072 - Huilin Tian
, Jingke Meng
, Wei-Shi Zheng
, Yuan-Ming Li
, Junkai Yan
, Yunong Zhang
:
Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation. 4073-4081 - Changcheng Xiao
, Qiong Cao
, Zhigang Luo
, Long Lan
:
MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model. 4082-4091 - Ling Li
, Wenrui Yang
, Xinchun Yu
, Junliang Xing
, Xiao-Ping Zhang
:
Translating Motion to Notation: Hand Labanotation for Intuitive and Comprehensive Hand Movement Documentation. 4092-4100
Poster Session 2
- Xiang Gao
, Jiaying Liu
:
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation. 4101-4109 - Wen Yin
, Bin Benjamin Zhu
, Yulai Xie
, Pan Zhou
, Dan Feng
:
Backdoor Attacks on Bimodal Salient Object Detection with RGB-Thermal Data. 4110-4119 - Zhixiang Shen
, Haolan He
, Zhao Kang
:
Balanced Multi-Relational Graph Clustering. 4120-4128 - Jiyuan Wang
, Chunyu Lin
, Lang Nie
, Kang Liao
, Shuwei Shao
, Yao Zhao
:
Digging into Contrastive Learning for Robust Depth Estimation with Diffusion Models. 4129-4137 - Zhuoxiao Chen
, Zixin Wang
, Yadan Luo
, Sen Wang
, Zi Huang
:
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection. 4138-4147 - Xian Zhang
, Haokun Wen
, Jianlong Wu
, Pengda Qin
, Hui Xue'
, Liqiang Nie
:
Differential-Perceptive and Retrieval-Augmented MLLM for Change Captioning. 4148-4157 - Bingyan Liu
, Chengyu Wang
, Jun Huang
, Kui Jia
:
Attentive Linguistic Tracking in Diffusion Models for Training-free Text-guided Image Editing. 4158-4166 - Changhao He
, Hongyuan Zhu
, Peng Hu
, Xi Peng
:
Robust Variational Contrastive Learning for Partially View-unaligned Clustering. 4167-4176 - Shengxin Chen
, Gen Luo
, Yiyi Zhou
, Xiaoshuai Sun
, Guannan Jiang
, Rongrong Ji
:
QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding. 4177-4186 - Rui Liu
, Yifan Hu
, Yi Ren
, Xiang Yin
, Haizhou Li
:
Generative Expressive Conversational Speech Synthesis. 4187-4196 - Zhien Dai
, Zhaohui Tang
, Hu Zhang
, Can Tian
, Mingjun Pan
, Yongfang Xie
:
Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching. 4197-4206 - Humen Zhong
, Zhibo Yang
, Zhaohai Li
, Peng Wang
, Jun Tang
, Wenqing Cheng
, Cong Yao
:
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer. 4207-4216 - Chaofan Gan
, Yuanpeng Tu
, Yuxi Li
, Weiyao Lin
:
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction. 4217-4226 - Zhenyu Hou
, Junjun Guo
:
Virtual Visual-Guided Domain-Shadow Fusion via Modal Exchanging for Domain-Specific Multi-Modal Neural Machine Translation. 4227-4235 - Yuxiang Yang
, Lu Wen
, Xinyi Zeng
, Yuanyuan Xu
, Xi Wu
, Jiliu Zhou
, Yan Wang
:
Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition. 4236-4245 - Shuhuang Chen
, Dingjie Fu
, Shiming Chen
, Shuo Ye
, Wenjin Hou
, Xinge You
:
Causal Visual-semantic Correlation for Zero-shot Learning. 4246-4255 - Patrick Steinert
, Stefan Wagenpfeil
, Ingo Frommholz
, Matthias L. Hemmje
:
256 Metaverse Records Dataset. 4256-4263 - Yifeng Xie
, Zhihong Zhu
, Xin Chen
, Zhanpeng Chen
, Zhiqi Huang
:
MoBA: Mixture of Bi-directional Adapter for Multi-modal Sarcasm Detection. 4264-4272 - Jiulin Li
, Mengyu Yang
, Ye Tian
, Lanshan Zhang
, Yongchun Lu
, Jice Liu
, Wendong Wang
:
WaveDN: A Wavelet-based Training-free Zero-shot Enhancement for Vision-Language Models. 4273-4282 - Runkai Zhao
, Heng Wang
, Weidong Cai
:
LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer. 4283-4291 - Wenju Sun
, Qingyong Li
, Siyu Zhang
, Wen Wang
, Yangli-ao Geng
:
Incremental Learning via Robust Parameter Posterior Fusion. 4292-4301 - Tao Jin
, Weicai Yan
, Ye Wang
, Sihang Cai
, Qifan Shuai
, Zhou Zhao
:
Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding. 4302-4311 - Pengyue Lin
, Ruifan Li
, Yuzhe Ji
, Zhihan Yu
, Fangxiang Feng
, Zhanyu Ma
, Xiaojie Wang
:
Triple Alignment Strategies for Zero-shot Phrase Grounding under Weak Supervision. 4312-4321 - Zhenni Yu
, Xiaoqin Zhang
, Li Zhao
, Yi Bin
, Guobao Xiao
:
Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection. 4322-4330 - Jiawei Wang
, Da Cao
, Shaofei Lu
, Zhanchang Ma
, Junbin Xiao
, Tat-Seng Chua
:
Causal-driven Large Language Models with Faithful Reasoning for Knowledge Question Answering. 4331-4340 - Zijian Yi
, Ziming Zhao
, Zhishu Shen
, Tiehua Zhang
:
Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation. 4341-4348 - Cheng Shen
, Liquan Shen
, Mengyao Li
, Meng Yu
:
EPL-UFLSID: Efficient Pseudo Labels-Driven Underwater Forward-Looking Sonar Images Object Detection. 4349-4357 - Shuiping Gou
, Xin Wang
, Xinlin Wang
, Yunzhi Chen
:
Interpretable Matching of Optical-SAR Image via Dynamically Conditioned Diffusion Models. 4358-4367 - Xiaohuan Ding
, Yangrui Gong
, Tianyi Shi
, Zihang Huang
, Gangwei Xu
, Xin Yang
:
Masked Snake Attention for Fundus Image Restoration with Vessel Preservation. 4368-4376 - Yajie Zhang
, Zhi-An Huang
, Zhiliang Hong
, Songsong Wu
, Jibin Wu
, Kay Chen Tan
:
Mixed Prototype Correction for Causal Inference in Medical Image Classification. 4377-4386 - Yi Zhang
, Ke Yu
, Angelica I. Avilés-Rivero
, Jiyuan Jia
, Yushun Tang
, Zhihai He
:
Training-Free Feature Reconstruction with Sparse Optimization for Vision-Language Models. 4387-4396 - Nan Wang
, Zonglin Di
, Houlin He
, Qingchao Jiang
, Xiaoxiao Li
:
A Simple and Provable Approach for Learning on Noisy Labeled Medical Images. 4397-4405 - Mengmeng Sheng
, Zeren Sun
, Gensheng Pei
, Tao Chen
, Haonan Luo
, Yazhou Yao
:
Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training Approach. 4406-4415 - Muquan Li
, Dongyang Zhang
, Tao He
, Xiurui Xie
, Yuan-Fang Li
, Ke Qin
:
Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation. 4416-4425 - Qiuhui Chen
, Yi Hong
:
SMART: Self-Weighted Multimodal Fusion for Diagnostics of Neurodegenerative Disorders. 4426-4435 - Taoyu Su
, Jiawei Sheng
, Shicheng Wang
, Xinghua Zhang
, Hongbo Xu
, Tingwen Liu
:
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment. 4436-4445 - Zhijun Jia
, Huaying Xue
, Xiulian Peng
, Yan Lu
:
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision. 4446-4454 - Yihan Zhao
, Wei Xi
, Yuhang Cui
, Gairui Bai
, Xinhui Liu
, Jizhong Zhao
:
CoPL: Parameter-Efficient Collaborative Prompt Learning for Audio-Visual Tasks. 4455-4464 - Junbo Hu
, Zhixin Li
:
Distilled Cross-Combination Transformer for Image Captioning with Dual Refined Visual Features. 4465-4474 - Siyuan Xu
, Guannan Li
, Haofei Song
, Jiansheng Wang
, Yan Wang
, Qingli Li
:
GeNSeg-Net: A General Segmentation Framework for Any Nucleus in Immunohistochemistry Images. 4475-4484 - Ziyi Gao
, Kai Chen
, Zhipeng Wei
, Tingshu Mou
, Jingjing Chen, Zhiyu Tan
, Hao Li
, Yu-Gang Jiang
:
ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack. 4485-4494 - Kunyu Peng
, David Schneider
, Alina Roitberg
, Kailun Yang
, Jiaming Zhang
, Chen Deng
, Kaiyu Zhang
, M. Saquib Sarfraz
, Rainer Stiefelhagen
:
Towards Video-based Activated Muscle Group Estimation in the Wild. 4495-4504 - Rui Xu
, Gaolei Li
, Changze Li
, Zhaohui Yang
, Yuchen Liu
, Mingzhe Chen
:
OSNeRF: On-demand Semantic Neural Radiance Fields for Fast and Robust 3D Object Reconstruction. 4505-4514 - Wenjie Li
, Heng Guo
, Xuannan Liu
, Kongming Liang
, Jiani Hu
, Zhanyu Ma
, Jun Guo
:
Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network. 4515-4523 - Ruoxi Deng
, Bin Yu
, Jinxuan Lu
, Caixia Zhou
, Zhao-Min Chen
, Jie Hu
:
Advancing Semantic Edge Detection through Cross-Modal Knowledge Learning. 4524-4532 - Jiacheng Zhang
, Jie Wu
, Huafeng Kuang
, Haiming Zhang
, Yuxi Ren
, Weifeng Chen
, Manlin Zhang
, Xuefeng Xiao
, Guanbin Li
:
TreeReward: Improve Diffusion Model via Tree-Structured Feedback Learning. 4533-4542 - Chaomin Shen
, Yaomin Huang
, Haokun Zhu
, Jinsong Fan
, Guixu Zhang
:
Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation. 4543-4552 - Yanshan Zhou
, Pingrui Lai
, Jiaqi Yu
, Yingjie Xiong
, Hua Yang
:
Hydrodynamics-Informed Neural Network for Simulating Dense Crowd Motion Patterns. 4553-4561 - Zhidong Yu
, Zhenbo Shi
, Xiaoman Liu
, Wei Yang
:
PFFAA: Prototype-based Feature and Frequency Alteration Attack for Semantic Segmentation. 4562-4571 - Wenbo Huang
, Jinghui Zhang
, Xuwei Qian
, Zhen Wu
, Meng Wang
, Lei Zhang
:
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition. 4572-4580 - Xiangyan Qu
, Jing Yu
, Keke Gai
, Jiamin Zhuang
, Yuanmin Tang
, Gang Xiong
, Gaopeng Gou
, Qi Wu
:
Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning. 4581-4590 - Weixiang Han
, Chengjun Cai
, Yu Guo
, Jialiang Peng
:
ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning. 4591-4600 - Luca Rossetto
, Cristina Sarasua
, Abraham Bernstein
:
Estimating the Semantic Density of Visual Media. 4601-4609 - Shaokun Zhang
, Yiran Wu
, Zhonghua Zheng
, Qingyun Wu
, Chi Wang
:
HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts. 4610-4619 - Xiaomeng Chu
, Jiajun Deng
, Guoliang You
, Yifan Duan
, Yao Li
, Yanyong Zhang
:
RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies. 4620-4629 - Yi Bin
, Junrong Liao
, Yujuan Ding
, Haoxuan Li
, Yang Yang
, See-Kiong Ng
, Heng Tao Shen
:
Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning. 4630-4639 - Chengyou Jia
, Minnan Luo
, Xiaojun Chang
, Zhuohang Dang
, Mingfei Han
, Mengmeng Wang
, Guang Dai
, Sizhe Dang
, Jingdong Wang
:
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition. 4640-4649 - Jialu Zhang
, Xinyi Wang
, Chenglin Yao
, Jianfeng Ren
, Xudong Jiang
:
Visual-linguistic Cross-domain Feature Learning with Group Attention and Gamma-correct Gated Fusion for Extracting Commonsense Knowledge. 4650-4659 - Wenhan Wu
, Ce Zheng
, Zihao Yang
, Chen Chen
, Srijan Das
, Aidong Lu
:
Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer. 4660-4669 - Xianwei Zhuang
, Xuxin Cheng
, Zhihong Zhu
, Zhanpeng Chen
, Hongxiang Li
, Yuexian Zou
:
Towards Multimodal-augmented Pre-trained Language Models via Self-balanced Expectation-Maximization Iteration. 4670-4679 - Hongze Zhu
, Guoyang Xie
, Chengbin Hou
, Tao Dai
, Can Gao
, Jinbao Wang
, Linlin Shen
:
Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning. 4680-4689 - Kaixiang Wang
, Xiaojian Ding
, Fan Yang
:
Non-Overlapped Multi-View Weak-Label Learning Guided by Multiple Correlations. 4690-4698 - Xin Mei
, Rui Mao
, Xiaoyan Cai
, Libin Yang
, Erik Cambria
:
Medical Report Generation via Multimodal Spatio-Temporal Fusion. 4699-4708 - Guofan Fan
, Zekun Qi
, Wenkai Shi
, Kaisheng Ma
:
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast. 4709-4718 - Menghao Zhang
, Jingyu Wang
, Qi Qi
, Pengfei Ren
, Haifeng Sun
, Zirui Zhuang
, Huazheng Wang
, Lei Zhang
, Jianxin Liao
:
Video Anomaly Detection via Progressive Learning of Multiple Proxy Tasks. 4719-4728 - Xingyu Zhang
, Siyu Zhao
, Zeen Song
, Huijie Guo
, Jianqi Zhang
, Changwen Zheng
, Wenwen Qiang
:
Not All Frequencies Are Created Equal: Towards a Dynamic Fusion of Frequencies in Time-Series Forecasting. 4729-4737 - Shijie Chen
, Junbao Zhuo
, Xin Li
, Haizhuang Liu
, Rongquan Wang
, Jiansheng Chen
, Huimin Ma
:
CMT: Co-training Mean-Teacher for Unsupervised Domain Adaptation on 3D Object Detection. 4738-4747 - Tianrui Pan
, Jie Liu
, Bohan Wang
, Jie Tang
, Gangshan Wu
:
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues. 4748-4756 - Siqi Wang
, Chao Liang
, Yunfan Gao
, Yang Liu
, Jing Li
, Haofen Wang
:
Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT. 4757-4765 - Yuanbin Fu
, Jie Ying
, Houlei Lv
, Xiaojie Guo
:
Semi-supervised Camouflaged Object Detection from Noisy Data. 4766-4775 - Bolei Chen
, Jiaxu Kang
, Ping Zhong
, Yixiong Liang
, Yu Sheng
, Jianxin Wang
:
Embodied Contrastive Learning with Geometric Consistency and Behavioral Awareness for Object Navigation. 4776-4785 - Jia-Li Yin
, Menghao Chen
, Jin Han
, Bo-Hao Chen
, Ximeng Liu:
Adversarial Example Quality Assessment: A Large-scale Dataset and Strong Baseline. 4786-4794 - Ye Jing
, Xinpei Zhao
:
DQ-Former: Querying Transformer with Dynamic Modality Priority for Cognitive-aligned Multimodal Emotion Recognition in Conversation. 4795-4804 - Xicong Wang
, Huiyuan Fu
, Jiaxuan Wang
, Xin Wang
, Heng Zhang
, Huadong Ma
:
Exploring in Extremely Dark: Low-Light Video Enhancement with Real Events. 4805-4813 - Qing Zhang
, Haocheng Lv
, Jie Liu
, Zhiyun Chen
, Jianyong Duan
, Hao Wang
, Li He
, Mingying Xu
:
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism. 4814-4822 - Kangpeng Hu
, Quansen Sun
, Yinghui Sun
, Tao Wang
:
Interactive Segmentation by Considering First-Click Intentional Ambiguity. 4823-4831 - Leqi Shen
, Sicheng Zhao
, Yifeng Zhang
, Hui Chen
, Jundong Zhou
, Pengzhang Liu
, Yongjun Bao
, Guiguang Ding
:
Multi-Label Learning with Block Diagonal Labels. 4832-4840 - Wentao He
, Jianfeng Ren
, Ruibin Bai
, Xudong Jiang
:
Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. 4841-4850 - Wenxi Li
, Yuchen Guo
, Jilai Zheng
, Haozhe Lin
, Chao Ma
, Lu Fang
, Xiaokang Yang
:
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer. 4851-4860 - Bo Liu
, Zexin Lu
, Yan Wang
:
Towards Medical Vision-Language Contrastive Pre-training via Study-Oriented Semantic Exploration. 4861-4870 - Zihao Liu
, Xiaoyu Wu
, Shengjin Wang
, Jiayao Qian
:
Adaptively Building a Video-language Model for Video Captioning and Retrieval without Massive Video Pretraining. 4871-4880 - Wenhao Guo
, Peng Lu
, Xujun Peng
, Zhaoran Zhao
, Ji Qiu
, Xiangtao Dong
:
BCSCN: Reducing Domain Gap through Bézier Curve basis-based Sparse Coding Network for Single-Image Super-Resolution. 4881-4889 - Yi Tu
, Chong Zhang
, Ya Guo
, Huan Chen
, Jinyang Tang
, Huijia Zhu
, Qi Zhang
:
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents. 4890-4898 - Tao Ling
, Siping Shi
, Hao Wang
, Chuang Hu
, Dan Wang
:
Federated Morozov Regularization for Shortcut Learning in Privacy Preserving Learning with Watermarked Image Data. 4899-4908 - Jinfu Liu
, Chen Chen
, Mengyuan Liu
:
Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition. 4909-4918 - Zewen Du
, Zhenjiang Hu
, Guiyu Zhao
, Ying Jin
, Hongbin Ma
:
LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention. 4919-4927 - Shichen Lu
, Longteng Guo
, Wenxuan Wang
, Zijia Zhao
, Tongtian Yue
, Jing Liu
, Si Liu
:
Collaborative Training of Tiny-Large Vision Language Models. 4928-4937 - Xudong Zhou
, Tianxiang Chen
:
BSBP-RWKV: Background Suppression with Boundary Preservation for Efficient Medical Image Segmentation. 4938-4946 - Yuxing Zhang
, Siyuan Meng
, Chunchun Chen
, Mengyao Peng
, Hongyan Gu
, Xinli Huang
:
LinkThief: Combining Generalized Structure Knowledge with Node Similarity for Link Stealing Attack against GNN. 4947-4956 - Yeqing Shen
, Shang Li
, Kun Song
:
Restoring Real-World Degraded Events Improves Deblurring Quality. 4957-4966 - Xiao Liang
, Yanlei Zhang
, Di Wang
, Haodi Zhong
, Ronghan Li
, Quan Wang
:
Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation. 4967-4975 - Zhen Wang
, Dongyuan Li
, Guang Li
, Ziqing Zhang
, Renhe Jiang
:
Multimodal Low-light Image Enhancement with Depth Information. 4976-4985 - Zishuo Wang
, Wenhao Zhou
, Jinglin Xu
, Yuxin Peng
:
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection. 4986-4994 - Xu Han
, Yuan Tang
, Zhaoxuan Wang
, Xianzhi Li
:
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. 4995-5004 - Wenqi Ren
, Ruihao Xia
, Meng Zheng
, Ziyan Wu
, Yang Tang
, Nicu Sebe
:
Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models. 5005-5014 - Xuefeng Yin
, Chenyang Zhu
, Shanglai Qu
, Yuqi Li
, Kai Xu
, Baocai Yin
, Xin Yang:
CSO: Constraint-Guided Space Optimization for Active Scene Mapping. 5015-5024 - Luoyi Sun
, Xuenan Xu
, Mengyue Wu
, Weidi Xie
:
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning. 5025-5034 - Xinyue Liu
, Jianyuan Wang
, Biao Leng
, Shuo Zhang
:
Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection. 5035-5044 - Huimin Ma
, Siwei Wang
, Shengju Yu
, Suyuan Liu
, Junjie Huang
, Huijun Wu
, Xinwang Liu
, En Zhu
:
Automatic and Aligned Anchor Learning Strategy for Multi-View Clustering. 5045-5054 - Shengyang Sun
, Jiashen Hua
, Junyi Feng
, Dongxu Wei
, Baisheng Lai
, Xiaojin Gong
:
TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection. 5055-5064 - Yang Xin
, Yu Zhou
, Jianmin Jiang
:
RobustFace: Adaptive Mining of Noise and Hard Samples for Robust Face Recognitions. 5065-5073 - Xiang Ma
, Xuemei Li
, Lexin Fang
, Caiming Zhang
:
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching. 5074-5082 - Chunli Peng
, Xuan Dong
, Tiantian Cao
, Zhengqing Li
, Kun Dong
, Weixin Li
:
ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig. 5083-5091 - Yang Fang
, Xuefeng Rao
, Xinbo Gao
, Weisheng Li
, Zijian Min
:
MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation. 5092-5101 - Le Jiang
, Yan Huang
, Lianxin Xie
, Wen Xue
, Cheng Liu, Si Wu
, Hau-San Wong
:
Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data. 5102-5111 - Yijia Guo
, Yuanxi Bai
, Liwen Hu
, Ziyi Guo
, Mianzhi Liu
, Yu Cai
, Tiejun Huang
, Lei Ma
:
PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting. 5112-5120 - Mingcan Xiang
, Jiaxun Tang
, Qizheng Yang
, Hui Guan
, Tongping Liu
:
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model. 5121-5130 - Xinwei Zhang
, Aishan Liu
, Tianyuan Zhang
, Siyuan Liang
, Xianglong Liu
:
Towards Robust Physical-world Backdoor Attacks on Lane Detection. 5131-5140 - Longtao Jiang
, Min Wang
, Zecheng Li
, Yao Fang
, Wengang Zhou
, Houqiang Li
:
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval. 5141-5150 - Pinxue Guo
, Wanyun Li
, Hao Huang
, Lingyi Hong
, Xinyu Zhou
, Zhaoyu Chen
, Jinglun Li
, Kaixun Jiang
, Wei Zhang
, Wenqiang Zhang
:
X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation. 5151-5160 - Ling Huang
, Wenqian Dong
, Song Xiao
, Jiahui Qu
, Yuanbo Yang
, Yunsong Li
:
Language-Guided Visual Prompt Compensation for Multi-Modal Remote Sensing Image Classification with Modality Absence. 5161-5170 - Zening Lin
, Jiapeng Wang
, Teng Li
, Wenhui Liao
, Dayi Huang
, Longfei Xiong
, Lianwen Jin
:
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction. 5171-5180 - Haojian Huang
, Xiaozhen Qiao
, Zhuo Chen
, Haodong Chen
, Bingyu Li
, Zhe Sun
, Mulin Chen
, Xuelong Li
:
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning. 5181-5190 - Shuai Zhao
, Yongkun Du
, Zhineng Chen
, Yu-Gang Jiang
:
Decoder Pre-Training with only Text for Scene Text Recognition. 5191-5200 - Naibo Wang
, Yuchen Deng
, Wenjie Feng
, Shichen Fan
, Jianwei Yin
, See-Kiong Ng
:
One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity. 5201-5210 - Wendong Huang
, Jinwu Hu
, Xiuli Bi
, Bin Xiao:
Anatomical Prior Guided Spatial Contrastive Learning for Few-Shot Medical Image Segmentation. 5211-5220 - Libo Long
, Xiao Hu
, Jochen Lang
:
Learning to Handle Large Obstructions in Video Frame Interpolation. 5221-5229 - Hefei Huang
, Xu Jia
, Xinyu Zhang
, Shengming Li
, Huchuan Lu
:
Event-Guided Rolling Shutter Correction with Time-Aware Cross-Attentions. 5230-5239