


default search action
32nd MM 2024: Melbourne, VIC, Australia
- Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian

, Liang Zheng, Vivek K. Singh
, Pablo César
, Lexing Xie
, Dong Xu:
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024 - 1 November 2024. ACM 2024, ISBN 979-8-4007-0686-8
Keynote Talks
- Pascale Fung

:
From Assistants to Agents in the LLM Era. 1 - Benoit Huet

:
Revolutionizing Lung Cancer Diagnostics with eyonis TM LCS: Cutting-edge AI/ML Technology-based SaMD for Enhanced Patient Care. 2-3 - Judy Kay

:
Empowering People to Harness and Control their Multimodal Data in Scrutable User models. 4-5 - Jiebo Luo

:
Large Multimodal Models as Social Multimedia Analysis Engines. 6-7
Oral Session 1: Large Language Models & Applications 1
- Haicheng Liao

, Yongkang Li
, Chengyue Wang
, Yanchen Guan
, Kahou Tam
, Chunlin Tian
, Li Li
, Chengzhong Xu
, Zhenning Li
:
When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models. 8-17 - Haonan Zheng

, Xinyang Deng
, Wen Jiang
, Wenrui Li
:
A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models. 18-27 - Xiang Fang

, Wanlong Fang
, Daizong Liu
, Xiaoye Qu
, Jianfeng Dong
, Pan Zhou
, Renfu Li, Zichuan Xu
, Lixing Chen
, Panpan Zheng
, Yu Cheng
:
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval using Language. 28-37 - Huishan Ji

, Qingyi Si
, Zheng Lin
, Weiping Wang
:
Towards Flexible Evaluation for Generative Visual Question Answering. 38-47 - Jiaqi Zhu

, Shaofeng Cai
, Fang Deng
, Beng Chin Ooi
, Junran Wu
:
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection. 48-57 - Yudong Li

, Xianxu Hou
, Dezhi Zheng
, Linlin Shen
, Zhe Zhao
:
FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Language-Image Pre-Training. 58-67
Oral Session 2: Large Language Models & Applications 2
- Esmée Henrieke Anne de Haas

, Lik-Hang Lee
, Yiming Huang
, Carlos Bermejo
, Pan Hui
, Zijun Lin
:
Towards Trustworthy MetaShopping: Studying Manipulative Audiovisual Designs in Virtual-Physical Commercial Platforms. 68-77 - Weiqi Li

, Shijie Zhao
, Bin Chen
, Xinhua Cheng
, Junlin Li
, Li Zhang
, Jian Zhang
:
ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images. 78-87 - Yunqiang Pei

, Kaiyue Zhang
, Hongrong Yang
, Yong Tao
, Qihang Tang
, Jialei Tang
, Guoqing Wang
, Zhitao Liu
, Ning Xie
, Peng Wang
, Yang Yang
, Hengtao Shen
:
Improving Interaction Comfort in Authoring Task in AR-HRI through Dynamic Dual-Layer Interaction Adjustment. 88-97 - Yang Lu

, Junxian Li
, Zhitong Cui
, Jiapeng Hu
, Yanna Lin
, Shijian Luo
:
Designing Spatial Visualization and Interactions of Immersive Sankey Diagram in Virtual Reality. 98-107 - Zhang Wan

, Sheng Tang
, Jiawei Wei
, Ruize Zhang
, Juan Cao
:
DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships. 108-116 - Kento Shigyo

, Yifan Cao
, Kentaro Takahira
, Mingming Fan
, Huamin Qu
:
VR-Mediated Cognitive Defusion: A Comparative Study for Managing Negative Thoughts. 117-126
Oral Session 3: Novel Multimedia Applications 1
- Yinxuan Gui

, Bin Zhu
, Jingjing Chen, Chong Wah Ngo
, Yu-Gang Jiang
:
Navigating Weight Prediction with Diet Diary. 127-136 - Feiyu Chen

, Cong Xu
, Qi Jia
, Yihua Wang
, Yuhan Liu
, Haotian Zhang
, Endong Wang
:
Egocentric Vehicle Dense Video Captioning. 137-146 - Jinyue Chen

, Lingyu Kong
, Haoran Wei
, Chenglong Liu
, Zheng Ge
, Liang Zhao
, Jianjian Sun
, Chunrui Han
, Xiangyu Zhang
:
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token. 147-155 - Jiawei Lin

, Zhaoyun Jiang
, Jiaqi Guo
, Shizhao Sun
, Ting Liu
, Zijiang Yang
, Jian-Guang Lou
, Dongmei Zhang
:
IconDM: Text-Guided Icon Set Expansion Using Diffusion Models. 156-165 - Haipeng Zhou

, Hongqiu Wang
, Tian Ye
, Zhaohu Xing
, Jun Ma
, Ping Li
, Qiong Wang
, Lei Zhu
:
Timeline and Boundary Guided Diffusion Network for Video Shadow Detection. 166-175 - Yichang Qu

, Bing Li
, Jie Huang
, Feng Zhao
:
Training Pansharpening Networks at Full Resolution Using Degenerate Invariance. 176-185
Oral Session 4: Graph and Diffusion Models
- Jielong Lu

, Zhihao Wu
, Zhaoliang Chen
, Zhiling Cai
, Shiping Wang
:
Towards Multi-view Consistent Graph Diffusion. 186-195 - Liyuan Ma

, Xueji Fang
, Guo-Jun Qi
:
Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization. 196-204 - Weilun Feng

, Chuanguang Yang
, Zhulin An
, Libo Huang
, Boyu Diao
, Fei Wang
, Yongjun Xu
:
Relational Diffusion Distillation for Efficient Image Generation. 205-213 - Hongjie Wu

, Linchao He
, Mingqin Zhang
, Dongdong Chen
, Kunming Luo
, Mengting Luo
, Jizhe Zhou
, Hu Chen
, Jiancheng Lv
:
Diffusion Posterior Proximal Sampling for Image Restoration. 214-223 - Yiheng Huang

, Hui Yang
, Chuanchen Luo
, Yuxi Wang
, Shibiao Xu
, Zhaoxiang Zhang
, Man Zhang
, Junran Peng
:
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework. 224-232 - Yichi Zhang

, Zhuo Chen
, Lingbing Guo
, Yajing Xu
, Wen Zhang
, Huajun Chen
:
Making Large Language Models Perform Better in Knowledge Graph Completion. 233-242
Oral Session 5: Multimodal Models and Applications
- Rishikesh Devanathan

, Apoorva Singh
, A. S. Poornash
, Sriparna Saha
:
Seeing Beyond Words: Multimodal Aspect-Level Complaint Detection in Ecommerce Videos. 243-252 - Hsiang-Hui Hung

, Huu-Phu Do
, Yung-Hui Li
, Ching-Chun Huang
:
TimeNeRF: Building Generalizable Neural Radiance Fields across Time from Few-Shot Input Views. 253-262 - Xiaoxuan Shen

, Fenghua Yu
, Yaqi Liu
, Ruxia Liang
, Qian Wan
, Kai Yang
, Jianwen Sun
:
Revisiting Knowledge Tracing: A Simple and Powerful Model. 263-272 - Peiming Li

, Ziyi Wang
, Mengyuan Liu
, Hong Liu
, Chen Chen
:
ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models. 273-281 - Bochao Liu

, Pengju Wang
, Weijia Guo
, Yong Li
, Liansheng Zhuang
, Weiping Wang
, Shiming Ge
:
Private Gradient Estimation is Useful for Generative Modeling. 282-290 - Ke Zhu

, Liang Zhao
, Zheng Ge
, Xiangyu Zhang
:
Self-Supervised Visual Preference Alignment. 291-300
Oral Session 6: Innovations in Medical Imaging and Physiological Measurement
- Yuxin Hong

, Xiao Zhang
, Xin Zhang
, Joey Tianyi Zhou
:
Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification. 301-310 - Ruiqi Wang

, Jinyang Huang
, Jie Zhang
, Xin Liu
, Xiang Zhang
, Zhi Liu
, Peng Zhao
, Sigui Chen
, Xiao Sun
:
FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks. 311-320 - Wei Zhang

, En Zhu
, Juan Chen
, Yunpeng Li
:
MDDR: Multi-modal Dual-Attention aggregation for Depression Recognition. 321-329 - Wei Qian

, Kun Li
, Dan Guo
, Bin Hu
, Meng Wang
:
Cluster-Phys: Facial Clues Clustering Towards Efficient Remote Physiological Measurement. 330-339 - Zhenxi Song

, Ruihan Qin
, Huixia Ren
, Zhen Liang
, Yi Guo
, Min Zhang
, Zhiguo Zhang
:
EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations. 340-349 - Xueyuan Xu

, Li Zhuo
, Jinxin Lu
, Xia Wu
:
WSEL: EEG Feature Selection with Weighted Self-expression Learning for Incomplete Multi-dimensional Emotion Recognition. 350-359
Oral Session 7: Imaging, Computer Vision & Graphics
- Yuanbo Wen

, Tao Gao
, Ting Chen
:
Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model. 360-369 - Zeyu Li

, Ruitong Gan
, Chuanchen Luo
, Yuxi Wang
, Jiaheng Liu
, Ziwei Zhu
, Qing Li
, Xucheng Yin
, Man Zhang
, Zhaoxiang Zhang
, Junran Peng
:
MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. 370-379 - Xiao Han

, Yiming Ren
, Peishan Cong
, Yujing Sun
, Jingya Wang
, Lan Xu
, Yuexin Ma
:
Gait Recognition in Large-scale Free Environment via Single LiDAR. 380-389 - Tang Tao

, Longfei Gao
, Guangrun Wang
, Yixing Lao
, Peng Chen
, Hengshuang Zhao
, Dayang Hao
, Xiaodan Liang
, Mathieu Salzmann
, Kaicheng Yu
:
LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields. 390-398 - Mu Chen

, Zhedong Zheng
, Yi Yang:
Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation. 399-408 - Yujian Mo

, Yan Wu
, Junqiao Zhao
, Zhenjie Hou
, Weiquan Huang
, Yinghao Hu
, Jijun Wang
, Jun Yan
:
Sparse Query Dense: Enhancing 3D Object Detection with Pseudo Points. 409-418
Oral Session 8: Multimodal Reasoning & Inference
- Changmeng Zheng

, Dayong Liang
, Wengyu Zhang
, Xiaoyong Wei
, Tat-Seng Chua
, Qing Li
:
A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning. 419-428 - Qian Guo

, Xinyan Liang
, Yuhua Qian
, Zhihua Cui
, Jie Wen
:
A Progressive Skip Reasoning Fusion Method for Multi-Modal Classification. 429-437 - Wenxin Xu

, Hexin Jiang
, Xuefeng Liang
:
Leveraging Knowledge of Modality Experts for Incomplete Multimodal Learning. 438-446 - Bo Xu

, Junzhe Zheng
, Jiayuan He
, Yuxuan Sun
, Hongfei Lin
, Liang Zhao
, Feng Xia
:
Generating Multimodal Metaphorical Features for Meme Understanding. 447-455 - Junjie Shi

, Caozhi Shang
, Zhaobin Sun
, Li Yu
, Xin Yang
, Zengqiang Yan
:
PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates. 456-465 - Mengze Li

, Kairong Han
, Jiahe Xu
, Yueying Li
, Tao Wu
, Zhou Zhao
, Jiaxu Miao
, Shengyu Zhang
, Jingyuan Chen
:
Cross-modal Observation Hypothesis Inference. 466-475
Oral Session 9: Image, Video, and Multimedia Processing
- Jiyang Li

, Lechao Cheng
, Zhangye Wang
, Tingting Mu
, Jingxuan He
:
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field. 476-485 - Chaofeng Chen

, Sensen Yang
, Haoning Wu
, Liang Liao
, Zicheng Zhang
, Annan Wang
, Wenxiu Sun
, Qiong Yan
, Weisi Lin
:
Q-Ground: Image Quality Grounding with Large Multi-modality Models. 486-495 - Cheng Ye

, Weidong Chen
, Jingyu Li
, Lei Zhang
, Zhendong Mao
:
Dual-path Collaborative Generation Network for Emotional Video Captioning. 496-505 - Hu Lin

, Chengjiang Long
, Yifeng Fei
, Qianchen Xia
, Erwei Yin
, Baocai Yin
, Xin Yang:
Exploring Matching Rates: From Keypoint Selection to Camera Relocalization. 506-514 - Zhihong Zhu

, Xuxin Cheng
, Zhaorun Chen
, Yuyan Chen
, Yunyan Zhang
, Xian Wu
, Yefeng Zheng
, Bowen Xing
:
InMu-Net: Advancing Multi-modal Intent Detection via Information Bottleneck and Multi-sensory Processing. 515-524 - Chaoya Jiang

, Hongrui Jia
, Mengfan Dong
, Wei Ye
, Haiyang Xu
, Ming Yan
, Ji Zhang
, Shikun Zhang
:
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models. 525-534
Oral Session 10: Speech and Audio in Multimedia Processing
- Zhongxu Wang

, Yujia Wang
, Mingzhu Li
, Hua Huang
:
ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations. 535-544 - Shuai Yu

, Xiaoliang He
, Ke Chen
, Yi Yu
:
HKDSME: Heterogeneous Knowledge Distillation for Semi-supervised Singing Melody Extraction Using Harmonic Supervision. 545-553 - Yixuan Zhou

, Xiaoyu Qin
, Zeyu Jin
, Shuoyi Zhou
, Shun Lei
, Songtao Zhou
, Zhiyong Wu
, Jia Jia
:
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. 554-563 - Navonil Majumder

, Chia-Yu Hung
, Deepanway Ghosal
, Wei-Ning Hsu
, Rada Mihalcea
, Soujanya Poria
:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. 564-572 - Xihua Wang

, Yuyue Wang
, Yihan Wu
, Ruihua Song
, Xu Tan
, Zehua Chen
, Hongteng Xu
, Guodong Sui
:
TiVA: Time-Aligned Video-to-Audio Generation. 573-582 - Alejandro Galán-Cuenca

, Jose J. Valero-Mas
, Juan C. Martinez-Sevilla
, Antonio Hidalgo-Centeno
, Antonio Pertusa
, Jorge Calvo-Zaragoza
:
MUSCAT: A Multimodal mUSic Collection for Automatic Transcription of Real Recordings and Image Scores. 583-591
Oral Session 11: Emotion & Sentiment
- Jianing Zhao

, Jingjing Wang
, Yujie Jin
, Jiamin Luo
, Guodong Zhou
:
Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model. 592-601 - Daiqing Wu

, Dongbao Yang
, Yu Zhou
, Can Ma
:
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs. 602-611 - Tan Yu

, Jingjing Wang
, Jiawen Wang
, Jiamin Luo
, Guodong Zhou
:
Towards Emotion-enriched Text-to-Motion Generation via LLM-guided Limb-level Emotion Manipulating. 612-621 - Wenjie Zheng

, Jianfei Yu
, Rui Xia
:
A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion Recognition. 622-631 - Xinji Mai

, Junxiong Lin
, Haoran Wang
, Zeng Tao
, Yan Wang
, Shaoqi Yan
, Xuan Tong
, Jiawen Yu
, Boyang Wang
, Ziheng Zhou
, Qing Zhao
, Shuyong Gao
, Wenqiang Zhang
:
All rivers run into the sea: Unified Modality Brain-Inspired Emotional Central Mechanism. 632-641 - Xin Li

, Shangfei Wang
, Xuandong Huang
:
Temporal Enhancement for Video Affective Content Analysis. 642-650
Poster Session 1
- Pei He

, Licheng Jiao
, Lingling Li
, Xu Liu
, Fang Liu
, Wenping Ma
, Shuyuan Yang
, Ronghua Shang
:
Domain Generalization-Aware Uncertainty Introspective Learning for 3D Point Clouds Segmentation. 651-660 - Yi Ma

, Peiqi Duan
, Yuchen Hong
, Chu Zhou
, Yu Zhang
, Jimmy S. J. Ren
, Boxin Shi
:
Color4E: Event Demosaicing for Full-color Event Guided Image Deblurring. 661-670 - Jiajie Zhu

, Xia Du
, Jizhe Zhou
, Chi-Man Pun
, Qizhen Xu
, Xiaoyuan Liu
:
DP-RAE: A Dual-Phase Merging Reversible Adversarial Example for Image Privacy Protection. 671-680 - Xinyi Zhang

, Qinpeng Cui
, Qiqi Bao
, Wenming Yang
, Qingmin Liao
:
Geometry-Guided Diffusion Model with Masked Transformer for Robust Multi-View 3D Human Pose Estimation. 681-690 - Meiqi Cao

, Rui Yan
, Xiangbo Shu
, Guangzhao Dai
, Yazhou Yao
, Guo-Sen Xie
:
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition. 691-700 - Junsheng Wang

, Tiantian Gong
, Yan Yan
:
Partially Aligned Cross-modal Retrieval via Optimal Transport-based Prototype Alignment Learning. 701-709 - Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, Depeng Dang:

Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring. 710-718 - Hangjun Che

, Xinyu Pu
, Deqiang Ouyang
, Beibei Li
:
Enhanced Tensorial Self-representation Subspace Learning for Incomplete Multi-view Clustering. 719-728 - Jian-Jun Qiao

, Meng-Yu Duan
, Xiao Wu
, Yu-Pei Song
:
CartoonNet: Cartoon Parsing with Semantic Consistency and Structure Correlation. 729-737 - Qianyu Guo

, Jieji Ren
, Haofen Wang
, Tianxing Wu
, Weifeng Ge
, Wenqiang Zhang
:
Visual-Language Collaborative Representation Network for Broad-Domain Few-Shot Image Classification. 738-747 - Wenzhuo Xu

, Kai Chen
, Ziyi Gao
, Zhipeng Wei
, Jingjing Chen, Yu-Gang Jiang
:
Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models. 748-757 - Hongzhi Wang

, Xiubo Liang
, Tao Zhang
, Yue Gu
, Weidong Geng
:
PSSD-Transformer: Powerful Sparse Spike-Driven Transformer for Image Semantic Segmentation. 758-767 - Zengsheng Kuang

, Changxing Ding
, Huan Yao
:
Learning Context with Priors for 3D Interacting Hand-Object Pose Estimation. 768-777 - Yang Chen

, Jingcai Guo
, Tian He
, Xiaocheng Lu
, Ling Wang
:
Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition. 778-786 - Shuo Zhang

, Yupeng Zhai
, Jilin Mei
, Yu Hu
:
FusionOcc: Multi-Modal Fusion for 3D Occupancy Prediction. 787-796 - Shaokun Wang

, Yifan Yu
, Yuhang He
, Yihong Gong
:
Enhancing Pre-trained ViTs for Downstream Task Adaptation: A Locality-Aware Prompt Learning Method. 797-806 - Fangming Cui

, Xun Yang
, Chao Wu
, Liang Xiao
, Xinmei Tian
:
Advancing Prompt Learning through an External Layer. 807-816 - Hanzi Wang

, Jiamin Ren
, Yifeng Ding
, Lei Ren
, Huixing Jiang
, Wei Chen
, Fangxiang Feng
, Xiaojie Wang
:
Q-MoE: Connector for MLLMs with Text-Driven Routing. 817-825 - Guozhen Peng

, Yunhong Wang
, Yuwei Zhao
, Shaoxiong Zhang
, Annan Li
:
GLGait: A Global-Local Temporal Receptive Field Network for Gait Recognition in the Wild. 826-835 - Qiang Wang

, Yuning Cui
, Yawen Li
, Yaping Ruan
, Ben Zhu
, Wenqi Ren
:
RFFNet: Towards Robust and Flexible Fusion for Low-Light Image Denoising. 836-845 - Minghe Gao

, Shuang Chen
, Liang Pang
, Yuan Yao
, Jisheng Dang
, Wenqiao Zhang
, Juncheng Li
, Siliang Tang
, Yueting Zhuang
, Tat-Seng Chua
:
Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales. 846-855 - Yue Zhang

, Parisa Kordjamshidi
:
Narrowing the Gap between Vision and Action in Navigation. 856-865 - Zequn Zeng

, Jianqiao Sun
, Hao Zhang
, Tiansheng Wen
, Yudi Su
, Yan Xie
, Zhengjue Wang
, Bo Chen
:
HICEScore: A Hierarchical Metric for Image Captioning Evaluation. 866-875 - Chen Feng

, Georgios Tzimiropoulos
, Ioannis Patras
:
CLIPCleaner: Cleaning Noisy Labels with CLIP. 876-885 - Haochen Zhao

, Hui Meng
, Deqian Yang
, Xiaozheng Xie
, Xiaoze Wu
, Qingfeng Li
, Jianwei Niu
:
GuidedNet: Semi-Supervised Multi-Organ Segmentation via Labeled Data Guide Unlabeled Data. 886-895 - Kin-Chung Chan

, Jun Xiao
, Hana Lebeta Goshu
, Kin-Man Lam
:
Point Cloud Densification for 3D Gaussian Splatting from Sparse Input Views. 896-904 - Xiaorui Huang

, Gen Luo
, Chaoyang Zhu
, Bo Tong
, Yiyi Zhou
, Xiaoshuai Sun
, Rongrong Ji
:
Deep Instruction Tuning for Segment Anything Model. 905-914 - Ziyi Wang

, Yiming Rong
, Deyang Jiang
, Haoran Wu
, Shiyu Zhou
, Bo Xu
:
CIEASR: Contextual Image-Enhanced Automatic Speech Recognition for Improved Homophone Discrimination. 915-924 - Jinxu Zhang

, Yongqi Yu
, Yu Zhang
:
CREAM: Coarse-to-Fine Retrieval and Multi-modal Efficient Tuning for Document VQA. 925-934 - Hebaixu Wang

, Hao Zhang
, Xunpeng Yi
, Xinyu Xiang
, Leyuan Fang
, Jiayi Ma
:
TeRF: Text-driven and Region-aware Flexible Visible and Infrared Image Fusion. 935-944 - Ruonan Zhang

, Ziwei Shang
, Fengjuan Wang
, Zhaoqilin Yang
, Shan Cao
, Yigang Cen
, Gaoyun An
:
Synergetic Prototype Learning Network for Unbiased Scene Graph Generation. 945-954 - Jiawei Zhu

, Yishu Liu
, Huanjia Zhu
, Hui Lin
, Yuncheng Jiang
, Zheng Zhang
, Bingzhi Chen
:
Combating Visual Question Answering Hallucinations via Robust Multi-Space Co-Debias Learning. 955-964 - Qian Cao

, Xu Chen
, Ruihua Song
, Xiting Wang
, Xinting Huang
, Yuchen Ren
:
See or Guess: Counterfactually Regularized Image Captioning. 965-974 - Shuai Li

, Fan Qi
, Zixin Zhang
, Changsheng Xu
:
Cross-Modal Meta Consensus for Heterogeneous Federated Learning. 975-984 - Xiang He

, Xiangxi Liu
, Yang Li
, Dongcheng Zhao
, Guobin Shen
, Qingqun Kong
, Xin Yang
, Yi Zeng
:
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization. 985-993 - Jiabao Guo

, Huan Liu
, Yizhi Luo
, Xueli Hu
, Hang Zou
, Yuan Zhang
, Hui Liu
, Bo Zhao
:
Style-conditional Prompt Token Learning for Generalizable Face Anti-spoofing. 994-1003 - Bowen Chen

, Yun Sing Koh
, Gillian Dobbie
:
SSAT-Adapter: Enhancing Vision-Language Model Few-shot Learning with Auxiliary Tasks. 1004-1013 - Haoyu Tong

, Xiaoyu Zhang
, Yulin Jin
, Jian Lou
, Kai Wu
, Xiaofeng Chen
:
Balancing Generalization and Robustness in Adversarial Training via Steering through Clean and Adversarial Gradient Directions. 1014-1023 - Shuo Zheng

, Yuanjie Dang
, Peng Chen
, Ruohong Huan
, Dongdong Zhao
, Ronghua Liang
:
Saliency-Guided Fine-Grained Temporal Mask Learning for Few-Shot Action Recognition. 1024-1033 - Mengyin Liu

, Chao Zhu
, Shiqi Ren
, Xu-Cheng Yin
:
Unsupervised Multi-view Pedestrian Detection. 1034-1042 - Zhilin Huang

, Yijie Yu
, Ling Yang
, Chujun Qin
, Bing Zheng
, Xiawu Zheng
, Zikun Zhou
, Yaowei Wang
, Wenming Yang
:
Motion-aware Latent Diffusion Models for Video Frame Interpolation. 1043-1052 - Zongxin Ye

, Wenyu Li
, Sidun Liu
, Peng Qiao
, Yong Dou
:
AbsGS: Recovering Fine Details in 3D Gaussian Splatting. 1053-1061 - Ziming Wang

, Boxiang Zhang, Ming Ma, Yue Wang
, Taoli Du
, Wenhui Li:
Multi-fineness Boundaries and the Shifted Ensemble-aware Encoding for Point Cloud Semantic Segmentation. 1062-1071 - Yubo Wang

, Chaohu Liu
, Yanqiu Qu
, Haoyu Cao
, Deqiang Jiang
, Linli Xu
:
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models. 1072-1081 - Wenhao Li

, Qiangchang Wang
, Peng Zhao
, Yilong Yin
:
KNN Transformer with Pyramid Prompts for Few-Shot Learning. 1082-1091 - Lu Zhang

, Ke Yan
, Shouhong Ding
:
AlignCLIP: Align Multi Domains of Texts Input for CLIP models with Object-IoU Loss. 1092-1100 - Pengfei Yue

, Jianghang Lin
, Shengchuan Zhang
, Jie Hu
, Yilin Lu
, Hongwei Niu
, Haixin Ding
, Yan Zhang
, Guannan Jiang
, Liujuan Cao
, Rongrong Ji
:
Adaptive Selection based Referring Image Segmentation. 1101-1110 - Shanshan Wang

, ALuSi
, Xun Yang
, Ke Xu
, Huibin Tan
, Xingyi Zhang
:
Dual-stream Feature Augmentation for Domain Generalization. 1111-1119 - Yang Liu

, Xiang Huang
, Minghan Qin
, Qinwei Lin
, Haoqian Wang
:
Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars. 1120-1129 - Wei Feng

, Dongyuan Wei
, Qianqian Wang
, Bo Dong
, Quanxue Gao
:
Multi-View Clustering Based on Deep Non-negative Tensor Factorization. 1130-1138 - Aoqi Li

, Saihui Hou
, Chenye Wang
, Qingyuan Cai
, Yongzhen Huang
:
AerialGait: Bridging Aerial and Ground Views for Gait Recognition. 1139-1147 - Zefan Zhang

, Weiqi Zhang
, Yanhui Li
, Tian Bai
:
Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization. 1148-1157 - Xiaochen Li

, Jian Cheng
, Ziying Xia
, Zichong Chen
, Junhao Shi
, Zhicheng Dong
, Nyima Tashi
:
TS-ILM: Class Incremental Learning for Online Action Detection. 1158-1167 - Yuxiang Cai

, Yongheng Shang
, Jianwei Yin
:
MultiDAN: Unsupervised, Multistage, Multisource and Multitarget Domain Adaptation for Semantic Segmentation of Remote Sensing Images. 1168-1177 - Yu Tong

, Weihai Lu
, Zhe Zhao
, Song Lai
, Tong Shi
:
MMDFND: Multi-modal Multi-Domain Fake News Detection. 1178-1186 - Minghang Zheng

, Jiahua Zhang
, Qingchao Chen
, Yuxin Peng
, Yang Liu
:
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding. 1187-1196 - Shilong Jia

, Tingting Wu
, Yingying Fang
, Tieyong Zeng
, Guixu Zhang
, Zhi Li
:
Purified Distillation: Bridging Domain Shift and Category Gap in Incremental Object Detection. 1197-1205 - Haonan Zhang

, Pengpeng Zeng
, Lianli Gao
, Jingkuan Song
, Heng Tao Shen
:
MPT: Multi-grained Prompt Tuning for Text-Video Retrieval. 1206-1214 - Ziwei Zheng

, Zechuan Zhang
, Yulin Wang
, Shiji Song
, Gao Huang
, Le Yang
:
Rethinking the Architecture Design for Efficient Generic Event Boundary Detection. 1215-1224 - Jinglun Li

, Xinyu Zhou
, Kaixun Jiang
, Lingyi Hong
, Pinxue Guo
, Zhaoyu Chen
, Weifeng Ge
, Wenqiang Zhang
:
TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning. 1225-1234 - Zihan Cao

, Xiao Wu
, Liang-Jian Deng
, Yu Zhong
:
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion. 1235-1244 - Zhenyu Yang

, Shengsheng Qian
, Dizhan Xue
, Jiahong Wu
, Fan Yang
, Weiming Dong
, Changsheng Xu
:
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval. 1245-1254 - Zeyu Jin

, Jia Jia
, Qixin Wang
, Kehan Li
, Shuoyi Zhou
, Songtao Zhou
, Xiaoyu Qin
, Zhiyong Wu
:
SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description. 1255-1264 - Lihao Liu

, Yanqi Cheng
, Zhongying Deng
, Shujun Wang
, Dongdong Chen
, Xiaowei Hu
, Pietro Liò
, Carola-Bibiane Schönlieb
, Angelica I. Avilés-Rivero
:
TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios. 1265-1273 - Jing Yang, Xiaowen Jiang, Yuan Gao, Laurence T. Yang

, Jieming Yang:
Generalize to Fully Unseen Graphs: Learn Transferable Hyper-Relation Structures for Inductive Link Prediction. 1274-1282 - Panjun Liu

, Jiacheng Li
, Lizhi Wang
, Zheng-Jun Zha
, Zhiwei Xiong
:
MLP Embedded Inverse Tone Mapping. 1283-1291 - Mingkai Lin

, Wenzhong Li
, Xiaobin Hong
, Sanglu Lu
:
Scalable Multi-Source Pre-training for Graph Neural Networks. 1292-1301 - Xiaole Zhao

, Linze Li
, Chengxing Xie
, Xiaoming Zhang
, Ting Jiang
, Wenjie Lin
, Shuaicheng Liu
, Tianrui Li
:
Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation. 1302-1310 - Minsu Kim

, Jeong Hun Yeo
, Se Jin Park
, Hyeongseop Rha
, Yong Man Ro
:
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation. 1311-1320 - Shoutong Luo

, Zhengxing Sun
, Yi Wang
, Yunhan Sun
, Chendi Zhu
:
LDCNet: Long-Distance Context Modeling for Large-Scale 3D Point Cloud Scene Semantic Segmentation. 1321-1330 - Yiming Cui

, Liang Li
, Jiehua Zhang
, Chenggang Yan
, Hongkui Wang
, Shuai Wang
, Heng Jin
, Li Wu
:
Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection. 1331-1340 - Zhuoling Li

, Yong Wang
, Kaitong Li
:
FewVS: A Vision-Semantics Integration Framework for Few-Shot Image Classification. 1341-1350 - Yuyan Bu

, Qiang Sheng
, Juan Cao
, Peng Qi
, Danding Wang
, Jintao Li
:
FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process. 1351-1360 - Subash Khanal

, Eric Xing
, Srikumar Sastry
, Aayush Dhakal
, Zhexiao Xiong
, Adeel Ahmad
, Nathan Jacobs
:
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping. 1361-1369 - Zizhao Wu

, Haohan Li
, Gongyi Chen
, Zhou Yu
, Xiaoling Gu
, Yigang Wang
:
3D Question Answering with Scene Graph Reasoning. 1370-1378 - Liang He

, Hongke Wang
, Zhen Wu
, Jianbing Zhang
, Xinyu Dai
, Jiajun Chen
:
Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media. 1379-1388 - Yuanchen Wu

, Xiaoqiang Li
, Jide Li
, Kequan Yang
, Pinpin Zhu
, Shaohua Zhang
:
DINO is Also a Semantic Guider: Exploiting Class-aware Affinity for Weakly Supervised Semantic Segmentation. 1389-1397 - Dongshuo Yin

, Xueting Han
, Bin Li
, Hao Feng
, Jing Bai
:
Parameter-efficient is not Sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions. 1398-1406 - Rongwen Li

, Haiyang Hu
, Liang Du
, Jiarong Chen
, Bingbing Jiang
, Peng Zhou
:
One-Stage Fair Multi-View Spectral Clustering. 1407-1416 - Jingfan Tan

, Hyunhee Park
, Ying Zhang
, Tao Wang
, Kaihao Zhang
, Xiangyu Kong
, Pengwen Dai
, Zikun Liu
, Wenhan Luo
:
Blind Face Video Restoration with Temporal Consistent Generative Prior and Degradation-Aware Prompt. 1417-1426 - Yinghui Sun

, Xingfeng Li
, Quansen Sun
, Min-Ling Zhang
, Zhenwen Ren
:
Improved Weighted Tensor Schatten p-Norm for Fast Multi-view Graph Clustering. 1427-1436 - Xinjie Jiang

, Chenxi Zheng
, Xuemiao Xu
, Bangzhen Liu
, Weiying Zheng
, Huaidong Zhang
, Shengfeng He
:
VrdONE: One-stage Video Visual Relation Detection. 1437-1446 - Chenxi Ma

, Weimin Tan
, Shili Zhou
, Bo Yan
:
Learning Cross-Spectral Prior for Image Super-Resolution. 1447-1455 - Dayu Hu

, Suyuan Liu
, Jun Wang
, Junpu Zhang
, Siwei Wang
, Xingchen Hu
, Xinzhong Zhu, Chang Tang
, Xinwang Liu
:
Reliable Attribute-missing Multi-view Clustering with Instance-level and feature-level Cooperative Imputation. 1456-1466 - Duc Dang Trung Tran

, Byeongkeun Kang
, Yeejin Lee
:
MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation. 1467-1475 - Jingjing Hu

, Dan Guo
, Kun Li
, Zhan Si
, Xun Yang
, Meng Wang
:
Maskable Retentive Network for Video Moment Retrieval. 1476-1485 - Junming Hou

, Zihan Cao
, Naishan Zheng
, Xuan Li
, Xiaoyu Chen
, Xinyang Liu
, Xiaofeng Cong
, Danfeng Hong
, Man Zhou
:
Linearly-evolved Transformer for Pan-sharpening. 1486-1494 - Zhenhao Yang

, Xin Liu
, Deqiang Ouyang
, Guiduo Duan
, Dongyang Zhang
, Tao He
, Yuan-Fang Li
:
Towards Open-vocabulary HOI Detection with Calibrated Vision-language Models and Locality-aware Queries. 1495-1504 - Kang Zeng

, Hao Shi
, Jiacheng Lin
, Siyu Li
, Jintao Cheng
, Kaiwei Wang
, Zhiyong Li
, Kailun Yang
:
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model. 1505-1513 - Tao Tang

, Hong Liu
, Yingxuan You
, Ti Wang
, Wenhao Li
:
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos. 1514-1523 - Xudong Lu

, Yuqi Jiang
, Haiwen Hong
, Qi Sun
, Cheng Zhuo
:
DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion. 1524-1533 - Wenbin Zou

, Hongxia Gao
, Weipeng Yang
, Tongtong Liu
:
Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement. 1534-1543 - Junwei He, Qianqian Xu

, Yangbangyan Jiang
, Zitai Wang
, Yuchen Sun
, Qingming Huang
:
HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection. 1544-1553 - Ke Liang

, Lingyuan Meng
, Yue Liu
, Meng Liu
, Wei Wei
, Suyuan Liu
, Wenxuan Tu
, Siwei Wang
, Sihang Zhou
, Xinwang Liu
:
Simple Yet Effective: Structure Guided Pre-trained Transformer for Multi-modal Knowledge Graph Reasoning. 1554-1563 - Yuning Ding

, Sifan Zhang
, Shenglan Liu
, Jinrong Zhang
, Wenyue Chen
, Haifei Duan
, Bingcheng Dong
, Tao Sun
:
2M-AF: A Strong Multi-Modality Framework For Human Action Quality Assessment with Self-supervised Representation Learning. 1564-1572 - Liqiu Chen

, Yuqing Huang
, Hengyu Li
, Zikun Zhou
, Zhenyu He
:
Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT Tracking. 1573-1582 - Can Cui

, Siteng Huang
, Wenxuan Song
, Pengxiang Ding
, Min Zhang
, Donglin Wang
:
ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification. 1583-1592 - Tianqi Wei

, Zhi Chen
, Zi Huang
, Xin Yu
:
Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline. 1593-1601 - Jiaming Lei

, Lin Li
, Chunping Wang
, Jun Xiao
, Long Chen
:
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer. 1602-1611 - Jinyong Wen

:
Gaussian Mutual Information Maximization for Efficient Graph Self-Supervised Learning: Bridging Contrastive-based to Decorrelation-based. 1612-1621 - Haowei Kuang

, Yiyang Ma
, Wenhan Yang
, Zongming Guo
, Jiaying Liu
:
Consistency Guided Diffusion Model with Neural Syntax for Perceptual Image Compression. 1622-1631 - Zhangchi Feng

, Richong Zhang
, Zhijie Nie
:
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. 1632-1641 - Guanchen Ding

, Lingbo Liu
, Zhenzhong Chen
, Changwen Chen
:
Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity Augmentation. 1642-1651 - Cunhang Fan

, Jingjing Zhang
, Hongyu Zhang
, Wang Xiang
, Jianhua Tao
, Xinhui Li
, Jiangyan Yi
, Dianbo Sui
, Zhao Lv
:
MSFNet: Multi-Scale Fusion Network for Brain-Controlled Speaker Extraction. 1652-1661 - Zhong Ji

, Changxu Meng
, Yan Zhang
, Haoran Wang
, Yanwei Pang
, Jungong Han
:
Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning. 1662-1671 - Jinyan Zhang

, Mengyuan Liu
, Hong Liu
, Guoquan Wang
, Wenhao Li
:
APP: Adaptive Pose Pooling for 3D Human Pose Estimation from Videos. 1672-1681 - Jing Bi

, Yunlong Tang
, Luchuan Song
, Ali Vosoughi
, Nguyen Nguyen
, Chenliang Xu
:
EAGLE: Egocentric AGgregated Language-video Engine. 1682-1691 - Kai Yin

, Jie Shen
:
Expanded Convolutional Neural Network Based Look-Up Tables for High Efficient Single-Image Super-Resolution. 1692-1700 - Zheng Han

, Xiaobin Zhu
, Chun Yang
, Hongyang Zhou
, Jingyan Qin
, Xu-Cheng Yin
:
Exploring Stable Meta-Optimization Patterns via Differentiable Reinforcement Learning for Few-Shot Classification. 1701-1710 - Yixin Guo

, Yu Liu
, Jianghao Li
, Weimin Wang
, Qi Jia
:
Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection. 1711-1720 - Jiangbin Zheng

, Han Zhang
, Qianqing Xu
, An-Ping Zeng
, Stan Z. Li
:
MetaEnzyme: Meta Pan-Enzyme Learning for Task-Adaptive Redesign. 1721-1730 - Yiming Zhong

, Xiaolin Zhang
, Yao Zhao
, Yunchao Wei
:
DreamLCM: Towards High Quality Text-to-3D Generation via Latent Consistency Model. 1731-1740 - Anna Zhu

, Ke Xiao
, Bo Zhou
, Runmin Wang
:
Trust Prophet or Not? Taking a Further Verification Step toward Accurate Scene Text Recognition. 1741-1750 - Gongli Xi

, Ye Tian
, Mengyu Yang
, Lanshan Zhang
, Xirong Que
, Wendong Wang
:
Global Patch-wise Attention is Masterful Facilitator for Masked Image Modeling. 1751-1760 - Chenghao Deng, Haote Xu, Xiaolu Chen, Haodi Xu, Xiaotong Tu, Xinghao Ding, Yue Huang:

SimCLIP: Refining Image-Text Alignment with Simple Prompts for Zero-/Few-shot Anomaly Detection. 1761-1770 - Yuanhe Tian

, Fei Xia
, Yan Song
:
Diffusion Networks with Task-Specific Noise Control for Radiology Report Generation. 1771-1780 - Yun Xing

, Qing Guo
, Xiaofeng Cao
, Ivor W. Tsang
, Lei Ma
:
MetaRepair: Learning to Repair Deep Neural Networks from Repairing Experiences. 1781-1790 - Xingtao Wang

, Xianqi Zhang
, Wenxue Cui
, Ruiqin Xiong
, Xiaopeng Fan
, Debin Zhao
:
Mesh Denoising Using Filtering Coefficients Jointly Aware of Noise and Geometry. 1791-1799 - Yan Zhuang, Yanru Zhang, Zheng Hu, Xiaoyue Zhang, Jiawen Deng, Fuji Ren:

GLoMo: Global-Local Modal Fusion for Multimodal Sentiment Analysis. 1800-1809 - Yuhui Wu

, Guoqing Wang
, Zhiwen Wang
, Yang Yang
, Tianyu Li
, Malu Zhang
, Chongyi Li
, Heng Tao Shen
:
JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement. 1810-1818 - Zichen Wen

, Tianyi Wu
, Yazhou Ren
, Yawen Ling
, Chenhang Cui
, Xiaorong Pu
, Lifang He
:
Dual-Optimized Adaptive Graph Reconstruction for Multi-View Graph Clustering. 1819-1828 - Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren:

3D Priors-Guided Diffusion for Blind Face Restoration. 1829-1838 - Hao Wu

, Likun Zhang
, Shucheng Li
, Fengyuan Xu
, Sheng Zhong
:
CoAst: Validation-Free Contribution Assessment for Federated Learning based on Cross-Round Valuation. 1839-1847 - Kang Xia

, Wenzhong Li
, Yimiao Shao
, Sanglu Lu
:
Vi2ACT: Video-enhanced Cross-modal Co-learning with Representation Conditional Discriminator for Few-shot Human Activity Recognition. 1848-1856 - Seonggwan Ko

, Yeong Jun Koh
, Donghyeon Cho
:
Reference-based Burst Super-resolution. 1857-1865 - Yi Zhang

, Zhefeng Wang
, Rui Hu
, Xinyu Duan
, Yi Zheng
, Baoxing Huai
, Jiarun Han
, Jitao Sang
:
Poisoning for Debiasing: Fair Recognition via Eliminating Bias Uncovered in Data Poisoning. 1866-1874 - Dizhan Xue

, Shengsheng Qian
, Changsheng Xu
:
Few-Shot Multimodal Explanation for Visual Question Answering. 1875-1884 - Jingtao Wang

, Zechao Li
:
3DPCP-Net: A Lightweight Progressive 3D Correspondence Pruning Network for Accurate and Efficient Point Cloud Registration. 1885-1894 - Jiawei Ge

, Jiuxin Cao
, Xuelin Zhu
, Xinyu Zhang
, Chang Liu
, Kun Wang
, Bo Liu
:
Consistencies are All You Need for Semi-supervised Vision-Language Tracking. 1895-1904 - Zhen Zou

, Hu Yu
, Jie Huang
, Feng Zhao
:
FreqMamba: Viewing Mamba from a Frequency Perspective for Image Deraining. 1905-1914 - Zhida Zhao

, Jia Li
, Lijun Wang
, Yifan Wang
, Huchuan Lu
:
MaskMentor: Unlocking the Potential of Masked Self-Teaching for Missing Modality RGB-D Semantic Segmentation. 1915-1923 - Linli Yao

, Yuanmeng Zhang
, Ziheng Wang
, Xinglin Hou
, Tiezheng Ge
, Yuning Jiang
, Xu Sun
, Qin Jin
:
Edit As You Wish: Video Caption Editing with Multi-grained User Control. 1924-1933 - Wenlin Li

, Yucheng Xu
, Xiaoqing Zheng
, Suoya Han
, Jun Wang
, Xiaobo Sun
:
Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images. 1934-1942 - Zhiwei Hao

, Zhongyu Xiao
, Yong Luo
, Jianyuan Guo
, Jing Wang
, Li Shen
, Han Hu
:
PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation. 1943-1951 - Kaixin Shen

, Ruijie Quan
, Linchao Zhu
, Jun Xiao
, Yi Yang:
Neural Interaction Energy for Multi-Agent Trajectory Prediction. 1952-1960 - Hao Gu

, Jiangyan Yi
, Chenglong Wang
, Yong Ren
, Jianhua Tao
, Xinrui Yan
, Yujie Chen
, Xiaohui Zhang
:
Utilizing Speaker Profiles for Impersonation Audio Detection. 1961-1970 - Zejun Li

, Ye Wang
, Mengfei Du
, Qingwen Liu
, Binhao Wu
, Jiwen Zhang
, Chengxing Zhou
, Zhihao Fan
, Jie Fu
, Jingjing Chen, Zhongyu Wei
, Xuanjing Huang
:
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks. 1971-1980 - Jiankang Chen

, Ling Deng
, Zhiyong Gan
, Wei-Shi Zheng
, Ruixuan Wang
:
FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector. 1981-1990 - Xudong Wang

, Weihong Ren
, Xi'ai Chen
, Huijie Fan
, Yandong Tang
, Zhi Han
:
Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open World. 1991-2000 - Junliu Zhong

, Zhiyi Li
, Dan Xiang
, Maotang Han
, Changsheng Li
, Yanfen Gan
:
A Lightweight Multi-domain Multi-attention Progressive Network for Single Image Deraining. 2001-2010 - Weijia Zhang

, Dongnan Liu
, Weidong Cai
, Chao Ma
:
Cross-View Consistency Regularisation for Knowledge Distillation. 2011-2020 - Zikai Song

, Ying Tang
, Run Luo
, Lintao Ma
, Junqing Yu
, Yi-Ping Phoebe Chen
, Wei Yang
:
Autogenic Language Embedding for Coherent Point Tracking. 2021-2030 - Yuwen Pan

, Rui Sun
, Yuan Wang
, Tianzhu Zhang
, Yongdong Zhang
:
Rethinking the Implicit Optimization Paradigm with Dual Alignments for Referring Remote Sensing Image Segmentation. 2031-2040 - Zhaopeng Gu

, Bingke Zhu
, Guibo Zhu
, Yingying Chen
, Hao Li
, Ming Tang
, Jinqiao Wang
:
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization. 2041-2049 - Yi Lei

, Huilin Zhu
, Jingling Yuan
, Guangli Xiang
, Xian Zhong
, Shengfeng He
:
DenseTrack: Drone-Based Crowd Tracking via Density-Aware Motion-Appearance Synergy. 2050-2058 - Fengze Jiang

, Shuling Wang
, Xiaojin Gong
:
Task-Conditional Adapter for Multi-Task Dense Prediction. 2059-2068 - Yitai Lin, Zhijie Wei, Wanfa Zhang

, Xiping Lin, Yudi Dai
, Chenglu Wen, Siqi Shen, Lan Xu, Cheng Wang:
HmPEAR: A Dataset for Human Pose Estimation and Action Recognition. 2069-2078 - Deji Zhao

, Donghong Han
, Ye Yuan
, Bo Ning
, Mengxiang Li
, Zhongjiang He
, Shuangyong Song
:
AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation. 2079-2088 - Jiaxin Zhang

, Yiqi Wang
, Xihong Yang
, Siwei Wang
, Yu Feng
, Yu Shi
, Ruichao Ren
, En Zhu
, Xinwang Liu
:
Test-Time Training on Graphs with Large Language Models (LLMs). 2089-2098 - Yujia Xiao

, Xi Wang
, Xu Tan
, Lei He
, Xinfa Zhu
, Sheng Zhao
, Tan Lee
:
Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis. 2099-2107 - Junyu Lin

, Yan Zheng
, Xinyue Chen
, Yazhou Ren
, Xiaorong Pu
, Jing He
:
Cross-view Contrastive Unification Guides Generative Pretraining for Molecular Property Prediction. 2108-2116 - Bo Yuan

, Danpei Zhao
, Zhuoran Liu
, Wentao Li
, Tian Li
:
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images. 2117-2126 - Shidi Chen

, Lili Wei
, Liqian Liang
, Congyan Lang
:
Joint Homophily and Heterophily Relational Knowledge Distillation for Efficient and Compact 3D Object Detection. 2127-2135 - Zhiwen Wang

, Yuhui Wu
, Zheng Wang
, Jiwei Wei
, Tianyu Li
, Guoqing Wang
, Yang Yang
, Hengtao Shen
:
Cascaded Adversarial Attack: Simultaneously Fooling Rain Removal and Semantic Segmentation Networks. 2136-2145 - Jiexuan Yan

, Sheng Huang
, Nankun Mu
, Luwen Huangfu
, Bo Liu
:
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification. 2146-2155 - Penglei Sun

, Yaoxian Song
, Xiang Liu
, Xiaofei Yang
, Qiang Wang
, Tiefeng Li
, Yang Yang
, Xiaowen Chu
:
3D Question Answering for City Scene Understanding. 2156-2165 - Qiuyu Kong

, Jiangming Chen
, Jie Jiang
, Zanxi Ruan
, Lai Kang
:
Dual-Branch Fusion with Style Modulation for Cross-Domain Few-Shot Semantic Segmentation. 2166-2174 - Jiaqi Wang

, Lu Lu
, Mingmin Chi
, Jian Chen
:
MDR: Multi-stage Decoupled Relational Knowledge Distillation with Adaptive Stage Selection. 2175-2183 - Xiongjun Zhao

, Zhengyu Liu
, Fen Liu
, Guanting Li
, Yutao Dou
, Shaoliang Peng
:
Report-Concept Textual-Prompt Learning for Enhancing X-ray Diagnosis. 2184-2193 - Jianzhi Lu

, Ruian He
, Shili Zhou
, Weimin Tan
, Bo Yan
:
FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model. 2194-2203 - Wei-Bang Jiang

, Yu-Ting Lan
, Bao-Liang Lu
:
REmoNet: Reducing Emotional Label Noise via Multi-regularized Self-supervision. 2204-2213 - Shuxun Wang

, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu
, Li Yang
, Bing Li
, Wenjuan Li, Jin Gao
, Weiming Hu:
NFT1000: A Cross-Modal Dataset For Non-Fungible Token Retrieval. 2214-2222 - Haoyang Su

, Wenzhe Du, Xiaoliang Wang, Cam-Tu Nguyen:
Sample Efficiency Matters: Training Multimodal Conversational Recommendation Systems in a Small Data Setting. 2223-2232 - Xincheng Ju

, Dong Zhang
, Suyang Zhu
, Junhui Li
, Shoushan Li
, Guodong Zhou
:
ECFCON: Emotion Consequence Forecasting in Conversations. 2233-2241 - Xiangbo Yin

, Jiangming Shi
, Yachao Zhang
, Yang Lu
, Zhizhong Zhang
, Yuan Xie
, Yanyun Qu
:
Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification. 2242-2251 - Yubo Li

, De Cheng
, Chaowei Fang
, Changzhe Jiao
, Nannan Wang
, Xinbo Gao
:
Disentangling Identity Features from Interference Factors for Cloth-Changing Person Re-identification. 2252-2261 - Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li:

Harmfully Manipulated Images Matter in Multimodal Misinformation Detection. 2262-2271 - Wuliang Huang

, Yiqiang Chen
, Xinlong Jiang
, Chenlong Gao
, Qian Chen
, Teng Zhang
, Bingjie Yan
, Yifan Wang
, Jianrong Yang
:
Correlation-Driven Multi-Modality Graph Decomposition for Cross-Subject Emotion Recognition. 2272-2281 - Wenbin Wang

, Liang Ding
, Li Shen
, Yong Luo
, Han Hu
, Dacheng Tao
:
WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge. 2282-2291 - Zhanpeng Chen

, Zhihong Zhu
, Wanshi Xu
, Yunyan Zhang
, Xian Wu
, Yefeng Zheng
:
Aspects are Anchors: Towards Multimodal Aspect-based Sentiment Analysis via Aspect-driven Alignment and Refinement. 2292-2300 - Haodong Chen

, Haojian Huang
, Junhao Dong
, Mingzhe Zheng
, Dian Shao
:
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs. 2301-2310 - Honghao Li

, Lei Sang
, Yi Zhang
, Yiwen Zhang
:
SimCEN: Simple Contrast-enhanced Network for CTR Prediction. 2311-2320 - Yuanyuan Shi

, Yunan Li
, Siyu Liang
, Huizhou Chen
, Qiguang Miao
:
MGR-Dark: A Large Multimodal Video Dataset and RGB-IR Benchmark for Gesture Recognition in Darkness. 2321-2330 - Shuanglin Yan

, Jun Liu
, Neng Dong
, Liyan Zhang
, Jinhui Tang
:
Prototypical Prompting for Text-to-image Person Re-identification. 2331-2340 - Kexiang Feng

, Chuanmin Jia
, Siwei Ma
, Wen Gao
:
Unifying Spike Perception and Prediction: A Compact Spike Representation Model Using Multi-scale Correlation. 2341-2349 - Feifei Zhang

, Sijia Qu
, Fan Shi
, Changsheng Xu
:
Overcoming the Pitfalls of Vision-Language Model for Image-Text Retrieval. 2350-2359 - Francesco Tonini

, Nicola Dall'Asen
, Lorenzo Vaquero
, Cigdem Beyan
, Elisa Ricci
:
AL-GTD: Deep Active Learning for Gaze Target Detection. 2360-2369 - Yuxiang Zhou

, Zhe Sun
, Rui Liu
, Yong Chen
, Dell Zhang
:
AVHash: Joint Audio-Visual Hashing for Video Retrieval. 2370-2378 - Xin Jiang

, Hao Tang
, Rui Yan
, Jinhui Tang
, Zechao Li
:
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines. 2379-2388 - Qian Li

, Yucheng Zhou
, Cheng Ji
, Feihong Lu
, Jianian Gong
, Shangguang Wang
, Jianxin Li
:
Multi-Modal Inductive Framework for Text-Video Retrieval. 2389-2398 - Hancheng Zhu

, Ju Shi
, Zhiwen Shao
, Rui Yao
, Yong Zhou
, Jiaqi Zhao
, Leida Li
:
Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment. 2399-2408 - Zeyu Xiao

, Dachun Kai
, Yueyi Zhang
, Xiaoyan Sun
, Zhiwei Xiong
:
Asymmetric Event-Guided Video Super-Resolution. 2409-2418 - Yuanfeng Pan

, Wenkang Su
, Jiangqun Ni
, Qingliang Liu
, Yulin Zhang
, Donghua Jiang
:
Model-Based Non-Independent Distortion Cost Design for Effective JPEG Steganography. 2419-2427 - Xianghu Yue

, Xueyi Zhang
, Yiming Chen
, Chengwei Zhang
, Mingrui Lao
, Huiping Zhuang
, Xinyuan Qian
, Haizhou Li
:
MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks. 2428-2437 - Yuzheng Wang

, Zhaoyu Chen
, Jie Zhang
, Dingkang Yang
, Zuhao Ge
, Yang Liu
, Siao Liu
, Yunquan Sun
, Wenqiang Zhang
, Lizhe Qi
:
Sampling to Distill: Knowledge Transfer from Open-World Data. 2438-2447 - Xi Wu

, Chuang Huang
, Xinliu Liu
, Fei Zhou
, Zhenwen Ren
:
Multiple Kernel Clustering with Shifted Laplacian on Grassmann Manifold. 2448-2456 - Guangyao Li

, Yajun Jian, Yan Yan, Hanzi Wang:
GLATrack: Global and Local Awareness for Open-Vocabulary Multiple Object Tracking. 2457-2466 - Xuze Hao

, Wenqian Ni
, Xuhao Jiang
, Weimin Tan
, Bo Yan
:
Addressing Imbalance for Class Incremental Learning in Medical Image Classification. 2467-2476 - Qiwei Li

, Yuxin Peng
, Jiahuan Zhou
:
Progressive Prototype Evolving for Dual-Forgetting Mitigation in Non-Exemplar Online Continual Learning. 2477-2486 - Fengfan Zhou

, Qianyu Zhou
, Bangjie Yin
, Hui Zheng
, Xuequan Lu
, Lizhuang Ma
, Hefei Ling
:
Rethinking Impersonation and Dodging Attacks on Face Recognition Systems. 2487-2496 - Xin Chen

, Bin Wang
, Jinzheng Jiang
, Kunkun Zhang
, Yongsheng Gao
:
SDePR: Fine-Grained Leaf Image Retrieval with Structural Deep Patch Representation. 2497-2505 - Yuhan Liu

, Qianxin Huang
, Siqi Hui
, Jingwen Fu
, Sanping Zhou
, Kangyi Wu
, Pengna Li
, Jinjun Wang
:
Semantic-aware Representation Learning for Homography Estimation. 2506-2514 - Chen Hui

, Haiqi Zhu
, Shuya Yan
, Shaohui Liu
, Feng Jiang
, Debin Zhao
:
S2-CSNet: Scale-Aware Scalable Sampling Network for Image Compressive Sensing. 2515-2524 - Gangyan Zeng

, Yuan Zhang
, Jin Wei
, Dongbao Yang
, Peng Zhang
, Yiwen Gao
, Xugong Qin
, Yu Zhou
:
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval. 2525-2534 - Hua Yu

, Weiming Liu
, Jiapeng Bai
, Xu Gui
, Yaqing Hou
, Yew-Soon Ong
, Qiang Zhang
:
Towards Efficient and Diverse Generative Model for Unconditional Human Motion Synthesis. 2535-2544 - Dan Zeng

, Yu Zhu
, Shuiwang Li
, Qijun Zhao
, Qiaomu Shen
, Bo Tang
:
Towards Labeling-free Fine-grained Animal Pose Estimation. 2545-2553 - Rui Xie

, Anlong Ming
, Shuai He
, Yi Xiao
, Huadong Ma
:
"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical Perspective. 2554-2563 - Zhengwei Yin

, Mingze Ma
, Guixu Lin
, Yinqiang Zheng
:
Exploring Data Efficiency in Image Restoration: A Gaussian Denoising Case Study. 2564-2573 - Yuntao Wang

, Jinpu Zhang
, Ruonan Wei
, Wenbo Gao
, Yuehuan Wang
:
MFRGN: Multi-scale Feature Representation Generalization Network for Ground-to-Aerial Geo-localization. 2574-2583 - Chang Wu

, Guancheng Quan
, Gang He
, Xin-Quan Lai
, Yunsong Li
, Wenxin Yu
, Xianmeng Lin
, Cheng Yang
:
QS-NeRV: Real-Time Quality-Scalable Decoding with Neural Representation for Videos. 2584-2592 - Xiaoyu Han

, Shunyuan Zheng
, Zonglin Li
, Chenyang Wang
, Xin Sun
, Quanling Meng
:
Shape-Guided Clothing Warping for Virtual Try-On. 2593-2602 - Richen Liu

, Hansheng Wang
, Hailong Wang
, Siru Chen
, Chufan Lai
, Ayush Kumar
, Siming Chen
:
ScaleTraversal: Creating Multi-Scale Biomedical Animation with Limited Hardware Resources. 2603-2612 - Chenrui Wu

, Haishuai Wang
, Xiang Zhang
, Zhen Fang
, Jiajun Bu
:
Spatio-temporal Heterogeneous Federated Learning for Time Series Classification with Multi-view Orthogonal Training. 2613-2622 - Yaopeng Peng

, Milan Sonka
, Danny Z. Chen
:
Group Vision Transformer. 2623-2631 - Zhichao Yang

, Leida Li
, Pengfei Chen
, Jinjian Wu
, Weisheng Dong
:
Semantics-Aware Image Aesthetics Assessment using Tag Matching and Contrastive Ranking. 2632-2641 - Pengcheng Zhang

, Xiaohan Yu
, Xiao Bai
, Jin Zheng
, Xin Ning
:
Prompting Continual Person Search. 2642-2651 - Xiao Zhao

, Xukun Zhang
, Dingkang Yang
, Mingyang Sun
, Mingcheng Li
, Shunli Wang
, Lihua Zhang
:
MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation. 2652-2661 - Yong Yang

, Aoqi Zhao
, Shuying Huang
, Xiaozheng Wang
, Yajing Fan
:
SCPSN: Spectral Clustering-based Pyramid Super-resolution Network for Hyperspectral Images. 2662-2670 - Xiangyu Chen

, Yihao Liu
, Yuandong Pu
, Wenlong Zhang
, Jiantao Zhou
, Yu Qiao
, Chao Dong
:
Learning A Low-Level Vision Generalist via Visual Task Prompt. 2671-2680 - Wenxu Shi

, Bochuan Zheng
:
Alleviating the Equilibrium Challenge with Sample Virtual Labeling for Adversarial Domain Adaptation. 2681-2689 - Federico Espositi

, Andrea Bonarini
:
The Room: Design and Embodiment of Spaces as Social Beings. 2690-2699 - Chunjie Ma

, Lina Du
, Zan Gao
, Li Zhuo
, Meng Wang
:
A Coarse to Fine Detection Method for Prohibited Object in X-ray Images Based on Progressive Transformer Decoder. 2700-2708 - Qizhi Xie

, Kun Yuan
, Yunpeng Qu
, Mingda Wu
, Ming Sun
, Chao Zhou
, Jihong Zhu
:
QPT-V2: Masked Image Modeling Advances Visual Scoring. 2709-2718 - Shengguang Wu

, Zhenglun Chen
, Qi Su
:
Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision. 2719-2728 - Yu Feng

, Zhen Tian
, Yifan Zhu
, Zongfu Han
, Haoran Luo
, Guangwei Zhang
, Meina Song
:
CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning. 2729-2738 - Huixiang Wen

, Shizong Yan
, Shan Chang
, Jie Xu
, Hongzi Zhu
, Yanting Zhang
, Bo Li
:
DepthCloak: Projecting Optical Camouflage Patches for Erroneous Monocular Depth Estimation of Vehicles. 2739-2747 - Keming Wu

, Man Yao
, Yuhong Chou
, Xuerui Qiu
, Rui Yang
, Bo Xu
, Guoqi Li
:
RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding. 2748-2756 - Xueying Mao

, Xiaoxiao Hu
, Wanli Peng
, Zhenliang Gan
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
:
From Covert Hiding To Visual Editing: Robust Generative Video Steganography. 2757-2765 - Wu Ran

, Peirong Ma
, Zhiquan He
, Hong Lu
:
Rainmer: Learning Multi-view Representations for Comprehensive Image Deraining and Beyond. 2766-2775 - Haoxuan Li

, Zhengmao Yang
, Yunshan Ma
, Yi Bin
, Yang Yang
, Tat-Seng Chua
:
MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models. 2776-2785 - Shuyuan Wen

, Bingrui Hu
, Wenchao Li
:
CDEA: Context- and Detail-Enhanced Unsupervised Learning for Domain Adaptive Semantic Segmentation. 2786-2794 - Xitong Ling

, Minxi Ouyang
, Yizhi Wang
, Xinrui Chen
, Renao Yan
, Hongbo Chu
, Junru Cheng
, Tian Guan
, Sufang Tian
, Xiaoping Liu
, Yonghong He
:
Agent Aggregator with Mask Denoise Mechanism for Histopathology Whole Slide Image Analysis. 2795-2803 - Kepeng Xu

, Zijia Ma
, Li Xu
, Gang He
, Yunsong Li
, Wenxin Yu
, Taichu Han
, Cheng Yang
:
An End-to-End Real-World Camera Imaging Pipeline. 2804-2813 - Lijian Yang

, Weisheng Li
, Yucheng Shu
, Jian-Xun Mi
, Yuping Huang
, Bin Xiao
:
ShiftMorph: A Fast and Robust Convolutional Neural Network for 3D Deformable Medical Image Registration. 2814-2823 - Ximing Wu

, Kongyange Zhao
, Xu Chen
, Teng Liang
:
Edge-assisted Real-time Dynamic 3D Point Cloud Rendering for Multi-party Mobile Virtual Reality. 2824-2832 - Nannan Yu

, Tao Ma
, Jiqing Zhang
, Yuji Zhang
, Qirui Bao
, Xiaopeng Wei
, Xin Yang:
Adaptive Vision Transformer for Event-Based Human Pose Estimation. 2833-2841 - Litian Zhang

, Xiaoming Zhang
, Chaozhuo Li
, Ziyi Zhou
, Jiacheng Liu
, Feiran Huang
, Xi Zhang
:
Mitigating Social Hazards: Early Detection of Fake News via Diffusion-Guided Propagation Path Generation. 2842-2851 - Yuzhen Du

, Teng Hu
, Ran Yi
, Lizhuang Ma
:
LD-BFR: Vector-Quantization-Based Face Restoration Model with Latent Diffusion Enhancement. 2852-2860 - Jie Huang

, Zhao-Min Chen, Xiaoqin Zhang, Yisu Ge
, Lusi Ye, Guodao Zhang, Huiling Chen:
Label Decoupling and Reconstruction: A Two-Stage Training Framework for Long-tailed Multi-label Medical Image Recognition. 2861-2869 - Chengpei Xu

, Hao Fu
, Long Ma
, Wenjing Jia
, Chengqi Zhang
, Feng Xia
, Xiaoyu Ai
, Binghao Li
, Wenjie Zhang
:
Seeing Text in the Dark: Algorithm and Benchmark. 2870-2878 - Ye Tian

, Zhe Wang
, Jianguo Sun
, Liguo Zhang
:
Time-Frequency Domain Fusion Enhancement for Audio Super-Resolution. 2879-2887 - Lei Liu, Li Liu, Yawen Cui:

Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning. 2888-2897 - Tianjiao Xu

, Aoxuan Chen
, Yuxi Zhao
, Jinfei Gao
, Tian Gan
:
A Chinese Multimodal Social Video Dataset for Controversy Detection. 2898-2907 - Zhe Ji

, Qiansiqi Hu
, Yicheng Zheng
, Liyao Xiang
, Xinbing Wang
:
A Principled Approach to Natural Language Watermarking. 2908-2916 - Hao Wu

, Fan Xu
, Chong Chen
, Xian-Sheng Hua
, Xiao Luo
, Haixin Wang
:
PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction. 2917-2926 - Jiawei Yao

, Yingxin Lai
, Hongrui Kou
, Tong Wu
, Ruixi Liu
:
QE-BEV: Query Evolution for Bird's Eye View Object Detection in Varied Contexts. 2927-2935 - Xiangrui Liu

, Xinju Wu
, Pingping Zhang
, Shiqi Wang
, Zhu Li
, Sam Kwong
:
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting. 2936-2944 - Shengyu Hao

, Wenhao Chai
, Zhonghan Zhao
, Meiqi Sun
, Wendi Hu
, Jieyang Zhou
, Yixian Zhao
, Qi Li
, Yizhou Wang
, Xi Li
, Gaoang Wang
:
Ego3DT: Tracking Every 3D Object in Ego-centric Videos. 2945-2954 - Junkang Liu

, Fanhua Shang
, Yuanyuan Liu
, Hongying Liu
, Yuangang Li
, YunXiang Gong
:
FedBCGD: Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning. 2955-2963 - Yiran Cheng

, Bintao He
, Fa Zhang
, Renmin Han
:
Serial Section Microscopy Image Inpainting Guided by Axial Optical Flow. 2964-2972 - Han Fang

, Kejiang Chen
, Yupeng Qiu
, Zehua Ma
, Weiming Zhang
, Ee-Chien Chang
:
DERO: Diffusion-Model-Erasure Robust Watermarking. 2973-2981 - Yin Wang

, Hao Lu
, Ying-Cong Chen
, Li Kuang
, Mengchu Zhou
, Shuiguang Deng
:
rPPG-HiBa: Hierarchical Balanced Framework for Remote Physiological Measurement. 2982-2991 - Huan Chen

, Tingfa Xu
, Zhenxiang Chen
, Peifu Liu
, Huiyan Bai
, Jianan Li
:
Multi-scale Change-Aware Transformer for Remote Sensing Image Change Detection. 2992-3000 - Yinyin Peng

, Yaofei Wang
, Donghui Hu
, Kejiang Chen
, Xianjin Rong
, Weiming Zhang
:
LDStega: Practical and Robust Generative Image Steganography based on Latent Diffusion Models. 3001-3009 - Lei Lu

, Yanyue Xie
, Wei Jiang
, Wei Wang
, Xue Lin
, Yanzhi Wang
:
HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression. 3010-3018 - Linfei Li

, Lin Zhang
, Zhong Wang
, Ying Shen
:
GS3LAM: Gaussian Semantic Splatting SLAM. 3019-3027 - Shuang Wang

, Pengyi Hao
, Fuli Wu
, Cong Bai
:
Live on the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning. 3028-3036 - Xuhan Zhu

, Yifei Xing
, Ruiping Wang
, Yaowei Wang
, Xiangyuan Lan
:
Calibration for Long-tailed Scene Graph Generation. 3037-3046 - Minjing Yu

, Lingzhi Zeng
, Xinxin Du
, Jenny Sheng
, Qiantian Liao
, Yong-Jin Liu
:
VisHanfu: An Interactive System for the Promotion of Hanfu Knowledge via Cross-Shaped Flat Structure. 3047-3055 - Xiuquan Du

, Jiajia Chen
, Xuejun Zhang
:
CBNet: Cooperation-Based Weakly Supervised Polyp Detection. 3056-3064 - Zeyu Xiao

, Zhihe Lu
, Michael Bi Mi
, Zhiwei Xiong
, Xinchao Wang
:
Unraveling Motion Uncertainty for Local Motion Deblurring. 3065-3074 - Yi Wang

, Ningze Zhong
, Minglin Chen
, Longguang Wang
, Yulan Guo
:
Tangram-Splatting: Optimizing 3D Gaussian Splatting Through Tangram-inspired Shape Priors. 3075-3083 - Jiali Chen

, Yi Cai
, Ruohang Xu
, Jiexin Wang
, Jiayuan Xie
, Qing Li
:
Deconfounded Emotion Guidance Sticker Selection with Causal Inference. 3084-3093 - Zhijian Wu

, Jun Li
, Yang Hu
, Dingjiang Huang
:
Compacter: A Lightweight Transformer for Image Restoration. 3094-3103 - Xiuli Bi

, Yang Hu
, Bo Liu
, Weisheng Li
, Pamela C. Cosman
, Bin Xiao:
PriFU: Capturing Task-Relevant Information Without Adversarial Learning. 3104-3112 - Zan Chen

, Xiao Yu
, Yuanjing Feng
:
Connectivity-based Cerebrovascular Segmentation in Time-of-Flight Magnetic Resonance Angiography. 3113-3121 - Jiawei Chen

, Dingkang Yang
, Yue Jiang
, Mingcheng Li
, Jinjie Wei
, Xiaolu Hou
, Lihua Zhang
:
Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Models. 3122-3130 - Keke Tang

, Zhensu Wang
, Weilong Peng
, Lujie Huang
, Le Wang
, Peican Zhu
, Wenping Wang
, Zhihong Tian
:
SymAttack: Symmetry-aware Imperceptible Adversarial Attacks on 3D Point Clouds. 3131-3140 - Jie Liang

, Rongjie Wang
, Rui Peng
, Zhe Zhang
, Kaiqiang Xiong
, Ronggang Wang
:
High Fidelity Aggregated Planar Prior Assisted PatchMatch Multi-View Stereo. 3141-3150 - Tao Huang

, Xinjia Ou
, Huali Yang
, Shengze Hu
, Jing Geng
, Junjie Hu
, Zhuoran Xu
:
Remembering is Not Applying: Interpretable Knowledge Tracing for Problem-solving Processes. 3151-3159 - Kien T. Pham

, Jingye Chen
, Qifeng Chen
:
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization. 3160-3169 - Lingyu Xiong

, Xize Cheng
, Jintao Tan
, Xianjia Wu
, Xiandong Li
, Lei Zhu
, Fei Ma
, Minglei Li
, Huang Xu
, Zhihui Hu
:
SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing. 3170-3179 - Changshuo Wang

, Mingzhe Yu
, Lei Wu
, Lei Meng
, Xiang Li
, Xiangxu Meng
:
InstantAS: Minimum Coverage Sampling for Arbitrary-Size Image Generation. 3180-3188 - Du Chen

, Zhengqiang Zhang
, Jie Liang
, Lei Zhang
:
SSL: A Self-similarity Loss for Improving Generative Image Super-resolution. 3189-3198 - Zhengze Xu

, Mengting Chen
, Zhao Wang
, Linyu Xing
, Zhonghua Zhai
, Nong Sang
, Jinsong Lan
, Shuai Xiao
, Changxin Gao
:
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos. 3199-3208 - Lixing Tan

, Shuang Song
, Kangneng Zhou
, Chengbo Duan
, Lanying Wang
, Huayang Ren
, Linlin Liu
, Wei Zhang
, Ruoxiu Xiao
:
Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans. 3209-3218 - Zecheng Wang

, Xinye Li
, Zhanyue Qin
, Chunshan Li
, Zhiying Tu
, Dianhui Chu
, Dianbo Sui
:
Can We Debias Multimodal Large Language Models via Model Editing? 3219-3228 - Shuqi Dai

, Ming-Yu Liu
, Rafael Valle
, Siddharth Gururani
:
ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control. 3229-3238 - Dehao Ying

, Fengchang Yu
, Haihua Chen
, Wei Lu
:
DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout Analysis. 3239-3247 - Shibo Hong

, Xuhong Zhang
, Tianyu Du
, Sheng Cheng
, Xun Wang
, Jianwei Yin
:
Cons2Plan: Vector Floorplan Generation from Various Conditions via a Learning Framework based on Conditional Diffusion Models. 3248-3256 - Qihe Pan

, Zhen Zhao
, Zicheng Wang
, Sifan Long
, Yiming Wu
, Wei Ji
, Haoran Liang
, Ronghua Liang
:
Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach. 3257-3265 - Xiaofeng Mao

, Zhengkai Jiang
, Qilin Wang
, Chencan Fu
, Jiangning Zhang
, Jiafu Wu
, Yabiao Wang
, Chengjie Wang
, Wei Li
, Mingmin Chi
:
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation. 3266-3274 - Jihoon Lee

, Yunhong Min
, Hwidong Kim
, Sangtae Ahn
:
DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting. 3275-3283 - Boyong He

, Yuxiang Ji
, Zhuoyue Tan
, Liaoni Wu
:
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector. 3284-3293 - Weizhi Liu

, Yue Li
, Dongdong Lin
, Hui Tian
, Haizhou Li
:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. 3294-3302 - Feihong Lu

, Weiqi Wang
, Yangyifei Luo
, Ziqin Zhu
, Qingyun Sun
, Baixuan Xu
, Haochen Shi
, Shiqi Gao
, Qian Li
, Yangqiu Song
, Jianxin Li
:
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery. 3303-3312 - Guojin Zhong

, Yihu Guo
, Jin Yuan
, Qianjun Zhang
, Weili Guan
, Long Chen
:
PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation. 3313-3322 - Xiangcheng Zhai

, Yingqi Jie
, Xueguang Xie
, Aimin Hao
, Na Jiang
, Yang Gao
:
ANFluid: Animate Natural Fluid Photos base on Physics-Aware Simulation and Dual-Flow Texture Learning. 3323-3331 - Shoubin Yu

, Jacob Zhiyuan Fang
, Jian Zheng
, Gunnar A. Sigurdsson
, Vicente Ordonez
, Robinson Piramuthu
, Mohit Bansal
:
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition. 3332-3341 - Goirik Chakrabarty

, Aditya Chandrasekar
, Ramya Hebbalaguppe
, Prathosh AP
:
LoMOE: Localized Multi-Object Editing via Multi-Diffusion. 3342-3351 - Yuyan Chen

, Songzhou Yan
, Zhihong Zhu
, Zhixu Li
, Yanghua Xiao
:
XMeCap: Meme Caption Generation with Sub-Image Adaptability. 3352-3361 - Zhenqiang Li

, Jie Li
, Yangjie Cao
, Jiayi Wang
, Runfeng Lv
:
ImageBind3D: Image as Binding Step for Controllable 3D Generation. 3362-3371 - Pengxiang Cai

, Zhiwei Liu
, Guibo Zhu
, Yunfang Niu
, Jinqiao Wang
:
Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner. 3372-3380 - Chengwei Zhang

, Xueyi Zhang
, Xianghu Yue
, Mingrui Lao
, Tao Jiang
, Jiawei Wang
, Fubo Zhang
, Longyong Chen
:
PD-Refiner: An Underlying Surface Inheritance Refiner with Adaptive Edge-Aware Supervision for Point Cloud Denoising. 3381-3390 - Yue Jiang

, Yueming Lyu
, Ziwen He
, Bo Peng
, Jing Dong
:
Mitigating Social Biases in Text-to-Image Diffusion Models via Linguistic-Aligned Attention Guidance. 3391-3400 - Peng Zhou

, Dunbo Cai
, Yujian Du
, Runqing Zhang
, Bingbing Ni
, Jie Qin
, Ling Qian
:
Edit3D: Elevating 3D Scene Editing with Attention-Driven Multi-Turn Interactivity. 3401-3410 - Ziyu Yao

, Xuxin Cheng
, Zhiqi Huang
:
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model. 3411-3420 - Xiaomin Li

, Xu Jia
, Qinghe Wang
, Haiwen Diao
, Mengmeng Ge
, Pengxiang Li
, You He
, Huchuan Lu
:
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models. 3421-3430 - Qi Xu

, Yaxin Li
, Xuanye Fang
, Jiangrong Shen
, Qiang Zhang
, Gang Pan
:
Reversing Structural Pattern Learning with Biologically Inspired Knowledge Distillation for Spiking Neural Networks. 3431-3439 - Xiaogang Wang

, Yuhang Cheng
, Ziyang Fan
, Kai Xu
:
Learning to Transfer Heterogeneous Translucent Materials from a 2D Image to 3D Models. 3440-3448 - Zonglin Lyu

, Ming Li
, Jianbo Jiao
, Chen Chen
:
Frame Interpolation with Consecutive Brownian Bridge Diffusion. 3449-3458 - Teng Hu

, Jiangning Zhang
, Ran Yi
, Yating Wang
, Jieyu Weng
, Hongrui Huang
, Yabiao Wang
, Lizhuang Ma
:
COMD: Training-free Video Motion Transfer With Camera-Object Motion Disentanglement. 3459-3468 - Yihao Liu

, Feng Xue
, Anlong Ming
, Mingshuai Zhao
, Huadong Ma
, Nicu Sebe
:
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model. 3469-3478 - Qinfeng Li

, Zhiqiang Shen
, Zhenghan Qin
, Yangfan Xie
, Xuhong Zhang
, Tianyu Du
, Sheng Cheng
, Xun Wang
, Jianwei Yin
:
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment. 3479-3488 - Tao Wu

, Mengze Li
, Jingyuan Chen
, Wei Ji
, Wang Lin
, Jinyang Gao
, Kun Kuang
, Zhou Zhao
, Fei Wu
:
Semantic Alignment for Multimodal Large Language Models. 3489-3498 - Wenxuan Yang

, Weimin Tan
, Yuqi Sun
, Bo Yan
:
A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation Models. 3499-3508 - Jin Liu

, Huaibo Huang
, Jie Cao
, Ran He
:
ZePo: Zero-Shot Portrait Stylization with Faster Sampling. 3509-3518 - Yiding Li

, Lingyun Yu
, Li Wang
, Hongtao Xie
:
Control-Talker: A Rapid-Customization Talking Head Generation Method for Multi-Condition Control and High-Texture Enhancement. 3519-3527 - Zhaoyang Li

, Zhu Teng
, Baopeng Zhang
, Jianping Fan
:
Boosting Non-causal Semantic Elimination: An Unconventional Harnessing of LVM for Open-World Deepfake Interpretation. 3528-3537 - Zhihao Sun

, Haipeng Fang
, Juan Cao
, Xinying Zhao
, Danding Wang
:
Rethinking Image Editing Detection in the Era of Generative AI Revolution. 3538-3547 - Hongyun Yu

, Zhan Qu
, Qihang Yu
, Jianchuan Chen
, Zhonghua Jiang
, Zhiwen Chen
, Shengyu Zhang
, Jimin Xu
, Fei Wu
, Chengfei Lv
, Gang Yu
:
GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting. 3548-3557 - Xingqi Wang

, Xiaoyuan Yi
, Xing Xie
, Jia Jia
:
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization. 3558-3567 - Weili Zeng

, Yichao Yan
, Qi Zhu
, Zhuo Chen
, Pengzhi Chu
, Weiming Zhao
, Xiaokang Yang
:
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting. 3568-3577 - Yi Liu

, Chengjun Cai
, Xiaoli Zhang
, Xingliang Yuan
, Cong Wang
:
Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts. 3578-3586 - Yisu Liu

, Jinyang An
, Wanqian Zhang
, Dayan Wu
, Jingzi Gu
, Zheng Lin
, Weiping Wang
:
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization. 3587-3596 - Yiren Lu

, Jing Ma
, Yu Yin
:
View-consistent Object Removal in Radiance Fields. 3597-3606 - Shaocong Long

, Qianyu Zhou
, Xiangtai Li
, Xuequan Lu
, Chenhao Ying
, Yuan Luo
, Lizhuang Ma
, Shuicheng Yan
:
DGMamba: Domain Generalization via Generalized State Space Model. 3607-3616 - Wangguandong Zheng

, Haifeng Xia
, Rui Chen
, Libo Sun
, Ming Shao
, Siyu Xia
, Zhengming Ding
:
Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation. 3617-3626 - Ziyin Zhou

, Ke Sun
, Zhongxi Chen
, Huafeng Kuang
, Xiaoshuai Sun
, Rongrong Ji
:
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model. 3627-3636 - Hong Chen

, Xin Wang
, Yipeng Zhang
, Yuwei Zhou
, Zeyang Zhang
, Siao Tang
, Wenwu Zhu
:
DisenStudio: Customized Multi-Subject Text-to-Video Generation with Disentangled Spatial Control. 3637-3646 - Ziqi Yu

, Jing Zhou
, Zhongyun Bao
, Gang Fu
, Weilei He
, Chao Liang
, Chunxia Xiao
:
CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model. 3647-3656 - Hao Wang

, Shangwei Guo
, Jialing He
, Kangjie Chen
, Shudong Zhang
, Tianwei Zhang
, Tao Xiang
:
EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second. 3657-3665 - Haiyan Jiang

, Leiyu Song
, Dongdong Weng
, Zhe Sun
, Huiying Li
, Xiaonuo Dongye
, Zhenliang Zhang
:
In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces. 3666-3675 - Haoning Wu

, Xiele Wu
, Chunyi Li
, Zicheng Zhang
, Chaofeng Chen
, Xiaohong Liu
, Guangtao Zhai
, Weisi Lin
:
T2I-Scorer: Quantitative Evaluation on Text-to-Image Generation via Fine-Tuned Large Multi-Modal Models. 3676-3685 - Shiwei Li

, Yingyi Cheng
, Haozhao Wang
, Xing Tang
, Shijie Xu
, Weihong Luo
, Yuhua Li
, Dugang Liu
, Xiuqiang He
, Ruixuan Li
:
Masked Random Noise for Communication-Efficient Federated Learning. 3686-3694 - Sa Yan

, Nuowen Kan
, Chenglin Li
, Wenrui Dai
, Junni Zou
, Hongkai Xiong
:
Task-Oriented Multi-Bitstream Optimization for Image Compression and Transmission via Optimal Transport. 3695-3703 - Tingting Li

, Ziming Zhao
, Jianwei Yin
:
Minerva: Enhancing Quantum Network Performance for High-Fidelity Multimedia Transmission. 3704-3712 - Xiaotong Yu

, Chang-Wen Chen
:
Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception. 3713-3721 - Yu Chen

, Yanan Wu
, Na Han
, Xiaozhao Fang
, Bingzhi Chen
, Jie Wen
:
Partial Multi-label Learning Based On Near-Far Neighborhood Label Enhancement And Nonlinear Guidance. 3722-3731 - Ruofan Jia

, Weiying Xie
, Jie Lei
, Yunsong Li
:
Adaptive Hierarchical Aggregation for Federated Object Detection. 3732-3740 - Liang Xie

, Wei Gao
, Huiming Zheng
, Ge Li:
ROI-Guided Point Cloud Geometry Compression Towards Human and Machine Vision. 3741-3750
Oral Session 12: Human-centric and Interactive Multimedia
- Xiyu Wang

, Yufei Wang
, Satoshi Tsutsui
, Weisi Lin
, Bihan Wen
, Alex C. Kot
:
Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models. 3751-3760 - Shiyu Liu

, Zibo Zhao
, Yihao Zhi
, Yiqun Zhao
, Binbin Huang
, Shuo Wang
, Ruoyu Wang
, Michael Xuan
, Zhengxin Li
, Shenghua Gao
:
HeroMaker: Human-centric Video Editing with Motion Priors. 3761-3770 - Yunze Liu

, Changxi Chen
, Chenjing Ding
, Li Yi
:
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation. 3771-3780 - Wenxuan Wang

, Haonan Bai
, Jen-tse Huang
, Yuxuan Wan
, Youliang Yuan
, Haoyi Qiu
, Nanyun Peng
, Michael R. Lyu
:
New Job, New Gender? Measuring the Social Bias in Image Generation Models. 3781-3789 - Mengzhen Liu

, Mengyu Wang
, Henghui Ding
, Yilong Xu
, Yao Zhao
, Yunchao Wei
:
Segment Anything with Precise Interaction. 3790-3799 - Zhihua Xu

, Tianshui Chen
, Zhijing Yang
, Chunmei Qing
, Yukai Shi
, Liang Lin
:
Self-Supervised Emotion Representation Disentanglement for Speech-Preserving Facial Expression Manipulation. 3800-3808
Oral Session 13: Machine Learning for Multimedia
- Dongyu Xie

, Chaofan Qiao
, Lanyue Liang
, Zhiwen Wang
, Tianyu Li
, Qiao Liu
, Chongyi Li
, Guoqing Wang
, Yang Yang
:
Generalizing ISP Model by Unsupervised Raw-to-raw Mapping. 3809-3817 - Yang Liu

, Daizong Liu
, Zongming Guo
, Wei Hu
:
Cross-Task Knowledge Transfer for Semi-supervised Joint 3D Grounding and Captioning. 3818-3827 - Yang Liu

, Qianqian Xu
, Peisong Wen
, Siran Dai
, Qingming Huang
:
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval. 3828-3837 - Dongjie Fu

, Xize Cheng
, Xiaoda Yang
, Hanting Wang
, Zhou Zhao
, Tao Jin
:
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts. 3838-3847 - Xingyu Zhu

, Beier Zhu
, Yi Tan
, Shuo Wang
, Yanbin Hao
, Hanwang Zhang
:
Selective Vision-Language Subspace Projection for Few-shot CLIP. 3848-3857 - Jin Liu

, Bo Wang
, Chuanming Wang
, Huiyuan Fu
, Huadong Ma
:
Learning Exposure Correction in Dynamic Scenes. 3858-3866
Oral Session 14: Multimodal Datasets, Models & Analytics
- Fuqiang Niu

, Zebang Cheng
, Xianghua Fu
, Xiaojiang Peng
, Genan Dai
, Yin Chen
, Hu Huang
, Bowen Zhang
:
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model. 3867-3876 - Ruilin Yao

, Shengwu Xiong
, Yichen Zhao
, Yi Rong
:
Visual Grounding with Multi-modal Conditional Adaptation. 3877-3886 - Junhao Xu

, Jingjing Chen, Xue Song
, Feng Han
, Haijun Shan
, Yu-Gang Jiang
:
Identity-Driven Multimedia Forgery Detection via Reference Assistance. 3887-3896 - Bowen Zhao

, Tianhao Cheng
, Yuejie Zhang
, Ying Cheng
, Rui Feng
, Xiaobo Zhang
:
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart. 3897-3906 - Zhanyu Wang

, Longyue Wang
, Zhen Zhao
, Minghao Wu
, Chenyang Lyu
, Huayang Li
, Deng Cai
, Luping Zhou
, Shuming Shi
, Zhaopeng Tu
:
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation. 3907-3916 - Linmei Hu

, Duokang Wang
, Yiming Pan
, Jifan Yu
, Yingxia Shao
, Chong Feng
, Liqiang Nie
:
NovaChart: A Large-scale Dataset towards Chart Understanding and Generation of Multimodal Large Language Models. 3917-3925
Oral Session 15: Video Applications
- Jiaxu Li

, Songsong Yu
, Yifan Wang
, Lijun Wang
, Huchuan Lu
:
SelM: Selective Mechanism based Audio-Visual Segmentation. 3926-3935 - Yuqing Wang, Lei Meng, Haokai Ma, Yuqing Wang, Haibei Huang, Xiangxu Meng:

Modeling Event-level Causal Representation for Video Classification. 3936-3944 - Te Yang

, Jian Jia
, Bo Wang
, Yanhua Cheng
, Yan Li
, Dongze Hao
, Xipeng Cao
, Quan Chen
, Han Li
, Peng Jiang
, Xiangyu Zhu
, Zhen Lei
:
Spatiotemporal Fine-grained Video Description for Short Videos. 3945-3954 - Yili Li

, Jing Yu
, Keke Gai
, Bang Liu
, Gang Xiong
, Qi Wu
:
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval. 3955-3963 - Haijie Yang

, Zhenyu Zhang
, Hao Tang
, Jianjun Qian
, Jian Yang
:
ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance. 3964-3973 - Zhiyu Zhang

, Guo Lu
, Huanxiong Liang
, Zhengxue Cheng
, Anni Tang
, Li Song
:
Rate-aware Compression for NeRF-based Volumetric Video. 3974-3983
Oral Session 16: Biological and Health Applications
- Jingxiong Li

, Sunyi Zheng
, Chenglu Zhu
, Yuxuan Sun
, Pingyi Chen
, Zhongyi Shui
, Yunlong Zhang
, Honglin Li
, Lin Yang
:
PathUp: Patch-wise Timestep Tracking for Multi-class Large Pathology Image Synthesising Diffusion Model. 3984-3993 - Dian Xie

, Peiang Zhao
, Jiarui Zhang
, Kangqi Wei
, Xiaobao Ni
, Jiong Xia
:
BrainRAM: Cross-Modality Retrieval-Augmented Image Reconstruction from Human Brain Activity. 3994-4003 - Shuo Ma

, Yingwei Zhang
, Qiqi Zhang
, Yiqiang Chen
, Haoran Wang
, Ziyu Jia
:
SleepMG: Multimodal Generalizable Sleep Staging with Inter-modal Balance of Classification and Domain Discrimination. 4004-4013 - Zixuan Gong

, Qi Zhang
, Guangyin Bao
, Lei Zhu
, Yu Zhang
, Ke Liu
, Liang Hu
, Duoqian Miao
:
Lite-Mind: Towards Efficient and Robust Brain Representation Learning. 4014-4023 - Kun Dong

, Jian Xue
, Zehai Niu
, Xing Lan
, Ke Lu
, Qingyuan Liu
, Xiaoyu Qin
:
Realistic Full-Body Motion Generation from Sparse Tracking with State Space Model. 4024-4033 - Usman Naseem

, Adam G. Dunn
, Matloob Khushi
, Jinman Kim
:
Vaccine Misinformation Detection in X using Cooperative Multimodal Framework. 4034-4042
Oral Session 17: Person Modeling and Tracking
- Shizong Yan

, Huixiang Wen
, Shan Chang
, Hongzi Zhu
, Luo Zhou
:
Fooling 3D Face Recognition with One Single 2D Image. 4043-4052 - Fangyi Liu

, Mang Ye
, Bo Du
:
Cloth-aware Augmentation for Cloth-generalized Person Re-identification. 4053-4062 - Zhiqi Pang

, Lingling Zhao
, Chunyu Wang
:
Dual-Resolution Fusion Modeling for Unsupervised Cross-Resolution Person Re-Identification. 4063-4072 - Huilin Tian

, Jingke Meng
, Wei-Shi Zheng
, Yuan-Ming Li
, Junkai Yan
, Yunong Zhang
:
Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation. 4073-4081 - Changcheng Xiao

, Qiong Cao
, Zhigang Luo
, Long Lan
:
MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model. 4082-4091 - Ling Li

, Wenrui Yang
, Xinchun Yu
, Junliang Xing
, Xiao-Ping Zhang
:
Translating Motion to Notation: Hand Labanotation for Intuitive and Comprehensive Hand Movement Documentation. 4092-4100
Poster Session 2
- Xiang Gao

, Jiaying Liu
:
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation. 4101-4109 - Wen Yin

, Bin Benjamin Zhu
, Yulai Xie
, Pan Zhou
, Dan Feng
:
Backdoor Attacks on Bimodal Salient Object Detection with RGB-Thermal Data. 4110-4119 - Zhixiang Shen

, Haolan He
, Zhao Kang
:
Balanced Multi-Relational Graph Clustering. 4120-4128 - Jiyuan Wang

, Chunyu Lin
, Lang Nie
, Kang Liao
, Shuwei Shao
, Yao Zhao
:
Digging into Contrastive Learning for Robust Depth Estimation with Diffusion Models. 4129-4137 - Zhuoxiao Chen

, Zixin Wang
, Yadan Luo
, Sen Wang
, Zi Huang
:
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection. 4138-4147 - Xian Zhang

, Haokun Wen
, Jianlong Wu
, Pengda Qin
, Hui Xue'
, Liqiang Nie
:
Differential-Perceptive and Retrieval-Augmented MLLM for Change Captioning. 4148-4157 - Bingyan Liu

, Chengyu Wang
, Jun Huang
, Kui Jia
:
Attentive Linguistic Tracking in Diffusion Models for Training-free Text-guided Image Editing. 4158-4166 - Changhao He

, Hongyuan Zhu
, Peng Hu
, Xi Peng
:
Robust Variational Contrastive Learning for Partially View-unaligned Clustering. 4167-4176 - Shengxin Chen

, Gen Luo
, Yiyi Zhou
, Xiaoshuai Sun
, Guannan Jiang
, Rongrong Ji
:
QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding. 4177-4186 - Rui Liu

, Yifan Hu
, Yi Ren
, Xiang Yin
, Haizhou Li
:
Generative Expressive Conversational Speech Synthesis. 4187-4196 - Zhien Dai

, Zhaohui Tang
, Hu Zhang
, Can Tian
, Mingjun Pan
, Yongfang Xie
:
Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching. 4197-4206 - Humen Zhong

, Zhibo Yang
, Zhaohai Li
, Peng Wang
, Jun Tang
, Wenqing Cheng
, Cong Yao
:
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer. 4207-4216 - Chaofan Gan

, Yuanpeng Tu
, Yuxi Li
, Weiyao Lin
:
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction. 4217-4226 - Zhenyu Hou

, Junjun Guo
:
Virtual Visual-Guided Domain-Shadow Fusion via Modal Exchanging for Domain-Specific Multi-Modal Neural Machine Translation. 4227-4235 - Yuxiang Yang

, Lu Wen
, Xinyi Zeng
, Yuanyuan Xu
, Xi Wu
, Jiliu Zhou
, Yan Wang
:
Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition. 4236-4245 - Shuhuang Chen

, Dingjie Fu
, Shiming Chen
, Shuo Ye
, Wenjin Hou
, Xinge You
:
Causal Visual-semantic Correlation for Zero-shot Learning. 4246-4255 - Patrick Steinert

, Stefan Wagenpfeil
, Ingo Frommholz
, Matthias L. Hemmje
:
256 Metaverse Records Dataset. 4256-4263 - Yifeng Xie

, Zhihong Zhu
, Xin Chen
, Zhanpeng Chen
, Zhiqi Huang
:
MoBA: Mixture of Bi-directional Adapter for Multi-modal Sarcasm Detection. 4264-4272 - Jiulin Li

, Mengyu Yang
, Ye Tian
, Lanshan Zhang
, Yongchun Lu
, Jice Liu
, Wendong Wang
:
WaveDN: A Wavelet-based Training-free Zero-shot Enhancement for Vision-Language Models. 4273-4282 - Runkai Zhao

, Heng Wang
, Weidong Cai
:
LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer. 4283-4291 - Wenju Sun

, Qingyong Li
, Siyu Zhang
, Wen Wang
, Yangli-ao Geng
:
Incremental Learning via Robust Parameter Posterior Fusion. 4292-4301 - Tao Jin

, Weicai Yan
, Ye Wang
, Sihang Cai
, Qifan Shuai
, Zhou Zhao
:
Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding. 4302-4311 - Pengyue Lin

, Ruifan Li
, Yuzhe Ji
, Zhihan Yu
, Fangxiang Feng
, Zhanyu Ma
, Xiaojie Wang
:
Triple Alignment Strategies for Zero-shot Phrase Grounding under Weak Supervision. 4312-4321 - Zhenni Yu

, Xiaoqin Zhang
, Li Zhao
, Yi Bin
, Guobao Xiao
:
Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection. 4322-4330 - Jiawei Wang

, Da Cao
, Shaofei Lu
, Zhanchang Ma
, Junbin Xiao
, Tat-Seng Chua
:
Causal-driven Large Language Models with Faithful Reasoning for Knowledge Question Answering. 4331-4340 - Zijian Yi

, Ziming Zhao
, Zhishu Shen
, Tiehua Zhang
:
Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation. 4341-4348 - Cheng Shen

, Liquan Shen
, Mengyao Li
, Meng Yu
:
EPL-UFLSID: Efficient Pseudo Labels-Driven Underwater Forward-Looking Sonar Images Object Detection. 4349-4357 - Shuiping Gou

, Xin Wang
, Xinlin Wang
, Yunzhi Chen
:
Interpretable Matching of Optical-SAR Image via Dynamically Conditioned Diffusion Models. 4358-4367 - Xiaohuan Ding

, Yangrui Gong
, Tianyi Shi
, Zihang Huang
, Gangwei Xu
, Xin Yang
:
Masked Snake Attention for Fundus Image Restoration with Vessel Preservation. 4368-4376 - Yajie Zhang

, Zhi-An Huang
, Zhiliang Hong
, Songsong Wu
, Jibin Wu
, Kay Chen Tan
:
Mixed Prototype Correction for Causal Inference in Medical Image Classification. 4377-4386 - Yi Zhang

, Ke Yu
, Angelica I. Avilés-Rivero
, Jiyuan Jia
, Yushun Tang
, Zhihai He
:
Training-Free Feature Reconstruction with Sparse Optimization for Vision-Language Models. 4387-4396 - Nan Wang

, Zonglin Di
, Houlin He
, Qingchao Jiang
, Xiaoxiao Li
:
A Simple and Provable Approach for Learning on Noisy Labeled Medical Images. 4397-4405 - Mengmeng Sheng

, Zeren Sun
, Gensheng Pei
, Tao Chen
, Haonan Luo
, Yazhou Yao
:
Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training Approach. 4406-4415 - Muquan Li

, Dongyang Zhang
, Tao He
, Xiurui Xie
, Yuan-Fang Li
, Ke Qin
:
Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation. 4416-4425 - Qiuhui Chen

, Yi Hong
:
SMART: Self-Weighted Multimodal Fusion for Diagnostics of Neurodegenerative Disorders. 4426-4435 - Taoyu Su

, Jiawei Sheng
, Shicheng Wang
, Xinghua Zhang
, Hongbo Xu
, Tingwen Liu
:
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment. 4436-4445 - Zhijun Jia

, Huaying Xue
, Xiulian Peng
, Yan Lu
:
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision. 4446-4454 - Yihan Zhao

, Wei Xi
, Yuhang Cui
, Gairui Bai
, Xinhui Liu
, Jizhong Zhao
:
CoPL: Parameter-Efficient Collaborative Prompt Learning for Audio-Visual Tasks. 4455-4464 - Junbo Hu

, Zhixin Li
:
Distilled Cross-Combination Transformer for Image Captioning with Dual Refined Visual Features. 4465-4474 - Siyuan Xu

, Guannan Li
, Haofei Song
, Jiansheng Wang
, Yan Wang
, Qingli Li
:
GeNSeg-Net: A General Segmentation Framework for Any Nucleus in Immunohistochemistry Images. 4475-4484 - Ziyi Gao

, Kai Chen
, Zhipeng Wei
, Tingshu Mou
, Jingjing Chen, Zhiyu Tan
, Hao Li
, Yu-Gang Jiang
:
ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack. 4485-4494 - Kunyu Peng

, David Schneider
, Alina Roitberg
, Kailun Yang
, Jiaming Zhang
, Chen Deng
, Kaiyu Zhang
, M. Saquib Sarfraz
, Rainer Stiefelhagen
:
Towards Video-based Activated Muscle Group Estimation in the Wild. 4495-4504 - Rui Xu

, Gaolei Li
, Changze Li
, Zhaohui Yang
, Yuchen Liu
, Mingzhe Chen
:
OSNeRF: On-demand Semantic Neural Radiance Fields for Fast and Robust 3D Object Reconstruction. 4505-4514 - Wenjie Li

, Heng Guo
, Xuannan Liu
, Kongming Liang
, Jiani Hu
, Zhanyu Ma
, Jun Guo
:
Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network. 4515-4523 - Ruoxi Deng

, Bin Yu
, Jinxuan Lu
, Caixia Zhou
, Zhao-Min Chen
, Jie Hu
:
Advancing Semantic Edge Detection through Cross-Modal Knowledge Learning. 4524-4532 - Jiacheng Zhang

, Jie Wu
, Huafeng Kuang
, Haiming Zhang
, Yuxi Ren
, Weifeng Chen
, Manlin Zhang
, Xuefeng Xiao
, Guanbin Li
:
TreeReward: Improve Diffusion Model via Tree-Structured Feedback Learning. 4533-4542 - Chaomin Shen

, Yaomin Huang
, Haokun Zhu
, Jinsong Fan
, Guixu Zhang
:
Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation. 4543-4552 - Yanshan Zhou

, Pingrui Lai
, Jiaqi Yu
, Yingjie Xiong
, Hua Yang
:
Hydrodynamics-Informed Neural Network for Simulating Dense Crowd Motion Patterns. 4553-4561 - Zhidong Yu

, Zhenbo Shi
, Xiaoman Liu
, Wei Yang
:
PFFAA: Prototype-based Feature and Frequency Alteration Attack for Semantic Segmentation. 4562-4571 - Wenbo Huang

, Jinghui Zhang
, Xuwei Qian
, Zhen Wu
, Meng Wang
, Lei Zhang
:
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition. 4572-4580 - Xiangyan Qu

, Jing Yu
, Keke Gai
, Jiamin Zhuang
, Yuanmin Tang
, Gang Xiong
, Gaopeng Gou
, Qi Wu
:
Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning. 4581-4590 - Weixiang Han

, Chengjun Cai
, Yu Guo
, Jialiang Peng
:
ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning. 4591-4600 - Luca Rossetto

, Cristina Sarasua
, Abraham Bernstein
:
Estimating the Semantic Density of Visual Media. 4601-4609 - Shaokun Zhang

, Yiran Wu
, Zhonghua Zheng
, Qingyun Wu
, Chi Wang
:
HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts. 4610-4619 - Xiaomeng Chu

, Jiajun Deng
, Guoliang You
, Yifan Duan
, Yao Li
, Yanyong Zhang
:
RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies. 4620-4629 - Yi Bin

, Junrong Liao
, Yujuan Ding
, Haoxuan Li
, Yang Yang
, See-Kiong Ng
, Heng Tao Shen
:
Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning. 4630-4639 - Chengyou Jia

, Minnan Luo
, Xiaojun Chang
, Zhuohang Dang
, Mingfei Han
, Mengmeng Wang
, Guang Dai
, Sizhe Dang
, Jingdong Wang
:
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition. 4640-4649 - Jialu Zhang

, Xinyi Wang
, Chenglin Yao
, Jianfeng Ren
, Xudong Jiang
:
Visual-linguistic Cross-domain Feature Learning with Group Attention and Gamma-correct Gated Fusion for Extracting Commonsense Knowledge. 4650-4659 - Wenhan Wu

, Ce Zheng
, Zihao Yang
, Chen Chen
, Srijan Das
, Aidong Lu
:
Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer. 4660-4669 - Xianwei Zhuang

, Xuxin Cheng
, Zhihong Zhu
, Zhanpeng Chen
, Hongxiang Li
, Yuexian Zou
:
Towards Multimodal-augmented Pre-trained Language Models via Self-balanced Expectation-Maximization Iteration. 4670-4679 - Hongze Zhu

, Guoyang Xie
, Chengbin Hou
, Tao Dai
, Can Gao
, Jinbao Wang
, Linlin Shen
:
Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning. 4680-4689 - Kaixiang Wang

, Xiaojian Ding
, Fan Yang
:
Non-Overlapped Multi-View Weak-Label Learning Guided by Multiple Correlations. 4690-4698 - Xin Mei

, Rui Mao
, Xiaoyan Cai
, Libin Yang
, Erik Cambria
:
Medical Report Generation via Multimodal Spatio-Temporal Fusion. 4699-4708 - Guofan Fan

, Zekun Qi
, Wenkai Shi
, Kaisheng Ma
:
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast. 4709-4718 - Menghao Zhang

, Jingyu Wang
, Qi Qi
, Pengfei Ren
, Haifeng Sun
, Zirui Zhuang
, Huazheng Wang
, Lei Zhang
, Jianxin Liao
:
Video Anomaly Detection via Progressive Learning of Multiple Proxy Tasks. 4719-4728 - Xingyu Zhang

, Siyu Zhao
, Zeen Song
, Huijie Guo
, Jianqi Zhang
, Changwen Zheng
, Wenwen Qiang
:
Not All Frequencies Are Created Equal: Towards a Dynamic Fusion of Frequencies in Time-Series Forecasting. 4729-4737 - Shijie Chen

, Junbao Zhuo
, Xin Li
, Haizhuang Liu
, Rongquan Wang
, Jiansheng Chen
, Huimin Ma
:
CMT: Co-training Mean-Teacher for Unsupervised Domain Adaptation on 3D Object Detection. 4738-4747 - Tianrui Pan

, Jie Liu
, Bohan Wang
, Jie Tang
, Gangshan Wu
:
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues. 4748-4756 - Siqi Wang

, Chao Liang
, Yunfan Gao
, Yang Liu
, Jing Li
, Haofen Wang
:
Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT. 4757-4765 - Yuanbin Fu

, Jie Ying
, Houlei Lv
, Xiaojie Guo
:
Semi-supervised Camouflaged Object Detection from Noisy Data. 4766-4775 - Bolei Chen

, Jiaxu Kang
, Ping Zhong
, Yixiong Liang
, Yu Sheng
, Jianxin Wang
:
Embodied Contrastive Learning with Geometric Consistency and Behavioral Awareness for Object Navigation. 4776-4785 - Jia-Li Yin

, Menghao Chen
, Jin Han
, Bo-Hao Chen
, Ximeng Liu:
Adversarial Example Quality Assessment: A Large-scale Dataset and Strong Baseline. 4786-4794 - Ye Jing

, Xinpei Zhao
:
DQ-Former: Querying Transformer with Dynamic Modality Priority for Cognitive-aligned Multimodal Emotion Recognition in Conversation. 4795-4804 - Xicong Wang

, Huiyuan Fu
, Jiaxuan Wang
, Xin Wang
, Heng Zhang
, Huadong Ma
:
Exploring in Extremely Dark: Low-Light Video Enhancement with Real Events. 4805-4813 - Qing Zhang

, Haocheng Lv
, Jie Liu
, Zhiyun Chen
, Jianyong Duan
, Hao Wang
, Li He
, Mingying Xu
:
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism. 4814-4822 - Kangpeng Hu

, Quansen Sun
, Yinghui Sun
, Tao Wang
:
Interactive Segmentation by Considering First-Click Intentional Ambiguity. 4823-4831 - Leqi Shen

, Sicheng Zhao
, Yifeng Zhang
, Hui Chen
, Jundong Zhou
, Pengzhang Liu
, Yongjun Bao
, Guiguang Ding
:
Multi-Label Learning with Block Diagonal Labels. 4832-4840 - Wentao He

, Jianfeng Ren
, Ruibin Bai
, Xudong Jiang
:
Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. 4841-4850 - Wenxi Li

, Yuchen Guo
, Jilai Zheng
, Haozhe Lin
, Chao Ma
, Lu Fang
, Xiaokang Yang
:
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer. 4851-4860 - Bo Liu

, Zexin Lu
, Yan Wang
:
Towards Medical Vision-Language Contrastive Pre-training via Study-Oriented Semantic Exploration. 4861-4870 - Zihao Liu

, Xiaoyu Wu
, Shengjin Wang
, Jiayao Qian
:
Adaptively Building a Video-language Model for Video Captioning and Retrieval without Massive Video Pretraining. 4871-4880 - Wenhao Guo

, Peng Lu
, Xujun Peng
, Zhaoran Zhao
, Ji Qiu
, Xiangtao Dong
:
BCSCN: Reducing Domain Gap through Bézier Curve basis-based Sparse Coding Network for Single-Image Super-Resolution. 4881-4889 - Yi Tu

, Chong Zhang
, Ya Guo
, Huan Chen
, Jinyang Tang
, Huijia Zhu
, Qi Zhang
:
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents. 4890-4898 - Tao Ling

, Siping Shi
, Hao Wang
, Chuang Hu
, Dan Wang
:
Federated Morozov Regularization for Shortcut Learning in Privacy Preserving Learning with Watermarked Image Data. 4899-4908 - Jinfu Liu

, Chen Chen
, Mengyuan Liu
:
Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition. 4909-4918 - Zewen Du

, Zhenjiang Hu
, Guiyu Zhao
, Ying Jin
, Hongbin Ma
:
LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention. 4919-4927 - Shichen Lu

, Longteng Guo
, Wenxuan Wang
, Zijia Zhao
, Tongtian Yue
, Jing Liu
, Si Liu
:
Collaborative Training of Tiny-Large Vision Language Models. 4928-4937 - Xudong Zhou

, Tianxiang Chen
:
BSBP-RWKV: Background Suppression with Boundary Preservation for Efficient Medical Image Segmentation. 4938-4946 - Yuxing Zhang

, Siyuan Meng
, Chunchun Chen
, Mengyao Peng
, Hongyan Gu
, Xinli Huang
:
LinkThief: Combining Generalized Structure Knowledge with Node Similarity for Link Stealing Attack against GNN. 4947-4956 - Yeqing Shen

, Shang Li
, Kun Song
:
Restoring Real-World Degraded Events Improves Deblurring Quality. 4957-4966 - Xiao Liang

, Yanlei Zhang
, Di Wang
, Haodi Zhong
, Ronghan Li
, Quan Wang
:
Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation. 4967-4975 - Zhen Wang

, Dongyuan Li
, Guang Li
, Ziqing Zhang
, Renhe Jiang
:
Multimodal Low-light Image Enhancement with Depth Information. 4976-4985 - Zishuo Wang

, Wenhao Zhou
, Jinglin Xu
, Yuxin Peng
:
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection. 4986-4994 - Xu Han

, Yuan Tang
, Zhaoxuan Wang
, Xianzhi Li
:
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. 4995-5004 - Wenqi Ren

, Ruihao Xia
, Meng Zheng
, Ziyan Wu
, Yang Tang
, Nicu Sebe
:
Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models. 5005-5014 - Xuefeng Yin

, Chenyang Zhu
, Shanglai Qu
, Yuqi Li
, Kai Xu
, Baocai Yin
, Xin Yang:
CSO: Constraint-Guided Space Optimization for Active Scene Mapping. 5015-5024 - Luoyi Sun

, Xuenan Xu
, Mengyue Wu
, Weidi Xie
:
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning. 5025-5034 - Xinyue Liu

, Jianyuan Wang
, Biao Leng
, Shuo Zhang
:
Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection. 5035-5044 - Huimin Ma

, Siwei Wang
, Shengju Yu
, Suyuan Liu
, Junjie Huang
, Huijun Wu
, Xinwang Liu
, En Zhu
:
Automatic and Aligned Anchor Learning Strategy for Multi-View Clustering. 5045-5054 - Shengyang Sun

, Jiashen Hua
, Junyi Feng
, Dongxu Wei
, Baisheng Lai
, Xiaojin Gong
:
TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection. 5055-5064 - Yang Xin

, Yu Zhou
, Jianmin Jiang
:
RobustFace: Adaptive Mining of Noise and Hard Samples for Robust Face Recognitions. 5065-5073 - Xiang Ma

, Xuemei Li
, Lexin Fang
, Caiming Zhang
:
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching. 5074-5082 - Chunli Peng

, Xuan Dong
, Tiantian Cao
, Zhengqing Li
, Kun Dong
, Weixin Li
:
ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig. 5083-5091 - Yang Fang

, Xuefeng Rao
, Xinbo Gao
, Weisheng Li
, Zijian Min
:
MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation. 5092-5101 - Le Jiang

, Yan Huang
, Lianxin Xie
, Wen Xue
, Cheng Liu, Si Wu
, Hau-San Wong
:
Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data. 5102-5111 - Yijia Guo

, Yuanxi Bai
, Liwen Hu
, Ziyi Guo
, Mianzhi Liu
, Yu Cai
, Tiejun Huang
, Lei Ma
:
PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting. 5112-5120 - Mingcan Xiang

, Jiaxun Tang
, Qizheng Yang
, Hui Guan
, Tongping Liu
:
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model. 5121-5130 - Xinwei Zhang

, Aishan Liu
, Tianyuan Zhang
, Siyuan Liang
, Xianglong Liu
:
Towards Robust Physical-world Backdoor Attacks on Lane Detection. 5131-5140 - Longtao Jiang

, Min Wang
, Zecheng Li
, Yao Fang
, Wengang Zhou
, Houqiang Li
:
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval. 5141-5150 - Pinxue Guo

, Wanyun Li
, Hao Huang
, Lingyi Hong
, Xinyu Zhou
, Zhaoyu Chen
, Jinglun Li
, Kaixun Jiang
, Wei Zhang
, Wenqiang Zhang
:
X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation. 5151-5160 - Ling Huang

, Wenqian Dong
, Song Xiao
, Jiahui Qu
, Yuanbo Yang
, Yunsong Li
:
Language-Guided Visual Prompt Compensation for Multi-Modal Remote Sensing Image Classification with Modality Absence. 5161-5170 - Zening Lin

, Jiapeng Wang
, Teng Li
, Wenhui Liao
, Dayi Huang
, Longfei Xiong
, Lianwen Jin
:
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction. 5171-5180 - Haojian Huang

, Xiaozhen Qiao
, Zhuo Chen
, Haodong Chen
, Bingyu Li
, Zhe Sun
, Mulin Chen
, Xuelong Li
:
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning. 5181-5190 - Shuai Zhao

, Yongkun Du
, Zhineng Chen
, Yu-Gang Jiang
:
Decoder Pre-Training with only Text for Scene Text Recognition. 5191-5200 - Naibo Wang

, Yuchen Deng
, Wenjie Feng
, Shichen Fan
, Jianwei Yin
, See-Kiong Ng
:
One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity. 5201-5210 - Wendong Huang

, Jinwu Hu
, Xiuli Bi
, Bin Xiao:
Anatomical Prior Guided Spatial Contrastive Learning for Few-Shot Medical Image Segmentation. 5211-5220 - Libo Long

, Xiao Hu
, Jochen Lang
:
Learning to Handle Large Obstructions in Video Frame Interpolation. 5221-5229 - Hefei Huang

, Xu Jia
, Xinyu Zhang
, Shengming Li
, Huchuan Lu
:
Event-Guided Rolling Shutter Correction with Time-Aware Cross-Attentions. 5230-5239 - Xibiao Wang

, Hang Gao
, Xindian Wei
, Liang Peng
, Rui Li
, Cheng Liu, Si Wu
, Hau-San Wong
:
Contrastive Graph Distribution Alignment for Partially View-Aligned Clustering. 5240-5249 - Xudong Cai

, Yongcai Wang
, Lun Luo
, Minhang Wang
, Deying Li
, Jintao Xu
, Weihao Gu
, Rui Ai
:
PRISM: PRogressive dependency maxImization for Scale-invariant image Matching. 5250-5259 - Yang Du

, Yuqi Liu
, Qin Jin
:
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval. 5260-5269 - Wen Luo

, Yu Xia
, Tianshu Shen
, Sujian Li
:
Shapley Value-based Contrastive Alignment for Multimodal Information Extraction. 5270-5279 - Hao Yu

, Xin Yang
, Xin Gao
, Yihui Feng
, Hao Wang
, Yan Kang
, Tianrui Li
:
Overcoming Spatial-Temporal Catastrophic Forgetting for Federated Class-Incremental Learning. 5280-5288 - Haibo Wang

, Chenghang Lai
, Yixuan Sun
, Weifeng Ge
:
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering. 5289-5298 - Shudong Huang

, Hecheng Cai
, Hao Dai
, Wentao Feng
, Jiancheng Lv
:
Adaptive Instance-wise Multi-view Clustering. 5299-5307 - Ze Yuan

, Jinyang Guo
, Dakai An
, Junran Wu
, He Zhu
, Jianhao Li
, Xueyuan Chen
, Ke Xu
, Jiaheng Liu
:
VRDistill: Vote Refinement Distillation for Efficient Indoor 3D Object Detection. 5308-5317 - Sunoh Kim

, Daeho Um
, Hyunjun Choi
, Jin Young Choi
:
Learnable Negative Proposals Using Dual-Signed Cross-Entropy Loss for Weakly Supervised Video Moment Localization. 5318-5327 - Yansong Qu

, Shaohui Dai
, Xinyang Li
, Jianghang Lin
, Liujuan Cao
, Shengchuan Zhang
, Rongrong Ji
:
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane. 5328-5337 - Huan Yao

, Changxing Ding
, Xuanda Xu
, Zhifeng Lin
:
Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses Estimation. 5338-5346 - Zhiyu Zhu

, Zhibo Jin
, Jiayu Zhang
, Huaming Chen
:
Enhancing Model Interpretability with Local Attribution over Global Exploration. 5347-5355 - Ruxue Yan

, Wenya Guo
, Xubo Liu
, Xumeng Liu
, Ying Zhang
, Xiaojie Yuan
:
Tracking-forced Referring Video Object Segmentation. 5356-5364 - Xin Zhang

, Shenghua Zhong
, Jianmin Jiang
:
Effective Optimization of Root Selection Towards Improved Explanation of Deep Classifiers. 5365-5373 - Guangchen Shi

, Wei Zhu
, Yirui Wu
, Danhuai Zhao
, Kang Zheng
, Tong Lu
:
Few-shot Semantic Segmentation via Perceptual Attention and Spatial Control. 5374-5383 - Zibo Ma

, Bo Zhang
, Zheng Zhang
, Wu Liu
, Wufan Wang
, Hui Gao
, Wendong Wang
:
ADDG: An Adaptive Domain Generalization Framework for Cross-Plane MRI Segmentation. 5384-5392 - Lixiang Ru

, Xin Guo
, Lei Yu
, Yingying Zhang
, Jiangwei Lao
, Jian Wang
, Jingdong Chen
, Yansheng Li
, Ming Yang
:
Parameter-Efficient Complementary Expert Learning for Long-Tailed Visual Recognition. 5393-5402 - Tianyuan Zhang

, Lu Wang
, Hainan Li
, Yisong Xiao
, Siyuan Liang
, Aishan Liu
, Xianglong Liu
, Dacheng Tao
:
LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions. 5403-5412 - Xinyue Zhang

, Tingjin Luo
, Yueying Liu
, Chenping Hou
:
Imbalanced Multi-instance Multi-label Learning via Coding Ensemble and Adaptive Thresholds. 5413-5422 - Pengxu Chen

, Huazhong Liu
, Jihong Ding
, Jiawen Luo
, Peng Tan
, Laurence T. Yang
:
Holistic-CAM: Ultra-lucid and Sanity Preserving Visual Interpretation in Holistic Stage of CNNs. 5423-5431 - Yihao Wang

, Meng Yang
, Rui Cao
:
Fine-grained Semantic Alignment with Transferred Person-SAM for Text-based Person Retrieval. 5432-5441 - Qijie Wang

, Guandu Liu
, Bin Wang
:
CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification. 5442-5450 - Rongyu Zhang

, Zefan Cai
, Huanrui Yang
, Zidong Liu
, Denis A. Gudovskiy
, Tomoyuki Okuno
, Yohei Nakata
, Kurt Keutzer
, Baobao Chang
, Yuan Du
, Li Du
, Shanghang Zhang
:
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness. 5451-5459 - Linhui Xiao

, Xiaoshan Yang
, Fang Peng
, Yaowei Wang
, Changsheng Xu
:
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding. 5460-5469 - Yunfeng Fan

, Wenchao Xu
, Haozhao Wang
, Junhong Liu
, Song Guo
:
Detached and Interactive Multimodal Learning. 5470-5478 - Chenglong Zhang

, Xinyan Liang
, Peng Zhou
, Zhaolong Ling
, Yingwei Zhang
, Xingyu Wu
, Weiguo Sheng
, Bingbing Jiang
:
Scalable Multi-view Unsupervised Feature Selection with Structure Learning and Fusion. 5479-5488 - Chengyi Yang

, Mingda Dong
, Xiaoyue Zhang
, Jiayin Qi
, Aimin Zhou
:
Introducing Common Null Space of Gradients for Gradient Projection Methods in Continual Learning. 5489-5497 - Masoumeh Zareapoor

, Pourya Shamsolmoali
, Huiyu Zhou
, Yue Lu
, Salvador García
:
Fractional Correspondence Framework in Detection Transformer. 5498-5506 - Geuntaek Lim

, Hyunwoo Kim
, Joonsoo Kim
, Yukyung Choi
:
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization. 5507-5516 - Xihong Yang

, Erxue Min
, Ke Liang
, Yue Liu
, Siwei Wang
, Sihang Zhou
, Huijun Wu
, Xinwang Liu
, En Zhu
:
GraphLearner: Graph Node Clustering with Fully Learnable Augmentation. 5517-5526 - Hongqiu Wang

, Wei Wang
, Haipeng Zhou
, Huihui Xu
, Shaozhi Wu
, Lei Zhu
:
Language-Driven Interactive Shadow Detection. 5527-5536 - Jinyu Cai

, Yunhe Zhang
, Zhoumin Lu
, Wenzhong Guo
, See-Kiong Ng
:
Towards Effective Federated Graph Anomaly Detection via Self-boosted Knowledge Distillation. 5537-5546 - Chaofan Huo

, Ye Shi
, Jingya Wang
:
Monocular Human-Object Reconstruction in the Wild. 5547-5555 - Baoqi Gao

, Daoxu Sheng
, Lei Zhang
, Qi Qi
, Bo He
, Zirui Zhuang
, Jingyu Wang
:
STAR-VP: Improving Long-term Viewport Prediction in 360° Videos via Space-aligned and Time-varying Fusion. 5556-5565 - Hu Gao, Jing Yang, Ying Zhang, Jingfan Yang, Bowen Ma, Depeng Dang:

Learning Optimal Combination Patterns for Lightweight Stereo Image Super-Resolution. 5566-5574 - Yifan Wang

, Wuliang Huang
, Lei Li
, Chun Yuan
:
Semantic Distillation from Neighborhood for Composed Image Retrieval. 5575-5583 - Zhentao He

, Changqun Xia
, Shengye Qiao
, Jia Li
:
Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning. 5584-5593 - Zuyu Zhang

, Yan Li
, Byung-Seok Shin
:
Embracing Domain Gradient Conflicts: Domain Generalization Using Domain Gradient Equilibrium. 5594-5603 - Ting Zhe

, Jing Zhang
, Yongqian Li
, Yong Luo
, Han Hu
, Dacheng Tao
:
Multi-Granularity Hand Action Detection. 5604-5613 - Xingyuan Mao

, Yuwen Liu
, Lianyong Qi
, Li Duan
, Xiaolong Xu
, Xuyun Zhang
, Wanchun Dou
, Amin Beheshti
, Xiaokang Zhou
:
Cluster-driven Personalized Federated Recommendation with Interest-aware Graph Convolution Network for Multimedia. 5614-5622 - Yuan Sun

, Kaiming Liu
, Yongxiang Li
, Zhenwen Ren
, Jian Dai
, Dezhong Peng
:
Distribution Consistency Guided Hashing for Cross-Modal Retrieval. 5623-5632 - Luanyuan Dai

, Xiaoyu Du
, Jinhui Tang
:
TrGa: Reconsidering the Application of Graph Neural Networks in Two-View Correspondence Pruning. 5633-5642 - Han Jiang

, Haoyu Tang
, Ming Yan
, Ji Zhang
, Mingzhu Xu
, Yupeng Hu
, Jihua Zhu
, Liqiang Nie
:
Revisiting Unsupervised Temporal Action Localization: The Primacy of High-Quality Actionness and Pseudolabels. 5643-5652 - Yu Liao

, Xinfeng Zhang
, Rui Yang
, Jianwei Tao
, Bai Liu
, Zhipeng Hu
, Shuang Wang
, Zeng Zhao
:
Selection and Reconstruction of Key Locals: A Novel Specific Domain Image-Text Retrieval Method. 5653-5662 - Wei Yang

, Qingchen Yang
:
Multimodal-aware Multi-intention Learning for Recommendation. 5663-5672 - Liupeng Li

, Yuhua Zheng
, Shupeng Liu
, Xiaoyin Xu
, Taihao Li
:
Domain Knowledge Enhanced Vision-Language Pretrained Model for Dynamic Facial Expression Recognition. 5673-5682 - Yuting Zhang

, Zhao Zhang
, Yiqing Wu
, Ying Sun
, Fuzhen Zhuang
, Wenhui Yu
, Lantao Hu
, Han Li
, Kun Gai
, Zhulin An
, Yongjun Xu
:
Tag Tree-Guided Multi-grained Alignment for Multi-Domain Short Video Recommendation. 5683-5691 - Kai Shao

, Rui Wang
, Yixue Hao
, Long Hu
, Min Chen
, Hans-Arno Jacobsen
:
Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition. 5692-5701 - Xinyu Li

, Wenqing Ye
, Yueyi Zhang
, Xiaoyan Sun
:
GRACE: GRadient-based Active Learning with Curriculum Enhancement for Multimodal Sentiment Analysis. 5702-5711 - Yuchen Pan

, Junjun Jiang
, Kui Jiang
, Xianming Liu
:
Disentangled-Multimodal Privileged Knowledge Distillation for Depression Recognition with Incomplete Multimodal Data. 5712-5721 - Yuanyuan Liu

, Yuxuan Huang
, Shuyang Liu
, Yibing Zhan
, Zijing Chen
, Zhe Chen
:
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting. 5722-5731 - Aoqiang Zhu

, Min Hu
, Xiaohua Wang
, Jiaoyun Yang
, Yiming Tang
, Fuji Ren
:
KEBR: Knowledge Enhanced Self-Supervised Balanced Representation for Multimodal Sentiment Analysis. 5732-5741 - Zining Wang

, Jinyang Guo
, Ruihao Gong
, Yang Yong
, Aishan Liu
, Yushi Huang
, Jiaheng Liu
, Xianglong Liu
:
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models. 5742-5751 - Longan Wang

, Yang Qin
, Yuan Sun
, Dezhong Peng
, Xi Peng
, Peng Hu
:
Robust Contrastive Cross-modal Hashing with Noisy Labels. 5752-5760 - Xiying Zheng

, Yukang Zhang
, Yang Lu
, Hanzi Wang
:
Semi-supervised Visible-Infrared Person Re-identification via Modality Unification and Confidence Guidance. 5761-5770 - Ziyang Zhou

, Pinghui Wang
, Zi Liang
, Ruofei Zhang
, Haitao Bai
:
PAIR: Pre-denosing Augmented Image Retrieval Model for Defending Adversarial Patches. 5771-5779 - Daiqing Wu

, Dongbao Yang
, Yu Zhou
, Can Ma
:
Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion. 5780-5789 - Kunlun Xu

, Haozhuo Zhang
, Yu Li
, Yuxin Peng
, Jiahuan Zhou
:
Mitigate Catastrophic Remembering via Continual Knowledge Purification for Noisy Lifelong Person Re-Identification. 5790-5799 - Wei Shen

, Mang Ye
, Wenke Huang
:
Resisting Over-Smoothing in Graph Neural Networks via Dual-Dimensional Decoupling. 5800-5809 - Junlin Fang

, Wenya Wang
, Guosheng Lin
, Fengmao Lv
:
Sentiment-oriented Sarcasm Integration for Video Sentiment Analysis Enhancement with Sarcasm Assistance. 5810-5819 - Fanfan Wang

, Heqing Ma
, Xiangqing Shen
, Jianfei Yu
, Rui Xia
:
Observe before Generate: Emotion-Cause aware Video Caption for Multimodal Emotion Cause Generation in Conversations. 5820-5828 - Yang Yang

, Liyuan Cao
, Haoyu Shi
, Huaiwen Zhang
:
Multi-Instance Multi-Label Learning for Text-motion Retrieval. 5829-5837 - Hongzu Su

, Jingjing Li
, Fengling Li
, Ke Lu
, Lei Zhu
:
SOIL: Contrastive Second-Order Interest Learning for Multimodal Recommendation. 5838-5846 - Jiansong Qi

, Yaping Huang
, Ying Zhang
, Sihui Zhang
, Mei Tian
, Yi Tian
, Fanchao Meng
, Lin Guan
, Tianyi Chang
:
Visual Question Answering Driven Eye Tracking Paradigm for Identifying Children with Autism Spectrum Disorder. 5847-5855 - Dongxiao He

, Jinghan Zhang
, Xiaobao Wang
, Meng Ge
, Zhiyong Feng
, Longbiao Wang
, Xiaoke Ma
:
TUT4CRS: Time-aware User-preference Tracking for Conversational Recommendation System. 5856-5864 - Guoqing Yang

, Zhiming Luo
, Jianzhe Gao
, Yingxin Lai
, Kun Yang
, Yifan He
, Shaozi Li
:
A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection. 5865-5873 - Zekun Ai

, Xiaotong Luo
, Yanyun Qu
, Yuan Xie
:
SkipVSR: Adaptive Patch Routing for Video Super-Resolution with Inter-Frame Mask. 5874-5882 - Qianxin Huang

, Siyao Peng
, Xiaobo Shen
, Yunhao Yuan
, Shirui Pan
:
Similarity Preserving Transformer Cross-Modal Hashing for Video-Text Retrieval. 5883-5891 - Wenxiao Zhang

, Hossein Rahmani
, Xun Yang
, Jun Liu
:
Reverse2Complete: Unpaired Multimodal Point Cloud Completion via Guided Diffusion. 5892-5901 - Yitong Sun

, Yao Huang
, Xingxing Wei
:
Embodied Laser Attack: Leveraging Scene Priors to Achieve Agent-based Robust Non-contact Attacks. 5902-5910 - Yipo Huang

, Xiangfei Sheng
, Zhichao Yang
, Quan Yuan
, Zhichao Duan
, Pengfei Chen
, Leida Li
, Weisi Lin
, Guangming Shi
:
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception. 5911-5920 - Ji Qiu

, Peng Lu
, Xujun Peng
, Wenhao Guo
, Zhaoran Zhao
, Xiangtao Dong
:
Learning Realistic Sketching: A Dual-agent Reinforcement Learning Approach. 5921-5929 - Xiaobo Shen

, Gaoyao Yu
, Yinfan Chen
, Xichen Yang
, Yuhui Zheng
:
Graph Convolutional Semi-Supervised Cross-Modal Hashing. 5930-5938 - Harry Cheng

, Yangyang Guo
, Tianyi Wang
, Liqiang Nie
, Mohan S. Kankanhalli
:
Diffusion Facial Forgery Detection. 5939-5948 - Hengxing Liu

, Mingjia Li
, Xiaojie Guo
:
Regional Attention For Shadow Removal. 5949-5957 - Hao Fang

, Haoyuan Zhao
, Jianxin Shi
, Miao Zhang
, Guanzhen Wu
, Yi Ching Chou
, Feng Wang
, Jiangchuan Liu
:
Robust Live Streaming over LEO Satellite Constellations: Measurement, Analysis, and Handover-Aware Adaptation. 5958-5966 - Qi Zang

, Shuang Wang
, Dong Zhao
, Yang Hu
, Dou Quan
, Jinlong Li
, Nicu Sebe
, Zhun Zhong
:
Generalized Source-Free Domain-adaptive Segmentation via Reliable Knowledge Propagation. 5967-5976 - Yunqiang Pei

, Jialei Tang
, Qihang Tang
, Mingfeng Zha
, Dongyu Xie
, Guoqing Wang
, Zhitao Liu
, Ning Xie
, Peng Wang
, Yang Yang
, Hengtao Shen
:
Emotion Recognition in HMDs: A Multi-task Approach Using Physiological Signals and Occluded Faces. 5977-5986 - Xiaochao Pan

, Jiawei Yao
, Hongrui Kou
, Tong Wu
, Canran Xiao
:
HarmonicNeRF: Geometry-Informed Synthetic View Augmentation for 3D Scene Reconstruction in Driving Scenarios. 5987-5996 - Guangyao Li

, Henghui Du
, Di Hu
:
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues. 5997-6005 - Jiongming Qin

, Fei Luo
, Tuo Cao
, Wenju Xu
, Chunxia Xiao
:
HS-Surf: A Novel High-Frequency Surface Shell Radiance Field to Improve Large-Scale Scene Rendering. 6006-6014 - Gang Wu

, Junjun Jiang
, Kui Jiang
, Xianming Liu
:
Harmony in Diversity: Improving All-in-One Image Restoration via Multi-Task Collaboration. 6015-6023 - Meichen Liu

, Shuting He
, Songnan Lin
, Bihan Wen
:
Dual-head Genre-instance Transformer Network for Arbitrary Style Transfer. 6024-6032 - Yingjie Zhou

, Zicheng Zhang
, Wei Sun
, Xiaohong Liu
, Xiongkuo Min
, Guangtao Zhai
:
Subjective and Objective Quality-of-Experience Assessment for 3D Talking Heads. 6033-6042 - Zhi Zhou

, Junke Zhu
, Zhangjin Huang
:
Gaussian Splatting with Neural Basis Extension. 6043-6052 - Zhenyu Zhang

, Guangyao Chen
, Yixiong Zou
, Yuhua Li
, Ruixuan Li
:
Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-shot Open-Set Recognition. 6053-6062 - Jinxiao Zhang

, Runmin Dong
, Juepeng Zheng
, Mengxuan Chen
, Lixian Zhang
, Yi Zhao
, Haohuan Fu
:
Spatial-Temporal Context Model for Remote Sensing Imagery Compression. 6063-6072 - Weiying Xie

, Mei Yuan
, Jitao Ma
, Yunsong Li
:
Adaptive Pruning of Channel Spatial Dependability in Convolutional Neural Networks. 6073-6082 - Heng Fang

, Sheng Huang
, Wenhao Tang
, Luwen Huangfu
, Bo Liu
:
SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification. 6083-6092 - Wenhao Shen

, Wanqi Yin
, Hao Wang
, Chen Wei
, Zhongang Cai
, Lei Yang
, Guosheng Lin
:
HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh Recovery. 6093-6102 - Shalayiding Sirejiding

, Bayram Bayramli
, Yuxiang Lu
, Yuwen Yang
, Tamam Alsarhan
, Hongtao Lu
, Yue Ding
:
Task-Interaction-Free Multi-Task Learning with Efficient Hierarchical Feature Representation. 6103-6112 - Yiyong Xiao

, Kai Shu
, Haoyi Zhang
, Baohua Yin
, Wai Seng Cheang
, Haoyang Wang
, Jiechao Gao
:
EGGesture: Entropy-Guided Vector Quantized Variational AutoEncoder for Co-Speech Gesture Generation. 6113-6122 - Yuqi Sun

, Qing Lin
, Weimin Tan
, Bo Yan
:
Audio-Driven Identity Manipulation for Face Inpainting. 6123-6132 - Leilei Ma

, Hongxing Xie
, Lei Wang
, Yanping Fu
, Dengdi Sun
, Haifeng Zhao
:
Text-Region Matching for Multi-Label Image Recognition with Missing Labels. 6133-6142 - Zhengwei Yin

, Guixu Lin
, Mengshun Hu
, Hao Zhang
, Yinqiang Zheng
:
FlexIR: Towards Flexible and Manipulable Image Restoration. 6143-6152 - Hamed Alimohammadzadeh

, Shahram Ghandeharizadeh
:
Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks. 6153-6161 - Xiaowen Cai

, Yunbo Tao
, Daizong Liu
, Pan Zhou
, Xiaoye Qu
, Jianfeng Dong
, Keke Tang
, Lichao Sun
:
Frequency-Aware GAN for Imperceptible Transfer Attack on 3D Point Clouds. 6162-6171 - Mingjin Zhang

, Shilong Liu
, Yuanjun Ouyang
, Jie Guo
, Zhihong Tang
, Yunsong Li
:
Explore Hybrid Modeling for Moving Infrared Small Target Detection. 6172-6181 - Yuhui Quan

, Xiaoheng Tan
, Yan Huang
, Yong Xu
, Hui Ji
:
Enhancing Underwater Images via Asymmetric Multi-Scale Invertible Networks. 6182-6191 - Lishuang Zhan

, Enting Ying
, Jiabao Gan
, Shihui Guo
, Boyu Gao
, Yipeng Qin
:
SATPose: Improving Monocular 3D Pose Estimation with Spatial-aware Ground Tactility. 6192-6201 - Hongjian Zhan

, Yangfu Li
, Yu-Jie Xiong
, Umapada Pal
, Yue Lu
:
Free Lunch: Frame-level Contrastive Learning with Text Perceiver for Robust Scene Text Recognition in Lightweight Models. 6202-6211 - Xin Wang

, Kai Chen
, Xingjun Ma
, Zhineng Chen
, Jingjing Chen
, Yu-Gang Jiang
:
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning. 6212-6221 - Xudong Lv

, Zhiwei He
, Yuxiang Yang
, Jiahao Nie
, Jing Zhang
:
SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding. 6222-6231 - Shao-Kui Zhang

, Junkai Huang
, Liang Yue
, Jia-Tong Zhang
, Jia-Hong Liu
, Yu-Kun Lai
, Song-Hai Zhang
:
SceneExpander: Real-Time Scene Synthesis for Interactive Floor Plan Editing. 6232-6240 - Long Tian

, Hongyi Zhao
, Ruiying Lu
, Rongrong Wang
, Yujie Wu
, Liming Wang
, Xiongpeng He
, Xiyang Liu
:
FOCT: Few-shot Industrial Anomaly Detection with Foreground-aware Online Conditional Transport. 6241-6249 - Chuang Liu

, Yichao Cao
, Xiu Su
, Haogang Zhu
:
Universal Frequency Domain Perturbation for Single-Source Domain Generalization. 6250-6259 - Yushun Tang

, Shuoshuo Chen
, Jiyuan Jia
, Yi Zhang
, Zhihai He
:
Domain-Conditioned Transformer for Fully Test-time Adaptation. 6260-6269 - Zhiru Wang

, Shiyun Xie
, Chengwei Pan
, Guoping Wang
:
SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting. 6270-6278 - Wencheng Han

, Chen Zhang
, Yang Zhou
, Wentao Liu
, Chen Qian
, Chengzhong Xu
, Jianbing Shen
:
Prior Metadata-Driven RAW Reconstruction: Eliminating the Need for Per-Image Metadata. 6279-6287 - Fulin Luo

, Yi Liu
, Xiuwen Gong
, Zhixiong Nan
, Tan Guo
:
EMVCC: Enhanced Multi-View Contrastive Clustering for Hyperspectral Images. 6288-6296 - Fan Nie

, Jiangqun Ni
, Jian Zhang
, Bin Zhang
, Weizhe Zhang
:
FRADE: Forgery-aware Audio-distilled Multimodal Learning for Deepfake Detection. 6297-6306 - Siru Zhong

, Xixuan Hao
, Yibo Yan
, Ying Zhang
, Yangqiu Song
, Yuxuan Liang
:
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation. 6307-6315 - Yuzhen Niu

, Lifen Yang
, Rui Xu
, Yuezhou Li
, Yuzhong Chen
:
MiNet: Weakly-Supervised Camouflaged Object Detection through Mutual Interaction between Region and Edge Cues. 6316-6325 - Delong Zhang

, Yi-Xing Peng
, Xiao-Ming Wu
, Ancong Wu
, Weishi Zheng
:
PixelFade: Privacy-preserving Person Re-identification with Noise-guided Progressive Replacement. 6326-6334 - Wei He

, Xiang Li
, Shengtian Xu
, Yuzheng Chen
, Chan-In Sio
, Ge Lin Kan
, Lik-Hang Lee
:
MetaDragonBoat: Exploring Paddling Techniques of Virtual Dragon Boating in a Metaverse Campus. 6335-6344 - Yuxuan Lu

, Jiahao Nie
, Zhiwei He
, Hongjie Gu
, Xudong Lv
:
VoxelTrack: Exploring Multi-level Voxel Representation for 3D Point Cloud Object Tracking. 6345-6354 - Yu Liu

, Longhan Feng
, Qi Jia
, Zezheng Liu
, Zi-Huang Cao
:
Two Teachers Are Better Than One: Semi-supervised Elliptical Object Detection by Dual-Teacher Collaborative Guidance. 6355-6363 - Yao Luo

, Ming Yang
, Jinhui Tang
:
Dual-view Pyramid Network for Video Frame Interpolation. 6364-6373 - Junxiong Lin

, Zen Tao
, Xuan Tong
, Xinji Mai
, Haoran Wang
, Boyang Wang
, Yan Wang
, Qing Zhao
, Jiawen Yu
, Yuxuan Lin
, Shaoqi Yan
, Shuyong Gao
, Wenqiang Zhang
:
Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution. 6374-6383 - Wenxiao Zhang

, Ziqi Wang
, Li Xu
, Xun Yang
, Jun Liu
:
Informative Point cloud Dataset Extraction for Classification via Gradient-based Points Moving. 6384-6393 - Jia-Hong Liu

, Shao-Kui Zhang
, Chuyue Zhang
, Song-Hai Zhang
:
Controllable Procedural Generation of Landscapes. 6394-6403 - Fangjian Liao

, Xingxing Zou
, Waikeung Wong
:
Uni-DlLoRA: Style Fine-Tuning for Fashion Image Translation. 6404-6413 - Yusen Wang

, Kaixuan Zhou
, Wenxiao Zhang
, Chunxia Xiao
:
MegaSurf: Scalable Large Scene Neural Surface Reconstruction. 6414-6423 - Zherui Qiu

, Chenqu Ren
, Kaiwen Song
, Xiaoyi Zeng
, Leyuan Yang
, Juyong Zhang
:
Deformable NeRF using Recursively Subdivided Tetrahedra. 6424-6432 - Mamta

, Gopendra Vikram Singh
, Deepak Raju Kori
, Asif Ekbal
:
Aspect-Based Multimodal Mining: Unveiling Sentiments, Complaints, and Beyond in User-Generated Content. 6433-6442 - Zichen Liu

, Yuxin Peng
, Jiahuan Zhou
:
InsVP: Efficient Instance Visual Prompting from Image Itself. 6443-6452 - Zidu Wang

, Xiangyu Zhu
, Jiang Yu
, Tianshuo Zhang
, Zhen Lei
:
S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single Sketch. 6453-6462 - Satoshi Kosugi

:
Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement. 6463-6471 - Xun Jiang

, Zhuoyuan Wei
, Shenshen Li
, Xing Xu
, Jingkuan Song
, Heng Tao Shen
:
Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding. 6472-6481 - Bingzhi Chen

, Ruihan Liu
, Yishu Liu
, Xiaozhao Fang
, Jiahui Pan
, Guangming Lu
, Zheng Zhang
:
Stay Focused is All You Need for Adversarial Robustness. 6482-6491 - Zhi Zeng

, Minnan Luo
, Xiangzheng Kong
, Huan Liu
, Hao Guo
, Hao Yang
, Zihan Ma
, Xiang Zhao
:
Mitigating World Biases: A Multimodal Multi-View Debiasing Framework for Fake News Video Detection. 6492-6500 - Zibin Liu

, Banglei Guan
, Yang Shang
, Shunkun Liang
, Zhenbao Yu
, Qifeng Yu
:
Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera. 6501-6509 - Junran Wu

, Xueyuan Chen
, Shangzhe Li
:
Uncovering Capabilities of Model Pruning in Graph Contrastive Learning. 6510-6519 - Zheng Wei

, Yuzheng Chen
, Wai Tong
, Xuan Zong
, Huamin Qu
, Xian Xu
, Lik-Hang Lee
:
Hearing the Moment with MetaEcho! From Physical to Virtual in Synchronized Sound Recording. 6520-6529 - Cong Wang

, Chengjin Yu
, Jie Mu
, Wei Wang
:
PercepLIE: A New Path to Perceptual Low-Light Image Enhancement. 6530-6539 - Xin Cheng

, Hao Wang
, Jinwei Wang
, Xiangyang Luo
, Bin Ma
:
Advancing Quantization Steps Estimation: A Two-Stream Network Approach for Enhancing Robustness. 6540-6548 - Mingjin Zhang

, Longyi Li
, Wenxuan Shi
, Jie Guo
, Yunsong Li
, Xinbo Gao
:
VmambaSCI: Dynamic Deep Unfolding Network with Mamba for Compressive Spectral Imaging. 6549-6558 - Rui-Chen Zheng

, Yang Ai
, Zhen-Hua Ling
:
Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation. 6559-6568 - Junyuan Guo

, Hao Tang
, Teng Wang
, Chao Wang
:
R4D-planes: Remapping Planes For Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos. 6569-6577 - Wu Chen

, Hehe Fan
, Qiuping Jiang
, Chao Huang
, Yi Yang
:
Progressive Point Cloud Denoising with Cross-Stage Cross-Coder Adaptive Edge Graph Convolution Network. 6578-6587 - Mingyang Sun

, Qipeng Yan
, Zhuoer Liang
, Dongliang Kou
, Dingkang Yang
, Ruisheng Yuan
, Xiao Zhao
, Mingcheng Li
, Lihua Zhang
:
IF-Garments: Reconstructing Your Intersection-Free Multi-Layered Garments from Monocular Videos. 6588-6597 - Bo Dong

, Pichao Wang
, Hao Luo
, Fan Wang
:
Adaptive Query Selection for Camouflaged Instance Segmentation. 6598-6606 - Yuxin Mao

, Xuyang Shen
, Jing Zhang
, Zhen Qin
, Jinxing Zhou
, Mochu Xiang
, Yiran Zhong
, Yuchao Dai
:
TAVGBench: Benchmarking Text to Audible-Video Generation. 6607-6616 - Yuan Tang

, Xu Han
, Xianzhi Li
, Qiao Yu
, Yixue Hao
, Long Hu
, Min Chen
:
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors. 6617-6626 - Guan Luo

, Tian-Xing Xu
, Ying-Tian Liu
, Xiaoxiong Fan
, Fang-Lue Zhang
, Song-Hai Zhang
:
3D Gaussian Editing with A Single Image. 6627-6636 - Zhenhong Sun

, Junyan Wang
, Zhiyu Tan
, Daoyi Dong
, Hailan Ma
, Hao Li
, Dong Gong
:
EGGen: Image Generation with Multi-entity Prior Learning through Entity Guidance. 6637-6645 - Zhengzhong Kuang

, Jianan Lu
, Chenhui Hong
, Haobin Huang
, Suguo Zhu
, Xiaowei Zhao
, Jun Yu
, Jianping Fan
:
Latent Representation Reorganization for Face Privacy Protection. 6646-6655 - Wulin Xie

, Xiaohuan Lu
, Yadong Liu
, Jiang Long
, Bob Zhang
, Shuping Zhao
, Jie Wen
:
Uncertainty-Aware Pseudo-Labeling and Dual Graph Driven Network for Incomplete Multi-View Multi-Label Classification. 6656-6665 - Mingzhao Yang

, Shangchao Su
, Bin Li
, Xiangyang Xue
:
FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models. 6666-6675 - Ruiyang Xia

, Dawei Zhou
, Decheng Liu
, Lin Yuan
, Shuodi Wang
, Jie Li
, Nannan Wang
, Xinbo Gao
:
Advancing Generalized Deepfake Detector with Forgery Perception Guidance. 6676-6685 - Hongye Hou

, Xuehao Gao
, Zhan Liu
, Yang Yang
:
Dig into Detailed Structures: Key Context Encoding and Semantic-based Decoding for Point Cloud Completion. 6686-6695 - Tao Liu

, Feilong Chen
, Shuai Fan
, Chenpeng Du
, Qi Chen
, Xie Chen
, Kai Yu
:
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding. 6696-6705 - Qi Chen

, Wenjie Liu
, Hu Ding
:
A Novel Confidence Guided Training Method for Conditional GANs with Auxiliary Classifier. 6706-6714 - Yukang Lin

, Haonan Han
, Chaoqun Gong
, Zunnan Xu
, Yachao Zhang
, Xiu Li
:
Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors. 6715-6724 - Zhaoyu Zhang

, Yang Hua
, Guanxiong Sun
, Hui Wang
, Seán F. McLoone
:
Improving the Training of the GANs with Limited Data via Dual Adaptive Noise Injection. 6725-6734 - Changgu Chen

, Libing Yang
, Xiaoyan Yang
, Lianggangxu Chen
, Gaoqi He
, Changbo Wang
, Yang Li
:
FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models. 6735-6744 - Tianyi Lu

, Xing Zhang
, Jiaxi Gu
, Renjing Pei
, Songcen Xu
, Xingjun Ma
, Hang Xu
, Zuxuan Wu
:
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models. 6745-6754 - Zhichao Liao

, Fengyuan Piao
, Di Huang
, Xinghui Li
, Yue Ma
, Pingfa Feng
, Heming Fang
, Long Zeng
:
Freehand Sketch Generation from Mechanical Components. 6755-6764 - Qishan Zhang

, Shuangbing Wen
, Tao Hu
:
Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier. 6765-6773 - Bohong Chen

, Yumeng Li
, Yao-Xiang Ding
, Tianjia Shao
, Kun Zhou
:
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation. 6774-6783 - Xiangcheng Du

, Zhao Zhou
, Xingjiao Wu
, Yanlong Wang
, Zhuoyao Wang
, Yingbin Zheng
, Cheng Jin
:
MultiColor: Image Colorization by Learning from Multiple Color Spaces. 6784-6792 - Haozhe Jia

, Yan Li
, Hengfei Cui
, Di Xu
, Yuwang Wang
, Tao Yu
:
DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing. 6793-6802 - Lutao Jiang

, Hangyu Li
, Lin Wang
:
A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness. 6803-6812 - Yiluo Wei

, Gareth Tyson
:
Understanding the Impact of AI-Generated Content on Social Media: The Pixiv Case. 6813-6822 - Ruiqi Zhang

, Jie Chen
:
Mesh-Centric Gaussian Splatting for Human Avatar Modelling with Real-time Dynamic Mesh Reconstruction. 6823-6832 - Bo Xiong

, Changqing Su
, Zihan Lin
, Yanqin Chen
, You Zhou
, Zhen Cheng
, Zhaofei Yu
, Tiejun Huang
:
Real-time Parameter Evaluation of High-speed Microfluidic Droplets using Continuous Spike Streams. 6833-6841 - Qi Mao

, Lan Chen
, Yuchao Gu
, Zhen Fang
, Mike Zheng Shou
:
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance. 6842-6850 - Guan-Yuan Chen

, Von-Wun Soo
:
Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning. 6851-6859 - Weitian Zhang

, Yichao Yan
, Yunhui Liu
, Xingdong Sheng
, Xiaokang Yang
:
E3Gen: Efficient, Expressive and Editable Avatars Generation. 6860-6869 - Haibo Yang

, Yang Chen
, Yingwei Pan
, Ting Yao
, Zhineng Chen
, Chong-Wah Ngo
, Tao Mei
:
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models. 6870-6879 - Shuo Huang

, Shikun Sun
, Zixuan Wang
, Xiaoyu Qin
, Yanmin Xiong
, Yuan Zhang
, Pengfei Wan
, Di Zhang
, Jia Jia
:
PlacidDreamer: Advancing Harmony in Text-to-3D Generation. 6880-6889 - Xiaodi Li

:
Streamable Portrait Video Editing with Probabilistic Pixel Correspondence. 6890-6899 - Xuan Hai

, Xin Liu
, Yuan Tan
, Gang Liu
, Song Li
, Weina Niu
, Rui Zhou
, Xiaokang Zhou
:
What's the Real: A Novel Design Philosophy for Robust AI-Synthesized Voice Detection. 6900-6909 - Xiangyang Luo

, Xin Zhang
, Yifan Xie
, Xinyi Tong
, Weijiang Yu
, Heng Chang
, Fei Ma
, Fei Richard Yu
:
CodeSwap: Symmetrically Face Swapping Based on Prior Codebook. 6910-6919 - Ruofan Wang

, Xingjun Ma
, Hanxu Zhou
, Chuanjun Ji
, Guangnan Ye
, Yu-Gang Jiang
:
White-box Multimodal Jailbreaks Against Large Vision-Language Models. 6920-6928 - Anwen Hu

, Yaya Shi
, Haiyang Xu
, Jiabo Ye
, Qinghao Ye
, Ming Yan
, Chenliang Li
, Qi Qian
, Ji Zhang
, Fei Huang
:
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model. 6929-6938 - Weifeng Chen

, Tao Gu
, Yuhao Xu
, Arlene Chen
:
Magic Clothing: Controllable Garment-Driven Image Synthesis. 6939-6948 - Yiluo Wei

, Yiming Zhu
, Pan Hui
, Gareth Tyson
:
Exploring the Use of Abusive Generative AI Models on Civitai. 6949-6958 - Xiuliang Duan

, Dating Tan
, Liangda Fang
, Yuyu Zhou
, Chaobo He
, Ziliang Chen
, Lusheng Wu
, Guanliang Chen
, Zhiguo Gong
, Weiqi Luo
, Quanlong Guan
:
Reason-and-Execute Prompting: Enhancing Multi-Modal Large Language Models for Solving Geometry Questions. 6959-6968 - Weiye Xu

, Min Wang
, Wengang Zhou
, Houqiang Li
:
P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task. 6969-6978 - Wenjie Xuan

, Yufei Xu
, Shanshan Zhao
, Chaoyue Wang
, Juhua Liu
, Bo Du
, Dacheng Tao
:
When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability. 6979-6988 - Wenshuo Chen

, Hongru Xiao
, Erhang Zhang
, Lijie Hu
, Lei Wang
, Mengyuan Liu
, Chen Chen
:
SATO: Stable Text-to-Motion Framework. 6989-6997 - Zhen Ye

, Zeqian Ju
, Haohe Liu
, Xu Tan
, Jianyi Chen
, Yiwen Lu
, Peiwen Sun
, Jiahao Pan
, Weizhen Bian
, Shulin He
, Wei Xue
, Qifeng Liu
, Yike Guo
:
FlashSpeech: Efficient Zero-Shot Speech Synthesis. 6998-7007 - Huadai Liu

, Rongjie Huang
, Yang Liu
, Hengyuan Cao
, Jialei Wang
, Xize Cheng
, Siqi Zheng
, Zhou Zhao
:
AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps. 7008-7017 - Jiaxu Zhang

, Xin Chen
, Gang Yu
, Zhigang Tu
:
Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space. 7018-7026 - Fengqi Liu

, Hexiang Wang
, Jingyu Gong
, Ran Yi
, Qianyu Zhou
, Xuequan Lu
, Jiangbo Lu
, Lizhuang Ma
:
Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation. 7027-7035 - Tianyi Zheng

, Cong Geng
, Peng-Tao Jiang
, Ben Wan
, Hao Zhang
, Jinwei Chen
, Jia Wang
, Bo Li
:
Non-uniform Timestep Sampling: Towards Faster Diffusion Model Training. 7036-7045 - Miaoxin Ye

, Saixing Zhou
, Weiqi Luo
, Shunquan Tan
, Jiwu Huang
:
GAN-based Symmetric Embedding Costs Adjustment for Enhancing Image Steganographic Security. 7046-7054 - Yaqi Li

, Han Fang
, Zerun Feng
, Kaijing Ma
, Chao Ban
, Xianghao Zang
, Lanxiang Zhou
, Zhongjiang He
, Jingyan Chen
, Jiani Hu
, Hao Sun
, Huayu Zhang
:
GOAL: Grounded text-to-image Synthesis with Joint Layout Alignment Tuning. 7055-7064 - Jinfeng Wei

, Xiaofeng Zhang
:
DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer. 7065-7074 - Yang Luo

, Yiheng Zhang
, Zhaofan Qiu
, Ting Yao
, Zhineng Chen
, Yu-Gang Jiang
, Tao Mei
:
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process. 7075-7084 - Wenquan Lu

, Yufei Xu
, Jing Zhang
, Chaoyue Wang
, Dacheng Tao
:
HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting. 7085-7093 - Miao Liu

, Jing Wang
, Xinyuan Qian
, Haizhou Li
:
ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers. 7094-7103 - Jie Hu

, Jie Li
, Yue Ma
, Liujuan Cao
, Songan Zhang
, Wei Zhang
, Guannan Jiang
, Rongrong Ji
:
Prompting to Adapt Foundational Segmentation Models. 7104-7112 - Zhiyuan Ma

, Guoli Jia
, Biqing Qi
, Bowen Zhou
:
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking. 7113-7122 - Jin Sun

, Xiaoshuang Shi
, Zhiyuan Wang
, Kaidi Xu
, Heng Tao Shen
, Xiaofeng Zhu
:
Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation. 7123-7132 - Yuanbin Wang

, Weilun Dai
, Long Chan
, Huanyu Zhou
, Aixi Zhang
, Si Liu
:
GPD-VVTO: Preserving Garment Details in Video Virtual Try-On. 7133-7142 - Hengfei Wang

, Zhongqun Zhang
, Yihua Cheng
, Hyung Jin Chang
:
TextGaze: Gaze-Controllable Face Generation with Natural Language. 7143-7151 - Huiming Zheng

, Wei Gao
, Zhuozhen Yu
, Tiesong Zhao
, Ge Li:
ViewPCGC: View-Guided Learned Point Cloud Geometry Compression. 7152-7161 - Liyang He

, Zhenya Huang
, Chenglong Liu
, Rui Li
, Runze Wu
, Qi Liu
, Enhong Chen
:
One-bit Deep Hashing: Towards Resource-Efficient Hashing Model with Binary Neural Network. 7162-7171 - Xinghao Wu

, Xuefeng Liu
, Jianwei Niu
, Haolin Wang
, Shaojie Tang
, Guogang Zhu
, Hao Su
:
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-rank Decomposition. 7172-7181 - Hengyi Wang

, Weiying Xie
, Jitao Ma
, Daixun Li
, Yunsong Li
:
FedSLS: Exploring Federated Aggregation in Saliency Latent Space. 7182-7190 - Zhongchi Wang

, Hailong Sun
, Zhengyang Zhao
:
FedEvalFair: A Privacy-Preserving and Statistically Grounded Federated Fairness Evaluation Framework. 7191-7199 - Weitao Tang

, Jianqiang Li
, Meijie Du
, Die Hu
, Qingyun Liu
:
Zenith: Real-time Identification of DASH Encrypted Video Traffic with Distortion. 7200-7209 - Beizhang Guo

, Juntao Bao
, Baili Chai
, Di Wu
, Miao Hu
:
Lumos: Optimizing Live 360-degree Video Upstreaming via Spatial-Temporal Integrated Neural Enhancement. 7210-7219 - Zhongnian Li

, Meng Wei
, Peng Ying
, Tongfeng Sun
, Xinzheng Xu
:
Learning from Concealed Labels. 7220-7228 - Xiangxiang Dai

, Zeyu Zhang
, Peng Yang
, Yuedong Xu
, Xutong Liu
, John C. S. Lui
:
AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics. 7229-7238 - Shuo Wang

, Yongcai Wang
, Zhimin Xu
, Yongyu Guo
, Wanting Li
, Zhe Huang
, Xuewei Bai
, Deying Li
:
GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System. 7239-7248 - Yiyang Jiang

, Wengyu Zhang
, Xulu Zhang
, Xiaoyong Wei
, Chang Wen Chen
, Qing Li
:
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval. 7249-7258
Oral Session 18: Fairness, Trust, Explainability & Inperpretability in Multimedia
- Peiwen Sun, Honggang Zhang

, Di Hu
:
Unveiling and Mitigating Bias in Audio Visual Segmentation. 7259-7268 - Ying Liu

, Lihong Liu
, Cai Xu
, Xiangyu Song
, Ziyu Guan
, Wei Zhao
:
Dynamic Evidence Decoupling for Trusted Multi-view Learning. 7269-7277 - Wei Liu

, Yufei Chen
, Xiaodong Yue
:
Building Trust in Decision with Conformalized Multi-view Deep Classification. 7278-7287 - Daoming Zong

, Chaoyue Ding
, Kaitao Chen
:
Toward Explainable Physical Audiovisual Commonsense Reasoning. 7288-7297 - Jingjie Zeng

, Zhihao Yang
, Qi Yang
, Liang Yang
, Hongfei Lin
:
Peeling Back the Layers: Interpreting the Storytelling of ViT. 7298-7306 - Chihaya Matsuhira

, Marc A. Kastner
, Takahiro Komamizu
, Takatsugu Hirayama
, Ichiro Ide
:
Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation. 7307-7315
Oral Session 19: Multimodal Applications
- Minghui Wu

, Chenxu Zhao
, Anyang Su
, Donglin Di
, Tianyu Fu
, Da An
, Min He
, Ya Gao
, Meng Ma
, Kun Yan
, Ping Wang
:
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding. 7316-7325 - Yanglin Deng

, Tianyang Xu
, Chunyang Cheng
, Xiao-Jun Wu
, Josef Kittler
:
MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion. 7326-7335 - Ziyan Li

, Jianfei Yu
, Jia Yang
, Wenya Wang
, Li Yang
, Rui Xia
:
Generative Multimodal Data Augmentation for Low-Resource Multimodal Named Entity Recognition. 7336-7345 - Zhiqi Ge

, Hongzhe Huang
, Mingze Zhou
, Juncheng Li
, Guoming Wang
, Siliang Tang
, Yueting Zhuang
:
WorldGPT: Empowering LLM as Multimodal World Model. 7346-7355 - Yiming Li

, Zhifang Guo
, Xiangdong Wang
, Hong Liu
:
Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training. 7356-7365 - Yingxuan Li

, Ryota Hinami
, Kiyoharu Aizawa
, Yusuke Matsui
:
Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion. 7366-7374
Oral Session 20: Datasets & Algorithms for Multimedia Analysis
- Chunyi Li

, Haoning Wu
, Hongkun Hao
, Zicheng Zhang
, Tengchuan Kou
, Chaofeng Chen
, Lei Bai
, Xiaohong Liu
, Weisi Lin
, Guangtao Zhai
:
G-Refine: A General Quality Refiner for Text-to-Image Generation. 7375-7384 - Wenqiang Xu

, Wenrui Dai
, Ziyang Zheng
, Chenglin Li
, Junni Zou
, Hongkai Xiong
:
Point Cloud Upsampling with Geometric Algebra Driven Inverse Heat Dissipation. 7385-7394 - Junyan Wu

, Wei Lu
, Xiangyang Luo
, Rui Yang
, Qian Wang
, Xiaochun Cao
:
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization. 7395-7403 - Fujun Han

, Peng Ye
, Shukai Duan
, Lidan Wang
:
Ada-iD: Active Domain Adaptation for Intrusion Detection. 7404-7413 - Zhixi Cai

, Shreya Ghosh
, Aman Pankaj Adatia
, Munawar Hayat
, Abhinav Dhall
, Tom Gedeon
, Kalin Stefanov
:
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset. 7414-7423 - Rintaro Yanagi

, Ren Togo
, Takahiro Ogawa
, Miki Haseyama
:
DQG: Database Question Generation for Exact Text-based Image Retrieval. 7424-7433
Oral Session 21: Image Enhancement and Super-Resolution
- Tongshun Zhang

, Pingping Liu
, Ming Zhao
, Haotian Lv
:
DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement. 7434-7443 - Fei Gao

, Yuhao Lin
, Jiaqi Shi
, Maoying Qiao
, Nannan Wang
:
AesMamba: Universal Image Aesthetic Assessment with State Space Models. 7444-7453 - Yi Dong

, Yuxi Wang
, Zheng Fang
, Wenqi Ouyang
, Xianhui Lin
, Zhiqi Shen
, Peiran Ren
, Xuansong Xie
, Qingming Huang
:
MovingColor: Seamless Fusion of Fine-grained Video Color Enhancement. 7454-7463 - Ruibin Li

, Jingcai Guo
, Qihua Zhou
, Song Guo
:
FreePIH: Training-Free Painterly Image Harmonization with Diffusion Model. 7464-7473 - Qian Huang

, Cheng Xu
, Guiqing Li
, Ziheng Wu
, Shengxin Liu
, Shengfeng He
:
Portrait Shadow Removal via Self-Exemplar Illumination Equalization. 7474-7482 - Qiwen Zhu

, Yanjie Wang
, Shilv Cai
, Liqun Chen
, Jiahuan Zhou
, Luxin Yan
, Sheng Zhong
, Xu Zou
:
Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem. 7483-7492
Oral Session 22: Audio-visual Datasets and Applications
- Han Wang

, Tan Rui Yang
, Usman Naseem
, Roy Ka-Wei Lee
:
MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili. 7493-7502 - Jiale Yu

, Baopeng Zhang
, Zhu Teng
, Jianping Fan
:
OpenAVE: Moving towards Open Set Audio-Visual Event Localization. 7503-7512 - Xinfa Zhu

, Wenjie Tian
, Xinsheng Wang
, Lei He
, Yujia Xiao
, Xi Wang
, Xu Tan
, Sheng Zhao
, Lei Xie
:
UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis. 7513-7522 - Zhedong Zhang

, Liang Li
, Gaoxiang Cong
, Haibing Yin
, Yuhan Gao
, Chenggang Yan
, Anton van den Hengel
, Yuankai Qi
:
From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning. 7523-7532 - Ruohao Guo

, Liao Qu
, Dantong Niu
, Yanyu Qi
, Wenzhen Yue
, Ji Shi
, Bowei Xing
, Xianghua Ying
:
Open-Vocabulary Audio-Visual Semantic Segmentation. 7533-7541
Oral Session 23: Multimodal Learning and Recommendation Systems
- Hongcheng Li

, Yucan Zhou
, Xiaoyan Gu
, Bo Li
, Weiping Wang
:
Diversified Semantic Distribution Matching for Dataset Distillation. 7542-7550 - Jinghao Zhang

, Guofan Liu
, Qiang Liu
, Shu Wu
, Liang Wang
:
Modality-Balanced Learning for Multimedia Recommendation. 7551-7560 - Ziyi Ye

, Jingtao Zhan
, Qingyao Ai
, Yiqun Liu
, Maarten de Rijke
, Christina Lioma
, Tuukka Ruotsalo
:
Query Augmentation with Brain Signals. 7561-7570 - Lei Shi

, Jiapeng Yang
, Pengtao Lv
, Lu Yuan
, Feifei Kou
, Jia Luo
, Mingying Xu
:
Self-derived Knowledge Graph Contrastive Learning for Recommendation. 7571-7580 - Jiaye Lin

, Qing Li
, Guorui Xie
, Zhongxu Guan
, Yong Jiang
, Ting Xu
, Zhong Zhang
, Peilin Zhao
:
Mitigating Sample Selection Bias with Robust Domain Adaption in Multimedia Recommendation. 7581-7590 - Yangqin Jiang

, Lianghao Xia
, Wei Wei
, Da Luo
, Kangyi Lin
, Chao Huang
:
DiffMM: Multi-Modal Diffusion Model for Recommendation. 7591-7599
Oral Session 24: Novel Multimedia Applications 2
- Tongtong Feng

, Xin Wang
, Feilin Han
, Leping Zhang
, Wenwu Zhu
:
U2UData: A Large-scale Cooperative Perception Dataset for Swarm UAVs Autonomous Flight. 7600-7608 - Chaoqun Niu

, Dongdong Chen
, Jizhe Zhou
, Jian Wang
, Xiang Luo
, Quan-Hui Liu
, Yuan Li
, Jiancheng Lv
:
Neural Boneprint: Person Identification from Bones Using Generative Contrastive Deep Learning. 7609-7618 - Xueli Hu

, Huan Liu
, Haocheng Yuan
, Zhiyang Fu
, Yizhi Luo
, Ning Zhang
, Hang Zou
, Jianwen Gan
, Yuan Zhang
:
Fine-Grained Prompt Learning for Face Anti-Spoofing. 7619-7628 - Xiao Han

, Yiming Ren
, Yichen Yao
, Yujing Sun
, Yuexin Ma
:
Towards Practical Human Motion Prediction with LiDAR Point Clouds. 7629-7638 - Haodong Hong

, Sen Wang
, Zi Huang
, Qi Wu
, Jiajun Liu
:
Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments. 7639-7648 - Minghe Gao

, Juncheng Li
, Hao Fei
, Liang Pang
, Wei Ji
, Guoming Wang
, Zheqi Lv
, Wenqiao Zhang
, Siliang Tang
, Yueting Zhuang
:
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback. 7649-7657
Oral Session 25: Media and Communication Technologies
- Jingjing Liu

, Youyi Zheng
, Kun Zhou
:
Virtual Agent Positioning Driven by Personal Characteristics. 7658-7666 - Meng Luo

, Hao Fei
, Bobo Li
, Shengqiong Wu
, Qian Liu
, Soujanya Poria
, Erik Cambria
, Mong-Li Lee
, Wynne Hsu
:
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis. 7667-7676 - Yawen Luo

, Min Shi
, Liao Shen
, Yachuan Huang
, Zixuan Ye
, Juewen Peng
, Zhiguo Cao
:
Video Bokeh Rendering: Make Casual Videography Cinematic. 7677-7685 - Zhenyu Zhang

, Guangyao Chen
, Yixiong Zou
, Zhimeng Huang
, Yuhua Li
, Ruixuan Li
:
MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning. 7686-7695 - Zejun Zhang

, Xiao Zhu
, Anlan Zhang
, Feng Qian
:
An In-depth Study of Bandwidth Allocation across Media Sources in Video Conferencing. 7696-7704 - Zixuan Yang

, Yushu Zhang
, Tao Wang
, Zhongyun Hua
, Zhihua Xia
, Jian Weng
:
Once-for-all: Efficient Visual Face Privacy Protection via Person-specific Veils. 7705-7713
Oral Session 26: Cultural Heritage & Media Analysis
- Shipeng Zhu

, Hui Xue
, Na Nie
, Chenjie Zhu
, Haiyue Liu
, Pengfei Fang
:
Reproducing the Past: A Dataset for Benchmarking Inscription Restoration. 7714-7723 - Jiao Pan

, Liang Li
, Hiroshi Yamaguchi
, Kyoko Hasegawa
, Fadjar Ibnu Thufail
, Brahmantara
, Xiaojuan Ban
, Satoshi Tanaka
:
Reconstructing, Understanding, and Analyzing Relief Type Cultural Heritage from a Single Old Photo. 7724-7733 - Yi Bin

, Wenhao Shi
, Yujuan Ding
, Zhiqiang Hu
, Zheng Wang
, Yang Yang
, See-Kiong Ng
, Heng Tao Shen
:
GalleryGPT: Analyzing Paintings with Large Multimodal Models. 7734-7743 - Jun Ma

, Tuukka Ruotsalo
:
Cognition-Supervised Saliency Detection: Contrasting EEG Signals and Visual Stimuli. 7744-7753 - Yizhang Liu

, Weiwei Zhou
, Yanping Li
, Shengjie Zhao
:
RoSe: Rotation-Invariant Sequence-Aware Consensus for Robust Correspondence Pruning. 7754-7763 - Yujia Wang

, Fang-Lue Zhang
, Neil A. Dodgson
:
ScanTD: 360° Scanpath Prediction based on Time-Series Diffusion. 7764-7773
Oral Session 27: Security & Quality in Multimedia Systems
- Dunyun Chen

, Xin Liao
, Xiaoshuai Wu
, Shiwei Chen
:
SafePaint: Anti-forensic Image Inpainting with Domain Adaptation. 7774-7782 - Zicheng Zhang

, Haoning Wu
, Yingjie Zhou
, Chunyi Li
, Wei Sun
, Chaofeng Chen
, Xiongkuo Min
, Xiaohong Liu
, Weisi Lin
, Guangtao Zhai
:
LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM. 7783-7792 - Tengchuan Kou

, Xiaohong Liu
, Zicheng Zhang
, Chunyi Li
, Haoning Wu
, Xiongkuo Min
, Guangtao Zhai
, Ning Liu
:
Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment. 7793-7802 - Puyi Wang

, Wei Sun
, Zicheng Zhang
, Jun Jia
, Yanwei Jiang
, Zhichao Zhang
, Xiongkuo Min
, Guangtao Zhai
:
Large Multi-modality Model Assisted AI-Generated Image Quality Assessment. 7803-7812 - Xuemei Zhou

, Irene Viola
, Yunlu Chen
, Jiahuan Pei
, Pablo César
:
Deciphering Perceptual Quality in Colored Point Cloud: Prioritizing Geometry or Texture Distortion? 7813-7822 - Desen Yuan

, Lei Wang
:
Dual-Criterion Quality Loss for Blind Image Quality Assessment. 7823-7832
Oral Session 28: Complex Scene Processing
- Zhe Huang

, Shuo Wang
, Yongcai Wang
, Wanting Li
, Deying Li
, Lei Wang
:
RoCo: Robust Cooperative Perception By Iterative Object Matching and Pose Adjustment. 7833-7842 - Shao-Kui Zhang

, Hanxi Zhu
, Xuebin Chen
, Jinghuan Chen
, Zhike Peng
, Ziyang Chen
, Yong-Liang Yang
, Song-Hai Zhang
:
ScenePhotographer: Object-Oriented Photography for Residential Scenes. 7843-7851 - Changli Wu

, Yihang Liu
, Jiayi Ji
, Yiwei Ma
, Haowei Wang
, Gen Luo
, Henghui Ding
, Xiaoshuai Sun
, Rongrong Ji
:
3D-GRES: Generalized 3D Referring Expression Segmentation. 7852-7861 - Xuan Han

, Yihao Zhao
, Mingyu You
:
Scene Diffusion: Text-driven Scene Image Synthesis Conditioning on a Single 3D Model. 7862-7870 - Jinbo Yan

, Rui Peng
, Luyang Tang
, Ronggang Wang
:
4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes. 7871-7880 - Hongtao Wu

, Yijun Yang
, Huihui Xu
, Weiming Wang
, Jinni Zhou
, Lei Zhu
:
RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining. 7881-7890
Oral Session 29: Enhancements in Video Streaming and Compression
- Bo Wu

, Tong Li
, Cheng Luo
, Xu Yan
, Fuyu Wang
, Xinle Du
, Ke Xu
:
Toward Timeliness-Enhanced Loss Recovery for Large-Scale Live Streaming. 7891-7899 - Fangtao Zhou

, Xiaofeng Huang
, Peng Zhang
, Meng Wang
, Zhao Wang
, Yang Zhou
, Haibing Yin
:
Enhanced Screen Content Image Compression: A Synergistic Approach for Structural Fidelity and Text Integrity Preservation. 7900-7908 - Miao Zhang

, Jiaxing Li
, Haoyuan Zhao
, Linfeng Shen
, Jiangchuan Liu
:
StarStream: Live Video Analytics over Space Networking. 7909-7917 - Pengqiang Bi

, Yifei Zou
, Mengbai Xiao
, Dongxiao Yu
, Yijun Li
, Zhixiong Liu
, Qun Xie
:
LiteQUIC: Improving QoE of Video Streams by Reducing CPU Overhead of QUIC. 7918-7927 - Yili Jin

, Xize Duan
, Fangxin Wang
, Xue Liu
:
HeadsetOff: Enabling Photorealistic Video Conferencing on Economical VR Headsets. 7928-7936 - Zihan Zheng

, Houqiang Zhong
, Qiang Hu
, Xiaoyun Zhang
, Li Song
, Ya Zhang
, Yanfeng Wang
:
HPC: Hierarchical Progressive Coding Framework for Volumetric Video. 7937-7946
Poster Session 3
- Lianghui Zhu

, Junwei Zhou
, Yan Liu
, Xin Hao
, Wenyu Liu
, Xinggang Wang
:
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition. 7947-7956 - Xiangyu Sun

, Joo Chan Lee
, Daniel Rho
, Jong Hwan Ko
, Usman Ali
, Eunbyung Park
:
F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting. 7957-7965 - Sijing Wu

, Yunhao Li
, Yichao Yan
, Huiyu Duan
, Ziwei Liu
, Guangtao Zhai
:
MMHead: Towards Fine-grained Multi-modal 3D Facial Animation. 7966-7975 - Chunxiao Li

, Shuyang Wang
, Xuejing Kang
, Anlong Ming
:
Thinking Temporal Automatic White Balance: Datasets, Models and Benchmarks. 7976-7984 - Zhe Luo

, Weina Fu
, Shuai Liu
, Saeed Anwar
, Muhammad Saqib
, Sambit Bakshi
, Khan Muhammad
:
Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection. 7985-7994 - Wencan Huang

, Daizong Liu
, Wei Hu
:
Advancing 3D Object Grounding Beyond a Single 3D Scene. 7995-8004 - Bin Huang

, Feng He
, Qi Wang
, Hong Chen
, Guohao Li
, Zhifan Feng
, Xin Wang
, Wenwu Zhu
:
Neighbor Does Matter: Curriculum Global Positive-Negative Sampling for Vision-Language Pre-training. 8005-8014 - Haoyuan Jin

, Xuesong Nie
, Yunfeng Yan
, Xi Chen
, Zhihang Zhu
, Donglian Qi
:
Object-Level Pseudo-3D Lifting for Distance-Aware Tracking. 8015-8023 - Xinwei Liu

, Xiaojun Jia
, Yuan Xun
, Siyuan Liang
, Xiaochun Cao
:
Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning. 8024-8033 - Ge Luo

, Yuchen Ma
, Manman Zhang
, Junqiang Huang
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
Engaging Live Video Comments Generation. 8034-8042 - Lu Chen

, Qiangchang Wang
, Zhaohui Li
, Yilong Yin
:
Hypergraph-guided Intra- and Inter-category Relation Modeling for Fine-grained Visual Recognition. 8043-8052 - Yuan Xie

, Yichen Zhang
, Yifang Yin
, Sheng Zhang
, Ying Zhang
, Rajiv Ratn Shah
, Roger Zimmermann
, Guoqing Xiao
:
Traj2Former: A Local Context-aware Snapshot and Sequential Dual Fusion Transformer for Trajectory Classification. 8053-8061 - Guilin Li

, Mengdan Zhang
, Xiawu Zheng
, Peixian Chen
, Zihan Wang
, Yunhang Shen
, Mingchen Zhuge
, Chenglin Wu
, Fei Chao
, Ke Li
, Xing Sun
, Rongrong Ji
:
Multimodal Inplace Prompt Tuning for Open-set Object Detection. 8062-8071 - Shengran Cheng

, Chuhang Ma
, Ye Pan
:
StylizedFacePoint: Facial Landmark Detection for Stylized Characters. 8072-8080 - Sheng Zhang

, Xi Yang
:
Information Fusion with Knowledge Distillation for Fine-grained Remote Sensing Object Detection. 8081-8089 - Bowen Zhao

, Qianqian Wang
, Zhiqiang Tao
, Wei Feng
, Quanxue Gao
:
DFMVC: Deep Fair Multi-view Clustering. 8090-8099 - Ruyu Liu

, Zhengzhe Liu
, Haoyu Zhang
, Guodao Zhang
, Jianhua Zhang
, Bo Sun
, Weiguo Sheng
, Xiufeng Liu
, Yaochu Jin
:
ColVO: Colonoscopic Visual Odometry Considering Geometric and Photometric Consistency. 8100-8109 - Xun Lin

, Yi Yu
, Zitong Yu
, Ruohan Meng
, Jiale Zhou
, Ajian Liu
, Yizhong Liu
, Shuai Wang
, Wenzhong Tang
, Zhen Lei
, Alex C. Kot
:
HideMIA: Hidden Wavelet Mining for Privacy-Enhancing Medical Image Analysis. 8110-8119 - Shuyuan Liu

, Jiawei Chen
, Shouwei Ruan
, Hang Su
, Zhaoxia Yin
:
Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models. 8120-8128 - Jiahe Tian

, Cai Yu
, Xi Wang
, Peng Chen
, Zihao Xiao
, Jizhong Han
, Yesheng Chai
:
Dynamic Mixed-Prototype Model for Incremental Deepfake Detection. 8129-8138 - Tianshan Liu

, Kin-Man Lam
, Bing-Kun Bao
:
Label Text-aided Hierarchical Semantics Mining for Panoramic Activity Recognition. 8139-8148 - Xiaoda Yang

, Xize Cheng
, Dongjie Fu
, Minghui Fang
, Jialong Zuo
, Shengpeng Ji
, Zhou Zhao
, Tao Jin:
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning. 8149-8158 - Jingjun Yi

, Qi Bi
, Hao Zheng
, Haolan Zhan
, Wei Ji
, Yawen Huang
, Yuexiang Li
, Yefeng Zheng
:
Learning Spectral-Decomposited Tokens for Domain Generalized Semantic Segmentation. 8159-8168 - Peng Yin

, Xiaosu Zhu
, Jingkuan Song
, Lianli Gao
, Heng Tao Shen
:
SI-BiViT: Binarizing Vision Transformers with Spatial Interaction. 8169-8178 - Ao Li

, Huijun Liu
, Jinrong Sheng
, Zhongming Chen
, Yongxin Ge
:
Efficient Dual-Confounding Eliminating for Weakly-supervised Temporal Action Localization. 8179-8188 - Xuri Ge

, Junchen Fu
, Fuhai Chen
, Shan An
, Nicu Sebe
, Joemon M. Jose
:
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning. 8189-8198 - Jongbhin Woo

, Hyeonggon Ryu
, Youngjoon Jang
, Jae-Won Cho
, Joon Son Chung
:
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding. 8199-8208 - Jiali Chen

, Xusen Hei
, Yuqi Xue
, Yuancheng Wei
, Jiayuan Xie
, Yi Cai
, Qing Li
:
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor. 8209-8218 - Yu-Pei Song

, Yuantong Liu
, Xiao Wu
, Qi He
, Zhaoquan Yuan
, Ao Luo
:
MagicCartoon: 3D Pose and Shape Estimation for Bipedal Cartoon Characters. 8219-8227 - Ajian Liu

, Hui Ma
, Junze Zheng
, Haocheng Yuan
, Xiaoyuan Yu
, Yanyan Liang
, Sergio Escalera
, Jun Wan
, Zhen Lei
:
FM-CLIP: Flexible Modal CLIP for Face Anti-Spoofing. 8228-8237 - Jiaqi Guo

, Lianli Gao
, Junchen Zhu
, Jiaxin Zhang
, Siyang Li
, Jingkuan Song
:
MagicVFX: Visual Effects Synthesis in Just Minutes. 8238-8246 - Kangzheng Liu

, Feng Zhao
, Yu Yang
, Guandong Xu
:
DySarl: Dynamic Structure-Aware Representation Learning for Multimodal Knowledge Graph Reasoning. 8247-8256 - Weicai Yan

, Ye Wang
, Wang Lin
, Zirun Guo
, Zhou Zhao
, Tao Jin
:
Low-rank Prompt Interaction for Continual Vision-Language Retrieval. 8257-8266 - Jing Zhou

, Ziqi Yu
, Zhongyun Bao
, Gang Fu
, Weilei He
, Chao Liang
, Chunxia Xiao
:
Foreground Harmonization and Shadow Generation for Composite Image. 8267-8276 - Zhen-Xiang Ma

, Zhen-Duo Chen
, Li-Jun Zhao
, Zi-Chao Zhang
, Tai Zheng
, Xin Luo
, Xin-Shun Xu
:
Bi-directional Task-Guided Network for Few-Shot Fine-Grained Image Classification. 8277-8286 - Xiao He

, Chang Tang
, Xinwang Liu
, Chuankun Li
, Shan An
, Zhenglai Li
:
Heterogeneous Graph Guided Contrastive Learning for Spatially Resolved Transcriptomics Data. 8287-8295 - Yabing Wang

, Le Wang
, Qiang Zhou
, Zhibin Wang
, Hao Li
, Gang Hua
, Wei Tang
:
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval. 8296-8305 - Zhiwen Yang

, Liang Li
, Jiehua Zhang
, Tingyu Wang
, Yaoqi Sun
, Chenggang Yan
:
Domain Shared and Specific Prompt Learning for Incremental Monocular Depth Estimation. 8306-8315 - Shuting He

, Henghui Ding
:
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation. 8316-8325 - Yunwei Bai

, Bill Yang Cai
, Ying Kiat Tan
, Zangwei Zheng
, Shiming Chen
, Tsuhan Chen
:
FSL-QuickBoost: Minimal-Cost Ensemble for Few-Shot Learning. 8326-8335 - Jinhui Pang

, Changqing Lin
, Xiaoshuai Hao
, Rong Yin
, Zixuan Wang
, Zhihui Zhang
, Jinglin He
, Huang Tai Sheng
:
FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning. 8336-8344 - Fengmao Lv

, Changru Nie
, Jianyang Zhang
, Guowu Yang
, Guosheng Lin
, Xiao Wu
, Tianrui Li
:
Rethinking the Effect of Uninformative Class Name in Prompt Learning. 8345-8354 - Yuhan Wang

, Mofei Song
:
UniL: Point Cloud Novelty Detection through Multimodal Pre-training. 8355-8364 - Zeyu Xiao

, Zhihe Lu
, Xinchao Wang
:
P-BiC: Ultra-High-Definition Image Moiré Patterns Removal via Patch Bilateral Compensation. 8365-8373 - Jing Yang, Shundong Yang

, Yuan Gao, Jieming Yang, Laurence T. Yang
:
Multimodal Contextual Interactions of Entities: A Modality Circular Fusion Approach for Link Prediction. 8374-8382 - Chaolei Tan

, Zihang Lin
, Junfu Pu
, Zhongang Qi
, Wei-Yi Pei
, Zhi Qu
, Yexin Wang
, Ying Shan
, Wei-Shi Zheng
, Jian-Fang Hu
:
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses. 8383-8392 - Buyu Liu

, Kai Wang
, Yansong Liu
, Jun Bao
, Tingting Han
, Jun Yu
:
MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability. 8393-8401 - Junzhang Liu

, Zhecan Wang
, Hammad A. Ayyubi
, Haoxuan You
, Chris Thomas
, Rui Sun
, Shih-Fu Chang
, Kai-Wei Chang
:
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions. 8402-8411 - Yingchun Wang

, Jingcai Guo
, Song Guo
, Yi Liu
, Jie Zhang
, Weizhan Zhang
:
SFP: Spurious Feature-Targeted Pruning for Out-of-Distribution Generalization. 8412-8420 - Yao Li

, Jiajun Deng
, Yuxuan Xiao
, Yingjie Wang
, Xiaomeng Chu
, Jianmin Ji
, Yanyong Zhang
:
FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection. 8421-8430 - Fangdi Wang

, Jiaqi Jin
, Zhibin Dong
, Xihong Yang
, Yu Feng
, Xinwang Liu
, Xinzhong Zhu
, Siwei Wang
, Tianrui Liu
, En Zhu
:
View Gap Matters: Cross-view Topology and Information Decoupling for Multi-view Clustering. 8431-8440 - Wenjie Wei

, Yu Liang
, Ammar Belatreche
, Yichen Xiao
, Honglin Cao
, Zhenbang Ren
, Guoqing Wang
, Malu Zhang
, Yang Yang
:
Q-SNNs: Quantized Spiking Neural Networks. 8441-8450 - Shihua Zhang

, Jiayi Ma
:
DiffGlue: Diffusion-Aided Image Feature Matching. 8451-8460 - Xueyang Li

, Yu Song
, Yunzhong Lou
, Xiangdong Zhou
:
CAD Translator: An Effective Drive for Text to 3D Parametric Computer-Aided Design Generative Modeling. 8461-8470 - Weichen Xu, Jian Cao

, Tianhao Fu
, Ruilong Ren, Zicong Hu
, Xixin Cao
, Xing Zhang
:
Point Cloud Reconstruction Is Insufficient to Learn 3D Representations. 8471-8479 - Xiao Yu

, Kejiang Chen
, Kai Zeng
, Han Fang
, Zijin Yang
, Xiuwei Shang
, Yuang Qi
, Weiming Zhang
, Nenghai Yu
:
SemGIR: Semantic-Guided Image Regeneration Based Method for AI-generated Image Detection and Attribution. 8480-8488 - Jiahua Xiao

, Yang Liu
, Shizhou Zhang
, Xing Wei
:
Bridging Fourier and Spatial-Spectral Domains for Hyperspectral Image Denoising. 8489-8497 - Heng Jia

, Yunqiu Xu
, Linchao Zhu
, Guang Chen
, Yufei Wang
, Yi Yang:
MoS2: Mixture of Scale and Shift Experts for Text-Only Video Captioning. 8498-8507 - Qi Zhang

, Chi Huang
, Qian Zhang
, Nan Li
, Wei Feng
:
Learning Geometry Consistent Neural Radiance Fields from Sparse and Unposed Views. 8508-8517 - Zihan Fang

, Shide Du
, Yuhong Chen
, Shiping Wang
:
Beyond the Known: Ambiguity-Aware Multi-view Learning. 8518-8526 - Jingchao Wang

, Zhengnan Deng
, Tongxu Lin
, Wenyuan Li
, Shaobin Ling
, Junyu Lin
:
Beyond Direct Relationships: Exploring Multi-Order Label Pair Dependencies for Knowledge Distillation. 8527-8535 - Yuhang Li

, Jincen Jiang
, Xiaosong Yang
, Youdong Ding
, Jian Jun Zhang
:
Harmony Everything! Masked Autoencoders for Video Harmonization. 8536-8545 - Linfeng Tang

, Yuxin Deng
, Xunpeng Yi
, Qinglong Yan
, Yixuan Yuan
, Jiayi Ma
:
DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffusion Prior. 8546-8555 - Jintao Chen

, Fan Wang
, Shengye Pang
, Siwei Tan
, Mingshuai Chen
, Tiancheng Zhao
, Meng Xi
, Jianwei Yin
:
UniGM: Unifying Multiple Pre-trained Graph Models via Adaptive Knowledge Aggregation. 8556-8565 - Ziyue Wu

, Junyu Gao
, Changsheng Xu
:
Open-Vocabulary Video Scene Graph Generation via Union-aware Semantic Alignment. 8566-8575 - Li Zheng

, Boyu Chen
, Hao Fei
, Fei Li
, Shengqiong Wu
, Lizi Liao
, Donghong Ji
, Chong Teng
:
Self-Adaptive Fine-grained Multi-modal Data Augmentation for Semi-supervised Muti-modal Coreference Resolution. 8576-8585 - Daqin Luo

, Chengjian Feng
, Yuxuan Nong
, Yiqing Shen
:
AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models. 8586-8594 - Xu Zhang

, Zhipeng Xie
, Haiyang Yu
, Qitong Wang
, Peng Wang
, Wei Wang
:
Enhancing Adaptive Deep Networks for Image Classification via Uncertainty-aware Decision Fusion. 8595-8603 - Ran Wang

, Hua Zuo
, Zhen Fang
, Jie Lu
:
Towards Robustness Prompt Tuning with Fully Test-Time Adaptation for CLIP's Zero-Shot Generalization. 8604-8612 - Lijun Zhang

, Wei Suo
, Peng Wang
, Yanning Zhang
:
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap. 8613-8622 - Haojie Wei

, Jun Yuan
, Rui Zhang
, Quanyu Dai
, Yueguo Chen
:
MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation. 8623-8632 - Binbin Xu

, Jun Yin
, Nan Zhang
:
Graph based Consistency Learning for Contrastive Multi-View Clustering. 8633-8641 - Jiaxin Gao

, Yaohua Liu
:
Enhancing Images with Coupled Low-Resolution and Ultra-Dark Degradations: A Tri-level Learning Framework. 8642-8651 - Qian Qu

, Xinhang Wan
, Weixuan Liang
, Jiyuan Liu
, Yu Feng
, Huiying Xu
, Xinwang Liu
, En Zhu
:
A Lightweight Anchor-Based Incremental Framework for Multi-view Clustering. 8652-8661 - Yao Wu

, Mingwei Xing
, Yachao Zhang
, Yuan Xie
, Yanyun Qu
:
CLIP2UDA: Making Frozen CLIP Reward Unsupervised Domain Adaptation in 3D Semantic Segmentation. 8662-8671 - Zongqian Wu

, Yujing Liu
, Mengmeng Zhan
, Ping Hu
, Xiaofeng Zhu
:
Adaptive Multi-Modality Prompt Learning. 8672-8680 - Shiwei Zhang

, Wei Ke
, Shuai Liu
, Xiaopeng Hong
, Tong Zhang
:
Boosting Semi-supervised Crowd Counting with Scale-based Active Learning. 8681-8690 - Yingjie Gao

, Yanan Zhang
, Ziyue Huang
, Nanqing Liu
, Di Huang
:
PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection. 8691-8700 - Li Yuan

, Yi Cai
, Junsheng Huang
:
Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model. 8701-8710 - Yijia Wang

, Qianqian Xu
, Yangbangyan Jiang
, Siran Dai
, Qingming Huang
:
Regularized Contrastive Partial Multi-view Outlier Detection. 8711-8720 - Rui Liu

, Mingjie Li
, Shen Zhao
, Ling Chen
, Xiaojun Chang
, Lina Yao
:
In-Context Learning for Zero-shot Medical Report Generation. 8721-8730 - Guoliang Zou

, Yangdong Ye
, Tongji Chen
, Shizhe Hu
:
Learning Dual Enhanced Representation for Contrastive Multi-view Clustering. 8731-8739 - Yang Zhao, Gangwei Xu, Gang Wu:

Hybrid Cost Volume for Memory-Efficient Optical Flow. 8740-8749 - Xiao-Qian Liu

, Minghui Liu
, Zhen-Duo Chen
, Xin Luo
, Xin-Shun Xu
:
Hierarchical Multi-label Learning for Incremental Multilingual Text Recognition. 8750-8758 - Yuzhuo Wang

, Junwei He
, Hongzhi Wang
:
RHKH: Relational Hypergraph Neural Network for Link Prediction on N-ary Knowledge Hypergraph. 8759-8767 - Fengbo Lan

, Chang Wen Chen
:
Understanding and Tackling Scattering and Reflective Flare for Mobile Camera Systems. 8768-8776 - Ziyu Zhao

, Pingping Cai
, Canyu Zhang
, Xiaoguang Li
, Song Wang
:
Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View Synthesis. 8777-8785 - Jinkai Zheng

, Xinchen Liu
, Boyue Zhang
, Chenggang Yan
, Jiyong Zhang
, Wu Liu
, Yongdong Zhang
:
It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment. 8786-8794 - Kenan Huang

, Junbao Zhuo
, Shuhui Wang
, Chi Su
, Qingming Huang
, Huimin Ma
:
Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation. 8795-8804 - Lv Tang

, Peng-Tao Jiang
, Zhihao Shen
, Hao Zhang
, Jin-Wei Chen
, Bo Li
:
Chain of Visual Perception: Harnessing Multimodal Large Language Models for Zero-shot Camouflaged Object Detection. 8805-8814 - Xinyao Liao

, Wei Wei
, Dangyang Chen
, Yuanyuan Fu
:
UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation. 8815-8824 - Siyang Wang

, Jinghao Zhang
, Jie Huang
, Feng Zhao
:
Image-free Pre-training for Low-Level Vision. 8825-8834 - Jiacheng Ruan

, Jingsheng Gao
, Mingye Xie
, Suncheng Xiang
, Zefang Yu
, Ting Liu
, Yuzhuo Fu
, Xiaoye Qu
:
GIST: Improving Parameter Efficient Fine-Tuning via Knowledge Interaction. 8835-8844 - Xuechen Guo

, Wenhao Chai
, Shiyan Li
, Gaoang Wang
:
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound. 8845-8854 - Xiao Han

, Zhenduo Zhang
, Yiling Wu
, Xinfeng Zhang
, Zhe Wu
:
Event Traffic Forecasting with Sparse Multimodal Data. 8855-8864 - Wanru Xu

, Zhenjiang Miao
, Yi Tian
, Yigang Cen
, Lili Wan
, Xiaole Ma
:
Probabilistic Distillation Transformer: Modelling Uncertainties for Visual Abductive Reasoning. 8865-8873 - Shiye Wang

, Changsheng Li
, Jialin Tang
, Xing Gong
, Ye Yuan
, Guoren Wang
:
Importance-aware Shared Parameter Subspace Learning for Domain Incremental Learning. 8874-8883 - Chengshun Wang

, Na Zhao
:
GS2-GNeSF: Geometry-Semantics Synergy for Generalizable Neural Semantic Fields. 8884-8892 - Liang Du

, Yukai Shi
, Yan Chen
, Peng Zhou
, Yuhua Qian
:
Fast and Scalable Incomplete Multi-View Clustering with Duality Optimal Graph Filtering. 8893-8902 - Zhilin He

, Yawei Zhang
, Jingchang Mu
, Xiaoyue Gu
, Tianhao Gu
:
LiteGfm: A Lightweight Self-supervised Monocular Depth Estimation Framework for Artifacts Reduction via Guided Image Filtering. 8903-8912 - Chengyi Yang

, Wentao Liu
, Shisong Chen
, Jiayin Qi
, Aimin Zhou
:
Generating Prompts in Latent Space for Rehearsal-free Continual Learning. 8913-8922 - Choubo Ding

, Guansong Pang
:
Improving Out-of-Distribution Detection with Disentangled Foreground and Background Features. 8923-8931 - Yi Lu

, Shenghao Ren
, Qiu Shen
, Xun Cao
:
Leveraging RGB-Pressure for Whole-body Human-to-Humanoid Motion Imitation. 8932-8941 - Li Zhang

, Zean Han
, Yan Zhong
, Qiaojun Yu
, Xingyu Wu
, Xue Wang
, Rujing Wang
:
VoCAPTER: Voting-based Pose Tracking for Category-level Articulated Object via Inter-frame Priors. 8942-8951 - Jinpeng Yu

, Binbin Huang
, Yuxuan Zhang
, Huaxia Li
, Xu Tang
, Shenghua Gao
:
GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer. 8952-8961 - Sifan Wu

, Haipeng Chen
, Yifang Yin
, Sihao Hu
, Runyang Feng
, Yingying Jiao
, Ziqi Yang
, Zhenguang Liu
:
Joint-Motion Mutual Learning for Pose Estimation in Video. 8962-8971 - Jiaqi Wang

, Pichao Wang
, Yi Feng
, Huafeng Liu
, Chang Gao
, Liping Jing
:
Align2Concept: Language Guided Interpretable Image Recognition by Visual Prototype and Textual Concept Alignment. 8972-8981 - Siying Xiao

, Mao Ye
, Qichen He
, Shuaifeng Li
, Song Tang
, Xiatian Zhu
:
Adversarial Experts Model for Black-box Domain Adaptation. 8982-8991 - Yayun Wei

, Lei Cao
, Hao Li
, Yilin Dong
:
MB2C: Multimodal Bidirectional Cycle Consistency for Learning Robust Visual Neural Representations. 8992-9000 - Qiang Wang

, Ke Yan
, Shouhong Ding
:
Bilateral Adaptive Cross-Modal Fusion Prompt Learning for CLIP. 9001-9009 - Yifei Gao

, Jiaqi Wang
, Zhiyu Lin
, Jitao Sang
:
AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models. 9010-9018 - Haizhuang Liu

, Junbao Zhuo
, Chen Liang
, Jiansheng Chen
, Huimin Ma
:
Affinity3D: Propagating Instance-Level Semantic Affinity for Zero-Shot Point Cloud Semantic Segmentation. 9019-9028 - Zhaojian Li

, Bin Zhao
, Yuan Yuan
:
TAS: Personalized Text-guided Audio Spatialization. 9029-9037 - Congqi Cao

, Yueran Zhang
, Yating Yu
, Qinyi Lv
, Lingtong Min
, Yanning Zhang
:
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition. 9038-9047 - Quanjiang Li

, Tingjin Luo
, Mingdie Jiang
, Jiahui Liao
, Zhangqi Jiang
:
Deep Incomplete Multi-View Network Semi-Supervised Multi-Label Learning with Unbiased Loss. 9048-9056 - Xinyue Liu

, Jiahui Wan
, Linlin Zong
, Bo Xu
:
Conditional Diffusion Model for Open-ended Video Question Answering. 9057-9066 - Yulin He

, Siqi Wang
, Wei Chen
, Tianci Xun
, Yusong Tan
:
Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models. 9067-9076 - Haosen Sun

, Yiming Li
, Xixiang Lyu
, Jing Ma
:
Learning from Distinction: Mitigating Backdoors Using a Low-Capacity Model. 9077-9086 - Shen Lin

, Xiaoyu Zhang
, Willy Susilo
, Xiaofeng Chen
, Jun Liu
:
GDR-GMA: Machine Unlearning via Direction-Rectified and Magnitude-Adjusted Gradients. 9087-9095 - Timin Gao

, Peixian Chen
, Mengdan Zhang
, Chaoyou Fu
, Yunhang Shen
, Yan Zhang
, Shengchuan Zhang
, Xiawu Zheng
, Xing Sun
, Liujuan Cao
, Rongrong Ji
:
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM. 9096-9105 - Shijie Li

, Yunbin Tu
, Qingyuan Xiang
, Zheng Li
:
MAGIC: Rethinking Dynamic Convolution Design for Medical Image Segmentation. 9106-9115 - Chao Wang, Yang Zhou, Liangtian He, Fenglai Lin

, Hongming Chen
, Liang-Jian Deng:
Illumination Distribution Prior for Low-light Image Enhancement. 9116-9125 - Pinhan Fu

, Xinyan Liang
, Yuhua Qian
, Qian Guo
, Zhifang Wei
, Wen Li
:
CoMO-NAS: Core-Structures-Guided Multi-Objective Neural Architecture Search for Multi-Modal Classification. 9126-9135 - Yi Liu

, Jiachen Li
, Yanchun Ma
, Qing Xie
, Yongjian Liu
:
HcaNet: Haze-concentration-aware Network for Real-scene Dehazing with Codebook Priors. 9136-9144 - Wenlong Liao

, Sunyuan Qiang
, Xianfei Li
, Xiaolei Chen
, Haoyu Wang
, Yanyan Liang
, Junchi Yan
, Tao He
, Pai Peng
:
CalibRBEV: Multi-Camera Calibration via Reversed Bird's-eye-view Representations for Autonomous Driving. 9145-9154 - Md Tanvir Islam

, Nasir Rahim
, Saeed Anwar
, Muhammad Saqib
, Sambit Bakshi
, Khan Muhammad
:
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing. 9155-9164 - Xiaojun Chen

, Jimeng Lou
, Wenxi Huang
, Ting Wan
, Qin Zhang
, Min Yang
:
ReCoS: A Novel Benchmark for Cross-Modal Image-Text Retrieval in Complex Real-Life Scenarios. 9165-9174 - Shicheng Yang

, Xiaoxu Li
, Dongliang Chang
, Zhanyu Ma
, Jing-Hao Xue
:
Channel-Spatial Support-Query Cross-Attention for Fine-Grained Few-Shot Image Classification. 9175-9183 - Xiaorui Jiang, Zhongyi Ma, Yulin Fu, Yong Liao, Pengyuan Zhou

:
Heterogeneity-Aware Federated Deep Multi-View Clustering towards Diverse Feature Representations. 9184-9193 - Jiyuan Zhang

, Kang Chen
, Shiyan Chen
, Yajing Zheng
, Tiejun Huang
, Zhaofei Yu
:
SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion. 9194-9203 - Jiangyi Wang

, Zhongyao Cheng
, Na Zhao
, Jun Cheng
, Xulei Yang
:
On-the-fly Point Feature Representation for Point Clouds Analysis. 9204-9213 - Kun Wang

, Hao Liu
, Lirong Jie
, Zixu Li
, Yupeng Hu
, Liqiang Nie
:
Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization. 9214-9223 - Shaoqing Xu

, Shengyin Jiang
, Fang Li
, Li Liu
, Ziying Song
, Bo Yang
, Zhixin Yang
:
SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection. 9224-9233 - Mahiro Ukai

, Shuhei Kurita
, Atsushi Hashimoto
, Yoshitaka Ushiku
, Nakamasa Inoue
:
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering. 9234-9243 - Shengwei Zhao

, Linhai Xu
, Yuying Liu
, Shaoyi Du
:
Multi-grained Correspondence Learning of Audio-language Models for Few-shot Audio Recognition. 9244-9252 - Song Wu

, Xiaoyu Wei
, Xinyue Chen
, Yazhou Ren
, Jing He
, Xiaorong Pu
:
Cross-View Mutual Learning for Semi-Supervised Medical Image Segmentation. 9253-9261 - Yunshan Qi

, Lin Zhu
, Yifan Zhao
, Nan Bao
, Jia Li
:
Deblurring Neural Radiance Fields with Event-driven Bundle Adjustment. 9262-9270 - Jingqiao Xiu

, Mengze Li
, Wei Ji
, Jingyuan Chen
, Hanbin Zhao
, Shin'ichi Satoh
, Roger Zimmermann
:
Hierarchical Debiasing and Noisy Correction for Cross-domain Video Tube Retrieval. 9271-9280 - Wenyu Yin

, Shuyuan Lin
, Yang Lu
, Hanzi Wang
:
Diverse Consensuses Paired with Motion Estimation-Based Multi-Model Fitting. 9281-9290 - Andong Lu

, Jiacong Zhao
, Chenglong Li
, Yun Xiao
, Bin Luo
:
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation. 9291-9300 - Peng Wu

, Xuerong Zhou
, Guansong Pang
, Zhiwei Yang
, Qingsen Yan
, Peng Wang
, Yanning Zhang
:
Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts. 9301-9310 - Pengfei Luo

, Tong Xu
, Che Liu
, Suojuan Zhang
, Linli Xu
, Minglei Li
, Enhong Chen
:
Bridging Gaps in Content and Knowledge for Multimodal Entity Linking. 9311-9320 - Shiyu Tang

, Zhaofan Luo
, Yifan Wang
, Lijun Wang
, Huchuan Lu
, Weibo Su
, Libo Liu
:
LOVD: Large-and-Open Vocabulary Object Detection. 9321-9329 - Cam-Van Thi Nguyen

, The-Son Le
, Anh-Tuan Mai
, Duc-Trong Le
:
Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition. 9330-9339 - Xinpeng Li

, Teng Wang
, Jian Zhao
, Shuyi Mao
, Jinbao Wang
, Feng Zheng
, Xiaojiang Peng
, Xuelong Li
:
Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer. 9340-9349 - Jingjia Huang

, Jingyan Tu
, Ge Meng
, Yingying Wang
, Yuhang Dong
, Xiaotong Tu
, Xinghao Ding
, Yue Huang
:
Efficient Perceiving Local Details via Adaptive Spatial-Frequency Information Integration for Multi-focus Image Fusion. 9350-9359 - Wonwoo Cho

, Kangyeol Kim
, Saemee Choi
, Jaegul Choo
:
Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for Accurate Black-Box Transfer Learning. 9360-9368 - Ning Xu

, Yifei Gao
, Ting-Ting Zhang
, Hongshuo Tian
, An-An Liu
:
Cross-Modal Coherence-Enhanced Feedback Prompting for News Captioning. 9369-9377 - Yuzhen Li

, Zehang Deng
, Yuxin Cao
, Lihua Liu
:
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution. 9378-9386 - Muxin Pu

, Mei Kuan Lim
, Chun Yong Chong
:
Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition. 9387-9396 - Yue Duan

, Zhangxuan Gu
, Zhenzhe Ying
, Lei Qi
, Changhua Meng
, Yinghuan Shi
:
PC2: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval. 9397-9406 - Wei Feng

, Zhenwei Wu
, Qianqian Wang
, Bo Dong
, Quanxue Gao
:
Federated Fuzzy C-means with Schatten-p Norm Minimization. 9407-9416 - Tianjiao Wan

, Kele Xu
, Long Lan
, Zijian Gao
, Dawei Feng
, Bo Ding
, Huaimin Wang
:
Tracing Training Progress: Dynamic Influence Based Selection for Active Learning. 9417-9425 - Ruohao Guo

, Dantong Niu
, Liao Qu
, Yanyu Qi
, Ji Shi
, Wenzhen Yue
, Bowei Xing
, Taiyan Chen
, Xianghua Ying
:
Instance-Level Panoramic Audio-Visual Saliency Detection and Ranking. 9426-9434 - Shenglin Yin

, Kelu Yao
, Zhen Xiao
, Jieyi Long
:
Embracing Adaptation: An Effective Dynamic Defense Strategy Against Adversarial Examples. 9435-9444 - Zitong Huang

, Ze Chen
, Yuanze Li
, Bowen Dong
, Erjin Zhou
, Yong Liu
, Rick Siow Mong Goh
, Chun-Mei Feng
, Wangmeng Zuo
:
Class Balance Matters to Active Class-Incremental Learning. 9445-9454 - Hao Zhang

, Ee Yeo Keat
, Basura Fernando
:
RCA: Region Conditioned Adaptation for Visual Abductive Reasoning. 9455-9464 - Jian-Yu Jiang-Lin

, Kang-Yang Huang
, Ling Lo
, Yi-Ning Huang
, Terence Lin
, Jhih-Ciang Wu
, Hong-Han Shuai
, Wen-Huang Cheng
:
ReCorD: Reasoning and Correcting Diffusion for HOI Generation. 9465-9474 - Xiaze Zhang

, Ziheng Ding
, Qi Jing
, Ying Cheng
, Wenchao Ding
, Rui Feng
:
DeepPointMap2: Accurate and Robust LiDAR-Visual SLAM with Neural Descriptors. 9475-9484 - Hongyu Li

, Tianrui Hui
, Zihan Ding
, Jing Zhang
, Bin Ma
, Xiaoming Wei
, Jizhong Han
, Si Liu
:
Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding. 9485-9494 - Hengde Zhu

, Xiangyu Kong
, Weicheng Xie, Xin Huang
, Linlin Shen
, Lu Liu
, Hatice Gunes
, Siyang Song
:
PerFRDiff: Personalised Weight Editing for Multiple Appropriate Facial Reaction Generation. 9495-9504 - Shiqin Liu

, Chaozhuo Li
, Xi Zhang
, Minjun Zhao
, Yuanbo Xu
, Jiajun Bu
:
Deeply Fusing Semantics and Interactions for Item Representation Learning via Topology-driven Pre-training. 9505-9514 - Yongsen Zheng

, Guohua Wang
, Yang Liu
, Liang Lin
:
Diversity Matters: User-Centric Multi-Interest Learning for Conversational Movie Recommendation. 9515-9524 - Yuanchen Shi

, Fang Kong
:
Integrating Stickers into Multimodal Dialogue Summarization: A Novel Dataset and Approach for Enhancing Social Media Interaction. 9525-9534 - Andreea-Maria Oncescu

, João F. Henriques
, A. Sophia Koepke
:
Dissecting Temporal Understanding in Text-to-Audio Retrieval. 9535-9543 - Yuhang Su

, Wei Hu
, Fan Zhang
, Qiming Xu
:
AMG-Embedding: A Self-Supervised Embedding Approach for Audio Identification. 9544-9553 - Xue Li

, Jiong Yu
, Ziyang Li
, Hongchun Lu
, Ruifeng Yuan
:
Dr. CLIP: CLIP-Driven Universal Framework for Zero-Shot Sketch Image Retrieval. 9554-9562 - Yan Zhuang

, Yanlu Cai
, Weizhong Zhang
, Cheng Jin
:
Future Motion Dynamic Modeling via Hybrid Supervision for Multi-Person Motion Prediction Uncertainty Reduction. 9563-9572 - Yupeng Zhang

, Shuqi Zheng
, Ruize Han
, Yuzhong Feng
, Junhui Hou
, Linqi Song
, Wei Feng
, Liang Wan
:
Rethinking the One-shot Object Detection: Cross-Domain Object Search. 9573-9581 - Yuhan Wu

, Xiyu Meng
, Yang He
, Junru Zhang
, Haowen Zhang
, Yabo Dong
, Dongming Lu
:
Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series. 9582-9590 - Dongding Lin

, Jian Wang
, Chak Tou Leong
, Wenjie Li
:
SCREEN: A Benchmark for Situated Conversational Recommendation. 9591-9600 - Xiaowan Hu

, Yiyi Chen
, Yan Li
, Minquan Wang
, Haoqian Wang
, Quan Chen
, Han Li
, Peng Jiang
:
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval. 9601-9610 - Zheqi Lv

, Shaoxuan He
, Tianyu Zhan
, Shengyu Zhang
, Wenqiao Zhang
, Jingyuan Chen
, Zhou Zhao
, Fei Wu
:
Semantic Codebook Learning for Dynamic Recommendation Models. 9611-9620 - Geng Tu

, Feng Xiong
, Bin Liang
, Hui Wang
, Xi Zeng
, Ruifeng Xu
:
Multimodal Emotion Recognition Calibration in Conversations. 9621-9630 - Wuyou Xia

, Shengzhe Liu
, Rong Qin
, Guoli Jia
, Eunil Park
, Jufeng Yang
:
Perceive before Respond: Improving Sticker Response Selection by Emotion Distillation and Hard Mining. 9631-9640 - Yunshan Ma

, Yingzhi He
, Wenjun Zhong
, Xiang Wang
, Roger Zimmermann
, Tat-Seng Chua
:
CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling. 9641-9649 - Zixian Gao

, Disen Hu
, Xun Jiang
, Huimin Lu
, Heng Tao Shen
, Xing Xu
:
Enhanced Experts with Uncertainty-Aware Routing for Multimodal Sentiment Analysis. 9650-9659 - Zhenyang Li

, Fan Liu
, Yinwei Wei
, Zhiyong Cheng
, Liqiang Nie
, Mohan S. Kankanhalli
:
Attribute-driven Disentangled Representation Learning for Multimodal Recommendation. 9660-9669 - Ting Fu

, Yu-Wei Zhan
, Chong-Yu Zhang
, Xin Luo
, Zhen-Duo Chen
, Yongxin Wang
, Xun Yang
, Xin-Shun Xu
:
FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature Enhancement. 9670-9679 - Feng Zhu

, Xinxing Yang
, Longfei Li
, Jun Zhou
:
An Active Masked Attention Framework for Many-to-Many Cross-Domain Recommendations. 9680-9689 - Zehao Qi

, Ruixu Zhang
, Xinyi Hu
, Wenxuan Liu
, Zheng Wang
:
Predicting the Unseen: A Novel Dataset for Hidden Intention Localization in Pre-abnormal Analysis. 9690-9698 - Ding Wang

, Wei Zhou
, Songlin Hu
:
Information Diffusion Prediction with Graph Neural Ordinary Differential Equation Network. 9699-9708 - Jian Chen

, Wei Wang
, Yuzhu Hu
, Junxin Chen
, Han Liu
, Xiping Hu
:
TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition. 9709-9718 - Rui Yang

, Shuang Wang
, Jianwei Tao
, Yingping Han
, Qiaoling Lin
, Yanhe Guo
, Biao Hou
, Licheng Jiao
:
Accurate and Lightweight Learning for Specific Domain Image-Text Retrieval. 9719-9728 - Xianbing Zhao

, Lizhen Qu
, Tao Feng
, Jianfei Cai
, Buzhou Tang
:
Learning in Order! A Sequential Strategy to Learn Invariant Features for Multimodal Sentiment Analysis. 9729-9738 - Yutong Wang

, Sidan Zhu
, Hongteng Xu
, Dixin Luo
:
An Inverse Partial Optimal Transport Framework for Music-guided Trailer Generation. 9739-9748 - Haonan Zheng

, Wen Jiang
, Xinyang Deng
, Wenrui Li
:
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models. 9749-9758 - Jiade Chen

, Jin Wang
, Yunhui Shi
, Nam Ling
, Baocai Yin
:
MVP-Net: Multi-View Depth Image Guided Cross-Modal Distillation Network for Point Cloud Upsampling. 9759-9768 - Zuoyan Zhao

, Hui Xue
, Pengfei Fang
, Shipeng Zhu
:
PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution. 9769-9778 - Yuzhi Huang

, Chenxin Li
, Zixu Lin
, Hengyu Liu
, Haote Xu
, Yifan Liu
, Yue Huang
, Xinghao Ding
, Xiaotong Tu
, Yixuan Yuan
:
P2SAM: Probabilistically Prompted SAMs Are Efficient Segmentator for Ambiguous Medical Images. 9779-9788 - Ran Yi

, Haokun Zhu
, Teng Hu
, Yu-Kun Lai
, Paul L. Rosin
:
AesStyler: Aesthetic Guided Universal Style Transfer. 9789-9798 - Wenxuan Wang

, Chenglei Wang
, Huihui Qi
, Menghao Ye
, Xuelin Qian
, Peng Wang
, Yanning Zhang
:
Sustainable Self-evolution Adversarial Training. 9799-9808 - Jian-Jun Qiao

, Meng-Yu Duan
, Xiao Wu
, Wei Li
:
CAPNet: Cartoon Animal Parsing with Spatial Learning and Structural Modeling. 9809-9817 - Xuanyu Zhang

, Youmin Xu
, Runyi Li
, Jiwen Yu
, Weiqi Li
, Zhipei Xu
, Jian Zhang
:
V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection. 9818-9827 - Xian Zhong

, Shengwang Hu
, Wenxuan Liu
, Wenxin Huang
, Jianhao Ding
, Zhaofei Yu
, Tiejun Huang
:
Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks. 9828-9836 - Junqi Shi

, Mingyi Jiang
, Ming Lu
, Tong Chen
, Xun Cao
, Zhan Ma
:
HINER: Neural Representation for Hyperspectral Image. 9837-9846 - Yaqiang Wu

, Zhen Xu
, Yong Duan
, Yanlai Wu
, Qinghua Zheng
, Hui Li
, Xiaochen Hu
, Lianwen Jin
:
RDLNet: A Novel and Accurate Real-world Document Localization Method. 9847-9855 - Xiao Teng

, Xingyu Shen
, Kele Xu
, Long Lan
:
Enhancing Unsupervised Visible-Infrared Person Re-Identification with Bidirectional-Consistency Gradual Matching. 9856-9865 - Zhen Zhang

, Jing Xiao
, Liang Liao
, Mi Wang
:
RefScale: Multi-temporal Assisted Image Rescaling in Repetitive Observation Scenarios. 9866-9874 - Chaoxiang He

, Xiaofan Bai
, Xiaojing Ma
, Bin B. Zhu
, Pingyi Hu
, Jiayun Fu
, Hai Jin
, Dongmei Zhang
:
Towards Stricter Black-box Integrity Verification of Deep Neural Network Models. 9875-9884 - Peibin Chen

, Xijin Zhang
, Daniel Kang Du
:
SimpliGuard: Robust Mesh Simplification In the Wild. 9885-9893 - Shixuan Gao

, Pingping Zhang, Tianyu Yan
, Huchuan Lu
:
Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection. 9894-9903 - Panjun Duan

, Yang Zhao
, Yuan Chen
, Wei Jia
, Zhao Zhang
, Ronggang Wang
:
Blind Video Bit-Depth Expansion. 9904-9912 - Xiaoheng Tan

, Jiabin Zhang
, Yuhui Quan
, Jing Li
, Yajing Wu
, Zilin Bian
:
Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy. 9913-9922 - Yujia Wang

, Zhongxu Wang
, Hua Huang
:
AutoSFX: Automatic Sound Effect Generation for Videos. 9923-9932 - Weiguang Zhang

, Qiufeng Wang
, Kaizhu Huang
, Xiaowei Huang
, Fengjun Guo
, Xiaomeng Gu
:
Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents. 9933-9942 - Hao Yang

, Min Wang
, Zhengfei Yu
, Zhi Zeng
, Mingrui Lao
, Yun Zhou
:
Maximizing Feature Distribution Variance for Robust Neural Networks. 9943-9951 - Kai Han

, Jin Wang
, Yunhui Shi
, Nam Ling
, Baocai Yin
:
D3U-Net: Dual-Domain Collaborative Optimization Deep Unfolding Network for Image Compressive Sensing. 9952-9960 - Jiangtong Zhu

, Zhao Yang
, Yinan Shi
, Jianwu Fang
, Jianru Xue
:
IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction. 9961-9969 - Jianjun Xiang

, Yuanjie Dang
, Peng Chen
, Ronghua Liang
, Ruohong Huan
, Nan Gao
:
Semantic-Aware and Quality-Aware Interaction Network for Blind Video Quality Assessment. 9970-9979 - Zerui Zhang

, Jun Yu
, Liangxian Cui
, Qiang Ling
, Tianyu Liu
:
Part-level Reconstruction for Self-Supervised Category-level 6D Object Pose Estimation with Coarse-to-Fine Correspondence Optimization. 9980-9988 - Yachun Mi

, Yan Shu
, Yu Li
, Chen Hui
, Puchao Zhou
, Shaohui Liu
:
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings. 9989-9998 - Xuntao Liu

, Yuzhou Yang
, Haoyue Wang
, Qichao Ying
, Zhenxing Qian
, Xinpeng Zhang
, Sheng Li
:
Multi-view Feature Extraction via Tunable Prompts is Enough for Image Manipulation Localization. 9999-10007 - Junfeng Yang

, Jing Fu
, Zhen Zhang
, Limei Liu
, Qin Li
, Wei Zhang
, Wenzhi Cao
:
Align-IQA: Aligning Image Quality Assessment Models with Diverse Human Preferences via Customizable Guidance. 10008-10017 - Zehang Lin

, Jiayuan Xie
, Zhenguo Yang
, Yi Yu
, Qing Li
:
Generalized News Event Discovery via Dynamic Augmentation and Entropy Optimization. 10018-10026 - Jiahao Cui

, Wei Jiang
, Zhan Peng
, Zhiyu Pan
, Zhiguo Cao
:
Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering. 10027-10035 - Lei Han

, Xuesong Zhang
:
Scalable Super-Resolution Neural Operator. 10036-10045 - Ling Zhang

, Yidong Ma
, Zhi Jiang
, Weilei He
, Zhongyun Bao
, Gang Fu
, Wenju Xu
, Chunxia Xiao
:
HighlightRemover: Spatially Valid Pixel Learning for Image Specular Highlight Removal. 10046-10054 - Yuhang Zhou

, Yushu Zhang
, Leo Yu Zhang
, Zhongyun Hua
:
DERD: Data-free Adversarial Robustness Distillation through Self-adversarial Teacher Group. 10055-10064 - Shuman Zhuang

, Sujia Huang
, Wei Huang, Yuhong Chen
, Zhihao Wu
, Ximeng Liu
:
Enhancing Multi-view Graph Neural Network with Cross-view Confluent Message Passing. 10065-10074 - Fu Rong

, Wenjin Peng
, Meng Lan
, Qian Zhang
, Lefei Zhang
:
Driving Scene Understanding with Traffic Scene-Assisted Topology Graph Transformer. 10075-10084 - Chang'an Yi

, Haotian Chen
, Yifan Zhang
, Yonghui Xu
, Yan Zhou
, Lizhen Cui
:
From Question to Exploration: Can Classic Test-Time Adaptation Strategies Be Effectively Applied in Semantic Segmentation? 10085-10094 - Zehao Chen

, Zhan Lu
, De Ma
, Huajin Tang
, Xudong Jiang
, Qian Zheng
, Gang Pan
:
Event-ID: Intrinsic Decomposition Using an Event Camera. 10095-10104 - Xu Zhang

, Fan Ni
, Guannan Dong
, Aichun Zhu
, Jianhui Wu
, Mingcheng Ni
, Hui Liu
:
TVPR: Text-to-Video Person Retrieval and a New Benchmark. 10105-10113 - Haoyu Shi

, Huaiwen Zhang
:
Modal-Enhanced Semantic Modeling for Fine-Grained 3D Human Motion Retrieval. 10114-10123 - Hongyu Zhu

, Sichu Liang
, Wentao Hu
, Fangqi Li, Ju Jia
, Shi-Lin Wang
:
Reliable Model Watermarking: Defending against Theft without Compromising on Evasion. 10124-10133 - Qian Qiao

, Yu Xie
, Jun Gao
, Tianxiang Wu
, Shaoyao Huang
, Jiaqing Fan
, Ziqiang Cao
, Zili Wang
, Yue Zhang
:
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training. 10134-10143 - Yi Liu

, Xinyi Li
, Wenjing Shuai
:
3D Scene De-occlusion in Neural Radiance Fields: A Framework for Obstacle Removal and Realistic Inpainting. 10144-10153 - Xuannan Liu

, Peipei Li
, Huaibo Huang
, Zekun Li
, Xing Cui
, Jiahao Liang
, Lixiong Qin
, Weihong Deng
, Zhaofeng He
:
FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs. 10154-10163 - Yalan Qin

, Li Qian
:
Fast Elastic-Net Multi-view Clustering: A Geometric Interpretation Perspective. 10164-10172 - Xiaojiao Guo

, Xuhang Chen
, Shenghong Luo
, Shuqiang Wang
, Chi-Man Pun
:
Dual-Hybrid Attention Network for Specular Highlight Removal. 10173-10181 - Yiyang Luo

, Ke Lin
, Chao Gu
:
Context-Aware Indoor Point Cloud Object Generation through User Instructions. 10182-10190 - Zhangli Hu

, Ye Chen
, Zhongyin Zhao
, Jinfan Liu
, Bilian Ke
, Bingbing Ni
:
Towards Artist-Like Painting Agents with Multi-Granularity Semantic Alignment. 10191-10199 - Zixuan Wang

, Jiayi Li
, Xiaoyu Qin
, Shikun Sun
, Songtao Zhou
, Jia Jia
, Jiebo Luo
:
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis. 10200-10209 - Sooho Kim

, Soyeon Hong
, Kyungsoo Park
, Hyunsouk Cho
, Kyung-Ah Sohn
:
OmniStitch: Depth-Aware Stitching Framework for Omnidirectional Vision with Multiple Cameras. 10210-10219 - Kaijiang Li

, Hao Li
, Haining Li
, Peisen Wang
, Chunyi Guo
, Wenfeng Jiang
:
SIRLUT: Simulated Infrared Fusion Guided Image-adaptive 3D Lookup Tables for Lightweight Image Enhancement. 10220-10228 - Bolin Jiang

, Yuqiu Xie
, Jiawei Li
, Naiqi Li
, Bin Chen
, Shu-Tao Xia
:
IGSPAD: Inverting 3D Gaussian Splatting for Pose-agnostic Anomaly Detection. 10229-10237 - Guobiao Li

, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
Cover-separable Fixed Neural Network Steganography via Deep Generative Models. 10238-10247 - Baorui Ma

, Yu-Shen Liu
, Matthias Zwicker
, Zhizhong Han
:
Inferring 3D Occupancy Fields through Implicit Reasoning on Silhouette Images. 10248-10257 - Rui Li

, Yishu Liu
, Huafeng Li
, Jinxing Li
, Guangming Lu
:
Prototype-Guided Dual-Transformer Reasoning for Video Individual Counting. 10258-10267 - Tao Wang

, Yushu Zhang
, Xiangli Xiao
, Lin Yuan
, Zhihua Xia
, Jian Weng
:
Make Privacy Renewable! Generating Privacy-Preserving Faces Supporting Cancelable Biometric Recognition. 10268-10276 - Green Rosh K. S

, B. H. Pawan Prasad
, Lokesh R. Boregowda
, Kaushik Mitra
:
R2SFD: Improving Single Image Reflection Removal using Semantic Feature Dictionary. 10277-10286 - Jiaming Shen

, Kun Hu
, Wei Bao
, Chang Wen Chen
, Zhiyong Wang
:
Bridging the Gap: Sketch-Aware Interpolation Network for High-Quality Animation Sketch Inbetweening. 10287-10295 - Yanghao Su

, Jie Zhang
, Ting Xu
, Tianwei Zhang
, Weiming Zhang
, Nenghai Yu
:
Model X-ray: Detecting Backdoored Models via Decision Boundary. 10296-10305 - Lize Zhou

, Xiaoqi Wang
, Jian Xiong
, Xianzhong Long
, Hao Gao
:
Towards Distortion-Debiased Blind Image Quality Assessment. 10306-10315 - Benhui Zhang

, Junyu Gao
, Yuan Yuan
:
A Descriptive Basketball Highlight Dataset for Automatic Commentary Generation. 10316-10325 - Cong Wang

, Liyan Wang
, Jie Mu
, Chengjin Yu
, Wei Wang
:
Progressive Local and Non-Local Interactive Networks with Deeply Discriminative Training for Image Deraining. 10326-10335 - Kaifang Yang

, Xinrong Zhao
, Yanchao Gong
:
Semantic Aware Just Noticeable Differences for VVC Compressed Text Screen Content Images. 10336-10344 - Jiaxuan Wu

, Zhengxian Wu
, Yiming Xue
, Juan Wen
, Wanli Peng
:
Generative Text Steganography with Large Language Model. 10345-10353 - Yuchen Wang

, Xingyu Zhu
, Guanhui Ye
, Shiyao Zhang
, Xuetao Wei
:
Achieving Resolution-Agnostic DNN-based Image Watermarking: A Novel Perspective of Implicit Neural Representation. 10354-10362 - Renshu Gu

, Jiajun Zhu
, Yixuan Si
, Fei Gao
, Jiamin Xu
, Gang Xu
:
3D Human Pose Estimation from Multiple Dynamic Views via Single-view Pretraining with Procrustes Alignment. 10363-10372 - Yang Ding

, Yi Dai
, Xin Wang
, Ling Feng
, Lei Cao
, Huijun Zhang
:
Integrating Content-Semantics-World Knowledge to Detect Stress from Videos. 10373-10381 - Xintian Mao

, Jiansheng Wang
, Xingran Xie
, Qingli Li
, Yan Wang
:
LoFormer: Local Frequency Transformer for Image Deblurring. 10382-10391 - Mingjin Zhang

, Chi Zhang
, Qiming Zhang
, Yunsong Li
, Xinbo Gao
, Jing Zhang
:
Unleashing the Power of Generic Segmentation Model: A Simple Baseline for Infrared Small Target Detection. 10392-10401 - Honglin Yuan

, Shiyun Lai
, Xingfeng Li
, Jian Dai
, Yuan Sun
, Zhenwen Ren
:
Robust Prototype Completion for Incomplete Multi-view Clustering. 10402-10411 - Changhao Peng

, Wei Gao
:
Laplacian Matrix Learning for Point Cloud Attribute Compression with Ternary Search-Based Adaptive Block Partition. 10412-10420 - Zhongwei Xuan

, Zunjie Zhu
, Shuai Wang
, Haibing Yin
, Hongkui Wang
, Ming Lu
:
Superpixel-based Efficient Sampling for Learning Neural Fields from Large Input. 10421-10430 - Zhaolin Wan

, Qiushuang Yang
, Zhiyang Li
, Xiaopeng Fan
, Wangmeng Zuo
, Debin Zhao
:
Dual-stream Perception-driven Blind Quality Assessment for Stereoscopic Omnidirectional Images. 10431-10439 - Weixuan Tang

, Haoyu Yang
, Yuan Rao
, Zhili Zhou
, Fei Peng
:
Dig a Hole and Fill in Sand: Adversary and Hiding Decoupled Steganography. 10440-10448 - Bin Wang

, Meishan Zhang
, Hao Fei
, Yu Zhao
, Bobo Li
, Shengqiong Wu
, Wei Ji
, Min Zhang
:
SpeechEE: A Novel Benchmark for Speech Event Extraction. 10449-10458 - Shouyu Chen

, Liang Hu
, Tangwei Ye
, Zhongyuan Lai
, Qi Zhang
, Ke Liu
, Usman Naseem
, Ke Sun
, Nengjun Zhu
:
VR-DiagNet: Medical Volumetric and Radiomic Diagnosis Networks with Interpretable Clinician-like Optimizing Visual Inspection. 10459-10467 - Minjing Yu

, Delong Pang
, Ziwen Kang
, Zhiyao Sun
, Tian Lv
, Jenny Sheng
, Ran Yi
, Yu-Hui Wen
, Yong-Jin Liu
:
ECAvatar: 3D Avatar Facial Animation with Controllable Identity and Emotion. 10468-10476 - Zhenyu Bao

, Guibiao Liao
, Zhongyuan Zhao
, Kanglin Liu
, Qing Li
, Guoping Qiu
:
3D Reconstruction and Novel View Synthesis of Indoor Environments Based on a Dual Neural Radiance Field. 10477-10486 - Zimo Liu

, Kangjun Liu
, Mingyue Guo
, Shiliang Zhang
, Yaowei Wang
:
CoTuning: A Large-Small Model Collaborating Distillation Framework for Better Model Generalization. 10487-10496 - Yanbin Deng

, Zheng Li
, Ning Xie
, Wei Zhang
:
PIMT: Physics-Based Interactive Motion Transition for Hybrid Character Animation. 10497-10505 - Kang Shen

, Haifeng Xia
, Guangxing Geng
, Guangyue Geng
, Siyu Xia
, Zhengming Ding
:
DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity Modeling. 10506-10514 - Tianyi Wang

, Mengxiao Huang
, Harry Cheng
, Xiao Zhang
, Zhiqi Shen
:
LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks. 10515-10524 - Lintao Dong

, Wei Zhai
, Zheng-Jun Zha
:
UniDense: Unleashing Diffusion Models with Meta-Routers for Universal Few-Shot Dense Prediction. 10525-10534 - Henglei Lv

, Jiayu Xiao
, Liang Li
:
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization. 10535-10543 - Guoqing Zhu, Honghu Pan, Qiang Wang, Chao Tian, Chao Yang, Zhenyu He:

Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model. 10544-10553 - Qiao Li

, Xiaomeng Fu
, Xi Wang
, Jin Liu
, Xingyu Gao
, Jiao Dai
, Jizhong Han
:
Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models. 10554-10562 - Zhaoda Ye

, Xinhan Zheng
, Yang Liu
, Yuxin Peng
:
RelScene: A Benchmark and baseline for Spatial Relations in text-driven 3D Scene Generation. 10563-10571 - Shilong Tian

, Hong Chen
, Chengtao Lv
, Yu Liu
, Jinyang Guo
, Xianglong Liu
, Shengxi Li
, Hao Yang
, Tao Xie
:
QVD: Post-training Quantization for Video Diffusion Models. 10572-10581 - Jingjing Xie

, Yuxin Zhang
, Mingbao Lin
, Liujuan Cao
, Rongrong Ji
:
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation. 10582-10591 - Pengfei Zhou

, Fangxiang Feng
, Guang Liu
, Ruifan Li
, Xiaojie Wang
:
DiffHarmony++: Enhancing Image Harmonization with Harmony-VAE and Inverse Harmonization Model. 10592-10601 - Qi Xu

, Xuanye Fang
, Yaxin Li
, Jiangrong Shen
, De Ma
, Yi Xu
, Gang Pan
:
RSNN: Recurrent Spiking Neural Networks for Dynamic Spatial-Temporal Information Processing. 10602-10610 - Wei Yang

, Tengfei Huo
, Zhiqiang Liu
:
Enhancing Transformer-based Semantic Matching for Few-shot Learning through Weakly Contrastive Pre-training. 10611-10620 - Stanislav Frolov

, Brian B. Moser
, Sebastian Palacio
, Andreas Dengel
:
ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation. 10621-10629 - Rongjie Huang

, Yongqi Wang
, Ruofan Hu
, Xiaoshan Xu
, Zhiqing Hong
, Dongchao Yang
, Xize Cheng
, Zehan Wang
, Ziyue Jiang, Zhenhui Ye
, Luping Liu
, Siqi Zheng
, Zhou Zhao
:
VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation. 10630-10639 - Yuran Wang

, Zhijing Wan
, Yansheng Qiu
, Zheng Wang
:
Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation. 10640-10648 - Minghui Li

, Jiangxiong Wang
, Hao Zhang
, Ziqi Zhou
, Shengshan Hu
, Xiaobing Pei
:
Transferable Adversarial Facial Images for Privacy Protection. 10649-10658 - Ming Tao

, Bing-Kun Bao
, Hao Tang
, Yaowei Wang
, Changsheng Xu
:
CoIn: A Lightweight and Effective Framework for Story Visualization and Continuation. 10659-10668 - Xulu Zhang

, Wengyu Zhang
, Xiaoyong Wei
, Jinlin Wu
, Zhaoxiang Zhang
, Zhen Lei
, Qing Li
:
Generative Active Learning for Image Synthesis Personalization. 10669-10677 - Zhijun Zhai

, Zengmao Wang
, Xiaoxiao Long
, Kaixuan Zhou
, Bo Du
:
SAT3D: Image-driven Semantic Attribute Transfer in 3D. 10678-10687 - Zihan Huang

, Xinyu Shi
, Zecheng Hao
, Tong Bu
, Jianhao Ding
, Zhaofei Yu
, Tiejun Huang
:
Towards High-performance Spiking Transformers from ANN to SNN Conversion. 10688-10697 - Jialiang Li

, Haoyue Wang
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
, Athanasios V. Vasilakos
:
Are handcrafted filters helpful for attributing AI-generated images? 10698-10706 - Peng Ding

, Jingyu Wu
, Jun Kuang
, Dan Ma
, Xuezhi Cao
, Xunliang Cai
, Shi Chen
, Jiajun Chen
, Shujian Huang
:
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs. 10707-10715 - Shaodong Wang

, Yunyang Ge
, Liuhan Chen
, Haiyang Zhou
, Qian Wang
, Xinhua Cheng
, Li Yuan
:
Prompt2Poster: Automatically Artistic Chinese Poster Creation from Prompt Only. 10716-10724 - Weijie Wang

, Jichao Zhang
, Chang Liu
, Xia Li
, Xingqian Xu
, Humphrey Shi
, Nicu Sebe
, Bruno Lepri
:
UVMap-ID: A Controllable and Personalized UV Map Generative Model. 10725-10734 - Tianshuo Peng

, Zuchao Li
, Lefei Zhang
, Hai Zhao
, Ping Wang
, Bo Du
:
Multi-modal Auto-regressive Modeling via Visual Tokens. 10735-10744 - Haining Wang

, Na Li
, Huijie Zhao
, Yan Wen
, Yi Su
, Yuqiang Fang
:
MappingFormer: Learning Cross-modal Feature Mapping for Visible-to-infrared Image Translation. 10745-10754 - Xiangping Zheng

, Xiuxin Hao
, Bo Wu
, Xigang Bao
, Xuan Zhang
, Wei Li
, Xun Liang
:
A Sample-driven Selection Framework: Towards Graph Contrastive Networks with Reinforcement Learning. 10755-10764 - Peiyong Wang

, Bohan Xiao
, Qisheng He
, Carri Glide-Hurst
, Ming Dong
:
Score-Based Image-to-Image Brownian Bridge. 10765-10773 - Tingfeng Cao

, Junsheng Kong
, Xue Zhao
, Wenqing Yao
, Junwei Ding
, Jinhui Zhu
, Jiandong Zhang
:
Product2IMG: Prompt-Free E-commerce Product Background Generation with Diffusion Model and Self-Improved LMM. 10774-10783 - Zhenyu Xie

, Haoye Dong
, Yufei Gao
, Zehua Ma
, Xiaodan Liang
:
DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models. 10784-10793 - Chencan Fu

, Yabiao Wang
, Jiangning Zhang
, Zhengkai Jiang
, Xiaofeng Mao
, Jiafu Wu
, Weijian Cao
, Chengjie Wang
, Yanhao Ge
, Yong Liu
:
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion. 10794-10803 - Wei Lou

, Guanbin Li
, Xiang Wan
, Haofeng Li
:
Multi-modal Denoising Diffusion Pre-training for Whole-Slide Image Classification. 10804-10813 - Xingyi Li

, Yizheng Wu
, Jun Cen
, Juewen Peng
, Kewei Wang
, Ke Xian
, Zhe Wang
, Zhiguo Cao
, Guosheng Lin
:
iControl3D: An Interactive System for Controllable 3D Scene Generation. 10814-10823 - Yibin Wang

, Weizhong Zhang
, Jianwei Zheng
, Cheng Jin
:
PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering. 10824-10832 - Jiancheng Huang

, Mingfu Yan
, Songyan Chen
, Yi Huang
, Shifeng Chen
:
MagicFight: Personalized Martial Arts Combat Video Generation. 10833-10842 - Longfei Lu

, Huachen Gao
, Tao Dai
, Yaohua Zha
, Zhi Hou
, Junta Wu
, Shu-Tao Xia
:
Large Point-to-Gaussian Model for Image-to-3D Generation. 10843-10852 - Mingzhen Sun

, Weining Wang
, Yanyuan Qiao
, Jiahui Sun
, Zihan Qin
, Longteng Guo
, Xinxin Zhu
, Jing Liu
:
MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation. 10853-10861 - Ruowei Wang

, Jiaqi Li
, Dan Zeng
, Xueqi Ma
, Zixiang Xu
, Jianwei Zhang
, Qijun Zhao
:
GenUDC: High Quality 3D Mesh Generation With Unsigned Dual Contouring Representation. 10862-10871 - Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yinpeng Dong, Xiaolin Hu:

Natural Language Induced Adversarial Images. 10872-10881 - Xin Lu

, Chuanqing Zhuang
, Zhengda Lu
, Yiqun Wang
, Jun Xiao
:
FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing. 10882-10890 - Jiaxing Li, Hongbo Zhao, Yijun Wang, Jianxin Lin:

Towards Photorealistic Video Colorization via Gated Color-Guided Image Diffusion Models. 10891-10900 - Mengmeng Ge

, Xu Jia
, Takashi Isobe
, Xiaomin Li
, Qinghe Wang
, Jing Mu
, Dong Zhou
, Li Wang
, Huchuan Lu
, Lu Tian
, Ashish Sirasao
, Emad Barsoum
:
Customizing Text-to-Image Generation with Inverted Interaction. 10901-10909 - Yunqiu Xu

, Linchao Zhu
, Yi Yang:
GG-Editor: Locally Editing 3D Avatars with Multimodal Large Language Model Guidance. 10910-10919 - Xianqiang Lyu

, Hui Liu
, Junhui Hou
:
RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering. 10920-10929 - Jingyu Lin

, Guiqin Zhao
, Jing Xu
, Guoli Wang
, Zejin Wang
, Antitza Dantcheva
, Lan Du
, Cunjian Chen
:
DiffTV: Identity-Preserved Thermal-to-Visible Face Translation via Feature Alignment and Dual-Stage Conditions. 10930-10938 - Yifan Li

, Yuhang Bai
, Shuai Yang
, Jiaying Liu
:
COCO-LC: Colorfulness Controllable Language-based Colorization. 10939-10947 - Yiying Bao

, Hao Zhou
, Chao Peng
, Chenyang Xu
, Shuo Shi
, Kecheng Cai
:
Boundary-Aware Periodicity-based Sparsification Strategy for Ultra-Long Time Series Forecasting. 10948-10956 - Ziyi Dong

, Yao Xiao
, Pengxu Wei
, Liang Lin
:
Decoder-Only LLMs are Better Controllers for Diffusion Models. 10957-10965 - Zhenqi Dai

, Ting Liu
, Xingxing Zhang
, Yunchao Wei
, Yanning Zhang
:
One-shot In-context Part Segmentation. 10966-10975 - Ziyang Yuan

, Mingdeng Cao
, Xintao Wang
, Zhongang Qi
, Chun Yuan
, Ying Shan
:
CustomNet: Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models. 10976-10984 - Kyusun Cho

, Joungbin Lee
, Heeji Yoon
, Yeobin Hong
, Jaehoon Ko
, Sangjun Ahn
, Seungryong Kim
:
GaussianTalker: Real-Time Talking Head Synthesis with 3D Gaussian Splatting. 10985-10994 - Huanpeng Chu

, Wei Wu
, Chengjie Zang
, Kun Yuan
:
QNCD: Quantization Noise Correction for Diffusion Models. 10995-11003 - Dan Wang

, Xinrui Cui
:
InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering. 11004-11012 - Zhongyi Fan

, Zixin Yin
, Gang Li
, Yibing Zhan
, Heliang Zheng
:
DreamBooth++: Boosting Subject-Driven Generation via Region-Level References Packing. 11013-11021 - Zhenghao Chen

, Luping Zhou
, Zhihao Hu
, Dong Xu
:
Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression. 11022-11031 - Lingfei Ren

, Ruimin Hu
, Zheng Wang
, Yilin Xiao
, Dengshi Li
, Junhang Wu
, Yilong Zang
, Jinzhang Hu
, Zijun Huang
:
Heterophilic Graph Invariant Learning for Out-of-Distribution of Fraud Detection. 11032-11040 - Haicheng Liao

, Haoyu Sun
, Huanming Shen
, Chengyue Wang
, Chunlin Tian
, KaHou Tam
, Li Li
, Chengzhong Xu
, Zhenning Li
:
CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions. 11041-11050 - Lehao Lin

, Hong Kang
, Xinyao Sun
, Wei Cai
:
SemNFT: A Semantically Enhanced Decentralized Middleware for Digital Asset Immortality. 11051-11059 - Guogang Zhu

, Xuefeng Liu
, Jianwei Niu
, Shaojie Tang
, Xinghao Wu
, Jiayuan Zhang
:
DualFed: Enjoying both Generalization and Personalization in Federated Learning via Hierachical Representations. 11060-11069 - Hui Zeng

, Minrui Xu
, Tongqing Zhou
, Xinyi Wu
, Jiawen Kang
, Zhiping Cai
, Dusit Niyato
:
One-shot-but-not-degraded Federated Learning. 11070-11079 - Miao Cao

, Lishun Wang
, Huan Wang
, Guoqing Wang
, Xin Yuan
:
Towards Real-time Video Compressive Sensing on Mobile Devices. 11080-11088 - Daheng Yin

, Jianxin Shi
, Miao Zhang
, Zhaowu Huang
, Jiangchuan Liu
, Fang Dong
:
FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature Grid. 11089-11098 - Huanhuan Zhang

, Liu zhuo
, Haotian Li
, Anfu Zhou
, Chuanming Wang
, Huadong Ma
:
AraLive: Automatic Reward Adaption for Learning-based Live Video Streaming. 11099-11108 - Jun Dan

, Weiming Liu
, Mushui Liu
, Chunfeng Xie
, Shunjie Dong
, Guofang Ma
, Yanchao Tan
, Jiazheng Xing
:
HOGDA: Boosting Semi-supervised Graph Domain Adaptation via High-Order Structure-Guided Adaptive Feature Alignment. 11109-11118
Reproducibility
- Xin Jin

, Longteng Jiang
, Yihao Zhang
, Lihua Lu
, Xiaobo Gao
, Boyan Dong
:
Reproducibility Companion Paper: Aesthetics-Driven Virtual Time-Lapse Photography Generation. 11119-11122
Panel
- Zi Helen Huang

, Phoebe Chen
, Shuicheng Yan
:
Generative AI in Multimedia: Challenges and Opportunities for Academic and Industrial Impact. 11123-11124
Industry Session
- Jianquan Liu

, Balu Adsumilli
, Yukiko Yanagawa
, Haiwei Dong
:
An Innovative Industry Program in A New Era of Multimedia with Generative AI. 11125-11126
Doctoral Symposium
- Wenmiao Hu

:
Utilizing Very High-resolution Optical RGB Satellite Imagery in Geo-information Extraction for Fine-scale Map-making. 11127-11131 - Cheng Zhang

:
Practical Deep Learning Models for QIM-based VoIP Steganalysis. 11132-11136
Brave New Ideas
- Jie An

, Zhengyuan Yang
, Linjie Li
, Jianfeng Wang
, Kevin Lin
, Zicheng Liu
, Lijuan Wang
, Jiebo Luo
:
OpenLEAF: A Novel Benchmark for Open-Domain Interleaved Image-Text Generation. 11137-11145 - Carlos de la Torre-Ortiz

, Tuukka Ruotsalo
:
Perceptual Visual Similarity from EEG: Prediction and Image Generation. 11146-11155 - Yifeng Gao

, Yuhua Sun
, Xingjun Ma
, Zuxuan Wu
, Yu-Gang Jiang
:
ModelLock: Locking Your Model With a Spell. 11156-11165 - Jiyi Zhang

, Han Fang
, Ee-Chien Chang
:
Finding Input Data Domains of Image Classification Models with Hard-Label Black-Box Access. 11166-11174 - Yudong Zhang

, Ruobing Xie
, Jiansheng Chen
, Xingwu Sun
, Yu Wang
:
PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions. 11175-11183 - Taotao Zhou

, Teng Xu
, Dong Zhang
, Yuyang Jiao
, Peijun Xu
, Yaoyu He
, Lan Xu
, Jingyi Yu
:
Sophia-in-Audition: Virtual Production with a Robot Performer. 11184-11193
Open-Source
- Xiaodong Chen

, Kunlang He
, Wu Liu
, Xinchen Liu
, Zheng-Jun Zha
, Tao Mei
:
CLaM: An Open-Source Library for Performance Evaluation of Text-driven Human Motion Generation. 11194-11197 - Haodong Duan

, Junming Yang
, Yuxuan Qiao
, Xinyu Fang
, Lin Chen
, Yuan Liu
, Xiaoyi Dong
, Yuhang Zang
, Pan Zhang
, Jiaqi Wang
, Dahua Lin
, Kai Chen
:
VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models. 11198-11201 - Wei Gao

, Huiming Zheng
, Chenhao Zhang
, Kaiyu Zheng
, Zhuozhen Yu
, Yuan Li
, Hua Ye
, Yongchi Zhang
:
OpenDIC: An Open-Source Library and Performance Evaluation for Deep-learning-based Image Compression. 11202-11205 - Hung-Jui Guo

, Hiranya Garbha Kumar
, Minhas Kamal
, Balakrishnan Prabhakaran
:
Room2XR: Virtual Interactive Collaboration in Real-world Scenes. 11206-11209 - Jack Jansen

, Thomas Röggla
, Silvia Rossi
, Irene Viola
, Pablo César
:
Open-Sourcing VR2Gather: A Collaborative Social VR System for Adaptive Multi-Party Real Time Communication. 11210-11213 - Joni Räsänen

, Heikki Tampio
, Alexandre Mercat
, Jarno Vanne
:
uvgComm: Open Software for Low-Latency Multi-party Video Communication. 11214-11217 - Tomás Soucek

, Jakub Lokoc
:
TransNet V2: An Effective Deep Network Architecture for Fast Shot Transition Detection. 11218-11221 - Jingyuan Tang

, Yangang Cai
, Xuesong Gao
, Songlin Sun
:
Generalized Sampling of Non-Local Textural Clues Multi-View Stereo Framework. 11222-11225 - Yuan Tong

, Mengshun Hu
, Zheng Wang
:
NNVISR: Bring Neural Network Video Interpolation and Super Resolution into Video Processing Framework. 11226-11229 - Marko Viitanen

, Joose Sainio
, Kari Siivonen
, Alexandre Mercat
, Jarno Vanne
:
uvg266: Open-Source VVC Intra Encoder. 11230-11233 - Liang Xie

, Wei Gao
:
LearningPCC: A PyTorch Library for Learning-Based Point Cloud Compression. 11234-11238 - Liang Xie

, Wei Gao
:
PCHMVision: An Open-Source Library of Point Cloud Compression for Human and Machine Vision. 11239-11243 - Feng Ye

, Li Zhang
, Chuanmin Jia
:
Deep Video Compression with Scaled Hierarchical Bi-directional Motion Model. 11244-11247 - Hang Yuan

, Wei Gao
, Wenxu Gao
:
OpenSEP: An Open Source Subjective Experiment Platform. 11248-11251
Technical Demonstrations
- Ansel Blume

, Khanh Duy Nguyen
, Zhenhailong Wang
, Yangyi Chen
, Michal Shlapentokh-Rothman
, Xiaomeng Jin
, Jeonghwan Kim
, Zhen Zhu
, Jiateng Liu
, Kuan-Hao Huang
, Mankeerat Sidhu
, Xuanming Zhang
, Vivian Liu
, Raunak Sinha
, Te-Lin Wu
, Abhay Zala
, Elias Stengel-Eskin
, Da Yin
, Yao Xiao
, Utkarsh Mall
, Zhou Yu
, Kai-Wei Chang
, Camille Cobb
, Karrie Karahalios
, Lydia B. Chilton
, Mohit Bansal
, Nanyun Peng
, Carl Vondrick
, Derek Hoiem
, Heng Ji
:
MIRACLE: An Online, Explainable Multimodal Interactive Concept Learning System. 11252-11254 - Difei Gao

, Siyuan Hu
, Zechen Bai
, Qinghong Lin
, Mike Zheng Shou
:
AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation. 11255-11257 - Feilin Han

, Leping Zhang
, Xin Wang
, Ke-Ao Zhao
, Ying Zhong
, Ziyi Su
, Tongtong Feng
, Wenwu Zhu
:
U2USim - A UAV Telepresence Simulation Platform with Multi-agent Sensing and Dynamic Environment. 11258-11260 - Zhanbin Hu

, Xiaodong He
, Renzhou Pan
, Xianzhou Zeng
, Chenming Fan
, Qiang Zhu
:
MAF-ID: Multi-Agent Framework for Interactive Dubbing through Deep Video Understanding. 11261-11263 - Xin Jin

, Liaoruxing Zhang
, Longteng Jiang
, Dandan Li
:
Unlimited Vision: Professional Composition by Yourself. 11264-11266 - Seongjean Kim, Jungwoo Huh, Yeseung Park, Jungsu Kim, Sanghoon Lee:

DanceMimic: Awaken Your Dancing Instinct through a Real-time Dance Imitation Capture System. 11267-11269 - Ying Ma

, Xinyan Yang
, Aiqi Wang
, Jianglin Zeng
, Shaofei Liu
:
Video Editing Chatbot: Language-Driven Video Compositing System. 11270-11272 - Liangyu Wang

, Yoko Yamakata
, Ryoma Maeda
, Kiyoharu Aizawa
:
Measure and Improve Your Food: Ingredient Estimation Based Nutrition Calculator. 11273-11275 - Mingyuan Wu

, Ruifan Ji
, Haozhen Zheng
, Jiaxi Li
, Beitong Tian
, Bo Chen
, Ruixiao Zhang
, Jacob Chakareski
, Michael Zink
, Ramesh K. Sitaraman
, Klara Nahrstedt
:
Scene Graph Driven Hybrid Interactive VR Teleconferencing. 11276-11278 - Yuning Wu

, Jiatong Shi
, Yifeng Yu
, Yuxun Tang
, Tao Qian
, Yueqian Lin
, Jionghao Han
, Xinyi Bai
, Shinji Watanabe
, Qin Jin
:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. 11279-11281 - Shengzhou Yi

, Junichiro Matsugami
, Takuya Yamamoto
, Toshihiko Yamasaki
:
Enhancing Speaking and Slide Design Skills with Deep Learning: An Online Presentation Assessment System. 11282-11284
Tutorial Presentations
- Rahel Arnold

, Werner Bailer
, Ralph Gasser
, Björn Þór Jónsson
, Omar Shahbaz Khan
, Heiko Schuldt
, Florian Spiess
, Lucia Vadicamo
:
Multimedia Information Retrieval in XR. 11285-11286 - Niccolò Biondi

, Simone Ricci
, Federico Pernici
, Alberto Del Bimbo
:
Learning Backward Compatible Representations. 11287-11288 - Hao Fei

, Xiangtai Li
, Haotian Liu
, Fuxiao Liu
, Zhuosheng Zhang
, Hanwang Zhang
, Shuicheng Yan
:
From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond. 11289-11291 - Wei Gao

, Ge Li
:
Point Cloud Compression, Enhancement and Applications: From 3D Perception to Large Models. 11292-11293 - Soyeon Caren Han

, Feiqi Cao
, Josiah Poon
, Roberto Navigli
:
Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond. 11294-11295 - Xin Wang

, Yuwei Zhou
, Hong Chen
, Wenwu Zhu
:
Curriculum Learning for Multimedia in the Era of Large Language Models. 11296-11297 - Kaicheng Yu

, Zhuang Shao
, Siyuan Qi
, Dongfang Liu
:
Tutorial: Large Language-Vision Model in Society. 11298-11299 - Sicheng Zhao

, Guoli Jia
, Xiaopeng Hong
, Yanyan Zhao
, Jianhua Tao
:
Label-Efficient Emotion and Sentiment Analysis. 11300-11301
Grand Challenges
- Yicheng Wu

, Yutong Xie
, Xiangde Luo
, Qi Wu
, Jianfei Cai
:
Dataset, Challenge, and Evaluation for Tumor Segmentation Variability. 11302-11303 - Dan Guo

, Xiaobai Li
, Kun Li
, Haoyu Chen
, Jingjing Hu
, Guoying Zhao
, Yi Yang, Meng Wang
:
MAC 2024: Micro-Action Analysis Grand Challenge. 11304-11305 - Jun Yu

, Mohan Jing
, Guopeng Zhao
, Keda Lu
, Yifan Wang
, Feng Zhao
, Jiaqing Sun
, Qingsong Liu
, Jiaen Liang
:
End-to-end Spatio-Temporal Information Aggregation For Micro-Action Detection. 11306-11312 - Qiankun Li

, Xiaolong Huang
, Huabao Chen
, Feng He
, Qiupu Chen
, Zengfu Wang
:
Advancing Micro-Action Recognition with Multi-Auxiliary Heads and Hybrid Loss Optimization. 11313-11319 - Chen Wang

, Xun Mei
, Feng Zhang
:
Instance-aware Fine-grained Micro-action Recognition. 11320-11326 - Fan Gong

, Jialiang Chen
, Jiajun Zhu
, Qijian Bao
, Fei Gao
, Renshu Gu
, Gang Xu
:
Micro-Action Recognition via Hierarchical Fusion and Inference. 11327-11332 - Muhammad Saad Saeed, Shah Nawaz, Marta Moscati, Rohan Kumar Das, Muhammad Salman Tahir, Muhammad Zaigham Zaheer, Muhammad Irzam Liaqat, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Markus Schedl:

A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments. 11333-11334 - Jiehui Tang

, Xiaofei Wang
, Zhen Xiao
, Jiayi Liu
, Xueliang Liu
, Richang Hong
:
Exploring Robust Face-Voice Matching in Multilingual Environments. 11335-11341 - Ruijie Tao

, Zhan Shi
, Yidi Jiang
, Duc-Tuan Truong
, Eng Siong Chng
, Massimo Alioto
, Haizhou Li
:
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization. 11342-11347 - Wuyang Chen

, Yanjie Sun
, Kele Xu
, Yong Dou
:
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association. 11348-11354 - Zhixi Cai

, Abhinav Dhall
, Shreya Ghosh
, Munawar Hayat
, Dimitrios Kollias
, Kalin Stefanov
, Usman Tariq
:
1M-Deepfakes Detection Challenge. 11355-11359 - Diego Pérez-Vieites

, Juan José Moreira-Pérez
, Ángel Aragón-Kifute
, Raquel Román-Sarmiento
, Rubén Castro-González
:
Vigo: Audiovisual Fake Detection and Segment Localization. 11360-11364 - Yi Zhang

, Changtao Miao
, Man Luo
, Jianshu Li
, Wenzhong Deng
, Weibin Yao
, Zhe Li
, Bingyu Hu
, Weiwei Feng
, Tao Gong
, Qi Chu
:
MFMS: Learning Modality-Fused and Modality-Specific Features for Deepfake Detection and Localization Tasks. 11365-11369 - Yifan Wang

, Xuecheng Wu
, Jia Zhang
, Mohan Jing
, Keda Lu
, Jun Yu
, Wen Su
, Fang Gao
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
:
Building Robust Video-Level Deepfake Detection via Audio-Visual Local-Global Interactions. 11370-11376 - Philipp Müller

, Michal Balazia
, Tobias Baur
, Michael Dietz
, Alexander Heimerl
, Anna Penzkofer
, Dominik Schiller
, François Brémond
, Jan Alexandersson
, Elisabeth André
, Andreas Bulling
:
MultiMediate'24: Multi-Domain Engagement Estimation. 11377-11382 - Deepak Kumar

, Surbhi Madan
, Pradeep Singh
, Abhinav Dhall
, Balasubramanian Raman
:
Towards Engagement Prediction: A Cross-Modality Dual-Pipeline Approach using Visual and Audio Features. 11383-11389 - Fuyan Ma

, Yiran He
, Bin Sun
, Shutao Li
:
Less is More: Adaptive Feature Selection and Fusion for Eye Contact Detection. 11390-11396 - Jia Li

, Yangchen Yu
, Yin Chen
, Yu Zhang
, Peng Jia
, Yunbo Xu
, Ziqiang Li
, Meng Wang
, Richang Hong
:
DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation. 11397-11403 - Yu Zhao

, Hao Fei
, Bobo Li
, Meishan Zhang
, Min Zhang
:
The ACM Multimedia 2024 Viual Spatial Description Grand Challenge. 11404-11406 - Jun Yu

, Yunxiang Zhang
, Zerui Zhang
, Zhao Yang
, Gongpeng Zhao
, Fengzhao Sun
, Fanrui Zhang
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
, Yaohui Zhang
:
RAG-Guided Large Language Models for Visual Spatial Description with Adaptive Hallucination Corrector. 11407-11413 - Jiabao Wang, Fang Gao, Jingfeng Tang

, Shaodong Li
, Hanbo Zheng
, Shengheng Ma
, Feng Shuang
, Jun Yu
:
A Method for Visual Spatial Description Based on Large Language Model Fine-tuning. 11414-11419 - Yizhang Jin

, Jian Li
, Jiangning Zhang
, Jianlong Hu
, Zhenye Gan
, Xin Tan
, Yong Liu
, Yabiao Wang
, Chengjie Wang
, Lizhuang Ma
:
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description. 11420-11425 - Zhiqi Ge

, Juncheng Li
, Qifan Yu
, Wei Zhou
, Siliang Tang
, Yueting Zhuang
:
DEMON24: ACM MM24 Demonstrative Instruction Following Challenge. 11426-11428 - Xian Fu

:
Enhancing Multimodal Large Language Models on Demonstrative Multi-Image Instructions. 11429-11434 - Jingyu Wei

, Yi Su
, Kele Xu
, Lingbin Zeng
, Bo Liu
, Huaimin Wang
:
Demonstrative Instruction Following in Multimodal LLMs via Integrating Low-Rank Adaptation with Ensemble Learning. 11435-11441 - Bo Wu

, Peiye Liu
, Qiushi Huang
, Zhaoyang Zeng
, Jia Wang
, Bei Liu
, Jiebo Luo
, Wen-Huang Cheng
:
SMP Challenge Summary: Social Media Prediction Challenge. 11442-11444 - Yu-Shi Lin

, Anthony J. T. Lee
:
MMF: Winning Solution to Social Media Popularity Prediction Challenge 2024. 11445-11449 - Wenhao Hu

, Weilong Chen
, Weimin Yuan
, Yan Wang
, Shimin Cai
, Yanru Zhang
:
Dual-Stream Pre-Training Transformer to Enhance Multimodal Learning for Social Media Prediction. 11450-11456 - Mingsheng Tu

, Tianjiao Wan
, Qisheng Xu
, Xinhao Jiang
, Kele Xu
, Cheng Yang
:
Higher-Order Vision-Language Alignment for Social Media Prediction. 11457-11463 - Chih-Chung Hsu

, Chia-Ming Lee
, Yu-Fan Lin
, Yi-Shiuan Chou
, Chih-Yu Jian
, Chi-Han Tsai
:
Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction. 11464-11469 - Shien Song

, Jie Yang
, Jin Chen
, Han Qi
, Yifei Xue
, Yizhen Lao
, Yi Yu
:
ACM Multimedia 2024 Grand Challenge Report for Artificial Intelligence Generated Image Detection. 11470-11471 - Huihui Fu

:
Optimizing AIGC Image Detection: Strategies in Data Augmentation and Model Architecture. 11472-11474 - ShiHang Li

, Haishan Wu
, Biao Wang
:
A Solution to ACMMM 2024 on Artificial Intelligence Generated Image Detection. 11475-11477 - Jin Chen

:
Optimizing the Baseline Approach for the 2024 ACM Multimedia Grand Challenge in Artificial Intelligence Generated Image Detection. 11478-11481 - John See

, Jingting Li
, Adrian K. Davison
, Gen-Bing Liong
, Moi Hoon Yap
, Wen-Huang Cheng
, Xiaobai Li
, Xiaopeng Hong
, Su-Jing Wang
:
MEGC2024: ACM Multimedia 2024 Facial Micro-Expression Grand Challenge. 11482-11483 - Jun Yu

, Gongpeng Zhao
, Yaohui Zhang
, Peng He
, Zerui Zhang
, Zhao Yang
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
:
Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize. 11484-11489 - Jun Yu

, Yaohui Zhang
, Gongpeng Zhao
, Peng He
, Zerui Zhang
, Zhongpeng Cai
, Qingsong Liu
, Jianqing Sun
, Jiaen Liang
:
Micro-Expression Spotting Based on Optical Flow Feature with Boundary Calibration. 11490-11496 - Zhengye Zhang

, Sirui Zhao
, Xinglong Mao
, Shifeng Liu
, Hao Wang
, Tong Xu
, Enhong Chen
:
A Multi-scale Feature Learning Network with Optical Flow Correction for Micro- and Macro-expression Spotting. 11497-11502 - Yuhong He

, Wenchao Liu
, Guangyu Wang
, Lin Ma
, Haifeng Li
:
Enhancing Micro-Expression Analysis Performance by Effectively Addressing Data Imbalance. 11503-11507

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














