


default search action
CVPR 2025: Nashville, TN, USA
- IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE 2025, ISBN 979-8-3503-5300-6

2025-06-13
- Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun:

Motion Prompting: Controlling Video Generation with Motion Trajectories. 1-12 - Ryan D. Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi, Michael S. Ryoo, Paul E. Debevec, Ning Yu:

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise. 13-23 - Pascal Chang

, Sergio Sancho, Jingwei Tang, Markus Gross, Vinicius C. Azevedo:
LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping. 24-33 - Yifan Zhou, Zeqi Xiao, Shuai Yang, Xingang Pan:

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space. 34-44 - Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang:

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders. 45-55 - Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu

, Tianhua Li, Yuxuan Xie, Xiaojun Chang
, Yu Qiao, Wenqi Shao, Kaipeng Zhang
:
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation. 56-66 - Faridoun Mehri, Mahdieh Soleymani Baghshah, Mohammad Taher Pilehvar:

LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions. 67-78 - Damien Teney, Liangze Jiang, Florin Gogianu, Ehsan Abbasnejad:

Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild. 79-90 - Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, Yen-Sung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross B. Girshick, Ali Farhadi, Aniruddha Kembhavi:

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models. 91-104 - Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, Xiaoming Liu:

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector. 105-116 - Nick Stracke, Stefan Andreas Baumann, Kolja Bauer, Frank Fundel, Björn Ommer:

CleanDIFT: Diffusion Features without Noise. 117-127 - Meng Lou

, Yizhou Yu:
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels. 128-138 - Longyu Yang, Ping Hu, Shangbo Yuan, Lu Zhang, Jun Liu, Hengtao Shen, Xiaofeng Zhu:

Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather. 139-149 - Xiaoyi Liu

, Hao Tang
:
DiffFNO: Diffusion Fourier Neural Operator. 150-160 - Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen, Marc Levoy:

Removing Reflections from RAW Photos. 161-171 - Zhedong Zhang, Liang Li, Chenggang Yan, Chunshan Liu

, Anton van den Hengel, Yuankai Qi
:
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing. 172-182 - Hao Li, Ju Dai, Xin Zhao, Feng Zhou, Junjun Pan, Lei Li:

Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation. 183-192 - Xiaozhong Ji, Xiaobin Hu, Zhihong Xu, Junwei Zhu, Chuming Lin, Qingdong He, Jiangning Zhang, Donghao Luo, Yi Chen, Qin Lin, Qinglin Lu, Chengjie Wang

:
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation. 193-203 - Xuanchen Li, Jianyu Wang, Yuhao Cheng, Yikun Zeng, Xingyu Ren, Wenhan Zhu, Weiming Zhao, Yichao Yan:

Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture. 204-214 - Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius, Joachim Denzler:

Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis. 215-227 - Mingtao Guo, Guanyu Xing, Yanli Liu:

High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model. 228-238 - Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov, Hsiao-Yu Chen, Aljaz Bozic, Nikolaos Sarafianos, Doug Roble:

Quaffure: Real-Time Quasi-Static Neural Hair Simulation. 239-249 - Wei-Qi Feng, Dong Han, Ze-Kang Zhou, Shunkai Li, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Miao Wang:

GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections. 250-259 - Hongrui Cai, Yuting Xiao, Xuan Wang, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, Juyong Zhang:

HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars. 260-270 - Jack R. Saunders

, Charlie Hewitt, Yanan Jian, Marek Kowalski, Tadas Baltrusaitis, Yiye Chen, Darren Cosker
, Virginia Estellers, Nicholas Gyde, Vinay P. Namboodiri, Benjamin E. Lundell:
GASP: Gaussian Avatars with Synthetic Priors. 271-280 - Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason M. Saragih, Yaser Sheikh:

FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images. 281-291 - Jingyu Zhuang, Di Kang, Linchao Bao, Liang Lin, Guanbin Li:

DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh. 292-303 - Zedong Chu, Feng Xiong, Meiduo Liu, Jinzhi Zhang, Mingqi Shao, Zhaoxu Sun, Di Wang, Mu Xu:

HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset. 304-313 - Yuanyou Xu, Zongxin Yang, Yi Yang:

SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons. 314-325 - Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori:

FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy. 326-337 - Gangjian Zhang, Nanjie Yao, Shunsi Zhang, Hanfeng Zhao, Guoliang Pang, Jian Shu, Hao Wang:

MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction. 338-347 - Zichen Tang, Yuan Yao, Miaomiao Cui, Liefeng Bo, Hongyu Yang:

GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior. 348-358 - Yingmao Miao, Zhanpeng Huang, Rui Han, Zibin Wang, Chenhao Lin, Chao Shen:

Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model. 359-368 - Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey

, Zhixin Shu:
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces. 369-379 - Junying Wang, Jingyuan Liu, Xin Sun, Krishna Kumar Singh, Zhixin Shu, He Zhang, Jimei Yang, Nanxuan Zhao, Tuanfeng Y. Wang, Simon S. Chen, Ulrich Neumann, Jae Shin Yoon:

Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization. 380-390 - Kenji Enomoto, Scott Cohen, Brian L. Price, T. J. Rhodes:

Polarized Color Screen Matting. 391-399 - Ning Ni

, Libao Zhang:
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation. 400-410 - Ping Wang, Lishun Wang, Gang Qu, Xiaodong Wang, Yulun Zhang, Xin Yuan:

Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging. 411-421 - Bojian Wu, Yifan Peng, Ruizhen Hu, Xiaowei Zhou:

Glossy Object Reconstruction with Cost-effective Polarized Acquisition. 422-431 - Wei Xu

, Charles James Wagner, Junjie Luo, Qi Guo:
Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries. 432-441 - Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, Anand Bhattad:

LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting. 442-452 - Chao Wang, Zhihao Xia, Thomas Leimkühler, Karol Myszkowski, Xuaner Zhang:

LEDiff: Latent Exposure Diffusion for HDR Generation. 453-464 - Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li, Zhao Dong, Christian Richardt, Tuotuo Li, Michael Zollhöfer, Johannes Kopf, Shenlong Wang, Changil Kim:

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images. 465-474 - Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek:

Differentiable Inverse Rendering with Interpretable Basis BRDFs. 475-484 - Samuel Rota Bulò, Nemanja Bartolovic, Lorenzo Porzi, Peter Kontschieder:

Hardware-Rasterized Ray-Based Gaussian Splatting. 485-494 - Chun Gu, Xiaofei Wei, Li Zhang, Xiatian Zhu:

TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering. 495-504 - Zhengqin Li, Dilin Wang, Ka Chen, Zhaoyang Lv, Thu Nguyen-Phuoc, Milim Lee, Jia-Bin Huang, Lei Xiao, Yufeng Zhu, Carl S. Marshall, Yuheng Ren

, Richard A. Newcombe, Zhao Dong:
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields. 505-517 - Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang:

Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering. 518-529 - Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan:

Accurate Differential Operators for Hybrid Neural Fields. 530-539 - Feixiang He, Jiangbei Yue, Jialin Zhu

, Armin Seyfried, Dan Casas, Julien Pettré, He Wang:
Learning Extremely High Density Crowds as Active Matters. 540-550 - Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding, Zhouhui Lian:

TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting. 551-561 - Guoxing Sun

, Rishabh Dabral, Heming Zhu, Pascal Fua, Christian Theobalt, Marc Habermann:
Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures. 562-573 - Zhipeng Huang, Wangbo Yu, Xinhua Cheng, ChengShu Zhao, Yunyang Ge, Mingyi Guo, Li Yuan, Yonghong Tian:

RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing. 574-584 - Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, Liang Pan:

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D. 585-594 - Qiao Yu, Xianzhi Li, Yuan Tang, Xu Han, Long Hu, Yixue Hao, Min Chen:

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation. 595-604 - Nissim Maruani

, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun:
ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion. 605-617 - Daoyi Gao, Yawar Siddiqui, Lei Li

, Angela Dai:
MeshArt: Generating Articulated Meshes with Structure-Guided Transformers. 618-627 - Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai:

SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation. 628-639 - Ziya Erkoç, Can Gümeli, Chaoyang Wang, Matthias Nießner, Angela Dai, Peter Wonka, Hsin-Ying Lee, Peiye Zhuang:

PrEditor3D: Fast and Precise 3D Shape Editing. 640-649 - Quan Meng, Lei Li

, Matthias Nießner, Angela Dai:
LT3SD: Latent Trees for 3D Scene Diffusion. 650-660 - Yian Zhao, Wanshi Xu, Ruochong Zheng, Pengchong Qiao, Chang Liu, Jie Chen:

iSegMan: Interactive Segment-and-Manipulate 3D Gaussians. 661-670 - Jianxiong Shen, Yue Qian, Xiaohang Zhan:

LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup. 671-680 - Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, Xiao Sun:

MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks. 681-690 - Kun Yang, Yuxiang Liu, Zeyu Cui, Yu Liu, Maojun Zhang, Shen Yan, Qing Wang:

NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics. 691-700 - Yexing Xu, Longguang Wang, Minglin Chen, Sheng Ao, Li Li, Yulan Guo:

DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering. 701-710 - Yecong Wan, Mingwen Shao, Yuanshuo Cheng, Wangmeng Zuo:

S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting. 711-721 - Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, Arno Solin

:
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering. 722-732 - Zhihao Liu, Zhanglin Cheng, Naoto Yokoya:

Neural Hierarchical Decomposition for Single Image Plant Modeling. 733-742 - Xiang Li, Zixuan Huang, Anh Thai, James M. Rehg:

Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation. 743-752 - Zhao Dong, Ka Chen, Zhaoyang Lv, Hong-Xing Yu, Yunzhi Zhang, Cheng Zhang, Yufeng Zhu, Stephen Tian, Zhengqin Li, Geordie Moffatt, Sean Christofferson, James Fort, Xiaqing Pan, Mingfei Yan, Jiajun Wu, Carl Yuheng Ren, Richard A. Newcombe:

Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset. 753-763 - Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich, Rares Ambrus:

Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion. 764-776 - Lingen Li, Zhaoyang Zhang, Yaowei Li, Jiale Xu, Wenbo Hu, Xiaoyu Li, Weihao Cheng, Jinwei Gu, Tianfan Xue, Ying Shan:

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images. 777-787 - Jingyu Lin, Jiaqi Gu, Lubin Fan, Bojian Wu, Yujing Lou, Renjie Chen, Ligang Liu, Jieping Ye:

HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting. 788-797 - Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren:

Wonderland: Navigating 3D Scenes from a Single Image. 798-810 - Zhen Lv, Yangqi Long, Congzhentao Huang, Cao Li, Chengfei Lv, Hao Ren, Dian Zheng:

SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input. 811-821 - Yunzhi Yan, Zhen Xu, Haotong Lin, Haian Jin, Haoyu Guo, Yida Wang, Kun Zhan, Xianpeng Lang, Hujun Bao, Xiaowei Zhou, Sida Peng:

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models. 822-832 - Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, Yi Yang:

DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery. 833-843 - Cong Ruan, Yuesong Wang, Tao Guan, Bin Zhang, Lili Ju:

IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction. 844-853 - Xiaohao Xu, Feng Xue, Shibo Zhao, Yike Pan, Sebastian A. Scherer, Xiaonan Huang:

MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction. 854-863 - Sangmin Kim, Seunguk Do, Jaesik Park:

ShowMak3r: Compositional TV Show Reconstruction. 864-874 - Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, Xiaoyun Zhang, Guangtao Zhai, Yanfeng Wang:

4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video. 875-885 - Yiming Liang, Tianhan Xu, Yuta Kikuchi:

HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation. 886-895 - Siyuan Shen, Tianjia Shao, Kun Zhou, Chenfanfu Jiang, Yin Yang:

EnliveningGS: Active Locomotion of 3DGS. 896-905 - Hongye Cheng, Tianyu Wang, Guangsi Shi, Zexing Zhao, Yanwei Fu:

HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation. 906-916 - Haolin Liu, Xiaohang Zhan, Zizheng Yan, Zhongjin Luo, Yuxin Wen, Xiaoguang Han:

Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence. 917-928 - Bohan Yu, Jinxiu Liang, Zhuofeng Wang, Bin Fan, Art Subpa-Asa, Boxin Shi, Imari Sato:

Active Hyperspectral Imaging Using an Event Camera. 929-939 - Yaniv Benny, Lior Wolf:

SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception. 940-950 - Huan Zheng, Wencheng Han, Jianbing Shen:

Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution. 951-960 - Hongkai Lin, Dingkang Liang, Zhenghao Qi, Xiang Bai:

A Unified Image-Dense Annotation Generation Model for Underwater Scenes. 961-970 - Jianing Li, Yunjian Zhang, Haiqian Han, Xiangyang Ji:

Active Event-based Stereo Vision. 971-981 - Zidong Cao, Jinjing Zhu, Weiming Zhang, Hao Ai

, Haotian Bai, Hengshuang Zhao, Lin Wang:
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation. 982-992 - Xunzhi Zheng, Dan Xu

:
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations. 993-1002 - Jiaxi Deng, Yushen Wang, Haitao Meng, Zuoxun Hou, Yi Chang, Gang Chen:

OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye Cameras. 1003-1012 - Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia

:
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail. 1013-1027 - Luigi Piccinelli, Christos Sakaridis, Mattia Segù, Yung-Hsu Yang, Siyuan Li, Wim Abbeloos, Luc Van Gool:

UniK3D: Universal Camera Monocular 3D Estimation. 1028-1039 - Yihan Wang, Linfei Pan, Marc Pollefeys, Viktor Larsson:

Structure-from-Motion with a Non-Parametric Camera Model. 1040-1049 - Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jérôme Revaud, Vincent Leroy:

MUSt3R: Multi-view Network for Stereo 3D Reconstruction. 1050-1060 - Hana Bezalel, Dotan Ankri, Ruojin Cai, Hadar Averbuch-Elor:

Extreme Rotation Estimation in the Wild. 1061-1070 - Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jérôme Revaud:

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors. 1071-1081 - Maxime Pietrantoni, Gabriela Csurka, Torsten Sattler:

Gaussian Splatting Feature Fields for (Privacy-Preserving) Visual Localization. 1082-1092 - Jonathan Astermark

, Anders Heyden, Viktor Larsson:
Dense Match Summarization for Faster Two-view Estimation. 1093-1102 - Honggyu An, Jin Hyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim:

Cross-View Completion Models are Zero-shot Correspondence Estimators. 1103-1115 - David Yifan Yao, Albert J. Zhai, Shenlong Wang:

Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video. 1116-1126 - Yuzhen Liu, Qiulei Dong:

EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation. 1127-1137 - Krispin Wandel

, Hesheng Wang:
SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations. 1138-1147 - Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas:

PromptHMR: Promptable Human Mesh Recovery. 1148-1159 - Yalong Xu, Lin Zhao, Chen Gong, Guangyu Li, Di Wang, Nannan Wang:

DynPose: Largely Improving the Efficiency of Human Pose Estimation by a Simple Dynamic Framework. 1160-1169 - Huan Ren, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang:

Rethinking Correspondence-based Category-Level Object Pose Estimation. 1170-1179 - Ming-Feng Li, Xin Yang, Fu-En Wang, Hritam Basak, Yuyin Sun, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo:

UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References. 1180-1189 - Bin Tan, Rui Yu

, Yujun Shen, Nan Xue:
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes. 1190-1199 - Xiuqiang Song, Li Jin, Zhengxian Zhang, Jiachen Li, Fan Zhong, Guofeng Zhang, Xueying Qin:

Prior-free 3D Object Tracking. 1200-1209 - Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu, Yulan Guo:

Progressive Correspondence Regenerator for Robust 3D Registration. 1210-1219 - Amir Etefaghi Daryani, M. Usman Maqbool Bhutta, Byron Hernandez, Henry Medeiros:

CaMuViD: Calibration-Free Multi-View Detection. 1220-1229 - Théo Bodrito, Olivier Flasseur, Julien Mairal, Jean Ponce, Maud Langlois, Anne-Marie Lagrange:

A New Statistical Model of Star Speckles for Learning to Detect and Characterize Exoplanets in Direct Imaging Observations. 1230-1240 - Huy Nguyen

, Kien Nguyen
, Akila Pemasiri, Feng Liu, Sridha Sridharan, Clinton Fookes:
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification. 1241-1251 - Shuo Wang, Wanting Li

, Yongcai Wang, Zhaoxin Fan, Zhe Huang
, Xudong Cai, Jian Zhao, Deying Li:
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing. 1252-1262 - Hongyu Sun, Qiuhong Ke, Ming Cheng, Yongcai Wang, Deying Li, Chenhui Gou, Jianfei Cai:

Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis. 1263-1275 - Zimo Wang, Cheng Wang, Taiki Yoshino, Sirui Tao

, Ziyang Fu, Tzu-Mao Li:
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition. 1276-1286 - Yuanqi Li, Jingcheng Huang, Hongshen Wang

, Peiyuan Lv, Yansong Liu, Jiuming Zheng, Jie Guo, Yanwen Guo:
High-quality Point Cloud Oriented Normal Estimation via Hybrid Angular and Euclidean Distance Encoding. 1287-1296 - Jiangbei Hu, Yanggeng Li, Fei Hou, Junhui Hou

, Zhebin Zhang, Shengfa Wang, Na Lei, Ying He:
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions. 1297-1307 - An Li, Zhe Zhu, Mingqiang Wei:

GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors. 1308-1318 - Ziyi Wang, Yanran Zhang, Jie Zhou, Jiwen Lu:

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting. 1319-1329 - Ziyin Zeng, Mingyue Dong, Jian Zhou, Huan Qiu, Zhen Dong, Man Luo, Bijun Li:

DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis. 1330-1341 - Chengzhi Wu, Yuxin Wan, Hao Fu, Julius Pfrommer, Zeyun Zhong, Junwei Zheng, Jiaming Zhang, Jürgen Beyerer:

SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity. 1342-1352 - Jianan Ye, Weiguang Zhao, Xi Yang, Guangliang Cheng, Kaizhu Huang:

PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection. 1353-1362 - Shaocheng Yan, Yiming Wang, Kaiyan Zhao, Pengcheng Shi, Zhenjun Zhao, Yongjun Zhang, Jiayuan Li:

HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration. 1363-1373 - Zihui Zhang, Weisheng Dai, Hongtao Wen, Bo Yang:

LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds. 1374-1384 - Runmao Yao, Yi Du, Zhuoqun Chen, Haoze Zheng, Chen Wang:

AirRoom: Objects Matter in Room Reidentification. 1385-1394 - Fajwel Fogel, Yohann Perron

, Nikola Besic, Laurent Saint-André, Agnès Pellissier-Tanon, Martin Schwartz, Thomas Boudras, Ibrahim Fayad, Alexandre d'Aspremont, Loïc Landrieu, Philippe Ciais:
Open-Canopy: Towards Very High Resolution Forest Monitoring. 1395-1406 - Xin Jin, Haisheng Su, Kai Liu, Cong Ma, Wei Wu, Fei Hui, Junchi Yan:

UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection. 1407-1417 - Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, Chenglu Wen:

Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels. 1418-1428 - R. D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang:

HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving. 1429-1438 - Zakaria Laskar, Tomas Vojir, Matej Grcic, Iaroslav Melekhov, Shankar Gangisetty, Juho Kannala, Jiri Matas, Giorgos Tolias, C. V. Jawahar:

A Dataset for Semantic Segmentation in the Presence of Unknowns. 1439-1448 - Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun:

MAD: Memory-Augmented Detection of 3D Objects. 1449-1460 - Cédric Vincent, Taehyoung Kim, Henri Meeß:

High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight. 1461-1471 - Lingdong Kong, Dongyue Lu, Xiang Xu, Lai Xing Ng

, Wei Tsang Ooi, Benoit R. Cottereau:
EventFly: Event Camera Perception from Ground to the Sky. 1472-1484 - Tianchen Deng, Guole Shen, Chen Xun, Shenghai Yuan, Tongxin Jin, Hongming Shen, Yanbo Wang, Jingchuan Wang, Hesheng Wang, Danwei Wang, Weidong Chen:

MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots. 1485-1494 - Xin Ye, Burhaneddin Yaman, Sheng Cheng, Feng Tao, Abhirup Mallik, Liu Ren:

BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance. 1495-1504 - Dubing Chen, Huan Zheng, Jin Fang, Xingping Dong, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen:

Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction. 1505-1515 - Zhimin Liao, Ping Wei, Shuaijia Chen, Haoxuan Wang, Ziyang Ren:

STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction. 1516-1526 - Pan Yin, Kaiyu Li, Xiangyong Cao, Jing Yao, Lei Liu, Xueru Bai, Feng Zhou, Deyu Meng:

Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method. 1527-1537 - Chenxu Zhou, Lvchang Fu, Sida Peng, Yunzhi Yan, Zhanhua Zhang, Yong Chen, Jiazhi Xia, Xiaowei Zhou:

LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation. 1538-1548 - Jingqiu Zhou, Lue Fan, Linjiang Huang, Xiaoyu Shi, Si Liu, Zhaoxiang Zhang

, Hongsheng Li:
FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering. 1549-1558 - Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen, Yida Wang, Xueyang Zhang, Yifei Zhan, Kun Zhan, Peng Jia, Xianpeng Lang, Xingang Wang, Wenjun Mei:

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration. 1559-1569 - Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang:

SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model. 1570-1580 - Ziyang Xie, Zhizheng Liu, Zhenghao Peng, Wayne Wu, Bolei Zhou:

Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation. 1581-1591 - Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, Jinglin Li:

One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception. 1592-1601 - Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, Wei Yin:

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving. 1602-1611 - Zikang Zhou

, Hengjian Zhou, Haibo Hu, Zihao Wen
, Jianping Wang, Yung-Hui Li, Yu-Kai Huang:
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling. 1612-1621 - Yichen Xie, Runsheng Xu, Tong He, Jyh-Jing Hwang, Katie Luo, Jingwei Ji, Hubert Lin, Letian Chen, Yiren Lu, Zhaoqi Leng, Dragomir Anguelov, Mingxing Tan:

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation. 1622-1632 - Yifan Wang, Jian Zhao, Zhaoxin Fan, Xin Zhang, Xuecheng Wu, Yudian Zhang, Lei Jin, Xinyue Li, Gang Wang, Mengxi Jia, Ping Hu, Zheng Zhu, Xuelong Li:

JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems. 1633-1644 - Ruiqi Qiu, Jun Gong, Xinyu Zhang, Siqi Luo, Bowen Zhang, Yi Cen:

Adapting to Observation Length of Trajectory Prediction via Contrastive Learning. 1645-1654 - Dianze Li, Jianing Li, Xu Liu, Xiaopeng Fan

, Yonghong Tian:
Asynchronous Collaborative Graph Representation for Frames and Events. 1655-1666 - Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang:

METASCENES: Towards Automated Replica Creation for Real-world 3D Scans. 1667-1679 - Dongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee:

GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency. 1680-1690 - Chunlin Yu, Hanqing Wang, Ye Shi, Haoyang Luo, Sibei Yang, Jingyi Yu, Jingya Wang:

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model. 1691-1701 - Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn

, Ankur Handa, Tsung-Yi Lin, Gordon Wetzstein, Ming-Yu Liu, Donglai Xiang:
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models. 1702-1713 - Zhenyu Wu, Yuheng Zhou, Xiuwei Xu, Ziwei Wang, Haibin Yan:

MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation. 1714-1723 - Yuheng Ji, Huajie Tan, Jiayu Shi, Xiaoshuai Hao, Yuan Zhang, Hengyuan Zhang, Pengwei Wang, Mengdi Zhao, Yao Mu, Pengju An, Xinda Xue, Qinghang Su, Huaihai Lyu, Xiaolong Zheng, Jiaming Liu, Zhongyuan Wang, Shanghang Zhang:

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. 1724-1734 - Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, Ping Luo:

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation. 1735-1744 - Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, Mingyu Ding:

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation. 1745-1755 - Guangyan Chen, Te Cui, Meiling Wang, Chengcai Yang, Mengxiao Hu, Haoyang Lu, Yao Mu, Zicai Peng, Tianxing Zhou, Xinran Jiang, Yi Yang, Yufeng Yue:

GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning. 1756-1768 - Yun Liu, Chengwen Zhang

, Ruofan Xing, Bingda Tang, Bowen Yang, Li Yi:
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement. 1769-1782 - Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun S. Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas:

PICO: Reconstructing 3D People In Contact with Objects. 1783-1794 - Yiming Dou, Wonseok Oh, Yuqing Luo, Antonio Loquercio, Andrew Owens:

Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes. 1795-1804 - Jinglei Zhang, Jiankang Deng, Chao Ma, Rolandos Alexandros Potamias:

HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos. 1805-1815 - Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, Hanbyul Joo:

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions. 1816-1828 - Jing Gao, Ce Zheng, László A. Jeni, Zackory Erickson:

DiSRT-In-Bed: Diffusion-Based Sim-to-Real Transfer Framework for In-Bed Human Mesh Recovery. 1829-1838 - Songpengcheng Xia, Yu Zhang, Zhuo Su, Xiaozheng Zheng, Zheng Lv, Guidong Wang, Yongjie Zhang, Qi Wu, Lei Chu, Ling Pei:

EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling. 1839-1849 - Germán Barquero, Nadine Bertsch, Manojkumar Marramreddy, Carlos Chacón, Filippo Arcadu, Ferran Rigual, Nicky Sijia He, Cristina Palmero, Sergio Escalera

, Yuting Ye, Robin Kips:
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models. 1850-1860 - Dong Wei, Xiaoning Sun, Xizhan Gao, Shengxiang Hu, Huaijiang Sun:

ALIEN: Implicit Neural Representations for Human Motion Prediction under Arbitrary Latency. 1861-1870 - Cecilia Curreli, Dominik Muhle, Abhishek Saroha, Zhenzhang Ye, Riccardo Marin, Daniel Cremers:

Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction. 1871-1882 - Jianwei Tang, Hong Yang, Tengyue Chen, Jianfang Hu:

Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic. 1883-1893 - Jiayi Su, Youhe Feng, Zheng Li, Jinhua Song, Yangfan He, Botao Ren, Botian Xu:

ArtFormer: Controllable Generation of Diverse 3D Articulated Objects. 1894-1904 - Dian Shao, Mingfei Shi, Shengda Xu, Haodong Chen, Yongle Huang, Binglu Wang:

FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance. 1905-1916 - Ting-Hsuan Liao, Yi Zhou, Yu Shen, Chun-Hao Paul Huang, Saayan Mitra, Jia-Bin Huang, Uttaran Bhattacharya:

Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions. 1917-1928 - Xuan Wang

, Kai Ruan
, Xing Zhang, Gaoang Wang:
AniMo: Species-Aware Model for Text-Driven Animal Motion Generation. 1929-1939 - Yifeng Ma, Jinwei Qi, Chaonan Ji, Peng Zhang, Bang Zhang, Zhidong Deng, Liefeng Bo:

Exploring Timeline Control for Facial Motion Generation. 1940-1950 - Ruineng Li, Daitao Xing, Huiming Sun, Yuanzhou Ha, Jinglin Shen, Chiuman Ho:

TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation. 1951-1961 - Inès Hyeonsu Kim, Seokju Cho, Jiahui Huang, Jung Yi, Joon-Young Lee, Seungryong Kim:

Exploring Temporally-Aware Features for Point Tracking. 1962-1972 - Yuhong Zhang, Guanlin Wu, Ling-Hao Chen, Zhuokai Zhao, Jing Lin, Xiaoke Jiang, Jiamin Wu, Zhuoheng Li, Hao (Frank) Yang, Haoqian Wang, Lei Zhang:

HumanMM: Global Human Motion Recovery from Multi-shot Videos. 1973-1983 - Daikun Liu, Lei Cheng, Teng Wang, Changyin Sun:

EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation. 1984-1993 - Zaoming Yan, Pengcheng Lei, Tingting Wang, Faming Fang, Junkang Zhang, Yaomin Huang, Haichuan Song:

Explicit Depth-Aware Blurry Video Frame Interpolation Guided by Differential Curves. 1994-2004 - Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan:

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos. 2005-2015 - Baorui Ma, Huachen Gao, Haoge Deng, Zhengxiong Luo, Tiejun Huang, Lulu Tang, Xinlong Wang:

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale. 2016-2029 - Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha, Niloy J. Mitra, Karan Singh, Paul Guerrero:

Motion Modes: What Could Happen Next? 2030-2039 - Wonjoon Jin, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho:

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis. 2040-2049 - David Junhao Zhang, Roni Paiss, Shiran Zada, Nikhil Karnad, David E. Jacobs, Yael Pritch, Inbar Mosseri, Mike Zheng Shou, Neal Wadhwa, Nataniel Ruiz:

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning. 2050-2062 - Zhenghao Zhang, Junchao Liao, Menghao Li, Zuozhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang:

Tora: Trajectory-oriented Diffusion Transformer for Video Generation. 2063-2073 - Shengzhi Wang, Yingkang Zhong, Jiangchuan Mu, Kai Wu, Mingliang Xiong, Wen Fang, Mingqing Liu, Hao Deng, Bin He, Gang Li, Qingwen Liu:

Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing. 2074-2083 - Xin Jin, Simon Niklaus, Zhoutong Zhang, Zhihao Xia, Chunle Guo, Yuting Yang, Jiawen Chen, Chongyi Li:

Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable. 2084-2093 - Yifan Bian, Chuanbo Tang, Li Li, Dong Liu:

Augmented Deep Contexts for Spatially Embedded Video Coding. 2094-2104 - Zihao Zhang, Haoran Chen, Haoyu Zhao, Guansong Lu, Yanwei Fu, Hang Xu, Zuxuan Wu:

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation. 2105-2115 - Yuantong Zhang, Zhenzhong Chen:

Continuous Space-Time Video Resampling with Invertible Motion Steganography. 2116-2126 - Xingguang Zhang, Nicholas Chimitt

, Xijun Wang, Yu Yuan, Stanley H. Chan:
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation. 2127-2138 - Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, Difan Liu:

VideoGigaGAN: Towards Detail-rich Video Super-Resolution. 2139-2149 - Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, Jian Wang:

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception. 2150-2160 - Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Chen Change Loy, Lu Jiang:

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration. 2161-2172 - Weichen Dai, Hexing Wu, Xiaoyang Weng, Yuxin Zheng, Yuhang Ming, Wanzeng Kong:

Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation. 2173-2182 - Yutong Wang, Jiajie Teng, Jiajiong Cao, Yuming Li, Chenguang Ma

, Hongteng Xu, Dixin Luo:
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency. 2183-2193 - Xinan Xie, Qing Zhang, Wei-Shi Zheng:

Diffusion-based Event Generation for High-Quality Image Deblurring. 2194-2203 - Yanis Benidir, Nicolas Gonthier, Clément Mallet:

The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf. 2204-2214 - Hyejin Oh, Woo-Shik Kim, Sangyoon Lee, YungKyung Park, Je-Won Kang:

Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation. 2215-2225 - Jinyuan Liu, Bowei Zhang, Qingyun Mei, Xingyuan Li, Yang Zou, Zhiying Jiang, Long Ma, Risheng Liu, Xin Fan:

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion. 2226-2235 - Junming Hou, Xiaoyu Chen, Ran Ran, Xiaofeng Cong, Xinyang Liu, Jian Wei You, Liang-Jian Deng:

Binarized Neural Network for Multi-spectral Image Fusion. 2236-2245 - Haitao Wu, Qing Li, Changqing Zhang, Zhen He, Xiaomin Ying:

Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior. 2246-2257 - Jiuchen Chen, Xinyu Yan, Qizhi Xu, Kaiqi Li:

Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images. 2258-2268 - Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho, Sunghoon Im, Kyong Hwan Jin:

Towards Lossless Implicit Neural Representation via Bit Plane Decomposition. 2269-2278 - Wei Long, Xingyu Zhou, Leheng Zhang, Shuhang Gu:

Progressive Focused Transformer for Single Image Super-Resolution. 2279-2288 - Yuxuan Jiang, Ho Man Kwan, Tianhao Peng, Ge Gao

, Fan Zhang
, Xiaoqing Zhu, Joel Sole, David Bull:
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution. 2289-2299 - Yulu Bai, Jiahong Fu, Qi Xie, Deyu Meng:

A Regularization-Guided Equivariant Approach for Image Restoration. 2300-2310 - Fengjia Zhang, Samrudhdhi B. Rangrej, Tristan Aumentado-Armstrong, Afsaneh Fazly, Alex Levinshtein:

Augmenting Perceptual Super-Resolution via Image Quality Predictors. 2311-2322 - Tengyu Ma, Long Ma, Ziye Li, Yuetong Wang, Jinyuan Liu, Chengpei Xu, Risheng Liu:

Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond. 2323-2332 - Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, Lei Zhang:

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach. 2333-2343 - Xudong Li, Wenjie Nie, Yan Zhang, Runze Hu, Ke Li, Xiawu Zheng, Liujuan Cao:

Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment. 2344-2354 - Jinho Jeong, Sangmin Han, Jinwoo Kim, Seon Joo Kim:

Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models. 2355-2365 - Guangqian Guo, Yong Guo, Xuehui Yu, Wenbo Li, Yaoxing Wang

, Shan Gao:
Segment Any-Quality Images with Generative Latent Space Enhancement. 2366-2376 - Yuhan Wang

, Suzhi Bi, Ying-Jun Angela Zhang, Xiaojun Yuan:
Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model. 2377-2386 - Zhifu Tian

, Tao Hu, Chaoyang Niu, Di Wu, Shu Wang:
Sampling Innovation-Based Adaptive Compressive Sensing. 2387-2397 - Tomer Garber, Tom Tirer:

Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond). 2398-2407 - Zhi Jiang, Jingbo Hu, Ling Zhang, Gang Fu, Chunxia Xiao:

Hierarchical Adaptive Filtering Network for Text Image Specular Highlight Removal. 2408-2417 - Yi Liu, Hao Zhou, Benlei Cui, Wenxiang Shang, Ran Lin:

Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways. 2418-2427 - Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

:
Balanced Rate-Distortion Optimization in Learned Image Compression. 2428-2438 - Sora Kim, Sungho Suh, Minsik Lee:

RAD: Region-Aware Diffusion Models for Image Inpainting. 2439-2448 - Lucas Relic, Roberto Azevedo, Yang Zhang, Markus Gross, Christopher Schroers:

Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression. 2449-2458 - Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martínez:

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion. 2459-2468 - Yuchuan Tian, Jing Han, Chengcheng Wang, Yuchen Liang, Chao Xu, Hanting Chen:

DiC: Rethinking Conv3x3 Designs in Diffusion Models. 2469-2478 - Yushu Wu, Zhixing Zhang, Yanyu Li, Yanwu Xu, Anil Kag, Yang Sui, Huseyin Coskun, Ke Ma, Aleksei Lebedev, Ju Hu, Dimitris N. Metaxas, Yanzhi Wang, Sergey Tulyakov, Jian Ren:

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device. 2479-2490 - Zijian Zhou, Shikun Liu, Xiao Han, Haozhe Liu, Kam Woh Ng, Tian Xie, Yuren Cong, Hang Li, Mengmeng Xu, Juan-Manuel Pérez-Rúa, Aditya Patel, Tao Xiang, Miaojing Shi, Sen He:

Learning Flow Fields in Attention for Controllable Person Image Generation. 2491-2501 - Xiao Zhang, Ruoxi Jiang, Rebecca Willett, Michael Maire:

Nested Diffusion Models Using Hierarchical Latent Priors. 2502-2512 - Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee:

Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training. 2513-2522 - Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi S. Jaakkola, Xuhui Jia, Saining Xie:

Scaling Inference Time Compute for Diffusion Models. 2523-2534 - Hermann Kumbong, Xian Liu, Tsung-Yi Lin, Ming-Yu Liu, Xihui Liu, Ziwei Liu, Daniel Y. Fu, Christopher Ré, David W. Romero:

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation. 2535-2544 - Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K. Du, Zehuan Yuan, Xinglong Wu:

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation. 2545-2555 - Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Yi-Zhe Song:

SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models. 2556-2567 - Roberto Henschel, Levon Khachatryan, Hayk Poghosyan, Daniil Hayrapetyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi:

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text. 2568-2577 - Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai:

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity. 2578-2588 - Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo:

VideoDirector: Precise Video Editing via Text-to-Video Models. 2589-2598 - Dohun Lee, Bryan Sangwoo Kim, Geon Yeong Park, Jong Chul Ye:

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide. 2599-2608 - Xi Wang, Robin Courant, Marc Christie, Vicky Kalogeiton:

AKiRa: Augmentation Kit on Rays for Optical Video Generation. 2609-2619 - Mingi Kwon, Shin seong Kim, Jaeseok Jeong, Yi Ting Hsiao, Youngjung Uh:

TCFG: Tangential Damping Classifier-free Guidance. 2620-2629 - Zixuan Ye, Huijuan Huang, Xintao Wang, Pengfei Wan, Di Zhang, Wenhan Luo:

StyleMaster: Stylize Your Video with Artistic Generation and Translation. 2630-2640 - Nadav Z. Cohen, Oron Nir, Ariel Shamir:

Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation. 2641-2650 - Yufan Ren, Zicong Jiang

, Tong Zhang, Søren Forchhammer, Sabine Süsstrunk:
FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing. 2651-2660 - Fengyi Fu, Lei Zhang, Mengqi Huang, Zhendong Mao:

FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation. 2661-2670 - Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Krishna, Jiasen Lu:

One Diffusion to Generate Them All. 2671-2682 - Yanfeng Li, Ka-Hou Chan, Yue Sun, Chan-Tong Lam, Tong Tong, Zitong Yu, Keren Fu, Xiaohong Liu, Tao Tan:

MoEdit: On Learning Quantity Perception for Multi-object Image Editing. 2683-2693 - Yingjing Xu, Jie Kong, Jiazhi Wang, Xiao Pan, Bo Lin, Qiang Liu:

InsightEdit: Towards Better Instruction Following for Image Editing. 2694-2703 - Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng, Zhihao Xia:

Instruction-based Image Manipulation by Watching How Things Move. 2704-2713 - Mushui Liu, Dong She, Jingxuan Pang, Qihan Huang, Jiacheng Ying, Wanggui He, Yuanlei Hou, Siming Fu:

TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance. 2714-2723 - Edurne Bernal-Berdun, Ana Serrano, Belén Masiá, Matheus Gadelha, Yannick Hold-Geoffroy, Xin Sun, Diego Gutierrez:

PreciseCam: Precise Camera Control for Text-to-Image Generation. 2724-2733 - Jialuo Li, Wenhao Chai, Xingyu Fu, Haiyang Xu, Saining Xie:

Science-T2I: Addressing Scientific Illusions in Image Synthesis. 2734-2744 - Wataru Shimoda, Naoto Inoue, Daichi Haraguchi, Hayato Mitani, Seiichi Uchida, Kota Yamaguchi:

Type-R: Automatically Retouching Typos for Text-to-Image Generation. 2745-2754 - Qihao Liu, Xi Yin, Alan L. Yuille, Andrew Brown, Mannat Singh:

Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution. 2755-2765 - Chao Feng, Ziyang Chen, Aleksander Holynski, Alexei A. Efros, Andrew Owens:

GPS as a Control Signal for Image Generation. 2766-2778 - Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yuval Kluger, Linjie Yang, Peng Wang:

Dual Diffusion for Unified Image Generation and Understanding. 2779-2790 - Rishubh Parihar, Vaibhav Agrawal, Sachidanand VS, Venkatesh Babu Radhakrishnan:

Compass Control: Multi Object Orientation Control for Text-to-Image Generation. 2791-2801 - Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, Wenbo Li, Renjing Pei, Fan Li, Wangmeng Zuo:

MC^2: Multi-concept Guidance for Customized Multi-concept Generation. 2802-2812 - Bin Wu

, Wuxuan Shi, Jinqiao Wang, Mang Ye
:
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models. 2813-2823 - Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe

, Mubarak Shah:
Curriculum Direct Preference Optimization for Diffusion and Consistency Models. 2824-2834 - Rui Zhao, Weijia Mao, Mike Zheng Shou:

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles. 2835-2846 - Cong Xie, Han Zou, Ruiqi Yu, Yan Zhang, Zhenpeng Zhan:

SerialGen: Personalized Image Generation by First Standardization Then Personalization. 2847-2856 - Yuanbo Yang, Jiahao Shao, Xinyang Li, Yujun Shen, Andreas Geiger, Yiyi Liao:

Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation. 2857-2869 - Silin Gao, Sheryl Mathew, Li Mi, Sepideh Mamooler, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Syrielle Montariol, Antoine Bosselut:

VinaBench: Benchmark for Faithful and Consistent Visual Narratives. 2870-2879 - Bonan Li, Zicheng Zhang, Xingyi Yang, Xinchao Wang:

CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation. 2880-2890 - Zeqi Gu, Yin Cui, Zhaoshuo Li, Fangyin Wei, Yunhao Ge, Jinwei Gu, Ming-Yu Liu, Abe Davis, Yifan Ding:

ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary. 2891-2901 - Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng

, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell:
AutoPresent: Designing Structured Visuals from Scratch. 2902-2911 - Xi Wang, Hongzhen Li, Heng Fang, Yichen Peng, Haoran Xie, Xi Yang, Chuntao Li:

LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model. 2912-2923 - Siyuan Bian, Chenghao Xu

, Yuliang Xiu
, Artur Grigorev, Zhen Liu, Cewu Lu, Michael J. Black, Yao Feng:
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models. 2924-2934 - Haobin Zhong, Shuai He, Anlong Ming, Huadong Ma:

Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as An Exemplification. 2935-2944 - Zirun Guo, Tao Jin:

ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation. 2945-2954 - Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie:

DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture. 2955-2965 - Matan Rusanovsky, Shimon Malnick, Amir Jevnisek, Ohad Fried, Shai Avidan:

Memories of Forgotten Concepts. 2966-2975 - Basim Azam

, Naveed Akhtar:
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control. 2976-2985 - Yimeng Zhang, Tiancheng Zhi, Jing Liu, Shen Sang, Liming Jiang, Qing Yan, Sijia Liu, Linjie Luo:

ID-Patch: Robust ID Association for Group Photo Personalization. 2986-2996 - Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang, Kaidi Xu, Jindong Gu, Renjing Xu:

Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models. 2997-3007 - Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, Jian Zhang:

OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking. 3008-3018 - Yiren Song, Pei Yang, Hai Ci, Mike Zheng Shou:

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation. 3019-3028 - Mischa Dombrowski, Weitong Zhang, Sarah Cechnicka, Hadrien Reynaud, Bernhard Kainz:

Image Generation Diversity Issues and How to Tame Them. 3029-3039 - Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm

:
Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images. 3040-3050 - Jinwoo Kim, Sangmin Han, Jinho Jeong, Jiwoo Choi, Dongyeoung Kim, Seon Joo Kim:

ORIDa: Object-centric Real-world Image Composition Dataset. 3051-3060 - Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe, Trac D. Tran, Vishal M. Patel:

SINR: Sparsity Driven Compressed Implicit Neural Representations. 3061-3070 - Tiago Novello, Diana Aldana, Andre Araujo, Luiz Velho:

Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks. 3071-3080 - Yuki Kawana, Shintaro Shiba

, Quan Kong, Norimasa Kobori:
GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding. 3081-3090 - Yunfeng Xiao, Xiaowei Bai, Baojun Chen, Hao Su, Hao He, Liang Xie, Erwei Yin:

De^2Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation. 3091-3100 - Tianyun Zhong, Chao Liang, Jianwen Jiang, Gaojie Lin, Jiaqi Yang, Zhou Zhao:

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation. 3101-3110 - Juncheng Wang, Chao Xu, Cheng Yu, Lei Shang, Zhe Hu, Shujun Wang, Liefeng Bo:

Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition. 3111-3120 - Inho Kim, Youngkil Song, Jicheol Park, Won Hwa Kim, Suha Kwak:

Improving Sound Source Localization with Joint Slot Attention on Image and Audio. 3121-3130 - Chen Liu, Liying Yang, Peike Li, Dadong Wang, Lincheng Li, Xin Yu

:
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics. 3131-3141 - Eitan Shaar, Ariel Shaulov, Gal Chechik, Lior Wolf:

Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds. 3142-3151 - Zitang Zhou, Ke Mei, Yu Lu, Tianyi Wang, Fengyun Rao:

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization. 3152-3162 - Sanchayan Santra, Vishal M. Chudasama, Pankaj Wasnik, Vineeth N. Balasubramanian:

Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance. 3163-3172 - Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang:

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation. 3173-3183 - Xin Yan, Yuxuan Cai, Qiuyue Wang, Yuan Zhou, Wenhao Huang, Huan Yang:

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation. 3184-3194 - Zhengrong Yue, Shaobin Zhuang, Kunchang Li, Yanbo Ding, Yali Wang:

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents. 3195-3205 - Huiyu Duan, Qiang Hu, Jiarui Wang, Liu Yang, Zitong Xu, Lu Liu, Xiongkuo Min, Chunlei Cai, Tianxiao Ye, Xiaoyun Zhang, Guangtao Zhai:

FineVQ: Fine-Grained User Generated Content Video Quality Assessment. 3206-3217 - Kevin Qinghong Lin, Mike Zheng Shou:

VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary. 3218-3228 - Zicheng Zhang, Ziheng Jia, Haoning Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai:

Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs. 3229-3239 - Wei Li, Bing Hu, Rui Shao, Leyang Shen, Liqiang Nie:

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant. 3240-3251 - Kaixuan Wu

, Xinde Li, Xinling Li, Chuanfei Hu, Guoliang Wu:
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning. 3252-3261 - Huabin Liu, Filip Ilievski

, Cees G. M. Snoek:
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning. 3262-3271 - Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal:

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos. 3272-3283 - Haiyi Qiu, Minghe Gao, Long Qian, Kaihang Pan, Qifan Yu, Juncheng Li, Wenjie Wang, Siliang Tang, Yueting Zhuang, Tat-Seng Chua:

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training. 3284-3294 - Kangsan Kim, Geon Park, Youngwan Lee, Woongyeong Yeo, Sung Ju Hwang:

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding. 3295-3305 - Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li:

PAVE: Patching and Adapting Video Large Language Models. 3306-3317 - Shuming Liu, Chen Zhao, Tianqi Xu

, Bernard Ghanem
:
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding. 3318-3327 - Zhenpeng Huang, Xinhao Li, Jiaqi Li, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, Limin Wang:

Online Video Understanding: OVBench and VideoChat-Online. 3328-3338 - Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu:

Localizing Events in Videos with Multimodal Queries. 3339-3351 - Junho Kim, Hyunjun Kim, Hosu Lee, Yong Man Ro:

SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis. 3352-3362 - Sheng Zhou, Junbin Xiao, Qingyun Li, Yicong Li, Xun Yang, Dan Guo, Meng Wang, Tat-Seng Chua, Angela Yao:

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering. 3363-3373 - Felix Vogel, Walid Bousselham, Anna Kukleva, Nina Shvetsova, Hilde Kuehne:

VideoGEM: Training-free Action Grounding in Videos. 3374-3383 - Aaryan Garg, Akash Kumar, Yogesh S. Rawat:

STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding. 3384-3394 - Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi, Carlo Masone

, Giuseppe Averta:
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation. 3395-3405 - Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, Qianqian Wang:

Segment Any Motion in Videos. 3406-3416 - Haiyang Mei, Pengyu Zhang, Mike Zheng Shou:

SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost. 3417-3426 - Andrei Dumitriu

, Florin Tatui, Florin Miron, Aakash Ralhan
, Radu Tudor Ionescu, Radu Timofte:
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety. 3427-3437 - Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca, Juergen Gall:

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation. 3438-3448 - Yilong Wang, Zilin Gao, Qilong Wang, Zhaofeng Chen, Peihua Li, Qinghua Hu:

TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition. 3449-3459 - Shaopeng Yang, Jilong Wang, Saihui Hou, Xu Liu, Chunshui Cao, Liang Wang, Yongzhen Huang:

Bridging Gait Recognition and Large Language Models Sequence Modeling. 3460-3469 - Lorenzo Mur-Labadia, Josechu Guerrero, Ruben Martinez-Cantin:

DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos. 3470-3480 - Shengeng Tang, Jiayi He, Lechao Cheng, Jingjing Wu, Dan Guo, Richang Hong:

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations. 3481-3491 - Zezeng Li, Xiaoyu Du, Na Lei, Liming Chen, Weimin Wang:

NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary. 3492-3502 - Li Lin, Santosh Santosh, Mingyang Wu, Xin Wang, Shu Hu:

AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark. 3503-3515 - Fengfan Zhou, Bangjie Yin, Hefei Ling, Qianyu Zhou, Wenxuan Wang:

Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation. 3516-3527 - Saeed Ebrahimi, Sahar Rahimi Malakshan, Ali Dabouei, Srinjoy Das, Jeremy M. Dawson, Nasser M. Nasrabadi:

GIF: Generative Inspiration for Face Recognition at Scale. 3528-3539 - Li Lun

, Kunyu Feng, Qinglong Ni, Ling Liang, Yuan Wang, Ying Li, Dunshan Yu, Xiaoxin Cui:
Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients. 3540-3551 - Ziqi Li, Tao Gao, Yisheng An, Ting Chen, Jing Zhang, Yuanbo Wen, Mengkun Liu, Qianxi Zhang:

Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection. 3552-3562 - Tian Gao, Yu Zhang, Zhiyuan Zhang

, Huajun Liu, Kaijie Yin, Chengzhong Xu, Hui Kong:
BHViT: Binarized Hybrid Vision Transformer. 3563-3572 - Zhenyu Cui, Jiahuan Zhou, Yuxin Peng:

DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification. 3573-3582 - Jingwei Zhang, Anh Tien Nguyen, Xi Han, Vincent Quoc-Huy Trinh, Hong Qin, Dimitris Samaras, Mahdi S. Hosseini:

2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification. 3583-3592 - Jose Henrique Lima Marques, Jeffri Murrugarra-Llerena, Cláudio R. Jung:

GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection. 3593-3602 - Biplab Chandra Das, Viswanath Gopalakrishnan:

Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering. 3603-3613 - Datao Tang, Xiangyong Cao, Xuan Wu, Jialin Li, Jing Yao, Xueru Bai, Dongsheng Jiang, Yin Li, Deyu Meng:

AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation. 3614-3624 - Zhe Shan, Yang Liu, Lei Zhou

, Cheng Yan, Heng Wang
, Xia Xie:
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object. 3625-3635 - Phuc Nguyen, Minh Luu, Anh Tuan Tran, Cuong Pham, Khoi Nguyen:

Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking. 3636-3645 - Joey Wilson, Marcelino Almeida, Sachit Mahajan, Martin Labrie, Maani Ghaffari, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnab Sen:

POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality. 3646-3655 - Runsong Zhu, Shi Qiu, Zhengzhe Liu, Ka-Hei Hui, Qianyi Wu, Pheng-Ann Heng, Chi-Wing Fu:

Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting. 3656-3665 - Wenxuan Guo, Xiuwei Xu, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu:

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding. 3666-3675 - Yijie Tang, Jiazhao Zhang, Yuqing Lan, Yulan Guo, Dezun Dong, Chenyang Zhu, Kai Xu:

OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging. 3676-3685 - Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo, Yang Liu, Ming Yang

, Chunhua Shen:
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories. 3686-3696 - Savya Khosla, Sethuraman TV, Alexander G. Schwing, Derek Hoiem:

RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations. 3697-3706 - Rong Li, Shijie Li, Lingdong Kong, Xulei Yang, Junwei Liang:

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding. 3707-3717 - Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue:

ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning. 3718-3727 - Ziyang Zhou, Pinghui Wang, Zi Liang, Haitao Bai, Ruofei Zhang:

Cross-Modal 3D Representation with Multi-View Images and Point Clouds. 3728-3739 - Austin Stone, Hagen Soltau, Robert Geirhos, Xi Yi, Ye Xia, Bingyi Cao, Kaifeng Chen, Abhijit Ogale, Jonathon Shlens:

Learning Visual Composition through Improved Semantic Guidance. 3740-3750 - Keyu Guo, Yongle Huang, Shijie Sun, Xiangyu Song, Mingtao Feng, Zedong Liu, Huansheng Song, Tiantian Wang, Jianxin Li, Naveed Akhtar

, Ajmal Saeed Mian
:
Beyond Human Perception: Understanding Multi-Object World from Monocular View. 3751-3760 - Hongyan Zhi, Peihao Chen, Junyan Li, Shuailei Ma, Xinyu Sun, Tianhang Xiang, Yinjie Lei, Mingkui Tan, Chuang Gan:

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences. 3761-3771 - Jiajun Deng, Tianyu He, Li Jiang, Tianyu Wang

, Feras Dayoub
, Ian D. Reid:
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer. 3772-3782 - Benlin Liu, Yuhao Dong, Yiqin Wang, Zixian Ma, Yansong Tang, Luming Tang, Yongming Rao, Wei-Chiu Ma, Ranjay Krishna:

Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model. 3783-3792 - Efstathios Karypidis, Ioannis Kakogeorgiou

, Spyros Gidaris, Nikos Komodakis:
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers. 3793-3803 - Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen:

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation. 3804-3814 - Haoqiang Kang, Enna Sachdeva, Piyush Gupta, Sangjae Bae, Kwonjoon Lee:

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks. 3815-3825 - Cheng Chen, Yunpeng Zhai, Yifan Zhao, Jinyang Gao, Bolin Ding, Jia Li:

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning. 3826-3835 - Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh, Ethan Shen, Dongping Chen, Linda G. Shapiro, Ranjay Krishna:

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models. 3836-3845 - Akhil Perincherry, Jacob Krantz, Stefan Lee:

Do Visual Imaginations Improve Vision-and-Language Navigation Agents? 3846-3855 - Fan Yang, Ru Zhen, Jianing Wang, Yanhao Zhang, Haoxiang Chen, Haonan Lu, Sicheng Zhao, Guiguang Ding:

HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator. 3856-3866 - Ailin Deng, Tri Cao, Zhirui Chen, Bryan Hooi:

Words or Vision: Do Vision-Language Models Have Blind Faith in Text? 3867-3876 - Christopher Chou, Lisa Dunlap, Koki Mashita, Krishna Mandal, Trevor Darrell, Ion Stoica, Joseph E. Gonzalez, Wei-Lin Chiang:

VisionArena: 230k Real World User-VLM Conversations with Preference Labels. 3877-3887 - Wen Yin

, Yong Wang, Guiduo Duan, Dongyang Zhang, Xin Hu, Yuan-Fang Li, Tao He:
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition. 3888-3898 - Yangyu Huang, Tianyi Gao, Haoran Xu, Qihao Zhao, Yang Song, Zhipeng Gui, Tengchao Lv, Hao Chen, Lei Cui, Scarlett Li, Furu Wei:

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs. 3899-3908 - Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra, Zsolt Kira:

FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering. 3909-3918 - Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma:

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks. 3919-3930 - Wenbo Chen, Zhen Xu, Ruotao Xu, Si Wu, Hau-San Wong:

Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding. 3931-3941 - Yue Han, Jiangning Zhang, Junwei Zhu, Runze Hou, Xiaozhong Ji, Chuming Lin, Xiaobin Hu, Zhucun Xue, Yong Liu:

GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model. 3942-3951 - Yang Bai, Yucheng Ji, Min Cao, Jinqiao Wang, Mang Ye

:
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment. 3952-3962 - Ruotian Peng, Haiying He, Yake Wei, Yandong Wen, Di Hu:

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception. 3963-3973 - Likai Tian, Jian Zhao, Zechao Hu, Zhengwei Yang, Hao Li, Lei Jin, Zheng Wang, Xuelong Li:

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval. 3974-3983 - You Li, Fan Ma, Yi Yang:

Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy. 3984-3993 - Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava:

CoLLM: A Large Language Model for Composed Image Retrieval. 3994-4004 - Zhenxing Zhang, Yaxiong Wang, Lechao Cheng, Zhun Zhong, Dan Guo, Meng Wang:

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding. 4005-4014 - Yikun Liu, Yajie Zhang

, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiangchao Yao, Yanfeng Wang, Weidi Xie:
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant. 4015-4025 - Yuchen Duan, Zhe Chen, Yusong Hu, Weiyun Wang, Shenglong Ye

, Botian Shi, Lewei Lu, Qibin Hou, Tong Lu, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Docopilot: Improving Multimodal Models for Document-Level Understanding. 4026-4037 - Wenhui Liao, Jiapeng Wang, Hongliang Li, Chengyu Wang, Jun Huang, Lianwen Jin:

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding. 4038-4049 - Jeong Ryong Lee, Yejee Shin, Geonhui Son, Dosik Hwang:

Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning. 4050-4059 - Guotao Liang, Baoquan Zhang, Zhiyuan Wen, Junteng Zhao, Yunming Ye, Kola Ye, Yao He:

Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text. 4060-4069 - Hyungyu Choi, Young Kyun Jang, Chanho Eom:

GOAL: Global-local Object Alignment Learning. 4070-4079 - Anjia Cao, Xing Wei, Zhiheng Ma:

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training. 4080-4090 - Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram, Dibakar Gope:

Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales. 4091-4100 - Yassine Ouali, Adrian Bulat, Alexandros Xenos, Anestis Zaganidis, Ioannis Maniadis Metaxas, Brais Martínez, Georgios Tzimiropoulos:

VladVA: Discriminative Fine-tuning of LVLMs. 4101-4111 - Tianyu Chen, Xingcheng Fu, Yisen Gao, Haodong Qian, Yuecen Wei, Kun Yan, Haoyi Zhou, Jianxin Li:

Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding. 4112-4121 - Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Haotian Tang, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Jinyi Hu, Sifei Liu, Ranjay Krishna, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu:

NVILA: Efficient Frontier Visual Language Models. 4122-4134 - Jing Bi, Junjia Guo, Yunlong Tang

, Lianggong Bruce Wen, Zhang Liu, Bingjie Wang, Chenliang Xu:
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach. 4135-4144 - Xudong Lu, Yinghao Chen, Cheng Chen, Hui Tan, Boheng Chen, Yina Xie, Rui Hu, Guanxin Tan, Renshou Wu, Yan Hu, Yi Zeng, Lei Wu, Liuyang Bian, Zhaoxiong Wang, Long Liu, Yanzhou Yang, Han Xiao, Aojun Zhou, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li:

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices. 4145-4155 - Junyan Lin, Haoran Chen, Yue Fan, Yingqi Fan, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen:

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices. 4156-4166 - Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang:

MBQ: Modality-Balanced Quantization for Large Vision-Language Models. 4167-4177 - Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen:

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement. 4178-4188 - Xianwei Zhuang, Zhihong Zhu, Yuxin Xie, Liming Liang, Yuexian Zou:

VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification. 4189-4199 - Dokyoon Yoon, Youngsook Song, Woomyoung Park:

Stop Learning it all to Mitigate Visual Hallucination, Focus on the Hallucination Target. 4200-4208 - Junzhe Chen, Tianshu Zhang, Shiyu Huang, Yuwei Niu, Linfeng Zhang, Lijie Wen, Xuming Hu:

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models. 4209-4221 - Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi, Rita Cucchiara:

Hyperbolic Safety-Aware Vision-Language Models. 4222-4232 - Jin Wang, Chenghui Lv, Xian Li, Shichao Dong, Huadong Li, Kelu Yao, Chao Li, Wenqi Shao, Ping Luo:

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models. 4233-4245 - Haoyu Zhang, Yangyang Guo, Mohan S. Kankanhalli:

Joint Vision-Language Social Bias Removal for CLIP. 4246-4255 - Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai, Kazuki Adachi

, Daiki Chijiwa:
Post-pre-training for Modality Alignment in Vision-Language Foundation Models. 4256-4266 - Karsten Roth, Zeynep Akata, Dima Damen, Ivana Balazevic, Olivier J. Hénaff:

Context-Aware Multimodal Pretraining. 4267-4279 - Yi Zhang, Yi-Xuan Deng, Meng-Hao Guo, Shi-Min Hu:

Adaptive Parameter Selection for Tuning Vision-Language Models. 4280-4290 - Yabin Wang, Zhiwu Huang, Xiaopeng Hong:

OpenSDI: Spotting Diffusion-Generated Images in the Open World. 4291-4301 - Jianyu Lai, Sixiang Chen, Yunlong Lin, Tian Ye, Yun Liu, Song Fei, Zhaohu Xing, Hongtao Wu, Weiming Wang, Lei Zhu:

SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization. 4302-4312 - Kevin Miller, Aditya Gangrade, Samarth Mishra, Kate Saenko, Venkatesh Saligrama:

SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models. 4313-4321 - Zhaogeng Liu, Haozhen Zhang, Hualin Zhang, Xingchen Li, Wanli Shi, Bin Gu, Yi Chang:

Query Efficient Black-Box Visual Prompting with Subspace Learning. 4322-4331 - Xueyu Liu, Rui Wang, Yexin Lai, Guangze Shi

, Feixue Shao, Fang Hao, Jianan Zhang, Jia Shen
, Yongfei Wu, Wen Zheng:
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater. 4332-4342 - Xueyi Ke, Satoshi Tsutsui, Yayun Zhang, Bihan Wen:

Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning. 4343-4352 - Li Ren, Chen Chen, Liqiang Wang, Kien Hua:

DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers. 4353-4363 - Wenlong Yu, Qilong Wang, Chuang Liu, Dong Li, Qinghua Hu:

CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification. 4364-4374 - Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai, Jianyang Gu, Ziheng Zhang, Kazi Sajeed Mehrab, Elizabeth G. Campolongo, Daniel I. Rubenstein, Charles V. Stewart, Anuj Karpatne, Tanya Y. Berger-Wolf, Yu Su, Wei-Lun Chao:

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis. 4375-4385 - Aaron Serianni, Tyler Zhu, Olga Russakovsky, Vikram V. Ramaswamy:

Attention IoU: Examining Biases in CelebA using Attention Maps. 4386-4397 - Guangda Ji

, Silvan Weder, Francis Engelmann
, Marc Pollefeys, Hermann Blum:
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding. 4398-4407 - Andrey Gizdov, Shimon Ullman, Daniel Harari:

Seeing More with Less: Human-like Representations in Vision Models. 4408-4417 - Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh

, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu:
Argus: A Compact and Versatile Foundation Model for Vision. 4418-4429 - Unki Park, Seongmoon Jeong, Youngchan Jang, Gyeong-Moon Park, Jong Hwan Ko:

Test-Time Fine-Tuning of Image Compression Models for Multi-Task Adaptability. 4430-4440 - Sofia Casarin, Sergio Escalera

, Oswald Lanz:
L-SWAG: Layer-Sample Wise Activation with Gradients Information for Zero-Shot NAS on Vision Transformers. 4441-4451 - Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu:

NADER: Neural Architecture Design via Multi-Agent Collaboration. 4452-4461 - Minghao Fu, Hao Yu, Jie Shao, Junjie Zhou, Ke Zhu, Jianxin Wu:

Quantization without Tears. 4462-4472 - Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu:

Parallel Sequence Modeling via Generalized Spatial Propagation Network. 4473-4483 - Weihao Yu, Xinchao Wang:

MambaOut: Do We Really Need Mamba for Vision? 4484-4496 - Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang

, Yunsheng Wu, Lei Xie:
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network. 4497-4507 - Hao Yu, Tangyu Jiang, Shuning Jia, Shannan Yan, Shunning Liu, Haolong Qian, Guanghao Li, Shuting Dong, Chun Yuan:

ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices. 4508-4517 - Yuwei Sun, Hideya Ochiai, Zhirong Wu, Stephen Lin, Ryota Kanai:

Associative Transformer. 4518-4527 - Jon Donnelly, Zhicheng Guo, Alina Jade Barnett, Hayden McTavish, Chaofan Chen, Cynthia Rudin:

Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time. 4528-4538 - Xin Lin, Chong Shi, Zuopeng Yang, Haojin Tang, Zhili Zhou:

SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection. 4539-4549 - Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu, Muntasir Wahed, Ismini Lourentzou:

CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models. 4550-4561 - Zelin Peng, Zhengqin Xu, Zhilin Zeng, Changsong Wen, Yu Huang, Menglin Yang, Feilong Tang, Wei Shen:

Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space. 4562-4572 - Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto:

Scaling up Image Segmentation across Data and Tasks. 4573-4583 - Amin Karimi, Charalambos Poullis:

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation. 4584-4594 - Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, Sibei Yang:

Rethinking Query-based Transformer for Continual Image Segmentation. 4595-4606 - Seun-An Choe, Keon-Hee Park, Jinwoo Choi, Gyeong-Moon Park:

Universal Domain Adaptation for Semantic Segmentation. 4607-4617 - Yuhan Liu, Yixiong Zou, Yuhua Li, Ruixuan Li:

The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation. 4618-4627 - Yan Yang, Liyuan Pan, Dongxu Li, Liu Liu:

EZSR: Event-based Zero-Shot Recognition. 4628-4638 - Xianing Chen, Si Huo, Borui Jiang, Hailin Hu, Xinghao Chen:

Single Domain Generalization for Few-Shot Counting via Universal Representation Matching. 4639-4649 - Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei:

Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport. 4650-4660 - Dongseob Kim, Hyunjung Shim:

Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification. 4661-4671 - Phi Vu Tran:

SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection. 4672-4681 - Aming Wu, Cheng Deng:

Percept, Memory, and Imagine: World Feature Simulating for Open-Domain Unknown Object Detection. 4682-4691 - Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander:

Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection. 4692-4702 - Zhixiong Nan, Xianghong Li, Jifeng Dai, Tao Xiang

:
MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism. 4703-4712 - Huixin Sun, Runqi Wang, Yanjing Li, Linlin Yang, Shaohui Lin, Xianbin Cao, Baochang Zhang:

SET: Spectral Enhancement for Tiny Object Detection. 4713-4723 - Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, Yan Gu:

Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection. 4724-4733 - Kaichen Yang, Junjie Cao, Zeyu Bai, Zhixun Su, Andrea Tagliasacchi:

PIAD: Pose and Illumination agnostic Anomaly Detection. 4734-4743 - Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang

, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, S. Kevin Zhou
:
AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP. 4744-4754 - Ziming Huang, Xurui Li, Haotian Liu, Feng Xue, Yuzhe Wang, Yu Zhou:

AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios. 4755-4765 - Xiaofan Li, Xin Tan, Zhuo Chen, Zhizhong Zhang, Ruixin Zhang, Rizen Guo, Guannan Jiang, Yulong Chen, Yanyun Qu, Lizhuang Ma, Yuan Xie:

One-for-More: Continual Diffusion Model for Anomaly Detection. 4766-4775 - Shibin Mei, Hang Wang, Bingbing Ni:

GeoMM: On Geodesic Perspective for Multi-modal Learning. 4776-4786 - Seonggon Kim, Juncheol Shin, Seung-taek Woo, Eunhyeok Park:

HOT: Hadamard-based Optimized Training. 4787-4796 - Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao:

DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation. 4797-4806 - Jiamu Zhang, Shaochen Zhong, Andrew Ye, Zirui Liu, Sebastian Zhao, Kaixiong Zhou, Li Li, Soo-Hyun Choi, Rui Chen, Xia Hu, Shuai Xu, Vipin Chaudhary:

Flexible Group Count Enables Hassle-Free Structured Pruning. 4807-4818 - Fu Feng

, Yucheng Xie, Jing Wang, Xin Geng:
WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models. 4819-4828 - Hongxu Chen, Zhen Wang, Runshi Li, Bowei Zhu, Long Chen:

IterIS: Iterative Inference-Solving Alignment for LoRA Merging. 4829-4838 - Qiang Wang, Xiang Song, Yuhang He, Jizhou Han, Chenhao Ding, Xinyuan Gao, Yihong Gong:

Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need. 4839-4849 - Yuhao Zhou, Yuxin Tian, Jindi Lv, Mingjia Shi, Yuanxi Li, Qing Ye, Shuhao Zhang, Jiancheng Lv:

Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints. 4850-4861 - Xiaohan Zou, Wenchao Ma, Shu Zhao:

Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning. 4862-4873 - Hao Yu, Xin Yang, Le Zhang, Hanlin Gu, Tianrui Li, Lixin Fan

, Qiang Yang:
Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor. 4874-4883 - Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto:

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning. 4884-4893 - Guannan Lai, Yujie Li, Xiangkun Wang, Junbo Zhang, Tianrui Li, Xin Yang:

Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping. 4894-4904 - Vaibhav Rathore, Shubhranil B, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee:

When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach. 4905-4915 - Yue Zhang, Mingyue Bin, Yuyang Zhang, Zhongyuan Wang, Zhen Han, Chao Liang:

Link-based Contrastive Learning for One-Shot Unsupervised Domain Adaptation. 4916-4926 - Weiming Liu, Jun Dan, Fan Wang, Xinting Liao, Junhao Dong, Hua Yu, Shunjie Dong, Lianyong Qi:

Distinguish Then Exploit: Source-free Open Set Domain Adaptation via Weight Barcode Estimation and Sparse Label Assignment. 4927-4938 - Shinnosuke Matsuo, Riku Togashi, Ryoma Bise, Seiichi Uchida, Masahiro Nomura:

Instance-wise Supervision-level Optimization in Active Learning. 4939-4947 - Sk Miraj Ahmed, Umit Yigit Basaran, Dripta S. Raychaudhuri, Arindam Dutta, Rohit Kundu, Fahim Faisal Niloy, Basak Guler, Amit K. Roy-Chowdhury:

Towards Source-Free Machine Unlearning. 4948-4957 - Taero Kim, Subeen Park, Sungjun Lim, Yonghan Jung, Krikamol Muandet, Kyungwoo Song:

Sufficient Invariant Learning for Distribution Shift. 4958-4967 - Zhiwei Ling, Yachen Chang, Hailiang Zhao, Xinkui Zhao, Kingsum Chow, Shuiguang Deng:

CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging. 4968-4977 - Zheng Wang, Zihui Wang, Zheng Wang, Xiaoliang Fan, Cheng Wang:

Federated Learning with Domain Shift Eraser. 4978-4987 - Run He, Kai Tong, Di Fang, Han Sun, Ziqian Zeng, Haoran Li, Tianyi Chen, Huiping Zhuang:

AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models. 4988-4998 - K. Naveen Kumar, Ranjeet Ranjan Jha, C. Krishna Mohan, Ravindra Babu Tallamraju:

Fortifying Federated Learning Towards Trustworthiness via Auditable Data Valuation and Verifiable Client Contribution. 4999-5009 - Tae-Young Lee, Sundong Park, Minwoo Jeon, Hyoseok Hwang, Gyeong-Moon Park:

ESC: Erasing Space Concept for Knowledge Deletion. 5010-5019 - Jiate Li

, Meng Pang, Yun Dong, Binghui Wang:
Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary Perturbations. 5020-5029 - Keke Tang, Chao Hou, Weilong Peng, Xiang Fang, Zhize Wu, Yongwei Nie, Wenping Wang, Zhihong Tian:

Simplification Is All You Need against Out-of-Distribution Overconfidence. 5030-5040 - Ping Guo

, Cheng Gong
, Xi Lin
, Fei Liu
, Zhichao Lu
, Qingfu Zhang, Zhenkun Wang:
MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework. 5041-5051 - Banglong Liu

, Niuniu Qi, Xia Zeng, Lydia Dehbi, Zhengfeng Yang:
Automated Proof of Polynomial Inequalities via Reinforcement Learning. 5052-5060 - Haiming Xu, Qianqian Wang, Boyue Wang, Quanxue Gao:

Deep Fair Multi-View Clustering with Attention KAN. 5061-5070 - Yuzhuo Dai, Jiaqi Jin, Zhibin Dong, Siwei Wang, Xinwang Liu, En Zhu, Xihong Yang, Xinbiao Gan, Yu Feng:

Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning. 5071-5081 - Zijian Dong, Yilei Wu, Chongyao Chen, Yingtian Zou, Yichi Zhang, Juan Helen Zhou:

Improve Representation for Imbalanced Regression through Geometric Constraints. 5082-5091 - Shanglin Liu, Jianming Lv, Jingdan Kang, Huaidong Zhang, Zequan Liang, Shengfeng He:

MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining. 5092-5101 - Yingxue Xu, Fengtao Zhou

, Chenyu Zhao, Yihui Wang
, Can Yang, Hao Chen:
Distilled Prompt Learning for Incomplete Multimodal Survival Prediction. 5102-5111 - Wei Li, Jiawei Jiang, Jie Wu, Kaihao Yu, Jianwei Zheng:

LMO: Linear Mamba Operator for MRI Reconstruction. 5112-5122 - Xiao Wang, Fuling Wang, Yuehang Li, Qingchuan Ma, Shiao Wang, Bo Jiang, Jin Tang:

CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset. 5123-5133 - Ying Chen, Guoan Wang, Yuanfeng Ji, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, Junjun He:

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. 5134-5143 - Junxian Wu, Minheng Chen

, Xinyi Ke, Tianwang Xun, Xiaoming Jiang, Hongyu Zhou, Lizhi Shao, Youyong Kong:
Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images. 5144-5153 - Ziyuan Yang, Yingyu Chen, Zhiwen Wang, Hongming Shan, Yang Chen, Yi Zhang:

Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model. 5154-5163 - Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang:

Multi-modal Vision Pre-training for Medical Image Analysis. 5164-5174 - Qinghe Ma, Jian Zhang, Zekun Li, Lei Qi, Qian Yu, Yinghuan Shi:

Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation. 5175-5185 - Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Maximilian Miller, Leander Maerkisch, Paul F. Jaeger, Klaus H. Maier-Hein

:
Revisiting MAE Pre-training for 3D Medical Image Segmentation. 5186-5196 - Feng Yu

, Jiacheng Cao, Li Liu, Minghua Jiang:
SuperLightNet: Lightweight Parameter Aggregation Network for Multimodal Brain Tumor Segmentation. 5197-5206 - Jiongtong Hu, Wufeng Xue, Jun Cheng, Yingying Liu, Wei Zhuo, Dong Ni:

EchoONE: Segmenting Multiple Echocardiography Planes in One Model. 5207-5216 - Jinho Joo, Hyeseong Kim, Hyeyeon Won, Deukhee Lee, Taejoon Eo, Dosik Hwang:

AeSPa : Attention-guided Self-supervised Parallel Imaging for MRI Reconstruction. 5217-5226 - Xinxing Cheng, Tianyang Zhang, Wenqi Lu, Qingjie Meng, Alejandro F. Frangi, Jinming Duan

:
SACB-Net: Spatial-awareness Convolutions for Medical Image Registration. 5227-5237 - Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij, Luca Lumetti, Vittorio Pipoli

, Elisa Ficarra, Shankeeth Vinayahalingam, Costantino Grana:
Segmenting Maxillofacial Structures in CBCT Volumes. 5238-5248 - Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield:

FoundationStereo: Zero-Shot Stereo Matching. 5249-5260 - Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, Jiaolong Yang:

MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision. 5261-5271 - Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao:

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation. 5272-5282 - Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander G. Schwing, Zhicheng Yan:

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds. 5283-5293 - Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotný:

VGGT: Visual Geometry Grounded Transformer. 5294-5306 - Weiyu Li, Jiarui Liu, Hongyu Yan, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, Xiaoxiao Long:

CraftsMan3D: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner. 5307-5317 - Felix Taubner, Ruihang Zhang, Mathieu Tuli, David B. Lindell:

CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models. 5318-5330 - Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr:

Reanimating Images using Neural Representations of Dynamic Stimuli. 5331-5343 - Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim, Yuting Ye, Richard A. Newcombe, Ziwei Liu, Lingni Ma:

EgoLM: Multi-Modal Language Model of Egocentric Motions. 5344-5354 - Yan Xia, Xiaowei Zhou, Etienne Vouga, Qixing Huang, Georgios Pavlakos:

Reconstructing Humans with a Biomechanically Accurate Skeleton. 5355-5365 - Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer:

MEGA: Masked Generative Autoencoder for Human Mesh Recovery. 5366-5378 - Liang Pan, Zeshi Yang, Zhiyang Dou, Wenjia Wang, Buzhen Huang, Bo Dai, Taku Komura, Jingbo Wang:

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization. 5379-5391 - Laurie Bose, Jianing Chen, Piotr Dudek:

Descriptor-In-Pixel : Point-Feature Tracking For Pixel Processor Arrays. 5392-5400 - Anna Manasyan, Maximilian Seitzer, Filip Radovic, Georg Martius, Andrii Zadaianchuk:

Temporally Consistent Object-Centric Learning by Contrasting Slots. 5401-5411 - SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo:

Temporal Alignment-Free Video Matching for Few-shot Action Recognition. 5412-5421 - Zhejun Zhang, Péter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, Marco Pavone

:
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models. 5422-5432 - Otto Brookes, Maksim Kukushkin, Majid Mirmehdi

, Colleen Stephens, Paula Dieguez, Thurston C. Hicks, Sorrel Jones, Kevin Lee, Maureen S. McCarthy, Amelia Meier, Emmanuelle Normand, Erin G. Wessling, Roman M. Wittig, Kevin Langergraber, Klaus Zuberbühler, Lukas Boesch, Thomas Schmid, Mimi Arandjelovic, Hjalmar S. Kühl, Tilo Burghardt:
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition. 5433-5443 - Yichen Xiao, Shuai Wang, Dehao Zhang, Wenjie Wei, Yimeng Shan

, Xiaoli Liu, Yulin Jiang, Malu Zhang:
Rethinking Spiking Self-Attention Mechanism: Implementing a-XNOR Similarity Calculation in Spiking Transformers. 5444-5454 - Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Lanqing Hong, Lu Hou, Hang Xu:

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions. 5455-5466 - Xiumei Xie, Zikai Huang

, Wenhao Xu, Peng Xiao, Xuemiao Xu, Huaidong Zhang:
Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation. 5467-5476 - Antoni Bigata Casademunt, Michal Stypulkowski, Rodrigo Mira, Stella Bounareli, Konstantinos Vougioukas, Zoe Landgraf, Nikita Drobyshev, Maciej Zieba, Stavros Petridis, Maja Pantic:

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation. 5477-5488 - Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma

:
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. 5489-5498 - Di Chang, Hongyi Xu, You Xie, Yipeng Gao, Zhengfei Kuang, Shengqu Cai, Chenxu Zhang, Guoxian Song, Chao Wang, Yichun Shi, Zeyuan Chen, Shijie Zhou, Linjie Luo, Gordon Wetzstein, Mohammad Soleymani:

X-Dyna: Expressive Dynamic Human Image Animation. 5499-5509 - Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M. George, Xueming Yu, Gabriel Dedic, Ahmet Levent Tasel, Ning Yu, Vishal M. Patel, Paul E. Debevec:

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset. 5510-5522 - Shengjie Gong, Haojie Li, Jiapeng Tang, Dongming Hu, Shuangping Huang, Hao Chen, Tianshui Chen, Zhuoman Liu:

Monocular and Generalizable Gaussian Talking Head Animation. 5523-5534 - Jiawei Zhang, Zijian Wu, Zhiyang Liang, Yicheng Gong, Dongfang Hu, Yao Yao, Xun Cao, Hao Zhu:

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video. 5535-5545 - Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias Nießner:

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion. 5546-5558 - Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, Chen Cao:

Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior. 5559-5570 - Yufan Wu, Xuanhong Chen, Wen Li, Shunran Jia, Hualiang Wei, Kairui Feng, Jialiang Chen, Yuhan Li, Ang He, Weimin Zhang, Bingbing Ni, Wenjun Zhang:

SinGS: Animatable Single-Image Human Gaussian Splats with Kinematic Priors. 5571-5580 - Suzhen Wang

, Weijie Chen, Wei Zhang, Minda Zhao, Lincheng Li, Rongsheng Zhang, Zhipeng Hu, Xin Yu
:
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting. 5581-5591 - Yuxin Yao

, Zhi Deng, Junhui Hou
:
RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos. 5592-5601 - Yuxiang Mao

, Zhenfeng Fan, ZhiJie Zhang, Zhiheng Zhang, Shihong Xia:
Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model. 5602-5613 - Wooseok Jang, Youngjun Hong, Geonho Cha, Seungryong Kim:

ControlFace: Harnessing Facial Parametric Control for Face Rigging. 5614-5624 - Yifang Xu, Benxiang Zhai, Yunzhuo Sun, Ming Li, Yang Li, Sidan Du:

HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion. 5625-5635 - Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee:

DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image. 5636-5645 - Tengfei Xiao, Yue Wu, Yuelong Li, Can Qin, Maoguo Gong, Qiguang Miao, Wenping Ma:

Disentangled Pose and Appearance Guidance for Multi-Pose Generation. 5646-5655 - Yuanbo Wang, Zhaoxuan Zhang, Jiajin Qiu, Dilong Sun, Zhengyu Meng, Xiaopeng Wei, Xin Yang:

Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction. 5656-5665 - Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, Ping Luo:

MangaNinja: Line Art Colorization with Precise Reference Following. 5666-5677 - Qingsen Yan, Yixu Feng, Cheng Zhang, Guansong Pang

, Kangbiao Shi, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang:
HVI: A New Color Space for Low-light Image Enhancement. 5678-5687 - Tianfu Wang, Mingyang Xie, Haoming Cai, Sachin Shah, Christopher A. Metzler:

Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation. 5688-5698 - Feiran Li, Haiyang Jiang, Daisuke Iso:

Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising. 5699-5708 - Hang Chen, Yin Xie, Xiaoxiu Peng, Lihu Sun, Wenkai Su, Xiaodong Yang, Chengming Liu:

Quad-Pixel Image Defocus Deblurring: A New Benchmark and Model. 5709-5719 - Jun Myeong Choi, Annie Wang, Pieter Peers, Anand Bhattad, Roni Sengupta:

ScribbleLight: Single Image Indoor Relighting with Scribbles. 5720-5731 - Xiulong Liu, Anurag Kumar, Paul Calamia, Sebastià Vicenc Amengual Garí, Calvin Murdock, Ishwarya Ananthabhotla

, Philip W. Robinson, Eli Shlizerman, Vamsi Krishna Ithapu, Ruohan Gao:
Hearing Anywhere in Any Environment. 5732-5741 - Tao Xie, Xi Chen, Zhen Xu, Yiman Xie, Yudong Jin, Yujun Shen, Sida Peng, Hujun Bao, Xiaowei Zhou:

EnvGS: Modeling View-Dependent Appearance with Environment Gaussian. 5742-5751 


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID