


default search action
WACV 2025: Tucson, AZ, USA
- IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025, Tucson, AZ, USA, February 26 - March 6, 2025. IEEE 2025, ISBN 979-8-3315-1083-1
- Joanna Kaleta, Kacper Kania, Tomasz Trzcinski, Marek Kowalski:
LumiGauss: Relightable Gaussian Splatting in the Wild. 1-10 - Junjie Wang, Tomas Nordström:
Latency Robust Cooperative Perception Using Asynchronous Feature Fusion. 1-10 - Jordan Voas, Wei-Cheng Tseng, Layne Berry, Xixi Hu, Puyuan Peng, James Stuedemann, David Harwath:
Temporally Streaming Audio-Visual Synchronization for Real-World Videos. 1-9 - Seul-Ki Yeom, Julian von Klitzing:
U-MixFormer: UNet-Like Transformer with Mix-Attention for Efficient Semantic Segmentation. 1-10 - Seong Jong Yoo, Snehesh Shrestha, Irina Muresanu, Cornelia Fermüller:
VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference. 1-12 - Jane Wu, Diego Thomas, Ronald Fedkiw:
Sparse-View 3D Reconstruction of Clothed Humans via Normal Maps. 11-22 - Théo Morales, Omid Taheri, Gerard Lacey:
A Versatile and Differentiable Hand-Object Interaction Representation. 23-33 - Kohei Matsuzaki, Keisuke Nonaka:
Point Cloud Color Upsampling with Attention-Based Coarse Colorization and Refinement. 34-43 - Vincenzo Polizzi, Marco Cannici, Davide Scaramuzza, Jonathan Kelly:
FaVoR: Features via Voxel Rendering for Camera Relocalization. 44-53 - Pallabjyoti Deka, Saumik Bhattacharya, Debashis Sen, Prabir Kumar Biswas:
3D Shape Completion using Multi-resolution Spectral Encoding. 54-63 - Alexander H. Berger, Laurin Lux, Suprosanna Shit, Ivan Ezhov, Georgios Kaissis, Martin J. Menten, Daniel Rueckert, Johannes C. Paetzold:
Cross-Domain and Cross-Dimension Learning for Image-to-Graph Transformers. 64-74 - Hossein Askari, Fred Roosta, Hongfu Sun:
Training-free Medical Image Inverses via Bi-level Guided Diffusion Models. 75-84 - Suhyun Ahn, Wonjung Park, Jihoon Cho, Jinah Park:
Volumetric Conditioning Module to Control Pretrained Diffusion Models for 3D Medical Images. 85-95 - Trong-Thang Pham, Tien-Phat Nguyen, Yuki Ikebe, Akash Awasthi, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le:
GazeSearch: Radiology Findings Search Benchmark. 96-106 - Yitong Li, Morteza Ghahremani, Youssef Wally, Christian Wachinger:
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET. 107-116 - Mevan Ekanayake, Zhifeng Chen, Gary F. Egan, Mehrtash Harandi, Zhaolin Chen:
SeCo-INR: Semantically Conditioned Implicit Neural Representations for Improved Medical Image Super-Resolution. 117-126 - Majed El Helou, Doruk Cetin, Petar Stamenkovic, Niko Benjamin Huber, Fabio Zünd:
VerA: Versatile Anonymization Applicable to Clinical Facial Photographs. 127-138 - Dixi Yao:
Towards Privacy-Preserving Split Learning for ControlNet. 139-148 - Stefan Smeu, Elisabeta Oneata, Dan Oneata:
DeCLIP: Decoding CLIP Representations for Deepfake Localization. 149-159 - Maciej Chrabaszcz, Hubert Baniecki, Piotr Komorowski, Szymon Plotka, Przemyslaw Biecek:
Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models. 160-171 - Xin Hu, Janet Wang, Jihun Hamm, Rie Roselyne Yotsu, Zhengming Ding:
Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM. 172-181 - Hanxiao Tan:
Evaluating Sensitivity Consistency of Explanations. 182-191 - Pengxiao Wang, Tzu-Heng Lin, Chunyu Wang, Yizhou Wang:
Shift Equivariant Pose Network. 192-201 - Yunfei Li, Yuezun Li, Xin Wang, Baoyuan Wu, Jiaran Zhou, Junyu Dong:
Texture, Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection. 202-211 - Hai Wang
, Jing-Hao Xue:
360PanT: Training-Free Text-Driven 360-Degree Panorama-to-Panorama Translation. 212-221 - Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari:
LIME: Localized Image Editing via Attention Regularization in Diffusion Models. 222-231 - Rohit Jena, Ali Taghibakhshi, Sahil Jain, Gerald Shen, Nima Tajbakhsh, Arash Vahdat:
Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models. 232-242 - Qinpeng Cui, Xinyi Zhang, Qiqi Bao, Qingmin Liao:
Elucidating the Solution Space of Extended Reverse-Time SDE for Diffusion Models. 243-252 - Xiaofei Huang, Elaheh Hatamimajoumerd, Amal Mathew, Sarah Ostadabbas:
Infant Action Generative Modeling. 253-265 - Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, David Osowiechi, Farzad Beizaee, Gustavo Adolfo Vargas Hakim, Ismail Ben Ayed, Christian Desrosiers:
Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging. 266-275 - Peizhi Yan, Rabab Ward, Qiang Tang, Shan Du:
Gaussian Déjà-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities. 276-286 - Kazuto Ichimaru, Diego Thomas, Takafumi Iwaguchi, Hiroshi Kawasaki:
Neural SDF for Shadow-Aware Unsupervised Structured Light. 287-296 - Mateusz Poleski, Jacek Tabor, Przemyslaw Spurek:
GeoGuide: Geometric Guidance of Diffusion Models. 297-305 - Simon Boeder
, Benjamin Risse:
OccFlowNet: Occupancy Estimation via Differentiable Rendering and Occupancy Flow. 306-316 - Boyuan Zhang, Zhenliang He, Meina Kan, Shiguang Shan:
Precise Integral in NeRFs: Overcoming the Approximation Errors of Numerical Quadrature. 317-326 - Cagla Deniz Bahadir, Gozde Bozdagi Akar, Mert R. Sabuncu:
LLM-Generated Rewrite and Context Modulation for Enhanced Vision Language Models in Digital Pathology. 327-336 - Yiying Wang, Abhirup Banerjee, Robin P. Choudhury, Vicente Grau:
DeepCA: Deep Learning-Based 3D Coronary Artery Tree Reconstruction from Two 2D Non-Simultaneous X-Ray Angiography Projections. 337-346 - Daniel Kim, Mohammed A. Al-masni, Jaehun Lee, Dong-Hyun Kim, Kanghyun Ryu:
Improving Pelvic MR-CT Image Alignment with Self-Supervised Reference-Augmented Pseudo-CT Generation Framework. 347-356 - Felix Wagner, Wentian Xu, Pramit Saha, Ziyun Liang, Daniel Whitehouse
, David K. Menon, Virginia F. J. Newcombe, Natalie Voets, J. Alison Noble, Konstantinos Kamnitsas:
Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities. 357-367 - Shumpei Takezaki, Kiyohito Tanaka, Seiichi Uchida:
Self-Relaxed Joint Training: Sample Selection for Severity Estimation with Ordinal Noisy Labels. 368-377 - Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Tom Weidong Cai:
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation. 378-387 - Snehashis Majhi, Mohammed Guermal, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, François Brémond:
Guess Future Anomalies from Normalcy: Forecasting Abnormal Behavior in Real-World Videos. 388-398 - Seoyeon Gye, Junwon Ko, Hyounguk Shon, Minchan Kwon, Junmo Kim:
Reducing the Content Bias for AI-generated Image Detection. 399-408 - Jaehyeong Park, Juncheol Ye, Seungkook Lee, Hyun W. Ka, Dongsu Han:
NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context. 409-419 - Tung Luu, Nam Le, Duc Le, Bac Le:
From Visual Explanations to Counterfactual Explanations with Latent Diffusion. 420-429 - Yu-Yun Tseng, Tanusree Sharma, Lotus Zhang, Abigale Stangl, Leah Findlater, Yang Wang, Danna Gurari:
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments. 430-440 - Gayoon Choi, Taejin Jeong, Sujung Hong, Seong Jae Hwang:
Dragtext: Rethinking Text Embedding in Point-Based Image Editing. 441-450 - Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel:
Dynamic Attention-Guided Diffusion for Image Super-Resolution. 451-460 - Shuang Chen, Haozheng Zhang, Amir Atapour-Abarghouei, Hubert P. H. Shum:
SEM-Net: Efficient Pixel Modelling for Image Inpainting with Spatially Enhanced SSM. 461-471 - Rahul Sajnani, Jeroen van Baar, Jie Min, Kapil Katyal, Srinath Sridhar:
GeoDiffuser: Geometry-Based Image Editing with Diffusion Models. 472-482 - Zitian Zhang, Frédéric Fortier-Chouinard, Mathieu Garon, Anand Bhattad, Jean-François Lalonde:
Zerocomp: Zero-Shot Object Compositing from Image Intrinsics via Diffusion. 483-494 - Diego Thomas, Briac Toussaint, Jean-Sébastien Franco, Edmond Boyer:
VortSDF: 3D Modeling with Centroidal Voronoi Tessellation on Signed Distance Field. 495-504 - Markus Plack, Hannah Dröge, Leif Van Holland, Matthias B. Hullin:
VHS: High-Resolution Iterative Stereo Matching with Visual Hull Priors. 505-514 - Ren Matsumoto, Takahiro Okabe, Ryo Kawahara:
Polarization as Texture: Microscale 3D Shape from Polarized Light Focus. 515-524 - Yuxin Huang, Andong Yang, Yuantao Chen, Runyi Yang, Zhenxin Zhu, Chao Hou, Hao Zhao, Guyue Zhou:
Self-Aligning Depth-Regularized Radiance Fields for Asynchronous RGB-D Sequences. 525-534 - Henrique Piñeiro Monteagudo
, Leonardo Taccari, Aurel Pjetri, Francesco Sambo, Samuele Salti:
RendBEV: Semantic Novel View Synthesis for Self-Supervised Bird's Eye View Segmentation. 535-544 - Yujing Sun, Caiyi Sun, Yuan Liu, Yuexin Ma, Siu Ming Yiu:
Generalizable Single-View Object Pose Estimation by Two-Side Generating and Matching. 545-556 - Chetan Madan, Mayuna Gupta, Soumen Basu, Pankaj Gupta, Chetan Arora:
LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Images. 557-567 - Xiwei Liu, Mohamad Kassab, Min Xu, Qirong Ho:
J-Invariant Volume Shuffle for Self-Supervised Cryo-Electron Tomogram Denoising on Single Noisy Volume. 568-577 - Daniel Khalil, Christina Liu, Pietro Perona, Jennifer J. Sun, Markus Marks:
Learning Keypoints for Multi-Agent Behavior Analysis using Self-Supervision. 578-588 - Nirhoshan Sivaroopan, Chamuditha Jayanga Galappaththige, Chalani Ekanayake, Hasindri Watawana, Ranga Rodrigo, Chamira U. S. Edussooriya, Dushan N. Wadduwage:
Uncertainty Awareness Enables Efficient Labeling for Cancer Subtyping in Digital Pathology. 589-598 - Hyeongmin Park, Sungrae Hong, Chanjae Song, Jongwoo Kim, Mun Yong Yi:
Uncertainty-based Data-wise Label Smoothing for Calibrating Multiple Instance Learning in Histopathology Image Classification. 599-608 - Huimin Zeng, Jiacheng Li, Ziqiang Zheng, Zhiwei Xiong:
All-in-One Image Compression and Restoration. 609-619 - Sourajit Saha, Tejas Gokhale:
Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling. 620-629 - Pritam Karmokar, Quan H. Nguyen, William J. Beksi:
Secrets of Edge-Informed Contrast Maximization for Event-Based Vision. 630-639 - Sangwon Lee, Myungsub Choi, Nagyeong Lee, Hyong-Euk Lee:
Stable Autofocus with Focal Consistency Loss. 640-649 - Ashish Tiwari, Mihirkumar Sutariya, Shanmuganathan Raman:
LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo. 650-659 - Jiancheng Huang, Yi Huang, Jianzhuang Liu, Donghao Zhou, Yifan Liu, Shifeng Chen:
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing. 660-669 - Sanuwani Dayarathna, Kh Tohidul Islam, Bohan Zhuang, Guang Yang, Jianfei Cai, Meng Law, Zhaolin Chen:
McCaD: Multi-Contrast MRI Conditioned, Adaptive Adversarial Diffusion Model for High-Fidelity MRI Synthesis. 670-679 - Kyungri Park, Woohwan Jung:
Improving Detail in Pluralistic Image Inpainting with Feature Dequantization. 680-689 - Kyungmin Jo, Jaegul Choo:
Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects. 690-699 - Arya Bangun, Zhuo Cao, Alessio Quercia, Hanno Scharr, Elisabeth Pfaehler:
MRI Reconstruction with Regularized 3D Diffusion Model (R3DM). 700-710 - Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki:
VADet: Multi-Frame LiDAR 3D Object Detection Using Variable Aggregation. 711-720 - Gursimran Singh, Tianxi Hu, Mohammad Akbari, Qiang Tang, Yong Zhang:
Towards Secure and Usable 3D Assets: A Novel Framework for Automatic Visible Watermarking. 721-730 - Xinyue Wei, Fanbo Xiang, Sai Bi, Anpei Chen, Kalyan Sunkavalli, Zexiang Xu, Hao Su:
NeuManifold: Neural Watertight Manifold Reconstruction with Efficient and High-Quality Rendering Support. 731-741 - Decai Chen, Brianne Oberson, Ingo Feldmann, Oliver Schreer, Anna Hilsmann, Peter Eisert:
Adaptive and Temporally Consistent Gaussian Surfels for Multi-View Dynamic Reconstruction. 742-752 - Gonzalo Martin Garcia, Karim Abou Zeid, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe:
Fine-Tuning Image-Conditional Diffusion Models is Easier than you Think. 753-762 - Mahdi Alehdaghi, Pourya Shamsolmoali, Rafael M. O. Cruz, Eric Granger:
Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification. 763-773 - Jiahao Luo, Jing Liu, James Davis:
SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface. 774-783 - Jui-Che Chiang, Hou-Ning Hu, Bo-Syuan Hou, Chia-Yu Tseng, Yu-Lun Liu, Min-Hung Chen, Yen-Yu Lin:
ORFormer: Occlusion-Robust Transformer for Accurate Facial Landmark Detection. 784-793 - Rohit Lal, Saketh Bachu, Yash Garg, Arindam Dutta, Calvin-Khang Ta, Hannah Dela Cruz, Dripta S. Raychaudhuri, M. Salman Asif, Amit K. Roy-Chowdhury:
STRIDE: Single-Video Based Temporally Continuous Occlusion-Robust 3D Pose Estimation. 794-803 - Kartik Narayan, Nithin Gopalakrishnan Nair, Jennifer Xu, Rama Chellappa, Vishal M. Patel:
PETALface: Parameter Efficient Transfer Learning for Low-Resolution Face Recognition. 804-814 - Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras:
Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer. 815-824 - Elaine Sui, Xiaohan Wang, Serena Yeung-Levy:
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. 825-835 - Leon Sick, Dominik Engel, Pedro Hermosilla, Timo Ropinski:
Attention-Guided Masked Autoencoders for Learning Image Representations. 836-846 - Donghyeon Kwon, Inho Kim, Suha Kwak:
Boosting Semi-Supervised Video Action Detection with Temporal Context. 847-858 - Gabriele Spadaro, Marco Grangetto, Attilio Fiandrotti, Enzo Tartaglione, Jhony H. Giraldo:
WiGNet: Windowed Vision Graph Neural Network. 859-868 - Fei Wu, Pablo Márquez-Neila, Hedyeh Rafii-Tari, Raphael Sznitman:
Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation. 869-878 - Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu:
DeepMIM: Deep Supervision for Masked Image Modeling. 879-888 - Surojit Saha, Sarang C. Joshi, Ross T. Whitaker:
ARD-VAE: A Statistical Formulation to Find the Relevant Latent Dimensions of Variational Autoencoders. 889-898 - Wonjun Kang, Kevin Galim
, Hyung Il Koo, Nam Ik Cho:
Counting Guidance for High Fidelity Text-to-Image Synthesis. 899-908 - Rui Xu, Mengya Hu, Deren Lei, Yaxi Li, David Lowe, Alex Gorevski, Mingyu Wang, Emily Ching, Alex Deng:
InvisMark: Invisible and Robust Watermarking for AI-generated Image Provenance. 909-918 - Zhiyuan Xu, Yinhe Chen, Huan-ang Gao, Weiyan Zhao, Guiyu Zhang, Hao Zhao:
Diffusion-based Visual Anagram as Multi-task Learning. 919-928 - Ashutosh Srivastava, Tarun Ram Menta, Abhinav Java, Avadhoot Jadhav, Silky Singh, Surgan Jandial, Balaji Krishnamurthy:
REEDIT: Multimodal Exemplar-Based Image Editing. 929-939 - Tanvir Mahmud, Mustafa Munir, Radu Marculescu, Diana Marculescu
:
Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior. 940-949 - Bo Lang, Mooi Choo Chuah:
Event-Guided Fusion-Mamba for Context-Aware 3D Human Pose Estimation. 950-960 - Luchao Qi
, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta:
My3DGen: A Scalable Personalized 3D Generative Model. 961-972 - Ashkan Ganj, Hang Su, Tian Guo:
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors. 973-982 - Keon Moradi, Ethan Haque, Jasmeen Kaur, Alexandra B. Bentz, Eli S. Bridge, Golnaz Habibi:
Context-Aware Outlier Rejection for Robust Multi-View 3D Tracking of Similar Small Birds in An Outdoor Aviary. 983-991 - Stathis Galanakis, Alexandros Lattas, Stylianos Moschoglou, Stefanos Zafeiriou:
FitDiff: Robust Monocular 3D Facial Shape and Reflectance Estimation using Diffusion Models. 992-1004 - Junyi Cao, Chao Ma:
Towards Generalized Face Anti-Spoofing from a Frequency Shortcut View. 1005-1015 - Marco Huber, Naser Damer:
Beyond Spatial Explanations: Explainable Face Recognition in the Frequency Domain. 1016-1026 - Diana Voth, Leonidas Dane, Jonas Grebe, Sebastian Peitz
, Philipp Terhörst:
Effective Backdoor Learning on Open-Set Face Recognition Systems. 1027-1039 - Hanwei Kung, Tuomas Varanka, Sanjay Saha, Terence Sim, Nicu Sebe
:
Face Anonymization Made Simple. 1040-1050 - Yuxiang Guo, Anshul Shah, Jiang Liu, Ayush Gupta, Rama Chellappa, Cheng Peng:
GaitContour: Efficient Gait Recognition Based on a Contour-Pose Representation. 1051-1061 - Sanoojan Baliah, Qinliang Lin, Shengcai Liao, Xiaodan Liang, Muhammad Haris Khan:
Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models. 1062-1071 - Rui Li, Martin Trapp
, Marcus Klasson, Arno Solin
:
Flatness Improves Backbone Generalisation in Few-Shot Classification. 1072-1089 - Andrea Alfarano, Alberto Alfarano, Linda Friso, Andrea Bacciu, Irene Amerini, Fabrizio Silvestri:
STLight: A Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal Joint Processing. 1090-1100 - Do Huu Dat, Po Yuan Mao, Tien Hoang Nguyen, Wray L. Buntine, Mohammed Bennamoun
:
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts. 1101-1110 - Kiran Kokilepersaud, Seulgi Kim, Mohit Prabhushankar, Ghassan AlRegib:
HEX: Hierarchical Emergence Exploitation in Self-Supervised Algorithms. 1111-1121 - Yunbei Zhang, Akshay Mehra, Jihun Hamm:
OT-VP: Optimal Transport-Guided Visual Prompting for Test-Time Adaptation. 1122-1132 - Marcelo Sanchez, Gil Triginer, Coloma Ballester, Ignacio Sarasua, Lara Raad:
A New Benchmark and Baseline for Real-Time High-Resolution Image Inpainting on Edge Devices. 1133-1143 - Mrinal Verghese, Brian Chen, Hamid Eghbalzadeh, Tushar Nagarajan, Ruta Desai:
User-in-the-Loop Evaluation of Multimodal LLMs for Activity Assistance. 1144-1154 - Yan-Bo Lin, Yu Tian, Linjie Yang, Gedas Bertasius, Heng Wang:
VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos. 1155-1165 - Daeyoung Roh, Donghee Han, Jihyun Nam, Jungsoo Oh, Youngbin You, Jeongheon Park, Mun Yong Yi:
CTIP: Towards Accurate Tabular-to-Image Generation for Tire Footprint Generation. 1166-1175 - Younghyun Cho, Changhun Lee, Seonggon Kim, Eunhyeok Park:
PTQ4VM: Post-Training Quantization for Visual Mamba. 1176-1185 - Jay N. Paranjape, Celso de Melo, Vishal M. Patel:
A Mamba-Based Siamese Network for Remote Sensing Change Detection. 1186-1196 - Julian D. Santamaria, Claudia Isaza, Jhony H. Giraldo:
CATALOG: A Camera Trap Language-Guided Contrastive Learning Model. 1197-1206 - Faith M. Johnson, Ryan Meegan, Jack Lowry, Peter Oudemans, Kristin J. Dana:
Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models. 1207-1216 - Feng Chen, Sotirios A. Tsaftaris, Mario Valerio Giuffrida:
GMT: Guided Mask Transformer for Leaf Instance Segmentation. 1217-1226 - Shirin Qiam, Saipraneeth Devunuri, Lewis J. Lehe:
A Pipeline and NIR-Enhanced Dataset for Parking Lot Segmentation. 1227-1236 - Shao-Hao Lu, Ren Wang, Ching-Chun Huang, Wei-Chen Chiu:
Boosting Diffusion Guidance via Learning Degradation-Aware Models for Blind Super Resolution. 1237-1246 - Antoine Mercier, Ramin Nakhli, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger:
HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse Text-to-3D Generation. 1247-1257 - Ali Mollaahmadi Dehaghi, Reza Razavi, Mohammad Moshirpour:
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression. 1258-1267 - Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang:
ARTIST: Improving the Generation of Text-Rich Images with Disentangled Diffusion Models and Large Language Models. 1268-1278 - Lorenzo Mandelli, Stefano Berretti:
Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models. 1279-1288 - S. Divakar Bhat, Amit More, Mudit Soni, Surbhi Agrawal:
Prior2Posterior: Model Prior Correction for Long-Tailed Learning. 1289-1298 - Prafful Kumar Khoba, Zijian Wang
, Chetan Arora, Mahsa Baktashmotlagh
:
Feature Space Perturbation: A Panacea to Enhanced Transferability Estimation. 1299-1308 - Hayeong Yu, Seungjae Han, Young-Gyu Yoon:
Design Principles of Multi-Scale J-Invariant Networks for Self-Supervised Image Denoising. 1309-1318 - Simon Damm, Mike Laszkiewicz, Johannes Lederer, Asja Fischer:
AnomalyDINO: Boosting Patch-based Few-Shot Anomaly Detection with DINOv2. 1319-1329 - Abu Zahid Bin Aziz, Mokshagna Sai Teja Karanam, Tushar Kataria, Shireen Y. Elhabian:
EFFICIENTMORPH: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration. 1330-1341 - Wenxin Ma, Qingsong Yao, Xiang Zhang, Zhelong Huang, Zihang Jiang, S. Kevin Zhou:
Towards Accurate Unified Anomaly Segmentation. 1342-1352 - Junhyeong Go, Jongbin Ryu:
Channel Propagation Networks for Refreshable Vision Transformer. 1353-1362 - Muhammad Ali, Mamoona Javaid, Mubashir Noman, Mustansar Fiaz, Salman H. Khan:
COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes. 1363-1372 - Lucas Deregnaucourt, Hind Laghmara, Alexis Lechervy, Samia Ainouz:
A Conflict-Guided Evidential Multimodal Fusion for Semantic Segmentation. 1373-1382 - Monika Kwiatkowski, Simon Matern, Olaf Hellwich:
Swin-∇: Gradient-Based Image Restoration from Image Sequences using Video Swin-Transformers. 1383-1391 - Gautier Evennou, Antoine Chaffin, Vivien Chappelier, Ewa Kijak:
Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation. 1392-1402 - Mohammad Reza Taesiri, Cor-Paul Bezemer:
Videogamebunny: Towards Vision Assistants for Video Games. 1403-1413 - Aiyu Cui, Jay Mahajan, Viraj Shah, Preeti Gomathinayagam, Chang Liu, Svetlana Lazebnik:
Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images. 1414-1423 - Jeya Maria Jose Valanarasu, Rahul Garg, Andeep Toor, Xin Tong, Weijuan Xi, Andreas Lugmayr, Vishal M. Patel, Anne Menini:
ReBotNet: Fast Real-Time Video Enhancement. 1424-1435 - Souhail Bakkali, Sanket Biswas, Zuheng Ming, Mickaël Coustaty, Marçal Rusiñol, Oriol Ramos Terrades, Josep Lladós:
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification. 1436-1446 - Yizhou Wang, Kuan-Chuan Peng, Yun Fu:
Towards Zero-shot 3D Anomaly Localization. 1447-1456 - Seoungyoon Kang, Youngsun Lim, Hyunjung Shim:
Label-Augmented Dataset Distillation. 1457-1466 - S. Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, R. Subramanian, Abhinav Dhall, Tom Gedeon:
MIP-GAF: A MLLM-Annotated Benchmark for Most Important Person Localization and Group Context Understanding. 1467-1476 - Paritosh Parmar, Eric Peh, Basura Fernando:
Learning to Visually Connect Actions and Their Effects. 1477-1487 - Keren Ganon, Morris Alper, Rachel Mikulinsky, Hadar Averbuch-Elor:
WAFFLE: Multimodal Floorplan Understanding in the Wild. 1488-1497 - Dac Thai Nguyen, Trung Thanh Nguyen, Huu Tien Nguyen, Thanh Trung Nguyen, Huy Hieu Pham, Thanh Hung Nguyen, Truong Thao Nguyen, Phi Le Nguyen:
CT to PET Translation: A Large-Scale Dataset and Domain-Knowledge-Guided Diffusion Approach. 1498-1507 - Jiahao Xu
, Zikai Zhang, Rui Hu:
Achieving Byzantine-Resilient Federated Learning via Layer-Adaptive Sparsified Model Aggregation. 1508-1517 - Jung Im Choi
, Qizhen Lan, Qing Tian:
Improving Deep Detector Robustness via Detection-Related Discriminant Maximization and Reorganization. 1518-1527 - Rambod Azimi, Yijian Kong, Dusan Gostimirovic, James J. Clark, Odile Liboiron-Ladouceur:
SEMU-Net: A Segmentation-Based Corrector for Fabrication Process Variations of Nanophotonics with Microscopic Images. 1528-1536 - Seonguk Seo, Mustafa Gökhan Uzunbas, Bohyung Han, Sara Cao, Ser-Nam Lim:
Metric Compatible Training for Online Backfilling in Large-Scale Retrieval. 1537-1545 - Dimitrios Sinodinos
, Narges Armanfard:
Cross-Task Affinity Learning for Multitask Dense Scene Predictions. 1546-1555 - Sourasekhar Banerjee, Debaditya Roy, Vigneshwaran Subbaraju, Monowar H. Bhuyan:
Predicting Event Memorability Using Personalized Federated Learning. 1556-1565 - Hamidreza Dastmalchi, Aijun An
, Ali Cheraghian, Shafin Rahman, Sameera Ramasinghe:
Test-Time Adaptation of 3D Point Clouds via Denoising Diffusion Models. 1566-1576 - Dan-Sebastian Bacea, Florin Oniga:
ECF-YOLOv7-Tiny: Improving Feature Fusion and the Receptive Field for Lightweight Object Detectors. 1577-1586 - Giulia Rizzoli, Matteo Caligiuri, Donald Shenaj, Francesco Barbato, Pietro Zanuttigh:
When Cars Meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather. 1587-1596 - Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan, Michael Brudno, Babak Taati:
SUM: Saliency Unification Through Mamba for Visual Attention Modeling. 1597-1607 - Nyle Siddiqui, Florinel-Alin Croitoru, Gaurav Kumar Nayak, Radu Tudor Ionescu, Mubarak Shah:
DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-Id. 1608-1617 - Kuan-Hung Liu, Cheng-Kun Yang, Min-Hung Chen, Yu-Lun Liu, Yen-Yu Lin:
CorrFill: Enhancing Faithfulness in Reference-Based Inpainting with Correspondence Guidance in Diffusion Models. 1618-1627 - Shuo Wang
, Chunlong Xia, Feng Lv
, Yifeng Shi:
RT-DETRv3: Real-Time End-to-End Object Detection with Hierarchical Dense Positive Supervision. 1628-1636 - Elham Amin Mansour, Ozan Unal, Suman Saha, Benjamín Béjar, Luc Van Gool:
Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation. 1637-1648 - Saheli Hazra, Sudip Das, Rohit Choudhary, Arindam Das, Ganesh Sistu, Ciarán Eising, Ujjwal Bhattacharya:
Reflective Teacher: Semi-Supervised Multimodal 3D Object Detection in Bird's-Eye-View via Uncertainty Measure. 1649-1659 - Maciej K. Wozniak, Hariprasath Govindarajan, Marvin Klingner, Camille Maurice, Ravi Kiran, Senthil Kumar Yogamani:
S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving. 1660-1670 - Rémy Sun, Li Yang, Diane Lingrand, Frédéric Precioso:
Mind the Map! Accounting for Existing Maps When Estimating Online HDMaps from Sensors. 1671-1681 - Adrien Lafage, Mathieu Barbier, Gianni Franchi, David Filliat:
Hierarchical Light Transformer Ensembles for Multimodal Trajectory Forecasting. 1682-1691 - Chaesong Park, Eunbin Seo, Jongwoo Lim:
HeightLane: BEV Heightmap Guided 3D Lane Detection. 1692-1701 - Roy Uziel, Oded Bialer:
Optimizing Vision-Language Model for Road Crossing Intention Estimation. 1702-1712 - Xiaoyu Zhang, Ziwei Wang, Hai Dong, Zhifeng Bao, Jiajun Liu:
On-the-Fly Object-aware Representative Point Selection in Point Cloud. 1713-1722 - Nikos Efthymiadis, Bill Psomas, Zakaria Laskar, Konstantinos Karantzalos, Yannis Avrithis, Ondrej Chum, Giorgos Tolias:
Composed Image Retrieval for Training-FREE DOMain Conversion. 1723-1733 - Zifu Wan, Pingping Zhang, Yuhao Wang, Silong Yong, Simon Stepputtis, Katia P. Sycara, Yaqi Xie:
Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation. 1734-1744 - Hanoona Abdul Rasheed, Muhammad Maaz, Abdelrahman M. Shaker, Salman H. Khan, Hisham Cholakkal, Rao Muhammad Anwer, Tim Baldwin, Michael Felsberg, Fahad Shahbaz Khan:
Palo: A Polyglot Large Multimodal Model for 5B People. 1745-1754 - Anjishnu Mukherjee, Ziwei Zhu
, Antonios Anastasopoulos:
Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models. 1755-1764 - Srikumar Sastry, Subash Khanal, Aayush Dhakal, Adeel Ahmad, Nathan Jacobs:
TaxaBind: A Unified Embedding Space for Ecological Applications. 1765-1774 - Qianyi Liu, Siqi Zhang, Yanyuan Qiao, Junyou Zhu, Xiang Li, Longteng Guo, Qunbo Wang, Xingjian He, Qi Wu, Jing Liu:
GroundingMate: Aiding Object Grounding for Goal-Oriented Vision-and-Language Navigation. 1775-1784 - Florian Hofherr
, Bjoern Haefner, Daniel Cremers:
On Neural BRDFs: A Thorough Comparison of State-of-the-Art Approaches. 1785-1794 - Leif Van Holland, Michael Weinmann, Jan U. Müller, Patrick Stotko, Reinhard Klein:
NeRFs are Mirror Detectors: Using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface Primitives. 1795-1807 - Hugo Blanc, Jean-Emmanuel Deschaud, Alexis Paljic:
RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis. 1808-1817 - Konstantinos Tzevelekakis, Shutong Zhang, Luc Van Gool, Christos Sakaridis:
Sun Off, Lights on: Photorealistic Monocular Nighttime Simulation for Robust Semantic Perception. 1818-1828 - Kengo Matsufuji, Lin Shi, Ryo Kawahara, Takahiro Okabe:
Separating Direct and Global Components from Novel Viewpoints. 1829-1838 - Tianshu Kuai, Sina Honari, Igor Gilitschenski, Alex Levinshtein:
Towards Unsupervised Blind Face Restoration Using Diffusion Prior. 1839-1849 - Naeun Ko, Yonghyun Jeong, Jong Chul Ye:
Text-to-Image Synthesis for Domain Generalization in Face Anti-Spoofing. 1850-1860 - Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille:
GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling. 1850-1860 - Bin Yan, Martin Sundermeyer, David Joseph Tan, Huchuan Lu, Federico Tombari:
Towards Real-Time Open-Vocabulary Video Instance Segmentation. 1861-1871 - Hakjin Lee, Minki Song
, Jamyoung Koo, Junghoon Seo:
Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer. 1872-1882 - Nikos Efthymiadis, Giorgos Tolias, Ondrej Chum:
Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization. 1883-1892 - Abbas Khan, Muhammad Asad, Martin Benning, Caroline H. Roney, Gregory G. Slabaugh:
CAMS: Convolution and Attention-Free Mamba-Based Cardiac Image Segmentation. 1893-1903 - Wenhao Gu, Li Gu, Ziqiang Wang, Ching Yee Suen, Yang Wang:
DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning. 1904-1913 - Wulin Xie, Lian Zhao, Jiang Long, Xiaohuan Lu, Bingyan Nie:
Multi-View Factorizing and Disentangling: A Novel Framework for Incomplete Multi-View Multi-Label Classification. 1914-1923 - Shahriar Rifat, Jonathan D. Ashdown, Francesco Restuccia:
DARDA: Domain-Aware Real-Time Dynamic Neural Network Adaptation. 1924-1932 - Hidehisa Arai, Keita Miwa, Kento Sasaki, Kohei Watanabe, Yu Yamaguchi, Shunsuke Aoki, Issei Yamamoto:
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving. 1933-1943 - Alloy Das, Sanket Biswas, Prasun Roy, Subhankar Ghosh, Umapada Pal, Michael Blumenstein, Josep Lladós, Saumik Bhattacharya:
FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework. 1944-1954 - Sreetama Sarkar, Gourav Datta, Souvik Kundu, Kai Zheng, Chirayata Bhattacharyya, Peter A. Beerel:
MaskVD: Region Masking for Efficient Video Object Detection. 1955-1964 - Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu:
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding. 1965-1978 - Zhonghua Yi, Hao Shi, Qi Jiang, Kailun Yang, Ze Wang, Diyang Gu, Yufan Zhang, Kaiwei Wang:
EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data. 1979-1988 - Michael Schwingshackl, Fabio Francisco Oberweger, Markus Murschitz:
Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks. 1989-1998 - Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, Limin Wang:
Transferring Foundation Models for Generalizable Robotic Manipulation. 1999-2010 - Raktim Gautam Goswami, Naman Patel, Prashanth Krishnamurthy, Farshad Khorrami:
FlashMix: Fast Map-Free LiDAR Localization via Feature Mixing and Contrastive-Constrained Accelerated Training. 2011-2020 - Jianhao Zheng, Gábor Valasek, Daniel Barath, Iro Armeni:
Multi-HexPlanes: A Lightweight Map Representation for Rendering and 3D Reconstruction. 2021-2031 - Lin Shi, Kengo Matsufuji, Ryo Kawahara, Takahiro Okabe:
FluoNeRF: Fluorescent Novel-View Synthesis Under Novel Light Source Colors. 2032-2041 - Eito Ikuta, Yohan Lee, Akihiro Iohara, Yu Saito, Toshiyuki Tanaka:
Harmonizing Attention: Training-free Texture-aware Geometry Transfer. 2042-2051 - Chengyang Yan, Donald G. Dansereau:
TaCOS: Task-Specific Camera Optimization with Simulation. 2052-2062 - Daiki Miyake, Akihiro Iohara, Yu Saito, Toshiyuki Tanaka:
Negative-Prompt Inversion: Fast Image Inversion for Editing with Text-Guided Diffusion Models. 2063-2072 - Stanislav Frolov, Brian B. Moser, Andreas Dengel:
SpotDiffusion: A Fast Approach for Seamless Panorama Generation Over Time. 2073-2081 - Prajneya Kumar, Eshika Khandelwal, Makarand Tapaswi, Vishnu Sreekumar:
Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability. 2082-2091 - Akshita Gupta, Gaurav Mittal, Ahmed Magooda, Ye Yu, Graham W. Taylor, Mei Chen:
LoSA: Long-Short-Range Adapter for Scaling End-to-End Temporal Action Localization. 2092-2102 - Minghui Lin, Shu Wang, Xiang Wang, Jianhua Tang, Longbin Fu, Zhengrong Zuo, Nong Sang:
DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification. 2103-2112 - Rita Pucci, Niki Martinel:
CE-VAE: Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement. 2113-2123 - Logan Servant, Michaël Clément, Laurent Wendling, Camille Kurtz:
Contrastive Learning of Image Representations Guided by Spatial Relations. 2124-2133 - Katharina Prasse, Isaac Bravo, Stefanie Walter, Margret Keuper:
I Spy with My Little Eye a Minimum Cost Multicut Investigation of Dataset Frames. 2134-2143 - Jingbo Zeng, Zaiwang Gu, Weide Liu, Lile Cai, Jun Cheng:
Uncertainty Aware Interest Point Detection and Description. 2144-2153 - Jiawei Yao, Jusheng Zhang, Xiaochao Pan, Tong Wu, Canran Xiao:
DepthSSC: Monocular 3D Semantic Scene Completion via Depth-Spatial Alignment and Voxel Adaptation. 2154-2163 - Yongkang Cheng, Mingjiang Liang, Shaoli Huang, Gaoge Han, Jifeng Ning, Wei Liu:
Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios. 2164-2173 - Chen Zhao
, Mengyuan Yu, Fan Yang, Peiguang Jing:
VIIS: Visible and Infrared Information Synthesis for Severe Low-Light Image Enhancement. 2174-2184 - Saad Lahlali, Nicolas Granger, Hervé Le Borgne, Quoc-Cuong Pham:
ALPI: Auto-Labeller with Proxy Injection for 3D Object Detection using 2D Labels Only. 2185-2194 - Rao Fu, Jingyu Liu, Xilun Chen, Yixin Nie, Wenhan Xiong:
Scene-LLM: Extending Language Model for 3D Visual Reasoning. 2195-2206 - Tai D. Nguyen, Matthew C. Stamm:
MVFNet: Multipurpose Video Forensics Network using Multiple Forms of Forensic Evidence. 2207-2217 - Gaoge Han, Mingjiang Liang, Jinglei Tang, Yongkang Cheng, Wei Liu, Shaoli Huang:
ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model. 2218-2227 - Shaoxiang Wang, Yaxu Xie, Chun-Peng Chang, Christen Millerdurai, Alain Pagani, Didier Stricker:
Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction. 2228-2239 - Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Jürgen Gall, Amirhossein Habibian:
Valid: Variable-Length Input Diffusion for Novel View Synthesis. 2240-2249 - Florian Chabot, Nicolas Granger, Guillaume Lapouge:
GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation. 2250-2259 - Longwei Li, Huajian Huang, Sai-Kit Yeung, Hui Cheng:
OmniGS: Fast Radiance Field Reconstruction Using Omnidirectional Gaussian Splatting. 2260-2268 - Tao Tu, Ming-Feng Li, Chieh Hubert Lin, Yen-Chi Cheng, Min Sun, Ming-Hsuan Yang:
DreaMo: Articulated 3D Reconstruction from a Single Casual Video. 2269-2279 - Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Fiona R. Kolbinger, Stefanie Speidel
:
Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models. 2280-2290 - Fotios Logothetis, Ignas Budvytis, Roberto Cipolla:
NPL-MVPS: Neural Point-Light Multi-View Photometric Stereo. 2291-2300 - Wenzhao Li, Tianhao Wu, Fangcheng Zhong, Cengiz Öztireli:
ARF-Plus: Controlling Perceptual Factors in Artistic Radiance Fields for 3D Scene Stylization. 2301-2310 - Sachin Raja, Ajoy Mandal, C. V. Jawahar:
Treading Towards Privacy-Preserving Table Structure Recognition. 2311-2321 - Tong Wei, Philipp Lindenberger, Jirí Matas, Daniel Barath:
Breaking the Frame: Visual Place Recognition by Overlap Prediction. 2322-2331 - G. Ujwal Sai, Arkadipta De, Vartika Sengar, Anuj Rathore, Daksh Thapar, Manohar Kaul:
Learning Semantic Part-Based Graph Structure for 3D Point Cloud Domain Generalization. 2332-2341 - Jiuxiang Gu, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song:
Differential Privacy Mechanisms in Neural Tangent Kernel Regression. 2342-2356 - Adith Boloor, Weikai Lin, Tianrui Ma
, Yu Feng, Yuhao Zhu, Xuan Zhang:
PrivateEye: In-Sensor Privacy Preservation Through Optical Feature Separation. 2357-2367 - Shogo Sato, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida, Akisato Kimura:
Unsupervised Single-Image Intrinsic Image Decomposition with LiDAR Intensity Enhanced Training. 2368-2378 - Benjamin Salmon
, Alexander Krull
:
Unsupervised Denoising for Signal-Dependent and Row-Correlated Imaging Noise. 2379-2389 - Chen Wu, Ling Wang, Long Peng, Dianjie Lu, Zhuoran Zheng:
Dropout the High-Rate Downsampling: A Novel Design Paradigm for UHD Image Restoration. 2390-2399 - Ankit Dhiman, R. Srinath, Srinjay Sarkar, Lokesh R. Boregowda, R. Venkatesh Babu:
ChromaDistill: Colorizing Monochrome Radiance Fields with Knowledge Distillation. 2400-2410 - Chaohao Xie, Kai Han, Kwan-Yee K. Wong:
VipDiff: Towards Coherent and Diverse Video Inpainting via Training-Free Denoising Diffusion Models. 2411-2420 - Matias Turkulainen, Xuqian Ren
, Iaroslav Melekhov, Otto Seiskari
, Esa Rahtu
, Juho Kannala:
DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing. 2421-2431 - Dongwoo Park, Suk Pil Ko:
NCAP: Scene Text Image Super-Resolution with Non-CAtegorical Prior. 2432-2441 - Bo Ji, Angela Yao:
High-Pass Kernel Prediction for Efficient Video Deblurring. 2442-2452 - Guoshan Liu, Hailong Yin, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang:
Retrieval Augmented Recipe Generation. 2453-2463 - Nan Cai, Pia Bideau:
Active Event Alignment for Monocular Distance Estimation. 2464-2473 - Hojun Jang, Young Min Kim:
ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening. 2474-2483 - Zhiyu Pan, Zhicheng Zhong, Wenxuan Guo, Yifan Chen, Jianjiang Feng, Jie Zhou:
LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-timestamp 3D Human Pose Estimation. 2484-2494 - Laura O'Mahony, Nikola S. Nikolov, David J. P. O'Sullivan:
Towards Utilising a Range of Neural Activations for Comprehending Representational Associations. 2495-2506 - Amit Giloni, Omer Hofman, Ikuya Morikawa, Toshiya Shimizu, Yuval Elovici, Asaf Shabtai:
DiL: An Explainable and Practical Metric for Abnormal Uncertainty in Object Detection. 2507-2516 - Dongyu Yan, Guanyu Huang, Fengyu Quan, Haoyao Chen:
MSI-NeRF: Linking Omni-Depth with View Synthesis Through Multi-Sphere Image Aided Generalizable Neural Radiance Field. 2517-2526 - Giacomo Capitani, Lorenzo Bonicelli, Angelo Porrello, Federico Bolelli, Simone Calderara, Elisa Ficarra:
Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations. 2527-2537 - Juhyeon Park, Seokhyeon Jeong, Taesup Moon:
TLDR: Text Based Last-Layer Retraining for Debiasing Image Classifiers. 2538-2547 - Vito Paolo Pastore, Massimiliano Ciranni, Davide Marinelli, Francesca Odone, Vittorio Murino:
Looking at Model Debiasing through the Lens of Anomaly Detection. 2548-2557 - Mingqi Shao, Feng Xiong, Hang Zhang, Shuang Yang, Mu Xu, Wei Bian, Xueqian Wang:
Global-Guided Focal Neural Radiance Field for Large-Scale Scene Rendering. 2558-2567 - Weijing Tao
, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie:
DivAvatar: Diverse 3D Avatar Generation with a Single Prompt. 2568-2577 - Ugo Leone Cavalcanti, Matteo Poggi, Fabio Tosi, Valerio Cambareri, Vladimir Zlokolica, Stefano Mattoccia:
CabNIR: A Benchmark for In-Vehicle Infrared Monocular Depth Estimation. 2578-2590 - Muhammad Salman Ali
, Sung-Ho Bae, Enzo Tartaglione:
ELMGS: Enhancing Memory and Computation Scalability Through coMpression for 3D Gaussian Splatting. 2591-2600 - Matías Mendieta, Guangyu Sun, Chen Chen:
Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models. 2601-2610 - Feng Xu, David Ahmedt-Aristizabal, Lars Petersson, Dadong Wang, Xun Li:
Facial Expression Recognition with Controlled Privacy Preservation and Feature Compensation. 2611-2621 - Hermes McGriff, Renato Martins, Nicolas Andreff, Cédric Demonceaux:
Dense Scene Reconstruction from Light-Field Images Affected by Rolling Shutter. 2622-2630 - Shilin Hu, Hieu Le, ShahRukh Athar, Sagnik Das, Dimitris Samaras:
Shadow Removal Refinement via Material-Consistent Shadow Edges. 2631-2641 - Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao:
GauFRe: Gaussian Deformation Fields for Real-Time Dynamic Novel View Synthesis. 2642-2652 - Karam Park, Nam Ik Cho:
Partial Filter-Sharing: Improved Parameter-sharing Method for Single Image Super-Resolution Networks. 2653-2663 - Si-Yu Lu, Yung-Yao Chen, Yi-Tong Wu, Hsin-Chun Lin, Sin-Ye Jhong, Wen-Huang Cheng:
Radiance Field-Based Pose Estimation via Decoupled Optimization Under Challenging Initial Conditions. 2664-2673 - Yimu Wang, Krzysztof Czarnecki:
AiDe: Improving 3D Open-Vocabulary Semantic Segmentation by Aligned Vision-Language Learning. 2674-2685 - Yongjae Lee, Li Yang, Deliang Fan:
MFNeRF: Memory Efficient NeRF with Mixed-Feature Hash Table. 2686-2695 - Tu Vo, Chan Y. Park:
Deep Joint Unrolling for Deblurring and Low-Light Image Enhancement (JUDE). 2696-2705 - Hirunima Jayasekara, Khoi Pham, Nirat Saini, Abhinav Shrivastava:
Unified Framework for Open-World Compositional Zero-Shot Learning. 2706-2714 - Manh Duong Nguyen, Tuan Nghia Nguyen, Xuan Truong Nguyen:
ENAF: A Multi-Exit Network with an Adaptive Patch Fusion for Large Image Super Resolution. 2706-2714 - Yahan Chen
, Wenzheng Liu, Xiaowei Luo
:
Semantic Segmentation Method for Automated Indoor 3D Reconstruction based on Architectural-Knowledge-Aware Features. 2715-2724 - Asen Nachkov, Danda Pani Paudel, Martin Danelljan, Luc Van Gool:
Diffusion-Based Particle-DETR for BEV Perception. 2725-2735 - Aditya Dixit
, Nischit Hosamani, Puneet Gupta, Ankur Garg:
VISIONARY: Novel Spatial-Spectral Attention Mechanism for Hyperspectral Image Denoising. 2736-2745 - Yujing Xue, Jiaxiang Liu, Jiawei Du, Joey Tianyi Zhou:
PVP: Polar Representation Boost for 3D Semantic Occupancy Prediction. 2746-2755 - Han Zou, Masanori Suganuma, Takayuki Okatani:
RefVSR++: Exploiting Reference Inputs for Reference-based Video Super-resolution. 2756-2765 - Aimon Rahman, Malsha V. Perera, Vishal M. Patel:
Frame by Familiar Frame: Understanding Replication in Video Diffusion Models. 2766-2776 - Gasser Elazab, Torben Gräber, Michael Unterreiner, Olaf Hellwich:
MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications. 2777-2787 - Alexandre Fournier-Montgieux, Michaël Soumm, Adrian Popescu, Bertrand Luvison, Hervé Le Borgne:
Fairer Analysis and Demographically Balanced Face Generation for Fairer Face Verification. 2788-2798 - Ziqiang Shi, Rujie Liu, Jun Takahashi, Takuma Yamamoto:
Bayesian Optimal Latent Projection for Noisy Image Restoration. 2799-2807 - Amartya Bhattacharya, Debarshi Brahma, Suraj Nagaje Mahadev, Anmol Asati, Vikas Verma, Soma Biswas:
Can Out-of-Domain Data Help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection? 2808-2817 - Jiahui Li, Pourya Shamsolmoali, Yue Lu, Masoumeh Zareapoor:
ShapeMorph: 3D Shape Completion via Blockwise Discrete Diffusion. 2818-2827 - Inpyo Song, Sanghyeon Lee, Minjun Joo, Jangwon Lee:
Anomaly Detection for People with Visual Impairments Using an Egocentric 360-Degree Camera. 2828-2837 - Green Rosh K. S, Meghana Shankar, Prateek Kukreja, Anmol Namdev, B. H. Pawan Prasad:
XPose: Towards Extreme Low Light Hand Pose Estimation. 2838-2848 - Shaoxiong Zhang
, Hiromitsu Awano, Takashi Sato:
Gaitcloud: Leveraging Spatial-Temporal Information for Lidar-Base Gait Recognition With a True-3D Gait Representation. 2849-2858 - Federico Nocentini, Claudio Ferrari, Stefano Berretti:
EmoVOCA: Speech-Driven Emotional 3D Talking Heads. 2859-2868 - Hugo Porta, Emanuele Dalsasso, Diego Marcos, Devis Tuia:
Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation. 2869-2880 - Aleksandr Matsun, Numan Saeed, Fadillah Adamsyah Maani, Mohammad Yaqub:
ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalization. 2881-2889 - Vivek Madhavaram, Shivangana Rawat, Chaitanya Devaguptapu, Charu Sharma, Manohar Kaul:
Towards a Training Free Approach for 3D Scene Editing. 2890-2899 - Leonard Bruns, Jun Zhang, Patric Jensfelt:
Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration. 2900-2909 - Julian Kaltheuner, Patrick Stotko, Reinhard Klein:
ROSA: Reconstructing Object Shape and Appearance Textures by Adaptive Detail Transfer. 2910-2920 - Hossein Resani, Behrooz Nasihatkon, Mohammadreza Alimoradi Jazi:
Continual Learning in 3D Point Clouds: Employing Spectral Techniques for Exemplar Selection. 2921-2931 - Sanjay S. J, Akash J, Sreehari Rajan, Dimple A. Shajahan
, Charu Sharma:
Adversarial Learning Based Knowledge Distillation on 3D Point Clouds. 2932-2941 - Annie N. Wang, Luchao Qi, Roni Sengupta:
Continual Learning of Personalized Generative Face Models with Experience Replay. 2942-2951 - Jae Joong Lee, Bedrich Benes:
RGB2Point: 3D Point Cloud Generation from Single RGB Images. 2952-2962 - Thomas Walker, Octave Mariotti, Amir Vaxman, Hakan Bilen:
Spatially-Adaptive Hash Encodings for Neural Surface Reconstruction. 2963-2972 - Esmat Ghasemi Saghand, Susana K. Lai-Yuen:
MONAS-ESNN: Multi-Objective Neural Architecture Search for Efficient Spiking Neural Networks. 2963-2972 - Mingjiang Liang, Yongkang Cheng, Hualin Liang, Shaoli Huang, Wei Liu:
RopeTP: Global Human Motion Recovery via Integrating Robust Pose Estimation with Diffusion Trajectory Prior. 2973-2982 - Jiawei Liu, Wayne Lam, Zhigang Zhu, Hao Tang
:
SMDAF: A Scalable Sidewalk Material Data Acquisition Framework with Bidirectional Cross-Modal Knowledge Distillation. 2983-2992 - Anvita A. Srinivas, Tuomas P. Oikarinen, Divyansh Srivastava, Wei-Hung Weng, Tsui-Wei Weng:
SAND: Enhancing Open-Set Neuron Descriptions through Spatial Awareness. 2993-3002 - Shreya Saha, Zekai Liang, Shan Lin, Jingpei Lu, Michael C. Yip, Sainan Liu:
BASED: Bundle-Adjusting Surgical Endoscopic Dynamic Video Reconstruction Using Neural Radiance Fields. 3003-3012 - Chuanmao Fan, Chenxi Zhao, Ye Duan:
PVT: An Implicit Surface Reconstruction Framework via Point Voxel Geometric-Aware Transformer. 3013-3023 - Katherine Xu, Lingzhi Zhang, Jianbo Shi:
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models. 3024-3034 - Naga Venkata Sai Raviteja Chappa, Khoa Luu:
LiGAR: LiDAR-Guided Hierarchical Transformer for Multi-Modal Group Activity Recognition. 3035-3044 - Zhiyuan Gao, Wenbin Teng, Gonglin Chen, Jinsen Wu, Ningli Xu, Rongjun Qin, Andrew Feng, Yajie Zhao:
Skyeyes: Ground Roaming using Aerial View Images. 3045-3054 - Tingting Zhao, Chenguang Liu, Kamal Jnawali, Chang Su:
eLIR-Net: an Efficient AI Solution for Image Retouching. 3055-3063 - Haojie Cai, Dongfu Yin, Fei Richard Yu, Siting Xiong:
DSTR: Dual Scenes Transformer for Cross-Modal Fusion in 3D Object Detection. 3064-3073 - Wangduo Xie, Richard Schoonhoven, Tristan van Leeuwen, Matthew B. Blaschko:
AC-IND: Sparse CT Reconstruction Based on Attenuation Coefficient Estimation and Implicit Neural Distribution. 3074-3083 - Ziqi Gao, Wendi Yang, Yujia Li, Lei Xing, S. Kevin Zhou:
MS-Glance: Bio-Inspired Non-Semantic Context Vectors and Their Applications in Supervising Image Reconstruction. 3084-3095 - Ji Zhang, Yiran Ding, Zixin Liu:
OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction. 3096-3106 - Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang:
Data-Efficient 3D Visual Grounding via Order-Aware Referring. 3107-3117 - Brent Zoomers, Maarten Wijnants, Ivan Molenaers, Joni Vanherck
, Jeroen Put, Nick Michiels:
PRoGS: Progressive Rendering of Gaussian Splats. 3118-3127 - Junjie Oscar Yin, Ting Li, Jiahao Wang, Yi Zhang, Alan L. Yuille:
EasyRet3D: Uncalibrated Multi-View Multi-Human 3D Reconstruction and Tracking. 3128-3137 - Jingtong Yue, Xin Lin, Zijiu Yang, Chao Ren:
Dual-Representation Interaction Driven Image Quality Assessment with Restoration Assistance. 3138-3147 - Chen Feng, Duolikun Danier, Fan Zhang, Alex Mackin, Andrew Collins, David Bull:
MVAD: A Multiple Visual Artifact Detector for Video Streaming. 3148-3158 - Katharina Bendig, René Schuster, Nicole Thiemer, Karen Joisten, Didier Stricker:
Supplementary Material AnonyNoise: Anonymizing Event Data with Smart Noise to Outsmart Re-Identification and Preserve Privacy. 3159-3161 - Jiahuan Li, Xiaoyu Dong, Wei He, Naoto Yokoya:
Wavelength- and Depth-Aware Deep Image Prior for Blind Hyperspectral Imagery Deblurring with Coarse Depth Guidance. 3162-3171 - Md Motiur Rahman, Mohamed Trabelsi, Hüseyin Uzunalioglu, Aidan Boyd:
Personalized Mixture of Experts for Multi-Site Medical Image Segmentation. 3172-3184 - Maor Dikter, Tsachi Blau, Chaim Baskin:
Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency. 3185-3195 - Yilin Zheng, Chiang-Heng Chien, Ricardo Fabbri, Benjamin B. Kimia:
3D Edge Sketch from Multiview Images. 3196-3205 - Seonguk Seo, Dongwan Kim, Bohyung Han:
Revisiting Machine Unlearning with Dimensional Alignment. 3206-3215 - Arkadipta De, Vartika Sengar, Daksh Thapar, Mahesh Chandran, Manohar Kaul:
Elemental Composite Prototypical Network: Few-Shot Object Detection on Outdoor 3D Point Cloud Scenes. 3216-3226 - Nourhan Bayasi, Jamil Fayyad, Ghassan Hamarneh, Rafeef Garbi, Homayoun Najjaran:
Debiasify: Self-Distillation for Unsupervised Bias Mitigation. 3227-3236 - Haidong Wu, Snehal Bhayani, Janne Heikkilä:
A Conic Transformation Approach for Solving the Perspective-Three-Point Problem. 3237-3245 - Kunal Kathare, Ankit Dhiman, Vikas K. Gowda, Siddharth Aravindan, Shubham Monga, Basavaraja Shanthappa Vandrotti, Lokesh R. Boregowda:
Instructive3D: Editing Large Reconstruction Models with Text Instructions. 3246-3256 - Marco Garosi, Riccardo Tedoldi, Davide Boscaini
, Massimiliano Mancini
, Nicu Sebe
, Fabio Poiesi:
3D Part Segmentation via Geometric Aggregation of 2D Visual Features. 3257-3267 - Kunal Chelani, Assia Benbihi, Torsten Sattler, Fredrik Kahl:
EdgeGaussians - 3D Edge Mapping via Gaussian Splatting. 3268-3279 - Haoran Wang, Nantheera Anantrasirichai, Fan Zhang, David Bull:
UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction. 3280-3289 - Mohammad Farazi, Yalin Wang:
A Recipe for Geometry-Aware 3D Mesh Transformers. 3290-3300 - Kurt H. W. Stolle:
Balancing Shared and Task-Specific Representations: A Hybrid Approach to Depth-Aware Video Panoptic Segmentation. 3301-3309 - Michal Byra, Henrik Skibbe:
Generating Visual Explanations from Deep Networks Using Implicit Neural Representations. 3310-3319 - Youpeng Wen, Yi Zhu, Zhihao Zhan, Pengzhen Ren, Jianhua Han, Hang Xu, Shen Zhao, Xiaodan Liang:
DisCo: Discovering Common Affordance from Large Models for Actionable Part Perception. 3320-3329 - Zhen Yao, Mooi Choo Chuah:
Event-Guided Low-Light Video Semantic Segmentation. 3330-3341 - Masahiro Yamaguchi, Takashi Shibata, Shoji Yachida, Keiko Yokoyama, Toshinori Hosoi:
MDCN-PS: Monocular-Depth-Guided Coarse Normal Attention for Robust Photometric Stereo. 3342-3351 - Eui Jun Hwang, Sukmin Cho, Huije Lee, Youngwoo Yoon, Jong C. Park:
A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production. 3352-3362 - Devendra Patel, Vikas Verma, Shreyas Kumar Tah, Shwetabh Biswas, Soma Biswas:
FRAUD-Net: Fraud News Detection Using Sample Uncertainty & Domain Aware Generalized Network. 3363-3371 - Priyanka Mishra, Nancy Mehta, Santosh Kumar Vipparthi, Subrahmanyam Murala:
USWformer: Efficient Sparse Wavelet Transformer for Underwater Image Enhancement. 3372-3382 - Arturo Miguel Russell Bernal, Jane Cleland-Huang, Walter J. Scheirer:
Psych-Occlusion: Using Visual Psychophysics for Aerial Detection of Occluded Persons During Search and Rescue. 3383-3395 - Yi Yang, Lei Zhong, Huiping Zhuang:
ReFu: Recursive Fusion for Exemplar-Free 3D Class-Incremental Learning. 3396-3405 - Juheon Son, Jang-Hwan Choi:
FMD: Comprehensive Data Compression in Medical Domain via Fused Matching Distillation. 3406-3415 - Rouqaiah Al-Refai, Philipp Hempel, Clara Biagi, Philipp Terhörst:
FALCON: Fair Face Recognition via Local Optimal Feature Normalization. 3416-3426 - Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do:
CamoFA: A Learnable Fourier-Based Augmentation for Camouflage Segmentation. 3427-3436 - Gianluca D'Amico, Federico Nesti, Giulio Rossolini, Mauro Marinoni, Salvatore Sabina, Giorgio C. Buttazzo:
SynDRA: Synthetic Dataset for Railway Applications. 3437-3446 - Abdul Mohaimen Al Radi, Prothito Shovon Majumder, Md. Mosaddek Khan:
Blind Image Deblurring with FFT-ReLU Sparsity Prior. 3447-3456 - Benjamin Coupry, Baptiste Brument, Antoine Laurent, Jean Mélou, Yvain Quéau, Jean-Denis Durou:
Assessing the Quality of 3D Reconstruction in the Absence of Ground Truth: Application to a Multimodal Archaeological Dataset. 3457-3466 - Gereziher Adhane, Mohammad Mahdi Dehshibi, Dennis Vetter, David Masip, Gemma Roig:
On Explaining Knowledge Distillation: Measuring and Visualising the Knowledge Transfer Process. 3467-3476 - Sebastian Janampa, Marios Pattichis:
DT-LSD: Deformable Transformer-Based Line Segment Detection. 3477-3486 - Marzieh Mohammadi, Amir Salarpour:
Point-GN: A Non-Parametric Network Using Gaussian Positional Encoding for Point Cloud Classification. 3487-3496 - Rohan Chacko, Nicolai Häni, Eldar Khaliullin, Lin Sun, Douglas Lee:
Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation. 3497-3507 - Victor Rong, Jingxiang Chen, Sherwin Bahmani, Kiriakos N. Kutulakos, David B. Lindell:
GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling. 3508-3518 - Silvan Weder, Francis Engelmann, Johannes L. Schönberger, Akihito Seki, Marc Pollefeys, Martin R. Oswald:
ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction. 3519-3528 - Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu, Pin-Yu Chen:
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models. 3529-3538 - Pengxiang Li, Kai Chen, Zhili Liu, Ruiyuan Gao, Lanqing Hong, Dit-Yan Yeung, Huchuan Lu, Xu Jia:
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models. 3539-3548 - Yuan Zhang, Yutong Xie, Hu Wang, Jodie C. Avery, M. Louise Hull, Gustavo Carneiro
:
A Novel Perspective for Multi-Modal Multi-Label Skin Lesion Classification. 3549-3558 - Youngjun Jun, Jiwoo Park, Kyobin Choo, Tae Eun Choi, Seong Jae Hwang:
Disentangling Disentangled Representations: Towards Improved Latent Units via Diffusion Models. 3559-3569 - Chengyin Li, Rafi Ibn Sultan, Prashant Khanduri, Yao Qiang, Chetty J. Indrin, Dongxiao Zhu:
AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation. 3570-3580 - Chengyin Li, Hui Zhu, Rafi Ibn Sultan, Hassan Bagher-Ebadian, Prashant Khanduri, Chetty J. Indrin, Kundan Thind, Dongxiao Zhu:
MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding and Alternating Training. 3581-3591 - Boqi Chen, Yuanzhi Zhu, Yunke Ao, Sebastiano Caprara, Reto Sutter, Gunnar Rätsch, Ender Konukoglu, Anna Susmelj:
Generalizable Single-Source Cross-Modality Medical Image Segmentation via Invariant Causal Mechanisms. 3592-3602 - Nikolas Adaloglou, Tim Kaiser, Felix Michels, Markus Kollmann:
Rethinking Cluster-Conditioned Diffusion Models for Label-Free Image Synthesis. 3603-3613 - Shwetha Ram, Tal Neiman, Qianli Feng, Andrew Stuart, Son Tran, Trishul Chilimbi:
DreamBlend: Advancing Personalized Fine-Tuning of Text-to-Image Diffusion Models. 3614-3623 - Delin An, Pengfei Gu, Milan Sonka, Chaoli Wang, Danny Z. Chen:
Sli2Vol+: Segmenting 3D Medical Images Based on an Object Estimation Guided Correspondence Flow Network. 3624-3634 - Jonghun Kim
, Inye Na, Eun Sook Ko, Hyunjin Park
:
Tumor Synthesis Conditioned on Radiomics. 3635-3646 - Nahid Ul Islam, Dongao Ma
, Jiaxuan Pang, Shivasakthi Senthil Velan, Michael B. Gotway, Jianming Liang:
Foundation X: Integrating Classification, Localization, and Segmentation Through Lock-Release Pretraining Strategy for Chest X-Ray Analysis. 3647-3656 - Youyuan Zhang, Xuan Ju, James J. Clark:
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing. 3657-3666 - Sharon Chokuwa, Muhammad Haris Khan:
Divergent Domains, Convergent Grading: Enhancing Generalization in Diabetic Retinopathy Grading. 3667-3677 - Zhi Xu, Shaozhe Hao, Kai Han:
CusConcept: Customized Visual Concept Decomposition with Diffusion Models. 3678-3687 - Benito Buchheim, Max Reimann, Jürgen Döllner:
Controlling Human Shape and Pose in Text-to-Image Diffusion Models via Domain Adaptation. 3688-3697 - Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang:
Fine-grained Controllable Video Generation via Object Appearance and Context. 3698-3708 - Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang:
Generating Long-Take Videos via Effective Keyframes and Guidance. 3709-3720 - Rishubh Parihar, Prasanna Balaji, Raghav Magazine, Sarthak Vora, Varun Jampani, R. Venkatesh Babu:
Attribute Diffusion: Diffusion Driven Diverse Attribute Editing. 3721-3731 - Ming Kang, Fung Fung Ting, Raphaël C.-W. Phan, Chee-Ming Ting:
PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices. 3732-3741 - Taewoo Kim, Geonsu Lee, Hyukgi Lee, Seongtae Kim, Younggun Lee:
PixSwap: High-Resolution Face Swapping for Effective Reflection of Identity via Pixel-Level Supervision with Synthetic Paired Dataset. 3742-3751 - Niklas Babendererde, Haozhe Zhu, Moritz Fuchs, Jonathan Stieber, Anirban Mukhopadhyay:
Federated-Continual Dynamic Segmentation of Histopathology Guided by Barlow Continuity. 3752-3761 - Yannik Frisch
, Christina Bornberg, Moritz Fuchs, Anirban Mukhopadhyay:
GAUDA: Generative Adaptive Uncertainty-Guided Diffusion-Based Augmentation for Surgical Segmentation. 3762-3771 - Zhongpai Gao, Abhishek Sharma, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu:
Automated Patient Positioning with Learned 3D Hand Gestures. 3772-3781 - Xingzhe He, Zhiwen Cao, Nicholas I. Kolkin, Lantao Yu, Kun Wan, Helge Rhodin, Ratheesh Kalarot:
A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization. 3782-3791 - Kangfu Mei, Nithin Gopalakrishnan Nair, Vishal M. Patel:
Improving Conditional Diffusion Models through Re-Noising from Unconditional Diffusion Priors. 3792-3801 - Mario Wieser, Daniel Siegismund, Stephan Steigele:
Revisiting Deep Archetypal Analysis for Phenotype Discovery in High Content Imaging. 3802-3811 - Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Jiale Cao, Zhong Ji, Mingming Sun:
SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior. 3812-3822 - Ziyu Zhou, Haozhe Luo, Mohammad Reza Hosseinzadeh Taher, Jiaxuan Pang, Xiaowei Ding, Michael B. Gotway, Jianming Liang:
ACE: Anatomically Consistent Embeddings in Composition and Decomposition. 3823-3833 - Amin Ranem, John Kalkhof
, Anirban Mukhopadhyay:
NCAdapt: Dynamic Adaptation with Domain-Specific Neural Cellular Automata for Continual Hippocampus Segmentation. 3834-3843 - Michele De Vita, Vasileios Belagiannis:
Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation. 3844-3854 - Ruyu Wang, Xuefeng Hou, Sabrina Schmedding, Marco F. Huber:
STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation. 3855-3865 - Abdullah Al Rahat, Hemanth Venkateswara:
Dataset Augmentation by Mixing Visual Concepts. 3866-3875 - Chentianye Xu, Xueying Zhan, Min Xu:
CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders. 3876-3885 - Roberto Di Via, Francesca Odone, Vito Paolo Pastore:
Self-Supervised Pre-Training with Diffusion Model for Few-Shot Landmark Detection in X-Ray Images. 3886-3896 - Ziyang Zheng, Ruiyuan Gao, Qiang Xu:
Non-Cross Diffusion for Semantic Consistency. 3897-3906 - Aiman Farooq, Deepak Mishra, Santanu Chaudhury:
Survival Prediction in Lung Cancer through Multi-Modal Representation Learning. 3907-3915 - Zakaria Patel, Kirill Serkh:
Enhancing Image Layout Control with Loss-Guided Diffusion Models. 3916-3924 - Zhenyue Qin, Yiqun Zhang, Yang Liu, Dylan Campbell:
HandCraft: Anatomically Correct Restoration of Malformed Hands in Diffusion Generated Images. 3925-3933 - Xin Jiang, Junwei Zheng, Ruiping Liu, Jiahang Li, Jiaming Zhang, Sven Matthiesen, Rainer Stiefelhagen:
@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology. 3934-3943 - Haoning Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang:
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning. 3944-3953 - Abhishek Kumar Sinha, S. Manthira Moorthi:
CharDiff: Improving Sampling Convergence via Characteristic Function Consistency in Diffusion Models. 3955-3964 - Anuja Vats, Ivar Farup, Marius Pedersen, Kiran B. Raja:
Uncertainty-Aware Regularization for Image-to-Image Translation. 3965-3974 - Hongsuk Choi, Isaac Kasahara, Selim Engin, Moritz A. Graule, Nikhil Chavan Dafle, Volkan Isler:
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection. 3975-3984 - Souhaib Attaiki, Paul Guerrero, Duygu Ceylan, Niloy J. Mitra, Maks Ovsjanikov:
GANFusion: Feed-Forward Text-to-3D with Diffusion in GAN Space. 3985-3995 - Lucas N. Kirsten, Angelo Angonezi, Jose Marques, Fernanda Oliveira, Juliano Faccioni, Camila Cassel, Débora Santos de Sousa, Samlai Vedovatto, Guido Lenz, Cláudio R. Jung:
Oriented Cell Dataset: A Dataset and Benchmark for Oriented Cell Detection and Applications. 3996-4005 - Jinlin Xiang, Hillol Sarker, Bozhao Qi, Ruisu Zhang, Roger Trullo, Salvatore Badalamenti, Maria Wiekowski, Annie Kruger, Etienne Pochet, Qi Tang, Wei Zhao:
Endoscopic Scoring and Localization in Unconstrained Clinical Trial Videos. 4006-4015 - Vamsi Krishna Vasa, Peijie Qiu, Wenhui Zhu, Yujian Xiong, Oana M. Dumitrascu, Yalin Wang:
Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement. 4016-4025 - Libing Zeng, Nima Khademi Kalantari:
Analyzing and Improving the Skin Tone Consistency and Bias in Implicit 3D Relightable Face Generators. 4026-4035 - Sheng Zhang, Jinge Wu, Junzhi Ning, Guang Yang:
DMRN: A Dynamical Multi-Order Response Network for the Robust Lung Airway Segmentation. 4036-4045 - Shahzad Ahmad, Sania Bano, Sukalpa Chanda, Santosh Kumar Vipparthi, Subrahmanyam Murala:
TRUST: Time-Domain Residual Unsupervised Stability Technique for Improved Heart Rate Estimation. 4046-4055 - Yoni Gozlan, Antoine Falisse, Scott D. Uhlrich, Anthony A. Gatti, Michael Black, Jennifer L. Hicks, Scott L. Delp, Akshay Chaudhari:
OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics. 4056-4065 - Jianyi Zhang, Hao Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li:
MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning. 4066-4076 - Danfeng Guo, Sanchit Agarwal, Yu-Hsiang Lin, Jiun-Yu Kao, Tagyoung Chung, Nanyun Peng, Mohit Bansal:
Improving Faithfulness of Text-to-Image Diffusion Models through Inference Intervention. 4077-4086 - Idan Kligvasser, Regev Cohen, George Leifman, Ehud Rivlin, Michael Elad:
Anchored Diffusion for Video Face Reenactment. 4087-4097 - Youssof Nawar, Nouran Soliman, Moustafa Wassel, Mohamed ElHabebe, Noha Adly, Marwan Torki, Ahmed Elmassry, Islam Ahmed:
DiffuPT: Class Imbalance Mitigation for Glaucoma Detection via Diffusion Based Generation and Model Pretraining. 4098-4107 - Zoltán Ádám Milacski, Koichiro Niinuma, Ryosuke Kawamura, Fernando De la Torre, László A. Jeni:
GHOST: Grounded Human Motion Generation with Open Vocabulary Scene-and-Text Contexts. 4108-4118 - Naveen Karunanayake, Suranga Seneviratne, Sanjay Chawla:
CRAFT: Class Ranking Aware Fine-Tuning for Enhanced Out-of-Distribution Detection. 4119-4128 - Fatemeh Haghighi, Michael B. Gotway, Jianming Liang:
Learning Anatomy-Disease Entangled Representation. 4129-4141 - Yilmaz Korkmaz, Vishal M. Patel:
MambaRecon: MRI Reconstruction with Structured State Space Models. 4142-4152 - Sai Bharath Chandra Gutha, Ricardo Vinuesa, Hossein Azizpour:
Inverse Problems with Diffusion Models: A MAP Estimation Perspective. 4153-4162 - Steven Hogue, Chenxu Zhang, Yapeng Tian, Xiaohu Guo:
Joint Co-Speech Gesture and Expressive Talking Face Generation Using Diffusion with Adapters. 4163-4172 - Fazle Rahat, M. Shifat Hossain, Md Rubel Ahmed, Sumit Kumar Jha, Rickard Ewetz:
Data Augmentation for Image Classification Using Generative AI. 4173-4182 - Qianwen Lu, Xingchao Yang, Takafumi Taketomi:
BeautyBank: Encoding Facial Makeup in Latent Space. 4183-4193 - Trung Dinh Quoc Dang, Huy Hoang Nguyen, Aleksei Tiulpin:
Image-Level Regression for Uncertainty-Aware Retinal Image Segmentation. 4194-4204 - Remi Chierchia, Léo Lebrat
, David Ahmedt-Aristizabal, Olivier Salvado
, Clinton Fookes, Rodrigo Santa Cruz:
SALVE: A 3D Reconstruction Benchmark of Wounds from Consumer-Grade Videos. 4205-4214 - Haeil Lee, Hansang Lee, Seoyeon Gye, Junmo Kim:
Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models Using Stepwise Spectral Analysis. 4215-4224 - Chun-Hong Cheng, Jing Wei Chin, Kwan Long Wong, Tsz Tai Chan, Hau Ching Lo, Kwan Lok Pang, Richard Hau Yue So, Bryan Yan:
Remote Blood Pressure Estimation from Facial Videos Using Transfer Learning: Leveraging PPG to rPPG Conversion. 4225-4236 - Ali Karami, Thi Kieu Khanh Ho, Narges Armanfard:
Graph-Jigsaw Conditioned Diffusion Model for Skeleton-Based Video Anomaly Detection. 4237-4247 - Tawsifur Rahman, Alexander S. Baras, Rama Chellappa:
CEMIL: Contextual Attention Based Efficient Weakly Supervised Approach for Histopathology Image Classification. 4248-4257 - Rasel Ahmed Bhuiyan, Adam Czajka:
Forensic Iris Image-Based Post-Mortem Interval Estimation. 4258-4267 - Sabina Martyniak, Joanna Kaleta, Diego Dall'Alba, Michal Naskret, Szymon Plotka, Przemyslaw Korzeniowski:
SimuScope: Realistic Endoscopic Synthetic Dataset Generation Through Surgical Simulation and Diffusion Models. 4268-4278 - Tonmoy Hossain, Jing Ma, Jundong Li, Miaomiao Zhang:
Invariant Shape Representation Learning for Image Classification. 4279-4289 - Kaito Shiku, Kazuya Nishimura, Daiki Suehiro, Kiyohito Tanaka, Ryoma Bise:
Ordinal Multiple-instance Learning for Ulcerative Colitis Severity Estimation with Selective Aggregated Transformer. 4290-4299 - Koushik Biswas, Amit Reza, Meghana Karri, Debesh Jha, Hongyi Pan, Nikhil Kumar Tomar, Aliza Subedi, Smriti Regmi, Ulas Bagci:
Optimizing Neural Network Effectiveness via Non-monotonicity Refinement. 4300-4309 - Justin Theiss, Norman Müller, Daeil Kim, Aayush Prakash:
Multi-View Image Diffusion via Coordinate Noise and Fourier Attention. 4310-4319 - Pamela Osuna-Vargas, Maren H. Wehrheim, Lucas Zinz, Johanna V. Rahm, Ashwin Balakrishnan, Alexandra Kaminer, Mike Heilemann, Matthias Kaschube:
Denoising Diffusion Models for High-Resolution Microscopy Image Restoration. 4320-4330 - Utkarsh Nath, Rajeev Goel, Eun Som Jeon, Changhoon Kim, Kyle Min, Yezhou Yang, Yingzhen Yang, Pavan K. Turaga:
Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation. 4331-4341 - Juhyung Ha, Jong Sung Park, David Crandall
, Eleftherios Garyfallidis, Xuhong Zhang:
Multi-Resolution Guided 3D GANs for Medical Image Translation. 4342-4351 - Muhammad Sohaib
, Siyavash Shabani, Sahar A. Mohammed, Garrett Winkelmaier, Bahram Parvin:
Multi-Aperture Transformers for 3D (MAT3D) Segmentation of Clinical and Microscopic Images. 4352-4361 - Joy Dhar, Nayyar Zaidi, Maryam Haghighat
, Sudipta Roy, Puneet Goyal, Azadeh Alavi, Vikas Kumar:
Multimodal Fusion Learning with Dual Attention for Medical Imaging. 4362-4371 - Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo:
Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation. 4372-4381 - Man Minh Ho, Shikha Dubey, Yosep Chong, Beatrice Knudsen, Tolga Tasdizen:
F2FLDM: Latent Diffusion Models with Histopathology Pre-Trained Embeddings for Unpaired Frozen Section to FFPE Translation. 4382-4391 - Vaibhav Ganatra, Siddhartha Gairola, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Arivunithi Varadharajan, Bellamkonda Mallikarjuna, Nipun Kwatra, Mohit Jain:
SmartKC++: Improving Performance of Smartphone-Based Corneal Topographers. 4392-4399 - Kai Wang, Fei Yang, Bogdan Raducanu, Joost van de Weijer:
Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier. 4400-4409 - Antoine P. Sanner, Jonathan Stieber, Nils F. Grauhan, Suam Kim, Marc A. Brockmann, Ahmed E. Othman, Anirban Mukhopadhyay:
Federated Voxel Scene Graph for Intracranial Hemorrhage. 4410-4419 - Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen:
Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing. 4420-4429 - Pengfei Guo, Can Zhao, Dong Yang, Ziyue Xu, Vishwesh Nath, Yucheng Tang, Benjamin Simon, Mason Belue, Stephanie A. Harmon, Baris Turkbey, Daguang Xu:
MAISI: Medical AI for Synthetic Imaging. 4430-4441 - Sebastian Thiele, Jacqueline Kockwelp, Joachim Wistuba, Sabine Kliesch, Jörg Gromoll, Benjamin Risse:
Investigating Imaging, Annotation and Self-Supervision for the Classification of Continuously Developing Cells in Histological Whole Slide Images. 4442-4451 - Qiwen Deng, Yangcen Liu:
Structure-Aware Human Body Reshaping with Adaptive Affinity-Graph Network. 4452-4461 - Xiaoyang Wei, Camille Kurtz, Florence Cloppet:
Relaxing Binary Constraints in Contrastive Vision-Language Medical Representation Learning. 4462-4471 - Hyunsoo Lee, Minsoo Kang, Bohyung Han:
Diffusion-Based Conditional Image Editing Through Optimized Inference with Guidance. 4472-4480 - Ciprian A. Corneanu, Qianli Feng, Aleix M. Martínez:
Structured Human Assessment of Text-to-Image Generative Models. 4481-4490 - Raman Dutt, Ondrej Bohdal, Pedro Sanchez, Sotirios A. Tsaftaris, Timothy M. Hospedales:
MemControl: Mitigating Memorization in Diffusion Models via Automated Parameter Selection. 4491-4501 - Xuanzhao Dong, Vamsi Krishna Vasa, Wenhui Zhu, Peijie Qiu, Xiwen Chen, Yi Su, Yujian Xiong, Zhangsihao Yang, Yanxi Chen, Yalin Wang:
CUNSB-RFIE: Context-Aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement. 4502-4511 - Shuhan Xiao, Lukas Klein, Jens Petersen, Philipp Vollmuth, Paul F. Jaeger, Klaus H. Maier-Hein:
Enhancing Predictive Imaging Biomarker Discovery Through Treatment Effect Analysis. 4512-4522 - Yan Zeng, Masanori Suganuma, Takayuki Okatani:
Inverting the Generation Process of Denoising Diffusion Implicit Models: Empirical Evaluation and a Novel Method. 4516-4524 - Chaewon Kim, Seung Jun Moon, Gyeong-Moon Park:
WINE: Wavelet-Guided GAN Inversion and Editing for High-Fidelity Refinement. 4523-4532 - Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, Weidong Cai:
AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation. 4533-4544 - Kenta Horikawa, Mariko Isogawa, Hideo Saito, Shohei Mori:
Dense Depth from Event Focal Stack. 4545-4553 - Xulin Fan, Heting Gao, Ziyi Chen
, Peng Chang, Mei Han, Mark Hasegawa-Johnson:
SyncDiff: Diffusion-Based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization. 4554-4563 - Sai Shashank Kalakonda, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla:
Morag - Multi-Fusion Retrieval Augmented Generation for Human Motion. 4564-4573 - Shahzad Ahmad, Sania Bano, Sachin Verma, Yogesh Singh Rawat, Sukalpa Chanda, Santosh Kumar Vipparthi, Subrahmanyam Murala:
PULSE: Physiological Understanding with Liquid Signal Extraction. 4574-4584 - Xindi Wu, Uriel Singer, Zhaojiang Lin, Andrea Madotto, Xide Xia, Yifan Xu, Paul A. Crook, Xin Luna Dong, Seungwhan Moon:
Corgi: Cached Memory Guided Video Generation. 4585-4594 - Sungkyu Yang
, Woohyun Park, Kwangil Yim, Mansu Kim:
MFTrans: A Multi-Resolution Fusion Transformer for Robust Tumor Segmentation in Whole Slide Images. 4595-4605 - Zhenyuan Dong, Sai Qian Zhang:
DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing. 4606-4615 - Zhuoyi Yang, Liyue Shen:
TempA-VLP: Temporal-Aware Vision-Language Pretraining for Longitudinal Exploration in Chest X-Ray Image. 4625-4634 - Fang-Yi Su, Tzu-Hung Chang, Jung-Hsien Chiang:
DiffuCE: Expert-Level CBCT Image Enhancement Using a Novel Conditional Denoising Diffusion Model with Latent Alignment. 4635-4644 - Vasco Ramos
, Yonatan Bitton, Michal Yarom, Idan Szpektor, João Magalhães:
Contrastive Sequential-Diffusion Learning: Non-Linear and Multi-Scene Instructional Video Synthesis. 4645-4654 - Tapas Kumar Dutta, Snehashis Majhi, Deepak Ranjan Nayak, Debesh Jha:
SAM-Mamba: Mamba Guided SAM Architecture for Generalized Zero-Shot Polyp Segmentation. 4655-4664 - Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Vladimir Pavlovic, Hao Wang, Molei Tao, Dimitris N. Metaxas:
SODA: Spectral Orthogonal Decomposition Adaptation for Diffusion Models. 4665-4682 - Gurucharan Marthi Krishna Kumar, Janine D. Mendola, Amir Shmuel:
Nestedmorph: Enhancing Deformable Medical Image Registration With Nested Attention Mechanisms. 4683-4692 - Yaoxin Zhuo, Zachary Bessinger, Lichen Wang, Naji Khosravan, Baoxin Li, Sing Bing Kang:
TFM2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation. 4693-4703 - Marvin Burges, Sebastian Zambanini, Robert Sablatnig:
Interactive Object Detection for Tiny Objects in Large Remotely Sensed Images. 4704-4713 - Jingchen Sun, Rohan Sharma, Vishnu Suresh Lokhande, Changyou Chen:
Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning. 4714-4724 - Yicheng Wang, Zhikang Zhang, Jue Wang, David Fan, Zhenlin Xu, Linda Liu, Xiang Hao, Vimal Bhat, Xinyu Li:
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-Grained Video-Language Learning. 4725-4735 - Sanggeon Yun, Ryozo Masukawa, Minhyoung Na, Mohsen Imani:
Missiongnn: Hierarchical Multimodal GNN-Based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation. 4736-4745 - Wendi Yang, Zihang Jiang, Shang Zhao, S. Kevin Zhou:
PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery. 4746-4756 - Ayush Gupta, Rama Chellappa:
MimicGait: A Model Agnostic approach for Occluded Gait Recognition Using Correlational Knowledge Distillation. 4757-4766 - Ekin Celikkan, Timo Kunzmann, Yertay Yeskaliyev, Sibylle Itzerott, Nadja Klein, Martin Herold:
WeedsGalore: A Multispectral and Multitemporal UAV-Based Dataset for Crop and Weed Segmentation in Agricultural Maize Fields. 4767-4777 - Percy Lam
, Sooyong Park, Weiwei Chen
, Lavindra de Silva, Ioannis K. Brilakis:
CRAAC: Consistency Regularised Active Learning with Automatic Corrections for Real-Life Road Image Annotations. 4778-4787 - Sina Malakouti, Aysan Aghazadeh, Ashmit Khandelwal, Adriana Kovashka:
Benchmarking VLMs' Reasoning About Persuasive Atypical Images. 4788-4798 - I-Ting Tsai, Bharath Hariharan:
3D Synthesis for Architectural Design. 4799-4809 - Yan Yang, Utpal Bose, James Broadbent, Sally Stockwell, Keren Byrne, Md. Zakir Hossain, Eric A. Stone, Shannon Dillon:
Flowering Time Prediction of Wheat From DIA-MS Data. 4810-4820 - Xingjian Diao, Ming Cheng, Wayner Barrios, SouYoung Jin:
FT2TF: First-Person Statement Text-to-Talking Face Generation. 4821-4830 - Mayssa Zaier, Hazem Wannous, Hassen Drira:
Geometry-Aware Deep Learning for 3D Skeleton-Based Motion Prediction. 4831-4840 - Sanjana Sinha, Brojeshwar Bhowmick, Lokender Tiwari, Sushovan Chanda:
DisFlowEm : One-Shot Emotional Talking Head Generation Using Disentangled Pose and Expression Flow-Guidance. 4841-4851 - Sombit Dey, Ozan Unal, Christos Sakaridis, Luc Van Gool:
Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding. 4852-4861 - Xiaoyu Xiang, Liat Sless Gorelik, Yuchen Fan, Omri Armstrong, Forrest N. Iandola, Yilei Li, Ita Lifshitz, Rakesh Ranjan:
Make-A-Texture: Fast Shape-Aware Texture Generation in 3 Seconds. 4872-4881 - Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan:
AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models. 4882-4890 - Ce Zheng, Xianpeng Liu, Qucheng Peng, Tianfu Wu, Pu Wang, Chen Chen:
DiffMesh: A Motion-Aware Diffusion Framework for Human Mesh Recovery from Videos. 4891-4901 - Bardia Safaei, Vishal M. Patel:
Active Learning for Vision-Language Models. 4902-4912 - Yoshitomo Matsubara, Matteo Mendula, Marco Levorato:
A Multi-Task Supervised Compression Model for Split Computing. 4913-4922 - Aashish Rai, Srinath Sridhar:
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos. 4935-4946 - Risako Tanigawa, Kenji Ishikawa, Noboru Harada, Yasuhiro Oikawa:
SoundSil-DS: Deep Denoising and Segmentation of Sound-field Images with Silhouettes. 4947-4956 - Bingqing Zhang, Zhuo Cao, Heming Du, Xin Yu
, Xue Li
, Jiajun Liu, Sen Wang
:
TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm. 4957-4967 - Vittorio Pipoli, Federico Bolelli, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Costantino Grana, Rita Cucchiara, Elisa Ficarra:
Semantically Conditioned Prompts for Visual Recognition Under Missing Modality Scenarios. 4968-4977 - Shubham Agarwal, Raz Birman, Ofer Hadar:
WARLearn: Weather-Adaptive Representation Learning. 4978-4987 - Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu
, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung:
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding. 4988-4998 - Tetsushi Yamada, Simone Di Santo:
Partial Texture VAE: Color and Texture Encoder for Rock Particle Images. 4999-5008 - Pramook Khungurn:
Talking Head Anime 4: Distillation for Real-Time Performance. 5018-5029 - Anh-Quan Cao, Maximilian Jaritz, Matthieu Guillaumin, Raoul de Charette, Loris Bazzani
:
LATTECLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts. 5030-5040 - Denys Rozumnyi, Nadine Bertsch, Othman Sbai, Filippo Arcadu, Yuhua Chen, Artsiom Sanakoyeu, Manoj Kumar, Catherine Herold, Robin Kips:
XR-MBT: Multi-Modal Full Body Tracking for XR Through Self-Supervision with Learned Depth Point Cloud Registration. 5041-5050 - Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Panagiotis C. Petrantonakis:
Similarity Over Factuality: Are we Making Progress on Multimodal Out-of-Context Misinformation Detection? 5041-5050 - Yanan Niu, Roy Sarkis, Demetri Psaltis, Mario Paolone, Christophe Moser, Luisa Lambertini:
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor Using Public Cameras and Time Series. 5051-5060 - Sina Hajimiri, Ismail Ben Ayed, Jose Dolz:
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation. 5061-5071 - Xin Ye, Feng Tao, Abhirup Mallik, Burhaneddin Yaman, Liu Ren:
LORD: Large Models Based Opposite Reward Design for Autonomous Driving. 5072-5081 - Md Mahedi Hasan, Shoaib Meraj Sami, Nasser M. Nasrabadi:
CLFace: A Scalable and Resource-Efficient Continual Learning Framework for Lifelong Face Recognition. 5082-5091 - Dingkun Yan, Liang Yuan, Erwin Wu, Yuma Nishioka, Issei Fujishiro, Suguru Saito:
ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model. 5092-5102 - Lior Dikstein, Ariel Lapid, Arnon Netzer, Hai Victor Habi:
Data Generation for Hardware-Friendly Post-Training Quantization. 5103-5113 - Bo Lang, Mooi Choo Chuah:
Event-Guided Video Transformer for End-to-End 3D Human Pose Estimation. 5114-5124 - Wele Gedara Chaminda Bandara, Vishal M. Patel:
Deep Metric Learning for Unsupervised Remote Sensing Change Detection. 5125-5135 - Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai:
Dance any Beat: Blending Beats with Visuals in Dance Video Generation. 5136-5146 - Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann:
OpenCity3D: What do Vision-Language Models Know About Urban Environments? 5147-5155 - Abid Ali, Rui Dai, Ashish Marisetty, Guillaume Astruc, Monique Thonnat, Jean-Marc Odobez, Susanne Thümmler, François Brémond:
Loose Social-Interaction Recognition in Real-World Therapy Scenarios. 5156-5165 - Julius Pesonen, Teemu Hakala, Väinö Karjalainen, Niko Koivumäki, Lauri Markelin, Anna-Maria Raita-Hakola
, Juha Suomalainen, Ilkka Pölönen
, Eija Honkavaara:
Detecting Wildfires on UAVs with Real-Time Segmentation Trained by Larger Teacher Models. 5166-5176 - Ying Shen, Daniel Bis, Cynthia Lu, Ismini Lourentzou:
ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion. 5177-5186 - Tim Dieter Eberhardt, Tim Brühl, Robin Schwager, Tin Stribor Sohn, Wilhelm Stork:
Clarity Amidst Blur: A Deterministic Method for Synthetic Generation of Water Droplets on Camera Lenses. 5187-5196 - Siddharth Seth, Rishabh Dabral, Diogo C. Luvizon, Marc Habermann, Ming-Hsuan Yang, Christian Theobalt, Adam Kortylewski:
PocoLoco: A Point Cloud Diffusion Model of Human Shape in Loose Clothing. 5197-5206 - Hanyuan Xiao, Yingshu Chen, Huajian Huang, Haolin Xiong, Jing Yang, Pratusha Prasad, Yajie Zhao:
Localized Gaussian Splatting Editing with Contextual Awareness. 5207-5217 - Doyoung Park, Naresh Reddy Yarram, Sunjin Kim, Minkyu Kim, Seongho Cho, Taehee Lee:
Text Change Detection in Multilingual Documents Using Image Comparison. 5218-5227 - Zihao Zou, Jiaming Liu, Shirin Shoushtari, Yubo Wang, Ulugbek S. Kamilov:
FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration. 5228-5238 - Wenjun Huang, Yang Ni, Arghavan Rezvani, Sungheon Jeong, Hanning Chen, Yezi Liu, Fei Wen, Mohsen Imani:
Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach. 5239-5249 - Wele Gedara Chaminda Bandara, Nithin Gopalakrishnan Nair, Vishal M. Patel:
DDPM-CD: Denoising Diffusion Probabilistic Models as Feature Extractors for Remote Sensing Change Detection. 5250-5262 - Yusuke Akamatsu, Terumi Umematsu, Hitoshi Imaoka, Shizuko Gomi, Hideo Tsurushima:
ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces. 5263-5273 - Hankyeol Lee, Gawon Seo, Wonseok Choi, Geunyoung Jung, Kyungwoo Song, Jiyoung Jung:
Enhancing Visual Classification Using Comparative Descriptors. 5274-5283 - Luca Collorone, Stefano D'Arrigo, Massimiliano Pappa, Guido Maria D'Amely di Melendugno, Giovanni Ficarra
, Fabio Galasso:
ANTHROPOS-V: Benchmarking the Novel Task of Crowd Volume Estimation. 5284-5294 - Raquel Panadero, Dominik Schörkhuber, Margrit Gelautz:
Importance-Guided Interpretability and Pruning for Video Transformers in Driver Action Recognition. 5295-5304 - Puneet Kumar, Shreshtha Misra, Zhuhong Shao, Bin Zhu, Balasubramanian Raman, Xiaobai Li:
Multimodal Interpretable Depression Analysis Using Visual, Physiological, Audio and Textual Data. 5305-5315 - Anudeep Vurity
, Emanuela Marasco, Raghavendra Ramachandra, Jongwoo Park:
ColFigPhotoAttnNet: Reliable Finger Photo Presentation Attack Detection Leveraging Window-Attention on Color Spaces. 5316-5325 - Zhao-Yang Wang, Jiang Liu, Jieneng Chen, Rama Chellappa:
VM-Gait: Multi-Modal 3D Representation Based on Virtual Marker for Gait Recognition. 5326-5335 - Kevin Flanagan, Dima Damen
, Michael Wray:
Moment of Untruth: Dealing with Negative Queries in Video Moment Retrieval. 5336-5345 - Hao Fu, Naman Patel, Prashanth Krishnamurthy, Farshad Khorrami:
CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring. 5346-5355 - Ahmad Arrabi, Xiaohan Zhang, Waqas Sultani, Chen Chen, Safwan Wshah:
Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance. 5356-5366 - Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Eric Granger:
A Realistic Protocol for Evaluation of Weakly Supervised Object Localization. 5367-5376 - Mu Cai, Zeyi Huang, Yuheng Li, Utkarsh Ojha, Haohan Wang, Yong Jae Lee:
An Investigation on LLMs' Visual Understanding Ability Using SVG for Image-Text Bridging. 5377-5386 - Deepti Rawat, Keshav Gupta, Aryamaan Basu Roy, Ravi Kiran Sarvadevabhatla:
DashCop: Automated E-Ticket Generation for Two-Wheeler Traffic Violations Using Dashcam Videos. 5387-5397 - Bumsoo Kim, Wonseop Shin
, Kyuchul Lee
, Yonghoon Jung, Sanghyun Seo:
Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information. 5398-5407 - Yuhang He, Sangyun Shin, Anoop Cherian, Niki Trigoni, Andrew Markham:
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera. 5408-5418 - Hiroki Nishizawa, Keitaro Tanaka, Asuka Hirata, Shugo Yamaguchi, Qi Feng, Masatoshi Hamanaka, Shigeo Morishima:
SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering. 5419-5428 - Haoyu Jiang, Zhi-Qi Cheng
, Gabriel Moreira
, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua:
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval. 5429-5438 - Jash Dalvi, Ali Dabouei, Gunjan Dhanuka, Min Xu:
Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection. 5439-5448 - Raza Imam, Hanan Gani, Muhammad Huzaifa, Karthik Nandakumar:
Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models. 5449-5459 - Evelyn A. Stump, Francesco Luzi, Leslie M. Collins, Jordan M. Malof:
Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer. 5460-5469 - Tevin Moodley, Dustin van der Haar:
I3D-AE-LSTM: A 2-Stream Autoencoder for Action Quality Assessment Using a Newly Created Cricket Batsman Video Dataset. 5470-5478 - Junno Yun, Mehmet Akçakaya
:
Generative Model-Based Fusion for Improved Few-Shot Semantic Segmentation of Infrared Images. 5479-5488 - Pinrui Yu, Zhenglun Kong, Pu Zhao, Peiyan Dong, Hao Tang, Fei Sun, Xue Lin, Yanzhi Wang:
Q-TempFusion: Quantization-Aware Temporal Multi-Sensor Fusion on Bird's-Eye View Representation. 5489-5499 - Debolena Basak, Soham Bhatt, Sahith Kanduri, Maunendra Sankar Desarkar:
Aerial Mirage: Unmasking Hallucinations in Large Vision Language Models. 5500-5508 - Bhavin Jawade, João V. B. Soares, Kapil Thadani, Deen Dayal Mohan, Amir Erfan Eshratifar, Benjamin Culpepper, Paloma de Juan, Srirangaraj Setlur, Venu Govindaraju:
SCOT: Self-Supervised Contrastive Pretraining for Zero-Shot Compositional Retrieval. 5509-5519 - Dipu Manandhar, Paul Guerrero, Zhaowen Wang, John P. Collomosse:
CLASS: Conditional Latent Architecture for Search and Synthesis of Design Layouts. 5520-5529 - Seon-Ho Lee, Jue Wang, David Fan, Zhikang Zhang, Linda Liu, Xiang Hao, Vimal Bhat, Xinyu Li:
Now you see Me: Context-Aware Automatic Audio Description. 5530-5539 - Niharika Hegde
, Shishir Muralidhara, René Schuster, Didier Stricker:
Modality-Incremental Learning with Disjoint Relevance Mapping Networks for Image-Based Semantic Segmentation. 5540-5549 - Donggeun Kim, Yujin Jo, Myungjoo Lee, Taesup Kim:
Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling. 5550-5559 - Junha Lee, Sojung An, Sujeong You, Nam Ik Cho:
Self-Supervised Learning with Probabilistic Density Labeling for Rainfall Probability Estimation. 5560-5569 - Maksim Golyadkin, Ianis Plevokas, Ilya Makarov:
Closing the Domain Gap in Manga Colorization via Aligned Paired Dataset. 5580-5590 - Anurag Deo
, Savita Bhat, Shirish S. Karande:
VisualFusion: Enhancing Blog Content with Advanced Infographic Pipeline. 5591-5600 - Daniel Steininger, Julia Simon
, Andreas Trondl, Markus Murschitz:
TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations. 5601-5610 - Idris Zakariyya, Linda Tran, Kaushik Bhargav Sivangi, Paul Henderson, Fani Deligianni:
Differentially Private Integrated Decision Gradients (IDG-DP) for Radar-Based Human Activity Recognition. 5611-5622 - Suguru Onda, Ryan Farrell:
The FineView Dataset: A 3D Scanned Multi-View Object Dataset of Fine-Grained Category Instances. 5623-5634 - Deepayan Das, Davide Talon, Massimiliano Mancini, Yiming Wang
, Elisa Ricci:
One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering. 5635-5645 - Tom Gillooly, Jean-Baptiste Thomas, Jon Yngve Hardeberg, Giuseppe Claudio Guarnera:
Image Adaptation for Colour Vision Deficient Viewers Using Vision Transformers. 5646-5655 - Pallavi Jain, Dino Ienco, Roberto Interdonato, Tristan Berchoux, Diego Marcos:
SenCLIP: Enhancing Zero-Shot Land-Use Mapping for Sentinel-2 with Ground-Level Prompting. 5656-5665 - Zhuowen Zou, Prathyush Poduval, Narayan Srinivasa, Mohsen Imani:
Hyperdimensional Representation for Adaptive Information Association and Memorization. 5666-5675 - Sahil Goyal, Abhinav Mahajan, Swasti Mishra, Prateksha Udhayanan, Tripti Shukla, K. J. Joseph, Balaji Vasan Srinivasan:
Design-O-Meter: Towards Evaluating and Refining Graphic Designs. 5676-5686 - Muhammad Awais, Ali Husain Salem Abdulla Alharthi, Amandeep Kumar, Hisham Cholakkal, Rao Muhammad Anwer:
AgroGPT : Efficient Agricultural Vision-Language Model with Expert Tuning. 5687-5696 - Zhuo Xu, Xiang Xiang:
Learning Visual-Semantic Hierarchical Attribute Space for Interpretable Open-Set Recognition. 5697-5706 - Harini S. I, Somesh Singh, Yaman Kumar Singla, Aanisha Bhattacharyya, Veeky Baths, Changyou Chen, Rajiv Ratn Shah, Balaji Krishnamurthy:
Long-Term Ad Memorability: Understanding & Generating Memorable Ads. 5707-5718 - Debasmita Pal, Redwan Sony, Arun Ross:
A Parametric Approach to Adversarial Augmentation for Cross-Domain Iris Presentation Attack Detection. 5719-5729 - Abhishek Rajora, Shubham Gupta, Suman Kundu:
Cross-Aligned Fusion For Multimodal Understanding. 5730-5740 - Hanwen Zheng, Sijia Wang, Chris Thomas, Lifu Huang:
Advancing Chart Question Answering with Robust Chart Component Recognition. 5741-5750 - Moyuru Yamada, Nimish Dharamshi, Ayushi Kohli, Prasad Kasu, Ainulla Khan, Manu Ghulyani:
Unleashing Potentials of Vision-Language Models for Zero-Shot HOI Detection. 5751-5760 - Zi-Xiang Xia, Sudeep Fadadu, Yi Shi, Louis Foucard:
Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles. 5761-5770 - Felix Hertlein, Alexander Naumann, York Sure-Vetter:
DocMatcher: Document Image Dewarping via Structural and Textual Line Matching. 5771-5780 - Dulanga Weerakoon, Vigneshwaran Subbaraju, Joo Hwee Lim, Archan Misra
:
NeuroViG - Integrating Event Cameras for Resource-Efficient Video Grounding. 5781-5790 - Haiyu Wu, Sicong Tian, Huayu Li, Kevin W. Bowyer:
LogicNet: A Logical Consistency Embedded Face Attribute Learning Network. 5791-5800 - Hasnat Md Abdullah, Tian Liu, Kangda Wei, Shu Kong, Ruihong Huang:
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark. 5801-5811 - Neha Choudhary
, Poonam Goyal, Devashish Siwatch, Atharva Chandak, Harsh Mahajan, Varun Khurana, Yaman Kumar:
AdQuestA: Knowledge-Guided Visual Question Answer Framework for Advertisements. 5812-5821 - Raymond Yu, Paul Han, Piper Wolters, Favyen Bastani:
OPTIMUS: Observing Persistent Transformations in Multi-Temporal Unlabeled Satellite-Data. 5822-5830 - María Escobar, Juanita Puentes, Cristhian Forigua, Jordi Pont-Tuset, Kevis-Kokitsi Maninis, Pablo Arbeláez:
EgoCast: Forecasting Egocentric Human Pose in the Wild. 5831-5841 - Cheng-En Wu, Jinhong Lin, Yu Hen Hu, Pedro Morgado:
Patch Ranking: Token Pruning as Ranking Prediction for Efficient CLIP. 5842-5851 - Tom Wehrbein, Marco Rudolph, Bodo Rosenhahn, Bastian Wandt:
Utilizing Uncertainty in 2D Pose Detectors for Probabilistic 3D Human Mesh Recovery. 5852-5862 - Samyak Rawlekar, Shubhang Bhatnagar, Narendra Ahuja:
PositiveCoOp: Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations. 5863-5872 - Alexander Ponomarchuk, Ivan Kruzhilov, Gleb Mazanov, Ruslan Utegenov, Artem Shadrin, Galina Zubkova, Ivan Bessonov, Pavel Blinov:
CardioSyntax: End-to-End SYNTAX Score Prediction - Dataset, Benchmark and Method. 5873-5883 - Ziqiang Dang, Jianfang Li, Lin Liu:
Cascaded Dual Vision Transformer for Accurate Facial Landmark Detection. 5884-5894 - Charles Gaydon, Floryne Roche:
PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests. 5895-5904 - Ce Zhang, Simon Stepputtis, Katia P. Sycara, Yaqi Xie:
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning. 5905-5915 - Jia-Wei Liao, Winston Wang, Tzu-Sian Wang, Li-Xuan Peng, Ju-Hsuan Weng, Cheng-Fu Chou, Jun-Cheng Chen:
DiffQRCoder: Diffusion-Based Aesthetic QR Code Generation with Scanning Robustness Guided Iterative Refinement. 5916-5925 - Jingjiao Zhao, Jiaju Li, Dongze Lian, Liguo Sun, Pin Lv:
DualCIR: Enhancing Training-Free Composed Image Retrieval via Dual-Directional Descriptions. 5926-5936 - Ee Yeo Keat, Hao Zhang, Alexander Matyasko, Basura Fernando:
Deduce and Select Evidences with Language Models for Training-Free Video Goal Inference. 5937-5947 - Luca Scofano, Alessio Sampieri, Edoardo De Matteis, Indro Spinelli, Fabio Galasso:
Social EgoMesh Estimation. 5948-5958 - Xiang Huang, Zhi-Qi Cheng
, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun:
DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception. 5959-5968 - Siyuan Huang, Ram Prabhakar, Yuxiang Guo, Rama Chellappa, Cheng Peng:
VILLS: Video-Image Learning to Learn Semantics for Person Re-Identification. 5969-5979 - Sumin Hu, Youngmin Yoo, Jeeseong Kim, Changsoo Lim, Doohyun Cho, Bongnam Kang:
A Generic Vehicle-to-Sensor Calibration Framework. 5980-5989 - Christian Benz, Volker Rodehorst
:
Crackstructures and Crackensembles: The Power of Multi-View for 2.5D Crack Detection. 5990-5999 - Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp, Jindong Gu:
Can Multimodal Large Language Models Truly Perform Multimodal In-Context Learning? 6000-6010 - Rupanjali Kukal, Jay Patravali, Fuxun Yu, Simranjit Singh, Nikolaos Karianakis, Rishi Madhok:
Click&Describe: Multimodal Grounding and Tracking for Aerial Objects. 6011-6021 - Wenzhao Qiu, Shanmin Pang, Hao Zhang, Jianwu Fang, Jianru Xue:
HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning. 6022-6031 - Jinnan Chen, Chen Li, Gim Hee Lee:
DiHuR: Diffusion-Guided Generalizable Human Reconstruction. 6032-6041 - Ashutosh Chaubey, Anoubhav Agrawal, Sartaki Sinha Roy, Aayush Agrawal, Susmita Ghose:
ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising. 6042-6052 - Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan:
An Image is Worth Multiple Words: Multi-Attribute Inversion for Constrained Text-To-Image Synthesis. 6053-6062 - Xinhao Zhou, Tong Wang, Zhaodong Liu, Hao Wei, Guangyuan Pan:
A Regional-Level Resource-Saving Model for Winter Road Surface Snow Detection in Extreme Weathers. 6063-6072 - Nicola Fanelli, Gennaro Vessio
, Giovanna Castellano:
I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting. 6073-6082 - Eman Ali, Sathira Silva, Muhammad Haris Khan:
DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models. 6083-6093 - Zijiao Yang, Xiangxi Shi, Eric Slyman, Stefan Lee:
Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks. 6094-6103 - Ruoyu Wang, Yangfan He, Tengjiao Sun, Xiang Li, Tianyu Shi:
UniTMGE: Uniform Text-Motion Generation and Editing Model via Diffusion. 6104-6114 - Yehun Song, Sunyoung Cho:
Leveraging CLIP Encoder for Multimodal Emotion Recognition. 6115-6124 - Po-Hsuan Huang, Jeng-Lin Li, Chin-Po Chen, Ming-Ching Chang, Wei-Chao Chen:
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis. 6125-6135 - Daniel Panangian, Ksenia Bittner:
Can Location Embeddings Enhance Super-Resolution of Satellite Imagery? 6136-6145 - Dinghao Jin, Yuan Zeng, Yi Gong:
Bandwidth-Efficient Communication Modelling for Autonomous Vehicle Collaborative Perception. 6146-6155 - Mallika Garg, Debashis Ghosh, Pyari Mohan Pradhan:
ConvMixFormer- A Resource-Efficient Convolution Mixer for Transformer-Based Dynamic Hand Gesture Recognition. 6156-6166 - Mathieu Cocheteux
, Julien Moreau, Franck Davoine:
Uncertainty-Aware Online Extrinsic Calibration: A Conformal Prediction Approach. 6167-6176 - Floriane Magera, Thomas Hoyoux, Olivier Barnich, Marc Van Droogenbroeck:
BroadTrack: Broadcast Camera Tracking for Soccer. 6177-6187 - Hah Min Lew, Sahng-Min Yoo, Hyunwoo Kang, Gyeong-Moon Park:
Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications. 6188-6196 - Xiang Li, Yangfan He, Shuaishuai Zu, Zhengyang Li, Tianyu Shi, Yiting Xie, Kevin Zhang:
Multi-Modal Large Language Model with RAG Strategies in Soccer Commentary Generation. 6197-6206 - Niloufar Alipour Talemi, Hossein Kashiani, Fatemeh Afghah:
Style-Pro: Style-Guided Prompt Learning for Generalizable Vision-Language Models. 6207-6216 - Hung-Shuo Chang, Chien-Yao Wang, Richard Robert Wang, Gene Chou, Hong-Yuan Mark Liao:
Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models. 6217-6227 - Patrick Ramos, Nicolas Gonthier, Selina Khan, Yuta Nakashima, Noa Garcia:
No Annotations for Object Detection in Art Through Stable Diffusion. 6228-6237 - Manju R. A, Atul Kumar, Akshay Agarwal:
On Which Data Distribution (Synthetic or Real) We Should Rely for Soft Biometric Classification. 6238-6247 - Weixi Weng, Rui Zhang, Xiaojun Meng, Jieming Zhu, Qun Liu, Chun Yuan:
Unsupervised Domain Adaptive Visual Question Answering in the Era of Multi-Modal Large Language Models. 6248-6258 - Cole Hill, Florence Yellin, Krishna Regmi, Dawei Du, Scott McCloskey:
Re-identifying People in Video via Learned Temporal Attention and Multi-modal Foundation Models. 6259-6268 - Yao Zhang, Haokun Chen, Ahmed Frikha, Denis Krompass, Gengyuan Zhang, Jindong Gu, Volker Tresp:
CL-Cross VQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering. 6269-6278 - Abid Hasan Zim, Aquib Iqbal, Zaid Al-Huda, Asad Malik, Minoru Kuribayashi:
EfficientCrackNet: A Lightweight Model for Crack Segmentation. 6279-6289 - Shir Bar, Or Hirschorn, Roi Holzman, Shai Avidan:
Sifting Through the Haystack - Efficiently Finding Rare Animal Behaviors in Large-Scale Datasets. 6290-6299 - Huakun Shen, Boyue Caroline Hu, Krzysztof Czarnecki, Lina Marsso, Marsha Chechik:
Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance. 6300-6310 - Simone Fobi Nsutezo, Amrita Gupta, Duncan Kebut, Seema Iyer, Luana Marotti, Rahul Dodhia, Juan M. Lavista Ferres, Anthony Ortiz:
PGRID: Power Grid Reconstruction in Informal Developments Using High-Resolution Aerial Imagery. 6311-6319 - Muhammad Arbab Arshad, Talukder Zaki Jubery, Tirtho Roy, Rim Nassiri, Asheesh K. Singh, Arti Singh, Chinmay Hegde, Baskar Ganapathysubramanian, Aditya Balu, Adarsh Krishnamurthy, Soumik Sarkar:
Leveraging Vision Language Models for Specialized Agricultural Tasks. 6320-6329