


default search action
ICMR 2024: Phuket, Thailand
- Cathal Gurrin, Rachada Kongkachandra, Klaus Schoeffmann, Duc-Tien Dang-Nguyen, Luca Rossetto, Shin'ichi Satoh, Liting Zhou:
Proceedings of the 2024 International Conference on Multimedia Retrieval, ICMR 2024, Phuket, Thailand, June 10-14, 2024. ACM 2024
Regular Long Papers
- Xinzhe Ni
, Yong Liu
, Hao Wen
, Yatai Ji
, Jing Xiao
, Yujiu Yang
:
Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition. 1-10 - Kaixing Yang
, Xukun Zhou
, Xulong Tang
, Ran Diao
, Hongyan Liu
, Jun He
, Zhaoxin Fan
:
BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval. 11-19 - Yang Xu
, Yifan Feng
, Lin Bie
:
Triadic Elastic Structure Representation for Open-Set Incremental 3D Object Retrieval. 20-28 - Stephan Repp
, Ernst Georg Haffner
:
Dynamic Segmentation for Efficient Retrieval of Podcasts: The Repping Algorithm. 29-36 - Zhaoxin Fan
, Fengxin Li
, Hongyan Liu
, Jun He
, Xiaoyong Du
:
PoseRec: 3D Human Pose Driven Online Advertisement Recommendation for Micro-videos. 37-45 - Xiaoyu Qiu
, Hao Feng
, Yuechen Wang
, Wengang Zhou
, Houqiang Li
:
Progressive Multi-modal Conditional Prompt Tuning. 46-54 - Zhaoxin Fan
, Zhenbo Song
, Zhicheng Wang
, Jian Xu
, Kejian Wu
, Hongyan Liu
, Jun He
:
ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation. 55-63 - Yunfeng Yu
, Longlong Lin
, Qiyu Liu
, Zeli Wang
, Xi Ou
, Tao Jia
:
GSD-GNN: Generalizable and Scalable Algorithms for Decoupled Graph Neural Networks. 64-72 - Jiaxin Wu
, Chong-Wah Ngo
, Wing-Kwong Chan
:
Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank. 73-82 - Hua Gao
, Chenchen Hu
, Guang Han
, Jiafa Mao
, Wei Huang
, Kaiyuan Wan
:
HashNeck is a Boosting Tool for Deep Learning to Hashing. 83-91 - Di Wang
, Feng Yan
, Yifeng Wang
, Lin Zhao
, Xiao Liang
, Haodi Zhong
, Ronghua Zhang
:
Fine-grained Semantics-aware Representation Learning for Text-based Person Retrieval. 92-100 - Guangzhe Zhao
, Yanan Liu
, Xueping Wang
, Feihu Yan
:
CMFF-Face: Attention-Based Cross-Modal Feature Fusion for High-Quality Audio-Driven Talking Face Generation. 101-110 - Meng Wei
, Zhongnian Li
, Yong Zhou
, Xinzheng Xu
:
Learning from Reduced Labels for Long-Tailed Data. 111-119 - Tianyi Wang
, Shenghua Zhong
:
Fingerprinting in EEG Model IP Protection Using Diffusion Model. 120-128 - Weixing Liu
, Shenghua Zhong
:
MarginFinger: Controlling Generated Fingerprint Distance to Classification boundary Using Conditional GANs. 129-136 - Chuang Zhao
, Hefei Ling
, Shijie Lu
, Yuxuan Shi
, Jiazhong Chen
, Ping Li
:
Improve Deep Hashing with Language Guidance for Unsupervised Image Retrieval. 137-145 - Yue Yang
, Liangjun Ke
:
Exploiting Degradation Prior for Personalized Federated Learning in Real-World Image Super-Resolution. 146-154 - Hui Liu
, Xiaojun Wan
:
QAVidCap: Enhancing Video Captioning through Question Answering Techniques. 155-164 - Fanlei Meng
, Xiangru Chen
, Yuan Cao
:
Targeted Universal Adversarial Attack on Deep Hash Networks. 165-174 - Feifei Fu
, Yizhao Gao
, Zhiwu Lu
:
Enhancing Class-Incremental Learning for Image Classification via Bidirectional Transport and Selective Momentum. 175-183 - Mingzhe Yu
, Yunshan Ma
, Lei Wu
, Kai Cheng
, Xue Li
, Lei Meng
, Tat-Seng Chua
:
Smart Fitting Room: A One-stop Framework for Matching-aware Virtual Try-On. 184-192 - Mingyue Li
, Yuting Zhu
, Ruizhong Du
, Chunfu Jia
:
Secure Verification Encrypted Image Retrieval Scheme with Addition Homomorphic Bitmap Index. 193-201 - Xingquan Cai, Haoyu Zhang, Shanshan He, Haoyu Song, Haiyan Sun:
A Novel Auxiliary Task Framework in 3D Human Pose Estimation for Opera Videos. 202-210 - Donghuo Zeng
, Yanan Wang
, Kazushi Ikeda
, Yi Yu
:
Anchor-aware Deep Metric Learning for Audio-visual Retrieval. 211-219 - Jiaao Yu
, Yunlai Ding
, Junyu Dong
, Yuezun Li
:
Dynamic Soft Labeling for Visual Semantic Embedding. 220-228 - Feifei Xu
, Ziheng Yu
:
Navigating Style Variations in Scene Text Image Super-Resolution through Multi-Scale Perception. 229-238 - Depei Liu
, Hongjie Fan
, Junfei Liu
:
ExpoGenius: Robust Personalized Human Image Generation using Diffusion Model for Exposure Variation and Pose Transfer. 239-247 - Xudong Ru
, Haichuan Zhao
, Xingce Wang
, Zhongke Wu
, Shaolong Liu
, Yi-Cheng Zhu
, Alejandro F. Frangi
:
Vector-Aware Anisotropic Gauge Equivariant Mesh Convolution Network for 3D Aneurysm Detection. 248-256 - Junming Wang
, Yi Shi
:
NeurNCD: Novel Class Discovery via Implicit Neural Representation. 257-265 - Lin Bie
, Siqi Li
, Kai Cheng
:
Image-to-Point Registration via Cross-Modality Correspondence Retrieval. 266-274 - Lilong Wen
, Xiu Tang
, Dongxiang Zhang
:
TWIST: Text-only Weakly Supervised Scene Text Spotting Using Pseudo Labels. 275-284 - Xintao Jiao
, Jiansheng Chen
, Jiale Liu
:
A Graph Convolution Network with a POS-aware Filter and Context Enhancement Mechanism for Event Detection. 285-292 - Florian Spiess
, Nicolas Scharowski
, Ariane Haller
, Zgjim Memeti
, Heiko Schuldt
, Florian Brühlmann
:
Bringing Video Browsing to Virtual Reality: Empirical Evaluation of a Novel Multimedia Drawer. 293-301 - Changgu Chen
, Yang Li
, Jian Zhang
, Jiali Liu
, Changbo Wang
:
Generative Data Augmentation with Liveness Information Preserving for Face Anti-Spoofing. 302-310 - Lucas Joos
, Bastian Jäckl
, Daniel A. Keim
, Maximilian T. Fischer
, Ladislav Peska
, Jakub Lokoc
:
Known-Item Search in Video: An Eye Tracking-Based Study. 311-319 - Huixia Ben
, Shuo Wang
, Meng Wang
, Richang Hong
:
Pseudo Content Hallucination for Unpaired Image Captioning. 320-329 - Haiyang Zheng
, Ruilin Zhang
, Hongpeng Wang
:
Deep Image Clustering Based on Curriculum Learning and Density Information. 330-338 - Jiaxin Li
, Zhihan Yu
, Guibo Luo
, Yuesheng Zhu
:
CodeDetector: Revealing Forgery Traces with Codebook for Generalized Deepfake Detection. 339-347 - Zeli Wang
, Jian Li, Shuyin Xia
, Longlong Lin
, Guoyin Wang
:
Text Adversarial Defense via Granular-Ball Sample Enhancement. 348-356 - Zeli Wang
, Tuo Zhang
, Shuyin Xia
, Longlong Lin
, Guoyin Wang
:
GBRAIN: Combating Textual Label Noise by Granular-ball based Robust Training. 357-365 - Wei Tang
, Yuanyi Wang
:
Multi-modal Entity Alignment via Position-enhanced Multi-label Propagation. 366-375 - Zuheng Kang
, Yayun He
, Botao Zhao
, Xiaoyang Qu
, Junqing Peng
, Jing Xiao
, Jianzong Wang
:
Retrieval-Augmented Audio Deepfake Detection. 376-384 - Yongcheng Zhang
, Lingou Kong
, Sheng Tian
, Hao Fei
, Changpeng Xiang
, Huan Wang
, Xiaomei Wei
:
Multi-view Counterfactual Contrastive Learning for Fact-checking Fake News Detection. 385-393 - Danyang Hou
, Liang Pang
, Huawei Shen
, Xueqi Cheng
:
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement. 394-403 - Albatool Wazzan
, Imtiaz Ahmad
, Stephen MacNeil
, Richard Souvenir
:
Context or Clutter? Efficiently Matching Objects Across Scenes. 404-413 - Tianpeng Zhang
, Xuesong Jiang
:
A Lightweight Surface Defect Segmentation Network with External Semantics and High-frequency Information. 414-422 - Zhenghao Zhao
, Hao Tang
, Joy Wan
, Yan Yan
:
Monocular Expressive 3D Human Reconstruction of Multiple People. 423-432 - Mei Yu
, Xiaoxi Zhou
, Mankun Zhao
, Tianyi Xu
, Yue Zhao
, Ruiguo Yu
, Xuewei Li
:
A Causal View for Multi-Interest User Modeling in News Recommendation. 433-441 - Yang Liu
, Tongfei Shen
, Dong Zhang
, Qingying Sun
, Shoushan Li
, Guodong Zhou
:
Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection. 442-450 - Yichen Yan
, Xingjian He
, Sihan Chen
, Jing Liu
:
Calibration & Reconstruction: Deeply Integrated Language for Referring Image Segmentation. 451-459 - Thao-Nhu Nguyen
, Zongyao Li
, Satoshi Yamazaki
, Jianquan Liu
, Cathal Gurrin
:
A Parallel Transformer Framework for Video Moment Retrieval. 460-468 - Pengfei Wei
, Hongjun Ouyang
, Qintai Hu
, Bi Zeng
, Guang Feng
, Qingpeng Wen
:
VEC-MNER: Hybrid Transformer with Visual-Enhanced Cross-Modal Multi-level Interaction for Multimodal NER. 469-477 - Weiwei Zhou
, Guoqiang Xiao
, Michael S. Lew
, Song Wu
:
Causal Inference-based Few-Shot Class-Incremental Learning. 478-487 - Zixin Tang
, Haihui Fan
, Xiaoyan Gu
, Yang Li
, Bo Li
, Xin Wang
:
ELSEIR: A Privacy-Preserving Large-Scale Image Retrieval Framework for Outsourced Data Sharing. 488-496 - Yijing Zhao
, Yuchao Xia
, Yi Ding
, Yumeng Liu
, Shuai Liu
, Hongan Wang
:
S2F-Net: Shared-Specific Fusion Network for Infrared and Visible Image Fusion. 497-505 - Gullal S. Cheema
, Judi Arafat
, Chiao-I Tseng
, John A. Bateman
, Ralph Ewerth
, Eric Müller-Budack
:
Identification of Speaker Roles and Situation Types in News Videos. 506-514 - Tianwei Chen
, Noa Garcia
, Liangzhi Li
, Yuta Nakashima
:
Retrieving Emotional Stimuli in Artworks. 515-523 - Pengfei Wei
, Zhaokang Huang
, Hongjun Ouyang
, Qintai Hu
, Bi Zeng
, Guang Feng
:
CGI-MRE: A Comprehensive Genetic-Inspired Model For Multimodal Relation Extraction. 524-532 - Chenxiao Liu
, Zheyong Xie
, Sirui Zhao
, Jin Zhou
, Tong Xu
, Minglei Li
, Enhong Chen
:
Speak From Heart: An Emotion-Guided LLM-Based Multimodal Method for Emotional Dialogue Generation. 533-542 - Zhirui Kuai
, Yulu Zhou
, Qi Xie
, Li Kuang
:
Multi-Source Augmentation and Composite Prompts for Visual Recognition with Missing Modality. 543-551 - Xiangyu Liu
, Yanlei Shang
, Yong Chen
:
TriMPL: Masked Multi-Prompt Learning with Knowledge Mixing for Vision-Language Few-shot Learning. 552-560 - Zhongnian Li
, Peng Ying
, Meng Wei
, Tongfeng Sun
, Xinzheng Xu
:
Prompt Expending for Single Positive Multi-Label Learning with Global Unannotated Categories. 561-569 - Yaqun Fang
, Yi Shi
, Jia Bei
, Tongwei Ren
:
Semantic-guided RGB-Thermal Crowd Counting with Segment Anything Model. 570-578 - Ruiqi Wu
, Bingliang Jiao
, Wenxuan Wang
, Meng Liu
, Peng Wang
:
Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning. 579-588 - Zhenyu Xie
, Huanyu He
, Gui Zou
, Jie Wu
, Guoliang Liu
, Jun Zhao
, Yingxue Wang
, Hui Lin
, Weiyao Lin
:
Visibility-guided Human Body Reconstruction from Uncalibrated Multi-view Cameras. 589-598 - Yilin Li
, Tszyin Guo
, Ying Qiao
, Zitong Bo
, Hongan Wang
:
FEST: A Multi-way Framework with Enhanced Spatial-Temporal Modeling for Traffic Forecasting. 599-607 - Yuchen Niu
, Min Zhu
, Zhihua Wei:
SamCap: Energy-based Controllable Image Captioning by Gradient-Based Sampling. 608-617 - Zhuoyuan Wei
, Xun Jiang
, Zheng Wang
, Fumin Shen
, Xing Xu
:
PTAN: Principal Token-aware Adjacent Network for Compositional Temporal Grounding. 618-627 - Chao Ye
, Qian Wang
, Lanfang Dong
:
A Hybrid Few-Shot Image Classification Framework Combining Gaussian Modeling and Label Propagation. 628-637 - Shizhou Huang
, Bo Xu
, Changqun Li
, Jiabo Ye
, Xin Lin
:
A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis. 638-646 - Zhikai Hu
, Yiu-ming Cheung
, Yonggang Zhang
, Peiying Zhang
, Pui-ling Tang
:
Component-Level Oracle Bone Inscription Retrieval. 647-656 - Nico Hezel
, Kai Uwe Barthel
, Konstantin Schall
, Klaus Jung
:
An Exploration Graph with Continuous Refinement for Efficient Multimedia Retrieval. 657-665 - Siqi Wei
, Bin Wu
:
Intra and Inter-modality Incongruity Modeling and Adversarial Contrastive Learning for Multimodal Fake News Detection. 666-674 - Kaixing Yang
, Xulong Tang
, Ran Diao
, Hongyan Liu
, Jun He
, Zhaoxin Fan
:
CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic Unit. 675-683 - Yuwen Yang
, Yuxiang Lu
, Suizhi Huang
, Shalayiding Sirejiding
, Hongtao Lu
, Yue Ding
:
Federated Multi-Task Learning on Non-IID Data Silos: An Experimental Study. 684-693 - Xiaoqian Liang
, Jianji Wang
, Yuanliang Lu
, Xubin Duan
, Xichun Liu
, Nanning Zheng
:
Refracting Once is Enough: Neural Radiance Fields for Novel-View Synthesis of Real Refractive Objects. 694-703 - Bo Li
, You Wu
, Zhixin Li
:
Team HUGE: Image-Text Matching via Hierarchical and Unified Graph Enhancing. 704-712 - Peijia Chen
, Ke Qi
, Xi Tao
, Wenhao Xu
, Jingdong Zhang
:
MFVG: A Visual Grounding Network with Multi-scale Fusion. 713-721 - Zhijian Wu
, Wenhui Liu
, Dingjiang Huang
:
When Handcrafted Filter Meets CNN: A Lightweight Conv-Filter Mixer Network for Efficient Image Super-Resolution. 722-730 - Dahuang Liu
, Jiuxiang You
, Guobo Xie, Lap-Kei Lee
, Fu Lee Wang
, Zhenguo Yang
:
Modality-specific and -shared Contrastive Learning for Sentiment Analysis. 731-739 - Zhuohua Li
, Ruyun Wang
, Fuqing Zhu
, Jizhong Han
, Songlin Hu
:
Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image Classification. 740-748 - Xuanhao Qi
, Min Zhi
, Yanjun Yin
, Ping Ping
, Yuening Zhang
:
SFAM: Lightweight Spectrum Unreferenced Attention Network. 749-757 - Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou:
FaceX: Understanding Face Attribute Classifiers through Summary Model Explanations. 758-766 - Weipeng Yang
, Hongxia Gao
, Wenbin Zou
, Tongtong Liu
, Shasha Huang
, Jianliang Ma
:
Low-Light Image Enhancement via Weighted Low-Rank Tensor Regularized Retinex Model. 767-775 - Lai Wei
, Shanshan Song
:
Multi-view Subspace Clustering via An Adaptive Consensus Graph Filter. 776-784 - Ruihai Wu
, Yourong Zhang
, Yu Qi
, Andy Guanhong Chen
, Hao Dong
:
Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns. 785-794 - Xigang Bao
, Mengyuan Tian
, Luyao Wang, Zhiyuan Zha
, Biao Qin
:
Contrastive Pre-training with Multi-level Alignment for Grounded Multimodal Named Entity Recognition. 795-803 - Jian Yang
, Weize Quan
, Zhen Shen
, Dong-Ming Yan
, Huaiyu Wu
:
Neural Parametric Human Hand Modeling with Point Cloud Representation. 804-813 - Yi Li
, Qingmeng Zhu
, Changwen Zheng
, Jiangmeng Li
:
MSI: Multi-modal Recommendation via Superfluous Semantics Discarding and Interaction Preserving. 814-823 - Chao He
, Hongxi Wei
:
HybridHash: Hybrid Convolutional and Self-Attention Deep Hashing for Image Retrieval. 824-832 - Lisong Ou
, Zhixin Li
:
Modeling Multi-Task Joint Training of Aggregate Networks for Multi-Modal Sarcasm Detection. 833-841 - Ziyu Gong
, Chengcheng Mai
, Yihua Huang
:
ML2MG-VLCR: A Multimodal LLM Guided Zero-shot Method for Visio-linguistic Compositional Reasoning with Autoregressive Generative Language Model. 842-850 - Ziqing Deng
, Zhihui Lai
, Yujuan Ding
, Heng Kong
, Xu Wu
:
Deep Scaling Factor Quantization Network for Large-scale Image Retrieval. 851-859 - Yan Wang
, Yawen Zeng
, Junjie Liang
, Xiaofen Xing
, Jin Xu
, Xiangmin Xu
:
RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation. 860-868 - Runlai Hao
, Jinlong Li
, Qiuju Chen
, Huanhuan Chen
:
DualStyle3D: Real-time Exemplar-based Artistic Portrait View Synthesis Based on Radiance Field. 869-877 - Jiancheng Huang
, Mingfu Yan
, Yifan Liu
, Shifeng Chen
:
SBCR: Stochasticity Beats Content Restriction Problem in Training and Tuning Free Image Editing. 878-887 - Shenghao Liu
, Yuqin Lan
, Xianjun Deng
, Lingzhi Yi
, Chenlu Zhu
, Laurence T. Yang
, Jong Hyuk Park
:
TrustGo: Trust Mining and Multi-semantic Regularization in Social Recommendation. 888-896 - Beiqi Liu
, Fuqing Duan
, Junli Zhao
:
SkeletonFormer: Point Cloud Completion with Dynamic Selective Skeleton Points. 897-905 - Chen Huang
, Zhijun Fan
, Kui Xiao
, Yan Zhang
, Shihui Wang
, Jianhua Song
, Wei Wu
, Chao Liu
:
Research on Epilepsy Classification Model Based on Variational Mode Quadratic Decomposition. 906-914 - Xukun Zhou
, Zhenbo Song
, Jun He
, Hongyan Liu
, Zhaoxin Fan
:
STDG: Semi-Teacher-Student Training Paradigm for Depth-guided One-stage Scene Graph Generation. 915-924 - Anrui Wang
, Libo Weng
, Fei Gao
:
BFIDet: A YOLOv7-improved Vehicle and Pedestrian Detector via Balancing Feature Integration. 925-933 - Chun-Yen Chen
, Mei-Chen Yeh
:
Self-Supervised Multi-Label Classification with Global Context and Local Attention. 934-942 - Tianlong Zhang
, Jing Lv
, Ming Yang
:
Semi-Parametric Style Transfer with Multi-Perspective Feature Fusion and Information-Guided Alignment. 943-950 - Kontawat Wisetpaitoon
, Sattaya Singkul
, Theerat Sakdejayont
, Tawunrat Chalothorn
:
End-to-End Thai Text-to-Speech with Linguistic Unit. 951-959 - Linhao Zhou
, Sheng-Hua Zhong
, Zhijiao Xiao
:
Discovering Multi-Relational Integration for Knowledge Tracing with Retentive Networks. 960-968 - Qin Jiang
, Qinglin Wang
, Lihua Chi
, Wentao Ma
, Feng Li
, Jie Liu
:
DeepEnhancer: Temporally Consistent Focal Transformer for Comprehensive Video Enhancement. 969-977 - Hongyi Zhu
, Jia-Hong Huang
, Stevan Rudinac
, Evangelos Kanoulas
:
Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models. 978-987 - Yitong Xing
, Guoqiang Xiao
, Michael S. Lew
, Song Wu
:
Lifelong Visible-Infrared Person Re-Identification via a Tri-Token Transformer with a Query-Key Mechanism. 988-997 - Wenzhuo Li
, Yinghui Wang
, Wei Li
, Liangyi Huang
, Kamoliddin Shukurov
, Mingfeng Wang
:
Wireless Capsule Endoscope Low-light Image Enhancement with Balanced Brightness and Saturation. 998-1005 - Sohail Ahmed Khan
, Duc-Tien Dang-Nguyen
:
CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection. 1006-1015 - Boyue Xu
, Ruichao Hou
, Tongwei Ren
, Gangshan Wu
:
RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory. 1016-1024 - Yongkang Ding
, Anqi Wang
, Liyan Zhang
:
Multidimensional Semantic Disentanglement Network for Clothes-Changing Person Re-Identification. 1025-1033 - Yuting Mei
, Linli Yao
, Qin Jin
:
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos. 1034-1042 - Ali Abdari
, Alex Falcon
, Giuseppe Serra
:
AdOCTeRA: Adaptive Optimization Constraints for improved Text-guided Retrieval of Apartments. 1043-1050 - Ruiting Dai
, Yuqiao Tan
, Lisi Mo
, Shuang Liang
, Guohao Huo
, Jiayi Luo
, Yao Cheng
:
G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning. 1051-1060 - Minyang Xu
, Yunzhong Lou
, Weijian Ma
, Xueyang Li
, Xiangdong Zhou
:
Parametric CAD Primitive Retrieval via Multi-Modal Fusion and Deep Hashing. 1061-1069 - Lai Wei
, Mingyuan Xi
:
Subspace Clustering with A Hybrid Adaptive Graph Filter. 1070-1078
Regular Short Papers
- Cencen Liu
, Dongyang Zhang
, Ke Qin
:
Knowledge Distillation for Single Image Super-Resolution via Contrastive Learning. 1079-1083 - Yuhang Zheng
, Zhen Wang
, Long Chen
:
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning. 1084-1088 - Shuyang Zhang
, Liangwu Wei
, Qingyu Wang
, Yuntao Wei
, Yanzhi Song
:
CLCP: Realtime Text-Image Retrieval for Retailing via Pre-trained Clustering and Priority Queue. 1089-1093 - Mengzhu Yu
, Zhenjun Tang
, Huijiang Zhuang
, Xiaoping Liang
, Zhixin Li
, Xianquan Zhang
:
Robust Video Hashing with Non-negative Tensor Factorization for Copy Detection. 1094-1098 - Yihua Chen
, Xiaoping Liang
, Mengzhu Yu
, Zhenjun Tang
:
Unifying Pictorial and Textual Features for Screen Content Image Quality Evaluation. 1099-1103 - Mingyong Li
, Zongwei Zhao
, Xiaolong Jiang
, Zheng Jiang
:
CLIP-ProbCR: CLIP-based Probability embedding Combination Retrieval. 1104-1109 - Peihao Li
, Jie Huang
, Shuaishuai Zhang
, Chunyang Qi
:
Proactive Privacy and Intellectual Property Protection of Multimedia Retrieval Models in Edge Intelligence. 1110-1114 - Ruonan Zhang
, Xiaohang Liu
, Ge Li
, Thomas H. Li
, Pengjun Zhao
:
Sketch-aided Interactive Fusion Point Cloud Place Recognition. 1115-1119 - Huxiao Ji
, Haitao Yang
, Linchuan Li
, Shunyu Zhang
, Cunyi Zhang
, Xuanping Li
, Wenwu Ou
:
TIM: Temporal Interaction Model in Notification System. 1120-1124 - Quan Li
, Xike Xie
, Chao Wang
, Jiali Weng
:
Local Deep Learning Quantization for Approximate Nearest Neighbor Search. 1125-1129 - Pengfei Zhou
, Fangxiang Feng
, Xiaojie Wang
:
DiffHarmony: Latent Diffusion Model Meets Image Harmonization. 1130-1134 - Haoran Tong
, Xinyan Liu
, Guorong Li
, Laiyun Qing
:
Directly Locating Actions in Video with Single Frame Annotation. 1135-1139 - Ruoxi Sun
, Xinyu Yang
, Cong Qian, Chenyu Zhu, Wei Sui
, Zeyd Boukhers, Cong Yang:
YawnNet: A Visual-Centric Approach for Yawning Detection. 1140-1144 - Eisaku Yoshikawa
, Keishi Tajima
:
Content-Based Exclusion Queries in Keyword-Based Image Retrieval. 1145-1149 - Zhikang Zhang
, Zhongjie Zhu
, Yongqiang Bai
, Ming Wang
, Zhijing Yu
:
Octree-Retention Fusion: A High-Performance Context Model for Point Cloud Geometry Compression. 1150-1154 - Zhuo Lei
, Qiang Yu
, Lidan Shou
, Shengquan Li
, Yunqing Mao
:
A GAN based Video Summarization Method with Representation Loss. 1155-1159 - Sherzod Hakimov
, Gullal S. Cheema
:
Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict. 1160-1164 - Minh-Son Dao
, Koji Zettsu
:
Near-Miss Accident Prediction on the Edge: A Real-Time System for Safer Driving. 1165-1169 - Qinghua Sun
, Jia Cui
, Zhenyu Gu
:
Extending CLIP for Text-to-font Retrieval. 1170-1174 - Xitie Zhang
, Suping Wu
:
CLTalk: Speech-Driven 3D Facial Animation with Contrastive Learning. 1175-1179 - Chih-Pin Tan
, Shuen-Huei Guan
, Yi-Hsuan Yang
:
PiCoGen: Generate Piano Covers with a Two-stage Approach. 1180-1184 - Yueying Feng
, Fan Ma
, Wang Lin
, Chang Yao
, Jingyuan Chen
, Yi Yang:
FedPAM: Federated Personalized Augmentation Model for Text-to-Image Retrieval. 1185-1189
Brave New Ideas Papers
- Lorin Sweeney
, Graham Healy
, Alan F. Smeaton
:
Reconciling the Rift Between Recognition and Recall: Insights from a Video Memorability Drawing Experiment. 1190-1198 - Kai Uwe Barthel
, Florian Tim Barthel
, Peter Eisert
, Nico Hezel
, Konstantin Schall
:
Creating Sorted Grid Layouts with Gradient-based Optimization. 1199-1206 - Christian Limberg
, Zhe Zhang
:
Mapping the Audio Landscape for Innovative Music Sample Generation. 1207-1213
Doctoral Symposium Papers
- Jia-Hong Huang
:
Multi-modal Video Summarization. 1214-1218 - Maria Eirini Pegia
:
Multimodality in Media Retrieval. 1219-1223
Reproducibility Track Papers
- Shuiying Liao
, Yujuan Ding
, P. Y. Mok
, Qiushi Huang
, Jialun Cao
:
Reproducibility Companion Paper: Recommendation of Mix-and-Match Clothing by Modeling Indirect Personal Compatibility. 1224-1227 - Yankun Wu
, Yuta Nakashima
, Noa Garcia
, Sheng Li
, Zhaoyang Zeng
:
Reproducibility Companion Paper: Stable Diffusion for Content-Style Disentanglement in Art Analysis. 1228-1231 - Fan Yu
, Beibei Zhang
, Yaqun Fang
, Jia Bei
, Tongwei Ren
, Jiyi Li
, Luca Rossetto
:
Reproducibility Companion Paper of "MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style". 1232-1235
Technical Demonstrations
- Luca Rossetto
:
OpenLifelogCam - A Low-Cost Open-Source Wearable Camera Platform. 1236-1240 - Panumate Chetprayoon
, Sakol Tasanangam
, Gayatri Tirumalasetty
, Thanatwit Angsarawanee
, Paveen Virameteekul
, Wadeepas Lertwatanawanich
, Theerat Sakdejayont
:
CarAI: Car Inspection with Artificial Intelligence. 1241-1245 - Kuo-Yu Liu
, Ting-Yu Guo
, Ta-Shan Pan
, Ping-Yi Tung
, Yi-Rou Lin
:
AI Batting Buddy: A Computational and Kinematic Approach for Enhancing Batting Performance and Analysis in Baseball. 1246-1250 - Supatta Viriyavisuthisakul
, Parinya Sanguansat
, Toshihiko Yamasaki
:
A Web Demo Interface for Super-Resolution Reconstruction with Parametric Regularization Loss. 1251-1254 - Quang-Linh Tran
, Binh T. Nguyen
, Gareth J. F. Jones
, Cathal Gurrin
:
MemoriLens: a Low-cost Lifelog Camera Using Raspberry Pi Zero. 1255-1259 - Maria Eirini Pegia
, Dimitris Georgalis
, Nick Pantelidis
, Björn Þór Jónsson
, Anastasia Moumtzidou
, Sotiris Diplaris
, Ilias Gialampoukidis
, Stefanos Vrochidis
, Ioannis Kompatsiaris
:
3DMSE: An Interactive 3D Media Search Engine. 1260-1264 - Daniel D. Braghis
, Haiming Liu
:
Conversational Image Search: A Sketch-based Approach. 1265-1269 - Wang Xia
, Guodao Sun
, Zihao Zhu
, Pan Liang
, Sujia Zhu
, Yiming Wu
, Haoran Liang
, Ronghua Liang
:
RE-IDVIS: Person Re-Identification System based on Interactive Visualization. 1270-1274
Challenge Papers
- Duc-Tien Dang-Nguyen
, Sohail Ahmed Khan
, Michael Riegler
, Pål Halvorsen
, Anh-Duy Tran
, Minh-Son Dao
, Minh-Triet Tran
:
Overview of the Grand Challenge on Detecting Cheapfakes at ACM ICMR 2024. 1275-1281 - Hoa-Vien Vo-Hoang
, Long-Khanh Pham
, Minh-Son Dao
:
Detecting Out-of-Context Media with LLaMa-Adapter V2 and RoBERTa: An Effective Method for Cheapfakes Detection. 1282-1287 - Long-Khanh Pham
, Hoa-Vien Vo-Hoang
, Anh-Duy Tran
:
A Generative Adaptive Context Learning Framework for Large Language Models in Cheapfake Detection. 1288-1293 - Anh-Thu Le
, Minh-Dat Nguyen
, Minh-Son Dao
, Anh-Duy Tran
, Duc-Tien Dang-Nguyen
:
TeGA: A Text-Guided Generative-based Approach in Cheapfake Detection. 1294-1299 - Van-Loc Nguyen
, Bao-Tin Nguyen
, Thanh-Son Nguyen
, Duc-Tien Dang-Nguyen
, Minh-Triet Tran
:
A Unified Network for Detecting Out-Of-Context Information Using Generative Synthetic Data. 1300-1305 - Dang Vu
, Minh-Nhat Nguyen
, Quoc-Trung Nguyen
:
Enhancing Cheapfake Detection: An Approach Using Prompt Engineering and Interleaved Text-Image Model. 1306-1311 - Jangwon Seo
, Hyo-Seok Hwang
, Jiyoung Lee
, Minhyeok Lee
, Wonsuk Kim
, Junhee Seok
:
A Multi-Stage Deep Learning Approach Incorporating Text-Image and Image-Image Comparisons for Cheapfake Detection. 1312-1316
Invited Talks Abstracts
- Alan F. Smeaton
:
The LLM Wrecking Ball: Are We About to Lose Decades of Work in Multimedia because of MM-LLMs? 1317 - Yi-Ping Phoebe Chen
:
Diversity in Multimedia. 1318
Tutorial Abstracts
- Frank Sommers
, Alisa Kongthon
, Sarawoot Kongyoung
:
Fine-Tuning Large Language Models for Private Document Retrieval: A Tutorial. 1319-1320 - Vinh Dang
, Thanh-Son Nguyen
, Minh-Triet Tran
, Duc-Tien Dang-Nguyen
:
Detecting Misinformation in Photos Utilizing Reverse Image Search. 1321-1323 - Maria Pegia
, Sotiris Diplaris
, Stefanos Vrochidis
, Heiko Schuldt
, Florian Spiess
, Rahel Arnold
, Werner Bailer
:
Multimedia Retrieval in and for XR. 1324-1325 - Shiqi Wang
, Xinfeng Zhang
:
Compact Visual Data Representation for Multimedia Search and Analytics. 1326-1327
Workshop Abstracts
- Tai Tan Mai
, Quang-Linh Tran
, Ly-Duyen Tran
, Van-Tu Ninh
, Duc-Tien Dang-Nguyen
, Cathal Gurrin
:
The First ACM Workshop on AI-Powered Question Answering Systems for Multimedia. 1328-1329 - Mahasak Ketcham
, Kanyalag Phodong
, Patiyuth Pramkeaw
, Worawut Yimyam
, Narumol Chumuang
, Pokpong Songmuang
, Thittaporn Ganokratanaa
:
AI-SIPM 2024: International Workshop on Artificial Intelligence for Signal, Image Processing and Multimedia. 1330-1331 - Minh-Son Dao
, Michael Alexander Riegler
, Duc-Tien Dang-Nguyen
, Hanh-Nhi Tran
, Rage Uday Kiran
, Takahiro Komamizu
:
ICDAR 24: Intelligent Cross-Data Analysis and Retrieval. 1332-1333 - Cathal Gurrin
, Liting Zhou
, Graham Healy
, Werner Bailer
, Duc-Tien Dang-Nguyen
, Steve Hodges
, Björn Þór Jónsson
, Jakub Lokoc
, Luca Rossetto
, Minh-Triet Tran
, Klaus Schöffmann
:
Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24. 1334-1335 - Zhedong Zheng
, Yaxiong Wang
, Xuelin Qian
, Zhun Zhong
, Zheng Wang
, Liang Zheng
:
MORE'24 Multimedia Object Re-ID: Advancements, Challenges, and Opportunities. 1336-1338 - Cristian Lucian Stanciu
, Bogdan Ionescu
, Luca Cuccovillo
, Symeon Papadopoulos
, Giorgos Kordopatis-Zilos
, Adrian Popescu
, Roberto Caldelli
:
MAD '24 Workshop: Multimedia AI against Disinformation. 1339-1341 - Marc A. Kastner
, Gullal S. Cheema
, Sherzod Hakimov
, Noa Garcia
:
MUWS 2024: The 3rd International Workshop on Multimodal Human Understanding for the Web and Social Media. 1342-1344 - Hui Wang
, Josef Kittler
, Mark J. F. Gales, Rob Cooper
, Maurice D. Mulvenna
, Wing W. Y. Ng
, Yang Hua
, Richard Gault
, Abbas Haider
, Guanfeng Wu
:
MVRMLM 2024: Multimodal Video Retrieval and Multimodal Language Modelling. 1345-1346 - Hongzhang Mu
, Shuili Zhang
, Hongbo Xu
:
A Knowledge-Driven Approach to Enhance Topic Modeling with Multi-Modal Representation Learning. 1347-1355

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.