


default search action
29th MMM 2023: Bergen, Norway - Part I
- Duc-Tien Dang-Nguyen
, Cathal Gurrin
, Martha A. Larson
, Alan F. Smeaton, Stevan Rudinac
, Minh-Son Dao
, Christoph Trattner
, Phoebe Chen
:
MultiMedia Modeling - 29th International Conference, MMM 2023, Bergen, Norway, January 9-12, 2023, Proceedings, Part I. Lecture Notes in Computer Science 13833, Springer 2023, ISBN 978-3-031-27076-5
Detection, Recognition and Identification
- Ziyan Liao
, Dening Di
, Jingsong Hao
, Jiang Zhang
, Shulei Zhu, Jun Yin:
MMM-GCN: Multi-Level Multi-Modal Graph Convolution Network for Video-Based Person Identification. 3-15 - Chong-Jian Zhang, Song-Lu Chen
, Qi Liu, Zhi-Yong Huang, Feng Chen, Xu-Cheng Yin:
Feature Enhancement and Reconstruction for Small Object Detection. 16-27 - Zhiyong Zhou, Yuanning Liu, Xiaodong Zhu, Shuai Liu, Shaoqiang Zhang, Zhen Liu:
Toward More Accurate Heterogeneous Iris Recognition with Transformers and Capsules. 28-40 - Xiaotian Wang, Letian Zhao, Wei Wu, Xi Jin:
MCANet: Multiscale Cross-Modality Attention Network for Multispectral Pedestrian Detection. 41-53
Human Action Understanding
- Yibo Hu, Chenyu Cao, Fangtao Li, Chenghao Yan, Jinsheng Qi, Bin Wu:
Overall-Distinctive GCN for Social Relation Recognition on Videos. 57-68 - Haoran Ren, Hao Ren, Hong Lu, Cheng Jin:
Weakly-Supervised Temporal Action Localization with Regional Similarity Consistency. 69-81 - Yanrui Niu, Jingyao Yang, Chao Liang, Baojin Huang, Zhongyuan Wang:
A Spatio-Temporal Identity Verification Method for Person-Action Instance Search in Movies. 82-94 - Hongfeng Han, Zhiwu Lu, Ji-Rong Wen:
Binary Neural Network for Video Action Recognition. 95-106
Image Quality Assessment and Enhancement
- Bowen Wan
, Daming Shi
, Yukun Liu
:
STN: Stochastic Triplet Neighboring Approach to Self-supervised Denoising from Limited Noisy Images. 109-120 - Haodian Wang, Yang Wang, Yang Cao, Zheng-Jun Zha:
Fusion-Based Low-Light Image Enhancement. 121-133 - Ailin Li, Lei Zhao, Zhiwen Zuo, Zhizhong Wang, Wei Xing, Dongming Lu:
Towards Interactive Facial Image Inpainting by Text or Exemplar Image. 134-148 - Yihua Chen, Zhiyuan Chen, Mengzhu Yu, Zhenjun Tang:
Dual-Feature Aggregation Network for No-Reference Image Quality Assessment. 149-161
Multimedia Analytics Application
- Jiaying Lan, Lianglun Cheng, Guoheng Huang, Chi-Man Pun, Xiaochen Yuan, Shangyu Lai, Hongrui Liu, Wing-Kuen Ling:
Single Cross-domain Semantic Guidance Network for Multimodal Unsupervised Image Translation. 165-177 - Itthisak Phueaksri
, Marc A. Kastner
, Yasutomo Kawanishi
, Takahiro Komamizu
, Ichiro Ide
:
Towards Captioning an Image Collection from a Combined Scene Graph Representation Approach. 178-190 - Jianghai Wang, Menghao Hu, Yaguang Song, Xiaoshan Yang:
Health-Oriented Multimodal Food Question Answering. 191-203 - Golsa Tahmasebzadeh
, Eric Müller-Budack
, Sherzod Hakimov
, Ralph Ewerth
:
MM-Locate-News: Multimodal Focus Location Estimation in News. 204-216
Multimedia Content Generation
- Jiyun Li, Yuan Gao, Chen Qian, Jiachen Lu, Zhongqin Chen:
C-GZS: Controllable Person Image Synthesis Based on Group-Supervised Zero-Shot Learning. 219-230 - Fan Zhang
, Naye Ji
, Fuxing Gao, Yongping Li:
DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model. 231-242 - Dongjin Huang, Yue Zhang, Zhenyan Li, Jinhua Liu:
TG-Dance: TransGAN-Based Intelligent Dance Generation with Music. 243-254 - Zi Chai, Xiaojun Wan, Soyeon Caren Han, Josiah Poon:
Visual Question Generation Under Multi-granularity Cross-Modal Interaction. 255-266
Multimodal and Multidimensional Imaging Application
- Haoyi Xiu
, Xin Liu
, Weimin Wang
, Kyoung-Sook Kim
, Takayuki Shinohara
, Qiong Chang
, Masashi Matsuoka
:
Optimizing Local Feature Representations of 3D Point Clouds with Anisotropic Edge Modeling. 269-281 - Tao Wen, Chao Liang, You-Ming Fu, Chun-Xia Xiao, Hai-Ming Xiang:
Floor Plan Analysis and Vectorization with Multimodal Information. 282-293 - Pengwei Tang, Huayi Tang, Wei Wang, Yong Liu:
Safe Contrastive Clustering. 294-305 - Shufan Dai, Yangjie Cao, Pengsong Duan, Xianfu Chen
:
SRes-NeRF: Improved Neural Radiance Fields for Realism and Accuracy of Specular Reflections. 306-317
Real-Time and Interactive Application
- Zhi-Yong Huang, Song-Lu Chen
, Qi Liu, Chong-Jian Zhang, Feng Chen, Xu-Cheng Yin:
LiteHandNet: A Lightweight Hand Pose Estimation Network via Structural Feature Enhancement. 321-333 - Nikhil Kumar Tomar, Debesh Jha, Ulas Bagci:
DilatedSegNet: A Deep Dilated Segmentation Network for Polyp Segmentation. 334-344 - Hsin-Hung Chen
, Alexander Lerch
:
Music Instrument Classification Reprogrammed. 345-357 - Mingqi Chen
, Shaodong Li
, Feng Shuang
, Kai Luo:
Cascading CNNs with S-DQN: A Parameter-Parsimonious Strategy for 3D Hand Pose Estimation. 358-369
ICDAR: Intelligent Cross-Data Analysis and Retrieval
- Yuzhe Hao, Kuniaki Uto, Asako Kanezaki, Ikuro Sato, Rei Kawakami, Koichi Shinoda:
EvIs-Kitchen: Egocentric Human Activities Recognition with Video and Inertial Sensor Data. 373-384 - Longlong Zhou, Xiaojun Wu, Tianyang Xu
:
COMIM-GAN: Improved Text-to-Image Generation via Condition Optimization and Mutual Information Maximization. 385-396 - Jakub Lokoc
, Ladislav Peska
:
A Study of a Cross-modal Interactive Search Tool Using CLIP and Temporal Fusion. 397-408 - Dinh-Duy Pham, Minh-Son Dao
, Thanh-Binh Nguyen:
A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos. 409-420 - Mingliang Liang, Zhuoran Liu, Martha A. Larson:
Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching. 421-433 - Alireza Hossein Zadeh Nik, Michael A. Riegler, Pål Halvorsen, Andrea M. Storås:
Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks. 434-446 - Quoc-Cuong Le
, Minh-Quan Le
, Mai-Khiem Tran
, Ngoc-Quyen Le
, Minh-Triet Tran
:
FL-Former: Flood Level Estimation with Vision Transformer for Images from Cameras in Urban Areas. 447-459
MDRE: Multimedia Datasets for Repeatable Experimentation
- Tsung-Han Ho, Chen-Yin Yu, Tsai-Yen Ko, Wei-Ta Chu:
The NCKU-VTF Dataset and a Multi-scale Thermal-to-Visible Face Synthesis System. 463-475 - Viktor Lakic
, Luca Rossetto
, Abraham Bernstein
:
Link-Rot in Web-Sourced Multimedia Datasets. 476-488 - Werner Bailer
, Hannes Fassold
:
People@Places and ToDY: Two Datasets for Scene Classification in Media Production and Archiving. 489-501 - Michael A. Riegler, Vajira Thambawita, Ayan Chatterjee, Binh T. Nguyen, Steven Alexander Hicks, Vibeke Telle-Hansen, Svein Arne Pettersen, Dag Johansen, Ramesh C. Jain, Pål Halvorsen:
ScopeSense: An 8.5-Month Sport, Nutrition, and Lifestyle Lifelogging Dataset. 502-514 - Yuan Lin, Zhaoqi Chu, Jari Korhonen, Jiayi Xu, Xiangrong Liu, Juan Liu, Min Liu, Lvping Fang, Weidi Yang, Debasish Ghose, Junyong You:
Fast Accurate Fish Recognition with Deep Learning Based on a Domain-Specific Large-Scale Fish Dataset. 515-526 - Maarten Sukel
, Stevan Rudinac
, Marcel Worring
:
GIGO, Garbage In, Garbage Out: An Urban Garbage Classification Dataset. 527-538 - Quang-Trung Truong
, Tuan-Anh Vu
, Tan-Sang Ha
, Jakub Lokoc
, Yue Him Tim Wong
, Ajay Joneja
, Sai-Kit Yeung
:
Marine Video Kit: A New Marine Video Dataset for Content-Based Analysis and Retrieval. 539-550
SNL: Sport and Nutrition Lifelogging
- Tor-Arne S. Nordmo, Michael A. Riegler, Håvard D. Johansen, Dag Johansen:
Arctic HARE: A Machine Learning-Based System for Performance Analysis of Cross-Country Skiers. 553-564 - Matthias Boeker
, Cise Midoglu
:
Soccer Athlete Data Visualization and Analysis with an Interactive Dashboard. 565-576 - Bjørn Aslak Juliussen
, Jon Petter Rui
, Dag Johansen
:
Sport and Nutrition Digital Analysis: A Legal Assessment. 577-588 - Nitish Nagesh
, Iman Azimi
, Tom Andriola
, Amir M. Rahmani
, Ramesh C. Jain
:
Towards Deep Personal Lifestyle Models Using Multimodal N-of-1 Data. 589-600 - Aakash Sharma
, Katja Pauline Czerwinska
, Dag Johansen
, Håvard D. Johansen
:
Capturing Nutrition Data for Sports: Challenges and Ethical Issues. 601-612
VBS: Video Browser Showdown
- Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, Claudio Vairo:
VISIONE at Video Browser Showdown 2023. 615-621 - Florian Spiess
, Silvan Heller
, Luca Rossetto
, Loris Sauter
, Philipp Weber
, Heiko Schuldt
:
Traceable Asynchronous Workflows in Video Retrieval with vitrivr-VR. 622-627 - Jakub Lokoc, Zuzana Vopálková, Patrik Dokoupil, Ladislav Peska:
Video Search with CLIP and Interactive Text Query Reformulation. 628-633 - Sebastian Lubos
, Massimiliano Rubino, Christian Tautschnig, Markus Tautschnig, Boda Wen, Klaus Schoeffmann, Alexander Felfernig:
Perfect Match in Video Retrieval. 634-639 - Weixi Song
, Jiangshan He
, Xinghan Li, Shiwei Feng, Chao Liang:
QIVISE: A Quantum-Inspired Interactive Video Search Engine in VBS2023. 640-645 - Loris Sauter
, Ralph Gasser
, Silvan Heller
, Luca Rossetto
, Colin Saladin, Florian Spiess
, Heiko Schuldt
:
Exploring Effective Interactive Text-Based Video Search in vitrivr. 646-651 - Nhat Hoang-Xuan
, E-Ro Nguyen
, Thang-Long Nguyen-Ho
, Minh-Khoi Pham
, Quang-Thuc Nguyen
, Hoang-Phuc Trang-Trung
, Van-Tu Ninh
, Tu-Khiem Le
, Cathal Gurrin
, Minh-Triet Tran
:
V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023. 652-657 - Nick Pantelidis, Stelios Andreadis, Maria Pegia, Anastasia Moumtzidou, Damianos Galanopoulos, Konstantinos Apostolidis, Despoina Touska
, Konstantinos Gkountakos
, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris:
VERGE in VBS 2023. 658-664 - Konstantin Schall, Nico Hezel, Klaus Jung, Kai Uwe Barthel:
Vibro: Video Browsing with Semantic and Visual Image Embeddings. 665-670 - Thao-Nhu Nguyen, Bunyarit Puangthamawathanakun
, Annalina Caputo, Graham Healy
, Binh T. Nguyen, Chonlameth Arpnikanondt, Cathal Gurrin
:
VideoCLIP: An Interactive CLIP-based Video Retrieval System at VBS2023. 671-677 - Rahel Arnold
, Loris Sauter
, Heiko Schuldt
:
Free-Form Multi-Modal Multimedia Retrieval (4MR). 678-683 - Klaus Schoeffmann, Daniela Stefanics, Andreas Leibetseder:
diveXplore at the Video Browser Showdown 2023. 684-689 - Zhixin Ma, Jiaxin Wu
, Weixiong Loo, Chong-Wah Ngo:
Reinforcement Learning Enhanced PicHunter for Interactive Search. 690-696

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.