


default search action
29th ACM Multimedia 2021: Virtual Event, China
- Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo César, Florian Metze, Balakrishnan Prabhakaran:
MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. ACM 2021, ISBN 978-1-4503-8651-7
Keynote Talks I&II
- Wen Gao:
Video Coding for Machine. 1 - H. V. Jagadish:
Semantic Media Conversion: Possibilities and Limits. 2
Session 1: Deep Learning for Multimedia-I
- Rong Zhang, Wei Li, Yiqun Zhang, Hong Zhang, Jinhui Yu, Ruigang Yang, Weiwei Xu:
Image Re-composition via Regional Content-Style Decoupling. 3-11 - Hao Huang, Shinjae Yoo
, Chenxiao Xu:
Deep Clustering based on Bi-Space Association Learning. 12-21 - Seogkyu Jeon, Kibeom Hong, Pilhyeon Lee, Jewook Lee, Hyeran Byun:
Feature Stylization and Domain-aware Contrastive Learning for Domain Generalization. 22-31 - Qi Zhang, Xuesong Zhang
, Baoping Li, Yuzhong Chen, Anlong Ming:
HDA-Net: Horizontal Deformable Attention Network for Stereo Matching. 32-40 - Zhaoyang Jia, Han Fang, Weiming Zhang:
MBRS: Enhancing Robustness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression. 41-49 - Ye Liu, Lei Zhu, Shunda Pei, Huazhu Fu
, Jing Qin
, Qing Zhang, Liang Wan, Wei Feng:
From Synthetic to Real: Image Dehazing Collaborating with Unlabeled Real Data. 50-58
Session 2: Deep Learning for Multimedia-II
- Jiangtong Li, Wentao Wang, Junjie Chen
, Li Niu, Jianlou Si, Chen Qian, Liqing Zhang:
Video Semantic Segmentation via Sparse Temporal Transformer. 59-68 - Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jianxiong Pan, Kaiwen Cui, Shijian Lu, Feiying Ma, Xuansong Xie, Chunyan Miao:
Diverse Image Inpainting with Bidirectional and Autoregressive Transformers. 69-78 - Hanbang Liang, Xianxu Hou, Linlin Shen:
SSFlow: Style-guided Neural Spline Flows for Face Image Manipulation. 79-87 - Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi
:
Constrained Graphic Layout Generation via Latent Optimization. 88-96 - Xiaoya Zhang, Ling Zhou, Yong Li, Zhen Cui, Jin Xie, Jian Yang:
Transfer Vision Patterns for Multi-Task Pixel Learning. 97-106 - Yike Wu, Bo Zhang
, Gang Yu
, Weixi Zhang, Bin Wang, Tao Chen, Jiayuan Fan:
Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification. 107-115
Session 3: Brave New Idea
- Yunan Zhu
, Haichuan Ma, Jialun Peng, Dong Liu, Zhiwei Xiong:
Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN. 116-125 - Liangchen Song, Sheng Liu, Celong Liu, Zhong Li, Yuqi Ding
, Yi Xu, Junsong Yuan:
Learning Kinematic Formulas from Multiple View Videos. 126-134 - Pingyue Zhang, Mengyue Wu, Heinrich Dinkel, Kai Yu:
DEPA: Self-Supervised Audio Embedding for Depression Detection. 135-143 - Zhaodong Kang, Jianing Li, Lin Zhu
, Yonghong Tian:
Retinomorphic Sensing: A Novel Paradigm for Future Multimedia Computing. 144-152 - Haihan Duan, Jiaye Li, Sizheng Fan, Zhonghao Lin, Xiao Wu, Wei Cai:
Metaverse for Social Good: A University Campus Prototype. 153-161
Session 4: Deep Learning for Multimedia-III
- Yueqi Xie
, Ka Leong Cheng, Qifeng Chen:
Enhanced Invertible Encoding for Learned Image Compression. 162-170 - Shihao Zhou
, Mengxi Jiang, Shanshan Cai, Yunqi Lei:
DC-GNet: Deep Mesh Relation Capturing Graph Convolution Network for 3D Human Shape Reconstruction. 171-180 - Xun Cai, Jiajing Chai, Yanbo Gao, Shuai Li, Bo Zhu:
Deep Marginal Fisher Analysis based CNN for Image Representation and Classification. 181-189 - Yuanzhouhan Cao, Yidong Li, Haokui Zhang, Chao Ren, Yifan Liu
:
Learning Structure Affinity for Video Depth Estimation. 190-198 - Jingjing Jiang
, Ziyi Liu
, Yifan Liu, Zhixiong Nan, Nanning Zheng:
X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering. 199-208 - Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, Gang Hua:
DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval. 209-217
Session 5: Emerging Multimedia Applications-I
- David D. Nguyen, Surya Nepal
, Salil S. Kanhere:
Diverse Multimedia Layout Generation with Multi Choice Learning. 218-226 - Liangchen Liu, Xi Yang, Nannan Wang, Xinbo Gao:
Viewing from Frequency Domain: A DCT-based Information Enhancement Network for Video Person Re-Identification. 227-235 - Yingqing He, Yazhou Xing, Tianjia Zhang, Qifeng Chen:
Unsupervised Portrait Shadow Removal via Generative Priors. 236-244 - Yi Huang, Xiaoshan Yang, Changsheng Xu:
Multimodal Global Relation Knowledge Distillation for Egocentric Action Anticipation. 245-254 - Liuan Wang, Li Sun
, Mingjie Zhang, Huigang Zhang
, Ping Wang
, Rong Zhou, Jun Sun
:
Exploring Pathologist Knowledge for Automatic Assessment of Breast Cancer Metastases in Whole-slide Image. 255-263 - Mingxing Duan, Kenli Li, Lingxi Xie, Qi Tian, Bin Xiao
:
Towards Multiple Black-boxes Attack via Adversarial Example Generation Network. 264-272
Session 6: Emerging Multimedia Applications-II
- Hao Feng
, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li:
DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction. 273-281 - Yiyang Gan, Ruize Han, Liqiang Yin, Wei Feng, Song Wang
:
Self-supervised Multi-view Multi-Human Association and Tracking. 282-290 - Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo
:
Learning Fine-Grained Motion Embedding for Landscape Animation. 291-299 - Ying Li
, Hongwei Zhou, Yeyu Yin, Jiaquan Gao:
Multi-label Pattern Image Retrieval via Attention Mechanism Driven Graph Convolutional Network. 300-308 - Na Zheng, Xuemeng Song, Qingying Niu, Xue Dong, Yibing Zhan, Liqiang Nie:
Collocation and Try-on Network: Whether an Outfit is Compatible. 309-317 - Rishabh Baghel, Abhishek Trivedi, Tejas Ravichandran, Ravi Kiran Sarvadevabhatla
:
MeronymNet: A Hierarchical Model for Unified and Controllable Multi-Category Object Generation. 318-326
Session 7: Emerging Multimedia Applications-III
- Akash Gupta, Padmaja Jonnalagedda, Bir Bhanu, Amit K. Roy-Chowdhury:
Ada-VSR: Adaptive Video Super-Resolution with Meta-Learning. 327-336 - Minha Kim, Shahroz Tariq
, Simon S. Woo:
CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation. 337-346 - Xiaowen Ying, Xin Li, Mooi Choo Chuah:
SRNet: Spatial Relation Network for Efficient Single-stage Instance Segmentation in Videos. 347-356 - Zilong Shao, Siyang Song, Shashank Jaiswal, Linlin Shen, Michel F. Valstar, Hatice Gunes:
Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation. 357-366 - Xiaopeng Guo, Zhijie Huang, Jie Gao, Mingyu Shang, Maojing Shu, Jun Sun:
Enhancing Knowledge Tracing via Adversarial Training. 367-375 - Gangyan Zeng, Yuan Zhang, Yu Zhou
, Xiaomeng Yang:
Beyond OCR + VQA: Involving OCR into the Flow for Robust and Accurate TextVQA. 376-385
Poster Session 1
- Qing Guo, Xiaoguang Li, Felix Juefei-Xu, Hongkai Yu, Yang Liu
, Song Wang
:
JPGNet: Joint Predictive Filtering and Generative Network for Image Inpainting. 386-394 - Yihao Huang, Qing Guo, Felix Juefei-Xu, Lei Ma, Weikai Miao, Yang Liu
, Geguang Pu:
AdvFilter: Predictive Perturbation-aware Filtering against Adversarial Attack via Multi-domain Learning. 395-403 - Zizheng Yan, Xianggang Yu, Yipeng Qin
, Yushuang Wu, Xiaoguang Han, Shuguang Cui
:
Pixel-level Intra-domain Adaptation for Semantic Segmentation. 404-413 - Xugong Qin
, Yu Zhou
, Youhui Guo, Dayan Wu, Zhihong Tian, Ning Jiang, Hongbin Wang, Weiping Wang:
Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection. 414-423 - Chuanjun Zheng, Daming Shi, Yukun Liu
:
Windowing Decomposition Convolutional Neural Network for Image Enhancement. 424-432 - Weiming Zhuang, Yonggang Wen, Shuai Zhang:
Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification. 433-441 - Zehai Niu, Ke Lu, Jian Xue, Haifeng Ma, Runchen Wei:
Multi-view 3D Smooth Human Pose Estimation based on Heatmap Filtering and Spatio-temporal Information. 442-450 - Yu-Ke Li, Pin Wang, Mang Ye
, Ching-Yao Chan:
Imitative Learning for Multi-Person Action Forecasting. 451-459 - Ruikang Xu, Zeyu Xiao, Mingde Yao, Yueyi Zhang, Zhiwei Xiong:
Stereo Video Super-Resolution via Exploiting View-Temporal Correlations. 460-468 - Jiawei Zhao, Yifan Zhao, Jia Li:
M3TR: Multi-modal Multi-label Recognition with Transformer. 469-477 - Luchuan Song, Bin Liu, Guojun Yin, Xiaoyi Dong, Yufei Zhang, Jia-Xuan Bai:
TACR-Net: Editing on Deep Video and Voice Portraits. 478-486 - Yixiong Zou, Shanghang Zhang, Guangyao Chen, Yonghong Tian, Kurt Keutzer, José M. F. Moura:
Annotation-Efficient Untrimmed Video Action Recognition. 487-495 - Hsiao-Han Lu, Shao-En Weng, Ya-Fan Yen, Hong-Han Shuai, Wen-Huang Cheng:
Face-based Voice Conversion: Learning the Voice behind a Face. 496-505 - Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim
, Steven C. H. Hoi, Qianru Sun:
A Large-Scale Benchmark for Food Image Segmentation. 506-515 - Guowen Zhang
, Pingping Zhang, Jinqing Qi, Huchuan Lu:
HAT: Hierarchical Aggregation Transformers for Person Re-identification. 516-525 - Qinglin Liu, Haozhe Xie, Shengping Zhang, Bineng Zhong, Rongrong Ji:
Long-Range Feature Propagating for Natural Image Matting. 526-534 - Ansheng You, Chenglin Zhou
, Qixuan Zhang, Lan Xu
:
Towards Controllable and Photorealistic Region-wise Image Manipulation. 535-543 - Zhuangzi Li, Ge Li, Thomas H. Li, Shan Liu, Wei Gao
:
Information-Growth Attention Network for Image Super-Resolution. 544-552 - Jiale Li
, Hang Dai, Ling Shao, Yong Ding:
Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud. 553-562 - Xin Gao, Zhenjiang Liu, Zunlei Feng, Chengji Shen, Kairi Ou, Haihong Tang, Mingli Song:
Shape Controllable Virtual Try-on for Underwear Models. 563-572 - Zhiwei Chen
, Liujuan Cao, Yunhang Shen, Feihong Lian, Yongjian Wu, Rongrong Ji:
E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization. 573-581 - Jiahao Wang, Yunhong Wang, Sheng Liu, Annan Li
:
Few-shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning. 582-591 - Yi Tan
, Yanbin Hao, Xiangnan He, Yinwei Wei, Xun Yang:
Selective Dependency Aggregation for Action Classification. 592-601 - Wenbo Hu, Changgong Zhang, Fangneng Zhan, Lei Zhang, Tien-Tsin Wong:
Conditional Directed Graph Convolution for 3D Human Pose Estimation. 602-611 - Gangming Zhao:
Cross Chest Graph for Disease Diagnosis with Structural Relational Reasoning. 612-620 - Qi Wen, Shuang Li, Bingfeng Han, Yi Yuan:
ZiGAN: Fine-grained Chinese Calligraphy Font Generation via a Few-shot Style Transfer Approach. 621-629 - Hao Wang
, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao:
Cycle-Consistent Inverse GAN for Text-to-Image Synthesis. 630-638 - Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen:
Fully Quantized Image Super-Resolution Networks. 639-647 - Haonan Zhang, Longjun Liu
, Hengyi Zhou, Wenxuan Hou, Hongbin Sun, Nanning Zheng:
AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel Pruning. 648-657 - Qiangqiang Wu
, Jia Wan
, Antoni B. Chan
:
Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting. 658-666 - Miao Zhang, Tingwei Liu, Yongri Piao, Shunyu Yao, Huchuan Lu:
Auto-MSFNet: Search Multi-scale Fusion Network for Salient Object Detection. 667-676 - Shengqi Huang, Wanqi Yang, Lei Wang, Luping Zhou
, Ming Yang:
Few-shot Unsupervised Domain Adaptation with Image-to-Class Sparse Similarity Encoding. 677-685 - Xuanhan Wang, Lianli Gao, Yan Dai, Yixuan Zhou, Jingkuan Song:
Semantic-aware Transfer with Instance-adaptive Parsing for Crowded Scenes Pose Estimation. 686-694 - Haoyu Zhang
, Meng Liu, Zan Gao, Xiaoqiang Lei, Yinglong Wang, Liqiang Nie:
Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding. 695-703 - Jingwei Liao
, Yanli Liu, Guanyu Xing, Housheng Wei
, Jueyu Chen, Songhua Xu:
Shadow Detection via Predicting the Confidence Maps of Shadow Detection Methods. 704-712 - Pengxiang Su
, Zhenguang Liu, Shuang Wu
, Lei Zhu, Yifang Yin
, Xuanjing Shen:
Motion Prediction via Joint Dependency Modeling in Phase Space. 713-721 - Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Ji Wan, Mingliang Xu:
Q-Art Code: Generating Scanning-robust Art-style QR Codes by Deformable Convolution. 722-730 - Wenbo Zhang, Ge-Peng Ji, Zhuo Wang, Keren Fu, Qijun Zhao:
Depth Quality-Inspired Feature Manipulation for Efficient RGB-D Salient Object Detection. 731-740 - Yixiong Zou, Shanghang Zhang, Jianpeng Yu, Yonghong Tian, José M. F. Moura:
Revisiting Mid-Level Patterns for Cross-Domain Few-Shot Recognition. 741-749 - Yuqi Sun, Ri Cheng, Bo Yan, Shili Zhou:
Space-Angle Super-Resolution for Multi-View Images. 750-759 - Wei Wang, Junyu Gao, Changsheng Xu:
Weakly-Supervised Video Object Grounding via Stable Context Learning. 760-768 - Yukun Su, Guosheng Lin, Ruizhou Sun, Yun Hao, Qingyao Wu:
Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning. 769-778 - Rongyun Mo, Yan Yan, Jing-Hao Xue, Si Chen, Hanzi Wang:
D³Net: Dual-Branch Disturbance Disentangling Network for Facial Expression Recognition. 779-787 - Yukang Zhang, Yan Yan, Yang Lu, Hanzi Wang:
Towards a Unified Middle Modality Learning for Visible-Infrared Person Re-Identification. 788-796 - Yuhao Cui, Zhou Yu
, Chunqi Wang, Zhongzhou Zhao, Ji Zhang, Meng Wang, Jun Yu:
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration. 797-806 - Xuanxiang Lin, Ke Chen, Kui Jia:
Object Point Cloud Classification via Poly-Convolutional Architecture Search. 807-815 - Xiao Wang, Weirong Ye, Zhongang Qi, Xun Zhao, Guangge Wang, Ying Shan, Hanzi Wang:
Semantic-Guided Relation Propagation Network for Few-shot Action Recognition. 816-825 - Yunjie Ge, Qian Wang
, Baolin Zheng, Xinlu Zhuang, Qi Li, Chao Shen, Cong Wang:
Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation. 826-834 - Yinglu Liu, Mingcan Xiang, Hailin Shi, Tao Mei:
One-stage Context and Identity Hallucination Network. 835-843 - Zhi Chen
, Yadan Luo
, Sen Wang
, Ruihong Qiu
, Jingjing Li, Zi Huang
:
Mitigating Generation Shifts for Generalized Zero-Shot Learning. 844-852 - Yuan Ji, Xu Jia, Huchuan Lu, Xiang Ruan:
Weakly-Supervised Temporal Action Localization via Cross-Stream Collaborative Learning. 853-861 - Cheng Chen, Jiayin Cai, Yao Hu, Xu Tang, Xinggang Wang
, Chun Yuan, Xiang Bai, Song Bai:
Deep Interactive Video Inpainting: An Invisibility Cloak for Harry Potter. 862-870 - Chenchen Liu, Yadong Mu:
Searching Motion Graphs for Human Motion Synthesis. 871-879 - Hanbin Zhao, Xin Qin, Shihao Su, Yongjian Fu, Zibo Lin, Xi Li:
When Video Classification Meets Incremental Classes. 880-889 - Yulin He
, Wei Chen, Zhengfa Liang, Dan Chen, Yusong Tan, Xin Luo, Chen Li
, Yulan Guo:
Fast and Accurate Lane Detection via Frequency Domain Learning. 890-898 - Yifang Yin
, Ying Zhang, Zhenguang Liu, Yuxuan Liang, Sheng Wang, Rajiv Ratn Shah, Roger Zimmermann:
Learning Multi-context Aware Location Representations from Large-scale Geotagged Images. 899-907 - Xiaojing Zhong, Zhonghua Wu, Taizhe Tan, Guosheng Lin, Qingyao Wu:
MV-TON: Memory-based Video Virtual Try-on network. 908-916 - Hao Zhang, Yanbin Hao, Chong-Wah Ngo:
Token Shift Transformer for Video Classification. 917-925 - Rui Wang, Jian Chen, Gang Yu
, Li Sun, Changqian Yu, Changxin Gao, Nong Sang:
Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation. 926-934 - Zhihao Peng
, Hui Liu, Yuheng Jia, Junhui Hou
:
Attention-driven Graph Clustering Network. 935-943 - Tianhao Fu, Yingying Li, Xiaoqing Ye, Xiao Tan, Hao Sun, Fumin Shen, Errui Ding:
Lifting the Veil of Frequency in Joint Segmentation and Depth Estimation. 944-952
Panel 1
- João Magalhães, Tat-Seng Chua, Tao Mei, Alan F. Smeaton:
The Next Generation Multimodal Conversational Search and Recommendation. 953-954
Session 8: Emerging Multimedia Applications-IV
- Guanze Liu, Yu Rong, Lu Sheng
:
VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds. 955-964 - Shao-Kui Zhang
, Yi-Xiao Li, Yu He
, Yong-Liang Yang, Song-Hai Zhang:
MageAdd: Real-Time Interaction Simulation for Scene Synthesis. 965-973 - Gaowen Liu, Hao Tang, Hugo Latapie, Jason J. Corso
, Yan Yan:
Cross-View Exocentric to Egocentric Video Synthesis. 974-982 - Sachin Mehta, Amit Kumar, Fitsum A. Reda, Varun Nasery, Vikram Mulukutla, Rakesh Ranjan, Vikas Chandra:
EVRNet: Efficient Video Restoration on Edge Devices. 983-992 - Jingru Gan, Jinchang Luo, Haiwei Wang, Shuhui Wang, Wei He, Qingming Huang:
Multimodal Entity Linking: A New Dataset and A Baseline. 993-1001 - Xichu Ma, Ye Wang
, Min-Yen Kan, Wee Sun Lee:
AI-Lyricist: Generating Music and Vocabulary Constrained Lyrics. 1002-1011
Session 9: Emotional and Social Signals in Multimedia
- Yingjie Chen, Diqi Chen, Yizhou Wang, Tao Wang, Yun Liang:
CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition. 1029-1037 - Jingwei Yan, Jingjing Wang, Qiang Li, Chunmao Wang, Shiliang Pu:
Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition. 1038-1046 - Ziyu Jia, Youfang Lin
, Jing Wang, Zhiyang Feng, Xiangheng Xie, Caijie Chen:
HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition. 1047-1056 - Xu Yan
, Li-Ming Zhao, Bao-Liang Lu:
Simplifying Multimodal Emotion Recognition with Single Eye Movement Modality. 1057-1063