TopicCAT: Unsupervised Topic-Guided Co-Attention Transformer for Extreme Multimodal Summarisation


연구 분야: Artificial Intelligence



학회: MM '23: Proceedings of the 31st ACM International Conference on Multimedia


초록

The exponential growth of multimedia data has sparked a surge of interest in multimodal summarisation with multimodal output (MSMO). A relatively unexplored but essential task within this field is extreme multimodal summarisation, a process that involves creating extremely concise multimodal summaries to further address the issue of multimedia information overload. In this study, we propose a novel Unsupervised Topic-guided Co-Attention Transformer (TopicCAT) neural network to produce extreme multimodal summaries for video-document pairs. The approach consists of two learning stages for a comprehensive multimodal understanding, guided by topic-based insights: a unimodal learning stage and a cross-modal learning stage, in which a cross-modal topic model is devised to capture the overarching themes present in both documents and videos. To achieve unsupervised learning, eliminating the need for resource-expensive collection of ground-truth multimodal summaries, we propose an optimal transport-based optimisation scheme to evaluate summary coverage from a semantic distribution perspective at the topic-level. Comprehensive experiments demonstrate the effectiveness of our proposed TopicCAT method on a multimodal news dataset, achieving a BERTScore of 84.46 and an accuracy of 0.60.


Author Profile
Peggy Tang

The University of Sydney Sydney NSW Australia

Australia
Author Profile
Kun Hu

The University of Sydney Sydney NSW Australia

Australia
Author Profile
Lei Zhang

International Digital Economy Academy Shenzhen China

China

📄 논문 정보

발행 연도 2023년
인용수 4
출판 국가 Australia, China, United States
사이트 ACM
좋아요 수 0

연관 논문 목록 (166건)