ITRT(IT Research Trends)

AVA-AVD: Audio-visual Speaker Diarization in the Wild

연구 분야: Strategies

논문 키워드: #indoor #textcolormagenta #showlab #sitcoms #documentaries

학회: MM '22: Proceedings of the 30th ACM International Conference on Multimedia

초록

Audio-visual speaker diarization aims at detecting "who spoke when'' using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challenging videos, we create the AVA Audio-Visual Diarization (AVA-AVD) dataset. Our experiments demonstrate that adding AVA-AVD into training set can produce significantly better diarization models for in-the-wild videos despite that the data is relatively small. Moreover, this benchmark is challenging due to the diverse scenes, complicated acoustic conditions, and completely off-screen speakers. As a first step towards addressing the challenges, we design the Audio-Visual Relation Network (AVR-Net) which introduces a simple yet effective modality mask to capture discriminative information based on face visibility. Experiments show that our method not only can outperform state-of-the-art methods but is more robust as varying the ratio of off-screen speakers. Our data and code has been made publicly available at \textcolormagenta \urlhttps://github.com/showlab/AVA-AVD .

📄 논문 정보

발행 연도	2022년
인용수	25
출판 국가	Singapore, China
사이트	ACM
좋아요 수	0

AVA-AVD: Audio-visual Speaker Diarization in the Wild

AVA-AVD: Audio-visual Speaker Diarization in the Wild

Eric Zhongcong Xu

Zeyang Song

Satoshi Tsutsui

Chao Feng

Mang Ye

Mike Zheng Shou

📄 논문 정보

연관 논문 목록 (2건)

AVA-AVD: Audio-visual Speaker Diarization in the Wild

AVA-AVD: Audio-visual Speaker Diarization in the Wild

📄 논문 정보

연관 논문 목록 (2건) 내 서재 담기

연관 논문 목록 (2건)