Multimodal Dialog Act Classification for Digital Character Conversations


연구 분야: Infrastructure



학회: CUI '24: Proceedings of the 6th ACM Conference on Conversational User Interfaces


초록

Dialog act classification is essential for enabling digital characters to understand and respond effectively to user intents, leading to more engaging and seamless interactions. Previous research has focused on classifying dialog acts from transcriptions alone due to missing multimodal data. We close this gap by collecting a new multimodal (i.e., text, audio, video) dyadic dialog dataset from 60 participants. Based on our dataset, we developed a novel multimodal Transformer-based dialog act classification model. We show that our model can predict dialog acts in real-time on four classes with a Macro F1 score up to 80.81, outperforming the unimodal baseline by <Formula format="inline"><TexMath><?TeX $1.24\%$?></TexMath><AltText>Math 1</AltText><File name="cui24-8-inline1" type="svg"/></Formula>. Our analysis shows that the segments of a sentence associated with the highest acoustic energy are most predictive. By harnessing our new multimodal dataset, we pave the way for dynamic, real-time, and contextually rich conversations that enhance the experience of interactions with digital characters.


Author Profile
Philine Witzig

Department of Computer Science ETH Zurich Switzerland

Ethiopia
Author Profile
Rares Constantin

Department of Computer Science ETH Zurich Switzerland

Ethiopia
Author Profile
Nikola Kovačević

Department of Computer Science ETH Zurich Switzerland

Ethiopia

📄 논문 정보

발행 연도 2024년
인용수 3
출판 국가 Ethiopia
사이트 ACM
좋아요 수 0

연관 논문 목록 (96건)