A Multimodal Activation Detection Model for Wake-Free Robots


연구 분야: Safety



학회: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data


초록

During an interaction with a robot, the first thing we usually do is wake up the robot using a wake word. For example: ‘XiaoduXiaodu’ and ‘Hey, Siri’, these wake words undoubtedly reduce the interaction experience between us and robots. In this work, we focus on interacting with the robot without the use of wake words, even when the user is not within the robot’s field of view. To accomplish this task, we propose a multimodal activation detection model (MADM), which consists of three parts: primary feature extraction, high-level feature fusion, and fused feature classification. The first part is used to extract the original video and audio as primary feature vectors. The second part uses our proposed local variable weight fusion strategy to convert primary features into high-level features and fuse them into fused features for classification. The three parts use a fully connected neural network to classify the fused features to determine whether a response is required. To evaluate MADM, we constructed a dataset containing 7992 short videos recorded by 99 invited volunteers. Extensive experiments demonstrate the effectiveness of our model and the necessity of a feature fusion strategy.


Author Profile
Hangming Zhang

School of Computer Science and Technology Tiangong University Tianjin China

Andorra
Author Profile
Jianming Wang

School of Computer Science and Technology Tiangong University Tianjin China

Andorra
Author Profile
Shengjiao Yang

School of Psychology Guizhou Normal University Guiyang China

China

📄 논문 정보

발행 연도 2023년
인용수 0
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (37건)