Multimodal Agent AI: A Survey of Recent Advances and Future Directions


연구 분야: Artificial Intelligence



학회: Journal of Computer Science and Technology


초록

In recent years, multimodal agent AI (MAA) has emerged as a pivotal area of research, holding promise for transforming human-machine interaction. Agent AI systems, capable of perceiving and responding to inputs from multiple modalities (e.g., language, vision, audio), have demonstrated remarkable progress in understanding complex environments and executing intricate tasks. This survey comprehensively reviews the state-of-the-art developments in MAA and examines its fundamental concepts, key techniques, and applications across diverse domains. We first introduce the basics of agent AI and its multimodal interaction capabilities. We then delve into the core technologies that enable agents to perform task planning, decision-making, and multi-sensory fusion. Furthermore, we focus on exploring various applications of MAA in robotics, healthcare, gaming, and beyond. Additionally, we mainly focus on analyzing the challenges and limitations of current systems and propose promising research directions for future improvements, including human-AI collaboration, online learning method improvement. By reviewing existing work and highlighting open questions, this survey aims to provide a comprehensive roadmap for researchers and practitioners in the field of MAA.


Author Profile
Yu-Zhu Sun (孙玉柱)

School of Computer Science and Technology Xi’an Jiaotong University Xi’an 710061 China

Andorra
Author Profile
He-Li Sun (孙鹤立)

School of Computer Science and Technology Xi’an Jiaotong University Xi’an 710061 China

Andorra
Author Profile
Jian-Cong Ma (马健聪)

School of Electronic and Information Northwestern Polytechnical University Xi’an 710072 China

Andorra

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (196건)