Multi-level video captioning method based on semantic space


연구 분야: Strategies



학회: Multimedia Tools and Applications


초록

Video captioning is designed to generate natural language descriptions based on video content. Traditional methods extract visual features and interactive relationship features between objects, but the problem of video feature isolation and semantic hierarchy is ignored. This paper proposes a Multi-Level Video Captioning Method based on semantic space (S-MLM) to solve the above problems. S-MLM extracts different levels of visual elements and visual relationships, and the visual information of different levels is aggregated layer by layer to complete the generation of low-level to high-level visual features. The multi-level structure semantic graph is constructed from the semantic point of view. It does not rely on external knowledge bases, and uses its own information as guidance to enhance feature representation and improve semantic understanding. We conduct experiments on MSVD and MSR-VTT datasets, and the experimental results show that the performance of video captioning is further improved.


Author Profile
Xiao Yao

The College of IoT Engineering Hohai University 200 Jinling North Road Changzhou 21300 Jiangsu China

British Indian Ocean Territory
Author Profile
Yuanlin Zeng

The College of IoT Engineering Hohai University 200 Jinling North Road Changzhou 21300 Jiangsu China

British Indian Ocean Territory
Author Profile
Min Gu

Department of Stomatology The Third Affiliated Hospital of Soochow University The First People’s Hospital of Changzhou Changzhou 21303 Jiangsu China

China

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 British Indian Ocean Territory, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (47건)