ITRT(IT Research Trends)

Combining Audio and Image Sequence for Video Moment Retrieval by Natural Language

연구 분야: Artificial Intelligence

논문 키워드: #computational #audio #video #extractors #retrieval

학회: International Conference on Artificial Intelligence and Soft Computing

초록

The video moment retrieval with the natural language area aims to locate the segment (moment) of the video most relevant to a textual description (natural language). However, existing methods are based only on the image sequence analysis and neglect the information derived from the audio. Thus, the main objective of this study is to combine both features (from image and audio) to make the retrieval more comprehensive and robust. For this, a model is built on audio and image sequence extractors aligned that relate to the textual description to retrieve the desired moment of the video. We proposed a weakly supervised model that uses attention mechanisms and the audio component for video moment retrieval by natural language. Results demonstrate that the proposed model outperforms the current state-of-the-art in the metric mIoU by more than 27%, in addition to decreasing the response time of the video moment retrieval (reducing the computational complexity from polynomial to linear).

📄 논문 정보

발행 연도	2025년
인용수	0
출판 국가	Brazil
사이트	Springer
좋아요 수	0

Combining Audio and Image Sequence for Video Moment Retrieval by Natural Language

Combining Audio and Image Sequence for Video Moment Retrieval by Natural Language

Luís G. de Souza

Sílvio R. R. Sanches

Pedro H. Bugatti

Priscila T. M. Saito

📄 논문 정보

연관 논문 목록 (71건)

Combining Audio and Image Sequence for Video Moment Retrieval by Natural Language

Combining Audio and Image Sequence for Video Moment Retrieval by Natural Language

📄 논문 정보

연관 논문 목록 (71건) 내 서재 담기

연관 논문 목록 (71건)