ITRT(IT Research Trends)

AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies

연구 분야: Software Development

논문 키워드: #improving #audio #valuable #vision #movies

학회: International Conference on Multimedia Modeling

초록

Alternative text (alt text) is often mistaken for image captions. However, alt text is intended to replace an image, whereas a caption supports an image. Effective alt text is essential for enhancing visual accessibility for blind and low vision (BLV) individuals. While there has been substantial research in image captioning, this work often falls short in assessing visual accessibility needs. In this paper, we introduce AD2AT, a dataset of alt text derived from professionally tailored audio descriptions in movies. Our dataset, comprising over 3,800 text-image pairs, represents a first step toward advancing the alt text generation task and serves as a valuable resource for a range of vision-language applications. Through a qualitative analysis, we demonstrate the limitations of state-of-the-art image captioning and text generation models in producing effective alt text. We provide insights into improving alt text generation and call for future work on developing robust, context-aware models and evaluation metrics that align with accessibility guidelines, to better serve BLV users across different domains.

📄 논문 정보

발행 연도	2025년
인용수	0
출판 국가	France, Japan
사이트	Springer
좋아요 수	0

AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies

AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies

Elise Lincker

Camille Guinaudeau

Shin’ichi Satoh

📄 논문 정보

연관 논문 목록 (25건)

AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies

AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies

📄 논문 정보

연관 논문 목록 (25건) 내 서재 담기

연관 논문 목록 (25건)