AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies


연구 분야: Software Development



학회: International Conference on Multimedia Modeling


초록

Alternative text (alt text) is often mistaken for image captions. However, alt text is intended to replace an image, whereas a caption supports an image. Effective alt text is essential for enhancing visual accessibility for blind and low vision (BLV) individuals. While there has been substantial research in image captioning, this work often falls short in assessing visual accessibility needs. In this paper, we introduce AD2AT, a dataset of alt text derived from professionally tailored audio descriptions in movies. Our dataset, comprising over 3,800 text-image pairs, represents a first step toward advancing the alt text generation task and serves as a valuable resource for a range of vision-language applications. Through a qualitative analysis, we demonstrate the limitations of state-of-the-art image captioning and text generation models in producing effective alt text. We provide insights into improving alt text generation and call for future work on developing robust, context-aware models and evaluation metrics that align with accessibility guidelines, to better serve BLV users across different domains.


Author Profile
Elise Lincker

Cedric CNAM Paris France

France
Author Profile
Camille Guinaudeau

National Institute of Informatics Tokyo Japan

Japan
Author Profile
Shin’ichi Satoh

National Institute of Informatics Tokyo Japan

Japan

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 France, Japan
사이트 Springer
좋아요 수 0

연관 논문 목록 (25건)