CircuitVQA: A Visual Question Answering Dataset for Electrical Circuit Images


연구 분야: Software Development



학회: Joint European Conference on Machine Learning and Knowledge Discovery in Databases


초록

A visual question answering (VQA) system for electrical circuit images could be useful as a quiz generator, design and verification assistant or an electrical diagnosis tool. Although there exists a vast literature on VQA, to the best of our knowledge, there is no existing work on VQA for electrical circuit images. To this end, we curate a new dataset, CIRCUITVQA, of 115K+ questions on 5725 electrical images with 70 circuit symbols. The dataset contains schematic as well as hand-drawn images. The questions span various categories like counting, value, junction and position based questions. To be effective, models must demonstrate skills like object detection, text recognition, spatial understanding, question intent understanding and answer generation. We experiment with multiple foundational visio-linguistic models for this task and find that a finetuned BLIP model with component descriptions as additional input provides best results. We make the code and dataset publicly available (https://github.com/rahcode7/Circuit-VQA).


Author Profile
Rahul Mehta

IIIT Hyderabad Hyderabad India

India
Author Profile
Bhavyajeet Singh

IIIT Hyderabad Hyderabad India

India
Author Profile
Vasudeva Varma

Microsoft Hyderabad India

India

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 India
사이트 Springer
좋아요 수 0

연관 논문 목록 (10건)