ITRT(IT Research Trends)

GRiT: A Generative Region-to-Text Transformer for Object Understanding

연구 분야: Software Development

논문 키워드: #understanding #generative #encoder #nouns #grit

학회: European Conference on Computer Vision

초록

This paper presents a Generative RegIon-to-Text transformer, GRiT, for object understanding. The spirit of GRiT is to formulate object understanding as <region, text> pairs, where region locates objects and text describes objects. Specifically, GRiT consists of a visual encoder to extract image features, a foreground object extractor to localize objects, and a text decoder to generate natural language for objects. With the same model architecture, GRiT describes objects via not only simple nouns, but also rich descriptive sentences. We define GRiT as open-set object understanding, as it has no limit on object description output from the model architecture perspective. Experimentally, we apply GRiT to dense captioning and object detection tasks. GRiT achieves superior dense captioning performance (15.5 mAP on Visual Genome) and competitive detection accuracy (60.4 AP on COCO test-dev). Code is available at https://github.com/JialianW/GRiT.

📄 논문 정보

발행 연도	2024년
인용수	0
출판 국가	United States, Austria
사이트	Springer
좋아요 수	0

GRiT: A Generative Region-to-Text Transformer for Object Understanding

GRiT: A Generative Region-to-Text Transformer for Object Understanding

Jialian Wu

Jianfeng Wang

Zhengyuan Yang

Zhe Gan

Zicheng Liu

Junsong Yuan

Lijuan Wang

📄 논문 정보

연관 논문 목록 (0건)

GRiT: A Generative Region-to-Text Transformer for Object Understanding

GRiT: A Generative Region-to-Text Transformer for Object Understanding

Jialian Wu

Jianfeng Wang

Zhengyuan Yang

Zhe Gan

Zicheng Liu

Junsong Yuan

Lijuan Wang

📄 논문 정보

연관 논문 목록 (0건) 내 서재 담기

연관 논문 목록 (0건)