ITRT(IT Research Trends)

Think Before Placement: Common Sense Enhanced Transformer for Object Placement

연구 분야: Verification

논문 키워드: #experiments #performance #combines #dataset #extracting

학회: European Conference on Computer Vision

초록

Object placement is a task to insert a foreground object into a background scene at a suitable position and size. Existing methods mainly focus on extracting better visual features, while neglecting common sense about the objects and background. It leads to semantically unrealistic object positions. In this paper, we introduce Think Before Placement, a novel framework that effectively combines the implicit and explicit knowledge to generate placements that are both visually coherent and contextually appropriate. Specifically, we first adopt a large multi-modal model to generate a descriptive caption that identifies an appropriate position in the background for placing foreground object (Think), then output proper position and size of the object (Place). The caption serves as an explicit semantic guidance for the subsequent placement of objects. Using this framework, we implement our model named CSENet, which outperforms baseline methods on the OPA dataset in extensive experiments. Further, we establish the OPAZ dataset to evaluate the zero-shot transfer capabilities of CSENet, where it also shows impressive performance across different foreground objects and scenes.

📄 논문 정보

발행 연도	2024년
인용수	0
출판 국가	China, Anguilla
사이트	Springer
좋아요 수	0

Think Before Placement: Common Sense Enhanced Transformer for Object Placement

Think Before Placement: Common Sense Enhanced Transformer for Object Placement

Yaxuan Qin

Jiayu Xu

Ruiping Wang

Xilin Chen

📄 논문 정보

연관 논문 목록 (7건)

Think Before Placement: Common Sense Enhanced Transformer for Object Placement

Think Before Placement: Common Sense Enhanced Transformer for Object Placement

📄 논문 정보

연관 논문 목록 (7건) 내 서재 담기

연관 논문 목록 (7건)