Think Before Placement: Common Sense Enhanced Transformer for Object Placement


연구 분야: Verification



학회: European Conference on Computer Vision


초록

Object placement is a task to insert a foreground object into a background scene at a suitable position and size. Existing methods mainly focus on extracting better visual features, while neglecting common sense about the objects and background. It leads to semantically unrealistic object positions. In this paper, we introduce Think Before Placement, a novel framework that effectively combines the implicit and explicit knowledge to generate placements that are both visually coherent and contextually appropriate. Specifically, we first adopt a large multi-modal model to generate a descriptive caption that identifies an appropriate position in the background for placing foreground object (Think), then output proper position and size of the object (Place). The caption serves as an explicit semantic guidance for the subsequent placement of objects. Using this framework, we implement our model named CSENet, which outperforms baseline methods on the OPA dataset in extensive experiments. Further, we establish the OPAZ dataset to evaluate the zero-shot transfer capabilities of CSENet, where it also shows impressive performance across different foreground objects and scenes.


Author Profile
Yaxuan Qin

Key Laboratory of AI Safety of CAS Institute of Computing Technology Chinese Academy of Sciences (CAS) Beijing 100190 China

Anguilla
Author Profile
Jiayu Xu

University of Chinese Academy of Sciences Beijing 100049 China

China
Author Profile
Ruiping Wang

Key Laboratory of AI Safety of CAS Institute of Computing Technology Chinese Academy of Sciences (CAS) Beijing 100190 China

Anguilla

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 China, Anguilla
사이트 Springer
좋아요 수 0

연관 논문 목록 (7건)