ITRT(IT Research Trends)

PSNet: position-shift alignment network for image caption

연구 분야: Verification

논문 키워드: #competitive #popularity #advantages #improvement #transformer

학회: International Journal of Multimedia Information Retrieval

초록

Recently, Transformer-based models have gained increasing popularity in the field of image captioning. The global attention mechanism of the Transformer facilitates the integration of region and grid features, leading to a significant improvement in accuracy. However, combining two features through direct fusion may lead to inevitable semantic noise, which is caused by non-synergistic issue of the region and grid features; meanwhile, the additional detector to extract region features also decrease the efficiency of the model. In this paper, we introduce a novel position-shift alignment network (PSNet) to exploit the advantages of the two features. Concretely, we embed a simple detector DETR into the model and extracted region features based on grid features to improve model efficiency. Moreover, we propose a P-shift alignment module to address semantic noise caused by non-synergistic issue of the region and grid features. To validate our model, we conduct extensive experiments and visualization on the MS-COCO dataset, and results show that PSNet is qualitatively competitive with existing models under comparable experimental conditions.

📄 논문 정보

발행 연도	2023년
인용수	0
출판 국가	Andorra
사이트	Springer
좋아요 수	0

PSNet: position-shift alignment network for image caption

PSNet: position-shift alignment network for image caption

Lixia Xue

Awen Zhang

Ronggui Wang

Juan Yang

📄 논문 정보

연관 논문 목록 (24건)

PSNet: position-shift alignment network for image caption

PSNet: position-shift alignment network for image caption

📄 논문 정보

연관 논문 목록 (24건) 내 서재 담기

연관 논문 목록 (24건)