Text semantic structure-guided correlation learning for cross-modal retrieval


연구 분야: Strategies



학회: Multimedia Systems


초록

To effectively integrate the structural relationships between data into the generated cross-modal features, numerous cross-modal retrieval methods have employed Graph Convolutional Networks (GCNs) for cross-modal feature learning. However, most studies independently learn features for different modalities, ultimately limiting structural consistency across cross-modal features. As text features encompass semantic structural information, we introduce a Text Semantic Structure-Guided Correlation Learning (TSSCL) method. This method leverages text features as supervision information to guide the learning of both image and label correlations. As a result, we can align the structural relationships within the common representations and label embeddings with those inherent in the texts. Furthermore, we introduce a novel Structure Transfer Graph Convolutional Network (STGCN) to maintain the global structural relationships across images. Additionally, we propose a series of semantic consistency losses and structural InfoNCE losses, which are beneficial for maintaining both the semantic consistency and structural consistency of the common representations and label embeddings. We perform experiments on the NUS-WIDE, MIRFlickr-25K, and MS-COCO datasets, and the results demonstrate that our TSSCL outperforms the current state-of-the-art cross-modal retrieval methods.


Author Profile
Jie Zhu

Hebei Key Laboratory of Machine Learning and Computational Intelligence College of Mathematics and Information Science Hebei University Baoding 071002 China

Andorra
Author Profile
Jingjing Fan

Information Engineering College Hebei University of Architecture Zhangjiakou 075000 China

China
Author Profile
Jianguang Zhao

Information Engineering College Hebei University of Architecture Zhangjiakou 075000 China

China

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (87건)