연구 분야: Strategies
학회: Multimedia Systems
To effectively integrate the structural relationships between data into the generated cross-modal features, numerous cross-modal retrieval methods have employed Graph Convolutional Networks (GCNs) for cross-modal feature learning. However, most studies independently learn features for different modalities, ultimately limiting structural consistency across cross-modal features. As text features encompass semantic structural information, we introduce a Text Semantic Structure-Guided Correlation Learning (TSSCL) method. This method leverages text features as supervision information to guide the learning of both image and label correlations. As a result, we can align the structural relationships within the common representations and label embeddings with those inherent in the texts. Furthermore, we introduce a novel Structure Transfer Graph Convolutional Network (STGCN) to maintain the global structural relationships across images. Additionally, we propose a series of semantic consistency losses and structural InfoNCE losses, which are beneficial for maintaining both the semantic consistency and structural consistency of the common representations and label embeddings. We perform experiments on the NUS-WIDE, MIRFlickr-25K, and MS-COCO datasets, and the results demonstrate that our TSSCL outperforms the current state-of-the-art cross-modal retrieval methods.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra, China |
| 사이트 | Springer |
| 좋아요 수 | 0 |