Semantic-Orthogonal Multi-modal Attention Network for RGB-D Salient Object Detection


연구 분야: Strategies



학회: The Visual Computer


초록

In recent years, RGB-D saliency object detection has significantly advanced computer vision. However, existing methods still face challenges in feature extraction, cross-modal fusion, and multi-scale processing, limiting their performance in complex scenarios. To tackle these challenges, we propose SOMANet (Semantic Orthogonal Multi-Modal Attention Network), a novel and efficient RGB-D saliency object detection model that incorporates three key innovations: First, inspired by the “local focus-global reasoning” dual-path mechanism of the human visual system, we introduce a novel method for semantic token sparsification—Dual-Stage Sparse Semantic Enhancement (DSSE), based on the Swin Transformer architecture. DSSE filters out redundant semantic information, improving generalization and enabling focus on crucial semantics. This method enhances feature extraction efficiency by reducing FLOPs by over 33%, without sacrificing accuracy compared to the original Swin Transformer backbone. Second, we propose the Orthogonal Multi-Modal Mutual Attention Fusion (O-MMAF) module, which integrates mutual attention with orthogonal channel attention. This module effectively leverages the complementary relationship between RGB and Depth features, improving accuracy and robustness in cross-modal fusion. Finally, inspired by the visual processing mechanisms of primates, we design the Multi-Scale Self-Calibrating Spatial Recursive Attention (MSRA) module. By extracting multi-scale information and performing deep optimization, MSRA simulates the brain’s approach to information processing, generating high-precision saliency predictions in a coarse-to-fine manner. Experimental results show that SOMANet achieves outstanding performance across four evaluation metrics on nine publicly available RGB-D datasets, surpassing 12 state-of-the-art models, demonstrating its effectiveness in this field. Our code is published at https://github.com/jiaweiXu1029/SOMANet.


Author Profile
Jiawei Xu

School of Software Jiangxi Normal University Street Nanchang State 330000 China

China
Author Profile
Qiangqiang Zhou

School of Software Jiangxi Normal University Street Nanchang State 330000 China

China
Author Profile
Jiacong Yu

School of Software Jiangxi Normal University Street Nanchang State 330000 China

China

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Anguilla, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (58건)