ITRT(IT Research Trends)

Semantic-Orthogonal Multi-modal Attention Network for RGB-D Salient Object Detection

연구 분야: Strategies

논문 키워드: #innovations #optimization #improving #experimental #fusion

학회: The Visual Computer

초록

In recent years, RGB-D saliency object detection has significantly advanced computer vision. However, existing methods still face challenges in feature extraction, cross-modal fusion, and multi-scale processing, limiting their performance in complex scenarios. To tackle these challenges, we propose SOMANet (Semantic Orthogonal Multi-Modal Attention Network), a novel and efficient RGB-D saliency object detection model that incorporates three key innovations: First, inspired by the “local focus-global reasoning” dual-path mechanism of the human visual system, we introduce a novel method for semantic token sparsification—Dual-Stage Sparse Semantic Enhancement (DSSE), based on the Swin Transformer architecture. DSSE filters out redundant semantic information, improving generalization and enabling focus on crucial semantics. This method enhances feature extraction efficiency by reducing FLOPs by over 33%, without sacrificing accuracy compared to the original Swin Transformer backbone. Second, we propose the Orthogonal Multi-Modal Mutual Attention Fusion (O-MMAF) module, which integrates mutual attention with orthogonal channel attention. This module effectively leverages the complementary relationship between RGB and Depth features, improving accuracy and robustness in cross-modal fusion. Finally, inspired by the visual processing mechanisms of primates, we design the Multi-Scale Self-Calibrating Spatial Recursive Attention (MSRA) module. By extracting multi-scale information and performing deep optimization, MSRA simulates the brain’s approach to information processing, generating high-precision saliency predictions in a coarse-to-fine manner. Experimental results show that SOMANet achieves outstanding performance across four evaluation metrics on nine publicly available RGB-D datasets, surpassing 12 state-of-the-art models, demonstrating its effectiveness in this field. Our code is published at https://github.com/jiaweiXu1029/SOMANet.

📄 논문 정보

발행 연도	2025년
인용수	0
출판 국가	Anguilla, China
사이트	Springer
좋아요 수	0

Semantic-Orthogonal Multi-modal Attention Network for RGB-D Salient Object Detection

Semantic-Orthogonal Multi-modal Attention Network for RGB-D Salient Object Detection

Jiawei Xu

Qiangqiang Zhou

Jiacong Yu

Chen Liao

Dandan Zhu

📄 논문 정보

연관 논문 목록 (58건)

Semantic-Orthogonal Multi-modal Attention Network for RGB-D Salient Object Detection

Semantic-Orthogonal Multi-modal Attention Network for RGB-D Salient Object Detection

📄 논문 정보

연관 논문 목록 (58건) 내 서재 담기

연관 논문 목록 (58건)