DSARFormer: A dynamic sparse axial rectangular transformer for real-time semantic segmentation


연구 분야: Verification



학회: Signal, Image and Video Processing


초록

Transformer-based real-time semantic segmentation algorithms have demonstrated significant potential. Nonetheless, current mainstream Transformer methods typically overlook the correlation between each region in the attention calculation process. Limiting attention to the most important regions further reduces model parameters and computational load. To address this issue, a dynamic sparse axial rectangular attention Transformer was developed for real-time semantic segmentation (DSARFormer). DSARFormer comprises two key modules, namely the DSARFormer Block and the CNN-Transformer feature fusion module (CTFM). The DSARFormer Block contains dynamic sparse axial rectangular attention (ARAttention), which calculates the attention of the most relevant rectangular regions in the horizontal and vertical directions. Meanwhile, CTFM can effectively integrate the features of CNN and Transformer, making it suitable for real-time semantic segmentation. Both modules were evaluated on the ADE20K and Cityscapes datasets. The results revealed that DSARFormer achieved 39.3% mIoU and 73.4% mIoU at 48.5FPS and 46.3FPS, respectively, outperforming current mainstream real-time semantic segmentation algorithms. Code is available at https://github.com/Panyw1011/DSARFormer.


Author Profile
Xiaochun Lei

School of Computer Science and Information Security Guilin University of Electronic Technology Guilin 541004 Guangxi China

Andorra
Author Profile
Yiwei Pan

Guangxi Key Laboratory of Image and Graphics Intelligent Guilin University of Electronic Technology Guilin 541004 Guangxi China

Andorra
Author Profile
Yongya Zhang

School of Computer Science and Information Security Guilin University of Electronic Technology Guilin 541004 Guangxi China

Andorra

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (119건)