연구 분야: Verification
학회: Signal, Image and Video Processing
Transformer-based real-time semantic segmentation algorithms have demonstrated significant potential. Nonetheless, current mainstream Transformer methods typically overlook the correlation between each region in the attention calculation process. Limiting attention to the most important regions further reduces model parameters and computational load. To address this issue, a dynamic sparse axial rectangular attention Transformer was developed for real-time semantic segmentation (DSARFormer). DSARFormer comprises two key modules, namely the DSARFormer Block and the CNN-Transformer feature fusion module (CTFM). The DSARFormer Block contains dynamic sparse axial rectangular attention (ARAttention), which calculates the attention of the most relevant rectangular regions in the horizontal and vertical directions. Meanwhile, CTFM can effectively integrate the features of CNN and Transformer, making it suitable for real-time semantic segmentation. Both modules were evaluated on the ADE20K and Cityscapes datasets. The results revealed that DSARFormer achieved 39.3% mIoU and 73.4% mIoU at 48.5FPS and 46.3FPS, respectively, outperforming current mainstream real-time semantic segmentation algorithms. Code is available at https://github.com/Panyw1011/DSARFormer.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |