연구 분야: Strategies
학회: The Visual Computer
Semantic segmentation, a fundamental task in computer vision, aims to label each pixel in an image with a semantic category. Despite advancements, balancing segmentation accuracy and real-time inference speed remains challenging, particularly for lightweight networks. This paper proposes MSDSeg, a lightweight real-time semantic segmentation network employing multi-scale dilated convolutions. The encoder incorporates a Multi-scale Dilation Block (MSDB) featuring three varying dilated convolutions with distinct dilation rates, achieving good results without pre-training on large datasets. The decoder introduces a Cross-layer Attention Fusion Module (CAFM) to efficiently merge multi-level feature information, reducing the disparity between high-level and low-level features. Additionally, a Feature Enhancement Head (FEH) utilizing global average and maximum pooling is employed to improve object and boundary detection. Extensive experiments on the Cityscapes and CamVid datasets demonstrate that MSDSeg achieves a balance between accuracy and speed, with segmentation accuracies of 74.0% and 75.3% mIoU, and inference speeds of 204.7 and 175.0 FPS, respectively. Here, we show that MSDSeg effectively addresses the trade-off between accuracy and efficiency in real-time semantic segmentation. The code of this work is publicly available at: https://github.com/wangyunlei-wyl/MSDSeg.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Andorra, China |
| 사이트 | Springer |
| 좋아요 수 | 0 |