Lightweight and real-time semantic segmentation network via multi-scale dilated convolutions


연구 분야: Strategies



학회: The Visual Computer


초록

Semantic segmentation, a fundamental task in computer vision, aims to label each pixel in an image with a semantic category. Despite advancements, balancing segmentation accuracy and real-time inference speed remains challenging, particularly for lightweight networks. This paper proposes MSDSeg, a lightweight real-time semantic segmentation network employing multi-scale dilated convolutions. The encoder incorporates a Multi-scale Dilation Block (MSDB) featuring three varying dilated convolutions with distinct dilation rates, achieving good results without pre-training on large datasets. The decoder introduces a Cross-layer Attention Fusion Module (CAFM) to efficiently merge multi-level feature information, reducing the disparity between high-level and low-level features. Additionally, a Feature Enhancement Head (FEH) utilizing global average and maximum pooling is employed to improve object and boundary detection. Extensive experiments on the Cityscapes and CamVid datasets demonstrate that MSDSeg achieves a balance between accuracy and speed, with segmentation accuracies of 74.0% and 75.3% mIoU, and inference speeds of 204.7 and 175.0 FPS, respectively. Here, we show that MSDSeg effectively addresses the trade-off between accuracy and efficiency in real-time semantic segmentation. The code of this work is publicly available at: https://github.com/wangyunlei-wyl/MSDSeg.


Author Profile
Shan Zhao

School of Software Henan Polytechnic University 2001 Century Avenue Jiaozuo 454000 China

China
Author Profile
Yunlei Wang

School of Software Henan Polytechnic University 2001 Century Avenue Jiaozuo 454000 China

China
Author Profile
Zhanqiang Huo

College of Information Sciences and Technology Donghua University Shanghai 201620 China

Andorra

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (85건)