Ctnet: rethinking convolutional neural networks and vision transformer for medical image segmentation


연구 분야: Artificial Intelligence



학회: Signal, Image and Video Processing


초록

Convolutional architectures have demonstrated remarkable success in various vision tasks, offering efficient learning through their inherent induction bias. However, they might be constrained by a potential performance limit. On the other hand, vision transformers (ViTs) leverage more adaptable self-attention layers and have recently surpassed CNNs in image classification. Yet, ViTs often necessitate resource-intensive pre-training on sizable external datasets or refinement from pre-trained convolutional networks. In this paper, we propose an efficient integration of CNNs and Vision Transformers via a hierarchical stage-wise transformer. We introduce convolutional operations for precise feature extraction and devise a distinct module hierarchy for capturing both local and global features. The approach involves a parallel implementation of the CNN-based encoder and the Transformer-based segmentation network. To mitigate the challenge of feature misalignment arising from the amalgamation of CNNs and Transformers, we introduce an innovative adaptive feature fusion module. Our method undergoes comprehensive evaluation across various widely-used benchmark datasets, effectively addressing this challenge. Importantly, these advancements are achieved without imposing significant computational overhead.


Author Profile
Zhixin Zhang

Information Engineering Department Tianjin University of Commerce Tianjin 300134 China

China
Author Profile
Shuhao Jiang

Information Engineering Department Tianjin University of Commerce Tianjin 300134 China

China
Author Profile
Xuhua Pan

Information Engineering Department Tianjin University of Commerce Tianjin 300134 China

China

📄 논문 정보

발행 연도 2023년
인용수 5
출판 국가 China
사이트 Springer
좋아요 수 0

연관 논문 목록 (302건)