in Neural networks : the official journal of the International Neural Network Society by Zhiyong Zhou, Zhechen Zhou, Xusheng Qian, Jisu Hu, Bo Peng, Chen Geng, Bin Dai, He Huang, Wenbin Zhang, Yakang Dai
Segmentation of multiple targets of varying sizes within medical images is of significant importance for the diagnosis of disease and pathological research. Transformer-based methods are emerging in the medical image segmentation, leveraging the powerful yet computationally intensive self-attention mechanism. A variety of attention mechanisms have been proposed to reduce computation at the cost of accuracy loss, utilizing handcrafted patterns within local or artificially defined receptive fields. Furthermore, the common region-based loss functions are insufficient for guiding the transformer to focus on tissue regions, resulting in their unsuitability for the segmentation of tissues with intricate boundaries. This paper presents the development of a bi-level sparse attention network and a narrow band (NB) loss function for the accurate and efficient multi-target segmentation of medical images. In particular, we introduce a bi-level sparse attention module (BSAM) and formulate a segmentation network based on this module. The BSAM consists of coarse-grained patch-level attention and fine-grained pixel-level attention, which captures fine-grained contextual features in adaptive receptive fields learned by patch-level attention. This results in enhanced segmentation accuracy while simultaneously reducing computational complexity. The proposed narrow-band (NB) loss function constructs a target region in close proximity to the tissue boundary. The network is thus guided to perform boundary-aware segmentation, thereby simultaneously alleviating the issues of over-segmentation and under-segmentation. A series of comprehensive experiments on whole brains, brain tumors and abdominal organs, demonstrate that our method outperforms other state-of-the-art segmentation methods. Furthermore, the BSAM and NB loss can be applied flexibly to a variety of network frameworks.