Abstract
Thermal and RGB images exhibit significant differences in information representation, especially in low-light or nighttime environments. Thermal images provide temperature information, complementing the RGB images by restoring details and contextual information. However, the spatial discrepancy between different modalities in RGB-Thermal (RGB-T) semantic segmentation tasks complicates the process of multimodal feature fusion, leading to a loss of spatial contextual information and limited model performance. This paper proposes a channel-space fusion nonlinear spiking neural P system model network (CSPM-SNPNet) to address these challenges. This paper designs a novel color-thermal image fusion module to effectively integrate features from both modalities. During decoding, a nonlinear spiking neural P system is introduced to enhance multi-channel information extraction through the convolution of spiking neural P systems (ConvSNP) operations, fully restoring features learned in the encoder. Experimental results on public datasets MFNet and PST900 demonstrate that CSPM-SNPNet significantly improves segmentation performance. Compared with the existing methods, CSPM-SNPNet achieves a 0.5% improvement in mIOU on MFNet and 1.8% on PST900, showcasing its effectiveness in complex scenes.
Overview
- The study investigates the differences in information representation between thermal and RGB images, particularly in low-light or nighttime environments.
- The authors propose the channel-space fusion nonlinear spiking neural P system model network (CSPM-SNPNet) to address the challenges of multimodal feature fusion.
- The primary objective is to design a novel color-thermal image fusion module to effectively integrate features from both modalities and improve segmentation performance.
Comparative Analysis & Findings
- The CSPM-SNPNet outperforms existing methods in complex scenes, achieving a 0.5% improvement in mIOU on MFNet and 1.8% on PST900.
- The novel color-thermal image fusion module effectively integrates features from both modalities, restoring details and contextual information lost in RGB-Thermal (RGB-T) semantic segmentation tasks.
- The nonlinear spiking neural P system in the decoding phase enhances multi-channel information extraction through convolution operations, fully restoring features learned in the encoder.
Implications and Future Directions
- The study demonstrates the effectiveness of CSPM-SNPNet in multimodal feature fusion, particularly in complex scenes and nighttime environments.
- Future research can explore the application of CSPM-SNPNet in other thermal and RGB image fusion tasks.
- The study highlights the importance of nonlinear spiking neural P systems in enhancing multi-channel information extraction and feature restoration.