Linlong Wang, Huaiqing Zhang, Rurao Fu, Kexin Lei, Yang Liu, Tingdong Yang, Jing Zhang, Xiaoning Ge
Accurate individual tree species classification is essential for forest inventory, management, and conservation. However, existing methods relying primarily on single-source remote sensing data (e.g., spectral, LiDAR, or RGB) often suffer from insufficient feature representation and noise interference, particularly in subtropical forests with high species diversity, leading to increased classification errors. To address these challenges, we proposed the Multi-source Tree Species Classification Fusion Network (MTSCFNet), a novel deep learning framework that integrates RGB imagery, LiDAR-derived feature maps, and GF-2 satellite data through a modified UNet backbone, which incorporates a three-branch encoder and a Triple Branch Feature Fusion (TBFF) module within a middle fusion strategy. We evaluated the MTSCFNet in Chinese-fir mixed forests located in the Shanxia Forest Farm, Jiangxi Province, China. The results showed that: (1) MTSCFNet outperformed four baseline models, achieving Macro F1 (0.78 ± 0.01), Micro F1 (0.93 ± 0.01), Weighted F1 (0.93 ± 0.01), a Matthews correlation coefficient (MCC) (0.89 ± 0.01), Cohen’s ĸ (0.89 ± 0.01), and mIoU (0.69 ± 0.01), with respective improvements of 4.05% in Macro F1, 1.89% in Micro F1, 0.09% in Weighted F1, 1.67% in MCC, 1.64% in Cohen’s ĸ, and 5.92% in Mean IoU over the second best model, SwinUNet; (2) Compared to the best two-source combinations (R + S, R + L), MTSCFNet achieved up to 1.50%, 3.28%, 3.42%, 6.72%, 6.76%, and 3.51% higher Macro F1, Micro F1, Weighted F1, MCC, Cohen’s ĸ, and mIoU, and up to 8.11%, 2.63%, 2.88%, 5.01%, 4.99%, and 11.48% improvements over single-source inputs, while also exhibiting the lowest variability, indicating strong robustness; (3) Under different fusion strategies, MTSCFNet with middle fusion surpassed early and late fusion by up to 15.31%, 3.74%, 3.99%, 7.66%, 7.76%, 22.33% and 24.13%, 5.76%, 6.20%, 11.48%, 11.57%, 32.96% in Macro F1, Micro F1, Weighted F1, MCC, Cohen’s ĸ, and mIoU, respectively, validating the effectiveness of feature-level multi-modal integration; (4) In cross-region transfer experiments, MTSCFNet demonstrated strong spatial generalizability, achieving average scores of 0.78 (Macro F1),0.87 (Micro F1), 0.86 (Weighted F1), 0.59 (MCC), 0.59 (Cohen’s ĸ), and 0.68 (mIoU), and outperformed SwinUNet by up to 38.80%, 9.40%, 18.58%, 22.48%, 26.17%, and 33.00% in Macro F1, Micro F1, Weighted F1, MCC, Cohen’s ĸ, and mIoU across varying forest densities. Overall, MTSCFNet offers a robust, accurate, and transferable solution for tree species classification in complex subtropical forest environments.