SAM2-FNet: A Frequency-Domain Mul-ti-Expert Fusion Network for Medical Image Segmentation

Zihua Zhang; Shaoli Li; Bing Ge; Peng Lu

doi:10.63313/CS.8023

Authors

Zihua Zhang Shenyang University of Technology, Shenyang 110870, China Author
Shaoli Li Shenyang University of Technology, Shenyang 110870, China Author
Bing Ge Shenyang Hanxi Mechanical Equipment LLC, Shenyang Liaoning, China Author
Peng Lu Shenyang Hanxi Mechanical Equipment LLC, Shenyang Liaoning, China Author

DOI:

https://doi.org/10.63313/CS.8023

Keywords:

Medical image lesion segmentation, SAM2, frequency expert ensemble module

Abstract

In recent years, deep learning has achieved substantial progress in medical image segmentation; however, existing methods remain constrained by the challenge of effectively coupling local detail with global semantic context. To address this issue, we propose SAM2-FNet, a frequency-domain expert-fusion network built upon an improved SAM2-UNet. SAM2-FNet attains fine-grained, multi-scale feature enhancement through frequency-domain decoupling and a multi-expert collaborative mechanism, leading to marked improvements in segmentation accuracy.Specifically, SAM2-FNet provides three primary capabilities:(1) Frequency-domain decoupling and enhancement— input features are decomposed into high-frequency and low-frequency components via a two-dimensional Fourier transform; the resulting components are processed by a dual-branch heterogeneous pipeline and adaptively fused using channel-wise attention to preserve both texture details and global structure.(2) Dynamic expert decision-making — an improved Expert Fusion Module (FEM) implements a multi-expert architecture composed of five sub-networks (SubNets), enabling dynamic, data-driven expert weighting and enhancing the model’s generalization to lesion heterogeneity.(3) Decoder optimization — a Local Refinement Module (LRM) is introduced in the decoder: an atrous (dilated) convolutional pyramid combined with a spatial-attention mechanism narrows the encoder–decoder semantic gap and improves boundary reconstruction fidelity.Experiments on the Kvasir-SEG dataset demonstrate that SAM2-FNet yields a 3.0% absolute increase in Dice coefficient and a 4.1% absolute increase in IoU relative to the baseline SAM2-UNet. Moreover, compared with benchmark models such as umambabot and SwinUNETR, SAM2-FNet exhibits superior robustness under high-noise and low-contrast imaging conditions

References

[1] Li X, Meng X, Chen H, Fu X, Wang P, Chen X, Gu C, Zhou J. Integration of single sample and population analysis for understanding immune evasion mechanisms of lung cancer[J]. npj Systems Biology and Applications, 2023, 9(1): 4.

[2] Li X, Feng X, Zhou J, Luo Y, Chen X, Zhao J, Chen H, Xiong G, Luo G. A muti-modal feature fusion method based on deep learning for predicting immunotherapy response[J]. Journal of Theoretical Biology, 2024, 586: 111816.

[3] Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation[M]. arXiv, 2021.

[4] Si, C., Yu, W., Zhou, P., Zhou, Y., Wang, X., & Yan, S. (2022). Inception transformer. Advances in Neural Information Processing Systems, 35, 23495-23509.

[5] Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo WY, Dollar P, Girshick R. Segment Anything[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 4015-4026.

[6] Chen T, Zhu L, Ding C, Cao R, Wang Y, Li Z, Sun L, Mao P, Zang Y. SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More[M]. arXiv, 2023.

[7] Ma J, He Y, Li F, Han L, You C, Wang B. Segment Anything in Medical Images[J]. Nature Communications, 2024, 15(1): 654.

[8] Wu J, Ji W, Liu Y, Fu H, Xu M, Xu Y, Jin Y. Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation[M]. arXiv, 2023.

[9] Cheng J, Ye J, Deng Z, Chen J, Li T, Wang H, Su Y, Huang Z, Chen J, Jiang L, Sun H, He J, Zhang S, Zhu M, Qiao Y. (2023). Sam-med2d. arXiv preprint arXiv:2308.16184.

[10] Xiong X, Wu Z, Tan S, Li W, Tang F, Chen Y, Li S, Ma J, Li G. (2024). Sam2-unet: Segment anything 2 makes strong encoder for natural and medical image segmentation. arXiv preprint arXiv:2408.08870.

[11] Jha D, Smedsrud PH, Riegler MA, Halvorsen P, De Lange T, Johansen D, Johansen HD. Kvasir-SEG: A Segmented Polyp Dataset[M]//Ro Y M, Cheng W H, Kim J, et al. MultiMedia Modeling: Vol. 11962. Cham: Springer International Publishing, 2020: 451-462.

[12] Fan DP, Ji GP, Zhou T, Chen G, Fu H, Shen J, Shao L. PraNet: Parallel Reverse Attention Network for Polyp Segmentation[M]. arXiv, 2020.

[13] Zheng J, Yan Y, Zhao L, Pan X. CGMA-Net: Cross-Level Guidance and Multi-Scale Aggregation Network for Polyp Segmentation[J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28(3): 1424-1435.

[14] Oktay O, Schlemper J, Folgoc LL, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, Glocker B, Rueckert D. Attention U-net: Learning where to look for the pancreas[M]. arXiv, 2018.

[15] Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation[M]//Stoyanov D, Taylor Z, Carneiro G, et al. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Vol. 11045. Cham: Springer International Publishing, 2018: 3-11.

[16] Atten- tion u-net: Learning where to look for the pancreas[EB].

[17] Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, Roth HR, Xu D. UNETR: Transformers for 3D Medical Image Segmentation[C]//2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE, 2022: 1748-1758.

[18] Myronenko A. 3D MRI brain tumor segmentation using autoencoder regularization[M]. arXiv, 2018.

[19] Hatamizadeh A, Nath V, Tang Y, Yang D, Roth H, Xu D. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images[M]. arXiv, 2022.

[20] Ma J, Li F, Wang B. U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation[M]. arXiv, 2024.

[21] Huang CH, Wu HY, Lin YL. HarDNet-MSEG: A Simple Encoder-Decoder Polyp Segmentation Neural Network that Achieves over 0.9 Mean Dice and 86 FPS[M]. arXiv, 2021.

[22] Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation[J]. Nature methods, 2021, 18(2): 203-211.

[23] Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation[M]. arXiv, 2021.

[24] Sanderson E, Matuszewski BJ. FCN-Transformer Feature Fusion for Polyp Segmentation[M]//Yang G, Aviles-Rivero A, Roberts M, et al. Medical Image Understanding and Analysis: Vol. 13413. Cham: Springer International Publishing, 2022: 892-907.

SAM2-FNet: A Frequency-Domain Mul-ti-Expert Fusion Network for Medical Image Segmentation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

INDEXING & ABSTRACTING