ZR2ViM: a recursive vision Mamba model for boundary-preserving medical image segmentation

Authors

  • Caijian Hua
  • Caorong Xiang
  • Liuying Li
  • Xia Zhou

Keywords:

Boundary preservation, Deep learning, Medical image segmentation, State space models, Vision mamba, Zigzag scanning

Abstract

Introduction: Medical image segmentation is fundamental to quantitative disease analysis and therapeutic decision-making. However, constrained by limited computational resources, existing deep learning methods often struggle to simultaneously model long-range dependencies and preserve boundary precision, particularly when delineating structures with complex morphology or blurred edges.

Method: To overcome these challenges, we propose ZR2ViM, a recursion-enhanced visual state space model designed for medical image segmentation. ZR2ViM augments the Vision Mamba framework with a Zigzag Recursive Reinforced (ZR2) Block that incorporates Stacked State Redistribution (SSR) and a Nested Recursive Connection (NRC). The NRC employs dual inner and outer
pathways to iteratively fuse local details with global context while preserving 2D spatial adjacency. Furthermore, a Cross-directional Zigzag WKV (CZ-WKV) module executes multi-step recursive updates along multiple zigzag trajectories, injecting spatial directional information via Quad-Directional Token Shift (QShift) directional priors. Collectively, these mechanisms mitigate serialization-induced banding artifacts and enhance the representation of fine, elongated, and low-contrast structures, all while maintaining near-linear computational complexity.

Results: Comprehensive evaluations across four medical imaging domains—spanning dermatoscopic images, breast ultrasound, colorectal polyps, and abdominal multi-organ CT—on five public datasets demonstrate that ZR2ViM consistently outperforms representative convolutional, attention-based, and visual state space architectures in region consistency and boundary localization. Notably, ZR2ViM achieves a 2.15 mm reduction in the HD95 on the Synapse multi-organ CT dataset relative to the CC-ViM baseline, substantiating its superior capability for precise, clinically relevant boundary delineation.

Conclusion: The ZR2ViM framework delivers accurate, boundary-preserving segmentation across diverse imaging modalities and anatomically complex structures, achieving these gains with near-linear computational complexity. These findings demonstrate that ZR2ViM offers a robust and efficient solution for medical image analysis, establishing a promising foundation for advanced clinical and research applications.

Published

2026-04-16