ZR2ViM: a recursive vision Mamba model for boundary-preserving medical image segmentation

Caijian Hua; Caorong Xiang; Liuying Li; Xia Zhou

Authors

Caijian Hua
Caorong Xiang
Liuying Li
Xia Zhou

Keywords:

Boundary preservation, Deep learning, Medical image segmentation, State space models, Vision mamba, Zigzag scanning

Abstract

Introduction: Medical image segmentation is fundamental to quantitative disease analysis and therapeutic decision-making. However, constrained by limited computational resources, existing deep learning methods often struggle to simultaneously model long-range dependencies and preserve boundary precision, particularly when delineating structures with complex morphology or blurred edges.

Method: To overcome these challenges, we propose ZR²ViM, a recursion-enhanced visual state space model designed for medical image segmentation. ZR²ViM augments the Vision Mamba framework with a Zigzag Recursive Reinforced (ZR²) Block that incorporates Stacked State Redistribution (SSR) and a Nested Recursive Connection (NRC). The NRC employs dual inner and outer
pathways to iteratively fuse local details with global context while preserving 2D spatial adjacency. Furthermore, a Cross-directional Zigzag WKV (CZ-WKV) module executes multi-step recursive updates along multiple zigzag trajectories, injecting spatial directional information via Quad-Directional Token Shift (QShift) directional priors. Collectively, these mechanisms mitigate serialization-induced banding artifacts and enhance the representation of fine, elongated, and low-contrast structures, all while maintaining near-linear computational complexity.

Results: Comprehensive evaluations across four medical imaging domains—spanning dermatoscopic images, breast ultrasound, colorectal polyps, and abdominal multi-organ CT—on five public datasets demonstrate that ZR²ViM consistently outperforms representative convolutional, attention-based, and visual state space architectures in region consistency and boundary localization. Notably, ZR²ViM achieves a 2.15 mm reduction in the HD95 on the Synapse multi-organ CT dataset relative to the CC-ViM baseline, substantiating its superior capability for precise, clinically relevant boundary delineation.

Conclusion: The ZR²ViM framework delivers accurate, boundary-preserving segmentation across diverse imaging modalities and anatomically complex structures, achieving these gains with near-linear computational complexity. These findings demonstrate that ZR²ViM offers a robust and efficient solution for medical image analysis, establishing a promising foundation for advanced clinical and research applications.

ZR2ViM: a recursive vision Mamba model for boundary-preserving medical image segmentation

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section