Jiaxin Guo1
Wenzhen Dong2
Tianyu Huang1,2
Hao Ding3
Ziyi Wang1
Haomin Kuang4
Qi Dou1,2
Yun-Hui Liu1,2
1The Chinese University of Hong Kong 2Hong Kong Centre For Logistics Robotics
3Johns Hopkins University 4Shanghai Jiao Tong University
The repository contains the official implementation for the paper Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video.
- Release model weights, inference and evaluation code
- Release training code
In this paper, we present Endo3R, a unified 3D surgical foundation model for online scale-consistent reconstruction from monocular endoscopic video without any prior information or extra optimization, predicting globally aligned pointmaps, scale-consistent video depth, camera poses and intrinsics.
The core contribution of our method is expanding the capability of the recent pairwise reconstruction model to long-term incremental dynamic reconstruction by an uncertainty-aware dual memory mechanism.

- Clone Endo3R.
git clone https://github.com/wrld/Endo3R.git- Create the environment, following the below command.
conda create -n endo3r python=3.11 cmake=3.14.0
conda activate endo3r
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia # use the correct version of cuda for your system
pip install -r requirements.txt- Optional, compile the cuda kernels for RoPE:
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../- We train our method on four datasets containing GT/Stereo depth and pose (datasets 1-7 of SCARED, StereoMIS, C3VD, EndoMapper), four datasets without GT data (Cholec80, AutoLaparo, EndoVis17, EndoVis18).
- We evaluate our method on datasets 8-9 of SCARED and all scenes of Hamlyn.
Please download the pretrained models:
mkdir checkpoints
cd checkpoints
gdown https://drive.google.com/uc?id=11hbBHEqBWes4oK2e8OeNi2RM-QtzhKE0Also download the DUSt3R checkpoint:
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pthPlease follow the command:
python demo.py --demo_path SEQ_PATH --kf_every IMG_INTERVAL --save_path SAVE_PATH --ckpt_path ./checkpoints/endo3r.pth --save_result
# example:
# python demo.py --demo_path examples/hamlyn_23/ --kf_every 1 --save_path outputs/hamlyn_23/ --ckpt_path ./checkpoints/endo3r.pthTo visualize the 3D reconstruction result, please follow:
python vis.py --recon_path SAVE_PATH
# example:
# python vis.py --recon_path outputs/hamlyn_23/To validate our method, please run:
# SCARED
python eval.py --data_root EVAL_DATA_ROOT --data_type scared --ckpt_path ./checkpoints/endo3r.pth --resolution 320
# Hamlyn
python eval.py --data_root EVAL_DATA_ROOT --data_type hamlyn --ckpt_path ./checkpoints/endo3r.pth --resolution 320We would like to thank the authors of MonST3R, Spann3R, and CUT3R for their excellent work!
@inproceedings{guo2025endo3r,
title={Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video},
author={Jiaxin Guo and Wenzhen Dong and Tianyu Huang and Hao Ding and Ziyi Wang and Haomin Kuang and Qi Dou and Yun-Hui Liu},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
year={2025},
}