ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XI-2-2026
https://doi.org/10.5194/isprs-annals-XI-2-2026-811-2026
https://doi.org/10.5194/isprs-annals-XI-2-2026-811-2026
03 Jul 2026
 | 03 Jul 2026

Evaluating the Performance of 3D Vision Foundation Models for DSM Reconstruction from Satellite Images

Liupeng Su, Yuhao Ye, Han Hu, Zeyuan Dai, Qianrui Guo, Heyi Li, Yulin Ding, and Qing Zhu

Keywords: 3D Vision Foundation Models (3D VFMs), Satellite Imagery, Multi-view Stereo, 3D Reconstruction, Digital Surface Model (DSM)

Abstract. Three-dimensional (3D) reconstruction from satellite imagery is a critical research topic in the fields of remote sensing and geoinformation science. Although 3D Vision Foundation Models (3D VFMs) have demonstrated remarkable performance in reconstructing natural scenes, their capability to handle high-resolution satellite imagery has not been systematically evaluated. This study presents a comprehensive assessment of seven representative 3D VFMs for satellite-based 3D reconstruction and integrates four point-cloud alignment strategies. Rigorous comparisons were conducted against high-precision LiDAR-derived Digital Surface Models (DSMs) using two publicly available multi-view satellite datasets–WHU-TLC and MVS3D. The results show that Depth Anything V2 (DAV2) combined with an affine alignment strategy achieves the best overall performance among the evaluated methods. On the MVS3DM dataset, the reconstructed DSM achieves a Median Absolute Error(MedAE) of 1.693 m, a Root Mean Square Error (RMSE) of 3.649 m, and competitive reconstruction accuracy compared with several traditional photogrammetric pipelines. In contrast, on the lower-resolution WHU-TLC dataset, all 3D VFMs exhibited notable performance degradation, and the reconstructed results showed limited practical value, revealing persistent generalization challenges for current models in low-resolution scenarios. Overall, this study systematically quantifies the performance of 3D VFMs in satellite image-based 3D reconstruction, confirming their strong potential for high-resolution satellite applications and providing valuable insights for enhancing model robustness and generalization across complex urban and low-resolution environments.

Share