ISPRS-Annals

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Annals

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9050

Copernicus Publications

Göttingen, Germany

10.5194/isprs-annals-XI-2-2026-793-2026

Multimodal Large Language Models to road inventory with non-photorealistic Point Cloud visualization

Ameen

Horia

¹ Soilán

Mario

https://orcid.org/0000-0001-6545-2225

¹ Lorenzo

Henrique

¹ Balado

Jesús

https://orcid.org/0000-0002-3758-3102

CINTECX, Universidade de Vigo, GeoTECH, 36310 Vigo, Spain

03 07 2026

XI-2-2026 793 800

2026

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-annals.copernicus.org/articles/XI-2-2026/793/2026/isprs-annals-XI-2-2026-793-2026.html

The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/XI-2-2026/793/2026/isprs-annals-XI-2-2026-793-2026.pdf

Accurate road inventories are crucial for maintenance, safety, and resource allocation, with automation improving efficiency but often lacking user-friendly human-machine interaction. This paper evaluates how non-photorealistic rendering of 3D point clouds impacts Multimodal Large Language Models (MLLMs) interpretation for road inventory, testing three methods on real road data in Santarém (Portugal). From 3D point clouds coloured with RGB information, non-photorealistic techniques are implemented and compared: Ambient Occlusion (AO), Eye-Dome Lighting (EDL) and Multi Feature-Rich Synthetic Color (MFRSC). Several state-of-the-art MLLMs are also tested: GPT5, Gemini2.5-Pro, Gemini2.5-Flash, CogVLM2, MiniCPM-V, Llama4-scout-17b, Mistral-Small3.2, Qwen 2.5vl and Gemma3. The results indicate that non-photorealistic techniques do not hinder the identification of road elements by MLLMs, indicating their potential for 3D point cloud classification tasks even when true RGB colour is not available. Furthermore, the overall performance metrics, with F-scores over 80% for proprietary, state-of-the-art models (GPT5, Sonnet 4.5 and Gemini) show that 2D captures of 3D point clouds can be a suitable data source for zero-shot object classification. Rather than proposing new algorithms, this work contributes an empirical evaluation of how non-photorealistic point-cloud visualizations affect VLM-based road inventory interpretation.