Multimodal Large Language Models to road inventory with non-photorealistic
Point Cloud visualization

Ameen, Horia; Soilán, Mario; Lorenzo, Henrique; Balado, Jesús

doi:10.5194/isprs-annals-XI-2-2026-793-2026

Articles | Volume XI-2-2026

https://doi.org/10.5194/isprs-annals-XI-2-2026-793-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-annals-XI-2-2026-793-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XI-2-2026

03 Jul 2026

| 03 Jul 2026

Multimodal Large Language Models to road inventory with non-photorealistic Point Cloud visualization

Horia Ameen, Mario Soilán, Henrique Lorenzo, and Jesús Balado

Keywords: LiDAR, Mobile Laser Scanning, Mobile Mapping Systems, Visual Language Models, Deep Learning

Abstract. Accurate road inventories are crucial for maintenance, safety, and resource allocation, with automation improving efficiency but often lacking user-friendly human-machine interaction. This paper evaluates how non-photorealistic rendering of 3D point clouds impacts Multimodal Large Language Models (MLLMs) interpretation for road inventory, testing three methods on real road data in Santarém (Portugal). From 3D point clouds coloured with RGB information, non-photorealistic techniques are implemented and compared: Ambient Occlusion (AO), Eye-Dome Lighting (EDL) and Multi Feature-Rich Synthetic Color (MFRSC). Several state-of-the-art MLLMs are also tested: GPT5, Gemini2.5-Pro, Gemini2.5-Flash, CogVLM2, MiniCPM-V, Llama4-scout-17b, Mistral-Small3.2, Qwen 2.5vl and Gemma3. The results indicate that non-photorealistic techniques do not hinder the identification of road elements by MLLMs, indicating their potential for 3D point cloud classification tasks even when true RGB colour is not available. Furthermore, the overall performance metrics, with F-scores over 80% for proprietary, state-of-the-art models (GPT5, Sonnet 4.5 and Gemini) show that 2D captures of 3D point clouds can be a suitable data source for zero-shot object classification. Rather than proposing new algorithms, this work contributes an empirical evaluation of how non-photorealistic point-cloud visualizations affect VLM-based road inventory interpretation.

Multimodal Large Language Models to road inventory with non-photorealistic Point Cloud visualization

Useful Links

Useful External Links

Our Contact