Multimodal Large Language Models to road inventory with non-photorealistic Point Cloud visualization
Keywords: LiDAR, Mobile Laser Scanning, Mobile Mapping Systems, Visual Language Models, Deep Learning
Abstract. Accurate road inventories are crucial for maintenance, safety, and resource allocation, with automation improving efficiency but often lacking user-friendly human-machine interaction. This paper evaluates how non-photorealistic rendering of 3D point clouds impacts Multimodal Large Language Models (MLLMs) interpretation for road inventory, testing three methods on real road data in Santarém (Portugal). From 3D point clouds coloured with RGB information, non-photorealistic techniques are implemented and compared: Ambient Occlusion (AO), Eye-Dome Lighting (EDL) and Multi Feature-Rich Synthetic Color (MFRSC). Several state-of-the-art MLLMs are also tested: GPT5, Gemini2.5-Pro, Gemini2.5-Flash, CogVLM2, MiniCPM-V, Llama4-scout-17b, Mistral-Small3.2, Qwen 2.5vl and Gemma3. The results indicate that non-photorealistic techniques do not hinder the identification of road elements by MLLMs, indicating their potential for 3D point cloud classification tasks even when true RGB colour is not available. Furthermore, the overall performance metrics, with F-scores over 80% for proprietary, state-of-the-art models (GPT5, Sonnet 4.5 and Gemini) show that 2D captures of 3D point clouds can be a suitable data source for zero-shot object classification. Rather than proposing new algorithms, this work contributes an empirical evaluation of how non-photorealistic point-cloud visualizations affect VLM-based road inventory interpretation.
