ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume X-4/W3-2022
https://doi.org/10.5194/isprs-annals-X-4-W3-2022-73-2022
https://doi.org/10.5194/isprs-annals-X-4-W3-2022-73-2022
14 Oct 2022
 | 14 Oct 2022

THE EFFECT OF CULTURE-SPECIFIC DIFFERENCES IN URBAN STREETSCAPES ON THE INFERENCE ACCURACY OF DEEP LEARNING MODELS

T. Inoue, R. Manabe, A. Murayama, and H. Koizumi

Keywords: Streetscape, Deep Learning, Semantic Segmentation, Intersection over Union, Inference Accuracy

Abstract. Owing to the increasing focus on places in urban planning and design, methods to evaluate the quality and value of urban places is crucially needed. Many studies use deep learning models to identify the proportion and composition of landscape elements in images for evaluation. The accuracy of semantic segmentation achieved with such models is often validated using Cityscapes, a street-level image dataset taken from German cities. However, few studies have quantitatively revealed the inference accuracy decrease caused by culture-specific characteristics of streetscapes.

In this study, we calculated by-class intersection over union (IoU) and newly-defined indices of false inferences to demonstrate how and to what extent deep learning models can infer each landscape element falsely when applied to Japanese street-level images. Our analysis revealed that certain landscape elements are more difficult to infer correctly based on specific causes, such as their appearances in images and unique characteristics of the fixed physical configuration of Japanese streets. By applying the false inference categorization framework presented in this study, researchers can adjust their approaches considering two aspects: a decrease in inference accuracies of deep learning models and the impact of culture-specific characteristics of streetscapes on people's perception and valuation of urban places. Based on the results and analyses, a future research direction is to develop and implement more accurate image recognition models considering culture-specific characteristics to understand people's perceptions of urban spaces and assess the value of urban places by using the big data including street-level images.