LEARNING GEOGRAPHICAL DISTRIBUTION OF VACANT HOUSES USING CLOSED MUNICIPAL DATA: A CASE STUDY OF WAKAYAMA CITY, JAPAN
Keywords: Vacant house, Municipal data, Closed data, Resident registration, Building registration, Machine learning, XGBoost
Abstract. Vacant housing detection is an urgent problem that needs to be addressed. It is also a suitable example to promote utilisation of smart data that are stored in municipalities. This study proposes a vacant housing detection model that uses closed municipal data and considers accelerating the use of public data to promote smart cities. Employing a machine learning technique, this study ensures high predictive power for vacant housing detection. The model enables us to handle complex municipal data that include non-linear feature characteristics and substantial missing data. In particular, handling missing data is important in the practical use of closed municipal data because not all of the data are necessarily absorbed to a building unit. Consequently, the model in this analysis showed that the accuracy and false positive rate are 95.4 percent and 3.7 percent, respectively, which are high enough to detect vacant houses. However, the true positive rate is 77.0 percent. Although the rate is not low to some extent, selection of features and further collection of extra samples may improve the rate. Geographic distribution of vacant houses further enabled us to check the difference between the actual and estimated number of vacant houses, and more than 80 percent of 500-meter grid data are with below 10 errors, which we think, provides city planners with informative data to roughly grasp geographical tendencies.