Per-pixel population estimates in Western Amazon using limited remote sensing and spatial data
Keywords: Random Forest, Machine Learning, Demography, Remote Sensing, Population Estimate
Abstract. There is a lack of detailed demographic data in the northern Brazil region from the 1980s to the early 2000s. These data are available only at the municipal level, which in northern Brazil corresponds to extensive territorial areas. This data gap may hinder understanding of various human settlement processes in the region, affecting insights into processes such as the expansion of economic activities, deforestation, and even violent conflicts. Machine learning algorithms, such as Random Forest, combined with geospatial data from different sources, can be employed to disaggregate demographic data, transforming the discrete space of municipal polygons into a continuous raster surface. Thus, this study aims to assess the performance of these technologies under limited data availability in scenarios similar to those in the late decades of the twentieth century. To this end, a Random Forest model was implemented and evaluated against both the 2022 Brazilian census data and the WorldPop dataset. The results indicate that the methodology proposed here is a viable solution in data-scarce contexts, yielding estimates comparable to official census figures and to more complex products like WorldPop, while demanding significantly less computational effort. Future research should examine the model’s performance across broader and more heterogeneous regions to better assess its generalizability.
