A SPARK BASED COMPUTING FRAMEWORK FOR SPATIAL DATA
Keywords: Spatial Data, Spark, Index, Spatial Operations
Abstract. In this paper, a novel Apache Spark-based framework for spatial data processing is proposed, which includes 4 layers: spatial data storage, spatial RDDs, spatial operations, and spatial query language. The spatial data storage layer uses HDFS to store large size of spatial vector/raster data in the distributed cluster. The spatial RDDs are the abstract logical dataset of spatial data types, and can be transferred to the spark cluster to conduct spark transformations and actions. The spatial operations layer is a series of processing on spatial RDDs, such as range query, k nearest neighbour and spatial join. The spatial query language is a user-friendly interface which provide people not familiar with Spark with a comfortable way to operation the spatial operation. Compared with other spatial frameworks based on Spark, it is highlighted that spatial indexes like grid, R-tree are used for data storage and query. Extensive experiments on real system prototype and real datasets show that better performance can be achieved.