INTERESTING SPATIO-TEMPORAL REGION DISCOVERY COMPUTATIONS OVER GPU AND MAPREDUCE PLATFORMS
Keywords: GPU, MapReduce, Hadoop, Parallelism, GIS, Spatial, Spatio-Temporal, Data Mining
Abstract. Discovery of interesting paths and regions in spatio-temporal data sets is important to many fields such as the earth and atmospheric sciences, GIS, public safety and public health both as a goal and as a preliminary step in a larger series of computations. This discovery is usually an exhaustive procedure that quickly becomes extremely time consuming to perform using traditional paradigms and hardware and given the rapidly growing sizes of today’s data sets is quickly outpacing the speed at which computational capacity is growing. In our previous work (Prasad et al., 2013a) we achieved a 50 times speedup over sequential using a single GPU. We were able to achieve near linear speedup over this result on interesting path discovery by using Apache Hadoop to distribute the workload across multiple GPU nodes. Leveraging the parallel architecture of GPUs we were able to drastically reduce the computation time of a 3-dimensional spatio-temporal interest region search on a single tile of normalized difference vegetative index for Saudi Arabia. We were further able to see an almost linear speedup in compute performance by distributing this workload across several GPUs with a simple MapReduce model. This increases the speed of processing 10 fold over the comparable sequential while simultaneously increasing the amount of data being processed by 384 fold. This allowed us to process the entirety of the selected data set instead of a constrained window.