In this project, we propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include any mechanism to analyze large-scale geospatial data. In this work, we extend HBase and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting Hadoop-based framework is a robust capability to share large-scale geospatial data using spatial data mining and making its outputs available to end users.
Figure 1. R and HBase
Figure 2. Shark Alert: Test Objects are Multiple Geospatial Data Files
Figure 3. Probability of Shark Appearance Calculated by Water Temperature and Salinity.