论文标题
GeoFlink:用于空间流实时处理的分布式可扩展框架
GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams
论文作者
论文摘要
Apache Flink是一种用于批处理和流数据可扩展处理的开源系统。 Flink不本质地支持空间数据流的有效处理,这是许多处理空间数据的应用程序的要求。除了flink,其他可扩展的空间数据处理平台,包括Geospark,空间Hadoop等。不支持流式工作负载,只能处理静态/批处理工作负载。为了填补这一空白,我们提出了GeoFlink,它扩展了Apache Flink,以支持空间数据流的空间数据类型,索引和连续查询。为了有效地处理空间连续查询以及为跨夹具群集节点的有效数据分布的有效处理,引入了基于GIRD的索引。 GeoFlink当前支持空间范围,空间$ k $ nn和点数据类型上的空间加入查询。对实际空间数据流的广泛实验研究表明,地理链接的查询吞吐量明显高于普通的flink处理。
Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data. Besides Flink, other scalable spatial data processing platforms including GeoSpark, Spatial Hadoop, etc. do not support streaming workloads and can only handle static/batch workloads. To fill this gap, we present GeoFlink, which extends Apache Flink to support spatial data types, indexes and continuous queries over spatial data streams. To enable the efficient processing of spatial continuous queries and for the effective data distribution across Flink cluster nodes, a gird-based index is introduced. GeoFlink currently supports spatial range, spatial $k$NN and spatial join queries on point data type. An extensive experimental study on real spatial data streams shows that GeoFlink achieves significantly higher query throughput than ordinary Flink processing.