OIQ-tree: Support Continuous k Nearest Neighbor Queries over Large-Scale Spatial-Textual Data Streams

被引:0
|
作者
Yang R. [1 ]
Niu B.-N. [1 ]
机构
[1] College of Information and Computer, Taiyuan University of Technology, Jinzhong
来源
基金
中国国家自然科学基金;
关键词
Continuous query; Data streams; K nearest neighbor; Spatial-textual index; Spatial-textual query;
D O I
10.11897/SP.J.1016.2021.01732
中图分类号
学科分类号
摘要
The continuous k nearest neighbor queries over spatial textual data streams (CkQST for short) retrieve and continuously monitor at most k nearest neighbor objects to the user specified location containing all the user specified keywords over the data streams composed of spatial textual objects, which is a type of continuous queries over spatial textual data streams and has been widely used in a wide variety of location based applications, such as location aware targeting of advertisements, analysis of microblogs and mobile navigation services etc. by way of subscriptions. Evaluating CkQST utilizes the solution framework of evaluating the generic continuous queries over spatial textual data streams, i.e. selecting a spatial index and a textual index to form a hybrid spatial textual index to organize the queries, and matching the incoming objects continuously generated utilizing the spatial and textual filtering capabilities of the index. In this framework, the evaluation efficiency depends on the filtering ability of the index, and the major approach to improving the filtering ability of the index is to map the spatial search range of the queries to the smallest area of the index structure to reduce the number of queries being verified by the objects over data streams, which is suitable for the situations where the search range of the queries rarely changes. For CkQST, the spatial range covering k nearest neighbor qualified objects frequently changes with the number of the objects containing all the query keywords, and accordingly the index should be updated synchronously, which requires very expensive cost. To solve the problem, this paper selects the Quad-tree integrated with an inverted index to construct a hybrid spatial textual index to organize CkQST, where the Quad-tree can efficiently support the frequent change of the spatial range of CkQST, and the inverted index can efficiently support the keyword query. With respect to the spatial filtering, a memory based cost model VUMBCM (Verification and Update of Memory Based Cost Model) is proposed to optimize the mapping the search range of CkQST to the Quad-tree nodes by trading off the verification cost and update cost of index; With respect to the textual filtering, a block based ordered inverted index is proposed to organize CkQST at the Quad-tree nodes, which can quickly locate the promising queries and avoid verifying a large number of unpromising queries in the posting lists. Additionally, the ordered inverted index allows multiple objects over data streams containing common texts to be processed in a batch, which can improve the throughput performance during textual verification. The above hybrid index integrated the Quad-tree with the block based ordered inverted index and the cost model is called OIQ-tree. The extensive experiments on real world and synthetic datasets demonstrate that the proposed index OIQ-tree can efficiently evaluate CkQST. Compared with the state of the art techniques, when the number of the subscribed queries reaches 20 million, the average index updating time caused by the incoming objects over the data streams decreases by 46%, and the average incoming objects processing time decreases by 22%. © 2021, Science Press. All right reserved.
引用
收藏
页码:1732 / 1750
页数:18
相关论文
共 24 条
  • [1] Chen L S, Cong G, Cao X., An efficient query indexing mechanism for filtering geo-textual data, Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 749-760, (2013)
  • [2] Mahmood A R, Aly A M, Aref W G., FAST: Frequency-aware indexing for spatio-textual data streams, Proceedings of the 2018 IEEE 34th International Conference on Data Engineering, pp. 305-316, (2018)
  • [3] Wang X, Zhang Y, Zhang W J, Et al., AP-tree: Efficiently support location-aware publish/subscribe, The VLDB Journal, 24, 6, pp. 823-848, (2015)
  • [4] Deng Z, Wang M, Wang L Z, Et al., An efficient indexing approach for continuous spatial approximate keyword queries over geo-textual streaming data, International Journal of Geo-Information, 8, 2, pp. 57-76, (2019)
  • [5] Guo L, Zhang D X, Li G L, Et al., Location-aware pub/sub system: When continuous moving queries meet dynamic event streams, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 843-857, (2015)
  • [6] Li G L, Wang Y, Wang T, Feng J H., Location-aware publish/subscribe, Proceedings of the 2013 ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 802-810, (2013)
  • [7] Hu H Q, Liu Y Q, Li G L, Et al., A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions, Proceedings of the 2015 IEEE 31st International Conference on Data Engineering, pp. 711-722, (2015)
  • [8] Wang X, Zhang W J, Zhang Y, Et al., Top-k spatial-keyword publish/subscribe over sliding window, The VLDB Journal, 26, 3, pp. 301-326, (2017)
  • [9] Chen L S, Cong G, Cao X, Tan K L., Temporal spatial-keyword top-k publish/subscribe, Proceedings of the 2015 IEEE 31st International Conference on Data Engineering, pp. 255-266, (2015)
  • [10] Chen L S, Shang S., Approximate spatio-temporal top-k publish/subscribe, World Wide Web, 22, 5, pp. 2153-2175, (2019)