RE-STORE: A system for compressing, browsing, and searching large documents

被引:9
|
作者
Moffat, A [1 ]
Wan, R [1 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Parkville, Vic 3010, Australia
关键词
D O I
10.1109/SPIRE.2001.989752
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We describe a software system for managing text files of up to several hundred megabytes that combines a number of useful facilities. First, the text is stored compressed using a variant of the RE-PAIR mechanism described by Larsson and Moffat, with space savings comparable to those obtained by other widely used general-purpose compression systems. Second, we provide, as a byproduct of the compression process, a phrase-based browsing tool that allows users to explore the contents of the source text in a natural and useful manner. And third, once a set of desired phrases has been determined through the use of the browsing tool, the compressed text can be searched to determine locations at which those phrases appear, without decompressing the whole of the stored text, and without use of an additional index. That is, we show how the RE-PAIR compression regime can be extended to allow phrase-based browsing and fast interactive searching.
引用
收藏
页码:162 / 174
页数:13
相关论文
共 50 条
  • [1] RE-Store: Reliable and Efficient KV-Store with Erasure Coding and Replication
    Li, Yuzhe
    Zhou, Jiang
    Wang, Weiping
    Chen, Yong
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 301 - 312
  • [2] From searching to browsing through multimodal documents linking
    Mekhaldi, D
    Lalanne, D
    Ingold, R
    EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 924 - 928
  • [3] Integrated browsing and searching of large image collections
    Pecenovic, Z
    Do, MN
    Vetterli, M
    Pu, P
    ADVANCES IN VISUAL INFORMATION SYSTEMS, PROCEEDINGS, 2000, 1929 : 279 - 289
  • [4] A large scale system for searching and browsing images from the World Wide Web
    Yavlinsky, Alexei
    Heesch, Daniel
    Ruger, Stefan
    IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2006, 4071 : 537 - 540
  • [5] Using VR-VIBE; browsing and searching for documents in 3D-space
    Churchill, EF
    Snowdon, D
    Benford, S
    Dhanda, P
    DESIGN OF COMPUTING SYSTEMS: SOCIAL AND ERGONOMIC CONSIDERATIONS, 1997, 21 : 857 - 860
  • [6] Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream
    Moe, WW
    JOURNAL OF CONSUMER PSYCHOLOGY, 2003, 13 (1-2) : 29 - 39
  • [7] READFAST: Browsing large documents through Unified Famous Objects (UFO)
    Gubanov, Michael
    Pyayt, Anna
    Shapiro, Linda
    2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, : 321 - 326
  • [8] HMNews: an Integrated System for Searching and Browsing Hypermedia News Content
    Montagnuolo, Maurizio
    Ferri, Marco
    Messina, Alberto
    20TH ACM CONFERENCE ON HYPERTEXT AND HYPERMEDIA (HYPERTEXT 2009), 2009, : 83 - 87
  • [9] Using visual representations for the searching and browsing of large, complex, multimedia data sets
    Grierson, H. J.
    Corney, J. R.
    Hatcher, G. D.
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2015, 35 (02) : 244 - 252
  • [10] Searching for New System in Ottoman Education in the Light of Documents
    Yucedag, Ismail
    Erdogan, Hamit
    TARIH KULTUR VE SANAT ARASTIRMALARI DERGISI-JOURNAL OF HISTORY CULTURE AND ART RESEARCH, 2018, 7 (03): : 621 - 631