Processing of Huffman compressed texts with a super-alphabet

被引：0

作者：

Fredriksson, K

Tarhio, J

机构：

[1] Univ Joensuu, Dept CS, FIN-80101 Joensuu, Finland

[2] Aalto Univ, Dept CSE, FIN-02015 Espoo, Finland

来源：

STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS | 2003年 / 2857卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present an efficient algorithm for scanning Huffman compressed texts. The algorithm parses the compressed text in O(nlog(2)sigma/b) time, where n is the size of the compressed text in bytes, or is the size of the alphabet, and b is a user specified parameter. The method uses a variable size super-alphabet, with an average size of O(b/H log(2)sigma) symbols, where H is the entropy of the text. Each super-symbol is processed in O(1) time. The algorithm uses O(2(b)) space, and O(b2(b)) preprocessing time. The method can be easily augmented by auxiliary functions, which can e.g. decompress the text, or perform pattern matching in the compressed text. We give three example functions: decoding the text in average time O(nlog(2)sigma/Hw), where w is the number of bits in a machine word; an Aho-Corasick dictionary matching algorithm, which works in time O(n log(2)sigma/b + t), where t is the number of occurrences reported, and a shift-or string matching algorithm that works in time O(n log(2)sigma/b [(m + s)/w] + t), where m is the length of the pattern and s depends on the encoding. The Aho-Corasick algorithm uses an automaton with variable length moves, i.e. it processes variable number of states at each step. The shift-or algorithm makes variable length shifts, effectively also processing variable number of states at each step. The number of states processed in O(1) time is O(b/H log(2) sigma). The method can be applied to several other algorithms as well. We conclude with some experimental results.

引用

页码：108 / 121

页数：14

共 50 条

[31] Generalized argument/alphabet signal processing
Blyumin, S
ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 23 - 23
[32] Problems of publishing Romance-language texts in the Greek alphabet
Schlosser, R
OLD AND NEW PHILOLOGY, 1997, 8 : 337 - 345
[33] Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
Cantone, Domenico
Faro, Simone
Giaquinta, Emanuele
PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2009, 2009, : 29 - 39
[34] ADAPTING BOYER-MOORE-LIKE ALGORITHMS FOR SEARCHING HUFFMAN ENCODED TEXTS
Cantone, Domenico
Faro, Simone
Giaquinta, Emanuele
INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2012, 23 (02) : 343 - 356
[35] FINDING CHARACTERISTIC SUBSTRINGS FROM COMPRESSED TEXTS
Inenaga, Shunsuke
Bannai, Hideo
INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2012, 23 (02) : 261 - 280
[36] Finding Characteristic Substrings from Compressed Texts
Inenaga, Shunsuke
Bannai, Hideo
PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2009, 2009, : 40 - 54
[37] Robust super resolution of compressed video
Zhang, Xiaohong
Tang, Min
Tong, Ruofeng
VISUAL COMPUTER, 2012, 28 (12): : 1167 - 1180
[38] Robust super resolution of compressed video
Xiaohong Zhang
Min Tang
Ruofeng Tong
The Visual Computer, 2012, 28 : 1167 - 1180
[39] "The Mouth is the Wound of the Alphabet." About the Texts of the Collapsing New Buildings
Schuette, Uwe
WEIMARER BEITRAGE, 2019, 65 (04): : 606 - 624
[40] Adapting the Knuth-Morris-Pratt algorithm for pattern matching in Huffman encoded texts
Daptardar, A
Shapira, D
DCC 2004: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2004, : 535 - 535

← 1 2 3 4 5 →