Regular Languages meet Prefix Sorting

被引:0
|
作者
Alanko, Jarno [1 ]
D'Agostino, Giovanna [2 ]
Policriti, Alberto [2 ]
Prezza, Nicola [3 ]
机构
[1] Univ Helsinki, Helsinki, Finland
[2] Univ Udine, Udine, Italy
[3] Univ Pisa, Pisa, Italy
关键词
AUTOMATA; GRAPHS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]- which extends naturally the concept of prefix sorting to labeled graphs|we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. We first characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: the sorted prefixes of strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. We proceed by proving several results related to Wheeler automata: (i) We show that every Wheeler NFA (WNFA) with n states admits an equivalent Wheeler DFA (WDFA) with at most 2n-1 vertical bar Sigma vertical bar states (Sigma being the alphabet) that can be computed in O(n(3)) time. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a O(n log n)-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in O (n log n) time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time. Our contributions imply new results of independent interest. Contributions (i-iii) provide a new class of NFAs for which the minimization problem can be approximated within a constant factor in polynomial time. Contribution (iv) provides a provably minimum-size solution for the well-studied problem of indexing deterministicacyclic graphs for linear-time pattern matching queries.
引用
收藏
页码:911 / 930
页数:20
相关论文
共 50 条
  • [41] On the intersection of regex languages with regular languages
    Campeanu, Cezar
    Santean, Nicolae
    THEORETICAL COMPUTER SCIENCE, 2009, 410 (24-25) : 2336 - 2344
  • [42] Regular component decomposition of regular languages
    Liu, YJ
    THEORETICAL COMPUTER SCIENCE, 2003, 299 (1-3) : 743 - 749
  • [43] Two classes of languages related to the prefix codes
    Zhang, RH
    Cai, YY
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2004, 81 (01) : 1 - 7
  • [44] An Efficient Algorithm for the Computation of the Controllability Prefix of *-Languages
    Moor, Thomas
    Schmidt, Klaus Werner
    Schmuck, Anne-Kathrin
    IFAC PAPERSONLINE, 2020, 53 (02): : 2122 - 2129
  • [45] An Algebraic Characterization of Prefix-Strict Languages
    Tian, Jing
    Chen, Yizhi
    Xu, Hui
    MATHEMATICS, 2022, 10 (19)
  • [46] Largest Common Prefix of a Regular Tree Language
    Lohrey, Markus
    Maneth, Sebastian
    FUNDAMENTALS OF COMPUTATION THEORY, FCT 2019, 2019, 11651 : 95 - 108
  • [47] CHARACTERIZATIONS FOR THE REGULAR PREFIX CODES AND RELATED FAMILIES
    VELOSO, PAS
    INTERNATIONAL JOURNAL OF COMPUTER & INFORMATION SCIENCES, 1980, 9 (05): : 371 - 382
  • [48] PERIODICITY OF REGULAR LANGUAGES
    HWANG, K
    INFORMATION AND CONTROL, 1979, 40 (02): : 205 - 222
  • [49] REGULAR SEPARATION OF LANGUAGES
    ARIKAWA, S
    BULLETIN OF MATHEMATICAL STATISTICS, 1974, 16 (1-2): : 83 - 94
  • [50] On the entropy of regular languages
    Ceccherini-Silberstein, T
    Machi, A
    Scarabotti, F
    THEORETICAL COMPUTER SCIENCE, 2003, 307 (01) : 93 - 102