Regular Languages meet Prefix Sorting

被引:0
|
作者
Alanko, Jarno [1 ]
D'Agostino, Giovanna [2 ]
Policriti, Alberto [2 ]
Prezza, Nicola [3 ]
机构
[1] Univ Helsinki, Helsinki, Finland
[2] Univ Udine, Udine, Italy
[3] Univ Pisa, Pisa, Italy
关键词
AUTOMATA; GRAPHS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]- which extends naturally the concept of prefix sorting to labeled graphs|we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. We first characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: the sorted prefixes of strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. We proceed by proving several results related to Wheeler automata: (i) We show that every Wheeler NFA (WNFA) with n states admits an equivalent Wheeler DFA (WDFA) with at most 2n-1 vertical bar Sigma vertical bar states (Sigma being the alphabet) that can be computed in O(n(3)) time. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a O(n log n)-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in O (n log n) time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time. Our contributions imply new results of independent interest. Contributions (i-iii) provide a new class of NFAs for which the minimization problem can be approximated within a constant factor in polynomial time. Contribution (iv) provides a provably minimum-size solution for the well-studied problem of indexing deterministicacyclic graphs for linear-time pattern matching queries.
引用
收藏
页码:911 / 930
页数:20
相关论文
共 50 条
  • [1] Regular Languages meet Prefix Sorting
    Alanko, Jarno
    D'Agostino, Giovanna
    Policriti, Alberto
    Prezza, Nicola
    PROCEEDINGS OF THE THIRTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'20), 2020, : 911 - 930
  • [2] Prefix Distance Between Regular Languages
    Ng, Timothy
    IMPLEMENTATION AND APPLICATION OF AUTOMATA, 2016, 9705 : 224 - 235
  • [3] Parameterized Prefix Distance between Regular Languages
    Kutrib, Martin
    Meckel, Katja
    Wendlandt, Matthias
    SOFSEM 2014: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2014, 8327 : 419 - 430
  • [4] Complexity in Prefix-Free Regular Languages
    Jiraskova, Galina
    Krausova, Monika
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2010, (31): : 197 - 204
  • [5] PREFIX GRAMMARS - AN ALTERNATIVE CHARACTERIZATION OF THE REGULAR LANGUAGES
    FRAZIER, M
    PAGE, CD
    INFORMATION PROCESSING LETTERS, 1994, 51 (02) : 67 - 71
  • [6] Prefix-free regular languages and pattern matching
    Han, Yo-Sub
    Wang, Yajun
    Wood, Derick
    THEORETICAL COMPUTER SCIENCE, 2007, 389 (1-2) : 307 - 317
  • [7] Complexity of proper prefix-convex regular languages
    Brzozowski, Janusz A.
    Sinnamon, Corwin
    THEORETICAL COMPUTER SCIENCE, 2019, 787 : 2 - 13
  • [8] Kleene Closure on Regular and Prefix-Free Languages
    Jiraskova, Galina
    Palmovsky, Matus
    Sebej, Juraj
    IMPLEMENTATION AND APPLICATION OF AUTOMATA, CIAA 2014, 2014, 8587 : 226 - 237
  • [9] Equality sets of prefix morphisms and regular star languages
    Halava, V
    Harju, T
    Latteux, M
    INFORMATION PROCESSING LETTERS, 2005, 94 (04) : 151 - 154
  • [10] Non-regular Maximal Prefix-Free Subsets of Regular Languages
    Jirasek, Jozef, Jr.
    DEVELOPMENTS IN LANGUAGE THEORY, DLT 2016, 2016, 9840 : 229 - 242