Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Edit Distance

被引:1
|
作者
Grigorescu, Elena [1 ]
Sudan, Madhu [2 ]
Zhu, Minshen [1 ]
机构
[1] Purdue Univ, Comp Sci Dept, W Lafayette, IN 47907 USA
[2] Harvard Univ, Harvard John A Paulson Sch Engn & Appl Sci, Boston, MA 02134 USA
关键词
Trace reconstruction; mean-based algorithms; complex analysis; multiplicity of zeros; LITTLEWOOD-TYPE PROBLEMS; EFFICIENT RECONSTRUCTION; LOWER BOUNDS;
D O I
10.1109/TIT.2022.3168624
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Trace reconstruction considers the task of recovering an unknown string x is an element of {0,1}(n) given a number of independent "traces", i.e., subsequences of x obtained by randomly and independently deleting every symbol of x with some probability p. The information-theoretic limit of the number of traces needed to recover a string of length n is still unknown. This limit is essentially the same as the number of traces needed to determine, given strings x and y and traces of one of them, which string is the source. The most-studied class of algorithms for the worst-case version of the problem are "mean-based" algorithms. These are a restricted class of distinguishers that only use the mean value of each coordinate on the given samples. In this work we study limitations of mean-based algorithms on strings at small Hamming or edit distance. We show that, on the one hand, distinguishing strings that are nearby in Hamming distance is "easy" for such distinguishers. On the other hand, we show that distinguishing strings that are nearby in edit distance is "hard" for mean-based algorithms. Along the way, we also describe a connection to the famous Prouhet-Tarry-Escott (PTE) problem, which shows a barrier to finding explicit hard-to-distinguish strings: namely such strings would imply explicit short solutions to the PTE problem, a well-known difficult problem in number theory. Furthermore, we show that the converse is also true, thus, finding explicit solutions to the PTE problem is equivalent to the problem of finding explicit strings that are hard-to-distinguish by mean-based algorithms. Our techniques rely on complex analysis arguments that involve careful trigonometric estimates, and algebraic techniques that include applications of Descartes' rule of signs for polynomials over the reals.
引用
收藏
页码:6790 / 6801
页数:12
相关论文
共 27 条
  • [11] Phylogenetic tree reconstruction based on normalized edit distance
    Li, Yu-Jian
    Wang, Fang-Yuan
    Beijing Gongye Daxue Xuebao / Journal of Beijing University of Technology, 2008, 34 (11): : 1211 - 1215
  • [12] Nash Convergence of Mean-Based Learning Algorithms in First Price Auctions
    Deng, Xiaotie
    Hu, Xinyan
    Lin, Tao
    Zheng, Weiqiang
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 141 - 150
  • [13] Mean-based Borda count for paradox-free comparisons of optimization algorithms
    Liu, Qunfeng
    Jing, Yunpeng
    Yan, Yuan
    Li, Yun
    INFORMATION SCIENCES, 2024, 660
  • [14] Cultural distance in international business and management: from mean-based to variance-based measures
    Beugelsdijk, Sjoerd
    Maseland, Robbert
    Onrust, Marjolijn
    van Hoorn, Andre
    Slangen, Arjen
    INTERNATIONAL JOURNAL OF HUMAN RESOURCE MANAGEMENT, 2015, 26 (02): : 165 - 191
  • [15] Efficient sequential and parallel algorithms for finding edit distance based motifs
    Soumitra Pal
    Peng Xiao
    Sanguthevar Rajasekaran
    BMC Genomics, 17
  • [16] Efficient sequential and parallel algorithms for finding edit distance based motifs
    Pal, Soumitra
    Xiao, Peng
    Rajasekaran, Sanguthevar
    BMC GENOMICS, 2016, 17
  • [17] Mean-based geodesic distance alignment transfer for decoding natural hand movement from MRCPs
    Xue, Muhui
    Xu, Baoguo
    Ping, Jingyu
    Miao, Minmin
    Li, Huijun
    Song, Aiguo
    MEASUREMENT, 2025, 247
  • [18] A new hybrid strategy in medical image registration based on graph transformation matching and mean-based RANSAC algorithms
    Hossein-Nejad, Zahra
    Nasri, Mehdi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (35) : 82777 - 82804
  • [19] Distance based algorithms for small biomolecule classification and structural similarity search
    Karakoc, Emre
    Cherkasov, Artem
    Sahinalp, S. Cenk
    BIOINFORMATICS, 2006, 22 (14) : E243 - E251
  • [20] Single cell lineage reconstruction using distance-based algorithms and the R package, DCLEAR
    Wuming Gong
    Hyunwoo J. Kim
    Daniel J. Garry
    Il-Youp Kwak
    BMC Bioinformatics, 23