Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithm

被引:1
|
作者
de Almeida Farzat, Andre [1 ]
de Oliveira Barros, Marcio [1 ]
机构
[1] Fed Univ State Rio De Janeiro, Av Pasteur 458, BR-22290240 Rio De Janeiro, RJ, Brazil
关键词
Regular expressions; Regex Golf; Local search; Heuristic search; EXTRACTION;
D O I
10.1007/s10710-021-09411-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regular expression is a technology widely used in software development for extracting textual data, validating the structure of textual documents, or formatting data. Regex Golf is a challenge that consists in finding the smallest possible regular expression given a set of sentences to perform matches and another set not to match. An algorithm capable of meeting the Regex Golf requirements is a relevant contribution to the area of semi-structured document data extraction. In this paper, we propose a heuristic search algorithm based on local search, combined with a regular expression shrinker, to find valid results for Regex Golf problems. An experimental study was conducted to compare the proposed technique with an exact algorithm and a genetic programming algorithm designed for the Regex Golf challenge. The proposed local search was shown to outperform both competing algorithms in six out of fifteen problem instances, tying in another three instances. On the other hand, all algorithms still lack the ability to outperform human software developers in designing regular expressions for the challenge.
引用
收藏
页码:105 / 131
页数:27
相关论文
共 50 条
  • [31] AUTOMATIC CURVE FITTING USING AN ADAPTIVE LOCAL ALGORITHM
    CHUNG, WL
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1980, 6 (01): : 45 - 57
  • [32] Dynamic local search based immune automatic clustering algorithm and its applications
    Liu, Ruochen
    Zhu, Binbin
    Bian, Renyu
    Ma, Yajuan
    Jiao, Licheng
    APPLIED SOFT COMPUTING, 2015, 27 : 250 - 268
  • [33] Automatic Test Data Generation Using a Genetic Algorithm
    Aleb, Nassima
    Kechid, Samir
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT II, 2013, 7972 : 574 - 586
  • [34] Automatic fingerprints image generation using evolutionary algorithm
    Cho, Ung-Keun
    Hong, Jin-Hyuk
    Cho, Sung-Bae
    NEW TRENDS IN APPLIED ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4570 : 444 - +
  • [35] Automatic fingerprints image generation using evolutionary algorithm
    Cho, Ung-Keun
    Hong, Jin-Hyuk
    Cho, Sung-Bae
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1104 - +
  • [36] Automatic fuel lattice design in a boiling water reactor using a particle swarm optimization algorithm and local search
    Lin, Chaung
    Lin, Tung-Hsien
    ANNALS OF NUCLEAR ENERGY, 2012, 47 : 98 - 103
  • [37] Application of Search Group Algorithm for Automatic Generation Control of Interconnected Power System
    Khamari, Dillip
    Sahu, Rabindra Kumar
    Panda, Sidhartha
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, 2019, 711 : 557 - 568
  • [38] Gravitational Search Algorithm based Automatic Generation Control for Interconnected Power System
    Rout, Umesh Kumar
    Sahu, Rabindra Kumar
    Panda, Sidhartha
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2013), 2013, : 558 - 563
  • [39] Optimal gravitational search algorithm for automatic generation control of interconnected power systems
    Sahu, Rabindra Kumar
    Panda, Sidhartha
    Padhan, Saroj
    AIN SHAMS ENGINEERING JOURNAL, 2014, 5 (03) : 721 - 733
  • [40] Streaming, Plaintext Private Information Retrieval using Regular Expressions on Arbitrary Length Search Strings
    Fink, Russell A.
    Zaret, David R.
    Stonehirsch, Rachel B.
    Seng, Robert M.
    Tyson, Samantha M.
    2017 1ST IEEE SYMPOSIUM ON PRIVACY-AWARE COMPUTING (PAC), 2017, : 107 - 118