SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications

被引:0
|
作者
Zhong, Zexuan [1 ]
Guo, Jiaqi [2 ]
Yang, Wei [3 ]
Peng, Jian [1 ]
Xie, Tao [1 ]
Lou, Jian-Guang [4 ]
Liu, Ting [2 ]
Zhang, Dongmei [4 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Xi An Jiao Tong Univ, Xian, Peoples R China
[3] Univ Texas Dallas, Richardson, TX 75083 USA
[4] Microsoft Res Asia, Beijing, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
INFERENCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications. These approaches typically train a sequence-to-sequence learning model using a syntax-based objective: maximum likelihood estimation (MLE). Such syntax-based approaches do not effectively address the goal of generating semantically correct programs, because these approaches fail to handle Program Aliasing, i.e., semantically equivalent programs may have many syntactically different forms. To address this issue, in this paper, we propose a semantics-based approach named SemRegex. SemRegex provides solutions for a subtask of the program-synthesis problem: generating regular expressions from natural language. Different from the existing syntax-based approaches, SemRegex trains the model by maximizing the expected semantic correctness of the generated regular expressions. The semantic correctness is measured using the DFA-equivalence oracle, random test cases, and distinguishing test cases. The experiments on three public datasets demonstrate the superiority of SemRegex over the existing state-of-the-art approaches.
引用
收藏
页码:1608 / 1618
页数:11
相关论文
共 50 条
  • [1] A formal approach for generating oo specifications from natural language
    Juristo, N
    Morant, JL
    Moreno, AM
    JOURNAL OF SYSTEMS AND SOFTWARE, 1999, 48 (02) : 139 - 153
  • [2] Generating a Modelica compiler from natural semantics specifications
    Kågedal, D
    Fritzson, P
    PROCEEDINGS OF THE 1998 SUMMER COMPUTER SIMULATION CONFERENCE: SIMULATION AND MODELING TECHNOLOGY FOR THE TWENTY-FIRST CENTURY, 1998, : 299 - 307
  • [3] Semantics-based transformation of arithmetic expressions
    Martel, Matthieu
    STATIC ANALYSIS, PROCEEDINGS, 2007, 4634 : 298 - 314
  • [4] Generating simulation models from natural language specifications
    Cyre, WR
    Armstrong, JR
    Honcharik, AJ
    SIMULATION, 1995, 65 (04) : 239 - 251
  • [5] Generating a Petri net from a CSP specification: A semantics-based method
    Llorens, M.
    Oliver, J.
    Silva, J.
    Tamarit, S.
    ADVANCES IN ENGINEERING SOFTWARE, 2012, 50 : 110 - 130
  • [6] Generating Predicate Logic Expressions From Natural Language
    Levkovskyi, Oleksii
    Li, Wei
    SOUTHEASTCON 2021, 2021, : 465 - 472
  • [7] A semantics-based approach to malware detection
    Preda, Mila Dalla
    Christodorescu, Mihai
    Jha, Somesh
    Debray, Saumya
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2008, 30 (05):
  • [8] Searching the web: A semantics-based approach
    Cao, TH
    Nguyen, THD
    Qui, TCT
    MODELLING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2005, : 57 - 68
  • [9] A semantics-based approach to Malware detection
    Preda, Mila Dalla
    Christodorescu, Mihai
    Jha, Somesh
    Debray, Saumya
    ACM SIGPLAN NOTICES, 2007, 42 (01) : 377 - 388
  • [10] Generating Natural Language specifications from UML class diagrams
    Farid Meziane
    Nikos Athanasakis
    Sophia Ananiadou
    Requirements Engineering, 2008, 13 : 1 - 18