Artificially Generated Text Fragments Search in Academic Documents

被引:1
|
作者
Gritsay, G. M. [1 ,2 ]
Grabovoy, A. V. [1 ,2 ,3 ]
Kildyakov, A. S. [1 ]
Chekhovich, Yu. V. [1 ,3 ]
机构
[1] Antiplagiat Co, Moscow, Russia
[2] Natl Res Univ, Moscow Inst Phys & Technol, Moscow, Russia
[3] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, Moscow, Russia
关键词
machine-generated text; natural language processing; multiple hypothesis testing; paraphrase; detection of generated texts;
D O I
10.1134/S1064562423701211
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Recent advances in text generative models make it possible to create artificial texts that look like human-written texts. A large number of methods for detecting texts obtained using large language models have already been developed. However, improvement of detection methods occurs simultaneously with the improvement of generation methods. Therefore, it is necessary to explore new generative models and modernize existing approaches to their detection. In this paper, we present a large analysis of existing detection methods, as well as a study of lexical, syntactic, and stylistic features of the generated fragments. Taking into account the developments, we have tested the most qualitative, in our opinion, methods of detecting machine-generated documents for their further application in the scientific domain. Experiments were conducted for Russian and English languages on the collected datasets. The developed methods improved the detection quality to a value of 0.968 on the F1-score metric for Russian and 0.825 for English, respectively. The described techniques can be applied to detect generated fragments in scientific, research, and graduate papers.
引用
收藏
页码:S434 / S442
页数:9
相关论文
共 50 条
  • [1] Artificially Generated Text Fragments Search in Academic Documents
    G. M. Gritsay
    A. V. Grabovoy
    A. S. Kildyakov
    Yu. V. Chekhovich
    Doklady Mathematics, 2023, 108 : S434 - S442
  • [2] Annotating text segments in documents for search
    Cheng, PJ
    Chiao, HC
    Pan, YC
    Chien, LF
    2005 IEEE/WIC/ACM International Conference on Web Intelligence, Proceedings, 2005, : 317 - 320
  • [3] Auxiliary Methods for the Search of Text Documents in a Website
    Makagonov, Pavel
    Reyes Espinoza, Celia B.
    IMCIC 2010: INTERNATIONAL MULTI-CONFERENCE ON COMPLEXITY, INFORMATICS AND CYBERNETICS, VOL II, 2010, : 409 - 413
  • [4] Extracting Body Text from Academic PDF Documents for Text Mining
    Yu, Changfeng
    Zhang, Cheng
    Wang, Jie
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1, 2020, : 235 - 242
  • [5] Effective Keyword Search for Candidate Fragments of XML Documents
    Wen, Yanlong
    Zhang, Haiwei
    Zhang, Ying
    Zhang, Lu
    Xu, Lei
    Yuan, Xiaojie
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2011, 2011, 6637 : 427 - 439
  • [6] Automatic classification of academic documents using text mining techniques
    Nunez, Haydemar
    Ramos, Esmeralda
    2012 XXXVIII CONFERENCIA LATINOAMERICANA EN INFORMATICA (CLEI), 2012,
  • [7] Text-Independent Speaker Verification Using Artificially Generated GMMs for Cohorts
    Mukai, Yuuji
    Noda, Hideki
    Nimi, Michiharu
    Osanai, Takashi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (10) : 2536 - 2539
  • [8] Intelligent Sense-Enabled Lexical Search on Text Documents
    Thomas, Anu
    Sangeetha, S.
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2020, 1038 : 405 - 415
  • [9] Advanced text documents information retrieval system for search services
    Chiranjeevi, H. S.
    Shenoy, Manjula K.
    COGENT ENGINEERING, 2020, 7 (01):
  • [10] The Semantics of Clustering: Analysis of User-Generated Spatializations of Text Documents
    Endert, Alex
    Fox, Seth
    Maiti, Dipayan
    Leman, Scotland
    North, Chris
    PROCEEDINGS OF THE INTERNATIONAL WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES, 2012, : 555 - 562