WASTK: A Weighted Abstract Syntax Tree Kernel Method for Source Code Plagiarism Detection

被引:27
|
作者
Fu, Deqiang [1 ,2 ]
Xu, Yanyan [1 ]
Yu, Haoran [2 ]
Yang, Boyang [2 ]
机构
[1] Beijing Forestry Univ, Sch Informat Sci & Technol, 35 Qinghuadong Rd, Beijing 100083, Peoples R China
[2] Beijing Judao Youda Network Technol Co Ltd, Jisuan Inst Technol, 18 Suzhoujie St,Room 1204, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1155/2017/7809047
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Source Code Plagiarism Detection Based on Abstract Syntax Tree Fingerprintings
    Suttichaya, Vasin
    Eakvorachai, Niracha
    Lurkraisit, Tunchanok
    2022 17TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2022) / 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INTERNET OF THINGS (AIOT 2022), 2022,
  • [2] A Source Code Plagiarism Detecting Method Using Alignment with Abstract Syntax Tree Elements
    Kikuchi, Hiroshi
    Goto, Takaaki
    Wakatsuki, Mitsuo
    Nishino, Tetsuro
    2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2014, : 375 - 380
  • [3] A Source Code Plagiarism Detecting Method Using Sequence Alignment with Abstract Syntax Tree Elements
    Kikuchi, Hiroshi
    Goto, Takaaki
    Wakatsuki, Mitsuo
    Nishino, Tetsuro
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2015, 3 (03) : 41 - 56
  • [4] Novel code plagiarism detection based on abstract syntax tree and fuzzy petri nets
    Wang Y.-Y.
    Shen R.-K.
    Chiou G.-J.
    Yang C.-Y.
    Shen V.R.L.
    Putri F.P.
    International Journal of Engineering Education, 2019, 1 (01): : 46 - 56
  • [5] A Program Plagiarism Detection Approach Based On Abstract Syntax Tree
    Xiong, Hao
    Yan, Hai-hua
    Li, Zhou-jun
    Li, Hu
    ICAIE 2009: PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND EDUCATION, VOLS 1 AND 2, 2009, : 196 - 205
  • [6] Source Code Pattern as Anchored Abstract Syntax Tree
    Nakayama, Ken
    Sakai, Eko
    2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 170 - 173
  • [7] A Code Plagiarism Detection System Based on Abstract Syntax Tree and a High Level Fuzzy Petri Net
    Shen, Victor R. L.
    Putri, Farica P.
    INTERNATIONAL CONFERENCE ON MATERIALS, MANUFACTURING AND MECHANICAL ENGINEERING (MMME 2016), 2016, : 133 - 139
  • [8] Static code detection based on abstract syntax tree
    Lu, Xiaofeng
    Fang, Denghui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 195 - 195
  • [9] Code Summarization with Abstract Syntax Tree
    Chen, Qiuyuan
    Hu, Han
    Liu, Zhaoyi
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 652 - 660
  • [10] A Novel Neural Source Code Representation Based on Abstract Syntax Tree
    Zhang, Jian
    Wang, Xu
    Zhang, Hongyu
    Sun, Hailong
    Wang, Kaixuan
    Liu, Xudong
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 783 - 794