WASTK: A Weighted Abstract Syntax Tree Kernel Method for Source Code Plagiarism Detection

被引:27
|
作者
Fu, Deqiang [1 ,2 ]
Xu, Yanyan [1 ]
Yu, Haoran [2 ]
Yang, Boyang [2 ]
机构
[1] Beijing Forestry Univ, Sch Informat Sci & Technol, 35 Qinghuadong Rd, Beijing 100083, Peoples R China
[2] Beijing Judao Youda Network Technol Co Ltd, Jisuan Inst Technol, 18 Suzhoujie St,Room 1204, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1155/2017/7809047
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] An application for plagiarized source code detection based on a parse tree kernel
    Son, Jeong-Woo
    Noh, Tae-Gil
    Song, Hyun-Je
    Park, Seong-Bae
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (08) : 1911 - 1918
  • [22] Source code plagiarism detection: The Unix way
    Petrik, Juraj
    Chuda, Daniela
    Steinmuller, Branislav
    2017 IEEE 15TH INTERNATIONAL SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI), 2017, : 467 - 471
  • [23] A Source Code Similarity System for Plagiarism Detection
    Duric, Zoran
    Gasevic, Dragan
    COMPUTER JOURNAL, 2013, 56 (01): : 70 - 86
  • [24] A State of Art on Source Code Plagiarism Detection
    Agrawal, Mayank
    Sharma, Dilip Kumar
    PROCEEDINGS ON 2016 2ND INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING TECHNOLOGIES (NGCT), 2016, : 236 - 241
  • [25] Scalable Source Code Plagiarism Detection Using Source Code Vectors Clustering
    Duracik, Michal
    Krsak, Emil
    Hrkut, Patrik
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 499 - 502
  • [26] Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance
    Song, Yewei
    Lothritz, Cedric
    Tang, Daniel
    Bissyandé, Tegawendé F.
    Klein, Jacques
    arXiv,
  • [27] The Metric for Automatic Code Generation Based on Dynamic Abstract Syntax Tree
    Yao, Wenjun
    Jiang, Ying
    Yang, Yang
    INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2023, 15 (01)
  • [28] Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance
    Song, Yewei
    Lothritz, Cedric
    Tang, Daniel
    Bissyande, Tegawende F.
    Klein, Jacques
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 38 - 46
  • [29] Fine-Grained Code Clone Detection with Block-Based Splitting of Abstract Syntax Tree
    Hu, Tiancheng
    Xu, Zijing
    Fang, Yilin
    Wu, Yueming
    Yuan, Bin
    Zou, Deqing
    Jin, Hai
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 89 - 100
  • [30] Improving Source Code Plagiarism Detection: Lessons Learned
    Misic, Marko J.
    Protic, Jelica Z.
    Tomasevic, Milo V.
    2017 25TH TELECOMMUNICATION FORUM (TELFOR), 2017, : 856 - 863