SCC plus plus : Predicting the programming language of questions and snippets of Stack Overflow

被引:12
|
作者
Alrashedy, Kamel [1 ]
Dharmaretnam, Dhanush [1 ]
German, Daniel M. [1 ]
Srinivasan, Venkatesh [1 ]
Gulliver, T. Aaron [2 ]
机构
[1] Univ Victoria, Dept Comp Sci, Victoria, BC V8W 2Y2, Canada
[2] Univ Victoria, Dept Elect & Comp Engn, Victoria, BC V8W 2Y2, Canada
关键词
Classification; Machine learning; Natural language processing; And programming languages;
D O I
10.1016/j.jss.2019.110505
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted on Stack Overflow usually contain a code snippet. Determining the programming language of a source code file has been considered in the research community; it has been shown that Machine Learning (ML) and Natural Language Processing (NLP) algorithms can be effective in identifying the programming language of source code files. However, determining the programming language of a code snippet or a few lines of source code is still a challenging task. Online forums such as Stack Overflow and code repositories such as GitHub contain a large number of code snippets. In this paper, we design and evaluate Source Code Classification (SCC++), a classifier that can identify the programming language of a question posted on Stack Overflow. The classifier achieves an accuracy of 88.9% in classifying programming languages by combining features from the title, body and the code snippets of the question. We also propose a classifier that only uses the title and body of the question and has an accuracy of 78.9%. Finally, we propose a classifier of code snippets only that achieves an accuracy of 78.1%. These results show that deploying Machine Learning techniques on the combination of text and code snippets of a question provides the best performance. In addition, the classifier can distinguish between code snippets from a family of programming languages such as C. C++ and C#, and can also identify the programming language version such as C# 3.0, C# 4.0 and C# 5.0. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页数:11
相关论文
共 36 条
  • [21] COMPUTER ADAPTIVE TESTS IN EVALUATION OF KNOWLEDGE OF C plus plus PROGRAMMING LANGUAGE
    Cisar, Sanja Maravic
    Cisar, Petar
    Vasic, Dragana
    Obradovic, Borislav
    Vasiljevic, Petar
    METALURGIA INTERNATIONAL, 2012, 17 (04): : 39 - 46
  • [22] The FKM Pedagogics for Teaching C plus plus Object Oriented Programming Language
    Zhang, Tao-Hong
    Ma, Shu-Ming
    Yao, Lin
    2016 INTERNATIONAL CONFERENCE ON EDUCATION SCIENCE AND EDUCATION MANAGEMENT (ESEM 2016), 2016, : 102 - 106
  • [23] C plus plus PROGRAMMING FOR CARTOGRAPHERS AND GEODESIST: TEACHING THE PROGRAMMING LANGUAGE BY GEODESIC EXERCISES AND TOPOGRAPHIC TASKS
    Zablotskiy, Vladimir
    7TH INTERNATIONAL CONFERENCE ON CARTOGRAPHY AND GIS, VOLS 1 AND 2, 2018, : 190 - 194
  • [24] Implementation of NETCONF Client in C plus plus Programming Language for Software Defined Networks
    Popic, Srdan
    Krnjajic, Tijana
    Doslic, Sretenka
    Todorovic, Branislav M.
    2019 27TH TELECOMMUNICATIONS FORUM (TELFOR 2019), 2019, : 205 - 208
  • [25] A Semantic Programming Language SPL plus A Preliminary Report
    Zhang Guigang
    Shu Wang
    Xu ChengZhi
    Gong, Zhiyuan
    Sheu, Phillip C-Y
    20TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL 2, PROCEEDINGS, 2008, : 274 - +
  • [26] Mixed language programming in C/C plus plus and Java']Java for applications in mechatronic systems
    Cheetancheri, Kabileshkumar G.
    Cheng, Harry H.
    PROCEEDINGS OF THE 2006 IEEE/ASME INTERNATIONAL CONFERENCE ON MECHATRONIC AND EMBEDDED SYSTEMS AND APPLICATIONS, 2006, : 286 - +
  • [27] A C plus plus -embedded Domain-Specific Language for Programming the MORA Soft Processor Array
    Vanderbauwhede, W.
    Margala, M.
    Chalamalasetti, S. R.
    Purohit, S.
    21ST IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2010,
  • [28] An analysis of programming language statement frequency in C, C plus plus , and Java']Java source code
    Zhu, Xiaoyan
    Whitehead, E. James
    Sadowski, Caitlin
    Song, Qinbao
    SOFTWARE-PRACTICE & EXPERIENCE, 2015, 45 (11): : 1479 - 1495
  • [29] From PROLOG plus plus to PROLOG+CG: A CG object-oriented logic programming language
    Kabbaj, A
    Janta-Polczynski, M
    CONCEPTUAL STRUCTURES: LOGICAL, LINGUISTIC, AND COMPUTATIONAL ISSUES, PROCEEDINGS, 2000, 1867 : 540 - 554
  • [30] PASCAL-PLUS - ANOTHER LANGUAGE FOR MODULAR MULTI-PROGRAMMING
    WELSH, J
    BUSTARD, DW
    SOFTWARE-PRACTICE & EXPERIENCE, 1979, 9 (11): : 947 - 957