Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics

被引:46
|
作者
Rahman, Mohammad Masudur [1 ]
Roy, Chanchal K. [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Code search; query reformulation; crowd-sourced knowledge; extra-large data analytics; Stack Overflow; PageRank algorithm; Borda count; semantic similarity; SOFTWARE; CONTEXT;
D O I
10.1109/ICSME.2018.00057
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a programming task written as a natural language query, and then reformulates the query for improved code search. We first collect candidate API classes from Stack Overflow using pseudo-relevance feedback and two term weighting algorithms, and then rank the candidates using Borda count and semantic proximity between query keywords and the API classes. The semantic proximity has been determined by an analysis of 1.3 million questions and answers of Stack Overflow. Experiments using 310 code search queries report that our technique suggests relevant API classes with 48% precision and 58% recall which are 32% and 48% higher respectively than those of the state-of-the-art. Comparisons with two state-of-the-art studies and three popular search engines (e.g., Google, Stack Overflow, and GitHub native search) report that our reformulated queries (1) outperform the queries of the state-of-the-art, and (2) significantly improve the code search results provided by these contemporary search engines.
引用
收藏
页码:473 / 484
页数:12
相关论文
共 7 条
  • [1] NLP2API: Query Reformulation for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics
    Rahman, Mohammad Masudur
    Roy, Chanchal K.
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2018, : 714 - 714
  • [2] Automatic query reformulation for code search using crowdsourced knowledge
    Mohammad M. Rahman
    Chanchal K. Roy
    David Lo
    Empirical Software Engineering, 2019, 24 : 1869 - 1924
  • [3] Automatic query reformulation for code search using crowdsourced knowledge
    Rahman, Mohammad M.
    Roy, Chanchal K.
    Lo, David
    EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (04) : 1869 - 1924
  • [4] Supporting Code Search with Context-Aware, Analytics-Driven, Effective Query Reformulation
    Rahman, Mohammad Masudur
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2019), 2019, : 226 - 229
  • [5] QUICKAR: Automatic Query Reformulation for Concept Location using Crowdsourced Knowledge
    Rahman, Mohammad Masudur
    Roy, Chanchal K.
    2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2016, : 220 - 225
  • [6] RACK: Code Search in the IDE using Crowdsourced Knowledge
    Rahman, Mohammad Masudur
    Roy, Chanchal K.
    Lo, David
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 51 - 54
  • [7] End-to-End Space-Efficient Pipeline for Natural Language Query based Spacecraft Health Data Analytics using Large Language Model (LLM)
    Ram, Gummuluri Venkata Ravi
    Ashinee, Kesanam
    Kumar, M. Anand
    2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,