Sourcerer: An infrastructure for large-scale collection and analysis of open-source code

被引:55
|
作者
Bajracharya, Sushi [1 ]
Ossher, Joel [1 ]
Lopes, Cristina [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
关键词
Open source; Internet-scale code retrieval; Data mining; Sourcerer; Static analysis; Software information retrieval; SOFTWARE; SEARCH; REUSE;
D O I
10.1016/j.scico.2012.04.008
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-ofthe-art software engineering tools, as these tools often require access to both the structural and textual information available in source code. We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-ofthe-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:241 / 259
页数:19
相关论文
共 50 条
  • [1] SGL: A domain-specific language for large-scale analysis of open-source code
    Foo, Darius
    Yi, Ang Ming
    Yeo, Jason
    Sharma, Asankhaya
    2018 IEEE CYBERSECURITY DEVELOPMENT CONFERENCE (SECDEV 2018), 2018, : 61 - 68
  • [2] Code smells and their collocations: A large-scale experiment on open-source systems
    Walter, Bartosz
    Fontana, Francesca Arcelli
    Ferme, Vincenzo
    JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 144 : 1 - 21
  • [3] MapQuant: Open-source software for large-scale protein quantification
    Leptos, KC
    Sarracino, DA
    Jaffe, JD
    Krastins, B
    Church, GM
    PROTEOMICS, 2006, 6 (06) : 1770 - 1782
  • [4] A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition
    Ferras, Marc
    Madikeri, Srikanth
    Motlicek, Petr
    Dey, Subhadeep
    Bourlard, Herve
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (04) : 527 - 531
  • [5] Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
    Roehm, Tobias
    Veihelmann, Daniel
    Wagner, Stefan
    Juergens, Elmar
    SOFTWARE QUALITY: THE COMPLEXITY AND CHALLENGES OF SOFTWARE ENGINEERING AND SOFTWARE QUALITY IN THE CLOUD, 2019, 338 : 151 - 171
  • [6] VisRepo: A Visual Retrieval Tool for Large-Scale Open-Source Projects
    Yue, Xiaoqi
    Liu, Chao
    Zhang, Neng
    Hu, Haibo
    Zhang, Xiaohong
    PROCEEDINGS OF THE 15TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2024, 2024, : 499 - 502
  • [7] A Scalable Open-Source Pipeline for Large-Scale Root Phenotyping of Arabidopsis
    Slovak, Radka
    Goeschl, Christian
    Su, Xiaoxue
    Shimotani, Koji
    Shiina, Takashi
    Busch, Wolfgang
    PLANT CELL, 2014, 26 (06): : 2390 - 2403
  • [8] Forward Modeling of Large-scale Structure: An Open-source Approach with Halotools
    Hearin, Andrew P.
    Campbell, Duncan
    Tollerud, Erik
    Behroozi, Peter
    Diemer, Benedikt
    Goldbaum, Nathan J.
    Jennings, Elise
    Leauthaud, Alexie
    Mao, Yao-Yuan
    More, Surhud
    Parejko, John
    Sinha, Manodeep
    Sipocz, Brigitta
    Zentner, Andrew
    ASTRONOMICAL JOURNAL, 2017, 154 (05):
  • [9] A Large-Scale Study of MPI Usage in Open-Source HPC Applications
    Laguna, Ignacio
    Marshall, Ryan
    Mohror, Kathryn
    Ruefenacht, Martin
    Skjellum, Anthony
    Sultana, Nawrin
    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [10] A large-scale study of architectural evolution in open-source software systems
    Behnamghader, Pooyan
    Duc Minh Le
    Garcia, Joshua
    Link, Daniel
    Shahbazian, Arman
    Medvidovic, Nenad
    EMPIRICAL SOFTWARE ENGINEERING, 2017, 22 (03) : 1146 - 1193