Sourcerer: An infrastructure for large-scale collection and analysis of open-source code

被引:55
|
作者
Bajracharya, Sushi [1 ]
Ossher, Joel [1 ]
Lopes, Cristina [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
关键词
Open source; Internet-scale code retrieval; Data mining; Sourcerer; Static analysis; Software information retrieval; SOFTWARE; SEARCH; REUSE;
D O I
10.1016/j.scico.2012.04.008
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-ofthe-art software engineering tools, as these tools often require access to both the structural and textual information available in source code. We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-ofthe-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:241 / 259
页数:19
相关论文
共 50 条
  • [21] MeshMonk: Open-source large-scale intensive 3D phenotyping
    Julie D. White
    Alejandra Ortega-Castrillón
    Harold Matthews
    Arslan A. Zaidi
    Omid Ekrami
    Jonatan Snyders
    Yi Fan
    Tony Penington
    Stefan Van Dongen
    Mark D. Shriver
    Peter Claes
    Scientific Reports, 9
  • [22] Research and Application on Open-source Database in Large-scale Nuclear Power Enterprises
    Guo, Wei
    Wang, Qiang
    2015 International Conference on Software Engineering and Information System (SEIS 2015), 2015, : 544 - 550
  • [23] MigrationAdvisor: Recommending Library Migrations from Large-Scale Open-Source Data
    He, Hao
    Xu, Yulin
    Cheng, Xiao
    Liang, Guangtai
    Zhou, Minghui
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021), 2021, : 9 - 12
  • [24] MigrationAdvisor: Recommending Library Migrations from Large-Scale Open-Source Data
    He, Hao
    Xu, Yulin
    Cheng, Xiao
    Liang, Guangtai
    Zhou, Minghui
    Proceedings - International Conference on Software Engineering, 2021, : 9 - 12
  • [25] A Large-Scale Study On Repetitiveness, Containment, and Composability of Routines in Open-Source Projects
    Anh Tuan Nguyen
    Hoan Anh Nguyen
    Nguyen, Tien N.
    13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2016), 2016, : 362 - 373
  • [26] VERI: A Large-scale Open-Source Components Vulnerability Detection in IoT Firmware
    Cheng, Yiran
    Yang, Shouguo
    Lang, Zhe
    Shi, Zhiqiang
    Sun, Limin
    COMPUTERS & SECURITY, 2023, 126
  • [27] TDNetGen: An Open-Source, Parametrizable, Large-Scale, Transmission, and Distribution Test System
    Pilatte, Nicolas
    Aristidou, Petros
    Hug, Gabriela
    IEEE SYSTEMS JOURNAL, 2019, 13 (01): : 729 - 737
  • [28] MillimeTera: Toward A Large-Scale Open-Source mmWave and Terahertz Experimental Testbed
    Polese, Michele
    Restuccia, Francesco
    Gosain, Abhimanyu
    Jornet, Josep
    Bhardwaj, Shubhendu
    Ariyarathna, Viduneth
    Mandal, Soumyajit
    Zheng, Kai
    Dhananjay, Aditya
    Mezzavilla, Marco
    Buckwalter, James
    Rodwell, Mark
    Wang, Xin
    Zorzi, Michele
    Madanayake, Arjuna
    Melodia, Tommaso
    PROCEEDINGS OF THE 3RD ACM WORKSHOP ON MILLIMETER-WAVE NETWORKS AND SENSING SYSTEMS, MMNETS 2019, 2019, : 27 - 32
  • [29] ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling
    Li, Quanyi
    Peng, Zhenghao
    Feng, Lan
    Liu, Zhizheng
    Duan, Chenda
    Mo, Wenjie
    Zhou, Bolei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] MeshMonk: Open-source large-scale intensive 3D phenotyping
    White, Julie D.
    Ortega-Castrillon, Alejandra
    Matthews, Harold
    Zaidi, Arslan A.
    Ekrami, Omid
    Snyders, Jonatan
    Fan, Yi
    Penington, Tony
    Van Dongen, Stefan
    Shriver, Mark D.
    Claes, Peter
    SCIENTIFIC REPORTS, 2019, 9 (1)