A feature location approach for mapping application features extracted from crowd-based screencasts to source code

被引:0
|
作者
Parisa Moslehi
Bram Adams
Juergen Rilling
机构
[1] Concordia Universitys,
[2] Queen’s University,undefined
来源
关键词
Crowd-based documentation; Mining video content; Speech analysis; Feature location; Software traceability; Information extraction; Software documentation;
D O I
暂无
中图分类号
学科分类号
摘要
Crowd-based multimedia documents such as screencasts have emerged as a source for documenting requirements, the workflow and implementation issues of open source and agile software projects. For example, users can show and narrate how they manipulate an application’s GUI to perform a certain functionality, or a bug reporter could visually explain how to trigger a bug or a security vulnerability. Unfortunately, the streaming nature of programming screencasts and their binary format limit how developers can interact with a screencast’s content. In this research, we present an automated approach for mining and linking the multimedia content found in screencasts to their relevant software artifacts and, more specifically, to source code. We apply LDA-based mining approaches that take as input a set of screencast artifacts, such as GUI text and spoken word, to make the screencast content accessible and searchable to users and to link it to their relevant source code artifacts. To evaluate the applicability of our approach, we report on results from case studies that we conducted on existing WordPress and Mozilla Firefox screencasts. We found that our automated approach can significantly speed up the feature location process. For WordPress, we find that our approach using screencast speech and GUI text can successfully link relevant source code files within the top 10 hits of the result set with median Reciprocal Rank (RR) of 50% (rank 2) and 100% (rank 1). In the case of Firefox, our approach can identify relevant source code directories within the top 100 hits using screencast speech and GUI text with the median RR = 20%, meaning that the first true positive is ranked 5 or higher in more than 50% of the cases. Also, source code related to the frontend implementation that handles high-level or GUI-related aspects of an application is located with higher accuracy. We also found that term frequency rebalancing can further improve the linking results when using less noisy scenarios or locating less technical implementation of scenarios. Investigating the results of using original and weighted screencast data sources (speech, GUI, speech and GUI) that can result in having the highest median RR values in both case studies shows that speech data is an important information source that can result in having RR of 100%.
引用
收藏
页码:4873 / 4926
页数:53
相关论文
共 45 条
  • [21] Software Defect Prediction Using a Hybrid Model Based on Semantic Features Learned from the Source Code
    Miholca, Diana-Lucia
    Czibula, Gabriela
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 262 - 274
  • [22] Biomarkers of Immersion in Virtual Reality Based on Features Extracted from the EEG Signals: A Machine Learning Approach
    Tadayyoni, Hamed
    Campos, Michael S. Ramirez
    Quevedo, Alvaro Joffre Uribe
    Murphy, Bernadette A.
    BRAIN SCIENCES, 2024, 14 (05)
  • [23] A Learning-Based Approach for Automatic Construction of Domain Glossary from Source Code and Documentation
    Wang, Chong
    Peng, Xin
    Liu, Mingwei
    Xing, Zhenchang
    Bai, Xuefang
    Xie, Bing
    Wang, Tuo
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 97 - 108
  • [24] Roadway Feature Mapping from Point Cloud Data: A Graph-Based Clustering Approach
    Billah, Mohammad
    Maskooki, Arash
    Rahman, Farzana
    Farrell, Jay A.
    2017 28TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV 2017), 2017, : 475 - 480
  • [25] CCA BASED FEATURE SELECTION WITH APPLICATION TO CONTINUOUS DEPRESSION RECOGNITION FROM ACOUSTIC SPEECH FEATURES
    Kaya, Heysem
    Eyben, Florian
    Salah, Albert Ali
    Schuller, Bjoern
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [26] A novel approach to estimate the weight of food items based on features extracted from an image using boosting algorithms
    Konstantakopoulos, Fotios S.
    Georga, Eleni I.
    Fotiadis, Dimitrios I.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [27] A novel approach to estimate the weight of food items based on features extracted from an image using boosting algorithms
    Fotios S. Konstantakopoulos
    Eleni I. Georga
    Dimitrios I. Fotiadis
    Scientific Reports, 13
  • [28] Function point measurement from Web application source code based on screen transitions and database accesses
    Edagawa, T.
    Akaike, T.
    Higo, Y.
    Kusumoto, S.
    Hanabusa, S.
    Shibamoto, T.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2011, 84 (06) : 976 - 984
  • [29] An Approach for Classifying Alcoholic and Non-Alcoholic Persons Based on Time Domain Features Extracted From EEG Signals
    Fattah, S. A.
    Fatima, K.
    Shahnaz, C.
    2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), 2015, : 479 - 482
  • [30] TQU-SLAM Benchmark Dataset for Comparative Study to Build Visual Odometry Based on Extracted Features from Feature Descriptors and Deep Learning
    Nguyen, Thi-Hao
    Le, Van-Hung
    Do, Huu-Son
    Te, Trung-Hieu
    Phan, Van-Nam
    FUTURE INTERNET, 2024, 16 (05)