Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines

被引：0

作者：

Flynn, Patrick ^{[1
,2
]}

Vanderbruggen, Tristan ^{[1
]}

Liao, Chunhua ^{[1
]}

Lin, Pei-Hung ^{[1
]}

Emani, Murali ^{[3
]}

Shen, Xipeng ^{[4
]}

机构：

[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA

[2] Univ North Carolina Charlotte, Charlotte, NC 28223 USA

[3] Argonne Natl Lab, Lemont, IL 60439 USA

[4] North Carolina State Univ, Raleigh, NC 27695 USA

来源：

SOFTWARE ARCHITECTURE. ECSA 2022 TRACKS AND WORKSHOPS | 2023年 / 13928卷

关键词：

reusable datasets; reusable machine learning; programming language processing; interoperable pipelines;

D O I：

10.1007/978-3-031-36889-9_27

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets and models being released, and the set of complex compilers or tools involved. To improve the findability, accessibility, interoperability and reusability (FAIRness) of machine learning components, we collect and analyze a set of representative papers in the domain of machine learning-based PLP. We then identify and characterize key concepts including PLP tasks, model architectures and supportive tools. Finally, we show some example use cases of leveraging the reusable components to construct machine learning pipelines to solve a set of PLP tasks.

引用

页码：402 / 417

页数：16

共 50 条

[31] RESEARCH ON THE TEXT CLASSIFICATION BASED ON NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
Chen Keming
Zheng Jianguo
JOURNAL OF THE BALKAN TRIBOLOGICAL ASSOCIATION, 2016, 22 (03): : 2484 - 2494
[32] Towards Machine Learning Fairness Education in a Natural Language Processing Course
Bobesh, Samantha Jane
Miller, Tyler
Newman, Pax
Liu, Yudong
Elglaly, Yasmine N.
PROCEEDINGS OF THE 54TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, VOL 1, SIGCSE 2023, 2023, : 312 - 318
[33] Extracting Biomarker Information applying Natural Language Processing and Machine Learning
Islam, Md Tawhidul
Shaikh, Mostafa
Nayak, Abhaya
Ranganathan, Shoba
2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING (ICBBE 2010), 2010,
[34] Analysis of Breakdown Reports Using Natural Language Processing and Machine Learning
Ahmed, Mobyen Uddin
Bengtsson, Marcus
Salonen, Antti
Funk, Peter
INTERNATIONAL CONGRESS AND WORKSHOP ON INDUSTRIAL AI 2021, 2022, : 40 - 52
[35] Detecting Phishing Attacks Using Natural Language Processing and Machine Learning
Peng, Tianrui
Harris, Ian G.
Sawa, Yuki
2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 300 - 301
[36] SmishGuard: Leveraging Machine Learning and Natural Language Processing for Smishing Detection
Samad, Saleem Raja Abdul
Ganesan, Pradeepa
Rajasekaran, Justin
Radhakrishnan, Madhubala
Ammaippan, Hariraman
Ramamurthy, Vinodhini
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 586 - 593
[37] Detecting Phishing Attacks Using Natural Language Processing And Machine Learning
Banu, Reshma
Anand, M.
Kamath, Akshatha C.
Ashika, S.
Ujwala, H. S.
Harshitha, S. N.
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1210 - 1214
[38] Applying machine learning and natural language processing to detect phishing email
Alhogail, Areej
Alsabih, Afrah
COMPUTERS & SECURITY, 2021, 110
[39] Subjective Answers Evaluation Using Machine Learning and Natural Language Processing
Bashir, Muhammad Farrukh
Arshad, Hamza
Javed, Abdul Rehman
Kryvinska, Natalia
Band, Shahab S.
IEEE ACCESS, 2021, 9 : 158972 - 158983
[40] SmartFund: Predicting Research Outcomes with Machine Learning and Natural Language Processing
Alaphat, Alvin
Jiang, Meng
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2857 - 2865

← 1 2 3 4 5 →