Performance evaluation of machine learning models on large dataset of android applications reviews

被引：5

作者：

Qureshi, Ali Adil ^{[1
]}

Ahmad, Maqsood ^{[2
]}

Ullah, Saleem ^{[1
]}

Yasir, Muhammad Naveed ^{[3
]}

Rustam, Furqan ^{[4
]}

Ashraf, Imran ^{[5
]}

机构：

[1] Khwaja Fareed Univ Engn & Informat Technol, Dept Comp Sci, Rahim Yar Khan 64200, Pakistan

[2] Islamia Univ Bahawalpur, Dept Informat Secur, Bahawalpur 63100, Punjab, Pakistan

[3] Univ Narowal, Dept Comp Sci, Narowal 51600, Pakistan

[4] Univ Coll Dublin, Sch Comp Sci, Dublin, Ireland

[5] Yeungnam Univ, Informat & Commun Engn, Gyongsan 38541, South Korea

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 24期

关键词：

Opinion mining; Sentiment analysis; Mobile apps reviews; Google Play Store; CLASSIFICATION;

D O I：

10.1007/s11042-023-14713-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With an ever-increasing number of mobile users, the development of mobile applications (apps) has become a potential market during the past decade. Billions of users download mobile apps for divergent use from Google Play Store, fulfill tasks and leave comments about their experience. Such reviews are replete with a variety of feedback that serves as a guide for the improvement of existing apps and intuition for novel mobile apps. However, application reviews are challenging and very broad to approach. Such reviews, when segregated into different classes guide the user in the selection of suitable apps. This study proposes a framework for analyzing the sentiment of reviews for apps of eight different categories like shopping, sports, casual, etc. A large dataset is scrapped comprising 251661 user reviews with the help of 'Regular Expression' and 'Beautiful Soup'. The framework follows the use of different machine learning models along with the term frequency-inverse document frequency (TF-IDF) for feature extraction. Extensive experiments are performed using preprocessing steps, as well as, the stats feature of app reviews to evaluate the performance of the models. Results indicate that combining the stats feature with TF-IDF shows better performance and the support vector machine obtains the highest accuracy. Experimental results can potentially be used by other researchers to select appropriate models for the analysis of app reviews. In addition, the provided dataset is large, diverse, and balanced with eight categories and 59 app reviews and provides the opportunity to analyze reviews using state-of-the-art approaches.

引用

页码：37197 / 37219

页数：23

共 50 条

[1] Performance evaluation of machine learning models on large dataset of android applications reviews
Ali Adil Qureshi
Maqsood Ahmad
Saleem Ullah
Muhammad Naveed Yasir
Furqan Rustam
Imran Ashraf
Multimedia Tools and Applications, 2023, 82 : 37197 - 37219
[2] A large synthetic dataset for machine learning applications in power transmission grids
Gillioz, Marc
Dubuis, Guillaume
Jacquod, Philippe
SCIENTIFIC DATA, 2025, 12 (01)
[3] Android Spyware Detection Using Machine Learning: A Novel Dataset
Qabalin, Majdi K.
Naser, Muawya
Alkasassbeh, Mouhammd
SENSORS, 2022, 22 (15)
[4] Efficient Large Scale Medical Image Dataset Preparation for Machine Learning Applications
Denner, Stefan
Scherer, Jonas
Kades, Klaus
Bounias, Dimitrios
Schader, Philipp
Kausch, Lisa
Bujotzek, Markus
Bucher, Andreas Michael
Penzkofer, Tobias
Maier-Hein, Klaus
DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2023, 2023, 14314 : 46 - 55
[5] Dataset of cannabis seeds for machine learning applications
Chumchu, Prawit
Patil, Kailas
DATA IN BRIEF, 2023, 47
[6] Explainable Machine Learning for Malware Detection on Android Applications
Palma, Catarina
Ferreira, Artur
Figueiredo, Mario
INFORMATION, 2024, 15 (01)
[7] Performance Evaluation of Deep Learning Models on Mammogram Classification Using Small Dataset
Adedigba, Adeyinka P.
Adeshina, Steve A.
Aibinu, Abiodun M.
BIOENGINEERING-BASEL, 2022, 9 (04):
[8] ModelSet: A labelled dataset of software models for machine learning
López, José Antonio Hernández
Cánovas Izquierdo, Javier Luis
Cuadrado, Jesús Sánchez
Science of Computer Programming, 2024, 231
[9] Dataset Augmentation for Machine Learning Applications of Dental Radiography
Khan, Shahid
Mukati, Altaf
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 453 - 456
[10] HelmetML: A dataset of helmet images for machine learning applications
Patil, Kailas
Jadhav, Rohini
Suryawanshi, Yogesh
Chumchu, Prawit
Khare, Gaurav
Shinde, Tanishk
DATA IN BRIEF, 2024, 56

← 1 2 3 4 5 →