Performance evaluation of machine learning models on large dataset of android applications reviews

被引:5
|
作者
Qureshi, Ali Adil [1 ]
Ahmad, Maqsood [2 ]
Ullah, Saleem [1 ]
Yasir, Muhammad Naveed [3 ]
Rustam, Furqan [4 ]
Ashraf, Imran [5 ]
机构
[1] Khwaja Fareed Univ Engn & Informat Technol, Dept Comp Sci, Rahim Yar Khan 64200, Pakistan
[2] Islamia Univ Bahawalpur, Dept Informat Secur, Bahawalpur 63100, Punjab, Pakistan
[3] Univ Narowal, Dept Comp Sci, Narowal 51600, Pakistan
[4] Univ Coll Dublin, Sch Comp Sci, Dublin, Ireland
[5] Yeungnam Univ, Informat & Commun Engn, Gyongsan 38541, South Korea
关键词
Opinion mining; Sentiment analysis; Mobile apps reviews; Google Play Store; CLASSIFICATION;
D O I
10.1007/s11042-023-14713-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With an ever-increasing number of mobile users, the development of mobile applications (apps) has become a potential market during the past decade. Billions of users download mobile apps for divergent use from Google Play Store, fulfill tasks and leave comments about their experience. Such reviews are replete with a variety of feedback that serves as a guide for the improvement of existing apps and intuition for novel mobile apps. However, application reviews are challenging and very broad to approach. Such reviews, when segregated into different classes guide the user in the selection of suitable apps. This study proposes a framework for analyzing the sentiment of reviews for apps of eight different categories like shopping, sports, casual, etc. A large dataset is scrapped comprising 251661 user reviews with the help of 'Regular Expression' and 'Beautiful Soup'. The framework follows the use of different machine learning models along with the term frequency-inverse document frequency (TF-IDF) for feature extraction. Extensive experiments are performed using preprocessing steps, as well as, the stats feature of app reviews to evaluate the performance of the models. Results indicate that combining the stats feature with TF-IDF shows better performance and the support vector machine obtains the highest accuracy. Experimental results can potentially be used by other researchers to select appropriate models for the analysis of app reviews. In addition, the provided dataset is large, diverse, and balanced with eight categories and 59 app reviews and provides the opportunity to analyze reviews using state-of-the-art approaches.
引用
收藏
页码:37197 / 37219
页数:23
相关论文
共 50 条
  • [31] A new machine learning-based method for android malware detection on imbalanced dataset
    Dehkordy, Diyana Tehrany
    Rasoolzadegan, Abbas
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 24533 - 24554
  • [32] Security Evaluation System for Android Applications Using User's Reviews and Permissions
    Okazaki, Naonobu
    Kita, Yoshihiro
    Aburada, Kentaro
    Park, Mirang
    JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2015, 2 (03): : 190 - 193
  • [33] Attribute-based Granular Evaluation for Performance of Machine Learning Models
    Trenquier, Henri
    Ishikawa, Fuyuki
    Tokumoto, Susumu
    2020 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2020, : 125 - 132
  • [34] Automatic dataset builder for Machine Learning applications to satellite imagery
    Sebastianelli, Alessandro
    Del Rosso, Maria Pia
    Ullo, Silvia Liberata
    SOFTWAREX, 2021, 15
  • [35] Automatic dataset builder for Machine Learning applications to satellite imagery
    Sebastianelli, Alessandro
    Del Rosso, Maria Pia
    Ullo, Silvia Liberata
    SoftwareX, 2021, 15
  • [36] Performance Evaluation of Parametric and Non-Parametric Machine Learning Models using Statistical Analysis for RT-IoT2022 Dataset
    Sharmila, B. S.
    Nandini, B. M.
    Kavitha, S. S.
    Srivatsa, Anand
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2024, 83 (08): : 864 - 872
  • [37] VegNet: Dataset of vegetable quality images for machine learning applications
    Suryawanshi, Yogesh
    Patil, Kailas
    Chumchu, Prawit
    DATA IN BRIEF, 2022, 45
  • [38] A Dataset Auditing Method for Collaboratively Trained Machine Learning Models
    Huang, Yangsibo
    Huang, Chun-Yin
    Li, Xiaoxiao
    Li, Kai
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (07) : 2081 - 2090
  • [39] Performance Evaluation and Comparative Analysis of Machine Learning Models on the UNSW-NB15 Dataset: A Contemporary Approach to Cyber Threat Detection
    Fathima, Afrah
    Khan, Amir
    Uddin, Md Faizan
    Waris, Mohammad Maqbool
    Ahmad, Sultan
    Sanin, Cesar
    Szczerbicki, Edward
    CYBERNETICS AND SYSTEMS, 2023,
  • [40] Dataset of road surface images with seasons for machine learning applications
    Bhutad, Sonali
    Patil, Kailas
    DATA IN BRIEF, 2022, 42