Evaluation of text semantic features using latent dirichlet allocation model

被引:0
|
作者
Zhou C. [1 ]
Li N. [2 ,3 ]
Zhang C. [2 ]
Yang X. [1 ]
机构
[1] Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing
[2] Collaborative Innovation Center of eTourism, Tourism College, Beijing Union University, Beijing
[3] Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing
关键词
Creative computing; Information architecture; Latent Dirichlet allocation (LDA); Online reviews; Semantic feature;
D O I
10.23940/ijpe.20.06.p15.968978
中图分类号
O212 [数理统计];
学科分类号
摘要
Obtaining useful information from mass data on the Internet has been a hot topic in information process research in recent years. For unstructural data like online reviews based on natural languages, it becomes more challenging. Online consumer reviews reflect customers' real experience and opinions on products or services. However, there are short of methods or tools to help potential customers find high-quality and helpful reviews from a large number of reviews. This paper applied the concept and idea of creative computing to solve this problem. Tf-idf, as a traditional method to extract text features, measures the importance of words through word frequency and ignores the semantic information in the text data, while the topic model makes up for this deficiency. This paper proposed to use the vector of reviews allocated by LDA topic model to represent text semantic features. Basing on semantic features of reviews, it calculated cosine similarity between the thumb up reviews and other reviews and thus obtain the simulated helpfulness scores of all reviews. Then, a linear regression was designed to obtain two features, i.e., the syntax and semantic features, and determine the simulated helpfulness scores. The proposed method was validated by collected online tourism reviews of Forbidden City and Mount Huang on three Chinese representative online tourism platforms. The results showed that the proposed method can effectively obtain and thus compare the helpfulness of online reviews in a creative way. © 2020 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:968 / 978
页数:10
相关论文
共 50 条
  • [1] Improved Word Sense Determination in Malayalam using Latent Dirichlet Allocation and Semantic Features
    Sruthi, S.
    Kannan, B.
    Paul, Binu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [2] Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model
    Kar, Manika
    Nunes, Sergio
    Ribeiro, Cristina
    INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (06) : 809 - 833
  • [3] Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation
    Lienou, Marie
    Maitre, Henri
    Datcu, Mihai
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2010, 7 (01) : 28 - 32
  • [4] Feature extraction for document text using Latent Dirichlet Allocation
    Prihatini, P. M.
    Suryawan, I. K.
    Mandia, I. N.
    2ND INTERNATIONAL JOINT CONFERENCE ON SCIENCE AND TECHNOLOGY (IJCST) 2017, 2018, 953
  • [5] Feature Substitution Using Latent Dirichlet Allocation for Text Classification
    Mathivanan, Norsyela Muhammad Noor
    Janor, Roziah Mohd
    Abd Razak, Shukor
    Ghani, Nor Azura Md.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (01) : 1087 - 1098
  • [6] A comparison of the performance of latent Dirichlet allocation and the Dirichlet multinomial mixture model on short text
    Mazarura, Jocelyn
    de Waal, Alta
    2016 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS INTERNATIONAL CONFERENCE (PRASA-ROBMECH), 2016,
  • [7] BiModal Latent Dirichlet Allocation for Text and Image
    Liao, Xiaofeng
    Jiang, Qingshan
    Zhang, Wei
    Zhang, Kai
    2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 736 - 739
  • [8] A text classification model constructed by Latent Dirichlet Allocation and Deep Learning
    Liu, Yu
    Jin, Zhengping
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2501 - 2504
  • [9] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [10] Web information mining and semantic analysis in heterogeneous unstructured text data using enhanced latent Dirichlet allocation
    Venugopal, Madamanchi
    Sharma, Virendra K.
    Sharma, Kalpana
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (01):