Evaluation of text semantic features using latent dirichlet allocation model

被引:0
|
作者
Zhou C. [1 ]
Li N. [2 ,3 ]
Zhang C. [2 ]
Yang X. [1 ]
机构
[1] Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing
[2] Collaborative Innovation Center of eTourism, Tourism College, Beijing Union University, Beijing
[3] Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing
关键词
Creative computing; Information architecture; Latent Dirichlet allocation (LDA); Online reviews; Semantic feature;
D O I
10.23940/ijpe.20.06.p15.968978
中图分类号
O212 [数理统计];
学科分类号
摘要
Obtaining useful information from mass data on the Internet has been a hot topic in information process research in recent years. For unstructural data like online reviews based on natural languages, it becomes more challenging. Online consumer reviews reflect customers' real experience and opinions on products or services. However, there are short of methods or tools to help potential customers find high-quality and helpful reviews from a large number of reviews. This paper applied the concept and idea of creative computing to solve this problem. Tf-idf, as a traditional method to extract text features, measures the importance of words through word frequency and ignores the semantic information in the text data, while the topic model makes up for this deficiency. This paper proposed to use the vector of reviews allocated by LDA topic model to represent text semantic features. Basing on semantic features of reviews, it calculated cosine similarity between the thumb up reviews and other reviews and thus obtain the simulated helpfulness scores of all reviews. Then, a linear regression was designed to obtain two features, i.e., the syntax and semantic features, and determine the simulated helpfulness scores. The proposed method was validated by collected online tourism reviews of Forbidden City and Mount Huang on three Chinese representative online tourism platforms. The results showed that the proposed method can effectively obtain and thus compare the helpfulness of online reviews in a creative way. © 2020 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:968 / 978
页数:10
相关论文
共 50 条
  • [31] Text classification using genetic algorithm oriented latent semantic features
    Uysal, Alper Kursat
    Gunal, Serkan
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) : 5938 - 5947
  • [32] Indexing by Latent Dirichlet Allocation and an Ensemble Model
    Wang, Yanshan
    Lee, Jae-Sung
    Choi, In-Chan
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (07) : 1736 - 1750
  • [33] Blind Image Quality Assessment Using Latent Dirichlet Allocation Model
    Luo, Wang
    Zhang, Tianbing
    MECHANICAL ENGINEERING, MATERIALS AND ENERGY III, 2014, 483 : 594 - 598
  • [34] Human Action Recognition Using Labeled Latent Dirichlet Allocation Model
    Yang, Jiahui
    Chen, Changhong
    Gan, Zongliang
    Zhu, Xiuchang
    2013 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP 2013), 2013,
  • [35] Latent Dirichlet Allocation complement in the vector space model for Multi-Label Text Classification
    Carrera-Trejo, Victor
    Sidorov, Grigori
    Miranda-Jimenez, Sabino
    Moreno Ibarra, Marco
    Cadena Martinez, Rodrigo
    INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2015, 6 (01): : 7 - 19
  • [36] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [37] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [38] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 601 - 608
  • [39] Author Identification Using Latent Dirichlet Allocation
    Calvo, Hiram
    Hernandez-Castaneda, Angel
    Garcia-Flores, Jorge
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 303 - 312
  • [40] Semantic similarity measure for topic modeling using latent Dirichlet allocation and collapsed Gibbs sampling
    Micheal Olalekan Ajinaja
    Adebayo Olusola Adetunmbi
    Chukwuemeka Christian Ugwu
    Olugbemiga Solomon Popoola
    Iran Journal of Computer Science, 2023, 6 (1) : 81 - 94