Mining Unstructured Economic Indicators Based on PSP_HDP Topic Model

被引:0
|
作者
Zhang Y.-T. [1 ,2 ,3 ]
Wan C.-X. [1 ,3 ]
Liu X.-P. [1 ,3 ]
Jiang T.-J. [1 ,3 ]
Liu D.-X. [1 ,3 ]
Liao G.-Q. [1 ,3 ]
机构
[1] School of Information Management, Jiangxi University of Finance and Economics, Nanchang
[2] School of Software, East China Jiaotong University, Nanchang
[3] Jiangxi Key Laboratory of Data and Knowledge Engineering, Jiangxi University of Finance and Economics, Nanchang
来源
Ruan Jian Xue Bao/Journal of Software | 2020年 / 31卷 / 03期
基金
中国国家自然科学基金;
关键词
Economic factor; Economic taxonomy; HDP topic model; Semantic relevance; Unstructured economic indicator;
D O I
10.13328/j.cnki.jos.005898
中图分类号
学科分类号
摘要
With the increasing enrichment of economic activity data, a large number of financial texts have emerged on Internet, which contains the influence factors of the economic development. How to mine these economic factors from these texts is the key to conduct economic analysis based on unstructured data. Due to the limitation of manual selection of economic indicators, and the inaccuracy of modelling economic indicators in unstructured texts, the CRF (Chinese restaurant franchise) allocation processes in HDP topic model are extended to a more efficient pattern. In order to describe the dish style in a restaurant, the existing economic taxonomies are used to determine the domain membership of a document. The semantic similarity between words is exploited to define the semantic relevance between words and topics, which reflect the similarity of customers' requirements for dishes. For each word, its representativeness of each topic is employed to evaluate its contribution to the topic, which explains the loyalty of a customer to each dish. By combining documents' domain properties, word semantics and words' presence in topics with HDP topic model, a novel model, PSP_HDP topic model, is proposed. As the PSP_HDP topic model improves documents-topics and topics-words allocation processes, it increases the accuracy of identifying economic topics and distinctiveness of the topics, which leads to a more effective mining of economic topics and economic factors. Experimental results show that the proposed model not only achieves a better performance in terms of topic diversity, topic perplexity and topic complexity, but also is effective in finding more cohesive unstructured economic indicators and economic factors. © Copyright 2020, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:845 / 865
页数:20
相关论文
共 32 条
  • [1] Einav L., Levin J., Economics in the age of big data, Science, 346, 6210, pp. 715-719, (2014)
  • [2] Liu T.X., Xu X.F., Can Internet Search Behavior Help to Forecast the Macro Economy?, Economic Research, 12, pp. 68-83, (2015)
  • [3] Moat H.S., Curme C., Stanley H.E., Preis T., Anticipating stock market movement with Google and Wikipedia, Proc. of the Int'l Conf. on NATO Science for Peace and Security Series C: Environmental Security, pp. 47-59, (2013)
  • [4] Luo P., Chen Y.G., Xu C.H., Baidu search, risk perception and risk prediction-A perspective of behavioral finance, Finance Forum, 1, pp. 39-51, (2018)
  • [5] Yakovleva K., Text mining-based economic activity estimation, Russian Journal of Money and Finance, 77, 4, pp. 26-41, (2018)
  • [6] Siegel M., Text mining in economics, Semantic Applications, pp. 63-73, (2018)
  • [7] Blei D.M., Ng A.Y., Jordan M.I., Latent dirichlet allocation, The Journal of Machine Learning Research, 3, pp. 993-1022, (2003)
  • [8] Chen Z., Mukherjee A., Liu B., Hsu M.C., Castellanos M., Ghosh R., Leveraging multi-domain prior knowledge in topic models, Proc. of the 23rd Int'l Joint Conf. on Artificial Intelligence, pp. 2071-2077, (2013)
  • [9] Chen Z., Liu B., Mining topics in documents: Standing on the shoulders of big data, Proc. of the ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, pp. 1116-1125, (2014)
  • [10] Chen Z.Y., Mukherjee A., Liu B., Aspect extraction with automated prior knowledge learning, Proc. of the Association for Computational Linguistics, pp. 347-358, (2014)