Short Texts Classification Through Reference Document Expansion

被引:0
|
作者
Yang Zhen [1 ]
Fan Kefeng [2 ]
Lai Yingxu [1 ]
Gao Kaiming [1 ]
Wang Yong [3 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[2] China Elect Standardizat Inst, Beijing 100007, Peoples R China
[3] Guilin Univ Elect Technol, CSIP Guangxi Sect, Guilin 541004, Peoples R China
基金
国家高技术研究发展计划(863计划); 北京市自然科学基金;
关键词
Text classification; Short texts; Language model; Document expansion; External reference;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the rapid development of information technology, short texts arising from socialized human interaction are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classification models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suffers from the assumption of independent, identically distributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as observations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the external reference document. According to Vapnik's methods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the structural risk, which provides a result that allows one to trade off errors on the training sample against improved generalization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the experimental results. With respect to established baselines, results of these experiments show that applying our proposed document expansion method produces better chance to achieve the improved classification performance.
引用
收藏
页码:315 / 321
页数:7
相关论文
共 50 条
  • [21] Catalogue expansion through cataloguing and automatic document retrieval
    Lepsky, K
    Zimmermann, HH
    ZEITSCHRIFT FUR BIBLIOTHEKSWESEN UND BIBLIOGRAPHIE, 2000, 47 (04): : 305 - 316
  • [22] The short texts classification based on neural network topic model
    Shao, Dangguo
    Li, Chengyao
    Huang, Chusheng
    An, Qing
    Xiang, Yan
    Guo, Junjun
    He, Jianfeng
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (03) : 2143 - 2155
  • [23] Sentiment Classification of Short Texts Movie Review Case Study
    Kaur, Jaspinder
    Dara, Rozita
    Matsakis, Pascal
    RECENT TRENDS AND FUTURE TECHNOLOGY IN APPLIED INTELLIGENCE, IEA/AIE 2018, 2018, 10868 : 751 - 761
  • [24] Sentiment Classification Method Based on Blending of Emoticons and Short Texts
    Zou, Haochen
    Xiang, Kun
    ENTROPY, 2022, 24 (03)
  • [25] Orthographic features for emotion classification in Chinese in informal short texts
    I-Hsuan Chen
    Yunfei Long
    Qin Lu
    Chu-Ren Huang
    Language Resources and Evaluation, 2021, 55 : 329 - 352
  • [26] Orthographic features for emotion classification in Chinese in informal short texts
    Chen, I-Hsuan
    Long, Yunfei
    Lu, Qin
    Huang, Chu-Ren
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (02) : 329 - 352
  • [27] The Classification of Short Scientific Texts Using Pretrained BERT Model
    Danilov, Gleb
    Ishankulov, Timur
    Kotik, Konstantin
    Orlov, Yuriy
    Shifrin, Mikhail
    Potapov, Alexander
    PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 : 83 - 87
  • [28] Understanding Short Texts through Semantic Enrichment and Hashing
    Yu, Zheng
    Wang, Haixun
    Lin, Xuemin
    Wang, Min
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1552 - 1553
  • [29] Understanding Short Texts through Semantic Enrichment and Hashing
    Yu, Zheng
    Wang, Haixun
    Lin, Xuemin
    Wang, Min
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 566 - 579
  • [30] Key Information Expansion Applied in Spoken Document Classification based on Lattice
    Zhang, Lei
    Zhang, Zhuo
    Xiang, Xue-zhi
    JOURNAL OF COMPUTERS, 2011, 6 (05) : 923 - 930