Short Texts Classification Through Reference Document Expansion

被引:0
|
作者
Yang Zhen [1 ]
Fan Kefeng [2 ]
Lai Yingxu [1 ]
Gao Kaiming [1 ]
Wang Yong [3 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[2] China Elect Standardizat Inst, Beijing 100007, Peoples R China
[3] Guilin Univ Elect Technol, CSIP Guangxi Sect, Guilin 541004, Peoples R China
基金
国家高技术研究发展计划(863计划); 北京市自然科学基金;
关键词
Text classification; Short texts; Language model; Document expansion; External reference;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the rapid development of information technology, short texts arising from socialized human interaction are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classification models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suffers from the assumption of independent, identically distributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as observations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the external reference document. According to Vapnik's methods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the structural risk, which provides a result that allows one to trade off errors on the training sample against improved generalization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the experimental results. With respect to established baselines, results of these experiments show that applying our proposed document expansion method produces better chance to achieve the improved classification performance.
引用
收藏
页码:315 / 321
页数:7
相关论文
共 50 条
  • [41] Emotion classification for short texts: an improved multi-label method
    Liu, Xuan
    Shi, Tianyi
    Zhou, Guohui
    Liu, Mingzhe
    Yin, Zhengtong
    Yin, Lirong
    Zheng, Wenfeng
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2023, 10 (01):
  • [42] Joint Representations of Texts and Labels with Compositional Loss for Short Text Classification
    Hao, Ming
    Wang, Weijing
    Zhou, Fang
    JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 669 - 687
  • [43] PSLDA: a novel supervised pseudo document-based topic model for short texts
    Sun, Mingtao
    Zhao, Xiaowei
    Lin, Jingjing
    Jing, Jian
    Wang, Deqing
    Jia, Guozhu
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (06)
  • [44] Effect of Different Feature Types on Age Based Classification of Short Texts
    Pentel, Avar
    2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA), 2015,
  • [45] Universal affective model for Readers' emotion classification over short texts
    Liang, Weiming
    Xie, Haoran
    Rao, Yanghui
    Lau, Raymond Y. K.
    Wang, Fu Lee
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 322 - 333
  • [46] Emotion classification for short texts: an improved multi-label method
    Xuan Liu
    Tianyi Shi
    Guohui Zhou
    Mingzhe Liu
    Zhengtong Yin
    Lirong Yin
    Wenfeng Zheng
    Humanities and Social Sciences Communications, 10
  • [47] A REFERENCE DOCUMENT
    Baron, Pierre
    Pincas, Eric
    HISTORIA, 2015, (818): : 8 - 8
  • [48] Approaching an expansion of teaching American studies through popular culture texts
    Canero, Julio
    Marta Marini, Anna
    PORTA LINGUARUM, 2022, (37) : 9 - 26
  • [49] Predicting Rating Polarity through Automatic Classification of Review Texts
    Budhi, Gregorius Satia
    Chiong, Raymond
    Pranata, Ilung
    Hu, Zhongyi
    2017 IEEE CONFERENCE ON BIG DATA AND ANALYTICS (ICBDA), 2017, : 19 - 24
  • [50] FEATURES OF DOCUMENT AND LITERARY TEXTS
    Tokarev, Grigoriy Valeryevich
    VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2016, 15 (01): : 77 - 81