Short Texts Classification Through Reference Document Expansion

被引:0
|
作者
Yang Zhen [1 ]
Fan Kefeng [2 ]
Lai Yingxu [1 ]
Gao Kaiming [1 ]
Wang Yong [3 ]
机构
[1] Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
[2] China Elect Standardizat Inst, Beijing 100007, Peoples R China
[3] Guilin Univ Elect Technol, CSIP Guangxi Sect, Guilin 541004, Peoples R China
基金
国家高技术研究发展计划(863计划); 北京市自然科学基金;
关键词
Text classification; Short texts; Language model; Document expansion; External reference;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the rapid development of information technology, short texts arising from socialized human interaction are gradually predominant in network information streams. Accelerating demands are requiring the industry to provide more effective classification of the brief texts. However, faced with short text documents, each of which contains only a few words, traditional document classification models run into difficulty. Aggressive documents expansion works remarkably well for many cases but suffers from the assumption of independent, identically distributed observations. We formalize a view of classification using Bayesian decision theory, treat each short text as observations from a probabilistic model, called a statistical language model, and encode classification preferences with a loss function defined by the language models and the external reference document. According to Vapnik's methods of Structural risk minimization (SRM), the optimal classification action is the one that minimizes the structural risk, which provides a result that allows one to trade off errors on the training sample against improved generalization performance. We conduct experiments by using several corpora of microblog-like data, and analyze the experimental results. With respect to established baselines, results of these experiments show that applying our proposed document expansion method produces better chance to achieve the improved classification performance.
引用
收藏
页码:315 / 321
页数:7
相关论文
共 50 条
  • [1] Short Texts Classification Through Reference Document Expansion
    YANG Zhen
    FAN Kefeng
    LAI Yingxu
    GAO Kaiming
    WANG Yong
    ChineseJournalofElectronics, 2014, 23 (02) : 315 - 321
  • [2] Short texts classification through reference document expansion
    1600, Chinese Institute of Electronics (23):
  • [3] Improving Retrieval of Short Texts Through Document Expansion
    Efron, Miles
    Organisciak, Peter
    Fenlon, Katrina
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 911 - 920
  • [4] TERMINATION AS THE BASIS FOR CLASSIFICATION OF DOCUMENT TEXTS
    Kosova, Marina V.
    Sharipova, Roza R.
    VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2016, 15 (04): : 245 - 252
  • [5] Classification of Short Scientific Texts
    Kusakin, I. K.
    Fedorets, O. V.
    Romanov, A. Y.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2023, 50 (03) : 176 - 183
  • [6] Classification of Short Scientific Texts
    I. K. Kusakin
    O. V. Fedorets
    A. Y. Romanov
    Scientific and Technical Information Processing, 2023, 50 : 176 - 183
  • [7] Automatic Topic Modeling for Single Document Short Texts
    Sajid, Anamta
    Jan, Sadaqat
    Shah, Ibrar A.
    2017 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT), 2017, : 70 - 75
  • [8] A New Vector Representation of Short Texts for Classification
    Li, Yangyang
    Liu, Bo
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (02) : 241 - 249
  • [9] Short Texts Representations for Legal Domain Classification
    Zymkowski, Tomasz
    Szymanski, Julian
    Sobecki, Andrzej
    Drozda, Pawel
    Szalapak, Konrad
    Komar-Komarowski, Kajetan
    Scherer, Rafal
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT I, 2023, 13588 : 105 - 114
  • [10] Topic Modeling of Short Texts: A Pseudo-Document View
    Zuo, Yuan
    Wu, Junjie
    Zhang, Hui
    Lin, Hao
    Wang, Fei
    Xu, Ke
    Xiong, Hui
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 2105 - 2114