Modeling Chinese Microblogs with Five Ws for Topic Hashtags Extraction

被引:0
|
作者
Zhibin Zhao [1 ]
Jiahong Sun [1 ]
Lan Yao [1 ]
Xun Wang [1 ]
Jiahong Chu [1 ]
Huan Liu [1 ]
Ge Yu [1 ]
机构
[1] College of Computer Science and Engineering, Northeastern University
基金
中国国家自然科学基金;
关键词
hashtag; microblog; topic detection; short-message-style news; five Ws;
D O I
暂无
中图分类号
TP393.092 []; TP391.1 [文字信息处理];
学科分类号
080402 ; 081203 ; 0835 ;
摘要
Hashtags are important metadata in microblogs and are used to mark topics or index messages. However,statistics show that hashtags are absent from most microblogs. This poses great challenges for the retrieval and analysis of these tagless microblogs. In this paper, we summarize the similarity between microblogs and shortmessage-style news, and then propose an algorithm, named 5WTAG, for detecting microblog topics based on a model of five Ws(When, Where, Who, What, ho W). As five-W attributes are the core components in event description, it is guaranteed theoretically that 5WTAG can properly extract semantic topics from microblogs. We introduce the detailed procedure of the algorithm in this paper including spam microblog identification, microblog segmentation, and candidate hashtag construction. In addition, we propose a novel recommendation computing method for ranking candidate hashtags, which combines syntax and semantic analysis and observes the distribution of artificial topic hashtags. Finally, we conduct comprehensive experiments to verify the semantic correctness and completeness of the candidate hashtags, as well as the accuracy of the recommendation method using real data from Sina Weibo.
引用
收藏
页码:135 / 148
页数:14
相关论文
共 50 条
  • [21] A Topic Modeling Approach for Traditional Chinese Medicine Prescriptions
    Yao, Liang
    Zhang, Yin
    Wei, Baogang
    Zhang, Wenjin
    Jin, Zhe
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (06) : 1007 - 1021
  • [22] A Novel Chinese Text Topic Extraction Method Based on LDA
    Liu, Qihua
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 53 - 57
  • [23] Examining the Performance of Topic Modeling Techniques in Twitter Trends Extraction
    Kurniati, Mutia N.
    Ryu, Woo-Jong
    Alam, Md. Hijbul
    Lee, SangKeun
    2014 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2014), 2014, : 364 - 369
  • [24] Wikipedia Based News Video Topic Modeling for Information Extraction
    Roy, Sujoy
    Mak, Mun-Thye
    Wan, Kong Wah
    ADVANCES IN MULTIMEDIA MODELING, PT II, 2011, 6524 : 411 - 420
  • [25] Sentence extraction with topic modeling for question–answer pair generation
    Chung-Hsien Wu
    Chao-Hong Liu
    Po-Hsun Su
    Soft Computing, 2015, 19 : 39 - 46
  • [26] Topic Modeling Analysis of Chinese Medicine Literature on Potential Treatment
    QIAN Jianan
    KANG Yanlan
    HE Youcheng
    HU Hongyi
    Chinese Journal of Integrative Medicine, 2024, 30 (12) : 1128 - 1136
  • [27] Topic modeling of Chinese language beyond a bag-of-words
    Qin, Zengchang
    Cong, Yonghui
    Wan, Tao
    COMPUTER SPEECH AND LANGUAGE, 2016, 40 : 60 - 78
  • [28] Tibetan-Chinese Cross-language Topic Extraction and Alignment
    Sun, Yuan
    Zhao, Qian
    Yuan, Wolerrg
    2018 INTERNATIONAL CONFERENCE ON BIG DATA AND ARTIFICIAL INTELLIGENCE (BDAI 2018), 2018, : 67 - 71
  • [29] Chinese-Thai Cross-Language Topic Extraction and Alignment
    Li, Xia
    Zeng, ZiHang
    Zhang, JianShu
    Jiang, ShengYi
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 239 - 242
  • [30] Topic generation for Chinese stocks: a cognitively motivated topic modeling method using social media data
    Chen, Wenhao
    Lai, Kinkeung
    Cai, Yi
    QUANTITATIVE FINANCE AND ECONOMICS, 2018, 2 (02): : 279 - 293