PBM: A NEW DATASET FOR BLOG MINING

被引:0
|
作者
Aziz, Mehwish
Rafi, Muhammad
机构
关键词
Web; 2.0; blogosphere; text mining; clustering; natural language processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new genre in web 2.0 is a digital diary of web user, which has chronological entries and contains a lot of useful knowledge, thus offers a lot of challenges and opportunities for text mining. In this paper, we report a new indigenous dataset for Pakistani Political Blogosphere. The paper describes the process of data collection, organization, and standardization. We have used this dataset for carrying out various text mining tasks for blogosphere, like: blog-search, political sentiments analysis and tracking, identification of influential blogger, and clustering of the blog-posts. We wish to offer this dataset free for others who aspire to pursue further in this domain.
引用
收藏
页码:443 / 449
页数:7
相关论文
共 50 条
  • [1] New metries for blog mining
    Ulicny, Brian
    Baclawski, Ken
    Magnus, Amy
    DATA MINING, INTRUSION DETECTION, INFORMATION ASSURANCE, AND DATA NETWORKS SECURITY 2007, 2007, 6570
  • [2] Blog Summarization for Blog Mining
    Asbagh, Mohsen Jafari
    Sayyadi, Mohsen
    Abolhassani, Hassan
    SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2009, 209 : 157 - +
  • [3] A blog mining framework
    Chau, Michael
    Xu, Jennifer
    Cao, Jinwei
    Lam, Porsche
    Shiu, Boby
    IT Professional, 2009, 11 (01) : 36 - 41
  • [4] A Multidimensional Approach to Blog Mining
    Sandeep, K. S.
    Patil, Nagamma
    PROGRESS IN INTELLIGENT COMPUTING TECHNIQUES: THEORY, PRACTICE, AND APPLICATIONS, VOL 2, 2018, 719 : 51 - 58
  • [5] Blog mining for the fortune 500
    Geller, James
    Parikh, Sapankumar
    Krishnan, Sriram
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 379 - +
  • [6] Development of An Opinion Blog Mining System
    Al-Hamami, Alaa H.
    Shahrour, Suzan H.
    2015 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT), 2015, : 74 - 79
  • [7] Probabilistic techniques for corporate blog mining
    Tsai, Flora S.
    Chen, Yun
    Chan, Kap Luk
    EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 35 - 44
  • [8] Investigating MOOCs Through Blog Mining
    Chen, Yong
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTRIBUTED LEARNING, 2014, 15 (02): : 85 - 106
  • [9] Mining the MACHO dataset
    Hegland, M
    Clarke, W
    Kahn, M
    COMPUTER PHYSICS COMMUNICATIONS, 2001, 142 (1-3) : 22 - 28
  • [10] Interest Mining Algorithm based on Blog Information
    Ou, Guohua
    Xu, Changjian
    Zhan, Haoxun
    Qin, Yong
    Huang, Han
    APPLIED MATERIALS AND TECHNOLOGIES FOR MODERN MANUFACTURING, PTS 1-4, 2013, 423-426 : 2712 - +