Experiments in multi-class e-mail categorization

被引:0
|
作者
Berger, Helmut [1 ]
Merkl, Dieter [1 ]
Dittenbach, Michael [1 ]
机构
[1] E Commerce Competence Ctr, A-1220 Vienna, Austria
关键词
document categorization; document representation; machine learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper reports on extensive experiments in multi-class document categorization with supervised and unsupervised machine learning techniques. in particular, experiments on a document collection consisting of personal e-mail messages are described. Two distinct document representation formalisms are employed to characterize these messages, namely a standard word-based approach and a character n-gram document representation. Moreover, two distinct data sets, both with and without e-mail header information, are compared. Based on these document representations, the categorization performance of the various machine learning approaches is assessed and a comparison is given. The results indicate a substantial increase in classification accuracy when header information is considered in the document representation. To a much lesser degree, word-based document representations are advantageous over n-gram representations.
引用
收藏
页码:79 / 90
页数:12
相关论文
共 50 条
  • [1] Analyzing the effect of document representation on machine learning approaches in multi-class e-mail filtering
    Berger, Helmut
    Dittenbach, Michael
    Merkl, Dieter
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 297 - +
  • [2] E-mail about e-mail?
    Recine, L
    DATAMATION, 1996, 42 (13): : 7 - 8
  • [3] E-mail Address Categorization based on Semantics of Surnames
    Veluru, Suresh
    Rahulamathavan, Yogachandran
    Viswanath, P.
    Longley, Paul
    Rajarajan, Muttukrishnan
    2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM), 2013, : 222 - 229
  • [4] E-mail minus 'E-mail'
    Solovy, A
    HOSPITALS & HEALTH NETWORKS, 2002, 76 (11): : 26 - 26
  • [5] E-mail: what is e-mail?
    P K Downes
    British Dental Journal, 1998, 185 : 163 - 165
  • [6] E-mail: what is e-mail?
    Downes, PK
    BRITISH DENTAL JOURNAL, 1998, 185 (04) : 163 - 165
  • [7] The E-Mail Categorization and Filtering Technology Based On eEP
    Li, Yan
    Dong, Xiguang
    THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 259 - 262
  • [8] First-class E-mail
    PC World (San Francisco CA), 8 (169):
  • [9] Multi-class E-mail Classification with a Semi-Supervised Approach Based on Automatic Feature Selection and Information Retrieval
    Manuel Fernandez, Juan
    Errecalde, Marcelo
    CLOUD COMPUTING, BIG DATA & EMERGING TOPICS, JCC-BD&ET 2022, 2022, 1634 : 75 - 90
  • [10] On Taxonomies for Multi-class Image Categorization
    Binder, Alexander
    Mueller, Klaus-Robert
    Kawanabe, Motoaki
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2012, 99 (03) : 281 - 301