E-mail Address Categorization based on Semantics of Surnames

被引:0
|
作者
Veluru, Suresh [1 ]
Rahulamathavan, Yogachandran [1 ]
Viswanath, P.
Longley, Paul [2 ]
Rajarajan, Muttukrishnan [1 ]
机构
[1] City Univ London, Sch Engn & Math Sci, Informat Secur Grp, London EC1V 0HB, England
[2] UCL, Dept Geog, London, England
来源
2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING (CIDM) | 2013年
基金
英国工程与自然科学研究理事会;
关键词
Vector space model; latent semantic analysis; surnames; average link clustering method; suffix tree;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Surname (family name) analysis is used in geography to understand population origins, migration, identity, social norms and cultural customs. Some of these are supposedly evolved over generations. Surnames exhibit good statistical properties that can be used to extract information in names data set such as automatic detection of ethnic or community groups in names. An e-mail address, often contains surname as a substring. This containment may be full or partial. An e-mail address categorization based on semantics of surnames is the objective of this paper. This is achieved in two phases. First phase deals with surname representation and clustering. Here, a vector space model is proposed where latent semantic analysis is performed. Clustering is done using the method called average-linkage method. In the second phase, an email is categorized as belonging to one of the categories (discovered in first phase). For this, substring matching is required, which is done in an efficient way by using suffix tree data structure. We perform experimental evaluation for the 500 most frequently occurring surnames in India and United Kingdom. Also, we categorize the e-mail addresses that have these surnames as substrings.
引用
收藏
页码:222 / 229
页数:8
相关论文
共 50 条
  • [1] New E-mail address
    不详
    ZKG INTERNATIONAL, 1999, 52 (08): : A5 - A5
  • [2] The E-Mail Categorization and Filtering Technology Based On eEP
    Li, Yan
    Dong, Xiguang
    THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 259 - 262
  • [3] ADDRESS CHANGE VIA E-MAIL
    HANEY, K
    DATAMATION, 1994, 40 (20): : 15 - 15
  • [4] Why provide an e-mail address?
    Kim, DJ
    ABA JOURNAL, 2001, 87 : 12 - 12
  • [5] Register your e-mail address
    Hill, Karen
    NURSING IN CRITICAL CARE, 2012, 17 (02) : 106 - 106
  • [6] Multiple E-mail Address Certificate
    Banday, M. Tariq
    Sheikh, Shafiya Afzal
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1134 - 1139
  • [7] Knocking at your e-mail address?
    不详
    TIJDSCHRIFT VOOR DIERGENEESKUNDE, 2009, 134 (05) : 197 - 197
  • [8] CMCS's e-mail address
    不详
    SOAP COSMETICS CHEMICAL SPECIALTIES, 1996, 72 (03): : 90 - 90
  • [9] E-mail Address Harvesting on PubMed-A Call for Responsible Handling of E-mail Addresses
    Thomas, Brendan
    MAYO CLINIC PROCEEDINGS, 2011, 86 (04) : 362 - 362
  • [10] Forensic Analysis of E-mail Address Spoofing
    Gupta, Surekha
    Pilli, Emmanuel S.
    Mishra, Preeti
    Pundir, Sumit
    Joshi, R. C.
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 898 - 904