Multilingual sentence categorization and novelty mining

被引:10
|
作者
Zhang, Yi [1 ]
Tsai, Flora S. [1 ]
Kwee, Agus Trisnajaya [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
Multilingual categorization; Sentence retrieval; Novelty mining; Malay; Chinese;
D O I
10.1016/j.ipm.2010.02.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user's information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:667 / 675
页数:9
相关论文
共 50 条
  • [21] The ICSI plus Multilingual Sentence Segmentation System
    Zimmerman, M.
    Hakkani-Tuer, D.
    Fung, J.
    Mirghafori, N.
    Gottlieb, L.
    Shriberg, E.
    Liu, Y.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 117 - 120
  • [22] Toward Computational Models of Multilingual Sentence Processing
    Frank, Stefan L.
    LANGUAGE LEARNING, 2021, 71 : 193 - 218
  • [23] Multilingual Universal Sentence Encoder for Semantic Retrieval
    Yang, Yinfei
    Cer, Daniel
    Ahmad, Amin
    Guo, Mandy
    Law, Jax
    Constant, Noah
    Abrego, Gustavo Hernandez
    Yuan, Steve
    Tar, Chris
    Sung, Yun-Hsuan
    Strope, Brian
    Kurzweil, Ray
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, 2020, : 87 - 94
  • [24] Hierarchical Rhetorical Sentence Categorization for Scientific Papers
    Rachman, G. H.
    Khodra, M. L.
    Widyantoro, D. H.
    2ND INTERNATIONAL CONFERENCE ON COMPUTING AND APPLIED INFORMATICS 2017, 2018, 978
  • [25] A Highly Effective Hybrid Model for Sentence Categorization
    Chen, Zhenhong
    Yang, Kai
    Cai, Yi
    Huang, Dongping
    Leung, Ho-Fung
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2016, 2016, 9645 : 98 - 111
  • [26] Enhancing Text Categorization Using Sentence Semantics
    Shehata, Shady
    Karray, Fakhri
    Kamel, Mohamed
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 87 - 98
  • [27] Information services for novelty mining
    Tsai, Flora S.
    Kwee, Agus T.
    KNOWLEDGE ENGINEERING REVIEW, 2014, 29 (02): : 234 - 247
  • [28] Categorization in infancy based on novelty and co-occurrence
    Wu, Rachel
    Kurum, Esra
    Ahmed, Claire
    Sain, Debaleena
    Aslin, Richard N.
    INFANT BEHAVIOR & DEVELOPMENT, 2021, 62
  • [29] Sentence-Level Novelty Detection in English and Malay
    Kwee, Agus T.
    Tsai, Flora S.
    Tang, Wenyin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 40 - 51
  • [30] Sentence Production in Bilingual and Multilingual Aphasia: A Scoping Review
    Norhan, Aslam
    Hassan, Fatimah Hani
    Razak, Rogayah
    Aziz, Mohd Azmarul
    LANGUAGES, 2023, 8 (01)