A Language-Independent Hybrid Approach for Multi-Word Expression Extraction

被引:0
|
作者
Liang, Yinghong [1 ]
Tan, Hongye [2 ]
Li, Hui [1 ]
Wang, Zhigang [1 ]
Gui, Wenming [1 ]
机构
[1] Jingling Inst Technol, Dept Software Engn, Nanjing, Jiangsu, Peoples R China
[2] Shanxi Univ, Dept Comp & Informat Technol, Taiyuan, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-Word Expression; Bi-LSTM; Language-Independent;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Failing to identify multi-word expression (MWE) may cause serious problems for many Natural Language Processing (NLP) tasks. Previous approaches heavily depend on language specific knowledge and pre-existing natural language processing (NLP) tools. However, many languages (including Chinese language) have less such resources and tools compared to English. An automatically learn effective features from corpus, without relying on language specific resources is needed. In this paper, we develop a hybrid approach that combines Bidirectional long short-term memory (Bi-LSTM), word correlation degree calculation and weakly supervised K-means cluster to capture both sequence information and correlation degree of phrase from specific contexts, and use them to train a multi-word expression detector for multiple languages without any manually encoded features. Experiment result shows that the extraction results of Chinese and English multi-word expression using this hybrid approach is better than that of baseline algorithm, which verified that the hybrid approach is effective.
引用
收藏
页码:3273 / 3279
页数:7
相关论文
共 50 条
  • [1] A hybrid Approach for Arabic Multi-Word Term Extraction
    Bounhas, Ibrahim
    Slimani, Yahya
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 429 - 436
  • [2] A multi-word term extraction program for Arabic language
    Boulaknadel, Siham
    Daille, Beatrice
    Aboutajdine, Driss
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1485 - 1488
  • [3] Word Embedding Approach for Synonym Extraction of Multi-Word Terms
    Hazem, Amir
    Daille, Beatrice
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 297 - 303
  • [4] A Combined Approach for the Extraction of the Multi-word and Nested Biomedical
    Gong, Lejun
    Feng, Jiacheng
    Yang, Ronggen
    Yang, Geng
    2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 708 - 711
  • [5] A multi-word term extraction system
    Chen, Jisong
    Yeh, Chung-Hsing
    Chau, Rowena
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1160 - 1165
  • [6] Hybrid Approach for Automatic Identification of Multi-Word Expressions in Lithuanian
    Mandravickaite, Justina
    Rimkute, Erika
    Krilavicius, Tomas
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 153 - 159
  • [7] A language-independent approach to the extraction of dependencies between source code entities
    Savic, Milos
    Rakic, Gordana
    Budimac, Zoran
    Ivanovic, Mirjana
    INFORMATION AND SOFTWARE TECHNOLOGY, 2014, 56 (10) : 1268 - 1288
  • [8] A Contrastive Approach to Multi-word Term Extraction from Domain Corpora
    Bonin, Francesca
    Dell'Orletta, Felice
    Venturi, Giulia
    Montemagni, Simonetta
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [9] LHDiff: A Language-Independent Hybrid Approach for Tracking Source Code Lines
    Asaduzzaman, Muhammad
    Roy, Chanchal K.
    Schneider, Kevin A.
    Di Penta, Massimiliano
    2013 29TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2013, : 230 - 239
  • [10] AUGMENTED MUTUAL INFORMATION FOR MULTI-WORD EXTRACTION
    Zhang, Wen
    Yoshida, Taketoshi
    Ho, Tu Bao
    Tang, Xijin
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (02): : 543 - 554