A Language-Independent Hybrid Approach for Multi-Word Expression Extraction

被引:0
|
作者
Liang, Yinghong [1 ]
Tan, Hongye [2 ]
Li, Hui [1 ]
Wang, Zhigang [1 ]
Gui, Wenming [1 ]
机构
[1] Jingling Inst Technol, Dept Software Engn, Nanjing, Jiangsu, Peoples R China
[2] Shanxi Univ, Dept Comp & Informat Technol, Taiyuan, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-Word Expression; Bi-LSTM; Language-Independent;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Failing to identify multi-word expression (MWE) may cause serious problems for many Natural Language Processing (NLP) tasks. Previous approaches heavily depend on language specific knowledge and pre-existing natural language processing (NLP) tools. However, many languages (including Chinese language) have less such resources and tools compared to English. An automatically learn effective features from corpus, without relying on language specific resources is needed. In this paper, we develop a hybrid approach that combines Bidirectional long short-term memory (Bi-LSTM), word correlation degree calculation and weakly supervised K-means cluster to capture both sequence information and correlation degree of phrase from specific contexts, and use them to train a multi-word expression detector for multiple languages without any manually encoded features. Experiment result shows that the extraction results of Chinese and English multi-word expression using this hybrid approach is better than that of baseline algorithm, which verified that the hybrid approach is effective.
引用
收藏
页码:3273 / 3279
页数:7
相关论文
共 50 条
  • [31] Towards language-independent approach for security concerns weaving
    Mourad, Azzam
    Alhadidi, Dima
    Debbabi, Mourad
    SECRYPT 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2008, : 460 - 465
  • [32] A corpus-driven approach to formulaic language in English Multi-word patterns in speech and writing
    Biber, Douglas
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2009, 14 (03) : 275 - 311
  • [33] Topic Detection and Multi-word Terms Extraction for Arabic Unvowelized Documents
    Koulali, Rim
    Meziane, Ahdelouafi
    INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 614 - 623
  • [34] Term Extraction For A Single & Multi-Word Based On Islamic Corpus English
    Abduljabbar, Waleed Khalid
    Tomah, Saadiyaa A.
    Ali, Ammar Abdulateef
    2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 107 - 111
  • [35] Rule-based Automatic Multi-Word Term Extraction and Lemmatization
    Stankovic, Ranka
    Krstev, Cvetana
    Obradovic, Ivan
    Lazic, Biljana
    Trtovac, Aleksandra
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 507 - 514
  • [36] Semi-compositional Method for Synonym Extraction of Multi-Word Terms
    Hazem, Amir
    Daille, Beatrice
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1202 - 1207
  • [37] Unsupervised Classification of Verb Noun Multi-Word Expression Tokens
    Diab, Mona T.
    Krishna, Madhav
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2009, 5449 : 98 - 110
  • [38] Learning to predict: Second language perception of reduced multi-word sequences
    Tizon-Couto, David
    Lorenz, David
    SECOND LANGUAGE RESEARCH, 2024,
  • [39] Language-Independent Word Acquisition Method Using a State-Transition Model
    Xu, Bin
    Yamagishi, Naohide
    Suzuki, Makoto
    Goto, Masayuki
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2016, 15 (03): : 224 - 230
  • [40] Language-Independent Text-Line Extraction Algorithm for Handwritten Documents
    Ryu, Jewoong
    Koo, Hyung Il
    Cho, Nam Ik
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1115 - 1119