A Language-Independent Hybrid Approach for Multi-Word Expression Extraction

被引:0
|
作者
Liang, Yinghong [1 ]
Tan, Hongye [2 ]
Li, Hui [1 ]
Wang, Zhigang [1 ]
Gui, Wenming [1 ]
机构
[1] Jingling Inst Technol, Dept Software Engn, Nanjing, Jiangsu, Peoples R China
[2] Shanxi Univ, Dept Comp & Informat Technol, Taiyuan, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-Word Expression; Bi-LSTM; Language-Independent;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Failing to identify multi-word expression (MWE) may cause serious problems for many Natural Language Processing (NLP) tasks. Previous approaches heavily depend on language specific knowledge and pre-existing natural language processing (NLP) tools. However, many languages (including Chinese language) have less such resources and tools compared to English. An automatically learn effective features from corpus, without relying on language specific resources is needed. In this paper, we develop a hybrid approach that combines Bidirectional long short-term memory (Bi-LSTM), word correlation degree calculation and weakly supervised K-means cluster to capture both sequence information and correlation degree of phrase from specific contexts, and use them to train a multi-word expression detector for multiple languages without any manually encoded features. Experiment result shows that the extraction results of Chinese and English multi-word expression using this hybrid approach is better than that of baseline algorithm, which verified that the hybrid approach is effective.
引用
收藏
页码:3273 / 3279
页数:7
相关论文
共 50 条
  • [41] Meaning first: A case for language-independent access to word meaning in the bilingual brain
    Ng, Shukhan
    Wicha, Nicole Y. Y.
    NEUROPSYCHOLOGIA, 2013, 51 (05) : 850 - 863
  • [42] Highly Language-Independent Word Lemmatization Using a Machine-Learning Classifier
    Akhmetov, Iskander
    Pak, Alexandr
    Ualiyeva, Irina
    Gelbukh, Alexander
    COMPUTACION Y SISTEMAS, 2020, 24 (03): : 1353 - 1364
  • [43] Lexical Inference over Multi-Word Predicates: A Distributional Approach
    Abend, Omri
    Cohen, Shay B.
    Steedman, Mark
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 644 - 654
  • [44] The Oil Field Multi-word Term Recognition Based on Hybrid Strategy
    Liang, Ying-hong
    Liang, Ying-hong
    Li, Jin-xiang
    Xian, Xue-feng
    Chen, Ke
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2013), 2013, : 395 - 398
  • [45] The Interaction between Formulaic Language and Linguistic Variability in Authentic Language Use, Exemplified by the Prepositional Multi-Word Expression unter Geschrei
    Iglesias Iglesias, Nely M.
    REVISTA DE FILOLOGIA ALEMANA, 2019, 27 : 197 - 209
  • [46] Research on Automatic Chinese Multi-word Term Extraction Based on Term Component
    Kang, Wei
    Sui, Zhifang
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES: LANGUAGE TECHNOLOGY FOR THE KNOWLEDGE-BASED ECONOMY, 2009, 5459 : 57 - 67
  • [47] A Language-Independent Approach to Extracting Derivational Relations from an Inflectional Lexicon
    Baranes, Marion
    Sagot, Benoit
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2793 - 2799
  • [48] A language-independent authorship attribution approach for author identification of text documents
    Ramezani, Reza
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 180
  • [49] A Corpus Based Approach to Near Synonymy of German Multi-Word Expressions
    Huemmer, Christiane
    GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 142 - 149
  • [50] A Language-Independent Acronym Extraction From Biomedical Texts With Hidden Markov Models
    Osiek, Bruno Adam
    Xexeo, Geraldo
    Vidal de Carvalho, Luis Alfredo
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2010, 57 (11) : 2677 - 2688