A Language-Independent Hybrid Approach for Multi-Word Expression Extraction

被引：0

作者：

Liang, Yinghong ^{[1
]}

Tan, Hongye ^{[2
]}

Li, Hui ^{[1
]}

Wang, Zhigang ^{[1
]}

Gui, Wenming ^{[1
]}

机构：

[1] Jingling Inst Technol, Dept Software Engn, Nanjing, Jiangsu, Peoples R China

[2] Shanxi Univ, Dept Comp & Informat Technol, Taiyuan, Shanxi, Peoples R China

来源：

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2017年

基金：

中国国家自然科学基金;

关键词：

Multi-Word Expression; Bi-LSTM; Language-Independent;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Failing to identify multi-word expression (MWE) may cause serious problems for many Natural Language Processing (NLP) tasks. Previous approaches heavily depend on language specific knowledge and pre-existing natural language processing (NLP) tools. However, many languages (including Chinese language) have less such resources and tools compared to English. An automatically learn effective features from corpus, without relying on language specific resources is needed. In this paper, we develop a hybrid approach that combines Bidirectional long short-term memory (Bi-LSTM), word correlation degree calculation and weakly supervised K-means cluster to capture both sequence information and correlation degree of phrase from specific contexts, and use them to train a multi-word expression detector for multiple languages without any manually encoded features. Experiment result shows that the extraction results of Chinese and English multi-word expression using this hybrid approach is better than that of baseline algorithm, which verified that the hybrid approach is effective.

引用

页码：3273 / 3279

页数：7

共 50 条

[31] Towards language-independent approach for security concerns weaving
Mourad, Azzam
Alhadidi, Dima
Debbabi, Mourad
SECRYPT 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2008, : 460 - 465
[32] A corpus-driven approach to formulaic language in English Multi-word patterns in speech and writing
Biber, Douglas
INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2009, 14 (03) : 275 - 311
[33] Topic Detection and Multi-word Terms Extraction for Arabic Unvowelized Documents
Koulali, Rim
Meziane, Ahdelouafi
INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 614 - 623
[34] Term Extraction For A Single & Multi-Word Based On Islamic Corpus English
Abduljabbar, Waleed Khalid
Tomah, Saadiyaa A.
Ali, Ammar Abdulateef
2018 1ST ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION AND SCIENCES (AICIS 2018), 2018, : 107 - 111
[35] Rule-based Automatic Multi-Word Term Extraction and Lemmatization
Stankovic, Ranka
Krstev, Cvetana
Obradovic, Ivan
Lazic, Biljana
Trtovac, Aleksandra
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 507 - 514
[36] Semi-compositional Method for Synonym Extraction of Multi-Word Terms
Hazem, Amir
Daille, Beatrice
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1202 - 1207
[37] Unsupervised Classification of Verb Noun Multi-Word Expression Tokens
Diab, Mona T.
Krishna, Madhav
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2009, 5449 : 98 - 110
[38] Learning to predict: Second language perception of reduced multi-word sequences
Tizon-Couto, David
Lorenz, David
SECOND LANGUAGE RESEARCH, 2024,
[39] Language-Independent Word Acquisition Method Using a State-Transition Model
Xu, Bin
Yamagishi, Naohide
Suzuki, Makoto
Goto, Masayuki
INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2016, 15 (03): : 224 - 230
[40] Language-Independent Text-Line Extraction Algorithm for Handwritten Documents
Ryu, Jewoong
Koo, Hyung Il
Cho, Nam Ik
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1115 - 1119

← 1 2 3 4 5 →