Sense disambiguation for Punjabi language using supervised machine learning techniques

被引:2
|
作者
Singh, Varinder Pal [1 ]
Kumar, Parteek [1 ]
机构
[1] Thapar Inst Engn & Technol, Comp Sci & Engn Dept, Patiala 147004, Punjab, India
关键词
Lexical features; syntactic features; word embedding; supervised learning techniques; word sense disambiguation;
D O I
10.1007/s12046-019-1206-x
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Automatic identification of a meaning of a word in a context is termed as Word Sense Disambiguation (WSD). It is a vital and hard artificial intelligence problem used in several natural language processing applications like machine translation, question answering, information retrieval, etc. In this paper, an explicit WSD system for Punjabi language using supervised techniques has been analysed. The sense tagged corpus of 150 ambiguous Punjabi noun words has been manually prepared. The six supervised machine learning techniques Decision List, Decision Tree, Naive Bayes, K-Nearest Neighbour (K-NN), Random Forest and Support Vector Machines (SVM) have been investigated in this proposed work. Every classifier has used same feature space encompassing lexical (unigram, bigram, collocations, and co-occurrence) and syntactic (part of speech) count based features. The semantic features of Punjabi language have been devised from the unlabelled Punjabi Wikipedia text using word2vec continuous bag of word and skip gram shallow neural network models. Two deep learning neural network classifiers multilayer perceptron and long short term memory have also been applied for WSD of Punjabi words. The word embedding features have experimented on six classifiers for the Punjabi WSD task. It has been observed that the performance of the supervised classifiers applied for the WSD task of Punjabi language has been enhanced with the application of word embedding features. In this work, an accuracy of 84% has been achieved by LSTM classifier using word embedding feature.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Building Sense Tagged Corpus Using Wikipedia for Supervised Word Sense Disambiguation
    Saif, Abdulgabbar
    Omar, Nazlia
    Zainodin, Ummi Zakiah
    Ab Aziz, Mohd Juziaddin
    8TH ANNUAL INTERNATIONAL CONFERENCE ON BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, BICA 2017 (EIGHTH ANNUAL MEETING OF THE BICA SOCIETY), 2018, 123 : 403 - 412
  • [22] Machine Learning approach for resolving anaphors in the Punjabi Language
    Kaur, Kawaljit
    Goyal, Vishal
    Dutta, Kamlesh
    2021 1ST INTERNATIONAL CONFERENCE IN INFORMATION AND COMPUTING RESEARCH (ICORE 2021), 2021, : 50 - 55
  • [23] Investigating problems of semi-supervised learning for word sense disambiguation
    Le, Anh-Cuong
    Shimazu, Akira
    Nguyen, Le-Minh
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 482 - +
  • [24] Word sense disambiguation based on semi-supervised ensemble learning
    Zhang C.
    Xiong J.
    Gao X.
    Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2020, 41 (08): : 1216 - 1222
  • [25] Supervised word sense disambiguation using semantic diffusion kernel
    Wang, Tinghua
    Rao, Junyang
    Hu, Qi
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 27 : 167 - 174
  • [26] A supervised machine learning approach to author disambiguation in the Web of Science
    Rehs, Andreas
    JOURNAL OF INFORMETRICS, 2021, 15 (03)
  • [27] Word sense disambiguation using heterogeneous language resources
    Shirai, K
    Tamagaki, T
    NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 377 - 385
  • [28] Word sense disambiguation by machine learning approach: A short survey
    Tatar, D
    FUNDAMENTA INFORMATICAE, 2005, 64 (1-4) : 433 - 442
  • [29] Interpretability in Word Sense Disambiguation using Tsetlin Machine
    Yadav, Rohan Kumar
    Jiao, Lei
    Granmo, Ole-Christoffer
    Goodwin, Morten
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 402 - 409
  • [30] Semi-supervised Word Sense Disambiguation Using the Web as Corpus
    Guzman-Cabrera, Rafael
    Rosso, Paolo
    Montes-y-Gomez, Manuel
    Villasenor-Pineda, Luis
    Pinto-Avendano, David
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2009, 5449 : 256 - +