Industry Specific Word Embedding and its Application in Log Classification

被引:10
|
作者
Khabiri, Elham [1 ]
Gifford, Wesley M. [1 ]
Vinzamuri, Bhanukiran [1 ]
Patel, Dhaval [1 ]
Mazzoleni, Pietro [2 ]
机构
[1] IBM Res, Yorktown Hts, NY 10598 USA
[2] IBM Corp, Armonk, NY USA
关键词
natural language processing; word embeddings; text classification;
D O I
10.1145/3357384.3357827
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Word, sentence and document embeddings have become the cornerstone of most natural language processing-based solutions. The training of an effective embedding depends on a large corpus of relevant documents. However, such corpus is not always available, especially for specialized heavy industries such as oil, mining, or steel. To address the problem, this paper proposes a semi-supervised learning framework to create document corpus and embedding starting from an industry taxonomy, along with a very limited set of relevant positive and negative documents. Our solution organizes candidate documents into a graph and adopts different explore and exploit strategies to iteratively create the corpus and its embedding. At each iteration, two metrics, called Coverage and Context Similarity, are used as proxy to measure the quality of the results. Our experiments demonstrate how an embedding created by our solution is more effective than the one created by processing thousands of industry-specific document pages. We also explore using our embedding in downstream tasks, such as building an industry specific classification model given labeled training data, as well as classifying unlabeled documents according to industry taxonomy terms.
引用
收藏
页码:2713 / 2721
页数:9
相关论文
共 50 条
  • [41] Convolutional Neural Network with Contextualized Word Embedding for Text Classification
    Fan, Gaoyang
    Zhu, Cui
    Zhu, Wenjun
    2019 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2019, 11321
  • [42] A log-linearized Gaussian mixture network and its application to EEG pattern classification
    Tsuji, T
    Fukuda, O
    Ichinobe, H
    Kaneko, M
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 1999, 29 (01): : 60 - 72
  • [43] Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification
    Li, Peihua
    Wang, Qilong
    Zeng, Hui
    Zhang, Lei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) : 803 - 817
  • [44] Application of Output Embedding on Word2Vec
    Uchida, Shuto
    Yoshikawa, Tomohiro
    Furuhashi, Takeshi
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1433 - 1436
  • [45] Domain specific word embedding matrix for training neural networks
    Petrovic, Dorde
    Janicijevic, Stefana
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: APPLICATIONS AND INNOVATIONS (IC-AIAI 2019), 2019, : 71 - 75
  • [46] Sentiment-Specific Word Embedding for Indonesian Sentiment Analysis
    Farhan, Ahmad Naufal
    Khodra, Masayu Leylia
    2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS, CONCEPTS, THEORY, AND APPLICATIONS (ICAICTA) PROCEEDINGS, 2017,
  • [47] Class-specific Word Embedding through Linear Compositionality
    Kuang, Sicong
    Davison, Brian D.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 390 - 397
  • [48] Word Embedding for Small and Domain-specific Malay Corpus
    Tiun, Sabrina
    Nor, Nor Fariza Mohd
    Jalaludin, Azhar
    Rahman, Anis Nadiah Che Abdul
    COMPUTATIONAL SCIENCE AND TECHNOLOGY (ICCST 2019), 2020, 603 : 435 - 443
  • [49] A text sentiment classification model using double word embedding methods
    Zhou, Mingqiang
    Liu, Dan
    Zheng, Yanhui
    Zhu, Qingsheng
    Guo, Ping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (14) : 18993 - 19012
  • [50] A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification
    Sabbeh, Sahar F.
    Fasihuddin, Heba A.
    ELECTRONICS, 2023, 12 (06)