Industry Specific Word Embedding and its Application in Log Classification

被引：10

作者：

Khabiri, Elham ^{[1
]}

Gifford, Wesley M. ^{[1
]}

Vinzamuri, Bhanukiran ^{[1
]}

Patel, Dhaval ^{[1
]}

Mazzoleni, Pietro ^{[2
]}

机构：

[1] IBM Res, Yorktown Hts, NY 10598 USA

[2] IBM Corp, Armonk, NY USA

来源：

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19) | 2019年

关键词：

natural language processing; word embeddings; text classification;

D O I：

10.1145/3357384.3357827

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Word, sentence and document embeddings have become the cornerstone of most natural language processing-based solutions. The training of an effective embedding depends on a large corpus of relevant documents. However, such corpus is not always available, especially for specialized heavy industries such as oil, mining, or steel. To address the problem, this paper proposes a semi-supervised learning framework to create document corpus and embedding starting from an industry taxonomy, along with a very limited set of relevant positive and negative documents. Our solution organizes candidate documents into a graph and adopts different explore and exploit strategies to iteratively create the corpus and its embedding. At each iteration, two metrics, called Coverage and Context Similarity, are used as proxy to measure the quality of the results. Our experiments demonstrate how an embedding created by our solution is more effective than the one created by processing thousands of industry-specific document pages. We also explore using our embedding in downstream tasks, such as building an industry specific classification model given labeled training data, as well as classifying unlabeled documents according to industry taxonomy terms.

引用

页码：2713 / 2721

页数：9

共 50 条

[1] Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
Tang, Duyu
Wei, Furu
Yang, Nan
Zhou, Ming
Liu, Ting
Qin, Bing
PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 1555 - 1565
[2] Dual-Clustering Maximum Entropy with Application to Classification and Word Embedding
Wang, Xiaolong
Wang, Jingjing
Zhai, Chengxiang
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3323 - 3329
[3] Malware Classification with Word Embedding Features
Kale, Aparna Sunil
Di Troia, Fabio
Stamp, Mark
ICISSP: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2021, : 733 - 742
[4] Classification of Taxonomical Relationship by Word Embedding
Omine, Kazuki
Paik, Incheon
2018 IEEE INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING (ICCC), 2018, : 122 - 125
[5] Improving Text Classification with Word Embedding
Ge, Lihao
Moh, Teng-Sheng
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1796 - 1805
[6] Study on the Chinese Word Semantic Relation Classification with Word Embedding
Shijia, E.
Jia, Shengbin
Xiang, Yang
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2017, 2018, 10619 : 849 - 855
[7] Document Sentiment Classification based on the Word Embedding
Yin, Yanping
Jin, Zhong
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 456 - 461
[8] Automated Patent Classification Using Word Embedding
Grawe, Mattyws F.
Martins, Claudia A.
Bonfante, Andreia G.
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 408 - 411
[9] Citation Intent Classification Using Word Embedding
Roman, Muhammad
Shahid, Abdul
Khan, Shafiullah
Koubaa, Anis
Yu, Lisu
IEEE ACCESS, 2021, 9 : 9982 - 9995
[10] Topic Classification Based on Improved Word Embedding
Sheng, Liangliang
Xu, Lizhen
2017 14TH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE (WISA 2017), 2017, : 117 - 121

← 1 2 3 4 5 →