Identification of Cybersecurity Specific Content Using the Doc2Vec Language Model

被引：13

作者：

Mendsaikhan, Otgonpurev ^{[1
]}

Hasegawa, Hirokazu ^{[2
]}

Yamaguchi, Yukiko ^{[3
]}

Shimada, Hajime ^{[3
]}

机构：

[1] Nagoya Univ, Grad Sch Informat, Nagoya, Aichi, Japan

[2] Nagoya Univ, Informat Strategy Off, Nagoya, Aichi, Japan

[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi, Japan

来源：

2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1 | 2019年

关键词：

text mining; cyber threat; document embedding; doc2vec;

D O I：

10.1109/COMPSAC.2019.00064

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

It has become more challenging for the security analysts to identify cyber threat related content on the Internet because of the vast amount of publicly available digital texts. In this research, we proposed building an autonomous system for extracting cyber threat information from publicly available information sources. We tested a neural embedding method called doc2vec as a natural language filter for the proposed system. With cybersecurity-specific training data and custom preprocessing, we were able to train a doc2vec model and evaluate its performance. According to our evaluation, the natural language filter was able to identify cybersecurity specific natural language text with 83% accuracy.

引用

页码：396 / 401

页数：6

共 50 条

[21] Chinese Text Keyword Extraction Based on Doc2vec And TextRank
Wang, Wei
Li, Xiangshun
Yu, Sheng
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 369 - 373
[22] Vehicle Trajectory Clustering in Urban Road Network Environment Based on Doc2Vec Model
Kang, Jun
Ma, Haosen
Duan, Zongtao
He, Haojian
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[23] Sentiment Analysis on Chinese Hotel Reviews with Doc2Vec and Classifiers
Shuai, Qianjun
Huang, Yamei
Jin, Libiao
Pang, Long
PROCEEDINGS OF 2018 IEEE 3RD ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC 2018), 2018, : 1171 - 1174
[24] Unsupervised News Topic Modelling with Doc2Vec and Spherical Clustering
Budiarto, Arif
Rahutomo, Reza
Putra, Hendra Novyantara
Cenggoro, Tjeng Wawan
Kacamarga, Muhamad Fitra
Pardamean, Bens
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 40 - 46
[25] Automated Scoring of Interview Videos using Doc2Vec Multimodal Feature Extraction Paradigm
Chen, Lei
Feng, Gary
Leong, Chee Wee
Lehman, Blair
Martin-Raugh, Michelle
Kell, Harrison
Lee, Chong Min
Yoon, Su-Youn
ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 161 - 168
[26] Long-term Performance of a Generic Intrusion Detection Method Using Doc2vec
Mimura, Mamoru
Tanaka, Hidema
2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2017, : 456 - 462
[27] Automated Functional Dependency Detection Between Test Cases Using Doc2Vec and Clustering
Tahvili, Sahar
Hatvani, Leo
Felderer, Michael
Afzal, Wasif
Bohlin, Markus
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2019, : 19 - 26
[28] Retrieval of Semantically Similar Philippine Supreme Court Case Decisions using Doc2Vec
Barco Ranera, Lorenz Timothy
Solano, Geoffrey A.
Oco, Nathaniel
2019 INTERNATIONAL SYMPOSIUM ON MULTIMEDIA AND COMMUNICATION TECHNOLOGY (ISMAC), 2019,
[29] 利用Doc2Vec判断中文专利相似性
张海超
赵良伟
情报工程, 2018, 4 (02) : 64 - 72
[30] Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
Chen Q.
Sokolova M.
SN Computer Science, 2021, 2 (5)

← 1 2 3 4 5 →