Identification of Cybersecurity Specific Content Using the Doc2Vec Language Model

被引:13
|
作者
Mendsaikhan, Otgonpurev [1 ]
Hasegawa, Hirokazu [2 ]
Yamaguchi, Yukiko [3 ]
Shimada, Hajime [3 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Nagoya, Aichi, Japan
[2] Nagoya Univ, Informat Strategy Off, Nagoya, Aichi, Japan
[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi, Japan
来源
2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1 | 2019年
关键词
text mining; cyber threat; document embedding; doc2vec;
D O I
10.1109/COMPSAC.2019.00064
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
It has become more challenging for the security analysts to identify cyber threat related content on the Internet because of the vast amount of publicly available digital texts. In this research, we proposed building an autonomous system for extracting cyber threat information from publicly available information sources. We tested a neural embedding method called doc2vec as a natural language filter for the proposed system. With cybersecurity-specific training data and custom preprocessing, we were able to train a doc2vec model and evaluate its performance. According to our evaluation, the natural language filter was able to identify cybersecurity specific natural language text with 83% accuracy.
引用
收藏
页码:396 / 401
页数:6
相关论文
共 50 条
  • [21] Chinese Text Keyword Extraction Based on Doc2vec And TextRank
    Wang, Wei
    Li, Xiangshun
    Yu, Sheng
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 369 - 373
  • [22] Vehicle Trajectory Clustering in Urban Road Network Environment Based on Doc2Vec Model
    Kang, Jun
    Ma, Haosen
    Duan, Zongtao
    He, Haojian
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [23] Sentiment Analysis on Chinese Hotel Reviews with Doc2Vec and Classifiers
    Shuai, Qianjun
    Huang, Yamei
    Jin, Libiao
    Pang, Long
    PROCEEDINGS OF 2018 IEEE 3RD ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC 2018), 2018, : 1171 - 1174
  • [24] Unsupervised News Topic Modelling with Doc2Vec and Spherical Clustering
    Budiarto, Arif
    Rahutomo, Reza
    Putra, Hendra Novyantara
    Cenggoro, Tjeng Wawan
    Kacamarga, Muhamad Fitra
    Pardamean, Bens
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 40 - 46
  • [25] Automated Scoring of Interview Videos using Doc2Vec Multimodal Feature Extraction Paradigm
    Chen, Lei
    Feng, Gary
    Leong, Chee Wee
    Lehman, Blair
    Martin-Raugh, Michelle
    Kell, Harrison
    Lee, Chong Min
    Yoon, Su-Youn
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 161 - 168
  • [26] Long-term Performance of a Generic Intrusion Detection Method Using Doc2vec
    Mimura, Mamoru
    Tanaka, Hidema
    2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2017, : 456 - 462
  • [27] Automated Functional Dependency Detection Between Test Cases Using Doc2Vec and Clustering
    Tahvili, Sahar
    Hatvani, Leo
    Felderer, Michael
    Afzal, Wasif
    Bohlin, Markus
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2019, : 19 - 26
  • [28] Retrieval of Semantically Similar Philippine Supreme Court Case Decisions using Doc2Vec
    Barco Ranera, Lorenz Timothy
    Solano, Geoffrey A.
    Oco, Nathaniel
    2019 INTERNATIONAL SYMPOSIUM ON MULTIMEDIA AND COMMUNICATION TECHNOLOGY (ISMAC), 2019,
  • [29] 利用Doc2Vec判断中文专利相似性
    张海超
    赵良伟
    情报工程, 2018, 4 (02) : 64 - 72
  • [30] Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
    Chen Q.
    Sokolova M.
    SN Computer Science, 2021, 2 (5)