Identification of Cybersecurity Specific Content Using the Doc2Vec Language Model

被引:13
|
作者
Mendsaikhan, Otgonpurev [1 ]
Hasegawa, Hirokazu [2 ]
Yamaguchi, Yukiko [3 ]
Shimada, Hajime [3 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Nagoya, Aichi, Japan
[2] Nagoya Univ, Informat Strategy Off, Nagoya, Aichi, Japan
[3] Nagoya Univ, Informat Technol Ctr, Nagoya, Aichi, Japan
关键词
text mining; cyber threat; document embedding; doc2vec;
D O I
10.1109/COMPSAC.2019.00064
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
It has become more challenging for the security analysts to identify cyber threat related content on the Internet because of the vast amount of publicly available digital texts. In this research, we proposed building an autonomous system for extracting cyber threat information from publicly available information sources. We tested a neural embedding method called doc2vec as a natural language filter for the proposed system. With cybersecurity-specific training data and custom preprocessing, we were able to train a doc2vec model and evaluate its performance. According to our evaluation, the natural language filter was able to identify cybersecurity specific natural language text with 83% accuracy.
引用
收藏
页码:396 / 401
页数:6
相关论文
共 50 条
  • [1] Topic recommendation using Doc2Vec
    Karvelis, Petros
    Gavrilis, Dimitris
    Georgoulas, George
    Stylios, Chrysostomos
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [2] Bangla news recommendation using doc2vec
    Nandi, Rabindra Nath
    Zaman, M. M. Arefin
    Al Muntasir, Tareq
    Sumit, Sakhawat Hosain
    Sourov, Tanvir
    Rahman, Md. Jamil-Ur
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [3] Poem Generation using Transformers and Doc2Vec Embeddings
    Santillan, Marvin C.
    Azcarraga, Arnulfo P.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] Deep Learning Based Classification Using Academic Studies in Doc2Vec Model
    Safali, Yasar
    Nergiz, Gozde
    Avaroglu, Erdinc
    Dogan, Emre
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [5] Who is the Ringleader? Modelling Influence in Discourse using Doc2Vec
    Vyas, Priyank
    Smith, Tony
    Feldman, Philip
    Dant, Aaron
    Calude, Andreea
    Patros, Panos
    2021 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING AND SELF-ORGANIZING SYSTEMS COMPANION (ACSOS-C 2021), 2021, : 299 - 300
  • [6] Semantic Detection of Targeted Attacks Using DOC2VEC Embedding
    El-Rahmany, Mariam S.
    Mohamed, Ensaf Hussein
    Haggag, Mohamed H.
    JOURNAL OF COMMUNICATIONS SOFTWARE AND SYSTEMS, 2021, 17 (04) : 334 - 341
  • [7] Micro-blog sentiment classification using Doc2vec
    Liang, Yinghong
    Liu, Haitao
    Zhang, Su
    JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 407 - 410
  • [8] An Approach to Estimating Cited Sentences in Academic Papers Using Doc2vec
    Tanabe, Shunsuke
    Ohta, Manabu
    Takasu, Atsuhiro
    Adachi, Jun
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES'18), 2018, : 118 - 125
  • [9] Classification of Customer Demands by Using Doc2Vec Feaure Extraction Method
    Arslan, Halil
    Kaynar, Oguz
    Sahin, Sumeyye
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [10] Using Collaborative Filtering Algorithms Combined with Doc2Vec for Movie Recommendation
    Liu, Gaojun
    Wu, Xingyu
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 1461 - 1464