Cyberbullying detection in social media text based on character-level convolutional neural network with shortcuts

被引:35
|
作者
Lu, Nijia [1 ]
Wu, Guohua [1 ]
Zhang, Zhen [1 ,4 ]
Zheng, Yitao [1 ]
Ren, Yizhi [1 ]
Choo, Kim-Kwang Raymond [2 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Sch Cyberspace, Hangzhou, Zhejiang, Peoples R China
[2] Univ Texas San Antonio, Dept Informat Syst & Cyber Secur, San Antonio, TX USA
[3] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX USA
[4] 1158,2 St,Baiyang St, Hangzhou 310018, Zhejiang, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
convolutional neural networks; cyberbullying detection; social network; text classification;
D O I
10.1002/cpe.5627
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As people spend increasingly more time on social networks, cyberbullying has become a social problem that needs to be solved by machine learning methods. Our research focuses on textual cyberbullying detection because text is the most common form of social media. However, the content information in social media is short, noisy, and unstructured with incorrect spellings and symbols, and this impacts the performance of some traditional machine learning methods based on vocabulary knowledge. For this reason, we propose a Char-CNNS (Character-level Convolutional Neural Network with Shortcuts) model to identify whether the text in social media contains cyberbullying. We use characters as the smallest unit of learning, enabling the model to overcome spelling errors and intentional obfuscation in real-world corpora. Shortcuts are utilized to stitch different levels of features to learn more granular bullying signals, and a focal loss function is adopted to overcome the class imbalance problem. We also provide a new Chinese Weibo comment dataset specifically for cyberbullying detection, and experiments are performed on both the Chinese Weibo dataset and the English Tweet dataset. The experimental results show that our approach is competitive with state-of-the-art techniques on cyberbullying detection task.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Character-Level Convolutional Neural Network for Predicting Severity of Software Vulnerability from Vulnerability Description
    Nakagawa, Shunta .
    Nagai, Tatsuya
    Kanehara, Hideaki
    Furumoto, Keisuke
    Takita, Makoto
    Shiraishi, Yoshiaki
    Takahashi, Takeshi
    Mohri, Masami
    Takano, Yasuhiro
    Morii, Masakatu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (09) : 1679 - 1682
  • [22] Automatic detection of cyberbullying in social media text
    Van Hee, Cynthia
    Jacobs, Gilles
    Emmery, Chris
    Desmet, Bart
    Lefever, Els
    Verhoeven, Ben
    De Pauw, Guy
    Daelemans, Walter
    Hoste, Veronique
    PLOS ONE, 2018, 13 (10):
  • [23] CLEval: Character-Level Evaluation for Text Detection and Recognition Tasks
    Baek, Youngmin
    Nam, Daehyun
    Park, Sungrae
    Lee, Junyeop
    Shin, Seung
    Baek, Jeonghun
    Lee, Chae Young
    Lee, Hwalsuk
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2404 - 2412
  • [24] SanskritWord Segmentation Using Character-level Recurrent and Convolutional Neural Networks
    Helwig, Oliver
    Nehrdich, Sebastian
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2754 - 2763
  • [25] CECoR-Net: A Character-Level Neural Network Model for Web Attack Detection
    Gong, Xinyu
    Lu, Jialiang
    Wang, Yuchen
    Qiu, Han
    He, Ruan
    Qiu, Meikang
    4TH IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2019) / 3RD INTERNATIONAL SYMPOSIUM ON REINFORCEMENT LEARNING (ISRL 2019), 2019, : 98 - 103
  • [26] Novel Linguistic Steganography Based on Character-Level Text Generation
    Xiang, Lingyun
    Yang, Shuanghui
    Liu, Yuhang
    Li, Qian
    Zhu, Chengzhang
    MATHEMATICS, 2020, 8 (09)
  • [27] Character-level neural network for biomedical named entity recognition
    Gridach, Mourad
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 70 : 85 - 91
  • [28] Chinese text classification based on character-level CNN and SVM
    Wu H.
    Li D.
    Cheng M.
    International Journal of Intelligent Information and Database Systems, 2019, 12 (03) : 212 - 228
  • [29] Character-Level Neural Translation for Multilingual Media Monitoring in the SUMMA Project
    Barzdins, Guntis
    Renals, Steve
    Gosko, Didzis
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1789 - 1793
  • [30] Enhanced character-level deep convolutional neural networks for cardiovascular disease prediction
    Zhang, Zhichang
    Qiu, Yanlong
    Yang, Xiaoli
    Zhang, Minyu
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2020, 20 (Suppl 3)