A convolutional neural network approach for gender and language variety identification

被引:7
|
作者
Gomez-Adorno, Helena [1 ]
Fuentes-Alba, Roddy [2 ]
Markov, Ilia [3 ]
Sidorov, Grigori [2 ]
Gelbukh, Alexander [2 ]
机构
[1] Univ Nacl Autonoma Mexico, Inst Invest Matemdt Aplicadas & Sistemas IIMAS, Mexico City, DF, Mexico
[2] Inst Politecn Nacl, CIC, Mexico City, DF, Mexico
[3] INRIA, Le Chesnay, France
关键词
Convolutional neural networks; deep learning; author profiling; gender identification; language variety identification; machine learning; character n-grams; Spanish;
D O I
10.3233/JIFS-179032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for gender and language variety identification using a convolutional neural network (CNN). We compare the performance of this method with a traditional machine learning algorithm - support vector machines (SVM) trained on character n-grams (n = 3-8) and lexical features (unigrams and bigrams of words), and their combinations. We use a single multi-labeled corpus composed of news articles in different varieties of Spanish developed specifically for these tasks. We present a convolutional neural network trained on word- and sentence-level embeddings architecture that can be successfully applied to gender and language variety identification on a relatively small corpus (less than 10,000 documents). Our experiments show that the deep learning approach outperforms a traditional machine learning approach on both tasks, when named entities are present in the corpus. However, when evaluating the performance of these approaches reducing all named entities to a single symbol "NE" to avoid topic-dependent features, the drop in accuracy is higher for the deep learning approach.
引用
收藏
页码:4845 / 4855
页数:11
相关论文
共 50 条
  • [31] Convolutional Neural Network Approach for Iris Segmentation
    Abhinand, P.
    Sheela, S. V.
    Radhika, K. R.
    SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS, ICSOFTCOMP 2022, 2023, 1788 : 354 - 368
  • [32] A Convolutional Neural Network Approach for Classifying Leukocoria
    Henning, Ryan
    Rivas-Perea, Pablo
    Shaw, Bryan
    Hamerly, Greg
    2014 IEEE SOUTHWEST SYMPOSIUM ON IMAGE ANALYSIS AND INTERPRETATION (SSIAI 2014), 2014, : 9 - 12
  • [33] Pruning Convolutional Neural Network with Distinctiveness Approach
    Li, Wenrui
    Plested, Jo
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 448 - 455
  • [34] Face-Based Age and Gender Estimation Using Improved Convolutional Neural Network Approach
    Neha Sharma
    Reecha Sharma
    Neeru Jindal
    Wireless Personal Communications, 2022, 124 : 3035 - 3054
  • [35] A Convolutional Neural Network Approach for Face Verification
    Khalil-Hani, Mohamed
    Sung, Liew Shan
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 707 - 714
  • [36] Face-Based Age and Gender Estimation Using Improved Convolutional Neural Network Approach
    Sharma, Neha
    Sharma, Reecha
    Jindal, Neeru
    WIRELESS PERSONAL COMMUNICATIONS, 2022, 124 (04) : 3035 - 3054
  • [37] An End-to-end Approach to Language Identification in Short Utterances using Convolutional Neural Networks
    Lozano-Diez, Alicia
    Zazo-Candil, Ruben
    Gonzalez-Dominguez, Javier
    Toledano, Doroteo T.
    Gonzalez-Rodriguez, Joaquin
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 403 - 407
  • [38] Fruit Quality Identification and Classification by Convolutional Neural Network
    Jayanth J.
    Mahadevaswamy M.
    Shivakumar M.
    SN Computer Science, 4 (3)
  • [39] Bearing Fault Identification and Classification with Convolutional Neural Network
    Bhadane, Mukesh
    Ramachandran, K. I.
    PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT ,POWER AND COMPUTING TECHNOLOGIES (ICCPCT), 2017,
  • [40] Fake Faces Identification via Convolutional Neural Network
    Mo, Huaxiao
    Chen, Bolin
    Luo, Weiqi
    PROCEEDINGS OF THE 6TH ACM WORKSHOP ON INFORMATION HIDING AND MULTIMEDIA SECURITY (IH&MMSEC'18), 2018, : 43 - 47