Improving Gender Identification in Movie Audio using Cross-Domain Data

被引:9
|
作者
Hebbar, Rajat [1 ]
Somandepalli, Krishna [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Dept Elect Engn, Los Angeles, CA 90007 USA
关键词
gender identification; voice activity detection; deep neural networks; recurrent neural networks; transfer learning; bi-directional long short-term memory; RECOGNITION; SPEECH;
D O I
10.21437/Interspeech.2018-1462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gender identification from audio is an important task for quantitative gender analysis in multimedia, and to improve tasks like speech recognition. Robust gender identification requires speech segmentation that relies on accurate voice activity detection (VAD). These tasks are challenging in movie audio due to diverse and often noisy acoustic conditions. In this work, we acquire VAD labels for movie audio by aligning it with subtitle text, and train a recurrent neural network model for VAD. Subsequently, we apply transfer learning to predict gender using feature embeddings obtained from a model pre-trained for large-scale audio classification. In order to account for the diverse acoustic conditions in movie audio, we use audio clips from YouTube labeled for gender. We compare the performance of our proposed method with baseline experiments that were setup to assess the importance of feature embeddings and training data used for gender identification task. For systematic evaluation, we extend an existing benchmark dataset for movie VAD, to include precise gender labels. The VAD system shows comparable results to state-of-the-art in movie domain. The proposed gender identification system outperforms existing baselines, achieving an accuracy of 85% for movie audio. We have made the data and related code publicly available(1).
引用
收藏
页码:282 / 286
页数:5
相关论文
共 50 条
  • [1] Cross-domain Paraphrasing For Improving Language Modelling Using Out-of-domain Data
    Liu, X.
    Gales, M. J. F.
    Woodland, P. C.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3391 - 3395
  • [2] CROSS-DOMAIN ADAPTATION FOR BIOMETRIC IDENTIFICATION USING PHOTOPLETHYSMOGRAM
    Lee, Eugene
    Ho, Annie
    Wang, Yi-Ting
    Huang, Cheng-Han
    Lee, Chen-Yi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1289 - 1293
  • [3] Cross-Domain NER using Cross-Domain Language Modeling
    Jia, Chen
    Liang, Xiaobo
    Zhang, Yue
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2464 - 2474
  • [4] Cross-Domain Data Fusion
    Yang, Qiang
    COMPUTER, 2016, 49 (04) : 18 - 18
  • [5] A Cross-Domain Exploration of Audio and Textual Data for Multi-Modal Emotion Detection
    Haque, Mohd Ariful
    George, Roy
    Rifat, Rakib Hossain
    Uddin, Md Shihab
    Kamal, Marufa
    Gupta, Kishor Datta
    17TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2024, 2024, : 375 - 381
  • [6] Improving the Style Adaptation for Unsupervised Cross-Domain Person Re-identification
    Zhang, Wenyuan
    Zhu, Li
    Lu, Lu
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [7] Adding transparency to the identification of cross-domain mappings in real language data
    Krennmayr, Tina
    REVIEW OF COGNITIVE LINGUISTICS, 2013, 11 (01): : 163 - 184
  • [8] A Cross-domain Data Marketplace for Data Sharing
    Mavrogiorgou, Argyro
    Koukos, Vasileios
    Kouremenou, Eleftheria
    Kiourtis, Athanasios
    Raikos, Alexandros
    Manias, George
    Kyriazis, Dimosthenis
    PROCEEDINGS OF 2022 THE 3RD EUROPEAN SYMPOSIUM ON SOFTWARE ENGINEERING, ESSE 2022, 2022, : 72 - 79
  • [9] Improving Cross-Domain Brain Tissue Segmentation in Fetal MRI with Synthetic Data
    Zalevskyi, Vladyslav
    Sanchez, Thomas
    Roulet, Margaux
    Verdera, Jordina Aviles
    Hutter, Jana
    Kebiri, Hamza
    Cuadra, Meritxell Bach
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT I, 2024, 15001 : 437 - 447
  • [10] Improving Emotion Classification on Chinese Microblog Texts with Auxiliary Cross-Domain Data
    Wu, Huimin
    Jin, Qin
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 821 - 826