Improving Gender Identification in Movie Audio using Cross-Domain Data

被引:9
|
作者
Hebbar, Rajat [1 ]
Somandepalli, Krishna [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Dept Elect Engn, Los Angeles, CA 90007 USA
关键词
gender identification; voice activity detection; deep neural networks; recurrent neural networks; transfer learning; bi-directional long short-term memory; RECOGNITION; SPEECH;
D O I
10.21437/Interspeech.2018-1462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gender identification from audio is an important task for quantitative gender analysis in multimedia, and to improve tasks like speech recognition. Robust gender identification requires speech segmentation that relies on accurate voice activity detection (VAD). These tasks are challenging in movie audio due to diverse and often noisy acoustic conditions. In this work, we acquire VAD labels for movie audio by aligning it with subtitle text, and train a recurrent neural network model for VAD. Subsequently, we apply transfer learning to predict gender using feature embeddings obtained from a model pre-trained for large-scale audio classification. In order to account for the diverse acoustic conditions in movie audio, we use audio clips from YouTube labeled for gender. We compare the performance of our proposed method with baseline experiments that were setup to assess the importance of feature embeddings and training data used for gender identification task. For systematic evaluation, we extend an existing benchmark dataset for movie VAD, to include precise gender labels. The VAD system shows comparable results to state-of-the-art in movie domain. The proposed gender identification system outperforms existing baselines, achieving an accuracy of 85% for movie audio. We have made the data and related code publicly available(1).
引用
收藏
页码:282 / 286
页数:5
相关论文
共 50 条
  • [41] IMPROVING CROSS-DOMAIN SLOT FILLING WITH COMMON SYNTACTIC STRUCTURE
    Bu, Luchen
    Lin, Xixun
    Zhang, Peng
    Wang, Bin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7638 - 7642
  • [42] Cross-Domain Data Traceability Mechanism Based on Blockchain
    Zhao, Shoucai
    Cao, Lifeng
    Li, Jinhui
    Wan, Jiling
    Bai, Jinlong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (02): : 2531 - 2549
  • [43] Identifying intentions in forum posts with cross-domain data
    Tu Minh Phuong
    Le Cong Linh
    Ngo Xuan Bach
    Journal of Heuristics, 2022, 28 : 171 - 192
  • [44] ADDRESSING UNCERTAINTY AND CONFLICTS IN CROSS-DOMAIN DATA PROVENANCE
    Moitra, Abha
    Barnett, Bruce
    Crapo, Andrew
    Dill, Stephen J.
    MILITARY COMMUNICATIONS CONFERENCE, 2010 (MILCOM 2010), 2010, : 912 - 917
  • [45] Data Loss Prevention for Cross-Domain Instant Messaging
    Kongsgard, Kyrre Wahl
    Nordbotten, Nils Agne
    Mancini, Federico
    Engelstad, Paal E.
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 3565 - 3572
  • [46] Data Augmentation for Cross-Domain Named Entity Recognition
    Chen, Shuguang
    Aguilar, Gustavo
    Neves, Leonardo
    Solorio, Thamar
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5346 - 5356
  • [47] Identifying intentions in forum posts with cross-domain data
    Tu Minh Phuong
    Le Cong Linh
    Ngo Xuan Bach
    JOURNAL OF HEURISTICS, 2022, 28 (02) : 171 - 192
  • [48] A Cross-Domain Comparative Study of Big Data Architectures
    Macak, Martin
    Ge, Mouzhi
    Buhnova, Barbora
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2020, 29 (04)
  • [49] The Research on Key Techniques of Cross-Domain Data Services
    Yin, Xinming
    Jiang, Haiping
    Huang, Haiye
    Bi, Junhao
    Cao, Zhiwei
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC, CONTROL AND AUTOMATION ENGINEERING (MECAE 2017), 2017, 61 : 398 - 402
  • [50] On the Impact of Cross-Domain Data on German Language Models
    Dada, Amin
    Chen, Aokun
    Peng, Cheng
    Smith, Kaleb E.
    Idrissi-Yaghir, Ahmad
    Seibold, Constantin Marc
    Li, Jianning
    Heiliger, Lars
    Friedrich, Christoph M.
    Truhn, Daniel
    Egger, Jan
    Bian, Jiang
    Kleesiek, Jens
    Wu, Yonghui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13801 - 13813