Improving Gender Identification in Movie Audio using Cross-Domain Data

被引:9
|
作者
Hebbar, Rajat [1 ]
Somandepalli, Krishna [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Dept Elect Engn, Los Angeles, CA 90007 USA
关键词
gender identification; voice activity detection; deep neural networks; recurrent neural networks; transfer learning; bi-directional long short-term memory; RECOGNITION; SPEECH;
D O I
10.21437/Interspeech.2018-1462
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gender identification from audio is an important task for quantitative gender analysis in multimedia, and to improve tasks like speech recognition. Robust gender identification requires speech segmentation that relies on accurate voice activity detection (VAD). These tasks are challenging in movie audio due to diverse and often noisy acoustic conditions. In this work, we acquire VAD labels for movie audio by aligning it with subtitle text, and train a recurrent neural network model for VAD. Subsequently, we apply transfer learning to predict gender using feature embeddings obtained from a model pre-trained for large-scale audio classification. In order to account for the diverse acoustic conditions in movie audio, we use audio clips from YouTube labeled for gender. We compare the performance of our proposed method with baseline experiments that were setup to assess the importance of feature embeddings and training data used for gender identification task. For systematic evaluation, we extend an existing benchmark dataset for movie VAD, to include precise gender labels. The VAD system shows comparable results to state-of-the-art in movie domain. The proposed gender identification system outperforms existing baselines, achieving an accuracy of 85% for movie audio. We have made the data and related code publicly available(1).
引用
收藏
页码:282 / 286
页数:5
相关论文
共 50 条
  • [31] Cross-Domain Person Re-Identification Using Heterogeneous Convolutional Network
    Zhang, Zhong
    Wang, Yanan
    Liu, Shuang
    Xiao, Baihua
    Durrani, Tariq S.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1160 - 1171
  • [32] A machine learning approach for cross-domain plant identification using herbarium specimens
    Sophia Chulif
    Sue Han Lee
    Yang Loong Chang
    Kok Chin Chai
    Neural Computing and Applications, 2023, 35 : 5963 - 5985
  • [33] A machine learning approach for cross-domain plant identification using herbarium specimens
    Chulif, Sophia
    Lee, Sue Han
    Chang, Yang Loong
    Chai, Kok Chin
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (08): : 5963 - 5985
  • [34] Linking building data in the cloud: Integrating cross-domain building data using linked data
    Curry, Edward
    O'Donnell, James
    Corry, Edward
    Hasan, Souleiman
    Keane, Marcus
    O'Riain, Sean
    ADVANCED ENGINEERING INFORMATICS, 2013, 27 (02) : 206 - 219
  • [35] Cross-domain person re-identification using graph convolutional networks
    Pan S.
    Wang Y.
    Chong Y.
    Chong, Yanwen (ywchong@whu.edu.cn), 1600, Huazhong University of Science and Technology (48): : 44 - 49
  • [37] Cross-domain damage identification based on conditional adversarial domain adaptation
    Li, Zuoqiang
    Weng, Shun
    Xia, Yong
    Yu, Hong
    Yan, Yongyi
    Yin, Pengcheng
    ENGINEERING STRUCTURES, 2024, 321
  • [38] A Practical Cross-Domain ECG Biometric Identification Method
    Sun, Huan
    Guo, Yuchun
    Chen, Bin
    Chen, Yishuai
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [39] Sequential data-driven cross-domain lithology identification under logging data distribution discrepancy
    Zhou, Kaibo
    Li, Shangyuan
    Liu, Jie
    Zhou, Xiang
    Geng, Zhexian
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2021, 32 (12)
  • [40] Improving Cross-Domain Chinese Word Segmentation with Word Embeddings
    Ye, Yuxiao
    Zhang, Yue
    Li, Weikang
    Qiu, Likun
    Sun, Jian
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2726 - 2735