Optimized Active Learning Strategy for Audiovisual Speaker Recognition

被引:4
|
作者
Karlos, Stamatis [1 ]
Kaleris, Konstantinos [2 ]
Fazakis, Nikos [2 ]
Kanas, Vasileios G. [2 ]
Kotsiantis, Sotiris [1 ]
机构
[1] Univ Patras, Dept Math, Rion 26504, Achaia, Greece
[2] Univ Patras, Dept Elect & Engn, Rion 26504, Achaia, Greece
来源
关键词
Active Learning; Optimized learner; Speaker Recognition; Audiovisual features; Support Vector Machines; Hyperopt package tool; SPEECH; EXTRACTION;
D O I
10.1007/978-3-319-99579-3_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this work is to investigate the improved recognition accuracy caused from exploiting optimization stages for tuning parameters of an Active Learning (AL) classifier. Since plenty of data could be available during Speaker Recognition (SR) tasks, the AL concept, which incorporates human entities inside its learning kernel for exploring hidden insights into unlabeled data, seems extremely suitable, without demanding much expertise on behalf of the human factor. Six datasets containing 8 and 16 speakers' utterances under different recording setups, are described by audiovisual features and evaluated through the time-efficient Uncertainty Sampling query strategy (UncS). Both Support Vector Machines (SVMs) and Random Forest (RF) algorithms were selected to be tuned over a small subset of the initial training data and then applied iteratively for mining the most suitable instances from a corresponding pool of unlabeled instances. Useful conclusions are drawn concerning the values of the selected parameters, allowing future optimization attempts to get employed into more restricted regions, while remarkable improvements rates were obtained using an ideal annotator.
引用
收藏
页码:281 / 290
页数:10
相关论文
共 50 条
  • [21] Bimodal fusion of visual and speech data for audiovisual speaker recognition in noisy environment
    Chelali F.Z.
    International Journal of Information Technology, 2023, 15 (6) : 3135 - 3145
  • [22] Temporal Multimodal Learning in Audiovisual Speech Recognition
    Hu, Di
    Li, Xuelong
    Lu, Xiaoqiang
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3574 - 3582
  • [23] A transfer learning SHM strategy for bridges enriched by the use of speaker recognition x-vectors
    Eleonora M. Tronci
    Homayoon Beigi
    Maria Q. Feng
    Raimondo Betti
    Journal of Civil Structural Health Monitoring, 2022, 12 : 1285 - 1298
  • [24] A transfer learning SHM strategy for bridges enriched by the use of speaker recognition x-vectors
    Tronci, Eleonora M.
    Beigi, Homayoon
    Feng, Maria Q.
    Betti, Raimondo
    JOURNAL OF CIVIL STRUCTURAL HEALTH MONITORING, 2022, 12 (06) : 1285 - 1298
  • [25] Optimized speaker change detection approach for speaker segmentation towards speaker diarization based on deep learning
    VijayKumar, K.
    Rao, R. Rajeswara
    DATA & KNOWLEDGE ENGINEERING, 2023, 144
  • [26] Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition
    Go, Changhwan
    Lee, Young Han
    Kim, Taewoo
    Park, Nam In
    Chun, Chanjun
    SENSORS, 2024, 24 (19)
  • [27] Speaker Dependency Analysis, Audiovisual Fusion Cues and A Multimodal BLSTM for Conversational Engagement Recognition
    Huang, Yuyun
    Gilmartin, Emer
    Campbell, Nick
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3359 - 3363
  • [28] The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy
    Xie, Bo
    Shen, Guowei
    Guo, Chun
    Cui, Yunhe
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [29] Speaker recognition based on deep learning: An overview
    Bai, Zhongxin
    Zhang, Xiao-Lei
    NEURAL NETWORKS, 2021, 140 : 65 - 99
  • [30] An extreme learning machine approach for speaker recognition
    Lan, Yuan
    Hu, Zongjiang
    Soh, Yeng Chai
    Huang, Guang-Bin
    NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 417 - 425