LOW-RESOURCE CONTEXTUAL TOPIC IDENTIFICATION ON SPEECH

被引:0
|
作者
Liu, Chunxi [1 ]
Wiesner, Matthew [1 ]
Watanabe, Shinji [1 ]
Harman, Craig [1 ]
Trmal, Jan [1 ,2 ]
Dehak, Najim [1 ]
Khudanpur, Sanjeev [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
关键词
Topic identification; universal acoustic modeling; recurrent neural networks; attention;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-language topic classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large improvements. In particular, we propose an attention-based contextual model which is able to leverage the contexts in a selective manner. We test both our contextual and non-contextual models on four LORELEI languages, and on all but one our attention-based contextual model significantly outperforms the context-independent models.
引用
收藏
页码:656 / 663
页数:8
相关论文
共 50 条
  • [11] DEEP MAXOUT NETWORKS FOR LOW-RESOURCE SPEECH RECOGNITION
    Miao, Yajie
    Metze, Florian
    Rawat, Shourabh
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 398 - 403
  • [12] Low-Resource Autodiacritization of Abjads for Speech Keyword Search
    Schone, Patrick
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 741 - 744
  • [13] Speech recognition datasets for low-resource Congolese languages
    Kimanuka, Ussen
    Maina, Ciira wa
    Buyuk, Osman
    DATA IN BRIEF, 2024, 52
  • [14] Frontier Research on Low-Resource Speech Recognition Technology
    Slam, Wushour
    Li, Yanan
    Urouvas, Nurmamet
    SENSORS, 2023, 23 (22)
  • [15] LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
    Xu, Jin
    Tan, Xu
    Ren, Yi
    Qin, Tao
    Li, Jian
    Zhao, Sheng
    Liu, Tie-Yan
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2802 - 2812
  • [16] Optimizing Data Usage for Low-Resource Speech Recognition
    Qian, Yanmin
    Zhou, Zhikai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 394 - 403
  • [17] ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION
    Yi, Jiangyan
    Tao, Jianhua
    Wen, Zhengqi
    Bai, Ye
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4899 - 4903
  • [18] Optimizing HMM Speech Synthesis for Low-Resource Devices
    Toth, Balint
    Nemeth, Geza
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2012, 16 (02) : 327 - 334
  • [19] AUTOMATIC RATING OF SPONTANEOUS SPEECH FOR LOW-RESOURCE LANGUAGES
    Al-Ghezi, Ragheb
    Getman, Yaroslav
    Voskoboinik, Ekaterina
    Singh, Mittul
    Kurimo, Mikko
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 339 - 345
  • [20] Low-Resource Speech Recognition and Keyword-Spotting
    Gales, Mark J. F.
    Knill, Kate M.
    Ragni, Anton
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 3 - 19