PHONETICS EMBEDDING LEARNING WITH SIDE INFORMATION

被引:0
|
作者
Synnaeve, Gabriel [1 ]
Schatz, Thomas [1 ,2 ]
Dupoux, Emmanuel [1 ]
机构
[1] CNRS, EHESS, IEC ENS, LSCP, Paris, France
[2] CNRS, ENS, SIERRA Project Team INRIA, Paris, France
关键词
speech; ABX; deep neural network; side information; semi-supervised; speech embeddings; acoustic model; DISCOVERY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We show that it is possible to learn an efficient acoustic model using only a small amount of easily available word-level similarity annotations. In contrast to the detailed phonetic labeling required by classical speech recognition technologies, the only information our method requires are pairs of speech excerpts which are known to be similar (same word) and pairs of speech excerpts which are known to be different (different words). An acoustic model is obtained by training shallow and deep neural networks, using an architecture and a cost function well-adapted to the nature of the provided information. The resulting model is evaluated in an ABX minimalpair discrimination task and is shown to perform much better (11.8% ABX error rate) than raw speech features (19.6%), not far from a fully supervised baseline (best neural network: 9.2%, HMM-GMM: 11%).
引用
收藏
页码:106 / 111
页数:6
相关论文
共 50 条
  • [41] Learning Embedding for Signed Network in Social Media With Global Information
    Chen, Jiawang
    Wu, Zhenqiang
    Umar, Mubarak
    Yan, Jun
    Liao, Xuening
    Tian, Bo
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (01) : 871 - 879
  • [42] Learning and Updating Node Embedding on Dynamic Heterogeneous Information Network
    Xie, Yuanzhen
    Ou, Zijing
    Chen, Liang
    Liu, Yang
    Xu, Kun
    Yang, Carl
    Zheng, Zibin
    WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 184 - 192
  • [43] On the PAPR Reduction Technique: WP-PTS Scheme with Embedding the Side Information Data
    Zakaria, Jamaluddin
    Salleh, M. F. M.
    2013 IEEE MALAYSIA INTERNATIONAL CONFERENCE ON COMMUNICATIONS (MICC), 2013, : 351 - 356
  • [44] Enhanced Double-Carrier Word Embedding via Phonetics and Writing
    Zhu, Wenhao
    Jin, Xin
    Liu, Shuang
    Lu, Zhiguo
    Zhang, Wu
    Yan, Ke
    Wei, Baogang
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (02)
  • [45] Learning Dynamical Systems with Side Information (short version)
    Ahmadi, Amir Ali
    El Khadir, Bachir
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 718 - 727
  • [46] Side Channel Information Analysis Based on Machine Learning
    Saeedi, Ehsan
    Kong, Yinan
    2014 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2014,
  • [47] Learning Multiple Nonlinear Dynamical Systems with Side Information
    Takeishi, Naoya
    Kawahara, Yoshinobu
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 3206 - 3211
  • [48] Personalized Federated Learning With Server-Side Information
    Song, Jaehun
    Oh, Min-Hwan
    Kim, Hyung-Sin
    IEEE ACCESS, 2022, 10 : 120245 - 120255
  • [49] Modulation Recognition Using Side Information and Hybrid Learning
    Arumugam, Keerthi Suria Kumar
    Kadampot, Ishaque Ashar
    Tahmasbi, Mehrdad
    Shah, Shaswat
    Bloch, Matthieu
    Pokutta, Sebastian
    2017 IEEE INTERNATIONAL SYMPOSIUM ON DYNAMIC SPECTRUM ACCESS NETWORKS (IEEE DYSPAN), 2017,
  • [50] Learning Bayesian Networks in the Presence of Structural Side Information
    Mokhtarian, Ehsan
    Akbari, Sina
    Jamshidi, Fateme
    Etesami, Jalal
    Kiyavash, Negar
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7814 - 7822