Long-term flexible 2D cepstral modeling of speech spectral amplitudes

被引:1
|
作者
Firouzmand, Mohammad [1 ]
Girin, Laurent [1 ]
机构
[1] INPG, Grenoble Lab Images Speech Signal & Automat, Grenoble, France
来源
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12 | 2008年
关键词
speech analysis; speech processing; speech coding; speech modeling; speech synthesis;
D O I
10.1109/ICASSP.2008.4518515
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a method for modeling the envelope of spectral amplitude parameters of speech signals in "two dimensions" (2D). It consists of two cascaded modelings: the first one along the frequency axis is the usual cepstrum technique, which consists of modeling the log-scaled spectral envelope with a Discrete Cosine Model (DCM). The second one, along the time axis, consists of modeling the trajectory of the envelope DCM coefficients by another similar DCM model. An iterative algorithm is proposed to optimally fit this 2D-model to the data according to a perceptual criterion based on frequency masking. This approach is shown to provide an efficient and flexible representation of spectral amplitude parameters in terms of coefficient rates, while providing good signal quality, opening new perspectives in very-low bit-rate sinusoidal speech coding.
引用
收藏
页码:3937 / 3940
页数:4
相关论文
共 50 条
  • [41] Long-term quantization of speech LSF parameters
    Girin, Laurent
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 845 - 848
  • [42] Nonlinear long-term prediction of speech signal
    Lee, KS
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2002, E85D (08) : 1346 - 1348
  • [43] Speech Pathologists in Long-Term Care PREFACE
    Brush, Jennifer A.
    SEMINARS IN SPEECH AND LANGUAGE, 2013, 34 (01) : 3 - 4
  • [44] Nonlinear long-term prediction of speech signals
    Birgmeier, M
    Bernhard, HP
    Kubin, G
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1283 - 1286
  • [45] High durability and stability of 2D nanofluidic devices for long-term single-molecule sensing
    Thakur, Mukeshchand
    Cai, Nianduo
    Zhang, Miao
    Teng, Yunfei
    Chernev, Andrey
    Tripathi, Mukesh
    Zhao, Yanfei
    Macha, Michal
    Elharouni, Farida
    Lihter, Martina
    Wen, Liping
    Kis, Andras
    Radenovic, Aleksandra
    NPJ 2D MATERIALS AND APPLICATIONS, 2023, 7 (01)
  • [46] High durability and stability of 2D nanofluidic devices for long-term single-molecule sensing
    Mukeshchand Thakur
    Nianduo Cai
    Miao Zhang
    Yunfei Teng
    Andrey Chernev
    Mukesh Tripathi
    Yanfei Zhao
    Michal Macha
    Farida Elharouni
    Martina Lihter
    Liping Wen
    Andras Kis
    Aleksandra Radenovic
    npj 2D Materials and Applications, 7
  • [47] Long-Term Predictive Modelling of the Craniofacial Complex Using Machine Learning on 2D Cephalometric Radiographs
    Myers, Michael
    Brown, Michael D.
    Badirli, Sarkhan
    Eckert, George J.
    Johnson, Diane Helen-Marie
    Turkkahraman, Hakan
    INTERNATIONAL DENTAL JOURNAL, 2025, 75 (01) : 236 - 247
  • [49] Application of spectral element method based on convolution filtering in long-term wavefield modeling
    Ren JunSheng
    Zhang Huai
    Zhou YuanZe
    Zhang Zhen
    Shi YaoLin
    CHINESE JOURNAL OF GEOPHYSICS-CHINESE EDITION, 2024, 67 (05): : 1832 - 1838
  • [50] Combining Short-term Cepstral and Long-term Pitch Features for Automatic Recognition of Speaker Age
    Mueller, Christian
    Burkhardt, Felix
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2268 - +