Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration

被引:24
|
作者
Deng, Jun [1 ]
Schuller, Bjoern [1 ]
Eyben, Florian [1 ]
Schuller, Dagmar [1 ]
Zhang, Zixing [1 ]
Francois, Holly [2 ]
Oh, Eunmi [3 ]
机构
[1] AudEERING GmbH, Gilching, Germany
[2] Samsung Res UK, Staines, England
[3] Samsung Res, Seoul, South Korea
来源
NEURAL COMPUTING & APPLICATIONS | 2020年 / 32卷 / 04期
关键词
Audio restoration; LSTM; MP3; Deep learning; BANDWIDTH EXTENSION; TELEPHONE SPEECH;
D O I
10.1007/s00521-019-04158-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Perceptual audio coding is heavily and successfully applied for audio compression. However, perceptual audio coders may inject audible coding artifacts when encoding audio at low bitrates. Low-bitrate audio restoration is a challenging problem, which tries to recover a high-quality audio sample close to the uncompressed original from a low-quality encoded version. In this paper, we propose a novel data-driven method for audio restoration, where temporal and spectral dynamics are explicitly captured by a deep time-frequency-LSTM recurrent neural networks. Leveraging the captured temporal and spectral information can facilitate the task of learning a nonlinear mapping from the magnitude spectrogram of low-quality audio to that of high-quality audio. The proposed method substantially attenuates audible artifacts caused by codecs and is conceptually straightforward. Extensive experiments were carried out and the experimental results show that for low-bitrate audio at 96 kbps (mono), 64 kbps (mono), and 96 kbps (stereo), the proposed method can efficiently generate improved-quality audio that is competitive or even superior in perceptual quality to the audio produced by other state-of-the-art deep neural network methods and the LAME-MP3 codec.
引用
收藏
页码:1095 / 1107
页数:13
相关论文
共 10 条
  • [1] Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration
    Jun Deng
    Björn Schuller
    Florian Eyben
    Dagmar Schuller
    Zixing Zhang
    Holly Francois
    Eunmi Oh
    Neural Computing and Applications, 2020, 32 : 1095 - 1107
  • [2] HIGH-FREQUENCY TONAL COMPONENTS RESTORATION IN LOW-BITRATE AUDIO CODING USING MULTIPLE SPECTRAL TRANSLATIONS
    Samaali, Imen
    Mahe, Gael
    Alouane, Monia Turki-Hadj
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1053 - 1057
  • [3] ADVANCES IN LOW BITRATE TIME-FREQUENCY CODING
    Vaillancourt, Tommy
    Malenovsky, Vladimir
    Salami, Redwan
    Liu, Zexin
    Miao, Lei
    Gibbs, Jon
    Jelinek, Milan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5913 - 5917
  • [4] Exploiting Time-Frequency Conformers for Music Audio Enhancement
    Chae, Yunkee
    Koo, Junghyun
    Lee, Sungho
    Lee, Kyogu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2362 - 2370
  • [5] Audio Classification Using Dominant Spatial Patterns in Time-Frequency Space
    Molla, Md. Khademul Islam
    Hirose, Keikichi
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2914 - 2918
  • [6] Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks
    Sainath, Tara N.
    Li, Bo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 813 - 817
  • [7] Low-Complexity Hybrid Time-Frequency Audio Signal Pattern Detection
    Martalo, Marco
    Ferrari, Gianluigi
    Malavenda, Claudio Santo
    IEEE SENSORS JOURNAL, 2013, 13 (02) : 501 - 509
  • [8] Adaptive Time-Frequency Analysis for Noise Reduction in an Audio Filter Bank With Low Delay
    Andersen, Kristian Timm
    Moonen, Marc
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 784 - 795
  • [9] Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking
    Naqvi, S. Mohsen
    Wang, W.
    Khan, M. Salman
    Barnard, M.
    Chambers, J. A.
    IET SIGNAL PROCESSING, 2012, 6 (05) : 466 - 477
  • [10] Low-intensity ultrasound stimulation modulates time-frequency patterns of cerebral blood oxygenation and neurovascular coupling of mouse under peripheral sensory stimulation state
    Yuan, Yi
    Wu, Qianqian
    Wang, Xingran
    Liu, Mengyang
    Yan, Jiaqing
    Ji, Hui
    NEUROIMAGE, 2023, 270