mdctGAN: Taming transformer-based GAN for speech super-resolution with Modified DCT spectra

被引:2
|
作者
Shuai, Chenhao [1 ,3 ,4 ]
Shi, Chaohua [2 ,3 ,4 ]
Gan, Lu [3 ]
Liu, Hongqing [4 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Xidian Univ, Xian, Shaanxi, Peoples R China
[3] Brunel Univ London, London, England
[4] Chongqing Univ Posts & Telecommun, Chongqing, Peoples R China
来源
关键词
speech super-resolution; phase information; GAN;
D O I
10.21437/Interspeech.2023-113
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech super-resolution (SSR) aims to recover a high resolution (HR) speech from its corresponding low resolution (LR) counterpart. Recent SSR methods focus more on the reconstruction of the magnitude spectrogram, ignoring the importance of phase reconstruction, thereby limiting the recovery quality. To address this issue, we propose mdctGAN, a novel SSR framework based on modified discrete cosine transform (MDCT). By adversarial learning in the MDCT domain, our method reconstructs HR speeches in a phase-aware manner without vocoders or additional post-processing. Furthermore, by learning frequency consistent features with self-attentive mechanism, mdctGAN guarantees a high quality speech reconstruction. For VCTK corpus dataset, the experiment results show that our model produces natural auditory quality with high MOS and PESQ scores. It also achieves the state-of-the-art log-spectral-distance (LSD) performance on 48 kHz target resolution from various input rates. Code is available from https://github.com/neoncloud/mdctGAN
引用
收藏
页码:5112 / 5116
页数:5
相关论文
共 50 条
  • [41] CBCT Tooth Images Super-Resolution Method Based on GAN Prior
    Song Q.
    Li Y.
    Fan Y.
    Lu S.
    Zhou Y.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (11): : 1751 - 1759
  • [42] Image Super-Resolution using DCT Interpolation and Sparse Learning-based Method
    Reis, Saulo R. S.
    Bressan, Graca
    FIFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2013), 2013, 8878
  • [43] TIME-FREQUENCY LOSS FOR CNN BASED SPEECH SUPER-RESOLUTION
    Wang, Heming
    Wang, Deliang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 861 - 865
  • [44] ProfileSR-GAN: A GAN Based Super-Resolution Method for Generating High-Resolution Load Profiles
    Song, Lidong
    Li, Yiyan
    Lu, Ning
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (04) : 3278 - 3289
  • [45] Underwater Image Super-Resolution Based on the Combination of Generative Adversarial Networks and Transformer
    Trung Nguyen Quoc
    Nguyen Pham Thi Thao
    Viet-Tuan Le
    Vinh Truong Hoang
    Surinwarangkoon, Thongchai
    INTELLIGENCE OF THINGS: TECHNOLOGIES AND APPLICATIONS, ICIT 2024, VOL 2, 2025, 230 : 3 - 12
  • [46] A SWIN TRANSFORMER- BASED FUSION APPROACH FOR HYPERSPECTRAL IMAGE SUPER-RESOLUTION
    Yang, Yuchao
    Wang, Yulei
    Zhao, Enyu
    Song, Meiping
    Zhang, Qiang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7372 - 7375
  • [47] Method for degraded grassland gap localization based on super-resolution reconstruction and Transformer
    Lu J.
    Chang H.
    Lan Y.
    Wang L.
    Luo H.
    Huang J.
    Yuan J.
    Nongye Gongcheng Xuebao/Transactions of the Chinese Society of Agricultural Engineering, 2024, 40 (10): : 203 - 212
  • [48] Thangka Mural Super-Resolution Based on Nimble Convolution and Overlapping Window Transformer
    Ji, Liqi
    Wang, Nianyi
    Chen, Xin
    Zhang, Xinyang
    Wu, Zhen
    Yang, Yunbo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 211 - 224
  • [49] PERCEPTION-ORIENTED OMNIDIRECTIONAL IMAGE SUPER-RESOLUTION BASED ON TRANSFORMER NETWORK
    An, Hongyu
    Zhang, Xinfeng
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3583 - 3587
  • [50] Image super-resolution method based on the interactive fusion of transformer and CNN features
    Wang, Jianxin
    Zou, Yongsong
    Alfarraj, Osama
    Sharma, Pradip Kumar
    Said, Wael
    Wang, Jin
    VISUAL COMPUTER, 2024, 40 (08): : 5827 - 5839