NOISE ROBUST SINGING VOICE SYNTHESIS USING GAUSSIAN MIXTURE VARIATIONAL AUTOENCODER

Heyang Xue1, Jie Wu3, Jian Luan3, Yujun Wang3, Lei Xie12
Audio, Speech and Language Processing Group (ASLP@NPU), 1School of Software, 2 School of Computer Science, Northwestern Polytechnical University, Xian, China
3 Xiaomi AI Lab

Abstract

Generating high-quality singing voice usually depends on a sizable studio-level singing corpus which is difficult and expensive to collect. In contrast, there is plenty of singing voice data can be found on Internet. However, the found singing data may be mixed by accompaniments or contaminated by environmental noises due to recording conditions. In this paper, we propose a noise robust singing voice synthesizer which incorporates Gaussian Mixture Variational Autoencoder (GMVAE) as the noise encoder to handle different noise conditions, generating clean singing voice from lyrics for target speaker. Specifically, the proposed synthesizer learns a multi-modal latent noise representation of various noise conditions in a continuous space without the use of an auxiliary noise classifier for noise representation learning or clean reference audio during the inference stage. Experiments show that the proposed synthesizer can generate clean and high-quality singing voice for target speaker with MOS close to reconstructed singing voice from ground truth mel-spectrogram with Griffin-Lim vocoder. Experiments also show the robustness of our approach under complex noise conditions.


arch


Examples of Baseline, VAE and GMVAE trained on different dataset


* Please ensure your web browser supports wav audio format.
* Aug-1: mixing accompaniment noise with orignal dataset, Aug-2: mixing environmental noise with orignal dataset and Aug-3: mixing accompaniment and environmental noise with orignal dataset.
* The ground truth fundamental frequency and ground-truth phoneme duration are used.


Target Speaker Female Male
Test condition Aug-1 Aug-2 Aug-3 Aug-1 Aug-2 Aug-3
Baseline VAE GMVAE Baseline VAE GMVAE Baseline VAE GMVAE Baseline VAE GMVAE Baseline VAE GMVAE Baseline VAE GMVAE
Sample-1
Sample-2
Sample-3