Uses a fully convolutional end-to-end speech enhancement system.
Implemetation details of the paper accepted to ICASSP-2019
Deepak Baby and Sarah Verhulst, SERGAN: Speech enhancement using relativistic generative adversarial networks with gradient penalty, IEEE-ICASSP, pp. 106-110, May 2019, Brighton, UK.
This work was funded with support from the EU Horizon 2020 programme under grant agreement No 678120 (RobSpear).
$ ./download_dataset.sh
Prepare data for training and testing the various models. The folder path may be edited if you keep the database in a different folder. This script is to be executed only once and the all the models reads from the same location.
python prepare_data.py
Running the models. The models available in this repository are listed below. Every implementation offers several cGAN configurations. Edit the opts
variable for choosing the cofiguration. The results will be automatically saved to different folders. The folder name is generated from files_ops.py
and the foldername automatically includes different configuration options.
run_aecnn.py
: Auto-encoder CNN model with L1 loss term (No discriminator)run_lsgan_se.py
: SEGAN with least-squares loss [1]run_wgan-gp_se.py
: GAN model with Wassterstein loss and Gradient Penaltyrun_rsgan-gp_se.py
: GAN model with relativistic standard GAN with Gradient Penaltyrun_rasgan-gp_se.py
: GAN model with relativistic average standard GAN with Gradient Penaltyrun_ralsgan-gp_se.py
: GAN model with relativistic average least-squares GAN with Gradient PenaltyEvaluation on testset is also done together with training. Set TEST_SEGAN = False
for disabling testing.
run_<xxx>.py
clean_train_data = np.array(fclean['feat_data'])
noisy_train_data = np.array(fnoisy['feat_data'])
change the above lines to
clean_train_data = fclean['feat_data']
noisy_train_data = fnoisy['feat_data']
But this can lead to a slow-down of about 20 times (on the test machine) as the mini-batches are to be read from the disk over several epochs.
[1] S. Pascual, A. Bonafonte, and J. Serra, SEGAN: speech enhancement generative adversarial network, in INTERSPEECH., ISCA, Aug 2017, pp. 3642–3646.
The keras implementation of cGAN is based on the following repos