Advanced Speech Enhancement Using Deep Learning

Bandwidth extension of audio signals is a known problem in the signal processing field. Classic methods of bandwidth extension rely on statistics within the spectral domain and use prediction-based methods, that are simple and fast but usually yield overly smooth results. This problem has been thoroughly studied in the images area, using super resolution method. Super resolution uses pairs of down sampled images and original images, that are added to a convolutional neural network and achieves high-resolution image output. In our project, we relied on an article which uses super resolution called Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. We tried to implement the suggested super resolution method in order to bandwidth extent an audio signal. In our project, we aim to (a) train a stable GAN model, by using WGAN (Wasserstein Generative Adversarial Networks) and (b) explore the trade of L1 and WGAN loss using regulation parameter. Interestingly, based on our data set, we found that (a) WGAN does not contribute to CNN architecture for bandwidth extension and (b) by itself, L1 loss gives similar results as WGAN loss.

Advanced Speech Enhancement Using Deep Learning

Advanced Speech Enhancement Using  Deep Learning