Seeing Sound: Estimating Image From Sound

The goal of this work is to train a neural network so that it will reconstruct an image of an audio signal input source.
Under the assumption that the audio signal contains enough features of the image that created it, we tried to use an audio classifier to extract those features and transform them to a features vector, from which we can reconstruct the audio source image – using GAN.
The transformation was achieved using a simple deep neural network, which has been successful in reconstructing images in a small domain (only 2 classes of image, audio pairs of musical instruments) training and test cases.
Under this assumption that this simple training of the transformation can be applied to a broader domain, we tried to test this on animal and vehicles, and it has produced bad results.
Those results have led us to the conclusion that the transformation trained on a small domain cannot be applied to the broader domain, thus we need to train the transformation on a broader database.
After training a broader domain, our results demonstrated a lack of convergence of the network, there for we concluded that the broader transformation is more complex than we originally assumed and cannot be achieved using a simple linear neural net. Further literature survey led us to a new article supporting this conclusion.

Estimating Image From Sound