Many people listen to recorded music as part of their everyday lives, e.g. from radio or TV programs, downloads or, increasingly, online streaming services. Sometimes we might want to remix the balance within the music, perhaps to suppress the vocals for entertainment purposes, such as karaoke, or maybe to learn how to play an instrument. All these applications will be possible only if we would have access to separated sound channels (stems) for each musical audio object. In this project we dealt with source separation of polyphonic music while focusing on two audio channels: vocals and accompaniments.
To this task we exploited signal processing and data-driven tools, while using two different deep learning systems: MSS and Open-Unmix, two data-bases: MUSDB and DanyDB and we compared the results between the systems and between the data-bases, also adding two data-bases built from the existing where the true vocals channel that is used for training, consists of the true vocals channel and reduced accompaniment channel by 40 dB.