Project A6 - Deep learning for robust audio-visual processing
In project A6, we will develop novel algorithms to jointly process visual and acoustic cues to improve signal processing in both domains. We will develop robust methods that are able to cope with cluttered real-world data, various noise types and deliberately crafted adversarial data. We will determine which parts of the data are most promising using attentional mechanisms and the principle of information gain. Finally, we will exploit both visual and audio data for sound source separation and speech enhancement. The developed methods will be embedded into a multi-modal robotics platform, which involves dealing with practical constraints such as limited training data, limited computational power, and real-time constraints.