Un détecteur d'activité vocale visuel pour résoudre le problème des permutations en séparation de source de parole dans un mélange convolutif
Bertrand Rivet, Christine Servi`ere, Laurent Girin, Dinh-Tuan Pham, Christian Jutten.
Audio-visual speech source separation consists in mixing visual speech processing techniques (e.g. lip parameters tracking) with source separation methods to improve the extraction of a speech of interest from a mixture of acoustic signals. In this paper, we present a new approach that combines visual information with separation method based on the sparseness of speech: visual information is used as a voice activity detector (VAD) which is plugged on acoustic separation techniques. Results show the efficiency of the approach in the difficult case of realistic convolutive mixtures. Moreover, the overall process is quite simpler than previously proposed audiovisual separation schemes.