Reconnaissance automatique de la parole en langue somalienne
Abdillahi NIMAAN, Pascal NOCERA, Jean-François BONASTRE.
Most African countries follow an oral tradition system to transmit their cultural, scientific and historic heritage through generations. This ancestral knowledge accumulated during centuries is today threatened of disappearing. Automatic transciption and indexing tools seem potential solution to preserve it. This paper presents the first results of automatic speech recognition (ASR) of Djibouti languages in order to index the Djibouti cultural heritage. This work is dedicated to process Somali language, which represents half of the targeted Djiboutian audio archives. We describe the principal characteristics of audio (10 hours) and textual (3M words) training corpora collected and the first ASR results of this language. Using the the specificities of the Somali language, (words are composed of a concatenation of sub-words called ``roots'' in this paper), we improve the obtained results. We will also discuss future ways of research like roots indexing of audio archives.