dc.description.abstract |
This paper evaluates the performance of a speech recognition system using acoustic models trained on multilingual data.
The reason in our case for using data from more than one language is that there may not be enough data available for a new language to train a robust recogniser. Two general strategies are employed: firstly, the pooling of data from the different languages for training and, secondly, the training of models on the data from one language and subsequent adaptation of the models using data from the new target language. For the first approach, English data and Afrikaans training data are pooled in order to train hidden Markov models (HMMs) for the target language, Afrikaans. For the second approach, the parameters of HMMs trained on English data are adapted using maximum a posteriori probability (MAP) and maximum likelihood linear regression (MUR) methods on Afrikaans data. Continuous density HMMs are used to model context independent phones found in Afrikaans. Cross-language adaptation performance is evaluated in terms of phone recognition performance as well as,for a continuous speech recognition task in Afrikaans. The interesting result is that,for continuous recognition the best performance is obtained by simple pooling of the data and this performance far exceeds the performance achievable using only data from the target language. The improvement is due to the fact that in our database there exists no mismatch between the English and Afrikaans data (other than the language difference) and both languages were labelled with a consistent set of labels. Adaptation results indicate that both MAP adaptation and MUR transformation of English models using Afrikaans adaptation data significantly improves model performance and also achieves better performance than achievable by direct training on the adaptation data. |
en |