Experiments with syllable-based English-Zulu alignment
Loading...
Files
Authors
Kotzé, Gideon
Wolff, Friedel
Issue Date
2014-05
Type
Article
Language
en
Keywords
machine translation , morphology , bitext alignment , African languages , linguistics , computational linguistics
Alternative Title
Abstract
As a morphologically complex language, Zulu has notable challenges aligning with English. One of the biggest concerns for statistical machine translation is the fact that the morphological complexity leads to a large number of words for which there exist very few examples in a corpus. To address the problem, we set about establishing an experimental baseline for lexical alignment by naively dividing the Zulu text into syllables, resembling its morphemes. A small quantitative as well as a more thorough qualitative evaluation suggests that our approach has merit, although certain issues remain. Although we have not yet determined the effect of this approach on machine translation, our first experiments suggest that an aligned parallel corpus with reasonable alignment accuracy can be created for a language pair, one of which is under-resourced, in as little as a few days. Furthermore, since very little language-specific knowledge was required for this task, our approach can almost certainly be applied to other language pairs and perhaps for other tasks as well.
Description
Citation
Kotzé and Wolff, 2014
Publisher
European Language Resources Association (ELRA)