Experiments with syllable-based English-Zulu alignment

Kotzé, Gideon; Wolff, Friedel

UnisaIR Home
→
College of Graduate Studies
→
Academy of African Languages and Science
→
Research Outputs (AALS)
→
View Item

dc.contributor.author	Kotzé, Gideon
dc.contributor.author	Wolff, Friedel
dc.date.accessioned	2014-08-27T08:53:56Z
dc.date.available	2014-08-27T08:53:56Z
dc.date.issued	2014-05
dc.identifier.citation	Kotzé and Wolff, 2014	en
dc.identifier.uri	http://hdl.handle.net/10500/13869
dc.description.abstract	As a morphologically complex language, Zulu has notable challenges aligning with English. One of the biggest concerns for statistical machine translation is the fact that the morphological complexity leads to a large number of words for which there exist very few examples in a corpus. To address the problem, we set about establishing an experimental baseline for lexical alignment by naively dividing the Zulu text into syllables, resembling its morphemes. A small quantitative as well as a more thorough qualitative evaluation suggests that our approach has merit, although certain issues remain. Although we have not yet determined the effect of this approach on machine translation, our first experiments suggest that an aligned parallel corpus with reasonable alignment accuracy can be created for a language pair, one of which is under-resourced, in as little as a few days. Furthermore, since very little language-specific knowledge was required for this task, our approach can almost certainly be applied to other language pairs and perhaps for other tasks as well.	en
dc.language.iso	en	en
dc.publisher	European Language Resources Association (ELRA)	en
dc.subject	machine translation	en
dc.subject	morphology	en
dc.subject	bitext alignment	en
dc.subject	African languages	en
dc.subject	linguistics	en
dc.subject	computational linguistics	en
dc.title	Experiments with syllable-based English-Zulu alignment	en
dc.type	Article	en