Experiments with syllable-based English-Zulu alignment

Loading...
Thumbnail Image

Authors

Kotzé, Gideon
Wolff, Friedel

Issue Date

2014-05

Type

Article

Language

en

Keywords

machine translation , morphology , bitext alignment , African languages , linguistics , computational linguistics

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

As a morphologically complex language, Zulu has notable challenges aligning with English. One of the biggest concerns for statistical machine translation is the fact that the morphological complexity leads to a large number of words for which there exist very few examples in a corpus. To address the problem, we set about establishing an experimental baseline for lexical alignment by naively dividing the Zulu text into syllables, resembling its morphemes. A small quantitative as well as a more thorough qualitative evaluation suggests that our approach has merit, although certain issues remain. Although we have not yet determined the effect of this approach on machine translation, our first experiments suggest that an aligned parallel corpus with reasonable alignment accuracy can be created for a language pair, one of which is under-resourced, in as little as a few days. Furthermore, since very little language-specific knowledge was required for this task, our approach can almost certainly be applied to other language pairs and perhaps for other tasks as well.

Description

Citation

Kotzé and Wolff, 2014

Publisher

European Language Resources Association (ELRA)

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN