dc.contributor.author |
Bosch, Sonja E.
|
|
dc.contributor.author |
Pretorius, Laurette
|
|
dc.date.accessioned |
2012-03-19T07:37:10Z |
|
dc.date.available |
2012-03-19T07:37:10Z |
|
dc.date.issued |
2011 |
|
dc.identifier.citation |
Bosch, SE; Pretorius, L. 2011. Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis. South African Journal of African Languages, vol. 31, no. 1 2011. pp.138-158 |
en |
dc.identifier.issn |
0257-2117 |
|
dc.identifier.uri |
http://hdl.handle.net/10500/5539 |
|
dc.description.abstract |
This article reports on a practical, semi-automated procedure towards creating a clean, morphologically
annotated Zulu corpus of tractable size that could eventually serve both as a gold standard for Zulu
computational morphology and as basis for further linguistic annotation. A corpus development
architecture is proposed which includes the corpus in various stages of development, a pre-processing
module, the Zulu morphological analyser and its guesser variant, the machine-readable lexicon that
serves as comprehensive lexical database for Zulu, and a human elicitation function for ensuring the
integrity of the lexical database. The approach is novel in the sense that an existing rule-based, finitestate
Zulu computational morphological analyser is used as a core technology in this procedure to
facilitate the complex, agglutinative nature of Zulu morphology. The corpus, at present consisting of the
Zulu version of the South African Constitution, will have morphological analysis and tagging as a first
level of annotation. |
en |
dc.language.iso |
en |
en |
dc.publisher |
Unisa |
en |
dc.title |
Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis |
en |
dc.type |
Article |
en |