Institutional Repository

Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis

Show simple item record

dc.contributor.author Bosch, Sonja E.
dc.contributor.author Pretorius, Laurette
dc.date.accessioned 2012-03-19T07:37:10Z
dc.date.available 2012-03-19T07:37:10Z
dc.date.issued 2011
dc.identifier.citation Bosch, SE; Pretorius, L. 2011. Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis. South African Journal of African Languages, vol. 31, no. 1 2011. pp.138-158 en
dc.identifier.issn 0257-2117
dc.identifier.uri http://hdl.handle.net/10500/5539
dc.description.abstract This article reports on a practical, semi-automated procedure towards creating a clean, morphologically annotated Zulu corpus of tractable size that could eventually serve both as a gold standard for Zulu computational morphology and as basis for further linguistic annotation. A corpus development architecture is proposed which includes the corpus in various stages of development, a pre-processing module, the Zulu morphological analyser and its guesser variant, the machine-readable lexicon that serves as comprehensive lexical database for Zulu, and a human elicitation function for ensuring the integrity of the lexical database. The approach is novel in the sense that an existing rule-based, finitestate Zulu computational morphological analyser is used as a core technology in this procedure to facilitate the complex, agglutinative nature of Zulu morphology. The corpus, at present consisting of the Zulu version of the South African Constitution, will have morphological analysis and tagging as a first level of annotation. en
dc.language.iso en en
dc.publisher Unisa en
dc.title Towards Zulu corpus clean-up, lexicon development and corpus annotation by means of computational morphological analysis en
dc.type Article en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UnisaIR


Browse

My Account

Statistics