dc.contributor.advisor |
Van Staden, Wynand
|
|
dc.contributor.author |
Chohan, Farad Hoosen
|
|
dc.date.accessioned |
2021-11-09T14:25:31Z |
|
dc.date.available |
2021-11-09T14:25:31Z |
|
dc.date.issued |
2021-01 |
|
dc.identifier.uri |
https://hdl.handle.net/10500/28242 |
|
dc.description.abstract |
The ubiquitous and innocuous nature of spam email makes it an ideal carrier for covert-based communications, particularly linguistic steganography, which seeks to conceal sensitive information within a body of text through the use of lexical encoding schemes. Stegospam
generated using a Probabilistic Content-Free Grammar (PCFG) such as Spammimic which employs linguistic steganography techniques is extremely difficult to detect. Existing steganalysis approaches employ a combination of techniques that are either too specific to a particular stego-text or tool, or it requires the original text to perform a comparative analysis to discover lexical irregularities that may indicate the existence of a concealed message.
Advancements in Natural Language Processing (NLP) has allowed for its application across various fields including cyber-security. Hence, this research evaluates the implementation of NLP techniques for the detection or classification of stegospam generated using Spammimic.
An experimental approach is adopted to evaluate the ability an NLP-based steganalysis to detect stegospam from non-stegospam files contained within known corpora. This is achieved through the implementation of software prototype constructed on NLP algorithms and methods.
The results of the research demonstrates the ability of the proposed steganalysis to successfully classify non-stegospam from stegospam files generated using Spammimic. |
en |
dc.format.extent |
1 online resource (xiv, 207 leaves) : illustrations, graphs (chiefly color) |
en |
dc.language.iso |
en |
en |
dc.subject |
Classification |
en |
dc.subject |
Classification algorithm |
en |
dc.subject |
Covert communications |
en |
dc.subject |
Feature extraction |
en |
dc.subject |
Linguistic steganalysis |
en |
dc.subject |
Lemmatization |
en |
dc.subject |
Naïve Bayes classifier |
en |
dc.subject |
Natural language processing |
en |
dc.subject |
Sentence disambiguation |
en |
dc.subject |
Spammimic |
en |
dc.subject |
Steganography |
en |
dc.subject |
Steganographic medium |
en |
dc.subject |
Stegospam |
en |
dc.subject |
Support vector machine |
en |
dc.subject |
Tokenization |
en |
dc.subject.ddc |
006.35 |
|
dc.subject.lcsh |
Natural language processing (Computer science) |
en |
dc.subject.lcsh |
Image steganography |
en |
dc.subject.lcsh |
Cryptography |
en |
dc.subject.lcsh |
Spam (Electronic mail) |
en |
dc.title |
Implementing natural language processing techniques for the detection of stegospam files generated using a probabilistic context-free grammar |
en |
dc.type |
Dissertation |
en |
dc.description.department |
School of Computing |
en |
dc.description.degree |
M. Tech. (Information Technology) |
en |