Institutional Repository

Implementing natural language processing techniques for the detection of stegospam files generated using a probabilistic context-free grammar

Show simple item record

dc.contributor.advisor Van Staden, Wynand
dc.contributor.author Chohan, Farad Hoosen
dc.date.accessioned 2021-11-09T14:25:31Z
dc.date.available 2021-11-09T14:25:31Z
dc.date.issued 2021-01
dc.identifier.uri https://hdl.handle.net/10500/28242
dc.description.abstract The ubiquitous and innocuous nature of spam email makes it an ideal carrier for covert-based communications, particularly linguistic steganography, which seeks to conceal sensitive information within a body of text through the use of lexical encoding schemes. Stegospam generated using a Probabilistic Content-Free Grammar (PCFG) such as Spammimic which employs linguistic steganography techniques is extremely difficult to detect. Existing steganalysis approaches employ a combination of techniques that are either too specific to a particular stego-text or tool, or it requires the original text to perform a comparative analysis to discover lexical irregularities that may indicate the existence of a concealed message. Advancements in Natural Language Processing (NLP) has allowed for its application across various fields including cyber-security. Hence, this research evaluates the implementation of NLP techniques for the detection or classification of stegospam generated using Spammimic. An experimental approach is adopted to evaluate the ability an NLP-based steganalysis to detect stegospam from non-stegospam files contained within known corpora. This is achieved through the implementation of software prototype constructed on NLP algorithms and methods. The results of the research demonstrates the ability of the proposed steganalysis to successfully classify non-stegospam from stegospam files generated using Spammimic. en
dc.format.extent 1 online resource (xiv, 207 leaves) : illustrations, graphs (chiefly color) en
dc.language.iso en en
dc.subject Classification en
dc.subject Classification algorithm en
dc.subject Covert communications en
dc.subject Feature extraction en
dc.subject Linguistic steganalysis en
dc.subject Lemmatization en
dc.subject Naïve Bayes classifier en
dc.subject Natural language processing en
dc.subject Sentence disambiguation en
dc.subject Spammimic en
dc.subject Steganography en
dc.subject Steganographic medium en
dc.subject Stegospam en
dc.subject Support vector machine en
dc.subject Tokenization en
dc.subject.ddc 006.35
dc.subject.lcsh Natural language processing (Computer science) en
dc.subject.lcsh Image steganography en
dc.subject.lcsh Cryptography en
dc.subject.lcsh Spam (Electronic mail) en
dc.title Implementing natural language processing techniques for the detection of stegospam files generated using a probabilistic context-free grammar en
dc.type Dissertation en
dc.description.department School of Computing en
dc.description.degree M. Tech. (Information Technology) en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UnisaIR


Browse

My Account

Statistics