dc.contributor.advisor |
Wang, Zenghui
|
en |
dc.contributor.author |
Redelinghuys, Herman
|
|
dc.date.accessioned |
2022-05-30T10:16:40Z |
|
dc.date.available |
2022-05-30T10:16:40Z |
|
dc.date.issued |
2022-01 |
|
dc.identifier.uri |
https://hdl.handle.net/10500/28915 |
|
dc.description |
Summary in English |
en |
dc.description.abstract |
For research in speech processing and analysis of audio content in general, extensive data sets are
required. Creating such a dataset manually turns out to be time and labour consuming, but it is a
task very suitable for automation through machine learning. This dissertation describes the
development of an algorithm for audio content analysis, discriminating between speech and music
audio classes. To detect the different classes, the audio signal needs to be expressed in terms of its
statistical properties based on spectral and temporal features and statistical values of these features
are used to differentiate between the audio classes. The suitability of various low-level audio
features was evaluated to determine suitability and efficiency in discriminating between different
audio classes. To gain a better understanding of the feature set, exploratory dimensionality
reduction analysis was performed with Principal Component Analysis (PCA). The mean accuracy for
the SVM classifier was used to rank combinations of features. A number of feature selection
techniques were employed to reduce the feature space and increase the mean accuracy, these
included Univariate feature selection, Random Forest Regression, forward and backward Sequential
Feature Selection and the highest loading factors of the PCA components.
An optimal subset of features was selected and used in evaluating a Neural Network-based classifier
model and validated against a Support Vector Machine model. It was demonstrated that using this
novel method of selecting the optimal combination of audio features, a 50% reduction in
dimensionality and higher mean accuracy (99.95%) was achieved and proved to be well suited as a
tool for extracting and compiling speech data sets.
The algorithm developed in this study is pertinent and applicable to the requirement of the initial
motivation for the study, to efficiently create datasets for the study of language identification and
recognition for African and other indigenous languages and beyond that, for analysing audio content
in general. The reduction in the dimensionality of the feature space and consequently reduction in
computational load should capacitate a real-time audio content analysis tool, implemented on low power IoT devices. |
en |
dc.format.extent |
1 online resource (xvi, 128 leaves) : illustrations, graphs |
en |
dc.language.iso |
en |
en |
dc.subject |
Speech discrimination |
en |
dc.subject |
Audio Features |
en |
dc.subject |
Machine learning |
en |
dc.subject |
Short Time Energy Ratio |
en |
dc.subject |
Zero-Crossing Rate |
en |
dc.subject |
Spectral Roll-off |
en |
dc.subject |
Spectral Flux |
en |
dc.subject |
Spectral Centroid |
en |
dc.subject |
Spectral Entropy |
en |
dc.subject |
Mel Frequency Cepstral Coefficients |
en |
dc.subject |
Neural Network, |
en |
dc.subject |
Support Vector Machine |
en |
dc.subject.ddc |
006.454 |
en |
dc.subject.lcsh |
Speech processing systems |
en |
dc.subject.lcsh |
Machine learning |
en |
dc.subject.lcsh |
Spectral sensitivity |
en |
dc.subject.lcsh |
Audio frequency |
en |
dc.subject.lcsh |
Audio frequency |
en |
dc.subject.lcsh |
Automatic speech recognition |
en |
dc.title |
Development of neural network-based speech/non-speech discrimination algorithm for audio files |
en |
dc.type |
Dissertation |
en |
dc.description.department |
Electrical and Mining Engineering |
en |
dc.description.degree |
M. Tech (Electrical Engineering) |
en |