Institutional Repository

Development of neural network-based speech/non-speech discrimination algorithm for audio files

Show simple item record

dc.contributor.advisor Wang, Zenghui en
dc.contributor.author Redelinghuys, Herman
dc.date.accessioned 2022-05-30T10:16:40Z
dc.date.available 2022-05-30T10:16:40Z
dc.date.issued 2022-01
dc.identifier.uri https://hdl.handle.net/10500/28915
dc.description Summary in English en
dc.description.abstract For research in speech processing and analysis of audio content in general, extensive data sets are required. Creating such a dataset manually turns out to be time and labour consuming, but it is a task very suitable for automation through machine learning. This dissertation describes the development of an algorithm for audio content analysis, discriminating between speech and music audio classes. To detect the different classes, the audio signal needs to be expressed in terms of its statistical properties based on spectral and temporal features and statistical values of these features are used to differentiate between the audio classes. The suitability of various low-level audio features was evaluated to determine suitability and efficiency in discriminating between different audio classes. To gain a better understanding of the feature set, exploratory dimensionality reduction analysis was performed with Principal Component Analysis (PCA). The mean accuracy for the SVM classifier was used to rank combinations of features. A number of feature selection techniques were employed to reduce the feature space and increase the mean accuracy, these included Univariate feature selection, Random Forest Regression, forward and backward Sequential Feature Selection and the highest loading factors of the PCA components. An optimal subset of features was selected and used in evaluating a Neural Network-based classifier model and validated against a Support Vector Machine model. It was demonstrated that using this novel method of selecting the optimal combination of audio features, a 50% reduction in dimensionality and higher mean accuracy (99.95%) was achieved and proved to be well suited as a tool for extracting and compiling speech data sets. The algorithm developed in this study is pertinent and applicable to the requirement of the initial motivation for the study, to efficiently create datasets for the study of language identification and recognition for African and other indigenous languages and beyond that, for analysing audio content in general. The reduction in the dimensionality of the feature space and consequently reduction in computational load should capacitate a real-time audio content analysis tool, implemented on low power IoT devices. en
dc.format.extent 1 online resource (xvi, 128 leaves) : illustrations, graphs en
dc.language.iso en en
dc.subject Speech discrimination en
dc.subject Audio Features en
dc.subject Machine learning en
dc.subject Short Time Energy Ratio en
dc.subject Zero-Crossing Rate en
dc.subject Spectral Roll-off en
dc.subject Spectral Flux en
dc.subject Spectral Centroid en
dc.subject Spectral Entropy en
dc.subject Mel Frequency Cepstral Coefficients en
dc.subject Neural Network, en
dc.subject Support Vector Machine en
dc.subject.ddc 006.454 en
dc.subject.lcsh Speech processing systems en
dc.subject.lcsh Machine learning en
dc.subject.lcsh Spectral sensitivity en
dc.subject.lcsh Audio frequency en
dc.subject.lcsh Audio frequency en
dc.subject.lcsh Automatic speech recognition en
dc.title Development of neural network-based speech/non-speech discrimination algorithm for audio files en
dc.type Dissertation en
dc.description.department Electrical and Mining Engineering en
dc.description.degree M. Tech (Electrical Engineering) en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UnisaIR


Browse

My Account

Statistics