dc.contributor.author |
Motembe, Dodi
|
|
dc.date.accessioned |
2021-05-31T10:35:30Z |
|
dc.date.available |
2021-05-31T10:35:30Z |
|
dc.date.issued |
2020-01 |
|
dc.identifier.uri |
http://hdl.handle.net/10500/27389 |
|
dc.description.abstract |
Facial expression recognition (FER) is still a challenging concept, and machines struggle to
comprehend effectively the dynamic shifts in facial expressions of human emotions. The
existing systems, which have proven to be effective, consist of deeper network structures that
need powerful and expensive hardware. The deeper the network is, the longer the training and
the testing. Many systems use expensive GPUs to make the process faster. To remedy the
above challenges while maintaining the main goal of improving the accuracy rate of the
recognition, we create a generic hierarchical structure with variable settings. This generic
structure has a hierarchy of three convolutional blocks, two dropout blocks and one fully
connected block. From this generic structure we derived four different network structures to
be investigated according to their performances. From each network structure case, we again
derived six network structures in relation to the variable parameters. The variable parameters
under analysis are the size of the filters of the convolutional maps and the max-pooling as
well as the number of convolutional maps. In total, we have 24 network structures to
investigate, and six network structures per case. After simulations, the results achieved after
many repeated experiments showed in the group of case 1; case 1a emerged as the top
performer of that group, and case 2a, case 3c and case 4c outperformed others in their
respective groups. The comparison of the winners of the 4 groups indicates that case 2a is the
optimal structure with optimal parameters; case 2a network structure outperformed other
group winners. Considerations were done when choosing the best network structure,
considerations were; minimum accuracy, average accuracy and maximum accuracy after 15
times of repeated training and analysis of results. All 24 proposed network structures were
tested using two of the most used FER datasets, the CK+ and the JAFFE. After repeated
simulations the results demonstrate that our inexpensive optimal network architecture
achieved 98.11 % accuracy using the CK+ dataset. We also tested our optimal network
architecture with the JAFFE dataset, the experimental results show 84.38 % by using just a
standard CPU and easier procedures. We also compared the four group winners with other
existing FER models performances recorded recently in two studies. These FER models used
the same two datasets, the CK+ and the JAFFE. Three of our four group winners (case 1a,
case 2a and case 4c) recorded only 1.22 % less than the accuracy of the top performer model
when using the CK+ dataset, and two of our network structures, case 2a and case 3c came in
third, beating other models when using the JAFFE dataset. |
en |
dc.language.iso |
en |
en |
dc.subject |
Facial Expression Recognition (FER) |
en |
dc.subject |
Deep Learning |
en |
dc.subject |
Convolutional Neural Network (CNN) |
en |
dc.subject |
Deep Convolutional Neural Network (DCNN) |
en |
dc.subject |
Artificial Intelligence |
en |
dc.subject |
Hierarchical Deep Neural Network Structure |
en |
dc.subject |
Face Detection |
en |
dc.subject |
Facial Feature Extraction |
en |
dc.subject |
Central Processing Unit (CPU) |
en |
dc.subject |
Graphics Processing Unit (GPU) |
en |
dc.title |
Investigation of hierarchical deep neural network structure for facial expression recognition |
en |
dc.type |
Dissertation |
en |
dc.description.department |
Electrical and Mining Engineering |
en |