Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

Mpofu, Bongeka

dc.contributor.advisor	Mnkandla, E.
dc.contributor.author	Mpofu, Bongeka
dc.date.accessioned	2018-12-06T06:32:02Z
dc.date.available	2018-12-06T06:32:02Z
dc.date.issued	2017-11
dc.date.submitted	2018-12
dc.identifier.citation	Mpofu, Bongeka (2017) Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection, University of South Africa, Pretoria, <http://hdl.handle.net/10500/25108>
dc.identifier.uri	http://hdl.handle.net/10500/25108
dc.description.abstract	Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures.	en
dc.format.extent	1 online resource (xvii, 196 leaves) : illustrations (some color), graphs (some color)	en
dc.language.iso	en	en
dc.subject	Defect prediction	en
dc.subject	Feature selection	en
dc.subject	Software metrics	en
dc.subject	Relevant metrics	en
dc.subject	Redundancy	en
dc.subject	Machine learning algorithms	en
dc.subject	Filter	en
dc.subject	Wrapper	en
dc.subject	Embedded	en
dc.subject	Information theory	en
dc.subject.ddc	005.14
dc.subject.lcsh	Software measurement	en
dc.subject.lcsh	Machine learning	en
dc.subject.lcsh	Embedded computer systems	en
dc.subject.lcsh	Information theory	en
dc.title	Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection	en
dc.type	Thesis	en
dc.description.department	School of Computing	en
dc.description.degree	Ph. D. (Computer Science)	en