Institutional Repository

Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

Show simple item record

dc.contributor.advisor Mnkandla, E.
dc.contributor.author Mpofu, Bongeka
dc.date.accessioned 2018-12-06T06:32:02Z
dc.date.available 2018-12-06T06:32:02Z
dc.date.issued 2017-11
dc.date.submitted 2018-12
dc.identifier.citation Mpofu, Bongeka (2017) Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection, University of South Africa, Pretoria, <http://hdl.handle.net/10500/25108>
dc.identifier.uri http://hdl.handle.net/10500/25108
dc.description.abstract Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures. en
dc.format.extent 1 online resource (xvii, 196 leaves) : illustrations (some color), graphs (some color) en
dc.language.iso en en
dc.subject Defect prediction en
dc.subject Feature selection en
dc.subject Software metrics en
dc.subject Relevant metrics en
dc.subject Redundancy en
dc.subject Machine learning algorithms en
dc.subject Filter en
dc.subject Wrapper en
dc.subject Embedded en
dc.subject Information theory en
dc.subject.ddc 005.14
dc.subject.lcsh Software measurement en
dc.subject.lcsh Machine learning en
dc.subject.lcsh Embedded computer systems en
dc.subject.lcsh Information theory en
dc.title Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection en
dc.type Thesis en
dc.description.department School of Computing en
dc.description.degree Ph. D. (Computer Science) en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UnisaIR


Browse

My Account

Statistics