Application of improved density-based algorithms to data stream and performance evaluation

Loading...
Thumbnail Image

Authors

Akinosho, Tajudeen Akanbi

Issue Date

2023-10

Type

Dissertation

Language

en

Keywords

Data stream clustering , Stream clustering , Data stream , Clustering , MOA , Clusters , CluStream , DenStream , ClusTree , Modified DenStream , Arbitrary shape , SDG 9 Industry, Innovation and Infrastructure

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Density-based algorithms are effective in the detection of clusters with arbitrary shapes and outliers even when information about the number of clusters is not available. Parameter specification in data stream clustering remains a challenge. Selecting a suitable parameter-tuning is germane in having a good clustering quality. The density-based algorithm DenStream is an example of data stream clustering algorithms that require several parameter specifications. In this dissertation, an improved DenStream with a modified distance measure was proposed and demonstrated with parameter-tuning in Massive Online Analysis (MOA) using synthetic and real-world datasets. The modified DenStream algorithm was compared against CluStream, ClusTree and DenStream in the presence of noise levels 0%, 10%, and 30% and manually selected epsilon parameters 0.02, 0.03, and 0.05 respectively. The epsilon parameter range [0.02 – 0.05] was not used due to some algorithm not working on real-world datasets. The effects on clustering qualities were evaluated and demonstrated using performance evaluation metrics CMM, Purity, Silhouette Coefficient, and Rand index on the synthetic and real-world datasets. Finally, the result shows that effectiveness of the algorithms depends on the parameter-tuning and no single algorithm is a one-size-fits-all for the performance metrics.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN