dc.description.abstract |
Density-based algorithms are effective in the detection of clusters with arbitrary shapes and outliers even when information about the number of clusters is not available. Parameter specification in data stream clustering remains a challenge. Selecting a suitable parameter-tuning is germane in having a good clustering quality. The density-based algorithm DenStream is an example of data stream clustering algorithms that require several parameter specifications. In this dissertation, an improved DenStream with a modified distance measure was proposed and demonstrated with parameter-tuning in Massive Online Analysis (MOA) using synthetic and real-world datasets. The modified DenStream algorithm was compared against CluStream, ClusTree and DenStream in the presence of noise levels 0%, 10%, and 30% and manually selected epsilon parameters 0.02, 0.03, and 0.05 respectively. The epsilon parameter range [0.02 – 0.05] was not used due to some algorithm not working on real-world datasets. The effects on clustering qualities were evaluated and demonstrated using performance evaluation metrics CMM, Purity, Silhouette Coefficient, and Rand index on the synthetic and real-world datasets. Finally, the result shows that effectiveness of the algorithms depends on the parameter-tuning and no single algorithm is a one-size-fits-all for the performance metrics. |
en |