MWITONDI, Kassim and MOUSTAFA, Rida (2012). Determining optimal centroids in repeated K-means simulations. CODATA Data Science Journal. (Unpublished)Full text not available from this repository.
Most data mining problems involve data with heavy tails or data overlaps and so one of the main challenges data scientists face is to separate noise from meaningful data. In most applications standard non-linear approaches have not always been successful in detecting naturally arising structures in data. We propose an adaptive method for determining an optimal number of centroids in repeated K-Means simulations. The method derives from the particle swarm optimisation (PSO) but rather than following the conventional heuristc approach of PSO it adapts a mathematically formalised convergent approach. Applications on solar magnetic activity cycles data show that the proposed method can optimally identify K-Means centroids. We illustrated how the method can be extended to applications in seismology and oceanography. Key Words: Data Mining, K-Means, PSO, Solar Magnetic Activity Cycles, Supervised Modelling, Unsupervised Modelling
|Research Institute, Centre or Group:||Cultural Communication and Computing Research Institute > Communication and Computing Research Centre|
|Depositing User:||Kassim Mwitondi|
|Date Deposited:||24 Sep 2012 17:21|
|Last Modified:||24 Sep 2012 17:21|
Actions (login required)
Downloads per month over past year