MWITONDI, Kassim and SAID, Raed (2012). A data-based method for determining the most suitable model from a finite set of heterogeneous models. Journal of Statistics Applications and Probability (JSAP). (In Press)Full text not available from this repository.
Detecting and labelling naturally arising structures in data are fundamental to modelling accuracy and reliability. This paper proposes a step-wise method for carrying out the two tasks while balancing the two modelling attributes. Its key strategy involves using methods initially developed for detecting outliers in data to identify naturally arising patterns in continuous data. The detected natural groupings are simultaneously labelled based on internally-devised data-dependent parametric rules before carrying out predictive modelling using standard data mining models. Results from applying the method on both simulated and real data show that the method provides robust features capable of resisting data over-fitting. The paper makes key recommendations relating to the future of knowledge extraction from data across applications and highlights future challenges and opportunities. In particular, it focuses on addressing model complexity issues by appropriately taming data sources, models and related parameters in a monitored triangular balance. Key Words: Accuracy, data mining, data recycling, EM algorithm, forward search, over-fitting, supervised modelling, unsupervised modelling
|Research Institute, Centre or Group:||Cultural Communication and Computing Research Institute > Communication and Computing Research Centre|
|Depositing User:||Kassim Mwitondi|
|Date Deposited:||20 Sep 2012 17:38|
|Last Modified:||20 Sep 2012 17:38|
Actions (login required)
Downloads per month over past year