MWITONDI, Kassim and SAID, Raed (2011). A step-wise method for labelling continuous data with a focus on striking a balance between predictive accuracy and model reliability. Kuwait Journal of Sciences and Engineering. (In Press)
| PDF - Accepted Version Download (738kB) | Preview |
Abstract
Detecting and labelling naturally arising structures in data are fundamental to modelling accuracy and reliability. This paper proposes a step-wise method for carrying out the two tasks while balancing the two modelling attributes. Its key strategy involves using methods initially developed for detecting outliers in data to identify naturally arising patterns in continuous data. The detected natural groupings are simultaneously labelled based on internally-devised data-dependent parametric rules before carrying out predictive modelling using standard data mining models. Results from applying the method on both simulated and real data show that the method provides robust features capable of resisting data over-fitting. The paper makes key recommendations relating to the future of knowledge extraction from data across applications and highlights future challenges and opportunities. In particular, it focuses on addressing model complexity issues by appropriately taming data sources, models and related parameters in a monitored triangular balance. Key Words: Accuracy, data mining, data recycling, EM algorithm, forward search, over-fitting, supervised modelling, unsupervised modelling
| Item Type: | Article |
|---|---|
| Research Institute, Centre or Group: | Cultural Communication and Computing Research Institute > Communication and Computing Research Centre |
| Related URLs: | |
| Depositing User: | Kassim Mwitondi |
| Date Deposited: | 20 Sep 2012 16:36 |
| Last Modified: | 20 Sep 2012 16:36 |
| URI: | http://shura.shu.ac.uk/id/eprint/5264 |
Actions (login required)
| View Item |
Downloads
Downloads per month over past year
Tools
Tools