A step-wise method for labelling continuous data with a focus on striking a balance between predictive accuracy and model reliability

MWITONDI, Kassim and SAID, Raed (2011). A step-wise method for labelling continuous data with a focus on striking a balance between predictive accuracy and model reliability. Kuwait Journal of Sciences and Engineering. (In Press)

[img]
Preview
PDF - Accepted Version
Download (738kB) | Preview
    Official URL: http://www.pubcouncil.kuniv.edu.kw/kjse/english/co...

    Abstract

    Detecting and labelling naturally arising structures in data are fundamental to modelling accuracy and reliability. This paper proposes a step-wise method for carrying out the two tasks while balancing the two modelling attributes. Its key strategy involves using methods initially developed for detecting outliers in data to identify naturally arising patterns in continuous data. The detected natural groupings are simultaneously labelled based on internally-devised data-dependent parametric rules before carrying out predictive modelling using standard data mining models. Results from applying the method on both simulated and real data show that the method provides robust features capable of resisting data over-fitting. The paper makes key recommendations relating to the future of knowledge extraction from data across applications and highlights future challenges and opportunities. In particular, it focuses on addressing model complexity issues by appropriately taming data sources, models and related parameters in a monitored triangular balance. Key Words: Accuracy, data mining, data recycling, EM algorithm, forward search, over-fitting, supervised modelling, unsupervised modelling

    Item Type: Article
    Research Institute, Centre or Group: Cultural Communication and Computing Research Institute > Communication and Computing Research Centre
    Related URLs:
    Depositing User: Kassim Mwitondi
    Date Deposited: 20 Sep 2012 16:36
    Last Modified: 20 Sep 2012 16:36
    URI: http://shura.shu.ac.uk/id/eprint/5264

    Actions (login required)

    View Item

    Downloads

    Downloads per month over past year

    View more statistics