A data-driven method for selecting optimal models based on graphical visualisation of differences in sequentially fitted ROC model parameters.

MWITONDI, Kassim, MOUSTAFA, Rida and HADI, Ali (2011). A data-driven method for selecting optimal models based on graphical visualisation of differences in sequentially fitted ROC model parameters. CODATA Data Science Journal. (In Press)

[img]
Preview
PDF - Accepted Version
Download (345kB) | Preview
    Official URL: https://www.jstage.jst.go.jp

    Abstract

    Differences in modelling techniques and model performance assessments typically impinge on the quality of knowledge extraction from data. We propose an algorithm for determining optimal patterns in data by separately training and testing three decision tree models the Pima Indians Diabetes and the Bupa Liver Disorders datasets. Model performance is assessed using ROC curves and the Youden Index; moving differences between sequential fitted parameters are then extracted and their respective probability density estimations are used to track their variability using an iterative graphical data visualisation technique developed for this purpose. Our results show that the proposed strategy separates the groups more robustly than the plain ROC/Youden approach, eliminates obscurity and minimizes over-fitting. Further, the algorithm can easily be understood by non-specialists and demonstrates multi-disciplinary compliance. Key Words: Bayesian Error, Data Mining, Decision Trees, Domain Partitioning, Data Visualisation, Optimal Bandwidth, ROC curves, Visual Analytics, Youden Index

    Item Type: Article
    Research Institute, Centre or Group: Cultural Communication and Computing Research Institute > Communication and Computing Research Centre
    Related URLs:
    Depositing User: Kassim Mwitondi
    Date Deposited: 31 Aug 2012 16:55
    Last Modified: 31 Aug 2012 16:55
    URI: http://shura.shu.ac.uk/id/eprint/5263

    Actions (login required)

    View Item

    Downloads

    Downloads per month over past year

    View more statistics