Evolved Apache Lucene SpanFirst Queries are Good Text Classifiers

HIRSCH, Laurence (2010). Evolved Apache Lucene SpanFirst Queries are Good Text Classifiers. In: IEEE Congress on Evolutionary Computation, 2009. CEC '09. IEEE Press, 1-8.

[img] Microsoft Word - Submitted Version
Download (62kB)
    Link to published version:: 10.1109/CEC.2010.5585955

    Abstract

    Human readable text classifiers have a number of advantages over classifiers based on complex and opaque mathematical models. For some time now search queries or rules have been used for classification purposes, either constructed manually or automatically. We have performed experiments using genetic algorithms to evolve text classifiers in search query format with the combined objective of classifier accuracy and classifier readability. We have found that a small set of disjunct Lucene SpanFirst queries effectively meet both goals. This kind of query evaluates to true for a document if a particular word occurs within the first N words of a document. Previously researched classifiers based on queries using combinations of words connected with OR, AND and NOT were found to be generally less accurate and (arguably) less readable. The approach is evaluated using standard test sets Reuters-21578 and Ohsumed and compared against several classification algorithms.

    Item Type: Book Section
    Identification Number: 10.1109/CEC.2010.5585955
    Depositing User: Laurence Hirsch
    Date Deposited: 05 Jun 2011 21:02
    Last Modified: 05 Jun 2011 21:02
    URI: http://shura.shu.ac.uk/id/eprint/3422

    Actions (login required)

    View Item

    Downloads

    Downloads per month over past year

    View more statistics