Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata

Tools

ARAUJO DE SOUZA, Gabriel and DA COSTA ABREU, Marjory (2020). Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. In: IEEE World Congress on Computational Intelligence (IEEE WCCI). IEEE. [Book Section]

[+][-]

Documents

26018:544846

[+][-]

26018:544846

[thumbnail of IEEE_WCCI_Artificial_Intelligence.pdf]

Preview

PDF
IEEE_WCCI_Artificial_Intelligence.pdf - Accepted Version
Available under License All rights reserved.

Download (169kB) | Preview

Abstract

The popularity of social networks has only increased in recent years. In theory, the use of social media was proposed so we could share our views online, keep in contact with loved ones or share good moments of life. However, the reality is not so perfect, so you have people sharing hate speech-related messages, or using it to bully specific individuals, for instance, or even creating robots where their only goal is to target specific situations or people. Identifying who wrote such text is not easy and there are several possible ways of doing it, such as using natural language processing or machine learning algorithms that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial investigation of which are the best machine learning techniques to detect offensive language in tweets. After an analysis of the current trend in the literature about the recent text classification techniques, we have selected Linear SVM and Naive Bayes algorithms for our initial tests. For the preprocessing of data, we have used different techniques for attribute selection that will be justified in the literature section. After our experiments, we have obtained 92% of accuracy and 95% of recall to detect offensive language with Naive Bayes and 90% of accuracy and 92% of recall with Linear SVM. From our understanding, these results overcome our related literature and are a good indicative of the importance of the data description approach we have used.

More Information

Official URL:

https://ieeexplore.ieee.org/document/9207652

Additional Information:

© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. ISSN: 2161-4407

Event Location:

Glasgow, UK

Related URLs: