Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata

ARAUJO DE SOUZA, Gabriel and DA COSTA ABREU, Marjory (2020). Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. In: IEEE World Congress on Computational Intelligence (IEEE WCCI). IEEE. [Book Section]

Documents
26018:544846
[thumbnail of IEEE_WCCI_Artificial_Intelligence.pdf]
Preview
PDF
IEEE_WCCI_Artificial_Intelligence.pdf - Accepted Version
Available under License All rights reserved.

Download (169kB) | Preview
Abstract
The popularity of social networks has only increased in recent years. In theory, the use of social media was proposed so we could share our views online, keep in contact with loved ones or share good moments of life. However, the reality is not so perfect, so you have people sharing hate speech-related messages, or using it to bully specific individuals, for instance, or even creating robots where their only goal is to target specific situations or people. Identifying who wrote such text is not easy and there are several possible ways of doing it, such as using natural language processing or machine learning algorithms that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial investigation of which are the best machine learning techniques to detect offensive language in tweets. After an analysis of the current trend in the literature about the recent text classification techniques, we have selected Linear SVM and Naive Bayes algorithms for our initial tests. For the preprocessing of data, we have used different techniques for attribute selection that will be justified in the literature section. After our experiments, we have obtained 92% of accuracy and 95% of recall to detect offensive language with Naive Bayes and 90% of accuracy and 92% of recall with Linear SVM. From our understanding, these results overcome our related literature and are a good indicative of the importance of the data description approach we have used.
More Information
Statistics

Downloads

Downloads per month over past year

View more statistics

Metrics

Altmetric Badge

Dimensions Badge

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Actions (login required)

View Item View Item