Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata

ARAUJO DE SOUZA, Gabriel and DA COSTA ABREU, Marjory (2020). Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata. In: IEEE World Congress on Computational Intelligence (IEEE WCCI). IEEE.

[img]
Preview
PDF
IEEE_WCCI_Artificial_Intelligence.pdf - Accepted Version
All rights reserved.

Download (169kB) | Preview
Official URL: https://ieeexplore.ieee.org/document/9207652
Link to published version:: https://doi.org/10.1109/IJCNN48605.2020.9207652
Related URLs:

Abstract

The popularity of social networks has only increased in recent years. In theory, the use of social media was proposed so we could share our views online, keep in contact with loved ones or share good moments of life. However, the reality is not so perfect, so you have people sharing hate speech-related messages, or using it to bully specific individuals, for instance, or even creating robots where their only goal is to target specific situations or people. Identifying who wrote such text is not easy and there are several possible ways of doing it, such as using natural language processing or machine learning algorithms that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial investigation of which are the best machine learning techniques to detect offensive language in tweets. After an analysis of the current trend in the literature about the recent text classification techniques, we have selected Linear SVM and Naive Bayes algorithms for our initial tests. For the preprocessing of data, we have used different techniques for attribute selection that will be justified in the literature section. After our experiments, we have obtained 92% of accuracy and 95% of recall to detect offensive language with Naive Bayes and 90% of accuracy and 92% of recall with Linear SVM. From our understanding, these results overcome our related literature and are a good indicative of the importance of the data description approach we have used.

Item Type: Book Section
Additional Information: © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. ISSN: 2161-4407
Identification Number: https://doi.org/10.1109/IJCNN48605.2020.9207652
SWORD Depositor: Symplectic Elements
Depositing User: Symplectic Elements
Date Deposited: 31 Mar 2020 17:09
Last Modified: 17 Mar 2021 20:01
URI: https://shura.shu.ac.uk/id/eprint/26018

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics