Gender bias detection on hate speech classification: an analysis at feature-level

Tools

NASCIMENTO, Francimaria R. S., CAVALCANTI, George D. C. and COSTA-ABREU, Marjory Da (2025). Gender bias detection on hate speech classification: an analysis at feature-level. Neural Computing and Applications, 37 (5), 3887-3905. [Article]

[+][-]

Documents

34856:836671
34856:836672

[+][-]

34856:836671

[thumbnail of 521_2024_Article_10841.pdf]

Preview

PDF
521_2024_Article_10841.pdf - Published Version
Available under License Creative Commons Attribution.

Download (715kB) | Preview

34856:836672

[thumbnail of 521_2024_10841_MOESM1_ESM.pdf]

Preview

PDF
521_2024_10841_MOESM1_ESM.pdf - Supplemental Material
Available under License Creative Commons Attribution.

Download (46kB) | Preview

Abstract

Hate speech is a growing problem on social media due to the larger volume of content being shared. Recent works demonstrated the usefulness of distinct machine learning algorithms combined with natural language processing techniques to detect hateful content. However, when not constructed with the necessary care, learning models can magnify discriminatory behaviour and lead the model to incorrectly associate comments with specific identity terms (e.g., woman, black, and gay) with a particular class, such as hate speech. Moreover, some specific characteristics should be considered in the test set when evaluating the presence of bias, considering that the test set can follow the same biased distribution of the training set and compromise the results obtained by the bias metrics. This work argues that considering the potential bias in hate speech detection is needed and focuses on developing an intelligent system to address these limitations. Firstly, we proposed a comprehensive, unbiased dataset to unintended gender bias evaluation. Secondly, we propose a framework to help analyse bias from feature extraction techniques. Then, we evaluate several state-of-the-art feature extraction techniques, specifically focusing on the bias towards identity terms. We consider six feature extraction techniques, including TF, TF-IDF, FastText, GloVe, BERT, and RoBERTa, and six classifiers, LR, DT, SVM, XGB, MLP, and RF. The experimental study across hate speech datasets and a range of classification and unintended bias metrics demonstrates that the choice of the feature extraction technique can impact the bias on predictions, and its effectiveness can depend on the dataset analysed. For instance, combining TF and TF-IDF with DT and MLP resulted in higher bias, while BERT and RoBERTa showed lower bias with the same classifier for the HE and WH datasets. The proposed dataset and source code will be publicly available when the paper is published.

More Information

Official URL:

https://link.springer.com/article/10.1007/s00521-0...

Open Access URL:

https://link.springer.com/content/pdf/10.1007/s005...

Open Access Version:

Published version

Additional Information:

** From Springer Nature via Jisc Publications Router ** History: received 28-11-2023; accepted 21-10-2024; registration 29-11-2024; epub 17-12-2024; online 17-12-2024; ppub 01-02-2025. ** Licence for this article: http://creativecommons.org/licenses/by/4.0/ ** Acknowledgements: This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance Code 001 and Scholarship no. 88887.484211/2020-00.

Uncontrolled Keywords:

Social media, Hate speech detection, Unintended gender bias, Feature extraction, Machine learning techniques, Unbiased dataset

Page Range:

3887-3905

Identifiers

Identification Number:

10.1007/s00521-024-10841-8

ORCID for Marjory Da Costa-Abreu: