Gender bias detection on hate speech classification: an analysis at feature-level

NASCIMENTO, Francimaria R. S., CAVALCANTI, George D. C. and COSTA-ABREU, Marjory Da (2025). Gender bias detection on hate speech classification: an analysis at feature-level. Neural Computing and Applications, 37 (5), 3887-3905. [Article]

Documents
34856:836671
[thumbnail of 521_2024_Article_10841.pdf]
Preview
PDF
521_2024_Article_10841.pdf - Published Version
Available under License Creative Commons Attribution.

Download (715kB) | Preview
34856:836672
[thumbnail of 521_2024_10841_MOESM1_ESM.pdf]
Preview
PDF
521_2024_10841_MOESM1_ESM.pdf - Supplemental Material
Available under License Creative Commons Attribution.

Download (46kB) | Preview
Abstract
Hate speech is a growing problem on social media due to the larger volume of content being shared. Recent works demonstrated the usefulness of distinct machine learning algorithms combined with natural language processing techniques to detect hateful content. However, when not constructed with the necessary care, learning models can magnify discriminatory behaviour and lead the model to incorrectly associate comments with specific identity terms (e.g., woman, black, and gay) with a particular class, such as hate speech. Moreover, some specific characteristics should be considered in the test set when evaluating the presence of bias, considering that the test set can follow the same biased distribution of the training set and compromise the results obtained by the bias metrics. This work argues that considering the potential bias in hate speech detection is needed and focuses on developing an intelligent system to address these limitations. Firstly, we proposed a comprehensive, unbiased dataset to unintended gender bias evaluation. Secondly, we propose a framework to help analyse bias from feature extraction techniques. Then, we evaluate several state-of-the-art feature extraction techniques, specifically focusing on the bias towards identity terms. We consider six feature extraction techniques, including TF, TF-IDF, FastText, GloVe, BERT, and RoBERTa, and six classifiers, LR, DT, SVM, XGB, MLP, and RF. The experimental study across hate speech datasets and a range of classification and unintended bias metrics demonstrates that the choice of the feature extraction technique can impact the bias on predictions, and its effectiveness can depend on the dataset analysed. For instance, combining TF and TF-IDF with DT and MLP resulted in higher bias, while BERT and RoBERTa showed lower bias with the same classifier for the HE and WH datasets. The proposed dataset and source code will be publicly available when the paper is published.
More Information
Statistics

Downloads

Downloads per month over past year

View more statistics

Metrics

Altmetric Badge

Dimensions Badge

Share
Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Actions (login required)

View Item View Item