Revue de l'Information Scientifique et Technique
Volume 27, Numéro 2, Pages 36-43
2023-11-08

Hate Speech Detection Model Based On Bert For The Arabic Dialects

Authors : Chiker Nour Elhouda .

Abstract

Hateful speech spread through social media has the potential to cause personal harm and suffering as well as social tension. Social media platforms, on the other hand, are unable to regulate all of the content that users post. As a result, there is a demand for automatic detection of hate speech. This demand is increased when the posts are written in complex languages, such as Arabic. This present study is dedicated to contributing to hate speech and offensive language detection tasks for Arabic dialects. We propose an approach based on deep learning and a pre-trained BERT model. this approach is built by adding GRU and LSTM layers to BERT outputs. Additionally, to deal with the class imbalance issue in the dataset, two methods are proposed, the first is based on data augmentation by oversampling minority class using translation and back translation method and the second uses focal loss for training. The best results reached with focal loss training are 88.51% for accuracy and 97.46% for f1-score, and with data augmentation, 89.61% for accuracy and 97.78 for f1-score.

Keywords

hate speech ; offensive language ; detection ; Arabic dialects ; deep learning ; BERT model ; class unbalanced ; oversampling ; focal loss