New machine learning model finds hate tweeting mainly originates from right-leaning figures

Social media platforms have struggled to accurately detect hate speech, especially given the different definitions and contexts of harmful content. A new study in Computer Speech & Language introduces a machine learning model that improves detection by training on multiple datasets. The researchers found that right-leaning figures generated significantly more hate speech and abusive posts than left-leaning figures. This innovative model shows promise in better identifying and moderating hate speech across platforms like Twitter and Reddit.

The rise of social media has created new challenges in managing harmful content, with hate speech being a major issue. Platforms like Twitter, Facebook, and Reddit have struggled to efficiently and accurately detect and remove such content. Automated detection methods, primarily based on machine learning, have been employed to identify hate speech. However, existing methods often fail when applied to new datasets, partly due to the inconsistent definitions of hate speech across different contexts and platforms.

For example, a model trained to detect racist language may perform poorly when tasked with identifying misogynistic or xenophobic comments. The absence of a universal definition of hate speech further complicates the issue. Given this limitation, the research team aimed to create a more robust model that could recognize hate speech across a variety of domains and datasets, improving the accuracy of detection across platforms.

“Our group’s long-term research goals include understanding the creation and spread of online harmful content,” said study author Marian-Andrei Rizoiu, an associate professor leading the Behavioral Data Science lab at the University of Technology Sydney.

“We, therefore, needed a detector for hate speech to be able to track such content online. The issue with existing classifiers is that they capture very narrow definitions of hate speech; our classifier works better because we account for multiple definitions of hate across different platforms. Historically, literature has trained hate speech classifiers on data manually labeled by human experts. This process is expensive (human expertise is slow and costly) and usually leads to biased definitions of hate speech that account for the labeller’s points of view.”

To tackle the issue of generalization, the researchers developed a new machine learning model using Multi-task Learning. Multi-task Learning allows a model to learn from multiple datasets simultaneously, which helps the model capture broader patterns and definitions of hate speech. The idea is that learning from multiple sources at once can reduce biases and improve the model’s ability to detect hate speech in new or unseen contexts.

The researchers trained their model using eight publicly available hate speech datasets, gathered from platforms such as Twitter, Reddit, Gab, and others. These datasets varied in their definitions and classifications of hate speech, with some focusing on racism, others on sexism, and still others on abusive language more generally. This broad approach helped the model learn from diverse sources, making it less likely to overfit to a specific type of hate speech.

In addition to using existing datasets, the researchers also created a new dataset called “PubFigs,” which contains over 300,000 tweets from 15 American public figures. The figures selected for this dataset included both right-wing and left-wing political figures. By including this new dataset, the researchers tested how well their model could detect hate speech from high-profile individuals and in political contexts.

The model they developed was based on a pre-trained language model known as BERT (Bidirectional Encoder Representations from Transformers). This model is widely used in natural language processing tasks due to its ability to understand and generate human-like text. The researchers modified BERT by attaching separate classification layers for each dataset, allowing the model to handle different types of hate speech. During training, these classification layers worked together to optimize the model’s ability to detect a general definition of hate speech across all datasets.

The Multi-task Learning model outperformed existing state-of-the-art models in detecting hate speech across different datasets. It showed improved accuracy in identifying hate speech, especially when applied to datasets it had not seen during training. This was a key improvement over previous models, which tended to perform well only on the specific datasets they were trained on but struggled when exposed to new data.

For example, in one of the experiments, the researchers used a “leave-one-out” approach, where the model was trained on all but one dataset and then tested on the remaining dataset. In most cases, the new model outperformed other hate speech detection models, particularly when tested on datasets that involved different definitions or types of hate speech. This demonstrates the model’s ability to generalize and adapt to new kinds of harmful content.

“There is typically no single definition of hate speech; hate speech is a continuum, as hate can be expressed overtly using slurs and direct references or covertly using sarcasm and even humor,” Rizoiu told PsyPost. “Our study develops tools to account for these nuances by leveraging multiple training datasets and a novel machine learning technique called transfer learning.”

Another interesting finding from the study came from applying the model to the PubFigs dataset. Of the 1,133 tweets classified as hate speech, 1,094 were posted by right-leaning figures, while only 39 came from left-leaning figures. In terms of abusive content, right-leaning figures contributed 5,029 out of the total 5,299 abusive tweets, with only 270 coming from the left-leaning group. This means that left-leaning figures accounted for just 3.38% of the hate speech and 5.14% of the abusive content in the dataset.

Among the right-leaning figures, certain individuals stood out for their high levels of problematic content. Ann Coulter, a conservative media pundit known for her provocative views, was responsible for nearly half of the hate speech in the dataset, contributing 464 out of the 1,133 hate-labeled tweets. Former President Donald Trump also posted a significant number of problematic tweets, with 85 classified as hate speech and 197 as abusive content. Other prominent right-wing figures, such as Alex Jones and Candace Owens, also had high levels of flagged content.

On the other hand, left-leaning figures posted far fewer problematic tweets. For example, Senator Bernie Sanders, former President Barack Obama, and former First Lady Michelle Obama had no tweets labeled as abusive. Alexandria Ocasio-Cortez had only 4 tweets classified as hate speech and 4 tweets classified as abusive, while Ilhan Omar had 23 tweets classified as hate speech and 46 tweets classified as abusive.

“What surprised us was the fact that abusive speech appears not to be solely the traits of right-leaning figures,” Rizoiu said. “Left-leaning figures also spread abusive content in their postings. While this content would not necessarily be considered hate speech in most definitions, they were abusive.”

The content of the hate speech and abusive posts also differed between right-leaning and left-leaning figures. For right-leaning figures, the hateful content often targeted specific groups, including Muslims, women, immigrants, and people of color.

“We find that most hate-filled tweets target topics such as religion (particularly Islam), politics, race and ethnicity, women and refugees and immigrants,” Rizoiu said. “It is interesting how most hate is directed towards the most vulnerable cohorts.”

In comparison, the left-leaning figures’ tweets were less focused on inflammatory rhetoric. The few instances of problematic content from this group were often related to discussions of social justice or political topics.

While the study showed significant improvements in hate speech detection, there were still some limitations. One issue was the challenge of handling subtle or covert forms of hate speech. The researchers noted that their model might miss more nuanced expressions of hate that don’t use overtly harmful language but still contribute to a hostile environment. Future research could explore how to enhance the model’s ability to detect these more subtle forms of hate.

Additionally, the study’s reliance on labeled datasets presents a potential limitation. While Multi-task Learning helps reduce the biases inherent in individual datasets, these biases are not completely eliminated. The datasets used in the study, like many others, are subject to human labeling, which can introduce inconsistencies or inaccuracies.

“While our model builds more encompassing definitions and detections of hate speech, they still depend on the original datasets’ labelling,” Rizoiu explained. “That is, we average over human expert viewpoints, but if they are all biased similarly (say, they are all academics who share a similar bias), then even our encompassing model will have these general biases.”

“Our group’s research is modelling the spread of online content via the digital work-of-mouth process. We concentrate particularly on harmful content (misinformation, disinformation, hate speech) and its effects on the offline world. For example, we want to understand why people engage with harmful content, what makes it attractive, and why it spreads widely.”

“Detection is only the first phase in addressing an online issue,” Rizoiu added. “The question is how we develop and deploy effective methods in the real online world that can protect against harmful content without impeding rights such as free speech. Works like our study provide effective detection approaches that online platforms could incorporate to protect their users, particularly the most vulnerable, such as children and teens, from hate speech.”

The study, “Generalizing Hate Speech Detection Using Multi-Task Learning: A Case Study of Political Public Figures,” was authored by Lanqin Yuan and Marian-Andrei Rizoiu.