Researchers discover surprising gender gap in online reviews on “a gigantic scale”

In a recent study published in Nature Human Behaviour, researchers uncovered a “gender rating gap” in online reviews, revealing that women generally give higher ratings than men across several online review platforms. The study, which analyzed over a billion reviews from major sites like Amazon, Google, and Yelp, found that women tend to rate products and services more favorably than men do. Follow-up lab studies suggest that this discrepancy is largely due to women’s reluctance to leave negative reviews.

Online reviews are a primary source of information for consumers, who rely on others’ experiences to make informed decisions. However, online reviews are prone to biases, with previous research suggesting that gender influences various online behaviors and expressions. This study sought to investigate whether men and women show different tendencies when rating products or experiences and to determine what drives any observed disparities.

“User-generated content (UGC) over digital platforms has been shown to have an incredible social and economic influence on our daily lives,” said study author Yaniv Dover, the Vice Dean of Research at the Jerusalem Business School and a member of the Federmann Center for the Study of Rationality at the Hebrew University of Jerusalem.

“Online reviews are an especially important example of UGC – they are very pervasive and were shown to impact markets, businesses and consumers. Supposedly, the purpose of online reviews is to increase transparency in markets, reduce scamming and help consumers share useful information – empowering consumers and pushing businesses to improve. But it all relies on the fact that reviews are not systematically biased. So, our motivation was to check whether that is true — is there any systematic bias? We tried something fundamental like gender — we were hoping to find that gender doesn’t matter. We were wrong.”

The researchers gathered observational data from five major online platforms—Amazon, Google, IMDb, TripAdvisor, and Yelp—to examine potential gender differences in online review ratings. Their goal was to identify if a consistent gender rating gap existed across various types of platforms, products, and services. They collected a total of 1.2 billion reviews, which allowed them to examine a wide range of user experiences and a massive dataset across different years, locations, and categories of products or services.

Across all platforms, the study consistently found that women provided higher ratings on average than men. This gap persisted even after the researchers accounted for different factors, such as the category of review (e.g., restaurant or e-commerce) and differences in geographic location. Interestingly, the gap was also consistent across different years, suggesting that the observed trend was stable over time.

The team analyzed the rating distributions and found that, in general, men and women tended to use the more positive end of the rating scale, but men were slightly more critical. In other words, a larger percentage of female reviewers tended to give five-star ratings, while male reviewers had a higher tendency to give lower ratings, resulting in a modest but consistent gender gap.

“We have to admit that the fact that women are on average consistently more positive on such a gigantic scale was surprising,” Dover told PsyPost. “When we first started collecting data and were trying to guess what we’ll see, we each did a little not-so-scientific ‘survey’ to try and predict what we’ll find. We asked family, colleagues, students and anecdotally – when they had to guess, women usually got it wrong, i.e., women thought women are generally more negative. We haven’t done extensive surveys, but if our little convenient sample is representative – this may be an interesting misconception – women thinking they are themselves more negative when actually they are more positive.”

To investigate the reasons behind this gender gap, the researchers conducted two lab-based studies, Studies 2 and 3. These studies were intended to test whether differences in attitudes, rating adjustments, or the willingness to post negative reviews might contribute to the observed gap.

Study 2 used a controlled environment to examine how men and women rate products in a situation where they could immediately experience and evaluate them. In this study, the researchers recruited over 1,100 participants from the United States and had them rate AI-generated paintings. The design of Study 2 allowed researchers to compare both participants’ private attitudes (how they felt about the painting) and their willingness to post a public review if they were dissatisfied.

Interestingly, Study 2 confirmed that men and women did not significantly differ in their private attitudes toward the paintings, meaning that both genders had similar reactions when judging the art without any social implications. However, when asked to post a public review, women showed greater hesitation to give lower ratings, especially when dissatisfied. The researchers observed that the gender rating gap emerged largely because women were less likely to post negative reviews, aligning with the idea that women may have greater concerns about appearing overly critical or being judged negatively for their opinions.

Study 3 aimed to replicate the findings of Study 2 in a new context to see if the results held when participants rated a different type of content. Instead of paintings, this study had participants listen to short musical pieces and rate them in the same way. The researchers again recruited over 1,100 participants, ensuring an even balance between male and female respondents.

The design of Study 3 was nearly identical to that of Study 2, but it added a new element: participants were asked about their fear of negative evaluation, a measure of their discomfort with receiving criticism or disapproval. This measure was intended to help determine if fear of negative evaluation might be a driving factor in women’s lower likelihood of posting negative reviews.

As with Study 2, the results of Study 3 showed that men and women did not differ in their private attitudes toward the music. However, women again demonstrated a reluctance to post negative reviews when they felt dissatisfied.

The analysis revealed that women reported a higher level of fear of negative evaluation than men, which was linked to their lower likelihood of posting negative feedback publicly. This provided additional evidence that fear of negative evaluation could be a significant factor behind the observed gender rating gap. The researchers concluded that women’s concerns about social judgment or backlash might discourage them from posting reviews when they are dissatisfied, thereby leading to a more positive overall rating from female reviewers.

“This may sound trivial, but it is not: you can’t take what you read online at face value,” Dover said. “When you read online reviews to help you make decisions you need to realize there is a real person behind the content, in a specific social context and influenced by social norms. It’s not just a person providing their authentic opinion, they are affected by their environment and how people may judge them for their opinions.”

“There is this thinking that because communication on the internet is done through screens and keyboards, that people will freely express themselves and write whatever is on their minds. On the contrary, we find that women more readily censor themselves when they think their opinions are negative, whether it is because they don’t want to appear judgmental or they have more empathy towards the review subject.”

“The internet should be a place where everyone feels free to contribute, especially if their opinion is economically important and useful. What we find hints that a big part of the population does not feel that way, and not only that this is not ‘fair,’ it may be hurting whole markets, reducing representativeness and biasing consumer decisions.”

The researchers highlighted some limitations of the study and proposed areas for future investigation.

“Like all scientific work, one should be very careful when interpreting data analysis,” Dover explained. “The main finding of this work is that there is a consistent gap in favor of women, across relatively very big data sets and different platforms and categories. While we ran a few experimental studies, these are just the first efforts in what should be a whole research agenda to go deeper and find out why this gap exists. We find initial evidence of this self-censoring effect, but to really establish it – future research should look at other contexts, try to replicate what we find in and out of the lab and study larger samples.”

“There is more and more evidence that user-generated content on digital platforms is strongly affecting social dynamics and our lives in good and bad ways,” Dover added. “Think about misinformation and negative influence in social media, increasing polarization, on the one hand but also positive social connections in the digital world, more transparency, improved access to useful information, etc.”

“We want to understand more broadly how to construct digital platforms such that they do more good than harm, how to shape technology such that humanity benefits from it towards a better future, while decreasing risks. This is probably one of the most important roles of social scientists studying human behavior on digital platforms, we believe.”

The study, “Gender rating gap in online reviews,” was authored by Andreas Bayerl, Yaniv Dover, Hila Riemer, and Daniel Shapira.