A recent study in The Leadership Quarterly revisits the claim that women governors during COVID-19 achieved better outcomes, including fewer deaths. The study shows that earlier findings are highly sensitive to specific assumptions, and once adjusted, gender has no significant impact on COVID-19 deaths.
The motivation behind the new study stemmed from the substantial media and academic attention given to the idea that women political leaders were particularly effective during the early stages of the COVID-19 pandemic. Reports highlighted leaders such as Angela Merkel of Germany and Jacinda Ardern of New Zealand as examples of women who handled the crisis better than many of their male counterparts.
Building on this narrative, academic studies, including a 2020 influential paper by Sergent and Stajkovic, indicated that U.S. states led by women governors experienced fewer COVID-19 deaths. The researchers behind the new study sought to test the robustness of this “women leadership advantage during crisis” hypothesis.
“I am a strong advocate of replication research. No single research project, including ours, should ever be seen as the final answer,” said study author William “Billy” Obenauer, an associate professor of management at the University of Maine.
“Instead, we should work to build on previous research and understand the boundary conditions of previously observed relationships. Given the implications of associating gender with leadership effectiveness, it seemed like using replication to gain a stronger understanding of the relationships reported by Sergent and Stajkovic (2020) was an important task to take on.”
The study was divided into several stages, starting with a literal replication of the original 2020 study by Sergent and Stajkovic. This replication used similar data and methods to see if the original findings—an association between women governors and lower COVID-19 death rates—could be reproduced. The researchers focused on COVID-19 death counts through May 2020 and utilized the same control variables, such as governor attributes (age, political affiliation, and tenure), state population, and various pandemic response measures (stay-at-home orders, travel bans, mask mandates, and more).
The literal replication successfully reproduced the original results, showing that women governors were associated with fewer COVID-19 deaths when the same variables and statistical methods were applied. This initial success validated the dataset and served as a foundation for the subsequent studies, which aimed to explore the robustness of these findings under different conditions.
Next, the researchers conducted four constructive replications (Studies 2A-D) designed to challenge the assumptions and methodological choices of the original study. Constructive replications are studies that repeat previous research but with modifications to the methods or assumptions, aiming to test the robustness and generalizability of the original findings under different conditions.
In the first constructive replication, the researchers tested the effects of removing potentially problematic control variables from the model, such as various non-pharmaceutical interventions. These variables, like stay-at-home orders and travel bans, were problematic because they may have been influenced by the very leaders whose effectiveness was being studied. This could create a bias in the results. Once these variables were removed, the relationship between governor gender and COVID-19 deaths no longer reached statistical significance, suggesting that the original finding was not robust.
The second replication introduced a new control variable: the proximity of a state to New York City, the early epicenter of the pandemic in the U.S. States near New York, including New Jersey and Connecticut, experienced high COVID-19 case numbers early in the pandemic. These states were all led by male governors, potentially skewing the original results. When this geographic factor was included in the analysis, the relationship between governor gender and COVID-19 deaths disappeared, indicating that proximity to the pandemic’s epicenter, not leader gender, was a more significant predictor of death rates.
In the third replication, the researchers focused on a key statistical method used in the original study called analysis of covariance (ANCOVA). ANCOVA is a technique that helps control for the effects of other variables (called covariates) when examining the relationship between an independent variable (in this case, governor gender) and a dependent variable (COVID-19 deaths). However, ANCOVA operates under the assumption that the relationship between these covariates and the outcome is linear. But several of the variables in the original model did not meet this assumption. When the model was adjusted to account for these non-linear relationships, the association between governor gender and COVID-19 deaths again became non-significant.
In the final constructive replication, the researchers applied all of the previous modifications simultaneously: removing problematic control variables, accounting for proximity to New York City, and correcting for non-linear relationships. In this fully adjusted model, there was no evidence of a relationship between governor gender and COVID-19 deaths, strongly suggesting that the original findings were largely the result of model specification errors.
To go beyond the correlational findings of the earlier studies, the researchers implemented two causal testing methods: a geographic matching design and a regression discontinuity design. These methodologies are better suited to identifying causal relationships by comparing more similar groups and reducing the likelihood of confounding factors.
First, the researchers compared U.S. counties located on either side of state borders where the governors had different genders. The idea was that counties close to each other would share many demographic, economic, and cultural similarities, allowing the researchers to isolate the effect of governor gender on COVID-19 deaths. The results showed no significant difference in COVID-19 deaths between counties led by women governors and those led by men governors. This finding suggested that the earlier results linking governor gender to better crisis management did not hold up under more rigorous, localized analysis.
Next, the researchers extended their analysis to Brazilian municipalities. They used a regression discontinuity design, which compares municipalities that had very close elections between male and female candidates. This design is particularly powerful for causal inference because the close election results create a scenario similar to random assignment: the municipalities just above or below the vote threshold for electing a woman are likely very similar in other respects. Again, the analysis found no significant difference in COVID-19 deaths based on the gender of the elected mayor, reinforcing the conclusion that there is no causal relationship between leader gender and crisis outcomes.
In short, the results of the constructive replications and causal tests demonstrated that the initial findings were highly dependent on specific methodological choices. Once adjustments were made to the model—such as accounting for geographic proximity to pandemic hotspots and removing problematic control variables—the gender effect disappeared. While women leaders were initially celebrated for their pandemic response, this new research suggests that any perceived advantage may have been overstated.
“News organizations get clicks by promoting research with eye-catching findings,” Obenauer told PsyPost. “The public often sees these findings as ‘established knowledge,’ but they’re really just a piece of the puzzle. If you are interested in research that you learn about in the news, look into other academic research on the topic. Before your company implements something based on organizational research reported in the news, consider working with the business school at your local university to understand more about the topic.”
While this study makes a strong case for the importance of robust statistical methods in leadership research, it also has limitations of its own. One notable limitation is the relatively small sample of women governors in the U.S., as only 12 out of 55 states and territories were led by women during the study period. This imbalance could have contributed to the difficulty in detecting meaningful differences.
“I can’t stress this enough – no single research project should ever be seen as the final answer,” Obenauer said.
Replication studies are important because they test the reliability and validity of previous research findings, ensuring that results are not due to chance or specific conditions. By repeating studies with varied methods or in different contexts, replication helps confirm the generalizability and robustness of scientific conclusions.
“I am currently leading a global team of more than 30 researchers conducting a large-scale replication of a seminal piece of leadership research through the Advancement of Replications Initiative in Management,” Obenauer said. “We are in the midst of data collection and should have a first draft of a paper ready to submit to a journal this spring. We will also be recruiting collaborators for our next project this spring. I have four or five other active replication projects. I hope that by continuing to publish high-quality replication research that builds upon prior research, rather than tearing it down, my work will contribute to a culture where replication research is embraced by top journals in the organizational sciences.”
“A colleague and I are looking for a large organization that would be interested in having managers and subordinates participate in a research study—we would provide customized insights in exchange for collaboration. I would love to talk with organizational leaders who would like to explore such a partnership.”
The study, “Are women strategic leaders more effective during a crisis than men strategic leaders? A causal analysis of the relationship between strategic leader gender and outcomes during the COVID-19 crisis,” was authored by William G. Obenauer, Jost Sieweke, Nicolas Bastardoz, Paulo R. Arvate, Brooke A. Gazdag, and Tanja Hentschel.