Does the gender-equality paradox actually exist?

Testing the claim that more feminist countries have fewer women in STEM.

Jul 02, 2021

(Note: Your email client might truncate this post for being too long. If that happens, you can read the whole thing at dynomight.net/gender-equality-paradox.)

Act I

In 2018, Stoet and Geary had one of the most surprising results in social science in a decade. They took the Global Gender Gap Index (GGGI), which measures gender equality, and plotted it against the percentage of women among STEM graduates.

Finland has high equality but few women in STEM, while Algeria is the opposite. That’s the trend.

Why this would be true is unclear, but the result seems hard to dispute. It’s obvious from the graph that GGGI is measuring something, and you don’t need to trust any fancy statistics. You can just look at the data.

This was picked up by The Atlantic, The American Enterprise Institute, Ars Technica, MacLean’s, and Jordan Peterson. Stoet and Geary also published an article at Quillette, where they suggest the graph is partly due to different levels of interest in STEM and partly to comparative advantage—in places like Finland, girls perform similarly to boys in science but much better in reading, meaning fewer girls have science as their personal best subject.

Act II

Inevitably, this was disputed. Richardson and colleagues took the same data and found that the percentage of women among STEM graduates was completely different. They—I think—contacted the journal, which led to a corrigendum from Stoet and Geary in late 2019. This clarified what’s on the x-axis in the above graph:

The propensity of women to graduate with STEM degrees was a/(a + b), where a is the percentage of women who graduate with STEM degrees (relative to all women graduating) and b is the percentage of men who graduate with STEM degrees (relative to all men graduating).

Get that? Take a country with the following graduates each year:

              Men   Women
STEM degrees  100   10
All degrees   1000  100

Women make up 10/110 or around 9.1% of STEM graduates. However, their formula gives 50%, since 1/10th of women do STEM, just like 1/10th of men.

There’s a good argument for this. The most salient fact about the above country is that few women get degrees, rather than anything STEM-specific. Stoet and Geary’s formula is invariant to this kind of imbalance.

There’s also a good argument against this formula. Maybe you think that this imbalance is really important, and you don’t want to be invariant to it.

What there’s not a good argument for is calling this quantity “Women Among STEM Graduates (%)”! It’s not how this happened. In any case, Stoet and Geary don’t change much about their paper other than adding the quote above and inserting the word “propensity” everywhere.

Act III

Simultaneously with Stoet and Geary’s corrigendum, Richardson and colleagues published a commentary on the corrected paper. They argue:

Propensities are bad.
It’s not cool to use GGGI because it “measures achieved outcomes, not propensities” and “is not intended to be used to causally explain outcomes”.
Better than GGGI is the ultra-simple Basic Indicator of Gender Inequality (BIGI). Stoet and Geary shouldn’t object to this, since it was proposed by… Stoet and Geary.
If they compute the actual percentage of STEM degrees earned by women and plot it against BIGI, they get this graph, along with a non-significant regression coefficient.

They also published articles in Slate and on their blog. This was picked up by Buzzfeed and The Scientist, but doesn’t seem to have gotten as much publicity as the original article.

Act IV

In 2020, Breda and colleagues joined the party. They published a paper, part of this uses the same propensities as Stoet and Geary use. They argue this is worthwhile both because the original result is well-known and because it’s nice to be invariant to imbalances in the overall number of degrees.

Their first observation that the propensities aren’t just correlated with GGGI, but with all sorts of other stuff as well:

GDP per capita.
The human development index.
Income inequality, measured via the Gini index.
The Coefficient of Human Inequality.

They do a regression to predict propensities from each of these variables (one variable at a time) and get these coefficients (from Table S5):

Everything “good” is associated with fewer women in STEM, be it more GDP, more development, less income/human inequality, or more gender equality.

Their goal was to test how all this relates to gender stereotypes. They took the PISA 2012 data, and looked at how boys and girls felt about these two statements:

“Whether or not I do well in mathematics is completely up to me.”
“My parents believe it’s important for me to study mathematics.”

These were chosen because they don’t directly mention gender, reducing the risk of social desirability bias.

Their stereotype score for each country reflects how much boys vs. girls agree with the above statements. If a boy (girl) of equal math ability is more likely to agree, the stereotype score is positive (negative).

Their main result is a second regression to predict STEM propensities, now controlling for the stereotype scores in each country:

Knowing stereotypes makes the other variables less predictive, dramatically so in some cases (Human Inequality) less so for others (GGGI).

This paper is often summarized (e.g. on Wikipedia) with quotes like this (emphasis mine):

The stereotype associating math to men is stronger in more egalitarian and developed countries. It is also strongly associated with various measures of female underrepresentation in math-intensive fields and can therefore entirely explain the gender-equality paradox.

However, most of their paper is about predicting other things (e.g., the intention to study STEM) where controlling for stereotypes has a stronger effect. I think it’s misleading to take them as claiming to entirely explain Stoet and Geary’s paradox, when the reduction for GGGI coefficient above is so modest.

Paradox dissolved?

After reading these follow-up papers, I had the impression the original study was debunked. But notice three things:

First, causality isn’t everything. Richardson et al. think that BIGI is better than GGGI for establishing causality. I don’t understand their reasoning in the slightest, but it doesn’t matter. None of these analyses establish causality.

Still, does the paradox actually exist? It can’t simultaneously be false (as Richardson et al. seem to claim) and true but explained by gender stereotypes (as Breda et al. claim.) Which is it? Let’s figure that out before worrying about causality.

Second, stereotypes don’t solve the paradox. Suppose that the paradox was entirely explained by gender stereotypes. That’s valuable but leaves the mystery of why more gender-equal countries should have stronger stereotypes!

It could be cultural. Gender-equal countries are generally richer. Maybe that leaves more resources for The Patriarchy to spend indoctrinating everyone, something it couldn’t afford to when worried about survival.
It could be intrinsic interest. Maybe women are less likely to have STEM as their #1 choice, but in unequal countries they have few other options and so they conclude math is important for them.
It could be some impossible-to-disentangle combination. Maybe parents in gender-unequal countries know that their daughters have fewer opportunities, and so they constantly tell them how amazing math is, resulting in those girls liking math.

Third, it’s unclear how fragile the result it. Richardson et al. say that the paradox only appears because of “contrived measures and selective data”. Certainly, if the paradox only appears after torturing the data in one way, we shouldn’t trust it. But their evidence is… what happened when the tortured the data in one other way. Shouldn’t we try a bunch of analyses, and see how robust things are?

A bunch of analyses

Let’s start with the original analysis, relating GGGI to propensities.

This the same as the original Stoet and Geary figure, with three small changes:

Switch the axes.
Color countries according to continent.
Show a LOWESS smoothing (linearity is for wimps) along with a 95% confidence interval, computed using bootstrapping.

A different calculation for STEM-participation

The above figure uses propensities, which is a major point of contention. Personally, I think this debate is silly. Propensities give one view of the data, while the raw fraction of women in STEM gives another. They both have value.

So, what if Stoet and Geary had just switched to using the actual percentage of women among people who earn STEM degrees, as Richardson et al. suggest they should have? They’d have gotten the following curve, where I’ve included non-STEM degrees for context.

In more-equal countries, women earn a larger share of non-STEM degrees, but a smaller share of STEM degrees. The paradox is still there.

Other measures of equality

Maybe this all depends on some weirdness with how GGGI measures equality? A newer alternative is the Gender Inequality Index (GII). I took the 2019 rankings and used them instead of GGGI.

Be careful interpreting this graph: While more equality meant more GGGI, it means less GII.

Again, the most gender-equal countries have a smaller fraction of women in STEM, but not non-STEM. With propensities, this effect is even stronger.

A third alternative is BIGI, as suggested by Richardson et al. Be very careful here. BIGI is negative when women are favored and positive when men are favored. Equality occurs around zero.

The more women are favored, the more non-STEM degrees they earn. With STEM degrees, women earn the smallest share for BIGI ≈ -.02, where women are just slightly favored. The fraction increases when there’s more inequality in either direction. Comparing BIGI to propensities gives a stronger, but less symmetric, effect.

While we’re on the subject… The red dots in the above graph show the same data as in Richardson et al.’s commentary above, which they used to claim that there was no gender-equality-paradox. What’s going on?

For one thing, I made the graph differently, switching the axes and using smaller markers so you can see the density of countries.

For another thing, they did a linear regression and found no significant result. That’s not too surprising, given that the effect above is nonlinear and symmetric.

Against BIGI

We have three different measures of gender inequality, GGI, GII, and BIGI. Here’s a plot of GGGI against GII:

Are the Philippines more gender-equal than Japan (as GGGI implies) or the opposite (as GII implies)? I don’t know, but I’ll accept that it depends on different, reasonable definitions of gender-equal.

On the other hand, here’s a plot of GGGI against BIGI:

According to BIGI, Saudi Arabia—where women can only show their hands and eyes in public and must have a legal male guardian—is basically the same as Switzerland. Lesotho—the tiny country inside South Africa—is by far the most women-favored place in the entire world. Ooohkaaay.

This isn’t to say that BIGI is bad—they specifically discuss Saudi Arabia in their paper—but that it doesn’t capture what we have in mind when thinking about a gender-equality paradox.

Other measures of women in STEM

While the result seems robust to different measures of gender equality, everything above uses the same data from UNESCO on the number of STEM graduates. We’ve analyzed it both in terms of propensities and raw fractions, and the result is still robust. Still, what if we use a different data source entirely to measure STEM participation?

For variety, I looked at the female share of researchers in engineering and technology. If you compare this to GGGI, there’s really no paradox at all.

If you look at natural science researchers instead of engineering, you again see no paradox.

On the other hand, if you compare to GII instead of GGI, you do see an effect in the most gender-equal countries:

Comparing GII to the natural sciences shows more of a leveling off than a full reversal. I’m not sure if that’s a “paradox” but it’s not something I’d have predicted.

Takeaways

So, is there a gender-equality paradox? Three points.

First, Stoet and Geary’s original paradox is robust. It doesn’t matter how you measure gender inequality and or if you use propensities or raw fractions to measure women’s fraction of STEM degrees. It’s not fair to imply that they cherry-picked the details of their analysis to support some pre-determined conclusion.

Second, the paradox is somewhat limited. It appears with STEM degrees no matter how you define “equality” and how you torture the data. For STEM researchers, the effect depends on the definition of gender equality, and it is more modest when it does appear. This is weird, and I don’t understand it. Still, it shows that we need more nuance than “more gender equality → fewer women in STEM”.

Third, resist simplistic causal explanations! People choose degrees for lots of reasons: Economics, working conditions, family influences, cultural/media influences, intrinsic interest, and simply what degree programs are accessible. Most of these operate in feedback loops with each other. My love for scatterplots is vaster than the seas, but they’re at most vaguely suggestive of any single cause.