In case you missed it, there was a big-splash paper in PNAS recently entitled “National hiring experiments reveal 2:1 faculty preference for women on STEM tenure-track“. Williams & Ceci found one pretty huge result in their various experiments:
Given otherwise identical candidates, professors preferred female candidates over male candidates by 2:1.
When I first read that, I thought “Great!” But now, after thinking about it some more and reading the paper more carefully, I’m less enthused for two pretty gigantic reasons: methods and interpretation.
I see one major methodological flaw in the experimental design, which briefly is this: Williams & Ceci wrote narrative summaries describing two candidates. They then randomly used either female or male pronouns and asked real faculty members to rank the candidates as if they were potential hires for their department. In fairness, this seems like the most tractable way to hold all variables other than gender as “constant”. However, this design relies on the assumption that the sentences “We view her as a 9.5/10.” and “We view him as a 9.5/10.” are identical in the minds of the reader, which they absolutely are not. Randomizing male and female names on job applications for “lab manager” positions results in everyone rating the male applicants as more competent, more hirable and deserving of higher salary and more mentorship. Another study put male and female names at the top of a woman’s CV and found the same thing – the male version of her CV was consistently ranked higher. It seems we all judge women more harshly than males. “He” and “she” are simply not equivalent and treating them as such greatly confounds the results presented in the paper. An alternate explanation for all the results in the study is that their methods are not comparing identical applicants at all and are instead showing underlying bias. I mean, riddle me this: if a woman has to be more competent to be viewed the same, given two identical candidates, isn’t the female one more competent?
Along that same vein, I don’t think the experiments were all that great in terms of controls. For example, they find that the single woman was preferred over the (“identical”) married man with children. This basically contradicts everything I’ve ever heard or read about the effects of gender and family on job prospects in Academia – which, summarized is this: Women are professionally penalized for being or wanting to be mothers. Women in general are thought of as ticking biological clocks and departments prefer candidates that will not stop the tenure clock, will not require any family leave and will devote 110% of their time to their labs. (There’s a good summary of the motherhood issue here and more info here or here, or there are some books available or you can go through the wonderful blog Tenure, She Wrote to get all sorts information). There are abundant stories about women being hassled regarding their family plans while on job interviews (it’s illegal to ask about for a reason – and people do it anyway, for a reason). It seems the “risk” of a woman eventually having a baby is something that strongly affects hiring decisions and candidate rankings in the real world. If we’re starting to intentionally overlook that, I think it’s great. But to just say flat-out that a single woman is preferred over a married man is so unbelievable to me that I think it indicates a flaw in this type of study: there is no cost to choosing the candidate you think you should pick vs. the one you might pick in real life. There’s no need to put your money where your mouth is. And again, I wonder if “identical” candidates are not “identical” in this experiment – maybe the hypothetical female candidate is imagined to be 28 years old and the married man with kids is 35 and the 7 year age difference dilutes the man’s accomplishments (or something like that). I just wonder if he’s ranked below her because people make unconscious assumptions about the person based on their lifestyle choices.
The final methods question I have involves a big “yippee” result from the paper – that people who chose to take a year off for family leave were not penalized for it, in fact they were preferred (again, against otherwise identical candidates). Here’s the problem with this one: if they’re identical except for the fact that one person skipped a whole year, who wouldn’t want the one who achieved the same amount of work in one year’s less time? Again, they’re not identical. I guess it’s good that prioritizing family in and of itself isn’t punished, but really, we’re saying “Do you want awesome candidate number one or do you want the equally awesome candidate two who also took a year off to do another awesome thing, stay home with their baby, who didn’t suffer any reduction in productivity?” The fact is, if you take time off, you get less (professional) work done. That’s what people hear when you talk about babies and maternity leave in the real world – you’re going to be a less (professionally) productive person and that might mean some of your work is going to be pushed on to a colleague. So I’m not all that excited or encouraged about this particular result.
Regarding their interpretation of their results – the authors believe what they have found means that women are now the preferred sex in hiring situations. If that’s true – if faculty really prefer females 2:1 over identical males, this has to be an intentional act for this to be good news. If the results indicate that faculty are aware of and would like to correct a dearth of female professors in their department, for instance – that’s great because if two candidates truly are identical, that’s the ideal situation to opt for the underrepresented group. Bravo. However, the authors specifically say their results do NOT represent faculty members actively choosing “socially desirable” outcomes (“i.e., endorsing gender diversity”) because one of their experiments evaluated a single candidate (instead of ranking three). According to them, this experimental design “avoids socially desirable reporting”. The single candidate experiment still showed a preference for females; the women were ranked an average of 10% higher than the identical male candidate. The authors say this “suggests that norms and values associated with gender diversity have become internalized in the population of US faculty.” Hm. I say again, favoring a female candidate because she’s female must be done intentionally. Otherwise, isn’t that just kind of maybe…sexism? (Cue me squeamishly opening a can of worms…) If we’re subconsciously awarding points for being female – that’s a problem. Identical candidates should be ranked identically, right? Correcting unbalanced diversity in departments needs to be done on purpose, not because we’ve “internalized” the message that women need extra help or are more deserving somehow. The authors briefly comment on this in the discussion:
Also, it is worth noting that female advantages come at a cost to men, who may be disadvantaged when competing against equally qualified women. Our society has emphasized increasing women’s representation in science, and many faculty members have internalized this goal. The moral implications of women’s hiring advantages are outside the scope of this article, but clearly deserve consideration.
Yikes! Everything about those sentences worries me. I guess the plus side of me having reservations about their methods means I’m not convinced that the survey respondents weren’t intentionally choosing females. Williams and Ceci did not inquire as to why the respondents were giving the rankings that they did and this would have been a really useful piece of the story. Perhaps the most useful piece because it distinguishes between a conscious “I’m aware of a diversity problem in our department and I think gender should be a factor in future hiring decisions, especially when all the candidates are truly outstanding with no difference in qualifications” and an unconscious “Women deserve points for being women.”
I hope the next gigantic study looks at non-identical candidates. I hope they ask people whether gender affected their decisions. I hope I’m not secretly sexist for thinking this study – where identical candidates were not ranked identically – sounds sexist.
And finally, my most cynical thought about this whole thing: Do these results even matter? I mean, I guess it does because if they had found the opposite result, I would find that depressing. But I never really thought that this stage is the problem – that if I got the interview and nailed the interview, that I wouldn’t get a fair ranking. There are so many other factors that I think are bigger problems (e.g., implicit bias, stereotype threat, the baby penalty, lack of mentors) that this is almost irrelevant. Almost. It’s still pretty good news to know that there’s not some overarching bias against sex in ideal scenarios. But do I agree with the authors when they say it’s “a propitious time for women launching careers in academic science”? I’m not so sure we’re there yet (and this paper certainly hasn’t convinced me so).