I got involved in a discussion (on Daily Nous) of Carolyn Dicey Jennings’ data about US (I think) philosophy hires. In was in the context of a characterisation of “the New Consensus”. This all seems somewhat mixed up with the recent Leiterevents, but a lot of themes remind me of stuff I heard in graduate school in the 1990s.
In any case, the initial claim is:
who suggests that Carolyn Dicey Jennings’ data (that women who receive TT jobs have on average half the publications of men who receive TT jobs) indicates that women get preferential treatment
With a follow up by a different commentator:
@JT – CDJ’s attempt to provide an alternative explanation for her data seems rather tortured, and has widely been recognised as such. I agree that we don’t know whether AA in hiring overcompensates for other, previous discrimination.
The alternative explanation (at least the first move):
What is the mean number of publications for women and men in this data set? For all of the jobs (tenure-track, postdoctral, and VAP) and for all peer-reviewed publications, placed women have an average of 1.13 publications, whereas placed men have an average of 2.17 publications. Thus it looks as though placed men have one more publication, on average, than placed women. Yet, if we look at median number of publications, this difference evaporates: the midpoint of publications by both women and men is 1 publication. (The mode is 0 for each.) Why this difference between mean and median? The difference comes down to those at the extremes: 15% of men and 5% of women have 5+ publications.
Roughly, if you have a distribution of quantities with no upper bound and skewed left in a kinda of long tail, mean as a measure of central tendency is vulnerable to outliers. (This is roughly what I was saying here.)
There are several other interesting posts by Philippe Lemoine. I owe them a response, but I’ve started but not finished a line of analysis and want to get an interim report out on that, so I won’t really engage his points yet. Sorry Philippe!
First, it’s clear that this discussion could get pretty cantankerous esp. as things fit or fail to fit various political/policy positions. I’m not yet ready to discuss policy recommendations, but I want to get clear on the data.
Second, my bias is to suspect that the market is disproportionately adversarial to women. Considerations of implicit bias (though in conflict with positive action) etc. would suggest this straight off. However, I don’t know that initial tenure track hires is a place where this plays out strongly. Regardless, I will definitely be inclined to keep looking when analysis suggests otherwise and this raises a risk of confirmation bias even if I don’t delude myself about any piece of analysis. Fortunately, Philippe seems to have different priors so this might help. I’m pretty cognisant of this problem which can help.
Also, the current analyses are really just too shallow to say much in any direction. I think e.g., Philippe, me, and CDJ all agree on this.
Third, if it turns out that the TT job market isn’t unduly adversarial for women, I will be delighted. This is a great outcome. If it is unduly and unjustifiably adversarial for other groups that will not be good, but I don’t want to ignore the good. Lack of negative bias against women in hiring is a good thing.
Fourth, the data are probably not even close to sufficient to making strong conclusions, if only because we don’t know what the unsuccessful candidate pool looks like. But also,
- I’m pretty sure publication number are not the only consideration in determining a good candidate. Indeed, there’s plenty of prima facie reasons to supposed it’s not even correlated with overall quality, e.g., possible trade offs between teaching and research or quantity and quality.
- Gender might be correlated with other properties, e.g., program ranking which might dominate. I.e., when you control for the other factor, differences seemingly due to gender might disappear.
- Most of these candidates (I think!?) didn’t compete with each other as they weren’t all applying for the same pool of jobs. Some jobs might be out of reach due to AOS or might have been less desirable due to location or dept. We need a model of how the decision making might be unduly influenced and preferably at least an operational notion of problematic bias.
- And, I’ll just repeat and following on from the prior point, without some idea about the unsuccessful pool, it’s hard to make conclusions about why the current set got in. After all, you don’t need to beat out the other successful candidates for other jobs, just the unsuccessful ones for your job. If the whole pool of unsuccessful candidates is worse than the whole pool of successful candidates (and the head to heads are appropriately distributed), then the differences between the male and female pools of successful candidates are not evidence of bias in selection, just differences in the cohort.
Toward the “weight at the high end” hypothesis
So, my first move, I’ve broken out the data in two ways:
- I separate out by year (2012 and 2013). There’s three reasons for this: a) candidates primarily compete within a year (esp. successful ones…I presume most successful candidates for a TT position don’t go on the job market the very next year; if you did so, I’d love to hear why!), b) the selection committees and positions are different from year to year, and c) the first time through I tried to do it in Excel and for some reason I found it easier to start with 2012 alone and it kinda stuck through. What? Analysis isn’t always pretty, y’know!
- Within each year I break down the male and female cohorts by number of publications so we can get a more precise view of the distribution.
Method: I imported Data 2 from CDJ’s spreadsheet into BaseX and ran some queries to extract the first three columns for each set, then Excelled the rest. I’ll release the whole thing when I have it a bit further along. I’m using the “PR_Pubs”.
Here are the tables (sorry for the screenshots, but WP is sucking for me now; I need to decide whether to go premium or just move the blog):
Key: Pub Ct = Number of publications for a candidate. PubTotal = The cumulative number of publications for the cohort up to that row. CumAvg = The average number of publications for the cohort up to that row. Cum%=the percentage of that cohort up to that row. % of all=the percentage of the cohort appearing in that row. The totals are the total number of candidates.
I didn’t break out the medians per se, though you can sorta see where they’ll be. The first thing I noticed is that where the “CumAvg”s diverge: In 2012…huh! Well, in one version when I rounded to one decimal place, they didn’t diverge until 3 pubs (whereas in 2013, they diverge after 0). Here, they diverge at 1 because I’m not rounding/truncing/whatever Excel does before then. Hmmm. And of course, if you Whatever to the integer, divergence starts happening at 7 (2012) or 6 (2013).
I’m really not sure how to go here. On the one hand, a difference in averages of 0.03 papers doesn’t seem very meaningful. On the other hand, a difference of 0.6 does seem meaningful. I guess, the key think is that there is a lean toward the male cohort even when the differences aren’t very meaningful. So I’ll leave that as it is for the moment.
In 2012, 94% of the women and 73% of the men had 3 or under publications. 2013 had a higher publication year for both cohorts. What’s interesting to me is that the 0-pub percentage stays roughly at 1/3 for the men and a bit under 1/2 for the women across both years. There’s a bit of shuffling at the 1s and 2s, with the 2012 cohort outperforming the men (as percentages) in 1s and 2s (which helps explain why their divergence is delayed in 2012).
Overall, men outnumber women 2 to 1. This means there’s more “room” for more exceptional candidates (publicationwise) in a sense.
So what does this mean? Got me. These years, the successful women candidate cohort had more 0s and fewer of the high end. But it’s not clear what the “natural” rate should be. (John Proveti mentioned that if we have a lot of female continental candidates, they may be more book than paper oriented and that might make a difference.)
The jump between the % of women with 1 pub in 2012 (30%) and 2013 (22%) makes me a bit wary (esp. when it’s the same number of women :))
Well, it’s all rather tentative at the moment. I guess my first thought is that these data don’t show any evidence that women at being discriminated against at the TT hiring level. If only like 2% of women had 0 and most 1 where the male numbers stayed the same, that would be pretty striking. Similar in the reverse. But that’s not the case. What we have is a lot of 0s, a fair bit of 1s and maybe 2s, and then a lot of variation. The curves look pretty similar:
My second thought is that I find the gap in the 0s more concerning than the gap at the high end. I’m not quite sure whether this is well grounded or not. My intuition is that large number of publications aren’t really typical, but 0 vs. 1 might be significant. Either way, I want to know what’s going on and whether this is predictive of publication in the future (or or success in getting tenure).
My third thought is that I still don’t know if sex is a selection bias, but this data doesn’t rule it out for sure. Whether you find it suggestive of pro-woman bias depends at this point, I’d warrant, on your priors, more than anything else. But I think I agree with Philippe that my simple conceptual example (where a couple of outliers at the high end really mess things up) is probably not what’s going on here, though I don’t see that:
Of course, when the mean number of publications is greater for men than for women even though the median is the same, it’s also conceivable that it’s because a handful of men have a very large number of publications. But, for this to explain a difference between the mean numbers of publications as significant as that which Carolyn found, the number of publications of those men would really have to be ridiculous. So ridiculous that we can pretty much rule out this possibility at the outset, because we know that nobody goes on the market with that many publications.
I’m not sure what would count as a “handful”, but at least in 2012 we have 3 people with 12 and 1 with 14. If we added 3 with 14 (for 4 in total), we move the culm average from 2.06 to 2.27 for men. So significant movement can be made with small numbers within the bound of what existed. Now that’s not the full difference, but it’s non-neglible. So I’m not sure it was “ridiculous”. Of course, it’s not quite the case, so I’m happy to concede the point in this instance for the moment. (Hedge!)
I would hope this is “needless to say”, but all this is rather preliminary and there may be all sorts of errors not least in the translation to blog post. Corrections and suggestions most welcome.