What to the white American is Juneteenth?

June 20, 2015

This is really a “notes toward” rather than a fully fleshed out essay. 

The title is a play on the classic Fredrick Douglass piece “What to the Slave is the Fourth of July?” which I urge you to read.

Three paragraphs stand out to me when I read them this Juneteenth:

But, such is not the state of the case. I say it with a sad sense of the disparity between us. I am not included within the pale of this glorious anniversary! Your high independence only reveals the immeasurable distance between us. The blessings in which you, this day, rejoice, are not enjoyed in common. — The rich inheritance of justice, liberty, prosperity and independence, bequeathed by your fathers, is shared by you, not by me. The sunlight that brought life and healing to you, has brought stripes and death to me. This Fourth [of] July is yours, not mineYou may rejoice, I must mourn. To drag a man in fetters into the grand illuminated temple of liberty, and call upon him to join you in joyous anthems, were inhuman mockery and sacrilegious irony. Do you mean, citizens, to mock me, by asking me to speak to-day? If so, there is a parallel to your conduct. And let me warn you that it is dangerous to copy the example of a nation whose crimes, lowering up to heaven, were thrown down by the breath of the Almighty, burying that nation in irrecoverable ruin! I can to-day take up the plaintive lament of a peeled and woe-smitten people!


Fellow-citizens; above your national, tumultuous joy, I hear the mournful wail of millions! whose chains, heavy and grievous yesterday, are, to-day, rendered more intolerable by the jubilee shouts that reach them. If I do forget, if I do not faithfully remember those bleeding children of sorrow this day, “may my right hand forget her cunning, and may my tongue cleave to the roof of my mouth!” To forget them, to pass lightly over their wrongs, and to chime in with the popular theme, would be treason most scandalous and shocking, and would make me a reproach before God and the world. My subject, then fellow-citizens, is AMERICAN SLAVERY. I shall see, this day, and its popular characteristics, from the slave’s point of view. Standing, there, identified with the American bondman, making his wrongs mine, I do not hesitate to declare, with all my soul, that the character and conduct of this nation never looked blacker to me than on this 4th of July! Whether we turn to the declarations of the past, or to the professions of the present, the conduct of the nation seems equally hideous and revolting. America is false to the past, false to the present, and solemnly binds herself to be false to the future.

and, finally:

What, to the American slave, is your 4th of July? I answer: a day that reveals to him, more than all other days in the year, the gross injustice and cruelty to which he is the constant victim. To him, your celebration is a sham; your boasted liberty, an unholy license; your national greatness, swelling vanity; your sounds of rejoicing are empty and heartless; your denunciations of tyrants, brass fronted impudence; your shouts of liberty and equality, hollow mockery; your prayers and hymns, your sermons and thanksgivings, with all your religious parade, and solemnity, are, to him, mere bombast, fraud, deception, impiety, and hypocrisy — a thin veil to cover up crimes which would disgrace a nation of savages. There is not a nation on the earth guilty of practices, more shocking and bloody, than are the people of these United States, at this very hour.

Juneteenth is an answer or the start of an answer. We know what Juneteenth was to the American slave as the American slave created it to celebrate the ending of their slavery. Between the first Juneteenth and today we have not yet fully become one society where all our citizens are treated with fully equal respect and regard. But between the first 4th of July and the first Juneteenth, we  moved from a society where slavery was deeply embedded to one where it was ended (though we are still dealing with the aftermath today).

Juneteenth says to me that the United States can change even on something that seemed so deep and powerful that it took treason and a bloody war to slay. So Juneteenth gives me hope that the full redemption of 4th of July is possible.


The Liberty Principle, Gay Marriage, and Sleeping Under Bridges

January 6, 2015

There is much to dislike about McAdams’s bog-standard right-wing “omg, PCness in the university” attacking Cheryl Abbate, with a fair number of the issues articulated in several Daily Nous posts. There are a lot of academic freedom bits to think about in everything from how Abbate handled the student, to McAdams’s response, to the university’s response to McAdams. At first blush, basically everyone except Abbate has behaved rather badly. (Really, Mr. Undergrad? You secretly taped your instructor during a fishing expedition? Sheesh.)

I do think the question she raised in class (roughly, what are some positions that conflict with Rawls’ Liberty principle) and the particular proposition (gay marriage bans or lack of gay marriage conflicts with the Liberty principle) is pretty interesting. So that’s what this blog post is about. I’m going to go with the minimal level of scholarship I can get away with as I don’t have any texts handy and don’t feel like futzing around to get them.

Rawls’ Liberty principle goes roughly (since there are some variants):

Each person has the same indefeasible claim to a fully adequate scheme of equal basic liberties, which scheme is compatible with the same scheme of liberties for all;

Now, there are a range of anti-gay marriage legal situations possible. Gay marriage might be unrecognised by the state in a variety of ways (e.g., there’s a legally identical status which is not called “marriage”; there’s a related status, but it doesn’t function the same way e.g., it allows for joint tax returns but only overridable next of kin status). Gay marriage or gay marriage recognition might be affirmatively banned (again, in a variety of ways up to making any sort of homosexual relationship illegal and harshly punished). The basic situation I’ll consider is that we have a legally recognised relationship called “marriage” which has roughly the set of formal and informal benefits and privileges that marriage in the US has and is restricted to opposite sex couples. (I’ll call this the Moderately Sucky Regime (MSR). It’s only moderately sucky because there aren’t punishments for being in a gay relationship and yes this is grading on a curve.) Is this permitted by the Liberty principle?

The “Duh It’s Incompatible” Line

I think this should be the obvious, default starting place. Take two women, Mary1 and Mary2 who different only in that Mary1 loves Juan (a cis-hetero-man) and Mary2 loves Juanita (a cis-lebsian-woman). In the MSR, Mary1 has right to marry Juan (assuming e.g., they both want to get married, they both aren’t otherwise currently  married, etc., so ceteris paribus), but Mary2 does not have the right to marry Juanita. Marrying is either a fairly basic liberty or it’s heavily implicated in a number of basic liberties or it is implied by some basic liberties (various forms of association, for example).

I take it most people think it’s a basic liberty these days. So this argument sets the burden appropriately.

The Majestic Awesomeness of Freedom to Marry Only Outside Your Orientation

There is the oft-quote Faux Liberty Principle (Anatole France):

The law, in its majestic equality, forbids rich and poor alike to sleep under bridges, beg in the streets or steal bread.

This is a principle driven by formalist equality: As long as there is no formal or perhaps explicit inclusion of group distinction, then the law treats those groups equally. The application of this variant of the principle to gay marriage would be something like:

Hey! Mary2 can get married…to a person of the opposite sex. EVERYONE can get married to someone of the opposite sex. Even straight folks can’t marry people of the same sex. So everyone has exactly the same rights!!!

I think this is a possibly non-homophobic attempt to reconcile anti-gay-marriage with the Liberty principle. Indeed, it could be offered as a reductio of the Liberty principle as a sufficient or correct or useful principle of justice.

Now, with respect to the Abbate case, it’s important to note that the gay marriage instance of the Majestic Equality reading, while justifying the MSR, is not the only instance. The original one will do nicely. One can run it for less controversial marriage situations as well as many other disparate impact laws. The gay marriage version is merely timely not uniquely good. Timely topics can be pedagogically effective but they can also be a pedagogic disaster. This is easily seen when the learning outcome has little to do with the timely topic per se. As timely, you run the risk that people will be too engaged with it either because they have settled and passionate opinions or they just can’t easily separate out the public focus from what’s needed to make the classroom point. So the benefit (the students have knowledge and interest) can be a problem.

This is putting aside the possibility that people might behave badly to the detriment of other students or a reliantly hammering on even the non-homophobic variant might be unduly and pointlessly upsetting to other students. You don’t have to think that one must shield students from every uncomfortable thing to acknowledge that upsetting students in a class when there is no pedagogic benefit attached to it is something that should be avoided. Confusing students can be pedagogically useful as well, but that doesn’t justify all confusings.

The Inadequacy of Majestic Equality

Majestic equality fails because a majestically equal scheme of basic liberties might not be a fully adequate scheme of equal basic liberties. Indeed, it’s trivial to generate loads of obviously bonkers schemes of majestically equal basic liberties: E.g., consider a law which forbids advocacy of Republican (or Democratic) political positions. Hey! They affect everyone equally! Or consider a law forbidding belonging to a Christian religion. Hey! Muslims and atheists are forbidden from joining Catholicism as well! EQUALITY!!! Etc. etc. etc.

Clearly, that a law doesn’t carve out a set of persons by name for specifically restricted liberty doesn’t mean it doesn’t, essentially, restrict liberty for some group. I don’t think it’s at all a stretch to read “fully adequate scheme of equal basic liberties” as excluding such shenanigans. It’s unlikely that purely formal criteria will do the job. (I feel like there must be a theorem to this effect somewhere.)

More Iterations

There are definitely more moves to be made or these can be deepened. However, it’s really easy to get sucked into a US legal discussion or just go into a general discussion of gay marriage. For example, if an anti-er goes for a definitional move, “But ‘marriage’ just MEANS 1 man-1 women because procreation.” (or the “compelling interest” variant), it’s not going to illuminate the Liberty principle very much. Similarly, denying that marriage is a basic right does mean that anti-gay marriage might not violate the Liberty principle per se (though it probably dies on the second principle), but then it’s a bad example. If you do concede it’s a basic right then it’s hard to see how bans aren’t an immediate clash with the liberty principle. If you don’t concede that, then it’s irrelevant. Debating whether it is a basic right is also irrelevant (much of the time) to a discussion of Liberty principle applicability.

Some Philosophy Hiring Data Analysis

December 29, 2014

I got involved in a discussion (on Daily Nous) of Carolyn Dicey Jennings’ data about US (I think) philosophy hires. In was in the context of a characterisation of “the New Consensus”. This all seems somewhat mixed up with the recent Leiterevents, but a lot of themes remind me of stuff I heard in graduate school in the 1990s.


In any case, the initial claim is:

who suggests that Carolyn Dicey Jennings’ data (that women who receive TT jobs have on average half the publications of men who receive TT jobs) indicates that women get preferential treatment

With a follow up by a different commentator:

@JT – CDJ’s attempt to provide an alternative explanation for her data seems rather tortured, and has widely been recognised as such. I agree that we don’t know whether AA in hiring overcompensates for other, previous discrimination.

The alternative explanation (at least the first move):

What is the mean number of publications for women and men in this data set? For all of the jobs (tenure-track, postdoctral, and VAP) and for all peer-reviewed publications, placed women have an average of 1.13 publications, whereas placed men have an average of 2.17 publications. Thus it looks as though placed men have one more publication, on average, than placed women. Yet, if we look at median number of publications, this difference evaporates: the midpoint of publications by both women and  men is 1 publication. (The mode is 0 for each.) Why this difference between mean and median? The difference comes down to those at the extremes: 15% of men and 5% of women have 5+ publications.

Roughly, if you have a distribution of quantities with no upper bound and skewed left in a kinda of long tail, mean as a measure of central tendency is vulnerable to outliers. (This is roughly what I was saying here.)

There are several other interesting posts by Philippe Lemoine. I owe them a response, but I’ve started but not finished a line of analysis and want to get an interim report out on that, so I won’t really engage his points yet. Sorry Philippe!

Some Considerations

First, it’s clear that this discussion could get pretty cantankerous esp. as things fit or fail to fit various political/policy positions. I’m not yet ready to discuss policy recommendations, but I want to get clear on the data.

Second, my bias is to suspect that the market is disproportionately adversarial to women. Considerations of implicit bias (though in conflict with positive action) etc. would suggest this straight off. However, I don’t know that initial tenure track hires is a place where this plays out strongly. Regardless, I will definitely be inclined to keep looking when analysis suggests otherwise and this raises a risk of confirmation bias even if I don’t delude myself about any piece of analysis. Fortunately, Philippe seems to have different priors so this might help. I’m pretty cognisant of this problem which can help.

Also, the current analyses are really just too shallow to say much in any direction. I think e.g., Philippe, me, and CDJ all agree on this.

Third, if it turns out that the TT job market isn’t unduly adversarial for women, I will be delighted. This is a great outcome. If it is unduly and unjustifiably adversarial for other groups that will not be good, but I don’t want to ignore the good. Lack of negative bias against women in hiring is a good thing.

Fourth, the data are probably not even close to sufficient to making strong conclusions, if only because we don’t know what the unsuccessful candidate pool looks like. But also,

  • I’m pretty sure publication number are not the only consideration in determining a good candidate. Indeed, there’s plenty of prima facie reasons to supposed it’s not even correlated with overall quality, e.g., possible trade offs between teaching and research or quantity and quality.
  • Gender might be correlated with other properties, e.g., program ranking which might dominate. I.e., when you control for the other factor, differences seemingly due to gender might disappear.
  • Most of these candidates (I think!?) didn’t compete with each other as they weren’t all applying for the same pool of jobs. Some jobs might be out of reach due to AOS or might have been less desirable due to location or dept. We need a model of how the decision making might be unduly influenced and preferably at least an operational notion of problematic bias.
  • And, I’ll just repeat and following on from the prior point, without some idea about the unsuccessful pool, it’s hard to make conclusions about why the current set got in. After all, you don’t need to beat out the other successful candidates for other jobs, just the unsuccessful ones for your job. If the whole pool of unsuccessful candidates is worse than the whole pool of successful candidates (and the head to heads are appropriately distributed), then the differences between the male and female pools of successful candidates are not evidence of bias in selection, just differences in the cohort.

Toward the “weight at the high end” hypothesis

So, my first move, I’ve broken out the data in two ways:

  1. I separate out by year (2012 and 2013). There’s three reasons for this: a) candidates primarily compete within a year (esp. successful ones…I presume most successful candidates for a TT position don’t go on the job market the very next year; if you did so, I’d love to hear why!), b) the selection committees and positions are different from year to year, and c) the first time through I tried to do it in Excel and for some reason I found it easier to start with 2012 alone and it kinda stuck through. What? Analysis isn’t always pretty, y’know!
  2. Within each year I break down the male and female cohorts by number of publications so we can get a more precise view of the distribution.

Method: I imported Data 2 from CDJ’s spreadsheet into BaseX and ran some queries to extract the first three columns for each set, then Excelled the rest. I’ll release the whole thing when I have it a bit further along. I’m using the “PR_Pubs”.

Here are the tables (sorry for the screenshots, but WP is sucking for me now; I need to decide whether to go premium or just move the blog):


Key: Pub Ct = Number of publications for a candidate. PubTotal = The cumulative number of publications for the cohort up to that row. CumAvg = The average number of publications for the cohort up to that row. Cum%=the percentage of that cohort up to that row. % of all=the percentage of the cohort appearing in that row. The totals are the total number of candidates.

I didn’t break out the medians per se, though you can sorta see where they’ll be. The first thing I noticed is that where the “CumAvg”s diverge: In 2012…huh! Well, in one version when I rounded to one decimal place, they didn’t diverge until 3 pubs (whereas in 2013, they diverge after 0). Here, they diverge at 1 because I’m not rounding/truncing/whatever Excel does before then. Hmmm. And of course, if you Whatever to the integer, divergence starts happening at 7 (2012) or 6 (2013).

I’m really not sure how to go here. On the one hand, a difference in averages of 0.03 papers doesn’t seem very meaningful. On the other hand, a difference of 0.6 does seem meaningful. I guess, the key think is that there is a lean toward the male cohort even when the differences aren’t very meaningful. So I’ll leave that as it is for the moment.

In 2012, 94% of the women and 73% of the men had 3 or under publications. 2013 had a higher publication year for both cohorts. What’s interesting to me is that the 0-pub percentage stays roughly at 1/3 for the men and a bit under 1/2 for the women across both years. There’s a bit of shuffling at the 1s and 2s, with the 2012 cohort outperforming the men (as percentages) in 1s and 2s (which helps explain why their divergence is delayed in 2012).

Overall, men outnumber women 2 to 1. This means there’s more “room” for more exceptional candidates (publicationwise) in a sense.

So what does this mean? Got me. These years, the successful women candidate cohort had more 0s and fewer of the high end. But it’s not clear what the “natural” rate should be. (John Proveti mentioned that if we have a lot of female continental candidates, they may be more book than paper oriented and that might make a difference.)

The jump between the % of women with 1 pub in 2012 (30%) and 2013 (22%) makes me a bit wary (esp. when it’s the same number of women :))


Well, it’s all rather tentative at the moment. I guess my first thought is that these data don’t show any evidence that women at being discriminated against at the TT hiring level. If only like 2% of women had 0 and most 1 where the male numbers stayed the same, that would be pretty striking. Similar in the reverse. But that’s not the case. What we have is a lot of 0s, a fair bit of 1s and maybe 2s, and then a lot of variation. The curves look pretty similar:


My second thought is that I find the gap in the 0s more concerning than the gap at the high end. I’m not quite sure whether this is well grounded or not. My intuition is that large number of publications aren’t really typical, but 0 vs. 1 might be significant. Either way, I want to know what’s going on and whether this is predictive of publication in the future (or or success in getting tenure).

My third thought is that I still don’t know if sex is a selection bias, but this data doesn’t rule it out for sure. Whether you find it suggestive of pro-woman bias depends at this point, I’d warrant, on your priors, more than anything else. But I think I agree with Philippe that my simple conceptual example (where a couple of outliers at the high end really mess things up) is probably not what’s going on here, though I don’t see that:

Of course, when the mean number of publications is greater for men than for women even though the median is the same, it’s also conceivable that it’s because a handful of men have a very large number of publications. But, for this to explain a difference between the mean numbers of publications as significant as that which Carolyn found, the number of publications of those men would really have to be ridiculous. So ridiculous that we can pretty much rule out this possibility at the outset, because we know that nobody goes on the market with that many publications.

I’m not sure what would count as a “handful”, but at least in 2012 we have 3 people with 12 and 1 with 14. If we added 3 with 14 (for 4 in total), we move the culm average from 2.06 to 2.27 for men. So significant movement can be made with small numbers within the bound of what existed. Now that’s not the full difference, but it’s non-neglible. So I’m not sure it was “ridiculous”. Of course, it’s not quite the case, so I’m happy to concede the point in this instance for the moment. (Hedge!)

I would hope this is “needless to say”, but all this is rather preliminary and there may be all sorts of errors not least in the translation to blog post. Corrections and suggestions most welcome.

Music Monday: War on Xmas 2014

December 22, 2014

Xmas and New Year’s are on respective Thursdays. Wacky.

It does seem that in recent years the War on Xmas has wound down. As a dedicated soldier fighting against Xmas in all its forms, let me wish you a Happy Holidays, Season’s Greetings, and a hearty Solstice Salaam.

The most important front in the War on Xmas is, of course, music. The oppressive Xmas hegemony controls most public spaces and assaults us with endless aural evil of which we will not speak in detail lest we invoke insidious earworms! However, the revolutionary anti-Xmas cadres do, on occasion, produce effective musical blows against the saccharine onslaught! Here are a few of the more stirring.

The Waitresses: Xmas Wrapping

While covered to detriment (we’re hating on you Spice Girls and Glee, for two), the original is still wonderful in spite of the happy ending.

The ultimate Xmas shaggy dog story. For some reason the following two lines:

“A&P” has its provided me
With the world’s smallest turkey

make me really happy. I think it’s the combination of nostalgia for the “A&P” (I’ve not been in one in decades…do they still exist?) and the turkey. The poor poor inadequate yet appropriate turkey.

The Kinks: Father Xmas

Would any War on Xmas be complete without The Kinks delicately singing:

Father Christmas, give us some money
Don’t mess around with those silly toys
We’ll beat you up if you don’t hand it over
We want your bread so don’t make us annoyed
Give all the toys to the little rich boys

I think not!

(I’d prefer a flame thrower to a machine gun, of course.)

The Twelve Days after Xmas

This is also a favorite but there really is no worthwhile video. (OMG, it’s easy to do such a horrible performance of it that it almost becomes a Pro-Xmas song, even with all the bird carnage. “Enjoy” this one, if you dare.)

Of course, quality War on Xmas songs are rare. It’s not enough to be a parody or crabby Xmas song. We aim for quality! Here are some negative examples:

The Killers: Don’t Shoot Me Santa

Go ahead and shoot, Santa. Then eat a bullet.

After listening to this song and watching bits of the so-called video, I feel a Santanic  murder-suicide is totally appropriate.

The Ramones: Merry Christmas (I Don’t Want To Fight Tonight)

Ok, I didn’t want someone to die listening to this, but it’s just sorta blandly catchy with a really boring set of lyrics.



December 15, 2014

Just a weird little thing.

When scraping data from a paper (or any source), I grab both the numbers they give including derived numbers and try to rederive the numbers. This provides a couple of sanity checks (e.g., that my scraping was accurate) and gives me their “model” (even if it’s trivial).

Of course, you find stuff!

For example, I’m scraping the breakdown of a population across categories. The size of the population is 73,538 and they give both the number and the percentage in the breakdown. Thus, it’s trivial to rederive the percentage. So that’s what I do, but then I get four values that are off by one:

Paper Rederived
17.9 17.8
59.7 59.6
14.8 14.7
2.6 2.5

So, an off by one error. GRRR! Clearly this is a rounding problem, and looking at the unrounded results confirms this:

Paper Rederived Unrounded
17.9 17.8 17.8492752
59.7 59.6 59.64807311
14.8 14.7 14.74883734
2.6 2.5 2.548342354

Excel is doing the “right” thing here: It only looks at the digit before the target digit. Of course, this sort of rounding is not equivalent to the fixed point iteration (i.e., if I round(17.849) to 2 places I get 17.85 and if I round that to 1 place I get 17.9). But it’s far more common to do things the “right” way. (And it makes a lot of sense.)

What confuses me is how the heck did the paper get the iterated rounding version? Is there software out there that does it that way? My spot checking of Excel, Google Spreadsheets, and Python all yield the same behaviour.

Is this a big deal? Well, obviously not. Arguably, I don’t care about what’s beyond the decimal for these purposes and nothing about these differences is critical — or even  marginally relevant — for the paper’s results. However, it is an interopt and validation problem. What should have been two seconds took me 20 minutes. And what’s this software doing this weird rounding? Is it causing problems elsewhere?

New on the blogroll: Now Face North by the invaluable JL

December 9, 2014

JL is one of my absolutely favorite commenters on Lawyers, Guns, and Money. She is a true standout comment with a wealth of activism and other experience that she readily shares with sharp insight. She now has a blog! Read her blog!

Music Monday: Bad Lip Reading’s “Gang Fight”

December 8, 2014

While I love quite a bit of Bad Lip Reading’s video (the schtick, the dude’s mom went substantially deaf and learned to lip read by watching TV; being a good son, he turned off the sound and tried to learn as well; he’s really and hilariously bad at it), the only song that I like is the Bad Lip Reading of Rebecca Black’s “Friday”.

Fortunately, I don’t know the original and never will!

How can you not love the (very catchy) chorus:

Gang Fight, Gang Fight!
The gang is down to fight, yeah
Have I brought this chicken for us to eat?
Gang Fight! Gang Fight!
The gang is down to fight, yeah
Have I brought this chicken for us to thaw?

I believe the answer to those two questions is always YES! And I’m vegetarian!

École Polytechnique

December 8, 2014

The anniversary was yesterday, but I’m still thinking about it. I find it in my thoughts every now and again. I spend a lot of time in classrooms and every now and again I remember reading about it the day after it happened and am a little afraid of something like that happening again.

In remembrance:

  • Geneviève Bergeron (born 1968), civil engineering student
  • Hélène Colgan (born 1966), mechanical engineering student
  • Nathalie Croteau (born 1966), mechanical engineering student
  • Barbara Daigneault (born 1967), mechanical engineering student
  • Anne-Marie Edward (born 1968), chemical engineering student
  • Maud Haviernick (born 1960), materials engineering student
  • Maryse Laganière (born 1964), budget clerk in the École Polytechnique’s finance department
  • Maryse Leclair (born 1966), materials engineering student
  • Anne-Marie Lemay (born 1967), mechanical engineering student
  • Sonia Pelletier (born 1961), mechanical engineering student
  • Michèle Richard (born 1968), materials engineering student
  • Annie St-Arneault (born 1966), mechanical engineering student
  • Annie Turcotte (born 1969), materials engineering student
  • Barbara Klucznik-Widajewicz (born 1958), nursing student

Data and Method Sharing: Unnecessary sleuthing

December 8, 2014

Disclaimer: I do not endorse any inferences from my complaining here to “the paper authors suck”. In fact, I think all these paper authors are very cool and I value their work a lot. Stuff happens. Any venting that happens in this post is just that, venting, not considered judgement.

So, I’m trying to put together a paper about how computer based risk assessment tools which use score based risk assessment schemes unnecessarily homogenize risks strata and in a bad way. (I’m coming more and more to the conclusion that score based risk stratification is a pretty bad idea.) I have some good preliminary observations, but now I want to demonstrate how bad it can get. To do this, I’m trying to replicate the methods behind one tool both exactly (i.e., do what they did) and using a more granular set of probabilities (new contribution). But I’m running into problems.

First, no one, as far as I can tell, provides R scripts or spreadsheets of their data and calculations. Grrr. CSVs would be fine at this point. Ok, this means I’m pulling data out of PDFs. Such fun!

Second, and more importantly, I’m trying to pull data out of this paper‘s PDF the same way the other paper did. And even in the supplement, it’s not clear what’s going on. Here’s an example:

We derived a theoretical annual risk of stroke without treatment by adjusting SSE [stroke and systemic embolism] rates from a large cohort (n = 73 538) of ‘real world’ patients in the Danish National Patient Registry13 who have non-valvular AF and were not treated with warfarin. We chose to utilize the 10-year follow-up data from this study because the one-year follow-up data may overestimate the true event rate, given that all of the patients who were included in the analysis had recently been admitted to hospital. The rates in the Danish non-OAC cohort were adjusted to account for antiplatelet use within each group, assuming that antiplatelet use confers a 22% relative risk (RR) reduction.2 The adjusted rates were calculated by prorating the patient years according to the percentage of patients within each group who were taking anti-platelets, and then dividing the number of events by the adjusted patient years. The reported and adjusted annual risks of SSE by the CHA2DS2-VASc score are shown in Figure 1A. The event rates and exact confidence intervals were calculated independently for each score assuming a Poisson distribution. [bolds added]

So, we need:

  1. To find the SSE rate from the 10-year follow-up data.
  2. Find the percentage of patients in each group that were on anti-platelets.
  3. Do some calculations.

Ok, but where in the source paper is this data? They have 8 tables in there! Would it kill you to mention the particular table you used? Yes, i probably can figure it out, but why should I have to? Anyway, I look at Table 1, “Baseline characteristics of patients. Values are numbers (percentages)”. First hit, there’s a column with n=73 538, which is the n in the quote above! Woohoo! Table 1 also has a row, “Antiplatelet drug”, with the value “25 503 (34.7)” in the appropriate place. Yay! Except Boo! because I don’t have a number for this to cross check. Plus, this is a value for the entire cohort as a whole, not “for each group” by which I infer that a group is a set of people with the same CHA2DS2-VASc score. But it’s highly unlikely that antiplatelet therapy is evenly distributed over these scores. It could be that the sicker (more risk factor) groups are going to have more people on antiplatelet therapy. Perhaps CHA2DS2-VASc=0 will have only 20% on aspirin. Or it could be the other way around.

Of course, for my purposes it doesn’t matter since I just want to replicate. But I find this line ambiguous. I don’t like concluding that they just use 34% across each group by the fact that that’s the only number I could find. Tell me, please!

Now, I still need the event rate by CHA2DS2-VASc score from the 10-year follow-up. Enter Table 2, “Event rate (95% CI) of hospital admission and death due to thromboembolism* per 100 person years”.

Dear reader, you may think I’m being a bit of a lazybones. Table 1 and Table 2 don’t seem that hard to find. Yes, but finding isn’t the same as being told.

In any case, it’s found! Now I do a quick cross check:

CHA2DS2-VASc Table 2 Figure 1A
Figure 1A
0 0.66 0.6 0.6
1 1.45 1.2 1.3
2 2.92 2.6 2.8
3 4.28 3.9 4.2
4 6.46 6.0 6.6
5 9.97 9.4 10.5
6 12.52 11.6 13.2
7 13.96 13.0 15.0
8 14.10 13.2 15.4
9 15.89 13.9 15.9

Join in my confusion!

So Table 2, by itself, is not the direct source of the numbers in figure 1a. Some munging has been done. Hint: The units differ. Table 2 is events per 100 person-years and Figure 1a is annual risk. I don’t believe this is a simple translation, frankly. I mean, I think I have a way to do a calculation, and maybe I’ll get the numbers in Figure 1a, but I think it’s a bit dodgy statistically (though what do I know). In any case, TELLING ME WHAT YOU DID would make my life easier. The supplement just has the same paragraph :(

Update 1

It’s definitely not derived from Table 2 nor is the antiplatelet cohort calculated. The supplement contains a table with particular numbers for the antiplatelet cohort and the total number of incidents. They also have raw numbers of patient years broken down by category. That’s good data, but not in the original paper. Time to look for the other data supplement!

Update 2

Alas, the other supplement does not have this data either. I’ll do a few more searches before contacting authors.

I’ll note that this discrepancy is still worrisome. Presumably the rates should come out the same if the calculations are done in a similar manner.

Update 3

Contact the author time! There’s an overlapping author, thus I think they used data directly and made some slight changes in method. That’s the only thing that makes sense to me right now.


My tentative thought is that I can’t repeat the derivation in the second paper as described based on the data published in the first paper. I could perhaps do a replication, but some bits will be inherently lower quality. (I.e., not just different, but worse. E.g., the anti-platelet adjustment.) I can’t just use the supplemental data from the second paper, because it’s missing all the stuff on co-variates, which is what I’m interested in. So I can’t do my preferred replication off the data in the second paper. I probably still can do two replications of the second paper based on the published data to compare them, and that might be adequate for my purpose. It’s not as nice as being able to compare against the second paper directly, but it will probably work.

Music Monday: Crayola doesn’t make a color for your eyes

November 17, 2014

Lots of travelling and the blog dies. However, last week at NERFA I saw Kristen Andreassen perform this song (slapping on her vest). It’s very fun and the video is adorable with extra dor!

Zoe did a clapping rearrangement of The Twelve Days of Christmas (called One Little Partridge) which is very awesome but only available on a sampler. We’re hoping to do a video in time for the season. While Andreassen sets the bar very high, I’ll just be pleased if we can get anything.

BTW, if you like folk music, NERFA is an interesting thing. It’s an “insiders” conference (i.e., where performers and venues go to make contact) but there is a ton of a ton of music going on everywhere just about all the time.


Get every new post delivered to your Inbox.

Join 353 other followers