Archive for the 'Research' Category

Quantitative Social Sciences vs. the Humanities

December 29, 2016

Post Mortems

As we inch closer to realizing the Trump disaster, the election post-mortem’s continue. Obama has claimed that he would have beaten Trump. I’m unsure about the wisdom of that from either an analytical or political perspective. Qua analysis, it could be banal, reasonable, or silly:

  1. Banal: Your take on the election could be, roughly, while the fundamentals favored a generic Republican, Trump was an unusually bad candidate running and unusually bad campaign so that, absent extraordinary intervensions esp. from the FBI, a reasonable Democrat would have won. A bit more subtly, he could be claiming that Democrats can win when they aren’t also running against the press plus FBI plus Russia plus Wikileaks and he is a candidate that the press (a key enabler of the others) doesn’t run against.
    This isn’t quite as banal as “A major party candidate always has a good shot in this polarised age” in that it posits that Clinton specific features strengthened the Trump campaign just enough. However, it doesn’t posit any Obama specific features, hence the banality.
  2. Reasonable: Your take on the election could be, roughly, that given the closeness of Trump’s victory, a bit more juicing of Democratic turnout would have been sufficient (esp. when combined with all the items under the banal scenario) for victory. Obama has a good record of turnout which seems to be some combination of his personal qualities as well as his GOTV operation. If we posit that Clinton had the equivalent GOTV operation, then we’re left with his personal qualities which are a superset of “not having the Clinton failings”. I think you can probably make a case like this based on the exit polls. While reasonable, it’s highly defeasible. What’s more, it’s not clear that you add much over the banal case. You need something like what’s in the reasonable case to distinguish Obama vs. Sanders.
  3. Silly: Obama would have crushed Trump because Trump is an extremely bad candidate while Obama is an extremely good candidate. I feel like both those statements are true but we really need to take seriously the idea that candidate quality matters at best at the margins. It’s not just that fundamental models tend to do well empirically, but that the causal mechanisms for candidate or even campaign quality mattering are opposed by a lot of evidence and a lot of alternative causal stories. What voters hear, how they come to make decisions, the small number of “true idependents” etc. tend to point toward the partisan identity thesis of voting, to wit, voters tend to vote their party identity regardless of the policy implications or political behavior of the candidate. Voter attributions of decision making based on campaign specifics can be plausibly attributed (for many voters) on things like (supported) rationalisation.

Politically, all this seems to do is set up Clinton as a scapegoat or perhaps, better, set up Obama as the leader of the opposition. The former is pointless. The latter is perhaps worthwhile. It’s clear that Obama campaigning on the behalf of others isn’t effective (he’s not had notably strong coattails, for example). More significantly, I rather suspect he’s going to take a traditional ex-president role an be relatively quiet about Trump. If that’s the case, it would be bad for him to become leader of the opposition.

There’s lots to unpack about the election and we have the problem that, on the one hand, good analysis and data gathering takes time while, on the other hand, the further the election recedes into the past, the more evidence evaporates. This is all next to the fact that post mortems serve political goals thus are subject to motivated distortion.

The Loomis Hypotheses

Ok, that was a digression. What prompted this more directly is Erik Loomis’ latest entry in his war/trolling on the scientific status of social sciences like economic and political science. This is a bit more general than attempts to use the election outcome against specific models/prognosticators/etc. and, of course, Erik is provocatively overstating:

It’s time to put my humanities hat on for a bit. Obviously there are political scientists and economists who do good work. And we need people studying politics and economics, of course. But the idea that there is anything scientific about these fields compared to what historians or philosophers or literature critics do is completely laughable. As I tweeted at some point right after the election, the silver lining to November 8 is that I never have to even pretend to take political science seriously as a field ever again. Of course that’s overstated, but despite the very good political scientists doing good work (including my blog colleagues!) the idea that this field (Sam Wang, Nate Silver, etc., very much included) had some sort of special magic formula to help us understand politics this year, um, did not turn out to be true. They are just telling stories like I do, but with the pretense of scientific inquiry and DATA(!!!) around it. It’s really the same with economists, far too many of whom are completely deluded by their own models and disconnected from the real life of people.

Before trying to structure these a bit, I want to point out that we have some serious challenges to  making either a defensive or offensive claim about methodological validity or superiority based on prognostic outcomes of elections: All the models are probabilitistic with extremely small test cases. So, even Sam Wang’s prediction of a 99% chance of a Clinton win is consistent with what happened. Silver’s higher odds for Trump aren’t necessarily validated by Trump’s winning! You have to dig into the details in order to find grounds for determining which one actually overstated the odds and your arguments are going to be relatively weak. But conversely, your arguments that these models serve no useful purpose has to do more than say, “They got the election outcome wrong!!!” Highly accurate models might be only “empirically valid” that is, they succeed but provide no insight and don’t track the underlying causal structure. Highly uncertain models might tell you a lot about why certain outcomes are easily predictable.

Overall, I think the burden of argument is on the model proposers rather than the skeptics. First, this is the natural placement of burden: the person making the claim has to defend it. Models need content and if you rely on the fact that both Wang and Silver had a Trump win as a possibility, then you risk making them all essentially equivalent to coin toss models. In which case, Erik’s attack gets some purchase.

There seems to be three rough claims:

  1. (Quantitative) Social Science is no more scientific than history, philosophy, or literary criticism.
  2. (Quantitative) Social Science wrongly claims to have a “formula” that provides superior understanding of politics. Instead, they are “just telling stories.”
  3. The problem (Quantitative) Social Science is that they are deluded by their models and thus disconnected from the real lives of people.
    This could mean many things including: current models are oversimplistic (i.e., disconnected) yet treated as gold, models in principle are oversimplifying so will never be a good tool, or models are only useful in conjunction with other (qualitative) methods.

2 can be seen as a refinement of 1, that is, that the way that (Quantitative) Social Science is no more scientific than history, philosophy, or literary criticism is that it doesn’t do anything more than “tell stories,” albeit with a quantitative gloss. Obviously, there’s some difference in what they do as a novel about lost love is topic-distinct from a history of glass blowing in Egypt. Even when topic congruent, we expect that a novel about the Civil War to be a different kind of thing than a history of the Civil War. Not all stories have the same structure or purpose or value for a task, after all.

A Standard Caveat

Many debates about the “scienciness” of a field are prestige fights and as a result tend to be pretty worthless. That something is or isn’t a science per se doesn’t necessarily tell you about the difficulty or significance of it or much at all about its practitioners. There are sensible versions but they tend to be more focused on specific methodological, evidential, sociological, or ontological questions.

Comparative Scientivisity

(I’m not going to resolve this issue in this post. But here’s some gestures.)

While there’s some degree of “qualitative humanties is superior” in Erik’s posts (cf claim 3 and, wrt 1 and 2, the idea that they at least know their limits), let’s stick to the comparative scienciness claim. These points (the categorical and the superiorness) aren’t fully separable. (I.e., science is successful in certain enviable ways thus other fields try to glom on.)

Let’s pick a distant pair: election forecasting and interpretative literary criticism. It does seem that these two things are really different. If the literary criticism teases out a possible interpretation of, say, a poem, then the evaluative criteria for  the interpretation is whether it is “correct”, or “valid”, or “insightful” and the evaluative mechanism is (typically) either brute human judgement or more criticism (i.e., the presentation of other interpretations either of the criticism or of the original poem). The most obvious evaluative criterion for election forecasts is predictive success (and usually rather easy to verify predictive success). Prediction, of course, is a key indicator of science, so the fact that election forecasting (inherently)aims at prediction might be enough to cast a sciency feel on its parent discipline, political science.

Of course, astrology and tarot also aim at prediction. Their lack of science status doesn’t solely rest on their predictive failure. Indeed, predictive failure alone won’t give us a categorical judgement (science/nonscience) since it just as easily indicate bad or failing science. Throwing in some math won’t do the job, as astrology and numerology are happy to generate lots of math. The fact that the math tends to generate models that reasonable cohere with other knowledge of the physical world is a better indicator.

If we move over to history, it’s tempting to say that the main difference is analogous to autopsy vs. diagnosis: It’s much easier to figure out what killed someone (and when) than what will kill someone (and when). Even that there are epistemically or ontologically ambiguous cases (i.e., we can’t tell which bullet killed them or multiple simultaneous bullets were each sufficient to kill them) doesn’t make autopsy harder. (For one, it’s generally easier to tell when one is in such a situation.)

But there’s plenty of backward looking science. Cosmology and palentology and historical climate studies come to mind. They do try to predict things we’ll find (if we look at the right place), but it’s hard to say that they are fundamentally easier. What’s more, they all rest on a complex web of science.

I feel confident that history could (and probably does) do a lot of that as well. Surely more than most literary critcism would or perhaps should (even granting some literary critcism, such as author attribution, has become fairly sciency).

What does this mean for Erik’s claims?

I’m not sure. A lot of what we want from understanding of phenomena is how to manipulate those phenomena. But one thing we can learn is that we don’t have the capacity to manipulate something the way we’d like. This goes for buildings as well as elections.

(Oops. Gotta run to a play. But I don’t want to leave this hanging, so I’ll leave it with a hanging ending. But I’m also genuinely unsure where to go with this. I still have trouble interpreting Erik’s claims that leads me to any action.)

The EU’s all open access by 2020 push

May 30, 2016
The EU is going to push for all articles to be open access by 2020:

This week was a revolutionary week in the sciences—not because we discovered a new fundamental particle or had a new breakthrough in quantum computing—but because some of the most prominent world leaders announced an initiative which asserts that European scientific papers should be made freely available to all by 2020.

This would legally only impact research supported by public and public-private funds, which are a vast portion of the papers produced annually; however, the goal is to make all science freely available. Ultimately, the commitment rests on three main tenets: “Sharing knowledge freely,” “open access,” and “reusing research data.”

This is a big deal, but I’ve some mixed feelings about it.
  1. Academic publishers are grotesque rentiers for the most part. Just awful. It’s embarrassing that we don’t fix this problem.
  2.  Most open access stuff doesn’t fix this problem. Instead we have NEW classes of rentiers and scammers cropping up. Now every individual author has to keep an eye on which of the new pay to play journals are real and which aren’t. Individual authors have to find cash to pay the open access fee <–TYPICALLY PURE RENT!!! Ridiculous.
  3. I’m inclined to think that we’re pretty much solving a non-problem. Most people in places with richish universities can get access one way or another (most universities I’ve been affiliated with have community memberships). Most people don’t use most papers and, more importantly, most papers don’t get used.I don’t see the same push for open access monographs and textbooks by the funding agencies. The latter is a particular disgraces that hits millions of people every semester.
  4. It does solve one real problem: Access for people around poor universities. That’s a big deal and good show.
  5. It does nothing to solve the over-publication and publication bias problems. Nor does it help with reproducibility.

“But the public paid for the research, they should have access to it.” If we’re going down this route then Universities should take out no patents, open source all their software, and businesses that get any public money should release their stuff for free. It’s a coherent position, but it’s nowhere the general policy nor is this a step toward it. Heck, cf monographs. It feels like a lot of lip service, but more disruption for disruption’s sake.

Shorten the term of copyright and you’ll have a bigger impact. Easier to administer too, in a lot of ways. Never going to happen!

4 is important though. That probably balances everything else.

Bernard Williams on Case Studies

January 6, 2016

From “A critique of utilitarianism” (in Utilitarianism: For and Against, pp 96-96):

For a lot of the time so far we have been operating at an exceeding abstract level. This  has been necessary in order to get clearer in general terms about the differences between consequentialist and other outlooks, an aim which is important if we want to know what features of them lead to what results for our thought.

I found this a bit confusing, but I think the point here is conceptual clarity. Somehow, being clear in general terms helps us understand causal (or conceptual) relationships. I’m not convinced (or even convinced I understand it), but ok. Clear formulation of the manipulations or treatments we are comparing is a good idea. Whether we need to do this in general terms or not isn’t critical. We want to know exactly how each moral theory works in the cases under examination. At least, enough to “run the simulation”.

Now, hover, let us look more concretely at two examples, to see what utilitarianism might say about them, what me might say about utilitarianism and, most importantly of all, what would be implied by certain ways of thinking about the situation.

At this point, I don’t know that it matters whether the cases are experiments are case studies. There are uses for either with these specific goals.

The examples are inevitably schematized, and they are open to the objection that they beg as many questions as they illuminate. There are two ways in particular in which examples in moral philosophy tend to beg important questions. One is that, as presented, they arbitrarily cut off and restrict the range of alternative courses of action…The second is that they inevitably present one with the situation as a going concern, and cut off questions about how the agent got into it, and correspondingly about moral considerations which might flow from that…

I’m not sure that these are quite matters of question begging. In general, moral reasoning (like most normal reasoning) is heavily non-monotonic: that is, the conclusion might change as you add new information (and change back as you add still more). And, with respect to the first, it’s clear that if we add a new possibility to a scenario that might change what’s right! (A moral dilemma is solved by finding a third, permitted, option, after all.) With respect to the second, obviously, backstory can matter quite a lot to our judgment: If a child takes a toy that another child is playing with, we might chide them, but it is a reasonable defense if the first child says, “This is my toy. I brought it here. They took it and won’t let me or anyone else play with it.”

These are threats to external and ecological validity if there is never a reasonable attenuation of factors to consider. (Williams makes this point later, sort of, as I will quote.) We never know all the backstory or are aware of all the options, so the mere fact that a scenario necessarily elides some option or backstory details it not itself a reasonable point. These specific ones might fail because, say, no conclusion can be drawn with out some backstory (who’s toy is it?) or because there’s an obvious possible action not mentioned. But that’s a different problem.

I think these are different worries than the one’s Nussbaum raised. To requote:

This task cannot be easily accomplished by texts which speak in universal terms—for one of the difficulties of deliberation stressed by this view is that of grasping the uniqueness of the new particular.  Nor can it easily be done by texts which speak with the hardness or plainness which moral philosophy has traditionally chosen for its style—for how can this style at all convey the way in which the “matter of the practical” appears before the agent in all of its bewildering complexity, without its morally salient features stamped on its face?

The second problem (hardness and plainness) is clearly not a matter of missing propositions (as with Williams’ problems), but of richness of form. (In a future post, I’ll use Suzanne Langer to articulate this a bit more.) Obviously, Nussbaum can live with finite presentations, but she thinks that philosophical writing fails in some ways when compared to novelistic writing.

These difficulties, however, just have to be accepted, and if anyone finds these examples cripplingly defective in this sort of respect, then he must in his own thought rework them in richer and less question-begging form.

I kinda agree and am kinda annoyed by this. In one sense, Williams is correct. If these examples don’t suit, one response is to enrich them. On the other, there’s no justification of his examples. Are they sufficiently rich as not to be cripplingly defective? And there are other respects in which they may be problematic (e.g., are they typical? representative? do they cover problems in non-utilitarian theories?) Philosophy of this era isn’t stylised in the way many scientific papers have become, but I kinda want a “materials” section that discusses the corpus of examples!

If he feels that no presentations of any imagined situation can ever be other than misleading in morality, and that there an be never be any substitute for the concrete experienced complexity of actual moral situations

Note! Nussbaum thinks there is a substitute! But Williams isn’t writing no novel and his examples are pretty abstract and weird so he can still fail in Nussbaumian terms.

then this discussion, with him, must certainly grind to a halt: button one may legitimately wonder whether every discussion with him about conduct will not grind to a halt, including any discussion about the actual situations, since discussion about how one would think and feel about situations somewhat different from the actual (that is to say, situations to that extent imaginary) plays an important role in discussion of the actual.

One may legitimately wonder whether anyone would or has held such a silly position! Williams spends much more time defending against an extreme position that is so implausible he says that there is no talking to people who hold it than actually defending his actual examples. Indeed, he spends zero time defending his actual examples.

I, in general, love this essay. But whenever I dig in I really hate it. This is not good form. It gives the impression of giving due consideration as to whether the examples are useful and legit without even starting to do so.

I mean, consider that the imaginariness bit is just a red herring: We never have full knowledge of a situation. So we’re always working with an incomplete description even “in the moment”. So the real question is are we dealing with case descriptions of sufficient detail to allow for reasonably accurate simulation of moral deliberation. And I think we can answer that question, fallibly, partially, with the expectation that we can always do better. The Williams examples are not the worst ever, but they are much closer to thought experiments than thought case studies for all that he gives actors cute names (the wife and older friend don’t get names, nor does the captain or Indians, but Pedro does).

(I find the universal “he” pretty damn distracting, fwiw! I’m glad we’re past that.)

Experiments vs. Case Studies

January 4, 2016

My recent post on validities was motivated by John Proveti posting a draft of an abstract he was submitting about the Salaita affair. John focused on exploring the use of case studies in moral analysis. This prompts me to write up (again) my spiel on experiments and case studies.

The primary aim of a controlled experiment is internal validity, that is, demonstrating causal relationships. The primary tool for this is isolation, that is, we try to remove as much as possible so that any correlations we see are more likely to be causal. If you manipulate variable v1 and variable v2 responds systematically and there are no other factors that change through the manipulation then you have a case that changes in v1 cause those changes in v2. (Lots of caveats. You want to repeat it to rule out spontaneous changes to v2. Etc.) Of course, you have lots of problems holding everything except v1 and v2 fixed. It’s probably impossible in almost all cases. You may not know all the factors in play! This is especially true when it comes to people. So, you control as much as you can and us a large number of randomly selected participants to smooth out the unknowns (roughly). But critically, you shrink the v and up the n (i.e., repetitions).

Low v tends to hurt both external and ecological validity. In other circumstances, other factors might produce the changes in v2 (or block them!). For other controlled circumstances, this might be fairly easy to find the interaction. But for field circumstances, the number of factors potentially in play explodes.

Thus, the case study, where we lower the number of n (to n=1) in order to explore arbitrary numbers of factors. Of course, the price we pay for that is weakening internal and external validity, indeed, any sort of generalisability.

Of course, in non-experimental philosophy, the main form of experiment is the thought experiment. But you can see the experiment philosophy at work: The reason philosopher dream up outlandish circumstances is to isolate and amplify the target v1 and v2. Thus, in the trolly problem, you have a simple choice. No one else is involved, and we pit number of lives vs. omission or commission and the result is death. That the example is hard to relate to is a perfect example of a failure of ecological validity. But philosophers get so used to intuiting under though laboratory conditions that they become a bit like mice who have been bred to be susceptible to cancer: Their reactions and thinking is suspect. (That it is all so clean and clever and pure makes it seem like one is thinking better. Bad mistake!)

Of course, we can have thought case studies as well. This is roughly what I take Martha Nussbaum to claim about novels in “Flawed Crystals: James’s The Golden Bowl and Literature as Moral Philosophy“:

To show forth the force and truth of the Aristotelian claim that “the decision rests with perception,” we need, then-either side by side with a philosophical “outline” or inside it—texts which display to us the complexity, the indeterminacy, the sheer difficulty of moral choice, and which show us, as this text does concerning Maggie Verver, the childishness, the refusal of life involved in fixing everything in advance according to some system of inviolable rules. This task cannot be easily accomplished by texts which speak in universal terms—for one of the difficulties of deliberation stressed by this view is that of grasping the uniqueness of the new particular.  Nor can it easily be done by texts which speak with the hardness or plainness which moral philosophy has traditionally chosen for its style—for how can this style at all convey the way in which the “matter of the practical” appears before the agent in all of its bewildering complexity, without its morally salient features stamped on its face? And how, without conveying this, can it convey the active adventure of the deliberative intelligence, the “yearnings of thought and excursions of sympathy” (p. 521) that make up much of our actual moral life?

I take this as precisely the point that more abstract explorations of moral reasoning lack ecological validity.

This, of course, has implications both for moral theorising and for moral education. Our moral theories are likely to be wrong about moral life in the field (and, I would argue, in the lab as well!). (I think this is what Bernard Williams was partly complaining about in Utilitarianism For and Against.) But further, learning how to reason well about action in in the circumstances of our lives won’t work by ingesting abstract moral theories (even if they are more or less true). We still need to cultivate moral judgement.

I think we can do philosophical case studies that are not thought case studies just as we can do experimental philosophy without thought experiments. Indeed, I recommend it.

Data and Method Sharing: Unnecessary sleuthing

December 8, 2014

Disclaimer: I do not endorse any inferences from my complaining here to “the paper authors suck”. In fact, I think all these paper authors are very cool and I value their work a lot. Stuff happens. Any venting that happens in this post is just that, venting, not considered judgement.

So, I’m trying to put together a paper about how computer based risk assessment tools which use score based risk assessment schemes unnecessarily homogenize risks strata and in a bad way. (I’m coming more and more to the conclusion that score based risk stratification is a pretty bad idea.) I have some good preliminary observations, but now I want to demonstrate how bad it can get. To do this, I’m trying to replicate the methods behind one tool both exactly (i.e., do what they did) and using a more granular set of probabilities (new contribution). But I’m running into problems.

First, no one, as far as I can tell, provides R scripts or spreadsheets of their data and calculations. Grrr. CSVs would be fine at this point. Ok, this means I’m pulling data out of PDFs. Such fun!

Second, and more importantly, I’m trying to pull data out of this paper‘s PDF the same way the other paper did. And even in the supplement, it’s not clear what’s going on. Here’s an example:

We derived a theoretical annual risk of stroke without treatment by adjusting SSE [stroke and systemic embolism] rates from a large cohort (n = 73 538) of ‘real world’ patients in the Danish National Patient Registry13 who have non-valvular AF and were not treated with warfarin. We chose to utilize the 10-year follow-up data from this study because the one-year follow-up data may overestimate the true event rate, given that all of the patients who were included in the analysis had recently been admitted to hospital. The rates in the Danish non-OAC cohort were adjusted to account for antiplatelet use within each group, assuming that antiplatelet use confers a 22% relative risk (RR) reduction.2 The adjusted rates were calculated by prorating the patient years according to the percentage of patients within each group who were taking anti-platelets, and then dividing the number of events by the adjusted patient years. The reported and adjusted annual risks of SSE by the CHA2DS2-VASc score are shown in Figure 1A. The event rates and exact confidence intervals were calculated independently for each score assuming a Poisson distribution. [bolds added]

So, we need:

  1. To find the SSE rate from the 10-year follow-up data.
  2. Find the percentage of patients in each group that were on anti-platelets.
  3. Do some calculations.

Ok, but where in the source paper is this data? They have 8 tables in there! Would it kill you to mention the particular table you used? Yes, i probably can figure it out, but why should I have to? Anyway, I look at Table 1, “Baseline characteristics of patients. Values are numbers (percentages)”. First hit, there’s a column with n=73 538, which is the n in the quote above! Woohoo! Table 1 also has a row, “Antiplatelet drug”, with the value “25 503 (34.7)” in the appropriate place. Yay! Except Boo! because I don’t have a number for this to cross check. Plus, this is a value for the entire cohort as a whole, not “for each group” by which I infer that a group is a set of people with the same CHA2DS2-VASc score. But it’s highly unlikely that antiplatelet therapy is evenly distributed over these scores. It could be that the sicker (more risk factor) groups are going to have more people on antiplatelet therapy. Perhaps CHA2DS2-VASc=0 will have only 20% on aspirin. Or it could be the other way around.

Of course, for my purposes it doesn’t matter since I just want to replicate. But I find this line ambiguous. I don’t like concluding that they just use 34% across each group by the fact that that’s the only number I could find. Tell me, please!

Now, I still need the event rate by CHA2DS2-VASc score from the 10-year follow-up. Enter Table 2, “Event rate (95% CI) of hospital admission and death due to thromboembolism* per 100 person years”.

Dear reader, you may think I’m being a bit of a lazybones. Table 1 and Table 2 don’t seem that hard to find. Yes, but finding isn’t the same as being told.

In any case, it’s found! Now I do a quick cross check:

CHA2DS2-VASc Table 2 Figure 1A
“Reported”
Figure 1A
“Adjusted”
0 0.66 0.6 0.6
1 1.45 1.2 1.3
2 2.92 2.6 2.8
3 4.28 3.9 4.2
4 6.46 6.0 6.6
5 9.97 9.4 10.5
6 12.52 11.6 13.2
7 13.96 13.0 15.0
8 14.10 13.2 15.4
9 15.89 13.9 15.9

Join in my confusion!

So Table 2, by itself, is not the direct source of the numbers in figure 1a. Some munging has been done. Hint: The units differ. Table 2 is events per 100 person-years and Figure 1a is annual risk. I don’t believe this is a simple translation, frankly. I mean, I think I have a way to do a calculation, and maybe I’ll get the numbers in Figure 1a, but I think it’s a bit dodgy statistically (though what do I know). In any case, TELLING ME WHAT YOU DID would make my life easier. The supplement just has the same paragraph 😦

Update 1

It’s definitely not derived from Table 2 nor is the antiplatelet cohort calculated. The supplement contains a table with particular numbers for the antiplatelet cohort and the total number of incidents. They also have raw numbers of patient years broken down by category. That’s good data, but not in the original paper. Time to look for the other data supplement!

Update 2

Alas, the other supplement does not have this data either. I’ll do a few more searches before contacting authors.

I’ll note that this discrepancy is still worrisome. Presumably the rates should come out the same if the calculations are done in a similar manner.

Update 3

Contact the author time! There’s an overlapping author, thus I think they used data directly and made some slight changes in method. That’s the only thing that makes sense to me right now.

Conclusion

My tentative thought is that I can’t repeat the derivation in the second paper as described based on the data published in the first paper. I could perhaps do a replication, but some bits will be inherently lower quality. (I.e., not just different, but worse. E.g., the anti-platelet adjustment.) I can’t just use the supplemental data from the second paper, because it’s missing all the stuff on co-variates, which is what I’m interested in. So I can’t do my preferred replication off the data in the second paper. I probably still can do two replications of the second paper based on the published data to compare them, and that might be adequate for my purpose. It’s not as nice as being able to compare against the second paper directly, but it will probably work.