I think there's this mindset from other non-social sciences that spills into social science in a unproductive way. There's no scientific finding that involves human subjects that can divorce itself from human behavior (in this case, selection to comply to treatment assignment). Instead of thinking about whether or not colonoscopies work, you think about whether the policy of nudging people to get colonoscopies work.
This type of evidence answers policy questions and are not necessarily applicable to individual choices or the science of a specific procedure. Maybe in some cases, for some reason, you have very high compliance and it's doubled-blinded, etc etc - but a lot of these studies that don't have these very unusually ideal experimental conditions don't really lend themselves well to extracting the strict science of a procedure, it's always going to be obfuscated by the social stuff.
Not only do you make excellent points, but you also gave me a chuckle as I pondered how one could possibly ever design a double-blind colonoscopy trial. On the patient side, it would really suck to go through all the prep only to discover later that you were in the placebo group. But on the clinician side, I can't imagine how the trial could be blinded.
I cannot imagine an *ethical* way for the clinician to be blinded.
You could blindly and randomly assign the video of the procedure for clinicians to review, like they are double checking someone else's work. But now you've introduced a subtle bias in the reviewer's mind.
But yea, getting a placebo colonoscopy past an IRB sounds fun.
Not only that, what would you do after the placebo colonoscopy? Tell the patients everything looks good, even though you didn't do one? Tell them, "oops, we lost the records"?
But I think blinding isn't as big a problem here as sample bias. In principle, I think you could reduce sample bias quite a lot by just using money: Tell people: "Show up at location X on date Y after having done all the prep for a colonoscopy, and agree to do a colonoscopy, and we will give you Z dollars". Then pay everyone who shows up and perform colonoscopies on a random subset. Colonoscopies are already so expensive that you could raise Z pretty high without increasing costs *that* much.
Not totally sure that would pass IRB, though... colonoscopy prep is pretty unpleasant, so maybe too much harm without benefit for the control group?
Single-blinded sham surgery has actually happened a few times in orthopedics - for some surgeries that involve the knee or the back to reduce pain - with patient consent. They were as successful as the real treatment in most cases too!
Good catch! I actually thought I remembered a different SSC which was also relevant (I remember "HARD MATH" being mentioned several times), but I wasn't able to find it.
I did mean for them to disagree, but I also see the disanalogy you're pointing out and agree with you. The problem is that I couldn't find a simple brick-based analogy that preserved all the structure of the colonoscopy debate (including them calculating the same numbers). Ultimately, I felt like all the structure that really mattered was "two arguments, one simple, one fancy, different conclusions".
OK, but after all is said and done, do you have an opinion on the effectiveness (or lack thereof) of colonoscopies? I don't know math from mashed potatoes, so I can't really tell from your write-up what you now think. I recently finished reading an interesting book called "Outlive: The Science and Art of Longevity", by Peter Attia, MD. He seems to know what he's talking about, and he comes out strongly in favor of colonoscopies, to the point where I'm thinking of getting another one even though I already had one about six years ago.
I think I was subconsciously hiding my opinion because I don't think I'm enough of an expert to judge that.
But for what it's worth, my impression is that:
(1) There is clear evidence that colorectal cancer screening in general increases life expectancy (including sigmoidoscopy and fecal tests)
(2) There are fairly strong prior reasons to believe that colonoscopy would be more effective than sigmoidoscopy.
(3) So far the limited evidence about colonoscopy isn't all that encouraging in terms of showing an *extra* benefit, but it's *very very far* from conclusive so point (2) is still quite important.
So I would definitely urge getting some type of screening. And I think I agree that colonoscopy is probably the best bet. But colonoscopy is also more unpleasant and expensive, and colonoscopy has a somewhat higher rate of side effects although still small (probably? unclear) and it's hard to say how much better colonoscopy is.
So please get screened (somehow)! I'm personally unsure if colonoscopy is "worth it", especially because I don't know how much you hate doing them, how much it would cost you, your financial situation, etc. But they don't seem like a bad option! In 50 years we might have decided they are indeed much better, or maybe they aren't worth it, but I doubt we'd find that they were drastically worse than, sigmoidoscopy.
I wish I had access to the full paper to see, but isn't the disagreement between your logic and the proposed solution to the logic (instrumental variables) over the validity of the instrumental variable selected? You seem to be arguing that the instrumental variable selected was invalid, because "receives colonscopy" actually does correlate with lower risk of having colon cancer when the paper is arguing it doesn't and only instrumentally causes lower risk?
What was the justification provided in the paper for why that instrumental variable doesn't have an impact on the final variable?
But I *don't* think it comes down to the validity of the instrumental variable. I agree the instrumental variable is valid, because I agree that (being invited to participate in a colonoscopy trial) only influences the outcome (gets colorectal cancer) though the intermediate variable (gets colonoscopy). I agree with them that that's true, so I agree with them that the IV calculation is right.
The only disagreements are (1) the IV calculation only estimates causal effects *for acceptors*, which I think is best thought of as "biased" since acceptors are at higher risk and more likely to benefit from colonoscopy though I suppose ultimately it's a semantic debate and (2) I think IV is kinda pointless here since we already knew everything it says without using IV.
As I recall from learning to use IV some years ago in econ grad school, you are quite correct here: IV is good and unbiased when the IV shouldn't be endogenously related to what you are using it for. So as you say, since accepting the invitation is going to be biased towards "I probably should get one, yea" vs "Nah, I am fine, no colon cancer in my family" for those who refuse, it won't be an unbiased IV if the outcome variable is "did they get colon cancer?" Which is the tricky bit, picking an IV that isn't related to the decisions people make relevant to the topic, which is ridiculously hard in many (most?) cases.
Screening every 3 months starting at age 40 means you will get screened around 120 times in your life. So it's not clear to me that that much screening is worth it. Seems worth it but not obviously so (assuming it has a 100% success rate at preventing cancer). (I've never gotten a colonoscopy so I don't know how unpleasant it is.)
Economist here. I read the initial post and was very tempted to send a message exactly in line with what economists have sent. But then I read your post carefully and noticed your claim was that the 0.443% is biased _for the entire_ population, noted that is correct, and I didn't have any further objections. I fully endorse your "everyone is right" conclusion, and kudos for figuring it out.
Economists sometimes downplay the issue that the IV estimates are specific to the complier population, but that's something they should know. One classic reference about this is Imbens (2010) "Better LATE than nothing".
One note: you ask "I already figured out this number using the obvious assumptions and grade school algebra, why would I need stinking *instrumental variables*? Turns out IV can get complicated in some cases, but boils down to exactly the computation you did when you have binary instrument, binary independent variable. I think the intuition of "IV is rescaling" is underappreciated among economists.
Whew. Thanks so much for writing this. I hesitated to publish this because even after quadruple-checking things, I still worried that I might be missing something. So it's a huge relief to get some confirmation that I'm not completely crazy!
> If two people disagree, it should be the responsibility Dr. Fancy to explain what’s wrong with Dr. Simple, not the reverse.
Than how you would solve the Brandolini's law?
If we demand from the scientists explaining every axiom or theory to the public than we may occupied them from doing new search and discoveries because they are busy explaining to Dr. Simple that their eyes are biased and the earth is not flat.
For an issue such as the shape of the earth, it should suffice for Dr. Fancy to publish a textbook containing the evidence that the earth is spheroidal and leave it at that. He doesn't have the responsibility to debate every random flat-earther Mr. Simple who comes along.
I think today the more traditional practice would be to state that those who believe the earth is flat are all racists and misogynists and no one should ever associate with them.
@dynomight - in the brick analogy, it seems like Occam's Razor would fit pretty well as a logical heuristic to put the explanatory burden on Dr. Fancy.
You 'assume the “decrease” for refusers is zero', but then later say 'refusers had less colorectal cancer then controls'. Is the control group the group of people not invited? If so, wouldn't these two statements be contradictory?
It seems reasonable to guess that the acceptors are biased, and that also means the refusors would be biased. But if so, then surely you can't make the assumption in your calculation?
> You 'assume the “decrease” for refusers is zero', but then later say 'refusers had less colorectal cancer then controls'. Is the control group the group of people not invited? If so, wouldn't these two statements be contradictory?
No contradiction there. Let's clarify two different claims:
Claim 1: (colorectal cancer rate for refusers in world where NordICC happens) ≈ (colorectal cancer rate for refusers in world where NordICC never happens)
Claim 2: (colorectal cancer rate for refusers in world where NordICC never happens) ≠ (colorectal cancer rate for acceptors in world where NordICC never happens)
I think both of these are true. I think the first one is true because getting and refusing an invitation to a colonoscopy probably doesn't change your personal risk of getting colorectal cancer. And I think the second is true because people preferentially agree to participate in colonoscopy trials when their personal risk is higher.
where (0.186% overall decrease) = (rate of cancer for uninvited) - (rate of cancer for invited).
For the moment I'm just going to consider a 'rate' to be a proportion of the sampled population. Later we can consider them as estimates for probabilities of a general population.
I can get the equation by messing around with the sample proportions if
(decrease for refusers) = (rate of cancer for uninvited) - (rate of cancer for invited refusers).
Then assuming (decrease for refusers) = 0 gets you your result. But (decrease for refusers) = 0 is not the same as Claim 1.
So maybe the equation doesn't come from the above method?
Hmmm. I guess I'd derive that equation this way. Firstly, it's true that
(rate for invited group in world with trial) - (rate for invited group in world without trial)
= 0.42 × ((rate for acceptors in world with trial)-(rate for acceptors in world without trial))
+ 0.58 × ((rate for refusers in world with trial)-(rate for refusers in world without trial))
Next, I claim that
(rate for invited group in world without trial) = (rate for control group in world with trial)
so that
(rate for invited group in world with trial) - (rate for invited group in world without trial) = (observed rate for invited group) - (observed rate for control group) = 0.186.
The rest of the derivation is just being clear about what I mean:
(decrease for acceptors) = ((rate for acceptors in world with trial)-(rate for acceptors in world without trial))
(decrease for refusers) = ((rate for refusers in world with trial)-(rate for refusers in world without trial))
I think you and (from what you describe) your economist friends are wrong.
In short: the issue you point out is that being called for colonoscopy likely affects the probability of getting cancer; if so that means that the exclusion restriction fails and the LATE formula does not apply, so your economists friends are wrong. Your argument correctly shows that the prob of cancer in the colonoscopied population differs from that in the basile pop by less than 0.443%, but you are wrong to conclude from that that the decrease in prob from getting a colonoscopy in a random pop is less than 0.443%.
Let me explain (sorry for the legnth).
Let Pi be the prob of getting cancer for those invited (at random) to get colonoscopy.
Pa the prob for those invited that accepted.
Pr the prob for those that declined.
Pb the baseline probability (for those not randomly invited).
We have:
Pi = 0.42 Pa +0.58 Pr. Subtracting Pb from both sides:
Now, as you very nicely point out, and your economist friends seem to miss, Pr is possibly < Pb (those invited that initially refused the colonoscopy might get some awareness of the importance of colonoscopy). THIS IS A KEY POINT: BEING INVITED MIGHT HAVE A CAUSAL EFFECT ON THE REFUSERS; this means that the EXCLUSION RESTRICTION FAILS and so the typical LATE formula is invalid. On the other hand the bias could go the other way around: those refusing the colonoscopy will likely be less health-concerned, and healthy that the average population, and so might get more cancer, but the data seems to indicate that the 1st effect dominates).
Under your assumption that Pr<Pb it does follow that Pa - Pb > (Pi - Pb)/0.42 =-0.186%/0.42, that is, the Prob in those accepting the colonoscopy is larger than Pb -0.186% / 0.42 = Pb -0.443%
But from that you cannot conclude that gettin a colonoscopy in a random pop decreases the prob of cancer by less than 0.443%.
The economists get right is that Pa - Pb is not the (causal) effect of the colonoscopy, there is also a selection bias, those accepting the colonoscopy are likely more health-conscious, so likely Pa< Pb + causal effect of the colonoscopy on a random population -note, here causal effec is likely negative- (on the other hand, maybe those accepting have already some minor symptom, then the selection bias is the other way around, Pa>Pb +causal eff).
This selection bias is what the LATE formula nicely solves, BUT ONLY IF THERE IS EXCLUSION RESTRICTION (i.e. treatment assignment affects the outcome only by its effect on treatment); in addition we need a monotonicity assumption. If the exclusion restriction fails, I'm not sure we can say anything about the causal effect (even in a subset of the population), but i'd have to think a bit more about it.
In the wiki artilce on LATE, the exclusion restriction is somewhat weirdly called "excludability condition"
Also, LATE usually refers to local average treatment effect, not latent average ... since it is the average treatment effect in a particular population: those who change treatment due to the intervention.
Hi there. I didn't understand all of that, but... it does seem like the causal effect of just getting invited is *probably* small, doesn't it? The only story I could come up with is that getting the invitation affects how well you take care of yourself. Seems possible but... likely not to be a big effect And if it did happen, you'd think the effect would wear off over time, but that's not what the time-plots seem to show.
Also, while it's plausible that accepters would be more health-conscious than refusers, that isn't what the data suggests. The refusers has less cancer after 10 years than controls, indicating that people are smart and accept when their risk is higher. (Actually, as I recall, this was true in Poland where most of the subject came from, but in the other countries, it was more like you suggest.)
If you accept that being selected only affects chances of cancer via getting the offered colonoscopy (the exclusion restriction holds), then your economists friend were right (thought maybe their explanation wasn't great) and you kind of wrong.
Basically, from the data we have you cannot claim that the effect in the overall population is lower than 0.443%. It could be higher or lower than that (unless the rate in the refusers is less than 0.443% -the decrease cannot be larger than the rate without intervention-, then you are right).
How could it be that the random colonoscopy would help the refusers more than the acceptors, even when the refusers had lower rates of cancer? Accepters are likely more health-conscious, so even without the random colonoscopy many of them might have eventually gotten a colonoscopy early enough to avert cancer (for those the random colonoscopy had no effect), while those that end up with cancer among the refusers might be some who dont get a colonoscopy until it's too late, so for them a random colonoscopy would have been very impactful.
So we only know that the causal effect of the random colosc in the population that accepted is 0.443%. And that is why this is called a Local Average Treatment Effect. The data we have doesn't tell us much about the rate in the refusers population if they got the treatment, so we are in the dark about the causal effec in that population.
What would be certainly biased as an estimate of the effect of the colonosc is to simply compare the rate of accepters to the rate in the overall population. This would be biased because we compare 2 diff populations (acceptors might be more health conscious or more cancer prone, so even if the colonosc had no effect we could see an effect simply due to the baseline population difference). This is the bias that the economists are very happy LATE or instrumental variables gets rid off.
Anyway, yours was a very nice re-discovery of the LATE in a situation where only selected people can get the treatment. If you consider a situation where some assigned to the control population can also get the treatment, you might figure out the derivation that largely got Angrist and Imbens the Econ Nobel prize a few years ago.
I agree that if a colonoscopy would help refusers more than it would help acceptors, then the benefit for a random person would be higher than 0.443%. But I think that's quite unlikely since refusers had much lower colorectal cancer rates than controls. Also the paper notes that no colonoscopy screening program was introduced in any of the study areas during the course of the trial, which I think means that very few people did screening colonoscopies outside of the trial.
So, for acceptors, rate went from 1.41% to 1.41- 0.443 = 0.96 %
while for refusers it was 1.05 wihtout rand colonosc.
It doesn't seem too implausible to me that with the rand colonosc the rate in refusers could go 0.61 or lower. But we don't really know, that is the point I was making.
Now, we could make some assumptions. The relative decrease in cancer in the acceptors pop was 0.443/1.41 = 31%. If we assume this rate is equal in all populations, then the reduction in the refusers woul be 1.05*.31=0.33pp, and the reduction in the overall population 0.443 *0.42 + 0.33*0.58 = 0.377pp, and you would have been right (you might had something like this in mind).
But since we agree that acceptors and refusers are different, this assumption might not be warranted, and this is why the LATE is the best measure we have of the causal effect of a random colonosc.
It's been a while since I did econometrics in undergrad, but I'll try to take a stab at this. Basically, if the procedure is unbiased the estimate is unbiased, and we can show that this experimental method eliminates selection bias.
In summary:
1. We want to know the effect of a colonoscopy on cancer.
2. We can randomly assign people to the colonoscopy group or the not-colonoscopy group, but we cannot randomly assign people to actually get the colonoscopy (they can refuse).
3. The difference between the colonoscopy group and the not-colonoscopy group is an unbiased estimate, because this is randomized. There is a fundamental difference between people who accept colonoscopy once assigned and people who don't accept colonoscopy once assigned, but because of randomization, there is an equal amount of this "fundamental difference factor" in both treatment and control. Hence, there is no bias.
4. The only way being randomly assigned to the colonoscopy group can impact cancer is from actually getting the colonoscopy. Hence, the effect of being randomly assigned to the colonoscopy group gives us an unbiased estimate of the effect of colonoscopy.
The key bit here is that the randomization process controls for selection bias in step 3 already, so as long as we don't introduce any additional bias in step 4, we have an unbiased estimate of what we really care about (the effect of colonoscopy on cancer). It is critical that assignment can only impact cancer via colonoscopy itself — if that's not true, it's biased again.
Part of why your discussion with the economists may have been less productive is because bias is a property of the estimation procedure — by definition, if the procedure is unbiased, the estimate is unbiased. So you can only refute a claim that the estimate is biased by pointing to the process and saying "look, we can prove there's no bias". Rather than being a flawed "parallel argument", I think it's more because bias is process-dependent, so you can only really talking about it in such terms, plus there being too many layers of specialist terminology and math making it confusing.
(In more technical terms, the instrumental variable here is "assignment to the colonoscopy group". We use assignment to the colonoscopy group as an instrument to help estimate the effect of colonoscopy on cancer, since we cannot get an unbiased estimate of colonoscopy on cancer directly).
I believe that you are making invalid assumptions regarding burden of proof.
You seem to be assuming that a rebuttal of a simple argument must, necessarily, be simple (and, less importantly, that the rebutal of a complex argument must be complex). This does not follow, and, if a rebuttal of the simple argument is actually complex (and I find this to be the most common case in my life) then there is little to be gained by rebutting it as many people will view the compex rebuttal in the same way as they view the complex argument - You just end up wasting everybody's time and changing no minds. Contrariwise, I often find simple rebuttals to complex arguments - usually by simply disagreeing with one or more of the premises on which they are based.
I don't think my argument hinges on rebuttals of simple arguments being themselves simple (which I agree often is not true). I'm just arguing that *some* rebuttal should be done before giving a more complex parallel argument.
You are hiding the assumption of simplicity behind the *some*. If the simplest rebuttal is complex then *some* rebuttal must be either complex or wrong and if it is complex then TL;DR
I'm really not. It's OK if the *some* is complex. There's a difference between giving a complex argument that explicitly explains what mistake someone else is making and giving a complex totally unrelated argument that leads to a different conclusion.
I actually wrote (most of) an extra section going into this in more detail, but cut it in the interest of being concise. Since it's relevant, I guess I'll just post it here? It's an example of a case where a simple argument needs a complex rebuttal, but I feel everything is good.
---
Fancy arguments collide with simple ones in lots of other domains. As example of how this often goes *right*, take dieting.
If you want to lose weight, a simple argument is "calories in versus calories out". If you eat less or exercise more, you're guaranteed to lose weight because of thermodynamics or whatever.
Basically everyone agrees that is correct. But many suggest that *practically*, it's the wrong thing to focus on, because your body has regulatory mechanisms that control how much you want to eat, how tired you are, how much you fidget, etc. And it's *extremely* difficult to fight those regulatory mechanisms using willpower. (Debates about this often degenerate into debates about the nature of "free will".)
So opponents of calories-in versus calories-out argue that the biggest *practical* problem of weight loss is how to find ways to play around with those regulatory mechanisms and get a lower set-point without requiring inhuman discipline—e.g. by eating only potatoes, or taking drugs that activate your GLP-1 receptors.
So: We often need fancy arguments. But—because simple arguments are more legible—they should be justified by first explaining what's wrong with simple alternatives.
This comment ended up being snarkier than I had originally intended. Sorry.
Disclaimer: I did not read the big study because it was paywalled.
I think this is mostly (entirely?) a problem of context. In the context of going from data to a summary statistic, bias means estimator bias not selection bias. Would you still have a problem if everyone explicitly said estimator bias? Do you think you would get a better response if you consistently used selection bias/biased by selection effects? I think Recht in the sensible-med article you link knows that IV only gives the treatment effect for those who are treated "There is no perfect way to correct for the misestimates inherent to the intention-to-treat analyses." as does wikipedia https://en.wikipedia.org/wiki/Instrumental_variables_estimation#Interpretation_under_treatment_effect_heterogeneity "Generally, different subjects will respond in different ways to changes in the "treatment" x. When this possibility is recognized, the average effect in the population of a change in x on y may differ from the effect in a given subpopulation."
The sensible-med article does include the line "If it varies from the per protocol analysis, we know the IV estimate is closer to the effect size and we know the per protocol analysis has bias." which I think is suspect, but it's not written by Recht but by the editor Prasad, so I'm willing to ignore it.
Also note, estimation means estimation of a parameter in a model. Everything is a model. Your model is the standard IV model + a difference in response rates correlated to underlying risk. Granted, this is probably a better model. But do you know how to take the data and estimate all the parameters in your model? No. You've used the standard IV model and then reasoned that in your model the treat effect parameter would be less, but not by how much. The advantage of using simpler and standard models is that you know which calculations to do to get the estimations and even get the confidence intervals of that estimate. Maybe you are unimpressed by that, because the standard model is "too simple". As the adage goes, all models are wrong but some are useful.
I think you should read the big study. I've posted a non-paywalled link elsewhere here. I find it very unlikely you would find it "simpler" than my analysis.
You: "If you want to know how good colonoscopies are, probably you’d like to know what would have happened if everyone had agreed. Surely the decrease in colorectal cancer would have been larger than 0.186%. But how much larger?"
Article: "differential adherence makes ITS effects hard to compare across trials and sites. We show how instrumental variables (IV) methods address the nonadherence challenge"
I think they are quite clear about what they are calculating:
"LATE is also per-protocol effect, but not for everyone: as Eq. 5 shows, [parameter] gives the average causal effect of screening among experimental subjects screened as a result of the trial."
"Importantly, when all subjects not offered screening remain unscreened, LATE equals the average effect of screening on everyone in the study population who is screened."
You explicitly make two claims:
Claim 1. "If two people disagree, it should be the responsibility Dr. Fancy to explain what’s wrong with Dr. Simple, not the reverse."
Claim 2. "simple math doesn’t, like, disappear when a fancy alternative is presented"
I broadly agree with the Claim 1, with exceptions for trolls and bad-faith actors, but I don't think you substantially disagree with the linked articles, so it doesn't apply here. And for Claim 2, simple math can be a nice check on fancy math, and both you and the article arrive at similar figures, huzzah!
I'm more interested in adressing these more conrete implicit claims
Claim 3. The answer could be derived more easily, so why use fancy math?
"Does the instrumental variable reframing add anything?"
"So while the second point is true, we already knew it, and we didn’t need no instrumental variables."
The article acknowledges that lots of people calculate these statistics without using an IV framework, in the following footnote:
"IV ideas applied to randomized trials appear in alternate forms in social science and medicine without referencing IV or potential outcomes. Bloom (15) adjusts trial data for treated never-takers. Newcombe (16) derives an adjustment for randomized trials with control-group crossovers. Hearst et al. (17) uses similar reasoning to obtain effects of Vietnam-era military service using the American draft lottery. Baker and Lindeman (18) and Baker et al. (19) use maximum likelihood to derive an IV-type adjustment for nonadherence in a model for Bernoulli outcomes. Some analyses of screening trials, including Atkin et al. (20) and Segnan et al. (21), reference an adherence adjustment due to Cuzick et al. (22). Also focusing on Bernoulli outcomes, the latter derives a maximum likelihood estimator that adjusts risk ratios for nonadherence. The Cuzick et al.’s (22) estimator is an instance of results in Imbens and Rubin (23), which uses IV to compute marginal distributions of potential outcomes for compliers."
So they must agree that the IV framework is not necessary to get the point estimate. They offer reasons for using an IV framework. A lot of these are ways that it is better than other frameworks (intention-to-screen, as-treated, whatever ref. 7 did). Some other reasons offered are "IV estimates and the associated standard errors are easily computed" and there is "an immediate path to off-the-shelf statistical inference." Given how often researchers mess up basic calculations, the more off-the-shelf tools they can use the better. And I'll point out that your simple math doesn't come with a confidence interval, which there's does.
While the most important thing in the article is the calculation of LATE and its confidence intervals, in Section 3B it checks whether deviations between the trial sites can be explained by sampling variation and find that it can. Imagine if it had found that one trial site was sharply different from all the others. This might have indicated a problem with how the trial was run at that site, an important thing to know.
There is a cost to using the framework that is not discussed in the article, namely that it has lots of terminology and mathematics that may be unfamiliar to those not used to the model, and simpler ad-hoc calculations may suffice for at least some of the results. I tend to think there is a large benefit of a research community using standard calculation methods at the expensive of concreteness, but reasonable minds may differ.
Claim 4. Proponenets of IV do not acknowledge selection bias.
"I was contacted by a few economists. They said something like this: .... Because of fancy math reasons, instrumental variables methods are unbiased."
"[PNAS article made] the instrumental variables argument again, except with even more math and even more insistence that selection bias has been solved."
"refusers had less colorectal cancer then controls, even though neither did colonoscopies. Presumably that happens because people have some idea of their cancer risk and are more likely to agree to a colonoscopy when that risk is higher."
I think the quotes in my first comment refute this for the sensible-med article.
For the big study article, they give their opinion on selection bias explicitly:
"The resulting selection bias can go either way. For instance, NORCCAP adherents are relatively educated, and therefore likely to be relatively healthy whether screened or not. But they are also older and more likely to be male, elevating risk. Beyond such demographic differences, adherence may be motivated by chronic health concerns such as diabetes, a history of polyps, or a family history that elevates CRC risk."
And they know that this something they should check:
"Are compliers mostly old or mostly young? Mostly male or mostly female? Do they have pre-existing conditions that predispose them to take advantage of screening? Are complier populations so unusual that the external validity of IV estimates is limited?"
Now I would say that in Section 3 they underdeliver on the above promise, because they only look at demographic factors (age, sex, residency in Oslo, lol). But as you say in your asteriskmag peice "But the dataset used was very limited — it didn’t have income, education, family history...." So it seems they are doing the best they can.
I cannot comment on your private correspondence with fancy-math-loving economists.
I don't know what to have as a conclusion here, but I think how you put the question "So is 0.443% biased, or isn’t it?" is not helpful. All estimates will have limitations. Let's make it an open question "Our point estimate is 0.443%, what further considerations should be applied? In which direction do they move our point estimate?" I don't think you or the article makes a decisive argument either way on this issue. There are small demographic differences in the article between the compliers and refusers. Given that the effect size itself is quite small, I would like to see how robust the outcome to changes in these differences. Likewise you presume that there is a difference in CRC risk between the refusers and the control, based on a difference in rates (from asteriskmag, "At the end of the study period, 1.2% of the controls were diagnosed with colorectal cancer, compared with 1.05% of refusers and 0.89% of acceptors"). But you also report "there weren’t that many colorectal cancer deaths during the trial (229 total), so the analysis was done on sparse data." So is this difference signicant or just sampling error? You also offer the possibility "It’s also possible that refusers just hate going to the doctor and so just had fewer diagnoses, but never mind." And so on.
Once you have a big list of these further considerations, what do you do if some of them would increase the estimate and some of them would decrease the estimate, and some of them say the estimate is just right? Well you need to come up with a model that incorporates these effects, with parameters for their different strengths, and then you have to try to fit the data to the model to estimate these new parameters. Then the cycle of criticism repeats, and science improves.
Bob's task was to give a weight, not prove Alice wrong. So, why does Bob need to rebut anything? "There’s zero obligation to do anything else." He was asked for a weight and gave an answer of 1985kg. Yes, one of them has to be wrong. But, they can also both be wrong. Bob's pointing out of errors in Alice's method would not prove his method correct.
Not all simple mistakes are easy to find. Counting bricks sounds simple, but how can an error in the count be found other than by duplicating the work or offering proof through more complicate math?
Bob pointing out an error in Alice's method wouldn't prove him correct, but it would at least prove Alice *in*correct. That's something, no? And if he doesn't want to re-count the bricks, OK, but I think he should at least commit that that's where he thinks Alice's error lies, rather than leaving it ambiguous if she made an error in the count or if he thinks there's a flaw in her logic, or what.
I'll admit there could in principle be cases where there's a very simple-seeming argument that seems compelling, but is wrong and requires very difficult reasoning to explain why it's incorrect that can't even be summarized or given a reference for, and where it's overall more efficient to, instead of refuting it, just throw a more complex argument back at the person making the simpler one and tell them to sort out the differences are themselves. But I think such cases are quite rare! (And colonoscopies aren't one of them...)
I think there's this mindset from other non-social sciences that spills into social science in a unproductive way. There's no scientific finding that involves human subjects that can divorce itself from human behavior (in this case, selection to comply to treatment assignment). Instead of thinking about whether or not colonoscopies work, you think about whether the policy of nudging people to get colonoscopies work.
This type of evidence answers policy questions and are not necessarily applicable to individual choices or the science of a specific procedure. Maybe in some cases, for some reason, you have very high compliance and it's doubled-blinded, etc etc - but a lot of these studies that don't have these very unusually ideal experimental conditions don't really lend themselves well to extracting the strict science of a procedure, it's always going to be obfuscated by the social stuff.
Not only do you make excellent points, but you also gave me a chuckle as I pondered how one could possibly ever design a double-blind colonoscopy trial. On the patient side, it would really suck to go through all the prep only to discover later that you were in the placebo group. But on the clinician side, I can't imagine how the trial could be blinded.
I cannot imagine an *ethical* way for the clinician to be blinded.
You could blindly and randomly assign the video of the procedure for clinicians to review, like they are double checking someone else's work. But now you've introduced a subtle bias in the reviewer's mind.
But yea, getting a placebo colonoscopy past an IRB sounds fun.
Not only that, what would you do after the placebo colonoscopy? Tell the patients everything looks good, even though you didn't do one? Tell them, "oops, we lost the records"?
But I think blinding isn't as big a problem here as sample bias. In principle, I think you could reduce sample bias quite a lot by just using money: Tell people: "Show up at location X on date Y after having done all the prep for a colonoscopy, and agree to do a colonoscopy, and we will give you Z dollars". Then pay everyone who shows up and perform colonoscopies on a random subset. Colonoscopies are already so expensive that you could raise Z pretty high without increasing costs *that* much.
Not totally sure that would pass IRB, though... colonoscopy prep is pretty unpleasant, so maybe too much harm without benefit for the control group?
Single-blinded sham surgery has actually happened a few times in orthopedics - for some surgeries that involve the knee or the back to reduce pain - with patient consent. They were as successful as the real treatment in most cases too!
https://academic.oup.com/painmedicine/article/18/4/736/2924731?login=false
This post reminds me of Scott Alexander's excellent article, "Getting Eulered":
https://slatestarcodex.com/2014/08/10/getting-eulered/
Good catch! I actually thought I remembered a different SSC which was also relevant (I remember "HARD MATH" being mentioned several times), but I wasn't able to find it.
@dynomight - did you intend for the brick estimates to disagree? I ask because the disagreement does not seem to be part of the analogy.
Alice = 1,000 kg, Bob = 1,985.5 kg
I did mean for them to disagree, but I also see the disanalogy you're pointing out and agree with you. The problem is that I couldn't find a simple brick-based analogy that preserved all the structure of the colonoscopy debate (including them calculating the same numbers). Ultimately, I felt like all the structure that really mattered was "two arguments, one simple, one fancy, different conclusions".
OK, but after all is said and done, do you have an opinion on the effectiveness (or lack thereof) of colonoscopies? I don't know math from mashed potatoes, so I can't really tell from your write-up what you now think. I recently finished reading an interesting book called "Outlive: The Science and Art of Longevity", by Peter Attia, MD. He seems to know what he's talking about, and he comes out strongly in favor of colonoscopies, to the point where I'm thinking of getting another one even though I already had one about six years ago.
I think I was subconsciously hiding my opinion because I don't think I'm enough of an expert to judge that.
But for what it's worth, my impression is that:
(1) There is clear evidence that colorectal cancer screening in general increases life expectancy (including sigmoidoscopy and fecal tests)
(2) There are fairly strong prior reasons to believe that colonoscopy would be more effective than sigmoidoscopy.
(3) So far the limited evidence about colonoscopy isn't all that encouraging in terms of showing an *extra* benefit, but it's *very very far* from conclusive so point (2) is still quite important.
So I would definitely urge getting some type of screening. And I think I agree that colonoscopy is probably the best bet. But colonoscopy is also more unpleasant and expensive, and colonoscopy has a somewhat higher rate of side effects although still small (probably? unclear) and it's hard to say how much better colonoscopy is.
So please get screened (somehow)! I'm personally unsure if colonoscopy is "worth it", especially because I don't know how much you hate doing them, how much it would cost you, your financial situation, etc. But they don't seem like a bad option! In 50 years we might have decided they are indeed much better, or maybe they aren't worth it, but I doubt we'd find that they were drastically worse than, sigmoidoscopy.
I wish I had access to the full paper to see, but isn't the disagreement between your logic and the proposed solution to the logic (instrumental variables) over the validity of the instrumental variable selected? You seem to be arguing that the instrumental variable selected was invalid, because "receives colonscopy" actually does correlate with lower risk of having colon cancer when the paper is arguing it doesn't and only instrumentally causes lower risk?
What was the justification provided in the paper for why that instrumental variable doesn't have an impact on the final variable?
You can read it (a preprint?) here: https://blueprintcdn.com/wp-content/uploads/2023/12/Blueprint-Published-Paper-2023-Instrumental-Variables-Methods-Reconcile-Intention-to-Screen-Effects-Across-Pragmatic-Cancer-Screening-Trials.pdf
But I *don't* think it comes down to the validity of the instrumental variable. I agree the instrumental variable is valid, because I agree that (being invited to participate in a colonoscopy trial) only influences the outcome (gets colorectal cancer) though the intermediate variable (gets colonoscopy). I agree with them that that's true, so I agree with them that the IV calculation is right.
The only disagreements are (1) the IV calculation only estimates causal effects *for acceptors*, which I think is best thought of as "biased" since acceptors are at higher risk and more likely to benefit from colonoscopy though I suppose ultimately it's a semantic debate and (2) I think IV is kinda pointless here since we already knew everything it says without using IV.
As I recall from learning to use IV some years ago in econ grad school, you are quite correct here: IV is good and unbiased when the IV shouldn't be endogenously related to what you are using it for. So as you say, since accepting the invitation is going to be biased towards "I probably should get one, yea" vs "Nah, I am fine, no colon cancer in my family" for those who refuse, it won't be an unbiased IV if the outcome variable is "did they get colon cancer?" Which is the tricky bit, picking an IV that isn't related to the decisions people make relevant to the topic, which is ridiculously hard in many (most?) cases.
Here's a point I heard from a doctor on a podcast:
If everyone got a colonoscopy every three months, no one would die of colon cancer.
(Ooops, I see that Alex C said the same thing-ish - it was Attia who made this point)
Some rough math:
[Global Burden of Disease 2019](https://www.thelancet.com/journals/langas/article/PIIS2468-1253(22)00044-9/fulltext) reports 295 DALYs per 100,000 due to colorectal cancer, which if I understand correctly, means the average person can expect to lose about 87 quality-adjusted life-days to colorectal cancer.
Screening every 3 months starting at age 40 means you will get screened around 120 times in your life. So it's not clear to me that that much screening is worth it. Seems worth it but not obviously so (assuming it has a 100% success rate at preventing cancer). (I've never gotten a colonoscopy so I don't know how unpleasant it is.)
Economist here. I read the initial post and was very tempted to send a message exactly in line with what economists have sent. But then I read your post carefully and noticed your claim was that the 0.443% is biased _for the entire_ population, noted that is correct, and I didn't have any further objections. I fully endorse your "everyone is right" conclusion, and kudos for figuring it out.
Economists sometimes downplay the issue that the IV estimates are specific to the complier population, but that's something they should know. One classic reference about this is Imbens (2010) "Better LATE than nothing".
One note: you ask "I already figured out this number using the obvious assumptions and grade school algebra, why would I need stinking *instrumental variables*? Turns out IV can get complicated in some cases, but boils down to exactly the computation you did when you have binary instrument, binary independent variable. I think the intuition of "IV is rescaling" is underappreciated among economists.
Whew. Thanks so much for writing this. I hesitated to publish this because even after quadruple-checking things, I still worried that I might be missing something. So it's a huge relief to get some confirmation that I'm not completely crazy!
Great post as usual thank you.
But if you believe this 👇
> If two people disagree, it should be the responsibility Dr. Fancy to explain what’s wrong with Dr. Simple, not the reverse.
Than how you would solve the Brandolini's law?
If we demand from the scientists explaining every axiom or theory to the public than we may occupied them from doing new search and discoveries because they are busy explaining to Dr. Simple that their eyes are biased and the earth is not flat.
For an issue such as the shape of the earth, it should suffice for Dr. Fancy to publish a textbook containing the evidence that the earth is spheroidal and leave it at that. He doesn't have the responsibility to debate every random flat-earther Mr. Simple who comes along.
I think today the more traditional practice would be to state that those who believe the earth is flat are all racists and misogynists and no one should ever associate with them.
@dynomight - in the brick analogy, it seems like Occam's Razor would fit pretty well as a logical heuristic to put the explanatory burden on Dr. Fancy.
A couple of things that made me scratch my head.
You 'assume the “decrease” for refusers is zero', but then later say 'refusers had less colorectal cancer then controls'. Is the control group the group of people not invited? If so, wouldn't these two statements be contradictory?
It seems reasonable to guess that the acceptors are biased, and that also means the refusors would be biased. But if so, then surely you can't make the assumption in your calculation?
> You 'assume the “decrease” for refusers is zero', but then later say 'refusers had less colorectal cancer then controls'. Is the control group the group of people not invited? If so, wouldn't these two statements be contradictory?
No contradiction there. Let's clarify two different claims:
Claim 1: (colorectal cancer rate for refusers in world where NordICC happens) ≈ (colorectal cancer rate for refusers in world where NordICC never happens)
Claim 2: (colorectal cancer rate for refusers in world where NordICC never happens) ≠ (colorectal cancer rate for acceptors in world where NordICC never happens)
I think both of these are true. I think the first one is true because getting and refusing an invitation to a colonoscopy probably doesn't change your personal risk of getting colorectal cancer. And I think the second is true because people preferentially agree to participate in colonoscopy trials when their personal risk is higher.
Ok, thanks for the clarification.
So I'm trying to reproduce your math, i.e.,
(0.186% overall decrease)
= 0.42 × (decrease for acceptors)
+ 0.58 × (decrease for refusers)
where (0.186% overall decrease) = (rate of cancer for uninvited) - (rate of cancer for invited).
For the moment I'm just going to consider a 'rate' to be a proportion of the sampled population. Later we can consider them as estimates for probabilities of a general population.
I can get the equation by messing around with the sample proportions if
(decrease for refusers) = (rate of cancer for uninvited) - (rate of cancer for invited refusers).
Then assuming (decrease for refusers) = 0 gets you your result. But (decrease for refusers) = 0 is not the same as Claim 1.
So maybe the equation doesn't come from the above method?
Hmmm. I guess I'd derive that equation this way. Firstly, it's true that
(rate for invited group in world with trial) - (rate for invited group in world without trial)
= 0.42 × ((rate for acceptors in world with trial)-(rate for acceptors in world without trial))
+ 0.58 × ((rate for refusers in world with trial)-(rate for refusers in world without trial))
Next, I claim that
(rate for invited group in world without trial) = (rate for control group in world with trial)
so that
(rate for invited group in world with trial) - (rate for invited group in world without trial) = (observed rate for invited group) - (observed rate for control group) = 0.186.
The rest of the derivation is just being clear about what I mean:
(decrease for acceptors) = ((rate for acceptors in world with trial)-(rate for acceptors in world without trial))
(decrease for refusers) = ((rate for refusers in world with trial)-(rate for refusers in world without trial))
So, finally, we have
(0.186% overall decrease)
= 0.42 × (decrease for acceptors)
+ 0.58 × (decrease for refusers)
Ah cool, I think that makes sense \m/
I think you and (from what you describe) your economist friends are wrong.
In short: the issue you point out is that being called for colonoscopy likely affects the probability of getting cancer; if so that means that the exclusion restriction fails and the LATE formula does not apply, so your economists friends are wrong. Your argument correctly shows that the prob of cancer in the colonoscopied population differs from that in the basile pop by less than 0.443%, but you are wrong to conclude from that that the decrease in prob from getting a colonoscopy in a random pop is less than 0.443%.
Let me explain (sorry for the legnth).
Let Pi be the prob of getting cancer for those invited (at random) to get colonoscopy.
Pa the prob for those invited that accepted.
Pr the prob for those that declined.
Pb the baseline probability (for those not randomly invited).
We have:
Pi = 0.42 Pa +0.58 Pr. Subtracting Pb from both sides:
Pi - Pb = 0.42 (Pa -Pb) + 0.58 (Pr - Pb) (note that -0.42 Pb -0.58 Pb = -Pb )
So we find:
Pa - Pb = (Pi - Pb) / 0.42 - 0.58/0.42 (Pr - Pb)
Now, as you very nicely point out, and your economist friends seem to miss, Pr is possibly < Pb (those invited that initially refused the colonoscopy might get some awareness of the importance of colonoscopy). THIS IS A KEY POINT: BEING INVITED MIGHT HAVE A CAUSAL EFFECT ON THE REFUSERS; this means that the EXCLUSION RESTRICTION FAILS and so the typical LATE formula is invalid. On the other hand the bias could go the other way around: those refusing the colonoscopy will likely be less health-concerned, and healthy that the average population, and so might get more cancer, but the data seems to indicate that the 1st effect dominates).
Under your assumption that Pr<Pb it does follow that Pa - Pb > (Pi - Pb)/0.42 =-0.186%/0.42, that is, the Prob in those accepting the colonoscopy is larger than Pb -0.186% / 0.42 = Pb -0.443%
But from that you cannot conclude that gettin a colonoscopy in a random pop decreases the prob of cancer by less than 0.443%.
The economists get right is that Pa - Pb is not the (causal) effect of the colonoscopy, there is also a selection bias, those accepting the colonoscopy are likely more health-conscious, so likely Pa< Pb + causal effect of the colonoscopy on a random population -note, here causal effec is likely negative- (on the other hand, maybe those accepting have already some minor symptom, then the selection bias is the other way around, Pa>Pb +causal eff).
This selection bias is what the LATE formula nicely solves, BUT ONLY IF THERE IS EXCLUSION RESTRICTION (i.e. treatment assignment affects the outcome only by its effect on treatment); in addition we need a monotonicity assumption. If the exclusion restriction fails, I'm not sure we can say anything about the causal effect (even in a subset of the population), but i'd have to think a bit more about it.
In the wiki artilce on LATE, the exclusion restriction is somewhat weirdly called "excludability condition"
https://en.wikipedia.org/wiki/Local_average_treatment_effect
Also, LATE usually refers to local average treatment effect, not latent average ... since it is the average treatment effect in a particular population: those who change treatment due to the intervention.
Hi there. I didn't understand all of that, but... it does seem like the causal effect of just getting invited is *probably* small, doesn't it? The only story I could come up with is that getting the invitation affects how well you take care of yourself. Seems possible but... likely not to be a big effect And if it did happen, you'd think the effect would wear off over time, but that's not what the time-plots seem to show.
Also, while it's plausible that accepters would be more health-conscious than refusers, that isn't what the data suggests. The refusers has less cancer after 10 years than controls, indicating that people are smart and accept when their risk is higher. (Actually, as I recall, this was true in Poland where most of the subject came from, but in the other countries, it was more like you suggest.)
OK, so I misunderstood your post.
If you accept that being selected only affects chances of cancer via getting the offered colonoscopy (the exclusion restriction holds), then your economists friend were right (thought maybe their explanation wasn't great) and you kind of wrong.
Basically, from the data we have you cannot claim that the effect in the overall population is lower than 0.443%. It could be higher or lower than that (unless the rate in the refusers is less than 0.443% -the decrease cannot be larger than the rate without intervention-, then you are right).
How could it be that the random colonoscopy would help the refusers more than the acceptors, even when the refusers had lower rates of cancer? Accepters are likely more health-conscious, so even without the random colonoscopy many of them might have eventually gotten a colonoscopy early enough to avert cancer (for those the random colonoscopy had no effect), while those that end up with cancer among the refusers might be some who dont get a colonoscopy until it's too late, so for them a random colonoscopy would have been very impactful.
So we only know that the causal effect of the random colosc in the population that accepted is 0.443%. And that is why this is called a Local Average Treatment Effect. The data we have doesn't tell us much about the rate in the refusers population if they got the treatment, so we are in the dark about the causal effec in that population.
What would be certainly biased as an estimate of the effect of the colonosc is to simply compare the rate of accepters to the rate in the overall population. This would be biased because we compare 2 diff populations (acceptors might be more health conscious or more cancer prone, so even if the colonosc had no effect we could see an effect simply due to the baseline population difference). This is the bias that the economists are very happy LATE or instrumental variables gets rid off.
Anyway, yours was a very nice re-discovery of the LATE in a situation where only selected people can get the treatment. If you consider a situation where some assigned to the control population can also get the treatment, you might figure out the derivation that largely got Angrist and Imbens the Econ Nobel prize a few years ago.
I agree that if a colonoscopy would help refusers more than it would help acceptors, then the benefit for a random person would be higher than 0.443%. But I think that's quite unlikely since refusers had much lower colorectal cancer rates than controls. Also the paper notes that no colonoscopy screening program was introduced in any of the study areas during the course of the trial, which I think means that very few people did screening colonoscopies outside of the trial.
Well, comming back to your original post I see:
Rate in control pop 1.2%, rate in refusers 1.05%, so
1.2% = 0.42 * (rate acceptors without rand colonosc) + 0.58 * 1.05
->(rate acceptors without rand colonosc) = 1.41%
So, for acceptors, rate went from 1.41% to 1.41- 0.443 = 0.96 %
while for refusers it was 1.05 wihtout rand colonosc.
It doesn't seem too implausible to me that with the rand colonosc the rate in refusers could go 0.61 or lower. But we don't really know, that is the point I was making.
Now, we could make some assumptions. The relative decrease in cancer in the acceptors pop was 0.443/1.41 = 31%. If we assume this rate is equal in all populations, then the reduction in the refusers woul be 1.05*.31=0.33pp, and the reduction in the overall population 0.443 *0.42 + 0.33*0.58 = 0.377pp, and you would have been right (you might had something like this in mind).
But since we agree that acceptors and refusers are different, this assumption might not be warranted, and this is why the LATE is the best measure we have of the causal effect of a random colonosc.
It's been a while since I did econometrics in undergrad, but I'll try to take a stab at this. Basically, if the procedure is unbiased the estimate is unbiased, and we can show that this experimental method eliminates selection bias.
In summary:
1. We want to know the effect of a colonoscopy on cancer.
2. We can randomly assign people to the colonoscopy group or the not-colonoscopy group, but we cannot randomly assign people to actually get the colonoscopy (they can refuse).
3. The difference between the colonoscopy group and the not-colonoscopy group is an unbiased estimate, because this is randomized. There is a fundamental difference between people who accept colonoscopy once assigned and people who don't accept colonoscopy once assigned, but because of randomization, there is an equal amount of this "fundamental difference factor" in both treatment and control. Hence, there is no bias.
4. The only way being randomly assigned to the colonoscopy group can impact cancer is from actually getting the colonoscopy. Hence, the effect of being randomly assigned to the colonoscopy group gives us an unbiased estimate of the effect of colonoscopy.
The key bit here is that the randomization process controls for selection bias in step 3 already, so as long as we don't introduce any additional bias in step 4, we have an unbiased estimate of what we really care about (the effect of colonoscopy on cancer). It is critical that assignment can only impact cancer via colonoscopy itself — if that's not true, it's biased again.
Part of why your discussion with the economists may have been less productive is because bias is a property of the estimation procedure — by definition, if the procedure is unbiased, the estimate is unbiased. So you can only refute a claim that the estimate is biased by pointing to the process and saying "look, we can prove there's no bias". Rather than being a flawed "parallel argument", I think it's more because bias is process-dependent, so you can only really talking about it in such terms, plus there being too many layers of specialist terminology and math making it confusing.
(In more technical terms, the instrumental variable here is "assignment to the colonoscopy group". We use assignment to the colonoscopy group as an instrument to help estimate the effect of colonoscopy on cancer, since we cannot get an unbiased estimate of colonoscopy on cancer directly).
Dr. Simple,
Your point that Dr. Big Brains needs to explain the problem is most excellent. Made my day complete to see how you laid out the argument.
Call me Molecule.
I believe that you are making invalid assumptions regarding burden of proof.
You seem to be assuming that a rebuttal of a simple argument must, necessarily, be simple (and, less importantly, that the rebutal of a complex argument must be complex). This does not follow, and, if a rebuttal of the simple argument is actually complex (and I find this to be the most common case in my life) then there is little to be gained by rebutting it as many people will view the compex rebuttal in the same way as they view the complex argument - You just end up wasting everybody's time and changing no minds. Contrariwise, I often find simple rebuttals to complex arguments - usually by simply disagreeing with one or more of the premises on which they are based.
I don't think my argument hinges on rebuttals of simple arguments being themselves simple (which I agree often is not true). I'm just arguing that *some* rebuttal should be done before giving a more complex parallel argument.
You are hiding the assumption of simplicity behind the *some*. If the simplest rebuttal is complex then *some* rebuttal must be either complex or wrong and if it is complex then TL;DR
I'm really not. It's OK if the *some* is complex. There's a difference between giving a complex argument that explicitly explains what mistake someone else is making and giving a complex totally unrelated argument that leads to a different conclusion.
I actually wrote (most of) an extra section going into this in more detail, but cut it in the interest of being concise. Since it's relevant, I guess I'll just post it here? It's an example of a case where a simple argument needs a complex rebuttal, but I feel everything is good.
---
Fancy arguments collide with simple ones in lots of other domains. As example of how this often goes *right*, take dieting.
If you want to lose weight, a simple argument is "calories in versus calories out". If you eat less or exercise more, you're guaranteed to lose weight because of thermodynamics or whatever.
Basically everyone agrees that is correct. But many suggest that *practically*, it's the wrong thing to focus on, because your body has regulatory mechanisms that control how much you want to eat, how tired you are, how much you fidget, etc. And it's *extremely* difficult to fight those regulatory mechanisms using willpower. (Debates about this often degenerate into debates about the nature of "free will".)
So opponents of calories-in versus calories-out argue that the biggest *practical* problem of weight loss is how to find ways to play around with those regulatory mechanisms and get a lower set-point without requiring inhuman discipline—e.g. by eating only potatoes, or taking drugs that activate your GLP-1 receptors.
So: We often need fancy arguments. But—because simple arguments are more legible—they should be justified by first explaining what's wrong with simple alternatives.
This comment ended up being snarkier than I had originally intended. Sorry.
Disclaimer: I did not read the big study because it was paywalled.
I think this is mostly (entirely?) a problem of context. In the context of going from data to a summary statistic, bias means estimator bias not selection bias. Would you still have a problem if everyone explicitly said estimator bias? Do you think you would get a better response if you consistently used selection bias/biased by selection effects? I think Recht in the sensible-med article you link knows that IV only gives the treatment effect for those who are treated "There is no perfect way to correct for the misestimates inherent to the intention-to-treat analyses." as does wikipedia https://en.wikipedia.org/wiki/Instrumental_variables_estimation#Interpretation_under_treatment_effect_heterogeneity "Generally, different subjects will respond in different ways to changes in the "treatment" x. When this possibility is recognized, the average effect in the population of a change in x on y may differ from the effect in a given subpopulation."
The sensible-med article does include the line "If it varies from the per protocol analysis, we know the IV estimate is closer to the effect size and we know the per protocol analysis has bias." which I think is suspect, but it's not written by Recht but by the editor Prasad, so I'm willing to ignore it.
Also note, estimation means estimation of a parameter in a model. Everything is a model. Your model is the standard IV model + a difference in response rates correlated to underlying risk. Granted, this is probably a better model. But do you know how to take the data and estimate all the parameters in your model? No. You've used the standard IV model and then reasoned that in your model the treat effect parameter would be less, but not by how much. The advantage of using simpler and standard models is that you know which calculations to do to get the estimations and even get the confidence intervals of that estimate. Maybe you are unimpressed by that, because the standard model is "too simple". As the adage goes, all models are wrong but some are useful.
I think you should read the big study. I've posted a non-paywalled link elsewhere here. I find it very unlikely you would find it "simpler" than my analysis.
I read the article as preprint, linked in another comment
https://blueprintcdn.com/wp-content/uploads/2023/12/Blueprint-Published-Paper-2023-Instrumental-Variables-Methods-Reconcile-Intention-to-Screen-Effects-Across-Pragmatic-Cancer-Screening-Trials.pdf
First, I think you mostly agree with the article?
You: "If you want to know how good colonoscopies are, probably you’d like to know what would have happened if everyone had agreed. Surely the decrease in colorectal cancer would have been larger than 0.186%. But how much larger?"
Article: "differential adherence makes ITS effects hard to compare across trials and sites. We show how instrumental variables (IV) methods address the nonadherence challenge"
I think they are quite clear about what they are calculating:
"LATE is also per-protocol effect, but not for everyone: as Eq. 5 shows, [parameter] gives the average causal effect of screening among experimental subjects screened as a result of the trial."
"Importantly, when all subjects not offered screening remain unscreened, LATE equals the average effect of screening on everyone in the study population who is screened."
You explicitly make two claims:
Claim 1. "If two people disagree, it should be the responsibility Dr. Fancy to explain what’s wrong with Dr. Simple, not the reverse."
Claim 2. "simple math doesn’t, like, disappear when a fancy alternative is presented"
I broadly agree with the Claim 1, with exceptions for trolls and bad-faith actors, but I don't think you substantially disagree with the linked articles, so it doesn't apply here. And for Claim 2, simple math can be a nice check on fancy math, and both you and the article arrive at similar figures, huzzah!
I'm more interested in adressing these more conrete implicit claims
Claim 3. The answer could be derived more easily, so why use fancy math?
"Does the instrumental variable reframing add anything?"
"So while the second point is true, we already knew it, and we didn’t need no instrumental variables."
The article acknowledges that lots of people calculate these statistics without using an IV framework, in the following footnote:
"IV ideas applied to randomized trials appear in alternate forms in social science and medicine without referencing IV or potential outcomes. Bloom (15) adjusts trial data for treated never-takers. Newcombe (16) derives an adjustment for randomized trials with control-group crossovers. Hearst et al. (17) uses similar reasoning to obtain effects of Vietnam-era military service using the American draft lottery. Baker and Lindeman (18) and Baker et al. (19) use maximum likelihood to derive an IV-type adjustment for nonadherence in a model for Bernoulli outcomes. Some analyses of screening trials, including Atkin et al. (20) and Segnan et al. (21), reference an adherence adjustment due to Cuzick et al. (22). Also focusing on Bernoulli outcomes, the latter derives a maximum likelihood estimator that adjusts risk ratios for nonadherence. The Cuzick et al.’s (22) estimator is an instance of results in Imbens and Rubin (23), which uses IV to compute marginal distributions of potential outcomes for compliers."
So they must agree that the IV framework is not necessary to get the point estimate. They offer reasons for using an IV framework. A lot of these are ways that it is better than other frameworks (intention-to-screen, as-treated, whatever ref. 7 did). Some other reasons offered are "IV estimates and the associated standard errors are easily computed" and there is "an immediate path to off-the-shelf statistical inference." Given how often researchers mess up basic calculations, the more off-the-shelf tools they can use the better. And I'll point out that your simple math doesn't come with a confidence interval, which there's does.
While the most important thing in the article is the calculation of LATE and its confidence intervals, in Section 3B it checks whether deviations between the trial sites can be explained by sampling variation and find that it can. Imagine if it had found that one trial site was sharply different from all the others. This might have indicated a problem with how the trial was run at that site, an important thing to know.
There is a cost to using the framework that is not discussed in the article, namely that it has lots of terminology and mathematics that may be unfamiliar to those not used to the model, and simpler ad-hoc calculations may suffice for at least some of the results. I tend to think there is a large benefit of a research community using standard calculation methods at the expensive of concreteness, but reasonable minds may differ.
Claim 4. Proponenets of IV do not acknowledge selection bias.
"I was contacted by a few economists. They said something like this: .... Because of fancy math reasons, instrumental variables methods are unbiased."
"[PNAS article made] the instrumental variables argument again, except with even more math and even more insistence that selection bias has been solved."
"refusers had less colorectal cancer then controls, even though neither did colonoscopies. Presumably that happens because people have some idea of their cancer risk and are more likely to agree to a colonoscopy when that risk is higher."
I think the quotes in my first comment refute this for the sensible-med article.
For the big study article, they give their opinion on selection bias explicitly:
"The resulting selection bias can go either way. For instance, NORCCAP adherents are relatively educated, and therefore likely to be relatively healthy whether screened or not. But they are also older and more likely to be male, elevating risk. Beyond such demographic differences, adherence may be motivated by chronic health concerns such as diabetes, a history of polyps, or a family history that elevates CRC risk."
And they know that this something they should check:
"Are compliers mostly old or mostly young? Mostly male or mostly female? Do they have pre-existing conditions that predispose them to take advantage of screening? Are complier populations so unusual that the external validity of IV estimates is limited?"
Now I would say that in Section 3 they underdeliver on the above promise, because they only look at demographic factors (age, sex, residency in Oslo, lol). But as you say in your asteriskmag peice "But the dataset used was very limited — it didn’t have income, education, family history...." So it seems they are doing the best they can.
I cannot comment on your private correspondence with fancy-math-loving economists.
I don't know what to have as a conclusion here, but I think how you put the question "So is 0.443% biased, or isn’t it?" is not helpful. All estimates will have limitations. Let's make it an open question "Our point estimate is 0.443%, what further considerations should be applied? In which direction do they move our point estimate?" I don't think you or the article makes a decisive argument either way on this issue. There are small demographic differences in the article between the compliers and refusers. Given that the effect size itself is quite small, I would like to see how robust the outcome to changes in these differences. Likewise you presume that there is a difference in CRC risk between the refusers and the control, based on a difference in rates (from asteriskmag, "At the end of the study period, 1.2% of the controls were diagnosed with colorectal cancer, compared with 1.05% of refusers and 0.89% of acceptors"). But you also report "there weren’t that many colorectal cancer deaths during the trial (229 total), so the analysis was done on sparse data." So is this difference signicant or just sampling error? You also offer the possibility "It’s also possible that refusers just hate going to the doctor and so just had fewer diagnoses, but never mind." And so on.
Once you have a big list of these further considerations, what do you do if some of them would increase the estimate and some of them would decrease the estimate, and some of them say the estimate is just right? Well you need to come up with a model that incorporates these effects, with parameters for their different strengths, and then you have to try to fit the data to the model to estimate these new parameters. Then the cycle of criticism repeats, and science improves.
Bob's task was to give a weight, not prove Alice wrong. So, why does Bob need to rebut anything? "There’s zero obligation to do anything else." He was asked for a weight and gave an answer of 1985kg. Yes, one of them has to be wrong. But, they can also both be wrong. Bob's pointing out of errors in Alice's method would not prove his method correct.
Not all simple mistakes are easy to find. Counting bricks sounds simple, but how can an error in the count be found other than by duplicating the work or offering proof through more complicate math?
Bob pointing out an error in Alice's method wouldn't prove him correct, but it would at least prove Alice *in*correct. That's something, no? And if he doesn't want to re-count the bricks, OK, but I think he should at least commit that that's where he thinks Alice's error lies, rather than leaving it ambiguous if she made an error in the count or if he thinks there's a flaw in her logic, or what.
I'll admit there could in principle be cases where there's a very simple-seeming argument that seems compelling, but is wrong and requires very difficult reasoning to explain why it's incorrect that can't even be summarized or given a reference for, and where it's overall more efficient to, instead of refuting it, just throw a more complex argument back at the person making the simpler one and tell them to sort out the differences are themselves. But I think such cases are quite rare! (And colonoscopies aren't one of them...)