In Lancet’s house are many mansions; a place has been prepared for every cargo-cult scientist sitting on a large data set and willing to join the great Covid vaccine praise chorus. Today we celebrate The effectiveness of COVID-19 vaccines to prevent long COVID symptoms: staggered cohort study of data from the UK, Spain, and Estonia.
Halleluja! Three countries! Four databases! Millions of participants! A 136-page appendix which has been peer reviewed as well! Large-scale propensity scores! Logistic regression with LASSO regularisation! Overlap weighting! Fine-Gray models! Sub-distribution hazard ratios! Kaplan-Meier plots and log(–log) plots! Cox proportional hazards regression! Random effect meta-analysis! And a protective effect against long Covid of even the first vaccine dose, with Hazard ratios between 48% and 71% (for the different databases, corresponding to vaccine effectiveness between 1 – 71% = 29% and 1 – 48% = 52%)!
Alas, the mansion has been built on sand. To see this, one has to ignore the main paper, more or less, and dive into the appendix. I will concentrate on the Clinical Practice Research Datalink (CPRD) AURUM database, which mainly covers England. I have compiled AURUM data from three appendix tables (S28, S30, S32) and table 2 from the main paper (reformatted to correspond to the appendix tables):
Quite large numbers of individuals are included in the study, around 5.7 million as “vaccinated” and around 5.9 million as “unvaccinated”. Cohorts were formed during different periods, corresponding to the UK vaccination schedule (roughly, January 2021 for cohort 1, February for cohort 2, March and half of April for cohort 3, the rest of April until end of July for cohort 4). The four tables differ by definition of long Covid (Table 2: certain symptoms between 90 and 365 days after infection; Table S28: same symptoms, but between 28 and 365 days after infection; Table S30: post-acute Covid between 90 and 365 days after infection; Table S32: same, but starting earlier than after 90 days). I do not care about the definitions; the post-acute definition is obviously more restrictive than the symptoms definition (leading to fewer cases); and extension of the period always leads to more cases.
Observation for the AURUM cohorts ended in January, 2022 (I am assuming this means end of December, 2021; this is not clear from the paper). Two facts leap to the eye:
The numbers and proportions of Covid infections are way lower for the “vaccinated” than for the “unvaccinated”
Nevertheless, the proportions of long Covid among the Covid-infected, no matter the definition of “long Covid”, are almost always higher for the “vaccinated”
How do you get effectiveness against long Covid from such data? Only by not computing effectiveness conditional on Covid infection, but by directly holding long Covid figures against the total sample. In fact, this seems to be what the authors want to express when they say, in the introduction to the paper, that “the effect of vaccines to prevent SARS-CoV-2 infections is a crucial factor to include when estimating vaccine effectiveness to prevent long COVID”.
Without access to the data, I am not in a position to reproduce the results, but I call bullshit anyway. All results in the main body of the paper are based on a certain censoring rule. Hold your breath, that’s a technical term, and means removing an individual out of a cohort in case of a certain event; in this case:
Unvaccinated people were censored when they received a first vaccine dose.
Vaccinated people were censored when they received a second vaccine dose.
The effects are hilarious, as will become clear from table S54 in the appendix:
The figures in columns 3 and 4 seem to show the median and the quartiles (this is not made explicit here, but at other places in the appendix). For example, 75% of the people in vaccinated cohort 1 were censored after 79 days or less; most of them (97.2%) because they received their second dose. And the authors want to lecture us about long Covid between 90 and 365 days after infection…
Fortunately, the data without censoring are there, albeit only in the appendix. For those of you having wondered why there are no tables with odd numbers, here we go:
The captions only mention absence of censoring for the vaccinated group, but the veil over the unvaccinated has been lifted as well (otherwise their Covid and long Covid figures would not have changed).
Now we have enough material to get our hands dirty, and approximate some effectiveness measures ourselves. The first step is to go from numbers of individuals to person-days spent under observation. For the vaccinated, this is easier. Basically, ignoring death or technical reasons, the average time spent in the study has to lie around the number of days between the middle of the cohort-forming period, and the end of the year. I will be using 350, 320, 285 and 210 days for cohorts 1-4.
The unvaccinated, however, should be censored after having received a first dose. The minimum number of happy unvaccinated days is zero; the quartiles and the median we get from table S54 above; and the maximum has to lie around the number of days between the beginning of the cohort-forming period, and the end of the year. There are several estimators1 for the mean from these five data points. I computed
and
and I will use mean(5), because the follow-up time distributions are so heavily skewed. These are the results (and the maxima I chose) by cohort:
Now we can convert numbers of individuals into person-years by multiplying with the respective average follow-up time, and dividing by 365 (left/orange: vaccinated; right/green: unvaccinated):
These person-years serve as basis for computation of Covid and long Covid rates (I am restricting to long Covid symptoms because the post-acute figures are so small). Again, for the vaccinated (orange) we use the uncensored version, and the censored one for the unvaccinated.
When the average follow-up times converge (as they do with increasing cohort number), so converge the long Covid rates (and the Covid rates for the vaccinated even surpass those for the unvaccinated). What we see here, it seems to me, is mainly the effect of cohort formation timing. It makes all the difference if a cohort goes through specific phases of high or low infection prevalence. To illustrate, I annotated a snapshot of His Majesty’s Dashboard:
Cohort 1 for the unvaccinated, with median follow-up time of 20 days and a 75% quantile of 52 days, has most of the weight on January and February, missing the spring valley. The vaccinated from this cohort, on the other hand, remain vaccinated all throughout the year.
Cohort 4, in contrast, remains stable until the end of 2021 (people who rejected vaccination until the end of July usually sticked to their decision). For people in this cohort, the vaccines did not prevent long Covid, and even led to higher Covid infection rates (as had been clear from the early surveillance reports).
To summarize, without stupid censoring the claim of effectiveness against long Covid (and Covid infection) does not hold water.
This, of course, does not prevent the authors from making more hay from their data. The BMJ has already been infected. Look out for “The role of COVID-19 vaccines in preventing post-COVID-19 athlete’s foot”, coming soon, probably in NEJM.
I always wonder about papers like these: do the authors know of the flaws in their papers and just don't care because they get to publish? Don't any of them want to know what is actually going on? Or are made up stories fine with them because they got to publish?
Thanks for doing this work!