My new favourite toilet-paper manufacturer, Deutsches Ärzteblatt, informed me that The Lancet Infectious Diseases has a new study out on the “effectiveness of a bivalent mRNA vaccine booster dose to prevent severe COVID-19 outcomes”. The authors explain:
We did a retrospective, population-based, cohort study in Israel, using data from electronic medical records in Clalit Health Services (CHS), a large health-care organisation. CHS covers approximately two-thirds of the Israeli population aged 65 years or older, the age group who are prioritised for COVID-19 vaccination. The CHS data repositories were previously described in COVID-19 studies. The bivalent mRNA vaccine administered in Israel was the Pfizer-BioNTech COVID-19 vaccine (BA.4 and BA.5). The study observation period commenced on Sept 27, 2022, 7 days after the bivalent vaccination campaign was initiated in CHS, and ended 120 days later on Jan 25, 2023.
Please do not expect me to DESTROY the study, but allow me to guide you through my thought processes. This is what I would have done if assigned peer reviewer (having almost no idea about medicine, but some experience with data and statistics) to the paper.
This is not a double-blind cohort study, but then double-blind cohort studies sometimes only refer to the blindness of the authors, the blindness of the peer reviewers, and the cohorts of readers accepting them uncritically. It is an observational study with quite impressive sample size. Altogether, “569,519 eligible participants were identified. Of those, 134,215 (24%) participants received a bivalent mRNA booster vaccination during the study period”.
Most booster recipients spent quite some time in unboostered purgatory. It therefore makes sense not to count individuals but days spent in the two possible states. The average number of days spent unboostered is reported to have been 102, which makes around 102 * 569,519 = 58,090,938 days (the first three digits should be accurate). Those accepting the booster then went on to spend 68 days in boostered heaven, so that’s another 68 * 134,215 = 9,126,620 days. In total, therefore, 67,217,558 person-days went into the study, which is around 118 per person. The study covered a period of 120 days, so why are two days per person missing?
Now, participants of the study were old: around 75 years on average, with a standard deviation of 7.5 years, and a lower bound of 65. If you are 75% years old, your chances of dying within a year stand at around 3%, or 1% for the coming 120 days. When people died, they were removed from the study from that point in time. With a 1% chance of death, and death in the middle of the 120-day period on average, we can expect to lose 120 * 1% * 0.5 = 0.6 days of observation per person. Less than two days, yes, but the probability of dying might be larger (it rises roughly exponentially with age), and there might be other reasons for censoring (it is not easy to keep track of 569,519 people).
So, nothing to worry about here – but keep in mind that a 1% chance of dying means around 5,700 deaths to be expected.
Now, what happened? Based on patients’ primary diagnoses in hospital discharge letters, 541 hospitalisations due to Covid-19 were identified for the unboostered, and 32 for the boostered. Back-of-napkin computation of effectiveness against hospitalisation based on person-days gives 1 – (32 / 9,126,620) / (541 / 58,090,938) = 62%.
But the paper claims 72%; why is that? That is because the authors have access to a statistics package, and use a fancy method called Cox regression – which immediately reminds me of my analysis of fecundity papers, and the strange effects therein. The idea is to control for all kinds of risk factors, and the authors have been busy identifying 24 of them: sex, age, population sector, socioeconomic status score, data on previous Covid vaccinations and infections, and various clinical risk factors. We might disagree about the exact number (do you count three for population sector or just one?), but it is still a lot of variables. The beauty (or the horror) of Cox regression is that this method is able to simultaneously incorporate them all, producing a hazard ratio per variable. A hazard ratio of 1 means that the variable does not affect the event to be observed (hospitalisation for Covid in this case). A hazard ratio of less than one is favourable, reducing the chance of hospitalisation.
And the hazard ratio for the bivalent booster is 0.28 (see Table 3 of the paper), leading to claimed effectiveness of 1 – 0.28 = 72% (note that this conversion between hazard rate and effectiveness is legitimate). Hazard ratios for clinical risk factors are greater than one, as might be expected. Previous vaccination or infection, on the other hand, seems to have a positive effect. Most of the p-values are tiny.
Are we happy now? Are we really happy about calibrating around 24 variables based on 541 + 32 = 573 events? I mean, this is like guessing, as a kid, all the presents in your advent calendar from knowing what your parents did during 573 random moments in November. The statistics software used by the authors of the paper does not care (maybe it should; please, AI, there is something to do for you), but I do. My prime suspect is the “population sector: Arab” variable: around 13% (56,259) of the unboostered belonged to that group, but only around 1% (1,194) of the boostered (which makes 10% of the total sample). Still, the hazard ratio for hospitalisation is 0.29. What is going on there? Refusing the booster and refusing to be hospitalised for Covid? I do not have the data, but instead of around 57 hospitalisations (10% of all hospitalisations), the hazard ratio of 0.29 is compatible with something like 20. Are different groups more reluctant to go to hospital? Are they maybe treated in different hospitals? Are the discharge letter procedures comparable across hospitals?
But these are just pygmy elephants. The big one in the room is the now-infamous Bayesian datacrime: the first seven days of limbo after injection are still counted as unboostered. We are talking about 7 * 134,215 = 939,505 person-days, 1.6% of unboostered, and 10.3% of boostered person-days. How many events have happened during such times, and what would happen if we classified them as boostered?
If no hospitalisations had happened during limbo, vaccine effectiveness would go from 62% to 66%. If limbo was like boostered heaven, we would have counted 10.3% * 32 = 3 hospitalisations, and determined effectiveness to remain at 62% (of course). If being in limbo was instead more like being unboostered, we would have seen something like 1.6% * 541 = 9 hospitalisations, and effectiveness of only 56%. We can even solve for an effectiveness of zero, and find that this corresponds to around 53 hospitalisations in limbo. But what might be a reasonable estimate?
To find such, we turn to the paper’s only figure, which I augmented by the development of Covid case rates in Israel (black line; taken from Our World in Data; the scale is not important).
The graphs are actually less interesting than the numbers below it. The observation period of 120 days has been divided into ten 12-day chunks. The number of booster non-recipients goes down while the number of hospitalisations among them increases, the green figures having been supplemented by yours truly, indicating the increase during the respective chunk. I have no idea why the number of booster recipients (highlighted in yellow) is shown in decreasing order.
We are not really surprised, are we, that the bulk of hospitalisations nicely coincides with the bulk of vaccinations (in red, according to my assumption that the order of the numbers of booster recipients has to be reversed). For example, periods 5 to 8 account for only 40% of time but for 57% of hospitalisations, and for 57% of vaccinations as well.
Now, assuming that being in limbo is like being unboostered, we can weight hospitalisations accordingly. For example, in the first period there are 43 hospitalisations among 569,519 people, and 8,084 vaccinations. We therefore expect 43 * 8,084 / 569,519 * 7/12 = 0.427 limbo hospitalisations. The factor of 7/12 accounts for the fact that the period lasts twelve days, and limbo only seven. Of course we can debate if the factors should be a little lower for the first and last period but that has no great bearing on the result. Doing this for all ten periods, and summing up, we get around ten hospitalisations in limbo, and adjusted effectiveness of 54%.
The authors of the paper surely know the correct values, but chose not to publish them. Why didn’t they include being in limbo as variable no. 25? Why didn’t they look at all-cause hospitalisation and all-cause mortality? I am a little annoyed by these run-of-the-mill papers. Sure, abiding by the playbook will get you published in the Lancet family, boostering your career, but don’t delude yourself by thinking that this was science. You can keep some of your beloved vaccine effectiveness in exchange for complete transparency. Deal?
This is a great idea to skeptically analyse the studies promoted in Ärzteblatt. As you've pointed out, many doctors rely on this "mainstream" newsletter to inform them about developments which they don't have time (or motivation?) to investigate more thoroughly themselves. (it's a bit like one of those Testsieger magazines which we all know are so impartial ;)
Cox regression assumes that the influences of the independent variables are constant over time.
They write, "A Cox proportional hazards regression model with time-dependent covariates was used to estimate the association between the bivalent vaccine and hospitalization due to COVID-19 while adjusting for demographic factors and coexisting illnesses."
Fine, the model handles time-dependent variables, but does it handle their time-dependent influences also? My guess is no, and I bet they didn't check.