Recently, el gato malo called for purr review on the Wesselink et al study on the effects of covid vaccination and covid infection on fecundity. He noticed that the claimed effects (in particular, increased fecundity after vaccination) are in contradiction to the raw data.
Basically, the paper works like this: a sample of women and men are observed over a number of menstrual cycles, until pregnancy happens or the observation ends. We will not be concerned with later steps (“adjustments”) since they are indistinguishable from magic. In the first step, however, an unadjusted fecundity ratio (FR) is being computed. This FR is meant to estimate the factor between probability to conceive, per menstrual cycle, in the base case (e.g., unvaccinated) and in the test case (e.g., vaccinated). It is, however, the result of employing a certain model, namely Cox regression. Alternatively, one might simply divide the number of pregnancies by the number of cycles, and then compute some kind of raw FR. El gato malo did exactly this and found that based on the raw data, vaccination seems to have a negative effect on fecundity, whereas the Cox FR given in the paper claims a positive effect.
Now, it is difficult to judge the paper on its own merits since we do not have access to the raw data and to the machinery employed. However, by looking up Amelia Wesselink or the other authors, I got the impression that there is a kind of industry going on here where certain databases are being milked for papers. All kinds of phenomena are being thrown at couples trying to procreate, and the results are always being evaluated in similar fashion. I will go through some of the papers and compare the results. I will always list
the number of pregnancies and the number of cycles,
the (pregnancy) ratio, computed as [number of pregnancies / number of cycles],
the raw FR, computed as [ratio / ratio in the base case],
the Cox FR, as stated in the paper,
and raw/Cox, computed as [raw FR / Cox FR].
The base case is taken from the respective paper (more on that later) and displayed on green background. The idea behind raw/Cox is that it should be close to one, and more so if sample sizes are large, because in the limit raw FR and Cox FR should be estimates for the same quantity. I therefore always ordered the entries in the table by decreasing number of pregnancies. Usually I will not show the categories, except maybe in passing when commenting on the numbers.
Let’s start with the effect of age (of the women first, and then of the men):
Our test variable raw/Cox indeed stays close to one in most cases, more so for the women. This might have to do with the choice of base case, which is the youngest age category (21-24). Both the number of cycles and the probability to conceive are low in this category.
Alcohol consumption might be another thing to worry about:
Here the base case has a large number of cycles and pregnancies, and all the raw/Cox ratios stay close to one.
Now, uptake of different kinds of antibiotics:
The sample in the base case (no antibiotics) is huge, but samples in other categories are small, leading to raw/Cox ratios deviating from one. And in one category (penicillin) something happens that is worth marking red: raw FR is smaller than one, but cox FR is larger so that the two estimators are pointing in different directions regarding effect.
And for the men, we have something more exotic, cellular telephone exposure:
Again, small sample in the base case, and one category with raw FR and Cox FR pointing in different directions.
I could go on, but maybe we have seen enough by now, and can return to the vaccine study:
Strange, isn’t it? Large sample sizes, and yet (at least for females) raw/Cox ratios far from one, and raw FR and Cox FR pointing in different directions.
Could be fraud, but why? Could be just a mistake, but is this likely when it’s the authors’ 365th paper using the same methodology? My guess is that it is just a bad idea to use Cox regression in this situation. Maximum likelihood estimation is rarely the best choice anyway. And I lack intuition about application in the case of time-dependent but one-way covariates (like vaccination). I stand with el gato malo, and consider the raw ratios instructive. Which means that Covid vaccination is not the best of ideas.
Will admit to having never used a Cox model (or perhaps I did in graduate school and have since forgotten), but just reading the basics, I'm not sure it would be appropriate in this case. My understanding is that the model makes the assumption that the probability of the "hazard" (in this case, a pregnancy) is does not change over time (i.e. the probability of getting pregnant in month n+1 is the same as the probability of getting pregnant in month n, controlling for the other variables). We know that births are seasonal, so the likelihood of getting pregnant in a given month is not constant. Given that vaccination rates were quickly changing during the enrollment period, it could be that the sample sizes of unvaccinated/vaccinated varied greatly between the low pregnancy months and the high pregnancy months.
Looking at the supplemental data from the paper, there are clues as to why the adjustments changed the direction of the signal. In particular, the unvaccinated women reported more frequent sex (smaller % < 1/wk, larger % >4x per wk), and a larger % of unvaxxed reported reported "doing something to increase chances of conception". As always, how we arrive at these "adjustments" can make all the difference. This is why we need more of an open source system so others can confirm the analyses, as the bad cat recommends.