Discover more from cm27874
Baselines and bootstrap
Most data are collected, processed, and reported by the Robert Koch Institut (RKI). Two files are of particular interest: one about (PCR) testing (number of tests, and number of positive test results, per week), and one about Covid cases.
The fun begins when one combines the two, and computes various proportions:
Throughout 2020 and 2021, testing was extensive, and only a small proportion of cases (“prop positive”, blue) came back with a positive result. Cases were always affirmed by PCR tests, and hovered between 75% and 100% of positive tests (“cases/positive”, red) because quite frequently, people were tested several times (until negative). For most cases (around 80%), information about presence of symptoms (“info on symptoms?”, green) and about hospitalization (“info on hospitalization?”, yellow) was recorded.
Then, at the end of 2021, they got sloppy, presumably by changing the testing strategy (see also Witzbold’s comments on England): a PCR test was no longer required to affirm a case (a simple antigen test will do). Consequently, the number of cases started to exceed the number of positive test results, and the ratio has reached heights of around 140%. Unfortunately, this means that there is much less data on presence of symptoms and on hospitalizations (why the two descended to different levels escapes me). Still, in the “prop positive” ratios we can observe those curious 3-month waves.
I suspect that there is now underreporting of Covid cases. One hint comes from the proportion of men among cases (“men/cases”, grey). The number has been declining throughout 2022, and now stands at only 44%. Yes, we have seen this before, at the end of 2020, at a time when old people were being hit hard by Covid, and there are many more old women than old men in Germany (less than 40% of those 80 or older are men). Maybe we are missing cases among old people? Another explanation might be that, in general, men seem to be less interested in their health, and less interested in following official guidelines on testing and on reporting a positive test result.
In order to validate such shaky claims, it is usually great if one can draw on an alternative, independent data set. A candidate for such a set are the DIVI data on ICU occupancy. Here I am updating my analysis from May 2022.
First I am going to look at weekly official Covid deaths (this is still RKI territory):
Covid deaths (blue) now seem to arrive in lower-amplitude, but higher-frequency waves. At the beginning of 2022, in the early omicron days, the baseline (case rate for 1/2/3 weeks divided by all-cause deaths, “base(x)”, red) explained quite a lot of Covid deaths (around 60% for the 3-week case rate, “base(3)/deaths”, grey area, right axis). This proportion has been going down throughout the second half of 2022, now standing at around 25%. Underreporting of cases (numerator) is one possibility, more Covid deaths (denominator, e.g., due to more dangerous recent variants) another. Curiously, deaths on ICU reported by DIVI (“DIVI deaths”, green) have now converged to the baseline. Note that DIVI is not distinguishing between ICU cases “with” Covid and cases “because of” Covid.
Going from ICU Covid deaths to ICU Covid cases (as reported by DIVI), we observe a similar pattern. A baseline (case rate for 1/2/3 weeks divided by constant ICU occupancy of 20,000, scaled to one week, assuming four-day average stay on ICU, “base(x)”, red) nicely explained everything that was happening at the beginning of 2022 (“base(3)/ICU Covid”, grey area, right axis), and fails to do so recently.
Now, a word of caution: as with all things Covid, age plays a significant role. Case rates for different age groups have never been close to synchronous throughout the pandemic. Therefore, particularly when comparing populations with very different age distribution (as is surely the case for the “ICU population” versus the total population) one should stratify by age. Unfortunately, the usual problems raise their heads: different agencies are using different age groupings, DIVI is providing age grouping only for current ICU occupancy but not for new admissions, etc. I am still wondering if the DIVI data can be operationalized to bootstrap (i.e., reconstruct) case rates, and so to validate the RKI data. I started by looking at data for children:
The vertical axis counts weekly cases per 100,000. The RKI is publishing these by age group (0-4 red, 5-9 yellow, 10-14 green, 15-19 pink). By assuming constant ICU occupancy of 2,000 (a reasonable assumption according to the DIVI reports), I scaled new admissions (reported since middle of 2021) to a bootstrapped case rate (“bootstrap new admissions”, black). Note that (as is visible by the jagged nature of the graph) this is a low-resolution exercise. A single additional admission will produce an increase of 100,000/2,000 = 50. The total number of admissions for the whole period since calendar week 31/2021 is only 1,129 (well, “only” should be read as “thank God”). Average occupancy is available from DIVI as well. By using total numbers of admissions divided by sum over average occupancy as scaling factor for length of stay, a second bootstrapped case rate (“bootstrap occupancy”, blue) is obtained. Since the above diagram might be a little hard to entangle, here it comes again, only with logarithmic vertical axis:
In 2021 and the first half of 2022, bootstrapped case rates could be interpreted as a weighted average of RKI case rates, indicating that most kids were on ICU “with”, and not “because of”, Covid. This seems to have changed recently.
I will continue to contemplate doing something with the DIVI data on adults, but this means sailing carefully between the Scylla named Simpson and the Charybdis of Small Samples.