As I understand the spread of COVID, there is a delay between exposure and positive tests; an intervention that either increases or decreases the spread of COVID should take about 5 days to show up in the numbers infected. In this case, the response shows up immediately, which makes me think there is something else going on.
Note also that the top panel shows that there were no affects from locking down; Wuhan had already stopped having many more new infections, and New York and Italy both continued to have rapidly spreading infections.
Andrew Gelman discusses this paper on his blog; I would recommend checking out his post here. The basic point Dr. Gelman makes and/or links to is that discontinuity-based models are very sensitive to some choices you make, especially when there is a strong day-of-week effect. Others have demonstrated that by changing some of the tuning parameters for the model, they can show almost any conclusion they want.
What can I add?
I find the linear growth of numbers infected very suspicious. As I understand, epidemics usually spread exponentially. I believe this issue in the paper is an artifact of trusting the official numbers, something I cautioned about previously. Moreover, Zhang et al. discuss neither the question of different levels of compliance in different areas, nor collateral issues that affected the spread of COVID. As releasing sick people to nursing homes is a theorized driver of the spread of infection in North Italy, New York City, and the United Kingdom, this is a curious omission.
What would I do differently? This paper shows an issue common to junior statisticians with too high an opinion of their methods (e.g. the me that shows up in somebody else's memoir). You have these cool techniques, and think they can show important insights into datasets that are lying around. But the world is complicated, and most of these methods make very strong assumptions. They are violated; all models are wrong, but some are useful. You need to understand the data you are working with in greater detail, and, more importantly, know how it was generated, and what differences might exist between different groups within your data. I believe the "fixed" version of this study would look a lot like the study in Germany I wrote about previously.
Huh... I usually only use linear regression for interpolation on dataset with unknown functions...
ReplyDeleteIf I were to fit the NYC tea leaves, I would expect a rate change that happens after almost 3 weeks (my observed spread of Unenforced compliance is about 1-2 weeks early in Covid19) that I would attribute to policy onset followed by a noncompliance/leaker induced explosion and then a steady recovery as policy long term effects take over again. However, I don’t have a feel for the noise level impact on an exponential growth curve...especially with respect to changes in sampling at the early stages.
Which makes the linear growth instead of exponential before the masks all the more interesting.
ReplyDeleteIs it linear? At this level, how would you tell the difference between linear, exponential, s-curve etc. We know it is a multifunctional expression because of the wacky constraints on the time domain, but it’s so noisy that you get into trouble. It is possible that the only time we could observe pure exponential growth is before the onset of identification and minimum defenses in a large population.
ReplyDelete