On June 10th, there were 2,000,000 COVID cases in the US, with 113,000 deaths. In California, there were about 130,000 cases, with 4,600 deaths. What can we conclude from this? California has slightly over 10% of the US population; does this mean it has a lower infection rate than the country overall (even after removing New York and New Jersey’s infections)? Is its 3.5% fatality rate evidence of better medical care than average for the US with a 5.6% fatality rate? Or the world’s 5.7% fatality rate?
We can conclude nothing.
Even assuming that the numbers are being correctly reported, there are many factors which make definitive interpretations problematic. The first is in counting cases of infections. A state which conducts more tests will report more infections. The next is in the reliability of the test. No test is 100% accurate; are the tests being used good enough? Are clinical symptoms (perhaps with hospitalization) good enough to count a case, or does a test need to come back as positive? China had a spike in diagnosed cases in early February when it changed from the more restrictive criteria to allowing symptomatic diagnosis.
Deaths are likely more reliable, since they do not depend on the rate and reliability of community testing. But given the large effect of comorbidities on COVID fatality rates, death rates can be artificially depressed by attributing deaths to a comorbidity even if the person tested positive for COVID. Is a positive test required to attribute a death to COVID, even if lots of clinical symptoms present? If you allow attribution of a death to COVID without a lab test, what about the normal rate of death from the influenza? Are you now over-counting COVID deaths? How about deaths that occur outside of hospitals? China had to significantly revise up deaths due to COVID in April because of this; in the US many COVD-related deaths have occurred in skilled nursing facilities. How well are these counted?
How large are these effects likely to be? I lack the clinical expertise to give reasonable numbers. If I had to pick numbers to evaluate claims, my intuition is that these could easily account for differences of a factor of 2, but my general optimism says that a factor of 10 is too large to be explained by this. But I have little trust in these numbers. Even within-state numbers can be problematic, if the amount of testing (especially community testing!) changes.
The bottom-line is that when criticizing research on COVID, I can always find flaws. I’m confident every study I’ve done has had significant flaws, even if it is just that the data I used for my study is not representative of the data you care about. When evaluating whether these objections should matter, the careful reader should have an idea of what is a reasonable effect size for the identified issues; if they are larger than the effect size claimed in the study, then there is reason to be skeptical. A good scientific communicator will highlight these issues, and help the reader with this analysis.
A prospective example of this: In the US, there are currently large-scale protests in many cities. If two weeks after the protests began, there is an increase in new COVID cases, this can be explained by a lack of social distancing at the protests, increased testing, or some other coincident change. The data will have troubles distinguishing these. Even testing can be deceptive, as there has been a scandal in the UK because the number of tests and the number of people effectively tested can be quite different. It is left to the reader’s judgement which of these explanations (or others I haven’t thought of!) is the most reasonable.
No comments:
Post a Comment