Evidential note. The questions that I’m noting in this post are based on statistics of confirmed cases. The problem is that this makes any attempt at analysis, whether explicit analysis or tacit analysis presupposed by the questions, prone to a whole bunch of measurement artifacts and sampling bias. The availability and turnaround time of tests have been a substantial bottleneck on finding out anything about the disease, but the bottleneck itself has changed over time as testing procedures changed. Tests are not randomly distributed or performed. (Nor should they be, if their more acute clinical function of determining who to treat is currently more important than their epidemiological function of finding out about the spread of the disease. Which of course it is. But that’s another reason to note that it’s hard to find things out about the disease, and to exercise some caution about the possibility of being misled. In any case, this was written in the mid-morning on March 28, 2020; case numbers change rapidly, and at different rates in different places, so this very tentative set of questions may presuppose a number of things that have been true until recently, but no longer are.
It’s hard to know for sure, but such numbers as we have seem to indicate (1) that Covid-19 has spread everywhere throughout the United States, but (2) that it has spread much more in the northeast, and in New York and New Jersey, than everywhere else. There was a lot of reporting this past week when the U.S. surpassed 100,000 reported cases, and when it surpassed Italy and China There are a lot of cases everywhere, but in fact the magnitude of the U.S.’s rapidly jumping numbers of reported is currently explained almost entirely by the numbers from New York and New Jersey:
|U.S. Total (March 28, AM):
|#1. New York:
|#2. New Jersey:
The estimated total population of the U.S. in 2018 was 327,167,434 people; New York (19,453,561) and New Jersey (8,882,190) have about 8.6% of the total population of the United States. But they account for well over half (just under 52.1%) of all the reported cases of Covid-19 in the U.S.; while reported Covid-19 cases have increased everywhere in the country, the huge increases in reported cases in New York and New Jersey over the last week explain nearly half (49.3%) of all the increased cases in the U.S. since March 21. So there’s some reason to worry that discussions heavily based on aggregated nationwide numbers are likely to be misleading about the actual patterns of the outbreak. The fact that just two contiguous states have more than half of the reported outbreaks, and account for just under half of the increase in reported outbreaks in the last week, also provides some reason to wonder, what’s going on with New York and New Jersey that have led to such a heavy regional concentration?
How much is the population due to outbreaks in population centers? New York City is the megalopolis that connects New York State with New Jersey, and as of March 28, New York City alone accounts for 25,398 reported cases, over half the reported cases in all of New York State. Of course New York City is atypical compared to the rest of the United States.
Could New York and New Jersey simply be in the later phases of an epidemic pathway that, without intervention, will also be experienced everywhere else as exponential growth continues? Well, maybe — New York has close connections and extreme, constant travel between both the Pacific Rim and Western Europe. Maybe the outbreak started earlier there and it has progressed longer, but if you give it time in other parts of the country, further out on the periphery of global social and economic graphs, you’ll get similar progress in infections. On the other hand, the cases in New York and New Jersey now dwarf those on the West Coast (California alone has more population than New York and New Jersey combined, but has less than a tenth of the reported cases), even though the West Coast contains the vast globally connected megalopolises of Los Angeles and the San Francisco Bay Area, and the earliest epidemic outbreaks in the United States were in Washington State. They plausibly ought to be at the same point or even a later point of an epidemic curve as New York and New Jersey, but they haven’t had anything like the huge spike in reporting of cases.
Could New York and New Jersey be more severely affected than the rest of the U.S. because of population differences? Well, maybe. They’re certainly more densely populated than a lot of states; New York City has a vast population, and a number of the mid-size cities surrounding it, have a much greater population density than anywhere else in the U.S, including even other large, dense megalopolises like San Francisco or Chicago. On the other hand, it’s not ten times more dense than, say, San Francisco. You might want to look not only at densities but at other features of how those populations go about and live their lives; for example, New York is unusual within the United States not only in having a very dense population but also in having extremely high levels of transit and subway usage within the inner city, unusually low rates of car ownership per household and per capita, etc.
Could New York and New Jersey be more severely affected because New York City is more severely affected, due to peculiar events and/or local political failures within New York? Derek Thompson at the Atlantic (warning: Twitter thread) thinks that at least some of it is down to fuck-ups and wavering by Bill de Blasio personally, or by the people around him. Maybe; although of course it will be pretty hard to measure how much regime uncertainty did or did not affect New Yorkers’ decisions; we’d have to have some other comparison point with other mayors and decision-makers elsewhere in the country; it’s easy to ridicule or to condemn politicians or policy fuck-ups, and often right to do so, but it may be hard for some time to come how much actual difference in infection rates could be attributed to any given erroneous, wrongheaded, ridiculous, or contemptible behavior. What might we do to gather more information on this? I suppose you could, for example, try to put together some timelines on a matrix of different decisions across several cities with notably different levels of reported cases (for example, when did the first reported cases show up, when were restaurants encouraged or forced to close, when were schools encouraged or forced to close, etc.; of course some questions — such as whether or not the city government decided to cancel a large public gathering like the St. Patrick’s Day Parade — depend on peculiar features that won’t be shared across all cities). Of course, differences might also have to do with differences not in local policies, but in state policies, since New York and New Jersey have state governments with very distinctive dynamics, and a large portion of the relevant decision making here is made by state departments of public health. Or it might have to do with relationships between state departments of public health and city or county authorities — in some places these relationships are fairly cooperative or plainly deferential, in other places fairly antagonistic or competitive and turf-protecting; has there been a difference in these dynamics between New York and/or New Jersey state-local politics, and state-local politics elsewhere in the country?
Could New York and NEw Jersey be more severely affected because of distinctive environmental factors? They’re way up north, and weather in early March is relatively chilly. Nobody knows very well whether or not infections with Covid-19 will be limited seasonally by hotter weather, although there have been some pious hopes that they might (like some other airborne infections, especially seasonal influenza). If they are, this could be a relevant difference between the northeast and the rest of the country and explain part of the difference; if they aren’t, then I suppose it wouldn’t. Or of course there might be other environmental factors.
Could New York and New Jersey have higher numbers of reported cases because there is something different about the testing or the reporting? Every state has had different access to test kits and different approaches to testing, in particular to the implementation of third-party laboratory testing. If so, the jump in reported cases might be partly accounted for by differences in the reporting, rather than differences in the disease or in the population. If so the question would not so much be, how did so many more people get infected? but rather, how did so many more people get tested? If this does turn out to be the true explanation, of course, it might be relatively better news for New York and New Jersey (since it would indicate that there isn’t something that they’re differentially doing wrong compared to the rest of the country, or a bad circumstance that they’re differentially stuck with); on the other hand, it would be relatively worse news for the rest of the country, since it would tend to suggest that the situations elsewhere might be worse than the reported numbers indicate.
Among people who are very worried about Covid-19, the effect of the outbreak on politics has often been to call in very stark terms for huge, drastic, nationally-uniform policy responses. (For example, on March 24, the New York Times Editorial Board argued that man in the White House
call for a two-week shelter-in-place order, now, as part of a coherent national strategy for the coronavirus. Maybe they are right about that — if New York and New Jersey are just an early vision of the future for other states within the United States, then that would be one reason to think that what’s helpful for them now, or what would have been helpful for them if enacted a couple of weeks ago, may be helpful soon, or helpful now, for the rest of the country. Or if they have the highest reported numbers because they have done more testing and reporting than other places, then that would be another reason to think that the situation is less regionally concentrated and more uniform than the numbers would indicate. On the other hand, if there are features distinctive of New York and New Jersey that help explain the regional concentration, that would also help provide some information about how to intelligently respond in different states where the situation might differ. The question should be, how much do we know or how much do we guess about these issues now, and how good is the evidential basis for what we think or guess about it? What evidence could we gather that would help clarify the situation? Is that evidence accessible now, or could it be reasonably approximated in time for it to be useful?
- Evidential note: I’m using the numbers from New York Times‘s frequently updated Coronavirus in the U.S.: Latest Map and Case Count page, which are based on a published dataset maintained by New York Times reporters compiled from state, federal, and territorial public health authorities, with some editorial intervention and normalization by the Times staff; they discuss their Methodology on the Github page’s README. I cross-checked those numbers and found that they show about the same results as taking the relevant aggregate numbers from CDC’s counts of
confirmed or presumptive positivecases of Covid-19 in the United States, and the ECDC’s national-level data set on global cases.↩
- This was a totally meaningless and uninformative statistical milestone. The United States has more than 5 times as many people as Italy; as of March 28, 2020, the prevalence of reported Covid-19 cases per 100,000 population was about 450% higher in Italy than in the United States. (Evidential note: For numbers, I used the cases and popData2018 columns helpfully provided with the ECDC’s data set, and corroborated the population estimate numbers with a Google search sanity-check.) Of course, the U.S.’s reported cases are growing rapidly; so depending on how things go over the next several days, the U.S. may overtake Italy in reported cases in the near future.↩
- China has a much larger population than Italy, and a much larger population than the U.S., so this is somewhat more meaningful.↩
- Which of course have even more intense travel, economic and population connections with China and the Pacific Rim.↩
- Although that’s a complicated question, too. If there are lots and lots of undetected Covid-19 cases in the country that aren’t reflected in the confirmed case numbers, that would mean both (1) Covid-19 is potentially much more contagious than the reported cases alone indicate, maybe at the higher end of ranges of estimated reproduction numbers; and (2) Covid-19 is also much harder to track and contain, since lots of cases are passing uncounted. On the other hand, if true, that would also suggested (3) Covid-19 is significantly less lethal than the reported case numbers indicate; if the denominator in the ratio of deaths to infections is actually much higher than we could measure, and there are lots and lots of hidden cases, that would mean that the risk associated with getting an infection is correspondingly lower than it is in models based on officially reported case fatality rates.↩
- Their Editorial does later recognize the fact that the President of the United States actually has no legal authority to issue a nation-wide shelter-in-place order; but they wish that he would use his position to emphatically cajole state authorities into doing so in the several states.↩