## Exponential growth and epidemics

The phrase “exponential growth” is familiar
to most people, and yet human intuition has a hard time really recognizing what it means
sometimes. We can anchor on a sequence of small seeming
numbers, then become surprised with suddenly those numbers look big, even if the overall
trend follows an exponential perfectly consistently. This right here is the data for recorded cases
of COVID-19, aka the Coronavirus, outside mainland China, at least as of the time I’m
writing this. Never one to waste an opportunity for a math
lesson, I thought this might be a good time for us all to go back to the basics on what
exponential growth is, where it comes from, what it implies, and maybe most pressingly,
how to know when it’s coming to an end. Exponential growth means as you go from one
day to the next, it involves multiplying by some constant. In our data, the number of cases each day
tends to be between 1.15 and 1.25 times the number of cases the previous day. Viruses are a textbook example of this kind
of growth because what causes new cases are the existing cases. If the number of cases on a given day is N,
and we say each individual with the virus is, on average, exposed to E people on a given
day, and each exposure has a probability p of becoming an infection, the number of new
cases each day is E*p*N. The fact that N itself is a part of this is what really makes things
go fast because as N gets big, the rate it grows also gets big. One way to think of this is that as you add
on these new cases to get the next day’s count, you can factor out the N, so it’s
just the same as multiplying by some constant bigger than 1. This is sometimes easier to see if we put
the y-axis on a logarithmic scale, meaning each step of a fixed distance corresponds
to multiplying by a certain factor; in this case, each step is another power of 10. On this scale, exponential growth looks like
a straight line. With our data, it took 20 days to go from
100 to 1,000, and 13 days to go from that to 10,000, and by doing a linear regression
to find the best fit line, you can look at the slope of that line to say it tends to
multiply by 10 every 16 days on average. This regression also lets us be more quantitative
about how close the exponential fit really is, and to use the technical jargon here,
the answer is that it’s really freaking close. It can be hard to digest what this really
means, if true. If you see one country with 6,000 cases, while
another has 60, it’s easy to think the second is doing 100 times better and, hence doing
fine. But if you’re in a situation where numbers
multiply by 10 every 16 days, another way to view the same fact is that the second country
is about a month behind the first. This is, of course, rather worrying if you
draw out the line. I’m recording this on March 6th, and if
the present trend continues, it would mean hitting 1M cases in 30 days (April 5th), hitting
10M in 47 days (April 22nd), 100M in 64 days (May 9th), and 1 billion in 81 days (May 26th). Needless to say, though, you can’t draw
out a line like this forever, it clearly must start slowing down at some point, but the
crucial question is when. Is it like the SARS outbreak of 2002 capped
out at about 8,000 cases, or more like the Spanish Flu in 1918 ultimately infected about
27% of the world’s population? In general, just drawing a line through your
data is not a great way to make predictions, but remember that there’s an actual reason
to expect an exponential here. If the number of new cases each day is proportional
to the number of existing cases, it means each day you multiply by some constant, so
moving forward d days is the same as multiplying by that constant d times. It is inevitable, though, that this factor
in front of N eventually decreases. Even in the most perfectly pernicious model
for a virus, which would be where every day, each person with the virus is exposed to a
random subset of the world’s population, at some point most of the people they’re
exposed to will already be sick, and so can’t become new cases. In our equation, this means the probability
of infection should include some factor to account for the probability that a person
you’re exposed to isn’t already infected, which for a random exposure model would be
(1 – the proportion of people in the world who are infected). When you include a factor like that and solve
for how N grows, you get what’s known as a logistic curve, which is essentially indistinguishable
from an exponential at the beginning, but ultimately levels upon approaching the total
population size, as you’d expect. True exponentials essentially never exist
in the real world, they’re all the beginnings of logistic curves. The point where this curve goes from curving
up to instead curving down is known as the “inflection point”. At that point, the number of new cases each
day, represented by the slope of this curve, is roughly constant, and will soon start decreasing. So one number that people will often follow
with epidemics is the “growth factor”, which defined as the ratio between the number
of new cases one day, and the number of new cases the previous day. So, just to be clear, if you were looking
at the totals from on day to the next, then tracking the changes between these totals,
the growth factor is the ratio between two successive changes. While you’re growing exponentially, this
factor will stay consistently above 1, whereas seeing a growth factor around 1 is a sign
you’ve hit the inflection. This can make for another counterintuitive
fact while following the data. Think about what it would look like for the
number of new cases one day to be about 15% more than the number of new cases the previous
day, and contrast that with what it would feel like for it to be about the same. Just looking at the totals, they really don’t
feel that different, but if the growth factor is 1, it could mean you’re at the inflection
point of a logistic, which means the total number of cases will max out around 2 times
wherever you are now. But a growth factor bigger than 1 means you’re
on the exponential part, which could imply orders of magnitude of growth still lie ahead
of you. While in the worst case this saturation point
would be the total population, it’s of course not true that people with the virus are randomly
shuffled around the world’s population like this, people are clustered in communities. But when you run simulations where there’s
even a little bit of travel between the clusters like these, the growth is not actually much
different. What you end up with is a kind of fractal
pattern, where communities themselves function like individuals. Each one has some exposure to others, with
some probability of spreading the infection, so the same underlying exponential-inducing
laws apply. Fortunately, saturating the whole population
is not the only thing that causes the growth factor to slow. The amount of exposure goes down when people
stop gather and traveling, and the infection rate goes down when people wash their hands
more. The other thing that’s counterintuitive
about exponential growth is how sensitive it is to this constant. For example, if it’s 15%, and we’re at
21,000 cases now, that means 61 days from now it’s over 100 million. But if through a bit less exposure and infection
it drops to 5%, it doesn’t mean the projection drops by a factor of 3, it actually drops
to around 400,000. So if people are sufficiently worried, there’s
much less to worry about, but if no one is worried, that’s when you should worry.