The phrase “exponential growth” is familiar

to most people, and yet human intuition has a hard time really recognizing what it means

sometimes. We can anchor on a sequence of small seeming

numbers, then become surprised with suddenly those numbers look big, even if the overall

trend follows an exponential perfectly consistently. This right here is the data for recorded cases

of COVID-19, aka the Coronavirus, outside mainland China, at least as of the time I’m

writing this. Never one to waste an opportunity for a math

lesson, I thought this might be a good time for us all to go back to the basics on what

exponential growth is, where it comes from, what it implies, and maybe most pressingly,

how to know when it’s coming to an end. Exponential growth means as you go from one

day to the next, it involves multiplying by some constant. In our data, the number of cases each day

tends to be between 1.15 and 1.25 times the number of cases the previous day. Viruses are a textbook example of this kind

of growth because what causes new cases are the existing cases. If the number of cases on a given day is N,

and we say each individual with the virus is, on average, exposed to E people on a given

day, and each exposure has a probability p of becoming an infection, the number of new

cases each day is E*p*N. The fact that N itself is a part of this is what really makes things

go fast because as N gets big, the rate it grows also gets big. One way to think of this is that as you add

on these new cases to get the next day’s count, you can factor out the N, so it’s

just the same as multiplying by some constant bigger than 1. This is sometimes easier to see if we put

the y-axis on a logarithmic scale, meaning each step of a fixed distance corresponds

to multiplying by a certain factor; in this case, each step is another power of 10. On this scale, exponential growth looks like

a straight line. With our data, it took 20 days to go from

100 to 1,000, and 13 days to go from that to 10,000, and by doing a linear regression

to find the best fit line, you can look at the slope of that line to say it tends to

multiply by 10 every 16 days on average. This regression also lets us be more quantitative

about how close the exponential fit really is, and to use the technical jargon here,

the answer is that it’s really freaking close. It can be hard to digest what this really

means, if true. If you see one country with 6,000 cases, while

another has 60, it’s easy to think the second is doing 100 times better and, hence doing

fine. But if you’re in a situation where numbers

multiply by 10 every 16 days, another way to view the same fact is that the second country

is about a month behind the first. This is, of course, rather worrying if you

draw out the line. I’m recording this on March 6th, and if

the present trend continues, it would mean hitting 1M cases in 30 days (April 5th), hitting

10M in 47 days (April 22nd), 100M in 64 days (May 9th), and 1 billion in 81 days (May 26th). Needless to say, though, you can’t draw

out a line like this forever, it clearly must start slowing down at some point, but the

crucial question is when. Is it like the SARS outbreak of 2002 capped

out at about 8,000 cases, or more like the Spanish Flu in 1918 ultimately infected about

27% of the world’s population? In general, just drawing a line through your

data is not a great way to make predictions, but remember that there’s an actual reason

to expect an exponential here. If the number of new cases each day is proportional

to the number of existing cases, it means each day you multiply by some constant, so

moving forward d days is the same as multiplying by that constant d times. It is inevitable, though, that this factor

in front of N eventually decreases. Even in the most perfectly pernicious model

for a virus, which would be where every day, each person with the virus is exposed to a

random subset of the world’s population, at some point most of the people they’re

exposed to will already be sick, and so can’t become new cases. In our equation, this means the probability

of infection should include some factor to account for the probability that a person

you’re exposed to isn’t already infected, which for a random exposure model would be

(1 – the proportion of people in the world who are infected). When you include a factor like that and solve

for how N grows, you get what’s known as a logistic curve, which is essentially indistinguishable

from an exponential at the beginning, but ultimately levels upon approaching the total

population size, as you’d expect. True exponentials essentially never exist

in the real world, they’re all the beginnings of logistic curves. The point where this curve goes from curving

up to instead curving down is known as the “inflection point”. At that point, the number of new cases each

day, represented by the slope of this curve, is roughly constant, and will soon start decreasing. So one number that people will often follow

with epidemics is the “growth factor”, which defined as the ratio between the number

of new cases one day, and the number of new cases the previous day. So, just to be clear, if you were looking

at the totals from on day to the next, then tracking the changes between these totals,

the growth factor is the ratio between two successive changes. While you’re growing exponentially, this

factor will stay consistently above 1, whereas seeing a growth factor around 1 is a sign

you’ve hit the inflection. This can make for another counterintuitive

fact while following the data. Think about what it would look like for the

number of new cases one day to be about 15% more than the number of new cases the previous

day, and contrast that with what it would feel like for it to be about the same. Just looking at the totals, they really don’t

feel that different, but if the growth factor is 1, it could mean you’re at the inflection

point of a logistic, which means the total number of cases will max out around 2 times

wherever you are now. But a growth factor bigger than 1 means you’re

on the exponential part, which could imply orders of magnitude of growth still lie ahead

of you. While in the worst case this saturation point

would be the total population, it’s of course not true that people with the virus are randomly

shuffled around the world’s population like this, people are clustered in communities. But when you run simulations where there’s

even a little bit of travel between the clusters like these, the growth is not actually much

different. What you end up with is a kind of fractal

pattern, where communities themselves function like individuals. Each one has some exposure to others, with

some probability of spreading the infection, so the same underlying exponential-inducing

laws apply. Fortunately, saturating the whole population

is not the only thing that causes the growth factor to slow. The amount of exposure goes down when people

stop gather and traveling, and the infection rate goes down when people wash their hands

more. The other thing that’s counterintuitive

about exponential growth is how sensitive it is to this constant. For example, if it’s 15%, and we’re at

21,000 cases now, that means 61 days from now it’s over 100 million. But if through a bit less exposure and infection

it drops to 5%, it doesn’t mean the projection drops by a factor of 3, it actually drops

to around 400,000. So if people are sufficiently worried, there’s

much less to worry about, but if no one is worried, that’s when you should worry.