In this post we see how we can track mutation rates to estimate when people were infected with HIV and even when the virus first crossed over into humans.
HIV is an evolution machine
Its polymerase enzyme is pretty sloppy and has an error rate of about 1 mistake for every 10 thousand nucleotide bases copied.
For a virus with a genome about 10 thousand bases in length, that means that basically every time HIV replicates itself, it makes a mistake.
Sometimes these errors result in a defective virus, but sometimes they give the virus some new property its predecessor didn’t have, such as resistance to an antiretroviral agent (the drugs we use to treat HIV). The high mutation rate of HIV has also led to extensive worldwide diversity in the epidemic, leading to groupings of related viruses called clades that are named with the letters A through K, and sometimes with two letters where it looks like two clades have recombined into a spliced version of HIV. The different clades are shown in this phylogenetic tree. Also shown are how they relate to other immunodeficiency viruses that infect other primates, as well as how HIV (more precisely, HIV-1) is related to a distinct virus that also infects humans and causes AIDS, called HIV-2, which is mostly confined to west Africa.
This extensive diversity also makes it very difficult to develop an HIV vaccine.
Although the high mutation rate makes things difficult for scientific and medical advances in HIV, it does allow us to see evolution in action, and can lead to some pretty interesting discoveries.
Exploring historical viruses
This paper by Worobey and colleagues is fascinating because it details one of the oldest samples of HIV ever found, a preserved lymph node tissue biopsy from 1960 from what is now Kinshasa, Democratic Republic of the Congo. The HIV sequence they found, named DRC60, is from a real person who likely died of AIDS some two decades before AIDS was ever identified.
By comparing this sequence with another very early sequence from the same city (ZR59), the team was able to ask a number of interesting questions about the origins of the HIV epidemic. If HIV had crossed over from chimpanzees around 1960, one would expect DRC60 and ZR59 to be quite similar in their sequences. However, if the epidemic got started earlier, or had a number of different independent transmission events that got it started, then you’d expect these sequences to be more divergent.
In fact, DRC60 and ZR59 are quite different from one another. DRC60 appears to be most closely related to clade A viruses while ZR59 looks more like a clade D virus, and overall, they differ by about 12%. Thus in one city in Africa, there were already at least two distinct clades of HIV in 1960, suggesting that the virus had been spreading for a number of years prior to that date. In fact, the researchers could use these two sequences to “count backwards” and determine how far in the past these two viruses had a common ancestor.
Timing the cross-over
Although HIV evolves rapidly, it does so at a fairly constant rate. In essense, you can use this constant rate to act like a clock to tell you roughly how many changes accumulate over a year. Then, by figuring out the number of changes it would take for both sequences to converge on a single identical sequence (their most recent common ancestor, “MRCA”), you can get an estimate of the date that the MRCA existed at. Using these two historical sequences plus some more contemporary ones, Worobey et al estimated that the MRCA for all these sequences probably arose around 1921, around 60 years before AIDS was identified in the US. The date for this common ancestor may represent when the virus first crossed over from chimps into humans.
Some estimates even ranged as far back as 1873.
The researchers posit that if the epidemic did indeed start that early, that it likely grew very slowly at first (otherwise AIDS would have likely been identified later). They overlay the range of their estimate for when the MRCA showed up with some population growth data for some cities in west Africa and conclude that it was the rise of cities after approximately 1910 that really allowed the epidemic to spread.
Estimating dates of infection
We can also use a similar approach to the one Worobey did to estimate when a single individual was infected by HIV.
This paper by Poon et al (a colleague of mine for full disclosure) details a cohort of 19 patients from Montreal who were identified with primary HIV infection, meaning they had likely been infected within the past few months. These patients had a number of archived blood samples from different points since they had become infected.
The researchers used a next-generation sequencing approach called deep sequencing, which gives a high-resolution cross section of all the different versions of HIV that exist within a patient. This approach generates an average of about 2500 sequences per sample. They then made phylogenetic trees from these sequences and used sophisticated statistical and phylogenetic modelling programs to trace back to the most recent common ancestor of all the different variants within that patient. Since they had multiple blood samples from these patients with known dates, they could determine the rates of evolution for HIV for each specific patient independently, and use that rate to count back to when the MRCA was most likely to have occurred.
Because other work has indicated that most HIV infections actually arise from a single virus (or at least a set of viruses that all look identical), an estimate of when the MRCA occurred for a patient can also serve as an estimate for when that patient was actually infected.
Now all this sounds fine but without actually knowing when the patient was infected, how do we know that these methods are correct or are wildly off-base? Well, there are other non-genetic ways of estimating when someone was infected with HIV. For example, around three-quarters of all people with HIV have a flu-like illness called an acute retroviral syndrome about 14 days after getting infected. So, if you know when that first occurred for someone, you can guess that they were infected about 2 weeks before. Patients in this phase of their HIV infection will test negative for HIV antibodies, but will have detectable levels of HIV RNA in their blood, which is another indicator of acute HIV infection, again occurring approximately 14 days after infection.
Using these and other approaches, the researchers could compare their phylogenetic estimates of dates of infection to these non-genetic estimates of viral syndromes and the like. As the figure indicates, the dates of infection estimated by the phylogenetic approach (Y axis) are actually quite similar to the dates derived from the non-genetic approaches. An independent dataset with known dates of seroconversion (the point at which HIV antibodies appear) again confirmed that this approach is quite accurate.
Thus overall, we can use these approaches that incorporate what we know about evolution of HIV to determine how long in the past it was that a most recent common ancestor occurred. When we do that for a single patient, we get an estimate of their date of infection. And when we do it with sequences from the whole of the HIV epidemic (including very old archived sequences since before the epidemic took off) we can estimate the MRCA of all HIV strains and when the virus first crossed over into humans.