A guide to the science and pseudoscience of A Troublesome Inheritance, part I: The genetics of human populations

This is the first in a series of guest posts in which Chris Smith will examine the evolutionary claims made in Nicholas Wade’s book A Troublesome Inheritance. Chris is an Associate Professor of Evolutionary Ecology at Willamette University. He uses population genetic approaches to understand coevolution of plants and insects, and he teaches the interdisciplinary course “Race, Racism, and Human Genetics” with Emily Drew.

Last month the former New York Times writer Nicholas Wade released his latest book on human evolution, A Troublesome Inheritance: Genes, Race, and Human History (2014, Penguin Press). In it, Wade argues that the genomic data amassed over the past ten years reveal real and meaningful biological differences between races, and that these differences explain much of the cultural and socioeconomic differences between people. If you haven’t read a newspaper or picked up a magazine in the last month, you may not have noticed that Wade’s book has—predictably—prompted intense and impassioned reaction from scientists, sociologists, and commentators from across the political spectrum. Writing for the Wall Street Journal, Charles A. Murray, author of The Bell Curve, called Wade’s book, “A delight to read … [that] could be the textbook for a semester’s college course on human evolution.” On the other hand, Arthur Allen, in his review for the New York Times, predicts that many readers will find Wade’s book to be, “a rather unconvincing attempt to promote the science of racial difference.”

Writers with considerably more gravitas than I have already pointed out that Wade seems to have a rather poor handle on the literature he reviews. Mike Eisen, professor of Molecular Biology in the Howard Hughes Medical Institute (HHMI) at University of California at Berkeley, writes that, “the book is riddled with scientific and logical flaws” and Wade’s “representation of modern genetics is simplistic and selective.” Likewise, Allen Orr, former president of the Society for the Study of Evolution, in his essay for the New York Review of Books warns that, “[Wade] is not the surest guide to a technical literature.” This is an understatement, to say the least. Indeed, many of Wade’s claims represent significant misunderstandings or misinterpretations of the literature.

Here, I offer the first in what I hope to be a series of posts examining Wade’s scientific claims, with a particular focus on his arguments about evolution and human genetics. I aim to review these in greater detail than has already been done elsewhere, but in terms that are still accessible to a general audience. I will not deal here with Wade’s arguments about the history of Western civilization and the relative contributions of economics and culture to the ascendancy of the West, which are topics that are well outside of my expertise.

Wade begins with the premise that recent population genetic studies reveal that human evolution has been “recent, copious, and regional.” On these very general points, I have no disagreement. The idea that all changes in allele frequencies ceased with the invention of agriculture is a notion that no one—apart from some of my introductory biology students—takes at all seriously. Likewise, it is inarguable that human populations vary genetically. As the evolutionary geneticist Richard Lewontin put it, “you don’t need a population geneticist to tell you that.”

However, starting from these uncontroversial (and frankly, rather banal) premises, Wade goes on to draw all manner of dramatic conclusions. Among other things, Wade works out that modern genetics confirm the existence of “three primary races”, that Europeans were genetically preprogrammed to become the world’s dominant culture, that African Americans may have evolved through natural selection to be inherently violent and socially deviant, and that Jews are genetically predisposed to careers in banking. (Seriously. You can’t make this stuff up!) Needless to say, the available evidence does not support Wade’s grandiose conclusions, many of which are directly contradicted by the very work he cites in support of his arguments. Over the next several weeks I will review several of Wade’s major claims, and evaluate what—if anything—the available data say about them.

Does modern genetics confirm the existence of human races?

Humans as a whole are unusual among primates in that we are remarkably genetically similar to one another (Gagneux et al. 1999). Needless to say, however, humans are not all genetically identical, and variation among humans is not distributed randomly. Rather, as is true of most mammals, genetic variation has a measurable geographic pattern in that people that live near each other tend to more genetically similar to one another. Within humans that pattern of geographic variation (or geographic structure) also bears a very strong mark of human history. Humans originated in Africa, and began to disperse into the rest of the world about 50,000 years ago, moving first into the Middle East, then into Europe, Asia, and finally into Oceania and the Americas. As a result, most of human genetic diversity is found within Africa, the source population. 

Human populations outside of Africa show progressively less and less variation as one moves further from Africa (Wang et al. 2007); as humans colonized each part of the globe in turn, each group of colonists carried with them only a subset of the genetic variation found in its source population. The combined effects of geographic structure and the history of humanity’s spread from our African homeland means that, for some genes, particular variants (alleles) are more common in some parts of the world than in others. So, given a sample of DNA from an individual, by looking at which genetic variants that individual carries at many, many genes we can estimate from where in the world that individual originated, often with a stunning degree of precision (Novembre et al. 2008). In addition, work by Noah Rosenberg and colleagues showed that when you use statistical tools (for example, the program STRUCTURE), to group individuals together into a pre-determined number of evolutionarily ideal populations, these populations largely correspond to continents of origin—but with some important exceptions, as we will see (Rosenberg et al. 2002; Rosenberg et al. 2005).

A guide to the science and pseudoscience of A Troublesome Inheritance, part I: The genetics of human populations

A measure of genetic variation (expected heterozygosity) contained within human populations located progressively further from east Africa, where modern humans originated. Image is from Wang et al. (2007), figure 2A.

In summarizing these facts about human genetic variation, Wade is largely on the mark. However, he misses several important points, both of which have major implications for Wade’s conclusions. The first of these, as has been pointed out by Jennifer Raff on her blog, Violent Metaphors, and as Jeremy Yoder explains in great technical detail at The Molecular Ecologist, STRUCTURE (the software used to cluster individuals into populations) does not, on its own, identify how many clusters actually exist. Rather, the investigator defines the number of populations in advance, and STRUCTURE then clusters the individuals accordingly[1], trying to find the statistically ‘best’ arrangement of individuals.

So, for example, a scientist might obtain a sample of genetic data from people living in each of several villages in the Alps, including some villages in Germany, and some in Switzerland. She would then feed these data into STRUCTURE. STRUCTURE will then ask for directions about how the data should be analyzed, including how many clusters it should use when grouping the individuals. In this case, since samples were taken from each of two countries, the scientist might tell STRUCTURE to assign the people into two populations. STRUCTURE will then assign each individual into a particular population, trying to create populations that—based on the frequency of genotypes within each resulting cluster—appear to be freely interbreeding.

Depending on how STRUCTURE organizes the individuals into these clusters, potentially interesting conclusions could be drawn. For example, we might find that the two clusters correspond to political boundaries—with people from villages in each country clustering together. Alternatively, the results might show that people from different villages that speak the same language cluster together, with the French and German speakers each forming separate groups, suggesting that language is more important than geography in determining who mates with whom. However, if the scientist had chosen to group the people into three clusters, instead of two, a different result might have emerged. For example, she might have found that both geography and language matter, with all the French speakers from Switzerland forming one cluster, the German-speaking Swiss another, and the people living in Germany forming a third.

Importantly, how many clusters are identified is a decision made by the scientist, not something that STRUCTURE determines. So, figuring out how many populations actually exist requires that we use some other criteria. At the time that Rosenberg completed their initial analyses, appropriate statistical tools for identifying the “optimal” number of clusters had not been developed [2]. Rosenberg’s group did, however, evaluate how the number of clusters chosen in advance affected “clusteredness” (the extent to which each individual is identified as belonging to one population, as opposed to having ancestry in multiple populations). They found that the highest levels of clusteredness were reached when STRUCTURE was asked to group individuals into 5 or 6 clusters (both of these produced similar levels of clusteredness when all individuals and all the genetic data were included) (Rosenberg et al. 2005).

The second important point that Wade seems to miss is that these idealized populations (or population clusters) do not correspond to any conventional racial classifications. Although Wade, conveniently, never explicitly defines what he actually means by race, he repeatedly makes the claim that modern genetics identifies “three primary races,” which he identifies as Africans, East Asians, and Europeans. These groupings correspond to the ‘negroid’, ‘mongoloid’, and ‘caucasoid’ races described by classical physical anthropology. The trouble is that none of the contemporary studies of human genetic variation actually find this.

Although Rosenberg and colleagues’ work showed that for five clusters the resulting groups correspond reasonably well to continent of origin—Africa, Europe, Asia, Oceania and the Americas (Wade manages to fold this into his ‘three primary races’ narrative by calling the Oceania and American groups as ‘minor continental races’), subsequent work by Sarah Tishkoff, which used a statistical criterion to identify the ‘best’ number of population clusters, identified 14 groups, nine of which were contained entirely within Africa (Tishkoff et al. 2009). That is, if we allow the data to identify human ‘races’ without guidance, we find 14, not Wade’s “three primary races”.

References

Bryc K., T. Karafet, A. Moreno-Estrada, A. Reynolds, A. Auton, M. Hammer, C. D. Bustamante & H. Ostrer (2010). Genome-wide patterns of population structure and admixture among Hispanic/Latino populations, Proceedings of the National Academy of Sciences, 107 (Supplement_2) 8954-8961. DOI: 10.1073/pnas.0914618107

Cavalli-Sforza L.L., C.R. Cantor, R.M. Cook-Deegan & M.-C. King (1991). Call for a worldwide survey of human genetic diversity: A vanishing opportunity for the Human Genome Project, Genomics, 11 (2) 490-491. DOI: 10.1016/0888-7543(91)90169-f

Evanno G. & J. Goudet (2005). Detecting the number of clusters of individuals using the software structure: a simulation study, Molecular Ecology, 14 (8) 2611-2620. DOI: 10.1111/j.1365-294x.2005.02553.x

Gagneux P., U. Gerloff, D. Tautz, P. A. Morin, C. Boesch, B. Fruth, G. Hohmann, O. A. Ryder & D. S. Woodruff (1999). Mitochondrial sequences show diverse evolutionary histories of African hominoids, Proceedings of the National Academy of Sciences, 96 (9) 5077-5082. DOI: 10.1073/pnas.96.9.5077

Moreno-Estrada A., J. C. Fernandez-Lopez, F. Zakharia, M. Sikora, A. V. Contreras, V. Acuna-Alonzo, K. Sandoval, C. Eng, S. Romero-Hidalgo & P. Ortiz-Tello & (2014). The genetics of Mexico recapitulates Native American substructure and affects biomedical traits, Science, 344 (6189) 1280-1285. DOI: 10.1126/science.1251688

Moreno-Estrada A., Fouad Zakharia, Jacob L. McCauley, Jake K. Byrnes, Christopher R. Gignoux, Patricia A. Ortiz-Tello, Ricardo J. Martínez, Dale J. Hedges, Richard W. Morris & Celeste Eng & (2013). Reconstructing the population genetic history of the Caribbean, PLoS Genetics, 9 (11) e1003925. DOI: 10.1371/journal.pgen.1003925

Morton SG (1839) Crania Americana: An essay on the varieties of the human species. J. Dobson, Philadelphia.

Novembre J., Katarzyna Bryc, Zoltán Kutalik, Adam R. Boyko, Adam Auton, Amit Indap, Karen S. King, Sven Bergmann, Matthew R. Nelson & Matthew Stephens & (2008). Genes mirror geography within Europe, Nature, 456(7218) 98-101. DOI: 10.1038/nature07331

Rosenberg N.A., Sohini Ramachandran, Chengfeng Zhao, Jonathan K. Pritchard & Marcus W. Feldman (2005). Clines, clusters, and the effect of study design on the inference of human population structure, PLoS Genetics, 1 (6) e70. DOI: 10.1371/journal.pgen.0010070

Rosenberg N.A. (2002). Genetic structure of human populations, Science, 298 (5602) 2381-2385. DOI: 10.1126/science.1078311

Tishkoff S.A., F. R. Friedlaender, C. Ehret, A. Ranciaro, A. Froment, J. B. Hirbo, A. A. Awomoyi, J.-M. Bodo, O. Doumbo & M. Ibrahim & (2009). The genetic structure and history of Africans and African Americans, Science, 324(5930) 1035-1044. DOI: 10.1126/science.1172257

Wang S., Mattias Jakobsson, Sohini Ramachandran, Nicolas Ray, Gabriel Bedoya, Winston Rojas, Maria V. Parra, Julio A. Molina, Carla Gallo & Guido Mazzotti & (2007). Genetic variation and population structure in Native Americans, PLoS Genetics, 3 (11) e185. DOI: 10.1371/journal.pgen.0030185

[1] STRUCTURE groups individuals into a predetermined number of clusters ‘K’, in such a way that the distribution of genetic variation (the genotype frequencies) within each cluster makes it appear that mating is occurring at random. That is, the software seeks to arrange individuals into groups in a way that minimizes the overall departures from Hardy-Weinberg equilibrium and linkage equilibrium.

[2] Subsequently a statistical approach has been suggested (Evanno et al. 2005), which selects the optimal number of clusters based on the rate at which the probability of observing the data, given the number of clusters posited, increases as more clusters are proposed. To my knowledge this approach has not been used to identify the optimal number of clusters in the Rosenberg dataset.

[3] The difference between genetic variation between races versus genetic variation within races is an important distinction (one that Wade entirely misses), which I will take up in a future post.

[4] Note that ‘genetic differentiation’ is not the same thing as genetic variation. Overall, Native American populations harbor far less genetic variation than the people of any other continent, having traveled further from Africa than any other group. The Moreno-Estrada result refers to a statistical measure of genetic variation called Wright’s FST, which measures the extent of genetic exchange between two populations, or, more precisely, the degree to which the distribution of genetic variation differs from what we would expect if the people living in each population were as likely to mate with someone from the other population as with someone from their own population.