population genetics etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
population genetics etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Darwin the Newtonian. Part III. In what sense does genetic drift 'exist'?

It has been about 50 years since Motoo Kimora and King and Jukes proposed that a substantial fraction of genetic variation can be selectively neutral, meaning that the frequency of such an allele (sequence variant) in a population or among species changes by chance--genetic drift--and, furthermore, that selectively 'neutral' variation and its dynamics are a widespread characteristic of evolution (see Wikipedia: Neutral theory of molecular evolution). Because Darwin had been so influential with his Newtonian-like deterministic theory of natural selection, natural evolution was and still is referred to as 'non-Darwinian' evolution. That's somewhat misleading, if convenient as a catch-phrase, and often used to denigrate the idea of neutral evolution, because even Darwin knew there were changes in life that were not due to selection (e.g., gradual loss of traits no longer useful, chance events affecting fitness).

First, of course, is the 'blind watchmaker' argument.  How else can one explain the highly organized functionally intricate traits of organisms, from the smallest microbe to the largest animals and plants?  No one can argue that such traits could plausibly just arise 'by chance'!

But beyond that, the reasoning basically coincides with what Darwin asserted.  It takes a basically thermodynamic belief and applies it to life.  Mother Nature can detect even the smallest difference between bearers of alternative genotypes, and in her Newtonian force-like way, will proffer better success on the better genotype.  If we're material scientists, not religious or other mystics, then it is almost axiomatic that since a mutation changes the nature of the molecule, if for no other reason that it requires the use of a different nucleotide and hence the use and or production of at least slightly different molecules and at least slightly different amounts of energy.

The difference might be very tiny in a given cell, but an organism has countless cells--many many billions in a human, and what about a whale or tree! Every nonessential nucleotide has to be provided for each of the billions of cells, renewed each time any cell divides.  A mutation that deleted something with no important function would make the bearer more economical in terms of its need for food and energy. The difference might be small, but those who then don't waste energy on something nonessential must on average do better: they'll have to find less food, for example, meaning spend less time out scouting and hence exposed to predators, etc.  In short, even such a trivial change will confer at least a tiny advantage, and as Darwin said many times to describe natural selection, nature detects the smallest grain in the balance (scale) of the struggle for life.  So even if there is no direct 'function,' every nucleotide functions in the sense of needing to be maintained in every cell, creating a thermodynamic or energy demand.  In this Newtonian view, which some evolutionary biologists hold or invoke quite strongly, there simply cannot be true selective neutrality--no genetic drift!


The relative success of any two genotypes in a population sample will almost never be exactly the same, and how could one ever claim that there is no functional reason for this difference?  Just because a statistical test doesn't find 'significant' differences in the probabilistic sense that it's not particularly unusual if nothing is going on, tiny differences nonetheless obviously can be real.  For example, a die that's biased in favor of 6 can, by chance, come up 3 or some other number more often in an experiment of just a few rolls. Significance cutoff values are, after all, nothing more than subjective criteria that we have chosen as conventions for making pragmatic decisions (the reason for dice being this way is interesting, but beyond our point here).

But what about the lightning strikes?  They are fortuitous events that, obviously, work randomly against individuals in a population in a way unrelated to their genotypes, thus adding some 'noise' to their relative reproductive success and hence of allele (genetic variant) frequencies in the population over time.  That noise would also be a form of true genetic drift, because it would be due to a cause unrelated to any function of the affected variants, whose frequencies would change, at least to some extent, by chance alone. A common, and not unreasonable selectionist response to that is to acknowledge that, OK! there's a minor role for chance, but nonetheless, on average, over time, the more efficient version must still win out in the end: 'must', for purely physical/chemical energetics if no other reasons.  That is, there can be no such thing as genetic drift on average, over the long haul.  Of course, 'overall' and 'in the end' have many unstated assumptions.  Among the most problematic is that sample sizes will eventually be sufficiently great for the underlying physical, deterministic truth to win out over the functionally unrelated lightning-strike types of factors.

On the other hand, the neutralists argue in essence that such minuscule energetic and many other differences are simply too weak to be detected by natural selection--that is, to affect the fitness of their bearers.  Our survival and reproduction are so heavily affected by those genotypes that really do affect them, that the remaining variants simply are not detectable by selection in life's real, finite daily hurly-burly competition. Their frequencies will evolve just by chance, even if the physical and energetic facts are real in molecular terms.

But to say that variants that are chemically or physically different do not affect fitness is actually a rather strong assertion! It is at best a very vague 'theory', and a very strong assumption of Newtonian (classical physics) deterministic principles. It is by no means obvious how one could ever prove that two variants have no effect.


So we have two contending viewpoints.  Everyone accepts that there is a chance component in survival and reproduction, but the selectionist view sees that component as trivial in the face of basic physical facts that two things that are different really are different and hence must be detectable by selection, and the other view that true equivalence is not only possible but widespread in life.

When you think about it, both views are so vague and dogmatic that they become largely philosophical rather than actual scientific views.  That's not good, if we fancy that we are actually trying to understand the real world.  What is the problem with these assertions?

Can drift be proved?
Maybe the simplest thing in an empirical setting would just be to rule out genetic drift, and show that even if the differences between two genotypes are small in terms of fitness there is always at least some difference.  But it might be easier to take the opposite approach, and prove that genetic drift exists.  To that, one must compare carriers of the different genotypes and show that in a real population context (because that's where evolution occurs) there is no, that is zero difference in their fitness. But to prove that something has a value of exactly zero is essentially impossible!


Is each outcome equally likely?  How to tell?


Again to a dice-rolling analogy, a truly unbiased die can still come up 6 a different number of times than 1/6th of the number of rolls: try any number of rolls not divisible by 6!  In the absence of any true theory of causation, or perhaps to contravene the pure thermodynamic consideration that different things really are different, we have to rely on statistical comparisons among samples of individuals with the different competing genotypes.  Since there is the lightning-strike source of at least some irrelevant chance effects and no way to know all the possible ways the genotypes' effects might differ truly but only slightly, we are stuck making comparisons of the realized fitness (e.g., number of surviving offspring) of the two groups.  That is what evolution does, after all.  But for us to make inferences we must apply some sort of statistical criteria, like a significance cut-off value ('p-value') to decide. We may judge the result to be 'not different from chance', but that is an arbitrary and subjective criterion.  Indeed, in the context of these contending views, it is also an emotional criterion.  Really proving that a fitness difference is exactly zero without any real external theory to guide us, is essentially impossible.

All we can really hope to do without better biological theory (if such were to exist) is to show that the fitness difference is very small.  But if there is even a small difference, if it is systematic it is the very definition of natural selection!  Showing that the difference is 'systematic' is easier to say than do, because there is no limit to the causal ideas we might hypothesize.  We cannot repeat the study exactly, and statistical tests relate to repeatable events.

There's another element making a test of real neutrality almost impossible.  We cannot sample groups of individuals who have this or that variant and who do not differ in anything else.  Every organism is different, and so are the details of their environment and lifestyle experiences.  So we really cannot ever prove that specific variants have no selective effect, except by this sort of weak statistical test averaging over non-replicable other effects that we assume are randomly distributed in our sample.  There are so many ways that selection might operate, that one cannot itemize them in a study and rule out all such things.  Again, selectionists can simply smile and be happy that their view is in a sense irrefutable.

A neutralist riposte to this smugness would be to say that, while it's literally true that we can't prove a variant to confer exactly zero effect, we can say that it has a trivially small effect--that it is effectively neutral.  But there is trouble with that argument, besides its subjectivity, which is the idea that the variant in question may in other times and genomic or environmental contexts have some stronger effect, and not be effectively neutral.


A related problem comes from the neutralists' own idea that by far most sequence variants seem to have no statistically discernible function or effect.  That is not the same as no effect.  Genomes are loaded with nearly or essentially neutral variants by the usual sampling strategies used in bioinformatic computing, such as that neutral sites have greater variation in populations or between species than is found in clearly functional elements.  But this in no way rules out the possibility that combinations of these do-almost-nothings might together have a substantial or even predominant effect on a trait and the carriers' fitness.


After all, is not that just what have countless very large-scale GWAS studies shown? Such studies repeatedly, and with great fanfare, report that there are tens, hundreds, or even thousands of genome sites that have very small but statistically identifiable individual effects but that even these together still account for only a minority of the heritability, the estimate of the overall amount of contribution that genetic variation makes to the trait's variation.  That is, it is likely that many variants that individually are not detectably different from being neutral may contribute to the trait, and thus potentially to its fitness value, in a functional sense.


This is one of the serious and I think deeply misperceived implications of the very high levels of complexity that are clearly and consistently observed, which raises questions about whether the concept of neutrality makes any empirical sense, and remains rather a metaphysical or philosophical idea.  This is related to the concepts of phenogenetic drift that we discussed in Part II of this series, in which the same phenotype with its particular fitness can be produced by a multitude of different genotypes--the underlying alleles being exchangeable.  So are they neutral or not?

In the end, we must acknowledge that selective neutrality cannot be proved, and that there can always be some, even if slight, selective difference at work.  Drift is apparently a mythical or even mystical, or at least metaphoric concept.  We live in a selection-driven world, just as Darwin said more than a century ago.  Or do we?  Tune in tomorrow.

What population genetic diversity can and can't tell us

By Anne Buchanan and Ken Weiss

Genetic diversity is indisputably a marker of geographic origin and human migration.  The reason is very simple: new mutations arise independently and, to a great extent uniquely, and they arise in some local area with only a single copy of the newly arisen variant.  Over time, that variant will either disappear (not be passed down to any offspring) or may increase in frequency.  Because humans traditionally had but few surviving children per parent, and mated locally, only slow increases and spread of descendant copies of a variant would occur.  Local areas had a unique pattern of genomic variants and, depending on their population size and structure, different amounts of variation.  Because all humans originated from a smallish emigration from a source population in Africa, there is more, and more complex, genomic variation there than in Eurasia.

Beyond these clear facts about the amount and distribution of human genomic diversity, interpretations of what it means, implies, involves get fuzzy, political, emotional and controversial; race is seen as either a genetic construct or a social one, and it is correlated in some ways with geographic location or origin, so that it is not obvious how genetic variation per se can be interpreted in terms of traits like societal diversity in wealth, achievements and the like.

The danger of course is to assume that geographic correlation of some societal trait with genomic variation is caused by that variation, that is, that societal variation is 'genetic'.  It is natural for some in the developed world to want to see their achievements as being due to inherent genetic traits (read: superiority), and there is a very long history, all the way back to the Greeks in western tradition, to hold such views of inherency.  But this is hard to demonstrate.

An interesting new paper in the September issue of Genetics tries to make some sense of the meaning of genetic diversity ("Genetic Diversity and Societally Important Disparities," Rosenberg and Kang, 2015) by examining "the ways in which population differences in genetic diversity might contribute to consequential societal differences across populations." Rosenberg and Kang assess the importance of genetic diversity in forensics, organ transplants, and genome wide association studies, as well as its contribution to societal disparities.  They conclude that genetic diversity must be taken into account for biological purposes, but they find no association with societal diversity.  Here's why.

Their paper was at least in part occasioned by a controversy over a 2013 report concluding that population genetic variation can be used as a proxy for economic diversity, and success ("The 'Out of Africa' Hypothesis, Human Genetic Diversity, and Comparative Economic Development," American Economic Review, Ashraf and Galor, 2013).  Ashraf and Galor (A and G) write:
This research advances and empirically establishes the hypothesis that, in the course of the prehistoric exodus of Homo sapiens out of Africa, variation in migratory distance to various settlements across the globe affected genetic diversity and has had a persistent hump-shaped effect on comparative economic development, reflecting the trade-off between the beneficial and the detrimental effects of diversity on productivity. While the low diversity of Native American populations and the high diversity of African populations have been detrimental for the development of these regions, the intermediate levels of diversity associated with European and Asian populations have been conducive for development.
And, this was all determined at "the dawn of humankind."  Naturally, and conveniently, a hump-shaped pattern rather than a simple linear one was needed if one had to similarly denigrate Native Americans and Africans.  None of that sort of argument for inherency is qualitatively new but the attempt to make it genetic and hence inherently true had a juicy appeal.  Rosenberg and Kang (R and K), however, apply the same methods to an even larger data set and find no association with economic success.

R and K make it clear that, in their attempt to replicate A and G's study, they are considering within-population diversity, not between.  This is important, because internal diversity is calculated from the population itself, not from a larger collection of populations which has various issues of sample selection, sample size, and the like. Within a population when one can assume approximate random-mating, one can estimate heterozygosity in ways far more unclear when analyzing multiple populations at one go.  So, R and K are calculating expected heterozygosity, "the probability that two draws from a population at a specific site in the genome will produce different genetic types."

Expected heterozygosity follows a consistent geographic pattern,  
...occurring as a function of increasing distance from East Africa, measured over land-based routes. The highest heterozygosities appear in populations from Africa, followed by populations from the Middle East, Europe, and Central and South Asia. Populations of East Asia have still lower heterozygosities, and Pacific Islander and Native American populations, at the greatest geographic distance from Africa over migration paths traversed in human evolution, are the least heterozygous. The linear decrease in heterozygosity with increasing distance from Africa is a strong and replicable
relationship, achieving correlation coefficients near 20.9 in a variety of studies of different genetic markers and sets of populations.
The explanation for the decreasing diversity out of Africa is that each new founding population is a subset of the original group, and thus carries with it less genetic diversity than the non-migrants.



The serial founder model in human evolution. (A) A schematic of the model. Each color
represents a distinct allele. Migration events outward from Africa tend to carry with them only a
subset of the genetic diversity from the source population, and some alleles are lost during
migration events.  (B) An example of the model at a particular genetic locus, TGA012. Each set of
vertical bars depicts the allele frequencies in a population, with different colors representing distinct
alleles. Within continental regions, populations are plotted from left to right in decreasing order
of expected heterozygosity at the locus [equation (3)]. This figure illustrates the loss of alleles across
geographic regions; Native Americans all possess the same allele. The allele frequencies are taken
from Rosenberg et al. (2005).  Source: Rosenberg and Kang, 2015

Other factors influence diversity as well, such as admixture between different groups, but distance from the original source is replicably the primary determining factor.  There are of course geographic irregularities, such as bodies of water or mountain ranges, but the general pattern is clear, consistent with archeology, linguistic patterns, and so on.

Tests of the interaction between genetic diversity and social factors
Forensics
Genetic diversity is used in forensics to identify a suspect with high probability if the DNA from the crime scene is a perfect match to an individual in the database.  If an exact match isn't found, the DNA profile may be used to identify relatives, which can be done because they will differ by theoretically predictable amounts.  The underlying genetic heterozygosity in a population, however, determines the likelihood that a partial match to a sample is from a genetic relative.  In a low diversity population, risk of a false positive is higher than in a high diversity population, because in the former a higher fraction of individuals will share each allele, which will mean it is less informative.

The different levels of genetic diversity in different populations means that the usefulness of DNA for identification purposes varies between populations.  And, populations are unequally represented in forensic databases.  That is a social issue, not a biological one, and doesn't obviate the relationship between genetic diversity and identification of social relationships.

Transplants
Genetic diversity is important in determining matches for the purpose of organ transplantation, particularly bone marrow.  Here, higher diversity populations will have lower match probabilities -- that is, it's most difficult to find a match when diversity in the population is highest, and the difficulty descends with decreasing diversity.  These are rather clear issues.

The difficulty is greater when populations are less likely to be well represented in match databases, which is, again, a social issue.
...the chance that no donor match is found is greatest for African Americans, followed by the Asian-American, Hispanic, Native American, and white groups. As in the forensic case, the population genetics of genetic diversity, together with societal factors that vary across populations, contributes to the quantity of ultimate interest. Both genetic diversity and its interaction with factors that affect participation in transplantation are important in increasing the probability that any given recipient can find a successful match.
GWAS
Genome wide association studies searching for alleles associated with disease rely on the relative proximity of SNPs, or DNA markers, with disease alleles.  In populations with high genetic diversity, in African populations, or among African Americans, because of the longer history of genomic recombination events that scramble nearby nucleotide variants over the generations, results in lower linkage disequilibrium (LD), so that the proximity of markers to causal alleles can't be relied upon with the same likelihoods as in more recent populations.  One needs more marker test sites to find the LD one needs to make associations with traits, for example.  R and K report that it has been estimated that 96% of subjects in GWAS are of European ancestry. The social implications of this are that disease alleles are even less likely to be identified in high diversity populations than in others.  The vast majority of GWAS and similar findings can be extrapolated only with great and unknown uncertainty at present (though many still attempt it, in what can be called expeditions of wishful thinking).

So, these are three examples of situations in which differences in genetic diversity between populations, interacting with social diversity, can have important social implications -- false positives in forensics, low probabilities of transplant matches, and low likelihood of inclusion in genetic research.
Each of these settings involves a problem that is fundamentally biological—DNA-based identification, transplantation, and genetics of disease. In each setting, principles from population-genetic theory in which aspects of genetic diversity feature prominently underlie the contribution of genetic diversity: theories of forensic and transplantation matching explicitly produce an inverse relationship between match probabilities and genetic diversity, and GWA statistics rely on models of the decay of genetic diversity and production of LD during migrations.  
Back to economics
R and K then return to the societal economics question, to re-examine whether population-level biological determinants are relevant to economic development, asking whether population genetic diversity is as useful when applied to a discipline in which population genetics theory is not relevant. Among other things, there are dangers of being statistically misled by phenomena such as Simpson's paradox and the ecological fallacy.

A and G used a small amount of genetic data to calculate genetic heterozygosity for a small number of populations, and imputed heterozygosity for many more based on geographic distance from Africa. Imputation generally takes sites found in one study that didn't look for variation between them, and assumes the states of those internal sites based on studies of other pouplations where they were typed.  This is a common, if iffy practice, in GWAS, but at least works reasonably well when the samples are from the same geographic area, such as Europe. It is sometimes needed because different GWA studies of a given trait use different marker sites (because they use different genotyping platforms).

R and K recalulated the results by using actual genetic data for more populations, but retaining the same analytic methods used in the original study.  So, rather than actual data for 53 populations in 21 countries, R and K used genetic data from 237 populations in 39 countries.  And they found no effect of genetic diversity on economic success.

Further, they chose multiple different samples of 21 countries, and found a significant effect in at most 27% of them.  Thus, three quarters of the time, had A and G chosen a different sample subset, they would have found no effect.  And, conclude R and K, even if the assumption that studying population genetic diversity and its effect on economic development is valid, the effect didn't persist for an expanded set of populations and countries.  While genetic diversity affects differences between populations in a variety of other ways, when the effect is biological and population genetics theory applies, economic success is not one of them.  "[P]rinciples of population genetics produce no theory of the economic development of nations..."

It is of course plausible that overall variation patterns include variation that leads one population, overall, to have more, or less, of some societal attribute.  One can always construct post hoc stories that fit social prejudices, for example.  But plausibility is not the same as truth, and one can -- and should -- ask why the investigators are making their societal assertions in the first place.  Generally, we know the answer, and it isn't very savory.

Rare Disease Day and the promises of personalized medicine

O ur daughter Ellen wrote the post that I republish below 3 years ago, and we've reposted it in commemoration of Rare Disease Day, Febru...