GWAS etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
GWAS etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Understanding Obesity? Fat Chance!

Obesity is one of our more widespread and serious health-threatening traits.  Many large-scale mapping as well as extensive environmental/behavioral epidemiological studies of obesity have been done over recent decades.  But if anything, the obesity epidemic seems to be getting worse.

There's deep meaning in that last sentence: the prevalence of obesity is changing rapidly.  This is being documented globally, and happening rapidly before our eyes.  Perhaps the most obvious implication is that this serious problem is not due to genetics!  That is, it is not due to genotypes that in themselves make you obese.  Although everyone's genotype is different, the changes are happening during lifetimes, so we can't attribute it to the different details of each generation's genotypes or their evolution over time. Instead, the trend is clearly due to lifestyle changes during lifetimes.

Of course, if you see everything through gene-colored lenses, you might argue (as people have) that sure, it's lifestyles, but only some key nutrient-responding genes are responsible for the surge in obesity.  These are the 'druggable' targets that we ought to be finding, and it should be rather easy since the change is so rapid that the genes must be few, so that even if we can't rein in McD and KFC toxicity, or passive TV-addiction, we can at least medicate the result.  That was always, at best, wishful thinking, and at worst, rationalization for funding Big Data studies.  Such a simple explanation would be good for KFC, and an income flood for BigPharma, the GWAS industry, DNA sequencer makers, and more.....except not so good for  those paying the medical price, and those who are trying to think about the problem in a disinterested scientific way.  Unfortunately, even when it is entirely sincere, that convenient hope for a simple genetic cause is being shown to be false.

A serious parody?
Year by year, more factors are identified that, by statistical association at least and sometimes by experimental testing, contribute to obesity.  A very fine review of this subject has appeared in the mid-October 201 Nature Reviews Genetics, by Ghosh and Bouchard, which takes seriously not just genetics but all the plausible causes of obesity, including behavior and environment, and their relationships as best we know them, and outlines the current state of knowledge.

Ghosh and Bouchard provide a well-caveated assessment of these various threads of evidence now in hand, and though they do end up with the pro forma plea for yet more funding to identify yet more details, they provide a clear picture that a serious reader can take seriously on its own merits.  However, we think that the proper message is not the usual one.  It is that we need to rethink what we've been investing so heavily on.

To their great credit, the authors melded behavioral, environmental, and genetic causation in their analysis. This is shown in this figure, from their summary; it is probably the best current causal map of obesity based on the studies the authors included in their analysis:



If this diagram were being discussed by John Cleese on Monty Python, we'd roar with laughter at what was an obvious parody of science.  But nobody's laughing and this isn't a parody!   And it is by no means of unusual shape and complexity.  Diagrams like this (but with little if any environmental component) have been produced by analyzing gene expression patterns even just of the early development of the simple sea urchin.  But we seem not to be laughing, which is understandable because they're serious diagrams.  On the other hand, we don't seem to be reacting other than by saying we need more of the same.  I think that is rather weird, for scientists, whose job it is to understand, not just list, the nature of Nature.

We said at the outset of this post that 'the obesity epidemic seems to be getting worse'.  There's a deep message there, but one essentially missing even from this careful obesity paper: it is that many of the causal factors, including genetic variants, are changing before our eyes. The frequency of genetic variants changes from population to population and generation to generation, so that all samples will look different.  And, mutations happen in every meiosis, adding new variants to a population every time a baby is born.   The results of many studies, as reflected in the current summary by Ghosh and Bouchard, show the many gene regions that contribute to obesity, their total net contribution is still minor.  It is possible, though perhaps very difficult to demonstrate, that an individual site might account more than minimally for some individual carriers in ways GWAS results can't really identify.  And the authors do cite published opinions that claim a higher efficacy of GWAS relative to obesity than we think is seriously defensible; but even if we're wrong, causation is very complex as the figure shows.

The individual genomic variants will vary in their presence or absence or frequency or average effect among studies, not to mention populations.  In addition, most contributing genetic variants are too rare or weak to be detected by the methods used in mapping studies, because of the constraints on statistical significance criteria, which is why so much of the trait's heritability in GWAS is typically unaccounted for by mapping.  These aspects and their details will differ greatly among samples and studies.

Relevant risk factors will come or go or change in exposure levels in the future--but these cannot be predicted, not even in principle.  Their interactions and contributions are also manifestly context-specific, as secular trends clearly show.  Even with the set of known genetic variants and other contributing factors, there are essentially an unmanageable number of possible combinations, so that each person is genetically and environmentally unique, and the complex combinations of future individuals are not predictable.

Risk assessment is essentially based on replicability, which in a sense is why statistical testing can be used (on which these sorts of results heavily rely).  However, because these risk factor combinations are each unique they're not replicable.  At best, as some advocate, the individual effects are additive so that if we just measure each in some individual add up each factor's effect, and predict the person's obesity (if the effects are not additive, this won't work).  We can probably predict, if perhaps not control, at least some of the major risk factors (people will still down pizzas or fried chicken while sitting in front of a TV). But even the known genetic factors in total only account for a small percentage of the trait's variance (the authors' Table 2), though the paper cites more optimistic authors.

The result of these indisputable facts is that as long as our eyes are focused, for research strategic reasons or lack of better ideas, on the litter of countless minor factors, even those we can identify, we have a fat chance of really addressing the problem this way.

If you pick any of the arrows (links) in this diagram, you can ask how strong or necessary that link is, how much it may vary among samples or depend on the European nature of the data used here, or to what extent even its identification could be a sampling or statistical artifact.  Links like 'smoking' or 'medication', not to mention specific genes, even if they're wholly correct, surely have quantitative effects that vary among people even within the sample, and the effect sizes probably often have very large variance. Many exposures are notoriously inaccurately reported or measured, or change in unmeasured ways.   Some are quite vague, like 'lifestyle', 'eating behavior', and many others--both hard to define and hard to assess with knowable precision, much less predictability.  Whether their various many effects are additive or have more complex interaction is another issue, and the connectivity diagram may be tentative in many places.  Maybe--probably?--in such traits simple behavioral changes would over-ride most of these behavioral factors, leaving those persons for whom obesity really is due to their genotype, which would then be amenable to gene-focused approaches.

If this is a friable diagram, that is, if the items, strengths, connections and so on are highly changeable, even if through no fault of the authors whatever, we can ask when and where and how this complex map is actually useful, no matter how carefully it was assembled.  Indeed, even if this is a rigidly accurate diagram for the samples used, how applicable is it to other samples or to the future?Or how useful is it in predicting not just group patterns, but individual risk?

Our personal view is that the rather ritual plea for more and more and bigger and bigger statistical association studies is misplaced, and, in truth, a way of maintaining funding and the status quo, something we've written much about--the sociopolitical economics of science today.  With obesity rising at a continuing rate and about a third of the US population recently reported as obese, we know that the future health care costs for the consequences will dwarf even the mega-scale genome mapping on which so much is currently being spent, if not largely wasted.  We know how to prevent much or most obesity in behavioral terms, and we think it is entirely fair to ask why we still pour resources into genetic mapping of this particular problem.

There are many papers on other complex traits that might seem to be simple like stature and blood pressure, not to mention more mysterious ones like schizophrenia or intelligence, in which hundreds of genomewide sites are implicated, strewn across the genome.  Different studies find different sites, and in most cases most of the heritability is not accounted for, meaning that many more sites are at work (and this doesn't include environmental effects).  In many instances, even the trait's definition itself may be comparably vague, or may change over time.  This is a landscape 'shape' in which every detail is different, within and between traits, but is found in common with complex traits.  That in itself is a tipoff that there is something consistent about these landscapes but we've not yet really awakened to it or learned how to approach it.

Rather than being skeptical about these Ghosh and Bouchard's' careful analysis or their underlying findings, I think we should accept their general nature, even if the details in any given study or analysis may not individually be so rigid and replicable, and ask: OK, this is the landscape--what do we do now?

Is there a different way to think about biological causation?  If not, what is the use or point of this kind of complexity enumeration, in which every person is different and the risks for the future may not be those estimated from past data to produce figures like the one above?  The rapid change in prevalence shows how unreliable these factors must be, at prediction--they are retrospective of the particular patterns of the study subjects.  Since we cannot predict the strengths or even presence of these or other new factors, what should we do?  How can we rethink the problem?

These are the harder question, much harder than analyzing the data; but they are in our view the real scientific questions that need to be asked.

The GWAS hoax....or was it a hoax? Is it a hoax?

A long time ago, in 2000, in Nature Genetics, Joe Terwilliger and I critiqued the idea then being pushed by the powers-that-be, that the genomewide mapping of complex diseases was going to be straightforward, because of the 'theory' (that is, rationale) then being proposed that common variants caused common disease.  At one point, the idea was that only about 50,000 markers would be needed to map any such trait in any global populations.  I and collaborators can claim that in several papers in prominent journals, in a 1992 Cambridge Press book, Genetic Variation and Human Disease, and many times on this blog we have pointed out numerous reasons, based on what we know about evolution, why this was going to be a largely empty promise.  It has been inconvenient for this message to be heard, much less heeded, for reasons we've also discussed in many blog posts.

Before we get into that, it's important to note that unlike me, Joe has moved on to other things, like helping Dennis Rodman's diplomatic efforts in North Korea (here, Joe's shaking hands as he arrives in his most recent trip).  Well, I'm more boring by far, so I guess I'll carry on with my message for today.....




There's now a new paper, coining a new catch-word (omnigenic), to proclaim the major finding that complex traits are genetically complex.  The paper seems solid and clearly worthy of note.  The authors examine the chromosomal distribution of sites that seem to affect a trait, in various ways including chromosomal conformation.  They argue, convincingly, that mapping shows that complex traits are affected by sites strewn across the genome, and they provide a discussion of the pattern and findings.

The authors claim an 'expanded' view of complex traits, and as far as that goes it is justified in detail. What they are adding to the current picture is the idea that mapped traits are affected by 'core' genes but that other regions spread across the genome also contribute. In my view the idea of core genes is largely either obvious (as a toy example, the levels of insulin will relate to the insulin gene) or the concept will be shown to be unclear.  I say this because one can probably always retroactively identify mapped locations and proclaim 'core' elements, but why should any genome region that affects a trait be considered 'non-core'?

In any case, that would be just a semantic point if it were not predictably the phrase that launched a thousand grant applications.  I think neither the basic claim of conceptual novelty, nor the breathless exploitive treatment of it by the news media, are warranted: we've known these basic facts about genomic complexity for a long time, even if the new analysis provides other ways to find or characterize the multiplicity of contributing genome regions.  This assumes that mapping markers are close enough to functionally relevant sites that the latter can be found, and that the unmappable fraction of the heritability isn't leading to over-interpretation of what is 'mapped' (reached significance) or that what isn't won't change the picture.

However, I think the first thing we really need to do is understand the futility of thinking of complex traits as genetic in the 'precision genomic medicine' sense, and the last thing we need is yet another slogan by which hands can remain clasped around billions of dollars for Big Data resting on false promises.  Yet even the new paper itself ends with the ritual ploy, the assertion of the essential need for more information--this time, on gene regulatory networks.  I think it's already safe to assure any reader that these, too, will prove to be as obvious and as elusively ephemeral as genome wide association studies (GWAS) have been.

So was GWAS a hoax on the public?
No!  We've had a theory of complex (quantitative) traits since the early 1900s.  Other authors argued similarly, but RA Fisher's famous 1918 paper is the typical landmark paper.  His theory was, simply put, that infinitely many genome sites contribute to quantitative (what we now call polygenic) traits.  The general model has jibed with the age-old experience of breeders who have used empirical strategies to improve crop, or pets species.  Since association mapping (GWAS) became practicable, they have used mapping-related genotypes to help select animals for breeding; but genomic causation is so complex and changeable that they've recognized even this will have to be regularly updated.

But when genomewide mapping of complex traits was first really done (a prime example being BRCA genes and breast cancer) it seemed that apparently complex traits might, after all, have mappable genetic causes. BRCA1 was found by linkage mapping in multiply affected families (an important point!), in which a strong-effect allele was segregating.  The use of association mapping  was a tool of convenience: it used random samples (like cases vs controls) because one could hardly get sufficient multiply affected families for every trait one wanted to study.  GWAS rested on the assumption that genetic variants were identical by descent from common ancestral mutations, so that a current-day sample captured the latest descendants of an implied deep family: quite a conceptual coup based on the ability to identify association marker alleles across the genome identical by descent from the un-studied shared remote ancestors.

Until it was tried, we really didn't know how tractable such mapping of complex traits might be. Perhaps heritability estimates based on quantitative statistical models was hiding what really could be enumerable, replicable causes, in which case mapping could lead us to functionally relevant genes. It was certainly worth a try!

But it was quickly clear that this was in important ways a fool's errand.  Yes, some good things were to be found here and there, but the hoped-for miracle findings generally weren't there to be found. This, however, was a success not a failure!  It showed us what the genomic causal landscape looked like, in real data rather than just Fisher's theoretical imagination.  It was real science.  It was in the public interest.

But that was then.  It taught us its lessons, in clear terms (of which the new paper provides some detailed aspects).  But it long ago reached the point of diminishing returns.  In that sense, it's time to move on.

So, then, is GWAS a hoax?
Here, the answer must now be 'yes'!  Once the lesson is learned, bluntly speaking, continuing on is more a matter of keeping the funds flowing than profound new insights.  Anyone paying attention should by now know very well what the GWAS etc. lessons have been: complex traits are not genetic in the usual sense of being due to tractable, replicable genetic causation.  Omnigenic traits, the new catchword, will prove the same.

There may not literally be infinitely many contributing sites as in the original statistical models, be they core or peripheral, but infinitely many isn't so far off.  Hundreds or thousands of sites, and accounting for only a fraction of the heritability means essentially infinitely many contributors, for any practical purposes.  This is particularly so since the set is not a closed one:  new mutations are always arising and current variants dying away, and along with somatic mutation, the number of contributing sites is open ended, and not enumerable within or among samples.

The problem is actually worse.  All these data are retrospective statistical fits to samples of past outcomes (e.g., sampled individuals' blood pressures, or cases' vs controls' genotypes).  Past experience is not an automatic prediction of future risk.  Future mutations are not predicable, not even in principle.  Future environments and lifestyles, including major climatic dislocations, wars, epidemics and the like are not predictable, not even in principle.  Future somatic mutations are not predictable, not even in principle.

GWAS almost uniformly have found (1) different mapping results in different samples or populations, (2) only a fraction of heritability is accounted for by tens, hundreds, or even thousands of genome locations and (3) even relatively replicable 'major' contributors, themselves usually (though not always) small in their absolute effect, have widely varying risk effects among samples.

These facts are all entirely expectable based on evolutionary considerations, and they have long been known, both in principle, indirectly, and from detailed mapping of complex traits.  There are other well-known reasons why, based on evolutionary considerations, among other things, this kind of picture should be expected.  They involve the blatantly obvious redundancy in genetic causation, which is the result of the origin of genes by duplication and the highly complex pathways to our traits, among other things.  We've written about them here in the past.  So, given what we now know, more of this kind of Big Data is a hoax, and as such, a drain on public resources and, perhaps worse, on the public trust in science.

What 'omnigenic' might really mean is interesting.  It could mean that we're pressing up ever more intensely against the log-jam of understanding based on an enumerative gestalt about genetics.  Ever more detail, always promising that if we just enumerate and catalog just a bit (in this case, the authors say we need to study gene regulatory networks) more we'll understand.  But that is a failure to ask the right question: why and how could every trait be affected by every part of the genome?  Until someone starts looking at the deeper mysteries we've been identifying, we won't have the transormative insight that seems to be called for, in my view.

To use Kuhn's term, this really is normal science pressing up against a conceptual barrier, in my view. The authors work the details, but there's scant hint they recognize we need something more than more of the same.  What is called for, I think is young people who haven't already been propagandized about the current way of thinking, the current grantsmanship path to careers.

Perhaps more importantly, I think the situation is at present an especially cruel hoax, because there are real health problems, and real, tragic, truly genetic diseases that a major shift in public funding could enable real science to address.

Some genetic non-sense about nonsense genes

The April 12 issue of Nature has a research report and a main article about what is basically presented as the discovery that people typically carry doubly knocked-out genes, but show no effect. The idea as presented in the editorial (p 171) notes that the report (p235) uses an inbred population to isolate double knockout genes (that is, recessive homozygous null mutations), and look at their effects.  The population sampled, from Pakistan, has high levels of consanguineous marriages.  The criteria for a knockout mutation was based on the protein coding sequence.

We have no reason to question the technical accuracy of the papers, nor their relevance to biomedical and other genetics, but there are reasons to assert that this is nothing newly discovered, and that the story misses the really central point that should, I think, be undermining the expensive Big Data/GWAS approach to biological causation.

First, for some years now there have been reports of samples of individual humans (perhaps also of yeast, but I can't recall specifically) in which both copies of a gene appear to be inactivated.  The criteria for saying so are generally indirect, based on nonsense, frameshift, or splice-site mutations in the protein code.  That is, there are other aspects of coding regions that may be relevant to whether this is a truly thorough search to see that whatever is coded really is non-functional.  The authors mention some of these.  But, basically, costly as it is, this is science on the cheap because it clearly only addresses some aspects of gene functionality.  It would obviously be almost impossible to show either that the gene was never expressed or never worked. For our purposes here, we need not question the finding itself.  The fact that this is not a first discovery does raise the question why a journal like Nature is so desperate for Dramatic Finding stories, since this one really should be instead a report in one of many specialty human genetics journals.

Secondly, there are causes other than coding mutations for gene inactivation. They have to do with regulatory sequences, and inactivating mutations in that part of a gene's functional structure is much more difficult, if not impossible, to detect with any completeness.  A gene's coding sequence itself may seem fine, but its regulatory sequences may simply not enable it to be expressed. Gene regulation depends on epigenetic DNA modification as well as multiple transcription factor binding sites, as well as the functional aspects of the many proteins required to activate a gene, and other aspects of the local DNA environment (such as RNA editing or RNA interference).  The point here is that there are likely to be many other instances of people with complete or effectively complete double knockouts of genes.

Thirdly, the assertion that these double KOs have no effect depends on various assumptions.  Mainly, it assumes that the sampled individuals will not, in the future, experience the otherwise-expected phenotypic effects of their defunct genes.  Effects may depend on age, sex, and environmental effects rather than necessarily being a congenital yes/no functional effect.

Fourthly, there may be many coding mutations that make the protein non-functional, but these are ignored by this sort of study because they aren't clear knockout mutations, yet they are in whatever data are used for comparison of phenotypic outcomes.  There are post-translational modification, RNA editing, RNA modification, and other aspects of a 'gene' that this is not picking up.

Fifthly, and by far most important, I think, is that this is the tip of the iceberg of redundancy in genetic functions.  In that sense, the current paper is a kind of factoid that reflects what GWAS has been showing in great, if implicit, detail for a long time: there is great complexity and redundancy in biological functions.  Individual mapped genes typically affect trait values or disease risks only slightly.  Different combinations of variants at tens, hundreds, or even thousands of genome sites can yield essentially the same phenotype (and here we ignore the environment which makes things even more causally blurred).

Sixthly, other samples and certainly other populations, as well as individuals within the Pakistani data base, surely carry various aspects of redundant pathways, from plenty of them to none.  Indeed, the inbreeding that was used in this study obviously affects the rest of the genome, and there's no particular way to know in what way, or more importantly, in which individuals.  The authors found a number of basically trivial or no-effect results as it is, even after their hunt across the genome. Whether some individuals had an attributable effect of a particular double knockout is problematic at best.  Every sample, even of the same population, and certainly of other populations, will have different background genotypes (homozygous or not), so this is largely a fishing expedition in a particular pond that cannot seriously be extrapolated to other samples.

Finally, this study cannot address the effect of somatic mutation on phenotypes and their risk of occurrence.  Who knows how many local tissues have experienced double-knockout mutations and produced (or not produced) some disease or other phenotype outcome.  Constitutive genome sequencing cannot detect this.  Surely we should know this very inconvenient fact by now!

Given the well-documented and pervasive biological redundancy, it is not any sort of surprise that some genes can be non-functional and the individual phenotypically within a viable, normal range. Not only is this not a surprise, especially by now in the history of genetics, but its most important implication is that our Big Data genetic reductionistic experiment has been very successful!  It has, or should have, shown us that we are not going to be getting our money's worth from that approach.  It will yield some predictions in the sense of retrospective data fitting to case-control or other GWAS-like samples, and it will be trumpeted as a Big Success, but such findings, even if wholly correct, cannot yield reliable true predictions of future risk.

Does environment, by any chance, affect the studied traits?  We have, in principle, no way to know what environmental exposures (or somatic mutations) will be like.  The by now very well documented leaf-litter of rare and/or small-effect variants plagues GWAS for practical statistical reasons (and is why usually only a fraction of heritability is accounted for).  Naturally, finding a single doubly inactivated gene may, but by no means need, yield reliable trait predictions.

By now, we know of many individual genes whose coded function is so proximate or central to some trait that mutations in such genes can have predictable effects.  This is the case with many of the classical 'Mendelian' disorders and traits that we've known for decades.  Molecular methods have admirably identified the gene and mutations in it whose effects are understandable in functional terms (for example, because the mutation destroys a key aspect of a coded protein's function).  Examples are Huntington's disease, PKU, cystic fibrosis, and many others.

However, these are at best the exceptions that lured us to think that even more complex, often late-onset traits would be mappable so that we could parlay massive investment in computerized data sets into solid predictions and identify the 'druggable' genes-for that Big Pharma could target.  This was predictably an illusion, as some of us were saying long ago and for the right reasons.  Everyone should know better now, and this paper just reinforces the point, to the extent that one can assert that it's the political economic aspects of science funding, science careers, and hungry publications, and not the science itself, that leads to the persistence of drives to continue or expand the same methods anyway.  Naturally (or should one say reflexively?), the authors advocate a huge Human Knockout Project to study every gene--today's reflex Big Data proposal.**

Instead, it's clearly time to recognize the relative futility of this, and change gears to more focused problems that might actually punch their weight in real genetic solutions!

** [NOTE added in a revision.  We should have a wealth of data by now, from many different inbred mouse and other animal strains, and from specific knockout experiments in such animals, to know that the findings of the Pakistani family paper are to be expected.  About 1/4 to 1/3 of knockout experiments in mice have no effect or not the same effect as in humans, or have no or different effect in other inbred mouse strains.  How many times do we have to learn the same lesson?  Indeed, with existing genomewide sequence databases from many species, one can search for 2KO'ed genes.  We don't really need a new megaproject to have lots of comparable data.]

Genomic causation....or not

By Ken Weiss and Anne Buchanan

The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.)  This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn.  The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss.  Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes.  Again we'll mention below what 'insufficient' might mean, if anything.

People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified.   We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain.  That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.

The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue.  We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.

Some of the obvious complicating issues
We know that a gene generally does not act alone.  DNA in itself is basically inert.  We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule.  This reinforces the yearning for causal simplicity and tractability.  Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.

There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:
1.  Current data are from blood-samples and that may not reflect the true constitutive genome because of early somatic mutation, and this will vary among study subjects,
2.  Life-long exposure to local somatic mutation is not considered nor measured, 
3.  Epigenetic changes, especially local tissue-specific ones, are not included, 
4.  Environmental factors are not considered, and indeed would be hard to consider,
5.  Non-Europeans, and even many Europeans are barely included, if at all, though this is  beginning to be addressed, 
6.  Regulatory variation, which GWAS has convincingly shown is much more important to most traits than coding variation, is not included. Exome data have been treated naively by many investigators as if that is what is important, and exome-only data have been used a major excuse for Great Big Grants that can't find what we know is probably far more important, 
7.  Non-coding regions, non-regulatory RNA regions are not included in exome-only data,
8.  A mutation may be causal in one context but not in others, in one family or population and not others, rendering the determination that it's a false discovery difficult,
9.  Single gene analysis is still the basis of the new 'revelations', that is, the idea being hinted at that the 'causal' gene isn't really causal....but one implicit notion is that it was misidentified, which is perhaps sometimes true but probably not always so,
 10.  The new reports are presented in the news, at least, as if the gene is being exonerated of its putative ill effects.  But that may not be the case, because if the regulatory regions near the mutated gene have no or little activity, the 'bad' gene may simply not be being expressed.  Its coding sequence could falsely be assumed to be harmless, 
11. Many aspects of this kind of work are dependent on statistical assumptions and subjective cutoff values, a problem recently being openly recognized, 
12.  Bigger studies introduce all sorts of statistical 'noise', which can make something appear causal or can weaken its actual apparent cause.  Phenotypes can be measured in many ways, but we know very well that this can be changeable and subjective (and phenotypes are not very detailed in the initial ExAC database), 
13.  Early reports of strong genetic findings have well known upward bias in effect size, the finder's curse that later work fails to confirm.

Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature.  But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.

Gregor Mendel studied fully penetrant (deterministic) causation.  That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study.  But even this was not so clear!  Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'.  (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.)  For whatever reason--and see below--the presence of a 'dominant' gene or  'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.

In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted.  The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule.  Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards.  Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising.  Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.

Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery.  This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation.  This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations.  But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense.  That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait.  This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.

This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes.  This is treated as a striking or even fundamental new finding, but it is nothing of that sort.  Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments.  The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.

In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants.  If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.

Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors.  Nobody doubts the role of BRCA1 but it is not in itself causal.  For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.

There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance.  A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect.  This seems to be the sort of thing the ExAC report is finding.

The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well.  There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason).  How could these even be sampled adequately?

Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects.  This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing.  Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.

So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise.  It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms.  It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.

Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations.  That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.

It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here.  They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).

But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge.  In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.

And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution.  Responses to environment (diet etc.) manifestly have the same problem.  It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life.  One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search.  So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved).  Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems.  And may apply only to that family.

The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex.  It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge.  And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.

Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show.  Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation.  This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.).  It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science.  That is, it's as much about feeding the science industry as it is about medicine and public health.  And that is why it's mainly about business as usual rather than real reform.

FAS - Fishy Association Studies

           
                                  On Saturday, July 19, 1879, the brilliant opera 
                                  composer, Richard Wagner, "had a bad night; 
                                  he thinks that...he ate too much trout."  
                                             Quoted from Cosima Wagner's Diary, Vol. II, 1878-83.

As I was reading Cosima Wagner's doting diary of life with her famous husband, I chanced across the above quote that seemed an appropriate, if snarky, way to frame today's post. The incident she related exemplifies how we routinely assign causation even to one-off events in daily life. Science, on the other hand, purports to be about causation of a deeper sort, with some sufficient form of regularity or replicability.

Cause and effect can be elusive concepts, especially difficult to winnow out from observations in the complex living world.  We've hammered on about this on MT over the years.  The best science at least tries to collect adequate evidence in order to infer causation in credible rather than casual ways. There are, for example, likely to be lots of reasons, other than eating trout, that could explain why a cranky genius like Wagner had a bad night.  It is all too easy to over-interpret associations in causal terms.










By such thinking, the above figures (from Wikimedia commons) might be interpreted as having the following predictive power:
     One fish = bad night
     Two fish = total insomnia
     Many fish = hours of nightmarish dissonance called Tristan und Isolde!

Too often, we salivate over GWAS (genomewide association studies) results as if they justify ever-bigger and longer studies.  But equally too often, these are FAS, fishy association studies.  That is what we get when the science community doesn't pay heed to the serious and often fundamental difficulties in determining causation that may well undermine their findings and the advice so blithely proffered to the public.

We are not the only ones who have been writing that the current enumerative, 'Big Data', approach to biomedical and even behavior genetic causation leaves, to say the least, much to be desired.  Among other issues, there's too much asserting conclusions on inadequate evidence, and not enough recognition of when assertions are effectively not that much more robust than saying one 'ate too much trout'.  Weak statistical associations, so typically the result of these association studies, are not the same as demonstrations of causation.

The idea of mapping complex traits by huge genomewide case-control or population sample studies is a captivating one for biomedical researchers.  It's mechanical, perfectly designed to be done by huge computer database analysis by people who may never have seen the inside of a wet lab (e.g., programmers and 'informatics' or statistical specialists who have little serious critical understanding of the underlying biology).  It's often largely thought-free, because that makes the results safe to publish, safe for getting more grants, and so on; but more than being 'captivating' it is 'capturing'.... a hog-trough's share of research resources.

The promise, not even always carefully hedged with escape-words lest it be shown to be wrong, is that from your genome your future biomedical (and behavioral) traits can be known.  A recent article in the July 28 issue of the Journal of the American Medical Association (JAMA), Joyner et al. describes the stubborn persistence of under-performing but costly research that becomes entrenched, a perpetuation that NIH's misnomered 'precision based genomic medicine' continues or even expands upon. Below is our riff on the article, but it's open-source so you can read the points they make and judge for yourself if we have the right 'take' on what they say.  It is one of many articles that have been making similar points....in case anyone is listening.

The problem is complex causation
The underlying basic problem is the complex nature of causation of 'complex' traits, like many if not most behavioral or chronic or late-onset diseases. The word complex, long-used for such traits, refers not to identified causes but to the fact that the outcomes clearly did not have simple, identified causes.  It seemed clear that their causation was due mainly to countless combinations of many individually small causal factors, some of which were inherited; but the specifics were usually unknown. Computer and various DNA technologies made it possible, in principle, to identify and sort through huge numbers of possible causes or at least statistically associated factors, including DNA sequence variants.  But underlying this source for this approach has been the idea, always a myth really, that identifying some enumerated set of causes in a statistical sample would allow accurate prediction of outcomes.  This has proven not to be the case nearly as generally as has been promised.

To me, the push to do large-scale huge-sample, survey-based genomewide risk analysis was at least partly justified, at least in principle, years ago when there might have been some doubt about the nature of the causal biology underlying complex traits, including the increasingly common chronic disease problems that our aging population faces.  But the results are in, and in fact have been in for quite a long time.  Moreover, and a credit to the validity of the science, is that the results support what we had good reason to know for a long time.  The results show that this approach is not, or at least clearly no longer the optimal way to do science in this area or contribute to improving public health (and much of the same applies to evolutionary biology as well).

I think it fair to say that I was making these points, in print, in prominent places, starting as long ago as nearly 30 years, in books and journal articles (and more recently here on MT), that is, ever since the relevant actual data were beginning to appear.  But neither I nor my collaborators were the original discoverers of this insight: instead, the basic truth has been known in principle and in many empirical experimental (such as agricultural breeding) and observational contexts, for nearly a century! Struggling with the inheritance of causal elements ('genes' as they were generically known), the 1930s' 'modern synthesis' of evolutionary biology reconciled (1) Darwin's idea of gradual evolution, mainly of quantitative traits, with the experimental evidence of the quantitative nature of their inheritance, and (2) the discrete nature of inheritance of discrete causal elements first systematically demonstrated by Mendel for selected 2-state traits.  That was a powerful understanding but in too many ways it has thoughtlessly been taken to imply that all traits, not just genes, are usefully 'Mendelian', due to substantial, enumerable, strongly causal genetic agents.  That has always been the exception, not the rule.

A view is possible that is not wholly cynical 
We have been outspoken about the sociocultural aspect of modern research, which can be understood by what one might call the FTM (Follow the Money) approach, in some ways a better way to understand where we are than looking at the science itself.  Who has what to gain by the current approaches?  Our understanding is aided by realizing that the science is presented to us by scientists and journalists, supplier industries and bureaucrats, who have vested interests that are served by promoting that way of doing business.

FTM isn't the only useful perspective, however.  A less cynical, and yet still appropriate way to look at this is in terms of diminishing returns.  The investment in the current way of doing science in this (and other areas) is part of our culture.  From a scientific point of view, the first forays into a new way or approach, or a theoretical idea, yield quick and, by definition, new results.  Eventually, it becomes more routine and the per-study yield diminishes. We asymptotically approach what we can glean from the approach.  Eventually some chance insight will yield some forms of better and more powerful approaches, whatever they'll be.

If current approaches were just yielding low-cost incremental gain, or were being done in well-off investigators' basement labs, it would be a normal course of scientific-history, and nobody would have reason to complain.  But that isn't how it works these days.  These days understanding via FTM is important: the science establishment's hands are in all our pockets, and we should expect more in return than the satisfaction that the trough has been feeding many very nice careers (including mine), in universities, journalism, and so on.  How, when, and where a properly increased expectation of science for societal benefits will be fulfilled is not predictable, because facts are elusive and Nature often opaque.  However, simply more-of-the-same, at its current costs, with continuing entrenched justification, isn't the best way for public resources to be used.

There will always be a place for 'big data' resources.  A unified system of online biomedical records would save a lot of excess repeat-testing and other clinical costs, if every doctor you consult could access those records.  The records could potentially be used for research purposes, to the (limited) extent that they could be informative.  For a variety of conditions that would be very useful and cost-effective indeed; but most of those would be relatively rare.

Continuing to pour research funds into the idea that ever more 'data' will lead to dramatic improvements of 'precision' medicine is far more about the health of entrenched university labs and investigators than that of the general citizenry. Focused laboratory work that is more rigorously supported by theory or definitive experiment, with some accountability (but no expectations nor promises of miracles) is in order, given what the GWAS etc. era, plus a century of evolutionary genetics, has shown. There are countless areas, especially many serious early onset diseases, for which we have a focused, persuasive, meaningful understanding of causation and where resources should now be invested more heavily.

Intentionally open-ended beetle collecting ventures joined at the hip to promises of 'precision' without those promising even knowing what that word means (but hinting that it means 'perfection'), or glorifying the occasional seriously good findings as if they are typical or as though more focussed, less open-ended research wouldn't be a better investment, is not a legitimate approach.  Yet that is largely what is going on today.  The scientists, at least the smart ones, know this very well and say so (in confidence, of course).

Understanding complex causation is complex, and we have to face up to that.  We can't demand inexpensive or instant or even predictable answers.  These are inconvenient facts few want to face up to.  But we and others have said this ad nauseam before, so here we wanted to point out the current JAMA paper as yet another formal and prominently published realization of the costly inertia in which we are embedded, and by highly capable authors. In any aspect of society, not just science, prying resources loose from the hands of a small elite is never easy, even when there are other ways to use those resources that might have better payoff for all of us.

Usually, such resource reallocation seems to require some major new and imminent external threat, or some unpredicted discovery, which I think is far more likely to come from some smaller operation where thinking was more important than cranking out yet another mass-scale statistical survey of Big Data sausage.  Still, every push against wasteful inertia, like the Joyner et al. JAMA paper,  helps. Indeed, those many whose careers are entrapped by that part of the System have the skills and neuronal power to do something better if circumstances enabled it to happen more readily.  To encourage that, perhaps we should stop paying so much attention to Fishy stories.

How many diseases does it take to map a SNP? Fifteen years on

Ken and I are here in Finland, preparing to teach a week of Logical Reasoning in Human Genetics with Joe Terwilliger and colleagues.  Not statistical methods, not laboratory techniques, not the latest way to analyze sequence data.  Concepts, logical reasoning.

Ken and Joe have been reasoning logically for a long time.  They've taught this course together in many places, and they wrote at least one logically reasoned paper 15 years ago.  That paper was published in Nature Genetics.  That journal shortly afterwards made an editorial policy decision to be the loudspeaker for genetic association studies (GWAS), and would be unlikely in the extreme to publish such a view today.  But Joe often says that it could, and probably should be published again, with very few wording changes.  (He also says that if overhead projectors were still available, he'd give the same talks he gave in 1995, since the issues in human genetics haven't changed.  We have lots more data, but no fundamentally new concepts or insights regarding SNP associations and complex traits.  In fairness, though, he does update his slides -- he adds photos of the latest places he has traveled.  Looking forward to photos of Crimea this week.)

The 2000 paper was called, "How many diseases does it take to map a SNP?"  They began:
There are more than a few parallels between the California gold rush and today's frenetic drive towards linkage disequilibrium (LD) mapping based on single-nucleotide polymorphisms (SNPs). This is fuelled by a faith that the genetic determinants of complex traits are tractable, and that knowledge of genetic variation will materially improve the diagnosis, treatment or prevention of a substantial fraction of cases of the diseases that constitute the major public health burden of industrialized nations. Much of the enthusiasm is based on the hope that the marginal effects of common allelic variants account for a substantial proportion of the population risk for such diseases in a usefully predictive way. A main area of effort has been to develop better molecular and statistical technologies often evaluated by the question: how many SNPs (or other markers) do we need to map genes for complex diseases? We think the question is inappropriately posed, as the problem may be one primarily of biology rather than technology.
Today, emphasis is more on ever larger sample sizes to find rare alleles, since common alleles turned out not to be the magical answer, but the issues are the same.  The problem is biological, rather than one of sample size. And not only do we have at least as much causal complexity due to environmental factors, but to the mix have been added the comparable complex 'genetic' causal factors as epigenetic modification of DNA affecting gene expression, and the potential contributions of highly complex microbiome.

The idea of mapping diseases from SNPs is that markers will be near the disease allele.  But, there are problems with this, as GWAS are successfully showing.
If traits do not strongly predict underlying genotypes, that is, if P(GP|Ph) is small, linkage and LD mapping may have very low power or may not work at all. As an extreme example, one's genotype cannot be reliably determined by merely stepping on the bathroom scale! But even if this could be done, there is a widespread but invalid belief that because something can be mapped (that is, P(GP|Ph) is high), the causal predictive power of the genotype (P(Ph|GP); Fig 1, blue arrow) will also be high. In fact, we have surprisingly little data on this latter topic, which requires extensive sampling from the general population, rather than patients. Note that the opposite can also be untrue—that is, if P(Ph|GP) is high it does not mean P(GP|Ph) will be high, as in genetically heterogeneous mendelian disorders such as retinitis pigmentosa. It is important to note that when we speak of P(Ph|GP) in this context, we speak of the marginal mode of inheritance, which is only valid for consideration of singletons, and relatives will not have independent and identically distributed penetrances (even without assuming epistasis or gene-environment interactions) because the other genetic and environmental factors are also correlated among them! Similar arguments can be made about detectance, P(GP|Ph), which must always be a function of the ascertainment, something that is often overlooked in the literature when investigators make comparisons of power for different study designs

 Figure 1. Schematic model of trait aetiology.
The phenotype under study, Ph, is influenced by diverse genetic, environmental and cultural factors (with interactions indicated in simplified form). Genetic factors may include many loci of small or large effect, GPi, and polygenic background. Marker genotypes, Gx, are near to (and hopefully correlated with) genetic factor, Gp, that affects the phenotype. Genetic epidemiology tries to correlate Gx with Ph to localize Gp. Above the diagram, the horizontal lines represent different copies of a chromosome; vertical hash marks show marker loci in and around the gene, Gp, affecting the trait. The red Pi are the chromosomal locations of aetiologically relevant variants, relative to Ph.
Other inconvenient biological issues, mentioned in the paper, include that linkage disequilibrium is stochastic, and this has implications for the use of SNPs in disease mapping, that regulatory rather than protein coding sites often affect disease risk, and these are generally impossible to identify (see below), that late-onset chronic diseases are much more complex than the clearly genetic pediatric disease, that the most effective disease mapping and association studies are done in "selective samples of individuals or families at high risk relative to the average risk in the population, and from populations with unusual histories" (hence, Joe's eclectic and interesting travelogue), etiology tends to be very heterogeneous, phenotype can't predict genotype and vice versa, environmental effects can be significant, but are unpredictable and often impossible to identify, and so on.

Whole genome sequencing will not be a general miracle cure.  Exome sequencing can sometimes find  coding variants that have strong effects because we know how to identify exomes and how to read their code.  But many if not most mapped sites for complex traits, as might be expected, are in regulatory regions.  Yet we are still quite inept at identifying regulatory regions, for many reasons not least having to do with their complexity and fluidity among individuals and populations.  So whole genome sequencing will likely have to be analyzed by using markers, as in GWAS, and that will not automatically show us where key regulatory affects are located or how they work.  If these are too heterogeneous, they'll vary hugely, so that mapping will still face the complexity problem.  Time will tell what transpires.
The problems faced in treating complex diseases as if they were Mendel's peas show, without invoking the term in its faddish sense, that 'complexity' is a subject that needs its own operating framework, a new twenty-first rather than nineteenth—or even twentieth—century genetics.
So, if the data are better and less costly now than 15 years ago, the basic issues haven't changed.  

Rare Disease Day and the promises of personalized medicine

O ur daughter Ellen wrote the post that I republish below 3 years ago, and we've reposted it in commemoration of Rare Disease Day, Febru...