complexity etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
complexity etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Understanding Obesity? Fat Chance!

Obesity is one of our more widespread and serious health-threatening traits.  Many large-scale mapping as well as extensive environmental/behavioral epidemiological studies of obesity have been done over recent decades.  But if anything, the obesity epidemic seems to be getting worse.

There's deep meaning in that last sentence: the prevalence of obesity is changing rapidly.  This is being documented globally, and happening rapidly before our eyes.  Perhaps the most obvious implication is that this serious problem is not due to genetics!  That is, it is not due to genotypes that in themselves make you obese.  Although everyone's genotype is different, the changes are happening during lifetimes, so we can't attribute it to the different details of each generation's genotypes or their evolution over time. Instead, the trend is clearly due to lifestyle changes during lifetimes.

Of course, if you see everything through gene-colored lenses, you might argue (as people have) that sure, it's lifestyles, but only some key nutrient-responding genes are responsible for the surge in obesity.  These are the 'druggable' targets that we ought to be finding, and it should be rather easy since the change is so rapid that the genes must be few, so that even if we can't rein in McD and KFC toxicity, or passive TV-addiction, we can at least medicate the result.  That was always, at best, wishful thinking, and at worst, rationalization for funding Big Data studies.  Such a simple explanation would be good for KFC, and an income flood for BigPharma, the GWAS industry, DNA sequencer makers, and more.....except not so good for  those paying the medical price, and those who are trying to think about the problem in a disinterested scientific way.  Unfortunately, even when it is entirely sincere, that convenient hope for a simple genetic cause is being shown to be false.

A serious parody?
Year by year, more factors are identified that, by statistical association at least and sometimes by experimental testing, contribute to obesity.  A very fine review of this subject has appeared in the mid-October 201 Nature Reviews Genetics, by Ghosh and Bouchard, which takes seriously not just genetics but all the plausible causes of obesity, including behavior and environment, and their relationships as best we know them, and outlines the current state of knowledge.

Ghosh and Bouchard provide a well-caveated assessment of these various threads of evidence now in hand, and though they do end up with the pro forma plea for yet more funding to identify yet more details, they provide a clear picture that a serious reader can take seriously on its own merits.  However, we think that the proper message is not the usual one.  It is that we need to rethink what we've been investing so heavily on.

To their great credit, the authors melded behavioral, environmental, and genetic causation in their analysis. This is shown in this figure, from their summary; it is probably the best current causal map of obesity based on the studies the authors included in their analysis:



If this diagram were being discussed by John Cleese on Monty Python, we'd roar with laughter at what was an obvious parody of science.  But nobody's laughing and this isn't a parody!   And it is by no means of unusual shape and complexity.  Diagrams like this (but with little if any environmental component) have been produced by analyzing gene expression patterns even just of the early development of the simple sea urchin.  But we seem not to be laughing, which is understandable because they're serious diagrams.  On the other hand, we don't seem to be reacting other than by saying we need more of the same.  I think that is rather weird, for scientists, whose job it is to understand, not just list, the nature of Nature.

We said at the outset of this post that 'the obesity epidemic seems to be getting worse'.  There's a deep message there, but one essentially missing even from this careful obesity paper: it is that many of the causal factors, including genetic variants, are changing before our eyes. The frequency of genetic variants changes from population to population and generation to generation, so that all samples will look different.  And, mutations happen in every meiosis, adding new variants to a population every time a baby is born.   The results of many studies, as reflected in the current summary by Ghosh and Bouchard, show the many gene regions that contribute to obesity, their total net contribution is still minor.  It is possible, though perhaps very difficult to demonstrate, that an individual site might account more than minimally for some individual carriers in ways GWAS results can't really identify.  And the authors do cite published opinions that claim a higher efficacy of GWAS relative to obesity than we think is seriously defensible; but even if we're wrong, causation is very complex as the figure shows.

The individual genomic variants will vary in their presence or absence or frequency or average effect among studies, not to mention populations.  In addition, most contributing genetic variants are too rare or weak to be detected by the methods used in mapping studies, because of the constraints on statistical significance criteria, which is why so much of the trait's heritability in GWAS is typically unaccounted for by mapping.  These aspects and their details will differ greatly among samples and studies.

Relevant risk factors will come or go or change in exposure levels in the future--but these cannot be predicted, not even in principle.  Their interactions and contributions are also manifestly context-specific, as secular trends clearly show.  Even with the set of known genetic variants and other contributing factors, there are essentially an unmanageable number of possible combinations, so that each person is genetically and environmentally unique, and the complex combinations of future individuals are not predictable.

Risk assessment is essentially based on replicability, which in a sense is why statistical testing can be used (on which these sorts of results heavily rely).  However, because these risk factor combinations are each unique they're not replicable.  At best, as some advocate, the individual effects are additive so that if we just measure each in some individual add up each factor's effect, and predict the person's obesity (if the effects are not additive, this won't work).  We can probably predict, if perhaps not control, at least some of the major risk factors (people will still down pizzas or fried chicken while sitting in front of a TV). But even the known genetic factors in total only account for a small percentage of the trait's variance (the authors' Table 2), though the paper cites more optimistic authors.

The result of these indisputable facts is that as long as our eyes are focused, for research strategic reasons or lack of better ideas, on the litter of countless minor factors, even those we can identify, we have a fat chance of really addressing the problem this way.

If you pick any of the arrows (links) in this diagram, you can ask how strong or necessary that link is, how much it may vary among samples or depend on the European nature of the data used here, or to what extent even its identification could be a sampling or statistical artifact.  Links like 'smoking' or 'medication', not to mention specific genes, even if they're wholly correct, surely have quantitative effects that vary among people even within the sample, and the effect sizes probably often have very large variance. Many exposures are notoriously inaccurately reported or measured, or change in unmeasured ways.   Some are quite vague, like 'lifestyle', 'eating behavior', and many others--both hard to define and hard to assess with knowable precision, much less predictability.  Whether their various many effects are additive or have more complex interaction is another issue, and the connectivity diagram may be tentative in many places.  Maybe--probably?--in such traits simple behavioral changes would over-ride most of these behavioral factors, leaving those persons for whom obesity really is due to their genotype, which would then be amenable to gene-focused approaches.

If this is a friable diagram, that is, if the items, strengths, connections and so on are highly changeable, even if through no fault of the authors whatever, we can ask when and where and how this complex map is actually useful, no matter how carefully it was assembled.  Indeed, even if this is a rigidly accurate diagram for the samples used, how applicable is it to other samples or to the future?Or how useful is it in predicting not just group patterns, but individual risk?

Our personal view is that the rather ritual plea for more and more and bigger and bigger statistical association studies is misplaced, and, in truth, a way of maintaining funding and the status quo, something we've written much about--the sociopolitical economics of science today.  With obesity rising at a continuing rate and about a third of the US population recently reported as obese, we know that the future health care costs for the consequences will dwarf even the mega-scale genome mapping on which so much is currently being spent, if not largely wasted.  We know how to prevent much or most obesity in behavioral terms, and we think it is entirely fair to ask why we still pour resources into genetic mapping of this particular problem.

There are many papers on other complex traits that might seem to be simple like stature and blood pressure, not to mention more mysterious ones like schizophrenia or intelligence, in which hundreds of genomewide sites are implicated, strewn across the genome.  Different studies find different sites, and in most cases most of the heritability is not accounted for, meaning that many more sites are at work (and this doesn't include environmental effects).  In many instances, even the trait's definition itself may be comparably vague, or may change over time.  This is a landscape 'shape' in which every detail is different, within and between traits, but is found in common with complex traits.  That in itself is a tipoff that there is something consistent about these landscapes but we've not yet really awakened to it or learned how to approach it.

Rather than being skeptical about these Ghosh and Bouchard's' careful analysis or their underlying findings, I think we should accept their general nature, even if the details in any given study or analysis may not individually be so rigid and replicable, and ask: OK, this is the landscape--what do we do now?

Is there a different way to think about biological causation?  If not, what is the use or point of this kind of complexity enumeration, in which every person is different and the risks for the future may not be those estimated from past data to produce figures like the one above?  The rapid change in prevalence shows how unreliable these factors must be, at prediction--they are retrospective of the particular patterns of the study subjects.  Since we cannot predict the strengths or even presence of these or other new factors, what should we do?  How can we rethink the problem?

These are the harder question, much harder than analyzing the data; but they are in our view the real scientific questions that need to be asked.

The GWAS hoax....or was it a hoax? Is it a hoax?

A long time ago, in 2000, in Nature Genetics, Joe Terwilliger and I critiqued the idea then being pushed by the powers-that-be, that the genomewide mapping of complex diseases was going to be straightforward, because of the 'theory' (that is, rationale) then being proposed that common variants caused common disease.  At one point, the idea was that only about 50,000 markers would be needed to map any such trait in any global populations.  I and collaborators can claim that in several papers in prominent journals, in a 1992 Cambridge Press book, Genetic Variation and Human Disease, and many times on this blog we have pointed out numerous reasons, based on what we know about evolution, why this was going to be a largely empty promise.  It has been inconvenient for this message to be heard, much less heeded, for reasons we've also discussed in many blog posts.

Before we get into that, it's important to note that unlike me, Joe has moved on to other things, like helping Dennis Rodman's diplomatic efforts in North Korea (here, Joe's shaking hands as he arrives in his most recent trip).  Well, I'm more boring by far, so I guess I'll carry on with my message for today.....




There's now a new paper, coining a new catch-word (omnigenic), to proclaim the major finding that complex traits are genetically complex.  The paper seems solid and clearly worthy of note.  The authors examine the chromosomal distribution of sites that seem to affect a trait, in various ways including chromosomal conformation.  They argue, convincingly, that mapping shows that complex traits are affected by sites strewn across the genome, and they provide a discussion of the pattern and findings.

The authors claim an 'expanded' view of complex traits, and as far as that goes it is justified in detail. What they are adding to the current picture is the idea that mapped traits are affected by 'core' genes but that other regions spread across the genome also contribute. In my view the idea of core genes is largely either obvious (as a toy example, the levels of insulin will relate to the insulin gene) or the concept will be shown to be unclear.  I say this because one can probably always retroactively identify mapped locations and proclaim 'core' elements, but why should any genome region that affects a trait be considered 'non-core'?

In any case, that would be just a semantic point if it were not predictably the phrase that launched a thousand grant applications.  I think neither the basic claim of conceptual novelty, nor the breathless exploitive treatment of it by the news media, are warranted: we've known these basic facts about genomic complexity for a long time, even if the new analysis provides other ways to find or characterize the multiplicity of contributing genome regions.  This assumes that mapping markers are close enough to functionally relevant sites that the latter can be found, and that the unmappable fraction of the heritability isn't leading to over-interpretation of what is 'mapped' (reached significance) or that what isn't won't change the picture.

However, I think the first thing we really need to do is understand the futility of thinking of complex traits as genetic in the 'precision genomic medicine' sense, and the last thing we need is yet another slogan by which hands can remain clasped around billions of dollars for Big Data resting on false promises.  Yet even the new paper itself ends with the ritual ploy, the assertion of the essential need for more information--this time, on gene regulatory networks.  I think it's already safe to assure any reader that these, too, will prove to be as obvious and as elusively ephemeral as genome wide association studies (GWAS) have been.

So was GWAS a hoax on the public?
No!  We've had a theory of complex (quantitative) traits since the early 1900s.  Other authors argued similarly, but RA Fisher's famous 1918 paper is the typical landmark paper.  His theory was, simply put, that infinitely many genome sites contribute to quantitative (what we now call polygenic) traits.  The general model has jibed with the age-old experience of breeders who have used empirical strategies to improve crop, or pets species.  Since association mapping (GWAS) became practicable, they have used mapping-related genotypes to help select animals for breeding; but genomic causation is so complex and changeable that they've recognized even this will have to be regularly updated.

But when genomewide mapping of complex traits was first really done (a prime example being BRCA genes and breast cancer) it seemed that apparently complex traits might, after all, have mappable genetic causes. BRCA1 was found by linkage mapping in multiply affected families (an important point!), in which a strong-effect allele was segregating.  The use of association mapping  was a tool of convenience: it used random samples (like cases vs controls) because one could hardly get sufficient multiply affected families for every trait one wanted to study.  GWAS rested on the assumption that genetic variants were identical by descent from common ancestral mutations, so that a current-day sample captured the latest descendants of an implied deep family: quite a conceptual coup based on the ability to identify association marker alleles across the genome identical by descent from the un-studied shared remote ancestors.

Until it was tried, we really didn't know how tractable such mapping of complex traits might be. Perhaps heritability estimates based on quantitative statistical models was hiding what really could be enumerable, replicable causes, in which case mapping could lead us to functionally relevant genes. It was certainly worth a try!

But it was quickly clear that this was in important ways a fool's errand.  Yes, some good things were to be found here and there, but the hoped-for miracle findings generally weren't there to be found. This, however, was a success not a failure!  It showed us what the genomic causal landscape looked like, in real data rather than just Fisher's theoretical imagination.  It was real science.  It was in the public interest.

But that was then.  It taught us its lessons, in clear terms (of which the new paper provides some detailed aspects).  But it long ago reached the point of diminishing returns.  In that sense, it's time to move on.

So, then, is GWAS a hoax?
Here, the answer must now be 'yes'!  Once the lesson is learned, bluntly speaking, continuing on is more a matter of keeping the funds flowing than profound new insights.  Anyone paying attention should by now know very well what the GWAS etc. lessons have been: complex traits are not genetic in the usual sense of being due to tractable, replicable genetic causation.  Omnigenic traits, the new catchword, will prove the same.

There may not literally be infinitely many contributing sites as in the original statistical models, be they core or peripheral, but infinitely many isn't so far off.  Hundreds or thousands of sites, and accounting for only a fraction of the heritability means essentially infinitely many contributors, for any practical purposes.  This is particularly so since the set is not a closed one:  new mutations are always arising and current variants dying away, and along with somatic mutation, the number of contributing sites is open ended, and not enumerable within or among samples.

The problem is actually worse.  All these data are retrospective statistical fits to samples of past outcomes (e.g., sampled individuals' blood pressures, or cases' vs controls' genotypes).  Past experience is not an automatic prediction of future risk.  Future mutations are not predicable, not even in principle.  Future environments and lifestyles, including major climatic dislocations, wars, epidemics and the like are not predictable, not even in principle.  Future somatic mutations are not predictable, not even in principle.

GWAS almost uniformly have found (1) different mapping results in different samples or populations, (2) only a fraction of heritability is accounted for by tens, hundreds, or even thousands of genome locations and (3) even relatively replicable 'major' contributors, themselves usually (though not always) small in their absolute effect, have widely varying risk effects among samples.

These facts are all entirely expectable based on evolutionary considerations, and they have long been known, both in principle, indirectly, and from detailed mapping of complex traits.  There are other well-known reasons why, based on evolutionary considerations, among other things, this kind of picture should be expected.  They involve the blatantly obvious redundancy in genetic causation, which is the result of the origin of genes by duplication and the highly complex pathways to our traits, among other things.  We've written about them here in the past.  So, given what we now know, more of this kind of Big Data is a hoax, and as such, a drain on public resources and, perhaps worse, on the public trust in science.

What 'omnigenic' might really mean is interesting.  It could mean that we're pressing up ever more intensely against the log-jam of understanding based on an enumerative gestalt about genetics.  Ever more detail, always promising that if we just enumerate and catalog just a bit (in this case, the authors say we need to study gene regulatory networks) more we'll understand.  But that is a failure to ask the right question: why and how could every trait be affected by every part of the genome?  Until someone starts looking at the deeper mysteries we've been identifying, we won't have the transormative insight that seems to be called for, in my view.

To use Kuhn's term, this really is normal science pressing up against a conceptual barrier, in my view. The authors work the details, but there's scant hint they recognize we need something more than more of the same.  What is called for, I think is young people who haven't already been propagandized about the current way of thinking, the current grantsmanship path to careers.

Perhaps more importantly, I think the situation is at present an especially cruel hoax, because there are real health problems, and real, tragic, truly genetic diseases that a major shift in public funding could enable real science to address.

Causal complexity in life

Evolution is the process that generates the relationships between genomes and traits in organisms.  Although we have written extensively and repeatedly about the issues raised by causal complexity,  we were led to write this post by a recent paper, in the 21 October 2016 issue of Science, which discusses molecular pathways to hemoglobin (Hb) gene function.  Although one might expect this to be rather simple and genomically direct, it is in fact complex and there are many different ways to achieve comparable function.

The authors, C Nataragan et al.,  looked at the genetic basis of adaptation to habitats at different altitude, focusing on genes coding for Hb molecules, that transport oxygen in the blood to provide the body's tissues with this vital fuel.  As a basic aspect of our atmosphere, oxygen concentrations differ at different altitudes, being low in mountainous regions compared to lowlands.  Species must somehow adapt to their localities, and at least one way to to this is for oxygen transport efficiency mechanisms to differ at different elevations.  Bird species have moved into and among these various environments on many independent occasions.

The affinity of Hb molecules for, that is, ability to bind oxygen, depends on their amino acid sequence, and the authors found that this varies by altitude.  The efficiency is similar among species at similar altitudes, even if due to independent population expansions. But when they looked at the Hb coding sequences in different species, they found a variety of species-specific changes.  That is, there are multiple ways to achieve similar function, so that parallel evolution at the functional level, which is what Nature detects, is achieved by many different mutational pathways.  In that sense, while an adaptation can be predicted, a specific genetic reason cannot be.

The authors looked only at coding regions, but of course evolution also involves regulatory sequences (among other functional regions in DNA), so there is every reason to expect that there is even more complexity to the adaptive paths taken.

Important specific documentation....but not conceptually new, though unappreciated
The authors also looked at what they call 'resurrected ancestral' proteins, by experimentally testing the efficacy of some specific Hb mutations, and they found that genomic background made a major difference in how, or whether, a specific change would affect oxygen binding.  This shows that evolution is contingent on local conditions, and that a given genomic change depends on the genomic background.  The ad hoc, locally contingent nature of evolution is (or should be) a central aspect of evolutionary world views, but there is a widespread tendency to think in classical Mendelian terms, of a gene for this and a gene for that, so that one would expect similar results in similar, if independent areas or contexts.  This is a common, if often tacit, view underlying much of genome mapping to find genes 'for' some human trait, like important diseases.  But it is quite misleading, or more accurately, is very wrong.

In 2008 we wrote about this in Genetics, as we've done before and since here on MT and in other papers.  In the 2008 article we used the following image to suggest metaphorically the nature of this complex causation, with its alternative pathways and the like, where the 'trait' is the amount of water passing New Orleans on the Mississippi River.  The figure suggests how difficult it would be to determine 'the' causal source of the water, how many different ways there are to get the same river level.

Drainage complexity as a metaphor for genomic causal complexity.  Map by Richard Weiss and ArcInfo
One can go even further, and note that this is exactly the kind of findings that are to be expected from and documented by the huge list of association studies done of human traits.  These typically find a great many genome regions whose variation contributes to the trait, usually each with a small individual effect, and mainly at low frequency in the population.  That means that individuals with similar trait values (say, diabetes, obesity, tall, or short stature, etc.) have different genotypes, that overlap in incomplete and individually unique ways.

We have written about aspects of this aspect of life, in what we called evolution by phenotype, in various places.  Nature screens on traits directly and only on genes very indirectly in most situations in complex organisms.  This means that many genotypes yield the same phenotype, and these will be equivalent in the face of natural selection and will experience genetic drift among them even in the fact of natural selection, again because selection screens the phenotype.  This is the process we called phenogenetic drift.  These papers were not 'discoveries' of ours but just statements of what is pretty obvious even if inconvenient for those seeking simple genetic causation.

The Science paper on altitude adaptation shows this by stereotypical sequences from one individual each from a variety of different species, rather than different individuals within each species, but that one can expect must also exist.  The point is that a priori prediction of how hemoglobin adaptation will occur is problematic, except that each species must have some adaptation to available oxygen.  Parallel phenotype evolution need not be matched by parallel genotypic evolution because selection 'sees' phenotypes and doesn't 'care' about how they are achieved.

The reason for this complexity is simple: it is that this is how evolution working via phenotypes rather than genotypes molds the genetic aspects of causation.

Genomic causation....or not

By Ken Weiss and Anne Buchanan

The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.)  This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn.  The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss.  Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes.  Again we'll mention below what 'insufficient' might mean, if anything.

People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified.   We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain.  That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.

The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue.  We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.

Some of the obvious complicating issues
We know that a gene generally does not act alone.  DNA in itself is basically inert.  We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule.  This reinforces the yearning for causal simplicity and tractability.  Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.

There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:
1.  Current data are from blood-samples and that may not reflect the true constitutive genome because of early somatic mutation, and this will vary among study subjects,
2.  Life-long exposure to local somatic mutation is not considered nor measured, 
3.  Epigenetic changes, especially local tissue-specific ones, are not included, 
4.  Environmental factors are not considered, and indeed would be hard to consider,
5.  Non-Europeans, and even many Europeans are barely included, if at all, though this is  beginning to be addressed, 
6.  Regulatory variation, which GWAS has convincingly shown is much more important to most traits than coding variation, is not included. Exome data have been treated naively by many investigators as if that is what is important, and exome-only data have been used a major excuse for Great Big Grants that can't find what we know is probably far more important, 
7.  Non-coding regions, non-regulatory RNA regions are not included in exome-only data,
8.  A mutation may be causal in one context but not in others, in one family or population and not others, rendering the determination that it's a false discovery difficult,
9.  Single gene analysis is still the basis of the new 'revelations', that is, the idea being hinted at that the 'causal' gene isn't really causal....but one implicit notion is that it was misidentified, which is perhaps sometimes true but probably not always so,
 10.  The new reports are presented in the news, at least, as if the gene is being exonerated of its putative ill effects.  But that may not be the case, because if the regulatory regions near the mutated gene have no or little activity, the 'bad' gene may simply not be being expressed.  Its coding sequence could falsely be assumed to be harmless, 
11. Many aspects of this kind of work are dependent on statistical assumptions and subjective cutoff values, a problem recently being openly recognized, 
12.  Bigger studies introduce all sorts of statistical 'noise', which can make something appear causal or can weaken its actual apparent cause.  Phenotypes can be measured in many ways, but we know very well that this can be changeable and subjective (and phenotypes are not very detailed in the initial ExAC database), 
13.  Early reports of strong genetic findings have well known upward bias in effect size, the finder's curse that later work fails to confirm.

Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature.  But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.

Gregor Mendel studied fully penetrant (deterministic) causation.  That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study.  But even this was not so clear!  Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'.  (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.)  For whatever reason--and see below--the presence of a 'dominant' gene or  'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.

In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted.  The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule.  Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards.  Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising.  Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.

Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery.  This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation.  This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations.  But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense.  That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait.  This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.

This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes.  This is treated as a striking or even fundamental new finding, but it is nothing of that sort.  Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments.  The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.

In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants.  If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.

Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors.  Nobody doubts the role of BRCA1 but it is not in itself causal.  For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.

There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance.  A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect.  This seems to be the sort of thing the ExAC report is finding.

The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well.  There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason).  How could these even be sampled adequately?

Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects.  This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing.  Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.

So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise.  It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms.  It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.

Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations.  That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.

It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here.  They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).

But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge.  In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.

And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution.  Responses to environment (diet etc.) manifestly have the same problem.  It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life.  One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search.  So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved).  Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems.  And may apply only to that family.

The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex.  It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge.  And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.

Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show.  Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation.  This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.).  It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science.  That is, it's as much about feeding the science industry as it is about medicine and public health.  And that is why it's mainly about business as usual rather than real reform.

Another look at 'complexity'

A fascinating and clear description of one contemporary problem of sciences involved in 'complexity' can be found in an excellent discussion of how brains work, in yesterday's Aeon Magazine essay ("The Empty Brain," by Robert Epstein).  Or rather, of how brains don't work.  Despite the ubiquity of the metaphor, brains are not computers.  Newborn babies, Epstein says, are born with brains that can learn, respond to the environment and change as they grow.
But here is what we are not born with: information, data, rules, software, knowledge, lexicons, representations, algorithms, programs, models, memories, images, processors, subroutines, encoders, decoders, symbols, or buffers – design elements that allow digital computers to behave somewhat intelligently. Not only are we not born with such things, we also don’t develop them – ever.
We are absolutely unqualified to discuss or even comment on the details or the neurobiology discussed.  Indeed, even the author himself doesn't provide any sort of explanation of how brains actually work, using general hand-waving terms that are almost tautologically true, as when he says that experiences 'change' the brains.  This involves countless neural connections (it must, since what else is there in the brain that is relevant?), and would be entirely different in two different people.

In dismissing the computer metaphor as a fad based on current culture, which seems like a very apt critique, he substitutes vague reasons without giving a better explanation.  So, if we don't somehow 'store' an image of things in some 'place' in the brain, somehow we obviously do retain abilities to recall it.  If the data-processing imagery is misleading, what else could there be?

We have no idea!  But one important thing is that this essay reveals is that the problem of understanding multiple-component phenomena is a general one.  The issues with the brain seem essentially the same as the issues in genomics, that we write about all the time, in which causation of the 'same' trait in different people is not due to the same causal factors (and we are struggling to figure out what they are in the first place).

A human brain, but what is it?  Wikipedia

In some fields like physics, chemistry, and cosmology, each item of a given kind, like an electron or a field or photon or mass is identical and their interactions replicable (if current understanding is correct).  Complexities like the interactions or curves of motion among many galaxies each with many stars, planets, and interstellar material and energy, the computational and mathematical details are far too intricate and extensive for simple solutions.  So one has to break the pattern down into subsets and simulate them on a computer.  This seems to work well, however, and the reason is that the laws of behavior in physics apply equally to every object or component.

Biology is comprised of molecules and at their level of course the same must be true.  But at anything close to the level of our needs for understanding, replicability is often very weak, except in the general sense that each person is 'more or less' alike in its physiology, neural structures, and so on. But at the level of underlying causation, we know that we're generally each different, often in ways that are important.  This applies to normal development, health and even to behavior.  Evolution works by screening differences, because that's how new species and adaptations and so on arise.  So it is difference that is fundamental to us, and part of that is that each individual with the 'same' trait has it for different reasons.  They may be nearly the same or very different--we have no a priori way to know, no general theory that is of much use in predicting, and we should stop pouring resources into projects to nibble away at tiny details, a convenient distraction from the hard thinking that we should be doing (as well as addressing many clearly tractable problems in genetics and behavior, where causal factors are strong, and well-known).

What are the issues?
There are several issues here and it's important to ask how we might think about them.  Our current scientific legacy has us trying to identify fundamental causal units, and then to show how they 'add up' to produce the trait we are interested in.  Add up means they act independently and each may, in a given individual, have its own particular strength (for example, variants at multiple contributing genes, with each person carrying a unique set of variants, and the variants having some specifiable independent effect).  When one speaks of 'interactions' in this context, what is usually meant is that (usually) two factors combine beyond just adding up.  The classical example within a given gene is 'dominance', in which the effect of the Aa genotype is not just the sum of the A and the a effects.  Statistical methods allow for two-way interactions in roughly this way, by including terms like zAXB (some quantitative coefficient times the A and the B state in the individual), assuming that this is the same in every A-B instance (z is constant).

This is very generic (not based on any theory of how these factors interact), but for general inference that they do act in relevant ways, it seems fine.  Theories of causality invoke such patterns as paths of factor interaction, but they almost always assume various clearly relevant simplifications:  that interactions are only pair-wise, that there is no looping (the presence of A and B set up the effect, but A and B don't keep interacting in ways that might change that and there's no feedback from other factors), that the size of effects are fixed rather than being different in each individual context.

For discovery purposes this may be fine in many multivariate situations, and that's what the statistical package industry is about. But the assumptions may not be accurate and/or the number and complexity of interactions too great to be usefully inferred in practical data--too many interactions for achievable sample sizes, their parameters being affected by unmeasured variables, their individual effects too small to reach statistical 'significance' but in aggregate accounting for the bulk of effects, and so on.

These are not newly discovered issues, but often they can only be found by looking under the rug, where they've been conveniently swept because our statistical industry doesn't and cannot adequately deal with them.  This is not a fault of the statistics except in the sense that they are not modeling things accurately enough, and in really complex situations, which seem to be the rule rather than the exception, it is simply not an appropriate way to make inferences.

We need, or should seek, something different.  But what?
Finding better approaches is not easy, because we don't know what form they should take.  Can we just tweak what we have, or are we asking the wrong sorts of questions for the methods we know about?  Are our notions of causality somehow fundamentally inadequate?  We don't know the answers.  But what we now do have is a knowledge of the causal landscape that we face.  It tells us that enumerative approaches are what we know how to do, but what we also know are not an optimal way to achieve understanding.  The Aeon essay describes yet another such situation, so we know that we face the same sort of problem, which we call 'complexity' as a not very helpful catchword, in many areas.  Modern science has shown this to us.  Now we need to use appropriate science to figure it out.

Statistical Reform.....or Safe-harbor Treadmill Science?

We have recently commented on the flap in statistics circles about the misleading use of significance test results (p-values) rather than a more complete and forthright presentation of the nature of the results and their importance (three posts, starting here).  There has been a lot of criticism of what boils down to misrepresentative headlines publicizing what are in essence very minor results.  The American Statistical Association recently published a statement about this, urging clearer presentation of results.  But one may ask about this and the practice in general. Our recent set of posts discussed the science.  But what about the science politics in all of this?

The ASA is a trade organization whose job it is, in essence, to advance the cause and use of statistical approaches in science.  The statistics industry is not a trivial one.  There are many companies who make and market statistical analytic software.  Then there are the statisticians themselves and their departments and jobs.  So one has to ask is the ASA statement and the other hand-wringing sincere and profound or, or to what extent, is this a vested interest protecting its interests?  Is it a matter of finding a safe harbor in a storm?

Statistical analysis can be very appropriate and sophisticated in science, but it is also easily mis- or over-applied.  Without it, it's fair to say that many academic and applied fields would be in deep trouble; sociopolitical sciences and many biomedical sciences as well fall into this category.  Without statistical methods to compare and contrast sampled groups, these areas rest on rather weak theory.  Statistical 'significance' can be used to mask what is really low level informativeness or low importance under a patina of very high quantitative sophistication.  Causation is the object of science, but statistical methods too often do little more than describe some particular sample.

When a problem arises, as here, there are several possible reactions.  One is to stop and realize that it's time for deeper thinking: that current theory, methods, or approaches are not adequately addressing the questions that are being asked.  Another reaction is to do public hand-wringing and say that what this shows is that our samples have been too small, or our presentations not clear enough, and we'll now reform.  

But if the effects being found are, as is the case in this controversy, typically very weak and hence not very important to society, then the enterprise and the promised reform seem rather hollow. The reform statements have had almost no component that suggests that re-thinking is what's in order. In that sense, what's going on is a stalling tactic, a circling of wagons, or perhaps worse, a manufactured excuse to demand even larger budgets and longer-term studies, that is to demand more--much more--of the same.

The treadmill problem

If that is what happens, it will keep scientists and software outfits and so on, on the same treadmill they've been on, that has led to the problem.  It will also be contrary to good science.  Good science should be forced by its 'negative' results, to re-think its questions. This is, in general, how major discoveries and theoretical transformations have occurred.  But with the corporatization of academic professions, both commercial and in the sense of trade-unions, we have an inertial factor that may actually impede real progress.  Of course, those dependent on the business will vigorously resist or resent such a suggestion. That's normal and can be expected, but it won't help unless a spirited attack on the problems at hand goes beyond more-of-the-same.




Is it going to simulate real new thinking, or mainly just strategized thinking for grants and so on?

So is the public worrying about this a holding action or a strategy? Or will we see real rather than just symbolic, pro forma, reform? The likelihood is not, based on the way things work these days.

There is a real bind here. Everyone depends on the treadmill and keeping it in operation. The labs need their funding and publication treadmills, because staff need jobs and professors need tenure and nice salaries. But if by far most findings in this arena are weak at best, then what journals will want to publish them? They have to publish something and keep their treadmill going. What news media will want to trumpet them, to feed their treadmill? How will professors keep their jobs or research-gear outfits sell their wares?

There is fault here, but it's widespread, a kind of silent conspiracy and not everyone is even aware of it. It's been built up gradually over the past few decades, like the frog in slowly heating water who does't realize he's about to be boiled alive. We wear the chains we've forged in our careers. It's not just a costly matter, and one of understandable careerism. It's a threat to the integrity of the enterprise itself.
We have known many researchers who have said they have to be committed to a genetic point of view because that's what you have to do to get funded, to keep your lab going, to get papers in the major journals or have a prominent influential career. One person applying for a gene mapping study to find even lesser genomic factors than the few that were already well-established said, when it was suggested that rather than find still more genes, perhaps the known genes might now be investigated instead, "But, mapping is what I do!".  Many a conversation I've heard is a quiet boasting about applying for funding for work that's already been done, so one can try something else (that's not being proposed for reviewers to judge).

If this sort of 'soft' dishonesty is part of the game (and if you think it's 'soft'), and yet science depends centrally on honesty, why do we think we can trust what's in the journals?  How many seriously negating details are not reported, or buried in huge 'supplemental' files, or not visible because of intricate data manipulation? Gaming the system undermines the very core of science: its integrity.  Laughing about gaming the system adds insult to injury.  But gaming the system is being taught to graduate students early in their careers (it's called 'grantsmanship').


We have personally encountered this sort of attitude, expressed only in private of course, again and again in the last couple of decades during which big studies and genetic studies have become the standard operating mode in universities, especially biomedical science (it's rife in other areas like space research, too, of course).  


There's no bitter personal axe being ground here.  I've retired, had plenty of funding through the laboratory years, our work was published and recognized.  The problem is of science not personal.  The challenge to understand genetics, development, causation and so forth is manifestly not an easy one, or these issues would not have arisen.  

It's only human, perhaps, given that the last couple of generations of scientists systematically built up an inflated research community, and the industries that serve it, much of which depends on research grant funding, largely at the public trough, with jobs and labs at stake.  The members of the profession know this, but are perhaps too deeply immersed to do anything major to change it, unless some sort of crisis forces that upon us. People well-heeled in the system don't like these thoughts being expressed, but all but the proverbial 1%-ers, cruising along just fine in elite schools with political clout and resources, know there's a problem and know they dare not say too much about it.


The statistical issues are not the cause.  The problem is a combination of the complexity of biological organisms as they have evolved, and the simplicity of human desires to understand (and not to get disease).  We are pressured not just to understand, but to translate that into dramatically better public and individual health.  Sometimes it works very well, but we naturally press the boundaries, as science should.  But in our current system we can't afford to be patient.  So, we're on a treadmill, but it's largely a treadmill of our own making.

We're all fundamentalists now

If you're a foodie at all, you've heard of Yotam Ottolenghi, chef, restauranteur, and food writer.  Perhaps you've used some of his recipes, or even have one or more of his cookbooks. And, if you're a fan you'll be happy to know that the Jan 11 episode of the Food Programme on BBC Radio 4 has an interview with him (starting at minute 7:15), and a brief overview of how he got to such a place of prominence in the food world.  

Ottolenghi and his business partner, Sami Tamimi both come from Jerusalem, but Ottolenghi from the Jewish west side and Tamimi from the Arab west.  They both now live in London where they have collaborated since the late 1990's on restaurants and delis and cookbooks, much of it with the aim of highlighting the food of their childhood.  Not only is their food amazing, but it's also worth noting that two men from two sides of the same strife-ridden Middle Eastern city have worked closely together for many years. This isn't something that everyone could do.   

One of the cookbooks Ottolenghi and Tamimi wrote together is called Jerusalem, written to accompany a BBC television program of the same name. For the show, they returned to their birthplace and described and prepared some of their favorite foods, but it wasn't just about the food.  Tamimi said it was difficult to return. He believes that people were much more naive when he was a child, having faith that the conflict between Israel and Palestine could be solved. Now, he says, people are much more entrenched in their belief in the rightness of their side, and it's much more difficult to imagine the differing sides agreeing on a solution.

To us in the West, the Middle East epitomizes fundamentalism, strict adherence to the literal interpretation of a religious text or dogma.  And, fundamentalism goes hand in hand with terrorism. Fundamentalism is our enemy.

But, it's not just in the Middle East that people are more entrenched in their beliefs about right and wrong. Here in the US we've got the Tea Party dictating what real conservatism is, we've got militiamen in Oregon, and homegrown 'terrorists' demanding whatever they're demanding. We've got a Congress that agrees only to disagree. Dare I say it, even the 'new atheists' are fundamentalists. Indeed, compromise has become a dirty word, immoral even. In so many ways, moderation, the ability to see more than one side of an issue, has lost its way.

Ken's view is that in a world in which fundamentalists are now our enemy, we've all become fundamentalists; we know what we believe, we hold to those beliefs without question, and we have no respect for the other side.  If there is a strongly ideological force that you disagree with or that threatens you, it pushes you toward an equal and opposite ideology. You listen only to Fox News or MSNBC, you turn off the radio when Trump comes on, or when Clinton comes on, depending on your predilection -- or you're waiting for the Libertarian candidate to be selected, and there's no way you'll listen to anyone else.

If you're here reading this it's likely that you've also picked a side in the nature/nurture 'debate'; 'genetic determinism' either nicely describes your view of biology, or you're very uncomfortable with the term. Genes will or will not be found 'for' most traits, including behaviors, and diseases will or won't be predictable once we've all got our genomes on a CD.

We've said this before but it's worth repeating.  In 1926, one of the great early geneticists, Thomas Hunt Morgan, wrote this about stature:
A man may be tall because he has long legs, or because he has a long body, or both. Some of the genes may affect all parts, but other genes may affect one region more than another. The result is that the genetic situation is complex and, as yet, not unraveled. Added to this is the probability that the environment may also to some extent affect the end-product.
                                  (TH Morgan, The Theory of the Gene, p 294, 1926):
Morgan would be totally comfortable with the recent GWAS results showing that there are hundreds if not thousands of genes that contribute to stature, as well as environmental factors.  He'd agree that complex traits, like stature, or many diseases (including schizophrenia, which Ken will talk about tomorrow) are polygenic, with some environmental effect.  This has been known for almost a century. So why are people still looking for genes (meaning single genes, or a few genes with individually strong effects) 'for' type 2 diabetes, or heart disease, or stature, or schizophrenia?  Why don't we still know what Morgan knew so long ago?


Because sometimes it's not true.  Sometimes there are single genes whose variants are by themselves responsible for traits, including disease.  Starting in the early 1980's, the role of single genes in various traits began to be discovered; oncogenes, Huntington's, cystic fibrosis, breast cancer, and a whole host of single-gene pediatric diseases, and normal traits as well, like blood types, eye color and so on.  There are now about 6000 rare diseases for which genetic causation appears to be known, or at least claimed.   This history of successes mislead, we would say, geneticists, and others, into assuming they could always expect to find 'the' gene for this and 'the' gene for that.  In essence it is still the informal working model, in the back of geneticists' heads, that everything segregates like Mendel's pea traits.

We can and do have both -- single-gene traits and complex traits due to many genes, or many genes and environmental factors too.  Indeed, there are also traits that are completely environmental -- look at the havoc Zika virus seems to be wreaking, with apparently no help from genes, even if close examination might find some people to be slightly more immune than others. Most viruses are like that.

So, it's curious that even the field of genetics has its fundamentalists.  Every time Ken and I write about complexity, or insufficient understanding of disease causation, or question how we know what we think we know, someone will send us a link to a paper that shows we're wrong because autism, or schizophrenia, or intelligence, or whatever their favorite trait, has been shown to be clearly genetic. Genes, with names, have been found to explain it. Sometimes the comments are so emotionally unrestrained that you'll never see them because we don't publish them.

And, we'll often or even typically look at the paper and realize that we've been reprimanded by a fundamentalist yet again. Autism, schizophrenia, heart disease, stature, intelligence, and so on are just not yet predictable from genes, and, we believe, are unlikely ever to be for reasons we write about all the time.  Ken will discuss the new Nature paper on schizophrenia tomorrow, a paper that got huge amounts of press for finally beginning to explain the disease.  Yes, a paper someone offered to send us when they disapproved of a post Ken had written about the difficulties of predicting disease, proving he was wrong. Which, good as that paper may be, is not the case.

I think if Morgan were to come back to the modern field of genetics, he'd feel as Sami Tamimi did returning to Jerusalem.  I think he'd be nostalgic for his era, when fundamentalists didn't rule the field, geneticists weren't prisoners of Mendel, ideologues who know what they'd find before they even looked. Where, even if things seem rosier in retrospect, and certainly people had preferred views and were not always nice to each other, there was more agreement that things were not yet clearly understood, and complexity was not a dirty word.  I think Morgan would appreciate that some traits are explained more simply than others, but that even those aren't 'simple' -- there are more than 2000 alleles in the CFTR gene that seem to be associated with cystic fibrosis, and this kind of complexity is true of most 'simple' traits.

So, why did the field lose this understanding, and take a turn to fundamentalism?  The answer isn't just that we're in a fundamentalist age, of course.  That it's a lot easier to sell the search for a causal gene than a search for.....we're not really sure what, is a large part of the problem.  But, as a friend says, we should be looking for the molasses that explains biological complexity, that connects causal pathways and processes, which ain't just gonna be a gene, or an environmental risk factor.  It's going to be something we don't yet understand, and continuing to look for 'the' gene for your favorite complex trait is only going to slow down the search.  Acknowledging that what we've learned, and confirmed over and over again since Mendel was rediscovered in 1900, is that most traits are complex -- and unpredictable -- is a crucial step.

Life in 'trans'-it: Why genomic causation is often so elusive

We are in a time when genes are in the daily news, with reports of how this gene or that gene is related to disease, evolution, race, ancestry, and even social behavior.  But what are 'genes', and what do they do?  This is so often presented--in classes, even at higher levels of education--as a simple story presenting genes as bits of DNA that code for a protein, and proteins the molecules that do the functions of life.  We are still heavily influenced by the pioneering work of Gregor Mendel, who did his famous experiments with peas more than 150 years ago.  So, we still think of genes as elements with one or more variant states in a population, transmitted from parents to offspring, which cause some trait (he studied traits like size, shape, or color in his pea plants to try use this fact to breed better agricultural crops).

Mendel's intentionally focused, single-cause approach opened the way for an understanding of the mechanisms of inheritance and enabled one of the most powerful research strategies in all of science. But the idea of one gene and one function is a 19th century legacy that has put a conceptual cage around our thinking ever since.  Mendelian inheritance and its terms (like dominance and recessiveness, and even some of his notation) are still around, and indeed it all is rather ubiquitous even at the university level.  But we now know better, and can do better, and the many discoveries of the last century in biology and genetics present us with many 'mysterious' facts, basically unanticipated by the long, persistent shadow of Mendel's well-chosen simplifications.  It requires some thinking outside the Mendelian box to understand what they might mean.  

The cis image of the world
DNA is located in the nucleus of our cells, but where does genetic function take place?  The usual Mendelian way of thinking is that the action occurs in a particular place in our DNA where a 'gene' is. The gene codes for protein and (usually) has nearby DNA sequences that regulate the gene's usage---turning on its expression by transcribing the gene into messengerRNA.  That is, the gene itself determines how it's used.  It's in a given place in our DNA, and the presence of a complex of regulatory proteins that attach to nearby sequence cause the gene to be transcribed into messenger RNA, which exits the nucleus and is in turn translated into an amino acid chain specified by the sequence.  The amino acid chain is then folded up into a functional protein.

This local, focal view of gene action is what is called a cis perspective.  The Latin origin has a meaning like 'right here', or 'on this side'.  The specifics of this process differ depending on the gene, as no two genes work exactly alike, but the variation in the details is not central to the main point here,  the widespread perception of genes  as modular, chromosomally local self-standing functional units.

But this common idea of how genes work is inaccurate--it's a fundamentally inaccurate way to understand genes and genomic function.

The fundamental nature of life in trans-it
DNA is itself essentially an inert molecule.  It doesn't do anything by itself.  In turn that means that each nucleotide, and that means each new mutational change, cannot be said to have a function or effect, or effect size, on its own.  It only has an effect in terms of its interactions with other aspects of the genome in the same cell, other materials in that cell, that cell in its respective organ and that organ in the organism as a whole, and indeed all of this in relation to environmental factors. While some gene-regulatory regions are near a coding gene, and act in cis, most function involves things elsewhere, on the same chromosome or on others.  This is the trans causal world of life, and it means we cannot really understand what's 'here' without knowing what's elsewhere.

Indeed even Darwinian evolution is fundamentally an ecological phenomenon--it's about organisms' resources, threats, mates, and so on, at any given time.  As well as luck, there may be many levels and aspects of life that are about competition for resources and so on, that are important to survival and reproduction.  But cooperating, in the sense of appropriate interaction, is by far the most prevalent, immediate, and vital aspect of life (Richard Dawkins' ideological 'selfish gene' excessive assertions notwithstanding).

Trans means cooperation in life and evolution
Trans interactions are just that: interactions.  That means multiple components working together, which involves the 'right' combinations in the 'right' time and the 'right' cellular place.  By 'right' I mean functionally viable.  During development and subsequent live, organisms require suitable expression patterns of genes and the dispersion and processing pattern of gene products.  If this combinatorial action--this cooperation--doesn't occur to a suitable degree, the organism fails and its reproduction is reduced.  The extent of this failure depends on the nature of the combinatorial action.

In this sense, trans interactions may be reproductively better or worse and that can be a form of natural selection, whose result is the 'better' (more viably successful) patterns proliferate.  But this does not require Darwinian selection among organisms competing for limited resource.  Genomic variants whose cooperative interactions do not function can lead to embryonic lethality, for example, which need have nothing whatever to do with competition, and certainly not with other organisms seeking mates, food, or safety.  Ineffective cooperation is an evolutionary factor not identical to natural selection in its mechanism, but with similarly 'adaptive' effects.

In our view, cooperation based on trans interactions is more important, more prevalent, and more fundamental than Darwinian natural selection (as we write in our book The Mermaid's Tale).  Interactions that are successful become increasingly installed in the life history of organisms ('canalized' to use CH Waddington's venerable term for it), and this constrains the way and perhaps the rate at which evolution can occur.  This is neither heresy nor surprise.  For example, genes present today are the descendants of 4 billion years of evolutionary history, and most are used in multiple ways in the organism (at least in complex multicellular organisms; we don't know how true this is of simple or single-celled species).  They are less likely to suffer mutational change without serious effect, mainly negative. This is a very long-established idea, and is clearly supported by the high degree of sequence conservation of genes in genomes.

Genomewide mapping of most traits identifies many different genome regions that can statistically affect a trait's presence or measure.  But mapping rarely identifies coding regions.  Most 'hits' are in regulatory regions or regions with other (usually unknown) function.

This should surprise no one.  First, as noted above, 'genes' (protein coding regions) are largely of evolutionary long standing and embedded in interaction patterns usually in multiple contexts (they are 'pleiotropic'), so the coding parts are harder than regulatory parts to modify viably by mutation. It is empirically much more likely that their expression patterns can be varied.  Second, every gene is a complex of many different components (protein code, splice and polyadenylation signals--where the required AAAAA... tail of a mRNA molecule is attached--promoter sites, enhancer sites, and so on). Each of these is mutable in principle, and ample evidence shows that regulatory regions are especially so.  And each transcription factor or other gene product that is needed to activate a given gene (that is, the tens of proteins and their DNA binding sites that must assemble to cause a nearby gene  to be expressed) is itself a gene with all the same sort of complex modular structures.  RNA has to be processed, transported and translated by factors that, again, are potentially mutable.  And so on.  And then most final functions, physiological, developmental, metabolic, or physical are the result of complex processes over time, involving many genes and systems.

In fact, in recognition of biological complexity, many investigators suggest that the proper level of analysis should be of systems, that is, organized pathways of interaction that bring about some end result.  Gene regulation, physiology and metabolism, and so on, represent such entities.  The 'emergence' of the result cannot be predicted by listing the individual contributing elements, in the same sense that the effect of a new mutational change cannot be understood without considering its context.  However, systems themselves have overlap, redundancy, and elements that contributed in different systems at different times, and many systems may themselves interact in what one might call hyper-systems for a result--like you--to come about.  Analyzing emergent systems is at present an active but in many ways immature endeavor, because we still probably don't have adequate understanding, or perhaps not even adequate technology for the job.  But it's important that people are considering the trans world in this and other ways.

Causal complexity is predictable, and what we expect is what we see
Causation in life is fundamentally about cooperation which is about trans interactions.  Since cells are isolated from each other, so they can sense their own environments and respond to them, they actively signal to each other and a major way gene expression is regulated is through complex signal sending and receiving mechanisms.  'Signals' can mean gene-coded proteins secreted from cells, or the detection by cells of ions or other chemicals in their environment, and so on.  Signaling and responding to environmental conditions involves large numbers of genes and their regulation in time and space.  Most genes, in fact, have such cooperative, communicative function.

In turn, this implies that traits have many contributing genes, and their modular coding and regulatory sequences (and other forms of genome function, such as packaging and many different types of RNA), and each of these is potentially mutable and potentially variable within and between samples, populations, and species.  The result is the high level of causal complexity that is being so clearly documented.  A very large amount of viable contributing variation can be expected, if the individual variants have small effect.  The trait itself must be viable, but viability can coexist with large amounts of variation in the hundreds of contributing components.  This is what GWAS consistently finds, and is wholly consistent with how evolution works.

Life is complex in these ways in very understandable (and predictable) ways.  Enumeration of causes or even defining 'causes' are often  fool's errands because different variants in different genome regions in different samples and populations are to be expected.

It's a highly cooperative trans world out there!

How do we know what we think we know?

Two stories collided yesterday to make me wonder, yet again, how we know what we think we know.  The first was from the latest BBC Radio 4 program The Inquiry, an episode called "Can we learn to live with nuclear power?" which discusses the repercussions of the 2011 disaster in the Fukushima nuclear power plant in Japan. It seems that some of us can live with nuclear power and some of us can't, even when we're looking at the same events and the same facts.  So, for example, Germans were convinced by the disaster that nuclear power isn't reliably safe and so they are abandoning it, but in France, nuclear power is still an acceptable option.  Indeed most of the electricity in France comes from nuclear power.

Why didn't the disaster convince everyone that nuclear power is unsafe?  Indeed, some saw the fact that there were no confirmed deaths attributable to the disaster as proof that nuclear power is safe, while others saw the whole event as confirmation that nuclear power is a disaster waiting to happen.  According to The Inquiry, a nation's history has a lot to do with how it reads the facts.  Germany's history is one of division and war, and nuclear power associated with bombs, but French researchers and engineers have long been involved in the development of nuclear power, so there's a certain amount of national pride in this form of energy.  It may not be an unrelated point that therefore many people in France have vested interests in nuclear power.  Still, same picture, different reading of it.

Cattenom nuclear power plant, France; Wikipedia


Reading ability is entirely genetic
And, I was alerted to yet another paper reporting that intelligence is genetic (h/t Mel Bartley); this time it's reading ability, for which no environmental effect was found (or acknowledged).  (This idea of little to no environmental effect is an interesting one, though, given that the authors, who are Dutch, report that heritability of dyslexia and reading fluency is higher among Dutch readers -- 80% compared with 45-70% elsewhere -- they suggest because Dutch orthography is simpler than that of English.  This sounds like an environmental effect to me.)

The authors assessed reading scores for twins, parents and siblings, and used these to evaluate additive and non-additive genetic effects, and family environmental factors.  As far as I can tell, subjects were asked to read aloud from a list of Dutch words, and the number they read correctly within a minute constituted their score.  And again, as far as I can tell, they did not test for nor select for children or parents with dyslexia, but they seem to be reporting results as though they apply to dyslexia.

The authors report a high correlation in reading ability between monozygotic twins, a lower correlation between dizygotic twins, and between twins and siblings, and a higher correlation between spouses, which to the authors is evidence of assortative mating (choice of mate based on traits associated with reading ability).  They conclude:
Such a pattern of correlation among family members is consistent with a model that attributes resemblance to additive genetic factors, these are the factors that contribute to resemblance among all biological relatives, and to non-additive genetic factors. Non-additive genetic factors, or genetic dominance, contributes to resemblance among siblings, but not to the resemblance of parents and offspring.  Maximum likelihood estimates for the additive genetic factors were 28% (CI: 0–43%) and for dominant genetic factors 36% (CI: 18–65%), resulting in a broad-sense heritability estimate of 64%. The remainder of the variance is attributed to unique environmental factors and measurement error (35%, CI: 29–44%).
Despite this evidence for environmental effect (right?), the authors conclude, "Our results suggest that the precursors for reading disability observed in familial risk studies are caused by genetic, not environ- mental, liability from parents. That is, having family risk does not reflect experiencing a less favorable literacy environment, but receiving less favorable genetic variants."

The ideas about additivity are technical and subtle.  Dominant effects, that is, non-additive interactions among alleles within a gene in the diploid copies of an individual, are not inherited as additive ones are (if you are a Dd and that determines your trait, only one of those alleles, and hence not enough to determine the trait, is transmitted to any of your offspring).  Likewise, interactions (between loci), called epistasis, is also not directly transmitted.

There are many practical as well as political reasons to believe that interactions can be ignored.  In a practical sense, even multiple 2-way interactions make impossible sample size and structure demands.  But in a political sense, additive effects mean that traits can be reliably predicted from genotype data (meaning, even at birth): you estimate the effects of each allele at each place in the genome, and add them to get the predicted phenotype.  There is money to be made by that, so to speak.  But it doesn't really work with complex interactions.  Strong incentives, indeed, to report additive effects and very understandable!

Secondly, all these various effects are estimated from samples, not derived from basic theory about molecular-level physiology, and often they are hardly informed by the latter at all.  This means that replication is not to be expected in any rigorous sense.  For example, dominance is estimated by the deviation of average traits in AA, Aa, and aa individuals from being in 0, 1, 2 proportions if (say) the 'a' allele contributed 1-unit of trait measure.  Dominance deviations are thoroughly sample-dependent.  It is not easy to interpret those results when samples cannot be replicated (the concepts are very useful in agricultural and experimental breeding contexts, but far less so in natural human populations). And this conveniently overlooks the environmental effects.

This study is of a small sample, especially since for many traits it now seems de rigueur to have samples of hundreds of thousands to get reliable mapping results, not to mention a confusingly defined trait, so it's difficult, at least for me, to make sense of the results.  In theory, it wouldn't be terribly surprising to find a genetic component to risk of reading disability, but it would be surprising, particularly since disability is defined only by test score in this study, if none of that ability was  substantially affected by environment.  In the extreme, if a child hasn't been to school or otherwise learned to read, that inability would be largely determined by environmental factors, right?  Even if an entire family couldn't read, it's not possible to know whether it's because no one ever had the chance to learn, or they share some genetic risk allele.

In people, unlike in other animals, assortative mating has a huge cultural component, so, again, it wouldn't be surprising if two illiterate adults married, or if they then had no books in the house, and didn't teach their children that reading was valuable.  But this doesn't mean either reading or their mate-choice necessarily has any genetic component.  

So, again, same data, different interpretations  
But why?  Indeed, what makes some Americans hear Donald Trump and resonate with his message, while others cringe?  Why do we need 9 Supreme Court justices if the idea is that evidence for determination of the constitutionality of a law is to be found in the Constitution?  Why doesn't just one justice suffice?  And, why do they look at the same evidence and reliably and predictably vote along political lines?

Or, more uncomfortably for scientists, why did some people consider it good news when it was announced that only 34% of replicated psychology experiments agreed with the original results, while others considered this unfortunate?  Again, same facts, different conclusions.

Why do our beliefs determine our opinions, even in science, which is supposed to be based on the scientific method, and sober, unbiased assessment of the data?  Statistics, like anything, can be manipulated, but done properly they at least don't lie.  But, is IQ real or isn't it?  Are behavioral traits genetically determined or aren't they?  Have genome wide association studies been successful or not?

As Ken often writes, much of how we view these things is certainly determined by vested interest and careerism, not to mention the emotional positions we inevitably take on human affairs.  If your lab spends its time and money on GWAS, you're more likely to see them as successful.  That's undeniable if you are candid.  But, I think it's more than that.  I think we're too often prisoners of induction, based on our experience, training, predilections of what observations we make or count as significant; our conclusions are often underdetermined, but we don't know it.  Underdetermined systems are those that are accounted for with not enough evidence.  It's the all-swans-are-white problem; they're all white until we see a black one. At which point we either conclude we were wrong, or give the black swan a different species name.  But, we never know if or when we're going to see a black one.  Or a purple one.

John Snow determined to his own satisfaction during the cholera epidemic in London in 1854 that cholera was transmitted by a contagion in the water.  But in fact he didn't prove it.  The miasmatists, who believed cholera was caused by bad air, had stacks of evidence of their own -- e.g., infection was more common in smoggy, smelly cities, and in fact in the dirtier sections of cities.  But both Snow and the miasmatists had only circumstantial evidence, correlations, not enough data to definitively prove their were right.  Both arguments were underdetermined.  As it happened, John Snow was right, but that wasn't to be widely known for another few decades when vibrio cholerae was identified under Robert Koch's microscope.

"The scent lies strong here; do you see anything?"; Wikipedia

Both sides strongly (emotionally!) believed they were right, believed they had the evidence to support their argument. They weren't cherry-picking the data to better support their side, they were looking at the same data and drawing different conclusions.  They based their conclusions on the data they had, but they had no idea it wasn't enough.  

But it's not just that, either.  It's also that we're predisposed by our beliefs to form our opinions.  And that's when we're likely to cherry pick the evidence that supports our beliefs.  Who's right about immigrants to the US, Donald Trump or Bernie Sanders?  Who's right about whether corporations are people or not?  Who's right about genetically modified organisms?  Or climate change?  Who's right about behavior and genetic determinism?  

And it's even more than that! If genetics and evolutionary biology have taught us anything, they've taught us about complexity.  Even simple traits turn out to be complex.  There are multiple pathways to most traits, most traits are due to interacting polygenes and environmental factors, and so on. Simple explanations are less likely to be correct than explanations that acknowledge complexity, and that's because evolution doesn't follow rules, except that what works works, and to an important degree that's what is here to be examined today.  

Simplistic explanations are probably wrong.   But they are so appealing. 

Rare Disease Day and the promises of personalized medicine

O ur daughter Ellen wrote the post that I republish below 3 years ago, and we've reposted it in commemoration of Rare Disease Day, Febru...