genetics etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
genetics etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Everything is genetic, isn't it?

There is hardly a trait, physical or behavioral, for which there is not at least some familial resemblance, especially among close relatives.  And I'm talking about what is meant when someone scolds you saying, "You're just like your mother!"  The more distant the relatives in terms of generations of separation, the less the similarity.  So you really can resist when told, "You're just like your great-grandmother!" The genetic effects decline in a systematic way with more distant kinship.

The 'heritability' of a trait refers to the relative degree to which its variation is the result of variation in genes, the rest being due to variation in non-genetic factors we call 'environment'.  Heritability is a ratio that ranges from zero when genes have nothing to do with the trait, to 1.0 when all the variation is genetic.  The measure applies to a sample or population and cannot automatically be extended to other samples or populations, where both genetic and environmental variation will be different, often to an unknown extent.

Most quantitative traits, like stature or blood pressure or IQ scores show some amount, often quite substantial, of genetic influence.  It often happens that we are interested in some trait that we think must be produced or affected by genes, but that no relevant factor, like a protein, is known.  The idea arose decades ago that if we could scan the genome, and compare those with different manifestations of the trait, using mapping techniques like GWAS (genomewide association studies), we could identify those sites, genomewide, whose variation in our chosen sample may affect the trait's variation.  Qualitative traits like the presence or absence of a disease (say, diabetes or hypertension), may often be due to the presence of some set of genetic variants whose joint impact exceeds some diagnostic threshold, and mapping studies can compare genotypes in affected cases to unaffected controls to identify those sites.

Genes are involved in everything. . . . .
Many things can affect the amount of similarity among relatives, so one has to try to think carefully about attributing ideas of similarity and cause.  Some traits, like stature (height) have very high heritability, sometimes estimated to be about 0.9, that is, 90% of the variation being due to the effects of genetic variation.  Other traits have much lower heritability, but there's generally familial similarity.  And, that's because we each develop from a single fertilized egg cell, which includes transmission of each of our parent's genomes, plus ingredients provided by the egg (and perhaps to a tiny degree sperm), much of which were the result of gene action in our parents when they produced that sperm or egg (e.g., RNA, proteins).  This is why traits can usually be found to have some heritability--some contribution due to genetic variation among the sampled individuals.  In that sense, we can say that genes are involved in everything.

Understanding the genetic factors involved in disease can be important and laudatory, even if tracking them down is a frustrating challenge.  But because genes are involved in everything, our society also seems to have an unending lust for investigators to overstate the value of their findings or, in particular, to estimate or declaim on the heritability, and hence genetic determination, of the most societally sensitive traits, like sexuality, criminality, race, intelligence, physical abuse and the like.

. . . . . but not everything is 'genetic'!

If the estimated heritability for a trait we care about is substantial, then this does suggest the obvious: genes are contributing to the mechanisms of the trait and so it is reasonable to acknowledge that genetic variation contributes to variation in the trait.  However, the mapping industry implies a somewhat different claim: it is that genes are a major factor in the sense that individual variants can be identified that are useful predictors of the trait of interest (NIH's lobbying machine has been saying we'll be able to predict future disease with 'precision').  There has been little constraint on the types of trait for which this approach, sometimes little more than belief or wishful-thinking, is appropriate.

It is important to understand that our standard measures of genes' relative effect are affected both by genetic variation and environmental lifestyle factors.  That means that if environments were to change, the relative genetic effects, even in the very same individuals, would also change.  But it isn't just environments that change; genotypes change, too, when mutations occur, and as with environmental factors, these change in ways that we cannot  predict even in principle.  That means that we cannot legitimately extrapolate, to a knowable extent, the genetic or environmental factors we observe in a given sample or population, to other, much less to future samples or populations.  This is not a secret problem, but it doesn't seem to temper claims of dramatic discoveries, in regard to disease or perhaps even more for societally sensitive traits.

But let's assume, correctly, that genetic variation affects a trait.  How does it work?  The usual finding is that tens or even hundreds of genome locations affect variation in the test trait.  Yet most of the effects of individual genes are very small or rare in the sample.  At least as important is that the bulk of the estimated heritability remains unaccounted for, and unless we're far off base somehow, the unaccounted fraction is due to the leaf-litter of variants individually too weak or too rare to reach significance.

Often it's also asserted that all the effects are additive, which makes things tractable: for every new person, not part of the study, just identify their variants and add up their estimated individual effects to get the total effect on the new person for whatever publishable trait you're interested in.  That's the predictive objective of the mapping studies.  However, I think that for many reasons one cannot accept that these variable sites' actions are truly additive. The reasons have to with actual biology, not the statistical convenience of using the results to diagnose or predict traits.  Cells and their compounds vary in concentrations per volume (3D), binding properties (multiple dimensions), surface areas (2D) and some in various ways that affect how how proteins are assembled and work, and so on.  In aggregate, additivity may come out in the wash, but the usual goal of applied measures is to extrapolate these average results to prediction in individuals.  There are many reasons to wish that were true, but few to believe it very strongly.

Even if they were really additive, the clearly very different leaf-litter background that together accounts for the bulk of the heritability can obscure the numerical amount of that additivity from sample to sample and person to person.  That is, what you estimated from this sample, may not apply, to an unknowable extent, to the next sample.  If and when it does works, we're lucky that our assumptions weren't too far off.

Of course, the focus and promises from the genetics interests assume that environment has nothing serious to do with the genetic effects.  But it's a major, often by far the major, factor, and it may even in principle be far more changeable than genetic variation.  One would have to say that environmental rather than genetic measures are likely to be, by far, the most important things to change in society's interest.

We regularly write these things here not just to be nay-sayers, but to try to stress what the issues are, hoping that someone, by luck or insight, finds better solutions or different ways to approach the problem that a century of genetics, despite its incredibly huge progress, has not yet done.  What it has done is in exquisite detail to show us what the problems are.

A friend and himself a good scientist in relevant areas, Michael Joyner, has passed on a rather apt suggestion to me, that he says he saw in work by Denis Noble.  We might be better off if we thought of the genome as a keyboard rather than as a code or program.  That is a good way to think about the subtle point that, in the end, yes, Virginia, there really are genomic effects: genes affect every trait....but not every trait is 'genetic'!

Spooky action at a (short) distance

Entanglement in physics is about action that seems to transfer some sort of 'information' across distances at speeds faster than that of light.  Roughly speaking (I'm not a physicist!), it is about objects with states that are not fixed in advance, and could take various forms but must differ between them, and that are separated from each other.  When measurement is made on one of them, whatever the result, the corresponding object takes on its opposite state.  That means the states are not entirely due to local factors, and somehow the second object 'knows' what state the first was observed in and takes on a different state.

You can read about this in many places and understand it better than I do or than I've explained it here.  Albert Einstein was skeptical that this could occur, if the speed of light were the fastest possible speed.  So he famously called the findings as they stood at that time "Spooky action at a distance." But the findings have stood many specific tests, and seem to be real, however it happens.

Does life, too, have spooky action? 
I think the answer is: maybe so.  But it is at a very short distance, that within the nuclei of individual cells.  Organisms have multiple chromosomes and many species, like humans, have 2 instances of each (are 'diploid'), one inherited from each parent.  I say 'instances' rather than 'copies', because they are not identical to each other nor to those of the parent that transmitted each of them.  They are perhaps near copies, but mutation always occurs, even among the cells within each of us, so each cell differs from their contemporary somatic fellows and from what we inherited in our single-cell beginnings as a fertilized egg.

Many clever studies over many years have been documenting the 3-dimensional, context-specific conformation, or detailed physical arrangement of chromosomes within cells.  The work is variously known, but one catch-term is chromosome conformation capture, or 3C, and I'll use that here.  Unless or until this approach is shown to be too laden with laboratory artifact (it's quite sophisticated), we'll assume it's more or less right.

The gist of the phenomenon is that (1) a given cell type, under a given set of conditions, is using only a subset of its genes (for my  purposes here this generally means protein-coding genes proper); (2) these active genes are scattered along and between the chromosomes, with intervening inactive regions (genes not being used at the moment); (3) the cell's gene expression pattern can change quickly when its circumstances change, as it responds to environmental conditions, during cell division, etc.; (4) at least to some extent the active regions seem to be clustered physically together in expression-centers in the nucleus; (5) this all implies that there is extensive trans communication, coordinating, and physically juxtaposing, parts within and among each chromosome--there is action at a very short distance.

Even more remarkably, I think, this phenomenon seems somehow robust to speciation because related species have similar functions and similar sets of genes, but often their chromosomes have been extensively rearranged during their evolutionary separation. More than this: each person has different DNA sequences due to mutation, and different numbers of genes due to copy number changes (duplications, deletions); yet the complex local juxtapositions seem to work anyway.  At present this is so complicated, so regular, and so changeable and thus so poorly understood, that I think we can reasonably parrot Einstein and call it 'spooky'.

What this means is that chromosomes are not just randomly floating around like a bowl of spaghetti.   Gene expression (including transcribed non-coding RNAs) is thought to be based on the sequence-specific binding of tens of transcription factors in an expression complex that is (usually) just upstream of the transcribed part.  Since a given cell under given conditions is expressing thousands of condition-specific genes, there must be very extensive interaction or 'communication' in trans, that is, across all the chromosomes. That's because the cell can change its expression set very quickly.

The 3C results show that in a given type of cell under given conditions, the chromosomes are physically very non-randomly arranged, with active centers physically very near or perhaps touching each other.  How this massive plate of apparent-spaghetti even physically rearranges to get these areas together, without getting totally tangled up, yet to be quickly rearrangeable is, to me, spooky if anything in Nature is.  The entanglement, disentanglement, and re-entanglement happens genome wide, which is implicitly what the classical term  'polygenic' essentially recognized related to genetic causation, but is now being documented.

The usual approach of genetics these days is to sequence and enumerate various short functional bits as being coding, regulatory, enhancing, inhibiting, transcribing etc. other parts nearby.  We have long been able to analyze cDNA and decide which parts are being used for protein coding, at least. Locally, we can see why or how this happens, in the sense that we can identify the transcription factors and their binding sites, called promoters, enhancers and the like, and the actual protein or functional RNA codes.  We can find expression correlates by extracting them from cells and enumerating them.  3C analysis appears to show that these coding elements are, at least to some extent, found juxtaposed in various transcription hot-spots.

Is gene expression 'entangled'?
What if the molecular aspects of the 3C research were shown to be technical artifacts, relative to what is really going on?  I have read some skepticism about that, concerning what is found in single cells vs aggregates of 'identical' cells.  If 3C stumbles, will our idea of polygenic condition-specific gene usage change?   I think not.  We needn't have 3C data to show the functional results since they are already there to see (e.g., in cell-specific expression studies--cDNA and what ENCODE has found). If 3C has been misleading for technical or other reasons, it would just mean that something else just as spooky but different from the 3D arrangement that 3C detects, is responsible for correlating the genomewide trans gene usage.  And it's of course 4-dimensional since it's time-dependent, too.  So what I've said here still will apply, even if for some other, unknown or even unsuspected reason.

The existing observations on context-specific gene expression show that something 'entangles' different parts of the genome for coordinated use, and that can change very rapidly.  The same genome, among the different types of cells of an individual, can behave very differently in this sense. Somehow, its various chromosomal regions 'know' how to be, or, better put, are coordinated.  This seems at least plausibly to be more than just that a specific context-specific set of transcription factors (TFs) binds selectively near regions to be transcribed and changes in its thousands of details almost instantly.  What TFs?  and how does a given TF know which binding sites to grab or to release, moment by moment, since they typically bind enhancers or promoters of many different genes, not all of them expression-related.  And if you want to dismiss that, by saying for example that this has to do with which TFs are themselves being produced, or which parts of DNA are unwrapped at each particular time, then you're just bumping the same question about trans control up, or over, to a different level of what's involved.  That's no answer!

And there is even another, seemingly simpler example to show that we really don't understand what's going on: the alignment of homologues in the first stage of meiosis.  We've been taught that empirical and necessary fact about meiosis for many decades. But how do the two homologues find each other to align?  This is essentially just not mentioned, if anyone even was asking, in textbooks.  I've seen some speculative ideas, again involving what I'll call 'electromagnetic' properties of each chromosome but even their authors didn't really claim it was sufficient or definitive.  Just for examples, homologous chromosomes in a diploid individual have different rearrangements, deletions, duplications, and all sorts of heterozygous sequence details, yet by and large they still seem to find each other in meiosis.  Something's going on!

How might this be tested?
I don't have any answers, but I wonder if, on the hypothesis that these thoughts are on target, how we might set up some critical experiments to test this.  I don't know if we can push the analogy with tests for quantum entanglement or not, but probably not.

One might hope that 'all' we have to do is enumerate sequence bits to account for this action-at-a-distance, this very detailed trans phenomenon.  But I wonder......I wonder if there may be something entirely unanticipated or even unknown that could be responsible.  Maybe there are 'electromagnetic' properties or something akin to that, that are involved in such detailed 4D contextually relativistic phenomena.

Suppose that what happens at one chromosomal location (let's just call it the binding of a TF), directly affects whether that or a different TF binds somewhere else at the same time.  Whatever causes the first event, if that's how it works, the distance effect would be a very non-local phenomenon, one so central to organized life in complex organisms that, causally, is not just a set of local gene expressions.  Somehow, some sort of 'information' is at work very fast and over very short distances. It is the worst sort of arrogance to assume it is all just encoded in DNA as a code we can read off along the strand and that will succumb to enumerative local informatic sequence analysis.

The current kind of purely local hypothetical sequence enumeration-based account seems too ordinary--it's not spooky enough!

The GWAS hoax....or was it a hoax? Is it a hoax?

A long time ago, in 2000, in Nature Genetics, Joe Terwilliger and I critiqued the idea then being pushed by the powers-that-be, that the genomewide mapping of complex diseases was going to be straightforward, because of the 'theory' (that is, rationale) then being proposed that common variants caused common disease.  At one point, the idea was that only about 50,000 markers would be needed to map any such trait in any global populations.  I and collaborators can claim that in several papers in prominent journals, in a 1992 Cambridge Press book, Genetic Variation and Human Disease, and many times on this blog we have pointed out numerous reasons, based on what we know about evolution, why this was going to be a largely empty promise.  It has been inconvenient for this message to be heard, much less heeded, for reasons we've also discussed in many blog posts.

Before we get into that, it's important to note that unlike me, Joe has moved on to other things, like helping Dennis Rodman's diplomatic efforts in North Korea (here, Joe's shaking hands as he arrives in his most recent trip).  Well, I'm more boring by far, so I guess I'll carry on with my message for today.....




There's now a new paper, coining a new catch-word (omnigenic), to proclaim the major finding that complex traits are genetically complex.  The paper seems solid and clearly worthy of note.  The authors examine the chromosomal distribution of sites that seem to affect a trait, in various ways including chromosomal conformation.  They argue, convincingly, that mapping shows that complex traits are affected by sites strewn across the genome, and they provide a discussion of the pattern and findings.

The authors claim an 'expanded' view of complex traits, and as far as that goes it is justified in detail. What they are adding to the current picture is the idea that mapped traits are affected by 'core' genes but that other regions spread across the genome also contribute. In my view the idea of core genes is largely either obvious (as a toy example, the levels of insulin will relate to the insulin gene) or the concept will be shown to be unclear.  I say this because one can probably always retroactively identify mapped locations and proclaim 'core' elements, but why should any genome region that affects a trait be considered 'non-core'?

In any case, that would be just a semantic point if it were not predictably the phrase that launched a thousand grant applications.  I think neither the basic claim of conceptual novelty, nor the breathless exploitive treatment of it by the news media, are warranted: we've known these basic facts about genomic complexity for a long time, even if the new analysis provides other ways to find or characterize the multiplicity of contributing genome regions.  This assumes that mapping markers are close enough to functionally relevant sites that the latter can be found, and that the unmappable fraction of the heritability isn't leading to over-interpretation of what is 'mapped' (reached significance) or that what isn't won't change the picture.

However, I think the first thing we really need to do is understand the futility of thinking of complex traits as genetic in the 'precision genomic medicine' sense, and the last thing we need is yet another slogan by which hands can remain clasped around billions of dollars for Big Data resting on false promises.  Yet even the new paper itself ends with the ritual ploy, the assertion of the essential need for more information--this time, on gene regulatory networks.  I think it's already safe to assure any reader that these, too, will prove to be as obvious and as elusively ephemeral as genome wide association studies (GWAS) have been.

So was GWAS a hoax on the public?
No!  We've had a theory of complex (quantitative) traits since the early 1900s.  Other authors argued similarly, but RA Fisher's famous 1918 paper is the typical landmark paper.  His theory was, simply put, that infinitely many genome sites contribute to quantitative (what we now call polygenic) traits.  The general model has jibed with the age-old experience of breeders who have used empirical strategies to improve crop, or pets species.  Since association mapping (GWAS) became practicable, they have used mapping-related genotypes to help select animals for breeding; but genomic causation is so complex and changeable that they've recognized even this will have to be regularly updated.

But when genomewide mapping of complex traits was first really done (a prime example being BRCA genes and breast cancer) it seemed that apparently complex traits might, after all, have mappable genetic causes. BRCA1 was found by linkage mapping in multiply affected families (an important point!), in which a strong-effect allele was segregating.  The use of association mapping  was a tool of convenience: it used random samples (like cases vs controls) because one could hardly get sufficient multiply affected families for every trait one wanted to study.  GWAS rested on the assumption that genetic variants were identical by descent from common ancestral mutations, so that a current-day sample captured the latest descendants of an implied deep family: quite a conceptual coup based on the ability to identify association marker alleles across the genome identical by descent from the un-studied shared remote ancestors.

Until it was tried, we really didn't know how tractable such mapping of complex traits might be. Perhaps heritability estimates based on quantitative statistical models was hiding what really could be enumerable, replicable causes, in which case mapping could lead us to functionally relevant genes. It was certainly worth a try!

But it was quickly clear that this was in important ways a fool's errand.  Yes, some good things were to be found here and there, but the hoped-for miracle findings generally weren't there to be found. This, however, was a success not a failure!  It showed us what the genomic causal landscape looked like, in real data rather than just Fisher's theoretical imagination.  It was real science.  It was in the public interest.

But that was then.  It taught us its lessons, in clear terms (of which the new paper provides some detailed aspects).  But it long ago reached the point of diminishing returns.  In that sense, it's time to move on.

So, then, is GWAS a hoax?
Here, the answer must now be 'yes'!  Once the lesson is learned, bluntly speaking, continuing on is more a matter of keeping the funds flowing than profound new insights.  Anyone paying attention should by now know very well what the GWAS etc. lessons have been: complex traits are not genetic in the usual sense of being due to tractable, replicable genetic causation.  Omnigenic traits, the new catchword, will prove the same.

There may not literally be infinitely many contributing sites as in the original statistical models, be they core or peripheral, but infinitely many isn't so far off.  Hundreds or thousands of sites, and accounting for only a fraction of the heritability means essentially infinitely many contributors, for any practical purposes.  This is particularly so since the set is not a closed one:  new mutations are always arising and current variants dying away, and along with somatic mutation, the number of contributing sites is open ended, and not enumerable within or among samples.

The problem is actually worse.  All these data are retrospective statistical fits to samples of past outcomes (e.g., sampled individuals' blood pressures, or cases' vs controls' genotypes).  Past experience is not an automatic prediction of future risk.  Future mutations are not predicable, not even in principle.  Future environments and lifestyles, including major climatic dislocations, wars, epidemics and the like are not predictable, not even in principle.  Future somatic mutations are not predictable, not even in principle.

GWAS almost uniformly have found (1) different mapping results in different samples or populations, (2) only a fraction of heritability is accounted for by tens, hundreds, or even thousands of genome locations and (3) even relatively replicable 'major' contributors, themselves usually (though not always) small in their absolute effect, have widely varying risk effects among samples.

These facts are all entirely expectable based on evolutionary considerations, and they have long been known, both in principle, indirectly, and from detailed mapping of complex traits.  There are other well-known reasons why, based on evolutionary considerations, among other things, this kind of picture should be expected.  They involve the blatantly obvious redundancy in genetic causation, which is the result of the origin of genes by duplication and the highly complex pathways to our traits, among other things.  We've written about them here in the past.  So, given what we now know, more of this kind of Big Data is a hoax, and as such, a drain on public resources and, perhaps worse, on the public trust in science.

What 'omnigenic' might really mean is interesting.  It could mean that we're pressing up ever more intensely against the log-jam of understanding based on an enumerative gestalt about genetics.  Ever more detail, always promising that if we just enumerate and catalog just a bit (in this case, the authors say we need to study gene regulatory networks) more we'll understand.  But that is a failure to ask the right question: why and how could every trait be affected by every part of the genome?  Until someone starts looking at the deeper mysteries we've been identifying, we won't have the transormative insight that seems to be called for, in my view.

To use Kuhn's term, this really is normal science pressing up against a conceptual barrier, in my view. The authors work the details, but there's scant hint they recognize we need something more than more of the same.  What is called for, I think is young people who haven't already been propagandized about the current way of thinking, the current grantsmanship path to careers.

Perhaps more importantly, I think the situation is at present an especially cruel hoax, because there are real health problems, and real, tragic, truly genetic diseases that a major shift in public funding could enable real science to address.

Some genetic non-sense about nonsense genes

The April 12 issue of Nature has a research report and a main article about what is basically presented as the discovery that people typically carry doubly knocked-out genes, but show no effect. The idea as presented in the editorial (p 171) notes that the report (p235) uses an inbred population to isolate double knockout genes (that is, recessive homozygous null mutations), and look at their effects.  The population sampled, from Pakistan, has high levels of consanguineous marriages.  The criteria for a knockout mutation was based on the protein coding sequence.

We have no reason to question the technical accuracy of the papers, nor their relevance to biomedical and other genetics, but there are reasons to assert that this is nothing newly discovered, and that the story misses the really central point that should, I think, be undermining the expensive Big Data/GWAS approach to biological causation.

First, for some years now there have been reports of samples of individual humans (perhaps also of yeast, but I can't recall specifically) in which both copies of a gene appear to be inactivated.  The criteria for saying so are generally indirect, based on nonsense, frameshift, or splice-site mutations in the protein code.  That is, there are other aspects of coding regions that may be relevant to whether this is a truly thorough search to see that whatever is coded really is non-functional.  The authors mention some of these.  But, basically, costly as it is, this is science on the cheap because it clearly only addresses some aspects of gene functionality.  It would obviously be almost impossible to show either that the gene was never expressed or never worked. For our purposes here, we need not question the finding itself.  The fact that this is not a first discovery does raise the question why a journal like Nature is so desperate for Dramatic Finding stories, since this one really should be instead a report in one of many specialty human genetics journals.

Secondly, there are causes other than coding mutations for gene inactivation. They have to do with regulatory sequences, and inactivating mutations in that part of a gene's functional structure is much more difficult, if not impossible, to detect with any completeness.  A gene's coding sequence itself may seem fine, but its regulatory sequences may simply not enable it to be expressed. Gene regulation depends on epigenetic DNA modification as well as multiple transcription factor binding sites, as well as the functional aspects of the many proteins required to activate a gene, and other aspects of the local DNA environment (such as RNA editing or RNA interference).  The point here is that there are likely to be many other instances of people with complete or effectively complete double knockouts of genes.

Thirdly, the assertion that these double KOs have no effect depends on various assumptions.  Mainly, it assumes that the sampled individuals will not, in the future, experience the otherwise-expected phenotypic effects of their defunct genes.  Effects may depend on age, sex, and environmental effects rather than necessarily being a congenital yes/no functional effect.

Fourthly, there may be many coding mutations that make the protein non-functional, but these are ignored by this sort of study because they aren't clear knockout mutations, yet they are in whatever data are used for comparison of phenotypic outcomes.  There are post-translational modification, RNA editing, RNA modification, and other aspects of a 'gene' that this is not picking up.

Fifthly, and by far most important, I think, is that this is the tip of the iceberg of redundancy in genetic functions.  In that sense, the current paper is a kind of factoid that reflects what GWAS has been showing in great, if implicit, detail for a long time: there is great complexity and redundancy in biological functions.  Individual mapped genes typically affect trait values or disease risks only slightly.  Different combinations of variants at tens, hundreds, or even thousands of genome sites can yield essentially the same phenotype (and here we ignore the environment which makes things even more causally blurred).

Sixthly, other samples and certainly other populations, as well as individuals within the Pakistani data base, surely carry various aspects of redundant pathways, from plenty of them to none.  Indeed, the inbreeding that was used in this study obviously affects the rest of the genome, and there's no particular way to know in what way, or more importantly, in which individuals.  The authors found a number of basically trivial or no-effect results as it is, even after their hunt across the genome. Whether some individuals had an attributable effect of a particular double knockout is problematic at best.  Every sample, even of the same population, and certainly of other populations, will have different background genotypes (homozygous or not), so this is largely a fishing expedition in a particular pond that cannot seriously be extrapolated to other samples.

Finally, this study cannot address the effect of somatic mutation on phenotypes and their risk of occurrence.  Who knows how many local tissues have experienced double-knockout mutations and produced (or not produced) some disease or other phenotype outcome.  Constitutive genome sequencing cannot detect this.  Surely we should know this very inconvenient fact by now!

Given the well-documented and pervasive biological redundancy, it is not any sort of surprise that some genes can be non-functional and the individual phenotypically within a viable, normal range. Not only is this not a surprise, especially by now in the history of genetics, but its most important implication is that our Big Data genetic reductionistic experiment has been very successful!  It has, or should have, shown us that we are not going to be getting our money's worth from that approach.  It will yield some predictions in the sense of retrospective data fitting to case-control or other GWAS-like samples, and it will be trumpeted as a Big Success, but such findings, even if wholly correct, cannot yield reliable true predictions of future risk.

Does environment, by any chance, affect the studied traits?  We have, in principle, no way to know what environmental exposures (or somatic mutations) will be like.  The by now very well documented leaf-litter of rare and/or small-effect variants plagues GWAS for practical statistical reasons (and is why usually only a fraction of heritability is accounted for).  Naturally, finding a single doubly inactivated gene may, but by no means need, yield reliable trait predictions.

By now, we know of many individual genes whose coded function is so proximate or central to some trait that mutations in such genes can have predictable effects.  This is the case with many of the classical 'Mendelian' disorders and traits that we've known for decades.  Molecular methods have admirably identified the gene and mutations in it whose effects are understandable in functional terms (for example, because the mutation destroys a key aspect of a coded protein's function).  Examples are Huntington's disease, PKU, cystic fibrosis, and many others.

However, these are at best the exceptions that lured us to think that even more complex, often late-onset traits would be mappable so that we could parlay massive investment in computerized data sets into solid predictions and identify the 'druggable' genes-for that Big Pharma could target.  This was predictably an illusion, as some of us were saying long ago and for the right reasons.  Everyone should know better now, and this paper just reinforces the point, to the extent that one can assert that it's the political economic aspects of science funding, science careers, and hungry publications, and not the science itself, that leads to the persistence of drives to continue or expand the same methods anyway.  Naturally (or should one say reflexively?), the authors advocate a huge Human Knockout Project to study every gene--today's reflex Big Data proposal.**

Instead, it's clearly time to recognize the relative futility of this, and change gears to more focused problems that might actually punch their weight in real genetic solutions!

** [NOTE added in a revision.  We should have a wealth of data by now, from many different inbred mouse and other animal strains, and from specific knockout experiments in such animals, to know that the findings of the Pakistani family paper are to be expected.  About 1/4 to 1/3 of knockout experiments in mice have no effect or not the same effect as in humans, or have no or different effect in other inbred mouse strains.  How many times do we have to learn the same lesson?  Indeed, with existing genomewide sequence databases from many species, one can search for 2KO'ed genes.  We don't really need a new megaproject to have lots of comparable data.]

The (bad) luck of the draw; more evidence

A while back, Vogelstein and Tomasetti (V-T) published a paper in Science in which it was argued that most cancers cannot be attributed to known environmental factors, but instead were due simply to the errors in DNA replication that occur throughout life when cells divide.  See our earlier 2-part series on this.

Essentially the argument is that knowledge of the approximate number of at-risk cell divisions per unit of age could account for the age-related pattern of increase in cancers of different organs, if one ignored some obviously environmental causes like smoking.  Cigarette smoke is a mutagen and if cancer is a mutagenic disease, as it certainly largely is, then that will account for the dose-related pattern of lung and oral cancers.

This got enraged responses from environmental epidemiologists whose careers are vested in the idea that if people would avoid carcinogens they'd reduce their cancer risk.  Of course, this is partly just the environmental epidemiologists' natural reaction to their ox being gored--threats to their grant largesse and so on.  But it is also true that environmental factors of various kinds, in addition to smoking, have been associated with cancer; some dietary components, viruses, sunlight, even diagnostic x-rays if done early and often enough, and other factors.

Most associated risks from agents like these are small, compared to smoking, but not zero and an at least legitimate objection to V-T's paper might be that the suggestion that environmental pollution, dietary excess, and so on don't matter when it comes to cancer is wrong.  I think V-T are saying no such thing.  Clearly some environmental exposures are mutagens and it would be a really hard-core reactionary to deny that mutations are unrelated to cancer.  Other external or lifestyle agents are mitogens; they stimulate cell division, and it would be silly not to think they could have a role in cancer.  If and when they do, it is not by causing mutations per se.  Instead mitogenic exposures in themselves just stimulate cell division, which is dangerous if the cell is already transformed into a cancer cell.  But it is also a way to increase cancer by just what V-T stress: the natural occurrence of mutations when cells divide.

There are a few who argue that cancer is due to transposable elements moving around and/or inserting into the genome where they can cause cells to misbehave, or other perhaps unknown factors such as of tissue organization, which can lead cells to 'misbehave', rather than mutations.

These alternatives are, currently, a rather minor cause of cancer.  In response to their critics, V-T have just published a new multi-national analysis that they suggest supports their theory.  They attempted to correct for the number of at-risk cells and so on, and found a convincing pattern that supports the intrinsic-mutation viewpoint.  They did this to rebut their critics.

This is at least in part an unnecessary food-fight.  When cells divide, DNA replication errors occur.  This seems well-documented (indeed, Vogelstein did some work years ago that showed evidence for somatic mutation--that is, DNA changes that are not inherited--and genomes of cancer cells compared to normal cells of the same individual.  Indeed, for decades this has been known in various levels of detail.  Of course, showing that this is causal rather than coincidental is a separate problem, because the fact of mutations occurring during cell division doesn't necessarily mean that the mutations are causal. However, for several cancers the repeated involvement of specific genes, and the demonstration of mutations in the same gene or genes in many different individuals, or of the same effect in experimental mice and so on, is persuasive evidence that mutational change is important in cancer.

The specifics of that importance are in a sense somewhat separate from the assertion that environmental epidemiologists are complaining about.  Unfortunately, to a great extent this is a silly debate. In essence, besides professional pride and careerism, the debate should not be about whether mutations are involved in cancer causation but whether specific environmental sources of mutation are identifiable and individually strong enough, as x-rays and tobacco smoke are, to be identified and avoided.  Smoking targets particular cells in the oral cavity and lungs.  But exposures that are more generic, but individually rare or not associated with a specific item like smoking, and can't be avoided, might raise the rate of somatic mutation generally.  Just having a body temperature may be one such factor, for example.

I would say that we are inevitably exposed to chemicals and so on that will potentially damage cells, mutation being one such effect.  V-T are substantially correct, from what the data look like, in saying that (in our words) namable, specific, and avoidable environmental mutations are not the major systematic, organ-targeting cause of cancer.  Vague and/or generic exposure to mutagens will lead to mutations more or less randomly among our cells (maybe, depending on the agent, differently depending on how deep in our bodies the cells are relative to the outside world or other means of exposure).  The more at-risk cells, the longer they're at risk, and so on, the greater the chance that some cell will experience a transforming set of changes.

Most of us probably inherit mutations in some of these genes from conception, and have to await other events to occur (whether these are mutational or of another nature as mentioned above).  The age patterns of cancers seem very convincingly to show that.  The real key factor here is the degree to which specific, identifiable, avoidable mutational agents can be identified.  It seems silly or, perhaps as likely, mere professional jealousy, to resist that idea.

These statements apply even if cancers are not all, or not entirely, due to mutational effects.  And, remember, not all of the mutations required to transform a cell need be of somatic origin.  Since cancer is mostly, and obviously, a multi-factor disease genetically (not a single mutation as a rule), we should not have our hackles raised if we find what seems obvious, that mutations are part of cell division, part of life.

There are curious things about cancer, such as our large body size but delayed onset ages relative to the occurrence of cancer in smaller, and younger animals like mice.  And different animals of different lifespans and body sizes, even different rodents, have different lifetime cancer risks (some may be the result of details of their inbreeding history or of inbreeding itself).  Mouse cancer rates increase with age and hence the number of at-risk cell divisions, but the overall risk at very young ages despite many fewer cell divisions (yet similar genome sizes) shows that even the spontaneous mutation idea of V-T has problems.  After all, elephants are huge and live very long lives; why don't they get cancer much earlier?

Overall, if if correct, V-T's view should not give too much comfort to our 'Precision' genomic medicine sloganeers, another aspect of budget protection, because the bad luck mutations are generally somatic, not germline, and hence not susceptible to Big Data epidemiology, genetic or otherwise, that depends on germ-line variation as the predictor.

Related to this are the numerous reports of changes in life expectancy among various segments of society and how they are changing based on behaviors, most recently, for example, the opiod epidemic among whites in depressed areas of the US.  Such environmental changes are not predictable specifically, not even in principle, and can't be built into genome-based Big Data, or the budget-promoting promises coming out of NIH about such 'precision'.  Even estimated lifetime cancer risks associated with mutations in clear-cut risk-affecting genes like BRCA1 mutations and breast cancer, vary greatly from population to population and study to study.  The V-T debate, and their obviously valid point, regardless of the details, is only part of the lifetime cancer risk story.

ADDENDUM 1
Just after posting this, I learned of a new story on this 'controversy' in The Atlantic.  It is really a silly debate, as noted in my original version.  It tacitly makes many different assumptions about whether this or that tinkering with our lifestyles will add to or reduce the risk of cancer and hence support the anti-V-T lobby.  If we're going to get into the nitty-gritty and typically very minor details about, for example, whether the statistical colon-cancer-protective effect of aspirin shows that V-T were wrong, then this really does smell of academic territory defense.

Why do I say that?  Because if we go down that road, we'll have to say that statins are cancer-causing, and so is exercise, and kidney transplants and who knows what else.  They cause cancer by allowing people to live longer, and accumulate more mutational damage to their cells.  And the supposedly serious opioid epidemic among Trump supporters actually is protective, because those people are dying earlier and not getting cancer!

The main point is that mutations are clearly involved in carcinogenesis, cell division life-history is clearly involved in carcinogenesis, environmental mutagens are clearly involved in carcinogenesis, and inherited mutations are clearly contributory to the additional effects of life-history events.  The silly extremism to which the objectors to V-T would take us would be to say that, obviously, if we avoided any interaction whatsoever with our environment, we'd never get cancer.  Of course, we'd all be so demented and immobilized with diverse organ-system failures that we wouldn't realize our good fortune in not getting cancer.

The story and much of the discussion on all sides is also rather naive even about the nature of cancer (and how many or of which mutations etc it takes to get cancer); but that's for another post sometime.

ADDENDUM 2
I'll add another new bit to my post, that I hadn't thought of when I wrote the original.  We have many ways to estimate mutation rates, in nature and in the laboratory.  They include parent-offspring comparison in genomewide sequencing samples, and there have been sperm-to-sperm comparisons.  I'm sure there are many other sets of data (see Michael Lynch in Trends in Genetics 2010 Aug; 26(8): 345–352.  These give a consistent picture and one can say, if one wants to, that the inherent mutation rate is due to identifiable environmental factors, but given the breadth of the data that's not much different than saying that mutations are 'in the air'.  There are even sex-specific differences.

The numerous mutation detection and repair mechanisms, built into genomes, adds to the idea that mutations are part of life, for example that they are not related to modern human lifestyles.  Of course, evolution depends on mutation, so it cannot and never has been reduced to zero--a species that couldn't change doesn't last.  Mutations occur in plants and animals and prokaryotes, in all environments and I believe, generally at rather similar species-specific rates.

If you want to argue that every mutation has an external (environmental) cause rather than an internal molecular one, that is merely saying there's no randomness in life or imperfection in molecular processes.  That is as much a philosophical as an empirical assertion (as perhaps any quantum physicist can tell you!).  The key, as  asserted in the post here, is that for the environmentalists' claim to make sense, to be a mutational cause in the meaningful sense, the force or factor must be systematic and identifiable and tissue-specific, and it must be shown how it gets to the internal tissue in question and not to other tissues on the way in, etc.

Given how difficult it has been to chase down most environmental carcinogenic factors, to which exposure is more than very rare, and that the search has been going on for a very long time, and only a few have been found that are, in themselves, clearly causal (ultraviolet radiation, Human Papilloma Virus, ionizing radiation, the ones mentioned in the post), whatever is left over must be very weak, non tissue-specific, rare, and the like.  Even radiation-induced lung cancer in uranium minors has been challenging to prove (for example, because miners also largely were smokers).

It is not much of a stretch to simply say that even if, in principle, all mutations in our body's lifetime were due to external exposures, and the relevant mutagens could be identified and shown in some convincing way to be specifically carcinogenic in specific tissues, in practice if not ultra-reality, then the aggregate exposures to such mutations are unavoidable and epistemically random with respect to tissue and gene.  That I would say is the essence of the V-T finding.

Quibbling about that aspect of carcinogenesis is for those who have already determined how many angels dance on the head of a pin.

Higher resolution discrimination: The GOP wants to allow employers to require genetic testing

This morning, Ed Yong published an article that takes on issues that we at the The Mermaid's Tale care very deeply about.
Link to article
The consequences for important medical research are not going to be pretty.

And I can't help but be angry about this for threatening to take away the fun of genetics too. If we can't have some control over our genetic testing, we can't do it for fun, for education, for finding out more about ourselves, for the awe of it, for innerspace exploration in the technology age. They're taking that away from us by eroding GINA.

I have lots of other thoughts... like about how this fits in so nicely with (not all of) the right's racist/eugenics inclinations.

And juxtapose this view from the political right where there is full-on acceptance of actually-more-than-genetics-can-even-deliver against their anti-science politics and policy...

It's like science is totally fine for Republicans as long as Mother Nature is a dictator.

If it's more complicated than that, then deny it, defund it, bulldoze it. The reality is, genetics is largely probabilistic; it is not a dictatorship. It's just so hard to convince people that it isn't. The ideological drive to justify behavioral differences and socioeconomic inequality with Nature above all is just too strong. If it's Nature, then we don't have to do the hard work of addressing the problems because Nature is Nature is Nature. This is really old thinking that really new knowledge (both through lots of science and lots of lived experience and lots of humanities and lots of art) has overturned but has not managed to catch on all that well. Along with new knowledge we get increasing understanding of genetics so these ancient beliefs can just be spouted by politicians using new-fangled science jargon.

This is really hard to write about today as all the stories about the proposed (and highly probable) budget cuts to science and the arts are blasting through my newsfeeds. It's overwhelming me today. I'm feeling hopeless and angry on behalf of science, art, knowledge, medicine, humanity, humans, children, teenagers, grown-ups, geezers. It's too much today.

But, back to Ed's article, I do need to put this here because it mentions that I have taught with 23andMe and longtime readers of the MT might know about that:

I don't teach with 23andMe anymore. I was doing it for as long as my university would pay for the kits. It was totally voluntary and students had to read Misha Angrist's book and endure long discussions and pass a quiz before deciding whether to go through with the testing. It was so powerful for teaching evolution, genetics, anthropology, etc... and we critiqued the hell out of it. My university said I needed to pay for the kits through course fees from now on. Before any of these threats to GINA, I decided not to do that and to stop using 23andMe. Now, even if my university reconsidered and funded the kits, I still wouldn't take it up again as a teaching tool.

Post-truth science?

This year was one that shook normal politics to its core.  Our belief in free and fair elections, in the idea that politicians strive to tell the truth and are ashamed to be caught lying, in real news vs fake, in the importance of tradition and precedent, indeed in the importance of science in shaping our world, have all been challenged.  This has served to remind us that we can't take progress, world view, or even truth and the importance of truth themselves for granted.  The world is changing, like it or not.  And, as scientists who assume that truth actually exists and whose lives are devoted to searching for it, the changes are not in familiar directions.  We can disagree with our neighbors about many things, but when we can't even agree on what's true, this is not the 'normal' world we know.

To great fanfare, Oxford Dictionaries chose "post-truth" as its international word of the year.
The use of “post-truth” — defined as “relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief” — increased by 2,000 percent over last year, according to analysis of the Oxford English Corpus, which collects roughly 150 million words of spoken and written English from various sources each month.  New York Times
I introduce this into a science blog because, well, I see some parallels with science.  As most of us know, Thomas Kuhn, in his iconic book, The Structure of Scientific Revolutions, wrote about "normal science", how scientists go about their work on a daily basis, theorizing, experimenting, and synthesizing based on a paradigm, a world view that is agreed upon by the majority of scientists.  (Although not well recognized, Kuhn was preceded in this by Ludwik Fleck, Polish and Israeli physician and biologist who, way back in the 1930s, used the term 'thought collective' for the same basic idea.)

When thoughtful observers recognize that an unwieldy number of facts no longer fit the prevailing paradigm, and develop a new synthesis of current knowledge, a 'scientific revolution' occurs and matures into a new normal science.  In the 5th post in Ken's recent thought-provoking series on genetics as metaphysics, he reminded us of some major 'paradigm shifts' in the history of science -- plate tectonics, relativity and the theory of evolution itself.

We have learned a lot in the last century, but there are 'facts' that don't fit into the prevailing gene-centered, enumerative, reductive approach to understanding prediction and causation, our current paradigm.  If you've read the MT for a while, you know that this is an idea we've often kicked around.  In 2013 Ken made a list of 'strange facts' in a post he called "Are we there yet or do strange things about life require new thinking?" I repost that list below because I think it's worth considering again the kinds of facts that should challenge our current paradigm.

As scientists, our world view is supposed to be based on truth.  We know that climate change is happening, that it's automation not immigration that's threatening jobs in the US, that fossil fuels are in many places now more costly than wind or solar.  But by and large, we know these things not because we personally do research into them all -- we can't -- but because we believe the scientists who do carry out the research and who tell us what they find.  In that sense, our world views are faith-based.  Scientists are human, and have vested interests and personal world views, and seek credit, and so on, but generally they are trustworthy about reporting facts and the nature of actual evidence, even if they advocate their preferred interpretation of the facts, and even if scientists, like anyone else, do their best to support their views and even their biases.

Closer to home, as geneticists, our world view is also faith-based in that we interpret our observations based on a theory or paradigm that we can't possibly test every time we invoke it, but that we simply accept.  The current 'normal' biology is couched in the evolutionary paradigm often based on ideas of strongly specific natural selection, and genetics in the primacy of the gene.

The US Congress just passed a massive bill in support of normal science, the "21st Century Cures Act", with funding for the blatant marketing ploys of the brain connectome project, the push for "Precision Medicine" (first "Personalized Medicine, this endeavor has been, rebranded -- cynically? --yet again to "All of Us") and the new war on cancer.  These projects are nothing if not born of our current paradigm in the life sciences; reductive enumeration of causation and the ability to predict disease.  But the many well-known challenges to this paradigm lead us to predict that, like the Human Genome Project which among other things was supposed to lead to the cure of all disease by 2020, these endeavors can't fulfill their promise.

To a great if not even fundamental extent, this branding is about securing societal resources, for projects too big and costly to kill, in a way similar to any advertising or even to the way churches promise heaven when they pass the plate. But it relies on wide-spread acceptance of contemporary 'normal science', despite the unwieldy number of well-known, misfitting facts.  Even science is now perilously close to 'post-truth' science. This sort of dissembling is deeply built into our culture at present.

We've got brilliant scientists doing excellent work, turning out interesting results every day, and brilliant science journalists who describe and publicize their new findings. But it's almost all done within, and accepting, the working paradigm. Too few scientists, and even fewer writers who communicate their science, are challenging that paradigm and pushing our understanding forward. Scientists, insecure and scrambling not just for insight but for their very jobs, are pressed explicitly or implicitly to toe the current party line. In a very real sense, we're becoming more dedicated to faith-based science than we are to truth.

Neither Ken nor I are certain that a new paradigm is necessary, or that it's right around the corner. How could we know? But, there are enough 'strange facts', that don't fit the current paradigm centered around genes as discrete, independent causal units, that we think it's worth thinking about whether a new synthesis, that can incorporate these facts, might be necessary. It's possible, as we've often said, that we already know everything we need to know: that biology is complex, genetics is interactive not iterative, every genome is unique and interacts with unique individual histories of exposures to environmental risk factors, evolution generates difference rather than replicability, and we will never be able to predict complex disease 'precisely'.

But it's also possible that there are new ways to think about what we know, beyond statistics and population-based observations, to better understand causation.  There are many facts that don't fit the current paradigm, and more smart scientists should be thinking about this as they carry on with their normal science.



---------------------------------
Do strange things about life require new concepts?
1.  The linear view of genetic causation (cis effects of gene function, for the cognoscenti) is clearly inaccurate.  Gene regulation and usage are largely, if not mainly, not just local to a given chromosome region (they are trans);
2.  Chromosomal usage is 4-dimensional within the nucleus, not even 3-dimensional, because arrangements are changing with circumstances, that is, with time;
3.  There is a large amount of inter-genic and inter-chromosomal communication leading to selective expression and non-expression at individual locations and across the genome (e.g., monoallelic expression).  Thousands of local areas of chromosomes wrap and unwrap dynamically depending on species, cell type,  environmental conditions, and the state of other parts of the genome at a given time; 
4.  There is all sorts of post-transcription modification (e.g., RNA editing, chaperoning) that is a further part of 4-D causation;
5.  There is environmental feedback in terms of gene usage, some of which is inherited (epigenetic marking) that can be inherited and borders on being 'lamarckian';
6.  There are dynamic symbioses as a fundamental and pervasive rather than just incidental and occasional part of life (e.g., microbes in humans);
7.  There is no such thing as 'the' human genome from which deviations are measured.  Likewise, there is no evolution of 'the' human and chimpanzee genome from 'the' genome of a common ancestor.  Instead, perhaps conceptually like event cones in physics, where the speed of light constrains what has happened or can happen, there are descent cones of genomic variation descending from individual sequences--time-dependent spreading of variation, with time-dependent limitations.  They intertwine among individuals though each individual's is unique.  There is a past cone leading of ancestry to each current instance of a genome sequence, from an ever-widening set of ancestors (as one goes back in time) and a future cone of descendants and their variation that's affected by mutations.  There are descent cones in the genomes among organisms, and among organisms in a species, and between species. This is of course just a heuristic, not an attempt at a literal simile or to steal ideas from physics! 
Light cone: Wikipedia

8.  Descent cones exist among the cells and tissues within each organism, because of somatic mutation, but the metaphor breaks down because they have strange singular rather than complex ancestry because in individuals the go back to a point, a single fertilized egg, and of individuals to life's Big Bang;
9.  For the previous reasons, all genomes represent 'point' variations (instances) around a non-existent core  that we conceptually refer to as 'species' or 'organs', etc.('the' human genome, 'the' giraffe, etc.);
10.  Enumerating causation by statistical sampling methods is often impossible (literally) because rare variants don't have enough copies to generate 'significance', significance criteria are subjective, and/or because many variants have effects too small to generate significance;
11.  Natural selection, that generates current variation along with chance (drift) is usually so weak that it cannot be demonstrated, often in principle, for similar statistical reasons:  if cause of a trait is too weak to show, cause of fitness is too weak to show; there is not just one way to be 'adapted'.
12.  Alleles and genotypes have effects that are inherently relativistic.  They depend upon context, and each organism's context is different;
13.  Perhaps analogously with the ideal gas law and its like, phenotypes seem to have coherence.  We each have a height or blood pressure, despite all the variation noted above.  In populations of people, or organs, we find ordinary (e.g., 'bell-shaped') distributions, that may be the result of a 'law' of large numbers: just as human genomes are variation around a 'platonic' core, so blood pressure is the net result of individual action of many cells.  And biological traits are typically always changing;
14. 'Environment' (itself a vague catch-all term) has very unclear effects on traits.  Genomic-based risks are retrospectively assessed but future environments cannot, in principle, be known, so that genomic-based prediction is an illusion of unclear precision; 
15.  The typical picture is of many-to-many genomic (and other) causation for which many causes can lead to the same result (polygenic equivalence), and many results can be due to the same cause (pleiotropy);
16. Our reductionist models, even those that deal with networks, badly under-include interactions and complementarity.  We are prisoners of single-cause thinking, which is only reinforced by strongly adaptationist Darwinism that, to this day, makes us think deterministically and in terms of competition, even though life is manifestly a phenomenon of molecular cooperation (interaction).  We have no theory for the form of these interactions (simple multiplicative? geometric?).
17.  In a sense all molecular reactions are about entropy, energy, and interaction among different molecules or whatever.  But while ordinary nonliving molecular reactions converge on some result, life is generally about increasing difference, because life is an evolutionary phenomenon.
18. DNA is itself a quasi-random, inert sequence. Its properties come entirely from spatial, temporal, combinatorial ('Boolean'-like) relationships. This context works only because of what else is in (and on the immediate outside) of the cell at the given time, a regress back to the origin of life.

Is genetics still metaphysical? Part VI. What might lead to a transformative insight in biology today--if we need one

It's easy to complain about the state of the world, in this case, of the life sciences, and much harder to provide the Big New Insights one argues might be due.  Senioritis makes it even easier: when my career in genetics began, not very much was known.  Genes figuratively had 2 alleles, with measurable rates of recurrence by mutation.  Genetically tractable traits were caused by the proteins in these genes; quantitative traits were too complex to be considered seriously to be due to individual genes, so were tacitly assumed to be the additive result of an essentially infinite number of them.

How many genes there were was essentially unknowable, but using identified proteins as a gauge, widely thought to be around 100,000. The 'modern evolutionary synthesis' solved the problem, conceptually, by treating these largely metaphorical causal items as largely equivalent, if distinct, entities whose identities were essentially unknowable.  That is, at least, we didn't have to think about them as specific entities, only their collective actions.  Mendelian causal genes, evolving by natural selection was, even if metaphorical or even in a serious way metaphysical, a highly viable worldview in which to operate.  A whole science enterprise grew around this worldview.  But things have changed.

Over the course of my career, we've learned a lot about these metaphysical units.  Whether or not they are now more physical than metaphysical is the question I've tried to address in this series of posts, and I think there's not an easy answer--but what we have, or should have, understood is that they are not units!  If we have to have a word for them, perhaps it should be interactants.  But even that is misleading because the referents are not in fact unitary.  For example, many if not  most 'genes' are only active in context-dependent circumstances, are multiply spliced, may be post-trascriptionally edited, are chemically modified, and have function only when combined with other units (e.g., don't code for discretely functioning proteins), etc.

Because interaction is largely a trans phenomenon--between factors here and there, rather than just everything here, the current gene concept, and the panselectionistic view in which every trait has an adaptive purpose, whether tacit or explicit, is a serious or even fundamental impediment to a more synthetic understanding. I feel it's worth piling on at this point, and adding that the current science is also pan-statistical in ways that in my view are just as damaging.  The reason, to me, is that these methods are almost entirely generic, based on internal comparison among samples, using subjective decision-criteria (e.g., p-values) rather than testing data against a serious-level theory.

If this be so, then perhaps if the gene-centered view of life, or even the gene concept itself as life's fundamental 'atomic' unit, needs to be abandoned as a crude if once important approximation to the nature of life. I have no brilliant ideas, but will try here to present the sorts of known facts that might stimulate some original thinker's synthesizing insight--or, alternatively, might lead us to believe that no such thing is even needed, because we already understand the relevant nature of life: that as an evolutionary product it is inherently not 'regular' the way physics and chemistry are.  But if our understanding is already correct, then our public promises of precision medicine are culpably misleading slogans.

In part V of this series I mentioned several examples of deep science insight, that seemed to have shared at least one thing in common:  they were based on a synthesis that unified many seemingly disparate facts.  We have many facts confronting us.  How would or might we try to think differently about them?  One way might be to ask the following questions: What if biological causation is about difference, not replication?  What if 'the gene' is misleading, and we were to view life in terms of interactions rather than genes-as-things?  How would that change our view?

Here are some well-established facts that might be relevant to a new, synthetic rather than particulate view of life:

1. Evolution works by difference, not replication Since Newton or perhaps back to the Greek geometers, what we now call 'science' largely was about understanding the regularities of existence.  What became known as 'laws' of Nature were, initially for theological reasons, assumed to be the basis of existence.  The same conditions led to the same outcomes anywhere.  Two colliding billiard balls here on Earth or in any other galaxy, would react in identical ways (yes, I know, that one can never have exactly the same conditions--or billiard balls--but the idea is that the parts of the cosmos were exchangeable.)  But one aspect of life is that it is an evolved chemical phenomenon whose evolution occurred because elements were different rather than exchangeable.  Evolution and hence life, is about interactions or context-specific relative effects (e.g., genetic drift, natural selection). 
2.  Life is a phenomenon of nested (cladistic) tree-like relationships Life is not about separated, independent entities, but about entities that from the biosphere down (at least) to individual organisms are made of sets of variations of higher-level components.  Observation at one level, at least from cells up to organs to systems to individuals, populations, species and ecosystems, are reflections of the nested level(s) the observational level contains. 
3.  Much genetic variation works before birth or on a population level Change may arise by genetic mutation, but function is about interactions, and success--which in life means reproduction--depends on the nature of the interactions at all levels.  That is, Darwinian competition among individuals of different species is only one, and perhaps one of the weakest, kinds of such interaction.  Embryonic development is a much more direct, and stronger arena for filtering interactions, than competition (natural selection) among adults for limited resources.  In a similar way, some biological and even genetic factors work only in a population way (bees and ants are an obvious instance, as are bacterial microfilm and the life cycles of sponges or slime molds). 
4.  Homeostasis is one of the fundamental and essential ways that organisms interact Homeostasis as an obvious example of a trans phenomenon.  It's complexly trans because not only do gene-expression combinations change, but they are induced to change by extra-cellular and even extra-organismal factors both intra and inter-species.  The idea of a balance or stasis, as with organized and orchestrated combinatorial reaction surely cannot be read of in cis.  We have known about interactions and reactions and so on, so this is not to invoke some vague Gaia notions, but to point out the deep level of interactions, and these depend on many factors that themselves vary, etc. 
5. Environments include non-living factors as well as social/interaction ones No gene is an island, even if we could identify what a 'gene' was, and indeed that no gene stands alone is partly why we can't.  Environments are like the celestial spheres: from each point of view everything else is the 'environment', including the rest of a cell, organ, system, organism, population, ecosystem.  In humans and many other species, we must include behavioral or social kinds of interactions as 'environment'.  There is no absolute reference frame in life any more than in the cosmos.  Things may appear linear from one point of view, but not another.  The 'causal' effects of a protein code (a classical 'gene') depend on its context--and vice versa
6. The complexity of factors often implies weak or equivalent causation--and that's evolutionarily fundamental. Factors or 'forces' that are too strong on their own--that is, that appear as individually identifiable 'units'--are often lethal to evolutionary survival.  Most outcomes we (or evolution) care about are causally complex, and they are always simultaneously multiple: a species isn't adapting to just one selective factor at a time, for example.  Polygenic causation (using the term loosely to refer to complex multi-factoral causation) is the rule.  These facts mean that individually identified factors usually have weak effects and/or that there are alternative ways to achieve the same end, within or among individuals.  Selection, even of the classical kind, must be typically weak relative to any given involved factor. 
7. The definition of traits is often subjective and affects their 'cause' Who decides what 'obesity', 'intelligence', or 'diabetes' is?  In general, we might say that 'Nature' decides what is a 'trait', but in practice it is often we, via our language and our scientific framework, who try to divide up the living world into discrete categories and hence to search for discrete causal factors.  It is no surprise that what we find is rather arbitrary, and gives the impression of biological causation as packaged into separate items rather than being fundamentally about a 'fabric' of interactions.  But the shoehorn is often a major instrument in our causal explanations. 
8. The 'quantum mechanics' effect: interaction affects the interactors In many aspects of life, obviously but not exclusively applied to humans, when scientists ask a question or publicize a result, it affects the population in question.  This is much like the measurement effect in quantum mechanics.  Studying something affects it in ways relevant to the causal landscape we are studying.  Even in non-human life, the 'studying' of rabbits by foxes, or of forests by sunlight, affects what is being studied.  This is another way of pointing out the pervasive centrality of interaction.  Just like political polls, the science 'news' in our media, affect our behavior and it is almost impossible to measure the breadth and impact of this phenomenon.

All of these phenomena can be shoe-horned into the 'gene' concept or a gene-centered view of life or of biomedical 'precision'.  But it's forced: each case has to be treated differently, by statistical tests rather than a rigorous theory, and with all sorts of exceptions, involving things like those listed here, that have to be given post hoc explanations (if any). In this sense, the gene concept is outmoded and an overly particulate and atomized view of a phenomenon--life--whose basic nature is that it is not so particularized.

Take all of these facts, and many others like them, and try to view them as a whole, and as a whole that, nonetheless can evolve.  Yesterday's post on how I make doggerel was intended to suggest a similar kind of mental exercise.  There can be wholes, and they can evolve, but they do it as wholes. If there is a new synthesis to be found, my own hunch it would be in these sorts of thoughts.  As with the examples I discussed a few days ago (plate techtonics, evolution itself, and relativity), there was a wealth of facts that were not secret or special, and were well-known. But they hadn't been put together until someone thinking hard about them, who was also smart and lucky, managed it. Whether we have this in the offing for biology, or whether we even need it, is what I've tried to write about in this series of posts.

Of course, one shouldn't romanticize scientific 'revolutions'.  As I've also tried to say, these sorts of facts, which are ones I happen to have thought of to list, do not in any way prove that there is a grand new synthesis out there waiting to be discovered. It is perfectly plausible that this kind of ad hoc, chaotic view of life is what life is like.  But if that's the case, we should shed the particulate, gene-centered view we have and openly acknowledge the ad hoc, complex, fundamentally trans nature of life--and, therefore, of what we can promise in terms of health miracles.

Is genetics still metaphysical? Part V. Examples of conditions that lead to transformative insights

A commenter on this series asked what I thought that "a theory of biology should (realistically) aspire to predict?" The series (part 1 here) has discussed aspects of life sciences in which we don't currently seem to have the kind of unifying underlying theory found in other physical sciences. I'm not convinced that many people even recognize the problem.

I couched the issues in the context of asking whether the 'gene' concept was metaphysical or was more demonstrably or rigorously concrete.  I don't think it is concrete, and I do think many areas of the life sciences are based on internal generic statistical or sampling comparison of one sort of data against another (e.g., genetic variants found in cases vs controls in a search for genetic causes of disease), rather than comparing data against some prior specific theory of causation other than vacuously true assertions like 'genes may contribute to risk of disease'.  I don't think there's an obvious current answer to my view that we need a better theory of biology, nor of course that I have that answer.   

I did suggest in this series that perhaps we should not expect biology to have the same kind of theory found in physics, because our current understanding doesn't (or at least shouldn't) lead us to expect the same kind of cause-effect replicability.  Evolution--which was one of the sort of basic revolutionary insights in the history of science, and is about life, specifically asserts that life got the way it is by not being replicable (e.g., in one process, by natural selection among different--non-replicate--individuals).  But that's also a very vanilla comment.

I'll try to answer the commenter's question in this and the next post.  I'll do it in a kind of 'meta' or very generic way, through the device of presenting examples of the kind of knowledge landscape that has stimulated new, deeply synthesizing insight in various areas of science.

1.  Relativity
History generally credits Galileo for the first modern understanding that some aspects of motion appear differently from different points of view.  A classic case was of a ship gliding into the port of Genoa: if someone inside the ship dropped a ball it would land at his feet, just as it would for someone on land.  But someone on land watching the sailor through a window would see the ball move not just down but also along an angled path toward the port, the hypotenuse of a right triangle, which is longer than the straight-down distance.  But if the two observations of the same event were quantitatively different, which was 'true'?  Eventually, Einstein extended this question using images such as trains and railroad stations: a passenger who switched on two lightbulbs, one each at opposite ends of a train, would see both flashes at the same time.  But a person at a station the train was passing through would see the rearmost flash before the frontward one.  So what does this say about simultaneity?

These and many other examples showed that, unlike Isaac Newton's view of space and time as existing in an absolute sense, they depend on one's point of view, in the sense that if you adjust for that, all observers will see the same laws of Nature at work.  Einstein was working in the Swiss patent office and at the time there were problems inventors were trying to solve in keeping coordinated time--this affected European railroads, but also telecommunication, marine transport and so on. Thinking synthetically about various aspects of the problem led Einstein later to show that a similar answer applied to acceleration and a fundamentally different, viewpoint-dependent, understanding of gravity as curvature in space and time itself, a deeply powerfully deeper understanding of the inherent structure of the universe.  A relativisitic viewpoint helped account for the nature and speed of light, aspects of both motion and momentum, of electromagnetism, the relationship between matter and energy, the composition of 'space', the nature of gravity, of time and space as a unified matrix of existence, the dynamics of the cosmos, and so on, all essentially in one go.

The mathematics is very complex (and beyond my understanding!).   But the idea itself was mainly based on rather simple observations (or thought experiments), and did not require extensive data or exotically remote theory, though it has been shown to fit very diverse phenomena better than former non-relativisitc views, and are required for aspects of modern life, as well as our wish to understand the cosmos and our place in it.  That's how we should think of a unifying synthesis. 

The insight that led to relativity as a modern concept, and that there is no one 'true' viewpoint ('reference frame'), is a logically simple one, but that united many different well-known facts and observations that had not been accounted for by the same underlying aspect of Nature.

2.  Geology and Plate Techtonics (Continental Drift)
Physics is very precise from a mathematical point of view, but transformative synthesis in human thinking does not require that sort of precision.  Two evolutionary examples will show this, and that principles or 'laws' of Nature can take various forms.  

The prevailing western view until the last couple of centuries, even among scientists, was that the cosmos had a point Creation, basically in its present form, a few thousand years ago.  But the age of exploration occasioned by better seagoing technology and a spirit of global investigation, found oddities, such as sea shells at high elevations, and fossils.  The orderly geographical nature of coral atolls, Pacific island chains, volcanic and earthquake-prone regions was discovered.  Remnants of very different climates than present ones in some locations were found.  Similarly looking biological species (and fossils) were found in disjoint parts of the world, such as South Africa, South America, and eventually Antarctica.  These were given various local, ad hoc one-off explanations.  There were hints in previous work, but an influential author was Alfred Wegener who wrote (e.g., from 1912--see Wikipedia: Alfred Wegener) about the global map, showing evidence of continental drift, the continents being remnants of a separating jigsaw puzzle, as shown in the first image here; the second shows additional evidence of what were strange similarities in distantly separated lands.  This knowledge had accumulated by the many world collectors and travelers during the Age of Exploration. Better maps showed that continents seemed sometimes to be 'fitted' to each other like pieces of a jigsaw puzzle.  



Geological ages and continental movement (from Hallam, A Revolution in the Earth Sciences, 1973; see text)


Evidence for the continental jigsaw puzzle (source Wikipedia: Alfred Wegener, see text)

Also, if the world were young and static since some 'creation' event, these individual findings were hard to account for. This complemented ideas by early geologists like Hutton and Lyell around the turn of the 19th century. They noticed that deep time also was consistent with the idea of (pardon the pun) glacially slow observable changes in glaciers, river banks, and coastlines that had been documented since by geologists  Their idea of 'uniformitarianism' was that processes observable today occurred as well during the deep past, meaning that extrapolation was a valid way to make inferences.  Ad hoc isolated and unrelated explanations had generally been offered piecemeal for these sorts of facts.  Similar plants or animals on oceanically separated continents must have gotten there by rafting on detritus from rivers that had been borne to the sea.

Many very different kinds of evidence were then assembled and a profound insight was the result, which we today refer to by terms such as 'plate techtonics' or 'continental drift'.   There are now countless sources for the details, but one that I think is interesting is A Revolution in the Earth Sciences, by A. Hallam, published by Oxford Press in 1973, only a few years after what is basically the modern view had been convincingly accepted.  His account is interesting because we now know so much more that reinforces the idea, but it was as stunning a thought-change as was biological evolution in Darwin's time.  I was a graduate student at the time, and we experienced the Aha! realization that was taking place was that, before our very observational eyes so to speak, diverse facts were being fit under the same synthesizing explanation (even some of our faculty were still teaching old, forced, stable-earth explanations).

Among much else, magnetic orientation of geological formations, including symmetric stripes of magnetic reversals flanking the Mid-Atlantic Trench documented the sea-floor spreading that separated the broken-off continental fragments--the pieces of the jigsaw puzzle.  Mountain height and sea depth patterns gained new explanations on a geologic (and very deep time) scale, because the earth was accepted as being older than biblical accounts).  Atolls and the volcanic ring of fire are accounted for by continental motions.  

This was not a sudden one-factor brilliant finding, but rather the accumulation of centuries of slowly collected global data from the age of sail (corresponding to today's fervor for 'Big Data'?).  A key is that the local facts were not really accounted for by locally specific explanations, but were globally united as instances of the same general, globally underlying processes.  Coastlines, river gorges, mountain building, fossil-site locations, current evidence of very different past climates and so on were brought under the umbrella of one powerful, unifying theory.  It was the recognition of very disparate facts that could be synthesized that led to the general acceptance of the theory.  Indeed, subsequent and extensive global data, continue to this day to make the hypothesis of early advocates like Wegener pay off.

3.  Evolution itself
It is a 100% irrefutable explanation for life's diversity to say that God created all the species on Earth. But that is of no use in understanding the world, especially if we believe, as is quite obvious, that the world and the cosmos more broadly follows regular patterns or 'laws'.  Creationist views of life's diversity, of fossils, and so on, are all post hoc, special explanations for each instance. Each living species can be credited to a separate divine reason or event of creation.  But when world traveling became more common and practicable, many facts and patterns were observed that seemed to make such explanations lame and tautological at best.  For example, fossils resembled crude forms of species present today in the same area.  Groups of similar species are found living in a given region, with clusters of somewhat less similar species elsewhere. The structures of species, such as of vertebrates, or insects, showed similar organization, and one could extend this to deeper if more different patterns in other groups (e.g., that we now would call genera, phyla, and so on).  Basic aspects of inheritance seemed to apply to species, plant and animal alike.  If all species had been, say, on the same Ark, why were similar species so geographically clustered?

It dawned on investigators scanning the Victorian Age's global collections, and in particular Darwin and Wallace, that because offspring resemble their parents, though are not identical to them, and individuals and species have to feed on each other or compete for resources, that those that did better would proliferate more.  If they became isolated, they could diverge in form, and not only that but the traits of each species were suited to its circumstances, even if species fed off each other.  Over time this would also produce different, but related species in a given area.  New species were not seen directly to arise, but precedents from breeders' history showed the effects of selective reproduction, and geologists like Lyell had made biologists aware of the slow but steady nature of geological change.  If one accepted the idea that rather than the short history implied by biblical reading, life on earth instead had been here for a very long time, these otherwise very disparate facts about the nature of life and the reasons for its diversity might have a common 'uniformitarian' explanation--a real scientific explanation in terms of a shared causative process, rather than a series of unrelated creations: the synthesis of a world's worth of very diverse facts made the global pattern of life make causal and explanatory sense, in a way that it had never had before.

Of course the fact of evolution does not directly inform us about genetic causation, which has been the motivating topic of this series of posts.  We'll deal with this in our next post in the series.

Insight comes from facing a problem by synthesis related to pattern recognition
The common feature of these examples of scientific insight is that they involve synthesis derived from pattern recognition. There is a problem to be solved or something to be explained, and multiple facts that may not have seemed related and have been given local, ad hoc, one-off 'explanations'. Often the latter are forced or far-fetched, or 'lazy' (as in Creationism, because it required no understanding of the birds and the beasts). Or because the explanations are not based on any sort of real-world process, they cannot be tested and tempered, and improved.  And, unlike Creationist accounts, scientific accounts can be shown to be wrong, and hence our understanding improved.

In our examples of the conditions in which major scientific insights have occurred, someone or some few, looking at a wealth of disparate facts, or perhaps finding some new fact that is relevant to them, saw through the thicket of 'data', and found meaning.  The more a truly new idea strikes home, in each case, the more facts it incorporates, even facts not considered to be relevant.

Well!  If we don't have diverse, often seemingly disparate facts in genetics then nobody does!  But the situation now seems somewhat different from the above examples: indeed, with the precedents like those above, and several others including historic advances in chemistry, quantum physics, and astronomy, we seem to hasten to generalize, and claim our own synthesizing 'laws'.  But how well are we actually doing, and have we identified the right primary units of causation on which to do the same sort of synthesizing?  Or do we need to?

 I'll do my feeble best to offer some thoughts on this in the final part of this series.

Rare Disease Day and the promises of personalized medicine

O ur daughter Ellen wrote the post that I republish below 3 years ago, and we've reposted it in commemoration of Rare Disease Day, Febru...