theory etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
theory etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Is genetics still metaphysical? Part III. Or could that be right after all?

In the two prior parts of this little series (I and II), we've discussed the way in which unknown, putatively causative entities were invoked to explain their purported consequences, even if the agent itself could not be seen or its essence characterized.  Atoms and an all-pervasive ether are examples. In the last two centuries, many scientists followed some of the principles laid down in the prior Enlightenment period, and were intensely empirical, to avoid untrammeled speculation.  Others followed long tradition and speculated about the underlying essentials of Nature that could account for the empiricists' observations. Of course, in reality I think most scientists, and even strongly religious people, believed that Nature was law-like: there were universally true underlying causative principles.  The idea of empiricism was to escape the unconstrained speculation that was the inheritance even from the classical times (and, of course, from dogmatic religious explanations of Nature).  Repeated observation was the key to finding Nature's patterns, which could only be understood indirectly.  I'm oversimplifying, but this was largely the situation in 19th and early 20th century physics and it became true of historical sciences like geology, and in biology during the same time.

At these stages in the sciences, free-wheeling speculation was denigrated as delving in metaphysics, because only systematic empiricism--actual data!--could reveal how Nature worked. I've used the term 'metaphysics' because in the post-Enlightenment era it has had and been used in a pejorative sense.  On the other hand, if one cannot make generalizations, that is, infer Nature's 'laws', then one cannot really turn retrospective observation into prospective prediction.

By the turn of the century, we had Darwin's attempt at Newtonian law-like invocation of natural selection as a universal force for change in life, and we had Mendel's legacy that said that causative elements, that were dubbed 'genes', underlay the traits of Nature's creatures.  But a 'gene' had never actually been 'seen', or directly identified until well into the 20th century. What, after all, was a 'gene'? Some sort of thing?  A particle?  An action?  How could 'it' account for traits as well as their evolution?  To many, the gene was a convenient concept that was perhaps casually and schematically useful, but not helpful in any direct way.  Much has changed, or at least seems to have changed since then!

Genetics is today considered a mainline science, well beyond the descriptive beetle-collecting style of the 19th century.  We now routinely claim to identify life's causative elements as distinct, discrete segments of DNA sequence, and a gene is routinely treated as causing purportedly 'precisely' understandable effects.  If raw Big Data empiricism is the Justification du Jour for open-ended mega-funding, the implicit justifying idea is that genomics is predictive the way gravity and relativity and electromagnetism are--if only we had enough data!  Only with Big Data can we identify these distinct, discrete causal entities, characterize their individual effects and use that for prediction, based on some implicit theory or law of biological causation.  It's real science, not metaphysics!

But even with today's knowledge, how true is that?

The inherent importance of context-dependency and alternative paths
It seems obvious that biological causation is essentially relative in nature: it fundamentally involves context and relationships.  Treating genes as individual, discrete causal agents really is a form of metaphysical reification, not least because it clearly ignores what we know about genetics itself. As we saw earlier, today there is no such thing as 'the' gene, much less one we can define as the discrete unit of biological function.  Biological function seems inherently about interactions.  The gene remains in that sense, to this day, a metaphysical concept--perhaps even in the pejorative sense, because we know better!

We do know what some 'genes' are: sequences coding for protein or mature RNA structure.  But we also know that much of DNA has function unrelated to the stereotypical gene.  A gene has multiple exons and often differently spliced (among many other things, including antisense RNA post-transcription regulation, and RNA editing), combined with other 'genes' to contribute to some function.  A given DNA coding sequence often is used in different contexts in which 'its' function depends on local context-specific combinations with other 'genes'.  There are regulatory DNA sequences, sequences related to the packaging and processing of DNA, and much more.  And this is just the tip of the current knowledge iceberg; that is, we know there's the rest of the iceberg not yet known to us.

Indeed, regardless of what is said and caveats offered here and there as escape clauses, in practice it is routinely assumed that genes are independent, discrete agents with additive functional effects, even though this additivity is a crude result of applying generic statistical rather than causal models, mostly to whole organisms rather than individual cells or gene products themselves.  Our methods of statistical inference are not causal models as a rule but really only indicate whether, more probably than not, in a given kind of sample and context a gene actually 'does' anything to what we've chosen to measure. Yes, Virginia, the gene concept really is to a great extent still metaphysical.

But isn't genomic empiricism enough?  Why bother with metaphysics (or whatever less pejorative-sounding term you prefer)? Isn't it enough to identify 'genes', however we do it, and estimate their functions empirically, regardless of what genes actually 'are'?  No, not at all.  As we noted yesterday, without an underlying theory, we may sometimes be able to make generic statistical 'fits' to retrospective data, but it is obvious, even in some of the clearest supposedly single-gene cases, that we do not have strong bases for extrapolating such findings in direct causal or predictive terms.  We may speak as if we know what we're talking about, but those who promise otherwise are sailing as close to the wind as possible.

That genetics today is still rather metaphysical, and rests heavily on fancifully phrased but basically plain empiricism, does not gainsay that fact that we are doing much more than just empiricism, in many areas, and we try to do that even in Big Promise biomedicine.  We do know a lot about functions of DNA segments.  We are making clear progress in understanding and combatting diseases and so on.  But we also know, as a general statement, that even in closely studied contexts, most organisms have alternative pathways to similar outcomes and the same mutation introduced into different backgrounds (in humans, because the causal probabilities vary greatly and are generally low, and in different strains of laboratory animals) often has different effects.  We already know from even the strongest kind of genetic effects (e.g., BRCA1 mutations and breast cancer) that extrapolation of future risk from retrospective data-fitting can be grossly inaccurate.  So our progress is typically a lot cruder than our claims about it.

An excuse that is implicit and sometimes explicit is that today's Big Data 'precision, personalized' medicine, and much of evolutionary inference, are for the same age-old argument good simply because they are based on facts, on pure empiricism, not resting on any fancy effete intellectual snobs' theorizing:  We know genes cause disease (and everything else) and we know natural selection causes our traits.  And those in Darwinian medicine know that everything can be explained by the 'force' of natural selection.  So just let us collect Big Data and invoke these 'theories' superficially as justification, and mint our predictions!

But--could it be that the empiricists are right, despite not realizing why?  Could it be that the idea that there is an underlying theory or law-like causal reality, of which Big Data empiricism provides only imperfect reflections, really is, in many ways, only a hope, but not a reality?

Or is life essentially empirical--without a continuous underlying causal fabric?
What if Einstein's dream of a True Nature, that doesn't play dice with causation, was a nightmare.  In biology, in particular, could it be that there isn't a single underlying, much less smooth and deterministic, natural law?  Maybe there isn't any causal element of the sort being invoked by terms like 'gene'.  If an essential aspect of life is its lack of law-like replicability, the living world may be essentially metaphysical in the usual sense of there being no 'true' laws or causative particles as such. Perhaps better stated, the natural laws of life may essentially be that life does not following any particular law, but is determined by universally unique local ad hoc conditions.  Life is, after all, the product of evolution and if our ideas about evolution are correct, it is a process of diversification rather than unity, of local ad hoc conditions rather than universal ones.

To the extent this is the reality, ideas like genes may be largely metaphysical in the common sense of the term.  Empiricism may in fact be the best way to see what's going on.  This isn't much solace, however, because if that's the case then promises of accurate predictability from existing data may be culpably misleading, even false in the sense that a proper understanding of life would be that such predictions won't work to a knowable extent.

I personally think that a major problem is our reliance on statistical analysis and its significance criteria, that we can easily apply but that have at best only very indirect relationship to any underlying causal fabric, and that 'indirect' means largely unknowably indirect. Statistics in this situation is essentially about probabilistic comparisons, and has little or often no basis in causal theory, that is, in the reason for observed differences.  Statistics work very well for inference when properly distributed factors, such as measurement errors, are laid upon some properly framed theoretically expected result.  But when we have no theory and must rely on internal comparisons and data fitting, as between cases and controls, then we often have no way to know what part of our results has to do with sampling etc. and where any underlying natural laws, might be in the empirical mix--if such laws even exist.

Given this situation, the promise of 'precision' can be seen starkly as a marketing ploy rather than knowledgeable science.  It's a distraction to the public but also to the science itself, and that is the worst thing that can happen to legitimate science.  For example, if we can't really predict based on any serious-level theory, we can't tell how erroneous future predictions will be relative to existing retrospective data-fitting so we can't, largely even in principle, know how much this Big Data romance will approximate any real risk truths, because true risks (of some disease or phenotype) may not exist as such or may depend on things, like environmental exposures and behavior, that cannot be known empirically (and perhaps not even in theory), again, even in principle.

Rethinking is necessary, but in our current System of careerism and funding, we're not really even trying to lay out a playing field that will stimulate the required innovation in thought.  Big Data advocates sometimes openly, without any sense of embarrassment, say that serendipity will lead those with Big Data actually to find something important.  But deep insight may not be stimulated as long as we aren't even aware that we're eschewing theory basically in favor of pure extrapolated empiricism--and that we have scant theory even to build on.

There are those of us who feel that a lot more attention and new kinds of thinking need to be paid to the deeper question of how living Nature 'is' rather than very shaky empiricism that is easy, if costly, to implement but whose implications are hard to evaluate. Again, based on current understanding, it is quite plausible that life, based on evolution which is in turn based on difference rather than replicability, simply is not a phenomenon that obeys natural law in the way oxygen atoms, gravity, and even particle entanglement do.

To the extent that is the case, we are still in a metaphysical age, and there may be no way out of it.

Is genetics still metaphysical? Part II. Is that wrong?

What is the role of theory vs empiricism in science?  How do these distinctions apply to genetics?

Yesterday, we discussed some of the history of contesting views on the subject.  Much of the division occurred before there was systematically theoretical biology.  In particular, when creationism, or divine creative acts rather than strictly material processes, was the main explanation for life and its diversity, the issues were contended in the burgeoning physical sciences, with its dramatic technological advances, and experimental settings, and where mathematics was a well-established part of the science and its measurement aspects.


Around the turn of the 20th century, Darwinian evolution was an hypothesis that not even all the leading biologists could accept.  Inheritance was fundamental to any evolutionary view, and inherited somethings seemed obviously to be responsible for the development of organisms from single cells (fertilized eggs). Mendel had shown examples of discretely inherited traits, but not all traits were like that.  Ideas about what the inherited units were (Darwin called them gemmules, Mendel called them Elements, and hereafter I'll use the modern term 'genes') were simply guesses (or just words).  They were stand-ins for what was assumed to exist, but in the absence of their direct identification they were, essentially, only metaphysical or hypothetical constructs.


The cloak of identity had serious implications.  For example, evolution is about inherited variation, but genes as known in Darwin's time and most of the later 19th century didn't seem to change over generations, except perhaps due to grotesquely nonviable effects called 'mutations'.  How could these 'genes', whatever they were, be related to evolution, which is inherently about change and relative positive effects leading to selection among organisms that carried them?


Many critics thought the gene was just a metaphysical concept, that is, used for something imagined, that could not in a serious way be related to the empirical facts about inherited traits. The data were real, but the alleged causal agent, the 'gene', was an unseen construct, yet there was a lot of dogma about genes.  Many felt that the life sciences should stick to what could be empirically shown, and shy away from metaphysical speculation.  As we saw yesterday, this contention between empiricism and theory was a serious part of the debate about fundamental physics at the time.


That was more than a century ago, however, and today almost everyone, including authors of textbooks and most biologists themselves, asserts that we definitely do know what a gene is, in great detail, and it is of course as real as rain and there's nothing 'metaphysical' about it.  To claim that genes are just imagined entities whose existential reality cannot be shown would today be held to be not just ignorant, but downright moronic.  After all, we spend billions of dollars each year studying genes and what they do!  We churn out a tsunami of papers about genes and their properties, and we are promised genetically based 'precision' medicine, and many other genetic miracles besides, that will be based on identifying 'genes for' traits and diseases, that is enumerable individual genes that cause almost any trait of interest, be it physical, developmental, or behavioral.  That's why we're plowing full budget ahead to collect all sorts of Big Data in genetics and related areas.  If we know what a gene is then the bigger the data the better, no?


Or could it be that much of this is marketing that invokes essentially metaphysical entities to cover what, despite good PR to the contrary, remains just empiricism?  And if it is just empiricism, why the 'just'?  Isn't it good that, whatever genes 'are', if we can measure them in some way we can predict what they do and live to ripe old ages with nary a health problem?  Can't we in fact make do with what is largely pure empiricism, without being distracted by any underlying law of biological causation, or the true nature of these causative entities--and deliver the miraculous promises? The answer might be a definitive no!


The metaphysical aspects of genes, still today

In essence, genes are not things, they are not always discrete DNA sequence entities with discrete functions, and they are not independently separable causative agents.  Instead, even the term 'gene' remains a vague, generically defined one.  We went through decades in the 20th century believing that a gene was a distinct bit of DNA sequence, carrying protein code. But it is not so simple.  Indeed, it is not simple at all. 

It is now recognized by those who want to pay attention to reality, that the concept of the 'gene' is still very problematic, and to the extent that assertions are made about 'genes' they are metaphysical assertions, no matter how clothed in the rhetoric of empiricism they may be.  For example, many DNA regions code for functional RNA rather than protein.  Much DNA function has to do with expression of these coding regions.  Many coding regions are used in different ways (for example, different exon splicing) in different circumstances.  Some DNA regions act only when they are chemically modified by non-DNA molecules (and gene expression works exclusively in that way). Some of 'our' DNA is in microbes that are colonizing us.  And 'traits' as we measure them are the result of many--often hundreds or more--DNA elements, and of interactions among cells.  Each cell's DNA is different at least in some details from that of its neighbors (due to somatic mutation, etc.).  And then there is 'the' environment!  This is central to our biological state but typically not accurately measurable.


Some discussion about these issues can be seen in a report of a conference on the gene concept in 2011 at the Santa Fe Institute.  Even earlier, in 2007 when it seemed we had really learned about genomes, hardly suspecting how much more there was (and is) still to be learned, a review in Genome Research was defined in an almost useless way as follows: 

Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition sidesteps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.
Really?!  Is that a definition or an academically couched but empty kicking of the can down the road while seeming to be knowledgeable and authoritative?  Or is it simply so empty as to be risible?

There are many now who advocate a 'Third Way' that in a rather generic sense of advocating less dogma and more integrative and indeed innovative or integrative approaches.  But even this doesn't say what the Third Way actually is, though one thing for sure is that it's every Third Way member's favorite way of coopting the concept of biological causation as his or her own.  I'm being cynical, and I'm associated with the Third Way myself and believe that serious rethinking about biological causation and evolution is in order, but that doesn't seem to be too unfair a way to characterize the Third Way's characterization of mainline genome-centered or perhaps genome-obsessed thinking. At least, it acknowledges that we don't just have 'genes' and 'environment', but that biological causality is based fundamentally on interactions of many different kinds. 

DNA is basically an inert molecule on its own
In genetic terminology, DNA is basically an inert molecule.  That is, whatever you want to call genes act in a context-specific way, and this goes beyond what is known as cis interactions among local DNA elements (like regulatory sequences flanking coding sequences) along a given strand. Instead, genetic function is largely a trans phenomenon, requiring interaction among many or even countless other parts of DNA on the different chromosomes in the cell.  And often if not typically, nothing happens until the coded product--RNA or protein--itself is modified by or interacts with other compounds in the cell (and responds to external things the cell detects).

Beyond even that complexity provides comparable evolutionary or physiological complexity.  There are many, perhaps often also countless alternative biological pathways to essentially the same empirical result (say, height or blood pressure or intelligence).  These causally equivalent combinations, if we can even use the term 'causal', are many and un-enumerated, and perhaps un-enumerable.  The alternatives may be biochemically different, but if it they confer essentially no difference in terms of natural selection, they are evolutionarily as well as physiologically equivalent. Indeed, the fact is that every cell, and hence every organism is different in regard to the 'causal' bases of traits.  We may be able to define and hence measure some result, such as blood pressure or reproductive fitness; but to speak of causes as if they are individually distinct or discrete entities is still essentially being metaphysical. Yet, for various sociocultural and economic reasons, we seem unwilling to acknowledge this.

You might object by saying that in fact most geneticists, from Francis Collins down to the peons who plead for his funding support, are being essentially empirical and not indulging in theory.  Yes, they drop words like 'gene' and 'epigenome' and 'microbiome' or 'network' or 'system', but this are on or over the edge of metaphysics (speculative guessing).  Many who feed at the NIH (and NSF) trough might proudly proclaim that they are in fact not dealing with airy-fairy theory, but simply delivering empirical and hence practical, useful results.  They do genomewide mapping because, or even proudly declaring, they have no causative theory for this disease or that behavioral trait.  Usually, however, they confound statistical significance with formal theory, even if they don't so declare explicitly.

For example, most studies of genotypes and genetic variation relative to traits like disease, are based on internal comparisons (cases vs control, tall vs short, smart vs not-smart, criminal vs non-criminal, addictive vs sober, etc.).  They don't rest on any sort of theory except that they do implicitly identify entities like 'genes'.  Often this is so metaphysical as to be rather useless, but it is only right to acknowledge that these results are occasionally supported by finding an indicated 'gene' (DNA sequence element), whose manipulation or variation can be shown to have molecular function relevant to the trait, at least under some experimental conditions.  But this causative involvement is usually quite statistical, providing only weak causative effects, rather than in any clear sense deterministic.  We are enabled by this largely pure empiricism to argue that the association we saw in our retrospective study is what we'll see prospectively as causation in the future.  And we now know enough to know that when it seems to work it is (as, indeed, in Mendel's own time) it's only the simplest tip of the causative iceberg.

We are tempted to believe, and to suggest, that this 'gene' (or genetic variant, an even cruder attempt at identifying a causative element) will be predictive of, say, a future disease at least in some above-average sense. That is, even if we don't know the exact amount of associated risk.  But even that is not always the case: the associated risks are usually small and data-specific and often vary hugely from study to study, over time, or among populations.  That means, for example, that people--typically by far most people--carrying the risk variant will not get the associated disease! It may often do nothing when put into, say, a transgenic mouse.  The reason has to be context, but we usually have scant idea about those contexts (even when they are environmental, where the story is very similar). That is a profound but far under-appreciated (or under-acknowledged) fact with very widespread empirical support!


Indeed, the defense of pure empiricism is one of convenience, funding-wise among other reasons; but perhaps with today's knowledge all we can do if we are wedded to Big Data science and public promises of 'precision' genomic prediction.  When or if we have a proper theory, a generalization about Nature, we can not only test our empirical data agains the theory's predictions, but also use the theory to predict new, future outcomes with a convincing level of, yes, precision. Prediction is our goal and the promises (and, notably, research funding) rest on prediction, not just description. So, as Einstein (and Darwin) felt, an underlying theory of Nature makes data make sense. Without it we are just making hopeful guesses.  Anyone who thinks we have such a theory based on all the public rhetoric by scientists is, like most of the scientists themselves, confusing empiricism with theory, and description with understanding. Those who are thoughtful know very well that they are doing this, but can't confess it publicly.  Retired people (like me) are often less inhibited!

Or could there perhaps be another way to think about this, in which genetics as currently understood remains largely metaphysical, that genetics is real but we simply don't yet have an adequate way of thinking that will unite empiricism to some underlying global reality, some theory in the proper scientific sense?


Tomorrow we'll address the possibility that genetics is inherently metaphysical in that there isn't any tractably useful universal natural law out there to be discovered.

Is genetics still metaphysical? Part I. Some general history.

In very broad terms, modern science has had debates about two basic kinds of approaches to understanding the world.  To over-simplify, they are the empirical and the theoretical approaches. Some argue that we can know only what we can detect with our sensory systems (and machines to extend them), but we can never know what general causal principles account for those data, or even if such real, true principles exist. Others view science's essential job as not just accumulating collections of data, which are necessarily imperfect, but to use such observations to build a picture of the true, or perfect underlying regularity--the 'laws' of Nature.

In the former case we just have to make measurements and try to show the ways in which comparable situations lead to comparable outcomes.  In the latter, we want what we call 'theory', that is, perfect generalizations that tell us how a given situation will turn out, and what the causal reasons are.  The standard assumption of the physical sciences is that Nature is, indeed, universally law-like.  Variables like the gravitational constant and the speed of light really are universally, precisely constant.

These are age-old differences, often 'just' philosophical, but they're quite important.  Comparably important are the still-unanswered question as to whether any phenomena in Nature is irreducibly probabilistic rather than deterministic, or whether probabilistic aspects of Nature really just reflect our imperfect sampling and measurement. This is the important distinction between epistemology--how we know things, and ontology--how things really are.  Can we ever tell the difference?

Empiricism is in some ways the easy part.  We just go out and make measurements and let them accumulate so we can generalize about them.  That's a lot of slogging to get the data, but all you have to do is be systematic and careful.  Don't give me airy generalizations, just the facts, please!

In other ways, theory is the easy part.  All you have to do is sit in your armchair, as the proverbial denigratory model has it, and make up something that sounds exotic (or even mathematically intricate) and claim you know how Nature 'is'.  Data are imperfect, so don't bother me about that! There are long traditions in both kinds of approach, and to a great extent it's only been the past few hundred years in which there has been melding of these two basic approaches.

Often, theory hypothesizes some fundamental objects whose properties and actions can only be seen indirectly, as they are manifest in measurable phenomena. Here there is a delicate boundary between what is essentially 'metaphysical' as opposed to real.  Many object to the use of metaphysical concepts and claims as being essentially untestable, and argue that only empiricism is real and should be taken seriously.  In the 19th and early 20th centuries, as technology revealed more and more about unseen Nature, things that were not yet seen directly but that could be hypothesized and assigned to things we could measure, we taken as true by some but denigrated as metaphysical by pure empiricists.

These distinctions were never that clear, in my view (even if they provided jobs for philosophers to write about).  Empiricism is retrospective but understanding requires some sorts of predictability, which is prospective.  If we cannot reliably generalize, if the same conditions don't always lead to the same result, how can the observing the former lead us to the latter?  Predictive power is largely what we want out of science, even if it's just to confirm our understanding of Nature's laws.

Until fairly recently, these issues have mainly been housed in the physical sciences, but since Linnaeus' time, but especially after Darwin and Wallace, the issues have applied to biology as well.
In this brief series we'll try to explore whether or how we can think of biology as the result of such universal laws or whether all we can do is make observations and rough causal generations about them. What is the place for strong causal theory in biology, or are empiricism and very general notions of process enough?

An example from the early prime era in modern science is the 'atom'.  Matter was conceived as being composed of these unseen particles, that accounted for the weight and properties of chemicals, and whose movement accounted for the weight, temperature, and pressure in gases.  Similar kinds of issues related to electromagnetism: what 'was' it?

An important late 19th-early 20th century example had to do with the existence of 'ether' as the medium through which electromagnetic radiation moved.  Ether could not be seen or felt but wavelike radiation had to be waves in something, didn't it?  Late-century tests failed to find it (e.g., the famous Michelson-Morely experiment).  In well-known interchanges at the time, figures like Ernst Mach, Albert Einstein and Max Planck thought about and debated whether there was a 'real' underlying general 'fabric' of Nature or whether specific empirical data simply showed us enough, and trying to delve deeper was dealing in metaphysics.  Many felt that was simply not justified--measurement or empiricism was what science could hope for.  On the other hand some, like Einstein, were convinced that Nature had a universal, and real underlying reality of which measurements were reflections.  He felt that theory, and in this case mathematics, could reveal or even 'intuit' Nature's underlying fabric.  An interesting article by Amanda Gefter in Nautilus science magazine deals with some of this history, with useful references.

So what about biology?
Biology had been largely a descriptive or even theological field before it became a modern science. But then came Darwin and his idea of evolution.  He viewed natural selection as a kind of Newtonian universal force.  Was it a type of explanation fitted simply around the empirical data that had been collected by Naturalists, or did it constitute some form of universal theory of life as Darwin asserted? Selection as a force had to work through some 'medium' or elements of inheritance.   His causal elements ('gemmules') were (like Lamarck's before him) entirely invented to 'fit' what was being observed about the evolution of diversity.  Indeed, he modeled natural selection itself after intentional agricultural selection because the latter could be demonstrated by human intent, while the former was generally far too slow to observe directly.  But there had to be some 'units' of inheritance for it to work, so he essentially invented them out of thin air.  Even in the early 20th century, 'genes' (as they became known) were largely hypothesized units for whose physical nature--or even reality--there was only indirect empirical evidence.

Assuming these discrete causal particles could enable the force, natural selection, to work on adaptive change was much like assuming that electromagnetic radiation needed ether to do its job.  Since differential reproductive success is observable, one can always define it to be the result of selection and to assume some gene(s) to be responsible. The test for relative success is, after all, only a statistical one with subjective decision-making criteria (like significance level) in empirical data.  In that sense, natural selection is a very  metaphysical notion because after the fact we can always empirically observe what has succeeded over time, or what functions have evolved, and call that the result of selection.  Such an explanation can hardly be falsified.  What is the reality of the underlying force, that Darwin likened to gravity?  Since it is always dependent on changing local conditions, what sort of a 'law' is it anyway?  And if it's basically metaphysical, should we reject it?

Mendelian genetics as metaphysics
If selection is a process, like gravity, it had to work on objects.  Because individual organisms are temporary (they all die), the objects in question had to be transmitted from parent to offspring.  That transmission was also found, by Mendel's experiment, to be a regular kind of process.  Mendel's causative 'elements', that we now call 'genes', appeared in his carefully chosen pea experiments to be transmitted as discrete things.  They fit the discretely causative world of the energized new field of atomic chemistry (see my Evolutionary Anthropology article on Mendel), with its idea that a chemical is made up of a particular kind of atom (thought by some to be multiples of hydrogen at the time), and Mendel's statistical tests showed a reasonably good fit to that discrete-unit worldview (indeed accusations that he or his assistants cheated may reflect his acceptance of discrete underlying but unseen and hence metaphysical, elements). But what were these genes?  In what serious sense did they exist as things rather than just an imaginary but essentially unconstrained variables conjured up to account for actual observations--of some sorts of inheritance, that of discretely varying traits--whose actual nature was entirely inaccessible?

These questions became very important in the debate about how evolution worked, since evolution required inheritance of favored states.  But what Mendelian analysis, the only 'genetic' analysis available at the time, showed was that the causal genes' effects did not change, and they only were shown to fit discretely varying traits, not the quantitative traits of Darwinian evolution.  For these reasons even many mainline evolutionary biologists felt that genes, whatever they were, couldn't account for evolution after all.  Maybe geneticists were indulging in metaphysics.

This was similar to the situation that engaged Einstein, Ernst Mach, and others about physics, but when it came to biology, the difference between empiricism and metaphysics became, literally, quite lethal!  The tragic impact of Profim Lysenko in the Soviet Union was due to a direct rejection by the scientific power structure that he established based on promises of rapid adaptation in plants, for example to the long, frozen Soviet winters, without adaptive 'genes' having to arise by evolution's slow pace.  As I summarized in another Ev. Anth article, it was in part the alleged 'metaphysical' nature of 'genes' in the early 20th century that Lysenko used to reject what most of us would call real science, and put in place an agricultural regime that failed, with mortally disastrous consequences. Along the way, Lysenko with Stalin's help purge many skilled Soviet geneticists, leading many of them to tragic ends. The mass starvation of the era of Lysenkoist agriculture in the USSR may in part have been the result of this view of theoretical science (of course, Lysenko had his own theory, which basically didn't work as it was as much wishful thinking as science).

But how wrong was it to think of genes as metaphysical concepts at the time?  Mendel had showed inheritance patterns that seemed to behave, statistically, as if they were caused by specific particles. But he knew many if not most traits did not follow the same pattern.  Darwin knew of Mendel's work (and he of Darwin's), but neither thought the other's theories were relevant to his own interests.

But in the first part of the 20th century, the great experimental geneticist TH Morgan used Mendelian ideas in careful breeding experiments to locate 'genes' relative to each other on chromosomes.  Even he was an empiricist and avowedly didn't really deal with what genes 'were', just how their causal agency was arranged.

Mendel's work also provided a research experimental approach that led via Morgan and others to the discovery of DNA and its protein coding sequences.  We call those sequences 'genes' and research has documented what they are and how they work in great detail.  In that sense, and despite early vague guesses about their nature, for most of a century one could assert that genes were in fact quite real, not metaphysical, entities at all.  Not only that, but genes were the causal basis of biological traits and their evolution!

But things have turned out not to be so simple or straightforward.  Our concept of 'the gene' is in rather great flux, in some ways each instance needing its own ad hoc treatment.  Is a regulatory element a 'gene', for example, or a modified epigenetic bit of DNA?  Is the 'gene' as still often taught in textbooks still in fact largely a metaphysical concept whose stereotypical properties are convenient but not nearly as informative as is the commonly presented view, even in the scientific literature?

Are we still resting on empiricism, invoking genetic and evolutionary theory as a cover but, often without realizing it, fishing for an adequate underlying theory of biological causation, that would correspond to the seamless reality Einstein (and Darwin, for that matter) felt characterized Nature? Is the gene, like Procrustes, being surgically adapted after the fact, to fit our desired tidy definition?  Is claiming a theory on which genetic-based predictions can be 'precise' a false if self-comforting claim, as a marketing tool by NIH, when in fact we don't have the kind of true underlying theory of life that Einstein dreamed of for physics and the cosmos?

We'll deal with that in our next posts.

When scientific theory constrains

It's good from time to time to reflect on how we know what we think we know.  And to remember that, as it has been in any time in history, much of what we now think is true will sooner or later be found to be false or, often, only inaccurately or partially correct.  Some of this is because values change -- not so long ago homosexuality was considered to be an illness, e.g.  Some is because of new discoveries -- when archaea were first discovered they were thought to be exotic microbes that inhabited extreme environments but now they're known to live in all environments, even in and on us. And of course these are just two of countless examples.

But what we think we know can be influenced by our assumptions about what we think is true, too. It's all too easy to look at data and interpret it in a way that makes sense to us, even if there are multiple possible interpretations.  This can be a particular problem in social science, when we've got a favorite theory and the data can be seen to confirm it; this is perhaps easiest to notice if you yourself aren't wedded to any of the theories.  But it's also true in biology. It is understandable that we want to assert that we now know something, and are rewarded for insight and discoveries, rather than more humbly hesitating to make claims.

Charitable giving
The other day I was listening to the BBC Radio 4 program Analysis on the charitable impulse.  Why do people give to charity?  It turns out that a lot of psychological research has been done on this, to the point that charities are now able to manipulate us into giving.  If you call your favorite NPR station to donate during a fund drive, e.g., if you're told that the caller just before you gave a lot of money, you're more likely to make a larger donation than if you're told the previous caller pledged a small amount.

A 1931 advertisement for the British charity, Barnardo's Homes; Wikipedia

Or, if an advertisement pictures one child, and tells us the story of that one child, we're more likely to donate than if we're told about 30,000 needy children.  This works even if we're told the story of two children, one after the other.  But, according to one of the researchers, if we're shown two children at once, and told that if we give, the money will randomly go to just one of the children, we're less likely to give.  This researcher interpreted this to mean that two is too many.

But there seem to me to be other possible interpretations given that the experiment changes more than one variable.  Perhaps it's that we don't like the idea that someone else will choose who gets our money.  Or that we feel uncomfortable knowing that we've helped only one child when two are needy.  But surely something other than that two is too many, given that in 2004 so many people around the world donated so much money to organizations helping tsunami victims that many had to start turning down donations.  These were anonymous victims, in great numbers.  Though, as the program noted, people weren't nearly as generous to the great number of victims of the earthquake in Nepal in 2015, with no obvious explanation.

The researcher did seem to be wedded to his one vs too many interpretation, despite the contradictory data.  In fact, I would suggest that the methods, given what were presented, don't allow him to legitimately draw any conclusion.  Yet he readily did.

Thinness microbes?
The Food Programme on BBC Radio 4 is on to the microbiome in a big way.  Two recent episodes (here and here) explore the connection between gut microbes, food, and health and the program promises to update us as new understanding develops.  As we all know by now, the microbiome, the bug intimates that accompany us through life, in and on our body, may affect our health, our weight, our behavior, and perhaps much more.  Or not.


Pseudomonas aeruginosa, Enterococcus faecalis and Staphylococcus aureus on Tryptic Soy Agar.  Wikipedia

Obesity, asthma, atopy, periodontal health, rheumatoid arthritis, Parkinson's, Alzheimer's, autism, and many many more conditions have been linked with, or are suggested to be linked with, in one way or another, our microbiome.  Perhaps we're hosting the wrong microbes, or not a diverse enough set of microbes, or we wipe the good ones out with antibiotics along with the bad, or with alcohol, and what we eat may have a lot to do with this.

One of the researchers interviewed for the program was experimenting with a set of identical twins in Scotland.  He varied their diets having them eat, for example, lots of junk food and alcohol, or a very fibrous diet, and documented changes in their gut microbiomes which apparently can change pretty quickly with changes in diet.  The most diverse microbiome was associated with the high fiber diet. Researchers seem to feel that diversity is good.

Along with a lot of enthusiasm and hype, though, mostly what we've got in microbiome research so far is correlations.  Thin people tend to have a different set of microbes than obese people, and people with a given neurological disease might statistically share a specific subset of microbes.  But this tells us nothing about cause and effect -- which came first, the microbiome or the condition?  And because the microbiome can change quickly and often, how long and how consistently would an organism have to reside in our gut before it causes a disease?

There was some discussion of probiotics in the second program, the assumption being that controlling our microbiome affects our health.  Perhaps we'll soon have probiotic yogurt or kefir or even a pill that keeps us thin, or prevents Alzheimer's disease.  Indeed, this was the logical conclusion from all the preceding discussion.

But one of the researchers, inadvertently I think, suggested that perhaps this reductionist conclusion was unwarranted.  He cautioned that thinking about probiotic pills rather than lifestyle might be counterproductive.  But except for factors with large effects such as smoking, the effect of "lifestyle" on health is rarely obvious.  We know that poverty, for example, is associated with ill health, but it's not so easy to tease out how and why.  And, if the microbiome really does directly influence our health, as so many are promising, the only interesting relevant thing about lifestyle would be how it changes our microbiomic makeup.  Otherwise, we're talking about complexity, multiple factors with small effects -- genes, environmental factors, diet, and so on, and all bets about probiotics and "the thinness microbiome" are off.  But, the caution was, to my mind, an important warning about the problem of assuming we know what we think we know; in this case, that the microbiome is the ultimate cause of disease.

The problem of theory
These are just two examples of the problem of assumption-driven science. They are fairly trivial, but if you are primed to notice, you'll see it all around you. Social science research is essentially the interpretation of observational data from within a theoretical framework. Psychologists might interpret observations from the perspective of behavioral, or cognitive, or biological psychology, e.g., and anthropologists, at least historically, from, say, a functionalist or materialist or biological or post-modernist perspective. Even physicists interpret data based on whether they are string theorists or particle physicists.

And biologists' theoretical framework? I would suggest that two big assumptions that biologists make are reductionism and let's call it biological uniformitarianism. We believe we can reduce causation to a single factor, and we assume that we can extrapolate our findings from the mouse or zebrafish we're working on to other mice, fish and species, or from one or some people to all people. That is, we assume invariance rather than that what we can expect is variation. There is plenty of evidence to show that by now we should know better.

True, most biologists would probably say that evolutionary theory is their theoretical framework, and many would add that traits are here because they're adaptive, because of natural selection. Evolution does connect people to each other and people to other species, it has done so by working on differences, not replicated identity, and there is no rule for the nature or number of those differences or for extrapolating from one species or individual to another. We know nothing to contradict evolutionary theory, but that every trait is adaptive is an assumption, and a pervasive one.

Theory and assumption can guide us, but they can also improperly constrain how we think about our data, which is why it's good to remind ourselves from time to time to think about how we know what we think we know. As scientists we should always be challenging and testing our assumptions and theories, not depending on them to tell us that we're right.

The statistics of Promissory Science. Part II: The problem may be much deeper than acknowledged

Yesterday, I discussed current issues related to statistical studies of things like genetic or other disease risk factors.  Recent discussion has criticized the misuse of statistical methods, including a statement on p-values by the American Statistical Association.  As many have said, the over-reliance on p-values can give a misleading sense that significance means importance of a tested risk factor.  Many touted claims are not replicated in subsequent studies, and analysis has shown this may preferentially apply to the 'major' journals.  Critics have suggested that p-values not be reported at all, or only if other information like confidence intervals (CIs) and risk factor effect sizes be included (I would say prominently included). Strict adherence will likely undermine what even expensive major studies can claim to have found, and it will become clear that many purported genetic, dietary, etc., risk factors are trivial, unimportant, or largely uninformative.

However, today I want to go farther, and question whether even making these correctives doesn't go far enough, and would perhaps serve as a convenient smokescreen for far more serious implications of the same issue. There is reason to believe the problem with statistical studies is more fundamental and broad than has been acknowledged.

Is reporting p-values really the problem?
Yesterday I said that statistical inference is only as good as the correspondence between the mathematical assumptions of the methods and what is being tested in the real world.  I think the issues at stake rest on a deep disparity between them.  Worse, we don't and often cannot know which assumptions are violated, or how seriously.  We can make guesses and do all auxiliary tests and the like, but as decades of experience in the social, behavioral, biomedical, epidemiological, and even evolutionary and ecological worlds show us, we typically have no serious way to check these things.

The problem is not just that significance is not the same as importance. A somewhat different problem with standard p-value cutoff criteria is that many of the studies in question involve many test variables, such as complex epidemiological investigations based on long questionnaires, or genomewide association studies (GWAS) of disease. Normally, p=0.05 means that by chance one test in 20 will seem to be significant, even if there's nothing causal going on in the data (e.g., if no genetic variant actually contributes to the trait).  If you do hundreds or even many thousands of 0.05 tests (e.g., of sequence variants across the genome), even if some of the variables really are causative, you'll get so many false positive results that follow-up will be impossible.  A standard way to avoid that is to correct for multiple testing by using only p-values that would be achieved by chance only once in 20 times of doing a whole multivariable (e.g., whole genome) scan.  That is a good, conservative approach, but means that to avoid a litter of weak, false positives, you only claim those 'hits' that pass that standard.

You know you're only accounting for a fraction of the truly causal elements you're searching for, but they're the litter of weakly associated variables that you're willing to ignore to identify the mostly likely true ones.  This is good conservative science, but if your problem is to understand the beach, you are forced to ignore all the sand, though you know it's there.  The beach cannot really be understood by noting its few detectable big stones.

Sandy beach; Wikipedia, Lewis Clark

But even this sensible play-it-conservative strategy has deeper problems.

How 'accurate' are even these preferred estimates?
The metrics like CIs and effect sizes that critics are properly insisting be (clearly) presented along with or instead of p-values face exactly the same issues as the p-value: the degree to which what is modeled fits the underlying mathematical assumptions on which test statistics rest.

To illustrate this point, the Pythagorean Theorem in plane geometry applies exactly and universally to right triangles. But in the real world there are no right triangles!  There are approximations to right triangles, and the value of the Theorem is that the more carefully we construct our triangle the closer the square of the hypotenuse is to the sum of the squares of the other sides.  If your result doesn't fit, then you know something is wrong and you have ideas of what to check (e.g., you might be on a curved surface).

Right triangle; Wikipedia

In our statistical study case, knowing an estimated effect size and how unusual it is seems to be meaningful, but we should ask how accurate these estimates are.  But that question often has almost no testable meaning: accurate relative to what?  If we were testing a truth derived from a rigorous causal theory, we could ask by how many decimal places our answers differ from that truth.  We could replicate samples and increase accuracy, because the signal to noise ratio would systematically improve.  Were that to fail, we would know something was amiss, in our theory or our instrumentation, and have ideas how to find out what that was.  But we are far, indeed unknowably far, from that situation.  That is because we don't have such an externally derived theory, no analog to the Pythagorean Theorem, in important areas where statistical study techniques are being used.

In the absence of adequate theory, we have to concoct a kind of data that rests almost entirely on internal comparison to reveal whether 'something' of interest (often that we don't or cannot specify) is going on.  We compare data such as cases vs controls, which forces us to make statistical assumptions such as that, other than (say) exposure to coffee, our sample of diseased vs normal subjects differ only in their coffee consumption, or that the distribution of other variation in unmeasured variables is random with regard to coffee consumption among our cases and controls subjects. This is one reason, for example, that even statistically significant correlation does not imply causation or importance. The underlying, often unstated assumptions are often impossible to evaluate. The same problem relates to replicability: for example, in genetics, you can't assume that some other population is the same as the population you first studied.   Failure to replicate in this situation does not undermine a first positive study.  For example, a result of a genetic study in Finland cannot be replicated properly elsewhere because there's only one Finland!  Even another study sample within Finland won't necessarily replicate the original sample.  In my opinion, the need for internally based comparison is the core problem, and a major reason why theory-poor fields often do so poorly.

The problem is subtle
When we compare cases and controls and insist on a study-wide 5% significance level to avoid a slew of false-positive associations, we know we're being conservative as described above, but at least those variables that do pass the adjusted test criterion are really causal with their effect strengths accurately estimated.  Right?  No!

When you do gobs of tests, some very weak causal factor may by good luck pass your test. But of those many contributing causal factors, the estimated effect size of the lucky one that passes the conservative test is something of a fluke.  The estimated effect size may well be inflated, as experience in follow-up studies often or even typically shows.

In this sense it's not just p-values that are the problem, and providing ancillary values like CIs and effect sizes in study reports is something of a false pretense of openness, because all of these values are vulnerable to similar problems.  The promise to require these other data is a stopgap, or even a strategy to avoid adequate scrutiny of the statistical inference enterprise itself.

It is nobody's fault if we don't have adequate theory.  The fault, dear Brutus, is in ourselves, for using Promissory Science, and feigning far deeper knowledge than we actually have.  We do that rather than come clean about the seriousness of the problems.  Perhaps we are reaching a point where the let-down from over-claiming is so common that the secret can't be kept in the bag, and the paying public may get restless.  Leaking out a few bits of recognition and promising reform is very different from letting all it all out and facing the problem bluntly and directly.  The core problem is not whether a reported association is strong or meaningful, but, more importantly, that we don't know or know how to know.

This can be seen in a different way.   If all studies including negative ones were reported in the literature, then it would be only right that the major journals should carry those findings that are most likely true, positive, and important.  That's the actionable knowledge we want, and a top journal is where the most important results should appear.  But the first occurrence of a finding, even if it turns out later to be a lucky fluke, is after all a new finding!  So shouldn't investigators report it, even though lots of other similar studies haven't yet been done?  That could take many years or, as in the example of Finnish studies, be impossible.  We should expect negative results should be far more numerous and less interesting in themselves, if we just tested every variable we could think of willy-nilly, but in fact we usually have at least some reason to look, so it is far from clear what fraction of negative results would undermine the traditional way of doing business.  Should we wait for years before publishing anything? That's not realistic.

If the big-name journals are still seen as the place to publish, and their every press conference and issue announcement is covered by the splashy press, why should they change?  Investigators may feel that if they don't stretch things to get into these journals, or just publish negative results, they'll be thought to have wasted their time or done poorly designed studies.  Besides normal human vanity, the risk is that they will not be able to get grants or tenure.  That feeling is the fault of the research, reputation, university, and granting systems, not the investigator.  Everyone knows the game we're playing. As it is, investigators and their labs have champagne celebrations when they get a paper in one of these journals, like winning a yacht race, which is a reflection of what one could call the bourgeois nature of the profession these days.

How serious is the problem?  Is it appropriate to characterize what's going on as fraud, hoax, or silent conspiracy?  Probably in some senses yes; at least there is certainly culpability among those who do understand the epistemological nature of statistics and their application.  Plow ahead anyway is not a legitimate response to fundamental problems.

When reality is closely enough approximated by statistical assumptions, causation can be identified, and we don't need to worry about the details.  Many biomedical and genetic, and probably even some sociological problems are like that.  The methods work very well in those cases.  But this doesn't gainsay the accusation that there is widespread over-claiming taking place and that the problem is a deep lack of sufficient theoretical understanding of our fields of interest, and a rush to do more of the same year after year.

It's all understandable, but it needs fixing.  To be properly addressed, an entrenched problem requires more criticism even than this one has been getting recently.  Until better approaches come along, we will continue wasting a lot of money in the rather socialistic support of research establishments that keep on doing science that has well-known problems.

Or maybe the problem isn't the statistics, after all?
The world really does, after all, seem to involve causation and at its basis seems to be law-like. There is truth to be discovered.  We know this because when causation is simple or strong enough to be really important, anyone can find it, so to speak, without big samples or costly gear and software. Under those conditions, numerous details that modify the effect are minor by comparison to the major signals.  Hundreds or even thousands of clear, mainly single-gene based disorders are known, for example.  What is needed is remediation, hard-core engineering to do something about the known causation.

However, these are not the areas where the p-value and related problems have arisen.  That happens when very large and SASsy studies seem to be needed, and the reason is that there causal factors are weak and/or so complex.  Along with trying to root out misrepresentation and failure to report the truth adequately, we should ask whether, perhaps, the results showing frustrating complexity are correct.

Maybe there is not a need for better theory after all.  In a sense the defining aspect of life is that it evolves not by the application of external forces as in physics, but by internal comparison--which is just what survey methods assess.  Life is the result of billions of years of differential reproduction, by chance and various forms of selection--that is, continual relative comparison by local natural circumstances.  'Differential' is the key word here.  It is the relative success among peers today that determines the genomes and their effects that will be here tomorrow.  In a way, in effect and if often unwittingly and for lack of better ideas, that's just the sort of comparison made in statistical studies.

From that point of view, the problem is that we don't want to face up to the resulting truth, which is that a plethora of changeable, individually trivial causal factors is what we find because that's what exists.  That we don't like that, don't report it cleanly, and want strong individual causation is our problem, not Nature's.

Unknowns, yes, but are there unknowables in biology?

The old Rumsfeld jokes about the knowns and unknowns are pretty stale by now, so we won't really indulge in beating that dead horse.  But in fact his statement made a lot of sense.  There are things we think we know (like our age), things we think we don't know but might know (like whether there will be a new message in our inbox when we sign onto email), and things we don't know but don't know we don't know (such as how many undiscovered marine species there are). Rumsfeld is the subject of ridicule not for this pronouncement per se (at least to those who think about it), because it is actually reasonable, but for other things that he is said to have done or said (or failed to say) in regard to American politics.

Explaining what we don't know is a problem!  Source: Google images

The unknowns may be problems, but they are not Big problems.  What we don't know but might know are at least within the realm of learning.  We may eventually stumble across facts we don't know but don't yet even know are there.  The job of science is to learn what we know we don't know and even to discover what we don't yet know that we don't know.  We think there is nothing 'inside' an electron or photon, but there may be if we some day realize that possibility.  Then the guts of a photon will become a known unknown.

However, there's another, even more problematic, one may say truly problematic kind of mystery: things that are actually unknowable.  They present a Really Big problem.  For example, based on our understanding of the current understanding of cosmology, there are parts of the universe that are so far away that energy (light etc.) from them simply has not, and can never, reach us.  We know that the details of this part of space are literally unknowable, but because we have reasonably rigorous physical theory we think we can at least reliably extrapolate from what we can see to the general contents (density of matter and galaxies etc.) of what we know must exist but cannot see.  That is, it's literally unknowable but theoretically known.

However, things like whether life exists out there are in principle unknowable.  But at least we know very specifically why that is so.  In the future, most of what we can see in the sky today is, according to current cosmological theories, going to become invisible as the universe expands so that the light from these visible but distant parts will no longer be able to reach us.  If there are any living descendants, they will know what was there to see and its dynamics and we will at least be able to make reasonable extrapolations of what it's like out there even though it can no longer be seen.

There are also 'multiverse' theories of various sorts (a book discussing these ideas is Our Mathematical Universe, by Mark Tegmark).  At present, the various sorts of parallel universes are simply inaccessible, even in principle, so we can't really know anything about them (or, perhaps, even whether they exist).  Not only is electromagnetic radiation not able to reach us so we can't observe, even indirectly, what was going on when that light was emitted from these objects, but our universe is self-contained relative to these other universes (if they exist).

Again, all of this is because of the kind of rigorous theory that we have, and the belief that if that theory is wrong, there is at least a correct theory to be discovered--Nature does work by fixed 'laws', and while our current understanding may have flaws the regularities we are finding are not imaginary even if they are approximations to something deeper (but comparably regular). In that sense, the theory we have tells us quite a lot about what seems likely to be the case even if unobserved. It was on such a basis that the Higgs boson was discovered (assuming the inferences from the LHC experiments are correct).

What about biology?
Biology has been rather incredibly successful in the last century and more.  The discoveries of evolution and genetics are as great as those in any other science.  But there remain plenty of unknowns about biological evolution and its genomic basis that are far deeper than questions about undiscovered species.  We know that these things are unknown, but we presume they are knowable and will be understood some day.

One example is the way that homologous chromosomes (one inherited each of a person's parents) line up with each other in the first stage of meiosis (formation of sperm and egg cells).  How do they find each other?  We know they do line up when sex cells are produced, and there are some hypotheses and bits of relevant information about the process, but we're aware of the fact that we don't yet really know how it works.

Homologous chromosomes pair up...somehow.  Wikimedia, public domain.

Chromosomes also are arranged in a very different 3-dimensional way during the normal life of every cell.  They form a spaghetti-like ball in the nucleus, with different parts of our 23 pairs of chromosomes very near to each other.  This 'chromosome conformation', the specific spaghetti ball, shown schematically in the figure, varies among cell types, and even within a cell as it does different things.  The reason seems to be at least in part that the juxtaposed bits of chromosomes contain DNA that is being transcribed (such as into messenger RNA to be translated into protein) in that particular cell under its particular circumstances.
Chromosomes arrange themselves systematically in the nucleus.  Source: image by Cutkosky, Tarazi, and Lieberman-Aiden from Manoharan, BioTechniques, 2011
It is easy to discuss what we don't know in evolution and genetics and we do that a lot here on MT. Often we critique current practice for claiming to know far more than is actually known, or, equally seriously, making promises to the supporting public that suggest we know things that in truth (and in private) we know very well that we don't know.  In fact, we even know why some things that we promise are either unknown or known not to be correct (for example, causation of biological and behavioral traits is far more complex than is widely claimed).

There are pragmatic reasons why our current system of science does this, which we and many others have often discussed, but here we want to ask a different sort of question:  Are there things in biology that are unknowable, even in principle, and if so how do we know that?  The answer at least in part is 'yes', though that fact is routinely conveniently ignored.

Biological causation involves genetic and environmental factors.  That is clearly known, in part because DNA is largely an inert molecule so any given bit of DNA 'does' something only in a particular context in the cell and related to whatever external factors affect the cell.  But we know that the future environmental exposures are unknown, and we know that they are unknowable.  What we will eat or do cannot be predicted even in principle, and indeed will be affected by what science learns but hasn't yet learned (if we find that some dietary factor is harmful, we will stop eating it and eat something else).  There is no way to predict such knowledge or the response to it.

What else may there be of this sort?
A human has hundreds of billions of cells, a number which changes and varies among and within each of us.  Each cell has a slightly different genotype and is exposed to slightly different aspects of the physical environment as well.   One thing we know that we cannot now know is the genotype and environment of every cell at every time.  We can make some statistical approximations, based on guessing about the countless unknowns of these details, but the numbers of variables will exceed that of stars on the universe and even in theory cannot be known with knowable precision.

Unlike much of physics, the use of statistical analytic techniques is inapt, also to an unknowable degree.  We know that not all cells are identical observational units, for example, so that aggregate statistics that are used for decision-making (e.g., significance tests) are simply guesses or gross assumptions whose accuracy is unknowable.  This is in principle because each cell, each individual is always changing.  We might call these 'numerical unknowables', because they are a matter of practicality rather than theoretical limits about the phenomena themselves.

So are there theoretical aspects of biology that in some way we know are unknowable and not just unknown?  We have no reason, based on current biological theory, to suspect the kinds of truly unknowables, analogous to cosmology's parallel universes.  One can speculate about all sorts of things, such as parallel yous, and we can make up stories about how quantum uncertainty may affect us. But these are far from having the kind of cogency found in current physics.

Our lack of comparably rigorous theory relative to what physics and chemistry enjoy leaves open the possibility that life has its own knowably unknowables. If so, we would like at least to know what those limits may be, because much of biology relates to practical prediction (e.g., causes of disease). The state of knowledge in biology, no matter how advanced it has become, is still far from adequate to address the question of the levels of knowable things that may eventually be knowable, but also what the limits to knowability are.  In a sense, unlike physics and cosmology, in biology we have no theory that tells us what we cannot know.

And unlike physics and cosmology, where some of these sorts of issues really are philosophical rather than of any practical relevance to daily life, we in biology have very strong reasons to want to know what we can know, and what we can promise....but perhaps also unlike physics, because people expect benefits from biological research, strong incentives not to acknowledge limits to our knowledge.

Rare Disease Day and the promises of personalized medicine

O ur daughter Ellen wrote the post that I republish below 3 years ago, and we've reposted it in commemoration of Rare Disease Day, Febru...