Masal ve Hikayeler: causation

causation etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Everything is genetic, isn't it?

There is hardly a trait, physical or behavioral, for which there is not at least some familial resemblance, especially among close relatives. And I'm talking about what is meant when someone scolds you saying, "You're just like your mother!" The more distant the relatives in terms of generations of separation, the less the similarity. So you really can resist when told, "You're just like your great-grandmother!" The genetic effects decline in a systematic way with more distant kinship.

The 'heritability' of a trait refers to the relative degree to which its variation is the result of variation in genes, the rest being due to variation in non-genetic factors we call 'environment'. Heritability is a ratio that ranges from zero when genes have nothing to do with the trait, to 1.0 when all the variation is genetic. The measure applies to a sample or population and cannot automatically be extended to other samples or populations, where both genetic and environmental variation will be different, often to an unknown extent.

Most quantitative traits, like stature or blood pressure or IQ scores show some amount, often quite substantial, of genetic influence. It often happens that we are interested in some trait that we think must be produced or affected by genes, but that no relevant factor, like a protein, is known. The idea arose decades ago that if we could scan the genome, and compare those with different manifestations of the trait, using mapping techniques like GWAS (genomewide association studies), we could identify those sites, genomewide, whose variation in our chosen sample may affect the trait's variation. Qualitative traits like the presence or absence of a disease (say, diabetes or hypertension), may often be due to the presence of some set of genetic variants whose joint impact exceeds some diagnostic threshold, and mapping studies can compare genotypes in affected cases to unaffected controls to identify those sites.

Genes are involved in everything. . . . .
Many things can affect the amount of similarity among relatives, so one has to try to think carefully about attributing ideas of similarity and cause. Some traits, like stature (height) have very high heritability, sometimes estimated to be about 0.9, that is, 90% of the variation being due to the effects of genetic variation. Other traits have much lower heritability, but there's generally familial similarity. And, that's because we each develop from a single fertilized egg cell, which includes transmission of each of our parent's genomes, plus ingredients provided by the egg (and perhaps to a tiny degree sperm), much of which were the result of gene action in our parents when they produced that sperm or egg (e.g., RNA, proteins). This is why traits can usually be found to have some heritability--some contribution due to genetic variation among the sampled individuals. In that sense, we can say that genes are involved in everything.

Understanding the genetic factors involved in disease can be important and laudatory, even if tracking them down is a frustrating challenge. But because genes are involved in everything, our society also seems to have an unending lust for investigators to overstate the value of their findings or, in particular, to estimate or declaim on the heritability, and hence genetic determination, of the most societally sensitive traits, like sexuality, criminality, race, intelligence, physical abuse and the like.

. . . . . but not everything is 'genetic'!
If the estimated heritability for a trait we care about is substantial, then this does suggest the obvious: genes are contributing to the mechanisms of the trait and so it is reasonable to acknowledge that genetic variation contributes to variation in the trait. However, the mapping industry implies a somewhat different claim: it is that genes are a major factor in the sense that individual variants can be identified that are useful predictors of the trait of interest (NIH's lobbying machine has been saying we'll be able to predict future disease with 'precision'). There has been little constraint on the types of trait for which this approach, sometimes little more than belief or wishful-thinking, is appropriate.

It is important to understand that our standard measures of genes' relative effect are affected both by genetic variation and environmental lifestyle factors. That means that if environments were to change, the relative genetic effects, even in the very same individuals, would also change. But it isn't just environments that change; genotypes change, too, when mutations occur, and as with environmental factors, these change in ways that we cannot predict even in principle. That means that we cannot legitimately extrapolate, to a knowable extent, the genetic or environmental factors we observe in a given sample or population, to other, much less to future samples or populations. This is not a secret problem, but it doesn't seem to temper claims of dramatic discoveries, in regard to disease or perhaps even more for societally sensitive traits.

But let's assume, correctly, that genetic variation affects a trait. How does it work? The usual finding is that tens or even hundreds of genome locations affect variation in the test trait. Yet most of the effects of individual genes are very small or rare in the sample. At least as important is that the bulk of the estimated heritability remains unaccounted for, and unless we're far off base somehow, the unaccounted fraction is due to the leaf-litter of variants individually too weak or too rare to reach significance.

Often it's also asserted that all the effects are additive, which makes things tractable: for every new person, not part of the study, just identify their variants and add up their estimated individual effects to get the total effect on the new person for whatever publishable trait you're interested in. That's the predictive objective of the mapping studies. However, I think that for many reasons one cannot accept that these variable sites' actions are truly additive. The reasons have to with actual biology, not the statistical convenience of using the results to diagnose or predict traits. Cells and their compounds vary in concentrations per volume (3D), binding properties (multiple dimensions), surface areas (2D) and some in various ways that affect how how proteins are assembled and work, and so on. In aggregate, additivity may come out in the wash, but the usual goal of applied measures is to extrapolate these average results to prediction in individuals. There are many reasons to wish that were true, but few to believe it very strongly.

Even if they were really additive, the clearly very different leaf-litter background that together accounts for the bulk of the heritability can obscure the numerical amount of that additivity from sample to sample and person to person. That is, what you estimated from this sample, may not apply, to an unknowable extent, to the next sample. If and when it does works, we're lucky that our assumptions weren't too far off.

Of course, the focus and promises from the genetics interests assume that environment has nothing serious to do with the genetic effects. But it's a major, often by far the major, factor, and it may even in principle be far more changeable than genetic variation. One would have to say that environmental rather than genetic measures are likely to be, by far, the most important things to change in society's interest.

We regularly write these things here not just to be nay-sayers, but to try to stress what the issues are, hoping that someone, by luck or insight, finds better solutions or different ways to approach the problem that a century of genetics, despite its incredibly huge progress, has not yet done. What it has done is in exquisite detail to show us what the problems are.

A friend and himself a good scientist in relevant areas, Michael Joyner, has passed on a rather apt suggestion to me, that he says he saw in work by Denis Noble. We might be better off if we thought of the genome as a keyboard rather than as a code or program. That is a good way to think about the subtle point that, in the end, yes, Virginia, there really are genomic effects: genes affect every trait....but not every trait is 'genetic'!

The Law of No Restraint

There's a new law of science reporting or, perhaps more accurately put, of the science jungle. The law is to feed any story, no matter how fantastic, to science journalists (including your university's PR spinners), and they will pick up whatever can be spun into a Big Story, and feed it to the eager mainstream media. Caveats may appear somewhere in the stories, but not the headlines so that, however weak or tentative or incredible, the story gets its exposure anyway. Then on to tomorrow's over-sell.

One rationale for this is that unexpected findings--typically presented breathlessly as 'discoveries'--sell: they rate the headline. The caveats and doubts that might un-headline the story may be reported as well, but often buried in minimal terms late in the report. Even if the report balances skeptics and claimants, simply publishing the story is enough to give at least some credence to the discovery.

The science journalism industry is heavily inflated in our commercial, 24/7 news environment. It would be better for science, if not for sales, if all these hyped papers, rather than being publicized at the time the paper is published, first appeared in musty journals for specialists to argue over, and in the pop-sci news only after some mature judgments are made about them. Of course, that's not good for commercial or academic business.

We have just seen a piece reporting that humans were in California something like 135,000 years ago, rather than the well-established continental dates of about 12,000. The report which I won't grace by citing here, and you've probably seen it anyway, then went on to speculate about what 'species' of our ancestors these early guys might have been.

Why is this so questionable? If it were a finding on its own, it might seem credible, but given the plethora of skeletal and cultural archeological findings, up and down the Americas, such an ancient habitation seems a stretch. There is no comparable trail of earlier settlements in northeast Asia or Alaska that might suggest it, and there are lots of animal and human archeological remains--all basically consistent with each other, so why has no earlier finding yet been made? It is of course possible that this is the first and is a correct one, but it is far too soon for this to merit a headline story, even with caveats.

Another piece we saw today reported that a new analysis casts doubt on whether diets high in saturated fat are bad for you. This was a meta-analysis of various other studies that have been done, and got some headline treatment because the authors report that, contrary to many findings over many years, saturated fats don't clog arteries. Instead, they say, coronary heart disease is a chronic inflammatory condition. Naturally, the study's basic data are being challenged, as reflected in this story's discussion, by critiques of its data and method. These get into details we're not qualified to judge, and we can't comment on the relative merits of the case.

However, one thing we can note is that with respect to coronary heart disease, study after study has reported more or less the same, or at least consistent findings about the correlation between saturated fats and risk. Still, despite so very much careful science, including physiological studies as well as statistical analysis of population samples, can we still apparently not be sure about a dietary component that we've been told for years should play a much reduced role in what we eat? How on earth could we possibly still not know about saturated fat diets and disease risk?

If this very basic issue is unresolved after so long, and the story is similar for risk factors for many complex diseases, then what is all this promise of 'precise' medicine all about? Causal explanations are still fundamentally unclear for many cancers, dementias, psychiatric disorders, heart disease, and so on. So why isn't the most serious conclusion that our methods and approaches themselves are for some reason simply not adequate to answer such seemingly simple questions as 'is saturated fat bad for you?' Were the plethora of previous studies all flawed in some way? Is the current study? Do the publicizing of the studies themselves change behaviors in ways that affects future studies?

There may be no better explanation than that diets and physiology are hard to measure and are complex, and that no simple answer is true. We may all differ for genetic and other reasons to such an extent that population averages are untrustworthy, or our habits may change enough that studies don't get consistent answers. Or asking about one such risk factor when diets and lifestyles are complex is a science modus operandi that developed for studying simpler things (like exposure to toxins or bacteria, the basis of classical epidemiology), and we simply need a better gestalt from which to work.

Clearly a contributory sociological factor is that the science industry has simply been cruising down the same rails despite constant popping of promise bubbles, for decades now. It's always more money for more and bigger studies. It's rarely let's stop and take a deep breath and think of some better way to understand (in this case) dietary relationships to physical traits. In times past, at least, most stories like the ancient Californian didn't get ink so widely and rapidly. But if I'm running a journal, or a media network, or am a journalist needing to earn my living, and I need to turn a buck, naturally I need to write about things that aren't yet understood.

Unfortunately, as we've noted before, the science industry is a hungry beast that needs its continual feeding, and (like our 3 cats) always demands more, more, and more. There are ways we could reform things, at least up to a point. We'll never end the fact that some scientists will claim almost anything to get attention, and we'll always be faced with data that suggest one thing that doesn't turn out that way. But we should be able to temper the level of BS and get back more to sober science rather than sausage factory 'productivity'. And educate the public that some questions can't be answered the way we'd like, or aren't being asked in the right way. But that is something science might address effectively, if it weren't so rushed and pressured to 'produce'.

Genomic causation....or not

By Ken Weiss and Anne Buchanan

The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.) This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn. The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss. Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes. Again we'll mention below what 'insufficient' might mean, if anything.

People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified. We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain. That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.

The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue. We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.

Some of the obvious complicating issues
We know that a gene generally does not act alone. DNA in itself is basically inert. We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule. This reinforces the yearning for causal simplicity and tractability. Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.

There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:

1. Current data are from blood-samples and that may not reflect the true constitutive genome because of early somatic mutation, and this will vary among study subjects,

2. Life-long exposure to local somatic mutation is not considered nor measured,

3. Epigenetic changes, especially local tissue-specific ones, are not included,

4. Environmental factors are not considered, and indeed would be hard to consider,

5. Non-Europeans, and even many Europeans are barely included, if at all, though this is beginning to be addressed,

6. Regulatory variation, which GWAS has convincingly shown is much more important to most traits than coding variation, is not included. Exome data have been treated naively by many investigators as if that is what is important, and exome-only data have been used a major excuse for Great Big Grants that can't find what we know is probably far more important,

7. Non-coding regions, non-regulatory RNA regions are not included in exome-only data,

8. A mutation may be causal in one context but not in others, in one family or population and not others, rendering the determination that it's a false discovery difficult,

9. Single gene analysis is still the basis of the new 'revelations', that is, the idea being hinted at that the 'causal' gene isn't really causal....but one implicit notion is that it was misidentified, which is perhaps sometimes true but probably not always so,

10. The new reports are presented in the news, at least, as if the gene is being exonerated of its putative ill effects. But that may not be the case, because if the regulatory regions near the mutated gene have no or little activity, the 'bad' gene may simply not be being expressed. Its coding sequence could falsely be assumed to be harmless,

11. Many aspects of this kind of work are dependent on statistical assumptions and subjective cutoff values, a problem recently being openly recognized,

12. Bigger studies introduce all sorts of statistical 'noise', which can make something appear causal or can weaken its actual apparent cause. Phenotypes can be measured in many ways, but we know very well that this can be changeable and subjective (and phenotypes are not very detailed in the initial ExAC database),

13. Early reports of strong genetic findings have well known upward bias in effect size, the finder's curse that later work fails to confirm.

Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature. But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.

Gregor Mendel studied fully penetrant (deterministic) causation. That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study. But even this was not so clear! Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'. (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.) For whatever reason--and see below--the presence of a 'dominant' gene or 'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.

In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted. The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule. Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards. Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising. Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.

Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery. This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation. This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations. But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense. That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait. This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.

This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes. This is treated as a striking or even fundamental new finding, but it is nothing of that sort. Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments. The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.

In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants. If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.

Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors. Nobody doubts the role of BRCA1 but it is not in itself causal. For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.

There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance. A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect. This seems to be the sort of thing the ExAC report is finding.

The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well. There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason). How could these even be sampled adequately?

Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects. This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing. Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.

So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise. It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms. It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.

Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations. That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.

It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here. They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).

But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge. In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.

And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution. Responses to environment (diet etc.) manifestly have the same problem. It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life. One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search. So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved). Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems. And may apply only to that family.

The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex. It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge. And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.

Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show. Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation. This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.). It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science. That is, it's as much about feeding the science industry as it is about medicine and public health. And that is why it's mainly about business as usual rather than real reform.

Another look at 'complexity'

A fascinating and clear description of one contemporary problem of sciences involved in 'complexity' can be found in an excellent discussion of how brains work, in yesterday's Aeon Magazine essay ("The Empty Brain," by Robert Epstein). Or rather, of how brains don't work. Despite the ubiquity of the metaphor, brains are not computers. Newborn babies, Epstein says, are born with brains that can learn, respond to the environment and change as they grow.

But here is what we are not born with: information, data, rules, software, knowledge, lexicons, representations, algorithms, programs, models, memories, images, processors, subroutines, encoders, decoders, symbols, or buffers – design elements that allow digital computers to behave somewhat intelligently. Not only are we not born with such things, we also don’t develop them – ever.

We are absolutely unqualified to discuss or even comment on the details or the neurobiology discussed. Indeed, even the author himself doesn't provide any sort of explanation of how brains actually work, using general hand-waving terms that are almost tautologically true, as when he says that experiences 'change' the brains. This involves countless neural connections (it must, since what else is there in the brain that is relevant?), and would be entirely different in two different people.

In dismissing the computer metaphor as a fad based on current culture, which seems like a very apt critique, he substitutes vague reasons without giving a better explanation. So, if we don't somehow 'store' an image of things in some 'place' in the brain, somehow we obviously do retain abilities to recall it. If the data-processing imagery is misleading, what else could there be?

We have no idea! But one important thing is that this essay reveals is that the problem of understanding multiple-component phenomena is a general one. The issues with the brain seem essentially the same as the issues in genomics, that we write about all the time, in which causation of the 'same' trait in different people is not due to the same causal factors (and we are struggling to figure out what they are in the first place).

A human brain, but what is it? Wikipedia

In some fields like physics, chemistry, and cosmology, each item of a given kind, like an electron or a field or photon or mass is identical and their interactions replicable (if current understanding is correct). Complexities like the interactions or curves of motion among many galaxies each with many stars, planets, and interstellar material and energy, the computational and mathematical details are far too intricate and extensive for simple solutions. So one has to break the pattern down into subsets and simulate them on a computer. This seems to work well, however, and the reason is that the laws of behavior in physics apply equally to every object or component.

Biology is comprised of molecules and at their level of course the same must be true. But at anything close to the level of our needs for understanding, replicability is often very weak, except in the general sense that each person is 'more or less' alike in its physiology, neural structures, and so on. But at the level of underlying causation, we know that we're generally each different, often in ways that are important. This applies to normal development, health and even to behavior. Evolution works by screening differences, because that's how new species and adaptations and so on arise. So it is difference that is fundamental to us, and part of that is that each individual with the 'same' trait has it for different reasons. They may be nearly the same or very different--we have no a priori way to know, no general theory that is of much use in predicting, and we should stop pouring resources into projects to nibble away at tiny details, a convenient distraction from the hard thinking that we should be doing (as well as addressing many clearly tractable problems in genetics and behavior, where causal factors are strong, and well-known).

What are the issues?
There are several issues here and it's important to ask how we might think about them. Our current scientific legacy has us trying to identify fundamental causal units, and then to show how they 'add up' to produce the trait we are interested in. Add up means they act independently and each may, in a given individual, have its own particular strength (for example, variants at multiple contributing genes, with each person carrying a unique set of variants, and the variants having some specifiable independent effect). When one speaks of 'interactions' in this context, what is usually meant is that (usually) two factors combine beyond just adding up. The classical example within a given gene is 'dominance', in which the effect of the Aa genotype is not just the sum of the A and the a effects. Statistical methods allow for two-way interactions in roughly this way, by including terms like zAXB (some quantitative coefficient times the A and the B state in the individual), assuming that this is the same in every A-B instance (z is constant).

This is very generic (not based on any theory of how these factors interact), but for general inference that they do act in relevant ways, it seems fine. Theories of causality invoke such patterns as paths of factor interaction, but they almost always assume various clearly relevant simplifications: that interactions are only pair-wise, that there is no looping (the presence of A and B set up the effect, but A and B don't keep interacting in ways that might change that and there's no feedback from other factors), that the size of effects are fixed rather than being different in each individual context.

For discovery purposes this may be fine in many multivariate situations, and that's what the statistical package industry is about. But the assumptions may not be accurate and/or the number and complexity of interactions too great to be usefully inferred in practical data--too many interactions for achievable sample sizes, their parameters being affected by unmeasured variables, their individual effects too small to reach statistical 'significance' but in aggregate accounting for the bulk of effects, and so on.

These are not newly discovered issues, but often they can only be found by looking under the rug, where they've been conveniently swept because our statistical industry doesn't and cannot adequately deal with them. This is not a fault of the statistics except in the sense that they are not modeling things accurately enough, and in really complex situations, which seem to be the rule rather than the exception, it is simply not an appropriate way to make inferences.

We need, or should seek, something different. But what?
Finding better approaches is not easy, because we don't know what form they should take. Can we just tweak what we have, or are we asking the wrong sorts of questions for the methods we know about? Are our notions of causality somehow fundamentally inadequate? We don't know the answers. But what we now do have is a knowledge of the causal landscape that we face. It tells us that enumerative approaches are what we know how to do, but what we also know are not an optimal way to achieve understanding. The Aeon essay describes yet another such situation, so we know that we face the same sort of problem, which we call 'complexity' as a not very helpful catchword, in many areas. Modern science has shown this to us. Now we need to use appropriate science to figure it out.

Darwin the Newtonian. Part IV. What is 'natural selection'?

If, as I suggested yesterday, genetic drift is a rather unprovable or even metaphysical notion, then what is the epistemological standing of its opposite: not-drift? That concept implies that the reproductive success of the alternative genotypes under consideration is not equal. But since we saw yesterday that showing that two things are exactly equal is something of a non-starter, how different is its negation?

Before considering this, we might note that to most biologists, those who think and those who just invoke explanations, non-drift means natural selection. That is what textbooks teach, even in biology departments (and in schools of medicine and public health, where simple-Simon is alive and well). But natural selection implies systematic, consistent favoring of one variant over others, and for the same reason. That is by far the main rationale for the routine if unstated assumption that today's functions or adaptations are due to past selection for those same functions: we observe today and retroactively extrapolate to the past. It's understandable that we do that, and it was a major indirect way (along with artificial selection) in which Darwin was able to reconstruct an evolutionary theory that didn't require divine ad hoc creation events. But there are problems with this sort of thinking--and some of them have long been known, even if essentially stifled by what amounts to a selectionist ideology, that is, a rather unquestioning belief in a kind of single-cause worldview.

What does exactly not-zero mean?

I suggested yesterday that drift, meaning exactly no systematic difference between states (like genotypes) was so illusive as to be essentially philosophical. But zero-difference is a very specific value and may thus be especially hard to prove. But non-zero is essentially an open-ended concept and might thus be trivially easy to show. But it's not!

One alternative to two things being not zero is simply that they have some difference. But need that difference be specifiable or of a fixed amount? Need it be constant or similar over instances of time and place? If not, we are again in rather spooky territory, because not being identical is not much if any help in understanding. One wants to know by how much, and why--and if it's consistent or a fluke of sample or local circumstance. But this is not a fixed set of things to check.

Instead of just 'they're different', what is usually implicitly implied is that the genotypes being compared have some particular, specific fitness difference amount, not just that they differ. That is what asserting different functional effects of the variants largely implies, because otherwise one is left asserting that they are different....sort of, sometimes, and this isn't very satisfying or useful. It would be normal, and sensible, to argue that the difference need not be precisely, deterministically constant, because there's always a luck component, and ecological conditions change. But if the difference varies widely among circumstances, it is far more difficult to make persuasive 'why' explanations. For example, small differences favoring variant A over variant B in one sample or setting might actually favor B over A in other times or places. Then selection is a kind of willy-nilly affair--which probably is true!--but much more difficult to infer in a neat way, because it really is not different from being zero on average (though 'on average' is also easier to say than to account for causally). If a difference is 'not zero', there are an infinity of ways that might be so, especially if it is acknowledged to be variable, as every sensible evolutionary biologist would probably agree is the case.

But then looking for causes becomes very difficult because among all the variants in a population, and all the variation in individual organisms' experience means that there may be an open-ended number of explanations one would have to test to account for an observed small fitness difference between A and B. And that leads to serious issues about statistical 'significance' and inference criteria. That's because most alleged fitness differences are essentially local and comparative. In turn that means the variant is not inherently selected but is context-dependent: fitness doesn't have a universal value, like, say, G, the universal Newtonian gravitational constant in physics, and to me that means that even an implicitly Newtonian view of natural selection is mistaken as a generality about life.

If selection were really force-like in that sense, rather than an ephemeral, context-specific statistical estimate, its amount (favoring A over B) should approach the force's parameter, analogous to G, asymptotically: the bigger the sample and greater the number of samples analyzed the closer the estimated value would get to the true value. Clearly that is not the way life is, even in most well-controlled experimental settings. Indeed, even Darwin's idea of a constant struggle for existence is incompatible with that idea.

There are clearly many instances in which selective explanations of the classical sort seem specifically or even generally credible. Infectious disease and the evolution of resistance is an obvious example. Parallel evolution, such as independent evolution of, say, flight or similar dog-like animals in Australia and Africa, may be taken to prove the general theory of adaptation to environments. But what about all the not dogs in these places? We are largely in ad hoc explanatory territory, and the best of evolutionary theory clearly recognizes that.

So, in what sense does natural selection actually exist? Or neutrality? If they are purely comparative, local, ad hoc phenomena largely demonstrable only by subjective statistical criteria, we have trouble asserting causation beyond constructing Just-So stories. Even with a plausible mechanism, this will often be the case, because plausibility is not the same as necessity. Just-So stories can, of course, be true....but usually hard to prove in any serious sense.

Additionally, in regard to adaptive traits within or between populations or species, if genetic causation is due to contributions of many genes, as typically seems to be the case, there is phenogenetic drift, so that even with natural selection working force-like on a trait, there may be little if any selection on specific variants in that mix: even if the trait is under selection, a given allelic variant may not be.

Some other slippery issues

Natural selection is somewhat strange. It is conceptually a passive screen of variation, but often treated as if an inherent property of a genotype (or an allele), whose value is determined on what else is in the same locus in the population. Yet it's also treated as if this is inherent and unchanging property of the genotype...until any competing genotypes disappear. As the favored allele becomes more common, its amount of advantage will increasingly vary because, due to recombination and mutation, the many individuals carrying the variant will also vary in the rest of their genomes, which will introduce differences in fitness among them (likewise, early on most carriers of the favored 'A' variant will be heterozygotes, but later on more and more will be homozygotes). When the A variant becomes very common in the population, its advantage will hardly be detectable since almost all its peers fellws will have the same genotype at that site. Continued adaptation will have to shift to other genes, where there still is a difference. Some AA carriers will have detrimental variants at another gene, say B, and hence reduced fitness. Relatively speaking, some A's, or eventually maybe all A's, will have become harmful, because even in classical Darwinian terms selection is only relative and local. So, selection even in the force-like sense, is very non-Newtonian, because it is so thoroughly context-dependent.

Another issue is somatic mutation. The genotypes that survive to be transmitted to the next generation are in the germ line. But every cell division induces some mutations, and depending on when and where during development or later life a mutation occurs, it could affect the traits of the individual. Even if selection were a deterministic force, it screens on individuals and hence that includes any effects of somatic mutation in those individuals. But somatic mutations aren't inherited, so even if the mechanism is genetic their effects will appear as drift in evolutionary terms.

Most models of adaptive selection are trait-specific. But species do not evolve one trait at a time, except perhaps occasionally when a really major stressor sweeps through (like an epidemic). Generally, a population is always subject to a huge diversity of threats and opportunities, contexts and changes. Every one of our biological systems is always being tested, of in many ways at once. Traits are also often correlated with one another, so pushing on one may be pulling on another. That means that even if each trait were being screened for separate reasons, the net effect on any one of the must typically be very very small, even if it is Newtonian in its force-like nature.

The result is something like a Japanese pachinko machine. Pachinko is popular type of gambling in Japan. A flurry of small metal balls bounces down from the top more or less randomly through a jungle of pins and little wheels, before finally arriving at the bottom. The balls bounce off each other on the way in basically random collisions. The payoff (we could say it's analogous to fitness) is based on the balls that, after all this apparent chaos, end up in a particular pocket at the bottom. In biological analogy, each ball can represent a different trait or perhaps individuals in a population. They bounce around rather randomly, constrained only by the walls and objects there--nothing steers them specifically. What's in the pocket is the evolutionary result.

Pachinko machine (from Google images)

(you can easily find YouTube videos showing pachinkos in action)

All similes limp, and these collisions are probably in truth deterministic, even if far too too complex to predict the outcome. Nonetheless, this sort of dynamics among individuals with their differing genes of varying and context-specific effects, in diverse and complex environments, suggests why in this dynamic complex, change related to a given trait will be a lot like drift; there are so many that if each were too strongly force-like extinction would be more likely the result. Further, since most traits are affected by many parts of the genome, the intensity of selection on any one of them must be reduced to be close to the expectations of drift. Adaptive complexity is another reason to think that adaptive change must be glacially slow, as Darwin stressed many times, but also that selection is much less force-like, as a rule. After the fact, seeing what managed to survive, it looks compatible with force-like, straight-line selection.

Here, the process seems to rest heavily on chance. But as we discussed in a post in 2014 in a series on the modes and nature of natural selection, we likened the course that species take through time to the geodesic paths that objects take through spacetime, that is determined (and there it really does seem to be 'determined') by the splattered matter and energy in any point it passes through.

An overall view

This leaves us in something of a quandary. We can easily construct criteria for making some inferences, in the stronger cases, and testing them in some experimental settings. We can proffer imaginative scenarios to account for the presence of organized traits and adaptations. But evolutionary explanations are often largely or wholly speculative. This applies comparably to natural selection and to genetic drift as well, and these are not new discoveries although they seem to be in few peoples' interest to acknowledge them fully.

Darwin wanted to show by plausibility argument that life on earth was the result of natural processes, not ad hoc divine creation events. He had scant concepts of chance or genetic drift, because his ideas of the mechanism of inheritance were totally wrong. Concepts of probabilism and statistical testing and the like were still rather new and only in restricted use. Darwin would have no trouble acknowledging a role for drift. How he would respond to the elusiveness of these factors, and that they really are not 'forces', is hard to say--but he probably would vigorously try to defend systematic selection by arguing that what is must have gotten here by selection as a force.

The causal explanation of life's diversity still falls far short of the kind of mathematical or deterministic rigor of the core physical sciences, and even of more historical physical sciences like geology, oceanography, and meteorology. Until someone finds better ways (if they indeed are there to be found), much of evolutionary biology verges on metaphysical philosophy for reasons we've tried to argue in this series. We should be honest about that fact, and clearly acknowledge it.

One can say that small values are at least real values, or that you can ignore small values, as in genetic drift. Likewise one can say that small selective effects will vary from sample to sample because of chance and so on. But such acknowledgments undermine the kinds of smooth inferences we naturally hunger for. The assumption that what we see today is what was the case in the past is usually little more than an assumption. This is a main issue we should confront in trying to understand evolution--and it applies as well to the promises being made of 'precision' prediction of genomic causation in health and medicine. The moving tide of innumerable genotypic ways to get similar traits, at any time, within or between populations, and over evolutionary time, needs to be taken seriously.

It may be sufficient and correct to say, almost tautologically, that today's function evolved somehow, and we can certainly infer that it got here by some mix of evolutionary factors. Our ancestors and their traits clearly were evolutionarily viable or we wouldn't be here. So even if we can't really trace the history in specifics, we can usually be happy to say that, clearly, whales evolved to be able to live in the ocean. Nobody can question that. But the points I've tried to make in this series are serious ones worth thinking seriously about, if we really want to understand evolution, and the genetic causal mechanisms that it has produced.

Statistical Reform.....or Safe-harbor Treadmill Science?

We have recently commented on the flap in statistics circles about the misleading use of significance test results (p-values) rather than a more complete and forthright presentation of the nature of the results and their importance (three posts, starting here). There has been a lot of criticism of what boils down to misrepresentative headlines publicizing what are in essence very minor results. The American Statistical Association recently published a statement about this, urging clearer presentation of results. But one may ask about this and the practice in general. Our recent set of posts discussed the science. But what about the science politics in all of this?

The ASA is a trade organization whose job it is, in essence, to advance the cause and use of statistical approaches in science. The statistics industry is not a trivial one. There are many companies who make and market statistical analytic software. Then there are the statisticians themselves and their departments and jobs. So one has to ask is the ASA statement and the other hand-wringing sincere and profound or, or to what extent, is this a vested interest protecting its interests? Is it a matter of finding a safe harbor in a storm?

Statistical analysis can be very appropriate and sophisticated in science, but it is also easily mis- or over-applied. Without it, it's fair to say that many academic and applied fields would be in deep trouble; sociopolitical sciences and many biomedical sciences as well fall into this category. Without statistical methods to compare and contrast sampled groups, these areas rest on rather weak theory. Statistical 'significance' can be used to mask what is really low level informativeness or low importance under a patina of very high quantitative sophistication. Causation is the object of science, but statistical methods too often do little more than describe some particular sample.

When a problem arises, as here, there are several possible reactions. One is to stop and realize that it's time for deeper thinking: that current theory, methods, or approaches are not adequately addressing the questions that are being asked. Another reaction is to do public hand-wringing and say that what this shows is that our samples have been too small, or our presentations not clear enough, and we'll now reform.

But if the effects being found are, as is the case in this controversy, typically very weak and hence not very important to society, then the enterprise and the promised reform seem rather hollow. The reform statements have had almost no component that suggests that re-thinking is what's in order. In that sense, what's going on is a stalling tactic, a circling of wagons, or perhaps worse, a manufactured excuse to demand even larger budgets and longer-term studies, that is to demand more--much more--of the same.

The treadmill problem
If that is what happens, it will keep scientists and software outfits and so on, on the same treadmill they've been on, that has led to the problem. It will also be contrary to good science. Good science should be forced by its 'negative' results, to re-think its questions. This is, in general, how major discoveries and theoretical transformations have occurred. But with the corporatization of academic professions, both commercial and in the sense of trade-unions, we have an inertial factor that may actually impede real progress. Of course, those dependent on the business will vigorously resist or resent such a suggestion. That's normal and can be expected, but it won't help unless a spirited attack on the problems at hand goes beyond more-of-the-same.

Is it going to simulate real new thinking, or mainly just strategized thinking for grants and so on?

So is the public worrying about this a holding action or a strategy? Or will we see real rather than just symbolic, pro forma, reform? The likelihood is not, based on the way things work these days.

There is a real bind here. Everyone depends on the treadmill and keeping it in operation. The labs need their funding and publication treadmills, because staff need jobs and professors need tenure and nice salaries. But if by far most findings in this arena are weak at best, then what journals will want to publish them? They have to publish something and keep their treadmill going. What news media will want to trumpet them, to feed their treadmill? How will professors keep their jobs or research-gear outfits sell their wares?

There is fault here, but it's widespread, a kind of silent conspiracy and not everyone is even aware of it. It's been built up gradually over the past few decades, like the frog in slowly heating water who does't realize he's about to be boiled alive. We wear the chains we've forged in our careers. It's not just a costly matter, and one of understandable careerism. It's a threat to the integrity of the enterprise itself.

We have known many researchers who have said they have to be committed to a genetic point of view because that's what you have to do to get funded, to keep your lab going, to get papers in the major journals or have a prominent influential career. One person applying for a gene mapping study to find even lesser genomic factors than the few that were already well-established said, when it was suggested that rather than find still more genes, perhaps the known genes might now be investigated instead, "But, mapping is what I do!". Many a conversation I've heard is a quiet boasting about applying for funding for work that's already been done, so one can try something else (that's not being proposed for reviewers to judge).

If this sort of 'soft' dishonesty is part of the game (and if you think it's 'soft'), and yet science depends centrally on honesty, why do we think we can trust what's in the journals? How many seriously negating details are not reported, or buried in huge 'supplemental' files, or not visible because of intricate data manipulation? Gaming the system undermines the very core of science: its integrity. Laughing about gaming the system adds insult to injury. But gaming the system is being taught to graduate students early in their careers (it's called 'grantsmanship').

We have personally encountered this sort of attitude, expressed only in private of course, again and again in the last couple of decades during which big studies and genetic studies have become the standard operating mode in universities, especially biomedical science (it's rife in other areas like space research, too, of course).

There's no bitter personal axe being ground here. I've retired, had plenty of funding through the laboratory years, our work was published and recognized. The problem is of science not personal. The challenge to understand genetics, development, causation and so forth is manifestly not an easy one, or these issues would not have arisen.

It's only human, perhaps, given that the last couple of generations of scientists systematically built up an inflated research community, and the industries that serve it, much of which depends on research grant funding, largely at the public trough, with jobs and labs at stake. The members of the profession know this, but are perhaps too deeply immersed to do anything major to change it, unless some sort of crisis forces that upon us. People well-heeled in the system don't like these thoughts being expressed, but all but the proverbial 1%-ers, cruising along just fine in elite schools with political clout and resources, know there's a problem and know they dare not say too much about it.

The statistical issues are not the cause. The problem is a combination of the complexity of biological organisms as they have evolved, and the simplicity of human desires to understand (and not to get disease). We are pressured not just to understand, but to translate that into dramatically better public and individual health. Sometimes it works very well, but we naturally press the boundaries, as science should. But in our current system we can't afford to be patient. So, we're on a treadmill, but it's largely a treadmill of our own making.

The statistics of Promissory Science. Part II: The problem may be much deeper than acknowledged

Yesterday, I discussed current issues related to statistical studies of things like genetic or other disease risk factors. Recent discussion has criticized the misuse of statistical methods, including a statement on p-values by the American Statistical Association. As many have said, the over-reliance on p-values can give a misleading sense that significance means importance of a tested risk factor. Many touted claims are not replicated in subsequent studies, and analysis has shown this may preferentially apply to the 'major' journals. Critics have suggested that p-values not be reported at all, or only if other information like confidence intervals (CIs) and risk factor effect sizes be included (I would say prominently included). Strict adherence will likely undermine what even expensive major studies can claim to have found, and it will become clear that many purported genetic, dietary, etc., risk factors are trivial, unimportant, or largely uninformative.

However, today I want to go farther, and question whether even making these correctives doesn't go far enough, and would perhaps serve as a convenient smokescreen for far more serious implications of the same issue. There is reason to believe the problem with statistical studies is more fundamental and broad than has been acknowledged.

Is reporting p-values really the problem?
Yesterday I said that statistical inference is only as good as the correspondence between the mathematical assumptions of the methods and what is being tested in the real world. I think the issues at stake rest on a deep disparity between them. Worse, we don't and often cannot know which assumptions are violated, or how seriously. We can make guesses and do all auxiliary tests and the like, but as decades of experience in the social, behavioral, biomedical, epidemiological, and even evolutionary and ecological worlds show us, we typically have no serious way to check these things.

The problem is not just that significance is not the same as importance. A somewhat different problem with standard p-value cutoff criteria is that many of the studies in question involve many test variables, such as complex epidemiological investigations based on long questionnaires, or genomewide association studies (GWAS) of disease. Normally, p=0.05 means that by chance one test in 20 will seem to be significant, even if there's nothing causal going on in the data (e.g., if no genetic variant actually contributes to the trait). If you do hundreds or even many thousands of 0.05 tests (e.g., of sequence variants across the genome), even if some of the variables really are causative, you'll get so many false positive results that follow-up will be impossible. A standard way to avoid that is to correct for multiple testing by using only p-values that would be achieved by chance only once in 20 times of doing a whole multivariable (e.g., whole genome) scan. That is a good, conservative approach, but means that to avoid a litter of weak, false positives, you only claim those 'hits' that pass that standard.

You know you're only accounting for a fraction of the truly causal elements you're searching for, but they're the litter of weakly associated variables that you're willing to ignore to identify the mostly likely true ones. This is good conservative science, but if your problem is to understand the beach, you are forced to ignore all the sand, though you know it's there. The beach cannot really be understood by noting its few detectable big stones.

Sandy beach; Wikipedia, Lewis Clark

But even this sensible play-it-conservative strategy has deeper problems.

How 'accurate' are even these preferred estimates?
The metrics like CIs and effect sizes that critics are properly insisting be (clearly) presented along with or instead of p-values face exactly the same issues as the p-value: the degree to which what is modeled fits the underlying mathematical assumptions on which test statistics rest.

To illustrate this point, the Pythagorean Theorem in plane geometry applies exactly and universally to right triangles. But in the real world there are no right triangles! There are approximations to right triangles, and the value of the Theorem is that the more carefully we construct our triangle the closer the square of the hypotenuse is to the sum of the squares of the other sides. If your result doesn't fit, then you know something is wrong and you have ideas of what to check (e.g., you might be on a curved surface).

Right triangle; Wikipedia

In our statistical study case, knowing an estimated effect size and how unusual it is seems to be meaningful, but we should ask how accurate these estimates are. But that question often has almost no testable meaning: accurate relative to what? If we were testing a truth derived from a rigorous causal theory, we could ask by how many decimal places our answers differ from that truth. We could replicate samples and increase accuracy, because the signal to noise ratio would systematically improve. Were that to fail, we would know something was amiss, in our theory or our instrumentation, and have ideas how to find out what that was. But we are far, indeed unknowably far, from that situation. That is because we don't have such an externally derived theory, no analog to the Pythagorean Theorem, in important areas where statistical study techniques are being used.

In the absence of adequate theory, we have to concoct a kind of data that rests almost entirely on internal comparison to reveal whether 'something' of interest (often that we don't or cannot specify) is going on. We compare data such as cases vs controls, which forces us to make statistical assumptions such as that, other than (say) exposure to coffee, our sample of diseased vs normal subjects differ only in their coffee consumption, or that the distribution of other variation in unmeasured variables is random with regard to coffee consumption among our cases and controls subjects. This is one reason, for example, that even statistically significant correlation does not imply causation or importance. The underlying, often unstated assumptions are often impossible to evaluate. The same problem relates to replicability: for example, in genetics, you can't assume that some other population is the same as the population you first studied. Failure to replicate in this situation does not undermine a first positive study. For example, a result of a genetic study in Finland cannot be replicated properly elsewhere because there's only one Finland! Even another study sample within Finland won't necessarily replicate the original sample. In my opinion, the need for internally based comparison is the core problem, and a major reason why theory-poor fields often do so poorly.

The problem is subtle
When we compare cases and controls and insist on a study-wide 5% significance level to avoid a slew of false-positive associations, we know we're being conservative as described above, but at least those variables that do pass the adjusted test criterion are really causal with their effect strengths accurately estimated. Right? No!

When you do gobs of tests, some very weak causal factor may by good luck pass your test. But of those many contributing causal factors, the estimated effect size of the lucky one that passes the conservative test is something of a fluke. The estimated effect size may well be inflated, as experience in follow-up studies often or even typically shows.

In this sense it's not just p-values that are the problem, and providing ancillary values like CIs and effect sizes in study reports is something of a false pretense of openness, because all of these values are vulnerable to similar problems. The promise to require these other data is a stopgap, or even a strategy to avoid adequate scrutiny of the statistical inference enterprise itself.

It is nobody's fault if we don't have adequate theory. The fault, dear Brutus, is in ourselves, for using Promissory Science, and feigning far deeper knowledge than we actually have. We do that rather than come clean about the seriousness of the problems. Perhaps we are reaching a point where the let-down from over-claiming is so common that the secret can't be kept in the bag, and the paying public may get restless. Leaking out a few bits of recognition and promising reform is very different from letting all it all out and facing the problem bluntly and directly. The core problem is not whether a reported association is strong or meaningful, but, more importantly, that we don't know or know how to know.

This can be seen in a different way. If all studies including negative ones were reported in the literature, then it would be only right that the major journals should carry those findings that are most likely true, positive, and important. That's the actionable knowledge we want, and a top journal is where the most important results should appear. But the first occurrence of a finding, even if it turns out later to be a lucky fluke, is after all a new finding! So shouldn't investigators report it, even though lots of other similar studies haven't yet been done? That could take many years or, as in the example of Finnish studies, be impossible. We should expect negative results should be far more numerous and less interesting in themselves, if we just tested every variable we could think of willy-nilly, but in fact we usually have at least some reason to look, so it is far from clear what fraction of negative results would undermine the traditional way of doing business. Should we wait for years before publishing anything? That's not realistic.

If the big-name journals are still seen as the place to publish, and their every press conference and issue announcement is covered by the splashy press, why should they change? Investigators may feel that if they don't stretch things to get into these journals, or just publish negative results, they'll be thought to have wasted their time or done poorly designed studies. Besides normal human vanity, the risk is that they will not be able to get grants or tenure. That feeling is the fault of the research, reputation, university, and granting systems, not the investigator. Everyone knows the game we're playing. As it is, investigators and their labs have champagne celebrations when they get a paper in one of these journals, like winning a yacht race, which is a reflection of what one could call the bourgeois nature of the profession these days.

How serious is the problem? Is it appropriate to characterize what's going on as fraud, hoax, or silent conspiracy? Probably in some senses yes; at least there is certainly culpability among those who do understand the epistemological nature of statistics and their application. Plow ahead anyway is not a legitimate response to fundamental problems.

When reality is closely enough approximated by statistical assumptions, causation can be identified, and we don't need to worry about the details. Many biomedical and genetic, and probably even some sociological problems are like that. The methods work very well in those cases. But this doesn't gainsay the accusation that there is widespread over-claiming taking place and that the problem is a deep lack of sufficient theoretical understanding of our fields of interest, and a rush to do more of the same year after year.

It's all understandable, but it needs fixing. To be properly addressed, an entrenched problem requires more criticism even than this one has been getting recently. Until better approaches come along, we will continue wasting a lot of money in the rather socialistic support of research establishments that keep on doing science that has well-known problems.

Or maybe the problem isn't the statistics, after all?
The world really does, after all, seem to involve causation and at its basis seems to be law-like. There is truth to be discovered. We know this because when causation is simple or strong enough to be really important, anyone can find it, so to speak, without big samples or costly gear and software. Under those conditions, numerous details that modify the effect are minor by comparison to the major signals. Hundreds or even thousands of clear, mainly single-gene based disorders are known, for example. What is needed is remediation, hard-core engineering to do something about the known causation.

However, these are not the areas where the p-value and related problems have arisen. That happens when very large and SASsy studies seem to be needed, and the reason is that there causal factors are weak and/or so complex. Along with trying to root out misrepresentation and failure to report the truth adequately, we should ask whether, perhaps, the results showing frustrating complexity are correct.

Maybe there is not a need for better theory after all. In a sense the defining aspect of life is that it evolves not by the application of external forces as in physics, but by internal comparison--which is just what survey methods assess. Life is the result of billions of years of differential reproduction, by chance and various forms of selection--that is, continual relative comparison by local natural circumstances. 'Differential' is the key word here. It is the relative success among peers today that determines the genomes and their effects that will be here tomorrow. In a way, in effect and if often unwittingly and for lack of better ideas, that's just the sort of comparison made in statistical studies.

From that point of view, the problem is that we don't want to face up to the resulting truth, which is that a plethora of changeable, individually trivial causal factors is what we find because that's what exists. That we don't like that, don't report it cleanly, and want strong individual causation is our problem, not Nature's.

The statistics of Promissory Science. Part I: Making non-sense with statistical methods

Statistics is a form of mathematics, a way devised by humans for representing abstract relationships. Mathematics comprises axiomatic systems, which make assumptions about basic units such as numbers; basic relationships like adding and subtracting; and rules of inference (deductive logic); and then elaborates these to draw conclusions that are typically too intricate to reason out in other less formal ways. Mathematics is an awesomely powerful way of doing this abstract mental reasoning, but when applied to the real world it is only as true or precise as the correspondence between its assumptions and real-world entities or relationships. When that correspondence is high, mathematics is very precise indeed, a strong testament to the true orderliness of Nature. But when the correspondence is not good, mathematical applications verge on fiction, and this occurs in many important applied areas of probability and statistics.

You can't drive without a license, but anyone with R or SAS can be a push-button scientist. Anybody with a keyboard and some survey generating software can monkey around with asking people a bunch of questions and then 'analyze' the results. You can construct a complex, long, intricate, jargon-dense, expansive survey. You then choose who to subject to the survey--your 'sample'. You can grace the results with the term 'data', implying true representation of the world, and be off and running. Sample and survey designers may be intelligent, skilled, well-trained in survey design, and of wholly noble intent. There's only one little problem: if the empirical fit is poor, much of what you do will be non-sense (and some of it nonsense).

Population sciences, including biomedical, evolutionary, social and political fields are experiencing an increasingly widely recognized crisis of credibility. The fault is not in the statistical methods on which these fields heavily depend, but in the degree of fit (or not) to the assumptions--with the emphasis these days on the 'or not', and an often dismissal of the underlying issues in favor of a patina of technical, formalized results. Every capable statistician knows this, but of course might be out of business if openly paying it enough attention. And many statisticians may be rather disinterested or too foggy in the philosophy of science to understand what goes beyond the methodological technicalities. Jobs and journals depend on not being too self-critical. And therein lie rather serious problems.

Promissory science
There is the problem of the problems--the problems we want to solve, such as in understanding the cause of disease so that we can do something about it. When causal factors fit the assumptions, statistical or survey study methods work very well. But when causation is far from fitting the assumptions, the impulse of the professional community seems mainly to increase the size, scale, cost, and duration of studies, rather than to slow down and rethink the question itself. There may be plenty of careful attention paid to refining statistical design, but basically this stays safely within the boundaries of current methods and beliefs, and the need for research continuity. It may be very understandable, because one can't just quickly uproot everything or order up deep new insights. But it may be viewed as abuse of public trust as well as of the science itself.

The BBC Radio 4 program called More Or Less keeps a watchful eye on sociopolitical and scientific statistical claims, revealing what is really known (or not) about them. Here is a recent installment on the efficacy (or believability, or neither) of dietary surveys. And here is a FiveThirtyEight link to what was the basis of the podcast.

The promotion of statistical survey studies to assert fundamental discovery has been referred to as 'promissory science'. We are barraged daily with promises that if we just invest in this or that Big Data study, we will put an end to all human ills. It's a strategy, a tactic, and at least the top investigators are very well aware of it. Big long-term studies are a way to secure reliable funding and to defer delivering on promises into the vague future. The funding agencies, wanting to seem prudent and responsible to taxpayers with their resources, demand some 'societal impact' section on grant applications. But there is in fact little if any accountability in this regard, so one can say they are essentially bureaucratic window-dressing exercises.

Promissory science is an old game, practiced since time immemorial by preachers. It boils down to promising future bliss if you'll just pay up now. We needn't be (totally) cynical about this. When we set up a system that depends on public decisions about resources, we will get what we've got. But having said that, let's take a look at what is a growing recognition of the problem, and some suggestions as to how to fix it--and whether even these are really the Emperor of promissory science dressed in less gaudy clothing.

A growing at least partial awareness
The problem of results that are announced by the media, journals, universities, and so on but that don't deliver the advertised promises is complex but widespread, in part because research has become so costly, that some warning sirens are sounding when it becomes clear that the promised goods are not being delivered.

One widely known issue is the lack of reporting of negative results, or their burial in minor journals. Drug-testing research is notorious for this under-reporting. It's too bad because a negative result on a well-designed test is legitimately valuable and informative. A concern, besides corporate secretiveness, is that if the cost is high, taxpayers or share-holders may tire of funding yet more negative studies. Among other efforts, including by NIH, there is a formal attempt called AllTrials to rectify the under-reporting of drug trials, and this does seem at least to be thriving and growing if incomplete and not enforceable. But this non-reporting problem has been written about so much that we won't deal with it here.

Instead, there is a different sort of problem. The American Statistical Association has recently noted an important issue, which is the use and (often) misuse of p-values to support claims of identified causation (we've written several posts in the past about these issues; search on 'p-value' if you're interested, and the post by Jim Wood is especially pertinent). FiveThirtyEight has a good discussion of the p-value statement.

The usual interpretation is that p represents the probability that if there is in fact no causation by the test variable, that its apparent effect arose just by chance. So if the observed p in a study is less than some arbitrary cutoff, such as 0.05, it means essentially that if no causation were involved the chance you'd see this association anyway is no greater than 5%; that is, there is some evidence for a causal connection.

Trashing p-values is becoming a new cottage industry! Now JAMA is on the bandwagon, with an article that shows in a survey of biomedical literature from the past 25 years, including well over a million papers, a far disproportionate and increasing number of studies reported statistical significant results. Here is the study on the JAMA web page, though it is not public domain yet.

Besides the apparent reporting bias, the JAMA study found that those papers generally failed to provide adequate fleshing out of that result. Where are all the negative studies that statistical principles might expect to be found? We don't see them, especially in the 'major' journals, as has been noted many times in recent years. Just as importantly, authors often did not report confidence intervals or other measures of the degree of 'convincingness' that might illuminate the p-value. In a sense that means authors didn't say what range of effects is consistent with the data. They report a non-random effect, but often didn't give the effect size, that is, say how large the effect was even assuming that effect was unusual enough to support a causal explanation. So, for example, a statistically significant increase of risk from 1% to 1.01% is trivial, even if one could accept all the assumptions of the sampling and analysis.

Another vocal critic of what's afoot is John Ionnides; in a recent article he levels both barrels against the misuse and mis- or over-representation of statistical results in biomedical sciences, including meta-analysis (the pooling of many diverse small studies into a single large analysis to gain sufficient statistical power to detect effects and test for their consistency). This paper is a rant, but a well-deserved one, about how 'evidence-based' medicine has been 'hijacked' as he puts it. The same must be said of 'precision genomic' or 'personalized' medicine, or 'Big Data', and other sorts of imitative sloganeering going on from many quarters who obviously see this sort of promissory science as what you have to do to get major funding. We have set ourselves a professional trap, and it's hard to escape. For example, the same author has been leading the charge against misrepresentative statistics for many years, and he and others have shown that the 'major' journals have in a sense the least reliable results in terms of their replicability. But he's been raising these points in the same journals that he shows are culpable of the problem, rather than boycotting those journals. We're in a trap!

These critiques of current statistical practice are the points getting most of the ink and e-ink. There may be a lot of cover-ups of known issues, and even hypocrisy, in all of this, and perhaps more open or understandable tacit avoidance. The industry (e.g., drug, statistics, and research equipment) has a vested interest in keeping the motor running. Authors need to keep their careers on track. And, in the fairest and non-political sense, the problems are severe.

But while these issues are real and must be openly addressed, I think the problems are much deeper. In a nutshell, I think they relate to the nature of mathematics relative to the real world, and the nature and importance of theory in science. We'll discuss this tomorrow.

Masal ve Hikayeler