Big Data etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
Big Data etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Genomic causation....or not

By Ken Weiss and Anne Buchanan

The Big Story in the latest Nature ("A radical revision of human genetics: Why many ‘deadly’ gene mutations are turning out to be harmless," by Erika Check Hayden) is that genes thought to be clearly causal of important diseases aren't always (the link is to the People magazine-like cover article in that issue.)  This is a follow-up on an August Nature paper describing the database from which the results discussed in this week's Nature are drawn.  The apparent mismatch between a gene variant and a trait can be, according to the paper, the result of technical error, a mis-call by a given piece of software, or due to the assumption that the identification of a given mutation in affected but not healthy individuals means the causal mutation has been found, without experimentally confirming the finding--which itself can be tricky for reasons we'll discuss.  Insufficient documentation of 'normal' sequence variation has meant that the frequency of so-called causal mutations hasn't been available for comparative purposes.  Again we'll mention below what 'insufficient' might mean, if anything.

People in general and researchers in particular need to be more than dismissively aware of these issues, but the conclusion that we still need to focus on single genes as causal of most disease, that is, do MuchMoreOfTheSame, which is an implication of the discussion, is not so obviously justified.   We'll begin with our usual contrarian statement that the idea here is being overhyped as if it were new, but we know that except for its details it clearly is not, for reasons we'll also explain.  That is important because presenting it as a major finding, and still focusing on single genes as being truly causal vs mistakenly identified, ignores what we think the deeper message needs to be.

The data come from a mega-project known as ExAC, a consortium of researchers sharing DNA sequences to document genetic variation and further understand disease causation, and now including data from approximately 60,000 individuals (in itself, rather small compared to the need for purpose). The data are primarily exome sequences, that is, from protein-coding regions of the human genome, not from whole genome sequences, again a major issue.  We have no reason at all to critique the original paper itself, which is large, sophisticated, and carefully analyzed as far as we can tell; but the excess claims about its novelty are we think very much hyperbolized, and that needs to be explained.

Some of the obvious complicating issues
We know that a gene generally does not act alone.  DNA in itself is basically inert.  We've been and continue to be misled by examples of gene causation in which context and interactions don't really matter much, but that leads us still to cling to these as though they are the rule.  This reinforces the yearning for causal simplicity and tractability.  Essentially even this ExAC story, or its public announcements, doesn't properly acknowledge causal context and complexity because it is critiquing some simplistic single-gene inferences, and assuming that the problems are methodological rather than conceptual.

There are many aspects of causal context that complicate the picture, that are not new and we're not making them up, but which the Bigger-than-Ever Data pleas don't address:
1.  Current data are from blood-samples and that may not reflect the true constitutive genome because of early somatic mutation, and this will vary among study subjects,
2.  Life-long exposure to local somatic mutation is not considered nor measured, 
3.  Epigenetic changes, especially local tissue-specific ones, are not included, 
4.  Environmental factors are not considered, and indeed would be hard to consider,
5.  Non-Europeans, and even many Europeans are barely included, if at all, though this is  beginning to be addressed, 
6.  Regulatory variation, which GWAS has convincingly shown is much more important to most traits than coding variation, is not included. Exome data have been treated naively by many investigators as if that is what is important, and exome-only data have been used a major excuse for Great Big Grants that can't find what we know is probably far more important, 
7.  Non-coding regions, non-regulatory RNA regions are not included in exome-only data,
8.  A mutation may be causal in one context but not in others, in one family or population and not others, rendering the determination that it's a false discovery difficult,
9.  Single gene analysis is still the basis of the new 'revelations', that is, the idea being hinted at that the 'causal' gene isn't really causal....but one implicit notion is that it was misidentified, which is perhaps sometimes true but probably not always so,
 10.  The new reports are presented in the news, at least, as if the gene is being exonerated of its putative ill effects.  But that may not be the case, because if the regulatory regions near the mutated gene have no or little activity, the 'bad' gene may simply not be being expressed.  Its coding sequence could falsely be assumed to be harmless, 
11. Many aspects of this kind of work are dependent on statistical assumptions and subjective cutoff values, a problem recently being openly recognized, 
12.  Bigger studies introduce all sorts of statistical 'noise', which can make something appear causal or can weaken its actual apparent cause.  Phenotypes can be measured in many ways, but we know very well that this can be changeable and subjective (and phenotypes are not very detailed in the initial ExAC database), 
13.  Early reports of strong genetic findings have well known upward bias in effect size, the finder's curse that later work fails to confirm.

Well, yes, we're always critical, but this new finding isn't really a surprise
To some readers we are too often critical, and at least some of us have to confess to a contrarian nature.  But here is why we say that these new findings, like so many that are by the grocery checkout in Nature, Science, and People magazines, while seemingly quite true, should not be treated as a surprise or a threat to what we've already known--nor a justification of just doing more, or much more of the same.

Gregor Mendel studied fully penetrant (deterministic) causation.  That is what we now know to be 'genes', in which the presence of the causal allele (in 2-allele systems) always caused the trait (green vs yellow peas, etc.; the same is true of recessive as dominant traits, given the appropriate genotype). But this is generally wrong, save at best for the exceptions such as those that Mendel himself knowingly and carefully chose to study.  But even this was not so clear!  Mendel has been accused of 'cheating' by ignoring inconsistent results. This may have been data fudging, but it is at least as likely to have been reacting to what we have known for a century as 'incomplete penetrance'.  (Ken wrote on this a number of years ago in one of his Evolutionary Anthropology columns.)  For whatever reason--and see below--the presence of a 'dominant' gene or  'recessive' homozyosity at a 'causal' gene doesn't always lead to the trait.

In most of the 20th century the probabilistic nature of real-world as opposed to textbook Mendelism has been completely known and accepted.  The reasons for incomplete penetrance were not known and indeed we had no way to know them as a rule.  Various explanations were offered, but the statistical nature of the inferences (estimates of penetrance probability, for example) were common practice and textbook standards.  Even the original authors acknowledge incomplete penetrance, but this essentially shows that what the ExAC consortium is reporting are details but nothing fundamentally new nor surprising.  Clinicians or investigators acting as if a variant were always causal should be blamed for gross oversimplification, and so should hyperbolic news media.

Recent advances such as genomewide association studies (GWAS) in various forms have used stringent statistical criteria to minimize false discovery.  This has led to mapped 'hits' that satisfied those criteria only accounting for a fraction of estimated overall genomic causation.  This was legitimate in that it didn't leave us swamped with hundreds of very weak or very rare false positive genome locations.  But even the acceptable, statistically safest genome sites showed typically small individual effects and risks far below 1.0. They were not 'dominant' in the usual sense.  That means that people with the 'causal' allele don't always, and in fact do not usually, have the trait.  This has been the finding for quantitative traits like stature and qualitative ones like presence of diabetes, heart attack-related events, psychiatric disorders and essentially all traits studied by GWAS. It is not exactly what the ExAC data were looking at, but it is highly relevant and is the relevant basic biological principle.

This does not necessarily mean that the target gene is not important for the disease trait, which seems to be one of the inferences headlined in the news splashes.  This is treated as a striking or even fundamental new finding, but it is nothing of that sort.  Indeed, the genes in question may not be falsely identified, but may very well contribute to risk in some people under some conditions at some age and in some environments.  The ExAC results don't really address this because (for example) to determine when a gene variant is a risk variant one would have to identify all the causes of 'incomplete penetrance' in every sample, but there are multiple explanations for incomplete penetrance, including the list of 1 - 13 above as well as methodological issues such as those pointed out by the ExAC project paper itself.

In addition, there may be 'protective' variants in the other regions of the genome (that is, the trait may need the contribution of many different genome regions), and working that out would typically involve "hyper astronomical" combinations of effects using unachievable, not to mention uninterpretable, sample sizes--from which one would have to estimate risk effects of almost uncountable numbers of sequence variants.  If there were, say, 100 other contributing genes, each with their own variant genotypes including regulatory variants, the number of combinations of backgrounds one would have to sort through to see how they affected the 'falsely' identified gene is effectively uncountable.

Even the most clearly causal genes such as variants of BRCA1 and breast cancer have penetrance far less than 1.0 in recent data (here referring to lifetime risk; risk at earlier ages is very far from 1.0). The risk, though clearly serious, depends on cohort, environmental and other mainly unknown factors.  Nobody doubts the role of BRCA1 but it is not in itself causal.  For example, it appears to be a mutation repair gene, but if no (or not enough) cancer-related mutations arise in the breast cells in a woman carrying a high-risk BRCA1 allele, she will not get breast cancer as a result of that gene's malfunction.

There are many other examples of mapping that identified genes that even if strongly and truly associated with a test trait have very far from complete penetrance.  A mutation in HFE and hemochromatosis comes to mind: in studies of some Europeans, a particular mutation seemed always to be present, but if the gene itself were tested in a general data base, rather than just in affected people, it had little or no causal effect.  This seems to be the sort of thing the ExAC report is finding.

The generic reason is again that genes, essentially all genes, work only in their context. That context includes 'environment', which refers to all the other genes and cells in the body and the external or 'lifestyle' factors, and also age and sex as well.  There is no obvious way to identify, evaluate or measure the effects of all possibly relevant lifestyle effects, and since these change, retrospective evaluation has unknown bearing on future risk (the same can be said of genomic variants for the same reason).  How could these even be sampled adequately?

Likewise, volumes of long-existing experimental and highly focused results tell the same tale. Transgenic mice, for example, in which the same mutation is introduced into their 'same' gene as in humans, very often show little or no, or only strain-specific effects.  This is true in other experimental organisms. The lesson, and it's by far not a new or very recent one, is that genomic context is vitally important, that is, it is person-specific genomic backgrounds of a target gene that affect the latter's effect strength--and vice versa: that is, the same is true for each of these other genes. That is why to such an extent we have long noted the legerdemain being foist on the research and public communities by the advocates of Big Data statistical testing.  Certainly methodological errors are also a problem, as the Nature piece describes, but they aren't the only problem.

So if someone reports some cases of a trait that seem too often to involve a given gene, such as the Nature piece seems generally to be about, but searches of unaffected people also occasionally find the same mutations in such genes (especially when only exomes are considered), then we are told that this is a surprise.  It is, to be sure, important to know, but it is just as important to know that essentially the same information has long been available to us in many forms.  It is not a surprise--even if it doesn't tell us where to go in search of genetic, much less genomic, causation.

Sorry, though it's important knowledge, it's not 'radical' nor dependent on these data!
The idea being suggested is that (surprise, surprise!) we need much more data to make this point or to find these surprisingly harmless mutations.  That is simply a misleading assertion, or attempted justification, though it has become the intentional industry standard closing argument.

It is of course very possible that we're missing some aspects of the studies and interpretations that are being touted, but we don't think that changes the basic points being made here.  They're consistent with the new findings but show that for many very good reasons this is what we knew was generally the case, that 'Mendelian' traits were the exception that led to a century of genetic discovery but only because it focused attention on what was then doable (while, not widely recognized by human geneticists, in parallel, agricultural genetics of polygenic traits showed what was more typical).

But now, if things are being recognized as being contextual much more deeply than in Francis' Collins money-strategy-based Big Data dreams, or 'precision' promises, and our inferential (statistical) criteria are properly under siege, we'll repeat our oft-stated mantra: deeply different, reformed understanding is needed, and a turn to research investment focused on basic science rather than exhaustive surveys, and on those many traits whose causal basis really is strong enough that it doesn't really require this deeper knowledge.  In a sense, if you need massive data to find an effect, then that effect is usually very rare and/or very weak.

And by the way, the same must be true for normal traits, like stature, intelligence, and so on, for which we're besieged with genome-mapping assertions, and this must also apply to ideas about gene-specific responses to natural selection in evolution.  Responses to environment (diet etc.) manifestly have the same problem.  It is not just a strange finding of exome mapping studies for disease. Likewise, 'normal' study subjects now being asked for in huge numbers may get the target trait later on in their lives, except for traits basically present early in life.  One can't doubt that misattributing the cause of such traits is an important problem, but we need to think of better solutions that Big Big Data, because not confirming a gene doesn't help, or finding that 'the' gene is only 'the' gene in some genomic or environmental backgrounds is the proverbial and historically frustrating needle in the haystack search.  So the story's advocated huge samples of 'normals' (random individuals) cannot really address the causal issue definitively (except to show what we know, that there's a big problem to be solved).  Selected family data may--may--help identify a gene that really is causal, but even they have some of the same sorts of problems.  And may apply only to that family.

The ExAC study is focused on severe diseases, which is somewhat like Mendel's selective approach, because it is quite obvious that complex diseases are complex.  It is plausible that severe, especially early onset diseases are genetically tractable, but it is not obvious that ever more data will answer the challenge.  And, ironically, the ExAC study has removed just such diseases from their consideration! So they're intentionally showing what is well known, that we're in needle in haystacks territory, even when someone has reported big needles.

Finally, we have to add that these points have been made by various authors for many years, often based on principles that did not require mega-studies to show.  Put another way, we had reason to expect what we're seeing, and years of studies supported that expectation.  This doesn't even consider the deep problems about statistical inference that are being widely noted and the deeply entrenched nature of that approach's conceptual and even material invested interests (see this week's Aeon essay, e.g.).  It's time to change, but doing so would involve deeply revising how resources are used--of course one of our common themes here on the MT--and that is a matter almost entirely of political economy, not science.  That is, it's as much about feeding the science industry as it is about medicine and public health.  And that is why it's mainly about business as usual rather than real reform.

FAS - Fishy Association Studies

           
                                  On Saturday, July 19, 1879, the brilliant opera 
                                  composer, Richard Wagner, "had a bad night; 
                                  he thinks that...he ate too much trout."  
                                             Quoted from Cosima Wagner's Diary, Vol. II, 1878-83.

As I was reading Cosima Wagner's doting diary of life with her famous husband, I chanced across the above quote that seemed an appropriate, if snarky, way to frame today's post. The incident she related exemplifies how we routinely assign causation even to one-off events in daily life. Science, on the other hand, purports to be about causation of a deeper sort, with some sufficient form of regularity or replicability.

Cause and effect can be elusive concepts, especially difficult to winnow out from observations in the complex living world.  We've hammered on about this on MT over the years.  The best science at least tries to collect adequate evidence in order to infer causation in credible rather than casual ways. There are, for example, likely to be lots of reasons, other than eating trout, that could explain why a cranky genius like Wagner had a bad night.  It is all too easy to over-interpret associations in causal terms.










By such thinking, the above figures (from Wikimedia commons) might be interpreted as having the following predictive power:
     One fish = bad night
     Two fish = total insomnia
     Many fish = hours of nightmarish dissonance called Tristan und Isolde!

Too often, we salivate over GWAS (genomewide association studies) results as if they justify ever-bigger and longer studies.  But equally too often, these are FAS, fishy association studies.  That is what we get when the science community doesn't pay heed to the serious and often fundamental difficulties in determining causation that may well undermine their findings and the advice so blithely proffered to the public.

We are not the only ones who have been writing that the current enumerative, 'Big Data', approach to biomedical and even behavior genetic causation leaves, to say the least, much to be desired.  Among other issues, there's too much asserting conclusions on inadequate evidence, and not enough recognition of when assertions are effectively not that much more robust than saying one 'ate too much trout'.  Weak statistical associations, so typically the result of these association studies, are not the same as demonstrations of causation.

The idea of mapping complex traits by huge genomewide case-control or population sample studies is a captivating one for biomedical researchers.  It's mechanical, perfectly designed to be done by huge computer database analysis by people who may never have seen the inside of a wet lab (e.g., programmers and 'informatics' or statistical specialists who have little serious critical understanding of the underlying biology).  It's often largely thought-free, because that makes the results safe to publish, safe for getting more grants, and so on; but more than being 'captivating' it is 'capturing'.... a hog-trough's share of research resources.

The promise, not even always carefully hedged with escape-words lest it be shown to be wrong, is that from your genome your future biomedical (and behavioral) traits can be known.  A recent article in the July 28 issue of the Journal of the American Medical Association (JAMA), Joyner et al. describes the stubborn persistence of under-performing but costly research that becomes entrenched, a perpetuation that NIH's misnomered 'precision based genomic medicine' continues or even expands upon. Below is our riff on the article, but it's open-source so you can read the points they make and judge for yourself if we have the right 'take' on what they say.  It is one of many articles that have been making similar points....in case anyone is listening.

The problem is complex causation
The underlying basic problem is the complex nature of causation of 'complex' traits, like many if not most behavioral or chronic or late-onset diseases. The word complex, long-used for such traits, refers not to identified causes but to the fact that the outcomes clearly did not have simple, identified causes.  It seemed clear that their causation was due mainly to countless combinations of many individually small causal factors, some of which were inherited; but the specifics were usually unknown. Computer and various DNA technologies made it possible, in principle, to identify and sort through huge numbers of possible causes or at least statistically associated factors, including DNA sequence variants.  But underlying this source for this approach has been the idea, always a myth really, that identifying some enumerated set of causes in a statistical sample would allow accurate prediction of outcomes.  This has proven not to be the case nearly as generally as has been promised.

To me, the push to do large-scale huge-sample, survey-based genomewide risk analysis was at least partly justified, at least in principle, years ago when there might have been some doubt about the nature of the causal biology underlying complex traits, including the increasingly common chronic disease problems that our aging population faces.  But the results are in, and in fact have been in for quite a long time.  Moreover, and a credit to the validity of the science, is that the results support what we had good reason to know for a long time.  The results show that this approach is not, or at least clearly no longer the optimal way to do science in this area or contribute to improving public health (and much of the same applies to evolutionary biology as well).

I think it fair to say that I was making these points, in print, in prominent places, starting as long ago as nearly 30 years, in books and journal articles (and more recently here on MT), that is, ever since the relevant actual data were beginning to appear.  But neither I nor my collaborators were the original discoverers of this insight: instead, the basic truth has been known in principle and in many empirical experimental (such as agricultural breeding) and observational contexts, for nearly a century! Struggling with the inheritance of causal elements ('genes' as they were generically known), the 1930s' 'modern synthesis' of evolutionary biology reconciled (1) Darwin's idea of gradual evolution, mainly of quantitative traits, with the experimental evidence of the quantitative nature of their inheritance, and (2) the discrete nature of inheritance of discrete causal elements first systematically demonstrated by Mendel for selected 2-state traits.  That was a powerful understanding but in too many ways it has thoughtlessly been taken to imply that all traits, not just genes, are usefully 'Mendelian', due to substantial, enumerable, strongly causal genetic agents.  That has always been the exception, not the rule.

A view is possible that is not wholly cynical 
We have been outspoken about the sociocultural aspect of modern research, which can be understood by what one might call the FTM (Follow the Money) approach, in some ways a better way to understand where we are than looking at the science itself.  Who has what to gain by the current approaches?  Our understanding is aided by realizing that the science is presented to us by scientists and journalists, supplier industries and bureaucrats, who have vested interests that are served by promoting that way of doing business.

FTM isn't the only useful perspective, however.  A less cynical, and yet still appropriate way to look at this is in terms of diminishing returns.  The investment in the current way of doing science in this (and other areas) is part of our culture.  From a scientific point of view, the first forays into a new way or approach, or a theoretical idea, yield quick and, by definition, new results.  Eventually, it becomes more routine and the per-study yield diminishes. We asymptotically approach what we can glean from the approach.  Eventually some chance insight will yield some forms of better and more powerful approaches, whatever they'll be.

If current approaches were just yielding low-cost incremental gain, or were being done in well-off investigators' basement labs, it would be a normal course of scientific-history, and nobody would have reason to complain.  But that isn't how it works these days.  These days understanding via FTM is important: the science establishment's hands are in all our pockets, and we should expect more in return than the satisfaction that the trough has been feeding many very nice careers (including mine), in universities, journalism, and so on.  How, when, and where a properly increased expectation of science for societal benefits will be fulfilled is not predictable, because facts are elusive and Nature often opaque.  However, simply more-of-the-same, at its current costs, with continuing entrenched justification, isn't the best way for public resources to be used.

There will always be a place for 'big data' resources.  A unified system of online biomedical records would save a lot of excess repeat-testing and other clinical costs, if every doctor you consult could access those records.  The records could potentially be used for research purposes, to the (limited) extent that they could be informative.  For a variety of conditions that would be very useful and cost-effective indeed; but most of those would be relatively rare.

Continuing to pour research funds into the idea that ever more 'data' will lead to dramatic improvements of 'precision' medicine is far more about the health of entrenched university labs and investigators than that of the general citizenry. Focused laboratory work that is more rigorously supported by theory or definitive experiment, with some accountability (but no expectations nor promises of miracles) is in order, given what the GWAS etc. era, plus a century of evolutionary genetics, has shown. There are countless areas, especially many serious early onset diseases, for which we have a focused, persuasive, meaningful understanding of causation and where resources should now be invested more heavily.

Intentionally open-ended beetle collecting ventures joined at the hip to promises of 'precision' without those promising even knowing what that word means (but hinting that it means 'perfection'), or glorifying the occasional seriously good findings as if they are typical or as though more focussed, less open-ended research wouldn't be a better investment, is not a legitimate approach.  Yet that is largely what is going on today.  The scientists, at least the smart ones, know this very well and say so (in confidence, of course).

Understanding complex causation is complex, and we have to face up to that.  We can't demand inexpensive or instant or even predictable answers.  These are inconvenient facts few want to face up to.  But we and others have said this ad nauseam before, so here we wanted to point out the current JAMA paper as yet another formal and prominently published realization of the costly inertia in which we are embedded, and by highly capable authors. In any aspect of society, not just science, prying resources loose from the hands of a small elite is never easy, even when there are other ways to use those resources that might have better payoff for all of us.

Usually, such resource reallocation seems to require some major new and imminent external threat, or some unpredicted discovery, which I think is far more likely to come from some smaller operation where thinking was more important than cranking out yet another mass-scale statistical survey of Big Data sausage.  Still, every push against wasteful inertia, like the Joyner et al. JAMA paper,  helps. Indeed, those many whose careers are entrapped by that part of the System have the skills and neuronal power to do something better if circumstances enabled it to happen more readily.  To encourage that, perhaps we should stop paying so much attention to Fishy stories.

The statistics of Promissory Science. Part I: Making non-sense with statistical methods

Statistics is a form of mathematics, a way devised by humans for representing abstract relationships. Mathematics comprises axiomatic systems, which make assumptions about basic units such as numbers; basic relationships like adding and subtracting; and rules of inference (deductive logic); and then elaborates these to draw conclusions that are typically too intricate to reason out in other less formal ways.  Mathematics is an awesomely powerful way of doing this abstract mental reasoning, but when applied to the real world it is only as true or precise as the correspondence between its assumptions and real-world entities or relationships. When that correspondence is high, mathematics is very precise indeed, a strong testament to the true orderliness of Nature.  But when the correspondence is not good, mathematical applications verge on fiction, and this occurs in many important applied areas of probability and statistics.

You can't drive without a license, but anyone with R or SAS can be a push-button scientist.  Anybody with a keyboard and some survey generating software can monkey around with asking people a bunch of questions and then 'analyze' the results. You can construct a complex, long, intricate, jargon-dense, expansive survey. You then choose who to subject to the survey--your 'sample'.  You can grace the results with the term 'data', implying true representation of the world, and be off and running.  Sample and survey designers may be intelligent, skilled, well-trained in survey design, and of wholly noble intent.  There's only one little problem: if the empirical fit is poor, much of what you do will be non-sense (and some of it nonsense).

Population sciences, including biomedical, evolutionary, social and political fields are experiencing an increasingly widely recognized crisis of credibility.  The fault is not in the statistical methods on which these fields heavily depend, but in the degree of fit (or not) to the assumptions--with the emphasis these days on the 'or not', and an often dismissal of the underlying issues in favor of a patina of technical, formalized results.  Every capable statistician knows this, but of course might be out of business if openly paying it enough attention. And many statisticians may be rather disinterested or too foggy in the philosophy of science to understand what goes beyond the methodological technicalities.  Jobs and journals depend on not being too self-critical.  And therein lie rather serious problems.

Promissory science
There is the problem of the problems--the problems we want to solve, such as in understanding the cause of disease so that we can do something about it.  When causal factors fit the assumptions, statistical or survey study methods work very well.  But when causation is far from fitting the assumptions, the impulse of the professional community seems mainly to increase the size, scale, cost, and duration of studies, rather than to slow down and rethink the question itself.  There may be plenty of careful attention paid to refining statistical design, but basically this stays safely within the boundaries of current methods and beliefs, and the need for research continuity.  It may be very understandable, because one can't just quickly uproot everything or order up deep new insights.  But it may be viewed as abuse of public trust as well as of the science itself.

The BBC Radio 4 program called More Or Less keeps a watchful eye on sociopolitical and scientific statistical claims, revealing what is really known (or not) about them.  Here is a recent installment on the efficacy (or believability, or neither) of dietary surveys.  And here is a FiveThirtyEight link to what was the basis of the podcast.

The promotion of statistical survey studies to assert fundamental discovery has been referred to as 'promissory science'.  We are barraged daily with promises that if we just invest in this or that Big Data study, we will put an end to all human ills.  It's a strategy, a tactic, and at least the top investigators are very well aware of it.  Big long-term studies are a way to secure reliable funding and to defer delivering on promises into the vague future.  The funding agencies, wanting to seem prudent and responsible to taxpayers with their resources, demand some 'societal impact' section on grant applications.  But there is in fact little if any accountability in this regard, so one can say they are essentially bureaucratic window-dressing exercises.

Promissory science is an old game, practiced since time immemorial by preachers.  It boils down to promising future bliss if you'll just pay up now.  We needn't be (totally) cynical about this.  When we set up a system that depends on public decisions about resources, we will get what we've got.  But having said that, let's take a look at what is a growing recognition of the problem, and some suggestions as to how to fix it--and whether even these are really the Emperor of promissory science dressed in less gaudy clothing.

A growing at least partial awareness
The problem of results that are announced by the media, journals, universities, and so on but that don't deliver the advertised promises is complex but widespread, in part because research has become so costly, that some warning sirens are sounding when it becomes clear that the promised goods are not being delivered.

One widely known issue is the lack of reporting of negative results, or their burial in minor journals. Drug-testing research is notorious for this under-reporting.  It's too bad because a negative result on a well-designed test is legitimately valuable and informative.  A concern, besides corporate secretiveness, is that if the cost is high, taxpayers or share-holders may tire of funding yet more negative studies.  Among other efforts, including by NIH, there is a formal attempt called AllTrials to rectify the under-reporting of drug trials, and this does seem at least to be thriving and growing if incomplete and not enforceable.  But this non-reporting problem has been written about so much that we won't deal with it here.

Instead, there is a different sort of problem.  The American Statistical Association has recently noted an important issue, which is the use and (often) misuse of p-values to support claims of identified  causation (we've written several posts in the past about these issues; search on 'p-value' if you're interested, and the post by Jim Wood is especially pertinent).  FiveThirtyEight has a good discussion of the p-value statement.

The usual interpretation is that p represents the probability that if there is in fact no causation by the test variable, that its apparent effect arose just by chance.  So if the observed p in a study is less than some arbitrary cutoff, such as 0.05, it means essentially that if no causation were involved the chance you'd see this association anyway is no greater than 5%; that is, there is some evidence for a causal connection.

Trashing p-values is becoming a new cottage industry!  Now JAMA is on the bandwagon, with an article that shows in a survey of biomedical literature from the past 25 years, including well over a million papers, a far disproportionate and increasing number of studies reported statistical significant results.  Here is the study on the JAMA web page, though it is not public domain yet.

Besides the apparent reporting bias, the JAMA study found that those papers generally failed to provide adequate fleshing out of that result.  Where are all the negative studies that statistical principles might expect to be found?  We don't see them, especially in the 'major' journals, as has been noted many times in recent years.  Just as importantly, authors often did not report confidence intervals or other measures of the degree of 'convincingness' that might illuminate the p-value. In a sense that means authors didn't say what range of effects is consistent with the data.  They report a non-random effect, but often didn't give the effect size, that is, say how large the effect was even assuming that effect was unusual enough to support a causal explanation. So, for example, a statistically significant increase of risk from 1% to 1.01% is trivial, even if one could accept all the assumptions of the sampling and analysis.

Another vocal critic of what's afoot is John Ionnides; in a recent article he levels both barrels against the misuse and mis- or over-representation of statistical results in biomedical sciences, including meta-analysis (the pooling of many diverse small studies into a single large analysis to gain sufficient statistical power to detect effects and test for their consistency).  This paper is a rant, but a well-deserved one, about how 'evidence-based' medicine has been 'hijacked' as he puts it.  The same must be said of  'precision genomic' or 'personalized' medicine, or 'Big Data', and other sorts of imitative sloganeering going on from many quarters who obviously see this sort of promissory science as what you have to do to get major funding.  We have set ourselves a professional trap, and it's hard to escape.  For example, the same author has been leading the charge against misrepresentative statistics for many years, and he and others have shown that the 'major' journals have in a sense the least reliable results in terms of their replicability.  But he's been raising these points in the same journals that he shows are culpable of the problem, rather than boycotting those journals.  We're in a trap!

These critiques of current statistical practice are the points getting most of the ink and e-ink.  There may be a lot of cover-ups of known issues, and even hypocrisy, in all of this, and perhaps more open or understandable tacit avoidance.  The industry (e.g., drug, statistics, and research equipment) has a vested interest in keeping the motor running.  Authors need to keep their careers on track.  And, in the fairest and non-political sense, the problems are severe.

But while these issues are real and must be openly addressed, I think the problems are much deeper. In a nutshell, I think they relate to the nature of mathematics relative to the real world, and the nature and importance of theory in science.  We'll discuss this tomorrow.

"The Blizzard of 2016" and predictability: Part III: When is a health prediction 'precise' enough?

We've discussed the use of data and models to predict the weather in the last few days (here and here).  We've lauded the successes, which are many, and noted the problems, including people not heeding advice. Sometimes that's due, as a commenter on our first post in this series noted, to previous predictions that did not pan out, leading people to ignore predictions in the future.  It is the tendency of some weather forecasters, like all media these days, to exaggerate or dramatize things, a normal part of our society's way of getting attention (and resources).

We also noted the genuine challenges to prediction that meteorologists face.  Theirs is a science that is based on very sound physics principles and theory, that as a meteorologist friend put it, constrain what can and might happen, and make good forecasting possible.  In that sense the challenge for accuracy is in the complexity of global weather dynamics and inevitably imperfect data, that may defy perfect analysis even by fast computers.  There are essentially random or unmeasured movements of molecules and so on, leading to 'chaotic' properties of weather, which is indeed the iconic example of chaos, known as the so-called 'butterfly effect': if a butterfly flaps its wings, the initially tiny and unseen perturbation can proliferate through the atmosphere, leading to unpredicted, indeed, wildly unpredictable changes in what happens.
  
The Butterfly Effect, far-reaching effects of initial conditions; Wikipedia, source

Reducing such effects is largely a matter of needing more data.  Radar and satellite data are more or less continuous, but many other key observations are only made many miles apart, both on the surface and into the air, so that meteorologists must try to connect them with smooth gradients, or estimates of change, between the observations.  Hence the limited number of future days (a few days to a week or so) for which forecasts are generally accurate.

Meteorologists' experience, given their resources, provide instructive parallels as well as differences with biomedical sciences, that aim for precise prediction, often of things decades in the future, such as disease risk based on genotype at birth or lifestyle exposures.  We should pay attention to those parallels and differences.

When is the population average the best forecast?
Open physical systems, like the atmosphere, change but don't age.  Physical continuity means that today is a reflection of yesterday, but the atmosphere doesn't accumulate 'damage' the way people do, at least not in a way that makes a difference to weather prediction.  It can move, change, and refresh, with a continuing influx and loss of energy, evaporation and condensation, and circulating movement, and so on. By contrast, we are each on a one-way track, and a population continually has to start over with its continual influx of new births and loss to death. In that sense, a given set of atmospheric conditions today has essentially the same future risk profile as such conditions had a year or century or millennium ago. In a way, that is what it means to have a general atmospheric theory. People aren't like that.

By far, most individual genetic and even environmental risk factors identified by recent Big Data studies only alter lifetime risk by a small fraction.  That is why the advice changes so frequently and inconsistently.  Shouldn't it be that eggs and coffee either are good or harmful for you?  Shouldn't a given genetic variant definitely either put you at high risk, or not? 

The answer is typically no, and the fault is in the reporting of data, not the data themselves. This is for several very good reasons.  There is measurement error.  From everything we know, the kinds of outcomes we are struggling to understand are affected by a very large number of separate causally relevant factors.  Each individual is exposed to a different set or level of those factors, which may be continually changing.  The impact of risk factors also changes cumulatively with exposure time--because we age.  And we are trying to make lifetime predictions, that is, ones of open-ended duration, often decades into the future.  We don't ask "Will I get cancer by Saturday?", but "Will I ever get cancer?"  That's a very different sort of question.

Each person is unique, like each storm, but we rarely have the kind of replicable sampling of the entire 'space' of potentially risk-affecting genetic variants--and we never will, because many genetic or even environmental factors are very rare and/or their combinations essentially unique, they interact and they come and go.  More importantly, we simply do not have the kind of rigorous theoretical basis that meteorology does. That means we may not even know what sort of data we need to collect to get a deeper understanding or more accurate predictive methods.

Unique contributions of combinations of a multiplicity of risk factors for a given outcome means the effect of each factor is generally very small and even in individuals their mix is continually changing.  Lifetime risks for a trait are also necessarily averaged across all other traits--for example, all other competing causes of death or disease.  A fatal early heart attack is the best preventive against cancer!  There are exceptions of course, but generally, forecasts are weak to begin with and in many ways over longer predictive time periods they will simply approximate the population--public health--average.  In a way that is a kind of analogy with weather forecasts that, beyond a few days into the future, move towards the climate average.

Disease forecasts change peoples' behavior (we stop eating eggs or forego our morning coffee, say), each person doing so, or not, to his/her own extent.  That is, feedback from the forecast affects the very risk process itself, changing the risks themselves and in unknown ways.  By contrast, weather forecasts can change behavior as well (we bring our umbrella with us) but the change doesn't affect the weather itself.


Parisians in the rain with umbrellas, by Louis-Léopold Boilly (1803)

Of course, there are many genes in which variants have very strong effects.  For those, forecasts are not perfect but the details aren't worth worrying about: if there are treatments, you take them.  Many of these are due to single genes and the trait may be present at birth. The mechanism can be studied because the problem is focused.  As a rule we don't need Big Data to discover and deal with them.  

The epidemiological and biomedical problem is with attempts to forecast complex traits, in which most every instance is causally unique.  Well, every weather situation is unique in its details, too--but those details can all be related to a single unifying theory that is very precise in principle.  Again, that's what we don't yet have in biology, and there is no really sound scientific justification for collecting reams of new data, which may refine predictions somewhat, but may not go much farther.  We need to develop a better theory, or perhaps even to ask whether there is such a formal basis to be had--or is the complexity we see is just what there is?

Meteorology has ways to check its 'precision' within days, whereas biomedical sciences have to wait decades for our rewards and punishments.  In the absence of tight rules and ways to adjust errors, constraints on biomedical business as usual are weak.  We think a key reason for this is that we must rely not on externally applied theory, but internal comparisons, like cases vs controls.  We can test for statistical differences in risk, but there is no reason these will be the same in other samples, or the future.  Even when a gene or dietary factor is identified by such studies, its effects are usually not very strong even if the mechanism by which they affect risk can be discovered.  We see this repeatedly, even for risk factors that seemed to be obvious.

We are constrained not just to use internal comparisons but to extrapolate the past to the future.  Our comparisons, say between cases and controls, are retrospective and almost wholly empirical rather than resting on adequate theory.  The 
'precision' predictions we are being promised are basically just applications of those retrospective findings to the future.  It's typically little more than extrapolation, and because risk factors are complex and each person is unique, the extrapolation largely assumes additivity: that we just add up the risk estimates for various factors that we measured on existing samples, and use that sum as our estimate of future risk.  

Thus, while for meteorology, Big Data makes sense because there is strong underlying theory, in many aspects of biomedical and evolutionary sciences, this is simply not the case, at least not yet.  Unlike meteorology, biomedical and genetic sciences are the really harder ones!  We are arguably just as likely to progress in our understanding by accumulating results from carefully focused questions, where we're tracing some real causal signal (e.g., traits with specific, known strong risk factors), as by just feeding the incessant demands of the Big Data worldview.  But this of course is a point we've written (ranted?) about many times.

You bet your life, or at least your lifestyle!
If you venture out on the highway despite a forecast snowstorm, you are placing your life in your hands.  You are also imposing dangers on others (because accidents often involve multiple vehicles). In the case of disease, if you are led by scientists or the media to take their 'precision' predictions too seriously, you are doing something similar, though most likely mainly affecting yourself.  

Actually, that's not entirely true.  If you smoke or hog up on MegaBurgers, you certainly put yourself at risk, but you risk others, too. That's because those instances of disease that truly are strongly and even mappably genetic (which seems true of subsets of even of most 'complex' diseases), are masked by the majority of cases that are due to easily avoidable lifestyle factors; the causal 'noise' that risky lifestyles make genetic causation harder to tease out.

Of course, taking minor risks too seriously also has known potentially serious consequences, such as of intervening on something that was weakly problematic to begin with.  Operating on a slow-growing prostate or colon cancer in older people, may lead to more damage than the cancer will. There are countless other examples.


Life as a Garden Party
The need is to understand weak predictability, and to learn to live with it. That's not easy.

I'm reminded of a time when I was a weather officer stationed at an Air Force fighter base in the eastern UK.  One summer, on a Tuesday morning, the base commander called me over to HQ.  It wasn't for the usual morning weather briefing.....

"Captain, I have a question for you," said the Colonel.

"Yes, sir?"

"My wife wants to hold a garden party on Saturday.  What will the weather be?"

"It might rain, sir," I replied.

The Colonel was not very pleased with my non-specific answer, but this was England, after all!

And if I do say so myself, I think that was the proper, and accurate, forecast.**


Plus ça change..  Rain drenches royal garden party, 2013; The Guardian


**(It did rain.  The wife was not happy! But I'd told the truth.)

"The Blizzard of 2016" and predictability: Part II: When is a prediction a good one? When is it good enough?

Weather forecasts require the prediction of many different parameter values.  These include temperature, wind at the ground and aloft (winds that steer storm systems, and where planes fly), humidity on the ground and in the air (that determines rain and snowfall), friction (related to tornadoes and thunderstorms), change over time and the track of these things across the surface with its own weather-affecting characteristics (like water, mountains, cities).  Forecasters have to model and predict all of these things.  In my day, we had to do it mainly with hand-drawn maps and ground observations--no satellites, basically no useful radar, only scattered ship reports over oceans, etc.), but of course now it's all computerized.

Other sciences are in the prediction business in various ways.  Genetic and other aspects of epidemiology are among them.  The widely made, now trendy promise of 'precision' medicine, or the predictions of what's good or bad for you, are clear daily examples.  But as with the weather, we need some criteria, or even some subjective sense of how good a prediction is.  Is it reliable enough to convince you to change how you live?

Yesterday, I discussed aspects of weather prediction and what people do in response, if anything.  Last weekend's big storm was predicted many days in advance, and it largely did what was being predicted.  But let's take a closer look and ask: How good is good enough for a prediction?  Did this one meet the standard?

Here are predicted patterns of snowfall depth, from the January 24th New York Times, the day after the storm, with data provided by the National Weather Service:



And now here are the measured results, as reported by various observers:




Are these well-forecast depths, or not?  How would you decide?  Clearly, the maximum snowfall reported (42") in the Washington area was a lot more than the '20+"' forecast, but is that nit-picking?  "20+" does leave a lot of leeway for additional snowfall, after all.  But, the prediction contour plot is very similar to the actual result. We are in State College, rather a weather capital because the Penn State Meteorology Department has long been a top-rated one and because Accuweather is located here as a result.  Our snowfall was somewhere between 7 and 10 inches.  The top prediction map shows us in the very light area, with somewhere between 1-5" and 7-10" expected, and the forecasts were for there to be a sharp boundary between virtually no snowfall, and a large dump.  A town only a few miles north of us had very few inches.

So was the forecast a good one, or a dud?

How good is a good forecast?
The answer to this fair question depends on the consequences.  No forecast can be perfect--not even in physics where deterministic mathematical theory seems to apply.  At the very least, there will always be measurement errors, meaning you can never tell exactly how good a prediction was.

As a lead-up to the storm's arrival in the east, I began checking a variety of commercial weather companies (AccuWeather, WeatherUnderground, the Weather Channel, WeatherBug) as well as the US National and the European Weather Services, interested in how similar they were.

This is an interesting question, because they all rely on a couple of major computer models of the weather, including an 'ensemble' of their forecasts. The local companies all use basically the same global data sources, and the same physical theory of fluid dynamics, and the same resulting numerical models.  They try to be original (that's the nature of the commercial outfits, of course, since they need to make sales, and even the government services want to show that they're in the public eye).

In the vast majority of cases, as in this one, the shared data from weather balloons, radar, ground reports, and satellite imagery, as well as the same physical theory, means that there really are only minor differences in the application of the theory to the computed models.  Data resources allow retrospective analysis to make corrections to the various models and see how each has been doing and adjust them.  For the curious, most of this is, rightly, freely available on the internet (thanks to its ultimately public nature).  Even the commercial services, as well as many universities, make data conveniently available.

In this case, the forecasts did vary. All more or less had us (State College) on a sharp edge of the advancing snow front.  Some forecasts had us getting almost no snow, others 1-3", others in the 5-8" range.  These varied within any given organization over time, as of course it should when better models become available.  But that's usually when D-day is closer and there is less extrapolation of the models, in that sense less accuracy or usefulness from a precision point of view.  At the same time, all made it clear that a big storm was coming and our location was near to the edge of real snowfall. They all also agreed about the big dump in the Washington area, but varied in terms of what they foresaw for New York and, especially, Boston.  Where most snow and disruption occurred, they gave plenty of notice, so in that sense the rest can be said to be details.  But if you expected 3" of snow and got a foot, you might not feel that way.

If you're in the forecasting business--be it for the weather or health risks based on, say, your genome or lifestyle exposures--you need to know how accurate forecasts are since they can lead to costly or even life-or-death consequences.  Crying wolf--and weather companies seem ever tempted to be melodramatic to retain viewers--is not good of course, but missing a major event could be worse, if people were not warned and didn't take precautions.  So it is important to have comparative predictions by various sources based on similar or even the same data, and for them to keep an eye on each other's reasons, and to adjust.

As far as accuracy and distance (time) is concerned, precision is a different sort of thing.  Here is the forecast by our local, excellent AccuWeather company for the next several days:

This and figure below from AccuWeather.com

And here is their forecast for the days after that.



How useful are these predictions, and how would you decide?  What minor or major decisions would you make, based on your answers?  Here nothing nasty is in the forecast, so if they blow the temperature or cloud over on the out-days of this span, you might grumble but you won't really care.

However, I'm writing this on Sunday, January 24.  The consensus of several online forecasts was all roughly like the above figures.  Basically smooth sailing for the week, with a southerly and hence warm but not very stormy air flow, and no significant weather.  But late yesterday, I saw one forecast for the possibility of another Big One like what we just had.  The forecaster outlined the similarities today with conditions ten days ago, and in a way played up the possibility of another one like it.  So I looked at the upper-air steering winds and found that they seem to be split between one that will steer cold arctic air down towards the southern and eastern US, and another branch that will sweep across the south including the most Gulf of Mexico and join up with the first branch in the eastern US, which is basically what happened last week!

Now, literally as I write, one online forecast outfit has changed its forecast for the coming week-end (just 5 days from now) to rain and possibly ice pellets.  Another site now asks "Could the eastern US face more snow later this week?" Another makes no such projection.  Go figure!

Now it's Monday.  One commercial site is forecasting basically nothing coming.  Another forecasts the probability of rain starting this weekend.  NOAA is forecasting basically nothing through Friday.

But here are screenshots from an AccuWeather video on Monday morning, discussing the coming week.  First, there is doubt as to whether the Low pressure system (associated with precipitation) will move up the east coast or farther out to sea.  The actual path taken, steered by upper-level winds, will make a big difference in the weather experienced in the east.

Source: AccuWeather.com

The difference in outcomes would essentially be because the relevant wind will be across the top of the Low, moving from east to west, that is, coming off the ocean onto land (air circulates as a counter-clockwise eddy around the center of the Low).  Rain or possibly snow will fall on land as the result.  How much, or how cold it will be depends on which path is taken.  This next shot shows a possible late-week scenario.

Source:  AccuWeather.com
The grey is the upper-level steering winds, but their actual path is not certain, as the prior figure showed, meaning that exactly where the Low will go is uncertain at present.  There just isn't enough data, and so there's too much uncertainty in the analysis, to be more precise at this stage.  The dry and colder air shown coming from the west would flow underneath the most air flowing in from offshore, pushing it up and causing precipitation.  If the flow is more eastward of the alternatives in the previous figure, the 'action' will mainly be out at sea.

Well, it's now Monday afternoon, and two sites I check are predicting little if anything as of the weekend....but another site is predicting several days in a row of rain.  And....(my last 'update'), a few hours later, the site is predicting 'chance of rain' for the same days.

To me, with my very rusty, and by now semi-amateur checking of various things, it looks as if there won't be anything dropping on us.  We'll see!

The point here is how much things change and how fast on little prior indication--and we are only talking about predicting a few days, not weeks, ahead.  The above AccuWeather video shows the uncertainty explicitly, so we're not being misled, just advised.

This level of uncertainty is relevant to biology, because meteorology is based on sophisticated, sound physics theory (hydrodynamics, etc.).  It lends itself to high-quality, very extensive and even exotic instrumentation and mathematical computer simulation modeling.  Most of the time, for most purposes, however, it is already an excellent system.  And yet, while major events like the Big Blizzard this January are predictable in general, if you want specific geographic details, things fall short.  It's a subjective judgment as to when one would say "short of perfection" rather than "short but basically right.".

With more instrumentation (satellites, radar, air-column monitoring techniques, and faster computers) it will get inevitably better.  Here's a reasonable case for Big Data.  However, because of measurement errors and minor fluctuations that can't be detected, inaccuracies accumulate (that is an early example of what is meant by 'chaotic' systems: the farther down the line you want to predict, the greater your errors.  Today, in meteorology, except in areas like deserts where things hardly change, I've been told by professional colleagues who are up to date, that a week ahead is about the limit.  After that, at least under conditions and locations where weather change is common, specific conditions today are no better than the climate average for that location and time of year.

The more dynamic a situation--changing seasons, rapidly altering air and moisture movement patterns, mountains or other local effects on air flow, the less predictable over more than a few days. You have to take such longer-range predictions with a huge grain of salt, understanding that they're the best theory and intuition and experience can do at present (and taking into account that it is better to be safe--warned--than sorry, and that companies need to promote their services with what we might charitably call energetic presentations).  The realities are that under all but rather stable conditions, such long-term predictions are misleading and probably shouldn't even be made: weather services should 'just say no' to offering them.

An important aspect of prediction these days, where 'precision' has recently become a widely canted promise, is in health.  Epidemiologists promise prediction based on lifestyle data.  Geneticists promise prediction based on genotypes.  How reliable or accurate are they now, or likely to become in the predictable future?  At what point does population average do as well as sophisticated models? We'll discuss that in tomorrow's installment.

Rare Disease Day and the promises of personalized medicine

O ur daughter Ellen wrote the post that I republish below 3 years ago, and we've reposted it in commemoration of Rare Disease Day, Febru...