assumptions etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster
assumptions etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

When scientific theory constrains

It's good from time to time to reflect on how we know what we think we know.  And to remember that, as it has been in any time in history, much of what we now think is true will sooner or later be found to be false or, often, only inaccurately or partially correct.  Some of this is because values change -- not so long ago homosexuality was considered to be an illness, e.g.  Some is because of new discoveries -- when archaea were first discovered they were thought to be exotic microbes that inhabited extreme environments but now they're known to live in all environments, even in and on us. And of course these are just two of countless examples.

But what we think we know can be influenced by our assumptions about what we think is true, too. It's all too easy to look at data and interpret it in a way that makes sense to us, even if there are multiple possible interpretations.  This can be a particular problem in social science, when we've got a favorite theory and the data can be seen to confirm it; this is perhaps easiest to notice if you yourself aren't wedded to any of the theories.  But it's also true in biology. It is understandable that we want to assert that we now know something, and are rewarded for insight and discoveries, rather than more humbly hesitating to make claims.

Charitable giving
The other day I was listening to the BBC Radio 4 program Analysis on the charitable impulse.  Why do people give to charity?  It turns out that a lot of psychological research has been done on this, to the point that charities are now able to manipulate us into giving.  If you call your favorite NPR station to donate during a fund drive, e.g., if you're told that the caller just before you gave a lot of money, you're more likely to make a larger donation than if you're told the previous caller pledged a small amount.

A 1931 advertisement for the British charity, Barnardo's Homes; Wikipedia

Or, if an advertisement pictures one child, and tells us the story of that one child, we're more likely to donate than if we're told about 30,000 needy children.  This works even if we're told the story of two children, one after the other.  But, according to one of the researchers, if we're shown two children at once, and told that if we give, the money will randomly go to just one of the children, we're less likely to give.  This researcher interpreted this to mean that two is too many.

But there seem to me to be other possible interpretations given that the experiment changes more than one variable.  Perhaps it's that we don't like the idea that someone else will choose who gets our money.  Or that we feel uncomfortable knowing that we've helped only one child when two are needy.  But surely something other than that two is too many, given that in 2004 so many people around the world donated so much money to organizations helping tsunami victims that many had to start turning down donations.  These were anonymous victims, in great numbers.  Though, as the program noted, people weren't nearly as generous to the great number of victims of the earthquake in Nepal in 2015, with no obvious explanation.

The researcher did seem to be wedded to his one vs too many interpretation, despite the contradictory data.  In fact, I would suggest that the methods, given what were presented, don't allow him to legitimately draw any conclusion.  Yet he readily did.

Thinness microbes?
The Food Programme on BBC Radio 4 is on to the microbiome in a big way.  Two recent episodes (here and here) explore the connection between gut microbes, food, and health and the program promises to update us as new understanding develops.  As we all know by now, the microbiome, the bug intimates that accompany us through life, in and on our body, may affect our health, our weight, our behavior, and perhaps much more.  Or not.


Pseudomonas aeruginosa, Enterococcus faecalis and Staphylococcus aureus on Tryptic Soy Agar.  Wikipedia

Obesity, asthma, atopy, periodontal health, rheumatoid arthritis, Parkinson's, Alzheimer's, autism, and many many more conditions have been linked with, or are suggested to be linked with, in one way or another, our microbiome.  Perhaps we're hosting the wrong microbes, or not a diverse enough set of microbes, or we wipe the good ones out with antibiotics along with the bad, or with alcohol, and what we eat may have a lot to do with this.

One of the researchers interviewed for the program was experimenting with a set of identical twins in Scotland.  He varied their diets having them eat, for example, lots of junk food and alcohol, or a very fibrous diet, and documented changes in their gut microbiomes which apparently can change pretty quickly with changes in diet.  The most diverse microbiome was associated with the high fiber diet. Researchers seem to feel that diversity is good.

Along with a lot of enthusiasm and hype, though, mostly what we've got in microbiome research so far is correlations.  Thin people tend to have a different set of microbes than obese people, and people with a given neurological disease might statistically share a specific subset of microbes.  But this tells us nothing about cause and effect -- which came first, the microbiome or the condition?  And because the microbiome can change quickly and often, how long and how consistently would an organism have to reside in our gut before it causes a disease?

There was some discussion of probiotics in the second program, the assumption being that controlling our microbiome affects our health.  Perhaps we'll soon have probiotic yogurt or kefir or even a pill that keeps us thin, or prevents Alzheimer's disease.  Indeed, this was the logical conclusion from all the preceding discussion.

But one of the researchers, inadvertently I think, suggested that perhaps this reductionist conclusion was unwarranted.  He cautioned that thinking about probiotic pills rather than lifestyle might be counterproductive.  But except for factors with large effects such as smoking, the effect of "lifestyle" on health is rarely obvious.  We know that poverty, for example, is associated with ill health, but it's not so easy to tease out how and why.  And, if the microbiome really does directly influence our health, as so many are promising, the only interesting relevant thing about lifestyle would be how it changes our microbiomic makeup.  Otherwise, we're talking about complexity, multiple factors with small effects -- genes, environmental factors, diet, and so on, and all bets about probiotics and "the thinness microbiome" are off.  But, the caution was, to my mind, an important warning about the problem of assuming we know what we think we know; in this case, that the microbiome is the ultimate cause of disease.

The problem of theory
These are just two examples of the problem of assumption-driven science. They are fairly trivial, but if you are primed to notice, you'll see it all around you. Social science research is essentially the interpretation of observational data from within a theoretical framework. Psychologists might interpret observations from the perspective of behavioral, or cognitive, or biological psychology, e.g., and anthropologists, at least historically, from, say, a functionalist or materialist or biological or post-modernist perspective. Even physicists interpret data based on whether they are string theorists or particle physicists.

And biologists' theoretical framework? I would suggest that two big assumptions that biologists make are reductionism and let's call it biological uniformitarianism. We believe we can reduce causation to a single factor, and we assume that we can extrapolate our findings from the mouse or zebrafish we're working on to other mice, fish and species, or from one or some people to all people. That is, we assume invariance rather than that what we can expect is variation. There is plenty of evidence to show that by now we should know better.

True, most biologists would probably say that evolutionary theory is their theoretical framework, and many would add that traits are here because they're adaptive, because of natural selection. Evolution does connect people to each other and people to other species, it has done so by working on differences, not replicated identity, and there is no rule for the nature or number of those differences or for extrapolating from one species or individual to another. We know nothing to contradict evolutionary theory, but that every trait is adaptive is an assumption, and a pervasive one.

Theory and assumption can guide us, but they can also improperly constrain how we think about our data, which is why it's good to remind ourselves from time to time to think about how we know what we think we know. As scientists we should always be challenging and testing our assumptions and theories, not depending on them to tell us that we're right.

The statistics of Promissory Science. Part I: Making non-sense with statistical methods

Statistics is a form of mathematics, a way devised by humans for representing abstract relationships. Mathematics comprises axiomatic systems, which make assumptions about basic units such as numbers; basic relationships like adding and subtracting; and rules of inference (deductive logic); and then elaborates these to draw conclusions that are typically too intricate to reason out in other less formal ways.  Mathematics is an awesomely powerful way of doing this abstract mental reasoning, but when applied to the real world it is only as true or precise as the correspondence between its assumptions and real-world entities or relationships. When that correspondence is high, mathematics is very precise indeed, a strong testament to the true orderliness of Nature.  But when the correspondence is not good, mathematical applications verge on fiction, and this occurs in many important applied areas of probability and statistics.

You can't drive without a license, but anyone with R or SAS can be a push-button scientist.  Anybody with a keyboard and some survey generating software can monkey around with asking people a bunch of questions and then 'analyze' the results. You can construct a complex, long, intricate, jargon-dense, expansive survey. You then choose who to subject to the survey--your 'sample'.  You can grace the results with the term 'data', implying true representation of the world, and be off and running.  Sample and survey designers may be intelligent, skilled, well-trained in survey design, and of wholly noble intent.  There's only one little problem: if the empirical fit is poor, much of what you do will be non-sense (and some of it nonsense).

Population sciences, including biomedical, evolutionary, social and political fields are experiencing an increasingly widely recognized crisis of credibility.  The fault is not in the statistical methods on which these fields heavily depend, but in the degree of fit (or not) to the assumptions--with the emphasis these days on the 'or not', and an often dismissal of the underlying issues in favor of a patina of technical, formalized results.  Every capable statistician knows this, but of course might be out of business if openly paying it enough attention. And many statisticians may be rather disinterested or too foggy in the philosophy of science to understand what goes beyond the methodological technicalities.  Jobs and journals depend on not being too self-critical.  And therein lie rather serious problems.

Promissory science
There is the problem of the problems--the problems we want to solve, such as in understanding the cause of disease so that we can do something about it.  When causal factors fit the assumptions, statistical or survey study methods work very well.  But when causation is far from fitting the assumptions, the impulse of the professional community seems mainly to increase the size, scale, cost, and duration of studies, rather than to slow down and rethink the question itself.  There may be plenty of careful attention paid to refining statistical design, but basically this stays safely within the boundaries of current methods and beliefs, and the need for research continuity.  It may be very understandable, because one can't just quickly uproot everything or order up deep new insights.  But it may be viewed as abuse of public trust as well as of the science itself.

The BBC Radio 4 program called More Or Less keeps a watchful eye on sociopolitical and scientific statistical claims, revealing what is really known (or not) about them.  Here is a recent installment on the efficacy (or believability, or neither) of dietary surveys.  And here is a FiveThirtyEight link to what was the basis of the podcast.

The promotion of statistical survey studies to assert fundamental discovery has been referred to as 'promissory science'.  We are barraged daily with promises that if we just invest in this or that Big Data study, we will put an end to all human ills.  It's a strategy, a tactic, and at least the top investigators are very well aware of it.  Big long-term studies are a way to secure reliable funding and to defer delivering on promises into the vague future.  The funding agencies, wanting to seem prudent and responsible to taxpayers with their resources, demand some 'societal impact' section on grant applications.  But there is in fact little if any accountability in this regard, so one can say they are essentially bureaucratic window-dressing exercises.

Promissory science is an old game, practiced since time immemorial by preachers.  It boils down to promising future bliss if you'll just pay up now.  We needn't be (totally) cynical about this.  When we set up a system that depends on public decisions about resources, we will get what we've got.  But having said that, let's take a look at what is a growing recognition of the problem, and some suggestions as to how to fix it--and whether even these are really the Emperor of promissory science dressed in less gaudy clothing.

A growing at least partial awareness
The problem of results that are announced by the media, journals, universities, and so on but that don't deliver the advertised promises is complex but widespread, in part because research has become so costly, that some warning sirens are sounding when it becomes clear that the promised goods are not being delivered.

One widely known issue is the lack of reporting of negative results, or their burial in minor journals. Drug-testing research is notorious for this under-reporting.  It's too bad because a negative result on a well-designed test is legitimately valuable and informative.  A concern, besides corporate secretiveness, is that if the cost is high, taxpayers or share-holders may tire of funding yet more negative studies.  Among other efforts, including by NIH, there is a formal attempt called AllTrials to rectify the under-reporting of drug trials, and this does seem at least to be thriving and growing if incomplete and not enforceable.  But this non-reporting problem has been written about so much that we won't deal with it here.

Instead, there is a different sort of problem.  The American Statistical Association has recently noted an important issue, which is the use and (often) misuse of p-values to support claims of identified  causation (we've written several posts in the past about these issues; search on 'p-value' if you're interested, and the post by Jim Wood is especially pertinent).  FiveThirtyEight has a good discussion of the p-value statement.

The usual interpretation is that p represents the probability that if there is in fact no causation by the test variable, that its apparent effect arose just by chance.  So if the observed p in a study is less than some arbitrary cutoff, such as 0.05, it means essentially that if no causation were involved the chance you'd see this association anyway is no greater than 5%; that is, there is some evidence for a causal connection.

Trashing p-values is becoming a new cottage industry!  Now JAMA is on the bandwagon, with an article that shows in a survey of biomedical literature from the past 25 years, including well over a million papers, a far disproportionate and increasing number of studies reported statistical significant results.  Here is the study on the JAMA web page, though it is not public domain yet.

Besides the apparent reporting bias, the JAMA study found that those papers generally failed to provide adequate fleshing out of that result.  Where are all the negative studies that statistical principles might expect to be found?  We don't see them, especially in the 'major' journals, as has been noted many times in recent years.  Just as importantly, authors often did not report confidence intervals or other measures of the degree of 'convincingness' that might illuminate the p-value. In a sense that means authors didn't say what range of effects is consistent with the data.  They report a non-random effect, but often didn't give the effect size, that is, say how large the effect was even assuming that effect was unusual enough to support a causal explanation. So, for example, a statistically significant increase of risk from 1% to 1.01% is trivial, even if one could accept all the assumptions of the sampling and analysis.

Another vocal critic of what's afoot is John Ionnides; in a recent article he levels both barrels against the misuse and mis- or over-representation of statistical results in biomedical sciences, including meta-analysis (the pooling of many diverse small studies into a single large analysis to gain sufficient statistical power to detect effects and test for their consistency).  This paper is a rant, but a well-deserved one, about how 'evidence-based' medicine has been 'hijacked' as he puts it.  The same must be said of  'precision genomic' or 'personalized' medicine, or 'Big Data', and other sorts of imitative sloganeering going on from many quarters who obviously see this sort of promissory science as what you have to do to get major funding.  We have set ourselves a professional trap, and it's hard to escape.  For example, the same author has been leading the charge against misrepresentative statistics for many years, and he and others have shown that the 'major' journals have in a sense the least reliable results in terms of their replicability.  But he's been raising these points in the same journals that he shows are culpable of the problem, rather than boycotting those journals.  We're in a trap!

These critiques of current statistical practice are the points getting most of the ink and e-ink.  There may be a lot of cover-ups of known issues, and even hypocrisy, in all of this, and perhaps more open or understandable tacit avoidance.  The industry (e.g., drug, statistics, and research equipment) has a vested interest in keeping the motor running.  Authors need to keep their careers on track.  And, in the fairest and non-political sense, the problems are severe.

But while these issues are real and must be openly addressed, I think the problems are much deeper. In a nutshell, I think they relate to the nature of mathematics relative to the real world, and the nature and importance of theory in science.  We'll discuss this tomorrow.

Rare Disease Day and the promises of personalized medicine

O ur daughter Ellen wrote the post that I republish below 3 years ago, and we've reposted it in commemoration of Rare Disease Day, Febru...