Masal ve Hikayeler: biology

biology etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Spooky action at a (short) distance

Entanglement in physics is about action that seems to transfer some sort of 'information' across distances at speeds faster than that of light. Roughly speaking (I'm not a physicist!), it is about objects with states that are not fixed in advance, and could take various forms but must differ between them, and that are separated from each other. When measurement is made on one of them, whatever the result, the corresponding object takes on its opposite state. That means the states are not entirely due to local factors, and somehow the second object 'knows' what state the first was observed in and takes on a different state.

You can read about this in many places and understand it better than I do or than I've explained it here. Albert Einstein was skeptical that this could occur, if the speed of light were the fastest possible speed. So he famously called the findings as they stood at that time "Spooky action at a distance." But the findings have stood many specific tests, and seem to be real, however it happens.

Does life, too, have spooky action?
I think the answer is: maybe so. But it is at a very short distance, that within the nuclei of individual cells. Organisms have multiple chromosomes and many species, like humans, have 2 instances of each (are 'diploid'), one inherited from each parent. I say 'instances' rather than 'copies', because they are not identical to each other nor to those of the parent that transmitted each of them. They are perhaps near copies, but mutation always occurs, even among the cells within each of us, so each cell differs from their contemporary somatic fellows and from what we inherited in our single-cell beginnings as a fertilized egg.

Many clever studies over many years have been documenting the 3-dimensional, context-specific conformation, or detailed physical arrangement of chromosomes within cells. The work is variously known, but one catch-term is chromosome conformation capture, or 3C, and I'll use that here. Unless or until this approach is shown to be too laden with laboratory artifact (it's quite sophisticated), we'll assume it's more or less right.

The gist of the phenomenon is that (1) a given cell type, under a given set of conditions, is using only a subset of its genes (for my purposes here this generally means protein-coding genes proper); (2) these active genes are scattered along and between the chromosomes, with intervening inactive regions (genes not being used at the moment); (3) the cell's gene expression pattern can change quickly when its circumstances change, as it responds to environmental conditions, during cell division, etc.; (4) at least to some extent the active regions seem to be clustered physically together in expression-centers in the nucleus; (5) this all implies that there is extensive trans communication, coordinating, and physically juxtaposing, parts within and among each chromosome--there is action at a very short distance.

Even more remarkably, I think, this phenomenon seems somehow robust to speciation because related species have similar functions and similar sets of genes, but often their chromosomes have been extensively rearranged during their evolutionary separation. More than this: each person has different DNA sequences due to mutation, and different numbers of genes due to copy number changes (duplications, deletions); yet the complex local juxtapositions seem to work anyway. At present this is so complicated, so regular, and so changeable and thus so poorly understood, that I think we can reasonably parrot Einstein and call it 'spooky'.

What this means is that chromosomes are not just randomly floating around like a bowl of spaghetti. Gene expression (including transcribed non-coding RNAs) is thought to be based on the sequence-specific binding of tens of transcription factors in an expression complex that is (usually) just upstream of the transcribed part. Since a given cell under given conditions is expressing thousands of condition-specific genes, there must be very extensive interaction or 'communication' in trans, that is, across all the chromosomes. That's because the cell can change its expression set very quickly.

The 3C results show that in a given type of cell under given conditions, the chromosomes are physically very non-randomly arranged, with active centers physically very near or perhaps touching each other. How this massive plate of apparent-spaghetti even physically rearranges to get these areas together, without getting totally tangled up, yet to be quickly rearrangeable is, to me, spooky if anything in Nature is. The entanglement, disentanglement, and re-entanglement happens genome wide, which is implicitly what the classical term 'polygenic' essentially recognized related to genetic causation, but is now being documented.

The usual approach of genetics these days is to sequence and enumerate various short functional bits as being coding, regulatory, enhancing, inhibiting, transcribing etc. other parts nearby. We have long been able to analyze cDNA and decide which parts are being used for protein coding, at least. Locally, we can see why or how this happens, in the sense that we can identify the transcription factors and their binding sites, called promoters, enhancers and the like, and the actual protein or functional RNA codes. We can find expression correlates by extracting them from cells and enumerating them. 3C analysis appears to show that these coding elements are, at least to some extent, found juxtaposed in various transcription hot-spots.

Is gene expression 'entangled'?
What if the molecular aspects of the 3C research were shown to be technical artifacts, relative to what is really going on? I have read some skepticism about that, concerning what is found in single cells vs aggregates of 'identical' cells. If 3C stumbles, will our idea of polygenic condition-specific gene usage change? I think not. We needn't have 3C data to show the functional results since they are already there to see (e.g., in cell-specific expression studies--cDNA and what ENCODE has found). If 3C has been misleading for technical or other reasons, it would just mean that something else just as spooky but different from the 3D arrangement that 3C detects, is responsible for correlating the genomewide trans gene usage. And it's of course 4-dimensional since it's time-dependent, too. So what I've said here still will apply, even if for some other, unknown or even unsuspected reason.

The existing observations on context-specific gene expression show that something 'entangles' different parts of the genome for coordinated use, and that can change very rapidly. The same genome, among the different types of cells of an individual, can behave very differently in this sense. Somehow, its various chromosomal regions 'know' how to be, or, better put, are coordinated. This seems at least plausibly to be more than just that a specific context-specific set of transcription factors (TFs) binds selectively near regions to be transcribed and changes in its thousands of details almost instantly. What TFs? and how does a given TF know which binding sites to grab or to release, moment by moment, since they typically bind enhancers or promoters of many different genes, not all of them expression-related. And if you want to dismiss that, by saying for example that this has to do with which TFs are themselves being produced, or which parts of DNA are unwrapped at each particular time, then you're just bumping the same question about trans control up, or over, to a different level of what's involved. That's no answer!

And there is even another, seemingly simpler example to show that we really don't understand what's going on: the alignment of homologues in the first stage of meiosis. We've been taught that empirical and necessary fact about meiosis for many decades. But how do the two homologues find each other to align? This is essentially just not mentioned, if anyone even was asking, in textbooks. I've seen some speculative ideas, again involving what I'll call 'electromagnetic' properties of each chromosome but even their authors didn't really claim it was sufficient or definitive. Just for examples, homologous chromosomes in a diploid individual have different rearrangements, deletions, duplications, and all sorts of heterozygous sequence details, yet by and large they still seem to find each other in meiosis. Something's going on!

How might this be tested?
I don't have any answers, but I wonder if, on the hypothesis that these thoughts are on target, how we might set up some critical experiments to test this. I don't know if we can push the analogy with tests for quantum entanglement or not, but probably not.

One might hope that 'all' we have to do is enumerate sequence bits to account for this action-at-a-distance, this very detailed trans phenomenon. But I wonder......I wonder if there may be something entirely unanticipated or even unknown that could be responsible. Maybe there are 'electromagnetic' properties or something akin to that, that are involved in such detailed 4D contextually relativistic phenomena.

Suppose that what happens at one chromosomal location (let's just call it the binding of a TF), directly affects whether that or a different TF binds somewhere else at the same time. Whatever causes the first event, if that's how it works, the distance effect would be a very non-local phenomenon, one so central to organized life in complex organisms that, causally, is not just a set of local gene expressions. Somehow, some sort of 'information' is at work very fast and over very short distances. It is the worst sort of arrogance to assume it is all just encoded in DNA as a code we can read off along the strand and that will succumb to enumerative local informatic sequence analysis.

The current kind of purely local hypothetical sequence enumeration-based account seems too ordinary--it's not spooky enough!

Is life itself a simulation of life?

It often happens in science that our theory of some area of reality is very precise, but the reality is too complex to work out precisely, or analytically. This can be when we decide to use computer simulation of that reality to get at least a close approximation to the truth. When a phenomenon is determined by a precise process, then if we increase the complexity of our simulation, and if the simulation really is simulating the underlying reality, then the more computer power we apply, the closer we get to the truth--that is, our results approach that truth asymptotically.

For example, if you want to predict the rotation of galaxies in space relative to each other, and of the stars within the galaxies, the theories of physics will do the job, in principle. But solving the equations directly the way one does in algebra or calculus is not possible with so many variables. However, you can use a computer to simulate the movement and get a very good approximation (we've discussed this here, among other places). Thus, at each time interval, you take the position and motion of each object you want to follow, and those measures of nearby objects, and use Newton's law of gravity to predict the position of the objects one time interval later.

If the motion you simulate doesn't match what you can observe, you suspect you've got something wrong with the theory you are using. In the case of cosmology, one such factor is known as 'dark matter'. That can be built into models of galactic motion, to get better predictions. In this way, simulation can tell you something you didn't already know, and because the equations can't be directly solved, simulation is an approach of choice.

In many situations, even if you think that the underlying causal process is deterministic, measurements are imperfect, and you may need to add a random 'noise' factor to each iteration of your simulation. Each simulation will be slightly 'off' because of this, but you run the same simulation thousands of times, so the effect of the noise evens out, and the average result represents what you are trying to model.

Is life a simulation of life?
Just like other processes that we attempt to simulate, life is a complex reality. We try to explain it with the very general theory of evolution, and we use genetics to try to explain how complex traits evolve, but there are far too many variables to predict future directions and the like analytically. This is more than just because of biological complexity however, in part because the fundamental processes of life seem, as far as we can tell, inherently probabilistic (not just a matter of measurement error). This adds an additional twist that makes life itself seem to be a simulation of its underlying processes.

Life evolves by parents transmitting genes to offspring. For those genes to be transmitted to the next generation, the offspring have to live long enough, must be able to acquire mates, and must be able to reproduce. Genes vary because mutations arise. For simplicity's sake, let's say that successful mating requires not falling victim to natural selection before offspring are produced, and that that depends on an organism's traits, and that genes are causally responsible for those traits. In reality, there are other process to be considered, but these will illustrate our point.

Mutation and surviving natural selection seem to be probabilistic processes. If we want to simulate life, we have to specify the probability of a mutation along some simulated genome, and the probability that a bearer of the mutation survives and reproduces. Populations contain thousands of individuals, genomes incur thousands of mutations each generation, and reproductive success involves those same individuals. This is far too hard to write tractable equations for in most interesting situations, unless we make almost uselessly simplifying assumptions. So we simulate these phenomena.

How, basically, do we do this? Here, generically and simplified, but illustrating the issues, is the typical way (and the way taken by my own elaborate simulation program, called ForSim which is freely available):

For each individual in a simulated population, each generation, we draw a random number based on an assumed mutation rate, and add the resulting number and location of mutations to the genotype of the individual. Then for each resulting simulated genotype, we draw a random number from the probability that such a genotype reproduces, and either remove or keep the individual depending on the result. We keep doing this for thousands of generations, and see what happens. As an example, the box lists some of the parameter values one specifies for a program like ForSim.

Sometimes, if the simulation is accurate enough, the probability and other values we assume look like what ecologists or geneticists believe is going on in their field site or laboratory. In the case of humans, however, we have little such data, so we make a guess at what we think might have been the case during our evolution. Often these things are empirically estimated one at a time, but their real values affect each other in many ways. This is, of course, very far from the situation in physics, described above! Still, we at least have a computer-based way to approximate our idea of evolutionary and genetic processes.

We run this for many, usually many thousand generations, and see the trait and genomic causal pattern that results (we've blogged about some of these issues here, among other posts). This is a simulation since it seems to follow the principles we think are responsible for evolution and genetic function. However, there is a major difference.

Unlike simulations in astronomy, life really does seem to involve random draws for probabilistic processes. In that sense, life looks like it is, itself, a simulation of these processes. The random draws it makes are not just practical estimates of some underlying phenomenon, but manifestation of the actual probabilistic nature of the phenomenon.

This is important, because when we simulate a process, we know that its probabilistic component can lead to different results each time through. And yet, life itself is a one-time run of those processes. In that sense, life is a simulation but we can only guess at the underlying causal values (like mutation and survival rates) from the single set of data: what actually happened its one time through. Of course, we can test various examples, like looking at mutation rates in bacteria or in some samples of people, but these involve many problems and are at best general estimates from samples, often artificial or simplified samples.

But wait! Is life a simulation after all? If not, what is life?
I don't want us to be bogged down in pure semantics here, but I think the answer is that in a very profound way, life is not a simulation in the sense we're discussing. For the relevant variables, life is not based on an underlying theoretical process in the usual sense, of whose parameters we use random numbers to approximate in simulations.

For example, we evaluate biological data in terms of 'the' mutation rate in genomes from parent to offspring. But in fact, we know there is no such thing as 'the' mutation rate, one that applies to each nucleotide as it is replicated from one generation to the next, and from which each actual mutation is a random draw. The observed rate of mutation at a given location in a given sample of a given species' genomes depends among other things on the sex, the particular nucleotides surrounding the site in question (and hence all sites along the DNA string), and the nature of the mutation-detection proteins coded by that individual's genome, and mutagen levels in the environment. In our theory, and in our simulations, we assume an average rate, and that the variation from that average will, so to speak, 'average out' in our simulations.

But I think that is fundamentally wrong. In life, every condition today is a branch-point for the future. The functional implications of a mutation here and now, depend on the local circumstances, and that is built into the production of the future local generations. Life in fact does not 'average' over the genome and over individuals does not in fact generate what life does, but in a sense the opposite. Each event has its own local dynamics and contingencies, but the effect of those conditions affects the rates of events in the future. Everywhere it's different, and we have no theory about how different, especially over evolutionary time.

Indeed, one might say that the most fundamental single characteristic of life is that the variation generated here today is screened here today and not anyplace else or any time else. In that sense, each mutation is not drawn from the same distribution. The underlying causal properties vary everywhere and all the time. Sometimes the difference may be slight, but we can't count on that being true and, importantly, we have no way of knowing when and to what extent it's true.

The same applies to foxes and rabbits. Every time a fox chases a rabbit, the conditions (including the genotypes of the fox and rabbit) differ. The chance aspect of whether it's caught or not are not the same each time, the success 'rate' is not drawn from a single, fixed distribution. In reality, each chase is unique.

After the fact, we can look back at net results, and it's all too tempting to think of what we see as a steady, deterministic process with a bit of random noise thrown in. But that's not an accurate way to think, because we don't know how inaccurate it is, when each event is to some (un-prespecified) extent unique. Overall, life is not, in fact, drawing from an underlying distribution. It is ad hoc by its very nature and that's what makes life different from other physical phenomena.

Life, and we who partake of it, are unique. The fact of local, contingent uniqueness is an important reason that the study of life eludes much of what makes modern physical science work. The latter's methods and concepts assume replicable law-like underlying regularity. That's the kind of thing we attempt to model, or simulate, by treating phenomena like mutation as if they are draws from some basic underlying causal distribution. But life's underlying regularity is its irregularity.

This means that one of the best ways we have of dealing with complex phenomena of life, simulating them by computer, smoothes over the very underlying process that we want to understand. In that sense, strangely, life appears to be a simulation but is even more elusive than that. To a great extent, except by some very broad generalities that are often too broad to be very useful, life isn't the way we simulate it, and doesn't even simulate itself in that way.

What would be a better approach to understanding life? The next generation will have to discover that.

Somatic mutation beyond neurological traits. Part IV: the big mistake in genetics

The previous posts in this series were about the potential relevance of somatic mutation to neurologically relevant traits. I commented about ideas I've long had about the possible genetic etiology of epilepsies, but then about the more general relevance of somatic mutation for behavior and other less clinical traits, indeed, to positively beneficial traits. But the issues go much farther!

Fundamental units as the basis of science
Every science has fundamental units at the bottom of their causal scale, whose existence and properties can be assumed and tested, but below which we cannot go. The science is about the behavior or consequences of these units and their interactions. The fundamental unit's nature and existence per se are simply assumed. Physicists generally don't ask what is inside a photon or electron or neutron (or they say that these 'particles' are really 'waves'). In that sense, fundamental 'causes' are also defined but not internally probed. They don't really attempt to define 'force' except empirically or, for that matter, 'curved space-time'. You simply don't go there! Or, more precisely, if and when you venture into the innards of fundamental units, you do that by defining other even more fundamental units. When string theory tries to delve into unreal dimensions, they leave most other physicists, certainly the day-to-day ones behind. Generally, I think, physicists are usually more clear about this than biologists. The same in mathematics: we have fundamental axioms and the like that are accepted, not proven or tested.

Why is somatic mutation considered to be some sort of side-show in genetics?
What are biology's fundamental units? For historical reasons, evolutionary biology, which became much of the conceptual and theoretical foundation for biology, was about organisms. Before the molecular age, we simply didn't have the technology to think of organisms in the more detailed way we do now, but thought of them instead as a kind of unit in and of themselves.

Thus, the origins of ecology and phylogeny (before as well as after Darwin) were about whole organisms. Of course, it was long known that plants had leaves and animals had organs, and these and their structures and behavior (and pathologies) were studied in a way that was known to involve dissecting the system from its normal context. That is, organs were just integral parts of otherwise fundamental units. This was true even after microscopes were developed, Virchow and others had established the cell theory of life. Even after Pasteur and others began studying bacteria in detail, the bacterium itself was a fundamental unit.

Eukaryotic cell; figure from The Mermaid's Tale, Weiss and Buchanan, 2009

But this was a major mistake. Dissecting organs to understand them did, when considered properly, allow the identification of digestion, circulation, muscle contraction, and the like. But the focus then, and still today in the genetic age, on the whole organism as a basically fundamental unit has had some unfortunate consequences. We know that genes in some senses 'cause' biological traits, but we treat an organism as a fundamental unit with a genotype, and that is where much trouble lies.

The cell theory made it obvious that you and I are not not just an indivisible fundamental unit, with a genotype as its fundamental characteristic. Theories of organisms, embryology, and evolution largely rest on that assumption, but it is a mistake, and somatic mutation is a major reason why.

The cell theory, or cell fact, really, makes it clear that you and I are clearly not indivisible causal units with a genotype. We know beyond dispute that cell division typically involves at least some DNA replication errors--'errors', that is, if you think life's purpose is to replicate faithfully. That itself is a bit strange, because life is an evolutionary phenomenon that is fundamentally about variation. Perhaps like most things in the physical world, the important issues have to do with the amount of variation.

Mitotic spindle during cell division; from Wikipedia, Public Domain

The number of cell-divisions from conception through adulthood in humans is huge. It is comparable to the number of generations in a species, or even a species' lifespan. Modern humans have been around for, say, 100,000 generations (2 million years), far fewer than the number of cell divisions in a lifetime. In addition, the number of cells in a human body at any given time is in the many billions, and many or even most cells continue to renew throughout life. This is comparable to the species size of many organisms. The point is that the amount of somatically generated variation among cells in any given individual is comparable to the amount of germline variation in a species or even a species' history. And I have not included the ecological diversity of each individual organism, including the bacteria and viruses and other small organisms on, in, and through a larger organism.

By assuming that somatic mutational variation doesn't exist or is trivially unimportant--that is, by assuming that a whole organism is the fundamental unit of life, we are entirely ignoring this rich, variable, dynamic ecology. Somatic mutation is hard to study. There are many ways that a body can detect and rid itself of 'mutant' cells--that is, that differ from the parent cell at their bodily time and place. But to treat each person as if s/he has 'the' genotype of his/her initial zygote is a rash assumption or, perhaps a bit more charitably, a convenient approximation.

Oversimplification, deeper and deeper
In the same way that we can understand the moon's orbit around the earth by ignoring the innards of both bodies, so long as we don't care about small orbital details, we can understand an organism's life and relations to others including its kin, by ignoring the internal dynamics that life is actually mainly about. But much of what the whole organism is or does is determined by the aggregate of its nature and the distribution of its genotypes over its large collections of cells. We have been indulging in avoiding inconvenient facts for several decades now. Before any real reason to think or know much about somatic mutation (except, for example, rearrangements in adaptive the immune system), the grossness of approximation was at least more excusable. But those days should be gone.

Geneomewide mapping is one example, of course. It can find those things which, when inherited in the germline and hence present in all other cells (except where it's been mutated), affect particular traits. Typically, traits of interest are found by mapping studies to be affected by tens, hundreds, or even thousands of 'genes' (including transcribed RNAs, regulatory regions etc.). Each individual inherits one diploid genotype, unique to every person, and then around this is a largely randomly generated distribution of mutant cells. When hundreds of genes contribute, it just makes no sense to think that what you inherit is what you are.

It should also be noted that we have no real way even to identify the 'constitutive' genome of an organism like a person. We must use some tissue sample, like blood or a cheek swab. But those will contain somatic mutations that arose subsequent to conception. We basically don't look for them and indeed each cell in the sample will be different. Sequencing will generally identify the common nucleotide(s) at each site, and that generally will be the inherited one(s), but that doesn't adequately characterize the variation among the cells; indeed, I think it largely ignores it as technical error.

The roles and relevance of somatic mutation might be studiable in comparing large-bodied, long-lived species with small ones in which not many cell divisions occur. They might be predicted to be more accurately described by constitutive (inherited) genomes, than larger species. Likewise plants with diverse 'germ lines', such as the countless meristems in trees that generate seeds, compared to simpler plants, might be illuminating.

How to understand and deal with these realities, is not easy to suggest. But it is easy to say that for every plausible reason somatic mutation must have substantial effects on traits good, bad, and otherwise. And that means that we have been wrong to consider the individual to be a fundamental unit of life.

Quantum spookiness is nothing compared to biology's mysteries!

The news is properly filled these days with reports of studies documenting various very mysterious aspects of the cosmos, on scales large and small. News media feed on stories of outer space's inner secrets. We have dark matter and dark energy that, if models of gravitational effects and other phenomena are correct, comprise the majority of the cosmos's contents. We have relativity, that shows that space and even time itself are curved. We have ideas that there may be infinitely many universes (there are various versions of this, some called the multiverse). We have quantum uncertainty by which a particle or wave or whatever can be everywhere at once and have multiple superposed states that are characterized in part only when we observe it. We have space itself inflating (maybe faster than the speed of light). And then there's entanglement, by which there seem to be instant correlated actions at unlimited distances. And there is some idea that everything is just a manifestation of many-dimensional vibrations ('strings').

The general explanations are that these things make no 'sense' in terms of normal human experience, using just our built in sensory systems (eyes, ears, touch-sense, smell, etc.) but that mathematically observable data fit the above sorts of explanations to a huge degree of accuracy. You cannot understand these phenomena in any real natural way but only by accustoming yourself to accept the mathematical results, the read-outs of instrumentation, and their interpretation. Even the most thoughtful physicists routinely tell us this.

These kinds of ideas rightfully make the news, and biologists (perhaps not wanting to be left out, especially those in human-related areas) are thus led to concocting other-worldly ideas of their own, making promises of miracle precision and more or less health immortality, based on genes and the like. There is a difference, however: unlike physicists, biologists reduce things to concepts like individual genes and their enumerable effects, treating them as basically simple, primary and independent causes.

In physics, if we could enumerate the properties of all the molecules in an object, like a baseball, comet, or a specified set of such objects, we (physicists, that is!) could write formal equations to describe their interactions with great precision. Some of the factors might be probabilistic if we wanted to go beyond gravity and momentum and so on, to describe quantum-scale properties, but everything would follow the same set of rules for contributing to every interaction. Physics is to a great, and perhaps ultimate extent, about replicable complexity. A region of space or an object may be made of countless individual bits, but each bit is the same (in terms of things like gravity per unit mass and so on). Each pair, say, of interactions of similar particles etc. follows the same rules. Every electron is alike as far as is known. That is why physics can be expressed confidently as a manifestation of laws of nature, laws that seem to hold true everywhere in our detectable cosmos.

Of cats and Schroedinger's cat
Biology is very different. We're clearly made of molecules and use energy just as inanimate objects do, and the laws of chemistry and physics apply 100% of the time at the molecular and physics levels. But the nature of life is essentially the product of non-replicable complexity, of uniquely interacting interactions. Life is composed strictly of identifiable elements and forces etc at the molecular level. Yet the essence of life is descent with modification from a common origin, Darwin's key phrase, and this is all about differences. Differences are essential when it comes to the adaptation of organisms, whether by natural selection, genetic drift, or whatever, because adaptation means change. Without life's constituent units being different, there would be no evolution beyond purely mechanical changes like the formation of crystals. Even if life is, in a sense the assembling of molecular structures, it is the difference in their makeups that makes us different from crystals.

Evolution and its genetic basis are often described in assertively simple terms, as if we understood them in a profound ultimate sense. But that is a great exaggeration: the fact that some simple molecules interacted 4 billion years ago, in ways that captured energy and enabled the accretion of molecular complexity to generate today's magnificent biosphere, is every bit as mysterious, in the subjective sense of the term at least, as anything quantum mechanics or relativity can throw at us. Indeed, the essential nature of life itself is equally as non-intuitive. And that's just a start.

The evolution of complex organisms, like cats, built through developmental interactions of awe-inspiring complexity, leading to units made up of associated organ systems that communicate internally in some molecular ways (physiology) and externally in basically different (sensory) ways is as easy to say as "it's genetic!", but again as mysterious as quantum entanglement. Organisms are the self-assembly of an assemblage of traits with interlocking function, that can be achieved in countless ways (because the genomes and environments of every individual are at least slightly different). An important difference is that quantum entanglement may simply happen, but we--evolved bags of molecular reactions--can discover that it happens!

The poor cat in the box. Source: "Schrödinger cat" by File:Kamee01.jpg: Martin Bahmann, Wilimedia Commons

This self-assembly is wondrous, even more so than the dual existence of Schroedinger's famous cat in a box. That cat is alive and dead at the same time depending on whether a probilistic event has happened inside the box (see this interesting discussion), until you open the box, in which case the cat is alive or dead. This humorous illustration of quantum superposition garnered a lot of attention, though not that much by Schroedinger himself for which it was just a whimsical way to make the point about quantum strangeness.

But nobody seems to give a thought beyond sympathy for the poor cat! That's too bad, because what's really amazing is the cat itself. That feline construct makes most of physics pale by comparison. A cat is not just a thing, but a massively well-organized entity, a phenomenon of interactions, thanks to the incredible dance of embryonic development. Yet even development and the lives that plants and animals (and, indeed, single-celled organisms) live, impressively elaborate as they are, pale by comparison with various aspects these organisms have of awareness, self-awareness, and consciousness.

This is worth thinking about (so to speak) when inundated by the fully justified media blitz that weird physics evokes, but then you should ask whether anything in the incomprehensibly grand physics and cosmology worlds are even close to the elusiveness and amazing reality of these properties of life and how these properties could possibly come about, how they evolved and how they develop in each individual--as particular traits, not just the result of some generic evolutionary process.

And there's even more: If flies or cats are not 'conscious' in the way that we are, then it is perhaps as amazing that their behavior, which so seems to have aspects of those traits, could be achieved without conscious awareness. But if that be so, then the mystery of the nature of consciousness having evolved, and the nature of its nature, are only augmented many-fold, and even farther from our intuition than quantum entanglement.

Caveat emptor
Of course, we may have evolved to perceive the world just the way the world really is (extending our native senses with sensitive instruments to do so). Maybe what seems strange or weird is just our own misunderstanding or willingness to jump on strangeness bandwagons. Here from Aeon Magazine is a recent and thoughtful expression of reservations about such concepts as dark matter and energy.

If quantum entanglement and superposition, or relativity's time dilation and length contraction, are inscrutable, and stump our intuition, then surely consciousness trumps those stumps. Will anyone reading this blog live to see even a comparable level of understanding in biology to what we have in physics?

Unknowns, yes, but are there unknowables in biology?

The old Rumsfeld jokes about the knowns and unknowns are pretty stale by now, so we won't really indulge in beating that dead horse. But in fact his statement made a lot of sense. There are things we think we know (like our age), things we think we don't know but might know (like whether there will be a new message in our inbox when we sign onto email), and things we don't know but don't know we don't know (such as how many undiscovered marine species there are). Rumsfeld is the subject of ridicule not for this pronouncement per se (at least to those who think about it), because it is actually reasonable, but for other things that he is said to have done or said (or failed to say) in regard to American politics.

Explaining what we don't know is a problem! Source: Google images

The unknowns may be problems, but they are not Big problems. What we don't know but might know are at least within the realm of learning. We may eventually stumble across facts we don't know but don't yet even know are there. The job of science is to learn what we know we don't know and even to discover what we don't yet know that we don't know. We think there is nothing 'inside' an electron or photon, but there may be if we some day realize that possibility. Then the guts of a photon will become a known unknown.

However, there's another, even more problematic, one may say truly problematic kind of mystery: things that are actually unknowable. They present a Really Big problem. For example, based on our understanding of the current understanding of cosmology, there are parts of the universe that are so far away that energy (light etc.) from them simply has not, and can never, reach us. We know that the details of this part of space are literally unknowable, but because we have reasonably rigorous physical theory we think we can at least reliably extrapolate from what we can see to the general contents (density of matter and galaxies etc.) of what we know must exist but cannot see. That is, it's literally unknowable but theoretically known.

However, things like whether life exists out there are in principle unknowable. But at least we know very specifically why that is so. In the future, most of what we can see in the sky today is, according to current cosmological theories, going to become invisible as the universe expands so that the light from these visible but distant parts will no longer be able to reach us. If there are any living descendants, they will know what was there to see and its dynamics and we will at least be able to make reasonable extrapolations of what it's like out there even though it can no longer be seen.

There are also 'multiverse' theories of various sorts (a book discussing these ideas is Our Mathematical Universe, by Mark Tegmark). At present, the various sorts of parallel universes are simply inaccessible, even in principle, so we can't really know anything about them (or, perhaps, even whether they exist). Not only is electromagnetic radiation not able to reach us so we can't observe, even indirectly, what was going on when that light was emitted from these objects, but our universe is self-contained relative to these other universes (if they exist).

Again, all of this is because of the kind of rigorous theory that we have, and the belief that if that theory is wrong, there is at least a correct theory to be discovered--Nature does work by fixed 'laws', and while our current understanding may have flaws the regularities we are finding are not imaginary even if they are approximations to something deeper (but comparably regular). In that sense, the theory we have tells us quite a lot about what seems likely to be the case even if unobserved. It was on such a basis that the Higgs boson was discovered (assuming the inferences from the LHC experiments are correct).

What about biology?
Biology has been rather incredibly successful in the last century and more. The discoveries of evolution and genetics are as great as those in any other science. But there remain plenty of unknowns about biological evolution and its genomic basis that are far deeper than questions about undiscovered species. We know that these things are unknown, but we presume they are knowable and will be understood some day.

One example is the way that homologous chromosomes (one inherited each of a person's parents) line up with each other in the first stage of meiosis (formation of sperm and egg cells). How do they find each other? We know they do line up when sex cells are produced, and there are some hypotheses and bits of relevant information about the process, but we're aware of the fact that we don't yet really know how it works.

Homologous chromosomes pair up...somehow. Wikimedia, public domain.

Chromosomes also are arranged in a very different 3-dimensional way during the normal life of every cell. They form a spaghetti-like ball in the nucleus, with different parts of our 23 pairs of chromosomes very near to each other. This 'chromosome conformation', the specific spaghetti ball, shown schematically in the figure, varies among cell types, and even within a cell as it does different things. The reason seems to be at least in part that the juxtaposed bits of chromosomes contain DNA that is being transcribed (such as into messenger RNA to be translated into protein) in that particular cell under its particular circumstances.

Chromosomes arrange themselves systematically in the nucleus. Source: image by Cutkosky, Tarazi, and Lieberman-Aiden from Manoharan, BioTechniques, 2011

It is easy to discuss what we don't know in evolution and genetics and we do that a lot here on MT. Often we critique current practice for claiming to know far more than is actually known, or, equally seriously, making promises to the supporting public that suggest we know things that in truth (and in private) we know very well that we don't know. In fact, we even know why some things that we promise are either unknown or known not to be correct (for example, causation of biological and behavioral traits is far more complex than is widely claimed).

There are pragmatic reasons why our current system of science does this, which we and many others have often discussed, but here we want to ask a different sort of question: Are there things in biology that are unknowable, even in principle, and if so how do we know that? The answer at least in part is 'yes', though that fact is routinely conveniently ignored.

Biological causation involves genetic and environmental factors. That is clearly known, in part because DNA is largely an inert molecule so any given bit of DNA 'does' something only in a particular context in the cell and related to whatever external factors affect the cell. But we know that the future environmental exposures are unknown, and we know that they are unknowable. What we will eat or do cannot be predicted even in principle, and indeed will be affected by what science learns but hasn't yet learned (if we find that some dietary factor is harmful, we will stop eating it and eat something else). There is no way to predict such knowledge or the response to it.

What else may there be of this sort?
A human has hundreds of billions of cells, a number which changes and varies among and within each of us. Each cell has a slightly different genotype and is exposed to slightly different aspects of the physical environment as well. One thing we know that we cannot now know is the genotype and environment of every cell at every time. We can make some statistical approximations, based on guessing about the countless unknowns of these details, but the numbers of variables will exceed that of stars on the universe and even in theory cannot be known with knowable precision.

Unlike much of physics, the use of statistical analytic techniques is inapt, also to an unknowable degree. We know that not all cells are identical observational units, for example, so that aggregate statistics that are used for decision-making (e.g., significance tests) are simply guesses or gross assumptions whose accuracy is unknowable. This is in principle because each cell, each individual is always changing. We might call these 'numerical unknowables', because they are a matter of practicality rather than theoretical limits about the phenomena themselves.

So are there theoretical aspects of biology that in some way we know are unknowable and not just unknown? We have no reason, based on current biological theory, to suspect the kinds of truly unknowables, analogous to cosmology's parallel universes. One can speculate about all sorts of things, such as parallel yous, and we can make up stories about how quantum uncertainty may affect us. But these are far from having the kind of cogency found in current physics.

Our lack of comparably rigorous theory relative to what physics and chemistry enjoy leaves open the possibility that life has its own knowably unknowables. If so, we would like at least to know what those limits may be, because much of biology relates to practical prediction (e.g., causes of disease). The state of knowledge in biology, no matter how advanced it has become, is still far from adequate to address the question of the levels of knowable things that may eventually be knowable, but also what the limits to knowability are. In a sense, unlike physics and cosmology, in biology we have no theory that tells us what we cannot know.

And unlike physics and cosmology, where some of these sorts of issues really are philosophical rather than of any practical relevance to daily life, we in biology have very strong reasons to want to know what we can know, and what we can promise....but perhaps also unlike physics, because people expect benefits from biological research, strong incentives not to acknowledge limits to our knowledge.

Who should take statins? Is heart disease predictable?

Who should take statins.....besides everyone? I thought a lot about this when I was working on a lecture about predicting disease. The purpose of statins, of course, is to prevent atherosclerotic cardiovascular disease in people at risk (how well they do this is another issue). The challenge is to identify the people 'at risk'. I wrote about this in July, but I've been playing some more with the ideas and wanted to follow up.

Statins are a class of drug that, in theory, work by lowering LDL (low-denstity lipoprotein) levels. They do this by inhibiting HMG-CoA reductase, an enzyme that has a central role in the production of cholesterol in the liver. LDL, the so-called 'bad' cholesterol, isn't actually just cholesterol, but has been linked to risk of heart disease because, as a lipoprotein, its job is to transport cholesterol to and from cells. It is bound to cholesterol. What's measured when we have our blood drawn for a cholesterol test is LDL-C, the amount of cholesterol bound to LDL particles (LDL-C), as well as HDL-C, the 'good' cholesterol package, which transports LDL-C from cells, leading to lower blood cholesterol levels. Cholesterol makes plaque and plaque lines and hardens arteries, which occludes them and leads to stroke and heart attack. Lower the amount of LDL, and you lower the risk of arterial plaque deposits.

The connection between cholesterol and heart disease was first identified in the Framingham Study in the 1950's and 60's, and this lead directly to the search for drugs to lower cholesterol. Statins were developed in the 1970's and 80's, and after some fits and starts, began to be used in earnest in the late 1980's. Statins work by inhibiting the liver cells' synthesizing of new cholesterol, that is, cholesterol that isn't due taken in in the diet.

Akira Endo, one of the first scientists to look for cholesterol-lowering compounds, reviewed the history of statins in 2010. He described the many studies of the effects of these drugs, saying "The results in all these studies have been consistent: treatment with statins lowers plasma LDL levels by 25–35% and reduces the frequency of heart attacks by 25–30%" (Akira Endo, Proc Japan Acad, Series B, 2010).

A systematic review of the literature on the effectiveness of statins was published by the Cochrane Organization in 2012. The review reports, "Of 1000 people treated with a statin for five years, 18 would avoid a major CVD event which compares well with other treatments used for preventing cardiovascular disease." This suggests, of course, that 982 people took statins with no benefit, and perhaps some risk, as statins are associated with muscle pain, slightly increased risk of type 2 diabetes, liver damage, neurological effects, digestive problems, rash and flushing, and other effects. But more on this below.

So, who should take statins?
Until 2013, the recommendation was that anyone with a modest risk, as assessed by the Framingham Risk Calculator (I've read that that means from 6.5% to 10% 10-year risk) would likely be prescribed statins. The interesting thing, to me, about this risk calculator is that it's impossible to push the risk estimate past "greater than 30%", even at maximum allowable cholesterol, LDL, and systolic blood pressure, and being a smoker on blood pressure medication. Which means that there's a lot that this calculator can't tell us about our risk of CVD, based on the best risk factors known.

Framingham Risk Calculator

In 2013, the American Heart Association/American College of Cardiology revised their criteria for statins. Now, they are recommended for people who have had one CVD event in order to prevent another; for people with primary elevations of LDL-C greater than 190mg/dL; people 45-70 years old who have diabetes and LDL-C between 70 and 189mg/dL, and people 45-70 years old with LDL-C between 70 and 189mg/dL and estimated 10-year cardiovascular disease risk of 7.5% or higher.

The first three criteria are straightforward. If statins lower LDL, and lower LDL lowers risk of ASCVD (artherosclerotic cardiovascular disease), then taking them should be beneficial. But then we're back to a risk calculator again to estimate 10-year risk.

ACC/AHA

It has been revised. Now included are ethnicity (well, White, African American or other), and diabetic status (yes/no), and estimated lifetime risk. And, now it's possible to push 10-year risk up past 70%, which I discovered by playing around with the calculator a bit. Whether or not it's a more accurate predictor of a cardiovascular event is another question.

Here's the lowest risk I could come up with, 0.1% 10-year risk. The recommendations offered are not to prescribe statins.

Lowest 10-year risk

Here's the highest risk I could force the calculator to estimate. Ten-year risk for a female with these risk factors is higher than for a male, but lifetime risk is lower. That seems strange, but ok, it must reflect association of risk factors including sex with disease at the population level.

Compared with the Framingham calculator, risk estimation seems to be getting more precise. Or at least bolder, with estimates up in the 70's. But is the new calculator actually better at predicting risk than the old one? A paper was recently published in JAMA addressing just this question ("Guideline-
Based Statin Eligibility, Coronary Artery Calcification, and Cardiovascular Events," Pursnani et al.) They identified 2435 people from the Framingham study who had never taken statins. Their medical history allowed the authors to determine that, based on the old guidelines, 14% would have been 'statin eligible' compared with 39%, based on the new 2013 guidelines.

Among those eligible by the old guidelines, 6.9% (24/348) developed CVD compared with 2.4% (50/2087) among noneligible participants (HR, 3.1; 95% CI, 1.9-5.0; P less than .001). Under the new guidelines, among those eligible for statins, 6.3% (59/941) developed incident CVD compared with only 1.0% (15/1494) among those not eligible (HR, 6.8; 95% CI, 3.8-11.9; P less than .001).

So, put a whole lot more people on statins, and you prevent an additional very small number of CVD events; 1.0% vs 2.4%. And, 93% of those ‘eligible’ for statins did not develop disease. Nor, of course, do statins prevent all disease. Actually, if everyone in the population were covered, statins would be preventing as many events as they could possibly prevent, but in a small minority of the population. That is, 90+% of people considered to be at 'high-risk' of disease don't go on to develop disease. Is it worth the side effects and cost to put so many more people on statins to prevent the 1.4% more CVD that these new guidelines are preventing? Well, heart disease is still the number one killer in rich countries, and 40+% of the population is currently taking statins, so a lot of people have decided that the benefits do outweigh the risks.

Another question, though, is more fundamental, and it concerns prediction. The calculator seems to now be predicting risk with some confidence. But, let's take a hypothetical person with a somewhat elevated risk. Her cholesterol is higher than the person above who's at lowest risk, but that's due to her HDL. Her systolic blood pressure is high at 180, which is apparently what bumps up her risk, but her 10-year risk is still not over 7.5% so the recommendation is not statins, but lifestyle and nutrition counseling. (Though, the definition of 'heart-healthy diet' keeps changing, so what to counsel this person with low risk seems a bit problematic, but ok.)

Low enough risk that statins aren't advised.

Now here's the same hypothetical person, but she's now a smoker, on medication to lower her blood pressure (and her b.p. is still high) and she has diabetes. Her 10-year risk of ASCVD jumps to 36.8%. This makes sense, given what we know about risk factors, right? The recommendation for her is high-intensity statins and lifestyle changes -- lose weight, do regular aerobic exercise, eat a heart-healthy diet, stop smoking (easy enough to say, so hard to do, which is another issue, of course, and the difficulty of changing all these behaviors is one reason that statins are so commonly prescribed).

But now I've lowered her total cholesterol by 70mg/dL, which is what statins ideally would do for her. Even so, the American College of Cardiology/American Heart Association recommendation is for 'high-intensity statin therapy' and lifestyle counseling. The calculator doesn't know this, but statins have already done everything they are likely to do for her.

So, let's add lifestyle changes. But, even when she quits smoking, her 10-year risk is 20%. So let's say we cure her diabetes -- even then, she's still at high enough risk (9%) that 'moderate to high-intensity statins' are recommended. I'm confused. I think even the calculator is confused. It seems there's a fuzzy area where statins are being recommended when what's left to do is, say, lower blood pressure, which statins won't do. This hypothetical woman probably needs to lower her weight to do that, and statins aren't going to help with that, either, but still they're recommended. Indeed, one of the criticisms of this risk calculator when it was released in 2013 was that it overestimates risk. Perhaps so, but it also seems to overestimate the benefit of statins.

Further, it seems there are a lot of type 1 errors here. That is, a lot of people are considered 'at-risk' who wouldn't actually develop cardiovascular disease. Risk of 7.5% means 7.5 of 100 people with a given, equal set of risk factors are expected to develop disease. That means that 92.5 would not. And that means that we have a pretty rough understanding of heart disease risk. The strongest risk factors we know -- smoking, high LDL-C, diabetes and hypertension -- can be expected to predict only a small fraction of events.

And that means that either something else is 'causing' cardiovascular disease in addition to these major known risk factors, or something is protecting people with these risk factors who don't go on to develop disease. Family history is a good or even the very best single predictor (why isn't it taken into account in these calculators?) which suggests that it's possible that genetic risk (or protection) is involved, but genome wide association studies haven't found genes with large effects. Of course, family history is highly conflated with environmental factors, too, so we shouldn't simply assume we need to look for genes when family history indicates risk. Anyway, it's unlikely that there are single genes responsible for ASCVD except in rare families, because that's the nature of complex diseases. Instead, many genes would be involved, but again as with most complex diseases, they would surely be interacting with environmental risk factors, and we don't yet know understand how to identify or really understand gene by environment interaction.

And then there's the truly wild card! All of these risks are based on the combinations of past exposures to measured lifestyle factors, but the mix of those and the rise of other new lifestyle factors, or the demise of past ones, means that the most fundamental of all predictors can itself not be predicted, not even in principle!

So, statins are a very broad brush, and a lot more people are being painted with them than in fact need to be. The problem is determining which people these are, but rather than zoom in with more precision, the updated calculator instead paints a whole lot more people with the brush. This isn't the calculator's fault. It's because understanding risk is difficult, ASCVD is a large and heterogeneous category, and prediction is very imprecise -- even for many 'simple' Mendelian disorders. If ASCVD were caused by a single gene, we'd say it had very low penetrance. And we'd want to understand the factors that affect its penetrance. That's the equivalent to where we are with cardiovascular disease.

I was interested to see that the 2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk says something that I have said so many times that I decided not to say it again in this post. But, I'm happy to see it elsewhere now. The guideline committee itself acknowledges the issue, so I'll let them explain the problem of assessing risk as their calculator does.

By its nature, such an approach requires a platform for reliable quantitative estimation of absolute risk based on data from representative population samples. It is important to note that risk estimation is based on group averages, which are then applied to individual patients in practice. This process is admittedly imperfect; no one has 10% or 20% of a heart attack during a 10-year period. Individuals with the same estimated risk will either have or not have the event of interest, and only those patients who are destined to have an event can have their event prevented by therapy.

It's the problem of using group data, which is all we've got, to make clinical decisions about individuals. It's the meta-analysis problem -- meta-analyses compile data from many individual studies to produce a single result that certainly reflects all the studies, because they were all included in the statistics, but it doesn't represent any of them with precision. Ultimately, it's the problem that these sorts of inferences must be based on statistical analysis of samples -- collections -- of individuals. We do not have an easy way around this, including the N of 1 studies currently being proposed.

Finally, here's a meta-thought about all this. Ken and I were in Finland this month co-teaching a course, Logical Reasoning in Human Genetics, with colleagues, including Joe Terwilliger. Joe said multiple times, "We suck at finding candidate genes because we don't know anything about biology. We're infants learning to crawl." The same can be said about epidemiological risk factors for many complex diseases -- we suck at understanding the causes of these diseases, and thus we suck at prediction, because we don't really understand the biology.

Masal ve Hikayeler