Gene fossilization and cavefish

When I started teaching biology, and that was a long time ago, I didn’t know most of what I know today about the theoretical framework of evolutionary biology, and I was not properly prepared for a lot of questions from students. Mea culpa. That unpreparedness led me to take certain paths in the classroom that, today, certainly I wouldn’t. The theme of this short post is exactly an example of this.

Back then, as today, virtually all textbooks provided almost the same examples and analogies in their chapters on evolution: the giraffe’s neck when talking about selection, the wings of bats and birds when talking about homologies, the vermiform appendix and the absence of eyes in cavefish when talking about vestigial organs, etc. This is not only dull as, on occasion, incorrect: the selective advantage of giraffe’s long neck is not being able to reach higher leaves, as shown in almost all books. I remember teaching a high school class, talking about the old concept of inheritance of acquired characters and about history of evolutionary biology in the nineteenth century, when a student asked me something like this: “If cavefish descend from ancestors with eyes, and if their eyes are only no longer used, we still should find animals with eyes amid that fish population, because there is no selective pressure for the loss of eyes”. The student didn’t express himself exactly in these words, of course, but that was the central idea of his question: why cavefish have no eyes, if there is no inheritance of acquired characters and if fishes with or without eyes are equally blind?

At that time, with my limited knowledge — I was a beginner, it was my first year teaching —, I tried to explain that the lack of eyes would be selectively advantageous: I explained that the formation of an eye is a complex and energetically expensive developmental process; Thus, variants that simply don’t develop eyes would have a selective advantage compared with variants that have eyes, the former becoming more common and the latter disappearing. This is a suitable explanation, not that absurd from the point of view of evolutionary biology, except for one fact: it is incorrect and unnecessary.

The explanation that I’d like to illustrate here is way simpler, although requiring a mathematical basis much more solid than most teachers of biology (nostra culpa) have.

Let’s begin. Given the informational complexity of genes, which in some cases reaches millions of base pairs, the correct sequence — in other words, the functional allele — is the one that must to be continuously maintained by selection, not the other way around. Hence, when a protein is no longer required, the gene that encodes it quickly becomes eroded, going so far as to completely disappear. This is called gene fossilization. Fossil comes from Latin verb fodere, that means to dig; so, fossilis means obtained by digging. I could have used another term, like “gene degeneration” or “gene erosion”. But I like “gene fossilization”, a term which I became familiar with by reading Sean Carroll’s “The making of the fittest”. Furthermore, “fossil” gives us the idea of something that was functional in the pass, but which only fragments remain.


An excellent introduction to gene fossilization: “The making of The fittest”, by Sean Carroll.

To properly understand the concept of gene fossilization, it’s worth having in mind the concept of entropy. Consider a gene sequence as a information, i. e., a set of data that make sense. Mathematically, being S the number of distinct elements in the sequence (which, in the case of DNA, are four: adenine, guanine, cytosine and thymine) and n the length of the sequence, the number of possible combinations is S(S to the power of n). For example, for a given gene containing 5,000 base pairs (and this is not a lot), the number of possible combinations is 45,000, which is simply astonishing. And what the concept of entropy has to do with it? Imagine, for the sake of simplicity, that only one gene sequence is the correct one, that is, only one of trillions and trillions of possible sequences is capable of properly producing the protein we want. Of course, there are many changes in the gene sequence (mutations) that either do not change the amino acid in the protein or, even if the amino acid is changed, do not change the function of the protein; but let’s just say here, again for the sake of simplicity, that only one sequence is correct, all the endless others are wrong, non-functional ones: they are all incorrect sequences, “recessive” alleles, that is, unable to produce that given protein. So, if we have only a single correct and highly organized state from a physical point of view, and an almost infinite set of all other states, all incorrect, disorganized, non-functional ones, it’s very easy to see that chance alone favours the emergence of incorrect variants of the gene (the non-functional or recessive allele).

The direct biological consequence of this is that any given functional protein, let me rephrase that, any given gene sequence that correctly determines a functional protein must be constantly monitored by selection. Incorrect gene sequences appear all the time, created by mutations of all types and causes. If organisms having this incorrect gene sequences do not produce the correct protein, their fitness will be lower and, because of the selection, these incorrect gene sequences will have their frequency reduced in the population. That is, the high frequency of correct gene sequences is actually due to a relentless selection process that constantly eliminates the incorrect variants.

Well, and what happens when a protein is no longer functional, that is, it is no longer needed? There is simply no more selection (having or not having the protein makes no difference in organism’s fitness), and mutant sequences begin to proliferate. And what is more interesting is that, given the complexity of the genetic material and the cumulative effect of generations, this occurs in much less time than we usually presume. Thus, if there is no selective pressure to maintain a certain gene sequence, it quickly degenerates and, if we add genetic drift to this scenario, it will eventually disappear completely (frequency p = 0).

Therefore, in the case of cavefish, this would be a much more adequate explanation for the loss of eyes: when the original populations stopped using its eyes, the genes for the proteins related to the formation or function of eyes began to deteriorate quickly. Let’s make things clear here: mutations occur in all regions of the genetic material, with no distinction or preferences, both in highly important sequences and in useless sequences. However, regarding the latter, selection does not scan for change anymore. If a gene is of no use, recessive non-functional alleles start to accumulate and the functional allele disappears. In a surprisingly short time, genes related to eye’s formation fossilize.

Cavefish are eyeless not (only) because the lack of eyes is a selective advantage, but mainly because the presence of functional eyes is a very organized and complex state, which must be continuously maintained and protected by selection.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s