Thursday, July 7, 2016

Apendix 7.5 Modeling Evolution

It turns out there is a very good reason why its difficult to include realistic selection in an equation of probability of evolution over time. Selection causes the outcome of one random mutation event to affect that of the following event breaching independence. The general form would normally be a Poisson distribution but that is only valid for independent events.

So lets keep things very simple and run the model for a more realistic gene pattern based on the A, T, C, G nucleotide bases of the DNA molecule. Same 10 genes, same 100% selection of the top scoring 5 genes reproduced faithfully to maintain the population of 10. So if we start with one mutation per generation per gene what happens..

To give some statistical validity I averaged the total number of mutation events to achieve the pattern [ATCGATCGATCG.. repeated] for various length genes over 10 cycles and I got this..

Gene      Number of
Length    Mutations
 (L)            (N)           Log(N)
10               818         2.9129
15             4353         3.6388
20           22956         4.3609
25         108139         5.0339
30         918605         5.9631
35       2453687         6.3898
40     16080795         7.2063

Basically log(N) graphed against (L) gives a nice straight line as expected since the governing distribution would be exponential (Poisson). My computer would not solve beyond length 40 without taking too long to do 10 cycles of each.. (days)..

A simple linear regression of the straight line projected out to just 150 bases long gives a first approximation.. however standard deviation is quite large..

The answer is for a gene of just 130 DNA bases evolved as using this model with perfect selection of every single mutation.. If a mutation occurred every millisecond it would take 3.6 billion years on average to get there.. While this result is mathematically correct its not the the most favorable model for a real evolutionary algorithm. What I have done here is pit a severe mutation rate against the very best selection rate.. and while the former won it means I need a more realistic mutation rates v more realistic selection rate. Not simple as Fred Hoyle's work shows..

I obviously need a more powerful computer and a bit more statistical work to tidy it up as a paper..

Wednesday, June 22, 2016

Appendix 7.4 Modelling Evolution

It seems for the statistical simulation model the efficiency of selection is not completely set by one value of Sn (or as in the stat model, top % of population selected). It is affected by the methodology used.. So in the original Dr J model all strings of '01' or '10' were counted resulting in 2000 - 4000 generations (single mutation) to achieve the 100 long "010101.." pattern. I have settled on simply counting every correct digit (base) be it 0 or 1 or A or T or C or G etc in its correct position. Running this model 50 times the sample mean to 'evolve' the 100 pattern was 443 generations which with a 10 gene population = 4431 mutation events.. min 2021, max 12870. Now that's with one mutation, on a simple 2 code choice model with perfect selection/copy of the top 50% after each single mutation level. A fair way from reality.

With just 2 mutations however these simple models reveal a real problem.. so far none have completed the pattern. As it turns out a single mutation in a 2 code system has only a 25% chance of being detrimental similar to the Fred Hoyle analysis of the 'naive' single beneficial mutation. However with a second mutation that all changes as the second becomes predominantly detrimental.

So do two mutations acting on 100 base length of genome = 2% mutation rate? Imply a massive overstatement of the rate of mutation. Lets pose the question.. How many DNA changes generally occur in concert before a selectable trait is produced? Putting it another way.. Is every single DNA mutation normally selectable.. clearly not. By requiring just 2 mutations to act together before selection criteria is applied is actually a huge concession to what occurs in the real world. Recall I am only modelling the algorithm of evolution and artificially increasing the rate of mutation just facilitates a quicker result particularly since every beneficial change is selected.

Including realistic selection strengths in models and equations of evolution has proved very difficult. So to keep things simple I am going to assume 100% selection and reproduction of every beneficial change. Which may mean a more realistic number of mutations (ie 2) or larger population size. The objective is a rigorous, simple, verifiable test of the evolutionary algorithm to which end it is imperative I give every possible concession to evolution theory.

So what's the result..

Sunday, April 24, 2016

Appendix 7.3 Modelling Evolution

The coin toss simulation is not a model of evolution but a model of the algorithm of evolution which allows exploration of the strength of natural selection to overcome the normal destructive effects of random mutation. Since natural selection does change the probability of a required outcome it would be nice to find an equation for that probability. In Appendix 5.1 I discovered a number which was effectively a boundary condition the 2nd Law imposes on the random assembly of any coded string from a finite alphabet. An equation for probability of evolving a given 'gene' after any number of generations including the effect of natural selection would allow me to relate the improbability of the state to the limit imposed by the Second Law.

I have an equation which on preliminary testing shows agreement with some statistical modelling, Fred Hoyle's results [The Mathematics of Evolution] and recently published papers like this..

[http://dx.doi.org/10.1371/journal.pone.0000096].. noting..
"Although a great deal is known about the landscape structure near the fitness peaks of native proteins [5][7][9][15], little is known about structures near the bottom, which contain information regarding primordial protein evolution." and..
"Although it was shown to be possible for a single arbitrarily chosen polypeptide to evolve infectivity, the evolution stagnated after the 7th generation, which was probably due to the small mutant library size at each generation."

While such modelling does show some increase in fitness as complexity (sequence length) increases it effectively stagnates at some limiting value dependent upon the "mutant library size".. This is precisely what my Second Law boundary condition predicts..

The equation is proving difficult to verify and I initially had problems with software handling very large/small numbers (now solved).. I started defining selection success 'Sn' as the probability that positive mutations on average will succeed.. ie not die or get eaten before they can reproduce.  Sn varies like this..

Sn = 0  (no selection)   to   Sn = 1  (100% selection)

Some program confirmation of stagnation occurred for all values of Sn for large enough gene lengths. However the extreme sensitivity as Sn dropped even minutely below 1 for me urges caution so I'll hold that result until fully verified.

Wednesday, March 9, 2016

Appendix 7.2 Modeling Evolution

Well it turns out my 100 coin toss model exceeds all expectations.. Yes the one proposed in Appendix 5 and which Dr J programmed (see App 6).

The program demonstrates that with a single mutation applied at random to all 10 genes (each 100 long) and with 50% selection of the best(fittest = largest count of 0101.. sequence) the model converges on the target 100 long string of 0101.. pattern in 2000 to 4000 generations. The advantage of this pattern is that it has an equal number of heads/tails which makes the improbability only a function of the order and not the number of heads or tails which over any large random sample will be approximately equal. In absolute entropy terms it means the set of macrostates with the largest number of microstates.

If we now note that a single mutation (toss of coin) at a random position in the 100 long gene has 50% chance of being correct and 50% chance of being in the right place = 25% chance favorable. It also has 2 x 25% chance of being neutral so leaving only a 25% chance of being unfavorable. This approximates Hoyle's description of what he calls the naively simplistic model widely accepted by evolutionary biologists and their followers.. its the single favorable mutation model.. and yes it behaves exactly as predicted. But now following Hoyle what happens if we introduce a second mutation.. ie make 2 random mutations at random positions at each new generation while still selecting 50% = 5 best genes..

Go ahead.. run it..

I let it run for two days and over 40 million generations and NO it does not converge just as Fred Hoyle's math analysis predicted.. The second mutation becomes overwhelmingly unfavorable to the completion of the whole series beyond a certain point.

You may think that a rather strange result.. by adding just one more mutation it completely annuls the power of 50% perfect selection..! So the real question it raises is; what exactly is the power of selection to 'create'? Consistent with standard probability rules the length of the genome has a big effect on the improbability of the final outcome as convergence can easily be obtained for say a 50 long string even with 2 mutations. Although it now becomes a statistical analysis problem.. it starts to look very much like it supports my original falsification in Ch 9 on the basis selection 'bias' may be so small as to be negligent for large genomes with large numbers of unfavorable mutations.. ie Fred Hoyle's conclusion.

Sunday, February 21, 2016

Appendix 7.1 Modeling Evolution

Fred Hoyle was a brilliant mathematician.. Professor of mathematics at Cambridge.. solved the nuclear synthesis of the heavy elements in the center of stars.. He was an anti-creationist and made a genuine attempt to math model the evolutionary algorithm analytically.. He differs from what he calls the "new believers" (in evolution by natural selection).. in one characteristic.. He told the truth in "The Mathematics of Evolution".. You need to hear what he had to say..

"Let us start naively with the feedback equation..
dx/dt = s.x    (t = time)  (1.1)

in which x is considered to be the fraction of some large population that possesses a particular property, 'A' say, the remaining fraction (1 - x) possessing a different property 'a', all the other individuals being otherwise similar to each other."

After integration to find x and some elaboration on the reproductive outcomes of this model for A being advantageous (s > 0) he gets..
x = xoexp(st)

"So it is agreed for s > 0, with A then a favorable property, that x rises to unity with all members of the population coming to possess it in a time span of ln xo/s generations..  ...  And if s < 0 the solution dies away in a time span of the order 1/s generations, thereby implying that if A is unfavorable it will be quickly rejected.

I am convinced it is this almost trivial simplicity that explains why the Darwinian theory of evolution is so widely accepted, why it has penetrated through the educational system so completely. As one student text puts it.. 'The theory is a two step process. First variation must exist in a population. Second the fittest members of the population have a selective advantage and are more likely to transmit their genes to the next generation.'

But what if individuals with a good gene A carry a bad gene B having the larger value of |s|. Does the bad gene not carry the good one down to disaster? What of the situation that bad mutations must enormously exceed good ones in number?"  (A fact acknowledged by all the research).. and so after some work he gets..
x  ~=  xo/( xo + exp(-st))      (1.6)

Unlike the solution to (1.1) for s > 0, x does not increase to unity ... but only to 1/2... Property A does not "fix" itself in the species in any finite number of generations. A residuum of individuals remain with the disadvantageous property 'a'."

My verification of this next..

Tuesday, February 16, 2016

Appendix 7 Modelling Evolution

First a note about probability and entropy..

The probability of any 'event' is determined by the 'prior' expectation of it (by a given process). (ie if coins are intelligently placed in order (HTHT.. etc.) then the probability of that arrangement would appear to be 1 because the outcome is certain). So by this it would appear the probability of achieving a state of matter depends upon how it is produced (process). But the absolute entropy of any state of matter is independent of the process that produced it! So it would appear the entropy cost of producing an end state is independent of the absolute entropy of the state itself.! Yet we know they must be dependently linked because any low entropy state must be accounted for by an increase in entropy (in the surroundings) and the only available way to get it is the process that produced it..?

How do we reconcile this apparent contradiction..

If the improbability of a state is reduced by a natural bias (like selection) or even direct intelligence the entropy of that state is not changed and the cost of that state must still be accounted for.. The answer must lie in that fact I have only considering the final steps (the placing or tossing of the coins) of a process which is in fact the result of a much larger 'system' that makes the outcome possible..

The considerations of where did the coins (or dice) come from.. what about the table, the room and even the person doing the placing. So is it valid to calculate the improbability simply by looking at the end result. That answer is in the term conditional probability. The probability of A given B is written Pr(A|B), (Noting that improbability is just 1/probability). So when I am calculating the probability of a sequence of 100 coins (leave the DNA for now).. It is actually..

Pr(100 HT pattern | coins, table, room, person, land, earth etc etc.) with all those 'givens'..

It is therefore possible to simply state the absolute entropy of the state is not changed by the process its just that I am calculating the conditional probability (or improbability) of the last part of the process (system) that produced it. Most important here is the fact that each conditional part in the process must result in an entropy increase which exceeds the drop in entropy that it creates. So the drop in entropy resulting from a person placing the coins in order is different to the drop in entropy resulting from tossing the coins randomly to get the same order. But the absolute entropy (~probability) of the final state is the same in both cases. They simply have a different set of 'givens'.

I hope that is clear enough..